In reply to Parsing Prince Edward Island Legislation: Understandling Styles
Of potential interest for readers/hackers: the Apache POI project can (sometimes) read Microsoft documents and provide an API for Java. e.g. They have a 'WordToHtmlConverter', though this may not be sufficient for the goal (which is likely extracting the data model?). See http://poi.apache.org/
ps. Happy to post an example to GitHub, if that is useful
pps. This site/project is fascinating stuff!
Of potential interest for readers/hackers: the Apache POI project can (sometimes) read Microsoft documents and provide an API for Java. e.g. They have a 'WordToHtmlConverter', though this may not be sufficient for the goal (which is likely extracting the data model?). See http://poi.apache.org/
ps. Happy to post an example to GitHub, if that is useful
pps. This site/project is fascinating stuff!