HtmlParser is a collection of Html processing and querying projects. The first element, the low level parser, is based on and extends Html::Parser from CPAN. This core is an event producing document parser with all other tools and libraries acting as subscribers.

Using this library, it will be possible to process, query (via XPath and similar), build and modify Html based documents.

The first version of this software is based on a nibbling regex parser which is a direct port (well almost) of an old version of Html::Parser from The first two goals of the project will be for FxCop compliance and to apply good code standards (this is a release of a private pet project).


  • FxCop compliance
  • Initial code review
  • Parser test cases
  • General parser consumers (identified from CPAN, PEAR et al)
  • General user guide (free ebook and a printed version to support the project)
  • Applications
    • DOM visualisation
    • Xml from Html doc builder (with XPath query support)
    • Configurable Html tidy application



FXCop compliance is complete against v1.35. The project has been restructured and a unit test project introduced. I'm still getting
used to TeamPrise so there are a few useless revisions in the source tree. The test project currently has one unit test in it that
counts the events from the parser using a demo file. This is taken as the first release v0.1. I'll be working, now, on adding test cases
to the project.

Last edited Jul 4, 2007 at 10:10 PM by simonproctor, version 8