This application is designed to test the functionality of free-text parsers. A high level description of the application is as follows:
1. The user specifies a list of URLs of domains, or single HTML pages, (the user indicates which is a single page, and which is a domain. It is intended that this functionality is simply implemented with wget.
2. If a full domain is specified, all HTML pages on the domain are fetched. IF the user specifies a single web page only, only that page is fetched by the application. It is intended that this functionality is simply implemented with wget.
3. HTML pages are parsed with parsers which will be specified (and will also be provided)
4. The XML output of each parser is inserted into a Microsoft SQL Server database.
5. Lucene is used to enable search of the XML output files from the parsers.
6. A small, simple web based interface enables the user to search the output files, and examine/edit the db records.
The project requires Java experience, as well as some .NET experience (one of the parsers has been developed in .NET). Bids will be accepted for developing the project on either platform, although Java is preferred.
Complete specifications will be provided to the winning coder, it can be anticipated that the project requirements will be minimal and not more than described above.
## Deliverables
Final deliverable would ideally be a web container file with the specified functionality implemented.