My web logs contain, as do everybodys, mostly extraneous image data that is of no interest to me. I am interested in a program that will parse a web log for lines that contain the text '.htm' (which includes '.html') between 'GET' and 'HTTP. The program would write this entire line to another file.
After parsing out the 'htm' lines, the program would sort the resultant file such that the viewers IP addresses would be clustered together. This is not an alphabetical sort. The primary sort would be by date which is how the original log file is sorted. Thus if IP 5.5.5.5 hit the web site first but was intermixed with IP 2.2.2.2, all the hits from 5.5.5.5 would be grouped together first, then all the hits from 2.2.2.2, etc. This would retain the basic date sort but would provide IP grouping.
The program should provide a percent done indication.
The goal is to have a condensed log file containing only viewable .htm or .html files listed.
The current web logs are approximately 200Meg of the following form. I would want the first line kept and not the second line. See the attached sample.
[login to view URL] | - | [01/Jul/2004:00:05:13 -0400] | "GET /GO/[login to view URL] HTTP/1.1" | 200 | 12444 | [login to view URL] | [login to view URL] | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"
[login to view URL] | - | [01/Jul/2004:00:05:13 -0400] | "GET /images/[login to view URL] HTTP/1.1" | 200 | 360 | - | [login to view URL] | "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
The program would execute in an XP environment from a desktop icon. I would require a 'browse' button to select the file and a 'browse' line to write the output file name.
The program would be fast, compact, and stand-alone. It should not require installation. I do not care what base language the program is written in.