This is an ideal project for an MySQL interfacing / data warehousing expert.
I am looking for two applications written in C/C++ for Console Linux, compilable via GCC. FULLY COMMENTED code please.
_Details of the Programs is in the 'Deliverables' Field. Please see there._
**With your bid, please include a small proposal showing you understand what I am asking, and how you would approach this**
I am more than happy to quickly answer any questions about these applications that you post. Ask away in the chat room or send a message!
**----- EDIT ADDITION ------** The Deliverables List was Garbled but I fixed it. Sorry.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables):
a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
My Deliverables List
The two applications are:
Parse a multiline text log file delimited by square brackets  and import into mySQL.
'[field1] [field2] [field3] [field4]' etc... into uniqueidcolumn, column1, column2, column3, column4
The parser MUST use/create separate tables in the database for each day. i.e name them *TABLENAME*_*datestring*
One of the fields is the date and time represented like:
[Wednesday 09th of November 2005 09:53:05 AM]
So the parser should first check if that day has a table in the database that exists and if not, create it and insert to there. Ideally in further processing of that file, the day should then have a flag as 'table created/exists' so each import doesn't require a mysql query, rather a simple local variable check.
As well, there should be a command line argument to have each insert do a duplicate record check in the database before inserting (and skip if it's a dupe), but this should be disabled by default for speed purporses.
I can provide a sample file after bidding but will not post here. If you have any questions please let me know and I can clarify further.
Parse a text log with raw keywords (one per line) into MYSQL.
The mysql table should contain 2 columns - KEYWORD and COUNT.
I have grabbed a cut and paste from metaspy to use as an example of the inputfile:
neil diamond tour dates
fernandez fox krystal radio sports
disenos de casas
new king james bible
new york daily news online
shoes for women with small feet
dhl package tracking 7702954210
[url removed, login to view]
el diario de hoy
soalan-soalan percubaan spm
The mySQL table name must contain the date as above (*TABLENAME*_*datestring*) where the date of data being entered is passable to the program via commandline. If the table does not exist, it shoudl create it.
If a keyword is encountered that already has an entry in the table, increment the COUNT column instead of adding a new entry.
As well, before inserting into the table, this application MUST, in this order :
1 - convert all characters in the string to lower case
2 - replace dashes with spaces
3 - strip off non-alphanumeric characters
4 - check the line if it containts one of the strings in a blacklist of 'bad' strings that are not to be databased. If it matches, skip the line.
Then, attempt to insert.
The most important part about both the applications is their speed. Both types of input files are going to be HUGE and the faster the application is written the better.
I have the majority of this written in PHP already and it is not fast enough.
Console output/logging is a must - errors should be reported and number of records inserted/time taken/filtered records, etc should be given as stats at the end of both the programs.
The MYSQL server will be on the same box/localhost.
Bidding is for both applications as they are very similar in scope.
Delivery of both FULLY COMMENTED sources that I can compile in GCC is required.
**These will be separate programs entirely**
My architecture is AMD 64bit under Fedora Core 4 64bit. Any improvements based on this are welcome.
Console Linux (Fedora Core 4)