Find Jobs
Hire Freelancers

HTML page scraper

$30-100 USD

En cours
Publié il y a presque 15 ans

$30-100 USD

Payé lors de la livraison
HTML files residing on a local drive will need to be scraped for data and placed into either a mySQL or SQLite table based upon a definition table. ## Deliverables I need a Delphi 7 application that will scrape data off of HTML files that reside on a local hard drive and place data into either a mySQL table or a SQLite table. The application will have a string constant that points to the location of the HTML files. The application needs to be able to search that location for *.htm files, including in any subfolders that might exist. E.g. const SourcePath : String = 'c:\data\'; If there are any subfolders under c:\data they need to be searched for *.htm files. The data that will be scraped will be defined by a table that will have 3 fields: BeginTag: EndTag: DBField: Each definition/record in this table needs to be applied to each HTM file found in SourcePath. Here's an example of this definition table: BeginTag: **Date:** EndTag: * DBField: THEDATE BeginTag: **ID:** EndTag: * DBField: IDNUMBER BeginTag: Rank: # EndTag: RANKING DBField: in Note: The actual definition table will hold more than just 3 definitions. The app needs to be able to handle all of the definition entries/records it finds. So, here's how the definition table would work. Using the 3 definitions above as an example, we would start with the "**Date:**" BeginTag. The app would search the HTML code in the first file for the first instance of "**Date:**". It would then start storing the data it finds beginning with the next character position after this BeginTag and store the characters/data into a temporary string until it reaches the EndTag, which in this case would be " * ". Whatever temporary string data has been found between the BeginTag and EndTag will be written to a different table (we'll call it the RESULTS DATA table) AFTER all of the definitions have been iterated through. So, the app would move on to the next definition record (BeginTag: **ID:** EndTag: * ) and likewise scrape the data to a temporary string. And then move on to the next definition, etc... Once all the definitions have been iterated through, the scraped data will be written to a record in the RESULTS DATA table. In the example above, 3 strings of data would be written to the fields THEDATE, IDNUMBER and RANKING. The app would then move on to the next HTM file it finds and repeats the scraping of data based upon the definitions, and saves the scraped data to another record in the RESULTS DATA table. And so on... Before writing the scraped data to a record in the results data table, the app will need to check and see if an existing record already resides in the RESULTS DATA table. We don't want duplicate records! The app only needs to check for the existence of a single field to determine if a record already exists in the RESULTS DATA table or not. That single field will be defined by a string constant: INDEXFIELD, e.g.: const IndexField : String = 'IDNUMBER'; If a record already exists, then the record will be replaced. If a record does not exist, a new one will be added to the RESULTS DATA table. Before moving on to the next HTM file, the app will rename the original HTM file by appending the extension ".processed" to its file name. A progress bar will be required, showing the current status of completion based upon how many HTM files still need to be processed. A TMemo will be placed on the main form, which will be used for output/logging/debugging purposes. Each processed HTM file will have logged into the TMemo the following: 1) The full path to the file name 2) The scraped data found within that file e.g. c:\data\[login to view URL] THEDATE: July 8, 2008 IDNUMBER: A9023 RANKING: 23 c:\data\[login to view URL] THEDATE: June 18, 2000 IDNUMBER: B1234 RANKING: 567 Before the program exits/quits, the TMemo needs to be written to disk, using the following file format: [login to view URL] in a \LOGS folder (placed under the application folder). 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables): a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment. b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request. 3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement). ## Platform Windows 32-bit
N° de projet : 2806574

Concernant le projet

13 propositions
Projet à distance
Actif à il y a 15 ans

Cherchez-vous à gagner de l'argent ?

Avantages de faire une offre sur Freelancer

Fixez votre budget et vos délais
Soyez payé pour votre travail
Surlignez votre proposition
Il est gratuit de s'inscrire et de faire des offres sur des travaux
Décerné à :
Avatar de l'utilisateur
See private message.
$25,50 USD en 10 jours
5,0 (71 commentaires)
6,2
6,2
13 freelances proposent en moyenne $64 USD pour ce travail
Avatar de l'utilisateur
See private message.
$102 USD en 10 jours
5,0 (29 commentaires)
5,6
5,6
Avatar de l'utilisateur
See private message.
$59,50 USD en 10 jours
5,0 (63 commentaires)
5,1
5,1
Avatar de l'utilisateur
See private message.
$50,15 USD en 10 jours
5,0 (46 commentaires)
5,0
5,0
Avatar de l'utilisateur
See private message.
$51 USD en 10 jours
5,0 (8 commentaires)
3,6
3,6
Avatar de l'utilisateur
See private message.
$85 USD en 10 jours
5,0 (24 commentaires)
3,6
3,6
Avatar de l'utilisateur
See private message.
$29,75 USD en 10 jours
4,9 (19 commentaires)
3,5
3,5
Avatar de l'utilisateur
See private message.
$51 USD en 10 jours
5,0 (4 commentaires)
2,4
2,4
Avatar de l'utilisateur
See private message.
$21,25 USD en 10 jours
5,0 (2 commentaires)
1,3
1,3
Avatar de l'utilisateur
See private message.
$212,50 USD en 10 jours
0,0 (0 commentaires)
0,0
0,0
Avatar de l'utilisateur
See private message.
$80,75 USD en 10 jours
0,0 (0 commentaires)
0,0
0,0
Avatar de l'utilisateur
See private message.
$21,25 USD en 10 jours
0,0 (0 commentaires)
0,0
0,0
Avatar de l'utilisateur
See private message.
$42,50 USD en 10 jours
0,0 (1 commentaire)
0,0
0,0

À propos du client

Drapeau de UNITED STATES
Fredericksburg, United States
4,9
29
Méthode de paiement vérifiée
Membre depuis mars 7, 2009

Vérification du client

Merci ! Nous vous avons envoyé un lien par e-mail afin de réclamer votre crédit gratuit.
Une erreur a eu lieu lors de l'envoi de votre e-mail. Veuillez réessayer.
Utilisateurs enregistrés Total des travaux publiés
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Chargement de l'aperçu
Permission donnée pour la géolocalisation.
Votre session de connexion a expiré et vous avez été déconnecté. Veuillez vous connecter à nouveau.