Find Jobs
Hire Freelancers

549190 Meta Tag and WHOIS Scraper

N/A

En cours
Publié il y a plus de 12 ans

N/A

Payé lors de la livraison
We're looking to do some research on a list of domain names. For each domain name we want to know the following: Domain, Company, Industry, Country, State, City, Zip/Postal, title tag, meta description, meta keywords That's it. The input will be the list of domain names pasted into a text area on a web page. The output should be a downloadable CSV or TAB delimited file I can load into Excel. There should also be visible output on the web page while running so we can see progress. Your script will have list of lists that contain the industry information. formatted as specified below or whatever way is easiest for you (though, we should be able to add/edit/delete from this list as much as we want. Hardcoded within the program is OK). The INDUSTRY "List of lists" could look like this: Industry,tag1,tag2,tag3,... And basically, if the domain name, home page title, meta description or keywords ("the data fields") have either the industry name or any of the tags in them, then that is the industry they should be assigned. Here's an example of what the industry lists might look like, but you can format them any way that works best for you. $legalwords = array("legal","law", "lawyer", "attorney","advoca"); $consultantwords = array ("consultant","consult","advisor"); $medicalwords = array ("medical","medicine","doctor","surgic","stem cell","scienc","research","laborat"); $contractorwords = array ("contractor","construction"); And since it is possible for a company to be in more than one of the industries, I'd like some logic that determines the most appropriate industry, maybe by counting how many matches there are in the data we are looking at for each category. The location information should come from the WHOIS database. We need error checking built into the program so that if a domain no longer exists, or if it redirects elsewhere, the script does not crash, but continues to the next URL. The output should be a TAB delimited file that we can easily load into EXCEL to do some analysis. That's the whole project. Once the project is awarded to you, I will send you a list of sample domains and a more complete list of industries and tags. When you reply, put the word "orange" in the subject line of your PM or BID. If you don't do that, I'll know you didn't read this spec completely, and I won't read your bid or PM. I need this done in the next 12 hours, but that should be easy as it's an extremely small and simple project for someone who knows PHP even reasonably well. And if you're an expert, this is probably an hour or less. Thanks. Mark
N° de projet : 2295134

Concernant le projet

Projet à distance
Actif à il y a 12 ans

Cherchez-vous à gagner de l'argent ?

Avantages de faire une offre sur Freelancer

Fixez votre budget et vos délais
Soyez payé pour votre travail
Surlignez votre proposition
Il est gratuit de s'inscrire et de faire des offres sur des travaux

À propos du client

Drapeau de UNITED STATES
Winnetka, United States
5,0
6
Membre depuis avr. 19, 2010

Vérification du client

Merci ! Nous vous avons envoyé un lien par e-mail afin de réclamer votre crédit gratuit.
Une erreur a eu lieu lors de l'envoi de votre e-mail. Veuillez réessayer.
Utilisateurs enregistrés Total des travaux publiés
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Chargement de l'aperçu
Permission donnée pour la géolocalisation.
Votre session de connexion a expiré et vous avez été déconnecté. Veuillez vous connecter à nouveau.