En cours

Open Access Harvest Project (data gathering scripting)

We are looking to build an Open Access archive of freely available scholarly journals. [url removed, login to view] is a good explanation of the what the content and project field is related to.


A. Create a harvesting engine in your own choice of coding ( parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some site support

2.) If not crawling read from an input file to gleam the data, some site supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a muli-core processor should be sufficient

Compétences : Programmation C, Java, Perl, Ruby on Rails, Web Scraping

Voir plus : access data project, oai, programming resources, programming org, one harvest, ftp engine, ftp dump site, find wikipedia, find sites programming work, file processor, programming wiki, open text, internet programming project, find programming project, parallel programming, help data gathering, harvest, find new programming, data harvest, data gathering, data en, crawl data, correctness, build accurate, access d

Concernant l'employeur :
( 5 commentaires ) Windsor, United States

N° du projet : #1608716

4 freelance ont fait une offre moyenne de 600 $ pour ce travail


I already have this system built but I need to have the sites and output format

500 $ USD en 3 jours
(2 Commentaires)

I can start on this project

400 $ USD en 20 jours
(1 Commentaire)

Great to know about this task.I am interested and will do it for you

1000 $ USD en 23 jours
(0 Commentaires)

Hi, I have experience of harvesting OAI data. Please see a demo as in the PMB. Thank you very much.

500 $ USD en 30 jours
(0 Commentaires)