En cours

Open Access Harvest Project (data gathering scripting)

We are looking to build an Open Access archive of freely available scholarly journals. [url removed, login to view] is a good explanation of the what the content and project field is related to.

Requirements:

A. Create a harvesting engine in your own choice of coding ( parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some site support

2.) If not crawling read from an input file to gleam the data, some site supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a muli-core processor should be sufficient

Note: we are looking for someone to develop and maintain this on a monthly basis.

Note: This is also known as data extraction. The data provided will be Article level data relative to each Journal. The detail data will need these output fields:

"Publisher", "Journal Title","ISSN", "Alternate ISSN", "Journal Year", "JournalVol","JournalIssue", "HTML URL", "PDF URL", "Start Page", "End Page"

We will be selecting two developers for this project.

Compétences : Programmation C, Java, Perl, Ruby on Rails, Web Scraping

Voir plus : access data project, oai, what is parallel programming, what is data input, what is a method in programming, programming wiki, programming resources, programming org, programming in access, parallel programming in c, open text, one harvest, ftp engine, ftp dump site, find wikipedia, find sites for programming work, file processor, d&b supply, c programming wiki, c# parallel programming, what is open text, parallel programming c, internet programming project, find a programming project, parallel programming

Concernant l'employeur :
( 5 commentaires ) Windsor, United States

N° du projet : #1608716

4 freelance ont fait une offre moyenne de 600 $ pour ce travail

MagedGazzar

I already have this system built but I need to have the sites and output format

500 $ USD en 3 jours
(2 Commentaires)
4.7
dolphin3456

I can start on this project

400 $ USD en 20 jours
(1 Commentaire)
1.5
tutorapex

Great to know about this task.I am interested and will do it for you

1000 $ USD en 23 jours
(0 Commentaires)
0.0
Efor3C

Hi, I have experience of harvesting OAI data. Please see a demo as in the PMB. Thank you very much.

500 $ USD en 30 jours
(0 Commentaires)
0.0