Data Search and Parsing - Schools and Universities Globally

for [url removed, login to view] - please get an account and create a private test journal to familiarise yourself with how this site works.



You must have the following skills:

* good search and research skills - finding datasets

* any scripting language (your choice) ideally with Linux

* web knowledge for retrieving and parsing data from sites (e.g. using the CURL library)

* good understanding of SQL - preferably Postgresql

* knowledge of character sets and encodings (we use UTF-8 for our database and text)

* be tidy, meticulous and produce high quality data and code

We will keep in reserve up to 25% on top of the bid price as a bonus which will be delivered depending on the quality. You will also be considered for future work, as we are an on-going operation and have several requirements. This can be considered your trial. Note that a lot of the research has already been done, so please look at the attached files, resources etc before putting in your quote.

You are more likely to be the successful bid if you can send some code samples, to check for formatting, style, quality and structure.



I would like to gather a list of all schools, colleges and universities globally. Your deliverable will be the script to gather this information from the respective websites, or from downloaded files and the resultant .SQL files containing the INSERT statements for the data.

You will need to:

* take the existing online resources (see below) and search for other additional data sources and resources to include educational institutions globally

* ensure that these data sources contain the mandatory fields

* download these data sources, or write scripts to harvest the data from the web pages where the information is present

* deliver the resulting .SQL INSERT statements for the postgres database to insert these rows into the database table, and a list of sources, codes and references explaining where you obtained them



Each educational institution MUST HAVE THE FOLLOWING:

* name

- (UTF-8 encoding please)

* two letter country code - e.g. US, NZ, AU, IN, TH etc

- (ISO 3166 country code, see the attached base_country SQL file)

* longitude and latitude (geographical coordinates)

- these must be _decimal formatted_ and to at least three decimal places ie [url removed, login to view]

- by decimal formatted, i mean where the decimal is out of 60 seconds, it needs to be converted to be out of 100

* type (either 'primary', 'secondary' or 'tertiary')

- primary schools are for up to ages of approximately 10-14

- secondary schools are more normally known as high schools

- tertiary is university, (or US: college), or polytechnic, technical school, MBA course, medical schools, professional training schools,

OPTIONAL fields (where available)


In decreasing order of importance:

* name of town/city

* ideally: the state, region, province or territory where the institution is found (e.g. CA for California, or Surat Thani in Thailand) - any format, e.g. text

* type of school (e.g. IT, medical studies, arts, physics, engineering etc)

* address of the educational institution

* website of the institution

* phone, email, fax of the institution



It would be good if you could find at least secondary and tertiary coverage globally, ie for each and every country, but we must have full coverage (primary, secondary, tertiary) for the following countries: United States, Canada, South Africa, Australia, New Zealand, EU countries (Britain, France, Germany, Sweden, Denmark, Norway, Spain, Italy, Poland etc), Asian countries (Japan, Korea, India, Thailand, ideally China, Hong Kong, Taiwan) and Ukraine/Russia (if available).

Other countries (Bangladesh, Indonesia, African countries etc) will help earn your bonus, if you can manage them.

Note that the trickiest ones here are the ones with foreign character sets (Chinese, Korean, Japanese, Russian). You can make a separate quote for this if you wish, after looking at what is available online, but you may well find a list that covers these places anyway.

Resources and Reading


List of colleges and universities (tertiary institutions) by country:

[url removed, login to view]

[url removed, login to view]

Colleges and Universities by Country (seems to lack coordinates, pls check)

[url removed, login to view]

Schools in the World (seems to lack Country, and not have type)

[url removed, login to view]

[url removed, login to view]

Site with lists of these things:

[url removed, login to view]

Google for other data sets,

A function which converts decimal latitude and longitude into traditional (degrees/60 seconds) format:

Compétences : .NET, Traitement de Données, PHP, Python, XML

en voir plus : xml formatting online, www united com, www search com au, www google com india, write xml code website, write scripts online, write price nz, write for medical website, work with russian language, works people, work south australia, works online india, work search australia, work search, work online indonesia, work in south australia, work available in australia, wiki website us, wiki websites, wikipedia websites

Concernant l'employeur :
( 2 commentaires ) London, United Kingdom

Nº du projet : #24938