I understand that task completion will generate an excel of around 50,000+ entries (assuming it has primary,secondary,independent, special schools etc).
Though it appears a simple task but the key here is accuracy. I propose to finish the project with 100% proven accuracy. I propose additional validation of the extracted data - county validation by using mapping technique (dictionary of county names), email validation by using Regular expression, school names against the site used as Input to this task (additionally by name annotation)
Interestingly, I see a good research opportunity with this data (and more features) to predict the demand of teachers,schools in each county (by using the demographic data) to meet future demand.