Scraping - getting a computer to capture information from online sources - is one of the most powerful techniques for data-savvy journalists who want to get to the story first, or find exclusives that no one else has spotted.
This three-day workshop in scraping is designed for reporters with no knowledge of scraping or programming and provides essential skills for getting original stories by compiling data across a range of online sources. By the end of the workshop, you will be able to use specialist scraping tools (without programming) and begin to write your own, more advanced, scrapers. You will also be able to communicate with programmers on relevant projects.
Timetable
Tuesday, 22 January: Scraping basics
10-10.30am Registrations
10:30-11:15am Introduction: What scraping is and how news organisations are using it
11:30-12.15pm Pitching story ideas involving scraping
12:15-1pm Scraping basics: finding structure in HTML and URLs
1-2pm Lunch
2-3.45pm Simple scraping jobs: checking a webpage every day; identifying information using XPath
4-5pm Introduction to scraping tools: Outwit Hub
Wednesday, 24 January: Looking at what's available
9-10am Advanced Outwit Hub: scraping multiple pages
10-10:15am What's possible with programming: APIs, regex and loops
10:30am-12pm Scraping text that fits a pattern: regex
12-1pm Lunch
1-3.45pm Basic scraping with Python and Morph.io
4-5pm Scraping database search results by following links: loops
Thursday, 25 January: Advanced techniques
9-10am Advanced scraping: spreadsheets
10-11am Advanced scraping: PDFs
11am-12pm Scraping lab: problem solving
12-1pm Lunch
1-4pm Scraping lab: problem solving
4-5pm Wrap up, final results
Prices:
Big organisations (10+ people) - £405
Freelancers and small organisations (9 people and fewer) - £305
Students (correspondence/evening course) - £205 (limited availability)
Students (full time) - £155 (limited availability)
Full time Goldsmiths' students get 20% discount on all CIJ courses. Please contact marina(at)tcij.org for more details. (Limited availability)
Tags: Journalism education Online news Online media Data journalism