Web scraping: how to extract structured data from a website
Web scraping is a technique that uses different software tools to extract data or information from a web page. It is used to collect data without structure and convert it into structured data to be later processed in databases or spreadsheets. The workshop will adopt a practical approach to web scraping with the aim of allowing attendees to carry out the processing of useful information in their own projects.
The meeting will establish an ongoing line of work focusing on data and the viewing of data overseen by the Montera34 group and following on from the Maps&Data workshops held in 2016 and 2017 in Hirikilabs, one of the results of which was the Report on the Airbnb effect in Donostia and the Basque Country. The objective of this new line of work, consisting of meetings and workshops, is to feed into the DataCommonsLab, a new open group that will be carrying out ongoing work on data and which will meet periodically in Hirikilabs.
February 6, Tuesday
Introduction: Presentation of the activity, establishment of the context and explanation of the aims of the workshop.
Introduction to scraping: Explanation of web operation (HTML, JSON, APIs ...), and introduction to forms of information storage obtained.
Scraper development: Explanation and practical application of initial tools to carry out scraping (Postman, Python, Beautiful Soup, etc.).
February 7, Wednesday
Scraper development: Continuation of the previous day's session.