Word Document Scraping Workshop

Author

Joseph Bodenheimer

Material Description

This workshop walks through building a complete Python workflow to scrape public data from Kansas Department of Health and Environment (KDHE) Consumer Confidence Reports (CCR) .docx files. The workshop covers how Word documents are organized and how to use python-docx, regular expressions, and pandas to extract key fields and combine results across many reports.

Module Materials

You can access the material here

This Material is under the CC BY license

Back to top