Harry Potter and the Goodreads Comments
May 2020
My very first Digital Humanities project from waaaay back in undergrad. Mostly here as a benchmark for how far I've come.
The goal of this project was to follow Sinclair and Rockwell’s hermenuetic spiral approach to text analysis. My project partners and I created a corpus of reader comments about the Harry Potter series using the since discontinued GoodReads API (RIP) and then explored it using a combination of Voyant and various python scripts.
The final project is hosted as a Google Site. If you want to use our corpus for any reason that’s available on my GitHub: HP Comments.
Topics
- Digital Humanities
- Topic Modeling
- Classification
Technologies Used
- Voyant
- GenSim
- Scikit-learn
- SciPy
- Matplotlib
- BeautifulSoup
Challenges & Learnings
This may have been the first project where I really used Python. It was definitely the first web scraper I ever built. A lot of the code we used for topic modeling and clustering was adapted sample code from Fernando Nascimento’s digital text analysis class and our findings weren’t terribly robust, but this project definitely started me on my journey into DH.