Harry Potter and the Goodreads Comments

May 2020

My very first Digital Humanities project from waaaay back in undergrad. Mostly here as a benchmark for how far I've come.

The goal of this project was to follow Sinclair and Rockwell’s hermenuetic spiral approach to text analysis. My project partners and I created a corpus of reader comments about the Harry Potter series using the since discontinued GoodReads API (RIP) and then explored it using a combination of Voyant and various python scripts.

The final project is hosted as a Google Site. If you want to use our corpus for any reason that’s available on my GitHub: HP Comments.

Topics

  • Digital Humanities
  • Topic Modeling
  • Classification

Technologies Used

  • Voyant
  • GenSim
  • Scikit-learn
  • SciPy
  • Matplotlib
  • BeautifulSoup

Challenges & Learnings

This may have been the first project where I really used Python. It was definitely the first web scraper I ever built. A lot of the code we used for topic modeling and clustering was adapted sample code from Fernando Nascimento’s digital text analysis class and our findings weren’t terribly robust, but this project definitely started me on my journey into DH.