One of the most interesting conversations I had at Women’s History in the Digital Word occurred with Cameron Blevins and Bridget Baird over the lunch, and then Jeri Wieringa joined us. I was desperate to find out what sort of metric they applied in their fascinating analysis of Martha Ballard and Elizabeth Drinker’s diaries
So based on my corpus of about 1.5m words (1800+ items), they suggested 60 topics, more than double what I’d ever tried.
We also discussed how topic modeling works wells for some sources (things with short relatively coherent content, like newspapers and diaries) but not so well for some other things potentially. Jeri noted that she and Fred Gibbs‘ had discussed this issue as well but concluded that different algorithms might be necessary for topic modeling different sources. Of course if we get into individual writing idiosyncrasies, well we are looking at some custom scripting then right (Which is when Bridget told me I could never pass one of her classes OH SNAP, she is FABULOUS BTW).
ANYWHO using David Newman’s mallett tool I had yet ANOTHER RUN at Off Our Backs and DANG if it didn’t work out pretty nicely. HOLLA Jon Goodwin!