So Heather has continued her gracious teaching via email.  My second lesson has more to do with those expectations.

This is just like I don’t need a computer to tell me that The Tempest is about an island, Moby Dick is about whales, or that Les Miserables is about France. These seem like pretty bald-faced, obvious statements. Of course The tempest is about an island (they’re on an island), Moby Dick is about whales (they’re whale-hunting), and Les Miserables is about France (all the french words/names; also that whole thing about paris.)   

Computers are very good at finding patterns that are harder to see with the naked eye: what we call “linear reading”: picking up a book and reading from the first word on the first page until the last word on the last page. You might not notice that halfway through Moby Dick they stop talking about whales as much to talk about boats for a while (this does happen), because it’s a book about whales.

Computers can show you where this is happening, leaving you to ask why. This is all leading up to more expectations. One of the things that was most compelling about your very early topic-modelling results was that it was able to pick out examples of intersection  in ways that I hadn’t seen a topic model do before. I remain wary of topic modeling mostly because nobody seems to have any consensus about good practice, how to do it, why you want to, etc. This is another story, though – but for now let’s take what you did get out of that and see where to go from here.

racism can __________ be ended, addressed 

racism is __________ bad  

_________ is heterosexist 

the family, marriage 

_________ of class   problems of 

As I’m on break this week and Heather was working the lab, we were able to work together via google docs.  Check it out.


  • cluster for racism tie to other isms   most common cluster is “racism and”
  • discovered “s culture”-> cluster 82 instances of women’s culture, contravening in initial impression from collocates which were  feminist, lesbian, white, American,
  • looking at words that characterize collective discourse v oppositional discourse