Mendenhall, who is also a professor of African American studies and urban and regional planning, is heading up the interdisciplinary team of researchers and computer scientists working on the big data project, which aims to better understand black women’s experience over time. The challenge in a project like this is that documents that record the history of black women, particularly in the slave era, aren’t necessarily going to be straightforward explanations of women’s feelings, resistance, or movement. Instead, Mendenhall and her team are looking for keywords that point to organizations or connections between groups that can indicate larger movements and experiences.
Using a supercomputer in Pittsburgh, they’ve culled 20,000 documents that discuss black women’s experience from a 100,000 document corpus (collection of written texts). “What we’re now trying to do is retrain a model based on those 20,000 documents, and then do a search on a larger corpus of 800,000, and see if there are more of those documents that have more information about black women,” Mendenhall added…
Using topic modeling and data visualization, they have started to identify clues that could lead to further research. For example, according to Phys.Org, finding documents that include the words “vote” and “women” could indicate black women’s participation in the suffrage movement. They’ve also preliminarily found some new texts that weren’t previously tagged as by or about black women.
Next up Mendenhall is interested in collecting and analyzing data about current movements, such as Black Lives Matter.
It sounds like this involves putting together the best algorithm to do pattern recognition that would take humans too long to process. This can only be done with some good programming as well as a significant collection of texts. Three questions come quickly to mind:
- How would one report findings from this data in typical outlets for sociological or historical research?
- How easy would it be to apply this to other areas of inquiry?
- Is this data mining or are there hypothesis that can be tested?
There are lots of possibilities like this with big data but it remains to be seen how useful it might be for research.