Using analytics and statistics in sports and society: a ways to go

Truehoop has been doing a fine job covering the 2013 MIT Sloan Sports Analytics Conference. One post from last Saturday highlighted five quotes “On how far people have delved into the potential of analytics“:

“We are nowhere yet.”
— Morey

“There is a human element in sports that is not quantifiable. These players bleed for you, give you everything they have, and there’s a bond there.”
— Bill Polian, ESPN NFL analyst

“When visualizing data, it’s not about how much can I put in but how much can I take out.”
— Joe Ward, The New York Times sports graphics editor

“If you are not becoming a digital CMO (Chief Marketing Officer), you are becoming extinct.”
— Tim McDermott, Philadelphia Eagles CMO

“Even if God came down and said this model is correct … there is still randomness, and you can be wrong.”
— Phil Birnbaum, By The Numbers editor

In other words, there is a lot of potential in these statistics and models but we have a long way to go in deploying them correctly. I think this is a good reminder when thinking about big data as well: simply having the numbers and recognizing they might mean something is a long way from making sense of the numbers and improving lives because of our new knowledge.

History – facts = sociology?

Lamenting how history is taught in today’s schools, one writer argues that history without facts is just sociology:

My son’s teacher confirmed that this is broadly true. The teaching of history in British schools is increasingly influenced by US methods of presenting the past thematically rather than chronologically. Thus pupils might study crime and punishment, or kingship, and dip in and out of different centuries. Consequently, dates lose their value. So 1605, which for me means the Gunpowder Plot, for my son simply means that he is five minutes late for games.

I didn’t argue with his teacher, and in any case there is more than one way to skin a cat, as Torquemada (1420-1498) knew. Besides, a slant on history that was good enough for two of our greatest historians, WC Sellar and RJ Yeatman, ought to be good enough for me. The subtitle of their enduringly delightful 1930 book, 1066 And All That, was A Memorable History of England comprising all the parts you can remember, including 103 Good Things, 5 Bad Kings, and 2 Genuine Dates.

Maybe it wasn’t crusty American academics but Sellar and Yeatman, having a laugh, who really popularised the notion that history can be taught largely without dates. “The first date in English history is 55BC,” they wrote, referring to the arrival of Julius Caesar and his legions on the pebbly shores of Kent. “For the other date, see Chapter 11, William the Conqueror.” They didn’t specify the year in which the King of Spain “sent the Great Spanish Armadillo to ravish the shores of England”.

Whatever, I can see the logic of going down the thematic rather than the chronological route. And I made sympathetic noises when my son’s teacher explained that “it’s helpful for those pupils who struggle to take in lots of facts”. But even if we leave out dates, aren’t facts what history is all about? The rest, as they say, is sociology.

This is not an unusual complaint: the next generations always seem to know less history and perhaps even more troubling is that they don’t seem to care.

A couple of other thoughts:

1. Why can’t you have both dates and thematic approaches? Knowing dates doesn’t necessarily know that a student knows what to do with the information or that they know the broad sweep of historical change.

2. I think the argument in the final sentence is that sociology is devoid of facts. While sociologists may indeed care about certain topics (such as race, class, and gender) that others don’t care as much about, we also care about facts. For example, many sociology undergraduate programs have students take statistics and research methods courses. We don’t want students or sociologists simply interpreting data and information without having their findings be reliable (replicable) and valid (measuring what we say we are). There is a lot of debate within the field about how we can best know about the world and determine what is causing or influencing what. This is not easy work since most social situations are quite complex and there are a lot of variables at play.

3. Why can’t history and sociology coexist? As an overgeneralization, history tends to tell us what happened and sociology helps us think through why these things happened. Why can’t sociology help inform us about history, particularly about how certain historical narratives develop and then become part of our collective memory?

A call to collect better data in order to predict economic crises

Economist Robert Shiller says that we would be better able to predict economic crises if we only had better data:

Eventually, these advances led to quantitative macroeconomic models with substantial predictive power — and to a better understanding of the economy’s instabilities. It is likely that the “great moderation,” the relative stability of the economy in the years before the recent crisis, owes something to better public policy informed by that data.

Since then, however, there hasn’t been a major revolution in data collection. Notably, the Flow of Funds Accounts have become less valuable. Over the last few decades, financial institutions have taken on systemic risks, using leverage and derivative instruments that don’t show up in these reports.

Some financial economists have begun to suggest the kinds of measurements of leverage and liquidity that should be collected. We need another measurement revolution like that of G.D.P. or flow-of-funds accounting. For example, Markus Brunnermeier of Princeton, Gary Gorton of Yale and Arvind Krishnamurthy of Northwestern are developing what they call “risk topography.” They explain how modern financial theory can guide the collection of new data to provide revealing views of potentially big economic problems.

Even if more data was collected, it would still require interpretation. If we had the right data before the ongoing current economic crisis, I wonder how confident Shiller would be that we would have made the right predictions (50%? 70% 95%?). From the public narrative that has developed, it looks like there was enough evidence that the mortgage industry was doing some interesting things but few people were looking at the data or putting the story together.

And for the future, do we even know what data we might need to be looking at in order to figure out what might go wrong next?

Interpreting data regarding scientists and religion

In looking at some data regarding what scientists think about religion, a commentator offers this regarding interpreting sociological data:

The point about asking such questions is not because we know the answers but to emphasise that the interpretation of sociological data is a tricky business. From the perspective of science, ants and humans are far more complex than stars and rocks. A discussion of atheism and science in the US context leads us straight to a discussion of the structure of the American educational system, the role of elites, the present polarisation of the political electorate along religious faultlines, and much else besides…

The challenge then is to think hard about the complex data and not be too dogmatic about the interpretations.

When the phrase “tricky business” is used, it sounds like it is referring to the complex nature of the social world. In order to understand the relationship between science and religion, one must account for a variety of possible factors. It is one thing to say that there are multiple possible interpretations of the same data, another to say that some twist data to support their personal interpretations, and another to suggest that we can get to a correct or right interpretation if we properly account for complexity.

While this commentary is ultimately about using caution when interpreting statistics regarding the religious beliefs of scientists, it also is a little summary of social science research regarding the religious beliefs of scientists. The 2010 study Science vs. Religion is discussed as well as a few other works.

Interpreting the FBI’s 2009 hate crime report

Hate crime legislation is a topic that seems to rile people up. The Atlantic provides five sources that try to summarize and make sense of the latest annual data released by the FBI:

Agence France-Presse reports that “out of 6,604 hate crimes committed in the United States in 2009, some 4,000 were racially motivated and nearly 1,600 were driven by hatred for a particular religion … Blacks made up around three-quarters of victims of the racially motivated hate crimes and Jews made up the same percentage of victims of anti-religious hate crimes.” The report also notes that “anti-Muslim crimes were a distant second to crimes against Jews, making up just eight percent of the hate crimes driven by religious intolerance.” Finally, the report notes a drop in hate crimes overall: “Some 8,300 people fell victim to hate crimes in 2009, down from 9,700 the previous year.”

This is a reminder that there is a lot of data out there, particularly generated by government agencies, but we need qualified and skilled people to interpret its meaning.

You can find the data on hate crimes at the FBI website of uniform crime reports. Here is the FBI’s summary of the incidents, 6,604 in all.