Varying statistics about DNA matches

NewScientist has a story about a criminal case that demonstrates how scientists can disagree about statistics regarding DNA analysis:

The DNA analyst who testified in Smith’s trial said the chances of the DNA coming from someone other than Jackson were 1 in 95,000. But both the prosecution and the analyst’s supervisor said the odds were more like 1 in 47. A later review of the evidence suggested that the chances of the second person’s DNA coming from someone other than Jackson were closer to 1 in 13, while a different statistical method said the chance of seeing this evidence if the DNA came from Jackson is only twice that of the chance of seeing it if it came from someone else…

[W]e show how, even when analysts agree that someone could be a match for a piece of DNA evidence, the statistical weight assigned to that match can vary enormously.

I recall reading something recently that suggested while the public thinks having DNA samples in a criminal case makes the case very clear, this is not necessarily the case. This article suggests is a lot more complicated and it depends on what lab and scientists are looking at the DNA samples.

How statistics may change golf

Statistics are part of many sports and are often used by managers, coaches, and players to make decisions.

Golf is not yet up to par with others sports (see the Moneyball craze in baseball or the efforts of some NBA teams to analyze games) but that moment might be just around the corner, according to Slate:

We’re in a golden age for golf research because the PGA Tour has opened ShotLink’s books to researchers. Two professors at the Wharton school, for example, looked at 1.6 million tour putts and concluded that professional golfers are risk-averse. They examined putts for par and putts for birdie from the same distances and discovered that pros make the birdie putts less often. They suggest that pros leave these birdie putts short out of fear of making bogey, and then calculate that this bogey terror—and the resultant failure to approach birdie putts in the same way as par putts—costs the average tour player about one stroke per tournament.

It’s insights like this that offer the provoking notion that a Moneyball-type revolution awaits golf.

It would seem like an advantage to players to have this kind of data and analysis in hand as long as they don’t completely overrule their instincts for the game. Just because one has statistics available doesn’t necessarily mean they will be used judiciously.

h/t Instapundit

Emerging businesses looking for social scientists who can data mine and use statistics

The Economic Times of India contains an interview with Prabhakar Raghavan, chief scientist for Yahoo! and head of their labs. Raghavan talks about their studies of social networking and social influence. Then Raghavan was asked about the people undertaking these studies:

What is the percentage of social scientists in Yahoo! Labs who anchor such work ?

They constitute around 10% of our people. We are interested in social scientists who can work on data mining. But in most colleges, the sociology department doesn’t teach data mining and the statistics department does not offer sociology. That’s why emerging businesses face a serious dearth of such social scientists.

A reminder that all sorts of businesses are looking for sociology students who are well-versed in statistics (and data mining). Since many students don’t think sociology and statistics naturally go together, it is up to colleges (and sociology statistics instructors) to help them put it together. Sociology may often be billed as a discipline that will help students understand, analyze, and change the world but one often needs to be able to work with and analyze data in these efforts.

This interview is also a reminder that social scientist degree holders are not just relegated to a career in academia.

A disappearing middle class?

Yahoo Finance has a story that contains 22 statistics to “prove” the American middle class is “radically shrinking.” Interestingly, some of these statistics don’t prove much of anything about the middle class even if  they do indicate something about America as a whole. The post does show that the wealthy have gotten wealthier but without more context (statistics to compare to from the past, rates from other nations, etc.), there are better statistics to use to make this argument. Some of the statistics are linked to the latest economic downturn such as a rising number of bankruptcies and a rising time for finding a job.

Some examples of weaker statistics:

-“36 percent of Americans say that they don’t contribute anything to retirement savings.” How does this compare to previous rates? Perhaps the Americans of today don’t save like people in the past?

-“More than 40 percent of Americans who actually are employed are now working in service jobs, which are often very low paying.” Service jobs are often low paying – but we don’t know much more from this statistic.

-“For the first time in U.S. history, more than 40 million Americans are on food stamps, and the U.S. Department of Agriculture projects that number will go up to 43 million Americans in 2011.” Sounds bad – but since we now have more people in the country, a percentage would be a much better measure.

-“Average Wall Street bonuses for 2009 were up 17 percent when compared with 2008.” This is a shot at Wall Street more than an explanation about the middle class.

Other statistics do back up his point (even though they would all benefit from more explanation):

-“66 percent of the income growth between 2001 and 2007 went to the top 1% of all Americans.”

-“Only the top 5 percent of U.S. households have earned enough additional income to match the rise in housing costs since 1975.”

-“The bottom 50 percent of income earners in the United States now collectively own less than 1 percent of the nation’s wealth.”

On the whole, this seems more an alarmist piece. There is evidence to back up his argument – but the evidence here is not presented well and needs a lot more context.

Discovering fake randomness

In the midst of a story involving fake data generated for DailyKos by the polling firm, Research 2000, TechDirt summarizes how exactly it was discovered that Research 2000 was faking the data. Several statisticians approached Kos after seeing some irregularities in cross-tab (table) data. The summary and the original analysis on DailyKos are fascinating: even truly random data follows certain parameters. One takeaway: faking random data is a lot harder than it looks. Another takeaway (for me at least): statistics can be both useful and enjoyable.

The three issues as summarized on DailyKos:

Issue one: astronomically low odds that both male and female figures would both be even or odd numbers.

In one respect, however, the numbers for M and F do not differ: if one is even, so is the other, and likewise for odd. Given that the M and F results usually differ, knowing that say 43% of M were favorable (Fav) to Obama gives essentially no clue as to whether say 59% or say 60% of F would be. Thus knowing whether M Fav is even or odd tells us essentially nothing about whether F Fav would be even or odd.

Issue two: the margin between favorability and unfavorability ratings did not display enough variance. If the polls were truly working with random samples, there would be broader range of values.

What little variation there was in the difference of those cross-tab margins seemed to happen slowly over many weeks, not like the week-to-week random jitter expected for real statistics.

Issue three: the changes in favorability ratings from week to week were too random. In most polls like this that track week to week, the most common result is no change. Research 2000 results had too many changes from week to week – often small changes, a percent either way.

For each individual issue, the odds are quite low that each would arise with truly random data. Put all three together happening with the same data and the odds are even lower.

Besides issues regarding integrity of data collection (and it becomes clearer why many people harbor a distrust toward polls and statistics), this is a great example of statistical detective work. Too often, many of us see numbers and quickly trust them (or distrust them). In reality, it takes just a little work to dig deeper into figures to discover what exactly is being measured and how it is being measured. The “what” and “how” matter tremendously as they can radically alter the interpretation of the data. Citizens and journalists need some of these abilities to decipher all the numbers we encounter on a daily basis.