Hard numbers

As I’ve mentioned before (including yesterday), everybody seems to be beating up the legal job market these days.  The American Bar Association apparently decided that it was time to inject some actual numbers into the discussion:

[Most prior discussion has] been based in great part on the tools of journalism: anecdote, instinct and the oft-competing wisdom of any experts we can find.

With this issue, however, the ABA Journal is offering our readers a new—and we believe different—view of the business and the profession.

We’ve teamed up with a nationally recognized expert on trends in the legal profession, William D. Henderson of the Center on the Global Legal Profession at Indiana University’s Maurer School of Law. We asked Henderson, a pioneer in the empirical study of the legal industry, to identify and map the movements of jobs and money.

There’s a separate page that allows county-by-county data searching.

Here’s the thing:  based on my look at the publicly available U.S. Bureau of Labor Statistics data, underlying the ABA’s “report”, I’m not quite sure what the ABA has added to the discussion here.  Sure, they’ve generated some colorful graphs and county-by-county maps.  But as far as I can tell, all (and I do mean all) of this data has been around since at least May 14, 2010.  And it’s not like the ABA has done much analysis here; they’ve basically just sorted the size of salaries out by metro region and announced a few “surprises”.

Even more problematically, I’m not sure there are many clear takeaways due to the inherent shortcomings of this data.  Per the bottom of the article’s main page:

The [U.S. Bureau of Labor Statistics] data are a representative sample of employed lawyers. The sample includes lawyers employed in law firms, state and local government, federal government, in-house lawyers in businesses, and nonprofits. Lawyers, as defined by the BLS classification (SOC), “represent clients in criminal and civil litigation and other legal proceedings, draw up legal documents, and manage or advise clients on legal transactions. May specialize in a single area or may practice broadly in many areas of law.” Equity partners and solo practitioners are not included in the survey. [emphasis added]

In other words:

  1. This data leaves out solo practitioners — fully 35% of all lawyers according to Harvard Law School’s research.  Analysis:  these salary numbers skew high.  (I suppose the lack of focus on solos isn’t too surprising since only about 7% of all solos belong to the ABA anyway.)
  2. This data only applies to employed lawyers.  Analysis:  This article tells us nothing about the marginal earning prospects of unemployed lawyers, including recently graduated J.D.’s who are “temporarily” employed in other industries (e.g., as servers in restaurants).

I get that this is “the first installment of a periodic series.”  But come on, ABA.  It’s more than a little disingenuous to claim that “the ABA Journal is offering our readers a new—and we believe different—view of the business and the profession” by “identify[ing] and map[ing] the movements of jobs and money” when you’re simply re-publishing eight month old government data with an arguably misleading slant and without substantive analysis.

Chart of total carbon emissions and emissions per capita

Miller-McCune has put together two charts showing total carbon emissions by country and also emissions per capita by country. See the two charts here.

This is colorful and vibrant. And it is nice to have the charts side-by-side as one can easily make comparisons. For example, the US is #2 in total emissions but #9 in per capita emissions. As The Infrastructurist points out, the chart gives some insights into how many countries might need to deal with per capita emissions rather than point fingers at countries with the largest amount of carbon emissions.

But there is a lot of information compressed in this chart – it is hard to see a lot of the smaller countries with small circles. Additionally, why are the countries in the order they are? It appears that regions are together but the order is not the same for both charts and it certainly isn’t rank-ordered (China and US are on opposite ends of the chart for total emissions). The color and vibrancy seems to be more important to the chart-makers than having a logical order to the countries.

h/t The Infrastructurist

Graphic comparing US to other developed nations on nine measures

This particular graphic provides a look at how the United States stacks up against other developed nations on nine key measures, such as a Gini index, Gallup’s global wellbeing index, and life expectancy at birth.

As a graphic, this is both interesting and confusing. It is interesting in that one can take a quick glance at all of these measures at once and the color shading helps mark the higher and lower values. This is the goal of graphics or charts: condense a lot of information into an engaging format. However, there are a few problems: there is a lot of information to look at, it is unclear why the countries are listed in the order they are, and it takes some work to compare the countries marked with the different colors because they may be at the top or bottom of the list.

(By the way, the United States doesn’t compare well to some of the other countries on this list. Are there other overall measures in which the United States would compare more favorably?)

Modeling “wordquakes”

Several researchers suggest that certain words on the Internet are used in patterns similar to those of earthquakes:

News tends to move quickly through the public consciousness, noted physicist Peter Klimek of the Medical University of Vienna and colleagues in a paper posted on arXiv.org. Readers usually absorb a story, discuss it with their friends, and then forget it. But some events send lasting reverberations through society, changing opinions and even governments.

“It is tempting to see such media events as a human, social excitable medium,” wrote Klimek’s team. “One may view them as a social analog to earthquakes.”…

Events that came from outside the blogosphere also seemed to exhibit aftershocks that line up with Omori’s law for the frequency of earthquake aftershocks.

“We show that the public reception of news reports follow a similar statistic as earthquakes do,” the researchers conclude. “One might also think of a ‘Richter scale’ for media events.”

“I always think it’s interesting when people exploit the scale of online media to try to understand human behavior,” said Duncan Watts, a researcher at Yahoo! Research who describes himself as a “reformed physicist who has become a sociologist.”

But he notes that drawing mathematical analogies between unrelated phenomena doesn’t mean there’s any deeper connection. A lot of systems, including views on YouTube, activity on Facebook, number of tweets on Twitter, avalanches, forest fires, power outages and hurricanes all show frequency graphs similar to earthquakes.

“But they’re all generated by different processes,” Watts said. “To suggest that the same mechanism is at work here is kind of absurd. It sort of can’t be true.”

A couple of things are of note:

1. One of the advantages of the Internet as a medium is that people can fairly easily track these sorts of social phenomenon. The data is often in front of our eyes and once collected and put into a spreadsheet or data program is like any other dataset.

2. An interesting quote from the story: the “reformed physicist who has become a sociologist.” This pattern that looks similar to an earthquake is interesting. But sociologists would also want to know why this is the case and what factors affect the initial “wordquake” and subsequent aftershocks. (But it is interesting that the paper was developed by physicists: how many sociologists would look at this word frequency data and think of an earthquake pattern?)

2a. Just thinking about these word frequencies, how does this earthquake model differ from other options for looking at this sort of data? For example, researchers have used diffusion models to examine the spread of riots. Is a diffusion model better than an earthquake model for this phenomena?

3. Does this model offer any predictive power? That is, does it give us any insights into what words may set off “wordquakes” in the future?

Use data in order to describe Anacostia neighborhood in Washington, D.C.

A recent NPR report described the changes taking place in the Anacostia neighborhood in Washington, D.C. In addition to calling Washington “Chocolate City” (setting off another line of debate), one of the residents quoted in the story is unhappy with how the neighborhood was portrayed:

Kellogg wrote that “in recent years, even areas like Anacostia — a community that was virtually all-black and more often than not poor — have seen dramatic increases in property values. The median sales price of a home east of the river — for years a no-go zone for whites and many blacks — was just under $300,000 in 2009, two to three times what it was in the mid-’90s.” After profiling one black resident who moved out, Kellogg spoke with David Garber, a “newcomer” among those who “see themselves as trailblazers fighting to preserve the integrity of historic Anacostia.”

But Garber and others didn’t like the portrayal, as even WAMU’s Anna John noted in her DCentric blog, where she headlined a post “‘Morning Edition’ Chokes On Chocolate City.”

On his own blog And Now, Anacostia, Garber wrote that the NPR story “was a dishonest portrayal of the changes that are happening in Anacostia. First, his evidence that black people are being forced out is based entirely on the story of one man who chose to buy a larger and more expensive house in PG County than one he was considering near Anacostia. Second, he attempts to prove that Anacostia is becoming ‘more vanilla’ by talking about one white person, me — and I don’t even live there anymore.”

Garber also complained that Kellogg “chose to sensationalize my move out of Anacostia” by linking it to a break-in at his home, which Garber says was unrelated to his move. Garber says Kellogg chose to repeat the “canned story” of Anacostia — which We Love D.C. bluntly calls a “quick and dirty race narrative.”

Garber continues, “White people are moving into Anacostia. So are black people. So are Asian people, Middle Eastern people, gay people, straight people, and every other mix. And good for them for believing in a neighborhood in spite of its challenges, and for meeting its hurdles head on and its new amenities with a sense of excitement.”

This seems like it could all be solved rather easily: let us just look at the data of what is happening in this neighborhood. I have not listened to the initial NPR report. But it would be fairly easy for NPR or Garber or anyone else to look up some Census figures regarding this neighborhood to see who is moving in or out. If the NPR story is built around Garber’s story (and some other anecdotal evidence), then it is lacking. If it has both the hard data but the story is one-sided or doesn’t give the complete picture, then this is a different issue. Then, we can have a conversation about whether Garber’s story is an appropriate or representative illustration or not.

Beyond the data issue, Garber also hints at another issue: a “canned story” or image of a community versus what residents experience on the ground. This is a question about the “character” of a location and the perspective of insiders (residents) and outsiders (like journalists) could differ. But both perspectives could be correct; each view has merit but has a different scope. A journalist is liable to try to place Anacostia in the larger framework of the whole city (or perhaps the whole nation) while a resident is likely working with their personal experiences and observations.

Thinking about a legal framework for a potential apocalypse

This story about the State of New York thinking about the legal challenges of an apocalyptic event might cause one to wonder: why are they spending time with this when there are other pressing concerns? Here is a description of some of the issues that could arise should an apocalypse occur:

Quarantines. The closing of businesses. Mass evacuations. Warrantless searches of homes. The slaughter of infected animals and the seizing of property. When laws can be suspended and whether infectious people can be isolated against their will or subjected to mandatory treatment. It is all there, in dry legalese, in the manual, published by the state court system and the state bar association.

The most startling legal realities are handled with lawyerly understatement. It notes that the government has broad power to declare a state of emergency. “Once having done so,” it continues, “local authorities may establish curfews, quarantine wide areas, close businesses, restrict public assemblies and, under certain circumstances, suspend local ordinances.”…

“It is a very grim read,” Mr. Younkins said. “This is for potentially very grim situations in which difficult decisions have to be made.”…

The manual provides a catalog of potential terrorism nightmares, like smallpox, anthrax or botulism episodes. It notes that courts have recognized far more rights over the past century or so than existed at the time of Typhoid Mary’s troubles. It details procedures for assuring that people affected by emergency rules get hearings and lawyers. It mentions that in the event of an attack, officials can control traffic, communications and utilities. If they expect an attack, it says, they can compel mass evacuations.

But the guide also presents a sober rendition of what the realities might be in dire times. The suspension of laws, it says, is subject to constitutional rights. But then it adds, “This should not prove to be an obstacle, because federal and state constitutional restraints permit expeditious actions in emergency situations.”

Isn’t it better that authorities are doing some thinking about these situations now rather than simply reacting if something major happens? This reminds me of Nasim Taleb’s book The Black Swan where he argues that a problem we face as a society is that we don’t consider the odd things that could, and still do (even if it is rarely), happen. Taleb suggests we tend to extrapolate from past historical events but this is a poor predictor of future happenings.

Depending on the size or scope of the problem, it may be that government is limited or even unable to respond. Then we would have a landscape painted by numerous books and movies of the last few decades where every person has to simply find a way to survive. But even a limited and effective government response would be better than no response.

It would be interesting to know how much time has been spent putting together this manual.

Scorecasting looks at data: Cubs not unlucky, just bad

The recently published book Scorecasting (read a quick summary here) has a chapter that tackles the question of whether the Chicago Cubs are cursed or not. Their conclusion after looking at the data: the team has simply been bad.

But how can anyone disprove the existence of a curse? According to the authors, teams that frequently field good teams but finish in second place, or make the playoffs but fail to win a title, justifiably can claim to be unlucky. So, too, can teams that have impressive batting, hitting and defensive statistics, but whose strong numbers don’t translate into victories.

On both scores, the Cubs proved to be “less unlucky” than the average team. That is, not unlucky, just bad.

“Relative to other teams, we could easily explain the Cubs lack of success from the data — both their on the field statistics and where they finished in the standings,” Moscowitz said.

Since their last Series appearance in 1945, the Cubs have finished second fewer times than they have finished first. They also have finished last or next to last close to 40 percent of the time. According to the book, the odds of this happening by chance are 527 to 1.

The authors of “Scorecasting” believe that what has been stopping the Cubs the last three decades is the extreme loyalty of their fans, which has served to reduce the incentive for Cubs management to win.

According to their analysis, which is primarily based on attendance records and the team’s won-loss percentage from 1982-2009, Cubs fans are the least sensitive to the team’s winning percentage, while White Sox fans are among the most sensitive.

There are two interesting arguments going on here, both of which commonly come up in conversation in Chicago:

1. The data suggests that the Cubs have just been a bad team. It is not as if they have reached the playoffs or World Series multiple times and lost. It is not that they have impressive statistics and this hasn’t translated into wins. They just haven’t been very good. It would be interesting to read the rest of this chapter to see if the authors talk about the MLB teams that have been truly unlucky. I don’t know if a chapter like this will put the talk of a Cubs curse to rest but it is good to hear that there is data that could quiet the curse talk. (But perhaps the curse is what Cubs fans want to believe – it means that the team or the fans aren’t at fault.)

2. Cubs fans like to think that they are loyal while White Sox fans argue that Cubs fans will go to Wrigley Field no matter what. So is the answer for more Cubs fans to stay away from the ballpark until the team and the Ricketts show that they are serious about winning?

Just how much did Facebook and Twitter contribute to changes in Egypt?

With the resignation of Hosni Mubarek, there is more talk about how the Internet, specifically social media sites like Facebook and Twitter, helped bring down a dictator in Egypt:

Dictators are toppled by people, not by media platforms. But Egyptian activists, especially the young, clearly harnessed the power and potential of social media, leading to the mass mobilizations in Tahrir Square and throughout Egypt. The Mubarak regime recognized early on that social media could loosen its grip on power. The government began disrupting Facebook and Twitter as protesters hit the streets on Jan. 25 before shutting down the Internet two days later.

In addition to organizing, Egyptian activists used Facebook, YouTube, and Twitter to share information and videos. Many of these digital offerings made the rounds online but were later amplified by Al Jazeera and news outlets around the world. “This revolution started online,” Ghonim told Blitzer. “This revolution started on Facebook.”

Egypt’s uprising followed on the heels of Tunisia’s. In each case, protestors employed social media to help oust an authoritarian government–a role some Western commentators expected Twitter to play in Iran during the election protests of 2009.

This article, and others, seem to want it both ways. On one hand, it seems like social media played a role. But when considering whether they were the main factor, the articles back away. Here is how this same article concludes:

It’s true that tweeting alone–especially from safe environs in the West–will not cause a revolution in the Middle East. But as Egypt and Tunisia have proven, social media tools can play a significant role as as activists battle authoritarian regimes, particularly given the tight control dictators typically wield over the official media. Tomorrow’s revolution, as Ghonim would likely attest, may be taking shape on Facebook today.

Or it may not. Ultimately, we need more data. For example, we could match Facebook or Twitter activity regarding Egypt with the level of protests on specific days – did more online traffic or activity lead to bigger protests? This would at least establish a correlation. Why can’t we match GPS information from people using Facebook or Twitter while they were protesting on the streets? This would require more private data, primarily from cell phone companies, but it would be fascinating to look for patterns in this data. And how exactly do these cases from Egypt and Tunisia help us understand what didn’t happen in Iran?

These questions about the role of social media need some answers and perhaps some innovative insights into data collection. And a thought from another commentator are helpful to keep in mind:

Evgeny Morozov writes in his new book, “The Net Delusion: The Dark Side of Internet Freedom,” that only a small minority of Iranians were actually Twitter users. Presumably, many tweeting about revolution were doing so far from the streets of Tehran.

“Iran’s Twitter Revolution revealed the intense Western longing for a world where information technology is the liberator rather than the oppressor,” Morozov wrote, according to a recent Slate review. In his book, Morozov writes how authoritarian regimes can use the Internet and social media to oppress people rather than such platforms only working the other way around.

Perhaps we only want it to be true that social media use can lead to revolution. If there are enough articles written suggesting that social media helped in Egypt and Tunisia, does it make it likely that in the future social media will play a pivotal and even decisive role in social movements? Morozov seems to suggest this is a Western idea, probably rooted in Enlightenment ideals where information can (and should?) disrupt tradition and authoritarianism.

Scorecasting: Freakanomics for the sports world

A movement has been growing in the sports world in the last few decades: the use of lots of data in order to make decisions. Some of this data goes against “conventional wisdom” such as ideas of whether players can be “clutch” (some good stuff on which NBA players you would want to take the final shot with the game on the line) and what should actually be valued in free agents (MLB’s shift toward statistics like on-base percentage over home runs and RBIs).

A new book, Scorecasting, tackles a number of sports issue from a quantitative perspective. Read an interview (including a few examples from the book) with one of the authors here.

It will be interesting to see just how mainstream these sorts of ideas become. Does the average sports fan, or even the average sports broadcaster, want to rely on these kinds of data as opposed to their intuition or their feeling? Numbers may provide a better explanation – but numbers have all sorts of perceptions tied to them including the idea that people are just twisting the data to fit their explanation and that numbers about sports are developed by geeks who can’t play sports (or something along these lines).

I, for one, would like to have more quantitative data available to me when watching sports. Information like the batting average of a batter for particular parts of the plate (usually split into nine segments) or on a particular count would be useful. The data might seem overwhelming but ultimately, I think it helps people see the patterns underlying their favorite sport. For example, a home run hit on an 0-2 count in the 9th inning to win the game is impressive in its own right. But to know how rarely home runs are hit on the 0-2, even more so for some batters, adds to the feat.

Study about drunk fans has a limited sample

A recently released study suggests that 8 percent of fans leave sporting events drunk. This may be an interesting finding – but the newspaper description of the sample suggests there may be issues:

University of Minnesota researchers tested the blood alcohol content of 362 people to see how much folks drink when they go to professional baseball and football games. In their study, released Tuesday, they determined that 40 percent of the participants had some alcohol in their system and 8 percent were drunk, meaning their blood alcohol content was .08 or higher.

“Given the number of attendees at these sporting events, we can be talking about thousands of people leaving a professional sporting event who are legally intoxicated,” lead author Darin Erickson said. The study did not address what percentage, if any, of those fans intended to drive.

To collect the data, research staff waited outside 13 Major League Baseball and three National Football League games and randomly approached fans as they left. Those who consented took a breath test and answered questions about when, where and how much they drank on game day.

So the researchers waited outside 16 sporting events. Across these 16 events, the researchers performed voluntary tests on 362 people. This averages out to 22.625 fans per event.

Let’s say the events average at least 30,000 fans – not an unreasonable expectation for MLB and NFL games. If they tested about 23 fans at each event, that is less than 1 percent of each fans at each game. How could these findings be considered generalizable? First, you would need to test more fans. Second, could there be something different about the fans who were willing to volunteer for this test after a game?

Another report on this study bumps the sample number up a bit to 382 people. This doesn’t change the averages too much. Also, this may be the first study to examine the particular phenomenon of drinking at sporting events. However, the sample still seems to be too small even as the research study is going to be published in Alcoholism: Clinical & Experimental Research.