Four tips for making a good infographic

The head of a new infographic website suggests four tips for making a good infographic:

1. Apply a journalist’s code of ethics

An infographic starts with a great data set. Even if you’re not a journalist — but an advertiser or independent contractor, say — you need to represent the data ethically in order to preserve your credibility with your audience. Don’t source from blogs. Don’t source from Wikipedia. Don’t misrepresent your data with images.

2. Find the story in the data

There’s a popular misconception that creating a great infographic just requires hiring a great graphic designer. But even the best designer can only do so much with poor material. Mapping out the key points in your narrative should be the first order of business. “The most accessible graphics we’ve ever done are the ones that tell a story. It should have an arc, a climax and a conclusion,” Langille says. When you find a great data set, mock up your visualization first and figure out what you want to say, before contacting a designer.

3. Make it mobile and personal

As the media becomes more sophisticated, designers are developing non-static infographics. An interactive infographic might seem pretty “sexy,” Langille says, but it’s much less shareable. A video infographic, on the other hand, is both interactive and easy to port from site to site. Another way to involve readers is to create a graphic that allows them to input and share their own information.

4. Don’t let the code out

One of the easiest ways to protect your work is to share it on a community site. Visual.ly offers Creative Commons licensing to users who upload a graphic to the site. When visitors who want to use the graphic grab embed code from the site, the embedded image automatically links back to its creator. Langille suggests adding branding to the bottom of your work and never releasing the actual source file — only the PNG, JPEG, or PDF. And what if your work goes viral without proper credit? For god’s sake, don’t be a pain and demand that the thieves take it down. “It’s better to let it go and ask for a link back and credits on the graphics,” Langille said.

The first two points apply to all charts and graphs: you need to have good and compelling data and then use the graphic to tell this story. Infographics should make the relevant data easier to understand than having someone read through denser text. An easy temptation is to try new ways of displaying data without thinking through whether they are easily readable.

It would be interesting to know whether infographics are actually more effective in conveying information to viewers. In other words, is a traditional bar graph made in Excel really worse in the basic task of sharing information than a snazzy infographic? I imagine websites and publications would rather have infographics because they look better and take advantage of newer tools but a better visual does not necessarily equal connecting more with viewers.

Side note: the “meta Infographic” at the beginning of this article and the “Most Popular Infographics You Can Find Around the Web” at the end are amusing.

What happens when you let Boston residents crowdsource neighborhood boundaries

Here is a fascinating online experiment: let residents of a city, in this case, Boston, illustrate how they would draw neighborhood boundaries. Here are the conclusions of the effort thus far:

Although we talk a lot about boundaries, this post included, the maps here should also remind us that neighborhoods are not defined by their edges—essentially, what is outside the neighborhood—but rather by their contents. And it’s not just a collection of roads and things you see on a map; it’s about some shared history, activities, architecture, and culture. So while the neighborhood summaries above rely on edges to describe the maps, let’s also think about the areas represented by the shapes and what’s inside them. What are the characteristics of these areas? Why are they the shapes that they are? Why is consensus easy or difficult in different areas? What is the significance of the differences in opinion between residents of a neighborhood and people outside the neighborhood?

We’ll revisit those questions in further detail in future posts, and also generate maps of other facets of the data. Next up: areas of overlap between neighborhoods. Here we’ve looked neighborhood-by-neighborhood at how much people agree, so now let’s map those zones that exhibit disagreement. Meanwhile, thanks so much for all the submissions for this project; and if you haven’t drawn some neighborhoods, what’s your problem? Get on it!

This gets at a recurring issue for urban sociologists: how to best define communities or neighborhoods. The best option with data is to use Census boundaries such as tracts, block groups, blocks, and perhaps zip codes. This data is collected regularly, in-depth, and can be easily downloaded. However, these boundaries are crude approximations of culturally defined neighborhoods. People on the ground have little knowledge about what Census tract they live in (though this is easy to figure out online).

So if Census definitions are not the best for the on-the-ground experience, what is left? This crowdsourcing project is a modern way of doing what some researchers have done: ask the residents themselves and also observe what happens. What streets are not crossed? Which features or landmarks define a neighborhood? Who “belongs” where? What are typical activities in different places? Of course, this is a much messier process than working with clearly defined and reliable Census data but it illustrates a key aspect about neighborhoods: they are continually changing and being redefined by their own residents and others.

Sociologist: 70% of murders in two high-crime Chicago neighborhoods tied to social network of 1,600 people

Social networks can be part of more nefarious activities: sociologist Andrew Papachristos looked at two high-crime Chicago neighborhoods and found that a majority of the murders involved a small percentage of the population.

Papachristos looked at murders that occurred between 2005 and 2010 in West Garfield Park and North Lawndale, two low-income West Side neighborhoods. Over that period, Papachristos found that 191 people in those neighborhoods were killed.

Murder occasionally is random, but, more often, he found, the victims have links either to their killers or to others linked to the killers. Seventy percent of the killings he studied occurred within what Papachristos determined was a social network of only about 1,600 people — out of a population in those neighborhoods of about 80,000.

Each person in that network of 1,600 people had been arrested at some point with at least one other person in the same network.

For those inside the network, the risk of being murdered, Papachristos found, was about 30 out of 1,000. In contrast, the risk of getting killed for others in those neighborhoods was less than one in 1,000.

On one hand, this isn’t too surprising, especially considering the prevalence of gangs. At the same time, these numbers of striking: if a resident is in this small network, their risk of being murdered jumps 3000%.

I would be interested to know how closely the Chicago Police have mapped social networks like these. Do they use special social network software that helps them visualize the network and see nodes? Indeed, the article suggests the police are doing something like this:

Now, he wants to tap the same social networking analysis techniques that Papachristos, the Yale sociologist, developed to identify potential shooting victims, only McCarthy wants to use it to identify potential killers.

Police brass will cross-reference murder victims and killers with their known associates — the people projected as most likely to be involved in future shootings.

“Hot people,” McCarthy calls them.

Those deemed most likely to commit violence will be targeted first: parolees and people who have outstanding arrest warrants.

McCarthy said his staff estimates there are 26,000 “hot people” living in Chicago.

It would also be worthwhile to see how effective such strategies are. This isn’t the first time that organizations/agencies have tried to identify at-risk individuals. So how effective is it in the long run?

Losing something in the research process with such easy access to information

A retired academic laments that the thrill of the research hunt has diminished with easy access to information and data:

It’s a long stretch, but it seems to me that “ease of access” and the quite miraculous enquiry-request-delivery systems now available to the scholar have had an effect on research. The turn to theory – attention to textuality rather than physical things such as books, manuscripts, letters and paraphernalia of various kinds – has, I think, coincided with big changes in method. Discovery has been replaced by critical discourse and by dialectic.

Fieldwork was, typically, solitary. Lonely sometimes. The new styles at the professional end of the subject are collective – if sometimes less than collegial. The conference is now central to the profession, particularly the conference at which everyone is a speaker, a colloquiast and a verbal “participant”.

One can see something similar at the undergraduate level. I suspect that in my subject (English), some undergraduates are nowadays doing their three years without feeling ever obliged to go the library. Gutenberg, iBook, Wikipedia, SparkNotes, Google and the preowned, dirt-cheap texts on AbeBooks have rendered the library nothing more than emergency back-up and a warm place to work, using wi-fito access extramural materials. The seminar (the undergraduate equivalent of the conference), not the one-on-one tutorial or the know-it-all lecture, is the central feature of the teaching programme.

There may be something to this. Discovering new sources, objects, and data that no one has examined before in out-of-way places is certainly exciting. However, I wonder if the research hunt hasn’t simply shifted. As this academic argues, it is not hard to find information these days. But today, the hunt is more in what story to tell or how to interpret the accessible data. As I tell my students, anyone with some computer skills can do a search, find a dataset, and download it within a few minutes. This does not mean that the everyone can understand how to work with the data and interpret it. (The same would apply to non-numeric/qualitative data that could be quickly found, such as analyzing online interactions or profiles.) Clearing a way through the flood of information is no easy task and can have its own kind of charm.

Perhaps the problem is that students and academics today feel like having the quick access to information already takes care of a large part of their research. Simply go to Google, type in some terms, look at the first few results, and there isn’t much left to do – it is all magic, after all. Perhaps the searching for information that one used to do wasn’t really about getting the information but rather about the amount of time it required as this led to more profitable thinking, reflection, and writing time.

Measuring “peak car” in the United States

With data suggesting congestion, the number of teenagers with driver’s licenses, and the numbers of miles driven has dropped in recent years, Scientific American asks whether we have reached “peak car”:

According to the Federal Highway Administration’s “2011 Urban Congestion Trends” report, there was a 1.2 percent decline in vehicle miles traveled (VMT) last year compared with 2010. The drop follows years of stagnant growth in vehicle travel following a peak in 2007, before the economic downturn…

Her observation is true for the entire country. Rather than maintain the 50-year legacy of a 2 to 4 percent increase in vehicle travel each year, the annual number of VMT in the United States has stalled and even gone into reverse. The total number of miles driven in the United States today is the same as in 2004…

The interesting thing for Roy Kienitz, transportation infrastructure consultant and former undersecretary for policy at the Department of Transportation, is that American drivers actually started changing their individual driving habits years before the recession started.

The overall number of miles traveled by road peaked just before the market collapsed, but the number of VMT per capita peaked in 2004 and declined over the next eight years until today, according to Kienitz’s research, which is based on publicly available data.

Interesting. But I’m not sure this is the best way to measure “peak car.” While miles driven by road may be important to note, there are other factors that matters. Here are a few:

-The number of vehicles bought.

-The number of vehicles licensed.

-The number or % of people with driver’s licenses.

-The average number of trips people make on a daily basis. This gives you different information than the number of miles driven per year.

-Whether travel by other modes has increased or whether overall miles traveled is down. This would help show whether people are using cars less or really all travel is down.

Looking at all of these figures would help provide a more complete picture of whether we are at “peak car.”

Also, even if Americans are driving less overall, this doesn’t necessarily mean that cars are valued less or are less culturally important. Driving less doesn’t automatically mean most or even a significant number of Americans want to get rid of their cars or the freedom and individualism they represent.

Facebook’s Data Science Team running experiments

Facebook’s Data Science Team of 12 researchers is working with all of its data (900 million users worth) and running experiments:

“Recently the Data Science Team has begun to use its unique position to experiment with the way Facebook works, tweaking the site-the way scientists might prod an ant’s nest-to see how users react… So [Eytan Bakshy] messed with how Facebook operated for a quarter of a billion users. Over a seven-week period, the 76 million links that those users shared with each other were logged. Then, on 219 million randomly chosen occasions, Facebook prevented someone from seeing a link shared by a friend. Hiding links this way created a control group so that Bakshy could assess how often people end up promoting the same links because they have similar information sources and interests.

“He found that our close friends strongly sway which information we share, but overall their impact is dwarfed by the collective influence of numerous more distant contacts-what sociologists call “weak ties.” It is our diverse collection of weak ties that most powerfully determines what information we’re exposed to.”

But if that sounds a little creepy, it shouldn’t. Well, not too creepy, because these kinds of experiments aren’t designed to influence us, but rather understand us. The piece continues:

“Marlow says his team wants to divine the rules of online social life to understand what’s going on inside Facebook, not to develop ways to manipulate it. “Our goal is not to change the pattern of communication in society,” he says. “Our goal is to understand it so we can adapt our platform to give people the experience that they want.” But some of his team’s work and the attitudes of Facebook’s leaders show that the company is not above using its platform to tweak users’ behavior. Unlike academic social scientists, Facebook’s employees have a short path from an idea to an experiment on hundreds of millions of people.”

I think there is a lot of room to explore the world of weak ties on Facebook and similar websites. Just how much do friends of friends affect us? What is the impact of people a few ties along in our network? For example, the book Connected shows that traits like obesity and happiness are tied to network behavior which could be examined on Facebook.

I would guess some people may not like hearing this but there are at least three points in Facebook’s favor here:

1. They are not the only online company running such experiments. Google has been doing such things with search results for quite a while. Theoretically, these experiments could help create a better user experience.

2. People are voluntarily giving their data. I don’t think these companies have to explain that user’s data might be used in experiments…but perhaps I am wrong?

3. This is “Big Data” writ large. Facebook and others would love to be able to run randomized trials with this large group and with all of the information available to researchers.

Finding the most extroverted town in America in Iowa

A “marketing research firm” recently named Keota, Iowa as the most extroverted town in America. How exactly does a researcher determine the most extroverted town?

Pyco, which claims to specialize in “psychological profiling,” ranked 61.639 percent of adults in Keota (pop. 1,009, according to the 2010 census)  as extroverts — just beating Manchester, N.Y.’s 60.570 percent for the title of most outgoing. Yet despite this designation, locals are reportedly confused as to how they ranked so high…

In fact, nobody outside Pyco quite understands the methodology for the rankings. According to the Register, the firm collected data in part from other research firms, and processed the numbers with a proprietary 2,000 page algorithm. Keith Streckenbach, the company’s chief operating officer, could not specify which factors most affected whether a person was deemed extroverted.

Keota’s designation has led to a series of stories in Iowa media examining the honor. One piece on the blog Eastern Iowa News Now interviewed Kevin Leicht, the chairman of the University of Iowa’s Sociology Department, and found that extroversion may be a trait inherent to small towns…

Pyco’s algorithm found that only about 57 percent of New York City adults are extroverts.

Several questions follow:

1. I would be really curious to know how this proprietary data was collected. Is it culled from the Internet? Could it be partially determined by the number of local businesses or “third places” (found in the Yellow Pages or some other kind of community listings)?

2. The differences between Keota and New York City are not huge: 61.6% to 57%. If you factor in the margin of error from these estimates (possibly fairly large since how many data points could there be in each town of more than 1,00 people across the US?), these figures may be close to the same. It would be worthwhile to see how broad the range of data for communities really is: are there towns in the US where less than 40% of people are extroverts?

3. Would we expect an extroverted community to know they are more extroverted than another community? Put another way, are extroverts more self-aware of their extroversion or are introverts the ones that are more likely to be aware of these things?

4. Since this data was collected by a marketing firm, I assume they would want to sell this information to companies and other organizations. So if Keota is the most extroverted town, will residents now see different kinds of promotional campaigns in the near future?

Sharing data among scientists vs. “Big Data”

In a quest to make data available to other researchers to verify research results, researchers have come up against one kind of data that is not made publicly available: “big data” from big Internet firms.

The issue came to a boil last month at a scientific conference in Lyon, France, when three scientists from Google and the University of Cambridge declined to release data they had compiled for a paper on the popularity of YouTube videos in different countries.

The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience…

At leading social science journals, there are few clear guidelines on data sharing. “The American Journal of Sociology does not at present have a formal position on proprietary data,” its editor, Andrew Abbott, a sociologist at the University of Chicago, wrote in an e-mail. “Nor does it at present have formal policies enforcing the sharing of data.”

The problem is not limited to the social sciences. A recent review found that 44 of 50 leading scientific journals instructed their authors on sharing data but that fewer than 30 percent of the papers they published fully adhered to the instructions. A 2008 review of sharing requirements for genetics data found that 40 of 70 journals surveyed had policies, and that 17 of those were “weak.”

Who will win the battle between proprietary data and science? The article makes it sound like scientists are all on one side, particularly because of an interest in fighting issues like scientific fraud. At the same time, scientific journals don’t seem to be “enforcing” their guidelines or the individual scientists who are publishing in these journals aren’t following these guidelines.

The other side of this debate is not presented in this story: what do these big Internet firms, like Google, Yahoo, and Facebook think about sharing this data? This is not a small issue: these firms are spending a good amount of money on analyzing this data and probably hoping to use it for their own business and research purposes. For example, Microsoft recently set up a lab with several well-known researchers in New York City. Would the social scientists who work in such labs want to insist that the data be open? Should these companies have to open up their proprietary data to satisfy the requirements of the larger scientific community?

I suspect this will be an ongoing issue as social scientists look to analyze more innovative data that big companies have collected and that are more difficult for researchers to collect on their own. Will researchers be willing to forgo sharing this kind of data with the wider scientific community if they can get their hands on unique data?

Debating the reliability of social science research

A philosopher argues social science research is not that reliable and therefore should have a limited impact on public policy:

Without a strong track record of experiments leading to successful predictions, there is seldom a basis for taking social scientific results as definitive.  Jim Manzi, in his recent book, “Uncontrolled,” offers a careful and informed survey of the problems of research in the social sciences and concludes that “nonexperimental social science is not capable of making useful, reliable and nonobvious predictions for the effects of most proposed policy interventions.”

Even if social science were able to greatly increase their use of randomized controlled experiments, Manzi’s judgment is that “it will not be able to adjudicate most policy debates.” Because of the many interrelated causes at work in social systems, many questions are simply “impervious to experimentation.”   But even when we can get reliable experimental results, the causal complexity restricts us to “extremely conditional, statistical statements,” which severely limit the range of cases to which the results apply.

My conclusion is not that our policy discussions should simply ignore social scientific research.  We should, as Manzi himself proposes, find ways of injecting more experimental data into government decisions.  But above all, we need to develop a much better sense of the severely limited reliability of social scientific results.   Media reports of research should pay far more attention to these limitations, and scientists reporting the results need to emphasize what they don’t show as much as what they do.

Given the limited predictive success and the lack of consensus in social sciences, their conclusions can seldom be primary guides to setting policy.  At best, they can supplement the general knowledge, practical experience, good sense and critical intelligence that we can only hope our political leaders will have.

Several quick thoughts:

1. There seems to be some misunderstanding about the differences between the social and natural sciences. The social sciences don’t have laws in the same sense that the natural sciences do. People don’t operate like planets (to pick up on one of the examples). Social behaviors change over time in response to changing conditions and this makes study more difficult.

2. There is a heavy emphasis in this article on experiments. However, these are more difficult to conduct in the social realm: it is hard to control for all sorts of possible influential factors, have a sizable enough N to make generalizations, and experiments in the “harder sciences” like medicine have some of their own issues (see this critique of medical studies).

3. Saying the social sciences have some or a little predictive ability is different than saying they have none. Having some knowledge of social life is better than none when crafting policy, right?

4. Leaders should have “the general knowledge, practical experience, good sense and critical intelligence” to be able to make good decisions. Are these qualities simply individualistic or could social science help inform and create these abilities?

5. While there are limitations to doing social science research, there are also ways that researchers can increase the reliability and validity of studies. These techniques are not inconsequential; there are big differences between good research methods and bad research methods in what kind of data they produce. There is a need within social science to think about “big science” more often rather than pursuing smaller, limited studies but these studies than can speak to broader questions typically require more data and analysis which in turn requires more resources and time.

Sociology grad student: scholars need to and can make their research and writing more public

Sociology PhD student Nathan Jurgenson argues that scholars need to make their research more public:

To echo folks like Steven Sideman or danah boyd, we have an obligation to change this; academics have a responsibility to make their work relevant for the society they exist within.

The good news is that the tools to counter this deficiency in academic relevance are here for the taking. Now we need the culture of academia to catch up. Simply, to become more relevant, academics need to make their ideas more accessible.

There are two different, yet equally important, ways academics need to make their ideas accessible:

(1) Accessible by availability: ideas should not be locked behind paywalls.

(2) Accessible by design: ideas should be expressed in ways that are interesting, readable and engaging.

Considering that Jurgenson researches social media (see my earlier post on another of his arguments), I’m not surprised to see him make this argument. Though most of his argument is tilted toward the brokenness of the current system, Jurgenson wants to help the academic world see that we now have the tools, particularly online, to do some new things.

A few other thoughts:

1. Does every generation of graduate students suggest the current system is broken or is this really a point in time where a big shift could occur?

2. Jurgenson also hints that academics need to be more able to write for larger publics. So it is not just about the tools but about the style and rhetoric needed to speak through these other means. I can’t imagine any “Blogging Sociology” courses in grad schools anytime soon but Jurgenson is bringing up a familiar complaint: academics sometimes have difficulty making their case to people who are not academics.

3. Jurgenson doesn’t really get at this but these new tools also mean that data, not just writing, can be shared more widely. This could also become an important piece of a more open academia.

4. The idea that academic writing should or could be fun is intriguing. How many academics could pull this off? Might this reduce the gravitas of academic research?