Sociologist receives award in part for one article being cited over 24,000 times

Mark Granovetter’s 1973 article “The Strength of Weak Ties” is a sociological classic and still is cited frequently in top sociology journals (see 2012 data here). This impressive number of citations contributed to the naming of Granovetter as the recipient of an award:

Cited over 24,000 times, Granovetter’s 1973 paper “The Strength of Weak Ties” is a social science classic and a milestone in network theory. Our close friends are strongly in touch with us and each other, he wrote, but our acquaintances – weak ties – are crucial bridges to other densely knit clumps of close friends. The more weak ties we have, the more in touch we are with ideas, fashions, job openings and whatever else is going on in diverse and far-flung communities.

The award honors the late Everett M. Rogers, a former associate dean at the University of Southern California’s Annenberg School for Communication and Journalism and an influential communication scholar whose Diffusion of Innovation is the second-most cited book in the social sciences.  Presented since 2007 on behalf of USC Annenberg by its Norman Lear Center, the award recognizes outstanding scholars and practitioners whose work has contributed path-breaking insights in areas of Rogers’s legacy.

At the USC Annenberg School on Wednesday, September 18 at 12 noon, Granovetter will present “The Strength of Weak Ties” Revisited.  He will discuss how he came to write it; where it fits in the history of social network analysis; how its argument has held up over the years; and its significance in recent social revolutions, where it’s often been claimed that social networks are at the core of the new political developments.  The event is free and open to the public but RSVP is required. (RSVP is available online at: http://bit.ly/189ayDM)

There is no doubt that being cited over 24,000 times is impressive. Granovetter’s work has been utilized in multiple disciplines and came at the forefront of an explosion of research on social networks and their effects.
At the same time, the press release makes a big deal about citations twice while also highlighting Granovetter’s specific findings. Which is more important in the world of science today: the number of citations, which is a measure of importance, or about the actual findings and how it pushed science forward? This award can contribute to existing debates about the importance of citations as a measure. What exactly do they tell us and should we recognize those who are cited the most?

The value of using maps to see the rise and fall of Detroit

Here is a series of maps that show both the growth and decline of Detroit over its history. When looking at these maps, I’m reminded that it is quite difficult to talk about either the rise or decline of a major city just by discussing raw numbers, such as population increases or losses or economic figures, or photographs. For example, we could talk about the rise of Houston in recent decades and contrast this to the sharp population decrease in Detroit. Moving past statistics, we could include photographs of a city. Detroit has been photographed many times in recent years with often bleak scenes illustrating economic and social decline.

In some middle ground between numbers and photos and in-depth analysis (of which there does not seem to be much about Detroit recently – the mainstream media has primarily focused on short snippets of information) are maps. A good map has sufficient information to provide a top-down approach to the city and give some indication of the city’s infrastructure. Additionally, it is much easier today to provide multiple layers of mapped information based on Census data and other sources. Growth is relatively easy to see as new streets and points of interest starting showing up. On the other hand, decline might be harder to show as the streets may be empty and the points of interest might be decaying. Still, a current map shows the scope of the problem facing Detroit: it is population and economic decline plus a large chunk of land and structures that is difficult to maintain.

All together, I’m advocating for more widespread use of maps in reporting on and discussions about cities, whether they are struggling or thriving. Maps can help us move beyond seeing vacant houses or economic developments and take in the big picture all at once.

Wired’s five tips for “p-hacking” your way to a positive study result

As part of its “Cheat Code to Life,” Wired includes four tips for researchers to obtain positive results in their studies:

Many a budding scientist has found themself one awesome result from tenure and unable to achieve that all-important statistical significance. Don’t let such setbacks deter you from a life of discovery. In a recent paper, Joseph Simmons, Leif Nelson, and Uri Simonsohn describe “p-hacking”—common tricks that researchers use to fish for positive results. Just promise us you’ll be more responsible when you’re a full professor. —MATTHEW HUTSON

Create Options. Let’s say you want to prove that listening to dubstep boosts IQ (aka the Skrillex effect). The key is to avoid predefining what exactly the study measures—then bury the failed attempts. So use two different IQ tests; if only one shows a pattern, toss the other.

Expand the Pool. Test 20 dubstep subjects and 20 control subjects. If the findings reach significance, publish. If not, run 10 more subjects in each group and give the stats another whirl. Those extra data points might randomly support the hypothesis.

Get Inessential. Measure an extraneous variable like gender. If there’s no pattern in the group at large, look for one in just men or women.

Run Three Groups. Have some people listen for zero hours, some for one, and some for 10. Now test for differences between groups A and B, B and C, and A and C. If all compar­isons show significance, great. If only one does, then forget about the existence of the p-value poopers.

Wait for the NSF Grant. Use all four of these fudges and, even if your theory is flat wrong, you’re more likely than not to confirm it—with the necessary 95 percent confidence.

This might be summed up as “things that are done but would never be explicitly taught in a research methods course.” Several quick thoughts:

1. This is a reminder of how important 95% significant is in the world of science. My students often ask why the cut-point is 95% – why do we accept 5% error and not 10% (which people sometimes “get away with” in some studies) or 1% (wouldn’t we be more sure of our results?).

2. Even if significance is important and scientists hack their way to more positive results, they can still have a humility about their findings. Reaching 95% significance still means there is a 5% chance of error. Problems arise when findings are countered or disproven but we should expect this to happen occasionally. Additionally, results can be statistically significant but have little substantive significance. All together, having a significant finding is not the end of the process for the scientist: it still needs to be interpreted and then tested again.

3. This is also tied to the pressure of needing to find positive results. In other words, publishing an academic study is more likely if you disprove the null hypothesis. At the same time, not disproving the hypothesis is still useful knowledge and such studies should also be published. Think of the example of Edison’s quest to find the proper material for a lightbulb filament. The story is often told in such a way to suggest that he went through a lot of work to finally find the right answer. But, this is often how science works: you go through a lot of ideas and data before the right answer emerges.

Comparing ethnography and journalism, stories vs. data

A letter writer to the New York Times unfavorably compares ethnography and social scientific methods to journalism:

David Brooks’s review of George Packer’s book “The Unwinding: An Inner History of the New America” (June 9) is befuddling. First, Brooks praises Packer’s “gripping narrative survey” of recession-era life, comparing it to earlier efforts like that of John Dos Passos. Then, bizarrely, he faults Packer for not providing a “theoretical framework and worldview” that would include “sociology, economics or political analysis.” Narrative description and evocation has for centuries been among the most powerful forms of argument — so powerful, in fact, that the social psychologists Brooks admires appropriated the styles and cloaked them in the pseudoscientific garb of “ethnography” (which we used to call “journalism”).

Have we reached a point where devotion to instrumental reason is so maniacal we can’t handle mere stories anymore? Or perhaps we accept stories only when they’re accompanied by the tenuous methodology of social “scientists.” I would bet that a single profile by Packer, one of America’s best journalists, provides a better snapshot of real life than the legions of sociology and economics articles published since the crash.

Here is someone suspicious of social science. This is not an uncommon position. There is no doubt that stories and narratives are powerful and also have a longer history than the social sciences which developed in and after the Enlightenment. Yet, we also live in a world where science and data have also become powerful arguments.

Intriguingly, ethnography is a social scientific method that might help bridge this gap between narrative and data. This method differs from journalism in some important ways but also shares some similarities. The ethnographer doesn’t just work with statistics and data from a distance or through a few interviews. Through an extended engagement with the research subject, even living with the subjects for months or years, the researcher gets an insider perspective while also trying to maintain objectivity. The participant observer is engaged with larger social science theories and ideas, trying to understand how more specific experiences and groups line up with larger theories and models. The research case is of interest but the connection to the bigger picture is very important in the end. At the same time, ethnographies are often written in a more narrative style than social science journal articles (unless we are talking about journal articles utilizing ethnography).

Stories and data can both be illuminating. I know which side I tend to favor, hence I’m a sociologist, but I also enjoy narratives and “mere stories.”

Can you name “America’s 50 Healthiest Counties for Kids” when you only account for 38% of US counties?

US News & World Report recently released a list of “America’s 50 Healthiest Counties for Kids.” However, there is a problem with the rankings: more than half of American counties aren’t included in the data.

About 1,200 of the nation’s 3,143 counties (a total that takes in county equivalents such as Louisiana’s parishes) were evaluated for the rankings. Many states don’t collect county-level information on residents’ health, whereas populous states, such as California, Florida and New York, tend to gather and report more data. In some counties, the population is so small that the numbers are unreliable, or the few events fall below state or federal reporting thresholds. And because states don’t collect county-level information on childhood smoking and obesity, the rankings incorporated percentages for adults. Catlin says this is justified because more adult smokers mean more children are exposed to secondhand smoke, a demonstrated health risk. Studies have also shown a moderately strong correlation between adult and childhood obesity, she says.

The experts who study community health yearn for more and better data. “We don’t have county-level data on kids with diabetes, controlled or uncontrolled, or on childhood obesity rates,” says Ali Mokdad of the Institute for Health Metrics and Evaluation at the University of Washington. “Almost every kid in this country goes to school. We could measure height and weight, but nobody’s connecting the dots.”

This won’t stop counties high on the list from touting their position. See this Daily Herald article about DuPage County coming in at #20. But, there should be some disclaimer or something on this list if a majority of US counties aren’t even considered. Or, perhaps such a list shouldn’t be too together at all.

Methodological issues with the “average” American wedding costing $27,000

Recent news reports suggest the average American wedding costs $27,000. But, there may be some important methodological issues with this figure: selection bias and using an average rather than a median.

The first problem with the figure is what statisticians call selection bias. One of the most extensive surveys, and perhaps the most widely cited, is the “Real Weddings Study” conducted each year by TheKnot.com and WeddingChannel.com. (It’s the sole source for the Reuters and CNN Money stories, among others.) They survey some 20,000 brides per annum, an impressive figure. But all of them are drawn from the sites’ own online membership, surely a more gung-ho group than the brides who don’t sign up for wedding websites, let alone those who lack regular Internet access. Similarly, Brides magazine’s “American Wedding Study” draws solely from that glossy Condé Nast publication’s subscribers and website visitors. So before they do a single calculation, the big wedding studies have excluded the poorest and the most low-key couples from their samples. This isn’t intentional, but it skews the results nonetheless.

But an even bigger problem with the average wedding cost is right there in the phrase itself: the word “average.” You calculate an average, also known as a mean, by adding up all the figures in your sample and dividing by the number of respondents. So if you have 99 couples who spend $10,000 apiece, and just one ultra-wealthy couple splashes $1 million on a lavish Big Sur affair, your average wedding cost is almost $20,000—even though virtually everyone spent far less than that. What you want, if you’re trying to get an idea of what the typical couple spends, is not the average but the median. That’s the amount spent by the couple that’s right smack in the middle of all couples in terms of its spending. In the example above, the median is $10,000—a much better yardstick for any normal couple trying to figure out what they might need to spend.

Apologies to those for whom this is basic knowledge, but the distinction apparently eludes not only the media but some of the people responsible for the surveys. I asked Rebecca Dolgin, editor in chief of TheKnot.com, via email why the Real Weddings Study publishes the average cost but never the median. She began by making a valid point, which is that the study is not intended to give couples a barometer for how much they should spend but rather to give the industry a sense of how much couples are spending. More on that in a moment. But then she added, “If the average cost in a given area is, let’s say, $35,000, that’s just it—an average. Half of couples spend less than the average and half spend more.” No, no, no. Half of couples spend less than the median and half spend more.

When I pressed TheKnot.com on why they don’t just publish both figures, they told me they didn’t want to confuse people. To their credit, they did disclose the figure to me when I asked, but this number gets very little attention. Are you ready? In 2012, when the average wedding cost was $27,427, the median was $18,086. In 2011, when the average was $27,021, the median was $16,886. In Manhattan, where the widely reported average is $76,687, the median is $55,104. And in Alaska, where the average is $15,504, the median is a mere $8,440. In all cases, the proportion of couples who spent the “average” or more was actually a minority. And remember, we’re still talking only about the subset of couples who sign up for wedding websites and respond to their online surveys. The actual median is probably even lower.

These are common issues with figures reported in the media. Indeed, these are two questions the average reader should ask when seeing a statistic like the average cost of the wedding:

1. How was the data collected? If this journalist is correct about these wedding cost studies, then this data is likely very skewed. What we would want to see is a more representative sample of weddings rather than having subscribers or readers volunteer how much their wedding cost.

2. What statistic is reported? Confusing the mean and median is a big program and pops up with issues as varied as the average vs. median college debtthe average vs. median credit card debt, and the average vs. median square footage of new homes. This journalist is correct to point out that the media should know better and shouldn’t get the two confused. However, reporting a higher average with skewed data tends to make the number more sensationalistic. It also wouldn’t hurt to have more media consumers know the difference and adjust accordingly.

It sounds like the median wedding cost would likely be significantly lower than the $27,000 bandied about in the media if some basic methodological questions were asked.

American median income, poverty rates, and inequality by county

Check out these maps of American inequality and income using the latest American Community Survey data.

The below five maps were created by Calvin Metcalf, Kyle Box and Laura Evans using the latest five-year American Community Survey estimates provided by the Census Bureau for last weekend’s National Day of Civic Hacking (we’re geeking out on these projects this week).

Working from Boston, the group has so far mapped nearly a dozen demographic points from the data, including a few they calculated on their own (be sure to check out the very bizarre map of America’s gender ratios by county). These five maps, however, jumped out at us for how they each illustrate deep and lingering differences between the American North and South, as seen through several different data points. Of course, the patterns aren’t perfect, and exceptions abound; major cities in the North turn out to be hotspots of inequality on par with much of the Deep South…

Median income (in annual dollars)

Population living below the poverty line (by percent)

Income inequality (as measured by the Gini coefficient, the closer to zero the better)

There do seem to be seem some regional differences. But, these three maps raise other questions:

1. This may be a good place for population weighted maps. While counties are one unit of geographic measure, they can obscure finer-grained data. For example, the map of median income shows higher incomes in urban areas but this glosses over poor urban and suburban neighborhoods. Plus, many of the counties in the South, Great Plains, and Mountain West have relatively fewer people.

2. The income map shows one story – generally higher incomes in urban areas – and the inequality, measured by the Gini coefficient, shows that these same urban areas have high levels of inequality. This may be an issue with the county measure but it also highlights that while cities are economic engines, they are also homes to pronounced inequality.

Adding creative endeavors to GDP

The federal government is set to change how it measures GDP and the new measure will include creative work:

The change is relatively simple: The BEA will incorporate into GDP all the creative, innovative work that is the backbone of much of what the United States now produces. Research and development has long been recognized as a core economic asset, yet spending on it has not been included in national accounts. So, as the Wall Street Journal noted, a Lady Gaga concert and album are included in GDP, but the money spent writing the songs and recording the album are not. Factories buying new robots counted; Pfizer’s expenditures on inventing drugs were not.

As the BEA explains, it will now count “creative work undertaken on a systematic basis to increase the stock of knowledge, and use of this stock of knowledge for the purpose of discovering or developing new products, including improved versions or qualities of existing products, or discovering or developing new or more efficient processes of production.” That is a formal way of saying, “This stuff is a really big deal, and an increasingly important part of the modern economy.”

The BEA estimates that in 2007, for example, adding in business R&D would have added 2 percent to U.S. GDP, or about $300 billion. Adding in the various inputs into creative endeavors such as movies, television and music will mean an additional $70 billion. A few other categories bring the total addition to over $400 billion. That is larger than the GDP of more than 160 countries…

The new framework will not stop the needless and often harmful fetishizing of these numbers. GDP is such a simple round number that it is catnip to commentators and politicians. It will still be used, incorrectly, as a proxy for our economic lives, and it will still frame our spending decisions more than it should. Whether GDP is up 2 percent or down 2 percent affects most people minimally (down a lot, quickly, is a different story). The wealth created by R&D that was statistically less visible until now benefited its owners even those the figures didn’t reflect that, and faster GDP growth today doesn’t help a welder when the next factory will use a robot. How wealth is used, who benefits from it and whether it is being deployed for sustainable future growth, that is consequential. GDP figures, even restated, don’t tell us that.

On one hand, changing a measure so that more accurately reflects the economy is a good thing. This could help increase the validity of the measure. On the other hand, measures still can be used well or poorly, the change may not be a complete improvement over previous measures, and it may be difficult to reconcile new figures with past figures. It is not quite as easy as simply “improving” a measure; a lot of other factors are involved. It will be interesting to see how this measurement change sorts out in the coming years and how the information is utilized.

Is this meaningful data: Chicago the “slowest-growing major city” between 2011 and 2012?

New figures from the Census show that Chicago doesn’t fare well compared to other cities in recent population growth:

Chicago gained nearly 10,000 people from July 2011 to July 2012, but was the slowest-growing major city in the country according to U.S. Census Bureau estimates released Thursday.

It was the second year in a row that population grew here, but the increase so far shows no signs of making up for the loss of 200,000 people over the previous decade…

Among cities with more than one million people, sun-belt metropolises like Dallas, San Antonio, Phoenix, Houston and San Diego all posted gains of more than 1.3 percent, while Chicago grew by little more than one-third of 1 percent.

With a total estimated population of 2,714,856. Chicago held on to its spot as the third largest city. But the two largest cities padded their leads, with New York City adding 67,000 in 2012 and No. 2 Los Angeles gaining 34,000 people.

While I’m sure some will use these figures to judge Chicago’s politics and development efforts, I’m not sure these figures mean anything. Here’s why:

1. The data only cover one year. This is just one time point. The story does a little bit to provide a wider context by referencing the 2010-2010 population figures but it would also be helpful to know the year-to-year figures for the last two years. In other words, what is the trend in the last several years in Chicago? Is the nearly 10,000 new people much different from 2011 or 1010 or 1009?

2. These are population estimates meaning there is a margin of error for the estimate. Thus, that error might cover a decent amount of population growth in all of these cities.

In the end, we need more data over time to know whether there are long-term trends going on in these major cities.

Two other interesting notes from the Census data:

1. The population growth in the Sunbelt continues:

Eight of the 15 fastest-growing large U.S. cities and towns for the year ending July 1, 2012 were in Texas, according to population estimates released today by the U.S. Census Bureau. The Lone Star State also stood out in terms of the size of population growth, with five of the 10 cities and towns that added the most people over the year…

No state other than Texas had more than one city on the list of the 15 fastest-growing large cities and towns. However, all but one were in the South or West.

This fits with what Joel Kotkin has been saying for a while.

2. Many Americans continue to live in communities with fewer than 50,000 people:

Of the 19,516 incorporated places in the United States, only 3.7 percent (726) had populations of 50,000 or more in 2012.

However, many of these smaller communities are suburbs near big cities. It’s too bad there aren’t figures here about what percentage of Americans live in those 726 communities of 50,000 or more.

Defining what makes for a luxury home

Here is how one data firm defines what it means to be a luxury housing unit:

Although upscale housing is selling better in some cities than in others, a monthly analysis by the Altos Research data firm for the Institute for Luxury Home Marketing says that overall, that segment of the market is gaining momentum and prices are rising…

Q: “Luxury home” is probably one of the most abused phrases in real estate-ese. How do you define it?

A: A price range that’s considered the high end of the market in one place might be something that’s average in another. So, “luxury” is local: Our organization generally defines it as the top 10 percent of an area’s sales in the past 12 months. But for the purposes of the research that we do with Altos for our monthly Luxury Market Report, we’ve taken the ZIP codes within each of 31 markets that have the highest median prices, and for about five years we’ve tracked the sales of homes in those (areas) that are $500,000 and above.

There are two techniques proposed here:

1. The highest 10 percent of a local housing market. Thus, the prices are all relative and the data is based on the highest end in each place. So, there could be some major differences in luxury prices across zip codes or metropolitan regions.

2. Breaking it down first by geography to the wealthiest places (so this is based on geographic clustering) and then setting a clear cut point at $500,000. In these wealthiest zip codes, wouldn’t most of the units be over $500,000? Why the 31 wealthiest markets and not 20 or 40?

Each of these approaches have strengths and weaknesses but I imagine the data here could change quite a bit based on what operationalization is utilized.

Interestingly, the firm found that luxury sales rebounded quicker than the rest of the market:

The interesting thing about this recovery is that the luxury segment, that group of affluent households, was able to recover fairly quickly. They shifted their assets around, and a lot of them were able to see opportunities in the down market. By 2010, there were almost as many high-end households as before the downturn, not just in the United States, but internationally, as well. This group focused on residential real estate as a pretty desirable asset — for them, a second or third home turned out to be a portfolio play.

This shouldn’t be too surprising – when an economic crisis hits, the wealthier members of society have more of a cushion. While the upper end is doing all right, others have argued the bottom end, those looking for starter homes, are having a tougher time.