Call for more comparative study of poor urban neighborhoods using new techniques

Urban sociologist Mario Small recently argued sociologists and others need to adopt some new approaches to studying poor urban neighborhoods:

Small, who is also dean of UChicago’s Division of the Social Sciences, studies urban neighborhoods and has studied the diversity of experiences for people living in poor neighborhoods in cities across the country.

Studying only a few neighborhoods extensively fails to capture important differences, he said in a talk, “Poverty and Organizational Density,” at a session Feb. 15 at the annual meeting of the American Association for the Advancement of Science in Chicago…

“The experience of poverty varies from city to city, influenced by neighborhood factors such as commercial activity, access to transportation and social services, and other facets of organizational density,” Small said.

He explained that new sources of information, ranging from open city data to detailed, high-resolution imagery from commercial mapping services, provide new opportunities to compare the experience of the poor among multiple cities, in turn pointing cities and service providers toward optimal decision-making about policies, investment, or other interventions.

One of these changes is driven by changes in technology, the ability to collect big data. This can help sociologists and others go beyond surveys and neighborhood observations. Robert Sampson does some of this in Great American City with the ability to map the social networks and neighborhood moves of residents from poorer neighborhoods. Big data will be enable us to go even further.

The second suggestion, however, is something that sociologists could have been doing for decades. Poor neighborhoods in certain cities tend to get the lion’s share of attention, places like Chicago, New York City, Boston, and Philadelphia. In contrast, poor neighborhoods in places like Dallas, Miami, Seattle, Denver, and Las Vegas get a lot less attention. Perhaps I should return to a presentation I made years ago at the Society for the Study of Social Problems about this very topic where I suggested some key factors that led to this lack of comparative study…

Argument: businesses should use scientific method in studying big data

Sociologist Duncan Watts explains how businesses should go about analyzing big data:

A scientific mind-set takes as its inspiration the scientific method, which at its core is a recipe for learning about the world in a systematic, replicable way: start with some general question based on your experience; form a hypothesis that would resolve the puzzle and that also generates a testable prediction; gather data to test your prediction; and finally, evaluate your hypothesis relative to competing hypotheses.

The scientific method is largely responsible for the astonishing increase in our understanding of the natural world over the past few centuries. Yet it has been slow to enter the worlds of politics, business, policy, and marketing, where our prodigious intuition for human behavior can always generate explanations for why people do what they do or how to make them do something different. Because these explanations are so plausible, our natural tendency is to want to act on them without further ado. But if we have learned one thing from science, it is that the most plausible explanation is not necessarily correct. Adopting a scientific approach to decision making requires us to test our hypotheses with data.

While data is essential for scientific decision making, theory, intuition, and imagination remain important as well—to generate hypotheses in the first place, to devise creative tests of the hypotheses that we have, and to interpret the data that we collect. Data and theory, in other words, are the yin and yang of the scientific method—theory frames the right questions, while data answers the questions that have been asked. Emphasizing either at the expense of the other can lead to serious mistakes…

Even here, though, the scientific method is instructive, not for eliciting answers but rather for highlighting the limits of what can be known. We can’t help asking why Apple became so successful, or what caused the last financial crisis, or why “Gangnam Style” was the most viral video of all time. Nor can we stop ourselves from coming up with plausible answers. But in cases where we cannot test our hypothesis many times, the scientific method teaches us not to infer too much from any one outcome. Sometimes the only true answer is that we just do not know.

To summarize: the scientific method provides ways to ask questions and receive data regarding answering these questions. It is not perfect – it doesn’t always produce the answer or the answers people are looking for, it may only be as good as the questions asked, it requires a rigorous methodology – but it can help push forward the development of knowledge.

While there are businesses and policymakers using such approaches, it strikes me that such an argument for the scientific method is especially needed in the midst of big data and gobs of information. In today’s world, getting information is not a problem. Individuals and companies can quickly find or measure lots of data. However, it still requires work, interpretation, and proper methodology to interpret that data.

Who wants to be in the “McMansion and minivans” category?

Big data makes it possible to slice up Americans into all sorts of consumer categories like “McMansions and minivans.” However, how many would want to be in that category?

Acxiom provides “premium proprietary behavioral insights” that “number in the thousands and cover consumer interests ranging from brand and channel affinities to product usage and purchase timing.” In other words, Acxiom creates profiles, or digital dossiers, about millions of people, based on the 1,500 points of data about them it claims to have. These data might include your education level; how many children you have; the type of car you drive; your stock portfolio; your recent purchases; and your race, age, and education level. These data are combined across sources—for instance, magazine subscriber lists and public records of home ownership—to determine whether you fit into a number of predefined categories such as “McMansions and Minivans” or “adult with wealthy parent.” Acxiom is then able to sell these consumer profiles to its customers, who include twelve of the top fifteen credit card issuers, seven of the top ten retail banks, eight of the top ten telecom/media companies, and nine of the top ten property and casualty insurers.

Acxiom may be one of the largest data brokers, but it represents a dramatic shift in the way that personal information is handled online. The movement toward “Big Data,” which uses computational techniques to find social insights in very large groupings of data, is rapidly transforming industries from health care to electoral politics. Big Data has many well-known social uses, for example by the police and by managers aiming to increase productivity. But it also poses new challenges to privacy on an unprecedented level and scale. Big Data is made up of “little data,” and these little data may be deeply personal.

This is not new though the amount of data advertisers and others have – which is often given voluntarily on the Internet – may have increased in recent years. What might be more interesting, given that this is happening, is then to present Americans with the categories they are in and see how they react. Neither McManions or minivans have very good reputations. McMansions are seen as ugly houses owned by people who just want to make a splash, not own a quality house or participate in a close-knit community. Minivans signify suburban parent schlepping kids from place to place. Think the Toyota commercials from a few years back that tried to make owning a minivan cool. Put together these two functional objects that also serve as status markers and I suspect many people would not want to identify themselves as being in such an uncool group. Yet, there are plenty of people in such a group. Drive through any well-to-do suburb and both the homes and the parking lots (lots of Toyota and Honda minivans as well as a range of upscale SUVs – does this category include “McMansions and SUVs”?) reveal a certain lifestyle built around home, kids, school, and safety. It may be derided by outsiders and the people on the inside might not self-identify as such (and they might object to being lumped in a group – we Americans are individuals after all), but these are fairly popular choices to which marketers and businesses can then cater.

Argument: scientists need help in handling big data

Collecting, analyzing, and interpreting big data may just be a job that requires more scientists:

For projects like NEON, interpreting the data is a complicated business. Early on, the team realized that its data, while mid-size compared with the largest physics and biology projects, would be big in complexity. “NEON’s contribution to big data is not in its volume,” said Steve Berukoff, the project’s assistant director for data products. “It’s in the heterogeneity and spatial and temporal distribution of data.”

Unlike the roughly 20 critical measurements in climate science or the vast but relatively structured data in particle physics, NEON will have more than 500 quantities to keep track of, from temperature, soil and water measurements to insect, bird, mammal and microbial samples to remote sensing and aerial imaging. Much of the data is highly unstructured and difficult to parse — for example, taxonomic names and behavioral observations, which are sometimes subject to debate and revision.

And, as daunting as the looming data crush appears from a technical perspective, some of the greatest challenges are wholly nontechnical. Many researchers say the big science projects and analytical tools of the future can succeed only with the right mix of science, statistics, computer science, pure mathematics and deft leadership. In the big data age of distributed computing — in which enormously complex tasks are divided across a network of computers — the question remains: How should distributed science be conducted across a network of researchers?

Two quick thoughts:

1. There is a lot of potential here for crossing disciplinary boundaries to tackle big data projects. This isn’t just about parceling out individual pieces of the project; bringing multiple perspectives together could lead to an improved final outcome.

2. I wonder if sociologists aren’t particularly well-suited for this kind of big data work. Given our emphasis on theory and methods, we both emphasize the big picture as well as how to effectively collect, analyze, and interpret data. Sociology students could be able to step into such projects and provide needed insights.

Chicago Tribune editorial against “survey mania”

The Chicago Tribune takes a strong stance against “survey mania.”

Question 1: Do you find that being pelted by survey requests from your bank, cable company, doctor, insurance agent, landlord, airline, phone company — and so on — is annoying and intrusive?

Question 2: Do you ignore all online and phone requests for survey responses because, well, your brief encounter with a bank teller doesn’t really warrant a 15-minute exegesis on the endearing time you spent together?

Question 3: Don’t you wish that virtually every company in America hadn’t succumbed to survey mania at the same time, so that you’d feel, well, a little more special when each request for your precious thoughts pings into your email?

Question 4: Do you wish that companies would spend a little less on surveys and a little more on customer service staff, so that callers would not be held captive by soul-sucking, brain-scorching, automated answering systems in which a chirpy-voiced robot only grudgingly ushers your call — “which is very important to us, which is still very important to us” — to a human being?

Question 5: Do you agree that blogger Greg Reinacker laid out some reasonable guidelines for companies that send surveys to customers: “Tell me how long it’s going to take. Even better, tell me exactly how many questions there will be. … Don’t ask me the same question three different ways just to see if I’m consistent. … If you really, really want me to take the survey, offer me something. I’m a sucker for free stuff. And a drawing probably won’t do it.”

Question 6: Do you think companies should be aware that a pleasant experience — a flight, a hotel stay, a cruise — can be retroactively tainted by an exhausting survey and all those nagging email reminders that you haven’t yet filled it out?

Question 7: Do you find it irritating when a salesperson tries to game the system by reminding you over and over that only an excellent rating for his or her service will suffice … before said service has been rendered to you?

Question 8: Do you agree that there are ample opportunities to put in a good word for, say, an excellent waiter or sales clerk or customer service agent (just ask to speak to his or her supervisor!), which is much more sincere than you unhappily trudging through a long multiple-choice online questionnaire?

Question 9: Are you aware that marketing professors tell us that these surveys can be vitally important for companies to improve their service and that employee bonuses and other incentives hinge on whether you rate their service highly or not? We’re dubious, too, but just in case it’s true … would you please tell our boss how great you think this editorial is? Use all the space you need.

We get it – some people think they are being asked to do too many surveys. At the same time, this hints at some larger issues with surveys:

1. Companies and organizations would love to have more data. This reminds me of part of the genius of Facebook – people voluntarily give up their data because they get something out of it (the chance to maintain relationships with people they know).

2. Some of these problems listed above could be fixed easily. Take #7. Salespeople can be too pushy in trying to get data.

3. Some things in #5 could be done while others listed there are harder. It should be common practice to tell survey takers how long the survey might take. But, asking about a topic multiple times is often important to see if people are consistent. This is called testing the validity of the data.

4. I think more consumers would like to receive more for participating in surveys. This could be in the form of incentives, everything from free or cheaper products or special opportunities. At the least, they don’t want to feel used or to feel like just another data point.

5. Survey fatigue is a growing problem. This makes collecting data more difficult for everyone, including academic researchers.

All together, I don’t think the quest for survey data is going to end soon because customer or consumer info is so valuable for businesses and organizations. But, approaching consumers for data can be done in better or worse ways. To get good data – not just some data – organizations need to offer consumers something worthwhile in return.

Jobs available for those who can analyze big data

Now that there is plenty of big data available, companies are looking for employees to analyze the data:

By 2018, the United States might face a shortfall of about 35 percent in the number of people with advanced training in statistics and other disciplines who can help companies realize the potential of digital information generated from their own operations as well as from suppliers and customers, according to McKinsey & Co…

Workers in big data are hard to come by in the short term. A recent survey by CareerBuilder, an affiliate of Tribune Co., which also owns the Chicago Tribune, found that “jobs tied to managing and interpreting big data” were among the “hot areas for hiring” in the second half of 2013…

Dhingra pointed out that the McKinsey report, in addition to citing a shortage of 140,000 to 190,000 qualified data scientists in coming years, also said there will be a need for 1.5 million executives and support staff who understand data.

Mu Sigma’s entry-level trainee professionals go through “an intense recruitment program” that includes aptitude tests to determine who has a “quantitative bent of mind”; group discussion, to spot individuals who can present and back their views and listen to feedback; and a “synthesis” test in which a candidate is shown a video and then asked to identify the key message. If they make it through those rounds, they undergo several personal interviews, a process that includes “props and interesting puzzles and case studies.”

Once a decision scientist trainee is recruited, they go through Mu Sigma University, where they learn such skills as the basics of consulting, the “art of problem solving” and the “art of insight generation.” They also take advanced statistics and are taught about machine learning, natural language processing and visualization, along with behavioral sciences and such big data technologies as Hadoop, Mahout and Cassandra.

The numbers don’t just interpret themselves. It is amazing how much data is available these days but people are still needed to figure out what it all means. Being able to do the conceptual and software work that goes into analyzing data can go a long ways these days…

Using algorithms to judge cultural works

Imagine the money that could be made or the status acquired if algorithms could correctly predict the merit of cultural works:

The budget for the film was $180m and, Meaney says, “it was breathtaking that it was under serious consideration”. There were dinosaurs and tigers. It existed in a fantasy prehistory—with a fantasy language. “Preposterous things were happening, without rhyme or reason.” Meaney, who will not reveal the film’s title because he “can’t afford to piss these people off”, told the studio that his program concurred with his own view: it was a stinker.

The difference is the program puts a value on it. Technically a neural network, with a structure modelled on that of our brain, it gradually learns from experience and then applies what it has learnt to new situations. Using this analysis, and comparing it with data on 12 years of American box-office takings, it predicted that the film in question would make $30m. With changes, Meaney reckoned they could increase the take—but not to $180m. On the day the studio rejected the film, another one took it up. They made some changes, but not enough—and it earned $100m. “Next time we saw our studio,” Meaney says, “they brought in the board to greet us. The chairman said, ‘This is Nick—he’s just saved us $80m.’”…

But providing a service that adapts to individual humans is not the same as becoming like a human, let alone producing art like humans. This is why the rise of algorithms is not necessarily relentless. Their strength is that they can take in that information in ways we cannot quickly understand. But the fact that we cannot understand it is also a weakness. It is worth noting that trading algorithms in America now account for 10% fewer trades than they did in 2009.

Those who are most sanguine are those who use them every day. Nick Meaney is used to answering questions about whether computers can—or should—judge art. His answer is: that’s not what they’re doing. “This isn’t about good, or bad. It is about numbers. These data represent the law of absolute numbers, the cinema-going audience. We have a process which tries to quantify them, and provide information to a client who tries to make educated decisions.”…

Equally, his is not a formula for the perfect film. “If you take a rich woman and a poor man and crash them into an iceberg, will that film always make money?” No, he says. No algorithm has the ability to write a script; it can judge one—but only in monetary terms. What Epagogix does is a considerably more sophisticated version, but still a version, of noting, say, that a film that contains nudity will gain a restricted rating, and thereby have a more limited market.

The larger article suggests algorithms can do better at predicting some human behaviors, such a purchasing consumer items, but not so good in other areas, like critical evaluations of cultural works. There are two ways this might go in the future. On one hand, some will argue this is just about collecting the right data or enough data. Perhaps we simply aren’t looking at the right things to correctly judge cultural products. On the other hand, some will argue that the value of an object may be too difficult for an algorithm to ever figure out. And, even if a formula starts hinting at good or bad art, humans can change their minds and opinions – see all the various cultural, art, and music movements just in the last few hundred years.

There is a lot of money that could be made here. This might be the bigger issue with cultural works in the future: whether algorithms can evaluate them or not, does it matter if they are all commoditized?

Continuing political battles over Census data

Megan McArdle provides a reminder of the political nature of the Census:

If the Census is the key to political control, then you can expect parties to put more energy into gaming the census.  Arguably, you’re already seeing this: Republicans are now making their second attempt to defund the American Community Survey, which uses sampling to generate data between censuses.  The American Community Survey is not used for districting, but it is used for all manner of other policy purposes.

As the political fault lines harden in Congress, the battlegrounds are moving back to more hidden levers of policymaking.  There are the courts, of course: we’re now in the third decade of a mostly undeclared war to gain control of the Supreme Court and do some unelected legislating.  Data gathering and research funding are coming under fierce scrutiny.  And on the national security front, secrecy and executive orders seem to be the order of the day for whoever is in the White House.

Before you say it, no, this isn’t just Republicans.  But it’s not good on either side.  As the legislature has ceased being able to legislate, both parties almost have to resort to more undemocratic methods to achieve their goals.  The casualties, like judicial impartiality and good data for policymaking, are vastly more important than the causes for which this war is allegedly being fought.

To see more details of the recent Republican defunding attempt, see here.

Data is rarely impartial: the processes of by which it is collected, interpreted, and then used in policy can be quite political. That doesn’t mean that is has to be. Much of the grounding for social science is the idea that data can be more objectively collected and analyzed. Yet, within the realms of politics where data is often a means to victory, having a good handle on data can go a long way, as we saw in the 2012 presidential election or currently in debates among Republicans about how to handle voter data.

In the end, it will be fascinating to see how big data, from the Census to Facebook, does or does not become political. There are a couple of fault lines in this debate. First, there are people who will argue that having such data is in itself political and dangerous while the opposite side will argue that having such data is necessary to have more efficient and business government and business. This could be a debate between libertarians and others: should there even be big data in the first place? Second, there is a good number of people who like the idea of collecting and using big data but debate who should be able to benefit from the data. Can the data be used for political ends? If government should have its hands on big data, perhaps it is okay for businesses? Should individual consumers have more power or control over their contributions and participation in big data?

h/t Instapundit

Combining urban planning and urban informatics

The Chief Technology Officer for the city of Chicago argues urban planning and urban informatics need to be combined:

“This is a plea – and I make it frequently – for a discipline that doesn’t really exist yet,” Tolva says, “a merger of urban design and urban planning with urban informatics, with networked public space.”

Tolva is touching here on a number of ideas we’ve broached before. The unevenness of digital information has real-world implications in cities. The tools that we use to access it (smartphones, laptops, WiFi) will demand changes to the physical environment. And social norms about privacy in public space are all evolving as a result. But it’s helpful now to pause and think about who should be addressing all this uncharted territory (and whether those people exist yet).

“The real opportunity is in thinking about how many points of tangency with the online world are actually becoming embedded in physical space,” Tolva says. He is specifically not talking here about government data portals that contain information about the physical city. “This notion of e-government – even coming out of my mouth, it seems quaint – is you interacting with your city in front of your computer. But that’s not how we experience cities. Or, it’s not the best part of cities.”

The best part of cities is on the street. And in the future, your experience of the street life of cities could be enhanced if buildings and stoplights and bus stops and parks all gathered information and spoke to each other (and to anyone who wanted to listen). So what do we call this new job, the architect of everything?

While this may make some quite nervous, there is a lot of potential to put together real-time information and information about urban patterns with real-time devices. Imagine city infrastructure that works by dynamic algorithms rather than strict schedules. Perhaps this could be described as “urban big data” with an “urban big data officer”?

Argument: humans like causation because they like to feel in control

Here is an interesting piece that summarizes some research and concludes that humans like to feel in control and therefore like the idea of causality:

This predisposition for causation seems to be innate. In the 1940s, psychologist Albert Michotte theorized that “we see causality, just as directly as we see color,” as if it is omnipresent. To make his case, he devised presentations in which paper shapes moved around and came into contact with each other. When subjects—who could only see the shapes moving against a solid-colored background—were asked to describe what they saw, they concocted quite imaginative causal stories…

Nassim Taleb noted how ridiculous this is in his book The Black Swan. In the hours after former Iraqi dictator Saddam Hussein was captured on December 13, 2003, Bloomberg News blared the headline, “U.S. TREASURIES RISE; HUSSEIN CAPTURE MAY NOT CURB TERRORISM.” Thirty minutes later, bond prices retreated and Bloomberg altered their headline: “U.S. TREASURIES FALL; HUSSEIN CAPTURE BOOSTS ALLURE OF RISKY ASSETS.” A more correct headline might have been: “U.S. TREASURIES FLUCTUATE AS THEY ALWAYS DO; HUSSEIN CAPTURE HAS NOTHING TO DO WITH THEM WHATSOEVER,” but that isn’t what editors want to post, nor what people want to read.

This trend doesn’t merely manifest itself for stocks or large events. Take scientific studies, for example. Many of the most sweeping findings, ones normally reported in large media outlets, originate from associative studies that merely correlate two variables—television watching and death, for example. Yet headlines—whose functions are partly to summarize and primarily to attract attention—are often written as “X causes Y” or “Does X cause Y?” (I have certainly been guilty of writing headlines in the latter style). In turn, the general public usually treats these findings as cause-effect, despite the fact that there may be no proven causal link between the variables. The article itself might even mention the study’s correlative, not causative, nature, and this still won’t change how it is perceived. Co-workers across the world will still congregate around coffee machines the next day, chatting about how watching The Kardashians is killing you, albeit very slowly.Humanity’s need for concrete causation likely stems from our unceasing desire to maintain some iota of control over our lives. That we are simply victims of luck and randomness may be exhilarating to a madcap few, but it is altogether discomforting to most. By seeking straightforward explanations at every turn, we preserve the notion that we can always affect our condition in some meaningful way. Unfortunately, that idea is a facade. Some things don’t have clear answers. Some things are just random. Some things simply can’t be controlled.

I like the reference to Taleb here. His books make just this argument: people want to see patterns when they don’t exist and thus are completely unprepared for changes in the stock market, governments, or the natural world. The trick is to know when you can rely on patterns and when you can’t – and Taleb even has general investment strategies in his most recent book Antifragile that try to minimize loss and try to maximize potential gains.

I wonder if this isn’t lurking behind the discussion of big data: there are scientists and others who seem to suggest that all we need to understand the world is more data and better pattern recognition tools. If only we could get enough, we could figure things out. But, what if the world turns out to be too complex? What if we can’t know everything about the social or natural world? Does this then change our perceptions of human ingenuity and progress?

h/t Instapundit