The human eyes and hours needed to review CCTV footage to find terrorists

A common tool in fighting urban terrorism today is the closed-circuit camera system. However, it still takes a tremendous amount of personnel and time to go through all of the available tape. Here is a summary of what was required to put together the narrative of the 2005 bombing in London:

Six days after the attack, police start linking these events together. “By 13 July, the police had strong evidence that Khan, Tanweer, Hussain and Lindsay were the bombers and that they had died in the attacks.” But it was no small feat: Police collected 80,000 CCTV tapes, amounting to hundreds of thousands of hours of footage. The London police brought on some 400 extra officers to help with the grunt of it.

“The scale is enormous,” the narrative concluded.

As Alexis Madrigal writes at The Atlantic, although we have the technology to capture and record every inch of a city in real time, the process very much depends on a human eye to analyze. “Right now, there is no video software that can do this type of analysis,” he writes, “not even in a first-pass way.”

Even so, given the history here, it seems likely that given enough time, the perpetrators of the bombing will be found on camera. Whether the police can connect the thread among all the disparate sources of information is another matter.

In other words, you can collect big data but it still requires humans to make sense of it all. I imagine there is a big opportunity here for someone to create reliable recognition software but this may be a task where humans are simply better.

Wired says the data in Boston is being crowdsourced but the investigation will not:

It is unclear whether law enforcement had overhead cameras mounted in helicopters or other aircraft over the Marathon. (Boston-area cops don’t have spy drones — yet.) But the era of readily-accessible commercial imaging tools provides a twist on the exponential growth of surveillance tech used by law enforcement and homeland security. The data on your phone can become an adjunct to police during the highest-profile investigations.

That isn’t an unfettered benefit to police. The military has found that its explosion of imagery data has stressed its ability to process it, to the point where its futurists are hunting for algorithms that can pre-select images a human analyst sees. Davis requested that any spectator providing media showing the attacks indicate the time they collected the data so police “don’t need to go through the electronic signature.”

Lots of work to do.

Hoping Chicago can become a big data hub

A fundraiser held by Tom Pritzker this weekend in Chicago was part of a plan to make Chicago a center for big data:

University of Chicago computer scientist Ian Foster pressed the clicker and up popped a map of the most sophisticated fiber-optic networks in the world.

On that map, at least, Chicago appeared to be the center of everything, a crossroads of information dwarfing Beijing, London and New York in importance.

Fiber-optic lines lace this city — because they are often laid along railroad lines. And the University of Chicago is working to use that geographic advantage to build the largest storage hub in the world for genetic and medical information, called the bionimbus cloud. The goal is to harness massive amounts of data and computing power to solve the riddle of diseases such as cancer.

Last week, Hyatt Hotels Corp. Chairman Tom Pritzker and his wife, Margot, hosted a fundraiser at the Park Hyatt Chicago to introduce the project to about 50 friends, including CDW Corp. founder Michael Krasny; Melvin and Ellen Gordon, the CEO and president of Tootsie Roll Industries, respectively; Crate and Barrel founders Carole and Gordon Segal; Wheels Inc. Chief Executive Jim Frank; and Charles Evans, president of the Federal Reserve Bank of Chicago…

Grasping the magnitude of the data the medical community needs to collect and analyze is almost impossible.

But understanding a railroad hub — and the transport of grain, meat or oil — is not.

“Business, innovation, discovery, jobs still depend on taking raw materials and turning them into refined products,” Foster said. “Often, nowadays, the raw material is data and the refined material is knowledge.”

This is an interesting comparison to make in Chicago, a city that heavily depends on its transportation facilities such as busy airports and the large percentage of US rail traffic that goes through the region and has served as a commodity trading center for decades. So why not data analysis and infrastructure? At the same time, I’m guessing Chicago has a ways to go compared to other tech and data centers like Silicon Valley and Seattle, let alone other places like Austin and Boston.

Argument: Big Data reduces humans to something less than human

One commentator suggests Big Data can’t quite capture what makes humans human:

I have been browsing in the literature on “sentiment analysis,” a branch of digital analytics that—in the words of a scientific paper—“seeks to identify the viewpoint(s) underlying a text span.” This is accomplished by mechanically identifying the words in a proposition that originate in “subjectivity,” and thereby obtaining an accurate understanding of the feelings and the preferences that animate the utterance. This finding can then be tabulated and integrated with similar findings, with millions of them, so that a vast repository of information about inwardness can be created: the Big Data of the Heart. The purpose of this accumulated information is to detect patterns that will enable prediction: a world with uncertainty steadily decreasing to zero, as if that is a dream and not a nightmare. I found a scientific paper that even provided a mathematical model for grief, which it bizarrely defined as “dissatisfaction.” It called its discovery the Good Grief Algorithm.

The mathematization of subjectivity will founder upon the resplendent fact that we are ambiguous beings. We frequently have mixed feelings, and are divided against ourselves. We use different words to communicate similar thoughts, but those words are not synonyms. Though we dream of exactitude and transparency, our meanings are often approximate and obscure. What algorithm will capture “the feel of not to feel it?/?when there is none to heal it,” or “half in love with easeful Death”? How will the sentiment analysis of those words advance the comprehension of bleak emotions? (In my safari into sentiment analysis I found some recognition of the problem of ambiguity, but it was treated as merely a technical obstacle.) We are also self-interpreting beings—that is, we deceive ourselves and each other. We even lie. It is true that we make choices, and translate our feelings into actions; but a choice is often a coarse and inadequate translation of a feeling, and a full picture of our inner states cannot always be inferred from it. I have never voted wholeheartedly in a general election.

For the purpose of the outcome of an election, of course, it does not matter that I vote complicatedly. All that matters is that I vote. The same is true of what I buy. A business does not want my heart; it wants my money. Its interest in my heart is owed to its interest in my money. (For business, dissatisfaction is grief.) It will come as no surprise that the most common application of the datafication of subjectivity is to commerce, in which I include politics. Again and again in the scholarly papers on sentiment analysis the examples given are restaurant reviews and movie reviews. This is fine: the study of the consumer is one of capitalism’s oldest techniques. But it is not fine that the consumer is mistaken for the entirety of the person. Mayer-Schönberger and Cukier exult that “datafication is a mental outlook that may penetrate all areas of life.” This is the revolution: the Rotten Tomatoes view of life. “Datafication represents an essential enrichment in human comprehension.” It is this inflated claim that gives offense. It would be more proper to say that datafication represents an essential enrichment in human marketing. But marketing is hardly the supreme or most consequential human activity. Subjectivity is not most fully achieved in shopping. Or is it, in our wired consumerist satyricon?

“With the help of big data,” Mayer-Schönberger and Cukier continue, “we will no longer regard our world as a string of happenings that we explain as natural and social phenomena, but as a universe comprised essentially of information.” An improvement! Can anyone seriously accept that information is the essence of the world? Of our world, perhaps; but we are making this world, and acquiescing in its making. The religion of information is another superstition, another distorting totalism, another counterfeit deliverance. In some ways the technology is transforming us into brilliant fools. In the riot of words and numbers in which we live so smartly and so articulately, in the comprehensively quantified existence in which we presume to believe that eventually we will know everything, in the expanding universe of prediction in which hope and longing will come to seem obsolete and merely ignorant, we are renouncing some of the primary human experiences. We are certainly renouncing the inexpressible. The other day I was listening to Mahler in my library. When I caught sight of the computer on the table, it looked small.

I think there are a couple of arguments possible about the limitations of big data and Wieseltier is making a particular argument. He does not appear to be saying that big data can’t predict or model human complexity. And fans of big data would probably say the biggest issue is that we simply don’t have enough data yet and we are developing better and better models. In other words, our abilities and data will eventually catch up to the problem of complexity. But I think Wieseltier is arguing something else: he, along with many others, does not want humans to be reduced to information. Even if we had the best models, it is one thing to see people as complex individuals and yet another to say they are simply another piece of information. Doing the latter takes away the dignity of people. Reducing people to data means we stop seeing people as people that can change their minds, be creative, and confound predictions.

It will be interesting to see how this plays out in the coming years. I think this is the same fear many people have about statistics. Particularly in our modern world where we see ourselves as sovereign individuals, describing statistical trends to people strikes them as reducing their own agency and negating their own experiences. Of course, this is not what statistics is about and something more training in statistics could help change. But, how we talk about data and its uses might go a long way to how big data is viewed in the future.

Scholars suggest switch from urban studies to urban science and the DNA of cities

Several scholars recently called for pursuing urban science:

William Solecki compares the current study of cities to natural history in the 19th century. Back then most natural scientists were content to explore and document the extent of biological and behavioral differences in the world. Only recently has science moved from cataloguing life to understanding the genetic code that forms its very basis.

It’s time for urban studies to evolve the same way, says Solecki, a geographer at Hunter College who’s also director of the C.U.N.Y. Institute for Sustainable Cities. Scholars from any number of disciplines — economics and history to ecology and psychology — have explored and documented various aspects of city life through their own unique lenses. What’s needed now, Solecki contends, is a new science of urbanization that looks beyond the surface of cities to the fundamental laws that form their very basis too…

In Environment, the researchers outline three basic research goals for their proposed science of urbanization:

  1. To define the basic components of urbanization across time, space, and place.
  2. To identify the universal laws of city-building, presenting urbanization as a natural system.
  3. To link this new system of urbanization with other fundamental processes that occur in the world.

The result, Solecki believes, will be a stronger understanding of the “DNA” of cities — and, by extension, an improved ability to address urban problems in a systemic manner. Right now, for instance, urban transport scholars respond to the problem of sprawl and congestion with ideas like bike lanes or bus-rapid transit lines. Those programs can be great for cities, but in a way they fix a symptom of a problem that still lingers. An improved science of urbanization would isolate the underlying processes that caused this unsustainable development in the first place.

Three quick thoughts:

1. I think this assumes we have the kind of data and methodology that could get at the “DNA of cities.” Presumably, this is big data collected in innovative ways. To use the natural science metaphor, it is one thing to know about the existence of DNA and it is another thing to collect and analyze it. With this new kind of data, cities can then be viewed as complex systems with lots of moving pieces.

2. Are there necessarily universal laws underlying cities? We are currently in an academic world where there are a variety of theories about urban growth but they tend to be idiosyncratic to particular cities, apply to particular time periods, and emphasize different aspects of social, economic, and political life. Is this because no one has really put it all together yet or because it is really hard to find universal laws?

3.

Measuring audience reaction: from the applause of crowds to Facebook likes

Megan Garber provides an overview of applause, “the big data of the ancient world.

Scholars aren’t quite sure about the origins of applause. What they do know is that clapping is very old, and very common, and very tenacious — “a remarkably stable facet of human culture.” Babies do it, seemingly instinctually. The Bible makes many mentions of applause – as acclamation, and as celebration. (“And they proclaimed him king and anointed him, and they clapped their hands and said, ‘Long live the king!'”)

But clapping was formalized — in Western culture, at least — in the theater. “Plaudits” (the word comes from the Latin “to strike,” and also “to explode”) were the common way of ending a play. At the close of the performance, the chief actor would yell, “Valete et plaudite!” (“Goodbye and applause!”) — thus signaling to the audience, in the subtle manner preferred by centuries of thespians, that it was time to give praise. And thus turning himself into, ostensibly, one of the world’s first human applause signs…

As theater and politics merged — particularly as the Roman Republic gave way to the Roman Empire — applause became a way for leaders to interact directly (and also, of course, completely indirectly) with their citizens. One of the chief methods politicians used to evaluate their standing with the people was by gauging the greetings they got when they entered the arena. (Cicero’s letters seem to take for granted the fact that “the feelings of the Roman people are best shown in the theater.”) Leaders became astute human applause-o-meters, reading the volume — and the speed, and the rhythm, and the length — of the crowd’s claps for clues about their political fortunes.

“You can almost think of this as an ancient poll,” says Greg Aldrete, a professor of history and humanistic studies at the University of Wisconsin, and the author of Gestures and Acclamations in Ancient Rome. “This is how you gauge the people. This is how you poll their feelings.” Before telephones allowed for Gallup-style surveys, before SMS allowed for real-time voting, before the Web allowed for “buy” buttons and cookies, Roman leaders were gathering data about people by listening to their applause. And they were, being humans and politicians at the same time, comparing their results to other people’s polls — to the applause inspired by their fellow performers. After an actor received more favorable plaudits than he did, the emperor Caligula (while clutching, it’s nice to imagine, his sword) remarked, “I wish that the Roman people had one neck.”…

So the subtleties of the Roman arena — the claps and the snaps and the shades of meaning — gave way, in later centuries, to applause that was standardized and institutionalized and, as a result, a little bit promiscuous. Laugh tracks guffawed with mechanized abandon. Applause became an expectation rather than a reward. And artists saw it for what it was becoming: ritual, rote. As Barbra Streisand, no stranger to public adoration, once complained: “What does it mean when people applaud? Should I give ’em money? Say thank you? Lift my dress?” The lack of applause, on the other hand — the unexpected thing, the relatively communicative thing — “that I can respond to.”…

Mostly, though, we’ve used the affordances of the digital world to remake public praise. We link and like and share, our thumbs-ups and props washing like waves through our networks. Within the great arena of the Internet, we become part of the performance simply by participating in it, demonstrating our appreciation — and our approval — by amplifying, and extending, the show. And we are aware of ourselves, of the new role a new world gives us. We’re audience and actors at once. Our applause is, in a very real sense, part of the spectacle. We are all, in our way, claqueurs.

Fascinating, from the human tendency across cultures to clap, planting people in the audience to clap and cheer, to the rules that developed around clapping.

A couple of thoughts:

1. Are there notable moments in history when politicians and others thought the crowd was going one way because of applause but quickly found out that wasn’t the case? Simply going by the loudest noise seems rather limited, particularly with large crowds and outdoors.

2. The translation of clapping into Facebook likes loses the embodied nature of clapping and crowds. Yes, likes allow you to mentally see that you are joining with others. But, there is something about the social energy of a crowd that is completely lost. Durkheim would describe this as collective effervesence and Randall Collins describes the physical nature of “emotional energy” that can be generated when humans are in close physical proximity to each other. Clapping is primarily a group behavior and is difficult to transfer to a more individualistic setting.

3. I have noticed in my lifetime the seemingly increasing prevalence of standing ovations. Pretty much every theater show I have been to in recent years is followed by a standing ovation. My understanding is that at one point such ovations were reserved for truly spectacular performances but now it is simply normal. Thus, the standing ovation now has a very different meaning.

Getting the data to model society like we model the natural world

A recent session at the American Association for the Advancement of Science included a discussion of how to model the social world:

Dirk Helbing was speaking at a session entitled “Predictability: from physical to data sciences”. This was an opportunity for participating scientists to share ways in which they have applied statistical methodologies they usually use in the physical sciences to issues which are more ‘societal’ in nature. Examples stretched from use of Twitter data to accurately predict where a person is at any moment of each day, to use of social network data in identifying the tipping point at which opinions held by a minority of committed individuals influence the majority view (essentially looking at how new social movements develop) through to reducing travel time across an entire road system by analysing mobile phone and GIS (Geographical Information Systems) data…

With their eye on the big picture, Dr Helbing and multidisciplinary colleagues are collaborating on FuturICT, a 10-year, 1 billion EUR programme which, starting in 2013, is set to explore social and economic life on earth to create a huge computer simulation intended to simulate the interactions of all aspects of social and physical processes on the planet. This open resource will be available to us all and particularly targeted at policy and decision makers. The simulation will make clear the conditions and mechanisms underpinning systemic instabilities in areas as diverse as finance, security, health, the environment and crime. It is hoped that knowing why and being able to see how global crises and social breakdown happen, will mean that we will be able to prevent or mitigate them.

Modelling so many complex matters will take time but in the future, we should be able to use tools to predict collective social phenomena as confidently as we predict physical pheno[men]a such as the weather now.

This will require a tremendous amount of data. It may also require asking for a lot more data from individual members of society in a way that has not happened yet. To this point, individuals have been willing to volunteer information in places like Facebook and Twitter but we will need much more consistent information than that to truly develop models like are suggested here. Additionally, once that minute to minute information is collected, it needs to be put in a central dataset or location to see all the possible connections. Who is going to keep and police this information? People might be convinced to participate if they could see the payoff. A social model will be able to do what exactly – limit or stop crime or wars? Help reduce discrimination? Thus, getting the data from people might be as much of a problem as knowing what to do with it once it is obtained.

You can collect lots of Moneyball-type data but it still has to be used well

Another report from the MIT Sloan Sports Analytics Conference provides this useful reminder about statistics and big data:

Politics didn’t come up at the conference, except for a single question to Nate Silver, the FiveThirtyEight election oracle who got his start doing statistical analysis on baseball players. Silver suggested there wasn’t much comparison between the two worlds.

But even if there’s no direct correlation, there was an underlying message I heard consistently throughout the conference that applies to both: Data is an incredibly valuable resource for organizations, but you must be able to communicate its value to stakeholders making decisions — whether that’s in the pursuit of athletes or voters.

And the Obama 2012 campaign successfully put this together. Here is one example:

Data played a major role. There’s perhaps no better example than the constant testing of email subject lines. The performance of the Obama email with the subject line “I will be outspent” earned the campaign an estimated $2.6 million. Had the campaign gone with the lowest-performing subject line, it would have raised $2.2 million less, according to “Inside the Cave,” a detailed report from Republican strategist Patrick Ruffini and the team at Engage.

This is an important reminder about statistics: they still have to be used well and effectively shared with leaders and the public. We are now in a world where more data is available than ever before but this doesn’t necessarily mean life is getting better.

I recently was in a conversation about the value of statistics. I suggested that if colleges and others were able to effectively train the students of today in statistics and how to use them in the real world, we might be better off as a society in a few decades as these students go on to become leaders who can make statistics a regular part of their decision-making. We’ll see if this happens…

Using analytics and statistics in sports and society: a ways to go

Truehoop has been doing a fine job covering the 2013 MIT Sloan Sports Analytics Conference. One post from last Saturday highlighted five quotes “On how far people have delved into the potential of analytics“:

“We are nowhere yet.”
— Morey

“There is a human element in sports that is not quantifiable. These players bleed for you, give you everything they have, and there’s a bond there.”
— Bill Polian, ESPN NFL analyst

“When visualizing data, it’s not about how much can I put in but how much can I take out.”
— Joe Ward, The New York Times sports graphics editor

“If you are not becoming a digital CMO (Chief Marketing Officer), you are becoming extinct.”
— Tim McDermott, Philadelphia Eagles CMO

“Even if God came down and said this model is correct … there is still randomness, and you can be wrong.”
— Phil Birnbaum, By The Numbers editor

In other words, there is a lot of potential in these statistics and models but we have a long way to go in deploying them correctly. I think this is a good reminder when thinking about big data as well: simply having the numbers and recognizing they might mean something is a long way from making sense of the numbers and improving lives because of our new knowledge.

Call for more social science modeling for Social Security

An op-ed in the New York Times explains how poorly financial forecasts for Social Security are made and suggests social scientists can help:

Remarkably, since Social Security was created in 1935, the government’s forecasting methods have barely changed, even as a revolution in big data and statistics has transformed everything from baseball to retailing.

This omission can be explained by the fact that the Office of the Chief Actuary, the branch of the Social Security Administration that is responsible for the forecasts, is almost exclusively composed of, well, actuaries — without any serious representation of statisticians or social science methodologists. While these actuaries are highly responsible and careful and do excellent work curating and describing the data that go into the forecasts, their job is not to make statistical predictions. Yet the agency badly needs such expertise.

With considerable help from the actuaries and other officials at the Social Security Administration, we unearthed how the agency makes mortality forecasts and uses them to predict the program’s solvency. We learned that the methods are antiquated, subjective and needlessly complicated — and, as a result, are prone to error and to potential interference from political appointees. This may explain why the agency’s forecasts have, at times, changed significantly from year to year, even when there was little change in the underlying data.

We have made our methods, calculations and software available online at j.mp/SSecurity so that others can replicate or improve our forecasts. The implications of our findings go beyond social science. As the wave of retirement by the baby boomers continues, doing nothing to shore up Social Security’s solvency is irresponsible. If the amount of money coming in through payroll taxes does not increase and if the amount of money going out as benefits remains the same, the trust funds will become insolvent less than 20 years from now.

Sociologists seem to be looking for ways to get involved in major policy issues so perhaps this is one way to do that. It is also interesting to note this op-ed is based on a 2012 article in Demography titled “Statistical Security for Social Security.” Not too many articles can make such a claim…

Also, I’m sure this doesn’t inspire confidence among some for the government’s ability to keep track of all of its data. Does the federal government have the ability to hire and train the kind of people it needs? Can it compete with the private sector or political campaigns (think of what the lauded 2012 Obama campaign big data workers might be able to do)?

New study says congestion could be lessened by reducing a small number of trips from certain neighborhoods

A new study suggests a targeted reduction of trips from certain locations could greatly reduce congestion:

To learn more about traffic congestion in the hope of finding ways of relieving it, an international team of scientists analyzed road use patterns in the San Francisco Bay area and the Boston area. They used mobile phone information from more than 1 million users over the course of three weeks to map out where drivers were concentrated on roads. (The data was rendered anonymous before the investigators looked at it, the study authors noted.)

Based on their analysis, the researchers suggest that certain neighborhoods in these urban areas were home to drivers that caused major congestion. The scientists found that canceling just 1 percent of trips from these neighborhoods could drastically reduce travel time that was otherwise added due to congestion.

“In the Boston area, we found that canceling 1 percent of trips by select drivers in the Massachusetts municipalities of Everett, Marlborough, Lawrence, Lowell and Waltham would cut all drivers’ additional commuting time caused by traffic congestion by 18 percent,” said researcher Marta González, a complex-systems scientist at the Massachusetts Institute of Technology. “In the San Francisco area, canceling trips by drivers from Dublin, Hayward, San Jose, San Rafael and parts of San Ramon would cut 14 percent from the travel time of other drivers.”

The location of these neighborhoods apparently makes it easy for them to impact their cities. “Being able to detect and then release the congestion in the most affected arteries improves the functioning of the entire coronary system,” González  told TechNewsDaily.

There are many ways people might reduce the number of drivers hitting the road from these key neighborhoods, the scientists said. For instance, the authorities might encourage alternatives “such as public transportation, carpooling, flex time and working from home,” González said. Mobile phone apps that connect people using the same roads might help them coordinate carpooling, she added.

Two things stand out to me:

1. It seems like the advantage to this method is that it allows officials and drivers to target traffic flows from particular locations and then plan accordingly. More often, we settle for traffic solutions like adding more lanes over a stretch of highway or extending mass transit to a particular location. But this kind of analysis is able to help people target particular areas rather than having to apply catch-all solutions.

2. Collecting and using this data sounds very interesting. This is big data at work: taking information that is collected about over 1 million cell phone users and then using that information in a new way. It also allows researchers to see the system as a whole.

My next question would then be is it be easy politically to target particular areas for congestion reduction?