Better software to reduce traffic

Adaptive traffic software has helped reduce congestion in Ann Arbor:

Ann Arbor’s adaptive traffic signal control system has been playing god for more than a decade, but fiddling engineers continue to tweak its inputs and algorithms. Now it reduces weekday travel times on affected corridors by 12 percent, and weekend travel time by 21 percent. A trip along one busy corridor that took under three minutes just 15 percent of the time in 2005 now comes in under that mark 70 percent of the time. That’s enough to convince Ann Arbor’s traffic engineers, who just announced they’ll extend this system to all its downtown traffic lights and its most trafficked corridors.

To combat congestion, each hopped-up signal uses pavement-embedded sensors or cameras to spot cars waiting at red lights. The signals send that information via fiber network to the Big Computer back at traffic management base, which compiles the data.

This stuff works on a macro and micro level: If there are four cars lined up to go one way through an intersection, and zero cars lined up to move perpendicular to them, the light might turn green for the four. But a network of connected lights—like in Ann Arbor—will analyze the entire grid, and figure out who to prioritize to get the most people to their destinations the fastest. Advanced traffic control systems can even predict delays and congestion build-up before they happen, based on the ebb and flow of commutes…

The system knows when to lay off the change. “People kind of freak out if the signal is really different from yesterday or different from what it was five years ago,” says Richard Wallace, who directs the Center for Automotive Research’s transportation systems analysis group. For the most part, the system looks to tweak light patterns, not reshape the whole shebang from one hour to the next.

As we wait for the complete takeover by driverless cars, this could help ease our troubles. Small but consistent improvements like this could make a big difference to many commuters. Of course, it could also have the effect of encouraging more drivers who see that the commute is not so bad. Perhaps this is why the lights should be somewhat haphazard; it might unnerve a few of those drivers.

I assume there are some costs associated with putting in sensors and cameras as well as in developing the software and having employees to set up and run the system. How do these costs compare to the money saved in shorter driving trips? Or, what if this money had been put into opportunities like mass transit that would remove drivers from the roads?

Using a supercomputer and big data to find stories of black women

A sociologist is utilizing unique methods to uncover more historical knowledge about black women:

Mendenhall, who is also a professor of African American studies and urban and regional planning, is heading up the interdisciplinary team of researchers and computer scientists working on the big data project, which aims to better understand black women’s experience over time. The challenge in a project like this is that documents that record the history of black women, particularly in the slave era, aren’t necessarily going to be straightforward explanations of women’s feelings, resistance, or movement. Instead, Mendenhall and her team are looking for keywords that point to organizations or connections between groups that can indicate larger movements and experiences.

Using a supercomputer in Pittsburgh, they’ve culled 20,000 documents that discuss black women’s experience from a 100,000 document corpus (collection of written texts). “What we’re now trying to do is retrain a model based on those 20,000 documents, and then do a search on a larger corpus of 800,000, and see if there are more of those documents that have more information about black women,” Mendenhall added…

Using topic modeling and data visualization, they have started to identify clues that could lead to further research. For example, according to Phys.Org, finding documents that include the words “vote” and “women” could indicate black women’s participation in the suffrage movement. They’ve also preliminarily found some new texts that weren’t previously tagged as by or about black women.

Next up Mendenhall is interested in collecting and analyzing data about current movements, such as Black Lives Matter.

It sounds like this involves putting together the best algorithm to do pattern recognition that would take humans too long to process. This can only be done with some good programming as well as a significant collection of texts. Three questions come quickly to mind:

  1. How would one report findings from this data in typical outlets for sociological or historical research?
  2. How easy would it be to apply this to other areas of inquiry?
  3. Is this data mining or are there hypothesis that can be tested?

There are lots of possibilities like this with big data but it remains to be seen how useful it might be for research.

Social scientists critique Facebook’s study claiming the news feed algorithm doesn’t lead to a filter bubble

Several social scientists have some concerns about Facebook’s recent findings that its news feed algorithm is less important than the choices of individual users in limiting what they see to what they already agree with:

But even that’s [sample size] not the biggest problem, Jurgenson and others say. The biggest issue is that the Facebook study pretends that individuals choosing to limit their exposure to different topics is a completely separate thing from the Facebook algorithm doing so. The study makes it seem like the two are disconnected and can be compared to each other on some kind of equal basis. But in reality, says Jurgenson, the latter exaggerates the former, because personal choices are what the algorithmic filtering is ultimately based on:“Individual users choosing news they agree with and Facebook’s algorithm providing what those individuals already agree with is not either-or but additive. That people seek that which they agree with is a pretty well-established social-psychological trend… what’s important is the finding that [the newsfeed] algorithm exacerbates and furthers this filter bubble.”

Sociologist and social-media expert Zeynep Tufekci points out in a post on Medium that trying to separate and compare these two things represents the worst “apples to oranges comparison I’ve seen recently,” since the two things that Facebook is pretending are unrelated have significant cumulative effects, and in fact are tied directly to each other. In other words, Facebook’s algorithmic filter magnifies the already human tendency to avoid news or opinions that we don’t agree with…

Christian Sandvig, an associate professor at the University of Michigan, calls the Facebook research the “not our fault” study, since it is clearly designed to absolve the social network of blame for people not being exposed to contrary news and opinion. In addition to the framing of the research — which tries to claim that being exposed to differing opinions isn’t necessarily a positive thing for society — the conclusion that user choice is the big problem just doesn’t ring true, says Sandvig (who has written a paper about the biased nature of Facebook’s algorithm).

Research on political echo chambers has grown in recent years and has included examinations of blogs and TV news channels. Is Facebook “bad” if it follows the pattern of reinforcing boundaries? While it may not be surprising if it does, I’m reminded of what I’ve read about Mark Zuckerberg’s intentions for what Facebook would do: bring people together in ways that wouldn’t happen otherwise. So, if Facebook itself has the goal of crossing traditional boundaries, which are usually limited by homophily (people choosing to associate with people largely like themselves) and protecting the in-group against out-group interlopers, then does this mean the company is not meeting its intended goals? I just took a user survey from them recently that didn’t include much about crossing boundaries and instead asked about things like having fun, being satisfied with the Facebook experience, and whether I was satisfied with the number of my friends.

Google+ a “sociologically simple and elegant solution”?

According to one reviewer, Google+ takes advantage of sociological principles with its circles:

You also don’t have to ask anybody to be your “friend”. Nor do you have to reply to anybody’s “friend request”. You simple put people into the discrete/discreet spheres they already inhabit in your life…

Now, if you had asked me which company I considered least likely to come up with such a sociologically simple and elegant solution, I might well have answered: Google.

Its founders and honchos worship algorithms more than Mark Zuckerberg does. (I used to exploit this geekiness as “color” in my profiles of Google from that era.) Google then seemed to live down to our worst fears by making several seriously awkward attempts at “social” (called Buzz and Wave and so forth).

But these calamities seem to have been blessings. Google seems to have been humbled into honesty and introspection. It then seems to have done the unthinkable and consulted not only engineers but … sociologists (yuck). And now it has come back with … this.

Why exactly do algorithms and sociological principles have to be in opposition to each other? It is a matter of what informs these algorithms: brute efficiency, sociological principles, something else…

Ultimately, couldn’t we also argue that the sociological validity of Google+ will be demonstrated by whether it catches on or not? Facebook may not be elegant or “correct” but people have found it useful and at least worthwhile to join(even if some loath it). Perhaps this is too pragmatic of an answer (if it works, it is successful) but this seems to make sense with social media.

This reminds me as well of the idea expressed in The Facebook Effect (quick review here) that Facebook wishes to reach a point where people are willing to share their information with lots of people they may not know. If this is still the goal, Google+ then is more conservative in that people can restrict information by circle. I suspect it will be a while before a majority of people are willing to go the route suggested by Facebook but perhaps Facebook is being more “progressive” in the long run by trying to push people in a new direction.

Using a sociological approach in “e-discovery technologies”

Legal cases can generate a tremendous amount of documents that each side needs to examine. With new searching technology, legal teams can now go through a lot more data for a lot less money. In one example, “Blackstone Discovery of Palo Alto, Calif., helped analyze 1.5 million documents for less than $100,000.” But within this discussion, the writer suggests that these searches can be done in two ways:

E-discovery technologies generally fall into two broad categories that can be described as “linguistic” and “sociological.”

The most basic linguistic approach uses specific search words to find and sort relevant documents. More advanced programs filter documents through a large web of word and phrase definitions. A user who types “dog” will also find documents that mention “man’s best friend” and even the notion of a “walk.”

The sociological approach adds an inferential layer of analysis, mimicking the deductive powers of a human Sherlock Holmes. Engineers and linguists at Cataphora, an information-sifting company based in Silicon Valley, have their software mine documents for the activities and interactions of people — who did what when, and who talks to whom. The software seeks to visualize chains of events. It identifies discussions that might have taken place across e-mail, instant messages and telephone calls…

The Cataphora software can also recognize the sentiment in an e-mail message — whether a person is positive or negative, or what the company calls “loud talking” — unusual emphasis that might give hints that a document is about a stressful situation. The software can also detect subtle changes in the style of an e-mail communication.

A shift in an author’s e-mail style, from breezy to unusually formal, can raise a red flag about illegal activity.

So this second technique gets branded as “sociological” because it is looking for patterns of behavior and interaction. If you wondered how the programmers set up their code in order to this kind of analysis, it sounds like some academics have been working on the problem for almost a decade:

[A computer scientist] bought a copy of the database [of Enron emails] for $10,000 and made it freely available to academic and corporate researchers. Since then, it has become the foundation of a wealth of new science — and its value has endured, since privacy constraints usually keep large collections of e-mail out of reach. “It’s made a massive difference in the research community,” Dr. McCallum said.

The Enron Corpus has led to a better understanding of how language is used and how social networks function, and it has improved efforts to uncover social groups based on e-mail communication.

Any sociologists involved in this project to provide input on what the programs should be looking for in human interactions?

This sort of analysis software could be very handy for sociological research when one has hundreds of documents or sources to look through. Of course, the algorithms might have be changed for specific projects or settings but I wonder if this sort of software might be widely available in a few years. Would this analysis be better than going through one by one through documents in coding software like Atlas.Ti or NVivo?

The prospect of the automated grading of essays

As the American public debates the exploits of Watson (and one commentator suggests it should, among other things, sort out Charlie Sheen’s problem) how about turning over grading essays to computers? There are programs in the works to make this happen:

At George Mason University Saturday, at the Fourth International Conference on Writing Research, the Educational Testing Service presented evidence that a pilot test of automated grading of freshman writing placement tests at the New Jersey Institute of Technology showed that computer programs can be trusted with the job. The NJIT results represent the first “validity testing” — in which a series of tests are conducted to make sure that the scoring was accurate — that ETS has conducted of automated grading of college students’ essays. Based on the positive results, ETS plans to sign up more colleges to grade placement tests in this way — and is already doing so.

But a writing scholar at the Massachusetts Institute of Technology presented research questioning the ETS findings, and arguing that the testing service’s formula for automated essay grading favors verbosity over originality. Further, the critique suggested that ETS was able to get good results only because it tested short answer essays with limited time for students — and an ETS official admitted that the testing service has not conducted any validity studies on longer form, and longer timed, writing.

Such programs are only as good as the algorithm and method behind it. And it sounds like this program from ETS still has some issues. The process of grading is a skill that teachers develop. Much of this can be quantified and placed into rubrics. But I would also guess that many teachers develop an intuition that helps them quickly apply these important factors to work that they read and grade.

But on a broader scale, what would happen if the right programs could be developed? Could we soon reach a point where professors and teachers would agree that a program could effectively grade writing?

Trying to count the people on the streets in Cairo

This is a problem that occasionally pops up in American marches or rallies: how exactly should one estimate the number of people in the crowd? This has actually been quite controversial at points as certain organizers of rallies have produced larger figures than official government or media estimates. And with the ongoing protests taking place in Cairo, the same question has arisen: just how many Egyptians have taken to the streets in Cairo? There is a more scientific process to this beyond a journalist simply making a guess:

To fact-check varying claims of Cairo crowd sizes, Clark McPhail, a sociologist at the University of Illinois and a veteran crowd counter, started by figuring out the area of Tahrir Square. McPhail used Google Earth’s satellite imagery, taken before the protest, and came up with a maximum area of 380,000 square feet that could hold protesters. He used a technique of area and density pioneered in the 1960s by Herbert A. Jacobs, a former newspaper reporter who later in his career lectured at the University of California, Berkeley, as chronicled in a Time Magazine article noting that “If the crowd is largely coeducational, he adds, it is conceivable that people might press closer together just for the fun of it.”

Such calculations of capacity say more about the size of potential gathering places than they do about the intensity of the political movements giving rise to the rallies. A government that wants to limit reported crowd sizes could cut off access to its cities’ biggest open areas.

From what I have read in the past on this topic, this is the common approach: calculate how much space is available to protesters or marchers, calculate how much space an individual needs, and then look at photos to see how much of that total space is used. The estimates can then vary quite a bit depending on how much space it is estimated each person wants or needs. These days, the quest to count is aided by better photographs and satellite images:

That is because to ensure an accurate count, some computerized systems require multiple cameras, to get high-resolution images of many parts of the crowd, in case density varies. “I don’t know of real technological solutions for this problem,” said Nuno Vasconcelos, associate professor of electrical and computer engineering at the University of California, San Diego. “You will have to go with the ‘photograph and ruler’ gurus right now. Interestingly, this stuff seems to be mostly of interest to journalists. The funding agencies for example, don’t seem to think that this problem is very important. For example, our project is more or less on stand-by right now, for lack of funding.”

Without any such camera setup, many have turned to some of the companies that collect terrestrial images using satellites, but these companies have collected images mostly before and after the peak of protests this week. “GeoEye and its regional affiliate e-GEOS tasked its GeoEye-1 satellite on Jan. 29, 2011 to collect half-meter resolution imagery showing central Cairo, Egypt,” GeoEye’s senior vice president of marketing, Tony Frazier, said in a written statement. “We provided the imagery to several customers, including Google Earth. GeoEye normally relies on our partners to provide their expert analysis of our imagery, such as counting the number of people in these protests.” This image was taken before the big midweek protests. DigitalGlobe, another satellite-imagery company, also didn’t capture images of the protests, according to a spokeswoman, but did take images later in the week.

Because these images are difficult to come by in Egypt, it is then difficult to make an estimate. As the article notes, this is why you will get vague estimates for crowd sizes in news stories like “thousands” or “tens of thousands.”

Since this is a problem that does come up now and then, can’t someone put together a better method for making crowd estimates? If certain kinds of images could be obtained, it seems like an algorithm could be developed that would scan the image and somehow differentiate between people.

Predicting future crimes

Professor Richard Berk from the University of Pennsylvania has developed software that predicts which criminals on probation or parole will commit future crimes. His software is already being used in Baltimore and Philadelphia and soon will be used in Washington, D.C.

Here is a quick description of how the algorithm was developed:

Beginning several years ago, the researchers assembled a dataset of more than 60,000 various crimes, including homicides. Using an algorithm they developed, they found a subset of people much more likely to commit homicide when paroled or probated. Instead of finding one murderer in 100, the UPenn researchers could identify eight future murderers out of 100.

Berk’s software examines roughly two dozen variables, from criminal record to geographic location. The type of crime, and more importantly, the age at which that crime was committed, were two of the most predictive variables.

Of course, there could be some problems with this:

But Berk’s scientific answer leaves policymakers with difficult questions, said Bushway. By labeling one group of people as high risk, and monitoring them with increased vigilance, there should be fewer murders, which the potential victims should be happy about.

It also means that those high-risk individuals will be monitored more aggressively. For inmate rights advocates, that is tantamount to harassment, “punishing people who, most likely, will not commit a crime in the future,” said Bushway.

“It comes down to a question of whether you would rather make these errors or those errors,” said Bushway.

I would be curious to see reports on the effectiveness of this software over time. And determining whether this software is effective in areas like reducing crime would present some interesting measurement issues.