Using smartphones to collect important economic data

Virginia Postrel describes a new app used in a number of countries to gather economic data on the ground:

Founded in 2012, the San Francisco-based startup Premise began by looking for a way to supplement official price indices with a quick-turnaround measure of inflation and relative currency values. It needed “a scalable, cost-effective way to collect a lot of price data,” chief executive David Soloff said in an interview. The answer was an Android app and more than 30,000 smart-phone-wielding contractors in 32 countries.

The contractors, who are paid by the usable photo and average about $100 a month, take pictures aimed at answering specific economic questions: How do the prices in government-run stores compare to those in private shops? Which brands of cigarette packages in which locations carry the required tax stamp? How many houses are hooked into power lines? What’s happening to food prices? Whatever the question, the data needed to answer it must be something a camera can capture…

The result is a collection of price indices updated much more frequently and with less time lag — although also fewer indicative items — than monthly government statistics. For Bloomberg terminal subscribers, Premise tracks food and beverage prices in the U.S., China, India, Brazil and Argentina, using indices mirroring government statistics. It gets new information daily; Bloomberg publishes new data twice a week. Premise tracks a similar index in Nigeria for Standard Chartered bank, which has made the aggregate data public. (Premise clients can drill down to see differences across products, types of retailers, or regions.) While more volatile than official statistics, the figures generally anticipate them, serving as an early-warning system for economic trends…

Premise has government clients, and it carefully positions its work as a complement to official statistics, as well as to the academic Billion Prices Project, which scrapes massive amounts of price data from online sources but can’t say what cooking oil sells for in a corner shop. Make no mistake, however: Its methods also provide valuable competition to the official data. The point, after all, is to find out what’s actually happening, not what government reports will say in a few weeks.

This is an innovative way to get data more quickly. It would be interesting to see how reliable this data is. Now it remains to be seen how markets, governments, and others will use more up-to-date information.

More broadly, smartphones could be used to collect all sorts of data. See previous posts on using the microphone and the use of additional apps such as Twitter and Waze.

“The most misleading charts of 2015, fixed”

Here are some improved charts first put forward by politicians, advocacy groups, and the media in 2015.

I’m not sure exactly how they picked “the most misleading charts” (is there bias in this selection?) but it is interesting that several involve a misleading y-axis. I’m not sure that I would count the last example as a misleading chart since it involves a definition issue before getting to the chart.

And what is the purpose of the original, poorly done graphics? Changing the presentation of the data provides evidence for a particular viewpoint. Change the graphic depiction of the data and another story could be told. Unfortunately, it is actions like these that tend to cast doubt on the use of data for making public arguments – the data is simply too easy to manipulate so why rely on data at all? Of course, that assumes people look closely at the chart and the data source and know what questions to ask…

Quick Review: The Third Coast: When Chicago Built the American Dream

Thomas Dyja has a provocative argument in The Third Coast: while New York and LA are widely viewed as America’s cultural centers, Chicago of the mid-1900s contributed more than people think to American culture. My quick review of the book:

  1. The fact that the book is built on impressionistic vignettes is book its greatest strength and weakness. Dyja tells a number of interesting stories about cultural figures in Chicago from author Nelson Algren to Bauhaus member László Moholy-Nagy to University of Chicago president Robert Hutchins to puppeteer and TV show creator Burr Tillstrom to magazine creator Hugh Hefner. The characters he profiles have highs and lows but they are all marked by a sort of middle America creativity based on hard work, connecting with audiences, and not being flashy.
  2. Yet, stringing together a set of characters doesn’t help him make his larger argument that Chicago was influential. We get pieces of evidence – an important contribution to television here, the importance of Chess records, a clear contribution to architecture there – but no comparative element. By his lack of attention, Dyja suggests Chicago didn’t contribute much – art is one such area with a lack of a vibrant modern art scene (though what TripAdvisor ratings say is the world’s #1 museum does not get much space). Just how much did these actions in Chicago change the broader American culture? What was going on in New York and LA at those times? The data is anecdotal and difficult to judge.
  3. A few of the more interesting pieces of the book: he suggests Chicago contributed more to the Civil Rights Movement than many people remember (particularly due to the Emmett Till case); Chicago music, particularly through Muddy Waters and Howlin’ Wolf, was particularly influential elsewhere; Mayor Richard J. Daley was on one hand supportive of the arts but only in a functional sense and the arts scene slowly died away into the early 1960s as creative type went elsewhere.

Ultimately, it is hard to know whether these contributions from Chicago really mattered or not. The one that gets the most attention – architecture through former members of the Bauhaus and then the International Style – probably really was a major contribution for both American and global cities. But even there, the focus of this book is on the people and not necessarily on their buildings or how normal Chicagoans experienced those structures or how the changes fit within the large social-political-economic scene in Chicago.

Debate over data on the mental fragility of college students

A recent study suggests there is a need for more data to claim that today’s college students are more fragile:

The point, overall, is that given the dizzying array of possible factors at work here, it’s much too pat a story to say that kids are getting more “fragile” as a result of some cultural bugaboo. “I think it’s not only an oversimplification, I think it’s unfair to the kids, many of whom are very hardworking and tremendously diligent, and working in systems that are often very competitive,” said Schwartz. “Many of the kids are doing extraordinarily well, and I think it’s unfair to portray this whole group of people as being somehow weakhearted or weak-minded in some sense, when there’s no evidence to really support it.”

It hasn’t gone unnoticed among those who study college mental health that there’s an interesting divide at work here: College counselors are so convinced kids’ mental health is getting worse that it’s become dogma in some quarters, and yet it’s been tricky to find any solid, rigorous evidence of this. Some researchers have tried to dig into counseling-center data in an attempt to explain this discrepancy. One recent effort, published in the October issue of the Journal of College Student Psychopathology, comes from Allan J. Schwartz, a psychiatry professor at the University of Rochester who has devoted a chunk of his career to studying college suicide. Schwartz examined data from “4,755 clients spanning a 15-year period from 1992-2007” at one university, poring over the records to determine whether students who came in contact with that school’s counseling services had, over that period, exhibited increasing levels of distress in the form of suicidality, anxiety and phobic disorders, overall signs of serious mental illness, and other measures. (The same caveat I mentioned above applies here — such a study can only tell us about rates of pathology among kids who go to counseling centers. But it can at least help determine whether counselors are right that among the kids they see every day, things are getting worse.)

Schwartz found no evidence to support the pessimistic view. With the exception of suicidality, where he noted a “significant decline” over the years, every other measure he looked at held stable over the study’s 15-year span. In his paper, Schwartz rightly notes that there are limitations to what we can extrapolate from a study of a single campus. But he goes on to explain that four other, similar studies, published between 1996 and 2007, also sought to track changes in pathology over time in single-university settings, and they too found no empirical evidence that things have been getting worse. This doesn’t definitively prove that kids who seek counseling aren’t getting sicker, of course. But statistically, Schwartz argues, it’s unlikely that five studies looking at different schools would all come up with null findings if, in fact, there was a widespread increase in student pathology overall.

I don’t know this area of research but it sounds like there is room for disagreement and/or need for more definitive data about what is going on among college students.

A broader observation: claims about cultural zeitgeists are not always backed with data. On one hand, perhaps the change is coming so quickly or underneath the radar (it takes time for scientists and others to measure things) that data simply can’t be found. On the other hand, claims about trends are often based on anecdotes and particular points of view that break down pretty quickly when compared to data that is available.

The FBI doesn’t collect every piece of data about crime

The FBI released the 2014 Uniform Crime Report Monday but it doesn’t have every piece of information we might wish to have:

As I noted in May, much statistical information about the U.S. criminal-justice system simply isn’t collected. The number of people kept in solitary confinement in the U.S., for example, is unknown. (A recent estimate suggested that it might be as many as 80,000 and 100,000 people.) Basic data on prison conditions is rarely gathered; even federal statistics about prison rape are generally unreliable. Statistics from prosecutors’ offices on plea bargains, sentencing rates, or racial disparities, for example, are virtually nonexistent.

Without reliable data on crime and justice, anecdotal evidence dominates the conversation. There may be no better example than the so-called “Ferguson effect,” first proposed by the Manhattan Institute’s Heather MacDonald in May. She suggested a rise in urban violence in recent months could be attributed to the Black Lives Matter movement and police-reform advocates…

Gathering even this basic data on homicides—the least malleable crime statistic—in major U.S. cities was an uphill task. Bialik called police departments individually and combed local media reports to find the raw numbers because no reliable, centralized data was available. The UCR is released on a one-year delay, so official numbers on crime in 2015 won’t be available until most of 2016 is over.

These delays, gaps, and weaknesses seem exclusive to federal criminal-justice statistics. The U.S. Department of Labor produces monthly unemployment reports with relative ease. NASA has battalions of satellites devoted to tracking climate change and global temperature variations. The U.S. Department of Transportation even monitors how often airlines are on time. But if you want to know how many people were murdered in American cities last month, good luck.

There could be several issues at play including:

  1. A lack of measurement ability. Perhaps we have some major disagreements about how to count certain things.
  2. Local law enforcement jurisdictions want some flexibility in working with the data.
  3. A lack of political will to get all this information.

My guess is that the most important issue is #3. If we wanted this data we could get this data. Yet, it may require concerted efforts by individuals or groups to make the issues enough of a social problem to ask that we collect good data. This means that the government and/or public needs a compelling enough reason to get uniformity in measurement and consistency in reporting.

How about this reason: having consistent and timely reporting on such data would help cut down on anecdotes and instead correctly keep the American public up to date. They could then make more informed political and civic choices. Right now, many Americans don’t quite know what is happening with crime rates as their primary sources are anecdotes or mass media reports (which can be quite sensationalistic).

To pay or not to pay for Facebook

Would you rather pay Facebook with money or data?

Not long ago, Zeynep Tufekci, a sociologist who studies social media, wrote that she wanted to pay for Facebook. More precisely, she wants the company to offer a cash option (about twenty cents a month, she calculates) for people who value their privacy, but also want a rough idea of what their friends’ children look like. In return for Facebook agreeing not to record what she does—and to not show her targeted ads—she would give them roughly the amount of money that they make selling the ads that she sees right now. Not surprisingly, her request seems to have been ignored. But the question remains: just why doesn’t Facebook want Tufekci’s money? One reason, I think, is that it would expose the arbitrage scheme at the core of Facebook’s business model and the ridiculous degree to which people undervalue their personal data…

The trick is that most people think they are getting a good deal out of Facebook; we think of Facebook to be “free,” and, as marketing professors explain, “consumers overreact to free.” Most people don’t feel like they are actually paying when the payment is personal data and when there is no specific sensation of having handed anything over. If you give each of your friends a hundred dollars, you might be out of money and will have a harder time buying dinner. But you can hand over your personal details or photos to one hundred merchants without feeling any poorer.

So what does it really mean, then, to pay with data? Something subtler is going on than with the more traditional means of payment. Jaron Lanier, the author of “Who Owns the Future,” sees our personal data not unlike labor—you don’t lose by giving it away, but if you don’t get anything back you’re not receiving what you deserve. Information, he points out, is inherently valuable. When billions of people hand data over to just a few companies, the effect is a giant wealth transfer from the many to the few…

Ultimately, Tufekci wants us to think harder about what it means when we pay with data or attention instead of money, which is what makes her proposition so interesting. While every business has slightly mixed motives, those companies that we pay live and die by how they serve the customer. In contrast, the businesses we are paying with attention or data are conflicted. We are their customers, but we are also their products, ultimately resold to others. We are unlikely to stop loving free stuff. But we always pay in the end—and it is worth asking how.

Perhaps we are headed toward a world where companies like Facebook would have to show customers (1) how much data they actually have about the person and (2) what that data is worth. But, I imagine the corporations would like to avoid this because it is better if the user is unaware and shares all sorts of things. And what would it take for customers to demand such transparency or do we simply like the allure of Facebook and credit cards and others products too much to pull back the curtain?

Is it going too far to suggest that personal data is the most important asset individuals will have in the future?

The ongoing mystery of counting website visitors

The headline says it all: “It’s 2015 – You’d Think We’d Have Figured Out How to Measure Web Traffic By Now.”

ComScore was one of the first businesses to take the approach Nielsen uses for TV and apply it to the Web. Nielsen comes up with TV ratings by tracking the viewing habits of its panel — those Nielsen families — and taking them as stand-ins for the population at large. Sometimes they track people with boxes that report what people watch; sometimes they mail them TV-watching diaries to fill out. ComScore gets people to install the comScore tracker onto their computers and then does the same thing.

Nielsen gets by with a panel of about 50,000 people as stand-ins for the entire American TV market. ComScore uses a panel of about 225,000 people4 to create their monthly Media Metrix numbers, Chasin said — the numbers have to be much higher because Internet usage is so much more particular to each user. The results are just estimates, but at least comScore knows basic demographic data about the people on its panel, and, crucial in the cookie economy, knows that they are actually people.5

As Chasin noted, though, the game has changed. Mobile users are more difficult to wrangle into statistically significant panels for a basic technical reason: Mobile apps don’t continue running at full capacity in the background when not in use, so comScore can’t collect the constant usage data that it relies on for its PC panel. So when more and more users started going mobile, comScore decided to mix things up…

Each measurement company comes up with different numbers each month, because they all have different proprietary models, and the data gets more tenuous when they start to break it out into age brackets or household income or spending habits, almost all of which is user-reported. (And I can’t be the only person who intentionally lies, extravagantly, on every online survey that I come across.)…

And that’s assuming that real people are even visiting your site in the first place. A study published this year by a Web security company found that bots make up 56 percent of all traffic for larger websites, and up to 80 percent of all traffic for the mom-and-pop blogs out there. More than half of those bots are “good” bots, like the crawlers that Google uses to generate its search rankings, and are discounted from traffic number reports. But the rest are “bad” bots, many of which are designed to register as human users — that same report found that 22 percent of Web traffic was made up of these “impersonator” bots.

This is an interesting data problem to solve with multiple interested parties from measurement firms, website owners, people who create search engines, and perhaps, most important of all, advertisers who want to quantify exactly which advertisements are seen and by whom. And the goalposts keep moving: new technologies like mobile devices change how visits are tracked and measured.

How long until we get an official number from the reputable organization? Could some of these measurement groups and techniques merge – consolidation to cut costs seems to be popular in the business world these days. In the end, it might not be good measurement that wins out but rather which companies can throw their weight around most effectively to eliminate their competition.

New gadgets, apps want more location data from users

Location data is valuable and more new gadgets make use of the information:

Location-tracking lets developers build fast, useful, personalized apps. They’re enticing, but they come with tradeoffs: your gadgets and apps maintain a log of where you’ve been and what you’re doing, and more of them than you think are sharing that data with others.

It’s going to advertisers, mostly, so they can lure you into the Starbucks a block away or the merch tent at Coachella. It’s as creepy as any other targeted marketing, but most of us have come to accept that it comes with the territory. Jennifer Lynch, a senior staff attorney at the Electronic Frontier Foundation, says it goes deeper. Your data might get sold to your credit reporting agency, which wants to know more about you as it determines your credit score. It might go to your insurance company, which is very interested in your whereabouts. It might be subpoenaed by the government, for just about any reason. Maybe none of that is happening. Maybe all of it is. There’s really no way for us to know…

Your phone’s ability to pinpoint your exact location and use that info to deliver services—a meal, a ride, a tip, a coupon—is reason for excitement. But this world of always-on GPS raises questions about what happens to our data. How much privacy are we willing to surrender? What can these services learn about our activities? What keeps detailed maps of our lives from being sold to the highest bidder? These have been issues as long as we’ve had cellphones, but they are more pressing than ever.

Another major trade-off that I suspect most users will make without much fuss in the coming years. The cynical take on the advantages for the user is that this is primarily about customizable marketing that can account for both your individual traits and where exactly you are. In other words, sharing location data will give consumers new opportunities. More consumerism! On the flip side, it is less clear how or when location data might be used against you. But, when it is, it probably won’t be good.

The broader issue here is whether people should have geographical freedom that is not known to others. This is increasingly difficult in today’s world even as we would celebrate the mobility Americans have within their own communities, country, and to travel throughout the world.

“We don’t lie to our search engine. We’re more intimate with it than with our friends, lovers, or family members.”

Wired has an interesting excerpt from a new book Data and Goliath:

One experiment from Stanford University examined the phone metadata of about 500 volunteers over several months. The personal nature of what the researchers could deduce from the metadata surprised even them, and the report is worth quoting:

Participant A communicated with multiple local neurology groups, a specialty pharmacy, a rare condition management service, and a hotline for a pharmaceutical used solely to treat relapsing multiple sclerosis…

That’s a multiple sclerosis sufferer, a heart attack victim, a semiautomatic weapons owner, a home marijuana grower, and someone who had an abortion, all from a single stream of metadata.

Web search data is another source of intimate information that can be used for surveillance. (You can argue whether this is data or metadata. The NSA claims it’s metadata because your search terms are embedded in the URLs.) We don’t lie to our search engine. We’re more intimate with it than with our friends, lovers, or family members. We always tell it exactly what we’re thinking about, in as clear words as possible.

The gist of the excerpt is that while people might be worried about the NSA, corporations know a lot about us: from who we have talked to, where we have been, who have interacted with through metadata and more personal information through search data. And perhaps the trick to all of this is that (1) we generally give up this data voluntarily online (2) because we perceive some benefits and (3) we can’t imagine life without all of this stuff (even though many important sites and social media barely existed a decade or two ago).

The reason I pulled the particular quote out for the headline is that it has some interesting implications: have we traded close social relationships for the intimacy of the Internet? We may not have to deal with so much ignorance – just Google everything now – but we don’t need to interact with people in the same ways.

Also, this highlights the need for tech companies to put a positive spin on all of their products and actions. “Trust us – we have your best interests at heart.” Yet, like most corporations, their best interests deal with money rather than solely helping people live better lives.

Picking apart the top cities for singles rankings

Rankings of the top cities for singles may not be that valid:

“It doesn’t make much difference” where millennials live in terms of their marriage prospects, Andrew Cherlin, director of Johns Hopkins’ sociology department, wrote in an email. He said most major cities now have about the same rate of millennial inhabitants…

And indeed, most of the top cities for this category were near military installations. No. 2 on Wang’s list was San Luis Obispo, which is less than an hour from Vandenberg Air Force base, the third-largest air force base in the country. No. 4, in Hanford, Calif., has a large Navy presence…

So what does predict whether you’ll get married? The reigning champ of marriage indicators is Mormonism, even for millennials. Utah towns occupy the top three slots among 18-34 year-old marriage rates (nearly 2/3rds of millennials are already spoken for in western Utah County, Utah). And the U.S.’s top-three Mormon states, Utah Wyoming and Idaho, occupy the top three slots for states.

Surprise, surprise; rankings found on the Internet may not be that great. Sometimes this has to do with methodology: what is included in the rankings and how are the different dimensions rated? This is discussed here: do you want to look at millennial composition (where Washington D.C. leads the pack) or millennial marriage rate (Washington D.C. doesn’t do as well)? One lesson might be to have more specific rankings – do you really mean it is best for singles if your data is based on the marriage rate?

Additionally, two other issues arise. One, what if the cities aren’t that different from each other? Rankings are intended to differentiate between options but mathematical differences do not necessarily equal substantive significances. Second, why are the rankings in this order? Here, what related factors – such as the proximity of military installations – might be relevant? This may be hard to pick up at times because not all the cities may be affected by the same phenomena. Thus, the researcher has to do some extra digging to try to explain the rankings rather than just simplistically report them.

Even with the argument from Richard Florida about the creative class seeking out cities with enticing culture and entertainment, how many people move where they do because of such rankings?