Cruz campaign using psychological data to reach potential voters

Campaigns not working with big data are behind: Ted Cruz’s campaign is working with unique psychological data as they try to secure the Republican nomination.

To build its data-gathering operation widely, the Cruz campaign hired Cambridge Analytica, a Massachusetts company reportedly owned in part by hedge fund executive Robert Mercer, who has given $11 million to a super PAC supporting Cruz. Cambridge, the U.S. affiliate of London-based behavioral research company SCL Group, has been paid more than $750,000 by the Cruz campaign, according to Federal Election Commission records.

To develop its psychographic models, Cambridge surveyed more than 150,000 households across the country and scored individuals using five basic traits: openness, conscientiousness, extraversion, agreeableness and neuroticism. A top Cambridge official didn’t respond to a request for comment, but Cruz campaign officials said the company developed its correlations in part by using data from Facebook that included subscribers’ likes. That data helped make the Cambridge data particularly powerful, campaign officials said…

The Cruz campaign modified the Cambridge template, renaming some psychological categories and adding subcategories to the list, such as “stoic traditionalist” and “true believer.” The campaign then did its own field surveys in battleground states to develop a more precise predictive model based on issues preferences.

The Cruz algorithm was then applied to what the campaign calls an “enhanced voter file,” which can contain as many as 50,000 data points gathered from voting records, popular websites and consumer information such as magazine subscriptions, car ownership and preferences for food and clothing.

Building a big data operation behind a major political candidate seems pretty par for the course these days. The success of the Obama campaigns was often attributed to tech whizzes behind the scenes. Since this is fairly normal these days, perhaps we need to move on to other questions: what do voters think about such micro targeting and how do they experience it? Does this contribute to political fragmentation? What is the role of the mass media amid more specific approaches? How valid are the predictions for voters and their behavior (since they are based on certain social science data and theories)? How does this all significantly change political campaigns?

How far are we from just getting ridding of the candidates all together and putting together AI apps/machines/data programs that garner support…

 

Subjective decisions can affect home appraisals

The final appraisal price for a home can be influenced by numerous subjective factors:

A massive, first-of-its-kind study of 1.3 million individual appraisal reports from 2012 through this year conducted by real estate analytics firm CoreLogic offers a suggestion: You should look at what are called adjustments to appraisals that involve relatively subjective estimations — the appraiser’s opinions on the overall quality level of your house, its condition, location and view — rather than more objectively determinable items such as living space square footage, lot size, number of baths and bedrooms, etc…

Adjustments are made in 99.8 percent of all appraisals, according to the CoreLogic study. The most frequent adjustments involve objective features of houses: Living area, rooms, car storage, porch and deck were all adjusted in more than 50 percent of the study’s 1.3 million appraisals, according to CoreLogic. (As a rule, the adjustments on objective features were not large in dollar terms. For example, room adjustments were made in nearly three-quarters of all appraisals but averaged only $2,246 and did not affect the final appraised value dramatically.)

Adjustments involving more-subjective matters — the overall quality or condition of the house — were less common, but they typically triggered much bigger dollar changes. The average adjustment based on quality was nearly $15,000, which is more than enough to complicate a home sale. Some subjective adjustments on the view or location of high-cost homes ran into the hundreds of thousands or even millions of dollars…

Research released last week by Platinum Data Solutions, which reviewed 300,000 appraisals made between July and September, found that fully 39 percent of “quality” or “condition” ratings conflicted with previous ratings on the same property. That inevitably invites controversy.

In other words, appraisals are an inexact science. What makes it particularly frustrating is that the stakes can be big as sellers and buyers are dealing with one of the biggest financial investments of their lives.

Two more thoughts about these findings:

  1. In order to cut down on the variation in findings, would it be better to regularly have multiple appraisers for the same property or some sort of blinded review?
  2. Here is how an example of big data can help reveal patterns across numerous properties and appraisers. But it would be particularly interesting – and perhaps some money could be made – if research identified individual appraisers who consistently had high or low findings.

The perils of analyzing big real estate data

Two leaders of Zillow recently wrote Zillow Talk: The New Rules of Real Estate which is a sort of Freakanomics look at all the real estate data they have. While it is an interesting book, it also illustrates the difficulties of analyzing big data:

1. The key to the book is all the data Zillow has harnessed to track real estate prices and make predictions on current and future prices. They don’t say much about their models. This could be for two good reasons: this is aimed at a mass market and the models are their trade secrets. Yet, I wanted to hear more about all the fascinating data – at least in an appendix?

2. Problems of aggregation: the data is analyzed usually at a metro area or national level. There are hints at smaller markets – a chapter on NYC for example and another looking at some unusual markets like Las Vegas – but there are not different chapters on cheaper/starter homes or luxury homes. An unanswered questino: is real estate within or across markets more similar? Put another way, are the features of the Chicago market so unique and patterned or are cheaper homes in the Chicago region more like similar homes in Atlanta or Los Angeles compared to more expensive homes across markets?

3. Most provocative argument: in Chapter 24, the authors suggest that pushing homeownership for lower-income Americans is a bad idea as it can often trap them in properties that don’t appreciate. This was a big problem in the 2000s: Presidents Clinton and Bush pushed homeownership but after housing values dropped in the late 2000s, poorer neighborhoods were hit hard, leaving many homeowners to default or seriously underwater. Unfortunately, unless demand picks up in these neighborhoods (and gentrification is pretty rare), these homes are not good investments.

4. The individual chapters often discuss small effects that may be significant but don’t have large substantive effects. For example, there is a section on male vs. female real estate agents. The effects for each gender are small: at most, a few percentage points difference in selling price as well as slight variations in speed of sale. (Women are better in both categories: higher prices, faster sales.)

5. The authors are pretty good at repeatedly pointing out that correlation does not mean causation. Yet, they don’t catch all of these moments and at other times present patterns in such a way that distort the axes. For example, here is a chart from page 202:

ZillowTalkp202

These two things may be correlated (as one goes up so does the other and vice versa) but why fix the axes so you are comparing half percentages to five percentage increments?

6. Continuing #4, I supposed a buyer and seller would want to use all the tricks they can but the tips here mean that those in the real estate market are supposed to string along all of these small effects to maximize what they get. On the final page, they write: “These are small actions that add up to a big difference.” Maybe. With margins of error on the effects, some buyers and sellers aren’t going to get the effects outlined here: some will benefit more but some will benefit less.

7. The moral of the whole story? Use data to your advantage even as it is not a guarantee:

In the new realm of real estate, everyone faces a rather stark choice. The operative question now is: Do you wield the power of data to your advantage? Or do you ignore the data, to your peril?

The same is true of the housing market writ large. Certainly, many macro-level dynamics are out of any one person’s control. And yet, we’re better equipped than ever before to choose wisely in the present – to make the kinds of measured judgments that can prevent another coast-to-coast bubble and calamitous burst. (p.252)

In the end, this book is aimed at the mass market where a buyer or seller could hope to string together a number of these small advantages. Yet, there are no guarantees and the effects are often small. Having more data may be good for markets and may make participants feel more knowledgeable (or perhaps more overwhelmed) but not everyone can take advantage of this information.

Using Chicago as a new big data laboratory

University of Chicago sociologist Robert Park once said that the city was a laboratory. A new venture seeks to use Chicago as just that:

On the heels of the University of Chicago’s $1 million Innovation Challenge for urban policy solutions, today’s announcement that UI Labs (“universities and industries”) will open CityWorks, a private R&D partnership that will be based on Goose Island, sets up the city to be a center for urban studies, technology and innovations. Founding partners Microsoft, Accenture, ComEd and Siemens will operate a bit like angel investors, according to Jason Harris, a spokesman for UI Labs. This project will seek to “level up Chicago as a center for the built environment.” The city’s mix of university and industry partners, government leadership and legacy of architecture and design innovation place it in a perfect position for this kind of incubator, according to Harris.

CityWorks wants to seed 6-8 ideas this year, focused on energy, physical infrastructure, transportation and water and sanitation, Harris says (funding amounts aren’t being released). “Our vision is that we have projects that can use the city as a testbed and try out ideas not being tested in other cities,” he says.

CityWorks will award grants to university and private researchers, with a focus on digital planning and the Internet of Things. Chicago is vying to be an important center for this potentially lucrative field. With the recent introduction of the Array of Things, a cutting-edge system of sensors that researchers and computer scientists are hoping will prove the value of real-time, open-source city data, and the recent opening of Uptake, a Brad Keywell-backed startup looking to bring custom data analytics solutions to businesses, the city is well-positioned to become a leader in the field.

I’ll be interested to see what comes out of this. It sounds like the goal the goal is to use big data collected at the city scale to find solutions to urban business issues. I do wonder if this is primarily about making profits or more about addressing urban social problems.

Some might be surprised to see such a project going forward in Chicago. After all, isn’t it a Rust Belt city struggling with big financial problems and violence? At the same time, this project highlights Chicago as a center of innovation (which requires a particular social context), a place where businesses want to locate, and home to a good amount of human capital (in both research interests and educated workers).

Using Google Street View to collect large-scale neighborhood data

One sociologist has plans to use a new Google Street View app to study neighborhoods:

Michael Bader, a professor of sociology at the American University, revealed the app developed is called Computer Assisted Neighborhood Visual Assessment System (CANVAS). The app rated 150 dissimilar features of neighborhoods in some main metropolitan cities in the U.S. The researchers claim the latest app reduces the cost and time in research.

With the help of Google Street View, the new app connects images and creates panoramic views of the required rural areas as well as cities. Bader explains that without the Google app researchers would have to cover many square miles for data collection, which is a painstaking job…

The app has already received funding of around $250,000 and s also supposed to be the first app that examines the scope and reliability of Google Street View when it comes to rating the neighborhoods in the U.S.

Bader reveals he is currently using CANVAS for a research on the Washington D.C. area. He revealed the population of people who have reached 65 and over in the region will be 15.3 percent by 2030. Bader hopes to understand why elderly people leave their community and what stops them from spending the remainder of the lives in the region. Bader’s research wants to understand the challenges elders face in Washington D.C.

As an urban sociologist, I think this has a lot of potential: Google Street View has an incredible amount of data and offers a lot of potential for visual sociology. While tradition in urban sociology might involve long-term study of a neighborhood (or perhaps a lot of walking within a single city), this offers a way to easily compare street scenes within and across cities.

Deciphering the words in home listings; “quaint” = 1,299 square feet, “cute” = 1,128 square feet

An analysis of Zillow data looks at the text accompanying real estate listings:

Longer is almost always better, though above a certain length, you didn’t get any added value — you don’t need to write a novel. Over 250 words, it doesn’t seem to matter. Our takeaway was that if you’ve got it, flaunt it. Descriptive words are very helpful. “Stainless steel,” “granite,” “view” and “landscaped” were found in listings that got a higher sales price than comparable homes.

And there are words you should stay away from, especially “nice.” We think that in the American dialect, you say “nice” if you don’t have anything more to say. And then there are the words that immediately tell a buyer that the house is small: When we analyzed the data, we found that homes described as “charming” averaged 1,487 square feet, “quaint” averaged 1,299 square feet, and “cute” averaged 1,128. All of them were smaller than the average house in our sampling.

The impact of words seems to vary by price tier. For example, “spotless” in a lower-priced house seemed to pay off in a 2 percent bonus in the final price, but it didn’t seem to affect more pricey homes. “Captivating” paid off by 6.5 percent in top-tier homes, but didn’t seem to matter in lower-priced ones.

There are certainly codes in real estate listings that are necessary due to the limited space for words. But, as the article notes some of the words are more precise than others. If someone says they have stainless steel appliances, the potential buyer has a really good idea of what is there. But, other words are much more ambiguous. Just how “new” are some big-ticket items like roofs or flooring or furnaces? The big data of real estate listings allows us to see the patterns tied to these words. Just remember the order for size: cute is small, quaint is slightly larger, and charming slightly bigger still.

If I’m Zillow, is it time to sell this info to select real estate professionals?

Mapping every road in the United States

The United States has a lot of roads and you can see them all on these state and national maps:

Roads, it turns out, are fantastic indicators of geographies, as evidenced by Fathom’s All Streets series of posters. A few years ago the Boston design studio released All Streets, a detailed look at all the streets in the United States. The team has since produced a set of All Streets for individual states and countries.

Using data from the U.S. Census Bureau’s TIGER/Line data files (Open Maps for other countries), the designers are able to paint a clear picture of where our infrastructure bumps into nature-made dead ends. In states like North Dakota and Iowa you see flat expanses of grids. Nebraska has a dense set of roads in the east near its more populated cities that dissipates as you head west towards the rural Sandhills prairie. A dark spot near the southern tip of Nevada punctuates the otherwise desert-heavy state, conversely, the Adirondack mountain range provides an expanse of white in a dark stretch of New York roads.

I do find the smaller maps or smaller scale views more interesting because they do show some differences. Looking at the national level doesn’t reveal all that much because we are now used to such images based on infrastructure and big data, whether based on cell phone coverage or interstates or lights seen from space or population distributions.

I could see hanging one of these – perhaps the Illinois version?

I like the band name “Big Data”

Band names can often reflect societal trends so I’m not surprised that a group selected the name Big Data. I like the name: it sounds current and menacing. I’ve only heard their songs a few times on the radio so I’ll have to reserve judgment on what they have actually created.

It might be interesting to think of what sociological terms and ideas could easily translate into good band names. One term that sometimes intrigues my intro students – interactional vandalism – could work. Conspicuous consumption? Cultural lag? Differential association? The culture industry? Impression management? The iron cage? Social mobility? The hidden curriculum?

“New Apps Instantly Convert Spreadsheets Into Something Actually Readable”

Several new apps transform spreadsheet data into a chart or graph without having to spend much or any time with the raw data:

It’s called Project Elastic, and he unveiled the thing this fall at a conference run by his company, Tableau. The Seattle-based company has been massively successful selling software that helps big businesses “visualize” the massive amount of online data they generate—transform all those words and numbers into charts and graphics their data scientists can more readily digest—but Project Elastic is something different. It’s not meant for big businesses. It’s meant for everyone.

The idea is that, when someone emails a spreadsheet to your iPad, the app will open it up—but not as a series of rows and columns. It will open the thing as chart or graph, and with a swipe of the finger, you can reformat the data into a new chart or graph. The hope is that this will make is easier for anyone to read a digital spreadsheet—an age-old computer creation that’s still looks like Greek to so many people. “We think that seeing and understanding your data is a human right,” says Story, the Tableau vice president in charge of the project.

And Story isn’t the only one. A startup called ChartCube has developed a similar tool that can turn raw data into easy-to-understand charts and graphs, and just this week, the new-age publishing outfit Medium released a tool called Charted that can visualize data in similar ways. So many companies aim to democratize access to online data, but for all the different data analysis tool out on the market, this is still the domain of experts—people schooled in the art of data analysis. These projects aim to put the democracy in democratize.

Two quick thoughts:

1. I understand the impulse to create charts and graphs that communicate patterns. Yet, such devices are not infallible in themselves. I would suggest we need more education in interpreting and using the information from infographics. Additionally, this might be a temporary solution but wouldn’t it be better in the long run if more people know how to read and use a spreadsheet?

2. Interesting quote: “We think that seeing and understanding your data is a human right.” I have a right to data or to the graphing and charting of my data? This adds to a collection of voices arguing for a human right to information and data.

Don’t see social media as representative of full populations

This should be obvious but computer scientists remind us that social media users are not representative populations:

One of the major problems with sites like Twitter, Pinterest or Facebook is ‘population bias’ where platforms are populated by a very narrow section of society.

Latest figures on Twitter suggest that just five per cent of over 65s use the platform compared with 35 per cent for those aged 18-29. Similarly far more men use the social networking site than women.

Instagram has a particular appeal to younger adults, urban dwellers, and non-whites.

In contrast, the picture-posting site Pinterest is dominated by females aged between 25 and 34. LinkedIn is especially popular among graduates and internet users in higher income households.

Although Facebook is popular across a diverse mix of demographic groups scientists warn that postings can be skewed because there is no ‘dislike’ button. There are also more women using Facebook than men, 76 per cent of female internet users use the site compared with 66 per cent of males.

Who does the data from social media represent? The people who use social media who, as pointed out above, tend to skew younger across the board and have other differences based on the service. Just because people are willing to put information out there doesn’t mean that it is a widely shared perspective, even if a Twitter account has millions of followers or a Facebook group has a lot of likes. Until we have a world where everyone participates in social media in similar ways and makes much of the same information public, we need to be careful about social media samples.