Analyzing what Americans value by examining lots of obituaries

Posted on December 11, 2025 by legallysociable

How are people remembered? One team of researchers analyzed millions of obituaries. Here is the abstract from the recently published study:

How societies remember the dead can reveal what people value in life. We analyzed 38 million obituaries from the United States to examine how personal values are encoded in individual and collective legacies. Using Schwartz’s theory of basic human values, we found that tradition and benevolence dominated legacy reflections, while values like power and stimulation appeared less frequently. Major cultural events—the terrorist attacks of September 11th, the 2008 financial crisis, and the COVID-19 pandemic—were systematically linked to changes in legacy reflections about personal values, with security declining after 9/11, achievement declining after the financial crisis, and benevolence declining for years after COVID-19 began and, to date, not yet returning to baseline. Gender and age of the deceased were also linked to differences in legacy: Men were remembered more for achievement, power, and conformity, while women were remembered more for benevolence and hedonism. Older people were remembered more for tradition and conformity than younger people. These patterns shifted dynamically across the lifespan, with obituaries for men showing more age-related variation than legacies for women. Our findings reveal how obituaries serve as psychological and cultural time capsules, preserving not just individual legacies, but also indicating what US society values collectively regarding a life well lived.

This sounds like a novel means by which to examine American cultural values. Obituaries are regularly published and are often accessible to many readers. But to collect and analyze millions of obituaries requires particular skills. This is a big data approach.

The study could also raise multiple additional research questions:

How many of these obituaries were written by the deceased or decided upon before death? Does this change the content?
What is the process by which people writing the obituary after death decide on the words to use and values to emphasize?
How much do these values in obituaries match what people say they value in life at different stages before they die?

The possibilities of linking together sets of data

Posted on August 21, 2023 by legallysociable

I saw multiple interesting presentations at ASA this year that linked together several datasets to develop robust analysis and interesting findings. These data sources included government data, data collected by the researchers, and other available data. Doing this unlocks a lot of possibilities for answering research questions.

Photo by Manuel Geissinger on Pexels.com

But, how might this happen more regularly? Or, put differently, how might more researchers use multiple datasets in a single project? Here are some quick thoughts on what could help make this possible:

-More access to data. Some data is publicly available. Other data is restricted for a variety of reasons. Having more big datasets accessible opens up possibilities. Just knowing where to request data is a process plus whatever applications and/or resources might be needed to access it.

-Having the know-how to put datasets together. It takes work to become familiar with a single dataset. To be able to merge data requires additional work. I do not know if it would be useful to offer more instruction in doing this or whether it matters which individual datasets are involved.

-Asking research questions gets more interesting and complicated with more variables and layers at play. Constructing sets of questions that build on the strengths of the combined data is a skill.

-Including more – but concise and understandable – explanations of how the data was merged in publications can help demystify the process.

And with all of this data innovation, it is interesting to consider how projects that link multiple datasets complement and come alongside other projects with only one source of data.

Nations vying for big data hegemony

Posted on September 10, 2022 by legallysociable

Big data is out there – but who will control it or oversee it?

The rise of Big Data—the vast digital output of daily life, including data Google and Facebook collect from their users and convert into advertising dollars—is now a matter of national security, according to some policymakers. The fear is that China is vacuuming up data about the U.S. and its citizens not just to steal secrets from U.S. companies or to influence citizens but also to build the foundation of technological hegemony in the not-too-distant future. Data—lots of it, the more the better—has, along with the rise of artificial intelligence, taken on strategic importance…

Broad fears of technological hegemony may be overblown, some policy experts say. And harsh measures against China could alienate allies and trigger a rash of similarly harsh measures by counties abroad toward U.S. tech firms.

In any case, the U.S. is in an exceedingly weak position to lead a moral crusade for the sanctity of data. The concept of harvesting clicks, text, internet addresses and other data from unsuspecting citizens and exploiting them for commercial and national-security ends was invented in the halls of the National Security Agency, the CIA and the tech startups of Silicon Valley. Facebook (now Meta), Google, Amazon, Microsoft and Apple currently lead a vast industry based on trading and compiling user data. Taking measures to protect the data of American citizens from the ravages of Silicon Valley would go a long way to protecting them from China, too. Any measures directed solely against China would likely be ineffective because vast troves of consumer data would still be available for purchase on secondary data markets…

Whatever the case, some suggest the world is already moving inexorably towards a bipolar digital world—a move that will only accelerate as the burgeoning race for AI dominance between China and America picks up steam.

So data becomes just another area in which powerful nations fight? Does the data with all of its potential and pitfalls simply become a national instrument of power?

There could be other options here. However, it might be hard to know whether these are preferable compared to states wanting to control big data.

In the hands of users. Move data toward consumers and individuals rather than in the hands or accessed by nations and corporations.
In the hands of corporations. They often generate and collect a lot of this data and then operate across nations and contexts.
In the hands of some other neutral actors. They may not exist yet or have much power but could they in the future?

This bears watching because this could go well or not and would have wide consequences either way.

Facebook releases big data to researchers outside the company

Posted on February 16, 2020 by legallysociable

Researchers can now access a big dataset of Facebook sharing data:

Social Science One is an effort to get the Holy Grail of data sets into the hands of private researchers. That Holy Grail is Facebook data. Yep, that same unthinkably massive trove that brought us Cambridge Analytica.

In the Foo Camp session, Stanford Law School’s Nate Persily, cohead of Social Science One, said that after 20 months of negotiations, Facebook was finally releasing the data to researchers. (The researchers had thought all of that would be settled in two months.) A Facebook data scientist who worked on the team dedicated to this project beamed in confirmation. Indeed, the official announcement came a few days later…

This is a new chapter in the somewhat tortured history of Facebook data research. The company hires top data scientists, sociologists, and statisticians, but their primary job is not to conduct academic research, it’s to use research to improve Facebook’s products and promote growth. These internal researchers sometimes do publish their findings, but after a disastrous 2014 Facebook study that involved showing users negative posts to see if their mood was affected, the company became super cautious about what it shared publicly. So this week’s data drop really is a big step in transparency, especially since there’s some likelihood that the researchers may discover uncomfortable truths about the way Facebook spreads lies and misinformation.

See the codebook here and the request for proposals to use the data here. According to the RFP, the data involves shared URLs and who interacted with those links:

Through Social Science One, researchers can apply for access to a unique Facebook dataset to study questions related to the effect of social media on democracy. The dataset contains approximately an exabyte (a quintillion bytes, or a billion gigabytes) of raw data from the platform, a total of more than 10 trillion numbers that summarize information about 38 million URLs shared more than 100 times publicly on Facebook (between 1/1/2017 and 7/31/2019). It also includes characteristics of the URLs (such as whether they were fact-checked or flagged by users as hate speech) and the aggregated data concerning the types of people who viewed, shared, liked, reacted to, shared without viewing, and otherwise interacted with these links. This dataset enables social scientists to study some of the most important questions of our time about the effects of social media on democracy and elections with information to which they have never before had access.

Now to see what social scientists can do with the data. The emphasis appears to be on democracy, political posts, and misinformation but given what is shared on Facebook, I imagine there are connections to numerous other topics.

Linking nicer cars to a suburb on the rise

Posted on June 11, 2019 by legallysociable

From the Australian suburbs: one insider suggests seeing nicer cars in driveways signals good prospects for the suburban community.

The gentrification of the driveway happens before the gentrification of a suburb, says the boss of a data analytics company.

Upmarket vehicles beginning to appear in the carports and garages of houses is often a forerunner of a suburb on the rise, as renovators move in...

When more models such as a BMW X5 or an Audi SUV begin appearing in the driveway of houses and apartments in particular suburban streets, it is a reliable predictor of a suburb undergoing gentrification and becoming much more popular with renovators. Extra investment in community infrastructure often followed, and there was a broad flow on to higher property prices…

He said households who were taking out a loan for $500,000 to buy a rundown home in an up-and-coming area were often also purchasing a $30,000 to $40,000 car to fit the aspirational lifestyle.

The article chalks this up to a big data insight as bringing together multiple pieces of information helped reveal this relationship. I can see how this new information might help investors but it is less clear how this would help residents or local governments.

More broadly, this gets at something my dad always said: look at the cars in driveways, on the street, or in parking spots and it gives you a sense of the people who live there. In societies that prize cars, such as in the United States and Australia and particularly their suburbs, a vehicle becomes an important social marker. The one-to-one relationship might not always work as some people buy more expensive cars than their housing might indicate and vice versa (recall the stories of millionaires driving old reliable cars). Yet, on the whole, people of different social classes drive different vehicles in varying states of repair. Hence, various brands aim at different segments of the market. Famously, General Motors did this early in the 20th century with five different car lines to appeal to different kinds of buyers.

UPDATE: I probably did not contribute to this upward trend with long-term ownership of a Toyota Echo. But, it looked good for its age.

Collecting big data the slow way

Posted on April 11, 2018 by legallysociable

One of the interesting side effects of the era of big data is finding out how much information is not actually automatically collected (or is at least not available to the general public or researchers without paying money). A quick example from the work of sociologist Matthew Desmond:

The new data, assembled from about 83 million court records going back to 2000, suggest that the most pervasive problems aren’t necessarily in the most expensive regions. Evictions are accumulating across Michigan and Indiana. And several factors build on one another in Richmond: It’s in the Southeast, where the poverty rates are high and the minimum wage is low; it’s in Virginia, which lacks some tenant rights available in other states; and it’s a city where many poor African-Americans live in low-quality housing with limited means of escaping it.

According to the Eviction Lab, here is how they collected the data:

First, we requested a bulk report of cases directly from courts. These reports included all recorded information related to eviction-related cases. Second, we conducted automated record collection from online portals, via web scraping and text parsing protocols. Third, we partnered with companies that carry out manual collection of records, going directly into the courts and extracting the relevant case information by hand.

In other words, it took a lot of work to put together such a database: various courts, websites, and companies had different pieces of information but a researcher to access all of that data and put them together.

Without a researcher or a company or government body explicitly starting to record or collect certain information, a big dataset on that particular topic will not happen. Someone or some institution, typically with resources at its disposal, needs to set a process into motion. And simply having the data is not enough; it needs to be cleaned up so it all works with the other pieces. Again, from the Eviction Lab:

To create the best estimates, all data we obtained underwent a rigorous cleaning protocol. This included formatting the data so that each observation represented a household; cleaning and standardizing the names and addresses; and dropping duplicate cases. The details of this process can be found in the Methodology Report (PDF).

This all can lead to a fascinating dataset of over 83 million records on an important topic.

We are probably still a ways off from a scenario where this information would automatically become part of a dataset. This data had a definite start and required much work. There are many other areas of social life that require similar efforts before researchers and the public have big data to examine and learn from.

When the candidate with the big data advantage didn’t win the presidency

Posted on December 11, 2016 by legallysociable

Much was made of the effective use of big data by Barack Obama’s campaigns. That analytic advantage didn’t help the Clinton campaign:

Clinton can be paranoid and self-destructively self-protective, but she’s also capable of assessing her own deficiencies as a politician in a bracingly clear-eyed way. And the conclusion that she drew from her 2008 defeat was essentially an indictment of her own management style: Eight years earlier, she had personally presided over a talented, sloppy, squabbling, sprawling menagerie of pals, longtime advisers and hangers-on who somehow managed to bungle the building of a basic political infrastructure to oppose Obama’s efficient, data-driven operation.

To do so, Mook hired a buddy who had helped Terry McAuliffe squeak out a win in the 2014 Virginia governor’s race: Elan Kriegel, a little-known data specialist who would, in many ways, exert more influence over the candidate than any of all-star team of veteran consultants. Kriegel’s campaign-within-a-campaign conducted dozens of targeted surveys—to test messaging and track voter sentiment day-by-day, especially in battleground states—and fed them into a computer algorithm, which ran hundreds of thousands of simulations that were used to steer ad spending, the candidate’s travel schedule, even the celebrities Clinton would invite to rallies.

The data operation, five staffers told me, was the source of Mook’s power within the campaign, and a source of perpetual tension: Many of Clinton’s top consultants groused that Mook and Kriegel withheld data from them, balking at the long lead time—a three-day delay—between tracking reports. A few of them even thought Mook was cherry-picking rosy polling to make the infamously edgy Clinton feel more confident…

In numerous interviews conducted throughout the campaign, Clinton staffers attested to Mook’s upbeat attitude and mastery of detail. But, in the end, Brooklyn simply failed to predict the tidal wave that swamped Clinton—a pro-Trump uprising in rural and exurban white America that wasn’t reflected in the polls—and his candidate failed to generate enough enthusiasm to compensate with big turnouts in Detroit, Milwaukee and the Philadelphia suburbs.

It would be fascinating to hear more. The pollsters didn’t get it right – but neither did the Clinton campaign internally?

The real question is what this will do to future campaigns. Was Donald Trump’s lack of campaign infrastructure and reliance on celebrity and media coverage (also highlighted nicely in the article above) something that others can or will replicate? Or, would the close margins in this recent presidential election highlight even more the need for finely-tuned data and microtargeting? I’m guessing the influence of big data in campaigns will only continue but data will only get you so far if it (1) isn’t great data in the first place and (2) people know how to use it well.

Let Amazon’s big data tractor trailer drive to you

Posted on December 3, 2016 by legallysociable

Americans like big trucks and hard drive space so why not put the two together?

Amazon announced the new service, confusingly named Snowmobile, at its Re:Invent conference in Las Vegas this week. It’s designed to shuttle as many as 100 petabytes–around 100,000 terabytes–per truck. That’s enough storage to hold five copies of the Internet Archive (a comprehensive backup of the web both present and past), which contains “only” about 18.5 petabytes of unique data...

Using multiple semis to shuttle data around might seem like overkill. But for such massive amounts of data, hitting the open road is still the most efficient way to go. Even with a one gigabit per-second connection such as Google Fiber, uploading 100 petabytes over the internet would take more than 28 years. At an average speed of 65 mph, on the other hand, you could drive a Snowmobile from San Francisco to New York City in about 45 hours—about 4,970 gigabits per second. That doesn’t count the time it takes to actually transfer the data onto Snowmobile–which Amazon estimates will take less than 10 days–or from the Snowmobile onto Amazon’s servers. But all told, that still makes the truck much, much faster. And because Amazon has data centers throughout the country, your data probably won’t need to travel cross-country anyway.

One could make a strong case that semis make America go. And all the money that the government has put into highways and roads certainly helps.

Guidelines for using big data to improve colleges

Posted on September 11, 2016 by legallysociable

A group of researchers and other interested parties recently made suggestions about how big data from higher ed can be used for good within higher ed:

To Stevens and others, this massive data is full of promise – but also peril. The researchers talk excitedly about big data helping higher education discover its Holy Grail: learning that is so deeply personalized that it both keeps struggling students from dropping out and pushes star performers to excel…

The guidelines center on four core ideas. The first calls on all players in higher education, including students and vendors, to recognize that data collection is a joint venture with clearly defined goals and limits. The second states that students be told how their data are collected and analyzed, and be allowed to appeal what they see as misinformation. The third emphasizes that schools have an obligation to use data-driven insights to improve their teaching. And the fourth establishes that education is about opening up opportunities for students, not closing them.

While numbers one and two deal with handling the data, numbers three and four discuss the purposes: will the data actually help students in the long run? Such data could serve a lot of interested parties: faculty, administrators, alumni, donors, governments, accreditation groups, and others. I suspect faculty would be worried that administrators would try to squeeze more efficiencies out of the college, donors might want to see what exactly is going on at college, the government could set new regulatory guidelines, etc.

Yet, big data doesn’t necessarily provide quick answers to these purposes even as it might provide insights into broader patterns. Take improving teaching: there is a lot of disagreement over this topic. Or, opening opportunities for students: which ones? Who chooses which options students should have?

One takeaway: big data offers much potential to see new patterns and give decision makers better tools. However, it does not guarantee better or worse outcomes; it can be used well or misused like any sense of data. I like the idea of getting out ahead of the data to set some common guidelines but I imagine it will take some time to work out best practices.

Claim that McMansions have proportionally lost resale value

Posted on August 25, 2016 by legallysociable

A recent study by Trulia suggests McMansions don’t hold their value:

The premium that buyers can expect to pay for a McMansion in Fort Lauderdale, Fla., declined by 84 percent from 2012 to 2016, according to data compiled by Trulia. In Las Vegas, the premium dropped by 46 percent and in Phoenix, by 42 percent.

Real estate agents don’t usually tag their listings #McMansion, so to compile the data, Trulia created a proxy, measuring the price appreciation of homes built from 2001 and 2007 that have 3,000 to 5,000 square feet. While there’s no single size designation, and plenty of McMansions were built outside that time window, those specifications capture homes built at the height of the trend.

McMansions cost more to build than your average starter ranch home does, and they will sell for more. But the return on investment has dropped like a stone. The additional cash that buyers should be willing to part with to get a McMansion fell in 85 of the 100 largest U.S. metropolitan areas. For example, four years ago a typical McMansion in Fort Lauderdale was valued at $477,000, a 274 percent premium over all other homes in the area. This year, those McMansions are worth about $611,000, or 190 percent more than the rest the homes on the market.

The few areas in which McMansions are gaining value faster than more tasteful housing stock are located primarily in the Midwest and the eastern New York suburbs that make up Long Island. The McMansion premium in Long Island has increased by 10 percent over the last four years.

Read the Trulia report here.

Interesting claim. After the housing bubble burst, some commentators suggested that Americans should go back to not viewing homes as goods with significant returns on investment. Instead, homes should be viewed as having some appreciation but this happens relatively slowly. This article would seem to suggest that return on investment is a key factor in buying a home. How often does this factor into the decisions of buyers versus other concerns (such as having more space or locating in the right neighborhoods)? And just how much of a premium should homeowners expect – 190% more than the rest of the market is not enough?

This analysis also appears to illustrate both the advantages and pitfalls of big data. On one hand, sites like Trulia and Zillow can look at the purchase and sale of all across the country. Patterns can be found and certain causal factors – such as housing market – ca be examined. Yet, they are still limited by the parameters in their data collection which, in this case, severely restricts their definition of McMansions to a certain size home built over a particular time period. As others might attest, big homes aren’t necessarily McMansions unless they have bad architecture or are teardowns. This sort of analysis would be very difficult to do without big data but it is self-evident that such analyses are always worthwhile.

Legally Sociable

Pleasant Musings on Sociology, McMansions and Housing, Suburbs and Cities, and Miscellaneous Errata.

Tag Archives: big data

Analyzing what Americans value by examining lots of obituaries

The possibilities of linking together sets of data

Nations vying for big data hegemony

Facebook releases big data to researchers outside the company

Collecting big data the slow way

When the candidate with the big data advantage didn’t win the presidency

Let Amazon’s big data tractor trailer drive to you

Guidelines for using big data to improve colleges

Claim that McMansions have proportionally lost resale value

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: