Trying to count the social patterns that have not happened yet, AI job takeover edition

Posted on February 11, 2026 by legallysociable

It is hard to know how many jobs AI might eliminate when we cannot yet count many jobs eliminated:

Measurement doesn’t abolish injustice; it rarely even settles arguments. But the act of counting—of trying to see clearly, of committing the government to a shared set of facts—signals an intention to be fair, or at least to be caught trying. Over time, that intention matters. It’s one way a republic earns the right to be believed in.

The BLS remains a small miracle of civilization. It sends out detailed surveys to about 60,000 households and 120,000 businesses and government agencies every month, supplemented by qualitative research it uses to check and occasionally correct its findings. It deserves at least some credit for the scoreboard. America: 250 years without violent class warfare. And you have to appreciate the entertainment value of its minutiae. The BLS is how we know that, in 2024, 44,119 people worked in mobile food services (a.k.a. food trucks), up 907 percent since 2000; that nonveterinary pet care (grooming, training) employed 190,984 people, up 513 percent; and that the United States had almost 100,000 massage therapists, with five times the national concentration in Napa, California.

These and thousands of other BLS statistics describe a society that has grown more prosperous, and a workforce endlessly adaptive to change. But like all statistical bodies, the BLS has its limits. It’s excellent at revealing what has happened and only moderately useful at telling us what’s about to. The data can’t foresee recessions or pandemics—or the arrival of a technology that might do to the workforce what an asteroid did to the dinosaurs…

This was the point Goolsbee wanted to emphasize: Economists are constrained by numbers. And numerically speaking, nothing indicates that AI has had an impact on people’s jobs. “It’s just too early,” he said.

A lack of certainty should not be mistaken for a lack of concern.

This sounds like a classic issue facing those concerned about particular social problems: can the numbers help you build a case that this issue is important and worthy of the attention of others? With all the possible social problems that need attention, having clear data regarding the problem can help make the case to the public and leaders. But, if this is largely speculation regarding AI, how many will act based on that?

Another important factor regarding counting: it is a key way of trying to make sense of a large and complex society. When you have a country with over 330 million residents, 50 states, and numerous important social patterns occurring, having data to look at can help make sense of what is happening on the broad scale. Anecdotes offer little on a large scale; case studies might provide some insight. Having statistics on a society-wide scale is necessary.

A third way to think about this: those who could generate numerical predictions or have small sectors that could provide early data on this could be helpful for others.

Beat the lottery odds by buying all the lottery tickets

Posted on April 18, 2025 by legallysociable

I have used a similar example in Statistics class when learning about the central limit theorem: for a better chance to win the lottery, buy more tickets to get closer to certainly winning.

Bernard Marantelli had a plan in mind. He and his partners would buy nearly every possible number in a coming drawing. There were 25.8 million potential number combinations. The tickets were $1 apiece. The jackpot was heading to $95 million. If nobody else also picked the winning numbers, the profit would be nearly $60 million.

Marantelli flew to the U.S. with a few trusted lieutenants. They set up shop in a defunct dentist’s office, a warehouse and two other spots in Texas. The crew worked out a way to get official ticket-printing terminals. Trucks hauled in dozens of them and reams of paper.

Over three days, the machines—manned by a disparate bunch of associates and some of their children—screeched away nearly around the clock, spitting out 100 or more tickets every second. Texas politicians later likened the operation to a sweatshop…

Over the years, Ranogajec and his partners have won hundreds of millions of dollars by applying Wall Street-style analytics to betting opportunities around the world. Like card counters at a blackjack table, they use data and math to hunt for situations ripe for flipping the house edge in their favor. Then they throw piles of money at it, betting an estimated $10 billion annually.

How representative sampling works: get a large enough sample with characteristics that mirror that of the larger population and you can have confidence that the sample results are within several percentage points of results if you measured the same things for the whole population. And as your sample size increases, you get closer and closer to the characteristics of the whole population.

Buying one lottery ticket means the buyer has really small odds of winning. Super small. Buy more tickets and the odds of winning increase. Buy nearly all the tickets and your odds go way up. Buy them all and you win.

It sounds like the gamblers above compare the cost of buying all the tickets to the jackpot and go all in when there is a large enough gap. But the central limit theorem suggests they could drastically increase their odds without buying every ticket; might that be worth it financially?

The importance of statistics on college campuses

Posted on March 7, 2023 by legallysociable

Within a longer look at the fate of the humanities, one Harvard student suggests statistics dominates campus conversations:

I asked Haimo whether there seemed to be a dominant vernacular at Harvard. (When I was a student there, people talked a lot about things being “reified.”) Haimo told me that there was: the language of statistics. One of the leading courses at Harvard now is introductory statistics, enrolling some seven hundred students a semester, up from ninety in 2005. “Even if I’m in the humanities, and giving my impression of something, somebody might point out to me, ‘Well, who was your sample? How are you gathering your data?’ ” he said. “I mean, statistics is everywhere. It’s part of any good critical analysis of things.”

It struck me that I knew at once what Haimo meant: on social media, and in the press that sends data visualizations skittering across it, statistics is now everywhere, our language for exchanging knowledge. Today, a quantitative idea of rigor underlies even a lot of arguments about the humanities’ special value. Last school year, Spencer Glassman, a history major, argued in a column for the student paper that Harvard’s humanities “need to be more rigorous,” because they set no standards comparable to the “tangible things that any student who completes Stat 110 or Physics 16 must know.” He told me, “One could easily walk away with an A or A-minus and not have learned anything. All the STEM concentrators have this attitude that humanities are a joke.”…

Haimo and I turned back toward Harvard Square. “I think the problem for the humanities is you can feel like you’re not really going anywhere, and that’s very scary,” he said. “You write one essay better than the other from one semester to the next. That’s not the same as, you know, being able to solve this economics problem, or code this thing, or do policy analysis.” This has always been true, but students now recognized less of the long-term value of writing better or thinking more deeply than they previously had. Last summer, Haimo worked at the HistoryMakers, an organization building an archive of African American oral history. He said, “When I was applying, I kept thinking, What qualifies me for this job? Sure, I can research, I can write things.” He leaned forward to check for passing traffic. “But those skills are very difficult to demonstrate, and it’s frankly not what the world at large seems in demand of.”

I suspect this level of authority is not just true on a college campus: numbers have a particular power in the world today. They convey proof. Patterns and trends. There can often be little space to ask where the numbers came from or what they mean.

Is this the only way to understand the world? No. We need to consider all sorts of data to understand and explain what is going on. Stories and narratives do not just exist to flesh out quantitative patterns; they can convey deep truths and raise important questions.

But what if we only care today about what is most efficient and most able to directly translate into money? If college students and others prioritize jobs over everything else, does this advantage numbers and their connections to STEM and certain occupations that are the only ways or perceived certain ways to wealth and a return on investment? From later in the article:

In a quantitative society for which optimization—getting the most output from your input—has become a self-evident good, universities prize actions that shift numbers, and pre-professionalism lends itself to traceable change.

If American society prizes money and a certain kind of success above all else, are these patterns that surprising?

Helping readers see patterns and the bigger picture in new housing price data

Posted on August 25, 2022 by legallysociable

The headline reads:

Home prices fell for the first time in 3 years last month – and it was the biggest decline since 2011

This quickly relays information about recent trends – prices went down for the first time in a while – as well as longer patterns – biggest drop in over a decade.

Next are some figures on housing affordability:

Now, housing affordability is at its lowest level in 30 years. It requires 32.7% of the median household income to purchase the average home using a 20% down payment on a 30-year mortgage, according to Black Knight. That is about 13 percentage points more than it did entering the pandemic and significantly more than both the years before and after the Great Recession. The 25-year average is 23.5%.

The housing affordability statistic is put into terms accessible to a broad audience: nearly 33% of the median household income is needed to buy the average house with common mortgage terms. Additionally, this percentage is higher than recent years and a longer 25 year stretch.

Some housing markets are seeing bigger price declines than others:

Some local markets are seeing even steeper declines over the last few months. San Jose, California, saw the largest, with home prices now down 10% in recent months, followed by Seattle (-7.7%), San Francisco (-7.4%), San Diego (-5.6%), Los Angeles (-4.3%) and Denver (-4.2%).

It could be noted that these are expensive and hot real estate markets. Yes, they had larger drops but they had been pushed higher in recent years than many other markets.

And the article ends with information on mortgage rates:

The average rate on the popular 30-year fixed mortgage began this year right around 3%, according to Mortgage News Daily. It climbed slowly month to month, pulling back slightly in May but then shot more dramatically to just over 6% in June. It is now hovering around 5.75%.

This highlights the rise in mortgage rates this year. Some broader context might be helpful; what was the average rate before COVID-19 or over the last 10 years?

This article provides numerous statistics and often puts the figures in context. Yet, it does lead one lingering question: what is the state of housing prices overall? One answer might be change after a period of trends during COVID-19. Another might be to focus on different actors involved: how does this affect the housing industry or what about the difficulty of some to get into the housing market or it could be a story about higher housing values for many homeowners.

Statistics are not just facts thrown into a void; they require interpretation and are often applied to particular concerns or issues.

Americans overestimate the size of smaller groups, underestimate the size of larger groups

Posted on March 25, 2022 by legallysociable

Recent YouGov survey data shows Americans have a hard time estimating the population of a number of groups:

https://today.yougov.com/topics/politics/articles-reports/2022/03/15/americans-misestimate-small-subgroups-population

When people’s average perceptions of group sizes are compared to actual population estimates, an intriguing pattern emerges: Americans tend to vastly overestimate the size of minority groups. This holds for sexual minorities, including the proportion of gays and lesbians (estimate: 30%, true: 3%), bisexuals (estimate: 29%, true: 4%), and people who are transgender (estimate: 21%, true: 0.6%).
It also applies to religious minorities, such as Muslim Americans (estimate: 27%, true: 1%) and Jewish Americans (estimate: 30%, true: 2%). And we find the same sorts of overestimates for racial and ethnic minorities, such as Native Americans (estimate: 27%, true: 1%), Asian Americans (estimate: 29%, true: 6%), and Black Americans (estimate: 41%, true: 12%)…
A parallel pattern emerges when we look at estimates of majority groups: People tend to underestimate rather than overestimate their size relative to their actual share of the adult population. For instance, we find that people underestimate the proportion of American adults who are Christian (estimate: 58%, true: 70%) and the proportion who have at least a high school degree (estimate: 65%, true: 89%)…
Misperceptions of the size of minority groups have been identified in prior surveys, which observers have often attributed to social causes: fear of out-groups, lack of personal exposure, or portrayals in the media. Yet consistent with prior research, we find that the tendency to misestimate the size of demographic groups is actually one instance of a broader tendency to overestimate small proportions and underestimate large ones, regardless of the topic.

I wonder how much this might be connected to a general sense of innumeracy. Big numbers can be difficult to understand and the United States has over 330,000,000 residents. Percentages and absolute numbers regarding certain groups are not always provided. I am more familiar with some of these percentages and numbers because my work requires it but it does not come up in all fields or settings.

Additionally, where would this information be taught or regularly shared? Civics classes alongside information about government structures and national history? Math classes as examples of relevant information? On television programs or in print materials? At political events or sports games? I would be interesting in making all of this more publicly visible so not just those who read the Statistical Abstract of the United States or have Census.gov as a top bookmark know this information.

Thinking about probabilistic futures

Posted on January 3, 2022 by legallysociable

When looking to predict the future, one historian of science suggests we need to think probabilistically:

The central message sent from the history of the future is that it’s not helpful to think about “the Future.” A much more productive strategy is to think about futures; rather than “prediction,” it pays to think probabilistically about a range of potential outcomes and evaluate them against a range of different sources. Technology has a significant role to play here, but it’s critical to bear in mind the lessons from World3 and Limits to Growth about the impact that assumptions have on eventual outcomes. The danger is that modern predictions with an AI imprint are considered more scientific, and hence more likely to be accurate, than those produced by older systems of divination. But the assumptions underpinning the algorithms that forecast criminal activity, or identify potential customer disloyalty, often reflect the expectations of their coders in much the same way as earlier methods of prediction did.

Social scientists have long hoped to contribute to accurate predictions. We want to both better understand what is happening now as well as provide insights into what will come after.

The idea of thinking probabilistically is a key part of the Statistics course I teach each fall semester. We can easily fall into using language that suggests we “prove” things or relationships. This implies certainty and we often think science leads to certainty, laws, and cause and effect. However, when using statistics we are usually making estimates about the population from the samples and information we have in front of us. Instead of “proving” things, we can speak to the likelihood of something happening or the degree to which one variable affects another. Our certainty of these relationships or outcomes might be higher or lower, depending on the information we are working with.

All of this relates to predictions. We can work to improve our current models to better understand current or past conditions but the future involves changes that are harder to know. Like inferential statistics, making predictions involves using certain information we have now to come to conclusions.

The idea of thinking both (1) probabilistically and (2) plural futures can help us understand our limitations in considering the future. In regards to probabilities, we can higher or lower likelihoods regarding our predictions of what will happen. In thinking of plural futures, we can work with multiple options or pathways that may occur. All of this should be accompanied by humility and creativity as it is difficult to predict the future, even with great information today.

Fighting math-phobia in America

Posted on October 26, 2019 by legallysociable

The president of Barnard College offers three suggestions for making math more enticing and relevant for Americans:

First, we can work to bring math to those who might shy away from it. Requiring that all students take courses that push them to think empirically with data, regardless of major, is one such approach. At Barnard — a college long known for its writers and dancers — empirical reasoning requirements are built into our core curriculum. And, for those who struggle to meet the demands of data-heavy classes, we provide access (via help rooms) to tutors who focus on diminishing a student’s belief that they “just aren’t good at math.”

Second, employers should encourage applications from and be open to having students with diverse educational interests in their STEM-related internships. Don’t only seek out the computer science majors. This means potentially taking a student who doesn’t come with all the computation chops in hand but does have a good attitude and a willingness to learn. More often than not, such opportunities will surprise both intern and employee. When bright students are given opportunities to tackle problems head on and learn how to work with and manipulate data to address them, even those anxious about math tend to find meaning in what they are doing and succeed. STEM internships also allow students to connect with senior leaders who might have had to overcome a similar experience of questioning their mathematical or computational skills…

Finally, we need to reject the social acceptability of being bad at math. Think about it: You don’t hear highly intelligent people proclaiming that they can’t read, but you do hear many of these same individuals talking about “not being a math person.” When we echo negative sentiments like that to ourselves and each other, we perpetuate a myth that increases overall levels of math phobia. When students reject math, they pigeonhole themselves into certain jobs and career paths, foregoing others only because they can’t imagine doing more computational work. Many people think math ability is an immutable trait, but evidence clearly shows this is a subject in which we can all learn and succeed.

Fighting innumeracy – an inability to use or understand numbers – is a worthwhile goal. I like the efforts suggested above though I worry a bit if they are tied too heavily to jobs and national competitiveness. These goals can veer toward efficiency and utilitarianism rather than more tangible results like better understanding of and interaction society and self. Fighting stigma is going to be hard by invoking more pressure – the US is falling behind! your future career is on the line! – rather than showing how numbers can help people.

This is why I would be in favor of more statistics training for students at all levels. The math required to do statistics can be tailored to different levels, statistical tests, and subjects. The basic knowledge can be helpful in all sorts of areas citizens run into: interpreting reports on surveys and polls, calculating odds and risks (including in finances and sports), and understanding research results. The math does not have to be complicated and instruction can address understanding where statistics come from and how they can be used.

I wonder how much of this might also be connected to the complicated relationship Americans have with expertise and advanced degrees. Think of the typical Hollywood scene of a genius at work: do they look crazy or unusual? Think about presidential candidates: do Americans want people with experience and knowledge or someone they can identify with and have dinner with? Math, in being unknowable to people of average intelligence, may be connected to those smart eccentrics who are necessary for helping society progress but not necessarily the people you would want to be or hang out with.

The retraction of a study provides a reminder of the importance of levels of measurement

Posted on September 30, 2019 by legallysociable

Early in Statistics courses, students learn about different ways that variables can be measured. This is often broken down into three categories: nominal variables (unordered, unranked), ordinal variables (ranked but with varied category widths), and interval-ratio (ranked and with consistent spaces between categories). Decisions about how to measure variables can have significant influence on what can be done with the data later. For example, here is a study that received a lot of attention when published but the researchers miscoded a nominal variable:

In 2015, a paper by Jean Decety and co-authors reported that children who were brought up religiously were less generous. The paper received a great deal of attention, and was covered by over 80 media outlets including The Economist, the Boston Globe, the Los Angeles Times, and Scientific American. As it turned out, however, the paper by Decety was wrong. Another scholar, Azim Shariff, a leading expert on religion and pro-social behavior, was surprised by the results, as his own research and meta-analysis (combining evidence across studies from many authors) indicated that religious participation, in most settings, increased generosity. Shariff requested the data to try to understand more clearly what might explain the discrepancy.

To Decety’s credit, he released the data. And upon re-analysis, Shariff discovered that the results were due to a coding error. The data had been collected across numerous countries, e.g. United States, Canada, Turkey, etc. and the country information had been coded as “1, 2, 3…” Although Decety’s paper had reported that they had controlled for country, they had accidentally not controlled for each country, but just treated it as a single continuous variable so that, for example “Canada” (coded as 2) was twice the “United States” (coded as 1). Regardless of what one might think about the relative merits and rankings of countries, this is obviously not the right way to analyze data. When it was correctly analyzed, using separate indicators for each country, Decety’s “findings” disappeared. Shariff’s re-analysis and correction was published in the same journal, Current Biology, in 2016. The media, however, did not follow along. While it covered extensively the initial incorrect results, only four media outlets picked up the correction.

In fact, Decety’s paper has continued to be cited in media articles on religion. Just last month two such articles appeared (one on Buzzworthy and one on TruthTheory) citing Decety’s paper that religious children were less generous. The paper’s influence seems to continue even after it has been shown to be wrong.

Last month, however, the journal, Current Biology, at last formally retracted the paper. If one looks for the paper on the journal’s website, it gives notice of the retraction by the authors. Correction mechanisms in science can sometimes work slowly, but they did, in the end, seem to be effective here. More work still needs to be done as to how this might translate into corrections in media reporting as well: The two articles above were both published after the formal retraction of the paper.

To reiterate, the researcher treated country – a nominal variable in this case since the countries were not ranked or ordered in any particular way – incorrectly which then threw off the overall results. When then using country correctly – from the description above, it sounds like using country as a dummy variable coded 1 and 0 – the findings that received all the attention disappeared.

The other issue at play here is whether corrections to academic studies or retractions are treated as such. It is hard to notify readers that a previously published study had flaws and the results have changed.

All that to say, paying attention to level of measurement earlier in the process helps avoid problems down the road.

Recommendations to help with SCOTUS’ innumeracy

Posted on October 18, 2017 by legallysociable

In the wake of recent comments about “sociological gobbledygook” and measures of gerrymandering, here are some suggestions for how the Supreme Court can better use statistical evidence:

McGhee, who helped develop the efficiency gap measure, wondered if the court should hire a trusted staff of social scientists to help the justices parse empirical arguments. Levinson, the Texas professor, felt that the problem was a lack of rigorous empirical training at most elite law schools, so the long-term solution would be a change in curriculum. Enos and his coauthors proposed “that courts alter their norms and standards regarding the consideration of statistical evidence”; judges are free to ignore statistical evidence, so perhaps nothing will change unless they take this category of evidence more seriously.

But maybe this allergy to statistical evidence is really a smoke screen — a convenient way to make a decision based on ideology while couching it in terms of practicality.

“I don’t put much stock in the claim that the Supreme Court is afraid of adjudicating partisan gerrymanders because it’s afraid of math,” Daniel Hemel, who teaches law at the University of Chicago, told me. “[Roberts] is very smart and so are the judges who would be adjudicating partisan gerrymandering claims — I’m sure he and they could wrap their minds around the math. The ‘gobbledygook’ argument seems to be masking whatever his real objection might be.”

If there is indeed innumeracy present, the justices would not be alone in this. Many Americans do not receive an education in statistics, let alone have enough training to make sense of the statistics regularly used in academic studies.

At the same time, we might go further than the argument made above: should judges make decisions based on statistics (roughly facts) more than ideology or arguments (roughly interpretation)? Again, many Americans struggle with this: there can be broad empirical patterns or even correlations but some would insist that their own personal experiences do not match these. Should judicial decisions be guided by principles and existing case law or by current statistical realities? The courts are not the only social spheres that struggle with this.

Using a GRIM method to find unlikely published results

Posted on March 27, 2017 by legallysociable

Discovering which published studies may be incorrect or fraudulent takes some work and here is a newer tool: GRIM.

GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.

GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are “honest errors or slightly sloppy reporting.”…

After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. “This lights up like a Christmas tree,” Brown said after highlighting on his computer screen the errors Anaya had identified…

Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab.”

I wonder how long it will be before journals employ such methods for submitted manuscripts. Imagine Turnitin for academic studies. Then, what would happen to authors if problems are found?

It also sounds like a program like this could make it easy to do mass analysis of published studies to help answer questions like how many findings are fraudulent.

Perhaps it is too easy to ask whether GRIM has been vetted by outside persons…

Legally Sociable

Pleasant Musings on Sociology, McMansions and Housing, Suburbs and Cities, and Miscellaneous Errata.

Tag Archives: statistics

Beat the lottery odds by buying all the lottery tickets

The importance of statistics on college campuses

Helping readers see patterns and the bigger picture in new housing price data

Americans overestimate the size of smaller groups, underestimate the size of larger groups

Thinking about probabilistic futures

Fighting math-phobia in America

The retraction of a study provides a reminder of the importance of levels of measurement

Recommendations to help with SCOTUS’ innumeracy

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: