Who is affected by unusual sports champions like Indiana in college football or Leicester City in the Premier League?

Posted on January 20, 2026 by legallysociable

Sports leagues often have a set of consistent winners who regularly contend for championships. They may have a history and resources. They are known by all in the sport. They may be disliked by plenty of others whose teams do not have regular success or do not challenge for championships.

Sometimes these hierarchies are upset. Last night was one such occurrence with Indiana University beating Miami to cap the college football playoff. Indiana is the football champion for the first time ever. A basketball school won the football championship. As one commentator summed up how it happened, they concluded that it may never happen again:

All of which makes this a singular moment in the sport. The Indiana football program still has the third most losses of any team in FBS history, and I’m not sensing that Northwestern or Wake Forest is all that close to hanging a championship banner. Maybe, though. College football was a static sport for a long time. The last year a team won its program’s first national title was 1996, when the Florida Gators did it. A new economic structure will create new first-time champs on a quicker timeline than that going forward. It will just never, ever yield a two-year flip job like the one Indiana just put on.

In the recent past, I also remember Leicester City winning the Premier League in 2016. This team had finished second in the top tier once in the distant past (late 1920s) and had fluctuated between the top tier and second tier for decades. But 2015-2016 was a magical season where the team overcame great odds to win the league. Ten years later, they are back in the second tier.

Who is affected by these unusual championship victories? Certainly it is good for supporters of these teams. They will remember this forever. Their team won it all when they typically are not even competing for the top spot. The teams will enjoy this success for years, perhaps with new fans and resources, and with a higher status legacy.

What about the broader public? Perhaps some others will join in for the exciting ride of the unusual championship. How many college football fans joined the Indiana bandwagon from their success the previous year through their just-completed undefeated year? How many fans enjoyed Leicester City beating the top teams that tend to dominate the Premier League?

At the same time, this success does not last forever. Do sports championships change people’s day to day lives? Will the regular powers in the sport reassert their dominance?

Maybe the most enduring legacy will be the hope that any team may have that they too could have these unusual seasons. Get the right coach. Attract the right star player. The top teams might falter. It could all come together for one season. It probably won’t – there can only be one champion each year – but it could. Remember when Indiana or Leicester City or other unexpected champions won it all? The great outlier season could happen. The odds that another unexpected champion could arise have to be greater than 0%, right?

(The 2016 World Series victory by the Chicago Cubs might be a similar unexpected championship – see one comparison to Indiana’s win here. The Cubs’ win led to a large public celebration. For multiple reasons, I did not include them in this post.)

From outlier to outlier in unemployment data

Posted on May 4, 2020 by legallysociable

With the responses to COVID-19, unemployment is expected to approach or hit a record high among recorded data:

April’s employment report, to be released Friday, will almost certainly show that the coronavirus pandemic inflicted the largest one-month blow to the U.S. labor market on record.

Economists surveyed by The Wall Street Journal forecast the new report will show that unemployment rose to 16.1% in April and that employers shed 22 million nonfarm payroll jobs—the equivalent of eliminating every job created in the past decade.

The losses in jobs would produce the highest unemployment rate since records began in 1948, eclipsing the 10.8% rate touched in late 1982 at the end of the double-dip recession early in President Reagan’s first term. The monthly number of jobs lost would be the biggest in records going back to 1939—far steeper than the 1.96 million jobs eliminated in September 1945, at the end of World War II.

But, also noteworthy is what these rapid changes follow:

Combined with the rise in unemployment and the loss of jobs in March, the new figures will underscore the labor market’s sharp reversal since February, when joblessness was at a half-century low of 3.5% and the country notched a record 113 straight months of job creation.

In other words, the United States has experienced both a record low in unemployment and a record high within three months. A few thoughts connected to this:

1. Either outlier is noteworthy; having them occur so close to each other is more unusual.

2. Their close occurrence makes it more difficult to ascertain what is “normal” unemployment for this period of history. The fallout of COVID-19 is unusual. But the 3.5% unemployment can also be considered unusual compared to historical data.

3. Given these two outliers, it might be relatively easy to dismiss either as aberrations. Yet, while people are living through the situations and the fallout, they cannot simply be dismissed. If unemployment now is around 16%, this requires attention even if historically this is a very unusual period.

4. With these two outliers, predicting the future regarding unemployment (and other social measures) is very difficult. Will the economy quickly restart in the United States and around the world? Will COVID-19 be largely under control within a few months or will there be new outbreaks for a longer period of time (and will governments and people react in the same ways)?

In sum, dealing with extreme data – outliers – is a difficult task for everyone.

Home value algorithms show consumers data with outliers, mortgage companies take the outliers out

Posted on January 15, 2019 by legallysociable

A homeowner can look online to get an estimate of the value of their home but that number may not match what a lender computes:

Different AVMs are designed to deliver different types of valuations. And therein lies confusion.

Consumers don’t realize that there’s an AVM for nearly any purpose, which explains why different algorithms serve up different results, said Ann Regan, an executive product manager with real estate analytic firm CoreLogic. “The scores presented to consumers are not the same version that is being used by lenders to make decisions,” she said. “The consumer-facing AVMs are designed for consumer marketing purposes.”

For instance, more accurate models used by lenders do not include outliers — properties that sold for extremely high or low prices and that consequently would skew the averages and the comparable sales for a particular house, like yours. But models used by consumer websites, such as brokers’ sites and national listing sites, scoop in as much “sold” data as possible when concocting a valuation, because then they can claim to include all available data. That’s true, said Regan, but it’s more accurate to weed out misleading data.

AVMs used by lenders send along “confidence scores” that indicate how firm the estimate is. That is a factor typically not included alongside consumer AVMs, she added.

This is an interesting trade-off. The assumption is the consumer wants to see that all the data is accounted for, which makes it seem that the estimate is more worthwhile. More data = more accuracy. On the other hand, those that work with data know that measures of central tendency and variability can be thrown off by unusual cases, often known as outliers. If the value of a home is too high or too low, and there are many reasons why this could be the case, the rest of the data can be thrown off. If there are significant outliers, more data does not equal more accuracy.

Since this knowledge is out there (at least printed in a major newspaper), does this mean consumers will be informed of these algorithm features when they look at websites like Zillow? I imagine it could be tricky to easily explain how removing some of the housing comparison data is actually a good thing but if the long-term goal is better numeracy for the public, this could be a good addition to such websites.

When a few people generate most of the complaints about a public nuisance

Posted on October 29, 2014 by legallysociable

The newest runway at O’Hare Airport has generated more noise complaints than ever. However, a good portion of the complaints come from a small number of people.

She now ranks among the area’s most prolific complainers and is one of 11 people responsible for 44 percent of the noise complaints leveled in August, according to the city’s Department of Aviation.

The city, which operates the airport, pokes at her serial reporting in its monthly report by isolating the number of complaints from a single address in various towns. It’s a move meant to downplay the significant surge in noise complaints since the airport’s fourth east-west runway opened last fall, but it only seems to energize Morong…

Chicago tallied 138,106 complaints during the first eight months of the year, according to the Department of Aviation. That figure surpassed the total number of noise complaints from 2007 to 2013.

The city, however, literally puts an asterisk next to this year’s numbers in monthly reports and notes that a few addresses are responsible for thousands of complaints. The August report, for example, states that 11 addresses were responsible for more than 13,000 complaints during that 31-day period…

But even excluding the serial reporters, the city still logged about 16,000 complaints in August, about eight times the number it received in August 2013.

There are two trends going on here:

1. The overall number of complaints is still up, even without the more serial complainers. This could mean several things: there are more people now affected by noise, a wider range of people are complaining, and/or this system of filing complaints online has caught on.

2. A lot of the complaints are generated by outliers, including the main woman in the story who peaked one day at 600 complaints. It is interesting that the City of Chicago has taken to pointing this out, probably in an attempt to

This is not an easy issue to solve. The runway issues and O’Hare’s path to being the world’s busiest airport again mean that there is more flight traffic and more noise. This is not desirable for some residents who feel like they are not heard. Yet, it is probably good for the whole region as Chicago tries to build on its transportation advantages. What might the residents accept as “being heard”? Changing whole traffic patterns or efforts at limiting the sound? Balancing local and regional interests is often very difficult but I don’t see how this is going to get much better for the residents.

Statistical anomalies show problems with Chicago’s red light cameras

Posted on July 23, 2014 by legallysociable

There has been a lot of fallout from the Chicago Tribune‘s report on problems with Chicago’s red light cameras. And the smoking gun was the improbable spikes in tickets handed out on single days or in short stretches:

From April 29 to June 19, 2011, one of the two cameras at Wague’s West Pullman intersection tagged drivers for 1,717 red light violations. That was more violations in 52 days than the camera captured in the previous year and a half…

On the Near West Side, the corner of North Ashland Avenue and West Madison Street generated 949 tickets in a 17-day period beginning June 23, 2013. That is a rate of about 56 tickets per day. In the previous two years, that camera on Ashland averaged 1.3 tickets per day…

City officials insisted the city has not changed its enforcement practices. They also said they have no records indicating camera malfunctions or adjustments that would have affected the volume of tickets.

The lack of records is significant, because Redflex was required to document any time the operation of a camera was disrupted for more than a day, as well as work “that will affect incident volume” — in other words, adjustments or repairs that could increase or decrease the number of violations.

In other words, graphs showing the number of tickets over time show big spikes. Here is one such graph from the intersection of Halsted and 119th Street:

http://apps.chicagotribune.com/news/local/red-light-camera-tickets/#intersection/halsted-and-119th

As the article notes, there are a number of these big outliers in the data, outliers that would be difficult to miss if anyone was examining the data like they were supposed to. Given the regularities in traffic, you would expect fairly similar patterns over time but graphs like this suggest something else at work. Outside of someone directly testifying to underhanded activities, it is difficult to imagine more damaging evidence than graphs like these.

Guinness World Records for housing

Posted on January 6, 2014 by legallysociable

Here is a roundup of some of the 2014 Guinness World Records in housing:

Knapp, who died in 1988, lived in the same house in Montgomery Township, Pa., for 110 years. And for that feat, she earns the title as the person who has lived the longest time ever in one residence, according to the 2014 edition of the “Guinness World Records.”…

While we’re at it, a nod to the world’s tallest real estate agents: Laurie and Wayne Hallquist are 6’6″ and 6’10”, respectively. She’s a full-time agent with Prudential California Realty in Stockton, Calif., while he’s a part-timer with the company…

The skinniest house on record is in Warsaw. It is three feet two inches wide at its narrowest point and just about five feet at its widest. It contains a floor area of 151 square feet, and instead of stairs, occupants climb a ladder to reach the bedrooms above…

The tallest resident-only building is in Dubai. Princess Tower is 1,356-feet high, with the highest occupied floor at 1,171 feet. But the title of tallest residential apartments belongs to Burj Khalifa, also in Dubai, which combines a hotel, offices and apartments. There, the highest residential floor—the 108^th—is at 1,263 feet.

Houses, their furnishings, and apparently, their agents, come in all shapes and sizes. However, when I think about these records, it strikes me that most housing in the United States is relatively uniform. I don’t mean that the housing is uniform – this is a common criticism of suburban housing and I don’t think it is particularly fair – but that most housing is within a standard deviation or two from normal. Give or take a few rooms, a few decades, and some furnishings and decorations, most housing is “normal.” The housing cited in Guinness tends to be unusual and extreme outliers.

Long tail: 17% of the seven foot tall men between ages 20 and 40 in the US play in the NBA

Posted on August 27, 2013 by legallysociable

As part of dissecting whether Shaq can really fit in a Buick Lacrosse (I’ve asked this myself when watching the commercial), Car & Driver drops in this little statistic about men in the United States who are seven feet tall:

The population of seven-footers is infinitesimal. In 2011, Sports Illustrated estimated that there are fewer than 70 men between the ages of 20 and 40 in the United States who stand seven feet or taller. A shocking 17 percent of them play in the NBA.

In the distribution of heights in the United States, being at least seven feet tall is quite unusual and at the far right side of a fairly normal distribution. But, being that tall increases the odds of playing in the NBA by quite a lot. As a Forbes post suggests, “Being 7 Feet Tall [may be] the Fastest Way To Get Rich in America“:

Drawing on Centers for Disease Control data, Sports Illustrated‘s Pablo Torre estimated that no more than 70 American men are between the ages of 20 and 40 and at least 7 feet tall. “While the probability of, say, an American between 6’6? and 6’8? being an NBA player today stands at a mere 0.07%, it’s a staggering 17% for someone 7 feet or taller,” Torre writes.

(While that claim might seem like a tall tale, more than 42 U.S.-born players listed at 7 feet did debut in NBA games between 1993 and 2013. Even accounting for the typical 1-inch inflation in players’ listed heights would still mean that 15 “true” 7-footers made it to the NBA, out of Torre’s hypothetical pool of about 70 men.)…

And given the market need for players who can protect the rim, there are extra rewards for this extra height. The league’s median player last season was 6 feet 7 inches tall, and paid about $2.5 million for his service. But consider the rarified air of the 7-footer-and-up club. The average salary of those 35 NBA players: $6.1 million.

(How much does one more inch matter? The 39 players listed at 6 feet 11 inches were paid an average of $4.9 million, or about 20% less than the 7 footers.)

Standing as an outlier at the far end of the distribution seems to pay off in this case.

Bad logic: stories of successful college dropouts obscure advantages of going to college

Posted on March 19, 2013 by legallysociable

The president of the University of Chicago writes that holding up successful college dropouts as models takes away attention from the advantages of a college degree:

Names like Jobs, Gates, Dell, and others lend star power to the myth of the wildly successful college dropout. One recent New York Times homage to the phenomenon compared dropping out to “lighting out for the territories to strike gold,” with one young executive describing it as “almost a badge of honor” among startup entrepreneurs. Like any myth, this story has a kernel of truth: There are exceptional individuals whose hard work, determination, and intelligence make up for the lack of a college degree. If they could do it, one might think, why can’t everybody?

Such a question ignores the outlier status of these exceptional drop-out entrepreneurs and innovators.

Those who are able to achieve such success often rely on a set of skills already developed before they get to college. They know how to educate themselves, get a bank loan, and manage their time and their money. They may benefit from a network of family, friends and acquaintances who open doors and provide a safety net.

But what happens to young people without access to these important resources? For them, skipping college to pursue business success is like investing their savings in lottery tickets in the hopes they will be a multimillion-dollar winner, or failing to pursue an education because they expect to be an NBA superstar. The reality is that the next college dropout will not be LeBron James, James Cameron, or Mark Zuckerberg. He will likely belong to the millions of college drop-outs you don’t hear the press singing about. These are the 34 million Americans over 25 with some college credits but no diploma. Nearly as large as the state of California, this group is 71 percent more likely to be unemployed and four times more likely to default on student loans. Far from being millionaires, they earn 32 percent less than college graduates, on average.

I’ve seen this logic used in arguments about not having to spend lots of money on college or from those who see college as liberal indoctrination. As Zimmer argues, using outliers to build a theory is just not a good idea. These famous cases are held up partly because they are so rare, not because this is necessarily a good path to pursue. This is similar to the logic used in holding up rages to riches stories; while it is true that social mobility, upward and downward, occurs in the United States, a phenomenal change in position over one lifetime is more rare.

I’ve used this very example with my Introduction to Sociology class when talking about why people go to college. I ask them if they are aware of wealthy college dropouts like Bill Gates and Steve Jobs. They say yes. I then ask if they dropped out of college, would their parents accept these stories as good rationale? They answer no. I then tell them a little of the Bill Gates story as relayed by Malcolm Gladwell in Outliers. Gates attended a pretty good high school that through one student’s parent who worked for a computer company was able to purchase a used mainframe computer. Gates then had a rare opportunity at the time for a high school student to spend hours with the mainframe and learn about it. He was then able to build on this background and later founded Microsoft with Paul Allen. Gladwell uses this as an example of the Matthew effect where those who come from more advantaged backgrounds (or who happened to be the oldest hockey players) tend to get more opportunities later in life.

Five main methods of detecting patterns in data mining

Posted on April 5, 2012 by legallysociable

Here is a summary of five of the main methods utilized to uncover patterns when data mining:

Anomaly detection : in a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit.

Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.

Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example – again with purchasing behavior – we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this – large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.

Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.

Several of these seem similar to methods commonly used by sociologists:

1. Anomaly detection seems like looking for outliers. On one hand, outliers can throw off basic measures of central tendency or dispersion. On the other hand, outliers can help prompt researchers to reassess their models and/or theories to account for the unusual cases.

2. Cluster detection and/or classification appear similar to factor analysis. This involves a statistical analysis of a set of variables to see which ones “hang together.” This can be helpful for finding categories and reducing the number of variables in an analysis to a lesser number of important concepts.

3. Regression is used all the time both for modeling and predictions.

This all reminds me of what I heard in graduate school about the difference between data mining and statistical research: data mining amounted to atheoretical analysis. In other words, you might find relationships between variables (or apparent relationships between variables – could always be a spurious association or there could be suppressor or distorter effects) but you wouldn’t have compelling explanations for these relationships. While you might be able to develop some explanations, this is a different process than hypothesis testing where you set out to look and test for relationships and patterns.

Legally Sociable

Pleasant Musings on Sociology, McMansions and Housing, Suburbs and Cities, and Miscellaneous Errata.

Tag Archives: outliers

Who is affected by unusual sports champions like Indiana in college football or Leicester City in the Premier League?

From outlier to outlier in unemployment data

Home value algorithms show consumers data with outliers, mortgage companies take the outliers out

When a few people generate most of the complaints about a public nuisance

Statistical anomalies show problems with Chicago’s red light cameras

Guinness World Records for housing

Long tail: 17% of the seven foot tall men between ages 20 and 40 in the US play in the NBA

Bad logic: stories of successful college dropouts obscure advantages of going to college

Five main methods of detecting patterns in data mining

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: