Just how many McMansions have actually collapsed like Trump’s polling figures?

One common critique of the McMansion is that they are poorly built. The story continues that because they are mass-produced, the materials are bottom of the line so builders don’t have to do anything more than necessary¬† in the search for big profits. This idea was found in a recent story about the decline of Donald Trump’s polling figures:

Public Policy Polling finds Donald Trump’s numbers collapsing like a poorly-built McMansion.

Might some people find this phrase redundant and ask whether McMansions are poorly-built by definition?

Perhaps I am being too literal here but this gets me thinking about how many McMansions have actually collapsed. I would guess that not too many have collapsed on their own so perhaps the more appropriate figures to search for would measure how many McMansions needed major renovations or fixes and then how this data compares to other kinds of homes. Would HGTV, the network always on the search for homes that need help, be a good source for figures? This is probably not the kind of data builders would want to keep and it would be difficult to collate the information from millions of individual homeowners.

And what would be a better metaphor for the collapse of Donald Trump’s polling numbers?

Discovering fake randomness

In the midst of a story involving fake data generated for DailyKos by the polling firm, Research 2000, TechDirt summarizes how exactly it was discovered that Research 2000 was faking the data. Several statisticians approached Kos after seeing some irregularities in cross-tab (table) data. The summary and the original analysis on DailyKos are fascinating: even truly random data follows certain parameters. One takeaway: faking random data is a lot harder than it looks. Another takeaway (for me at least): statistics can be both useful and enjoyable.

The three issues as summarized on DailyKos:

Issue one: astronomically low odds that both male and female figures would both be even or odd numbers.

In one respect, however, the numbers for M and F do not differ: if one is even, so is the other, and likewise for odd. Given that the M and F results usually differ, knowing that say 43% of M were favorable (Fav) to Obama gives essentially no clue as to whether say 59% or say 60% of F would be. Thus knowing whether M Fav is even or odd tells us essentially nothing about whether F Fav would be even or odd.

Issue two: the margin between favorability and unfavorability ratings did not display enough variance. If the polls were truly working with random samples, there would be broader range of values.

What little variation there was in the difference of those cross-tab margins seemed to happen slowly over many weeks, not like the week-to-week random jitter expected for real statistics.

Issue three: the changes in favorability ratings from week to week were too random. In most polls like this that track week to week, the most common result is no change. Research 2000 results had too many changes from week to week – often small changes, a percent either way.

For each individual issue, the odds are quite low that each would arise with truly random data. Put all three together happening with the same data and the odds are even lower.

Besides issues regarding integrity of data collection (and it becomes clearer why many people harbor a distrust toward polls and statistics), this is a great example of statistical detective work. Too often, many of us see numbers and quickly trust them (or distrust them). In reality, it takes just a little work to dig deeper into figures to discover what exactly is being measured and how it is being measured. The “what” and “how” matter tremendously as they can radically alter the interpretation of the data. Citizens and journalists need some of these abilities to decipher all the numbers we encounter on a daily basis.