Metrics are dependent on proper segmentation
I'm writing this before the election, so it's not some retro-active defense of bad data or the usual 'but it was within the margins of error!' argument. Which, by the way, is not a bad argument, but feels let true after watching it happen twice in a row...
So, I put money that Trump would win the 2020 election before Corona virus hit. I think if he loses, it will be because he downplayed the virus too much, did not encourage mask wearing, and overall gave the impression that he didn't really have a clear national strategy, even though I think his team does have a decent strategy.
However, despite the polls, I think Trump will win, and it will be convincing. One of the reasons for this is because I see some parallels of how polls look at data is similar to how marketers look at data. I worked for some of the world's largest YouTube marketing campaigns, and one thing we always struggled with was trying to align metrics with TV ratings to show like for like comparisons. Basically, we wanted TV budgets and were trying to show that people were moving away from TV and moving to digital, so budgets should be moving as well. Despite the obvious data we had that this was true, it was incredibly difficult to shift budgets from TV to digital. I think there were 3 main reasons for this:
1. Change Inertia: It's difficult to change what you're doing
- TV has long purchasing cycles, and a very well establish and respected measuring, buying and purchasing process and system in place. The TV rating system in the UK (BARB) is considered the Gold Standard in marketing metrics, and so if you can't prove that your digital media is getting similar results, then you get a share of the social media budget, not the sacred TV budget.
2. Lies, Damn Lies, and Marketing Metrics
- Metrics are complicated, hence why the average marketer can't really explain how the metrics work or how to translate them into the digital world. That's what they hire their analysts for, or at least, that's what they hope their analysts are doing. For example, TV ratings are based on Total Universe Sizes, but the secret sauce is that these total universe sizes are actually the size of the TV audience. So for example, if you have 10M 16-34 year olds in a country, and one year all 10M watch TV, and then 10 years later only 5M watch TV, then you have shrunk your Total Universe significantly. But you don't see the total universe sizes, instead you sell a % of the total universe size. So if you want to buy 50% of the total universe size, 10 years ago that would have gotten your ad viewed by 5M people, whereas today is would only have gotten your ad viewed by 2.5M. That difference is hidden, however, because the TV Ratings are the same (you're still buying 50% of the target audience). This is just one example of multiple tricks that can be done with metrics. Of course you have a single source of truth, and if they start to encounter issues like this, they like to ride their reputation for awhile and try to figure out how to frame the changes to their metrics in a way that they won't lose a lot of money. Another example in point 3 below with Nielsen and online metrics.
3. Fragmentation means less accuracy
- When online exploded, there was an initial fragmentation between offline and online media. Then mobile came in, and made that problem 10x more complicated. The major measurement companies were completely unprepared for this, with the gold standard in digital tracking, Nielsen, not having a mobile solution for about 4-5 years after the mobile revolution took off. They literally could only track desktop/laptop browser activity, and then just extrapolated the rest, which was obviously inaccurate because of how different user behavior was on mobile devices and tablets. Even big guns that lead the mobile revolution like Google and Facebook had a hard time measuring the fragmented marketplace.
So what does this have to do with Polling? Well, similar to TV modeling, polls are often dependent on pre-selected panels that are meant to reflect the overall population, and then extrapolate. Even direct calling polls have to extrapolate based on the segmentation of the audiences they contact, and while probably more accurate than panels, they are still .
The reason I believe this is no longer working is because, similar to the mobile revolution, we are seeing political fragmentation. You're also seeing change inertia, so instead of trying to reflect current reality, all the polling models are based on previous assumptions and historical patterns. In reality, I think the assumptions about gender, income, race, etc. in terms of voting segments are now outdated. This is because of the fragmentation of information and the variety of information sources that now exist. Not only because people can live in their own echo-chamber bubbles, but also because people can leave their echo-chamber bubbles, so you have a lot more cross-pollination of ideas.
Last time, the pollsters claimed they underestimated non-educated white voters. I think this is probably correct, but I think that is only ONE segment that they missed. I think black voters have started to fragment and there is a higher percentage voting for Trump. That means previous assumptions and models around black voters will be skewed. Usually polls try to account for these errors by calculating a margin of error, but the margins of error are calculated based on previous trends. If the baseline assumptions are wrong, then the margin of error calculations are also wrong. In the black voter scenario above, if Trump gets 2x the black vote as any previous Republican candidate, which is possible, then the entire weighting of black voters in the polling results will skew heavily in Biden's direction, and the margin of error calculations will also be off.
Here is another concrete example of how this works in terms of weighting. If you have a panel or polling result that has 500 women and 500 men, because in previous elections women outvote men, then the 500 women responses are weighted slightly higher, maybe 530 to 470. That means if the Biden Trump vote in the original panel was split evenly with women voting Biden 60-40 and men voting Trump 60-40, then after weighting the women higher (again, based on historical voting data) Biden now has a couple points lead. However, on election day, more men show up than women, and so Trump wins. Now imagine this weighting is done for every segmentation, and you can see how getting the segmentations wrong will completely screw your results...
Now, the last question then comes down to why would this change favor Trump over Biden? Well, my theory is that because the segmentation that is happening is partially because of Trump. Trump is a shock to the system, so the people that are moving out of normal voting patterns are breaking towards Trump. The perfect example would be the black American voters I mention above. The men and women split is also a good example, as maybe men are more pumped to vote this year due to Trump. The truth is, the polls are probably no where near their margin or errors or historical levels of accuracy, and just like the digital marketing examples I mentioned above, I think they're riding on their reputation more than actually reacting to the changes that are happening in the electoral landscape. It's too hard to account for everything that has happened, and you can't just say " 'eff it! we don't know!", so they resort to the easiest and tried and true methods, kind of knowing they're no longer relevant but not really having other options they can use.
Their best defense is to leave a 10% chance that they're wrong, so if they are wrong, they can always point to the 10% chance and just say 'that's how statistics work!". And maybe they're right.
Or maybe I am. ;)
Ely Loew
2040 Presidential Candidate
Youtube.com/ElyLoew2040