Truly a Representative Sample
This post is going to focus on something that has been bugging me ever since I read this column by Roger Simon, the chief political columnist for Politico. Simon essentially demonstrates that he knows very little about survey methodology and analysis. This lack of knowledge is not a big deal if you are the average person but, in my opinion, it is a big deal if you are the chief political columnist for a website like Politico (which currently gets more hits in a minute than Democracy Chronicles gets in a month). My annoyance is not only with Simon’s column, it was simply the tipping point for my frustration and provides to introduce the larger issue – that, unfortunately, this lack of knowledge appears to be fairly widespread among members of the media who tend to overgeneralize polling data and rarely provide much-needed critical analysis.
Consider the following example. This past weekend Reuters/Ipsos released a poll showing a 21-point advantage for Mitt Romney in South Carolina. Many in the media reported this poll (e.g., here, here, here, here, and here; there are plenty more) and a good number failed to report some basic facts about the methodology that, as Nate Silver addressed, are quite important:
“The Reuters/Ipsos poll was conducted online from January 10-13 with a sample of 995 South Carolina registered voters. It included 398 Republicans and 380 Democrats.”
Check out Nate’s article for all the details, but in a nutshell: the poll was conducted online and not in one of the more traditional formats (phone or face-to-face interview); it sampled registered voters and not likely primary voters; and the number of Republicans sampled is much smaller compared to most of the other polls conducted in South Carolina. I want to focus on two of these three issues: 1) the sampling of registered voters versus likely voters and; 2) the broader category of sample size and the idea of a representative sample, to make a few broader points about how the media generally fails to analyze polling data effectively. This failure ultimately leads to a column like the one by Roger Simon.
The distinction between a registered voter and a likely voter is often blurred by the media and it is also particularly important for a low-turnout election such as a primary. Simply put not all registered voters actually vote. Polling data based on likely voters is therefore far more accurate at predicting election results, and is even more valuable in low-turnout elections. Yet, while the media often reports how people were surveyed, whether they are registered or likely, and the margin of error, they often do so in a fashion that treats all polls equal. It is time to acknowledge that not all polls are equal. Identifying inferior polls and explaining why they are inferior is the job of those analyzing and reporting on the polling data. Why discuss and possibly even focus on inferior polling data? Is it possible for a report on a poll to include some kind of basic analysis at the end of it that recommends how much one should weigh the data? My reason for suggesting the media is largely uninformed on this topic is because such analysis is often absent from reports and discussions of polling data. A basic level statistics course covers all the topics necessary to perform such an analysis, and I personally do not think such knowledge is too much to expect of someone who is analyzing and reporting on polling data.
This brings us full circle to Richard Simon. In his critique of polling data he proudly proclaims:
“I have never been called by a political pollster and don’t know anybody who has, but I know some pollsters, who assure me they don’t make the numbers up, and I believe them.”
He goes on to critique a finding in a Washington Post-ABC News poll that Gingrich and Romney were equally favored by Republicans and Republican-leaning independents:
“We are a nation of nearly 313 million people. So how many people did the pollsters actually speak to? If you have extremely good eyes, you can find the answer in tiny type at the bottom of a chart: The Post-ABC poll was conducted by phone “among a random sample of 1,005 adults. That represents 0.0003 percent of the nation at large. (The number of Republicans and Republican-leaning independents was an even smaller sample of 395 people).”
As he is reaching his conclusion he returns to critiquing the Washington Post-ABC News poll, this time doubting the polling data that suggests most Americans are optimistic about their personal finances:
“Is this how “most Americans” — based on a survey of 1,005 of them — really feel?”
When one considers all of these comments it is strongly suggests that the person making them has little to no knowledge of the methodologies pollsters employ to obtain a representative sample, a concept relevant not only to public opinion data but scientific research in general. I am sure Roger Simon is thankful for antibiotics when he gets the flu and I am also sure he has not questioned the sample sizes used in the drug trials for the medication he is prescribed. What is downright humorous however is that Simon makes a good point – that the poll only sampled 395 Republicans and Republican- leaning independents – but fails to explain the reason his point is important. There is no mention of what one might consider a good representative sample for such a poll and/or how to obtain such a sample would be helpful. I would assume it is difficult to offer such analysis if one is unaware of the methodologies pollsters use to obtain their samples
Ultimately, Simon’s article is somewhat of a shame because he is able to identify what is wrong with the poll he chooses to criticize but unable to correctly explain why the factors he has identified are flawed. He instead tried to make a broader argument about how he just doubts the polling industry at large – which again betrays a lack of knowledge of sampling and provides yet another example of the media overgeneralizing one event and trying to apply it across the board. Our analysis of polling data can be more sophisticated and nuanced. The ideas behind random sampling and how to obtain a representative sample are not exceedingly complex – they just are not something many people think about on a day to day basis. The next post at The Rabbit Hole will offer an introduction to the topic in an attempt to broaden awareness on the topic. Understanding such key concepts as random sampling and representative samples will make it easier to engage in discussions about the topics this blog intends to discuss. Expect that post in a few days…