In my opinion, 538 did some of the best analytics work in the 2016 election. They were the only ones who came anywhere close to property estimating the chances of the outcome that actually occurred, and they were one of only a couple models who took sufficiently seriously how unusual things were - particularly, the massive unpopularity of both candidates.
I'm getting really irritated with people (in this thread and other) attacking 538 along partisan or personal lines for not making the same predictions they did personally. 538 made a model, not a poll; they are only ever as good as the polls they represent, and even then, they still managed to be better than quite a lot of public polling and commentary. If Hillary Clinton had won, the same kinds of people on the other side would be attacking 538 for being too bullish on Trump.
The worst part is that a lot of this is predicated on the idea that giving something a 1 in 3 chance is the same as saying it'll never happen, when it's almost the opposite. There are some critical misunderstandings of statistics going on here, on all sides.
After the election, Silver said something to the effect of this. That given the polling data available, there is no way anyone could have favored Trump. Clinton was leading by a large amount in a large percentage of polls. You would have to apply some seriously creative statistics to make a model that favored Trump.
Systemic polling error is a thing that happens. It's happened in previous elections. The polls are much more accurate than chance, but they aren't infallible. Particularly in this election, it's difficult to predict voter turnout from polls.
538 was one of the only models that took that into account. And as far as I know, they gave better odds to Trump than everyone else that tried to predict the election with statistical methods. Certainly they have a better track record than political pundits, which have never been better than chance.
Yep. And it's also worth remembering that FiveThirtyEight's thing is that they do analysis based on data as opposed to punditry. Yes, they apply corrections and otherwise filter the data sources. But some people seem to think that Silver should have gotten up a couple days before the election, state that he had a bad feeling about things, and dismiss the best forecast he could make based on some shaky polling data and gone with Trump based on gut feel. Sure, he would have ended up being correct. That would also run counter to the fundamental philosophy of the site.
People forget that Silver's approach is basically weighting polls to figure out what the election will be, and he's the best "poll weigher" out there. But if the polls are systematically wrong, being the best at it is an unimpressive subtlety to the layperson.
To the extent 2018 brings increasingly contentious elections, we will likely see polling to remain inaccurate due to voter unwillingness to state unpopular opinions. In that case, multi-input "big data" approaches may prevail.
(Of course, it's also possible that 2016 is a "top" in terms of divisive rhetoric...only time can tell.)
there's also the truism that if your model predicts a 25% chance for something, then 25% of the time the thing should happen.
Even if the model said 95% for Clinton, there's still a small possibility for Trump to win. It's hard to accept because 95% is really high! But hey, people win the lottery, right?
I am curious if there is a true statistically significant number of people who provide false answers and if it is growing.
plus do polls take into account where they are polling, some areas are hostile to one party or another and they may not get a good response regardless of intent
I don't think it was that hard to see that there might be some bias in the polling data itself. Trump seemed very much like a candidate which would get a nontrivial number of votes from people who would not "admit" that in a poll. I'd say he's exactly the kind of candidate that has people that will very loudly say they voted for him or keep completely quiet about it and less of a middle ground.
Granted hindsight etc.
I'd say the same will be true for some of the middle-right wing parties in Europe. Notably I think AfD in Germany will (unfortunately) get more votes than polls will indicate in the upcoming election(s). Le Pen in France could be a similar case but I'm not well informed enough on French politics to feel sure about that statement.
They gave the actual outcome - >300 electors - the odds of 1 in 10. That's a bigger error than people attribute to them based on "odds of Trump winning: 1 in 3".
This is a rare case where the "data-driven" cult is forced by reality to realize how worthless everything it praises actually is, because data doesn't measure the right thing. Normally people simply ignore the difference between what's measured and what you think you're measuring and all the wrong data-based conclusions become religion every civilized person is expected to believe in. That's why it's important to drive home the point how badly 538 did (and yes, others did even worse, but 538 didn't do "pretty well", it did tremendously badly.)
And BTW there were better "data-driven" predictions than 538's, just not based on polls. Allan Lichtman's model is one that got it right. Incidentally, the incentive to outright falsify polling data (not only for pollsters but to some degree, for those polled!) is much larger than the incentive or the ability to falsify the kind of data his model looks at.
> If Hillary Clinton had won, the same kinds of people on the other side would be attacking 538 for being too bullish on Trump.
In fact, that very thing did happen in the week leading up to the election, so sure were some that Clinton would win. There was a raft of articles nitpicking the model, particularly the adjustments that it applied to polls and the level of uncertainty that it assumed.
> In fact, that very thing did happen in the week leading up to the election, so sure were some that Clinton would win. There was a raft of articles nitpicking the model, particularly the adjustments that it applied to polls and the level of uncertainty that it assumed.
Ugh, yes, like that HuffPo guy attacking Nate Silver. He apologized later, at least.
> The worst part is that a lot of this is predicated on the idea that giving something a 1 in 3 chance is the same as saying it'll never happen, when it's almost the opposite. There are some critical misunderstandings of statistics going on here, on all sides.
On the morning of the election, FiveThirtyEight said that the chances of Trump winning were roughly equal to the chances of a random coin flip coming up heads twice in a row (25%).
In that situation, a good statistician would certainly say that the expected outcome is for at least one of the coin flips to land on "tails" (75% chance). But if the coin lands on heads twice, you wouldn't say that "statistics is broken"[0] or that the statistician predicting the outcome was wrong.
Similarly, let's say that you had a separate person predicting the probability of the double-heads coin flip to be 90%. If it does happen, that doesn't mean they made a good prediction, just because the outcome was correct. They predicted the correct outcome, but the degree of confidence they expressed in it was completely wrong.
Predictions aren't just about getting the outcome right - they're also about expressing the right degree of confidence in the results.
[0] Which is literally what some otherwise-respectable journalists complained after the election
You're totally right. The trouble is that when 538 was saying "wait, hold up, we're not sure," and in all their commentary being very clear that that 25% chance was a real, important possibility, people were taking it as, "oh, they think Hillary has it in the bag." People are taking their being rightfully less certain than anyone else using the same data as meaning they were just as wrong as anyone else, when the data itself was lacking. They weren't.
(Of course, the degree to which the data was lacking and the reasons for that lack is another issue in and of itself.)
> 538 made a model, not a poll; they are only ever as good as the polls they represent, and even then, they still managed to be better than quite a lot of public polling and commentary.
I completely agree with almost all of your post, but want to defend the polls, even though you didn't explicitly blame the polls. FiveThirtyEight themselves consistently defend the polls even in the 2016 election. For example, they ran an article a few days before the election that Trump was only "a normal polling error" away from Clinton [1], and in their podcasts and articles since the election, have stated that in fact an average polling error was exactly what happened. The problem with a normal polling error in this case was that the election was close, and in particular, states that were critical for Clinton were very close. And FiveThirtyEight's model was the only one to really recognize the degree of uncertainty this really created by giving Trump a 25 or 33% chance.
I think sometimes people get confused about percent CHANCE versus actual estimated results.
All those people need to do is go to the election forecast page, which is still up[1], and look at the percentages under "Popular vote". Or down lower under "How the forecast has changed", click POPULAR VOTE to see the percentages over time.
For instance, the final pre-election estimate before the election gives:
- CHANCE TO WIN of 71.4% Clinton vs. 28.6% Trump, around a 1 in 3.5 chance for Trump to win.
- RESULTS ESTIMATE of 48.5% Clinton vs. 44.9% Trump, with a big overlapping margin of error.
June 16, 2015: Why Donald Trump Isn’t A Real Candidate, In One Chart
July 16, 2015: Two Good Reasons Not To Take The Donald Trump ‘Surge’ Seriously
July 20, 2015: Donald Trump Is The World’s Greatest Troll
Aug. 6, 2015: Donald Trump’s Six Stages of Doom
Aug. 11, 2015: Donald Trump Is Winning The Polls, And Losing The Nomination
Nov. 23, 2015: Dear Media, Stop Freaking Out About Donald Trump’s Polls
Donald Trump Comes Out Of Iowa Looking Like Pat Buchanan
No, but you're cherry picking very old articles, so I'm not terribly convinced. Commentary and articles as the election versus Clinton proceeded took him quite seriously.
8 Nov: There's A Wide Range Of Outcomes, And Most of Them Come Up Clinton
6 Nov: How Much Did Comey Hurt Clinton's Chances?
6 Nov: Don't Ignore the Polls – Clinton Leads, But It's A Close Race
5 Nov: Why We Don't Know How Much Sexism Is Hurting Clinton's Campaign
4 Nov: National Polls Show Clinton's Lead Stabilizing – State Polls, Not So Much
4 Nov: Trump Is Just A Normal Polling Error Behind Clinton
3 Nov: Why Clinton's Position Is Worse Than Obama's
2 Nov: The How-Full-Is-This-Glass Edition
2 Nov: Trump's Chance Of Victory Has Doubled In The Past Two Weeks
1 Nov: Yes, Donald Trump Has A Path To Victory
1 Nov: On A Scale From 1 To 10, How Much Should Democrats Panic?
31 Oct: The Odds Of A Popular Vote-Electoral College Split Are Increasing
31 Oct: Comey Or Not, Trump Continues To Narrow Gap With Clinton
27 Oct: Don't Read Too Much Into Early Voting
26 Oct: Is The Presidential Race Tightening?
24 Oct: Why Our Models Are Much More Bullish Than Others On Trump
From:brentbbi@webtv.net : ... I see Dan, Nate Silver and various other insider pundits saying what they are saying, I have been to enough rodeos to know that thoughts are being planted from the Obama-Clinton consultant class (not you, and you know who I mean). This is not helpful to Hillary, quite the contrary, ...
Are you referring to this “The Cubs Have A Smaller Chance Of Winning Than Trump Does”[0]? If so, I don’t understand why you would consider such a headline to be inaccurate or not objective.
It's worth nothing that per the linked talks, 538's data visualizations are indeed created with R and ggplot2; the fancy annotations are done in Illustrator after exporting from ggplot2 in SVG/PDF.
I strongly recommend using Plotly, specifically ggplotly (https://plot.ly/ggplot2/), which converts ggplot2 charts to interactive d3 charts with good parity. Plotly also has WebGL and 3D data visualization support. One of the Plotly developers has a very good book on using Plotly in R: https://cpsievert.github.io/plotly_book/
You can build Shiny apps in RStudio and publish them with one click to shinyapps.io. Works very similarly to publishing with Plotly, even though Shiny is a server-side package vs. client-side for Plotly.
(You can build a Plotly chart in ggplot, embed it in a web page and then script it with JS to get a visualization. Frankly, that's a PITA. You can write R/shiny code that will generate a visualization, with HTML controls etc, and you end up writing a web app in R and that's a different PITA. I would like to see Plotly generate a web wrapper that automates wiring up a plotly graph and scripting it from an app so you can update data on the fly to make it a visualization. Supposedly they are working on it with their dash framework, but it's not released yet.)
The pricing on shinyapps.io is prohibitive if the post gets any more than a trickle of page views.
The embed-directly-into-HTML is the approach I use for my Jekyll blog and it does not require much effort. (I just have to set a YAML flag to load Plotly library)
Yes, for embedding a chart and getting some basic active functionality Plotly is very easy.
The part that requires some custom Plotly coding is when you want to script the chart ... have HTML buttons, sliders, to filter, recompute, revisualize the data.
It's beautiful that it even works, and you can programmatically update the JSON of an embedded chart. But it would be nice if you could give Plotly a dataset, plot the data, and say, give me a button to filter rows using these criteria for this column (or run some other code on the data) and re-render the chart. Right now you have to code that manually and update the JSON model.
Visualization used to mean charts that were animated and/or would dynamically update but now it means any old chart LOL.
Good lord, even after all this time, ggvis' demo is still rather a disaster on mobile. Oh well, it simply isn't possible for everything to be top priority. Rstudio has done some amazing work, I'm afraid ggvis is not yet up to the standards of their other projects.
I find it interesting that there isn't a single comment here about the package itself.
There's some interesting datasets in here[1]! Everything from "How American's like their Steak"[2] to "The Most Common Unisex Names In America: Is Yours One Of
Them?"[3]
As a (mostly) Python person I'd like to see these in Python!
I remember HuffingtonPost showing a 98% chance of Hillary winning. Seemed way too optimistic. I wonder how many voters that influenced to "safely" vote for a non-mainstream candidate...
Here is their methodology -- apparently quite flawed?
There is nothing fake about the news coverage that CNN provides. It cannot be honestly or rationally equated with the fake news phenomenon that we saw on Facebook during the election.
Also, 538 was the most accurate model during the election. They were the only ones fighting against the narrative that it was a sure win for HC.
Because they keep publishing fake news, hiding inconvenient truths, and lying. They report any and all unverified anti-Trump rumors, no matter how ridiculous, but refuse to report fairly when legitimate accusations are made against others. I used to like CNN, but over the past 2 years they have embarrassed themselves terribly, and their ratings show it.
Can you give some examples of things they have published which are fake, or lies that have been exposed? Most of the cases where people have claimed this, it's been due to a critical failure to read the reporting CNN actually did.
For instance, in the recent case where Buzzfeed released a massively unsubstantiated intelligence "dossier," CNN reported that Buzzfeed had done it and that it had been given to Trump and Obama in a briefing. Those two pieces of information are important and valid to report. CNN never said the document was accurate and didn't itself publish it specifically because it's not verifiable. How is that "fake news?"
CNN said “it’s illegal to possess these stolen documents [the Hilary emails from wikileaks]. It’s different for the media. So everything you learn about this, you’re learning from us.”
Thats fake and a lie, and they were exposed quickly about that.
Agreed. It's a lie, and a particularly boneheaded one.
Is that occurrence enough to conclude that all their reporting, on everything, is flawed? Because if so I think I could find similar examples from effectively any mainstream source, and any partisan (HuffPo, Breitbart, et.c) ones as well.
Who exactly should we trust when our criteria exclude everyone?
That is an opinion piece at one of the Washington Post's blogs, the Volokh Conspiracy, it is not a news article. Furthermore, the blog authors are not Post employees and they are editorially independent from the rest of the newspaper.
That being said, I strongly believe that reputable news sources should not publish unmoderated opinion blogs from their own domain. No matter how many times the page says "opinion" or "blog" or "unaffiliated", if it's published on their site, on their platform, on their domain, it's going to be equated with that news outlet.
February. False story about Carson dropping out. Their own "fact checker" lied about it.
August. CNN edits footage of sister of man shot by Milwaukee police to make it sound like she was calling for peace. She was calling for riots in white cities.
I don't think you know what fake news is. It isn't biased news, its not incorrect news, or unsubstantiated rumors.
It is websites that steal CSS and styles from official news sites. That then create a fake news paper and publish 100% fiction. The Denver Guardian http://www.snopes.com/fbi-agent-murder-suicide/ is an example of "Fake News". Nothing published by CNN or FiveThirtyEight is fake news. Incorrect, sure, possibly biased but not "Fake News".
You don't get to redefine fake news. I'm a huge fan of real journalism, and the ethics and integrity behind it. When journalists fail those ethics and report things they haven't vetted, that is not news, it's fake news.
A) 538 was one of the few sources that actually gave Donald Trump a decent chance, and IMHO 538 is one of the few sources that was discussing about the limitations of traditional polling. A lot of other sources were more blindly trusting of the polls.
B) The data in the link contains data that is not just about politics. Data includes such non-political topics as, say, "Where Do People Drink The Most Beer, Wine And Spirits?" to "How Americans Like Their Steak".
"Fake news" has unfortunately become a convenient euphemism that usually means the person saying it doesn't like a particular site's political stance. Fine, but what do the above type statistical sets have to do with election politics in the first place?
C) To be honest, since the data sets actually are present, a "this is fake news" sort of post seems lazy. If you have issues with 538's reporting, dive into R and report on any issues you find. Or do you think all of these datasets are fake? That's a tall claim without proof.
Yep. Beautifully wrought confirmations of bad hypotheses. After this election I'm taking anything stats-related on 538 (most everything) with a huge grain of salt. Can't stand another "Here's why the numbers show Hillary will win in a landslide" etc.
I'm pretty sure 538 were the only ones not predicting a strong EC victory for Hillary. And they kept mentioning how terrible the polls were in Wisconsin and Michigan, and that they had to interpolate based on national polls. And in the popular vote Hillary won by a landslide.
So this is a statistics package by the people who best interpreted the available evidence. Missing a 66% confidence forecast happens 1 out of every 3 times, it's not a big miss.
You do realize that of all the models available, 538's was the closest to the actual outcome and gave Trump by far the largest chance?
538 was very clear from the outset that the model's goal was predicting likelihoods, not being an oracle. It said there was a 1 in 3 chance of the election coming out like it did. That's not inaccurate - it was unlikely, but it still happened.
Plus, 538 didn't make the data. It aggregated data others made.
It's not a pejorative. It's a common phrase meaning "the quality of your output is only as good as your input data".
In this case, input data was skewed/biased/inaccurate for a number of reasons (sample choices, underreporting, etc.). Thus even the best predictions were wrong.
That's true, but the trouble is that if you look back to past elections, those polls are also wrong in very similar ways. They were just wrong in a way that overemphasized the success of the right candidate, whereas polls this year were wrong in a way that suggested success for the wrong candidate. So the errors have existed for a long time, and in fact they were lesser this year than in previous years. Polls are getting better, not worse - but the narrative has become "never trust polls again because they were wrong" when the truth is closer to "learn lessons from the polls because they got close but made serious missteps."
That's only an issue if you don't account for uncertainty in your polling data. If you ever read their articles or listened to the model talk podcast you'd know that a huge amount of effort was put into trying to quantify the uncertainty in the polls based on historical polling errors and factors unique to this year.
They gave Trump a 28.6% chance and pointed out that the model had an unusual amount of uncertainty.
Given the lead Clinton ended up with in the popular vote and the small margin Trump won by in the critical states the prediction seems pretty reasonable to me.
Closer to 50/50. It was obvious to me a year ago he was going to win. Go watch some pre-election videos of literally any news channel talking about the "path to 270". There are some incredible delusions at play. They could not separate their own feelings and desires from the work.
This is nonsense. They ran a statistical model - their feelings have nothing to do with it. They never tampered with the model to favor a particular candidate. They barely change the model at all, just feed it with polling data.
Good job predicting the outcome of the election. But no way will you be able to do better than 538 in the long run.
Go watch some pre-election videos of literally any news channel talking about the "path to 270". There are some incredible delusions at play. They could not separate their own feelings and desires from the work.
I agree with this entirely (assuming you are talking generally, not about 538 specifically). There were people in the GJP Superforecasting project giving Clinton 98% chance of the win 2 months out, which in a 2-person race is just ridiculous (There were people giving Trump that as well, but a lot less of them).
How did you come to this 50/50 prediction though? I can't find the report now, but I believe that Trump data science team gave him a 35% chance of winning, and that was only in the last 10 days of the campaign (they gave him much less prior to that).
To go from the 28%/35% (538/Trump Team) range to 50% seems a pretty big jump.
I haven't done the maths, but to make it a 50/50 doesn't that mean polls in Michigan etc would have had to be around 10% incorrect (to give him sufficient margin to make it that sure)?
That seems a pretty big difference to what the error actually was.
I'm getting really irritated with people (in this thread and other) attacking 538 along partisan or personal lines for not making the same predictions they did personally. 538 made a model, not a poll; they are only ever as good as the polls they represent, and even then, they still managed to be better than quite a lot of public polling and commentary. If Hillary Clinton had won, the same kinds of people on the other side would be attacking 538 for being too bullish on Trump.
The worst part is that a lot of this is predicated on the idea that giving something a 1 in 3 chance is the same as saying it'll never happen, when it's almost the opposite. There are some critical misunderstandings of statistics going on here, on all sides.