大數(shù)據(jù)的預(yù)測盲區(qū)
????統(tǒng)計學(xué)家內(nèi)特?希爾是個數(shù)學(xué)天才,卻并非因此而出名。他的成名,是因為知道怎樣把數(shù)學(xué)天才運用到真實世界。由于非常準(zhǔn)確地預(yù)測了美國總統(tǒng)大選的結(jié)果,希爾成為全美國最有名的數(shù)據(jù)達(dá)人。他在去年11月份的美國總統(tǒng)大選期間,準(zhǔn)確地預(yù)言了50個州的投票勝負(fù)。2008年,他也猜中了50個州中的49個。同時希爾還把他的大數(shù)據(jù)分析法應(yīng)用到了體育【比如美國的大學(xué)籃球聯(lián)賽(March Madness)、職業(yè)棒球大聯(lián)盟等(Major League Baseball)】、賭博(今年夏天他將第三次參加世界撲克系列賽)、甚至是約會。希爾曾經(jīng)給一個叫Baseball Prospectus的棒球網(wǎng)站寫過文章,現(xiàn)在他擴大了涉足的領(lǐng)域。他既是作家,又是政治專家,而且還在《紐約時報》(New York Times)網(wǎng)站上開了自己的博客“FiveThirtyEight”。 ????本周四,希爾作為主講嘉賓在Lithium Technologies公司的年度LiNC大會上做了有關(guān)數(shù)據(jù)分析的演講。《財富》雜志(Fortune)對他進(jìn)行了專訪,請他談了談大數(shù)據(jù)分析的局限性、大數(shù)據(jù)分析在股市中的角色、以及它如何應(yīng)用到約會中的,甚至還請他預(yù)測了2016年的美國總統(tǒng)大選。這次專訪的文字記錄節(jié)選如下: ????財富:我相信一直都會有人找你,想讓你幫他們賭贏美國大學(xué)體育總會(NCAA)“瘋狂三月”的比賽。 ????內(nèi)特?希爾:我沒有按自己的計算結(jié)果來下注,因為我覺得其他人也可能按我的選擇下注。如果我按照自己的計算結(jié)果買,我已經(jīng)贏了二等獎了。 ????或許你明年可以收一小筆版權(quán)費? ????絕對的。或者我們可以先拋出一個假的計算公式,然后晚一點再拋出一個真的。哎呀,上一個里有編碼錯誤?。ㄐΓ?/p> ????你一開始是用統(tǒng)計學(xué)來研究和預(yù)測棒球比賽勝負(fù),后來為什么又轉(zhuǎn)向政治了? ????回溯往事的時候,說你當(dāng)初為什么做了某些事比較容易,但說出來的不一定就是當(dāng)初推動你往那個方向走的合理動機。不過我認(rèn)為,當(dāng)初的部分理由是,我當(dāng)時為棒球網(wǎng)站Baseball Prospectus工作了五年——從2003年到2008年,這期間我發(fā)現(xiàn)棒球行業(yè)取得了長足的進(jìn)步。那個時代剛開始的時候,和電影《點球成金》(Moneyball ,由一本小說改編成的電影)里描寫的時代非常像,當(dāng)時統(tǒng)計學(xué)家和傳統(tǒng)人士之間的矛盾很緊張。人們擔(dān)心會有一堆宅男沖出來搶走他們的飯碗?,F(xiàn)在情況完全反了過來。事情并不是像你雇了一個統(tǒng)計學(xué)家,然后偷偷把他藏在某個地方。而是每支球隊——幾乎是每支球隊,當(dāng)然也有例外——在它的組織內(nèi)部的各個級別上都有人懂?dāng)?shù)據(jù)分析。 ????我看到統(tǒng)計分析方法在短短幾年的時間里進(jìn)步得很快。而政治報道玩的就是語言藝術(shù)。我發(fā)現(xiàn)無論是關(guān)于政治的新聞報道本身,還是從政治家們嘴里說出來的話,有很多都是在胡扯。所以當(dāng)時我覺得時機已經(jīng)成熟了,可以把某些非?;镜姆治龉ぞ哂迷陉P(guān)于選舉的新聞報道上。 |
????Statistician Nate Silver isn't famous because he's a mathematical genius. (Although, he is.) Silver's well-known because he knows how to apply his craft to the real world. The country's most popular data cruncher is known for his spot-on election predictions -- he accurately called the winner in all 50 states of November's presidential election; in 2008, he went 49 for 50 -- but Silver's big data analytics have also translated to the worlds of sports (March Madness, Major League Baseball), gambling (Silver will play in his third World series of Poker event this summer), and even dating. Silver once wrote for the baseball website Baseball Prospectus but has since expanded his offerings; he is now a published author, a political pundit, and the creator of his very own New York Times blog, FiveThirtyEight. ????Silver was in San Francisco Thursday to talk analytics as the keynote speaker at Lithium Technologies' annual LiNC Conference. Fortune sat down with him to talk about big data's limitations, its role in the stock market, how it applies to dating, and even his predictions for the 2016 presidential election. A lightly edited transcript follows. ????Fortune: I'm sure you get people coming up to you all the time to discuss how you helped them win their NCAA March Madness pool. ????Nate Silver: I went against my bracket in my own pool because I thought other people would be using it. I would have gotten second place if I had taken my own advice. ????Maybe take a small royalty fee next year? ????Absolutely. Or we need to put out a fake bracket [first], and then put out a real one [later]. Oops, there was a coding error! [Laughs] ????You started out using stats to better understand and predict success in baseball -- why did you move towards politics? ????Of course it's easy to say in retrospect why you did certain things instead of what rational motivations were pushing you in that direction in real time, but I think part was that I was involved working for Baseball Prospectus for about five years -- 2003 to 2008 -- and you saw a great amount of progress in the baseball industry during that time. The start of that era was the era described in [the book-turned movie] Moneyball where you really had a lot of tension between stat-heads and traditionalists. People were terrified that nerds would come over and take their jobs. And really now that's been totally reversed, where it's not just that you have some stat-head that you've hired and have locked into a closet somewhere, but that every team -- almost every team, there are some exceptions -- understands analytics at different levels of the organization. ????But seeing how quickly that progressed in a span of just a few years, and how behind politics coverage seemed to be where it's all about the narrative -- there's a lot of bullshit basically both in the news coverage of politics and from politicians themselves -- so it seemed like it was ripe to apply some very basic analytics tools to the coverage of elections. |