大數(shù)據(jù)的局限性
????阿貝斯曼頗具挑釁性的核心觀點(diǎn)是,有一個(gè)由事實(shí)組成的虛擬物理現(xiàn)象?!笆聦?shí)”遵從既定的規(guī)律和軌跡,這取決于它們的界定和衡量方式。“我們每天讀新聞時(shí),可能都要面對(duì)一個(gè)關(guān)于我們的世界,與我們自認(rèn)為了解的狀況完全不同的事實(shí),”他寫道?!暗聦?shí)證明,這些日新月異的變化,雖然在我們看來它們發(fā)生了真實(shí)的相變,但并不意外,也不是隨機(jī)的。通過應(yīng)用概率,我們可以理解它們的總體行為方式,但我們也可以通過搜索我們對(duì)其認(rèn)識(shí)的速度更慢、有規(guī)律的變化,來預(yù)測(cè)這些變化。事實(shí)的快速變化,如同我們看到的其他任何事物一樣,有其自身的規(guī)則,是可衡量、可預(yù)測(cè)的。” ????“可衡量”、“可預(yù)測(cè)”是什么意思?阿貝斯曼非常擅長(zhǎng)描述機(jī)構(gòu)、個(gè)人和概率的偏差,這種偏差可以扭曲科學(xué)和科學(xué)家評(píng)估、發(fā)布以及消滅“事實(shí)”的方式。 ????“這方面最明顯的例子出現(xiàn)在負(fù)面結(jié)果領(lǐng)域,”阿貝斯曼這樣寫道。他援引了進(jìn)化生物學(xué)家約翰?梅納德?史密斯曾經(jīng)說過的一段話:“統(tǒng)計(jì)學(xué)是一門讓你每年進(jìn)行20次試驗(yàn),然后在《自然》雜志(Nature)發(fā)布一個(gè)錯(cuò)誤結(jié)果的科學(xué)。然而,要是20位獨(dú)立的科學(xué)家分別進(jìn)行同一項(xiàng)試驗(yàn),其中的19位將以失敗告終,其職業(yè)生涯自然也就無法更進(jìn)一步。這種情形當(dāng)然令人苦惱,但這就是科學(xué)的運(yùn)行方式。大多數(shù)想法和實(shí)驗(yàn)都是不成功的。但最重要的是,失敗的結(jié)果也很少公布?!?/p> ????問題的關(guān)鍵并非統(tǒng)計(jì)科學(xué)或科學(xué)的統(tǒng)計(jì)學(xué)存在病理缺陷,而是這種已知的病理缺陷可以創(chuàng)造出動(dòng)機(jī),讓我們重新思考、修改并重新設(shè)計(jì)我們衡量和測(cè)試的事物。我們需要“事實(shí)”幫助我們更新我們對(duì)于“事實(shí)”的思考和理解??茖W(xué)——以及為其提供驅(qū)動(dòng)和支持的日益數(shù)字化的技術(shù)——為難以理解自身不斷增長(zhǎng)的海量數(shù)據(jù)、無法為這些數(shù)據(jù)增添價(jià)值的企業(yè)提供了一個(gè)強(qiáng)大的模型。 ????就這方面而言,《事實(shí)的半衰期》是一部入門讀本,闡述的是認(rèn)識(shí)論的流行病學(xué),即對(duì)于知識(shí)和認(rèn)知性質(zhì)的理解在一門學(xué)科、一種職業(yè)或文化中如何傳播的過程。阿貝斯曼的工作將敦促世界各地的決策者重新思考一個(gè)問題,他們的組織如何將有趣的數(shù)據(jù)轉(zhuǎn)化為有用的事實(shí)。 ????統(tǒng)計(jì)學(xué)家、《紐約時(shí)報(bào)》(The New York Times)網(wǎng)站 FiveThirtyEight博客撰稿人內(nèi)特?希爾則采用了一種完全不同,但又與阿貝斯曼相互兼容的方式探討知識(shí)、事實(shí)和可預(yù)見性等問題。通過有些過于繁多的詳細(xì)例證和插曲,希爾的這部著作就預(yù)測(cè)的傲慢發(fā)出了一組發(fā)人深省的警告。希爾這樣寫道:“這本書講述的與其說是我們知道的事物,倒不如說是我們知道的事物與我們認(rèn)為我們知道的事物之間的差異?!?/p> ????從天氣、地震、全球變暖、足球,到次級(jí)抵押貸款和全球金融危機(jī),希爾解釋了建模者和預(yù)報(bào)者為什么難以將昨天的數(shù)據(jù)轉(zhuǎn)化為明天“你可以賭一把”的預(yù)測(cè)。這些微觀案例研究雖然肯定是膚淺的,但并沒有回避數(shù)學(xué),而且對(duì)大多數(shù)最重要的假設(shè)采取了一以貫之的公正態(tài)度。要是本書編輯更優(yōu)秀一些的話,他或許將督促希爾犧牲數(shù)量,撰寫更多的深刻見解,但這些例證的廣度無可否認(rèn)地揭示了“預(yù)測(cè)的病理學(xué)”。 |
????What do we mean by "measurable" and "predictable?" Arbesman is quite good at describing the institutional, individual and probabilistic biases that skew how both science and scientists assess, publish and extinguish "facts." ????"The clearest example of this is in the world of negative results," Arbesman writes. He cites evolutionary biologist John Maynard Smith, who noted that "statistics is the science that lets you do twenty experiments a year and publish one false result in Nature. However, if it were one experiment being replicated by twenty separate scientists, nineteen of those would be a bust, with nineteen careers unable to move forward. Annoying, certainly … but that's how science operates. Most ideas and experiments are unsuccessful. But crucially, unsuccessful results are rarely published." ????The point is not that the science of statistics or the statistics of science are pathologically flawed but that known pathologies and flaws can create incentives to rethink, revise and redesign what we measure and test. We need "facts" to help us renew our insights and understandings about "facts." Science -- and the increasingly digital technologies that both drive and support it -- offers a powerful model for enterprises struggling to make sense of and add value to their growing mountains of data. ????In that respect, The Half-Life of Facts offers a pop science primer on the epidemiology of epistemology -- that is, the process by which ideas about the nature of knowledge and knowing spread throughout a discipline, a profession and a culture. Arbesman's work challenges decision-makers worldwide to rethink how they want their organizations to turn intriguing data into useful facts. ????Silver, a statistician who writes the FiveThirtyEight blog for the New York Times site, takes a different but compatible approach to knowledge, fact, and predictability. Almost overstuffed with detailed examples and vignettes, his book delivers a sobering portfolio of warnings about predictive hubris. "This book is less about what we know," Silver writes, "than about the difference between what we know and what we think we know." ????From weather to earthquakes to global warming to football to subprime mortgages to the global financial crisis, Silver explains how modelers and forecasters struggle to convert yesterday's data into tomorrow's "you can bet on it" predictions. These miniature case studies, while necessarily superficial, don't shy away from the math and consistently take a fair-minded view of the most important assumptions. A better editor might have pushed Silver to sacrifice quantity for keener insight, but the breadth of examples undeniably reveal a "pathology of prediction." |
最新文章