0條Plus

專訪谷歌頂級科學家：人工智能離普及還有多遠？

財富中文網(wǎng) 2016年12月25日

谷歌的一位頂級科研人員接受《財富》專訪談人工智能的發(fā)展。

等你下一次不管是用谷歌搜索引擎搜索問題也好，還是在谷歌地圖上找一家電影院的位置也罷，請你記住，在你看不見的地方，正有一個巨大的大腦在為你提供相關(guān)搜索結(jié)果，使你不至于在開車時迷了路。

當然，這里說的并不是人的大腦，而是網(wǎng)絡(luò)搜索巨頭谷歌的“谷歌大腦”（Google Brain）研究團隊?！敦敻弧酚浾吡_杰·帕洛夫曾專門撰文揭開了這支團隊的神秘面紗?！肮雀璐竽X”研究團隊迄今已經(jīng)開發(fā)了1000多個所謂的“深度學習”項目，它們也是YouTube、谷歌翻譯、谷歌照片等近年來谷歌公司多個成功產(chǎn)品背后的大功臣。通過深度學習技術(shù)，研究人員能夠?qū)⒑Ａ繑?shù)據(jù)輸入“神經(jīng)元網(wǎng)絡(luò)”軟件系統(tǒng)進行處理，該系統(tǒng)能夠以人腦完全無法企及的速度，在海量數(shù)據(jù)中進行學習和模式分析。

近日，“谷歌大腦”團隊的創(chuàng)始人和負責人之一的杰夫·迪恩接受了《財富》雜志專訪，并談到了人工智能領(lǐng)域的研究進展及其帶來的挑戰(zhàn)，以及人工智能技術(shù)在谷歌產(chǎn)品中的應(yīng)用。出于篇幅考慮，以下采訪稿有刪節(jié)。

問：在推動人工智能領(lǐng)域研究的過程中，科研人員主要面臨哪些挑戰(zhàn)？

人類的學習有大量內(nèi)容來自無監(jiān)督式的學習，也就是說，你只是在觀察周圍的世界，理解事物的道理。這是機器學習研究的一個非?；钴S的領(lǐng)域，但目前研究的進展與監(jiān)督式學習還是不能比擬的。

也就是說，無監(jiān)督式學習指的是一個人通過觀察和感知進行的學習，如果計算機也能自行進行觀察和感知，就能幫助我們解決更復(fù)雜的問題了?

是的，人類的洞察力主要是通過無監(jiān)督式學習訓(xùn)練出來的。你從小就會觀察世界，但偶爾你也會得到一些監(jiān)督式學習的信號，比如有人會告訴你：“那是一只長頸鹿”或“那是一輛小汽車”。你獲了這些少量的監(jiān)督式信息后，你的心智模式就會自然地對其產(chǎn)生回應(yīng)。

我們需要將監(jiān)督式和非監(jiān)督式學習更緊密地結(jié)合起來。不過以我們大部分機器學習系統(tǒng)的工作模式來看，我們現(xiàn)在還沒有完全進展到那個地步。

你能解釋一下什么是“強化學習”技術(shù)嗎？

“強化學習”背后的理念是，你并不一定理解你可能要采取的行動，所以你會先嘗試你應(yīng)該采取的一系列行動，比如你覺得某個想法很好，就可以先嘗試一下，然后觀察外界的反應(yīng)。這就好比玩桌游，你可以針對對手的舉動做出回應(yīng)。最終在一系列的類似行為之后，你就會獲得某種獎勵信號。

強化學習的理念就是，在你獲得獎勵信號的同時，可以將功勞或過錯分配給你在嘗試過程中采取的所有行動。這項技術(shù)在今天的某些領(lǐng)域的確非常有效。

我覺得強化學習面臨的一些挑戰(zhàn)主要集中在當你可以采取的行為狀態(tài)極為寬泛的時候。在真實世界中，人類在任何給定的時候都可以采取一系列極為寬泛的行為。而在你玩桌游的時候，你能采取的只有有限的一系列行為，因為游戲的規(guī)則限制了你，而且獎勵信號也要明確得多——不是贏就是輸。

如果我的目標是泡一杯咖啡之類的，那我可能采取的潛在行為就相當寬泛了，而獎勵信號也沒有那么明顯了。

不過你們還是可以將步驟分解開，對吧？比如，如果你想泡一杯咖啡，你就可以通過學習得知，如果你在沖泡之前不將咖啡豆充分研磨，泡出來的咖啡就不會好喝。

對。我認為增強學習的一個特點就是它需要探索，所以在物理系統(tǒng)環(huán)境下使用它往往有些困難。不過我們已經(jīng)開始嘗試在機器人上使用這種技術(shù)了。當機器人要需要采取某些行動中，它在特定一天內(nèi)可以采取的行為是有限的。但是如果使用計算機模擬的話，就可以輕易地使用大量計算機獲得上百萬個樣本。

谷歌已經(jīng)開始將強化學習技術(shù)用在核心搜索產(chǎn)品上了嗎？

我們通過與DeepMind（一家人工智能領(lǐng)域的創(chuàng)業(yè)公司，2014年被谷歌收購）和我們的數(shù)據(jù)中心運營人員的共同努力，已經(jīng)將強化學習技術(shù)應(yīng)用到了我們的核心產(chǎn)品上。他們還將這項技術(shù)運用在了數(shù)據(jù)中心的空調(diào)溫控系統(tǒng)上，在大大降低能耗的同時，達到了相同的、安全的冷卻效果和運行條件。它能探索溫控旋鈕的哪種設(shè)置是合理的，以及當你改變運行條件時應(yīng)該如何做出響應(yīng)。

通過強化學習技術(shù)，他們能夠探索這18個或者更多個溫控旋鈕的最優(yōu)設(shè)置，而這可能是連專門負責溫控的工作人員都沒有做過的。熟悉溫控系統(tǒng)的人可能會覺得：“這個設(shè)置真奇怪?！比欢聦嵣纤墓ぷ餍Ч浅：谩?

什么樣的任務(wù)更適合應(yīng)用強化學習技術(shù)？

上面說的數(shù)據(jù)中心這個案例之所以效果很好，就是因為在一段給定時間內(nèi)并沒有太多不同的行為。溫控系統(tǒng)大概有18個溫控旋鈕，你可以把一個旋鈕調(diào)高或調(diào)低，結(jié)果都是很容易衡量的。只要你在可以接受的適當溫度范圍內(nèi)運行，你的能耗利用率就會更好。從這個角度看，這幾乎是一個理想的強化學習技術(shù)的使用案例。

而至于在網(wǎng)絡(luò)搜索中，我應(yīng)該顯示哪些搜索結(jié)果，這應(yīng)該是強化學習技術(shù)的運用效果稍差的一個用例了。針對不同的搜索提問，我可以選擇顯示的搜索結(jié)果的面是很寬的，而且獎勵信號也不明確。比方說一名用戶看到了搜索結(jié)果，至于他心里喜不喜歡這個搜索結(jié)果，這是很不明顯的。

如果他們不喜歡某一搜索結(jié)果，你連衡量它都很難吧？

是的，的確有點棘手。我認為這個例子就能說明強化學習技術(shù)可能還不夠成熟，在這種獎勵信號不夠明確、約束條件太少的環(huán)境下，還不能真正有效地運行。

你們研究出來的這些技術(shù)要想應(yīng)用到人們?nèi)粘Ｊ褂玫漠a(chǎn)品中，還將面臨哪些最大的挑戰(zhàn)？

首先，很多機器學習解決方案和針對這些解決方案的研究是可以在各個不同領(lǐng)域重復(fù)使用的。比如我們與谷歌地圖團隊就在某些研究上展開了合作。他們希望能夠識別出街景圖片中的所有商戶名稱和標志牌，以更深入地了解這個世界——比如確定這究竟是一家披薩店還是別的什么。

事實證明，要想識別這些圖像中的文字，你可以對一個機器學習模型進行“訓(xùn)練”，給它一些人們在文字周圍畫圈或畫框的樣本數(shù)據(jù)。這樣一來，機器學習模型就會學會分辨圖像中的哪些部分包含了文字。

這項能力總體還是很有用的。谷歌團隊的另一部分人還將該技術(shù)運用到了一項衛(wèi)星圖像分析項目中，主要用來分辨美國和全世界的建筑物的房頂，以估算太陽能電池板在房頂上的安裝位置。

我們還發(fā)現(xiàn)，同樣的模型還能協(xié)助我們進行醫(yī)學影響分析方面的一些初級工作。比如說你有一些醫(yī)學影響，你想在其中發(fā)現(xiàn)一些與臨床相關(guān)的有趣的部分，你就可以用這個模型來幫忙。（財富中文網(wǎng)）

作者：Jonathan Vanian

譯者：樸成奎

The next time you enter a query into Google’s search engine or consult the company’s map service for directions to a movie theater, remember that a big brain is working behind the scenes to provide relevant search results and make sure you don’t get lost while driving.

Well, not a real brain per se, but the Google Brain research team. As Fortune’s Roger Parloff wrote, the Google Brain research team has created over 1,000 so-called deep learning projects that have supercharged many of Google’s products over the past few years like YouTube, translation, and photos. With deep learning, researchers can feed huge amounts of data into software systems called neural nets that learn to recognize patterns within the vast information faster than humans.

In an interview with Fortune, one of Google Brain’s co-founders and leaders, Jeff Dean, talks about cutting-edge A.I. research, the challenges involved, and using A.I. in its products. The following has been edited for length and clarity.

What are some challenges researchers face with pushing the field of artificial intelligence?

A lot of human learning comes from unsupervised learning where you’re just sort of observing the world around you and understanding how things behave. That’s a very active area of machine-learning research, but it’s not a solved problem to the extent that supervised learning is.

So unsupervised learning refers to how one learns from observation and perception, and if computers could observe and perceive on their own that could help solve more complex problems?

Right, human vision is trained mostly by unsupervised learning. You’re a small child and you observe the world, but occasionally you get a supervised signal where someone would say, “That’s a giraffe” or “That’s a car.” And that’s your natural mental model of the world in response to that small amount of supervised data you got.

We need to use more of a combination of supervised and unsupervised learning. We’re not really there yet, in terms of how most of our machine learning systems work.

Can you explain the A.I. technique called reinforcement learning?

The idea behind reinforcement learning is you don’t necessarily know the actions you might take, so you explore the sequence of actions you should take by taking one that you think is a good idea and then observing how the world reacts. Like in a board game where you can react to how your opponent plays. Eventually after a whole sequence of these actions you get some sort of reward signal.

Reinforcement learning is the idea of being able to assign credit or blame to all the actions you took along the way while you were getting that reward signal. It’s really effective in some domains today.

I think where reinforcement learning has some challenges is when the action-state you may take is incredibly broad and large. A human operating in the real world might take an incredibly broad set of actions at any given moment. Whereas in a board game there’s a limited set of moves you can take, and the rules of the game constrain things a bit and the reward signal is also much clearer. You either won or lost.

If my goal was to make a cup of coffee or something, there’s a whole bunch of actions I might want to take, and the reward signal is a little less clear.

But you can still break the steps down, right? For instance, while making a cup of coffee, you could learn that you didn’t fully ground the beans before they were brewed—and that it resulted in bad coffee.

Right. I think one of the things about reinforcement learning is that it tends to require exploration. So using it in the context of physical systems is somewhat hard. We are starting to try to use it in robotics. When a robot has to actually take some action, it’s limited to the number of sets of actions it can take in a given day. Whereas in computer simulations, it’s much easier to use a lot of computers and get a million examples.

Is Google incorporating reinforcement learning in the core search product?

The main place we’ve applied reinforcement learning in our core products is through collaboration between DeepMind [the AI startup Google bought in 2014] and our data center operations folks. They used reinforcement learning to set the air conditioning knobs within the data center and to achieve the same, safe cooling operations and operating conditions with much lower power usage. They were able to explore which knob settings make sense and how they reacted when you turn something this way or that way.

Through reinforcement learning they were able to discover knob settings for these 18 or however many knobs that weren’t considered by the people doing that task. People who knew about the system were like, “Oh, that’s a weird setting,” but then it turned out that it worked pretty well.

What makes a task more appropriate for incorporating reinforcement learning?

The data center scenario works well because there are not that many different actions you can take at a time. There’s like 18 knobs, you turn a knob up or down, and you’re there. The outcome is pretty measurable. You have a reward for better power usage assuming you’re operating within the appropriate margins of acceptable temperatures. From that perspective, it’s almost an ideal reinforcement learning problem.

An example of a messier reinforcement learning problem is perhaps trying to use it in what search results should I show. There’s a much broader set of search results I can show in response to different queries, and the reward signal is a little noisy. Like if a user looks at a search result and likes it or doesn’t like it, that’s not that obvious.

How would you even measure if they didn’t like a certain result?

Right. It’s a bit tricky. I think that’s an example of where reinforcement learning is maybe not quite mature enough to really operate in these incredibly unconstrained environments where the reward signals are less crisp.

What are some of the biggest challenges in applying what you’ve learned doing research to actual products people use each day?

One of the things is that a lot of machine learning solutions and research into those solutions can be reused in different domains. For example, we collaborated with our Map team on some research. They wanted to be able to read all the business names and signs that appeared in street images to understand the world better, and know if something’s a pizzeria or whatever.

It turns out that to actually find text in these images, you can train a machine learning model where you give it some example data where people have drawn circles or boxes around the text. You can actually use that to train a model to detect which pixels in the image contain text.

That turns out to be a generally useful capability, and a different part of the Map team is able to reuse that for a satellite-imagery analysis task where they wanted to find roof tops in the U.S. or around the world to estimate the location of solar panel installations on rooftops.

And then we’ve found that the same kind of model can help us on preliminary work on medical imaging problems. Now you have medical images and you’re trying to find interesting parts of those images that are clinically relevant.

撰寫或查看更多觀點, 請打開財富Plus APP

《財富》APP下載

雜志訂閱

在社交媒體上找到我們

專訪谷歌頂級科學家：人工智能離普及還有多遠？