2012年11月28日 星期三

Scientists See Promise in Deep-Learning Programs


讓機器領會人類語言的「深度學習」

Hao Zhang/The New York Times
一個語音識別程序把微軟首席科學家理乍得·F·拉希德演講的內容翻譯成了漢語普通話。

一些科技公司宣稱,利用一項基於人腦識別規律模式理論的人工智能技術,它們在計算機視覺、語音識別,以及辨識可望用於製藥的新分子等眾多領域取得了驚人的成果。
這些成果在那些設計執行看、聽、思考等人類活動的軟件的研究者中激起了廣泛熱情。它們提供了科技前景,讓人們有望製造出能夠與人類交流、能夠完成開車及工廠勞動等任務的機器,同時也讓人們更加擔心,能夠取代人工的自動機械人即將問世。
目前,這種名為“深度學習”的技術已經被應用於以紐昂斯通訊公司(Nuance Communications)的語音識別技術為基礎的虛擬個人助手“蘋果語音助手”(Apple's Siri),以及利用機器視覺來辨識地址的“谷歌街景”(Google’s Street View)等服務軟件。
但是,近幾個月才有的新事物是深度學習程序不斷提高的速度和精確度,這些程序通常被稱作人工神經網絡,或者簡稱為“神經網”,原因是它們與人腦的神經連結相似。
“深度學習的方法取得了一系列令人驚訝的新成果,”曾在貝爾實驗室(Bell Laboratories)從事開創性筆跡識別研究的紐約大學(New York University)計算機科學家嚴恩·勒坤(Yann LeCun)說。“這些系統在精確度上的巨大進步的確非常罕見。”
上個月,深度學習在中國天津的一次會議上得到了十分高調的展示。當微軟(Microsoft)首席科學家理乍得·F·拉希德(Richard F. Rashid)在巨大的禮堂里發表演說時,電腦程序對他的講話內容進行了識別,還用英語把這些內容實時顯示在了他上方的大屏幕上。
之後,他在講完每句話之後稍作停頓,程序就把這些話翻譯成了中文,同時還附上了模擬他嗓音的漢語配音,儘管拉希德從來都沒說過漢語。這個展示震驚了觀眾,現場掌聲雷動。
之所以能取得這個成果,部分是由於深度學習技術推動了語音識別精確度的提高。
負責監管微軟在全球各地的研究機構的拉希德表示,雖然微軟新語音識別軟件的誤差要比之前的版本低30%,但“還是離完美很遠”。
“現在的誤差率是七分之一到八分之一,不再是四分之一到五分之一,”他在微軟的官方網站上寫道。但是,他表示這仍然是自1979年以來“精確度方面的最顯著進步”,“而且,隨着我們將更多的數據加入訓練過程,我們相信自己能取得更好的結果。”
人工智能研究者非常清楚過分樂觀的危險。長期以來,他們的研究領域一直充斥着不合時宜的爆發熱情,隨之而來的則是同樣引人注目的倒退。
20世紀60年代,有些計算機科學家相信,他們距離可行的人工智能系統只有十年之遙。而在20世紀80年代,一大批商業科技新興公司紛紛倒閉,導致了一些人所說的“AI之冬”(AI winter,即人工智能之冬——譯註)。
然而,近期的成就給計算機領域的很多專家留下了深刻印象。舉例來說,今年十月,在默克集團(Merck)贊助的用於尋找有可能衍生新葯的分子的軟件 設計競賽中,一個與加拿大多倫多大學(University of Toronto)的計算機科學家傑弗里·E·欣頓(Geoffrey E. Hinton)一起從事研究工作的研究生小組獲得了頭獎。
利用深度學習軟件,他們從描述了15種不同分子的化學結構的數據組中挑出了最可能成為有效藥物助劑的那種分子。
這個成果尤其令人震驚,因為這個團隊是事到臨頭才決定參賽的,而且他們設計軟件的時候,對分子和目標之間的聯繫並沒有特別深刻的了解。同時,這些學生面對的是一個相對較小的數據組,而神經網絡通常要在非常大的數據組中才會有良好表現。
“結果真的非常驚人,因為這是深度學習方法首次勝出,更值得一提的是,人們根本想不到它會在這樣一個數據組中取勝,”預測分析公司Kaggle的首 席執行官及創始人安東尼·戈德布盧姆(Anthony Goldbloom)說。該公司經常組織數據科學競賽,包括由默克贊助的上述競賽。
規律模式辨識領域所取得的成就不僅將對藥品研發產生影響,還將對市場營銷和執法等諸多方面產生影響。例如,隨着精確度的提高,市場營銷人員可以通過梳理關於消費者行為的大型數據庫來獲得更準確的消費習慣信息。面部識別技術的進步也會降低監察技術的成本,使之更加普及。
人工神經網絡的理念源於20世紀50年代,旨在模擬人腦吸收信息並從中學習的方式。近幾十年,64歲的欣頓博士(19世紀數學家喬治·布爾 [George Boole]的玄孫,布爾在邏輯領域的工作構成了現代數碼計算機的基礎)率先推出了一些強大的新技術,用來幫助人工神經網絡識別規律模式。
現代人工神經網絡由一系列軟件組成,分為輸入、隱藏層和輸出幾個部分。通過反覆對圖像或聲音等規律模式進行識別,這些軟件就可以得到“訓練”。
在現代計算機日益增長的計算速度和計算能力的幫助下,這些技術推動了語音識別、新葯研製和計算機視覺等領域的快速發展。
最近,在一些特定的有限認知測試中,深度學習系統的表現甚至超過了人類。
例如,盧加諾大學(University of Lugano)瑞士AI實驗室(Swiss A. I. Lab)的科學家開發的一個程序在去年的一場規律模式辨識競賽中勝出,在德國交通標誌的數據庫中分辨圖像時,其表現超過了參與競賽的其他軟件系統和人類專家。
在包含5萬張圖像的數據組中,獲勝的程序精確地辨識出了其中99.46%的圖像;而由32人組成的人類參賽小組所取得的最好成績是99.22%,人類的平均水平則是98.84%。
今年夏季,谷歌的技術人員傑夫·迪安(Jeff Dean)和斯坦福大學(Stanford University)的計算機科學家安德魯·吳(Andrew Y. Ng)把1.6萬台電腦連在一起,使其能夠自我訓練,對2萬個不同物體的1400萬張圖片進行辨識。儘管準確率較低,只有15.8%,但該系統的表現比之 前最先進的系統都要好70%。
欣頓博士領導的研究項目最突出的方面之一是,研究工作基本不受專利權的限制,也沒有因為爭奪知識產權而發生激烈的內部鬥爭,雖然這種鬥爭在高科技領域很常見。
“我們很早就決定不利用這個掙錢,只是想把它推廣開來,影響到每個人,”他說。“這些公司都對這一點感到非常高興。”
更強的計算能力,尤其是圖形處理器的興起,帶來了深度學習的快速發展,說到這一點時,他補充道:
“這項技術的妙處在於,它的發展模式非常漂亮。你基本只需要不斷擴大它的規模、加快它的速度,它就會越變越好。現在它是不會走回頭路的。”
翻譯:陳柳


Scientists See Promise in Deep-Learning Programs

Hao Zhang/The New York Times
A voice recognition program translated a speech given by Richard F. Rashid, Microsoft’s top scientist, into Mandarin Chinese.

Using an artificial intelligence technique inspired by theories about how the brain recognizes patterns, technology companies are reporting startling gains in fields as diverse as computer vision, speech recognition and the identification of promising new molecules for designing drugs.
The advances have led to widespread enthusiasm among researchers who design software to perform human activities like seeing, listening and thinking. They offer the promise of machines that converse with humans and perform tasks like driving cars and working in factories, raising the specter of automated robots that could replace human workers.
The technology, called deep learning, has already been put to use in services like Apple’s Siri virtual personal assistant, which is based on Nuance Communications’ speech recognition service, and in Google’s Street View, which uses machine vision to identify specific addresses.
But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just “neural nets” for their resemblance to the neural connections in the brain.
“There has been a number of stunning new results with deep-learning methods,” said Yann LeCun, a computer scientist at New York University who did pioneering research in handwriting recognition at Bell Laboratories. “The kind of jump we are seeing in the accuracy of these systems is very rare indeed.”
Deep learning was given a particularly audacious display at a conference last month in Tianjin, China, when Richard F. Rashid, Microsoft’s top scientist, gave a lecture in a cavernous auditorium while a computer program recognized his words and simultaneously displayed them in English on a large screen above his head.
Then, in a demonstration that led to stunned applause, he paused after each sentence and the words were translated into Mandarin Chinese characters, accompanied by a simulation of his own voice in that language, which Dr. Rashid has never spoken.
The feat was made possible, in part, by deep-learning techniques that have spurred improvements in the accuracy of speech recognition.
Dr. Rashid, who oversees Microsoft’s worldwide research organization, acknowledged that while his company’s new speech recognition software made 30 percent fewer errors than previous models, it was “still far from perfect.”
“Rather than having one word in four or five incorrect, now the error rate is one word in seven or eight,” he wrote on Microsoft’s Web site. Still, he added that this was “the most dramatic change in accuracy” since 1979, “and as we add more data to the training we believe that we will get even better results.”
Artificial intelligence researchers are acutely aware of the dangers of being overly optimistic. Their field has long been plagued by outbursts of misplaced enthusiasm followed by equally striking declines.
In the 1960s, some computer scientists believed that a workable artificial intelligence system was just 10 years away. In the 1980s, a wave of commercial start-ups collapsed, leading to what some people called the “A.I. winter.”
But recent achievements have impressed a wide spectrum of computer experts. In October, for example, a team of graduate students studying with the University of Toronto computer scientist Geoffrey E. Hinton won the top prize in a contest sponsored by Merck to design software to help find molecules that might lead to new drugs.
From a data set describing the chemical structure of 15 different molecules, they used deep-learning software to determine which molecule was most likely to be an effective drug agent.
The achievement was particularly impressive because the team decided to enter the contest at the last minute and designed its software with no specific knowledge about how the molecules bind to their targets. The students were also working with a relatively small set of data; neural nets typically perform well only with very large ones.
“This is a really breathtaking result because it is the first time that deep learning won, and more significantly it won on a data set that it wouldn’t have been expected to win at,” said Anthony Goldbloom, chief executive and founder of Kaggle, a company that organizes data science competitions, including the Merck contest.
Advances in pattern recognition hold implications not just for drug development but for an array of applications, including marketing and law enforcement. With greater accuracy, for example, marketers can comb large databases of consumer behavior to get more precise information on buying habits. And improvements in facial recognition are likely to make surveillance technology cheaper and more commonplace.
Artificial neural networks, an idea going back to the 1950s, seek to mimic the way the brain absorbs information and learns from it. In recent decades, Dr. Hinton, 64 (a great-great-grandson of the 19th-century mathematician George Boole, whose work in logic is the foundation for modern digital computers), has pioneered powerful new techniques for helping the artificial networks recognize patterns.
Modern artificial neural networks are composed of an array of software components, divided into inputs, hidden layers and outputs. The arrays can be “trained” by repeated exposures to recognize patterns like images or sounds.
These techniques, aided by the growing speed and power of modern computers, have led to rapid improvements in speech recognition, drug discovery and computer vision.
Deep-learning systems have recently outperformed humans in certain limited recognition tests.
Last year, for example, a program created by scientists at the Swiss A. I. Lab at the University of Lugano won a pattern recognition contest by outperforming both competing software systems and a human expert in identifying images in a database of German traffic signs.
The winning program accurately identified 99.46 percent of the images in a set of 50,000; the top score in a group of 32 human participants was 99.22 percent, and the average for the humans was 98.84 percent.
This summer, Jeff Dean, a Google technical fellow, and Andrew Y. Ng, a Stanford computer scientist, programmed a cluster of 16,000 computers to train itself to automatically recognize images in a library of 14 million pictures of 20,000 different objects. Although the accuracy rate was low — 15.8 percent — the system did 70 percent better than the most advanced previous one.
One of the most striking aspects of the research led by Dr. Hinton is that it has taken place largely without the patent restrictions and bitter infighting over intellectual property that characterize high-technology fields.
“We decided early on not to make money out of this, but just to sort of spread it to infect everybody,” he said. “These companies are terribly pleased with this.”
Referring to the rapid deep-learning advances made possible by greater computing power, and especially the rise of graphics processors, he added:
“The point about this approach is that it scales beautifully. Basically you just need to keep making it bigger and faster, and it will get better. There’s no looking back now.”



沒有留言: