2025年3月26日 星期三

Humanity’s Last Exam

 The questions on Humanity’s Last Exam went through a two-step filtering process. First, submitted questions were given to leading A.I. models to solve.

If the models couldn’t answer them (or if, in the case of multiple-choice questions, the models did worse than by random guessing), the questions were given to a set of human reviewers, who refined them and verified the correct answers. Experts who wrote top-rated questions were paid between $500 and $5,000 per question, as well as receiving credit for contributing to the exam.


Mr. Hendrycks, who helped create a widely used A.I. test known as Massive Multitask Language Understanding, or M.M.L.U., said he was inspired to create harder A.I. tests by a conversation with Elon Musk. (Mr. Hendrycks is also a safety advisor to Mr. Musk’s A.I. company, xAI.) Mr. Musk, he said, raised concerns about the existing tests given to A.I. models, which he thought were too easy.

Once the list of questions had been compiled, the researchers gave Humanity’s Last Exam to six leading A.I. models, including Google’s Gemini 1.5 Pro and Anthropic’s Claude 3.5 Sonnet. All of them failed miserably. OpenAI’s o1 system scored the highest of the bunch, with a score of 8.3 percent.



Mr. Zhou, the theoretical particle physics researcher who submitted questions to Humanity’s Last Exam, told me that while A.I. models were often impressive at answering complex questions, he didn’t consider them a threat to him and his colleagues, because their jobs involve much more than spitting out correct answers.

“There’s a big gulf between what it means to take an exam and what it means to be a practicing physicist and researcher,” he said. “Even an A.I. that can answer these questions might not be ready to help in research, which is inherently less structured.”


spit it out
phrase of spit
  1. informal
    used to urge someone to say or confess something quickly.
    "spit it out, man, I haven't got all day"

2025年3月20日 星期四

Daniel Kahneman 等

  Thinking, Fast and Slow. by Daniel Kahneman


康納曼是全世界最具影響力的思想家之一,他是普林斯頓大學(Princeton University)的心理學家、諾貝爾經濟學獎得主。
一本於2011年首次出版的《快思慢想》(《Thinking, Fast and Slow》),更是讓康納曼在國際上引起了強烈反響。他在漫長的職業生涯中,一直潛心研究人類決策的不完美和不一致。根據大多數人的說法,他選擇結束生命時,身體和精神健康狀況仍然相當不錯。
2024年3月中旬,丹尼爾·康納曼與他的伴侶,從紐約飛到巴黎,與女兒一家團聚。他們用了幾天時間在城市裡散步,參觀博物館,欣賞芭蕾舞表演,品嘗舒芙蕾和巧克力慕斯。
3月22日前後,在那個月邁過90歲門檻的康納曼開始向跟他最親近的幾十個人發送個人郵件。郵件上是這麼寫的:這是我寫給好友們的告別信,我要告訴大家,我在去瑞士的途中,3月27日,我的生命將在瑞士結束。
而3月26日,康納曼離開了家人,飛往瑞士。
想當然,康納曼的死訊一經宣布就引起了廣泛哀悼。不過,只有親朋好友知道他是在瑞士一家自殺協助設施中去世的。時至今日,一些人仍然難以理解他的決定...
👇更深入的完整內容👇
可能是 2 個人和顯示的文字是「國際熱議 對死亡應有多少控制權? 90嵗主動結束生命 《快思慢想》 思慢想》作者一康納曼的最後按擇 作者一康納量的最後按擇 風傳媒×WSJ WSJ 風傳媒 HESTOMMEDIA STORL 草留第日報 中配货日量 SOURCE:圖/華爾街日報 SOURCE 圖/華爾街日報」的圖像



 
数学家亚当•库恰尔斯基谈决策者对逻辑的信任以及为什么算法不一定是答案。
 
卢斯:从查克•舒默到加文•纽森,恐惧和思维混乱正在阻止特朗普的反对者采取行动,捍卫处于危险中的民主制度。