2025年2月12日 星期三

used several technological tricks, including a method called “mixture of experts,”

 

How Did DeepSeek Build Its A.I. With Less Money?

The Chinese start-up used several technological tricks, including a method called “mixture of experts,” to significantly reduce the cost of building the technology.


Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

The experts still needed to trade some information with one another, and the generalist — which had a decent but not detailed understanding of each subject — could help coordinate interactions between the experts.

It is a bit like an editor’s overseeing a newsroom filled with specialist reporters.

Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.


Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory — half the space. In essence, it lopped several decimals from each number.

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.


After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem — making a key calculation that would help decide how the neural network would operate — it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.




A Dark Tale for a Dark Time, Told Through Indian Classical Dance

The choreographer Akram Khan’s “Gigenis,” based loosely on a character in the Mahabharata, represents a kind of homecoming for him.


為了在擁擠的市場中真正脫穎而出,人工智慧實驗室不僅需要建立高品質的模型,而且還需要以低成本建立它。我們解釋了他們用來提高模型效率的技巧


To really stand out in the crowded marketplace, an AI lab needs not just to build a high-quality model, but build it cheaply. We explain the tricks they’re using to make models more efficient https://econ.st/3EKesJo
Photograph: Alberto Miranda
可能是顯示的文字是「 For For a fistful of dollars A $6m large language model isn't cool But a $6 one is 」的圖形



沒有留言: