ポスト

They are releasing a combined text-audio-vision model that processes all three modalities in one single neural network, which can then do real-time voice translation as a special case afterthought, if you ask it to. (fixed it for you)

メニューを開く

Andrej Karpathy@karpathy

みんなのコメント

人気ポスト

もっと見る
Yahoo!リアルタイム検索アプリ