ポスト

My best guess is that they initially wanted to set *compute-optimal* 1 trillion parameters as the threshold, but because it's too subjective, they resorted to something more quantitative In Chinchilla paper, it was estimated that 10^26 gives the lower bound for 1T-parameter…

メニューを開く

人気ポスト

もっと見る
Yahoo!リアルタイム検索アプリ