The H800 has lower NVLink bandwidth compared to the H100, and this, naturally, affects multi-GPU communication performance. DeekSeek-V3 required a total of 2.79 million GPU-hours for pretraining ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results