
Colossal-AI
Learn about the distributed techniques of Colossal-AI to maximize the runtime performance of your large neural networks.
Colossal-AI 总览
Colossal-AI 是一个集成的系统,为用户提供一套综合的训练方法。 您可以找到常见的训练方法,如混合精度训练和梯度累积。 此外,我们提供了一系列的并行技术,包括数据并行、张量并行和流水线并行。
快速演示 - Colossal-AI
Colossal-AI 是一个集成的大规模深度学习系统,具有高效的并行化技术。该系统可以通过应用并行化技术在具有多个 GPU 的分布式系统上加速模型训练。该系统也可以在只有一个 GPU 的系统上运行。以下是展示如何使用 Colossal-AI 的 Quick demos。
Quick Demo - Colossal-AI
Colossal-AI is an integrated large-scale deep learning system with efficient parallelization techniques. The system can accelerate model training on distributed systems with multiple GPUs by applying parallelization techniques.
Introduction - Colossal-AI
The Colossal-AI system uses a device-mesh, similar to PyTorch's latest DTensor release, to manage its cluster. Colossal-AI uses a sharding-spec to annotate the storage status of each tensor and facilitate their distribution across the cluster.
Colossal-AI Overview
Colossal-AI is designed to be a unified system to provide an integrated set of training skills and utilities to the user. You can find the common training utilities such as mixed precision training and gradient accumulation.
Setup - Colossal-AI
The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. git clone https://github.com/hpcaitech/ColossalAI.git
Reading Roadmap - Colossal-AI
These tutorials will cover the basic usage of Colossal-AI to realize simple functions such as data parallel and mixed precision training. Lastly, if you wish to apply more complicated techniques such as how to run hybrid parallel on GPT-3, the advanced tutorials section is the place to go!
Colossal-AI
了解Colossal-AI内置的分布式技术以充分优化您的大型神经网络的运行性能。
Distributed Training - Colossal-AI
Distributed training is definitely a common practice when researchers and engineers develop AI models. There are several reasons behind this trend. Model size increases rapidly.