About 144,000 results
Open links in new tab
  1. djsamseng/CudaAwareMPINumba - GitHub

    How to install and run Cuda aware MPI with Numba and send device (GPU) memory via MPI

  2. NVIDIA/multi-gpu-programming-models - GitHub

    MPI: The mpi and mpi_overlap variants require a CUDA-aware 1 implementation. For NVSHMEM, NCCL and multi_node_p2p, a non CUDA-aware MPI is sufficient. The examples have been developed and tested with OpenMPI. NVSHMEM (version 0.4.1 or later): Required by the NVSHMEM variant.

  3. Apache TVM

    Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.

  4. apache-tvm - PyPI

    Jun 21, 2023 · Apache TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends.

  5. C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k)) Matmul: Operator Specification for yo in range(128): for xo in range(128): C[yo*8:yo*8+8][xo*8:xo*8+8] = 0 for ko in range(128): for yi in range(8): for xi in range(8): for ki in range(8): C[yo*8+yi][xo*8+xi] += A[ko*8+ki][yo*8+yi] * B[ko*8+ki][xo*8+xi] Loop Tiling for ...

  6. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization chal-lenges specific to deep learning, such as high-level op-erator fusion, mapping to arbitrary hardware primitives, and memory latency hiding.

  7. Apache TVM Documentationtvm 0.21.dev0 documentation

    Welcome to the documentation for Apache TVM, a deep learning compiler that enables access to high-performance machine learning anywhere for everyone. TVM’s diverse community of hardware vendors, compiler engineers and ML researchers work together to build a unified, programmable software stack, that enriches the entire ML technology ecosystem ...

  8. TI TVM User’s Guide, Release TIDL_PSDK_8.6.0 Texas Instrument’s fork of the Apache Tensor Virtual Machine (TVM) enables support for the TDA4 family of processors. These processors use C7x DSPs and Matrix Multiplication Accelera-tors (MMA) to accelerate inference-making by machine learning models. For additional informa-

  9. tvm.runtime.disco — tvm 0.21.dev0 documentation

    class tvm.runtime.disco. ProcessSession ( num_workers : int , num_groups : int = 1 , entrypoint : str = 'tvm.exec.disco_worker' ) A Disco session backed by pipe-based multi-processing.

  10. cld / ml / tvm - GitLab

    Mirror of https://github.com/dmlc/tvm for internal development Check other branches for active development. Don't forgot to git submodule init and git submodule update!

  11. Some results have been removed
Refresh