AI TECH INSIGHT
Machine learning frameworks such as PyTorch and TensorFlow are the de facto tools that AI developers use to train models and develop AI applications because of the powerful capabilities they provide. In this article, we introduce the Voyager SDK, which developers can use to deploy such applications to the Metis AI PU quickly, effortlessly and with high performance.
What is different at the Edge?
Machine learning frameworks are designed around the use of 32-bit floating point data, which has the precision needed to train models using standard backpropagation techniques. Models are often trained in the data center using powerful but expensive, energy-inefficient GPUs, and in the past these models were often used directly for inferencing on the same hardware. However, this class of hardware is no longer needed to achieve high inference accuracy and today’s challenge is how to efficiently deploy these models to lower cost, power-constrained devices operating at the network edge.
A complete AI application involves a pipeline of multiple tasks. For example, a computer vision application typically combines a deep learning model that operates on tensor data with various pre and post processing tasks that operate on non-tensor data such as pixels, labels and key points. The latter, also referred to as non-neural tasks, prepare data for input to the deep learning model. Examples include scaling an image to the model’s input resolution and encoding the image to the required tensor format. Non-neural tasks are also used to interpret the predicted output tensors, for example generating an array of bounding boxes.
For ease of development, most models are implemented and trained in high-level languages such as Python. However, most inference devices rely on low-level embedded programming to achieve the requisite performance. The core deep learning model is usually defined within the tight constraints of the ML framework, which enables the use of quantization tools to optimize and compile the model to run as native assembly on the target AI accelerator. The non-neural tasks are often more general-purpose in their design and their optimal location may vary from one platform to the next. In the example above, preprocessing elements are offloaded to an embedded media accelerator, and visualization elements reimplemented as OpenGL kernels on an embedded GPU. Furthermore, combining these heterogeneous components efficiently requires the use of a low-level language such as C++ and libraries that enable efficient buffer sharing and synchronization between devices. Many application developers are not familiar with low-level system design and thus providing developers with easy-to-use pipeline deployment tools is a prerequisite to enable the mass adoption of new Edge AI hardware accelerators in the market.
Simplifying AI development for the Edge
The Voyager SDK offers a fast and easy way for developers to build powerful and high-performance applications for Axelera AI’s Metis AI platform. Developers describe their end-to-end pipelines declaratively, in a simple YAML configuration file, which can include one or more deep learning models along with multiple non-neural pre and post processing elements. The SDK toolchain automatically compiles and deploys the models in the pipeline for the Metis AI platform and allocates pre and post processing components to available computing elements on the host such as the CPU, embedded GPU or media accelerator. The compiled pipeline can then be used directly as a first-class object from Python or C++ application code as an “inference input/output stream”.
The Voyager SDK provides turnkey pipelines for state-of-the-art models for image classification, object detection, segmentation, keypoint detection, face recognition and other computer vision tasks. It also comes with a library of non-neural processing elements such as scaling, cropping, normalization, format conversion and non-maximal suppression (NMS). Developers can modify the provided sample models to work with their own datasets, or they can port their proprietary models to Metis by writing a simple Python helper script with hooks to their original model and dataloader code. The Voyager SDK supports PyTorch models on initial release, with support for other frameworks including TensorFlow and ONNX to follow.
The pipeline generated by the low-code deployment workflow can be directly embedded within the Axelera Inference Server to obtain a variety of preconfigured, out-of-the-box solutions. These solutions range from fully embedded processing of a CSI MIPI camera to distributed processing of multiple RTSP streams across networked devices. Customers can also add their own Python/C++ callback functions, enabling seamless integration of their own business logic within an end-to-end pipeline. Under the hood, pipelines use Gstreamer libraries, plugins and extensions to achieve efficient buffer sharing and synchronization across all devices in the system. While most customers have no need to develop at this level, advanced users retain full control over the generated code, enabling them to extend support for additional devices and kernels
The highest performance with no compromise to model accuracy
The compiler fully automates the deployment of models to the Metis AI processor. Based on the Apache TVM compiler framework, it inputs models pretrained in industry-standard frameworks such as PyTorch, and outputs code tuned for Metis hardware. During compilation, the compiler quantizes the model using proprietary, state-of-the-art algorithms. At the same time, it partitions the model for optimal execution on Metis hardware elements. Convolution layers are performed on quantized 8-bit integer data within the Digital In-Memory Compute Engine (D-IMC), ensuring the highest performance and energy efficiency. Activations and other operations are allocated to floating-point units with re-quantization operations inserted where necessary. Using only a small number of calibration images and no manual intervention, the compiler generates code with an accuracy practically indistinguishable from the original model. A fallback path is provided to offload any operations unsupported by the hardware to the host, which ensures full coverage of all customer models. Models deployed on the Metis AI processor have the highest performance and energy efficiency, while retaining the full accuracy of the original FP32 model.
The Voyager SDK is a game-changer for developers looking to build powerful AI applications on the Metis AI platform. Its end-to-end approach to building and optimizing AI pipelines bridges the gap between machine learning applications developed in the cloud and low-cost energy-efficient devices operating at the Edge. With Voyager, it’s never been easier to build, optimize, and deploy your models for maximum performance and efficiency at the Edge!