AI TECH INSIGHT

2024-04-09

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

Open standards enable developers to more easily harness the power of AI accelerators, especially in heterogenous computing. Here you can read in detail why and how we implemented OpenCL using oneAPI on Metis.

Manuel Mohr | Staff Software Engineer
AXELERA AI

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

AI TECH INSIGHT

2024-04-09

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

Manuel Mohr | Staff Software Engineer
AXELERA AI

The necessity of dedicated AI hardware accelerators

AI applications have an endless hunger for computational power. Currently, increasing the sizes of the models and cranking up the number of parameters has not quite yet reached the point of diminishing returns. Thus, the ever growing models still yield better performance than their predecessors.

At the same time, new areas for application of AI tools are explored and discovered almost daily. Hence, building dedicated AI hardware accelerators is extremely attractive. In some situations it is even a necessity, as it enables running more powerful AI applications while using less energy on cheaper hardware.

Welcome to the hardware jungle

Such specialized accelerator hardware poses great challenges to software developers, as they instantly transform a regular computer into a heterogeneous supercomputer, where the accelerator is distinctly different from the host processor. Moreover, each accelerator is different in its own way and wants to be programmed appropriately to actually reap the potential performance and efficiency benefits.

In his 2011 article ¹, Herb Sutter heralded this age with the words “welcome to the hardware jungle”. And since he wrote this article, a thick jungle it has indeed become, with multiple specialized hardware accelerators now being commonplace across all device categories ranging from low-end phones to high-end servers.

So what’s the machete that developers can use to make their way through this jungle without getting lost?

Why custom accelerator interfaces are a bad idea

The answer lies in the creation of a suitable programming interface for those accelerators. Creating a custom interface that is completely tailored for a new accelerator silicon could let a developer exploit every little feature that the hardware has to offer to achieve maximum performance.

However, upon closer inspection, this is a bad idea for a variety of reasons. Firstly, while there might be the possibility of achieving peak performance with a custom interface, it would require expertise that is already hard to come by for existing devices and even rarer for new devices. The necessary developer training is time-intensive and costly.

Even more importantly, using a different bespoke interface to program each accelerator can also result in vendor lock-in if the created software completely relies on such a custom interface, making it highly challenging and significantly more expensive to switch to a different hardware accelerator. The choice of programming interface is thus crucial not only from a technical perspective, but also from a business standpoint. At Axelera, we therefore believe that the answer to the question of how to best bushwhack through the accelerator jungle is to embrace open standards, such as OpenCL ² and SYCL ³.

Open standards for open interaction

OpenCL and SYCL are open standards defined by the Khronos Group. They define an application programming interface (API) for interacting with all kinds of devices, as well as programming languages for implementing compute kernels to run on these devices.

SYCL provides high-level programming concepts for heterogeneous computing architectures, together with the ability to maintain code for host and device inside a shared source file.

But providing a standard-conformant implementation of such open standards poses a daunting challenge for creators of new hardware accelerators. The OpenCL API consists of more than 100 functions and OpenCL C specifies over 10000 built-in functions that compute kernels can use. It would be great if these open standards were also accompanied by high-quality open-source implementations that are easy to port to new silicon. Fortunately, in the case of OpenCL and SYCL, this is indeed the case.

Increased developer productivity

Open standards such as OpenCL & SYCL promise portability across different hardware devices and also foster collaboration and code reuse. After all, it suddenly becomes possible and worthwhile to create optimized libraries that are usable for many devices, which ultimately increases developer productivity.

Axelera is a member of the UXL Foundation ⁴, a group that governs optimized libraries implemented using SYCL. These libraries are compatible with this software stack, offering math and AI operations through standard APIs.

Evaluate industry defining AI inference technology today.

The Axelera AI Metis Platform accelerates prototyping and deploying (vision) AI acceleration by providing a comprehensive hardware and software solution with unmatched usability and cost-efficiency.

Be among the first to accelerate your innovation and experience true freedom to innovate. Order your Metis evaluation kit and be a part of shaping the future of Edge AI.

GET YOUR EVALUATION KIT

Which Evaluation kit do you want to order?1/3.

Which evaluation kit do you want?

This field is required!

Company name

This field is required!

What is your focus industry/application?

This field is required!

Other industry segment

This is not correct

What best describes your company?

This is not correct.

Other company type

This is not correct

Back

Your contact details2/3.

First name

This field is required!

Last name

This field is required!

Job Title

This field is required!

Country

United States
Canada
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
British Indian Ocean Territory
British Virgin Islands
Brunei
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Congo
Cook Islands
Costa Rica
Croatia
Cuba
Curaçao
Cyprus
Czech Republic
Côte d’Ivoire
Democratic Republic of the Congo
Denmark
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hong Kong S.A.R., China
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Japan
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Luxembourg
Macao S.A.R., China
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia
Moldova
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
North Korea
Northern Mariana Islands
Norway
Oman
Pakistan
Palau
Palestinian Territory
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Romania
Russia
Rwanda
Réunion
Saint Barthélemy
Saint Helena
Saint Kitts and Nevis
Saint Lucia
Saint Pierre and Miquelon
Saint Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Svalbard and Jan Mayen
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tokelau
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Tuvalu
U.S. Virgin Islands
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican
Venezuela
Viet Nam
Wallis and Futuna
Western Sahara
Yemen
Zambia
Zimbabwe

This is not correct.

This field is required!

Phone number

This field is required!

Back

Your project info3/3.

This field is required!

Back

One more thing...

How did you hear about us?

This field is required!

Other media channel

This is not correct

By submitting your information, you consent to ourprivacy policyand authorize us to store your personal data and contact you regarding organizational details.

Join our monthly updates about the future of edge-AI! By signing up, you agree to receive regular updates from Axelera AI, as per ourprivacy policy, and stay at the forefront of AI innovation.

Back

Thank you for your ordering your Axelera Metis Evaluation Kit!

We've received your order, and a confirmation email has been sent to the provided email address. Our team is excited to review your order.

After evaluating your input, we will be in touch within the next 2 business days to discuss the next steps and how your order can benefit your innovative projects.
Stay tuned for more details coming your way soon!

Conquering the jungle with the oneAPI Construction Kit

The open source oneAPI Construction Kit from Codeplay is a collection of high-quality implementations of open standards, such as OpenCL and Vulkan Compute, that are designed from the ground up to be easily portable to new hardware targets. We want to share our experiences using the Construction Kit to unlock OpenCL and SYCL for our Metis AI Processing Unit (AIPU) ⁵. Prerequisites for deployment In order to enable porting an existing OpenCL implementation to a new device, two prerequisites must be fulfilled:

There must be a compiler backend able to generate code for the device’s compute units. As the oneAPI construction kit, like virtually all OpenCL implementations, is based on the LLVM compiler framework, in this case this means having an LLVM code generator backend for the target instruction set architecture (ISA). As our Metis AIPU’s compute units are based on the RISC-V ISA, we could just use the RISC-V backend that’s part of the upstream LLVM distribution to get us started. If the accelerator uses a non-standard ISA, an adapted version of LLVM with a custom backend can of course be used with the Construction Kit as well.
There must be some way for low-level interaction with the device, to perform actions like reading or writing device memory, or triggering the execution of a newly loaded piece of machine code. As we already supported another API before looking into OpenCL, such a fundamental library was already in place. In our case, it was a kernel driver exposing the minimal needed functionality to user space (essentially handling interrupts and providing access to device memory), accompanied by a very thin user space library wrapping those exposed primitives.

Implementing HAL

With these prerequisites being met, we started following the Construction Kit’s documentation ⁶. The first thing to do is implementing what the Construction Kit calls the “hardware abstraction layer” (HAL). The HAL comprises a minimal interface that covers the second item of the above list and consists of just eight functions: allocating/freeing device memory, reading/writing device memory, loading/freeing programs on the device, and finding/executing a kernel contained in an already loaded program.
In order to avoid having to deal with the full complexity of OpenCL from the get-go, a smaller helper library called “clik” is provided by the Construction Kit to implement the HAL. This library is essentially a severely stripped-down version of OpenCL, with some especially complex parts like on-line kernel compilation being completely absent. Hence, the clik library serves as a stepping stone for getting the HAL implemented function by function, and provides matching test cases to ensure that the HAL implementation fulfills the contract expected by the Construction Kit. After all tests pass, this scaffolding can be removed, and the resulting HAL implementation can be used to bring up a full OpenCL implementation.
In our case, implementing the HAL was straightforward. The tests enabled a quick development cycle, where more tests started passing every time some new functionality was added or pointed out problems where the HAL implementation didn’t meet the Construction Kit’s expectations. In total, it took about two weeks of full-time work by one developer without prior Construction Kit knowledge to go from starting the work to passing all clik tests. Configuring a complete OpenCL stack
After gaining confidence that the Metis HAL implementation was functional, we could continue with the next step and bring up a complete OpenCL stack ⁷. This, too, was surprisingly quick, taking roughly another two person weeks of developer time. The Construction Kit again provides an extensive unit test suite, whose tests can be used to guide development by pointing out specific areas that aren’t working yet. Testing our Metis OpenCL implementation.
All bring-up work was initially performed using an internal simulator environment, but after passing all tests there, we could quickly move to working on actual silicon (see ⁸). As the first real-world litmus test for our Metis OpenCL implementation, we picked an OpenCL C kernel that is currently used for preprocessing as part of our production vision pipeline. By default, the kernel is offloaded to the host’s GPU. However, with Metis now being a possible offloading target for OpenCL workloads as well, we pointed the existing host application at our Metis OpenCL library and gave it a try. We were very happy to see that without any modifications to the host application¹, we were able to run the vision pipeline while offloading the computations to Metis instead of the host GPU. In total, with the transition to actual silicon taking another week of developer time, it took us around five person weeks of development effort to go from having no OpenCL support to having a prototype implementation capable of offloading an existing OpenCL C kernel used in a production setting to our accelerator.
Hence, in our experience, OpenCL and the oneAPI Construction Kit fully delivered on the promises of easy portability and avoiding vendor lock-in.

Opening up possibilities

Having a functional OpenCL implementation is also an important building block that opens up many other possibilities. OpenCL can be used as a backend for the DPC++ SYCL implementation [9], which enables a more modern single-source style for programming accelerators.
Even more importantly, a SYCL implementation makes it possible to tap into the wider SYCL ecosystem. This includes optimized libraries, such as portBLAS ¹⁰ providing linear algebra routines and portDNN ¹¹ providing neural-network-related routines, but also brings the potential to support the UXL Foundation libraries including oneMKL ¹², oneDPL ¹³, and oneDNN ¹⁴. Alongside these libraries it also includes tools like SYCLomatic ¹⁵, which assists with migrating existing CUDA codebases to SYCL. Thus, it offers an important migration path to escape from vendor lock-in.

Why oneAPI simplifies AI accelerator implementation

The best way to bushwhack through the accelerator jungle and enable heterogeneous computing is to embrace open standardsOpen standards play a crucial role in the evolution and adoption of heterogeneous computing by addressing some of the fundamental challenges associated with developing for diverse hardware architectures. They provide standardized programming models and APIs, such as OneAI, that allow software to communicate with various hardware components, including CPUs, GPUs, DSPs, and FPGAs, irrespective of the vendor.Overall, we found the Construction Kit of oneAPI to be key for unlocking access to open standards. Through the use of oneAPI, the integration of AI accelerators can be significantly simplified and made more efficient and future-proof. That’s because oneAPI enables seamless, hardware-agnostic interoperation between tools and libraries. This accelerates the development process and ensures that applications can leverage the latest advancements in AI hardware and software technologies, and remain compatible with future hardware innovations, reducing the need for costly rewrites or optimizations. At Axelera AI, we are excited to continue on this path.

*OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
*SYCL and the SYCL logo are trademarks of the Khronos Group Inc.

References

H. Sutter, "Welcome to the Jungle," 2011. Online. Available: https://herbsutter.com/welcome-to-the-jungle/.
The Khronos Group, "OpenCL Overview," Online. Available: https://www.khronos.org/opencl/.
The Khronos Group, "SYCL Overview," [Online]. Available: https://www.khronos.org/sycl/.
UXL Foundation, "UXL Foundation: Unified Acceleration," [Online]. Available: https://uxlfoundation.org/.
Axelera AI, "Metis AIPU Product Page," [Online]. Available: https://www.axelera.ai/metis-aipu.
Codeplay Software Ltd, "Guide: Creating a new HAL," [Online]. Available: https://developer.codeplay.com/products/oneapi/construction-kit/3.0.0/guides/overview/tutorials/creating-a-new-hal.
Codeplay Software Ltd, "Guide: Creating a new ComputeMux Target," [Online]. Available: https://developer.codeplay.com/products/oneapi/construction-kit/3.0.0/guides/overview/tutorials/creating-a-new-mux-target.
Axelera AI, "First Customers Receive World’s Most Powerful Edge AI Solutions from Axelera AI," 12 September 2023. [Online]. Available: https://www.axelera.ai/news/first-customers-receive-worlds-most-powerful-edge-ai-solutions-from-axelera-ai..
Intel Corporation, "Intel® oneAPI DPC++/C++ Compiler," [Online]. Available: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html.
Codeplay Software Ltd, "portBLAS: Basic Linear Algebra Subroutines using SYCL," [Online]. Available: https://github.com/codeplaysoftware/portBLAS.
Codeplay Software Ltd, "portDNN: neural network acceleration library using SYCL," [Online]. Available: https://github.com/codeplaysoftware/portDNN.
Intel Corporation, "Intel® oneAPI Math Kernel Library (oneMKL)," [Online]. Available:https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html.
Intel Corporation, "Intel® oneAPI DPC++ Library," [Online]. Available: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-library.html.
Intel Corporation, "Intel® oneAPI Deep Neural Network Library," [Online]. Available: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onednn.html.
Intel Corporation, "SYCLomatic: CUDA to SYCL migration tool," [Online]. Available: https://github.com/oneapi-src/SYCLomatic.

Check our AI product solutions!

2024-07-10

AI TECH INSIGHT

How our quantization methods make the Metis AI PU highly efficient and accurate

Read all about our unique quantization techniques that obsolete model retraining & enable the most powerful and energy-efficient AI accelerators.

2024-04-23

AI TECH INSIGHT

AI access control: How to accelerate verification without sacrificing accuracy

Vision AI can make access control less invasive. AI accelerators can increase verification speed in AI Access Control without increasing false positives in security.

2024-04-09

AI TECH INSIGHT

Using oneAPI construction kit to enable open standards programming for the Metis AIPU

2024-01-22

Davos 2024: AI’s Evolution and the Edge Revolution

At this year’s World Economic Forum in Davos, the spotlight was firmly placed on artificial intelligence (AI), reflecting its growing importance across various sectors. The discussions not only highlighted AI’s expansive role but also emphasized the evolving trend of edge computing, driven by specialized hardware accelerators.

2023-5-02

How Will Generative AI Revolutionize Our Work?

On Labor Day, a day dedicated to celebrating the achievements and perseverance of the workforce, we find ourselves on the cusp of a new era where artificial intelligence (AI) is poised to transform the labor market.

2023-12-15

The Metis AI Platform A technical Deepdive

The Metis AI Platform is a one-of-a-kind holistic hardware and software solution establishing best-in-class performance, efficiency, and ease of use for AI inferencing of computer vision workloads at the Edge.

2023-11-14

Interview with Stephen Owen, Axelera AI Advisor

Stephen Owen, Axelera AI Advisor, is an experienced Board Level International Executive with over 16 years of executive-level experience in an S&P Top 500 Semiconductor Company and extensive global leadership and organizational expertise.

2023-10-11

Harnessing the RISC-V Wave: The Future is Now

RISC-V is inevitable – it became the mantra of RISC-V, and it’s true. But before we see why that is, let’s step back and discuss what RISC-V is and why we should care.

2023-06-14

Cheap Computing and the Balancing Act of Population Decline

Imagine a world where computing power reaches a historic practical equivalent of two human brains. In this blog article by our Director of Systems Software, Cristian Olar explores how our revolutionary Metis AIPU achieves a remarkable 200 TOPS result at a fraction of today’s costs.

2023-06-12

Decoding Transformers on Edge Devices

This blog introduces the transformer encoder and decoder architecture and discusses the challenges in inferring them on an edge device: high computational cost, high peak memory requirements, low arithmetic intensity.

HTC5, High Tech Campus
5656 AE Eindhoven
The Netherlands
Email: info@axelera.ai

Reducing CO2 with
Axelera’s Forest

Thank you for your newsletter subscription

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

Using oneAPI construction kit
to enable open standards programming for the Metis AIPU

The necessity of dedicated AI hardware accelerators

Welcome to the hardware jungle