2021-11-15

What’s Next for Data Processing?
A Closer Look at In-Memory Computing

Evangelos Eleftheriou | CTO at AXELERA AI

Technology is progressing at an incredible pace and no technology is moving faster than Artificial Intelligence (AI). Indeed, we are on the cusp of an AI revolution which is already reshaping our lives. One can use AI technologies to automate or augment humans, with applications including autonomous driving, advances in sensory perception and the acceleration of scientific discovery using machine learning. In the past five years, AI has become synonymous with Deep Learning (DL), another area seeing fast and dramatic progress. We are at a point where Deep Neural Networks (DNNs) for image and speech recognition can provide accuracy on par or even better than that achieved by the human brain.

Most of the fundamental algorithmic developments around DL go back decades. However, the recent success has stemmed from the availability of large amounts of data and immense computing power for training neural networks. From around 2010, the exponential increase of single-precision floating point operations offered by Graphic Processing Units (GPUs) ran in parallel to the explosion of neural network sizes and computational requirements. Specifically, the amount of compute used in the largest AI training has doubled every 3.5 months during the last decade. At the same time, the size of state-of-the-art models increased from 26M weights for ResNet-50 to 1.5B for GPT-2. This phenomenal increase in model size is reflected directly in the cost of training such complex models. For example, the cost of training the bidirectional transformer network BERT, for Natural Language Processing applications, is estimated at $61,000, whereas training XLNet, which outperformed BERT, costs about nine times as much. However, a major concern is not only the cost associated with the substantial energy consumption needed to train complex networks but also the significant environmental impact incurred in the form of CO2 emissions.

As the world looks to reduce carbon emissions, there is an even greater need for higher performance with lower power consumption. This is true not only for AI applications in the data center, but also at the Edge, which is where we expect the next revolution to take place. AI at the Edge refers to processing of data where it is collected, as opposed to requiring data to be moved to separate processing centers. There is a wealth of applications at the edge: AI for mobile devices, including authentication, speech recognition, and mixed/augmented reality, AI for embedded processing for IoT devices, including smart cities and homes or embedded processing for prosthetics, wearables, and personalized healthcare, as well as AI for real-time video analytics for autonomous navigation and control. However, these embedded applications are all energy and memory constrained, meaning energy efficiency matters even more so at the Edge. The end of Moore’s and Dennard’s laws are compounding these challenges. Thus, there are compelling motivations to explore novel computing architectures with inspiration from the most efficient computer on the planet, the human brain.

Traditional Computing Systems: Current State of Play

Traditional digital computing systems, based on the von Neumann architecture, consist of separate processing and memory units. Therefore, performing computations typically results in a significant amount of data being moved back and forth between the physically separated memory and processing units. This data movement costs latency and energy and creates an inherent performance bottleneck. The latency associated with the growing disparity between the speed of memory and processing units, commonly known as the memory wall, is one example of a crucial performance bottleneck for a variety of AI workloads. Similarly, the energy cost associated with shuttling data represents another key challenge for computing systems that are severely power limited due to cooling constraints as well as for the plethora of battery-operated mobile devices. In general, the energy cost of multiplying two numbers is orders of magnitude lower than that of accessing numbers from memory. Therefore, it is clear to AI developers that there is a need to explore novel computing architectures that provide better collocation of processing and memory subsystems. One suggested concept in this area is near-memory computing, which aims to reduce the physical distance and time needed to access memory. This approach heavily leverages recent advances made in die stacking and new technologies such as the high memory cube (HMC) and high bandwidth memory (HBM).

In-Memory Computing: A Radical New Approach

In-memory computing is a radically different approach to data processing, in which certain computational tasks are performed in place in the memory itself (Sebastian 2020). This is achieved by organizing the memory as a crossbar array and by exploiting the physical attributes of the memory devices. The peripheral circuitry and the control logic play a key role in creating what we call an in-memory computing (IMC) unit or computational memory unit (CMU). In addition to overcoming the latency and energy issues associated with data movement, in-memory computing has the potential to significantly improve the computational time complexity associated with certain computational tasks. This is primarily a result of the massive parallelism created by a dense array of millions of memory devices simultaneously performing computations.

For instance, crossbar arrays of such memory devices can be used to store a matrix and perform matrix-vector multiplications (MVMs) at constant O(1) time complexity without intermediate movement of data. The efficient matrix-vector multiplication via in-memory computing is very attractive for training and inference of deep neural networks, particularly for inference applications at the Edge where high energy efficiency is critical. In fact, matrix-vector multiplications constitute 70-90% of all deep learning operations. Thus, applications requiring numerous AI components such as computer vision, natural language processing, reasoning and autonomous driving can explore this new technology in new and innovative ways. Novel dedicated hardware with massive on-chip memory, where part of it is enhanced with in-memory computation capabilities could lead to very efficient training and inference engines of ultra-large neural networks comprising of potentially billions of synaptic weights.

The core technology of IMC is memory. In general, there are two classes of memory devices. The conventional one, in which information is stored in the presence or absence of charge, includes dynamic random-access memory (DRAM), static random-access memory (SRAM) and Flash memory. There is also an emerging class of memory devices, in which information is stored in terms of the atomic arrangements within nanoscale volumes of materials, as opposed to charge on a capacitor. Generally speaking, one atomic configuration corresponds to one logic state, and the other corresponds to another logic state. These differences in atomic configuration manifest as a change in resistance, and thus these devices are collectively called resistive memory devices or memristors. Traditional and emerging memory technologies can perform a range of in-memory logic and arithmetic operations. In addition, SRAM, Flash and all memristive memories can also be used for MVM operations.

The most important characteristics of a memory device are its read and write times, that is how fast a device can store and retrieve information. Equally important characteristics are the cycling endurance, which refers to the number of times a memory device can be switched from one state to the other, the energy required to store information in a memory cell as well as the size of the memory cell. Table 1 -compares the traditional DRAM, SRAM and NOR Flash with the most popular emerging resistive-memory technologies, such as spin-transfer torque RAM (STT-RAM), phase-change memory (PCM) and resistive RAM (ReRAM).

Table 1 – Comparing different memory technologies. Sources:(B. Li 2019), (Marinella 2013)

Evaluate industry defining AI inference technology today.

The Axelera AI Metis Platform accelerates prototyping and deploying (vision) AI acceleration by providing a comprehensive hardware and software solution with unmatched usability and cost-efficiency.

Be among the first to accelerate your innovation and experience true freedom to innovate. Order your Metis evaluation kit and be a part of shaping the future of Edge AI.

GET YOUR EVALUATION KIT

Which Evaluation kit do you want to order?1/3.

Which evaluation kit do you want?

This field is required!

Company name

This field is required!

What is your focus industry/application?

This field is required!

Other industry segment

This is not correct

What best describes your company?

This is not correct.

Other company type

This is not correct

Back

Your contact details2/3.

First name

This field is required!

Last name

This field is required!

Job Title

This field is required!

Country

United States
Canada
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
British Indian Ocean Territory
British Virgin Islands
Brunei
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Congo
Cook Islands
Costa Rica
Croatia
Cuba
Curaçao
Cyprus
Czech Republic
Côte d’Ivoire
Democratic Republic of the Congo
Denmark
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hong Kong S.A.R., China
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Japan
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Luxembourg
Macao S.A.R., China
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia
Moldova
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
North Korea
Northern Mariana Islands
Norway
Oman
Pakistan
Palau
Palestinian Territory
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Romania
Russia
Rwanda
Réunion
Saint Barthélemy
Saint Helena
Saint Kitts and Nevis
Saint Lucia
Saint Pierre and Miquelon
Saint Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Svalbard and Jan Mayen
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tokelau
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Tuvalu
U.S. Virgin Islands
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican
Venezuela
Viet Nam
Wallis and Futuna
Western Sahara
Yemen
Zambia
Zimbabwe

This is not correct.

This field is required!

Phone number

This field is required!

Back

Your project info3/3.

This field is required!

Back

One more thing...

How did you hear about us?

This field is required!

Other media channel

This is not correct

By submitting your information, you consent to ourprivacy policyand authorize us to store your personal data and contact you regarding organizational details.

Join our monthly updates about the future of edge-AI! By signing up, you agree to receive regular updates from Axelera AI, as per ourprivacy policy, and stay at the forefront of AI innovation.

Back

Thank you for your ordering your Axelera Metis Evaluation Kit!

We've received your order, and a confirmation email has been sent to the provided email address. Our team is excited to review your order.

After evaluating your input, we will be in touch within the next 2 business days to discuss the next steps and how your order can benefit your innovative projects.
Stay tuned for more details coming your way soon!

Which Memory Technology for Which Operation? Considerations to Keep in Mind

There are many trade-offs involved in selecting which memory technology is suitable for MVM operations for the target DL workloads. For example, read latency, to a large extent, determines the performance of the system, also known as throughput, in operations per second (OPS). This means it also indirectly affects the system’s efficiency, measured in OPS/W. On the other hand, memory volatility, as well as the write time, determine whether the system supports static or reloadable weights. Cycling endurance is another important characteristic to keep in mind, as it determines the suitability of a memory technology for training and/or inference applications. For example, the limited endurance of PCM, ReRAM and Flash memory devices precludes them from DL training applications. The cell size also has an impact on the compute density. Specifically, it affects the die area and therefore the ASIC cost.

It is also important to look at temperature stability, drift phenomena and noise effects. In general, all memory devices exhibit intra-device variability and randomness that is intrinsic to how they operate. However, resistive memory devices appear to be more prone to noise (read and write), nonlinear behaviour, inter-device variability and inhomogeneity across an array. Thus, the precision achieved when using memristive technologies for analogue matrix-vector operations is typically not very high and requires the use of additional hardware-aware training techniques to achieve FP32-equivalent accuracies. Finally, the compatibility of the manufacturing process for memory devices with the CMOS technology and their scalability to lower lithography nodes are very important considerations for the successful commercialization of IMC technology and its future roadmap.

SRAM has a unique advantage in that it exhibits the fastest read and write time and highest endurance compared to other memory devices. Thus, SRAM enables high performance and reprogrammable IMC engines for both inference and training applications. Moreover, SRAM follows the scaling of CMOS technology to low lithography nodes and requires standard materials and processes that are readily available to foundries. On the other hand, it is a volatile memory technology that consumes energy not only when it is at the idle state but also for data retention. In addition, SRAM’s cell size, approximately 100 F2, is the largest of all charge- and resistance-based memory technologies. However, volatility is not a serious drawback, as the applications very rarely dictate static models. In fact, the fast write time of SRAM is a crucial advantage, allowing it to alternate DL models through very fast re-programmability. Finally, from a system architecture standpoint, due to the fast re-programmability of SRAM, there is no need to map the entire DNN onto multiple crossbar arrays of memory devices that would result in a large and costly ASIC.

Recently, IMEC reported an SRAM-based IMC Multiply-Accumulate unit (MAC) with a record energy efficiency of 2900 TOPS/W using ternary weights (imec 2020). There are also experimental prototype SRAM demonstrators that support INT8 activations and weights whose precision scales linearly with latency, power consumption and area. Clearly, the in-memory MAC implementation and operation are only one part of a multi-faceted IMC-based system. Other digital units are needed to support element-wise vector processing operations, including activation functions, depth-wise convolution, affine scaling, batch normalization and more. Moreover, the performance and usability of a multicore IMC engine also depends on multiple characteristics: optimized memory hierarchy, well-balanced fabric, fine-tuned quantization flow, optimized weight-mapping strategies and a versatile compiler and software tool chain.

There have been a lot of advancements made in the computing sector, with even more to come. Our customers, and the industry as a whole, have made it clear that they would like to have a system that offers high throughput, high efficiency and high accuracy – the three highs -, which is also easy to use and of course, cost-effective. At Axelera AI, we are working to design a system that offers all these capabilities and much more. Our AI solution will be based on a novel multicore in-memory computing paradigm combined with an innovative custom dataflow architecture.

Stay tuned to learn more about our progress in upcoming blog posts, and be sure to subscribe to our newsletter using the form on our homepage!

References

[1] B. Li, B. Yan, H. Li. 2019. “An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive Applications.” Great Lakes Symposium on VLSI.

[2] imec. 2020. Imec and GLOBALFOUNDRIES Announce Breakthrough in AI Chip, Bringing Deep Neural Network Calculations to IoT Edge Devices. Jul. Accessed Nov 2021. https://www.imec-int.com/en/articles/imec-and-globalfoundries-announce-breakthrough-in-ai-chip-bringing-deep-neural-network-calculations-to-iot-edge-devices.

[3] Marinella, M. 2013. “ERD Memory Planning – updated from last weeks telecon.”

[4] Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R., Eleftheriou, E. 2020. “Memory devices and applications for in-memory computing.” Nature Nanotechnology.

Links

https://www.nature.com/articles/s41565-020-0655-z

https://arxiv.org/abs/1906.06603

https://www.osti.gov/biblio/1658057

https://www.imec-int.com/en/articles/imec-and-globalfoundries-announce-breakthrough-in-ai-chip-bringing-deep-neural-network-calculations-to-iot-edge-devices

Want to learn about our Metis AI Platform?

Join our Early Access Program. Evaluation Kit available July 2023.

Adapted photograph of robot arm on factory belt, showcasing machine vision & Industry 4.0

2024-08-19

AI TECH INSIGHT

Challenges and Opportunities of Machine Learning in Quality Control

Discover how vision inspection system manufacturers can tackle the challenges associated with applying machine learning in quality control.

2024-07-10

AI TECH INSIGHT

How our quantization methods make the Metis AI PU highly efficient and accurate

Read all about our unique quantization techniques that obsolete model retraining & enable the most powerful and energy-efficient AI accelerators.

2024-04-23

AI TECH INSIGHT

AI access control: How to accelerate verification without sacrificing accuracy

Vision AI can make access control less invasive. AI accelerators can increase verification speed in AI Access Control without increasing false positives in security.

2024-04-09

AI TECH INSIGHT

Using oneAPI construction kit to enable open standards programming for the Metis AIPU

Open standards enable developers to more easily harness the power of AI accelerators, especially in heterogenous computing. Here you can read in detail why and how we implemented OpenCL using oneAPI on Metis.

2024-01-22

Davos 2024: AI’s Evolution and the Edge Revolution

At this year’s World Economic Forum in Davos, the spotlight was firmly placed on artificial intelligence (AI), reflecting its growing importance across various sectors. The discussions not only highlighted AI’s expansive role but also emphasized the evolving trend of edge computing, driven by specialized hardware accelerators.

2023-5-02

How Will Generative AI Revolutionize Our Work?

On Labor Day, a day dedicated to celebrating the achievements and perseverance of the workforce, we find ourselves on the cusp of a new era where artificial intelligence (AI) is poised to transform the labor market.

AI chip of Axelera AI placed on black hardware

2023-12-15

The Metis AI Platform A technical Deepdive

The Metis AI Platform is a one-of-a-kind holistic hardware and software solution establishing best-in-class performance, efficiency, and ease of use for AI inferencing of computer vision workloads at the Edge.

Image to promote interview with Stephen Owen

2023-11-14

Interview with Stephen Owen, Axelera AI Advisor

Stephen Owen, Axelera AI Advisor, is an experienced Board Level International Executive with over 16 years of executive-level experience in an S&P Top 500 Semiconductor Company and extensive global leadership and organizational expertise.

2023-10-11

Harnessing the RISC-V Wave: The Future is Now

RISC-V is inevitable – it became the mantra of RISC-V, and it’s true. But before we see why that is, let’s step back and discuss what RISC-V is and why we should care.

Hand holding Metis AI processing unit with two brains in the background

2023-06-14

Cheap Computing and the Balancing Act of Population Decline

Imagine a world where computing power reaches a historic practical equivalent of two human brains. In this blog article by our Director of Systems Software, Cristian Olar explores how our revolutionary Metis AIPU achieves a remarkable 200 TOPS result at a fraction of today’s costs.

HTC5, High Tech Campus
5656 AE Eindhoven
The Netherlands
Email: info@axelera.ai

Reducing CO2 with
Axelera’s Forest

Thank you for your newsletter subscription

What’s Next for Data Processing?
A Closer Look at In-Memory Computing

Traditional Computing Systems: Current State of Play

In-Memory Computing: A Radical New Approach

Thank you for your ordering your Axelera Metis Evaluation Kit!

Which Memory Technology for Which Operation? Considerations to Keep in Mind

References

Challenges and Opportunities of Machine Learning in Quality Control

How our quantization methods make the Metis AI PU highly efficient and accurate

AI access control: How to accelerate verification without sacrificing accuracy

Using oneAPI construction kit to enable open standards programming for the Metis AIPU

Davos 2024: AI’s Evolution and the Edge Revolution

How Will Generative AI Revolutionize Our Work?

The Metis AI Platform A technical Deepdive

Interview with Stephen Owen, Axelera AI Advisor

Harnessing the RISC-V Wave: The Future is Now

Cheap Computing and the Balancing Act of Population Decline

Address

Menu

Company

Follow Us

Sign Up for Our Newsletter

What’s Next for Data Processing?A Closer Look at In-Memory Computing

Traditional Computing Systems: Current State of Play

In-Memory Computing: A Radical New Approach

Thank you for your ordering your Axelera Metis Evaluation Kit!

Which Memory Technology for Which Operation? Considerations to Keep in Mind

References

Challenges and Opportunities of Machine Learning in Quality Control

How our quantization methods make the Metis AI PU highly efficient and accurate

AI access control: How to accelerate verification without sacrificing accuracy

Using oneAPI construction kit to enable open standards programming for the Metis AIPU

Davos 2024: AI’s Evolution and the Edge Revolution

How Will Generative AI Revolutionize Our Work?

The Metis AI Platform A technical Deepdive

Interview with Stephen Owen, Axelera AI Advisor

Harnessing the RISC-V Wave: The Future is Now

Cheap Computing and the Balancing Act of Population Decline

Address

Menu

Company

Follow Us

Sign Up for Our Newsletter

What’s Next for Data Processing?
A Closer Look at In-Memory Computing