2022-04-01

Multilayer perceptrons (MLP)
in Computer Vision

Bram Verhoef | Head of Machine Learning & Co-Founder at AXELERA AI

Multilayer Perceptrons (MLP) in Computer Vision

Summary

Convolutional neural networks (CNNs) still dominate today’s computer vision. Recently, however, networks based on transformer blocks have also been applied to typical computer vision tasks such as object classification, detection, and segmentation, attaining state-of-the-art results on standard benchmark datasets.

However, these vision-transformers (ViTs) are usually pre-trained on extremely large datasets and may consist of billions of parameters, requiring teraflops of computing power. Furthermore, the self-attention mechanism inherent to classical transformers builds on quadratically complex computations.

To mitigate some of the problems posed by ViTs, a new type of network based solely on multilayer perceptrons (MLPs), has recently been proposed. These vision-MLPs (V-MLP) shrug off classical self-attention but still achieve global processing through their fully connected layers.

In this blog post, we review the V-MLP literature, compare V-MLPs to CNNs and ViTs, and attempt to extract the ingredients that really matter for efficient and accurate deep learning-based computer vision.

Introduction

In computer vision, CNNs have been the de facto standard networks for years. Early CNNs, like AlexNet [1] and VGGNet [2], consisted of a stack of convolutional layers, ultimately terminating in several large fully connected layers used for classification. Later, networks were made progressively more efficient by reducing the size of the classifying fully connected layers using global average pooling [3]. Furthermore these more efficient networks, among other adjustments, reduce the spatial size of convolutional kernels [4, 5], employ bottleneck layers and depthwise convolutions [5, 6], and use compound scaling of the depth, width and resolution of the network [7]. These architectural improvements, together with several improved training methods [8] and larger datasets have led to highly efficient and accurate CNNs for computer vision.

Despite their tremendous success, CNNs have their limitations. For example, their small kernels (e.g., 3×3) give rise to small receptive fields in the early layers of the network. This means that information processing in early convolutional layers is local and often insufficient to capture an object’s shape for classification, detection, segmentation, etc. This problem can be mitigated using deeper networks, increased strides, pooling layers, dilated convolutions, skip connections, etc., but these solutions either lose information or increase the computational cost. Another limitation of CNNs stems from the inductive bias induced by the weight sharing across the spatial dimensions of the input. Such weight sharing is modeled after early sensory cortices in the brain and (hence) is well adapted to efficiently capture natural image statistics. However, it also limits the model’s capacity and restricts the tasks to which CNNs can be applied.

Recently, there has been much research to solve the problems posed by CNNs by employing transformer blocks to encode and decode visual information. These so-called Vision Transformers (ViTs) are inspired by the success of transformer networks in Natural Language Processing (NLP) [9] and rely on global self-attention to encode global visual information in the early layers of the network. The original ViT was isotropic (it maintains an equal-resolution-and-size representation across layers), permutation invariant, based entirely on fully connected layers and relying on global self attention [10]. As such, the ViT solved the above-mentioned problems related to CNNs by providing larger (dynamic) receptive fields in a network with less inductive bias.

This is exciting research but it soon became clear that the ViT was hard to train, not competitive with CNNs when trained on relatively small datasets (e.g., IM-1K, [11]), and computationally complex as a result of the quadratic complexity of self-attention. Consequently, further studies sought to facilitate training. One approach was using network distillation [12]. Another was to insert CNNs at the early stages of the network [13]. Further attempts to improve ViTs re-introduced inductive biases found in CNNs (e.g., using local self attention [14] and hierarchical/pyramidal network structures [15]). There were also efforts to replace dot-product QKV-self-attention with alternatives [e.g. 16]. With these modifications now in place, vision transformers can compete with CNNs with respect to computational efficiency and accuracy, even when trained on relatively small datasets [see this blog post by Bert Moons for more discussion on ViTs].

Vision MLPs

Notwithstanding the success of recent vision transformers, several studies demonstrate that models building solely on multilayer perceptrons (MLPs) — so-called vision MLPs (V-MLPs) — can achieve surprisingly good results on typical computer vision tasks like object classification, detection and segmentation. These models aim for global spatial processing, but without the computationally complex self-attention. At the same time, these models are easy to scale (high model capacity) and seek to retain a model structure with low inductive bias, which makes them applicable to a wide range of tasks [17].

Like ViTs, the V-MLPs first decompose the images into non-overlapping patches, called tokens, which form the input into a V-MLP block. A typical V-MLP block consists of a spatial MLP (token mixer) and a channel MLP (channel mixer), interleaved by (layer) normalization and complemented with residual connections. This is illustrated in Figure 1.

Table 1. Overview of some V-MLPs. For each V-MLP, we present the accuracy of the largest reported model that is trained on IM-1K only.

Here the spatial MLP captures the global correlations between tokens, while the channel MLP combines information across features. This can be formulated as follows:

Y=spatialMLP(LN(X))+X, Z=channelMLP(LN(Y))+Y,

Here X is a matrix containing the tokens, Y consists of intermediate features, LN denotes layer normalization, and Z is the output feature of the block. In these equations, spatialMLP and channelMLP can be any nonlinear function represented by some type of MLP with activation function (e.g. GeLU).

In practice, the channelMLP is often implemented by one or more 1×1 convolutions, and most of the innovation found in different studies lies in the structure of the spatialMLP submodule. And, here’s where history repeats itself. Where ViTs started as isotropic models with global spatial processing (e.g., ViT [10] or DeiT [12]), V-MLPs did so too (e.g., MLP-Mixer [17] or ResMLP [18]). Where recent ViTs improved their accuracy and performance on visual tasks by adhering to a hierarchical structure with local spatial processing (e.g., Swin-transformer [14] or NesT [19]), recent V-MLPs do so too (e.g., Hire-MLP [20] or S^2-MLPv2 [21]). These modifications made the models more computationally efficient (fewer parameters and FLOPs), easier to train and more accurate, especially when trained on relatively small datasets. Hence, over time both ViTs and V-MLPs re-introduced the inductive biases well known from CNNs.

Due to their fully connected nature, V-MLPs are not permutation invariant and thus do not necessitate the type of positional encoding frequently used in ViTs. However, one important drawback of pure V-MLPs is the fixed input resolution required for the spatialMLP submodule. This makes transfer to downstream tasks, such as object detection and segmentation, difficult. To mitigate this problem, some researchers have inserted convolutional layers or, similarly, bicubic interpolation layers, into the V-MLP (e.g., ConvMLP [22] or RaftMLP [23]). Of course, to some degree, this defies the purpose of V-MLPs. Other studies have attempted to solve this problem using MLPs only (e.g., [20, 21, 30]), but the data-shuffling needed to formulate the problem as an MLP results in an operation that is very similar or even equivalent to some form of (grouped) convolution.

See Table 1 for an overview of different V-MLPs. Note how some of the V-MLP models are very competitive with (or better than) state-of-the-art CNNs, e.g. ConvNeXt-B with 89M parameters, 45G FLOPs and 83.5% accuracy [28].

What matters?

It is important to note that the high-level structure of V-MLPs is not new. Depthwise-separable convolutions for example, as used in MobileNets [6], consist of a depthwise convolution (spatial mixer) and a pointwise 1×1 convolution (channel mixer). Furthermore, the standard transformer block comprises a self-attention layer (spatial mixer) and a pointwise MLP (channel mixer). This suggests that the good performance and accuracy obtained with these models results at least partly from the high-level structure of layers used inside V-MLPs and related models. Specifically, (1) the use of non-overlapping spatial patch embeddings as inputs, (2) some combination of independent spatial (with large enough spatial kernels) and channel processing, (3) some interleaved normalization, and (4) residual connections. Recently, such a block structure has been dubbed “Metaformer” ([24], Figure 2), referring to the high-level structure of the block, rather than the particular implementation of its subcomponents. Some evidence for this hypothesis comes from [27], who used a simple isotropic purely convolutional model, called “ConvMixer,” that takes non-overlapping patch embeddings as inputs. Given an equal parameter budget, their model shows improved accuracy compared to standard ResNets and DeiT. A more thorough analysis of this hypothesis was performed by “A ConvNet for the 2020s,” [28], which systematically examined the impact of block elements 1-4, finding a purely convolutional model reaching SOTA performance on ImageNet, even when trained on IN-1K alone.

Figure 2. a. V-MLP, b. Transformer and c. MetaFormer. Adapted from [24].

Evaluate industry defining AI inference technology today.

The Axelera AI Metis Platform accelerates prototyping and deploying (vision) AI acceleration by providing a comprehensive hardware and software solution with unmatched usability and cost-efficiency.

Be among the first to accelerate your innovation and experience true freedom to innovate. Order your Metis evaluation kit and be a part of shaping the future of Edge AI.

GET YOUR EVALUATION KIT

Which Evaluation kit do you want to order?1/3.

Which evaluation kit do you want?

This field is required!

Company name

This field is required!

What is your focus industry/application?

This field is required!

Other industry segment

This is not correct

What best describes your company?

This is not correct.

Other company type

This is not correct

Back

Your contact details2/3.

First name

This field is required!

Last name

This field is required!

Job Title

This field is required!

Country

United States
Canada
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
British Indian Ocean Territory
British Virgin Islands
Brunei
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Congo
Cook Islands
Costa Rica
Croatia
Cuba
Curaçao
Cyprus
Czech Republic
Côte d’Ivoire
Democratic Republic of the Congo
Denmark
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hong Kong S.A.R., China
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Japan
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Luxembourg
Macao S.A.R., China
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia
Moldova
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
North Korea
Northern Mariana Islands
Norway
Oman
Pakistan
Palau
Palestinian Territory
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Romania
Russia
Rwanda
Réunion
Saint Barthélemy
Saint Helena
Saint Kitts and Nevis
Saint Lucia
Saint Pierre and Miquelon
Saint Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Svalbard and Jan Mayen
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tokelau
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Turks and Caicos Islands
Tuvalu
U.S. Virgin Islands
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican
Venezuela
Viet Nam
Wallis and Futuna
Western Sahara
Yemen
Zambia
Zimbabwe

This is not correct.

This field is required!

Phone number

This field is required!

Back

Your project info3/3.

This field is required!

Back

One more thing...

How did you hear about us?

This field is required!

Other media channel

This is not correct

By submitting your information, you consent to ourprivacy policyand authorize us to store your personal data and contact you regarding organizational details.

Join our monthly updates about the future of edge-AI! By signing up, you agree to receive regular updates from Axelera AI, as per ourprivacy policy, and stay at the forefront of AI innovation.

Back

Thank you for your ordering your Axelera Metis Evaluation Kit!

We've received your order, and a confirmation email has been sent to the provided email address. Our team is excited to review your order.

After evaluating your input, we will be in touch within the next 2 business days to discuss the next steps and how your order can benefit your innovative projects.
Stay tuned for more details coming your way soon!

Conclusion

Taken together, these studies suggest that what matters for efficient and accurate vision models are the particular layer ingredients found in the Metaformer block (tokenization, independent spatial and channel processing, normalization and residual blocks) and the inductive biases typically found in CNNs (local processing with weight sharing and a hierarchical network structure). Clearly, this conclusion does not imply a special role for MLPs, as the Metaformer structure building on purely convolutional layers works (almost) just as well.

So are there other reasons for the recent focus on V-MLPs? The above-mentioned convolutional Metaformers were all tested on vision tasks and it is well known that the convolutional structure matches well with natural image statistics. Indeed, as mentioned above the best performing V-MLPs and ViTs (re-)introduce the inductive biases, such as local hierarchical processing, typically found in CNNs. However, if one is interested in a generic model that performs well in multimodal tasks and has lower computational complexity than standard transformers, an MLP-based network can be a good choice. For example, some initial results show that MLP-based Metaformers also perform well on NLP tasks [18, 29].

An additional benefit of isotropic MLPs is that they scale more easily. This scalability can make it easier to implement them on compute infrastructure that relies on regular compute patterns. Furthermore, it facilitates capturing the high information content of large (multimodal) datasets.

So based on current findings we can formulate the following practical guidelines: for settings that are significanlty resource- and data-constrained, such as edge computing, there is currently little evidence that V-MLPs, like ViTs, are a superior alternative to CNNs. However, when datasets are large and/or multimodal, and compute is more abundant, pure MLP-based models may be a more efficient and generic choice compared to CNNs and transformer-based models that rely on self-attention.

We are still in the early days of examining the possibilities of MLP-based models. In just 9 months the accuracy of V-MLPs on ImageNet classification increased by a stunning ~8%. It is expected that these models will improve further and that hybrid networks, which properly combine MLPs, CNNs and attention mechanisms, have the potential to significantly outperform existing models (e.g. [30]). We are excited to be part of this future.

References

Stay tuned to learn more about our progress in upcoming blog posts, and be sure to subscribe to our newsletter using the form on our homepage! References

[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012): 1097-1105.

[2] Karen Simonyan, Andrew Zisserman: “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 2014; [http://arxiv.org/abs/1409.1556 arXiv:1409.1556].

[3] Min Lin, Qiang Chen, Shuicheng Yan: “Network In Network,” 2013; [http://arxiv.org/abs/1312.4400 arXiv:1312.4400].

[4] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer: “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” 2016; [http://arxiv.org/abs/1602.07360 arXiv:1602.07360].

[5] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[6] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam: “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” 2017; [http://arxiv.org/abs/1704.04861 arXiv:1704.04861].

[7] Mingxing Tan, Quoc V. Le: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” 2019, International Conference on Machine Learning, 2019; [http://arxiv.org/abs/1905.11946 arXiv:1905.11946].

[8] Ross Wightman, Hugo Touvron, Hervé Jégou: “ResNet strikes back: An improved training procedure in timm,” 2021; [http://arxiv.org/abs/2110.00476 arXiv:2110.00476].

[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2018; [http://arxiv.org/abs/1810.04805 arXiv:1810.04805].

[10] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby: “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” 2020; [http://arxiv.org/abs/2010.11929 arXiv:2010.11929].

[11] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).

[12] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou: “Training data-efficient image transformers & distillation through attention,” 2020; [http://arxiv.org/abs/2012.12877 arXiv:2012.12877].

[13] Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick: “Early Convolutions Help Transformers See Better,” 2021; [http://arxiv.org/abs/2106.14881 arXiv:2106.14881].

[14] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo: “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” 2021; [http://arxiv.org/abs/2103.14030 arXiv:2103.14030].

[15] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao: “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,” 2021; [http://arxiv.org/abs/2102.12122 arXiv:2102.12122].

[16] Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira: “Perceiver: General Perception with Iterative Attention,” 2021; [http://arxiv.org/abs/2103.03206 arXiv:2103.03206].

[17] Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy: “MLP-Mixer: An all-MLP Architecture for Vision,” 2021; [http://arxiv.org/abs/2105.01601 arXiv:2105.01601].

[18] Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou: “ResMLP: Feedforward networks for image classification with data-efficient training,” 2021; [http://arxiv.org/abs/2105.03404 arXiv:2105.03404].

[19] Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister: “Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding,” 2021; [http://arxiv.org/abs/2105.12723 arXiv:2105.12723].

[20] Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang: “Hire-MLP: Vision MLP via Hierarchical Rearrangement,” 2021; [http://arxiv.org/abs/2108.13341 arXiv:2108.13341].

[21] Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li: “S^2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision,” 2021; [http://arxiv.org/abs/2108.01072 arXiv:2108.01072].

[22] Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi: “ConvMLP: Hierarchical Convolutional MLPs for Vision,” 2021; [http://arxiv.org/abs/2109.04454 arXiv:2109.04454].

[23] Yuki Tatsunami, Masato Taki: “RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?” 2021; [http://arxiv.org/abs/2108.04384 arXiv:2108.04384].

[24] Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan: “MetaFormer is Actually What You Need for Vision,” 2021; [http://arxiv.org/abs/2111.11418 arXiv:2111.11418].

[25] Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang: “An Image Patch is a Wave: Quantum Inspired Vision MLP,” 2021; [http://arxiv.org/abs/2111.12294 arXiv:2111.12294].

[26] Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu: “DynaMixer: A Vision MLP Architecture with Dynamic Mixing,” 2022; [http://arxiv.org/abs/2201.12083 arXiv:2201.12083].

[27] Asher Trockman, J. Zico Kolter: “Patches Are All You Need?” 2022; [http://arxiv.org/abs/2201.09792 arXiv:2201.09792].

[28] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie: “A ConvNet for the 2020s,” 2022; [http://arxiv.org/abs/2201.03545 arXiv:2201.03545].

[29] Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le: “Pay Attention to MLPs,” 2021; [http://arxiv.org/abs/2105.08050 arXiv:2105.08050].

[30] Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou: “Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs,” 2022; [http://arxiv.org/abs/2202.06510 arXiv:2202.06510].

2024-07-10

AI TECH INSIGHT

How our quantization methods make the Metis AI PU highly efficient and accurate

Read all about our unique quantization techniques that obsolete model retraining & enable the most powerful and energy-efficient AI accelerators.

2024-04-23

AI TECH INSIGHT

AI access control: How to accelerate verification without sacrificing accuracy

Vision AI can make access control less invasive. AI accelerators can increase verification speed in AI Access Control without increasing false positives in security.

2024-04-09

AI TECH INSIGHT

Using oneAPI construction kit to enable open standards programming for the Metis AIPU

Open standards enable developers to more easily harness the power of AI accelerators, especially in heterogenous computing. Here you can read in detail why and how we implemented OpenCL using oneAPI on Metis.

2024-01-22

Davos 2024: AI’s Evolution and the Edge Revolution

At this year’s World Economic Forum in Davos, the spotlight was firmly placed on artificial intelligence (AI), reflecting its growing importance across various sectors. The discussions not only highlighted AI’s expansive role but also emphasized the evolving trend of edge computing, driven by specialized hardware accelerators.

2023-5-02

How Will Generative AI Revolutionize Our Work?

On Labor Day, a day dedicated to celebrating the achievements and perseverance of the workforce, we find ourselves on the cusp of a new era where artificial intelligence (AI) is poised to transform the labor market.

2023-12-15

The Metis AI Platform A technical Deepdive

The Metis AI Platform is a one-of-a-kind holistic hardware and software solution establishing best-in-class performance, efficiency, and ease of use for AI inferencing of computer vision workloads at the Edge.

2023-11-14

Interview with Stephen Owen, Axelera AI Advisor

Stephen Owen, Axelera AI Advisor, is an experienced Board Level International Executive with over 16 years of executive-level experience in an S&P Top 500 Semiconductor Company and extensive global leadership and organizational expertise.

2023-10-11

Harnessing the RISC-V Wave: The Future is Now

RISC-V is inevitable – it became the mantra of RISC-V, and it’s true. But before we see why that is, let’s step back and discuss what RISC-V is and why we should care.

2023-06-14

Cheap Computing and the Balancing Act of Population Decline

Imagine a world where computing power reaches a historic practical equivalent of two human brains. In this blog article by our Director of Systems Software, Cristian Olar explores how our revolutionary Metis AIPU achieves a remarkable 200 TOPS result at a fraction of today’s costs.

2023-06-12

Decoding Transformers on Edge Devices

This blog introduces the transformer encoder and decoder architecture and discusses the challenges in inferring them on an edge device: high computational cost, high peak memory requirements, low arithmetic intensity.

HTC5, High Tech Campus
5656 AE Eindhoven
The Netherlands
Email: info@axelera.ai

Reducing CO2 with
Axelera’s Forest

Thank you for your newsletter subscription

Multilayer perceptrons (MLP)
in Computer Vision

Multilayer Perceptrons (MLP) in Computer Vision

Summary