This blog introduces the transformer encoder and decoder architecture and discusses the challenges in inferring them on an edge device: high computational cost, high peak memory requirements, low arithmetic intensity.
Imagine a world where computing power reaches a historic practical equivalent of two human brains. In this blog article by our Director of Systems Software, Cristian Olar explores how our revolutionary Metis AIPU achieves a remarkable 200 TOPS result at a fraction of today’s costs.
We are delighted to share the presentation “Insights and Trends of Machine Learning for Computer Vision” recently given, at different conferences, by our head of machine learning Bram-Ernst Verhoef and our Algorithm and Quantisation researcher Martino Dazzi.
In this blog post, we review the V-MLP literature, compare V-MLPs to CNNs and ViTs, and attempt to extract the ingredients that really matter for efficient and accurate deep learning-based computer vision.
Convolutional Neural Networks (CNN) have been dominant in Computer Vision applications for over a decade. Today, they are being outperformed and replaced by Vision Transformers (ViT) with a higher learning capacity. The fastest ViTs are essentially a CNN/Transformer hybrid, combining the best of both worlds.