What are transformers?

Transformers are a model architecture, a type of neural network design introduced in the 2017 paper Attention Is All You Need by Vaswani et al. It revolutionized natural language processing (NLP) by introducing self-attention, allowing models to understand relationships between words in a sequence more effectively than previous architectures like RNNs.

Once transformers were developed as a general-purpose architecture, many others used the blueprint to build many different models for tasks like translation, summarization, or text generation—including OpenAI’s GPT.

What is the difference between GPT and a transformer?

If the transformer is the general architecture, GPT (Generative Pre-trained Transformer) is a specific implementation of that architecture, developed by OpenAI. It’s a family of models (GPT-1, GPT-2, GPT-3, GPT-4…) designed for generative tasks like text generation, conversation, etc. It is pretrained on large text corpora and can be customized or prompted for specific downstream tasks.

A few transformers we will look at today

Llama 3

Nous Hermes 2 (Mistral DPO)

DeepSeek R1