About transformers

What are transformers?

Transformers are a model architecture, a type of neural network design introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al. It revolutionized natural language processing (NLP) by introducing self-attention, allowing models to understand relationships between words in a sequence more effectively than previous architectures like RNNs.

Once transformers were developed as a general-purpose architecture, many others used the blueprint to build many different models for tasks like translation, summarization, or text generation—including OpenAI’s GPT.

What is the difference between GPT and a transformer?

If the transformer is the general architecture, GPT (Generative Pre-trained Transformer) is a specific implementation of that architecture, developed by OpenAI. It’s a family of models (GPT-1, GPT-2, GPT-3, GPT-4…) designed for generative tasks like text generation, conversation, etc. It is pretrained on large text corpora and can be customized or prompted for specific downstream tasks.

What are transformers?

What is the difference between GPT and a transformer?

A few transformers we will look at today

Llama 3

Nous Hermes 2 (Mistral DPO)

DeepSeek R1