Chapter 5: Positional Encoding in Transformers
5.2 Understanding Positional Encoding
The positional encoding for a position $p$ in the sequence and a dimension $i$ in the embedding space is computed as:
$PE_{(p,2i)} = sin(p / 10000^{2i/d_{model}})$
$PE_{(p,2i+1)} = cos(p / 10000^{2i/d_{model}})$
where:
- $PE_{(p,2i)}$ and $PE_{(p,2i+1)}$ are the positional encodings for the position $p$ and dimensions $2i$ and $2i+1$.
- $p$ is the position in the sequence.
- $i$ is the dimension in the embedding space.
- $d_{model}$ is the dimensionality of the embeddings.
These formulas generate sinusoidal curves for positional encodings. For each dimension of the embedding, the model learns a different frequency of sine and cosine curves. This variation allows the model to learn to attend to relative positions since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.
Let's take a look at how to implement these positional encoding formulas in Python:
import numpy as np
import matplotlib.pyplot as plt
def positional_encoding(sequence_length, d_model):
positions = np.arange(sequence_length)[:, np.newaxis]
div_terms = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
pos_enc = np.zeros((sequence_length, d_model))
pos_enc[:, 0::2] = np.sin(positions * div_terms)
pos_enc[:, 1::2] = np.cos(position
s * div_terms)
return pos_enc
pos_enc = positional_encoding(50, 512)
plt.figure(figsize=(12,8))
plt.pcolormesh(pos_enc, cmap='viridis')
plt.xlabel('Depth')
plt.xlim((0, 512))
plt.ylim((50,0))
plt.ylabel('Position')
plt.colorbar()
plt.show()
This code generates a 2D numpy array with positional encodings for a sequence of length 50 and an embedding dimension of 512. The plot shows the values of the positional encodings. As you can see, the values oscillate between -1 and 1 in a sinusoidal pattern.
5.2 Understanding Positional Encoding
The positional encoding for a position $p$ in the sequence and a dimension $i$ in the embedding space is computed as:
$PE_{(p,2i)} = sin(p / 10000^{2i/d_{model}})$
$PE_{(p,2i+1)} = cos(p / 10000^{2i/d_{model}})$
where:
- $PE_{(p,2i)}$ and $PE_{(p,2i+1)}$ are the positional encodings for the position $p$ and dimensions $2i$ and $2i+1$.
- $p$ is the position in the sequence.
- $i$ is the dimension in the embedding space.
- $d_{model}$ is the dimensionality of the embeddings.
These formulas generate sinusoidal curves for positional encodings. For each dimension of the embedding, the model learns a different frequency of sine and cosine curves. This variation allows the model to learn to attend to relative positions since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.
Let's take a look at how to implement these positional encoding formulas in Python:
import numpy as np
import matplotlib.pyplot as plt
def positional_encoding(sequence_length, d_model):
positions = np.arange(sequence_length)[:, np.newaxis]
div_terms = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
pos_enc = np.zeros((sequence_length, d_model))
pos_enc[:, 0::2] = np.sin(positions * div_terms)
pos_enc[:, 1::2] = np.cos(position
s * div_terms)
return pos_enc
pos_enc = positional_encoding(50, 512)
plt.figure(figsize=(12,8))
plt.pcolormesh(pos_enc, cmap='viridis')
plt.xlabel('Depth')
plt.xlim((0, 512))
plt.ylim((50,0))
plt.ylabel('Position')
plt.colorbar()
plt.show()
This code generates a 2D numpy array with positional encodings for a sequence of length 50 and an embedding dimension of 512. The plot shows the values of the positional encodings. As you can see, the values oscillate between -1 and 1 in a sinusoidal pattern.
5.2 Understanding Positional Encoding
The positional encoding for a position $p$ in the sequence and a dimension $i$ in the embedding space is computed as:
$PE_{(p,2i)} = sin(p / 10000^{2i/d_{model}})$
$PE_{(p,2i+1)} = cos(p / 10000^{2i/d_{model}})$
where:
- $PE_{(p,2i)}$ and $PE_{(p,2i+1)}$ are the positional encodings for the position $p$ and dimensions $2i$ and $2i+1$.
- $p$ is the position in the sequence.
- $i$ is the dimension in the embedding space.
- $d_{model}$ is the dimensionality of the embeddings.
These formulas generate sinusoidal curves for positional encodings. For each dimension of the embedding, the model learns a different frequency of sine and cosine curves. This variation allows the model to learn to attend to relative positions since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.
Let's take a look at how to implement these positional encoding formulas in Python:
import numpy as np
import matplotlib.pyplot as plt
def positional_encoding(sequence_length, d_model):
positions = np.arange(sequence_length)[:, np.newaxis]
div_terms = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
pos_enc = np.zeros((sequence_length, d_model))
pos_enc[:, 0::2] = np.sin(positions * div_terms)
pos_enc[:, 1::2] = np.cos(position
s * div_terms)
return pos_enc
pos_enc = positional_encoding(50, 512)
plt.figure(figsize=(12,8))
plt.pcolormesh(pos_enc, cmap='viridis')
plt.xlabel('Depth')
plt.xlim((0, 512))
plt.ylim((50,0))
plt.ylabel('Position')
plt.colorbar()
plt.show()
This code generates a 2D numpy array with positional encodings for a sequence of length 50 and an embedding dimension of 512. The plot shows the values of the positional encodings. As you can see, the values oscillate between -1 and 1 in a sinusoidal pattern.
5.2 Understanding Positional Encoding
The positional encoding for a position $p$ in the sequence and a dimension $i$ in the embedding space is computed as:
$PE_{(p,2i)} = sin(p / 10000^{2i/d_{model}})$
$PE_{(p,2i+1)} = cos(p / 10000^{2i/d_{model}})$
where:
- $PE_{(p,2i)}$ and $PE_{(p,2i+1)}$ are the positional encodings for the position $p$ and dimensions $2i$ and $2i+1$.
- $p$ is the position in the sequence.
- $i$ is the dimension in the embedding space.
- $d_{model}$ is the dimensionality of the embeddings.
These formulas generate sinusoidal curves for positional encodings. For each dimension of the embedding, the model learns a different frequency of sine and cosine curves. This variation allows the model to learn to attend to relative positions since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.
Let's take a look at how to implement these positional encoding formulas in Python:
import numpy as np
import matplotlib.pyplot as plt
def positional_encoding(sequence_length, d_model):
positions = np.arange(sequence_length)[:, np.newaxis]
div_terms = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
pos_enc = np.zeros((sequence_length, d_model))
pos_enc[:, 0::2] = np.sin(positions * div_terms)
pos_enc[:, 1::2] = np.cos(position
s * div_terms)
return pos_enc
pos_enc = positional_encoding(50, 512)
plt.figure(figsize=(12,8))
plt.pcolormesh(pos_enc, cmap='viridis')
plt.xlabel('Depth')
plt.xlim((0, 512))
plt.ylim((50,0))
plt.ylabel('Position')
plt.colorbar()
plt.show()
This code generates a 2D numpy array with positional encodings for a sequence of length 50 and an embedding dimension of 512. The plot shows the values of the positional encodings. As you can see, the values oscillate between -1 and 1 in a sinusoidal pattern.