Quiz Part I
Answers
Multiple Choice Questions
- c) To enable machines to process, understand, and generate human language.
- b) It ignores word order and context.
- c) Supervised Learning
- b) To introduce non-linearity to the model.
- b) They capture semantic relationships between words.
True/False Questions
- False - Self-attention allows tokens to attend to all other tokens in a sequence, not just preceding tokens.
- True - Sparse attention reduces computation by focusing on relevant subsets.
- False - Transformers eliminate the need for RNNs by using attention mechanisms and parallel processing.
Short Answer Questions
- RNNs face challenges with long-range dependencies because they process sequences sequentially, making it difficult to retain information from earlier parts of a long sequence. Additionally, the vanishing gradient problem during backpropagation limits the effective learning of distant dependencies.
- In the attention mechanism:
- Query (Q): Represents the token for which the model seeks relevant context.
- Key (K): Encodes the features of all tokens in the sequence.
- Value (V): Holds the actual information associated with each token.
The model uses the dot product of Q and K to calculate attention scores, which are then used to weight the values (V) and generate the final context-aware representation.**
Code-Based Question
Solution:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""
Compute scaled dot-product attention.
Q: Queries
K: Keys
V: Values
"""
d_k = Q.shape[-1]
scores = np.dot(Q, K.T) / np.sqrt(d_k) # Scaled dot product
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True) # Softmax
output = np.dot(weights, V) # Weighted sum of values
return output, weights
# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])
output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)
Congratulations!
Completing this quiz demonstrates your understanding of the foundational concepts of NLP, machine learning, and attention mechanisms. As you proceed to the next part, you’ll build on these ideas to explore Transformers and their transformative applications.
Answers
Multiple Choice Questions
- c) To enable machines to process, understand, and generate human language.
- b) It ignores word order and context.
- c) Supervised Learning
- b) To introduce non-linearity to the model.
- b) They capture semantic relationships between words.
True/False Questions
- False - Self-attention allows tokens to attend to all other tokens in a sequence, not just preceding tokens.
- True - Sparse attention reduces computation by focusing on relevant subsets.
- False - Transformers eliminate the need for RNNs by using attention mechanisms and parallel processing.
Short Answer Questions
- RNNs face challenges with long-range dependencies because they process sequences sequentially, making it difficult to retain information from earlier parts of a long sequence. Additionally, the vanishing gradient problem during backpropagation limits the effective learning of distant dependencies.
- In the attention mechanism:
- Query (Q): Represents the token for which the model seeks relevant context.
- Key (K): Encodes the features of all tokens in the sequence.
- Value (V): Holds the actual information associated with each token.
The model uses the dot product of Q and K to calculate attention scores, which are then used to weight the values (V) and generate the final context-aware representation.**
Code-Based Question
Solution:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""
Compute scaled dot-product attention.
Q: Queries
K: Keys
V: Values
"""
d_k = Q.shape[-1]
scores = np.dot(Q, K.T) / np.sqrt(d_k) # Scaled dot product
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True) # Softmax
output = np.dot(weights, V) # Weighted sum of values
return output, weights
# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])
output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)
Congratulations!
Completing this quiz demonstrates your understanding of the foundational concepts of NLP, machine learning, and attention mechanisms. As you proceed to the next part, you’ll build on these ideas to explore Transformers and their transformative applications.
Answers
Multiple Choice Questions
- c) To enable machines to process, understand, and generate human language.
- b) It ignores word order and context.
- c) Supervised Learning
- b) To introduce non-linearity to the model.
- b) They capture semantic relationships between words.
True/False Questions
- False - Self-attention allows tokens to attend to all other tokens in a sequence, not just preceding tokens.
- True - Sparse attention reduces computation by focusing on relevant subsets.
- False - Transformers eliminate the need for RNNs by using attention mechanisms and parallel processing.
Short Answer Questions
- RNNs face challenges with long-range dependencies because they process sequences sequentially, making it difficult to retain information from earlier parts of a long sequence. Additionally, the vanishing gradient problem during backpropagation limits the effective learning of distant dependencies.
- In the attention mechanism:
- Query (Q): Represents the token for which the model seeks relevant context.
- Key (K): Encodes the features of all tokens in the sequence.
- Value (V): Holds the actual information associated with each token.
The model uses the dot product of Q and K to calculate attention scores, which are then used to weight the values (V) and generate the final context-aware representation.**
Code-Based Question
Solution:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""
Compute scaled dot-product attention.
Q: Queries
K: Keys
V: Values
"""
d_k = Q.shape[-1]
scores = np.dot(Q, K.T) / np.sqrt(d_k) # Scaled dot product
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True) # Softmax
output = np.dot(weights, V) # Weighted sum of values
return output, weights
# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])
output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)
Congratulations!
Completing this quiz demonstrates your understanding of the foundational concepts of NLP, machine learning, and attention mechanisms. As you proceed to the next part, you’ll build on these ideas to explore Transformers and their transformative applications.
Answers
Multiple Choice Questions
- c) To enable machines to process, understand, and generate human language.
- b) It ignores word order and context.
- c) Supervised Learning
- b) To introduce non-linearity to the model.
- b) They capture semantic relationships between words.
True/False Questions
- False - Self-attention allows tokens to attend to all other tokens in a sequence, not just preceding tokens.
- True - Sparse attention reduces computation by focusing on relevant subsets.
- False - Transformers eliminate the need for RNNs by using attention mechanisms and parallel processing.
Short Answer Questions
- RNNs face challenges with long-range dependencies because they process sequences sequentially, making it difficult to retain information from earlier parts of a long sequence. Additionally, the vanishing gradient problem during backpropagation limits the effective learning of distant dependencies.
- In the attention mechanism:
- Query (Q): Represents the token for which the model seeks relevant context.
- Key (K): Encodes the features of all tokens in the sequence.
- Value (V): Holds the actual information associated with each token.
The model uses the dot product of Q and K to calculate attention scores, which are then used to weight the values (V) and generate the final context-aware representation.**
Code-Based Question
Solution:
import numpy as np
def scaled_dot_product_attention(Q, K, V):
"""
Compute scaled dot-product attention.
Q: Queries
K: Keys
V: Values
"""
d_k = Q.shape[-1]
scores = np.dot(Q, K.T) / np.sqrt(d_k) # Scaled dot product
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True) # Softmax
output = np.dot(weights, V) # Weighted sum of values
return output, weights
# Example inputs
Q = np.array([[1, 0, 1]])
K = np.array([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
V = np.array([[0.5, 1.0], [0.2, 0.8], [0.9, 0.3]])
output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention Weights:\n", weights)
print("Attention Output:\n", output)
Congratulations!
Completing this quiz demonstrates your understanding of the foundational concepts of NLP, machine learning, and attention mechanisms. As you proceed to the next part, you’ll build on these ideas to explore Transformers and their transformative applications.