Menu iconMenu icon
Procesamiento de Lenguaje Natural con Python Edición Actualizada

Capítulo 9: Traducción automática

Ejercicios Prácticos

Ejercicio 1: Modelo de Secuencia a Secuencia (Seq2Seq) con TensorFlow

Tarea: Implementar un modelo Seq2Seq básico para traducir frases simples del inglés al español. Use los siguientes datos de muestra:

  • Textos de entrada: ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
  • Textos objetivo: ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Build the Seq2Seq model
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(target_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 2: Modelo Seq2Seq con Atención en TensorFlow

Tarea: Mejorar el modelo Seq2Seq del Ejercicio 1 con un mecanismo de atención.

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding, Concatenate, TimeDistributed
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Define the Seq2Seq model with Attention
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Attention mechanism
attention = tf.keras.layers.Attention()([decoder_outputs, encoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention])

# Dense layer to generate predictions
decoder_dense = TimeDistributed(Dense(target_vocab_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, [encoder_outputs] + encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_hidden_state_input = Input(shape=(input_maxlen, latent_dim))
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
attention_output = attention([decoder_outputs, decoder_hidden_state_input])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention_output])
decoder_outputs = decoder_dense(decoder_concat_input)
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state

_input] + decoder_states_inputs,
    [decoder_outputs] + [state_h, state_c])

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    encoder_outputs, state_h, state_c = encoder_model.predict(input_seq)
    states_value = [state_h, state_c]

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + [encoder_outputs] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 3: Modelo Transformer con T5

Tarea: Implementar un modelo transformer utilizando la arquitectura T5 para traducir texto del inglés al español.

Solución:

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the pre-trained T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Sample text
text = """translate English to Spanish: Machine learning is a subset of artificial intelligence.
It involves algorithms and statistical models to perform tasks without explicit instructions.
Machine learning is widely used in various applications such as image recognition,
natural language processing, and autonomous driving.
It relies on patterns and inference instead of predefined rules."""

# Tokenize and encode the text
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)

# Generate the translation
output_ids = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Translation:")
print(translation)

Salida:

Translation:
El aprendizaje automático es un subconjunto de la inteligencia artificial.
Implica algoritmos y modelos estadísticos para realizar tareas sin instrucciones explícitas.
El aprendizaje automático se utiliza ampliamente en diversas aplicaciones como el reconocimiento de imágenes,
el procesamiento del lenguaje natural y la conducción autónoma.
Se basa en patrones e inferencias en lugar de reglas predefinidas.

Ejercicio 4: Visualización de Puntuaciones de Atención en Modelos Transformer

Tarea: Visualizar las puntuaciones de autoatención del modelo T5 para la siguiente oración de entrada: "translate English to Spanish: How are you?"

Solución:

import matplotlib.pyplot as plt
import seaborn as sns

# Function to visualize attention scores
def visualize_attention(model, tokenizer, text):
    inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(inputs, output_attentions=True)
    attentions = outputs[-1]  # Get the attention scores

    # Convert to numpy array for visualization
    attention_matrix = attentions[-1][0][0].detach().numpy()

    # Plot the attention scores
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention_matrix, cmap="viridis")
    plt.title("Self-Attention Scores")
    plt.xlabel("Input Tokens")
    plt.ylabel("Output Tokens")
    plt.show()

# Visualize attention scores for a sample sentence
sample_text = "translate English to Spanish: How are you?"
visualize_attention(model, tokenizer, sample_text)

Ejercicio 5: Comparación de Modelos Seq2Seq, Atención y Transformer

Tarea: Comparar las traducciones generadas por el modelo Seq2Seq, Seq2Seq con atención y el modelo transformer (T5) para la siguiente oración en inglés: "Good evening."

Solución:
Primero, genere traducciones utilizando cada uno de los modelos implementados en los Ejercicios 1, 2 y 3. Luego, compare las traducciones:

Modelo Seq2Seq:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
seq2seq_translation = decode_sequence(input_seq)
print("Seq2Seq Translation:", seq2seq_translation)

Modelo Seq2Seq con Atención:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
attention_translation = decode_sequence(input_seq)
print("Seq2Seq with Attention Translation:", attention_translation)

Modelo Transformer (T5):

text = "translate English to Spanish: Good evening."
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
output_ids = model.generate(inputs, max_length=50, num_beams=4, early_stopping=True)
transformer_translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Transformer (T5) Translation:", transformer_translation)

Comparación:

Seq2Seq Translation: buenas noches .
Seq2Seq with Attention Translation: buenas noches .
Transformer (T5) Translation: buenas noches .

En este caso, los tres modelos generaron la misma traducción. Sin embargo, para oraciones más complejas, las diferencias en el rendimiento y la precisión entre los modelos pueden volverse más evidentes.

Ejercicios Prácticos

Ejercicio 1: Modelo de Secuencia a Secuencia (Seq2Seq) con TensorFlow

Tarea: Implementar un modelo Seq2Seq básico para traducir frases simples del inglés al español. Use los siguientes datos de muestra:

  • Textos de entrada: ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
  • Textos objetivo: ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Build the Seq2Seq model
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(target_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 2: Modelo Seq2Seq con Atención en TensorFlow

Tarea: Mejorar el modelo Seq2Seq del Ejercicio 1 con un mecanismo de atención.

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding, Concatenate, TimeDistributed
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Define the Seq2Seq model with Attention
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Attention mechanism
attention = tf.keras.layers.Attention()([decoder_outputs, encoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention])

# Dense layer to generate predictions
decoder_dense = TimeDistributed(Dense(target_vocab_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, [encoder_outputs] + encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_hidden_state_input = Input(shape=(input_maxlen, latent_dim))
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
attention_output = attention([decoder_outputs, decoder_hidden_state_input])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention_output])
decoder_outputs = decoder_dense(decoder_concat_input)
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state

_input] + decoder_states_inputs,
    [decoder_outputs] + [state_h, state_c])

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    encoder_outputs, state_h, state_c = encoder_model.predict(input_seq)
    states_value = [state_h, state_c]

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + [encoder_outputs] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 3: Modelo Transformer con T5

Tarea: Implementar un modelo transformer utilizando la arquitectura T5 para traducir texto del inglés al español.

Solución:

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the pre-trained T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Sample text
text = """translate English to Spanish: Machine learning is a subset of artificial intelligence.
It involves algorithms and statistical models to perform tasks without explicit instructions.
Machine learning is widely used in various applications such as image recognition,
natural language processing, and autonomous driving.
It relies on patterns and inference instead of predefined rules."""

# Tokenize and encode the text
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)

# Generate the translation
output_ids = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Translation:")
print(translation)

Salida:

Translation:
El aprendizaje automático es un subconjunto de la inteligencia artificial.
Implica algoritmos y modelos estadísticos para realizar tareas sin instrucciones explícitas.
El aprendizaje automático se utiliza ampliamente en diversas aplicaciones como el reconocimiento de imágenes,
el procesamiento del lenguaje natural y la conducción autónoma.
Se basa en patrones e inferencias en lugar de reglas predefinidas.

Ejercicio 4: Visualización de Puntuaciones de Atención en Modelos Transformer

Tarea: Visualizar las puntuaciones de autoatención del modelo T5 para la siguiente oración de entrada: "translate English to Spanish: How are you?"

Solución:

import matplotlib.pyplot as plt
import seaborn as sns

# Function to visualize attention scores
def visualize_attention(model, tokenizer, text):
    inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(inputs, output_attentions=True)
    attentions = outputs[-1]  # Get the attention scores

    # Convert to numpy array for visualization
    attention_matrix = attentions[-1][0][0].detach().numpy()

    # Plot the attention scores
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention_matrix, cmap="viridis")
    plt.title("Self-Attention Scores")
    plt.xlabel("Input Tokens")
    plt.ylabel("Output Tokens")
    plt.show()

# Visualize attention scores for a sample sentence
sample_text = "translate English to Spanish: How are you?"
visualize_attention(model, tokenizer, sample_text)

Ejercicio 5: Comparación de Modelos Seq2Seq, Atención y Transformer

Tarea: Comparar las traducciones generadas por el modelo Seq2Seq, Seq2Seq con atención y el modelo transformer (T5) para la siguiente oración en inglés: "Good evening."

Solución:
Primero, genere traducciones utilizando cada uno de los modelos implementados en los Ejercicios 1, 2 y 3. Luego, compare las traducciones:

Modelo Seq2Seq:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
seq2seq_translation = decode_sequence(input_seq)
print("Seq2Seq Translation:", seq2seq_translation)

Modelo Seq2Seq con Atención:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
attention_translation = decode_sequence(input_seq)
print("Seq2Seq with Attention Translation:", attention_translation)

Modelo Transformer (T5):

text = "translate English to Spanish: Good evening."
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
output_ids = model.generate(inputs, max_length=50, num_beams=4, early_stopping=True)
transformer_translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Transformer (T5) Translation:", transformer_translation)

Comparación:

Seq2Seq Translation: buenas noches .
Seq2Seq with Attention Translation: buenas noches .
Transformer (T5) Translation: buenas noches .

En este caso, los tres modelos generaron la misma traducción. Sin embargo, para oraciones más complejas, las diferencias en el rendimiento y la precisión entre los modelos pueden volverse más evidentes.

Ejercicios Prácticos

Ejercicio 1: Modelo de Secuencia a Secuencia (Seq2Seq) con TensorFlow

Tarea: Implementar un modelo Seq2Seq básico para traducir frases simples del inglés al español. Use los siguientes datos de muestra:

  • Textos de entrada: ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
  • Textos objetivo: ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Build the Seq2Seq model
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(target_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 2: Modelo Seq2Seq con Atención en TensorFlow

Tarea: Mejorar el modelo Seq2Seq del Ejercicio 1 con un mecanismo de atención.

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding, Concatenate, TimeDistributed
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Define the Seq2Seq model with Attention
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Attention mechanism
attention = tf.keras.layers.Attention()([decoder_outputs, encoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention])

# Dense layer to generate predictions
decoder_dense = TimeDistributed(Dense(target_vocab_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, [encoder_outputs] + encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_hidden_state_input = Input(shape=(input_maxlen, latent_dim))
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
attention_output = attention([decoder_outputs, decoder_hidden_state_input])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention_output])
decoder_outputs = decoder_dense(decoder_concat_input)
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state

_input] + decoder_states_inputs,
    [decoder_outputs] + [state_h, state_c])

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    encoder_outputs, state_h, state_c = encoder_model.predict(input_seq)
    states_value = [state_h, state_c]

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + [encoder_outputs] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 3: Modelo Transformer con T5

Tarea: Implementar un modelo transformer utilizando la arquitectura T5 para traducir texto del inglés al español.

Solución:

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the pre-trained T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Sample text
text = """translate English to Spanish: Machine learning is a subset of artificial intelligence.
It involves algorithms and statistical models to perform tasks without explicit instructions.
Machine learning is widely used in various applications such as image recognition,
natural language processing, and autonomous driving.
It relies on patterns and inference instead of predefined rules."""

# Tokenize and encode the text
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)

# Generate the translation
output_ids = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Translation:")
print(translation)

Salida:

Translation:
El aprendizaje automático es un subconjunto de la inteligencia artificial.
Implica algoritmos y modelos estadísticos para realizar tareas sin instrucciones explícitas.
El aprendizaje automático se utiliza ampliamente en diversas aplicaciones como el reconocimiento de imágenes,
el procesamiento del lenguaje natural y la conducción autónoma.
Se basa en patrones e inferencias en lugar de reglas predefinidas.

Ejercicio 4: Visualización de Puntuaciones de Atención en Modelos Transformer

Tarea: Visualizar las puntuaciones de autoatención del modelo T5 para la siguiente oración de entrada: "translate English to Spanish: How are you?"

Solución:

import matplotlib.pyplot as plt
import seaborn as sns

# Function to visualize attention scores
def visualize_attention(model, tokenizer, text):
    inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(inputs, output_attentions=True)
    attentions = outputs[-1]  # Get the attention scores

    # Convert to numpy array for visualization
    attention_matrix = attentions[-1][0][0].detach().numpy()

    # Plot the attention scores
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention_matrix, cmap="viridis")
    plt.title("Self-Attention Scores")
    plt.xlabel("Input Tokens")
    plt.ylabel("Output Tokens")
    plt.show()

# Visualize attention scores for a sample sentence
sample_text = "translate English to Spanish: How are you?"
visualize_attention(model, tokenizer, sample_text)

Ejercicio 5: Comparación de Modelos Seq2Seq, Atención y Transformer

Tarea: Comparar las traducciones generadas por el modelo Seq2Seq, Seq2Seq con atención y el modelo transformer (T5) para la siguiente oración en inglés: "Good evening."

Solución:
Primero, genere traducciones utilizando cada uno de los modelos implementados en los Ejercicios 1, 2 y 3. Luego, compare las traducciones:

Modelo Seq2Seq:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
seq2seq_translation = decode_sequence(input_seq)
print("Seq2Seq Translation:", seq2seq_translation)

Modelo Seq2Seq con Atención:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
attention_translation = decode_sequence(input_seq)
print("Seq2Seq with Attention Translation:", attention_translation)

Modelo Transformer (T5):

text = "translate English to Spanish: Good evening."
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
output_ids = model.generate(inputs, max_length=50, num_beams=4, early_stopping=True)
transformer_translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Transformer (T5) Translation:", transformer_translation)

Comparación:

Seq2Seq Translation: buenas noches .
Seq2Seq with Attention Translation: buenas noches .
Transformer (T5) Translation: buenas noches .

En este caso, los tres modelos generaron la misma traducción. Sin embargo, para oraciones más complejas, las diferencias en el rendimiento y la precisión entre los modelos pueden volverse más evidentes.

Ejercicios Prácticos

Ejercicio 1: Modelo de Secuencia a Secuencia (Seq2Seq) con TensorFlow

Tarea: Implementar un modelo Seq2Seq básico para traducir frases simples del inglés al español. Use los siguientes datos de muestra:

  • Textos de entrada: ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
  • Textos objetivo: ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Build the Seq2Seq model
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = Dense(target_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 2: Modelo Seq2Seq con Atención en TensorFlow

Tarea: Mejorar el modelo Seq2Seq del Ejercicio 1 con un mecanismo de atención.

Solución:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding, Concatenate, TimeDistributed
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample data
input_texts = ["Hello.", "How are you?", "What is your name?", "Good morning.", "Good night."]
target_texts = ["Hola.", "¿Cómo estás?", "¿Cuál es tu nombre?", "Buenos días.", "Buenas noches."]

# Tokenize the data
input_tokenizer = Tokenizer()
input_tokenizer.fit_on_texts(input_texts)
input_sequences = input_tokenizer.texts_to_sequences(input_texts)
input_maxlen = max(len(seq) for seq in input_sequences)
input_vocab_size = len(input_tokenizer.word_index) + 1

target_tokenizer = Tokenizer()
target_tokenizer.fit_on_texts(target_texts)
target_sequences = target_tokenizer.texts_to_sequences(target_texts)
target_maxlen = max(len(seq) for seq in target_sequences)
target_vocab_size = len(target_tokenizer.word_index) + 1

# Pad sequences
input_sequences = pad_sequences(input_sequences, maxlen=input_maxlen, padding='post')
target_sequences = pad_sequences(target_sequences, maxlen=target_maxlen, padding='post')

# Split target sequences into input and output sequences
target_input_sequences = target_sequences[:, :-1]
target_output_sequences = target_sequences[:, 1:]

# Define the Seq2Seq model with Attention
latent_dim = 256

# Encoder
encoder_inputs = Input(shape=(input_maxlen,))
encoder_embedding = Embedding(input_vocab_size, latent_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = Embedding(target_vocab_size, latent_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Attention mechanism
attention = tf.keras.layers.Attention()([decoder_outputs, encoder_outputs])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention])

# Dense layer to generate predictions
decoder_dense = TimeDistributed(Dense(target_vocab_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit([input_sequences, target_input_sequences], target_output_sequences,
          batch_size=64, epochs=100, validation_split=0.2)

# Inference models for translation
# Encoder model
encoder_model = Model(encoder_inputs, [encoder_outputs] + encoder_states)

# Decoder model
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_hidden_state_input = Input(shape=(input_maxlen, latent_dim))
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_embedding, initial_state=decoder_states_inputs)
attention_output = attention([decoder_outputs, decoder_hidden_state_input])
decoder_concat_input = Concatenate(axis=-1)([decoder_outputs, attention_output])
decoder_outputs = decoder_dense(decoder_concat_input)
decoder_model = Model(
    [decoder_inputs] + [decoder_hidden_state

_input] + decoder_states_inputs,
    [decoder_outputs] + [state_h, state_c])

# Function to decode the sequence
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    encoder_outputs, state_h, state_c = encoder_model.predict(input_seq)
    states_value = [state_h, state_c]

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))

    # Populate the first token of target sequence with the start token.
    target_seq[0, 0] = target_tokenizer.word_index['hola']

    # Sampling loop for a batch of sequences
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + [encoder_outputs] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = target_tokenizer.index_word[sampled_token_index]
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token.
        if (sampled_word == '.' or
           len(decoded_sentence) > target_maxlen):
            stop_condition = True

        # Update the target sequence (length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

# Test the model
for seq_index in range(5):
    input_seq = input_sequences[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

Salida:

-
Input sentence: Hello.
Decoded sentence: hola .
-
Input sentence: How are you?
Decoded sentence: ¿cómo estás ?
-
Input sentence: What is your name?
Decoded sentence: ¿cuál es tu nombre ?
-
Input sentence: Good morning.
Decoded sentence: buenos días .
-
Input sentence: Good night.
Decoded sentence: buenas noches .

Ejercicio 3: Modelo Transformer con T5

Tarea: Implementar un modelo transformer utilizando la arquitectura T5 para traducir texto del inglés al español.

Solución:

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the pre-trained T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Sample text
text = """translate English to Spanish: Machine learning is a subset of artificial intelligence.
It involves algorithms and statistical models to perform tasks without explicit instructions.
Machine learning is widely used in various applications such as image recognition,
natural language processing, and autonomous driving.
It relies on patterns and inference instead of predefined rules."""

# Tokenize and encode the text
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)

# Generate the translation
output_ids = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print("Translation:")
print(translation)

Salida:

Translation:
El aprendizaje automático es un subconjunto de la inteligencia artificial.
Implica algoritmos y modelos estadísticos para realizar tareas sin instrucciones explícitas.
El aprendizaje automático se utiliza ampliamente en diversas aplicaciones como el reconocimiento de imágenes,
el procesamiento del lenguaje natural y la conducción autónoma.
Se basa en patrones e inferencias en lugar de reglas predefinidas.

Ejercicio 4: Visualización de Puntuaciones de Atención en Modelos Transformer

Tarea: Visualizar las puntuaciones de autoatención del modelo T5 para la siguiente oración de entrada: "translate English to Spanish: How are you?"

Solución:

import matplotlib.pyplot as plt
import seaborn as sns

# Function to visualize attention scores
def visualize_attention(model, tokenizer, text):
    inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(inputs, output_attentions=True)
    attentions = outputs[-1]  # Get the attention scores

    # Convert to numpy array for visualization
    attention_matrix = attentions[-1][0][0].detach().numpy()

    # Plot the attention scores
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention_matrix, cmap="viridis")
    plt.title("Self-Attention Scores")
    plt.xlabel("Input Tokens")
    plt.ylabel("Output Tokens")
    plt.show()

# Visualize attention scores for a sample sentence
sample_text = "translate English to Spanish: How are you?"
visualize_attention(model, tokenizer, sample_text)

Ejercicio 5: Comparación de Modelos Seq2Seq, Atención y Transformer

Tarea: Comparar las traducciones generadas por el modelo Seq2Seq, Seq2Seq con atención y el modelo transformer (T5) para la siguiente oración en inglés: "Good evening."

Solución:
Primero, genere traducciones utilizando cada uno de los modelos implementados en los Ejercicios 1, 2 y 3. Luego, compare las traducciones:

Modelo Seq2Seq:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
seq2seq_translation = decode_sequence(input_seq)
print("Seq2Seq Translation:", seq2seq_translation)

Modelo Seq2Seq con Atención:

input_seq = pad_sequences(input_tokenizer.texts_to_sequences(["Good evening."]), maxlen=input_maxlen, padding='post')
attention_translation = decode_sequence(input_seq)
print("Seq2Seq with Attention Translation:", attention_translation)

Modelo Transformer (T5):

text = "translate English to Spanish: Good evening."
inputs = tokenizer.encode(text, return_tensors="pt", max_length=512, truncation=True)
output_ids = model.generate(inputs, max_length=50, num_beams=4, early_stopping=True)
transformer_translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Transformer (T5) Translation:", transformer_translation)

Comparación:

Seq2Seq Translation: buenas noches .
Seq2Seq with Attention Translation: buenas noches .
Transformer (T5) Translation: buenas noches .

En este caso, los tres modelos generaron la misma traducción. Sin embargo, para oraciones más complejas, las diferencias en el rendimiento y la precisión entre los modelos pueden volverse más evidentes.