Project 1: Machine Translation with MarianMT
Step 4: Exploring Additional Language Pairs
MarianMT supports various language pairs. You can experiment with models such as:
Helsinki-NLP/opus-mt-en-de
for English to German.Helsinki-NLP/opus-mt-fr-en
for French to English.
Simply replace the model name in the model_name
variable to load a different language pair. Here’s an example for English to German:
model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]
print(f"Translated Text (EN to DE): {translated_texts[0]}")
Let's break down this code example:
1. Model Setup:
- Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
- Initializes both the tokenizer and model from the pre-trained weights
2. Translation Process:
- Creates a sample text array with one sentence: "Welcome to the world of transformers!"
- Converts the text into tokens that the model can understand using the tokenizer
- Generates the translation using the model's generate method
- Decodes the output back into readable text, skipping special tokens
3. Output:
- Finally prints the translated text, showing the English to German conversion
Step 4: Exploring Additional Language Pairs
MarianMT supports various language pairs. You can experiment with models such as:
Helsinki-NLP/opus-mt-en-de
for English to German.Helsinki-NLP/opus-mt-fr-en
for French to English.
Simply replace the model name in the model_name
variable to load a different language pair. Here’s an example for English to German:
model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]
print(f"Translated Text (EN to DE): {translated_texts[0]}")
Let's break down this code example:
1. Model Setup:
- Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
- Initializes both the tokenizer and model from the pre-trained weights
2. Translation Process:
- Creates a sample text array with one sentence: "Welcome to the world of transformers!"
- Converts the text into tokens that the model can understand using the tokenizer
- Generates the translation using the model's generate method
- Decodes the output back into readable text, skipping special tokens
3. Output:
- Finally prints the translated text, showing the English to German conversion
Step 4: Exploring Additional Language Pairs
MarianMT supports various language pairs. You can experiment with models such as:
Helsinki-NLP/opus-mt-en-de
for English to German.Helsinki-NLP/opus-mt-fr-en
for French to English.
Simply replace the model name in the model_name
variable to load a different language pair. Here’s an example for English to German:
model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]
print(f"Translated Text (EN to DE): {translated_texts[0]}")
Let's break down this code example:
1. Model Setup:
- Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
- Initializes both the tokenizer and model from the pre-trained weights
2. Translation Process:
- Creates a sample text array with one sentence: "Welcome to the world of transformers!"
- Converts the text into tokens that the model can understand using the tokenizer
- Generates the translation using the model's generate method
- Decodes the output back into readable text, skipping special tokens
3. Output:
- Finally prints the translated text, showing the English to German conversion
Step 4: Exploring Additional Language Pairs
MarianMT supports various language pairs. You can experiment with models such as:
Helsinki-NLP/opus-mt-en-de
for English to German.Helsinki-NLP/opus-mt-fr-en
for French to English.
Simply replace the model name in the model_name
variable to load a different language pair. Here’s an example for English to German:
model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]
print(f"Translated Text (EN to DE): {translated_texts[0]}")
Let's break down this code example:
1. Model Setup:
- Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
- Initializes both the tokenizer and model from the pre-trained weights
2. Translation Process:
- Creates a sample text array with one sentence: "Welcome to the world of transformers!"
- Converts the text into tokens that the model can understand using the tokenizer
- Generates the translation using the model's generate method
- Decodes the output back into readable text, skipping special tokens
3. Output:
- Finally prints the translated text, showing the English to German conversion