Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconNLP with Transformers: Advanced Techniques and Multimodal Applications
NLP with Transformers: Advanced Techniques and Multimodal Applications

Project 1: Machine Translation with MarianMT

Step 4: Exploring Additional Language Pairs

MarianMT supports various language pairs. You can experiment with models such as:

  • Helsinki-NLP/opus-mt-en-de for English to German.
  • Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]

print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

  • Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
  • Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

  • Creates a sample text array with one sentence: "Welcome to the world of transformers!"
  • Converts the text into tokens that the model can understand using the tokenizer
  • Generates the translation using the model's generate method
  • Decodes the output back into readable text, skipping special tokens

3. Output:

  • Finally prints the translated text, showing the English to German conversion

Step 4: Exploring Additional Language Pairs

MarianMT supports various language pairs. You can experiment with models such as:

  • Helsinki-NLP/opus-mt-en-de for English to German.
  • Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]

print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

  • Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
  • Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

  • Creates a sample text array with one sentence: "Welcome to the world of transformers!"
  • Converts the text into tokens that the model can understand using the tokenizer
  • Generates the translation using the model's generate method
  • Decodes the output back into readable text, skipping special tokens

3. Output:

  • Finally prints the translated text, showing the English to German conversion

Step 4: Exploring Additional Language Pairs

MarianMT supports various language pairs. You can experiment with models such as:

  • Helsinki-NLP/opus-mt-en-de for English to German.
  • Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]

print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

  • Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
  • Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

  • Creates a sample text array with one sentence: "Welcome to the world of transformers!"
  • Converts the text into tokens that the model can understand using the tokenizer
  • Generates the translation using the model's generate method
  • Decodes the output back into readable text, skipping special tokens

3. Output:

  • Finally prints the translated text, showing the English to German conversion

Step 4: Exploring Additional Language Pairs

MarianMT supports various language pairs. You can experiment with models such as:

  • Helsinki-NLP/opus-mt-en-de for English to German.
  • Helsinki-NLP/opus-mt-fr-en for French to English.

Simply replace the model name in the model_name variable to load a different language pair. Here’s an example for English to German:

model_name = "Helsinki-NLP/opus-mt-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Translate a sentence
text_to_translate = ["Welcome to the world of transformers!"]
inputs = tokenizer(text_to_translate, return_tensors="pt", padding=True)
translated_outputs = model.generate(**inputs)
translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated_outputs]

print(f"Translated Text (EN to DE): {translated_texts[0]}")

Let's break down this code example:

1. Model Setup:

  • Sets up the English to German translation model using "Helsinki-NLP/opus-mt-en-de"
  • Initializes both the tokenizer and model from the pre-trained weights

2. Translation Process:

  • Creates a sample text array with one sentence: "Welcome to the world of transformers!"
  • Converts the text into tokens that the model can understand using the tokenizer
  • Generates the translation using the model's generate method
  • Decodes the output back into readable text, skipping special tokens

3. Output:

  • Finally prints the translated text, showing the English to German conversion