Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconNLP con Transformers: fundamentos y aplicaciones principales
NLP con Transformers: fundamentos y aplicaciones principales

Project 2: News Categorization Using BERT

8. Step 5: Testing with New Data

You can test your model on custom news articles to see how well it categorizes them.

# Define a custom news article
custom_text = "The stock market saw significant gains today as tech stocks rallied."

# Tokenize and predict
inputs = tokenizer(custom_text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()

# Map predicted label to category
categories = ['World', 'Sports', 'Business', 'Sci/Tech']
print(f"Predicted Category: {categories[predicted_label]}")

Let's break down this code that tests the BERT model with new data:

1. Input Definition:

custom_text = "The stock market saw significant gains today as tech stocks rallied."

This line creates a sample news article text that we want to categorize.

2. Processing the Input:

  • The tokenizer() function converts the text into a format BERT can understand, with these parameters:
    • return_tensors="pt": Returns PyTorch tensors
    • truncation=True: Cuts text if it's too long
    • padding=True: Adds padding to standardize input length

3. Making Predictions:

  • The model(**inputs) runs the processed text through the BERT model
  • The outputs.logits.argmax(-1).item() gets the predicted category index with the highest probability

4. Category Mapping:

  • The code maps the numerical prediction to one of four categories: World, Sports, Business, or Sci/Tech
  • Finally, it prints the predicted category for the input text

This code represents the practical application of the BERT model, allowing it to categorize any new news article into one of these predefined categories.

8. Step 5: Testing with New Data

You can test your model on custom news articles to see how well it categorizes them.

# Define a custom news article
custom_text = "The stock market saw significant gains today as tech stocks rallied."

# Tokenize and predict
inputs = tokenizer(custom_text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()

# Map predicted label to category
categories = ['World', 'Sports', 'Business', 'Sci/Tech']
print(f"Predicted Category: {categories[predicted_label]}")

Let's break down this code that tests the BERT model with new data:

1. Input Definition:

custom_text = "The stock market saw significant gains today as tech stocks rallied."

This line creates a sample news article text that we want to categorize.

2. Processing the Input:

  • The tokenizer() function converts the text into a format BERT can understand, with these parameters:
    • return_tensors="pt": Returns PyTorch tensors
    • truncation=True: Cuts text if it's too long
    • padding=True: Adds padding to standardize input length

3. Making Predictions:

  • The model(**inputs) runs the processed text through the BERT model
  • The outputs.logits.argmax(-1).item() gets the predicted category index with the highest probability

4. Category Mapping:

  • The code maps the numerical prediction to one of four categories: World, Sports, Business, or Sci/Tech
  • Finally, it prints the predicted category for the input text

This code represents the practical application of the BERT model, allowing it to categorize any new news article into one of these predefined categories.

8. Step 5: Testing with New Data

You can test your model on custom news articles to see how well it categorizes them.

# Define a custom news article
custom_text = "The stock market saw significant gains today as tech stocks rallied."

# Tokenize and predict
inputs = tokenizer(custom_text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()

# Map predicted label to category
categories = ['World', 'Sports', 'Business', 'Sci/Tech']
print(f"Predicted Category: {categories[predicted_label]}")

Let's break down this code that tests the BERT model with new data:

1. Input Definition:

custom_text = "The stock market saw significant gains today as tech stocks rallied."

This line creates a sample news article text that we want to categorize.

2. Processing the Input:

  • The tokenizer() function converts the text into a format BERT can understand, with these parameters:
    • return_tensors="pt": Returns PyTorch tensors
    • truncation=True: Cuts text if it's too long
    • padding=True: Adds padding to standardize input length

3. Making Predictions:

  • The model(**inputs) runs the processed text through the BERT model
  • The outputs.logits.argmax(-1).item() gets the predicted category index with the highest probability

4. Category Mapping:

  • The code maps the numerical prediction to one of four categories: World, Sports, Business, or Sci/Tech
  • Finally, it prints the predicted category for the input text

This code represents the practical application of the BERT model, allowing it to categorize any new news article into one of these predefined categories.

8. Step 5: Testing with New Data

You can test your model on custom news articles to see how well it categorizes them.

# Define a custom news article
custom_text = "The stock market saw significant gains today as tech stocks rallied."

# Tokenize and predict
inputs = tokenizer(custom_text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()

# Map predicted label to category
categories = ['World', 'Sports', 'Business', 'Sci/Tech']
print(f"Predicted Category: {categories[predicted_label]}")

Let's break down this code that tests the BERT model with new data:

1. Input Definition:

custom_text = "The stock market saw significant gains today as tech stocks rallied."

This line creates a sample news article text that we want to categorize.

2. Processing the Input:

  • The tokenizer() function converts the text into a format BERT can understand, with these parameters:
    • return_tensors="pt": Returns PyTorch tensors
    • truncation=True: Cuts text if it's too long
    • padding=True: Adds padding to standardize input length

3. Making Predictions:

  • The model(**inputs) runs the processed text through the BERT model
  • The outputs.logits.argmax(-1).item() gets the predicted category index with the highest probability

4. Category Mapping:

  • The code maps the numerical prediction to one of four categories: World, Sports, Business, or Sci/Tech
  • Finally, it prints the predicted category for the input text

This code represents the practical application of the BERT model, allowing it to categorize any new news article into one of these predefined categories.