Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconNLP con Transformers, técnicas avanzadas y aplicaciones multimodales
NLP con Transformers, técnicas avanzadas y aplicaciones multimodales

Project 4: Named Entity Recognition (NER) Pipeline with Custom Fine-Tuning

Step 5: Build the NER Pipeline

Create a pipeline that will handle three essential tasks in sequence:

  1. Process text input by breaking it down into tokens that the model can understand
  2. Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
  3. Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure

This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.

from transformers import pipeline

# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")

# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)

# Print recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")

Here's a breakdown of what the code does:

1. Pipeline Setup

  • Imports the pipeline module from the transformers library
  • Creates a NER pipeline using the previously fine-tuned model stored in "./results"
  • Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens

2. Text Processing

  • Takes a sample text input ("Barack Obama was born in Hawaii.")
  • Processes the text through the NER pipeline to identify entities

3. Output Format

  • Loops through the detected entities
  • For each entity, prints three pieces of information:
    • The word/phrase identified as an entity
    • The entity type (e.g., PER for person, LOC for location)
    • A confidence score indicating how certain the model is about its prediction

This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.

Step 5: Build the NER Pipeline

Create a pipeline that will handle three essential tasks in sequence:

  1. Process text input by breaking it down into tokens that the model can understand
  2. Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
  3. Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure

This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.

from transformers import pipeline

# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")

# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)

# Print recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")

Here's a breakdown of what the code does:

1. Pipeline Setup

  • Imports the pipeline module from the transformers library
  • Creates a NER pipeline using the previously fine-tuned model stored in "./results"
  • Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens

2. Text Processing

  • Takes a sample text input ("Barack Obama was born in Hawaii.")
  • Processes the text through the NER pipeline to identify entities

3. Output Format

  • Loops through the detected entities
  • For each entity, prints three pieces of information:
    • The word/phrase identified as an entity
    • The entity type (e.g., PER for person, LOC for location)
    • A confidence score indicating how certain the model is about its prediction

This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.

Step 5: Build the NER Pipeline

Create a pipeline that will handle three essential tasks in sequence:

  1. Process text input by breaking it down into tokens that the model can understand
  2. Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
  3. Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure

This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.

from transformers import pipeline

# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")

# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)

# Print recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")

Here's a breakdown of what the code does:

1. Pipeline Setup

  • Imports the pipeline module from the transformers library
  • Creates a NER pipeline using the previously fine-tuned model stored in "./results"
  • Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens

2. Text Processing

  • Takes a sample text input ("Barack Obama was born in Hawaii.")
  • Processes the text through the NER pipeline to identify entities

3. Output Format

  • Loops through the detected entities
  • For each entity, prints three pieces of information:
    • The word/phrase identified as an entity
    • The entity type (e.g., PER for person, LOC for location)
    • A confidence score indicating how certain the model is about its prediction

This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.

Step 5: Build the NER Pipeline

Create a pipeline that will handle three essential tasks in sequence:

  1. Process text input by breaking it down into tokens that the model can understand
  2. Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
  3. Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure

This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.

from transformers import pipeline

# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")

# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)

# Print recognized entities
for entity in entities:
    print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")

Here's a breakdown of what the code does:

1. Pipeline Setup

  • Imports the pipeline module from the transformers library
  • Creates a NER pipeline using the previously fine-tuned model stored in "./results"
  • Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens

2. Text Processing

  • Takes a sample text input ("Barack Obama was born in Hawaii.")
  • Processes the text through the NER pipeline to identify entities

3. Output Format

  • Loops through the detected entities
  • For each entity, prints three pieces of information:
    • The word/phrase identified as an entity
    • The entity type (e.g., PER for person, LOC for location)
    • A confidence score indicating how certain the model is about its prediction

This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.