Project 4: Named Entity Recognition (NER) Pipeline with Custom Fine-Tuning
Step 5: Build the NER Pipeline
Create a pipeline that will handle three essential tasks in sequence:
- Process text input by breaking it down into tokens that the model can understand
- Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
- Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure
This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.
from transformers import pipeline
# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")
# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)
# Print recognized entities
for entity in entities:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")
Here's a breakdown of what the code does:
1. Pipeline Setup
- Imports the pipeline module from the transformers library
- Creates a NER pipeline using the previously fine-tuned model stored in "./results"
- Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens
2. Text Processing
- Takes a sample text input ("Barack Obama was born in Hawaii.")
- Processes the text through the NER pipeline to identify entities
3. Output Format
- Loops through the detected entities
- For each entity, prints three pieces of information:
- The word/phrase identified as an entity
- The entity type (e.g., PER for person, LOC for location)
- A confidence score indicating how certain the model is about its prediction
This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.
Step 5: Build the NER Pipeline
Create a pipeline that will handle three essential tasks in sequence:
- Process text input by breaking it down into tokens that the model can understand
- Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
- Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure
This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.
from transformers import pipeline
# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")
# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)
# Print recognized entities
for entity in entities:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")
Here's a breakdown of what the code does:
1. Pipeline Setup
- Imports the pipeline module from the transformers library
- Creates a NER pipeline using the previously fine-tuned model stored in "./results"
- Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens
2. Text Processing
- Takes a sample text input ("Barack Obama was born in Hawaii.")
- Processes the text through the NER pipeline to identify entities
3. Output Format
- Loops through the detected entities
- For each entity, prints three pieces of information:
- The word/phrase identified as an entity
- The entity type (e.g., PER for person, LOC for location)
- A confidence score indicating how certain the model is about its prediction
This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.
Step 5: Build the NER Pipeline
Create a pipeline that will handle three essential tasks in sequence:
- Process text input by breaking it down into tokens that the model can understand
- Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
- Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure
This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.
from transformers import pipeline
# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")
# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)
# Print recognized entities
for entity in entities:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")
Here's a breakdown of what the code does:
1. Pipeline Setup
- Imports the pipeline module from the transformers library
- Creates a NER pipeline using the previously fine-tuned model stored in "./results"
- Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens
2. Text Processing
- Takes a sample text input ("Barack Obama was born in Hawaii.")
- Processes the text through the NER pipeline to identify entities
3. Output Format
- Loops through the detected entities
- For each entity, prints three pieces of information:
- The word/phrase identified as an entity
- The entity type (e.g., PER for person, LOC for location)
- A confidence score indicating how certain the model is about its prediction
This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.
Step 5: Build the NER Pipeline
Create a pipeline that will handle three essential tasks in sequence:
- Process text input by breaking it down into tokens that the model can understand
- Use the fine-tuned model to predict and identify entities within the text, including their types and confidence scores
- Map these predictions back to the original text, ensuring that entity boundaries and classifications are properly aligned with the input text's structure
This pipeline will serve as the core component for transforming raw text into structured entity information that can be used in downstream applications.
from transformers import pipeline
# Load fine-tuned model
ner_pipeline = pipeline("ner", model="./results", tokenizer=model_name, aggregation_strategy="simple")
# Process text input
text = "Barack Obama was born in Hawaii."
entities = ner_pipeline(text)
# Print recognized entities
for entity in entities:
print(f"Entity: {entity['word']}, Type: {entity['entity_group']}, Confidence: {entity['score']:.2f}")
Here's a breakdown of what the code does:
1. Pipeline Setup
- Imports the pipeline module from the transformers library
- Creates a NER pipeline using the previously fine-tuned model stored in "./results"
- Sets the tokenizer and uses "simple" aggregation strategy for combining subword tokens
2. Text Processing
- Takes a sample text input ("Barack Obama was born in Hawaii.")
- Processes the text through the NER pipeline to identify entities
3. Output Format
- Loops through the detected entities
- For each entity, prints three pieces of information:
- The word/phrase identified as an entity
- The entity type (e.g., PER for person, LOC for location)
- A confidence score indicating how certain the model is about its prediction
This pipeline serves as a core component for converting raw text into structured entity information that can be used in various applications.