Chapter 4: Deploying and Scaling Transformer Models
4.4 Practical Exercises
This section provides practical exercises to reinforce your understanding of deploying and scaling transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. Each exercise includes a solution with detailed code examples to guide your implementation.
Exercise 1: Convert a Transformer Model to ONNX
Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.
Instructions:
- Load a pretrained transformer model.
- Convert the model to ONNX.
- Use ONNXRuntime for inference.
Solution:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
model,
args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
f="bert_model.onnx",
input_names=["input_ids", "attention_mask"],
output_names=["output"],
dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
opset_version=11
)
# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])
Exercise 2: Deploy a Transformer Model with TensorFlow Lite
Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.
Instructions:
- Load a TensorFlow model.
- Convert it to TensorFlow Lite format.
- Use the TensorFlow Lite interpreter for inference.
Solution:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
f.write(tflite_model)
# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)
Exercise 3: Create an API with FastAPI
Task: Build a FastAPI application for sentiment analysis using a pretrained model.
Instructions:
- Set up a FastAPI application.
- Load a Hugging Face sentiment analysis pipeline.
- Create an endpoint to analyze the sentiment of input text.
Solution:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
text: str
app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")
# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
try:
result = model_pipeline(input.text)
return {"text": input.text, "sentiment": result[0]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload
Test the API:
curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'
Response:
{
"text": "Transformers are incredible!",
"sentiment": {
"label": "POSITIVE",
"score": 0.9998
}
}
Exercise 4: Deploy a Gradio App on Hugging Face Spaces
Task: Create and deploy a Gradio app for text summarization.
Instructions:
- Create a Gradio app for summarization.
- Deploy it on Hugging Face Spaces.
Solution:
import gradio as gr
from transformers import pipeline
# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")
# Step 2: Define the Gradio function
def summarize_text(input_text):
summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
return summary[0]["summary_text"]
# Step 3: Create the Gradio interface
interface = gr.Interface(
fn=summarize_text,
inputs="text",
outputs="text",
title="Text Summarizer",
description="Provide a text and get a summarized version."
)
# Step 4: Launch the app locally
interface.launch()
To Deploy on Hugging Face Spaces:
- Push this code to a GitHub repository.
- Link the repository to a Hugging Face Space.
- Your app will be hosted at:
https://huggingface.co/spaces/<your_space_name>
.
These exercises provided hands-on experience with deploying transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. By completing these tasks, you’ve gained practical knowledge in optimizing models, creating APIs, and deploying interactive applications. Experiment further with these tools to build scalable NLP solutions for real-world use cases.
4.4 Practical Exercises
This section provides practical exercises to reinforce your understanding of deploying and scaling transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. Each exercise includes a solution with detailed code examples to guide your implementation.
Exercise 1: Convert a Transformer Model to ONNX
Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.
Instructions:
- Load a pretrained transformer model.
- Convert the model to ONNX.
- Use ONNXRuntime for inference.
Solution:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
model,
args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
f="bert_model.onnx",
input_names=["input_ids", "attention_mask"],
output_names=["output"],
dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
opset_version=11
)
# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])
Exercise 2: Deploy a Transformer Model with TensorFlow Lite
Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.
Instructions:
- Load a TensorFlow model.
- Convert it to TensorFlow Lite format.
- Use the TensorFlow Lite interpreter for inference.
Solution:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
f.write(tflite_model)
# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)
Exercise 3: Create an API with FastAPI
Task: Build a FastAPI application for sentiment analysis using a pretrained model.
Instructions:
- Set up a FastAPI application.
- Load a Hugging Face sentiment analysis pipeline.
- Create an endpoint to analyze the sentiment of input text.
Solution:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
text: str
app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")
# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
try:
result = model_pipeline(input.text)
return {"text": input.text, "sentiment": result[0]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload
Test the API:
curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'
Response:
{
"text": "Transformers are incredible!",
"sentiment": {
"label": "POSITIVE",
"score": 0.9998
}
}
Exercise 4: Deploy a Gradio App on Hugging Face Spaces
Task: Create and deploy a Gradio app for text summarization.
Instructions:
- Create a Gradio app for summarization.
- Deploy it on Hugging Face Spaces.
Solution:
import gradio as gr
from transformers import pipeline
# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")
# Step 2: Define the Gradio function
def summarize_text(input_text):
summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
return summary[0]["summary_text"]
# Step 3: Create the Gradio interface
interface = gr.Interface(
fn=summarize_text,
inputs="text",
outputs="text",
title="Text Summarizer",
description="Provide a text and get a summarized version."
)
# Step 4: Launch the app locally
interface.launch()
To Deploy on Hugging Face Spaces:
- Push this code to a GitHub repository.
- Link the repository to a Hugging Face Space.
- Your app will be hosted at:
https://huggingface.co/spaces/<your_space_name>
.
These exercises provided hands-on experience with deploying transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. By completing these tasks, you’ve gained practical knowledge in optimizing models, creating APIs, and deploying interactive applications. Experiment further with these tools to build scalable NLP solutions for real-world use cases.
4.4 Practical Exercises
This section provides practical exercises to reinforce your understanding of deploying and scaling transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. Each exercise includes a solution with detailed code examples to guide your implementation.
Exercise 1: Convert a Transformer Model to ONNX
Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.
Instructions:
- Load a pretrained transformer model.
- Convert the model to ONNX.
- Use ONNXRuntime for inference.
Solution:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
model,
args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
f="bert_model.onnx",
input_names=["input_ids", "attention_mask"],
output_names=["output"],
dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
opset_version=11
)
# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])
Exercise 2: Deploy a Transformer Model with TensorFlow Lite
Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.
Instructions:
- Load a TensorFlow model.
- Convert it to TensorFlow Lite format.
- Use the TensorFlow Lite interpreter for inference.
Solution:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
f.write(tflite_model)
# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)
Exercise 3: Create an API with FastAPI
Task: Build a FastAPI application for sentiment analysis using a pretrained model.
Instructions:
- Set up a FastAPI application.
- Load a Hugging Face sentiment analysis pipeline.
- Create an endpoint to analyze the sentiment of input text.
Solution:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
text: str
app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")
# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
try:
result = model_pipeline(input.text)
return {"text": input.text, "sentiment": result[0]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload
Test the API:
curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'
Response:
{
"text": "Transformers are incredible!",
"sentiment": {
"label": "POSITIVE",
"score": 0.9998
}
}
Exercise 4: Deploy a Gradio App on Hugging Face Spaces
Task: Create and deploy a Gradio app for text summarization.
Instructions:
- Create a Gradio app for summarization.
- Deploy it on Hugging Face Spaces.
Solution:
import gradio as gr
from transformers import pipeline
# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")
# Step 2: Define the Gradio function
def summarize_text(input_text):
summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
return summary[0]["summary_text"]
# Step 3: Create the Gradio interface
interface = gr.Interface(
fn=summarize_text,
inputs="text",
outputs="text",
title="Text Summarizer",
description="Provide a text and get a summarized version."
)
# Step 4: Launch the app locally
interface.launch()
To Deploy on Hugging Face Spaces:
- Push this code to a GitHub repository.
- Link the repository to a Hugging Face Space.
- Your app will be hosted at:
https://huggingface.co/spaces/<your_space_name>
.
These exercises provided hands-on experience with deploying transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. By completing these tasks, you’ve gained practical knowledge in optimizing models, creating APIs, and deploying interactive applications. Experiment further with these tools to build scalable NLP solutions for real-world use cases.
4.4 Practical Exercises
This section provides practical exercises to reinforce your understanding of deploying and scaling transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. Each exercise includes a solution with detailed code examples to guide your implementation.
Exercise 1: Convert a Transformer Model to ONNX
Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.
Instructions:
- Load a pretrained transformer model.
- Convert the model to ONNX.
- Use ONNXRuntime for inference.
Solution:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
model,
args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
f="bert_model.onnx",
input_names=["input_ids", "attention_mask"],
output_names=["output"],
dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
opset_version=11
)
# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])
Exercise 2: Deploy a Transformer Model with TensorFlow Lite
Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.
Instructions:
- Load a TensorFlow model.
- Convert it to TensorFlow Lite format.
- Use the TensorFlow Lite interpreter for inference.
Solution:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf
# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
f.write(tflite_model)
# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)
Exercise 3: Create an API with FastAPI
Task: Build a FastAPI application for sentiment analysis using a pretrained model.
Instructions:
- Set up a FastAPI application.
- Load a Hugging Face sentiment analysis pipeline.
- Create an endpoint to analyze the sentiment of input text.
Solution:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline
# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
text: str
app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")
# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
try:
result = model_pipeline(input.text)
return {"text": input.text, "sentiment": result[0]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload
Test the API:
curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'
Response:
{
"text": "Transformers are incredible!",
"sentiment": {
"label": "POSITIVE",
"score": 0.9998
}
}
Exercise 4: Deploy a Gradio App on Hugging Face Spaces
Task: Create and deploy a Gradio app for text summarization.
Instructions:
- Create a Gradio app for summarization.
- Deploy it on Hugging Face Spaces.
Solution:
import gradio as gr
from transformers import pipeline
# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")
# Step 2: Define the Gradio function
def summarize_text(input_text):
summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
return summary[0]["summary_text"]
# Step 3: Create the Gradio interface
interface = gr.Interface(
fn=summarize_text,
inputs="text",
outputs="text",
title="Text Summarizer",
description="Provide a text and get a summarized version."
)
# Step 4: Launch the app locally
interface.launch()
To Deploy on Hugging Face Spaces:
- Push this code to a GitHub repository.
- Link the repository to a Hugging Face Space.
- Your app will be hosted at:
https://huggingface.co/spaces/<your_space_name>
.
These exercises provided hands-on experience with deploying transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. By completing these tasks, you’ve gained practical knowledge in optimizing models, creating APIs, and deploying interactive applications. Experiment further with these tools to build scalable NLP solutions for real-world use cases.