Click here to view the next lesson.

Chapter 4: Deploying and Scaling Transformer Models

4.4 Practical Exercises

This section provides practical exercises to reinforce your understanding of deploying and scaling transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. Each exercise includes a solution with detailed code examples to guide your implementation.

Exercise 1: Convert a Transformer Model to ONNX

Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.

Instructions:

Load a pretrained transformer model.
Convert the model to ONNX.
Use ONNXRuntime for inference.

Solution:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
    model,
    args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
    f="bert_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
    opset_version=11
)

# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.

Instructions:

Load a TensorFlow model.
Convert it to TensorFlow Lite format.
Use the TensorFlow Lite interpreter for inference.

Solution:

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
    f.write(tflite_model)

# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)

Exercise 3: Create an API with FastAPI

Task: Build a FastAPI application for sentiment analysis using a pretrained model.

Instructions:

Set up a FastAPI application.
Load a Hugging Face sentiment analysis pipeline.
Create an endpoint to analyze the sentiment of input text.

Solution:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline

# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
    text: str

app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")

# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
    try:
        result = model_pipeline(input.text)
        return {"text": input.text, "sentiment": result[0]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload

Test the API:

curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'

Response:

{
  "text": "Transformers are incredible!",
  "sentiment": {
    "label": "POSITIVE",
    "score": 0.9998
  }
}

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

Task: Create and deploy a Gradio app for text summarization.

Instructions:

Create a Gradio app for summarization.
Deploy it on Hugging Face Spaces.

Solution:

import gradio as gr
from transformers import pipeline

# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Step 2: Define the Gradio function
def summarize_text(input_text):
    summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
    return summary[0]["summary_text"]

# Step 3: Create the Gradio interface
interface = gr.Interface(
    fn=summarize_text,
    inputs="text",
    outputs="text",
    title="Text Summarizer",
    description="Provide a text and get a summarized version."
)

# Step 4: Launch the app locally
interface.launch()

To Deploy on Hugging Face Spaces:

Push this code to a GitHub repository.
Link the repository to a Hugging Face Space.
Your app will be hosted at: https://huggingface.co/spaces/<your_space_name>.

These exercises provided hands-on experience with deploying transformer models using ONNX, TensorFlow Lite, FastAPI, and Hugging Face Spaces. By completing these tasks, you’ve gained practical knowledge in optimizing models, creating APIs, and deploying interactive applications. Experiment further with these tools to build scalable NLP solutions for real-world use cases.

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.

Instructions:

Load a pretrained transformer model.
Convert the model to ONNX.
Use ONNXRuntime for inference.

Solution:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
    model,
    args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
    f="bert_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
    opset_version=11
)

# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.

Instructions:

Load a TensorFlow model.
Convert it to TensorFlow Lite format.
Use the TensorFlow Lite interpreter for inference.

Solution:

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
    f.write(tflite_model)

# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)

Exercise 3: Create an API with FastAPI

Task: Build a FastAPI application for sentiment analysis using a pretrained model.

Instructions:

Set up a FastAPI application.
Load a Hugging Face sentiment analysis pipeline.
Create an endpoint to analyze the sentiment of input text.

Solution:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline

# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
    text: str

app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")

# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
    try:
        result = model_pipeline(input.text)
        return {"text": input.text, "sentiment": result[0]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload

Test the API:

curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'

Response:

{
  "text": "Transformers are incredible!",
  "sentiment": {
    "label": "POSITIVE",
    "score": 0.9998
  }
}

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

Task: Create and deploy a Gradio app for text summarization.

Instructions:

Create a Gradio app for summarization.
Deploy it on Hugging Face Spaces.

Solution:

import gradio as gr
from transformers import pipeline

# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Step 2: Define the Gradio function
def summarize_text(input_text):
    summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
    return summary[0]["summary_text"]

# Step 3: Create the Gradio interface
interface = gr.Interface(
    fn=summarize_text,
    inputs="text",
    outputs="text",
    title="Text Summarizer",
    description="Provide a text and get a summarized version."
)

# Step 4: Launch the app locally
interface.launch()

To Deploy on Hugging Face Spaces:

Push this code to a GitHub repository.
Link the repository to a Hugging Face Space.
Your app will be hosted at: https://huggingface.co/spaces/<your_space_name>.

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.

Instructions:

Load a pretrained transformer model.
Convert the model to ONNX.
Use ONNXRuntime for inference.

Solution:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
    model,
    args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
    f="bert_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
    opset_version=11
)

# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.

Instructions:

Load a TensorFlow model.
Convert it to TensorFlow Lite format.
Use the TensorFlow Lite interpreter for inference.

Solution:

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
    f.write(tflite_model)

# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)

Exercise 3: Create an API with FastAPI

Task: Build a FastAPI application for sentiment analysis using a pretrained model.

Instructions:

Set up a FastAPI application.
Load a Hugging Face sentiment analysis pipeline.
Create an endpoint to analyze the sentiment of input text.

Solution:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline

# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
    text: str

app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")

# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
    try:
        result = model_pipeline(input.text)
        return {"text": input.text, "sentiment": result[0]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload

Test the API:

curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'

Response:

{
  "text": "Transformers are incredible!",
  "sentiment": {
    "label": "POSITIVE",
    "score": 0.9998
  }
}

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

Task: Create and deploy a Gradio app for text summarization.

Instructions:

Create a Gradio app for summarization.
Deploy it on Hugging Face Spaces.

Solution:

import gradio as gr
from transformers import pipeline

# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Step 2: Define the Gradio function
def summarize_text(input_text):
    summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
    return summary[0]["summary_text"]

# Step 3: Create the Gradio interface
interface = gr.Interface(
    fn=summarize_text,
    inputs="text",
    outputs="text",
    title="Text Summarizer",
    description="Provide a text and get a summarized version."
)

# Step 4: Launch the app locally
interface.launch()

To Deploy on Hugging Face Spaces:

Push this code to a GitHub repository.
Link the repository to a Hugging Face Space.
Your app will be hosted at: https://huggingface.co/spaces/<your_space_name>.

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Task: Convert a Hugging Face transformer model to the ONNX format and perform inference using ONNXRuntime.

Instructions:

Load a pretrained transformer model.
Convert the model to ONNX.
Use ONNXRuntime for inference.

Solution:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import onnx
import onnxruntime as ort

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Export the model to ONNX
dummy_input = tokenizer("This is a test input.", return_tensors="pt")
torch.onnx.export(
    model,
    args=(dummy_input["input_ids"], dummy_input["attention_mask"]),
    f="bert_model.onnx",
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={"input_ids": {0: "batch_size"}, "attention_mask": {0: "batch_size"}},
    opset_version=11
)

# Step 3: Perform inference with ONNXRuntime
onnx_session = ort.InferenceSession("bert_model.onnx")
inputs = tokenizer("This is a great day!", return_tensors="np")
onnx_outputs = onnx_session.run(None, {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"]
})
print("ONNX Inference Result:", onnx_outputs[0])

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Task: Convert a Hugging Face transformer model to TensorFlow Lite and perform inference.

Instructions:

Load a TensorFlow model.
Convert it to TensorFlow Lite format.
Use the TensorFlow Lite interpreter for inference.

Solution:

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
import tensorflow as tf

# Step 1: Load the model and tokenizer
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Step 2: Save the model and convert it to TFLite
model.save("saved_model")
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model")
tflite_model = converter.convert()
with open("bert_model.tflite", "wb") as f:
    f.write(tflite_model)

# Step 3: Perform inference with TensorFlow Lite
interpreter = tf.lite.Interpreter(model_path="bert_model.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

inputs = tokenizer("TensorFlow Lite is efficient!", return_tensors="np")
interpreter.set_tensor(input_details[0]['index'], inputs["input_ids"].astype('int32'))
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
print("TFLite Inference Result:", output_data)

Exercise 3: Create an API with FastAPI

Task: Build a FastAPI application for sentiment analysis using a pretrained model.

Instructions:

Set up a FastAPI application.
Load a Hugging Face sentiment analysis pipeline.
Create an endpoint to analyze the sentiment of input text.

Solution:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline

# Step 1: Initialize FastAPI and define input schema
class TextInput(BaseModel):
    text: str

app = FastAPI()
model_pipeline = pipeline("sentiment-analysis")

# Step 2: Define the API endpoint
@app.post("/analyze_sentiment")
def analyze_sentiment(input: TextInput):
    try:
        result = model_pipeline(input.text)
        return {"text": input.text, "sentiment": result[0]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Step 3: Run the server using `uvicorn`
# Command: uvicorn app:app --reload

Test the API:

curl -X POST "http://127.0.0.1:8000/analyze_sentiment" \
-H "Content-Type: application/json" \
-d '{"text": "Transformers are incredible!"}'

Response:

{
  "text": "Transformers are incredible!",
  "sentiment": {
    "label": "POSITIVE",
    "score": 0.9998
  }
}

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

Task: Create and deploy a Gradio app for text summarization.

Instructions:

Create a Gradio app for summarization.
Deploy it on Hugging Face Spaces.

Solution:

import gradio as gr
from transformers import pipeline

# Step 1: Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Step 2: Define the Gradio function
def summarize_text(input_text):
    summary = summarizer(input_text, max_length=50, min_length=20, do_sample=False)
    return summary[0]["summary_text"]

# Step 3: Create the Gradio interface
interface = gr.Interface(
    fn=summarize_text,
    inputs="text",
    outputs="text",
    title="Text Summarizer",
    description="Provide a text and get a summarized version."
)

# Step 4: Launch the app locally
interface.launch()

To Deploy on Hugging Face Spaces:

Push this code to a GitHub repository.
Link the repository to a Hugging Face Space.
Your app will be hosted at: https://huggingface.co/spaces/<your_space_name>.

Compra este libro

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Chapter 4: Deploying and Scaling Transformer Models

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Exercise 3: Create an API with FastAPI

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Exercise 3: Create an API with FastAPI

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Exercise 3: Create an API with FastAPI

Exercise 4: Deploy a Gradio App on Hugging Face Spaces

4.4 Practical Exercises

Exercise 1: Convert a Transformer Model to ONNX

Exercise 2: Deploy a Transformer Model with TensorFlow Lite

Exercise 3: Create an API with FastAPI

Exercise 4: Deploy a Gradio App on Hugging Face Spaces