Code icon

The App is Under a Quick Maintenance

We apologize for the inconvenience. Please come back later

Menu iconMenu iconNLP con Transformers: fundamentos y aplicaciones principales
NLP con Transformers: fundamentos y aplicaciones principales

Project 1: Sentiment Analysis with BERT

6. Step 3: Tokenizing the Dataset

BERT requires tokenized input, so we’ll use its tokenizer to preprocess the text.

Code Example: Tokenization

# Load BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Apply tokenization
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_test = test_data.map(tokenize_function, batched=True)

6. Step 3: Tokenizing the Dataset

BERT requires tokenized input, so we’ll use its tokenizer to preprocess the text.

Code Example: Tokenization

# Load BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Apply tokenization
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_test = test_data.map(tokenize_function, batched=True)

6. Step 3: Tokenizing the Dataset

BERT requires tokenized input, so we’ll use its tokenizer to preprocess the text.

Code Example: Tokenization

# Load BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Apply tokenization
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_test = test_data.map(tokenize_function, batched=True)

6. Step 3: Tokenizing the Dataset

BERT requires tokenized input, so we’ll use its tokenizer to preprocess the text.

Code Example: Tokenization

# Load BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Apply tokenization
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_test = test_data.map(tokenize_function, batched=True)