Click here to view the next lesson.

Capítulo 6: Ejercicios de Nivel Avanzado

Ejercicios de Nivel Avanzado Parte 1

Exercise 1: Análisis de Archivos
Exercise 2: Análisis de Datos
Exercise 3: Web Scraping
Exercise 4: Multithreading
Exercise 5: Aprendizaje Automático
Exercise 6: Procesamiento del Lenguaje Natural
Exercise 7: Desarrollo Web
Exercise 8: Visualización de Datos
Exercise 9: Aprendizaje Automático
Exercise 10: Análisis de Datos
Exercise 11: Visión por Computadora
Exercise 12: Procesamiento del Lenguaje Natural
Exercise 13: Web Scraping
Exercise 14: Procesamiento de Big Data
Exercise 15: DevOps
Exercise 16: Aprendizaje por Refuerzo
Exercise 17: Análisis de Series Temporales
Exercise 18: Redes de Computadoras
Exercise 19: Análisis y Visualización de Datos
Exercise 20: Aprendizaje Automático
Exercise 21: Procesamiento del Lenguaje Natural
Exercise 22: Web Scraping
Exercise 23: Interacción con Bases de Datos
Exercise 24: Procesamiento Paralelo
Exercise 25: Procesamiento de Imágenes
Exercise 26: Aprendizaje Automático
Exercise 27: Desarrollo Web
Exercise 28: Streaming de Datos
Exercise 29: Procesamiento del Lenguaje Natural
Exercise 30: Sistemas Distribuidos
Exercise 31: Visualización de Datos
Exercise 32: Ingeniería de Datos
Exercise 33: Generación de Lenguaje Natural
Exercise 34: Aprendizaje Automático
Exercise 35: Visión por Computadora
Exercise 36: Programación de Redes
Exercise 37: Computación en la Nube
Exercise 38: Procesamiento del Lenguaje Natural
Exercise 39: Aprendizaje Profundo
Exercise 40: Análisis de Datos
Exercise 41: Ciencia de Datos
Exercise 42: Aprendizaje Automático
Exercise 43: Web Scraping
Exercise 44: Programación de Bases de Datos
Exercise 45: Computación en la Nube
Exercise 46: Procesamiento del Lenguaje Natural
Exercise 47: Big Data
Exercise 48: Ciberseguridad
Exercise 49: Aprendizaje Automático
Exercise 50: Visión por Computadora

Ejercicio 1: Análisis de Archivos

Conceptos:

Entrada/Salida de Archivos
Expresiones regulares

Descripción: Escribe un script en Python que lea un archivo de texto y extraiga todas las URLs que estén presentes en el archivo. La salida debe ser una lista de URLs.

Solución:

pythonCopy code
import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Ejercicio 2: Análisis de Datos

Conceptos:

Entrada/Salida de Archivos
Manipulación de Datos
Biblioteca Pandas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de ventas y calcule los ingresos totales de ventas para cada categoría de producto.

Solución:

pythonCopy code
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Ejercicio 3: Web Scraping

Conceptos:

Web scraping
Biblioteca Requests
Biblioteca Beautiful Soup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que extraiga el título y el precio de todos los productos listados en un sitio web de comercio electrónico y los almacene en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Make a GET request to the website
response = requests.get('https://www.example.com/products')

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all product titles and prices
titles = [title.text for title in soup.find_all('h3', class_='product-title')]
prices = [price.text for price in soup.find_all('div', class_='product-price')]

# Zip the titles and prices together
data = list(zip(titles, prices))

# Write the data to a CSV file
with open('product_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

Ejercicio 4: Multithreading

Conceptos:

Multithreading
Biblioteca Requests
Biblioteca Threading

Descripción: Escribe un script en Python que utilice el multithreading para descargar varias imágenes de una lista de URLs simultáneamente.

Solución:

pythonCopy code
import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Ejercicio 5: Aprendizaje automático

Conceptos:

Aprendizaje automático
Biblioteca scikit-learn

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático en un conjunto de datos y lo utilice para predecir la salida para nuevos datos.

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

En este ejercicio, primero leemos un conjunto de datos en un dataframe de pandas. Luego, dividimos los datos en conjuntos de entrenamiento y prueba utilizando la función train_test_split del módulo sklearn.model_selection. Entrenamos un modelo de regresión lineal en los datos de entrenamiento utilizando la clase LinearRegression del módulo sklearn.linear_model. Finalmente, utilizamos el modelo entrenado para predecir la salida para los datos de prueba y evaluamos el rendimiento del modelo utilizando la métrica del error cuadrático medio.

Ejercicio 6: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Análisis de Sentimientos
Biblioteca NLTK

Descripción: Escribe un script en Python que lea un archivo de texto y realice análisis de sentimientos en el texto utilizando un modelo preentrenado de procesamiento del lenguaje natural.

Solución:

pythonCopy code
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Read the text file into a string
with open('input_file.txt', 'r') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

En este ejercicio, primero leemos un archivo de texto en una cadena. Luego, creamos un objeto SentimentIntensityAnalyzer del módulo nltk.sentiment.vader. Utilizamos el método polarity_scores del objeto SentimentIntensityAnalyzer para realizar análisis de sentimientos en el texto y obtener un diccionario de puntajes de sentimiento.

Ejercicio 7: Desarrollo Web

Conceptos:

Desarrollo Web
Marco de trabajo Flask
Subidas de archivos

Descripción: Escribe un script en Python que cree una aplicación web utilizando el marco de trabajo Flask que permita a los usuarios cargar un archivo y realice algún procesamiento en el archivo.

Solución:

pythonCopy code
from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = os.path.basename('uploads')
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    # Get the uploaded file
    file = request.files['file']

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    # Perform processing on the file
    # ...

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

En este ejercicio, primero importamos el módulo Flask y creamos una aplicación Flask. Configuramos una ruta para la página de inicio que devuelve una plantilla HTML. Configuramos una ruta para cargar archivos que recibe un archivo cargado y lo guarda en una carpeta de cargas designada. Podemos realizar procesamiento en el archivo cargado dentro de la función upload.

Ejercicio 8: Visualización de Datos

Conceptos:

Visualización de Datos
Biblioteca Matplotlib
Gráficos de Velas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos del mercado de valores y traze un gráfico de velas de los datos.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])

# Convert the date column to Matplotlib dates format
df['Date'] = df['Date'].apply(mdates.date2num)

# Create a figure and axis objects
fig, ax = plt.subplots()

# Plot the candlestick chart
candlestick_ohlc(ax, df.values, width=0.6, colorup='green', colordown='red')

# Format the x-axis as dates
ax.xaxis_date()

# Set the axis labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.set_title('Stock Market Data')

# Display the chart
plt.show()

En este ejercicio, primero leemos un archivo CSV que contiene datos del mercado de valores en un dataframe de pandas. Convertimos la columna de fecha al formato de fechas de Matplotlib y creamos objetos de figura y ejes. Tramamos el gráfico de velas usando la función candlestick_ohlc del módulo mpl_finance. Formateamos el eje x como fechas y configuramos las etiquetas de los ejes y el título. Finalmente, mostramos el gráfico usando la función show del módulo matplotlib.pyplot.

Ejercicio 9: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Biblioteca Scikit-learn

Descripción: Escribe un script en Python que lea un conjunto de datos que contenga información sobre diferentes tipos de flores y entrene un modelo de aprendizaje automático para predecir el tipo de flor basado en sus características.

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']], df['species'], test_size=0.2, random_state=42)

# Train a logistic regression model on the training data
model = LogisticRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

En este ejercicio, primero leemos un conjunto de datos que contiene información sobre diferentes tipos de flores en un dataframe de pandas. Dividimos los datos en conjuntos de entrenamiento y prueba utilizando la función train_test_split del módulo sklearn.model_selection. Entrenamos un modelo de regresión logística en los datos de entrenamiento utilizando la clase LogisticRegression del módulo sklearn.linear_model. Finalmente, utilizamos el modelo entrenado para predecir la salida para los datos de prueba y evaluamos el rendimiento del modelo utilizando la métrica de puntuación de precisión.

Ejercicio 10: Análisis de Datos

Conceptos:

Análisis de Datos
Sistemas de Recomendación
Filtrado Colaborativo
Biblioteca Surprise

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de compra de clientes y genere un sistema de recomendación que recomiende productos a los clientes en función de su historial de compras.

Solución:

pythonCopy code
import pandas as pd
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Convert the pandas dataframe to a surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
for customer_id in df['customer_id'].unique():
    products = df[df['customer_id'] == customer_id]['product_id'].values
    for product_id in df['product_id'].unique():
        if product_id not in products:
            rating = model.predict(customer_id, product_id).est
            print(f"Customer {customer_id} might like product {product_id} with rating {rating}")

En este ejercicio, primero leemos un archivo CSV que contiene datos de compra de clientes en un dataframe de pandas. Convertimos el dataframe de pandas en un conjunto de datos de surprise utilizando las clases Reader y Dataset del módulo surprise. Dividimos los datos en conjuntos de entrenamiento y prueba utilizando la función train_test_split del módulo surprise.model_selection. Entrenamos un modelo SVD en los datos de entrenamiento utilizando la clase SVD del módulo surprise. Utilizamos el modelo entrenado para predecir la salida para los datos de prueba y evaluamos el rendimiento del modelo utilizando la métrica de error cuadrático medio raíz. Finalmente, recomendamos productos a los clientes en función de su historial de compras utilizando el modelo entrenado.

Ejercicio 11: Visión por Computadora

Conceptos:

Visión por Computadora
Detección de Objetos
Biblioteca OpenCV
Modelos Pre-entrenados

Descripción: Escribe un script en Python que lea una imagen y realice detección de objetos en la imagen utilizando un modelo de detección de objetos pre-entrenado.

Solución:

pythonCopy code
import cv2

# Read the image file
img = cv2.imread('image.jpg')

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Set the input image and perform object detection
model.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))
output = model.forward()

# Loop through the detected objects and draw bounding boxes around them
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > 0.5:
        x1 = int(detection[3] * img.shape[1])
        y1 = int(detection[4] * img.shape[0])
        x2 = int(detection[5] * img.shape[1])
        y2 = int(detection[6] * img.shape[0])
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Display the image with the detected objects
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

En este ejercicio, primero leemos un archivo de imagen en una matriz NumPy utilizando la función imread del módulo cv2 de OpenCV. Cargamos un modelo de detección de objetos pre-entrenado utilizando la función readNetFromTensorflow del módulo cv2.dnn. Configuramos la imagen de entrada para el modelo y realizamos la detección de objetos utilizando los métodos setInput y forward del objeto del modelo. Finalmente, recorremos los objetos detectados y dibujamos cuadros delimitadores alrededor de ellos utilizando la función rectangle del módulo cv2.

Ejercicio 12: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que lea un archivo de texto y realice modelado de temas en el texto utilizando Asignación Latente de Dirichlet (LDA).

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

En este ejercicio, primero leemos un archivo de texto en una lista de cadenas. Procesamos el texto eliminando saltos de línea, convirtiéndolo a minúsculas y tokenizándolo en palabras utilizando el método split. Creamos un diccionario de palabras y su frecuencia y creamos una representación de bolsa de palabras del texto utilizando el método doc2bow del objeto del diccionario. Entrenamos un modelo LDA en el corpus utilizando la clase LdaModel del módulo gensim.models. Finalmente, imprimimos los temas y sus palabras asociadas utilizando el método print_topics del objeto del modelo.

Ejercicio 13: Web Scraping

Conceptos:

Web Scraping
Biblioteca Beautiful Soup
Biblioteca Requests
Manipulación de archivos CSV

Descripción: Escribe un script en Python que realice scraping en un sitio web para obtener información de productos y guarde la información en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Send a request to the website and get the response
response = requests.get(url)

# Parse the HTML content of the response using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])
    for listing in listings:
        name = listing.find('h3').text
        price = listing.find('span', class_='price').text
        description = listing.find('p').text
        writer.writerow([name, price, description])

En este ejercicio, primero definimos la URL del sitio web a hacer scraping y enviamos una solicitud al sitio web utilizando la función get del módulo requests. Analizamos el contenido HTML de la respuesta utilizando Beautiful Soup y encontramos todas las listas de productos en la página utilizando el método find_all. Escribimos la información del producto en un archivo CSV utilizando el módulo csv.

Ejercicio 14: Procesamiento de Big Data

Conceptos:

Procesamiento de Big Data
PySpark
Transformaciones de Datos
Agregación
Formato de archivo Parquet

Descripción: Escribe un script en PySpark que lea un archivo CSV que contenga datos de compra de clientes, realice algunas transformaciones de datos y agregación, y guarde los resultados en un archivo Parquet.

Solución:

pythonCopy code
from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter(df['purchase_date'].between('2020-01-01', '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')
df = df.groupBy('customer_id').sum('price')

# Save the results to a Parquet file
df.write.parquet('customer_spending.parquet')

En este ejercicio, primero creamos un objeto SparkSession utilizando la clase SparkSession del módulo pyspark.sql. Leemos un archivo CSV que contiene datos de compra de clientes en un DataFrame de Spark utilizando el método read.csv. Realizamos algunas transformaciones de datos en el DataFrame utilizando los métodos filter, select y groupBy. Finalmente, guardamos los resultados en un archivo Parquet utilizando el método write.parquet.

Ejercicio 15: DevOps

Conceptos:

DevOps
Biblioteca Fabric

Descripción: Escribe un script en Python que automatice la implementación de una aplicación web en un servidor remoto utilizando la biblioteca Fabric.

Solución:

pythonCopy code
from fabric import Connection

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = 'password'

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Upload the local files to the remote server
c.put(local_path, remote_path)

# Install any required dependencies on the remote server
c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
c.run('pip3 install -r requirements.txt')

# Start the web application on the remote server
c.run('python3 app.py')

En este ejercicio, primero definimos el host y las credenciales de usuario para el servidor remoto. Definimos la ruta de la aplicación web en la máquina local y en el servidor remoto. Creamos una conexión al servidor remoto utilizando la clase Connection del módulo fabric. Subimos los archivos locales al servidor remoto utilizando el método put del objeto de conexión. Instalamos cualquier dependencia requerida en el servidor remoto utilizando el método run del objeto de conexión. Finalmente, iniciamos la aplicación web en el servidor remoto utilizando el método run.

Ejercicio 16: Aprendizaje por Refuerzo

Conceptos:

Aprendizaje por Refuerzo
Q-Learning
Biblioteca OpenAI Gym

Descripción: Escribe un script en Python que implemente un algoritmo de aprendizaje por refuerzo para enseñar a un agente a jugar un juego simple.

Solución:

pythonCopy code
import gym
import numpy as np

# Create an OpenAI Gym environment for the game
env = gym.make('FrozenLake-v0')

# Define the Q-table for the agent
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set the hyperparameters for the algorithm
alpha = 0.8
gamma = 0.95
epsilon = 0.1
num_episodes = 2000

# Train the agent using the Q-learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False
    while not done:
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q[state, :])
        next_state, reward, done, _ = env.step(action)
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))
        state = next_state

# Test the agent by playing the game using the Q-table
state = env.reset()
done = False
while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _ = env.step(action)
    state = next_state
    env.render()

En este ejercicio, primero creamos un entorno de OpenAI Gym para el juego utilizando la función make del módulo gym. Definimos la tabla Q para el agente como un array de NumPy y configuramos los hiperparámetros para el algoritmo de Q-learning. Entrenamos al agente utilizando el algoritmo de Q-learning iterando a través de un número especificado de episodios y actualizando la tabla Q basada en las recompensas y los siguientes estados. Finalmente, probamos al agente jugando el juego utilizando la tabla Q y visualizando el juego utilizando el método render.

Ejercicio 17: Análisis de Series Temporales

Conceptos:

Análisis de Series Temporales
Preprocesamiento de Datos
Visualización de Datos
Modelo ARIMA
Biblioteca Statsmodels

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de series temporales, realice algún preprocesamiento y visualización de datos, y ajuste un modelo de series temporales a los datos.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Resample the data to a monthly frequency and fill any missing values
df = df.resample('M').mean()
df = df.fillna(method='ffill')

# Visualize the data
plt.plot(df)
plt.show()

# Fit an ARIMA model to the data
model = sm.tsa.ARIMA(df, order=(1, 1, 1))
results = model.fit()

# Print the model summary
print(results.summary())

En este ejercicio, primero leemos un archivo CSV que contiene datos de series temporales en un dataframe de pandas. Convertimos la columna de fecha a un objeto datetime y la configuramos como el índice. Re-muestreamos los datos a una frecuencia mensual y llenamos cualquier valor faltante utilizando el relleno hacia adelante. Visualizamos los datos utilizando la función plot del módulo matplotlib.pyplot. Finalmente, ajustamos un modelo ARIMA a los datos utilizando la función ARIMA del módulo statsmodels.api e imprimimos el resumen del modelo utilizando el método summary del objeto de resultados.

Ejercicio 18: Redes de Computadoras

Conceptos:

Redes de Computadoras
Protocolo TCP/IP
Programación de Sockets

Descripción: Escribe un script en Python que implemente un servidor TCP simple que acepte conexiones de clientes y envíe y reciba datos.

Solución:

pythonCopy code
import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

En este ejercicio, primero definimos el host y el puerto para el servidor. Creamos un objeto de socket utilizando la función socket del módulo socket y vinculamos el socket al host y al puerto utilizando el método bind. Escuchamos conexiones entrantes utilizando el método listen y aceptamos una conexión de cliente utilizando el método accept, que devuelve un objeto de conexión y la dirección del cliente. Enviamos datos al cliente utilizando el método sendall del objeto de conexión y recibimos datos del cliente utilizando el método recv. Finalmente, cerramos la conexión utilizando el método close.

Ejercicio 19: Análisis y Visualización de Datos

Conceptos:

Análisis de Datos
Visualización de Datos
Generación de Informes en PDF
Biblioteca Pandas
Biblioteca Matplotlib
Biblioteca ReportLab

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de ventas para una tienda minorista, realice algún análisis y visualización de datos, y guarde los resultados en un informe en PDF.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month']).sum()['sales']

# Plot the total sales by category and month
fig, axes = plt.subplots(nrows=len(df['category'].unique()), ncols=1, figsize=(8.5, 11))
for i, category in enumerate(df['category'].unique()):
    totals[category].plot(ax=axes[i], kind='bar', title=category)
plt.tight_layout()

# Save the plot to a PDF report
c = canvas.Canvas('sales_report.pdf', pagesize=letter)
c.drawString(50, 750, 'Sales Report')
c.drawString(50, 700, 'Total Sales by Category and Month')
plt.savefig('sales_plot.png')
c.drawImage('sales_plot.png', 50, 500, 500, 250)
c.showPage()
c.save()

En este ejercicio, primero leemos un archivo CSV que contiene datos de ventas para una tienda minorista en un dataframe de pandas. Calculamos las ventas totales por categoría y mes utilizando los métodos groupby y sum. Tramamos las ventas totales por categoría y mes utilizando la función plot del módulo matplotlib.pyplot y guardamos el gráfico en un archivo PNG. Finalmente, generamos un informe en PDF utilizando las funciones Canvas e Image del módulo reportlab.

Ejercicio 20: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Redes Neuronales Convolucionales
Biblioteca Keras
Conjunto de datos MNIST

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático para clasificar imágenes de dígitos escritos a mano del conjunto de datos MNIST.

Solución:

pythonCopy code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the convolutional neural network model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

En este ejercicio, primero cargamos el conjunto de datos MNIST utilizando la función load_data del módulo keras.datasets.mnist. Normalizamos los valores de píxeles y remodelamos los datos utilizando NumPy. Definimos un modelo de red neuronal convolucional utilizando la clase Sequential y varias capas del módulo layers de Keras. Compilamos el modelo utilizando el método compile con el optimizador Adam y la función de pérdida de entropía cruzada categórica dispersa. Entrenamos el modelo utilizando el método fit y evaluamos el modelo en los datos de prueba utilizando el método evaluate.

Ejercicio 21: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Preprocesamiento de Texto
Representación de Texto
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que utilice técnicas de procesamiento del lenguaje natural para analizar un corpus de datos de texto y extraer ideas útiles.

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Define the stop words and remove them from the text data
stop_words = stopwords.words('english')
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in word_tokenize(x.lower()) if word not in stop_words]))

# Create a document-term matrix from the text data
texts = df['text'].tolist()
tokenized = [word_tokenize(text) for text in texts]
dictionary = corpora.Dictionary(tokenized)
corpus = [dictionary.doc2bow(text) for text in tokenized]

# Perform topic modeling using LDA
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print the topics and top words for each topic
for topic in lda_model.show_topics(num_topics=num_topics):
    print('Topic {}:'.format(topic[0]))
    print(', '.join(word for word, _ in lda_model.show_topic(topic[0])))

# Extract the topic distributions for each document
topic_dists = lda_model[corpus]
df['topic_dist'] = topic_dists

# Save the results to a CSV file
df.to_csv('text_data_topics.csv', index=False)

En este ejercicio, primero leemos un corpus de datos de texto en un dataframe de pandas. Definimos las palabras vacías utilizando la función stopwords del módulo nltk.corpus y las eliminamos del texto utilizando list comprehension y el método apply de pandas. Creamos una matriz de documentos-términos a partir de los datos de texto utilizando las funciones Dictionary y corpus del módulo gensim. Realizamos modelado de temas utilizando asignación latente de Dirichlet (LDA) utilizando la función LdaModel y extraemos las distribuciones de temas para cada documento. Finalmente, guardamos los resultados en un archivo CSV utilizando el método to_csv de pandas.

Ejercicio 22: Web Scraping

Conceptos:

Web Scraping
Análisis de HTML
Biblioteca BeautifulSoup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que haga scraping de datos de un sitio web utilizando la biblioteca BeautifulSoup y los guarde en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Send a GET request to the URL and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the data from the HTML content
data = []
for item in soup.find_all('div', {'class': 'item'}):
    name = item.find('h3').text
    price = item.find('span', {'class': 'price'}).text
    data.append([name, price])

# Save the data to a CSV file
with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])
    for row in data:
        writer.writerow(row)

En este ejercicio, primero definimos la URL para hacer scraping utilizando la biblioteca requests y analizamos el contenido HTML utilizando la biblioteca BeautifulSoup. Extraemos los datos del contenido HTML utilizando los métodos find_all y find del objeto soup. Finalmente, guardamos los datos en un archivo CSV utilizando el módulo csv.

Ejercicio 23: Interacción con Bases de Datos

Conceptos:

Interacción con Bases de Datos
Base de datos SQLite
Consultas SQL
Módulo SQLite3

Descripción: Escribe un script en Python que interactúe con una base de datos para recuperar y manipular datos.

Solución:

pythonCopy code
import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

En este ejercicio, primero nos conectamos a una base de datos SQLite utilizando la función connect del módulo sqlite3. Creamos un objeto de cursor utilizando el método cursor del objeto de conexión y ejecutamos consultas SQL utilizando el método execute del objeto de cursor. Recuperamos datos de la tabla utilizando el método fetchall e imprimimos los resultados. Actualizamos datos en la tabla utilizando la instrucción UPDATE y eliminamos datos de la tabla utilizando la instrucción DELETE. Finalmente, confirmamos los cambios en la base de datos y cerramos la conexión.

Ejercicio 24: Procesamiento Paralelo

Conceptos:

Procesamiento Paralelo
Multiprocesamiento
Pool de Procesos
Tareas Ligadas a la CPU

Descripción: Escribe un script en Python que realice un cálculo que consume mucho tiempo utilizando procesamiento paralelo para acelerar el cálculo.

Solución:

pythonCopy code
import time
import multiprocessing

# Define a CPU-bound function that takes a long time to compute
def compute(num):
    result = 0
    for i in range(num):
        result += i
    return result

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)
    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

En este ejercicio, primero definimos una función ligada a la CPU que tarda mucho tiempo en calcular. Luego creamos un pool de procesos utilizando la función Pool del módulo multiprocessing con el número de CPU disponibles. Generamos una lista de números para calcular y calculamos los resultados utilizando el método map del pool de procesos. Finalmente, imprimimos los resultados y el tiempo de cálculo.

Ejercicio 25: Procesamiento de Imágenes

Conceptos:

Procesamiento de Imágenes
Biblioteca Pillow
Manipulación de Imágenes
Filtrado de Imágenes

Descripción: Escribe un script en Python que realice operaciones básicas de procesamiento de imágenes en un archivo de imagen.

Solución:

pythonCopy code
from PIL import Image, ImageFilter

# Open the image file
image = Image.open('example.jpg')

# Display the original image
image.show()

# Resize the image
image = image.resize((500, 500))

# Convert the image to grayscale
image = image.convert('L')

# Apply a Gaussian blur filter
image = image.filter(ImageFilter.GaussianBlur(radius=2))

# Save the processed image to a file
image.save('processed.jpg')

# Display the processed image
image.show()

En este ejercicio, primero abrimos un archivo de imagen utilizando la clase Image de la biblioteca Pillow. Redimensionamos la imagen utilizando el método resize y la convertimos a escala de grises utilizando el método convert con el modo 'L'. Aplicamos un filtro de desenfoque gaussiano utilizando el método filter con la clase GaussianBlur del módulo ImageFilter. Finalmente, guardamos la imagen procesada en un archivo utilizando el método save y la mostramos utilizando el método show.

¡Espero que encuentres útiles estos ejercicios! Avísame si tienes alguna otra pregunta.

Ejercicios de Nivel Avanzado Parte 1

Ejercicio 1: Análisis de Archivos

Conceptos:

Entrada/Salida de Archivos
Expresiones regulares

Descripción: Escribe un script en Python que lea un archivo de texto y extraiga todas las URLs que estén presentes en el archivo. La salida debe ser una lista de URLs.

Solución:

pythonCopy code
import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Ejercicio 2: Análisis de Datos

Conceptos:

Entrada/Salida de Archivos
Manipulación de Datos
Biblioteca Pandas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de ventas y calcule los ingresos totales de ventas para cada categoría de producto.

Solución:

pythonCopy code
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Ejercicio 3: Web Scraping

Conceptos:

Web scraping
Biblioteca Requests
Biblioteca Beautiful Soup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que extraiga el título y el precio de todos los productos listados en un sitio web de comercio electrónico y los almacene en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Make a GET request to the website
response = requests.get('https://www.example.com/products')

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all product titles and prices
titles = [title.text for title in soup.find_all('h3', class_='product-title')]
prices = [price.text for price in soup.find_all('div', class_='product-price')]

# Zip the titles and prices together
data = list(zip(titles, prices))

# Write the data to a CSV file
with open('product_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

Ejercicio 4: Multithreading

Conceptos:

Multithreading
Biblioteca Requests
Biblioteca Threading

Descripción: Escribe un script en Python que utilice el multithreading para descargar varias imágenes de una lista de URLs simultáneamente.

Solución:

pythonCopy code
import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Ejercicio 5: Aprendizaje automático

Conceptos:

Aprendizaje automático
Biblioteca scikit-learn

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático en un conjunto de datos y lo utilice para predecir la salida para nuevos datos.

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Ejercicio 6: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Análisis de Sentimientos
Biblioteca NLTK

Descripción: Escribe un script en Python que lea un archivo de texto y realice análisis de sentimientos en el texto utilizando un modelo preentrenado de procesamiento del lenguaje natural.

Solución:

pythonCopy code
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Read the text file into a string
with open('input_file.txt', 'r') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Ejercicio 7: Desarrollo Web

Conceptos:

Desarrollo Web
Marco de trabajo Flask
Subidas de archivos

Descripción: Escribe un script en Python que cree una aplicación web utilizando el marco de trabajo Flask que permita a los usuarios cargar un archivo y realice algún procesamiento en el archivo.

Solución:

pythonCopy code
from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = os.path.basename('uploads')
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    # Get the uploaded file
    file = request.files['file']

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    # Perform processing on the file
    # ...

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Ejercicio 8: Visualización de Datos

Conceptos:

Visualización de Datos
Biblioteca Matplotlib
Gráficos de Velas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos del mercado de valores y traze un gráfico de velas de los datos.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])

# Convert the date column to Matplotlib dates format
df['Date'] = df['Date'].apply(mdates.date2num)

# Create a figure and axis objects
fig, ax = plt.subplots()

# Plot the candlestick chart
candlestick_ohlc(ax, df.values, width=0.6, colorup='green', colordown='red')

# Format the x-axis as dates
ax.xaxis_date()

# Set the axis labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.set_title('Stock Market Data')

# Display the chart
plt.show()

Ejercicio 9: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Biblioteca Scikit-learn

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']], df['species'], test_size=0.2, random_state=42)

# Train a logistic regression model on the training data
model = LogisticRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Ejercicio 10: Análisis de Datos

Conceptos:

Análisis de Datos
Sistemas de Recomendación
Filtrado Colaborativo
Biblioteca Surprise

Solución:

pythonCopy code
import pandas as pd
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Convert the pandas dataframe to a surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
for customer_id in df['customer_id'].unique():
    products = df[df['customer_id'] == customer_id]['product_id'].values
    for product_id in df['product_id'].unique():
        if product_id not in products:
            rating = model.predict(customer_id, product_id).est
            print(f"Customer {customer_id} might like product {product_id} with rating {rating}")

Ejercicio 11: Visión por Computadora

Conceptos:

Visión por Computadora
Detección de Objetos
Biblioteca OpenCV
Modelos Pre-entrenados

Descripción: Escribe un script en Python que lea una imagen y realice detección de objetos en la imagen utilizando un modelo de detección de objetos pre-entrenado.

Solución:

pythonCopy code
import cv2

# Read the image file
img = cv2.imread('image.jpg')

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Set the input image and perform object detection
model.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))
output = model.forward()

# Loop through the detected objects and draw bounding boxes around them
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > 0.5:
        x1 = int(detection[3] * img.shape[1])
        y1 = int(detection[4] * img.shape[0])
        x2 = int(detection[5] * img.shape[1])
        y2 = int(detection[6] * img.shape[0])
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Display the image with the detected objects
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Ejercicio 12: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que lea un archivo de texto y realice modelado de temas en el texto utilizando Asignación Latente de Dirichlet (LDA).

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Ejercicio 13: Web Scraping

Conceptos:

Web Scraping
Biblioteca Beautiful Soup
Biblioteca Requests
Manipulación de archivos CSV

Descripción: Escribe un script en Python que realice scraping en un sitio web para obtener información de productos y guarde la información en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Send a request to the website and get the response
response = requests.get(url)

# Parse the HTML content of the response using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])
    for listing in listings:
        name = listing.find('h3').text
        price = listing.find('span', class_='price').text
        description = listing.find('p').text
        writer.writerow([name, price, description])

Ejercicio 14: Procesamiento de Big Data

Conceptos:

Procesamiento de Big Data
PySpark
Transformaciones de Datos
Agregación
Formato de archivo Parquet

Solución:

pythonCopy code
from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter(df['purchase_date'].between('2020-01-01', '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')
df = df.groupBy('customer_id').sum('price')

# Save the results to a Parquet file
df.write.parquet('customer_spending.parquet')

Ejercicio 15: DevOps

Conceptos:

DevOps
Biblioteca Fabric

Descripción: Escribe un script en Python que automatice la implementación de una aplicación web en un servidor remoto utilizando la biblioteca Fabric.

Solución:

pythonCopy code
from fabric import Connection

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = 'password'

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Upload the local files to the remote server
c.put(local_path, remote_path)

# Install any required dependencies on the remote server
c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
c.run('pip3 install -r requirements.txt')

# Start the web application on the remote server
c.run('python3 app.py')

Ejercicio 16: Aprendizaje por Refuerzo

Conceptos:

Aprendizaje por Refuerzo
Q-Learning
Biblioteca OpenAI Gym

Descripción: Escribe un script en Python que implemente un algoritmo de aprendizaje por refuerzo para enseñar a un agente a jugar un juego simple.

Solución:

pythonCopy code
import gym
import numpy as np

# Create an OpenAI Gym environment for the game
env = gym.make('FrozenLake-v0')

# Define the Q-table for the agent
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set the hyperparameters for the algorithm
alpha = 0.8
gamma = 0.95
epsilon = 0.1
num_episodes = 2000

# Train the agent using the Q-learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False
    while not done:
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q[state, :])
        next_state, reward, done, _ = env.step(action)
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))
        state = next_state

# Test the agent by playing the game using the Q-table
state = env.reset()
done = False
while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _ = env.step(action)
    state = next_state
    env.render()

Ejercicio 17: Análisis de Series Temporales

Conceptos:

Análisis de Series Temporales
Preprocesamiento de Datos
Visualización de Datos
Modelo ARIMA
Biblioteca Statsmodels

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Resample the data to a monthly frequency and fill any missing values
df = df.resample('M').mean()
df = df.fillna(method='ffill')

# Visualize the data
plt.plot(df)
plt.show()

# Fit an ARIMA model to the data
model = sm.tsa.ARIMA(df, order=(1, 1, 1))
results = model.fit()

# Print the model summary
print(results.summary())

Ejercicio 18: Redes de Computadoras

Conceptos:

Redes de Computadoras
Protocolo TCP/IP
Programación de Sockets

Descripción: Escribe un script en Python que implemente un servidor TCP simple que acepte conexiones de clientes y envíe y reciba datos.

Solución:

pythonCopy code
import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Ejercicio 19: Análisis y Visualización de Datos

Conceptos:

Análisis de Datos
Visualización de Datos
Generación de Informes en PDF
Biblioteca Pandas
Biblioteca Matplotlib
Biblioteca ReportLab

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month']).sum()['sales']

# Plot the total sales by category and month
fig, axes = plt.subplots(nrows=len(df['category'].unique()), ncols=1, figsize=(8.5, 11))
for i, category in enumerate(df['category'].unique()):
    totals[category].plot(ax=axes[i], kind='bar', title=category)
plt.tight_layout()

# Save the plot to a PDF report
c = canvas.Canvas('sales_report.pdf', pagesize=letter)
c.drawString(50, 750, 'Sales Report')
c.drawString(50, 700, 'Total Sales by Category and Month')
plt.savefig('sales_plot.png')
c.drawImage('sales_plot.png', 50, 500, 500, 250)
c.showPage()
c.save()

Ejercicio 20: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Redes Neuronales Convolucionales
Biblioteca Keras
Conjunto de datos MNIST

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático para clasificar imágenes de dígitos escritos a mano del conjunto de datos MNIST.

Solución:

pythonCopy code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the convolutional neural network model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Ejercicio 21: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Preprocesamiento de Texto
Representación de Texto
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que utilice técnicas de procesamiento del lenguaje natural para analizar un corpus de datos de texto y extraer ideas útiles.

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Define the stop words and remove them from the text data
stop_words = stopwords.words('english')
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in word_tokenize(x.lower()) if word not in stop_words]))

# Create a document-term matrix from the text data
texts = df['text'].tolist()
tokenized = [word_tokenize(text) for text in texts]
dictionary = corpora.Dictionary(tokenized)
corpus = [dictionary.doc2bow(text) for text in tokenized]

# Perform topic modeling using LDA
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print the topics and top words for each topic
for topic in lda_model.show_topics(num_topics=num_topics):
    print('Topic {}:'.format(topic[0]))
    print(', '.join(word for word, _ in lda_model.show_topic(topic[0])))

# Extract the topic distributions for each document
topic_dists = lda_model[corpus]
df['topic_dist'] = topic_dists

# Save the results to a CSV file
df.to_csv('text_data_topics.csv', index=False)

Ejercicio 22: Web Scraping

Conceptos:

Web Scraping
Análisis de HTML
Biblioteca BeautifulSoup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que haga scraping de datos de un sitio web utilizando la biblioteca BeautifulSoup y los guarde en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Send a GET request to the URL and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the data from the HTML content
data = []
for item in soup.find_all('div', {'class': 'item'}):
    name = item.find('h3').text
    price = item.find('span', {'class': 'price'}).text
    data.append([name, price])

# Save the data to a CSV file
with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])
    for row in data:
        writer.writerow(row)

Ejercicio 23: Interacción con Bases de Datos

Conceptos:

Interacción con Bases de Datos
Base de datos SQLite
Consultas SQL
Módulo SQLite3

Descripción: Escribe un script en Python que interactúe con una base de datos para recuperar y manipular datos.

Solución:

pythonCopy code
import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Ejercicio 24: Procesamiento Paralelo

Conceptos:

Procesamiento Paralelo
Multiprocesamiento
Pool de Procesos
Tareas Ligadas a la CPU

Descripción: Escribe un script en Python que realice un cálculo que consume mucho tiempo utilizando procesamiento paralelo para acelerar el cálculo.

Solución:

pythonCopy code
import time
import multiprocessing

# Define a CPU-bound function that takes a long time to compute
def compute(num):
    result = 0
    for i in range(num):
        result += i
    return result

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)
    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Ejercicio 25: Procesamiento de Imágenes

Conceptos:

Procesamiento de Imágenes
Biblioteca Pillow
Manipulación de Imágenes
Filtrado de Imágenes

Descripción: Escribe un script en Python que realice operaciones básicas de procesamiento de imágenes en un archivo de imagen.

Solución:

pythonCopy code
from PIL import Image, ImageFilter

# Open the image file
image = Image.open('example.jpg')

# Display the original image
image.show()

# Resize the image
image = image.resize((500, 500))

# Convert the image to grayscale
image = image.convert('L')

# Apply a Gaussian blur filter
image = image.filter(ImageFilter.GaussianBlur(radius=2))

# Save the processed image to a file
image.save('processed.jpg')

# Display the processed image
image.show()

¡Espero que encuentres útiles estos ejercicios! Avísame si tienes alguna otra pregunta.

Ejercicios de Nivel Avanzado Parte 1

Ejercicio 1: Análisis de Archivos

Conceptos:

Entrada/Salida de Archivos
Expresiones regulares

Descripción: Escribe un script en Python que lea un archivo de texto y extraiga todas las URLs que estén presentes en el archivo. La salida debe ser una lista de URLs.

Solución:

pythonCopy code
import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Ejercicio 2: Análisis de Datos

Conceptos:

Entrada/Salida de Archivos
Manipulación de Datos
Biblioteca Pandas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de ventas y calcule los ingresos totales de ventas para cada categoría de producto.

Solución:

pythonCopy code
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Ejercicio 3: Web Scraping

Conceptos:

Web scraping
Biblioteca Requests
Biblioteca Beautiful Soup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que extraiga el título y el precio de todos los productos listados en un sitio web de comercio electrónico y los almacene en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Make a GET request to the website
response = requests.get('https://www.example.com/products')

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all product titles and prices
titles = [title.text for title in soup.find_all('h3', class_='product-title')]
prices = [price.text for price in soup.find_all('div', class_='product-price')]

# Zip the titles and prices together
data = list(zip(titles, prices))

# Write the data to a CSV file
with open('product_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

Ejercicio 4: Multithreading

Conceptos:

Multithreading
Biblioteca Requests
Biblioteca Threading

Descripción: Escribe un script en Python que utilice el multithreading para descargar varias imágenes de una lista de URLs simultáneamente.

Solución:

pythonCopy code
import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Ejercicio 5: Aprendizaje automático

Conceptos:

Aprendizaje automático
Biblioteca scikit-learn

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático en un conjunto de datos y lo utilice para predecir la salida para nuevos datos.

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Ejercicio 6: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Análisis de Sentimientos
Biblioteca NLTK

Descripción: Escribe un script en Python que lea un archivo de texto y realice análisis de sentimientos en el texto utilizando un modelo preentrenado de procesamiento del lenguaje natural.

Solución:

pythonCopy code
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Read the text file into a string
with open('input_file.txt', 'r') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Ejercicio 7: Desarrollo Web

Conceptos:

Desarrollo Web
Marco de trabajo Flask
Subidas de archivos

Descripción: Escribe un script en Python que cree una aplicación web utilizando el marco de trabajo Flask que permita a los usuarios cargar un archivo y realice algún procesamiento en el archivo.

Solución:

pythonCopy code
from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = os.path.basename('uploads')
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    # Get the uploaded file
    file = request.files['file']

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    # Perform processing on the file
    # ...

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Ejercicio 8: Visualización de Datos

Conceptos:

Visualización de Datos
Biblioteca Matplotlib
Gráficos de Velas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos del mercado de valores y traze un gráfico de velas de los datos.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])

# Convert the date column to Matplotlib dates format
df['Date'] = df['Date'].apply(mdates.date2num)

# Create a figure and axis objects
fig, ax = plt.subplots()

# Plot the candlestick chart
candlestick_ohlc(ax, df.values, width=0.6, colorup='green', colordown='red')

# Format the x-axis as dates
ax.xaxis_date()

# Set the axis labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.set_title('Stock Market Data')

# Display the chart
plt.show()

Ejercicio 9: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Biblioteca Scikit-learn

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']], df['species'], test_size=0.2, random_state=42)

# Train a logistic regression model on the training data
model = LogisticRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Ejercicio 10: Análisis de Datos

Conceptos:

Análisis de Datos
Sistemas de Recomendación
Filtrado Colaborativo
Biblioteca Surprise

Solución:

pythonCopy code
import pandas as pd
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Convert the pandas dataframe to a surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
for customer_id in df['customer_id'].unique():
    products = df[df['customer_id'] == customer_id]['product_id'].values
    for product_id in df['product_id'].unique():
        if product_id not in products:
            rating = model.predict(customer_id, product_id).est
            print(f"Customer {customer_id} might like product {product_id} with rating {rating}")

Ejercicio 11: Visión por Computadora

Conceptos:

Visión por Computadora
Detección de Objetos
Biblioteca OpenCV
Modelos Pre-entrenados

Descripción: Escribe un script en Python que lea una imagen y realice detección de objetos en la imagen utilizando un modelo de detección de objetos pre-entrenado.

Solución:

pythonCopy code
import cv2

# Read the image file
img = cv2.imread('image.jpg')

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Set the input image and perform object detection
model.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))
output = model.forward()

# Loop through the detected objects and draw bounding boxes around them
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > 0.5:
        x1 = int(detection[3] * img.shape[1])
        y1 = int(detection[4] * img.shape[0])
        x2 = int(detection[5] * img.shape[1])
        y2 = int(detection[6] * img.shape[0])
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Display the image with the detected objects
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Ejercicio 12: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que lea un archivo de texto y realice modelado de temas en el texto utilizando Asignación Latente de Dirichlet (LDA).

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Ejercicio 13: Web Scraping

Conceptos:

Web Scraping
Biblioteca Beautiful Soup
Biblioteca Requests
Manipulación de archivos CSV

Descripción: Escribe un script en Python que realice scraping en un sitio web para obtener información de productos y guarde la información en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Send a request to the website and get the response
response = requests.get(url)

# Parse the HTML content of the response using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])
    for listing in listings:
        name = listing.find('h3').text
        price = listing.find('span', class_='price').text
        description = listing.find('p').text
        writer.writerow([name, price, description])

Ejercicio 14: Procesamiento de Big Data

Conceptos:

Procesamiento de Big Data
PySpark
Transformaciones de Datos
Agregación
Formato de archivo Parquet

Solución:

pythonCopy code
from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter(df['purchase_date'].between('2020-01-01', '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')
df = df.groupBy('customer_id').sum('price')

# Save the results to a Parquet file
df.write.parquet('customer_spending.parquet')

Ejercicio 15: DevOps

Conceptos:

DevOps
Biblioteca Fabric

Descripción: Escribe un script en Python que automatice la implementación de una aplicación web en un servidor remoto utilizando la biblioteca Fabric.

Solución:

pythonCopy code
from fabric import Connection

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = 'password'

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Upload the local files to the remote server
c.put(local_path, remote_path)

# Install any required dependencies on the remote server
c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
c.run('pip3 install -r requirements.txt')

# Start the web application on the remote server
c.run('python3 app.py')

Ejercicio 16: Aprendizaje por Refuerzo

Conceptos:

Aprendizaje por Refuerzo
Q-Learning
Biblioteca OpenAI Gym

Descripción: Escribe un script en Python que implemente un algoritmo de aprendizaje por refuerzo para enseñar a un agente a jugar un juego simple.

Solución:

pythonCopy code
import gym
import numpy as np

# Create an OpenAI Gym environment for the game
env = gym.make('FrozenLake-v0')

# Define the Q-table for the agent
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set the hyperparameters for the algorithm
alpha = 0.8
gamma = 0.95
epsilon = 0.1
num_episodes = 2000

# Train the agent using the Q-learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False
    while not done:
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q[state, :])
        next_state, reward, done, _ = env.step(action)
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))
        state = next_state

# Test the agent by playing the game using the Q-table
state = env.reset()
done = False
while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _ = env.step(action)
    state = next_state
    env.render()

Ejercicio 17: Análisis de Series Temporales

Conceptos:

Análisis de Series Temporales
Preprocesamiento de Datos
Visualización de Datos
Modelo ARIMA
Biblioteca Statsmodels

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Resample the data to a monthly frequency and fill any missing values
df = df.resample('M').mean()
df = df.fillna(method='ffill')

# Visualize the data
plt.plot(df)
plt.show()

# Fit an ARIMA model to the data
model = sm.tsa.ARIMA(df, order=(1, 1, 1))
results = model.fit()

# Print the model summary
print(results.summary())

Ejercicio 18: Redes de Computadoras

Conceptos:

Redes de Computadoras
Protocolo TCP/IP
Programación de Sockets

Descripción: Escribe un script en Python que implemente un servidor TCP simple que acepte conexiones de clientes y envíe y reciba datos.

Solución:

pythonCopy code
import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Ejercicio 19: Análisis y Visualización de Datos

Conceptos:

Análisis de Datos
Visualización de Datos
Generación de Informes en PDF
Biblioteca Pandas
Biblioteca Matplotlib
Biblioteca ReportLab

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month']).sum()['sales']

# Plot the total sales by category and month
fig, axes = plt.subplots(nrows=len(df['category'].unique()), ncols=1, figsize=(8.5, 11))
for i, category in enumerate(df['category'].unique()):
    totals[category].plot(ax=axes[i], kind='bar', title=category)
plt.tight_layout()

# Save the plot to a PDF report
c = canvas.Canvas('sales_report.pdf', pagesize=letter)
c.drawString(50, 750, 'Sales Report')
c.drawString(50, 700, 'Total Sales by Category and Month')
plt.savefig('sales_plot.png')
c.drawImage('sales_plot.png', 50, 500, 500, 250)
c.showPage()
c.save()

Ejercicio 20: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Redes Neuronales Convolucionales
Biblioteca Keras
Conjunto de datos MNIST

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático para clasificar imágenes de dígitos escritos a mano del conjunto de datos MNIST.

Solución:

pythonCopy code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the convolutional neural network model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Ejercicio 21: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Preprocesamiento de Texto
Representación de Texto
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que utilice técnicas de procesamiento del lenguaje natural para analizar un corpus de datos de texto y extraer ideas útiles.

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Define the stop words and remove them from the text data
stop_words = stopwords.words('english')
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in word_tokenize(x.lower()) if word not in stop_words]))

# Create a document-term matrix from the text data
texts = df['text'].tolist()
tokenized = [word_tokenize(text) for text in texts]
dictionary = corpora.Dictionary(tokenized)
corpus = [dictionary.doc2bow(text) for text in tokenized]

# Perform topic modeling using LDA
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print the topics and top words for each topic
for topic in lda_model.show_topics(num_topics=num_topics):
    print('Topic {}:'.format(topic[0]))
    print(', '.join(word for word, _ in lda_model.show_topic(topic[0])))

# Extract the topic distributions for each document
topic_dists = lda_model[corpus]
df['topic_dist'] = topic_dists

# Save the results to a CSV file
df.to_csv('text_data_topics.csv', index=False)

Ejercicio 22: Web Scraping

Conceptos:

Web Scraping
Análisis de HTML
Biblioteca BeautifulSoup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que haga scraping de datos de un sitio web utilizando la biblioteca BeautifulSoup y los guarde en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Send a GET request to the URL and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the data from the HTML content
data = []
for item in soup.find_all('div', {'class': 'item'}):
    name = item.find('h3').text
    price = item.find('span', {'class': 'price'}).text
    data.append([name, price])

# Save the data to a CSV file
with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])
    for row in data:
        writer.writerow(row)

Ejercicio 23: Interacción con Bases de Datos

Conceptos:

Interacción con Bases de Datos
Base de datos SQLite
Consultas SQL
Módulo SQLite3

Descripción: Escribe un script en Python que interactúe con una base de datos para recuperar y manipular datos.

Solución:

pythonCopy code
import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Ejercicio 24: Procesamiento Paralelo

Conceptos:

Procesamiento Paralelo
Multiprocesamiento
Pool de Procesos
Tareas Ligadas a la CPU

Descripción: Escribe un script en Python que realice un cálculo que consume mucho tiempo utilizando procesamiento paralelo para acelerar el cálculo.

Solución:

pythonCopy code
import time
import multiprocessing

# Define a CPU-bound function that takes a long time to compute
def compute(num):
    result = 0
    for i in range(num):
        result += i
    return result

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)
    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Ejercicio 25: Procesamiento de Imágenes

Conceptos:

Procesamiento de Imágenes
Biblioteca Pillow
Manipulación de Imágenes
Filtrado de Imágenes

Descripción: Escribe un script en Python que realice operaciones básicas de procesamiento de imágenes en un archivo de imagen.

Solución:

pythonCopy code
from PIL import Image, ImageFilter

# Open the image file
image = Image.open('example.jpg')

# Display the original image
image.show()

# Resize the image
image = image.resize((500, 500))

# Convert the image to grayscale
image = image.convert('L')

# Apply a Gaussian blur filter
image = image.filter(ImageFilter.GaussianBlur(radius=2))

# Save the processed image to a file
image.save('processed.jpg')

# Display the processed image
image.show()

¡Espero que encuentres útiles estos ejercicios! Avísame si tienes alguna otra pregunta.

Ejercicios de Nivel Avanzado Parte 1

Ejercicio 1: Análisis de Archivos

Conceptos:

Entrada/Salida de Archivos
Expresiones regulares

Descripción: Escribe un script en Python que lea un archivo de texto y extraiga todas las URLs que estén presentes en el archivo. La salida debe ser una lista de URLs.

Solución:

pythonCopy code
import re

# Open the file for reading
with open('input_file.txt', 'r') as f:
    # Read the file contents
    file_contents = f.read()

    # Use regular expression to extract URLs
    urls = re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file_contents)

# Print the list of URLs
print(urls)

Ejercicio 2: Análisis de Datos

Conceptos:

Entrada/Salida de Archivos
Manipulación de Datos
Biblioteca Pandas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos de ventas y calcule los ingresos totales de ventas para cada categoría de producto.

Solución:

pythonCopy code
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Group the data by product category and sum the sales revenue
total_revenue = df.groupby('Product Category')['Sales Revenue'].sum()

# Print the total revenue for each product category
print(total_revenue)

Ejercicio 3: Web Scraping

Conceptos:

Web scraping
Biblioteca Requests
Biblioteca Beautiful Soup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que extraiga el título y el precio de todos los productos listados en un sitio web de comercio electrónico y los almacene en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Make a GET request to the website
response = requests.get('https://www.example.com/products')

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all product titles and prices
titles = [title.text for title in soup.find_all('h3', class_='product-title')]
prices = [price.text for price in soup.find_all('div', class_='product-price')]

# Zip the titles and prices together
data = list(zip(titles, prices))

# Write the data to a CSV file
with open('product_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

Ejercicio 4: Multithreading

Conceptos:

Multithreading
Biblioteca Requests
Biblioteca Threading

Descripción: Escribe un script en Python que utilice el multithreading para descargar varias imágenes de una lista de URLs simultáneamente.

Solución:

pythonCopy code
import requests
import threading

# URL list of images to download
url_list = ['https://www.example.com/image1.jpg', 'https://www.example.com/image2.jpg', 'https://www.example.com/image3.jpg']

# Function to download an image from a URL
def download_image(url):
    response = requests.get(url)
    with open(url.split('/')[-1], 'wb') as f:
        f.write(response.content)

# Create a thread for each URL and start them all simultaneously
threads = []
for url in url_list:
    thread = threading.Thread(target=download_image, args=(url,))
    threads.append(thread)
    thread.start()

# Wait for all threads to finish
for thread in threads:
    thread.join()

Ejercicio 5: Aprendizaje automático

Conceptos:

Aprendizaje automático
Biblioteca scikit-learn

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático en un conjunto de datos y lo utilice para predecir la salida para nuevos datos.

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Read the dataset into a pandas dataframe
df = pd.read_csv('dataset.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['feature1', 'feature2']], df['target'], test_size=0.2, random_state=42)

# Train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the mean squared error metric
mse = ((y_test - y_pred) ** 2).mean()
print("Mean squared error:", mse)

Ejercicio 6: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Análisis de Sentimientos
Biblioteca NLTK

Descripción: Escribe un script en Python que lea un archivo de texto y realice análisis de sentimientos en el texto utilizando un modelo preentrenado de procesamiento del lenguaje natural.

Solución:

pythonCopy code
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Read the text file into a string
with open('input_file.txt', 'r') as f:
    text = f.read()

# Create a SentimentIntensityAnalyzer object
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis on the text
scores = sid.polarity_scores(text)

# Print the sentiment scores
print(scores)

Ejercicio 7: Desarrollo Web

Conceptos:

Desarrollo Web
Marco de trabajo Flask
Subidas de archivos

Descripción: Escribe un script en Python que cree una aplicación web utilizando el marco de trabajo Flask que permita a los usuarios cargar un archivo y realice algún procesamiento en el archivo.

Solución:

pythonCopy code
from flask import Flask, render_template, request
import os

app = Flask(__name__)

# Set the path for file uploads
UPLOAD_FOLDER = os.path.basename('uploads')
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route for the home page
@app.route('/')
def index():
    return render_template('index.html')

# Route for file uploads
@app.route('/upload', methods=['POST'])
def upload():
    # Get the uploaded file
    file = request.files['file']

    # Save the file to the uploads folder
    file.save(os.path.join(app.config['UPLOAD_FOLDER'], file.filename))

    # Perform processing on the file
    # ...

    return 'File uploaded successfully'

if __name__ == '__main__':
    app.run(debug=True)

Ejercicio 8: Visualización de Datos

Conceptos:

Visualización de Datos
Biblioteca Matplotlib
Gráficos de Velas

Descripción: Escribe un script en Python que lea un archivo CSV que contenga datos del mercado de valores y traze un gráfico de velas de los datos.

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
import matplotlib.dates as mdates

# Read the CSV file into a pandas dataframe
df = pd.read_csv('stock_data.csv', parse_dates=['Date'])

# Convert the date column to Matplotlib dates format
df['Date'] = df['Date'].apply(mdates.date2num)

# Create a figure and axis objects
fig, ax = plt.subplots()

# Plot the candlestick chart
candlestick_ohlc(ax, df.values, width=0.6, colorup='green', colordown='red')

# Format the x-axis as dates
ax.xaxis_date()

# Set the axis labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.set_title('Stock Market Data')

# Display the chart
plt.show()

Ejercicio 9: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Biblioteca Scikit-learn

Solución:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Read the dataset into a pandas dataframe
df = pd.read_csv('flower_data.csv')

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']], df['species'], test_size=0.2, random_state=42)

# Train a logistic regression model on the training data
model = LogisticRegression()
model.fit(X_train, y_train)

# Use the model to predict the output for the testing data
y_pred = model.predict(X_test)

# Evaluate the model performance using the accuracy score metric
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Ejercicio 10: Análisis de Datos

Conceptos:

Análisis de Datos
Sistemas de Recomendación
Filtrado Colaborativo
Biblioteca Surprise

Solución:

pythonCopy code
import pandas as pd
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import train_test_split

# Read the CSV file into a pandas dataframe
df = pd.read_csv('purchase_data.csv')

# Convert the pandas dataframe to a surprise dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'rating']], reader)

# Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Train an SVD model on the training data
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(trainset)

# Use the model to predict the output for the testing data
predictions = model.test(testset)

# Evaluate the model performance using the root mean squared error metric
rmse = accuracy.rmse(predictions)
print("RMSE:", rmse)

# Recommend products to customers based on their purchase history
for customer_id in df['customer_id'].unique():
    products = df[df['customer_id'] == customer_id]['product_id'].values
    for product_id in df['product_id'].unique():
        if product_id not in products:
            rating = model.predict(customer_id, product_id).est
            print(f"Customer {customer_id} might like product {product_id} with rating {rating}")

Ejercicio 11: Visión por Computadora

Conceptos:

Visión por Computadora
Detección de Objetos
Biblioteca OpenCV
Modelos Pre-entrenados

Descripción: Escribe un script en Python que lea una imagen y realice detección de objetos en la imagen utilizando un modelo de detección de objetos pre-entrenado.

Solución:

pythonCopy code
import cv2

# Read the image file
img = cv2.imread('image.jpg')

# Load the pre-trained object detection model
model = cv2.dnn.readNetFromTensorflow('frozen_inference_graph.pb', 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

# Set the input image and perform object detection
model.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))
output = model.forward()

# Loop through the detected objects and draw bounding boxes around them
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > 0.5:
        x1 = int(detection[3] * img.shape[1])
        y1 = int(detection[4] * img.shape[0])
        x2 = int(detection[5] * img.shape[1])
        y2 = int(detection[6] * img.shape[0])
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Display the image with the detected objects
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Ejercicio 12: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que lea un archivo de texto y realice modelado de temas en el texto utilizando Asignación Latente de Dirichlet (LDA).

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel

# Read the text file into a list of strings
with open('input_file.txt', 'r') as f:
    text = f.readlines()

# Remove newlines and convert to lowercase
text = [line.strip().lower() for line in text]

# Tokenize the text into words
tokens = [line.split() for line in text]

# Create a dictionary of words and their frequency
dictionary = corpora.Dictionary(tokens)

# Create a bag-of-words representation of the text
corpus = [dictionary.doc2bow(token) for token in tokens]

# Train an LDA model on the text
model = LdaModel(corpus, id2word=dictionary, num_topics=5, passes=10)

# Print the topics and their associated words
for topic in model.print_topics(num_words=5):
    print(topic)

Ejercicio 13: Web Scraping

Conceptos:

Web Scraping
Biblioteca Beautiful Soup
Biblioteca Requests
Manipulación de archivos CSV

Descripción: Escribe un script en Python que realice scraping en un sitio web para obtener información de productos y guarde la información en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL of the website to scrape
url = 'https://www.example.com/products'

# Send a request to the website and get the response
response = requests.get(url)

# Parse the HTML content of the response using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find all the product listings on the page
listings = soup.find_all('div', class_='product-listing')

# Write the product information to a CSV file
with open('products.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Product Name', 'Price', 'Description'])
    for listing in listings:
        name = listing.find('h3').text
        price = listing.find('span', class_='price').text
        description = listing.find('p').text
        writer.writerow([name, price, description])

Ejercicio 14: Procesamiento de Big Data

Conceptos:

Procesamiento de Big Data
PySpark
Transformaciones de Datos
Agregación
Formato de archivo Parquet

Solución:

pythonCopy code
from pyspark.sql import SparkSession

# Create a SparkSession object
spark = SparkSession.builder.appName('customer-purchases').getOrCreate()

# Read the CSV file into a Spark DataFrame
df = spark.read.csv('customer_purchases.csv', header=True, inferSchema=True)

# Perform some data transformations
df = df.filter(df['purchase_date'].between('2020-01-01', '2020-12-31'))
df = df.select('customer_id', 'product_id', 'price')
df = df.groupBy('customer_id').sum('price')

# Save the results to a Parquet file
df.write.parquet('customer_spending.parquet')

Ejercicio 15: DevOps

Conceptos:

DevOps
Biblioteca Fabric

Descripción: Escribe un script en Python que automatice la implementación de una aplicación web en un servidor remoto utilizando la biblioteca Fabric.

Solución:

pythonCopy code
from fabric import Connection

# Define the host and user credentials for the remote server
host = 'example.com'
user = 'user'
password = 'password'

# Define the path to the web application on the local machine and the remote server
local_path = '/path/to/local/app'
remote_path = '/path/to/remote/app'

# Create a connection to the remote server
c = Connection(host=host, user=user, connect_kwargs={'password': password})

# Upload the local files to the remote server
c.put(local_path, remote_path)

# Install any required dependencies on the remote server
c.run('sudo apt-get update && sudo apt-get install -y python3-pip')
c.run('pip3 install -r requirements.txt')

# Start the web application on the remote server
c.run('python3 app.py')

Ejercicio 16: Aprendizaje por Refuerzo

Conceptos:

Aprendizaje por Refuerzo
Q-Learning
Biblioteca OpenAI Gym

Descripción: Escribe un script en Python que implemente un algoritmo de aprendizaje por refuerzo para enseñar a un agente a jugar un juego simple.

Solución:

pythonCopy code
import gym
import numpy as np

# Create an OpenAI Gym environment for the game
env = gym.make('FrozenLake-v0')

# Define the Q-table for the agent
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set the hyperparameters for the algorithm
alpha = 0.8
gamma = 0.95
epsilon = 0.1
num_episodes = 2000

# Train the agent using the Q-learning algorithm
for i in range(num_episodes):
    state = env.reset()
    done = False
    while not done:
        if np.random.uniform() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q[state, :])
        next_state, reward, done, _ = env.step(action)
        Q[state, action] = (1 - alpha) * Q[state, action] + alpha * (reward + gamma * np.max(Q[next_state, :]))
        state = next_state

# Test the agent by playing the game using the Q-table
state = env.reset()
done = False
while not done:
    action = np.argmax(Q[state, :])
    next_state, reward, done, _ = env.step(action)
    state = next_state
    env.render()

Ejercicio 17: Análisis de Series Temporales

Conceptos:

Análisis de Series Temporales
Preprocesamiento de Datos
Visualización de Datos
Modelo ARIMA
Biblioteca Statsmodels

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Read the CSV file into a pandas dataframe
df = pd.read_csv('time_series.csv')

# Convert the date column to a datetime object and set it as the index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Resample the data to a monthly frequency and fill any missing values
df = df.resample('M').mean()
df = df.fillna(method='ffill')

# Visualize the data
plt.plot(df)
plt.show()

# Fit an ARIMA model to the data
model = sm.tsa.ARIMA(df, order=(1, 1, 1))
results = model.fit()

# Print the model summary
print(results.summary())

Ejercicio 18: Redes de Computadoras

Conceptos:

Redes de Computadoras
Protocolo TCP/IP
Programación de Sockets

Descripción: Escribe un script en Python que implemente un servidor TCP simple que acepte conexiones de clientes y envíe y reciba datos.

Solución:

pythonCopy code
import socket

# Define the host and port for the server
host = 'localhost'
port = 12345

# Create a socket object
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the host and port
s.bind((host, port))

# Listen for incoming connections
s.listen(1)
print('Server listening on', host, port)

# Accept a client connection
conn, addr = s.accept()
print('Connected by', addr)

# Send data to the client
conn.sendall(b'Hello, client!')

# Receive data from the client
data = conn.recv(1024)
print('Received:', data.decode())

# Close the connection
conn.close()

Ejercicio 19: Análisis y Visualización de Datos

Conceptos:

Análisis de Datos
Visualización de Datos
Generación de Informes en PDF
Biblioteca Pandas
Biblioteca Matplotlib
Biblioteca ReportLab

Solución:

pythonCopy code
import pandas as pd
import matplotlib.pyplot as plt
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

# Read the CSV file into a pandas dataframe
df = pd.read_csv('sales_data.csv')

# Calculate the total sales by category and month
totals = df.groupby(['category', 'month']).sum()['sales']

# Plot the total sales by category and month
fig, axes = plt.subplots(nrows=len(df['category'].unique()), ncols=1, figsize=(8.5, 11))
for i, category in enumerate(df['category'].unique()):
    totals[category].plot(ax=axes[i], kind='bar', title=category)
plt.tight_layout()

# Save the plot to a PDF report
c = canvas.Canvas('sales_report.pdf', pagesize=letter)
c.drawString(50, 750, 'Sales Report')
c.drawString(50, 700, 'Total Sales by Category and Month')
plt.savefig('sales_plot.png')
c.drawImage('sales_plot.png', 50, 500, 500, 250)
c.showPage()
c.save()

Ejercicio 20: Aprendizaje Automático

Conceptos:

Aprendizaje Automático
Redes Neuronales Convolucionales
Biblioteca Keras
Conjunto de datos MNIST

Descripción: Escribe un script en Python que entrene un modelo de aprendizaje automático para clasificar imágenes de dígitos escritos a mano del conjunto de datos MNIST.

Solución:

pythonCopy code
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize the pixel values and reshape the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Define the convolutional neural network model
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', test_acc)

Ejercicio 21: Procesamiento del Lenguaje Natural

Conceptos:

Procesamiento del Lenguaje Natural
Preprocesamiento de Texto
Representación de Texto
Modelado de Temas
Asignación Latente de Dirichlet
Biblioteca Gensim

Descripción: Escribe un script en Python que utilice técnicas de procesamiento del lenguaje natural para analizar un corpus de datos de texto y extraer ideas útiles.

Solución:

pythonCopy code
import gensim
from gensim import corpora
from gensim.models import LdaModel
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd

# Read the text data into a pandas dataframe
df = pd.read_csv('text_data.csv')

# Define the stop words and remove them from the text data
stop_words = stopwords.words('english')
df['text'] = df['text'].apply(lambda x: ' '.join([word for word in word_tokenize(x.lower()) if word not in stop_words]))

# Create a document-term matrix from the text data
texts = df['text'].tolist()
tokenized = [word_tokenize(text) for text in texts]
dictionary = corpora.Dictionary(tokenized)
corpus = [dictionary.doc2bow(text) for text in tokenized]

# Perform topic modeling using LDA
num_topics = 5
lda_model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=10)

# Print the topics and top words for each topic
for topic in lda_model.show_topics(num_topics=num_topics):
    print('Topic {}:'.format(topic[0]))
    print(', '.join(word for word, _ in lda_model.show_topic(topic[0])))

# Extract the topic distributions for each document
topic_dists = lda_model[corpus]
df['topic_dist'] = topic_dists

# Save the results to a CSV file
df.to_csv('text_data_topics.csv', index=False)

Ejercicio 22: Web Scraping

Conceptos:

Web Scraping
Análisis de HTML
Biblioteca BeautifulSoup
Entrada/Salida de Archivos CSV

Descripción: Escribe un script en Python que haga scraping de datos de un sitio web utilizando la biblioteca BeautifulSoup y los guarde en un archivo CSV.

Solución:

pythonCopy code
import requests
from bs4 import BeautifulSoup
import csv

# Define the URL to scrape
url = 'https://www.example.com'

# Send a GET request to the URL and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

# Extract the data from the HTML content
data = []
for item in soup.find_all('div', {'class': 'item'}):
    name = item.find('h3').text
    price = item.find('span', {'class': 'price'}).text
    data.append([name, price])

# Save the data to a CSV file
with open('data.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['Name', 'Price'])
    for row in data:
        writer.writerow(row)

Ejercicio 23: Interacción con Bases de Datos

Conceptos:

Interacción con Bases de Datos
Base de datos SQLite
Consultas SQL
Módulo SQLite3

Descripción: Escribe un script en Python que interactúe con una base de datos para recuperar y manipular datos.

Solución:

pythonCopy code
import sqlite3

# Connect to the database
conn = sqlite3.connect('example.db')

# Create a cursor object
c = conn.cursor()

# Execute an SQL query to create a table
c.execute('''CREATE TABLE IF NOT EXISTS customers
             (id INTEGER PRIMARY KEY, name TEXT, email TEXT, phone TEXT)''')

# Execute an SQL query to insert data into the table
c.execute("INSERT INTO customers (name, email, phone) VALUES ('John Smith', 'john@example.com', '555-1234')")

# Execute an SQL query to retrieve data from the table
c.execute("SELECT * FROM customers")
rows = c.fetchall()
for row in rows:
    print(row)

# Execute an SQL query to update data in the table
c.execute("UPDATE customers SET phone='555-5678' WHERE name='John Smith'")

# Execute an SQL query to delete data from the table
c.execute("DELETE FROM customers WHERE name='John Smith'")

# Commit the changes to the database
conn.commit()

# Close the database connection
conn.close()

Ejercicio 24: Procesamiento Paralelo

Conceptos:

Procesamiento Paralelo
Multiprocesamiento
Pool de Procesos
Tareas Ligadas a la CPU

Descripción: Escribe un script en Python que realice un cálculo que consume mucho tiempo utilizando procesamiento paralelo para acelerar el cálculo.

Solución:

pythonCopy code
import time
import multiprocessing

# Define a CPU-bound function that takes a long time to compute
def compute(num):
    result = 0
    for i in range(num):
        result += i
    return result

if __name__ == '__main__':
    # Create a process pool with the number of CPUs available
    num_cpus = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(num_cpus)

    # Generate a list of numbers to compute
    num_list = [10000000] * num_cpus

    # Compute the results using parallel processing
    start_time = time.time()
    results = pool.map(compute, num_list)
    end_time = time.time()

    # Print the results and computation time
    print('Results:', results)
    print('Computation time:', end_time - start_time, 'seconds')

Ejercicio 25: Procesamiento de Imágenes

Conceptos:

Procesamiento de Imágenes
Biblioteca Pillow
Manipulación de Imágenes
Filtrado de Imágenes

Descripción: Escribe un script en Python que realice operaciones básicas de procesamiento de imágenes en un archivo de imagen.

Solución:

pythonCopy code
from PIL import Image, ImageFilter

# Open the image file
image = Image.open('example.jpg')

# Display the original image
image.show()

# Resize the image
image = image.resize((500, 500))

# Convert the image to grayscale
image = image.convert('L')

# Apply a Gaussian blur filter
image = image.filter(ImageFilter.GaussianBlur(radius=2))

# Save the processed image to a file
image.save('processed.jpg')

# Display the processed image
image.show()

¡Espero que encuentres útiles estos ejercicios! Avísame si tienes alguna otra pregunta.

Compra este libro