Menu iconMenu iconPython & SQL Bible
Python & SQL Bible

Chapter 18: Data Analysis with Python and SQL

18.6 Practical Exercises of Chapter 18: Data Analysis with Python and SQL

Exercise 1: Data Cleaning

You have a table in your SQLite database named employee_data with the columns idnameageemaildepartment, and salary. Unfortunately, some rows contain missing values (None in Python, NULL in SQL), and some email entries are not in the proper format (they should be something@domain.com).

Write a Python script using sqlite3 module to:

  1. Remove all the rows with any column having None/NULL.
  2. Validate the email entries and remove the rows with invalid email format.
import sqlite3
import re

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Create a cursor object
c = conn.cursor()

# Remove rows with any NULL value
c.execute("DELETE FROM employee_data WHERE id IS NULL OR name IS NULL OR age IS NULL OR email IS NULL OR department IS NULL OR salary IS NULL")

# Validate email format and remove rows with invalid emails
c.execute("SELECT * FROM employee_data")
rows = c.fetchall()
for row in rows:
    if not re.match(r"[^@]+@[^@]+\.[^@]+", row[3]):
        c.execute("DELETE FROM employee_data WHERE id=?", (row[0],))

# Commit the changes and close the connection
conn.commit()
conn.close()

Exercise 2: Data Transformation

Assuming you have a table in your SQLite database named sales with the columns idregiontotal_sales, and date:

  1. Write a SQL query to add a new column profit, which is 10% of total_sales.
  2. Write a Python script using sqlite3 to implement the SQL query.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Add a new column 'profit'
c.execute("ALTER TABLE sales ADD COLUMN profit REAL")

# Update 'profit' as 10% of 'total_sales'
c.execute("UPDATE sales SET profit = total_sales * 0.1")

conn.commit()
conn.close()

Exercise 3: Querying SQL Database from Python

Using the sales table in your SQLite database:

  1. Write a Python script using sqlite3 module to fetch all the rows where region is 'West', and print each row.
  2. Calculate the average total_sales for the 'West' region in Python.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Fetch and print rows where 'region' is 'West'
c.execute("SELECT * FROM sales WHERE region = 'West'")
rows = c.fetchall()
for row in rows:
    print(row)

# Calculate the average 'total_sales' for the 'West' region
c.execute("SELECT AVG(total_sales) FROM sales WHERE region = 'West'")
average_sales = c.fetchone()[0]
print(f'Average sales in the West region: {average_sales}')

conn.close()

Chapter 18 Conclusion

This chapter provided a comprehensive overview of how Python and SQL can work in harmony to provide efficient and flexible solutions for data analysis tasks. The process begins with data cleaning, a crucial step to ensure the quality of data analysis. We explored how to handle missing data and duplicates, both in Python with pandas and directly in SQL.

We dived into the world of data manipulation and transformation, demonstrating how you can leverage the power of SQL's syntax and Python's pandas library to extract, convert, and create new data from existing datasets. SQL proved itself to be a powerful tool for manipulating data in place, while Python offered a flexible and intuitive environment for complex transformations and operations.

The chapter also emphasized the significance of exploratory data analysis, the practice of summarizing the main characteristics of a dataset, often through visual means. Here, we saw how Python's pandas library could be used to generate meaningful insights from our data, which can help inform further data analysis steps or business decisions.

Next, we dived into the art of SQL queries for data analysis. Advanced SQL concepts like joining tables, using aggregate functions, and crafting complex queries became accessible and practical. We learned how we could use these tools not just for extracting data, but also for conducting substantive data analysis directly within a SQL environment.

Finally, we provided a set of practical exercises, cementing the concepts covered in this chapter and providing hands-on experience in working with Python and SQL together in the context of data analysis.

The skills and knowledge you've gained in this chapter are valuable tools for any aspiring data scientist or data analyst. Mastering them will give you an edge in your ability to handle, analyze, and gain insights from data. Going forward, these will form the bedrock for more advanced techniques in data science and machine learning.

In the next chapter, we will continue to build on these foundations, diving into more advanced SQL functionalities within Python and integrating more sophisticated data analysis techniques into our toolkit. Stay tuned!

18.6 Practical Exercises of Chapter 18: Data Analysis with Python and SQL

Exercise 1: Data Cleaning

You have a table in your SQLite database named employee_data with the columns idnameageemaildepartment, and salary. Unfortunately, some rows contain missing values (None in Python, NULL in SQL), and some email entries are not in the proper format (they should be something@domain.com).

Write a Python script using sqlite3 module to:

  1. Remove all the rows with any column having None/NULL.
  2. Validate the email entries and remove the rows with invalid email format.
import sqlite3
import re

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Create a cursor object
c = conn.cursor()

# Remove rows with any NULL value
c.execute("DELETE FROM employee_data WHERE id IS NULL OR name IS NULL OR age IS NULL OR email IS NULL OR department IS NULL OR salary IS NULL")

# Validate email format and remove rows with invalid emails
c.execute("SELECT * FROM employee_data")
rows = c.fetchall()
for row in rows:
    if not re.match(r"[^@]+@[^@]+\.[^@]+", row[3]):
        c.execute("DELETE FROM employee_data WHERE id=?", (row[0],))

# Commit the changes and close the connection
conn.commit()
conn.close()

Exercise 2: Data Transformation

Assuming you have a table in your SQLite database named sales with the columns idregiontotal_sales, and date:

  1. Write a SQL query to add a new column profit, which is 10% of total_sales.
  2. Write a Python script using sqlite3 to implement the SQL query.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Add a new column 'profit'
c.execute("ALTER TABLE sales ADD COLUMN profit REAL")

# Update 'profit' as 10% of 'total_sales'
c.execute("UPDATE sales SET profit = total_sales * 0.1")

conn.commit()
conn.close()

Exercise 3: Querying SQL Database from Python

Using the sales table in your SQLite database:

  1. Write a Python script using sqlite3 module to fetch all the rows where region is 'West', and print each row.
  2. Calculate the average total_sales for the 'West' region in Python.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Fetch and print rows where 'region' is 'West'
c.execute("SELECT * FROM sales WHERE region = 'West'")
rows = c.fetchall()
for row in rows:
    print(row)

# Calculate the average 'total_sales' for the 'West' region
c.execute("SELECT AVG(total_sales) FROM sales WHERE region = 'West'")
average_sales = c.fetchone()[0]
print(f'Average sales in the West region: {average_sales}')

conn.close()

Chapter 18 Conclusion

This chapter provided a comprehensive overview of how Python and SQL can work in harmony to provide efficient and flexible solutions for data analysis tasks. The process begins with data cleaning, a crucial step to ensure the quality of data analysis. We explored how to handle missing data and duplicates, both in Python with pandas and directly in SQL.

We dived into the world of data manipulation and transformation, demonstrating how you can leverage the power of SQL's syntax and Python's pandas library to extract, convert, and create new data from existing datasets. SQL proved itself to be a powerful tool for manipulating data in place, while Python offered a flexible and intuitive environment for complex transformations and operations.

The chapter also emphasized the significance of exploratory data analysis, the practice of summarizing the main characteristics of a dataset, often through visual means. Here, we saw how Python's pandas library could be used to generate meaningful insights from our data, which can help inform further data analysis steps or business decisions.

Next, we dived into the art of SQL queries for data analysis. Advanced SQL concepts like joining tables, using aggregate functions, and crafting complex queries became accessible and practical. We learned how we could use these tools not just for extracting data, but also for conducting substantive data analysis directly within a SQL environment.

Finally, we provided a set of practical exercises, cementing the concepts covered in this chapter and providing hands-on experience in working with Python and SQL together in the context of data analysis.

The skills and knowledge you've gained in this chapter are valuable tools for any aspiring data scientist or data analyst. Mastering them will give you an edge in your ability to handle, analyze, and gain insights from data. Going forward, these will form the bedrock for more advanced techniques in data science and machine learning.

In the next chapter, we will continue to build on these foundations, diving into more advanced SQL functionalities within Python and integrating more sophisticated data analysis techniques into our toolkit. Stay tuned!

18.6 Practical Exercises of Chapter 18: Data Analysis with Python and SQL

Exercise 1: Data Cleaning

You have a table in your SQLite database named employee_data with the columns idnameageemaildepartment, and salary. Unfortunately, some rows contain missing values (None in Python, NULL in SQL), and some email entries are not in the proper format (they should be something@domain.com).

Write a Python script using sqlite3 module to:

  1. Remove all the rows with any column having None/NULL.
  2. Validate the email entries and remove the rows with invalid email format.
import sqlite3
import re

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Create a cursor object
c = conn.cursor()

# Remove rows with any NULL value
c.execute("DELETE FROM employee_data WHERE id IS NULL OR name IS NULL OR age IS NULL OR email IS NULL OR department IS NULL OR salary IS NULL")

# Validate email format and remove rows with invalid emails
c.execute("SELECT * FROM employee_data")
rows = c.fetchall()
for row in rows:
    if not re.match(r"[^@]+@[^@]+\.[^@]+", row[3]):
        c.execute("DELETE FROM employee_data WHERE id=?", (row[0],))

# Commit the changes and close the connection
conn.commit()
conn.close()

Exercise 2: Data Transformation

Assuming you have a table in your SQLite database named sales with the columns idregiontotal_sales, and date:

  1. Write a SQL query to add a new column profit, which is 10% of total_sales.
  2. Write a Python script using sqlite3 to implement the SQL query.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Add a new column 'profit'
c.execute("ALTER TABLE sales ADD COLUMN profit REAL")

# Update 'profit' as 10% of 'total_sales'
c.execute("UPDATE sales SET profit = total_sales * 0.1")

conn.commit()
conn.close()

Exercise 3: Querying SQL Database from Python

Using the sales table in your SQLite database:

  1. Write a Python script using sqlite3 module to fetch all the rows where region is 'West', and print each row.
  2. Calculate the average total_sales for the 'West' region in Python.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Fetch and print rows where 'region' is 'West'
c.execute("SELECT * FROM sales WHERE region = 'West'")
rows = c.fetchall()
for row in rows:
    print(row)

# Calculate the average 'total_sales' for the 'West' region
c.execute("SELECT AVG(total_sales) FROM sales WHERE region = 'West'")
average_sales = c.fetchone()[0]
print(f'Average sales in the West region: {average_sales}')

conn.close()

Chapter 18 Conclusion

This chapter provided a comprehensive overview of how Python and SQL can work in harmony to provide efficient and flexible solutions for data analysis tasks. The process begins with data cleaning, a crucial step to ensure the quality of data analysis. We explored how to handle missing data and duplicates, both in Python with pandas and directly in SQL.

We dived into the world of data manipulation and transformation, demonstrating how you can leverage the power of SQL's syntax and Python's pandas library to extract, convert, and create new data from existing datasets. SQL proved itself to be a powerful tool for manipulating data in place, while Python offered a flexible and intuitive environment for complex transformations and operations.

The chapter also emphasized the significance of exploratory data analysis, the practice of summarizing the main characteristics of a dataset, often through visual means. Here, we saw how Python's pandas library could be used to generate meaningful insights from our data, which can help inform further data analysis steps or business decisions.

Next, we dived into the art of SQL queries for data analysis. Advanced SQL concepts like joining tables, using aggregate functions, and crafting complex queries became accessible and practical. We learned how we could use these tools not just for extracting data, but also for conducting substantive data analysis directly within a SQL environment.

Finally, we provided a set of practical exercises, cementing the concepts covered in this chapter and providing hands-on experience in working with Python and SQL together in the context of data analysis.

The skills and knowledge you've gained in this chapter are valuable tools for any aspiring data scientist or data analyst. Mastering them will give you an edge in your ability to handle, analyze, and gain insights from data. Going forward, these will form the bedrock for more advanced techniques in data science and machine learning.

In the next chapter, we will continue to build on these foundations, diving into more advanced SQL functionalities within Python and integrating more sophisticated data analysis techniques into our toolkit. Stay tuned!

18.6 Practical Exercises of Chapter 18: Data Analysis with Python and SQL

Exercise 1: Data Cleaning

You have a table in your SQLite database named employee_data with the columns idnameageemaildepartment, and salary. Unfortunately, some rows contain missing values (None in Python, NULL in SQL), and some email entries are not in the proper format (they should be something@domain.com).

Write a Python script using sqlite3 module to:

  1. Remove all the rows with any column having None/NULL.
  2. Validate the email entries and remove the rows with invalid email format.
import sqlite3
import re

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Create a cursor object
c = conn.cursor()

# Remove rows with any NULL value
c.execute("DELETE FROM employee_data WHERE id IS NULL OR name IS NULL OR age IS NULL OR email IS NULL OR department IS NULL OR salary IS NULL")

# Validate email format and remove rows with invalid emails
c.execute("SELECT * FROM employee_data")
rows = c.fetchall()
for row in rows:
    if not re.match(r"[^@]+@[^@]+\.[^@]+", row[3]):
        c.execute("DELETE FROM employee_data WHERE id=?", (row[0],))

# Commit the changes and close the connection
conn.commit()
conn.close()

Exercise 2: Data Transformation

Assuming you have a table in your SQLite database named sales with the columns idregiontotal_sales, and date:

  1. Write a SQL query to add a new column profit, which is 10% of total_sales.
  2. Write a Python script using sqlite3 to implement the SQL query.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Add a new column 'profit'
c.execute("ALTER TABLE sales ADD COLUMN profit REAL")

# Update 'profit' as 10% of 'total_sales'
c.execute("UPDATE sales SET profit = total_sales * 0.1")

conn.commit()
conn.close()

Exercise 3: Querying SQL Database from Python

Using the sales table in your SQLite database:

  1. Write a Python script using sqlite3 module to fetch all the rows where region is 'West', and print each row.
  2. Calculate the average total_sales for the 'West' region in Python.
import sqlite3

conn = sqlite3.connect('my_database.db')
c = conn.cursor()

# Fetch and print rows where 'region' is 'West'
c.execute("SELECT * FROM sales WHERE region = 'West'")
rows = c.fetchall()
for row in rows:
    print(row)

# Calculate the average 'total_sales' for the 'West' region
c.execute("SELECT AVG(total_sales) FROM sales WHERE region = 'West'")
average_sales = c.fetchone()[0]
print(f'Average sales in the West region: {average_sales}')

conn.close()

Chapter 18 Conclusion

This chapter provided a comprehensive overview of how Python and SQL can work in harmony to provide efficient and flexible solutions for data analysis tasks. The process begins with data cleaning, a crucial step to ensure the quality of data analysis. We explored how to handle missing data and duplicates, both in Python with pandas and directly in SQL.

We dived into the world of data manipulation and transformation, demonstrating how you can leverage the power of SQL's syntax and Python's pandas library to extract, convert, and create new data from existing datasets. SQL proved itself to be a powerful tool for manipulating data in place, while Python offered a flexible and intuitive environment for complex transformations and operations.

The chapter also emphasized the significance of exploratory data analysis, the practice of summarizing the main characteristics of a dataset, often through visual means. Here, we saw how Python's pandas library could be used to generate meaningful insights from our data, which can help inform further data analysis steps or business decisions.

Next, we dived into the art of SQL queries for data analysis. Advanced SQL concepts like joining tables, using aggregate functions, and crafting complex queries became accessible and practical. We learned how we could use these tools not just for extracting data, but also for conducting substantive data analysis directly within a SQL environment.

Finally, we provided a set of practical exercises, cementing the concepts covered in this chapter and providing hands-on experience in working with Python and SQL together in the context of data analysis.

The skills and knowledge you've gained in this chapter are valuable tools for any aspiring data scientist or data analyst. Mastering them will give you an edge in your ability to handle, analyze, and gain insights from data. Going forward, these will form the bedrock for more advanced techniques in data science and machine learning.

In the next chapter, we will continue to build on these foundations, diving into more advanced SQL functionalities within Python and integrating more sophisticated data analysis techniques into our toolkit. Stay tuned!