Quiz Part 3: Data Cleaning and Preprocessing
Questions
This quiz will test your understanding of the techniques covered in Part 3. Each question focuses on key concepts from advanced data cleaning, time series handling, and dimensionality reduction.
1. Which of the following methods is most effective for removing features with very low variance?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Principal Component Analysis (PCA)
d) Lasso Regression
2. When working with time series data, which technique would be most appropriate for handling missing dates to maintain the dataset’s temporal consistency?
a) Removing all rows with missing dates
b) Filling missing dates with the average date
c) Reindexing the data to a regular frequency and using forward-fill or backward-fill
d) Replacing missing dates with a constant date value
3. What does the explained variance ratio in PCA represent?
a) The variance of the dataset captured by each original feature
b) The number of components selected to reach 100% variance
c) The proportion of the dataset’s variance captured by each principal component
d) The total variance of the transformed data
4. In which of the following scenarios would using cyclical encoding (sine and cosine transformation) for a feature be most appropriate?
a) Encoding daily sales figures
b) Encoding categorical variables like product categories
c) Encoding features with cyclical patterns, like day of the week
d) Encoding user IDs in a dataset
5. Which of the following is NOT a characteristic of filter methods in feature selection?
a) They rank features independently of a specific model
b) They are computationally efficient
c) They rely on model training to determine feature importance
d) They use metrics like correlation and variance for feature selection
6. Which dimensionality reduction technique uses a linear transformation to create new axes that capture maximum variance?
a) Linear Discriminant Analysis (LDA)
b) Principal Component Analysis (PCA)
c) Recursive Feature Elimination (RFE)
d) Lasso Regression
7. If two features in a dataset have a correlation coefficient close to 1, which technique would help to reduce redundancy without losing critical information?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Correlation Thresholding
d) Lasso Regression
8. In which situation might wrapper methods for feature selection be more effective than filter methods?
a) When computational efficiency is the top priority
b) When working with a dataset with many highly correlated features
c) When interaction effects between features need to be captured
d) When ranking features based on statistical properties alone
9. What is a common drawback of using Lasso regression for feature selection?
a) It may fail to remove any features from the model
b) It cannot be combined with other dimensionality reduction methods
c) It may over-penalize features, potentially leading to underfitting
d) It does not assign a coefficient of zero to unimportant features
10. Why is it recommended to use cross-validation when applying wrapper methods like Recursive Feature Elimination (RFE) on small datasets?
a) To maximize feature interactions
b) To ensure that selected features generalize well to new data
c) To avoid data leakage
d) To prioritize computational efficiency
Questions
This quiz will test your understanding of the techniques covered in Part 3. Each question focuses on key concepts from advanced data cleaning, time series handling, and dimensionality reduction.
1. Which of the following methods is most effective for removing features with very low variance?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Principal Component Analysis (PCA)
d) Lasso Regression
2. When working with time series data, which technique would be most appropriate for handling missing dates to maintain the dataset’s temporal consistency?
a) Removing all rows with missing dates
b) Filling missing dates with the average date
c) Reindexing the data to a regular frequency and using forward-fill or backward-fill
d) Replacing missing dates with a constant date value
3. What does the explained variance ratio in PCA represent?
a) The variance of the dataset captured by each original feature
b) The number of components selected to reach 100% variance
c) The proportion of the dataset’s variance captured by each principal component
d) The total variance of the transformed data
4. In which of the following scenarios would using cyclical encoding (sine and cosine transformation) for a feature be most appropriate?
a) Encoding daily sales figures
b) Encoding categorical variables like product categories
c) Encoding features with cyclical patterns, like day of the week
d) Encoding user IDs in a dataset
5. Which of the following is NOT a characteristic of filter methods in feature selection?
a) They rank features independently of a specific model
b) They are computationally efficient
c) They rely on model training to determine feature importance
d) They use metrics like correlation and variance for feature selection
6. Which dimensionality reduction technique uses a linear transformation to create new axes that capture maximum variance?
a) Linear Discriminant Analysis (LDA)
b) Principal Component Analysis (PCA)
c) Recursive Feature Elimination (RFE)
d) Lasso Regression
7. If two features in a dataset have a correlation coefficient close to 1, which technique would help to reduce redundancy without losing critical information?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Correlation Thresholding
d) Lasso Regression
8. In which situation might wrapper methods for feature selection be more effective than filter methods?
a) When computational efficiency is the top priority
b) When working with a dataset with many highly correlated features
c) When interaction effects between features need to be captured
d) When ranking features based on statistical properties alone
9. What is a common drawback of using Lasso regression for feature selection?
a) It may fail to remove any features from the model
b) It cannot be combined with other dimensionality reduction methods
c) It may over-penalize features, potentially leading to underfitting
d) It does not assign a coefficient of zero to unimportant features
10. Why is it recommended to use cross-validation when applying wrapper methods like Recursive Feature Elimination (RFE) on small datasets?
a) To maximize feature interactions
b) To ensure that selected features generalize well to new data
c) To avoid data leakage
d) To prioritize computational efficiency
Questions
This quiz will test your understanding of the techniques covered in Part 3. Each question focuses on key concepts from advanced data cleaning, time series handling, and dimensionality reduction.
1. Which of the following methods is most effective for removing features with very low variance?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Principal Component Analysis (PCA)
d) Lasso Regression
2. When working with time series data, which technique would be most appropriate for handling missing dates to maintain the dataset’s temporal consistency?
a) Removing all rows with missing dates
b) Filling missing dates with the average date
c) Reindexing the data to a regular frequency and using forward-fill or backward-fill
d) Replacing missing dates with a constant date value
3. What does the explained variance ratio in PCA represent?
a) The variance of the dataset captured by each original feature
b) The number of components selected to reach 100% variance
c) The proportion of the dataset’s variance captured by each principal component
d) The total variance of the transformed data
4. In which of the following scenarios would using cyclical encoding (sine and cosine transformation) for a feature be most appropriate?
a) Encoding daily sales figures
b) Encoding categorical variables like product categories
c) Encoding features with cyclical patterns, like day of the week
d) Encoding user IDs in a dataset
5. Which of the following is NOT a characteristic of filter methods in feature selection?
a) They rank features independently of a specific model
b) They are computationally efficient
c) They rely on model training to determine feature importance
d) They use metrics like correlation and variance for feature selection
6. Which dimensionality reduction technique uses a linear transformation to create new axes that capture maximum variance?
a) Linear Discriminant Analysis (LDA)
b) Principal Component Analysis (PCA)
c) Recursive Feature Elimination (RFE)
d) Lasso Regression
7. If two features in a dataset have a correlation coefficient close to 1, which technique would help to reduce redundancy without losing critical information?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Correlation Thresholding
d) Lasso Regression
8. In which situation might wrapper methods for feature selection be more effective than filter methods?
a) When computational efficiency is the top priority
b) When working with a dataset with many highly correlated features
c) When interaction effects between features need to be captured
d) When ranking features based on statistical properties alone
9. What is a common drawback of using Lasso regression for feature selection?
a) It may fail to remove any features from the model
b) It cannot be combined with other dimensionality reduction methods
c) It may over-penalize features, potentially leading to underfitting
d) It does not assign a coefficient of zero to unimportant features
10. Why is it recommended to use cross-validation when applying wrapper methods like Recursive Feature Elimination (RFE) on small datasets?
a) To maximize feature interactions
b) To ensure that selected features generalize well to new data
c) To avoid data leakage
d) To prioritize computational efficiency
Questions
This quiz will test your understanding of the techniques covered in Part 3. Each question focuses on key concepts from advanced data cleaning, time series handling, and dimensionality reduction.
1. Which of the following methods is most effective for removing features with very low variance?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Principal Component Analysis (PCA)
d) Lasso Regression
2. When working with time series data, which technique would be most appropriate for handling missing dates to maintain the dataset’s temporal consistency?
a) Removing all rows with missing dates
b) Filling missing dates with the average date
c) Reindexing the data to a regular frequency and using forward-fill or backward-fill
d) Replacing missing dates with a constant date value
3. What does the explained variance ratio in PCA represent?
a) The variance of the dataset captured by each original feature
b) The number of components selected to reach 100% variance
c) The proportion of the dataset’s variance captured by each principal component
d) The total variance of the transformed data
4. In which of the following scenarios would using cyclical encoding (sine and cosine transformation) for a feature be most appropriate?
a) Encoding daily sales figures
b) Encoding categorical variables like product categories
c) Encoding features with cyclical patterns, like day of the week
d) Encoding user IDs in a dataset
5. Which of the following is NOT a characteristic of filter methods in feature selection?
a) They rank features independently of a specific model
b) They are computationally efficient
c) They rely on model training to determine feature importance
d) They use metrics like correlation and variance for feature selection
6. Which dimensionality reduction technique uses a linear transformation to create new axes that capture maximum variance?
a) Linear Discriminant Analysis (LDA)
b) Principal Component Analysis (PCA)
c) Recursive Feature Elimination (RFE)
d) Lasso Regression
7. If two features in a dataset have a correlation coefficient close to 1, which technique would help to reduce redundancy without losing critical information?
a) Recursive Feature Elimination (RFE)
b) Variance Thresholding
c) Correlation Thresholding
d) Lasso Regression
8. In which situation might wrapper methods for feature selection be more effective than filter methods?
a) When computational efficiency is the top priority
b) When working with a dataset with many highly correlated features
c) When interaction effects between features need to be captured
d) When ranking features based on statistical properties alone
9. What is a common drawback of using Lasso regression for feature selection?
a) It may fail to remove any features from the model
b) It cannot be combined with other dimensionality reduction methods
c) It may over-penalize features, potentially leading to underfitting
d) It does not assign a coefficient of zero to unimportant features
10. Why is it recommended to use cross-validation when applying wrapper methods like Recursive Feature Elimination (RFE) on small datasets?
a) To maximize feature interactions
b) To ensure that selected features generalize well to new data
c) To avoid data leakage
d) To prioritize computational efficiency