Dataframe Column Matches Properties
Check if a DataFrame column matches specified properties like type, length, uniqueness, and more.
Location of the snippet : python/pandas/dataframes/dataframe_column_matches_properties
This snippet checks if a DataFrame column matches specified properties like type, length, uniqueness, and more.
| Device Type |
|---|
| Jupyter |
Variables:
| Variable Name | Variable Description | Type | Required? | Default |
|---|---|---|---|---|
df_variable_name | Name of the student's DataFrame variable. | str | Yes | |
column_name | Name of the column to check. | str | Yes | |
data_type | Expected data type of the column. | str | No | |
length | Expected length of the column. | int | No | |
num_unique | Expected number of unique values in the column. | int | No | |
null_count | Expected number of null values in the column. | int | No | |
max_value | Expected maximum value in the column. | number | No | |
min_value | Expected minimum value in the column. | number | No |
Examples:
1. Checking Data Type and Length
This example verifies that a specific column in the student's DataFrame has the correct data type and the expected number of entries.
Scenario: A student is loading a dataset and you want to ensure the 'CustomerID' column is of integer type and contains records for 100 customers.
Task:
Load or create a DataFrame named customer_data_df that includes a CustomerID column, ensuring it's an integer type and has 100 entries.
Example customer_data_df (in student's notebook):
CustomerID CustomerName
0 101 Alice
1 102 Bob
.. ... ...
99 200 Zoe
Placeholder:
customer_data_df = ...
Solution:
import pandas as pd
# Example:
data = {'CustomerID': list(range(101, 201)),
'CustomerName': [f'Name_{i}' for i in range(100)]}
customer_data_df = pd.DataFrame(data)
Snippet for the assertion:
| Variable Name | Value |
|---|---|
df_variable_name | customer_data_df |
column_name | CustomerID |
data_type | int |
length | 100 |
2. Checking Uniqueness and Null Count
This example checks number of unique values in a column and number of missing values, which is crucial for identifier columns.
Scenario: The 'OrderID' column in an transactions_df DataFrame must be unique and contain no null values to maintain data integrity.
Task:
Create a DataFrame named transactions_df where the OrderID column is unique and has no missing values.
Example transactions_df (in student's notebook):
OrderID Item Amount
0 T101 Laptop 1200.0
1 T102 Mouse 25.0
2 T103 Keyboard 45.0
Placeholder:
transactions_df = ...
Solution:
import pandas as pd
data = {'OrderID': ['T101', 'T102', 'T103'],
'Item': ['Laptop', 'Mouse', 'Keyboard'],
'Amount': [1200.0, 25.0, 45.0]}
transactions_df = pd.DataFrame(data)
Snippet for the assertion:
| Variable Name | Value |
|---|---|
df_variable_name | transactions_df |
column_name | OrderID |
num_unique | 3 |
null_count | 0 |
3. Checking Value Range (Min/Max)
This example validates that numerical values within a column fall within an expected range, useful for validating sensor readings or calculated metrics.
Scenario: After a data transformation, the 'Temperature' column in a sensor_data_df DataFrame should have values between 0.0 and 100.0, inclusive.
Task:
Create a DataFrame named sensor_data_df with a Temperature column where all values are within the range 0.0 to 100.0.
Example sensor_data_df (in student's notebook):
SensorID Temperature Humidity
0 S01 25.5 60
1 S02 50.0 65
2 S03 0.0 55
3 S04 100.0 70
Placeholder:
sensor_data_df = ...
Solution:
import pandas as pd
data = {'SensorID': ['S01', 'S02', 'S03', 'S04'],
'Temperature': [25.5, 50.0, 0.0, 100.0],
'Humidity': [60, 65, 55, 70]}
sensor_data_df = pd.DataFrame(data)
Snippet for the assertion:
| Variable Name | Value |
|---|---|
df_variable_name | sensor_data_df |
column_name | Temperature |
min_value | 0.0 |
max_value | 100.0 |