Skip to main content

Dataframe Column Matches Properties

Check if a DataFrame column matches specified properties like type, length, uniqueness, and more.

Location of the snippet : python/pandas/dataframes/dataframe_column_matches_properties

This snippet checks if a DataFrame column matches specified properties like type, length, uniqueness, and more.

Device Type
Jupyter

Variables:

Variable NameVariable DescriptionTypeRequired?Default
df_variable_nameName of the student's DataFrame variable.strYes
column_nameName of the column to check.strYes
data_typeExpected data type of the column.strNo
lengthExpected length of the column.intNo
num_uniqueExpected number of unique values in the column.intNo
null_countExpected number of null values in the column.intNo
max_valueExpected maximum value in the column.numberNo
min_valueExpected minimum value in the column.numberNo

Examples:

1. Checking Data Type and Length

This example verifies that a specific column in the student's DataFrame has the correct data type and the expected number of entries.

Scenario: A student is loading a dataset and you want to ensure the 'CustomerID' column is of integer type and contains records for 100 customers.

Task: Load or create a DataFrame named customer_data_df that includes a CustomerID column, ensuring it's an integer type and has 100 entries.

Example customer_data_df (in student's notebook):

   CustomerID CustomerName
0 101 Alice
1 102 Bob
.. ... ...
99 200 Zoe

Placeholder:

customer_data_df = ...

Solution:

import pandas as pd
# Example:
data = {'CustomerID': list(range(101, 201)),
'CustomerName': [f'Name_{i}' for i in range(100)]}
customer_data_df = pd.DataFrame(data)

Snippet for the assertion:

Variable NameValue
df_variable_namecustomer_data_df
column_nameCustomerID
data_typeint
length100

2. Checking Uniqueness and Null Count

This example checks number of unique values in a column and number of missing values, which is crucial for identifier columns.

Scenario: The 'OrderID' column in an transactions_df DataFrame must be unique and contain no null values to maintain data integrity.

Task: Create a DataFrame named transactions_df where the OrderID column is unique and has no missing values.

Example transactions_df (in student's notebook):

   OrderID      Item  Amount
0 T101 Laptop 1200.0
1 T102 Mouse 25.0
2 T103 Keyboard 45.0

Placeholder:

transactions_df = ...

Solution:

import pandas as pd

data = {'OrderID': ['T101', 'T102', 'T103'],
'Item': ['Laptop', 'Mouse', 'Keyboard'],
'Amount': [1200.0, 25.0, 45.0]}

transactions_df = pd.DataFrame(data)

Snippet for the assertion:

Variable NameValue
df_variable_nametransactions_df
column_nameOrderID
num_unique3
null_count0

3. Checking Value Range (Min/Max)

This example validates that numerical values within a column fall within an expected range, useful for validating sensor readings or calculated metrics.

Scenario: After a data transformation, the 'Temperature' column in a sensor_data_df DataFrame should have values between 0.0 and 100.0, inclusive.

Task: Create a DataFrame named sensor_data_df with a Temperature column where all values are within the range 0.0 to 100.0.

Example sensor_data_df (in student's notebook):

   SensorID  Temperature  Humidity
0 S01 25.5 60
1 S02 50.0 65
2 S03 0.0 55
3 S04 100.0 70

Placeholder:

sensor_data_df = ...

Solution:

import pandas as pd

data = {'SensorID': ['S01', 'S02', 'S03', 'S04'],
'Temperature': [25.5, 50.0, 0.0, 100.0],
'Humidity': [60, 65, 55, 70]}

sensor_data_df = pd.DataFrame(data)

Snippet for the assertion:

Variable NameValue
df_variable_namesensor_data_df
column_nameTemperature
min_value0.0
max_value100.0