Dataframe Column Matches Properties

Check if a DataFrame column matches specified properties like type, length, uniqueness, and more.

Location of the snippet : python/pandas/dataframes/dataframe_column_matches_properties

This snippet checks if a DataFrame column matches specified properties like type, length, uniqueness, and more.

Device Type
Jupyter

Variables:

Variable Name	Variable Description	Type	Required?
`df_variable_name`	Name of the student's DataFrame variable.	`str`	Yes
`column_name`	Name of the column to check.	`str`	Yes
`data_type`	Expected data type of the column.	`str`	No
`length`	Expected length of the column.	`int`	No
`num_unique`	Expected number of unique values in the column.	`int`	No
`null_count`	Expected number of null values in the column.	`int`	No
`max_value`	Expected maximum value in the column.	`number`	No
`min_value`	Expected minimum value in the column.	`number`	No

Examples:

1. Checking Data Type and Length

This example verifies that a specific column in the student's DataFrame has the correct data type and the expected number of entries.

Scenario: A student is loading a dataset and you want to ensure the 'CustomerID' column is of integer type and contains records for 100 customers.

Task: Load or create a DataFrame named customer_data_df that includes a CustomerID column, ensuring it's an integer type and has 100 entries.

Example customer_data_df (in student's notebook):

   CustomerID CustomerName
0         101        Alice
1         102          Bob
..        ...          ...
99        200        Zoe

Placeholder:

customer_data_df = ...

Solution:

import pandas as pd
# Example:
data = {'CustomerID': list(range(101, 201)),
        'CustomerName': [f'Name_{i}' for i in range(100)]}
customer_data_df = pd.DataFrame(data)

Snippet for the assertion:

Variable Name	Value
`df_variable_name`	`customer_data_df`
`column_name`	`CustomerID`
`data_type`	`int`
`length`	`100`

2. Checking Uniqueness and Null Count

This example checks number of unique values in a column and number of missing values, which is crucial for identifier columns.

Scenario: The 'OrderID' column in an transactions_df DataFrame must be unique and contain no null values to maintain data integrity.

Task: Create a DataFrame named transactions_df where the OrderID column is unique and has no missing values.

Example transactions_df (in student's notebook):

   OrderID      Item  Amount
   T101     Laptop  1200.0
   T102      Mouse    25.0
   T103   Keyboard    45.0

Placeholder:

transactions_df = ...

Solution:

import pandas as pd

data = {'OrderID': ['T101', 'T102', 'T103'],
        'Item': ['Laptop', 'Mouse', 'Keyboard'],
        'Amount': [1200.0, 25.0, 45.0]}

transactions_df = pd.DataFrame(data)

Snippet for the assertion:

Variable Name	Value
`df_variable_name`	`transactions_df`
`column_name`	`OrderID`
`num_unique`	`3`
`null_count`	`0`

3. Checking Value Range (Min/Max)

This example validates that numerical values within a column fall within an expected range, useful for validating sensor readings or calculated metrics.

Scenario: After a data transformation, the 'Temperature' column in a sensor_data_df DataFrame should have values between 0.0 and 100.0, inclusive.

Task: Create a DataFrame named sensor_data_df with a Temperature column where all values are within the range 0.0 to 100.0.

Example sensor_data_df (in student's notebook):

   SensorID  Temperature  Humidity
     S01         25.5        60
     S02         50.0        65
     S03          0.0        55
     S04        100.0        70

Placeholder:

sensor_data_df = ...

Solution:

import pandas as pd

data = {'SensorID': ['S01', 'S02', 'S03', 'S04'],
        'Temperature': [25.5, 50.0, 0.0, 100.0],
        'Humidity': [60, 65, 55, 70]}

sensor_data_df = pd.DataFrame(data)

Snippet for the assertion:

Variable Name	Value
`df_variable_name`	`sensor_data_df`
`column_name`	`Temperature`
`min_value`	`0.0`
`max_value`	`100.0`

Variables:​

Examples:​

1. Checking Data Type and Length​

2. Checking Uniqueness and Null Count​

3. Checking Value Range (Min/Max)​

Variables:

Examples:

1. Checking Data Type and Length

2. Checking Uniqueness and Null Count

3. Checking Value Range (Min/Max)