Skip to main content

Series Variable Matches Serialized Series

Verifies whether a student's Series matches a reference serialized Series (pickle or CSV).

Location of the snippet: python/pandas/series/series_variable_matches_serialized_series

This snippet is used to compare a Pandas Series variable with a serialized Series. The serialized Series is stored in a file, and the snippet reads this file to compare it with the student's Series variable.

The location variable is optional and defaults to /root/.cache/.local/.trash/. This is the default location where the serialized Series file is stored. If the file is stored in a different location, the location variable should be set to that location.

Device Type
Jupyter

Variables:

Variable NameVariable DescriptionTypeRequired?Default
series_variable_nameName of the student's Series variable.strYes
serialized_series_file_nameFile name of the serialized reference Series.strYes
locationDirectory where the serialized file is stored.strNo
read_kwargsOptional dictionary of additional keyword arguments for reading the file.dictNo
testing_kwargsOptional dictionary of additional keyword arguments for testing the Series.dictNo
serialization_methodExplicit method for serialization. If blank, it is inferred from the file extension. Choices: pickle, csv.strNo

Examples:

1. Basic Series Comparison (Pickle Serialization, Default Location)

This example checks if a student's Pandas Series exactly matches a reference Series saved using Pickle.

Scenario: Students need to create a Series student_ages containing the ages of a small group of students.

Task: Create a Pandas Series named student_ages with values [22, 24, 21, 23] and a default integer index.

Placeholder:

student_ages = ...

Solution:

import pandas as pd
student_ages = pd.Series([22, 24, 21, 23])

Serialization:

import pandas as pd
import pickle
import os

# Create the reference Series
expected_ages = pd.Series([22, 24, 21, 23])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_ages_ex1.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
pickle.dump(expected_ages, f)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
series_variable_namestudent_ages
serialized_series_file_nameexpected_ages_ex1.pkl

2. Series Comparison (CSV Serialization, Custom Location, with Name)

This example verifies a Series saved as a CSV, stored in a custom directory, and checks its values as well as its name.

Scenario: Students need to calculate the total sales for each product, resulting in a Series named total_product_sales with product names as the index.

Task: Create a Pandas Series named total_product_sales with index ['Laptop', 'Mouse', 'Keyboard'] and values [12000, 750, 1350].

Placeholder:

total_product_sales = ...

Solution:

import pandas as pd
total_product_sales = pd.Series(
[12000, 750, 1350],
index=['Laptop', 'Mouse', 'Keyboard'],
name='Total Sales' # Student might set a name
)

Serialization:

import pandas as pd
import os

# Create the reference Series
expected_sales = pd.Series(
[12000, 750, 1350],
index=['Laptop', 'Mouse', 'Keyboard'],
name='Total Sales' # Ensure the reference Series has the expected name
)

# Define a custom location
custom_location = "/sales_data/summaries/"
os.makedirs(custom_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_product_sales_ex2.csv"
reference_file_path = os.path.join(custom_location, reference_file_name)

# Serialize the Series to CSV
expected_sales.to_csv(reference_file_path)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
series_variable_nametotal_product_sales
serialized_series_file_nameexpected_product_sales_ex2.csv
location/sales_data/summaries/

3. Series Comparison with Numerical Tolerance (testing_kwargs)

This example demonstrates using testing_kwargs to allow for slight floating-point differences when comparing numerical Series.

Scenario: Students calculate a Series average_temperatures that may have floating-point precision variations.

Task: Create a Pandas Series named average_temperatures with values [25.1234, 26.5678, 24.9012].

Placeholder:

average_temperatures = ...

Solution:

import pandas as pd
average_temperatures = pd.Series([25.12345, 26.56789, 24.90123])

Serialization:

import pandas as pd
import pickle
import os

# Create the reference Series with slightly different precision
expected_temps = pd.Series([25.123, 26.568, 24.901])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True)

# Define the reference file path
reference_file_name = "expected_temperatures_ex4.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
pickle.dump(expected_temps, f)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
series_variable_nameaverage_temperatures
serialized_series_file_nameexpected_temperatures_ex4.pkl
testing_kwargs{"rtol": 1e-3}