Series Variable Matches Serialized Series
Verifies whether a student's Series matches a reference serialized Series (pickle or CSV).
Location of the snippet: python/pandas/series/series_variable_matches_serialized_series
This snippet is used to compare a Pandas Series variable with a serialized Series. The serialized Series is stored in a file, and the snippet reads this file to compare it with the student's Series variable.
The location variable is optional and defaults to /root/.cache/.local/.trash/. This is the default location where the serialized Series file is stored. If the file is stored in a different location, the location variable should be set to that location.
| Device Type |
|---|
| Jupyter |
Variables:
| Variable Name | Variable Description | Type | Required? | Default |
|---|---|---|---|---|
series_variable_name | Name of the student's Series variable. | str | Yes | |
serialized_series_file_name | File name of the serialized reference Series. | str | Yes | |
location | Directory where the serialized file is stored. | str | No | |
read_kwargs | Optional dictionary of additional keyword arguments for reading the file. | dict | No | |
testing_kwargs | Optional dictionary of additional keyword arguments for testing the Series. | dict | No | |
serialization_method | Explicit method for serialization. If blank, it is inferred from the file extension. Choices: pickle, csv. | str | No |
Examples:
1. Basic Series Comparison (Pickle Serialization, Default Location)
This example checks if a student's Pandas Series exactly matches a reference Series saved using Pickle.
Scenario: Students need to create a Series student_ages containing the ages of a small group of students.
Task:
Create a Pandas Series named student_ages with values [22, 24, 21, 23] and a default integer index.
Placeholder:
student_ages = ...
Solution:
import pandas as pd
student_ages = pd.Series([22, 24, 21, 23])
Serialization:
import pandas as pd
import pickle
import os
# Create the reference Series
expected_ages = pd.Series([22, 24, 21, 23])
# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True) # Ensure the directory exists
# Define the reference file path
reference_file_name = "expected_ages_ex1.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)
# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
pickle.dump(expected_ages, f)
print(f"Reference Series saved to: {reference_file_path}")
Snippet for the assertion:
| Variable Name | Value |
|---|---|
series_variable_name | student_ages |
serialized_series_file_name | expected_ages_ex1.pkl |
2. Series Comparison (CSV Serialization, Custom Location, with Name)
This example verifies a Series saved as a CSV, stored in a custom directory, and checks its values as well as its name.
Scenario: Students need to calculate the total sales for each product, resulting in a Series named total_product_sales with product names as the index.
Task:
Create a Pandas Series named total_product_sales with index ['Laptop', 'Mouse', 'Keyboard'] and values [12000, 750, 1350].
Placeholder:
total_product_sales = ...
Solution:
import pandas as pd
total_product_sales = pd.Series(
[12000, 750, 1350],
index=['Laptop', 'Mouse', 'Keyboard'],
name='Total Sales' # Student might set a name
)
Serialization:
import pandas as pd
import os
# Create the reference Series
expected_sales = pd.Series(
[12000, 750, 1350],
index=['Laptop', 'Mouse', 'Keyboard'],
name='Total Sales' # Ensure the reference Series has the expected name
)
# Define a custom location
custom_location = "/sales_data/summaries/"
os.makedirs(custom_location, exist_ok=True) # Ensure the directory exists
# Define the reference file path
reference_file_name = "expected_product_sales_ex2.csv"
reference_file_path = os.path.join(custom_location, reference_file_name)
# Serialize the Series to CSV
expected_sales.to_csv(reference_file_path)
print(f"Reference Series saved to: {reference_file_path}")
Snippet for the assertion:
| Variable Name | Value |
|---|---|
series_variable_name | total_product_sales |
serialized_series_file_name | expected_product_sales_ex2.csv |
location | /sales_data/summaries/ |
3. Series Comparison with Numerical Tolerance (testing_kwargs)
This example demonstrates using testing_kwargs to allow for slight floating-point differences when comparing numerical Series.
Scenario: Students calculate a Series average_temperatures that may have floating-point precision variations.
Task:
Create a Pandas Series named average_temperatures with values [25.1234, 26.5678, 24.9012].
Placeholder:
average_temperatures = ...
Solution:
import pandas as pd
average_temperatures = pd.Series([25.12345, 26.56789, 24.90123])
Serialization:
import pandas as pd
import pickle
import os
# Create the reference Series with slightly different precision
expected_temps = pd.Series([25.123, 26.568, 24.901])
# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True)
# Define the reference file path
reference_file_name = "expected_temperatures_ex4.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)
# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
pickle.dump(expected_temps, f)
print(f"Reference Series saved to: {reference_file_path}")
Snippet for the assertion:
| Variable Name | Value |
|---|---|
series_variable_name | average_temperatures |
serialized_series_file_name | expected_temperatures_ex4.pkl |
testing_kwargs | {"rtol": 1e-3} |