Series Variable Matches Serialized Series

Verifies whether a student's Series matches a reference serialized Series (pickle or CSV).

Location of the snippet: python/pandas/series/series_variable_matches_serialized_series

This snippet is used to compare a Pandas Series variable with a serialized Series. The serialized Series is stored in a file, and the snippet reads this file to compare it with the student's Series variable.

The location variable is optional and defaults to /root/.cache/.local/.trash/. This is the default location where the serialized Series file is stored. If the file is stored in a different location, the location variable should be set to that location.

Device Type
Jupyter

Variables:

Variable Name	Variable Description	Type	Required?
`series_variable_name`	Name of the student's Series variable.	`str`	Yes
`serialized_series_file_name`	File name of the serialized reference Series.	`str`	Yes
`location`	Directory where the serialized file is stored.	`str`	No
`read_kwargs`	Optional dictionary of additional keyword arguments for reading the file.	`dict`	No
`testing_kwargs`	Optional dictionary of additional keyword arguments for testing the Series.	`dict`	No
`serialization_method`	Explicit method for serialization. If blank, it is inferred from the file extension. Choices: pickle, csv.	`str`	No

Examples:

1. Basic Series Comparison (Pickle Serialization, Default Location)

This example checks if a student's Pandas Series exactly matches a reference Series saved using Pickle.

Scenario: Students need to create a Series student_ages containing the ages of a small group of students.

Task: Create a Pandas Series named student_ages with values [22, 24, 21, 23] and a default integer index.

Placeholder:

student_ages = ...

Solution:

import pandas as pd
student_ages = pd.Series([22, 24, 21, 23])

Serialization:

import pandas as pd
import pickle
import os

# Create the reference Series
expected_ages = pd.Series([22, 24, 21, 23])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_ages_ex1.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
    pickle.dump(expected_ages, f)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable Name	Value
`series_variable_name`	`student_ages`
`serialized_series_file_name`	`expected_ages_ex1.pkl`

2. Series Comparison (CSV Serialization, Custom Location, with Name)

This example verifies a Series saved as a CSV, stored in a custom directory, and checks its values as well as its name.

Scenario: Students need to calculate the total sales for each product, resulting in a Series named total_product_sales with product names as the index.

Task: Create a Pandas Series named total_product_sales with index ['Laptop', 'Mouse', 'Keyboard'] and values [12000, 750, 1350].

Placeholder:

total_product_sales = ...

Solution:

import pandas as pd
total_product_sales = pd.Series(
    [12000, 750, 1350],
    index=['Laptop', 'Mouse', 'Keyboard'],
    name='Total Sales' # Student might set a name
)

Serialization:

import pandas as pd
import os

# Create the reference Series
expected_sales = pd.Series(
    [12000, 750, 1350],
    index=['Laptop', 'Mouse', 'Keyboard'],
    name='Total Sales' # Ensure the reference Series has the expected name
)

# Define a custom location
custom_location = "/sales_data/summaries/"
os.makedirs(custom_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_product_sales_ex2.csv"
reference_file_path = os.path.join(custom_location, reference_file_name)

# Serialize the Series to CSV
expected_sales.to_csv(reference_file_path)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable Name	Value
`series_variable_name`	`total_product_sales`
`serialized_series_file_name`	`expected_product_sales_ex2.csv`
`location`	`/sales_data/summaries/`

3. Series Comparison with Numerical Tolerance (`testing_kwargs`)

This example demonstrates using testing_kwargs to allow for slight floating-point differences when comparing numerical Series.

Scenario: Students calculate a Series average_temperatures that may have floating-point precision variations.

Task: Create a Pandas Series named average_temperatures with values [25.1234, 26.5678, 24.9012].

Placeholder:

average_temperatures = ...

Solution:

import pandas as pd
average_temperatures = pd.Series([25.12345, 26.56789, 24.90123])

Serialization:

import pandas as pd
import pickle
import os

# Create the reference Series with slightly different precision
expected_temps = pd.Series([25.123, 26.568, 24.901])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True)

# Define the reference file path
reference_file_name = "expected_temperatures_ex4.pkl"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the Series using pickle
with open(reference_file_path, 'wb') as f:
    pickle.dump(expected_temps, f)

print(f"Reference Series saved to: {reference_file_path}")

Snippet for the assertion:

Variable Name	Value
`series_variable_name`	`average_temperatures`
`serialized_series_file_name`	`expected_temperatures_ex4.pkl`
`testing_kwargs`	`{"rtol": 1e-3}`

Variables:​

Examples:​

1. Basic Series Comparison (Pickle Serialization, Default Location)​

2. Series Comparison (CSV Serialization, Custom Location, with Name)​

3. Series Comparison with Numerical Tolerance (testing_kwargs)​

Variables:

Examples:

1. Basic Series Comparison (Pickle Serialization, Default Location)

2. Series Comparison (CSV Serialization, Custom Location, with Name)

3. Series Comparison with Numerical Tolerance (`testing_kwargs`)