Skip to main content

Numpy Array Variable Matches Serialized Array

Compare a numpy array variable with a serialized numpy array

Location of the snippet: python/numpy/numpy_array_variable_matches_serialized_array

This snippet can be used to compare a numpy array variable with a serialized numpy array file. The serialized file can be in.npy, .pkl, .parquet formats. The default location to serialize the data is always /root/.cache/.local/.trash/ which you should REMOVE before your students have access to the project. It can optionally be overwritten with the parameter location. Whenever we leave the location parameter blank, the assertion will look for the file in the default location /root/.cache/.local/.trash/.

Device Type
Jupyter

Variables:

Variable NameVariable DescriptionTypeRequired?Default
student_variable_nameName of student's numpy array variablestrYes
serialized_array_file_nameName of the file containing the serialized numpy arraystrYes
locationLocation of the serialized numpy array filestrNo
read_kwargsOptional dictionary of additional keyword arguments for reading the file.dictNo
testing_kwargsOptional dictionary of additional keyword arguments for testing the array.dictNo
serialization_method(Optional) Method used to serialize the numpy array (npy, pickle, parquet)strNo

Examples:

1. Basic 1D Array Comparison (NPY Serialization, Default Location)

This example checks if a student's 1D NumPy array exactly matches a reference array saved in the .npy format.

Scenario: Students need to create a NumPy array data_points containing a sequence of numbers.

Task: Create a NumPy array named data_points with the values [10, 20, 30, 40, 50].

Placeholder:

data_points = ...

Solution:

import numpy as np
data_points = np.array([10, 20, 30, 40, 50])

Serialization:

import numpy as np
import os

# Create the reference array
expected_array = np.array([10, 20, 30, 40, 50])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_data_points_ex1.npy"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the array using np.save (NPY format)
np.save(reference_file_path, expected_array)

print(f"Reference array saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
student_variable_namedata_points
serialized_array_file_nameexpected_data_points_ex1.npy

2. 2D Array Comparison (Pickle Serialization, Custom Location)

This example verifies a 2D NumPy array, serialized using Python's pickle module, and stored in a specific custom directory.

Scenario: Students need to create a matrix image_filter for a basic image processing task.

Task: Create a 2D NumPy array named image_filter representing a 3x3 kernel: [[1, 1, 1], [1, 0, 1], [1, 1, 1]].

Placeholder:

image_filter = ...

Solution:

import numpy as np
image_filter = np.array([
[1, 1, 1],
[1, 0, 1],
[1, 1, 1]
])

Serialization:

import numpy as np
import pickle
import os

# Create the reference array
expected_filter = np.array([
[1, 1, 1],
[1, 0, 1],
[1, 1, 1]
])

# Define a custom location
custom_location = "/ml_models/filters/"
os.makedirs(custom_location, exist_ok=True) # Ensure the directory exists

# Define the reference file path
reference_file_name = "expected_image_filter_ex2.pkl"
reference_file_path = os.path.join(custom_location, reference_file_name)

# Serialize the array using pickle
with open(reference_file_path, 'wb') as f:
pickle.dump(expected_filter, f)

print(f"Reference array saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
student_variable_nameimage_filter
serialized_array_file_nameexpected_image_filter_ex2.pkl
location/ml_models/filters/

3. Array Comparison with Numerical Tolerance (testing_kwargs)

This example demonstrates how to allow for small numerical differences when comparing arrays, essential for floating-point calculations.

Scenario: Students perform a calculation resulting in output_vector, which might have minor floating-point inaccuracies.

Task: Compute a NumPy array named output_vector representing [1/3, 2/3, 1/2].

Placeholder:

output_vector = ...

Solution:

import numpy as np
output_vector = np.array([1/3, 2/3, 1/2]) # Will result in floats

Serialization:

import numpy as np
import os

# Create the reference array with potentially slightly different precision
expected_vector = np.array([0.33333333, 0.66666667, 0.5])

# Define the default location
default_location = "/root/.cache/.local/.trash/"
os.makedirs(default_location, exist_ok=True)

# Define the reference file path
reference_file_name = "expected_output_vector_ex3.npy"
reference_file_path = os.path.join(default_location, reference_file_name)

# Serialize the array using np.save
np.save(reference_file_path, expected_vector)

print(f"Reference array saved to: {reference_file_path}")

Snippet for the assertion:

Variable NameValue
student_variable_nameoutput_vector
serialized_array_file_nameexpected_output_vector_ex3.npy
testing_kwargs{'rtol': 1e-5, 'atol': 1e-8}