Bug in ImputationKernel: Handling of imputation_order #90

nmehran · 2024-09-18T16:54:27Z

First off, thank you for maintaining such a useful library, I truly appreciate it.

A bug has been identified in the ImputationKernel class regarding the imputation_order parameter. The handling of "ascending" and "descending" orders is incorrect due to a typo and improper sorting configuration.

Problematic Code

# imputation_kernel.py
if imputation_order in ["ascending", "descending"]:
    _na_counts = {
        key: value
        for key, value in self.na_counts.items()
        if key in self.imputed_variables
    }
    self.imputation_order = list(
        Series(_na_counts).sort_values(ascending=False).index
    )
    if imputation_order == "decending":
        self.imputation_order.reverse()

Issues

Typo: "decending" should be "descending".
Incorrect Sorting Configuration: The parameter ascending=False should be ascending=True for ascending order.

Proposed Fix

Update the code as follows:

if imputation_order in ["ascending", "descending"]:
    _na_counts = {
        key: value
        for key, value in self.na_counts.items()
        if key in self.imputed_variables
    }
    self.imputation_order = list(
        Series(_na_counts).sort_values(ascending=True).index
    )
    if imputation_order == "descending":
        self.imputation_order.reverse()

Steps to Verify

Test the following code to compare imputation_order for "ascending" and "descending":

import pandas as pd
import numpy as np
import miceforest as mf
from sklearn.datasets import load_iris

# Load the Iris dataset
iris_data = load_iris()
iris_df = pd.DataFrame(iris_data.data, columns=iris_data.feature_names)

# Introduce missing values
np.random.seed(42)
missing_indices = np.random.choice(iris_df.index, size=10, replace=False)
iris_df.loc[missing_indices, 'sepal length (cm)'] = np.nan
iris_df.loc[np.random.choice(iris_df.index, size=8, replace=False), 'sepal width (cm)'] = np.nan
iris_df.loc[np.random.choice(iris_df.index, size=4, replace=False), 'petal length (cm)'] = np.nan
iris_df.loc[np.random.choice(iris_df.index, size=2, replace=False), 'petal width (cm)'] = np.nan

# Initialize ImputationKernel with 'ascending' order
kernel_ascending = mf.ImputationKernel(iris_df, imputation_order='ascending')

# Initialize ImputationKernel with 'descending' order
kernel_descending = mf.ImputationKernel(iris_df, imputation_order='descending')

# Compare imputation orders
ascending_order = kernel_ascending.imputation_order
descending_order = kernel_descending.imputation_order

# Print equality comparison result
print("Are the imputation orders equal?")
print(ascending_order == descending_order)

Additional Information

Version: 6.0.3

The text was updated successfully, but these errors were encountered:

- Fixed typo "decending" to "descending". - Updated sorting configuration for ascending order to use ascending=True. Related to issue AnotherSamWilson#90

nmehran mentioned this issue Sep 18, 2024

Fixed imputation order handling in ImputationKernel #91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in ImputationKernel: Handling of imputation_order #90

Bug in ImputationKernel: Handling of imputation_order #90

nmehran commented Sep 18, 2024

Bug in ImputationKernel: Handling of imputation_order #90

Bug in ImputationKernel: Handling of imputation_order #90

Comments

nmehran commented Sep 18, 2024

Problematic Code

Issues

Proposed Fix

Steps to Verify

Additional Information