Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modifications to propensity.py #811

Open
ras44 opened this issue Feb 10, 2025 · 0 comments
Open

modifications to propensity.py #811

ras44 opened this issue Feb 10, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@ras44
Copy link
Collaborator

ras44 commented Feb 10, 2025

Is your feature request related to a problem? Please describe.

A variety of topics came up re propensity.py in the last CausalML call:

  • compute_propensity_score could either:

    • enforce that the p_model arg is an instance of the PropensityModel class
      • if so, then alter the Propensity model class predict and predict_proba functions so that predict calls predict_proba if the underlying model doesn't have a predict function that produces scores (and not classifications)
    • accept any model passed (for instance, a Naive Bayes classifier)
      • if so, then call predict_proba or predict based on the availability of predict_proba from within compute_propensity_score for the passed model, so as to always produce the propensity scores and not the classification
  • when calling compute_propensity_score with calibrate=True, the model that is returned is not the model that produces the returned propensity scores (those are calculated by calibration function model). This seems like it could lead to confusion if someone were to attempt to reproduce the scores using the returned model.

    • Would it make sense here to return None for the model? Or some other warning if calibrate=True?

Describe the solution you'd like
Not sure about the tradeoffs between accepting any model in compute_propensity_score vs requiring that the model be an instance of PropensityModel, but I'd be inclined to accept any model (i.e. a Naive Bayes classifier as was used in the calibration example) and then test for the existence of predict_proba to produce scores, before using predict if predict_proba doesn't exist. Something like:

def compute_propensity_score(
    X, treatment, p_model=None, X_pred=None, treatment_pred=None, calibrate_p="iso"
):
    """Generate propensity score if user didn't provide

    Args:
        X (np.matrix): features for training
        treatment (np.array or pd.Series): a treatment vector for training
        p_model (propensity model object, optional):
            ElasticNetPropensityModel (default) / GradientBoostedPropensityModel
        X_pred (np.matrix, optional): features for prediction
        treatment_pred (np.array or pd.Series, optional): a treatment vector for prediciton
        calibrate_p (bool, optional): whether calibrate the propensity score

    Returns:
        (tuple)
            - p (numpy.ndarray): propensity score
            - p_model (PropensityModel): a trained PropensityModel object
    """

    print("using local compute_propensity_score")
    
    if treatment_pred is None:
        treatment_pred = treatment.copy()
    if p_model is None:
        p_model = ElasticNetPropensityModel()

    p_model.fit(X, treatment)

    if X_pred is None:
        try:
            p = p_model.predict_proba(X)[:, 1]
        except AttributeError:
            print("predict_proba not available, using predict instead")            
            p = p_model.predict(X)        
    else:
        try:
            p = p_model.predict_proba(X_pred)[:, 1]
        except AttributeError:
            print("predict_proba not available, using predict instead")            
            p = p_model.predict(X_pred)

    if calibrate_p:
        print("Isotonic calibrating propensity scores only.  Returning model=None.")
        p = calibrate_iso(p, treatment_pred)
        p_model = None

    # force the p values within the range
    eps = np.finfo(float).eps
    p = np.where(p < 0 + eps, 0 + eps * 1.001, p)
    p = np.where(p > 1 - eps, 1 - eps * 1.001, p)

    return p, p_model

Refs:

class PropensityModel(metaclass=ABCMeta):

def compute_propensity_score(

@ras44 ras44 added the enhancement New feature or request label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant