Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction terms and polynomials in Julia #206

Merged
merged 1 commit into from
Oct 7, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 34 additions & 2 deletions Model_Estimation/OLS/interaction_terms_and_polynomials.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $$
Y = \beta_0+\beta_1X_1+\beta_2X_2
$$

However, if the independent variables have a nonlinear effect on the outcome, the model will be incorrectly specified. This is fine as long as that nonlinearity is modeled by including those nonlinear terms in the index.
However, if the independent variables have a nonlinear effect on the outcome, the model will be incorrectly specified. This is fine as long as that nonlinearity is modeled by including those nonlinear terms in the index.

The two most common ways this occurs is by including interactions or polynomial terms. With an interaction, the effect of one variable varies according to the value of another:

Expand Down Expand Up @@ -44,6 +44,38 @@ $$

# Implementations

## Julia

Thanks to [**StatsModels.jl**](https://juliastats.org/StatsModels.jl/stable/) and [**GLM**](https://juliastats.org/GLM.jl/stable/) packages from the JuliaStats project we can match R and Python code very closely.

```julia
using StatsModels, GLM, DataFrames, CSV

# Load the R mtcars dataset from a URL
mtcars = CSV.read(download("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Data/mtcars.csv"), DataFrame)

# Here we specify a model with linear, quadratic and cubic `hp` terms.
# We can use any Julia functions and operators, including user-defined ones,
# in a `@formula` expression.
# We also specify `dropcollinear=false` otherwise `lm` function will drop
# the intercept during fitting, as soon as the model's terms are not linearly
# independent. That's a dubious thing to have in a presumably linear model,
# but here we show only how to write down a particular model, and not what model
# is the right one for the given data. :)
model1 = lm(@formula(mpg ~ hp + hp^2 + hp^3 + cyl), mtcars, dropcollinear=false)
print(model1)

# Include an interaction term and the variables by themselves using `*`
# The interaction term is represented by hp:cyl
model2 = lm(@formula(mpg ~ hp * cyl), mtcars)
print(model2)

# Include only the interaction term and not the variables themselves with `&`
# Hard to interpret! Occasionally useful though.
model3 = lm(@formula(mpg ~ hp&cyl), mtcars)
print(model3)
```

## Python

Using the [**statsmodels**](https://www.statsmodels.org/stable/index.html) package, we can use a similar formulation as the `R` example below.
Expand Down Expand Up @@ -126,7 +158,7 @@ reg mpg c.weight##c.weight##c.weight foreign
It is also possible to use other type of functions and obtain correct marginal effects. For example:
Say that you want to estimate the model:

$$ y = a_0 + a_1 * x + a_2 * 1/x + e $$
$$ y = a_0 + a_1 * x + a_2 * 1/x + e $$

and you want to estimate the marginal effects with respect to $$x$$. You can do this as follows:

Expand Down