Skip to content

Commit

Permalink
Interaction terms and polynomials in Julia
Browse files Browse the repository at this point in the history
  • Loading branch information
gabriel-fallen committed Oct 7, 2023
1 parent bf59d3c commit d8e43cc
Showing 1 changed file with 34 additions and 2 deletions.
36 changes: 34 additions & 2 deletions Model_Estimation/OLS/interaction_terms_and_polynomials.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $$
Y = \beta_0+\beta_1X_1+\beta_2X_2
$$

However, if the independent variables have a nonlinear effect on the outcome, the model will be incorrectly specified. This is fine as long as that nonlinearity is modeled by including those nonlinear terms in the index.
However, if the independent variables have a nonlinear effect on the outcome, the model will be incorrectly specified. This is fine as long as that nonlinearity is modeled by including those nonlinear terms in the index.

The two most common ways this occurs is by including interactions or polynomial terms. With an interaction, the effect of one variable varies according to the value of another:

Expand Down Expand Up @@ -44,6 +44,38 @@ $$

# Implementations

## Julia

Thanks to [**StatsModels.jl**](https://juliastats.org/StatsModels.jl/stable/) and [**GLM**](https://juliastats.org/GLM.jl/stable/) packages from the JuliaStats project we can match R and Python code very closely.

```julia
using StatsModels, GLM, DataFrames, CSV

# Load the R mtcars dataset from a URL
mtcars = CSV.read(download("https://github.com/LOST-STATS/lost-stats.github.io/raw/source/Data/mtcars.csv"), DataFrame)

# Here we specify a model with linear, quadratic and cubic `hp` terms.
# We can use any Julia functions and operators, including user-defined ones,
# in a `@formula` expression.
# We also specify `dropcollinear=false` otherwise `lm` function will drop
# the intercept during fitting, as soon as the model's terms are not linearly
# independent. That's a dubious thing to have in a presumably linear model,
# but here we show only how to write down a particular model, and not what model
# is the right one for the given data. :)
model1 = lm(@formula(mpg ~ hp + hp^2 + hp^3 + cyl), mtcars, dropcollinear=false)
print(model1)

# Include an interaction term and the variables by themselves using `*`
# The interaction term is represented by hp:cyl
model2 = lm(@formula(mpg ~ hp * cyl), mtcars)
print(model2)

# Include only the interaction term and not the variables themselves with `&`
# Hard to interpret! Occasionally useful though.
model3 = lm(@formula(mpg ~ hp&cyl), mtcars)
print(model3)
```

## Python

Using the [**statsmodels**](https://www.statsmodels.org/stable/index.html) package, we can use a similar formulation as the `R` example below.
Expand Down Expand Up @@ -126,7 +158,7 @@ reg mpg c.weight##c.weight##c.weight foreign
It is also possible to use other type of functions and obtain correct marginal effects. For example:
Say that you want to estimate the model:

$$ y = a_0 + a_1 * x + a_2 * 1/x + e $$
$$ y = a_0 + a_1 * x + a_2 * 1/x + e $$

and you want to estimate the marginal effects with respect to $$x$$. You can do this as follows:

Expand Down

0 comments on commit d8e43cc

Please sign in to comment.