factor.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Chapter 3 Factor investing and asset pricing anomalies | Machine Learning for Factor Investing</title>
<meta name="author" content="Guillaume Coqueret and Tony Guida">
<meta name="generator" content="bookdown 0.32 with bs4_book()">
<meta property="og:title" content="Chapter 3 Factor investing and asset pricing anomalies | Machine Learning for Factor Investing">
<meta property="og:type" content="book">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="Chapter 3 Factor investing and asset pricing anomalies | Machine Learning for Factor Investing">
<!-- JS --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://kit.fontawesome.com/6ecbd6c532.js" crossorigin="anonymous"></script><script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="libs/bootstrap-4.6.0/bootstrap.min.css" rel="stylesheet">
<script src="libs/bootstrap-4.6.0/bootstrap.bundle.min.js"></script><script src="libs/bs3compat-0.4.2/transition.js"></script><script src="libs/bs3compat-0.4.2/tabs.js"></script><script src="libs/bs3compat-0.4.2/bs3compat.js"></script><link href="libs/bs4_book-1.0.0/bs4_book.css" rel="stylesheet">
<script src="libs/bs4_book-1.0.0/bs4_book.js"></script><script src="libs/kePrint-0.0.1/kePrint.js"></script><link href="libs/lightable-0.0.1/lightable.css" rel="stylesheet">
<script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- CSS --><style type="text/css">
    
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
  </style>
<style type="text/css">
    /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
    div.csl-bib-body { }
    div.csl-entry {
      clear: both;
        }
    .hanging div.csl-entry {
      margin-left:2em;
      text-indent:-2em;
    }
    div.csl-left-margin {
      min-width:2em;
      float:left;
    }
    div.csl-right-inline {
      margin-left:2em;
      padding-left:1em;
    }
    div.csl-indent {
      margin-left: 2em;
    }
  </style>
<meta name="description" content=".container-fluid main { max-width: 60rem; } Asset pricing anomalies are the foundations of factor investing. In this chapter our aim is twofold: present simple ideas and concepts: basic factor...">
<meta property="og:description" content=".container-fluid main { max-width: 60rem; } Asset pricing anomalies are the foundations of factor investing. In this chapter our aim is twofold: present simple ideas and concepts: basic factor...">
<meta name="twitter:description" content=".container-fluid main { max-width: 60rem; } Asset pricing anomalies are the foundations of factor investing. In this chapter our aim is twofold: present simple ideas and concepts: basic factor...">
</head>
<body data-spy="scroll" data-target="#toc">

<div class="container-fluid">
<div class="row">
  <header class="col-sm-12 col-lg-3 sidebar sidebar-book"><a class="sr-only sr-only-focusable" href="#content">Skip to main content</a>

    <div class="d-flex align-items-start justify-content-between">
      <h1>
        <a href="index.html" title="">Machine Learning for Factor Investing</a>
      </h1>
      <button class="btn btn-outline-primary d-lg-none ml-2 mt-1" type="button" data-toggle="collapse" data-target="#main-nav" aria-expanded="true" aria-controls="main-nav"><i class="fas fa-bars"></i><span class="sr-only">Show table of contents</span></button>
    </div>

    <div id="main-nav" class="collapse-lg">
      <form role="search">
        <input id="search" class="form-control" type="search" placeholder="Search" aria-label="Search">
</form>

      <nav aria-label="Table of contents"><h2>Table of contents</h2>
        <ul class="book-toc list-unstyled">
<li><a class="" href="index.html">Preface</a></li>
<li class="book-part">Introduction</li>
<li><a class="" href="notdata.html"><span class="header-section-number">1</span> Notations and data</a></li>
<li><a class="" href="intro.html"><span class="header-section-number">2</span> Introduction</a></li>
<li><a class="active" href="factor.html"><span class="header-section-number">3</span> Factor investing and asset pricing anomalies</a></li>
<li><a class="" href="Data.html"><span class="header-section-number">4</span> Data preprocessing</a></li>
<li class="book-part">Common supervised algorithms</li>
<li><a class="" href="lasso.html"><span class="header-section-number">5</span> Penalized regressions and sparse hedging for minimum variance portfolios</a></li>
<li><a class="" href="trees.html"><span class="header-section-number">6</span> Tree-based methods</a></li>
<li><a class="" href="NN.html"><span class="header-section-number">7</span> Neural networks</a></li>
<li><a class="" href="svm.html"><span class="header-section-number">8</span> Support vector machines</a></li>
<li><a class="" href="bayes.html"><span class="header-section-number">9</span> Bayesian methods</a></li>
<li class="book-part">From predictions to portfolios</li>
<li><a class="" href="valtune.html"><span class="header-section-number">10</span> Validating and tuning</a></li>
<li><a class="" href="ensemble.html"><span class="header-section-number">11</span> Ensemble models</a></li>
<li><a class="" href="backtest.html"><span class="header-section-number">12</span> Portfolio backtesting</a></li>
<li class="book-part">Further important topics</li>
<li><a class="" href="interp.html"><span class="header-section-number">13</span> Interpretability</a></li>
<li><a class="" href="causality.html"><span class="header-section-number">14</span> Two key concepts: causality and non-stationarity</a></li>
<li><a class="" href="unsup.html"><span class="header-section-number">15</span> Unsupervised learning</a></li>
<li><a class="" href="RL.html"><span class="header-section-number">16</span> Reinforcement learning</a></li>
<li class="book-part">Appendix</li>
<li><a class="" href="data-description.html"><span class="header-section-number">17</span> Data description</a></li>
<li><a class="" href="python.html"><span class="header-section-number">18</span> Python notebooks</a></li>
<li><a class="" href="solutions-to-exercises.html"><span class="header-section-number">19</span> Solutions to exercises</a></li>
</ul>

        <div class="book-extra">
          
        </div>
      </nav>
</div>
  </header><main class="col-sm-12 col-md-9 col-lg-7" id="content"><div id="factor" class="section level1" number="3">
<h1>
<span class="header-section-number">3</span> Factor investing and asset pricing anomalies<a class="anchor" aria-label="anchor" href="#factor"><i class="fas fa-link"></i></a>
</h1>
<style>
.container-fluid main {
max-width: 60rem;
}
</style>
<p>Asset pricing anomalies are the foundations of <strong>factor investing</strong>. In this chapter our aim is twofold:</p>
<ul>
<li>present simple ideas and concepts: basic factor models and common empirical facts (time-varying nature of returns and risk premia);<br>
</li>
<li>provide the reader with lists of articles that go much deeper to stimulate and satisfy curiosity.</li>
</ul>
<p>The purpose of this chapter is not to provide a full treatment of the many topics related to factor investing. Rather, it is intended to give a broad overview and cover the essential themes so that the reader is guided towards the relevant references. As such, it can serve as a short, non-exhaustive, review of the literature. The subject of factor modelling in finance is incredibly vast and the number of papers dedicated to it is substantial and still rapidly increasing.</p>
<p>The universe of peer-reviewed financial journals can be split in two. The first kind is the <strong>academic</strong> journals. Their articles are mostly written by professors, and the audience consists mostly of scholars. The articles are long and often technical. Prominent examples are the <em>Journal of Finance</em>, the <em>Review of Financial Studies</em> and the <em>Journal of Financial Economics</em>. The second type is more <strong>practitioner</strong>-orientated. The papers are shorter, easier to read, and target finance professionals predominantly. Two emblematic examples are the <em>Journal of Portfolio Management</em> and the <em>Financial Analysts Journal</em>. This chapter reviews and mentions articles published essentially in the first family of journals.</p>
<p>Beyond academic articles, several monographs are already dedicated to the topic of style allocation (a synonym of factor investing used for instance in theoretical articles (<span class="citation">Barberis and Shleifer (<a href="solutions-to-exercises.html#ref-barberis2003style">2003</a>)</span>) or practitioner papers (<span class="citation">Clifford Asness et al. (<a href="solutions-to-exercises.html#ref-asness2015investing">2015</a>)</span>)). To cite but a few, we mention:</p>
<ul>
<li>
<span class="citation">Ilmanen (<a href="solutions-to-exercises.html#ref-ilmanen2011expected">2011</a>)</span>: an exhaustive excursion into risk premia, across many asset classes, with a large spectrum of descriptive statistics (across factors and periods),<br>
</li>
<li>
<span class="citation">Ang (<a href="solutions-to-exercises.html#ref-ang2014asset">2014</a>)</span>: covers factor investing with a strong focus on the money management industry,<br>
</li>
<li>
<span class="citation">Bali, Engle, and Murray (<a href="solutions-to-exercises.html#ref-bali2016empirical">2016</a>)</span>: very complete book on the cross-section of signals with statistical analyses (univariate metrics, correlations, persistence, etc.),<br>
</li>
<li>
<span class="citation">Jurczenko (<a href="solutions-to-exercises.html#ref-jurczenko2017factor">2017</a>)</span>: a tour on various topics given by field experts (factor purity, predictability, selection versus weighting, factor timing, etc.).</li>
</ul>
<p>Finally, we mention a few wide-scope papers on this topic: <span class="citation">Goyal (<a href="solutions-to-exercises.html#ref-goyal2012empirical">2012</a>)</span>, <span class="citation">Cazalet and Roncalli (<a href="solutions-to-exercises.html#ref-cazalet2014facts">2014</a>)</span> and <span class="citation">Baz et al. (<a href="solutions-to-exercises.html#ref-baz2015dissecting">2015</a>)</span>.</p>
<div id="introduction" class="section level2" number="3.1">
<h2>
<span class="header-section-number">3.1</span> Introduction<a class="anchor" aria-label="anchor" href="#introduction"><i class="fas fa-link"></i></a>
</h2>
<p>The topic of factor investing, though a decades-old academic theme, has gained traction concurrently with the rise of equity traded funds (ETFs) as vectors of investment. Both have gathered momentum in the 2010 decade. Not so surprisingly, the feedback loop between practical financial engineering and academic research has stimulated both sides in a mutually beneficial manner. Practitioners rely on key scholarly findings (e.g., asset pricing anomalies) while researchers dig deeper into pragmatic topics (e.g., factor exposure or transaction costs). Recently, researchers have also tried to quantify and qualify the impact of factor indices on financial markets. For instance, <span class="citation">Krkoska and Schenk-Hoppé (<a href="solutions-to-exercises.html#ref-krkoska2019herding">2019</a>)</span> analyze herding behaviors while <span class="citation">Cong and Xu (<a href="solutions-to-exercises.html#ref-cong2019rise">2019</a>)</span> show that the introduction of composite securities increases volatility and cross-asset correlations.</p>
<p>The core aim of factor models is to understand the <strong>drivers of asset prices</strong>. Broadly speaking, the rationale behind factor investing is that the financial performance of firms depends on factors, whether they be latent and unobservable, or related to intrinsic characteristics (like accounting ratios for instance). Indeed, as <span class="citation">Cochrane (<a href="solutions-to-exercises.html#ref-cochrane2011presidential">2011</a>)</span> frames it, the first essential question is <em>which characteristics really provide independent information about average returns?</em> Answering this question helps understand the cross-section of returns and may open the door to their prediction.</p>
<p>Theoretically, linear factor models can be viewed as special cases of the arbitrage pricing theory (APT) of <span class="citation">Ross (<a href="solutions-to-exercises.html#ref-ross1976arbitrage">1976</a>)</span>, which assumes that the return of an asset <span class="math inline">\(n\)</span> can be modelled as a linear combination of underlying factors <span class="math inline">\(f_k\)</span>:
<span class="math display" id="eq:apt">\[\begin{equation}
\tag{3.1}
r_{t,n}= \alpha_n+\sum_{k=1}^K\beta_{n,k}f_{t,k}+\epsilon_{t,n},
\end{equation}\]</span></p>
<p>where the usual econometric constraints on linear models hold: <span class="math inline">\(\mathbb{E}[\epsilon_{t,n}]=0\)</span>, <span class="math inline">\(\text{cov}(\epsilon_{t,n},\epsilon_{t,m})=0\)</span> for <span class="math inline">\(n\neq m\)</span> and <span class="math inline">\(\text{cov}(\textbf{f}_n,\boldsymbol{\epsilon}_n)=0\)</span>. If such factors do exist, then they are in contradiction with the cornerstone model in asset pricing: the capital asset pricing model (CAPM) of <span class="citation">Sharpe (<a href="solutions-to-exercises.html#ref-sharpe1964capital">1964</a>)</span>, <span class="citation">Lintner (<a href="solutions-to-exercises.html#ref-lintner1965valuation">1965</a>)</span> and <span class="citation">Mossin (<a href="solutions-to-exercises.html#ref-mossin1966equilibrium">1966</a>)</span>. Indeed, according to the CAPM, the only driver of returns is the market portfolio. This explains why factors are also called ‘anomalies’. In <span class="citation">Pesaran and Smith (<a href="solutions-to-exercises.html#ref-pesaran2021factor">2021</a>)</span>, the authors define the strength of factors using the sum of squared <span class="math inline">\(\beta_{n,k}\)</span> (across firms).</p>
<p>Empirical evidence of asset pricing anomalies has accumulated since the dual publication of <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1992cross">1992</a>)</span> and <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span>. This seminal work has paved the way for a blossoming stream of literature that has its meta-studies (e.g., <span class="citation">Green, Hand, and Zhang (<a href="solutions-to-exercises.html#ref-green2013supraview">2013</a>)</span>, <span class="citation">C. R. Harvey, Liu, and Zhu (<a href="solutions-to-exercises.html#ref-harvey2016and">2016</a>)</span> and <span class="citation">McLean and Pontiff (<a href="solutions-to-exercises.html#ref-mclean2016does">2016</a>)</span>). The regression <a href="factor.html#eq:apt">(3.1)</a> can be evaluated once (unconditionally) or sequentially over different time frames. In the latter case, the parameters (coefficient estimates) change and the models are thus called <em>conditional</em> (we refer to <span class="citation">Ang and Kristensen (<a href="solutions-to-exercises.html#ref-ang2012testing">2012</a>)</span> and to <span class="citation">Cooper and Maio (<a href="solutions-to-exercises.html#ref-cooper2018new">2019</a>)</span> for recent results on this topic as well as for a detailed review on the related research). Conditional models are more flexible because they acknowledge that the drivers of asset prices may not be constant, which seems like a reasonable postulate.</p>
</div>
<div id="detecting-anomalies" class="section level2" number="3.2">
<h2>
<span class="header-section-number">3.2</span> Detecting anomalies<a class="anchor" aria-label="anchor" href="#detecting-anomalies"><i class="fas fa-link"></i></a>
</h2>
<div id="challenges" class="section level3" number="3.2.1">
<h3>
<span class="header-section-number">3.2.1</span> Challenges<a class="anchor" aria-label="anchor" href="#challenges"><i class="fas fa-link"></i></a>
</h3>
<p>Obviously, a crucial step is to be able to identify an anomaly and the complexity of this task should not be underestimated. Given the publication bias towards positive results (see, e.g., <span class="citation">C. R. Harvey (<a href="solutions-to-exercises.html#ref-harvey2017presidential">2017</a>)</span> in financial economics), researchers are often tempted to report partial results that are sometimes invalidated by further studies. The need for replication is therefore high and many findings have no tomorrow (<span class="citation">Linnainmaa and Roberts (<a href="solutions-to-exercises.html#ref-linnainmaa2018history">2018</a>)</span>, <span class="citation">Johannesson, Ohlson, and Zhai (<a href="solutions-to-exercises.html#ref-johannesson2020explanatory">2020</a>)</span>, <span class="citation">Cakici and Zaremba (<a href="solutions-to-exercises.html#ref-cakici2021size">2021</a>)</span>), especially if transaction costs are taken into account (<span class="citation">Patton and Weller (<a href="solutions-to-exercises.html#ref-patton2020you">2020</a>)</span>, <span class="citation">A. Y. Chen and Velikov (<a href="solutions-to-exercises.html#ref-chen2020zeroing">2020</a>)</span>). Nevertheless, as is demonstrated by <span class="citation">A. Y. Chen (<a href="solutions-to-exercises.html#ref-chen2019limits">2019</a>)</span>, <span class="math inline">\(p\)</span>-hacking alone cannot account for all the anomalies documented in the literature. One way to reduce the risk of spurious detection is to increase the hurdles (often, the <span class="math inline">\(t\)</span>-statistics) but the debate is still ongoing (<span class="citation">C. R. Harvey, Liu, and Zhu (<a href="solutions-to-exercises.html#ref-harvey2016and">2016</a>)</span>, <span class="citation">A. Y. Chen (<a href="solutions-to-exercises.html#ref-chen2020t">2020a</a>)</span>, <span class="citation">C. R. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2021uncovering">2021</a>)</span>), or to resort to multiple testing (<span class="citation">C. R. Harvey, Liu, and Saretto (<a href="solutions-to-exercises.html#ref-harvey2019evaluation">2020</a>)</span>, <span class="citation">Vincent, Hsu, and Lin (<a href="solutions-to-exercises.html#ref-vincent2020investment">2020</a>)</span>). Nevertheless, the large sample sizes used in finance may mechanically lead to very low <span class="math inline">\(p\)</span>-values and we refer to <span class="citation">Michaelides (<a href="solutions-to-exercises.html#ref-michaelides2020large">2020</a>)</span> for a discussion on this topic.</p>
<p>Some researchers document fading anomalies because of publication: once the anomaly becomes public, agents invest in it, which pushes prices up and the anomaly disappears. <span class="citation">McLean and Pontiff (<a href="solutions-to-exercises.html#ref-mclean2016does">2016</a>)</span> and <span class="citation">Shanaev and Ghimire (<a href="solutions-to-exercises.html#ref-shanaev2020efficient">2020</a>)</span> document this effect in the US but <span class="citation">H. Jacobs and Müller (<a href="solutions-to-exercises.html#ref-jacobs2019anomalies">2020</a>)</span> find that all other countries experience sustained post-publication factor returns (see also <span class="citation">Zaremba, Umutlu, and Maydubura (<a href="solutions-to-exercises.html#ref-zaremba2020have">2020</a>)</span>). With a different methodology, <span class="citation">A. Y. Chen and Zimmermann (<a href="solutions-to-exercises.html#ref-chen2020publication">2020</a>)</span> introduce a publication bias adjustment for returns and the authors note that this (negative) adjustment is in fact rather small. Likewise, <span class="citation">A. Y. Chen (<a href="solutions-to-exercises.html#ref-chen2020limits">2020b</a>)</span> finds that <span class="math inline">\(p\)</span>-hacking cannot be responsible for all the anomalies reported in the literature.
<span class="citation">Penasse (<a href="solutions-to-exercises.html#ref-penasse2018understanding">2022</a>)</span> recommends the notion of <em>alpha decay</em> to study the persistence or attenuation of anomalies (see also <span class="citation">Falck, Rej, and Thesmar (<a href="solutions-to-exercises.html#ref-falck2021and">2021</a>)</span> on this matter). <span class="citation">Horenstein (<a href="solutions-to-exercises.html#ref-horenstein2020unintended">2020</a>)</span> even builds a model in which agents invest according to anomalies reporting in academic research.</p>
<p>The destruction of factor premia may be due to herding (<span class="citation">Krkoska and Schenk-Hoppé (<a href="solutions-to-exercises.html#ref-krkoska2019herding">2019</a>)</span>, <span class="citation">Volpati et al. (<a href="solutions-to-exercises.html#ref-volpati2020zooming">2020</a>)</span>) and could be accelerated by the democratization of so-called smart-beta products (ETFs notably) that allow investors to directly invest in particular styles (value, low volatility, etc.) - see <span class="citation">S. Huang, Song, and Xiang (<a href="solutions-to-exercises.html#ref-huang2020smart">2020</a>)</span>. For a theoretical perspective on the attractiveness of factor investing, we refer to <span class="citation">Jin (<a href="solutions-to-exercises.html#ref-jin2019drivers">2019</a>)</span> and for its impact on the active fund industry, to <span class="citation">Densmore (<a href="solutions-to-exercises.html#ref-densmore2021growth">2021</a>)</span>. For an empirical study that links crowding to factor returns we point to <span class="citation">Kang, Rouwenhorst, and Tang (<a href="solutions-to-exercises.html#ref-kang2021crowding">2021</a>)</span>. <span class="citation">D. H. Bailey and Lopez de Prado (<a href="solutions-to-exercises.html#ref-bailey2021finance">2021</a>)</span> (via <span class="citation">Brightman, Li, and Liu (<a href="solutions-to-exercises.html#ref-brightman2015chasing">2015</a>)</span>) recall that before their launch, ETFs report a 5% excess return, while they experience a 0% return on average posterior to their launch.</p>
<p>On the other hand, <span class="citation">DeMiguel, Martin Utrera, and Uppal (<a href="solutions-to-exercises.html#ref-demiguel2019crowding">2019</a>)</span> argue that the price impact of crowding in the smart-beta universe is mitigated by trading diversification stemming from external institutions that trade according to strategies outside this space (e.g., high frequency traders betting via order-book algorithms).</p>
<p>The remainder of this subsection was inspired from <span class="citation">Baker, Luo, and Taliaferro (<a href="solutions-to-exercises.html#ref-baker2017detecting">2017</a>)</span> and <span class="citation">C. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2017lucky">2019</a>)</span>.</p>
</div>
<div id="simple-portfolio-sorts" class="section level3" number="3.2.2">
<h3>
<span class="header-section-number">3.2.2</span> Simple portfolio sorts <a class="anchor" aria-label="anchor" href="#simple-portfolio-sorts"><i class="fas fa-link"></i></a>
</h3>
<p>This is the most common procedure and the one used in <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1992cross">1992</a>)</span>. The idea is simple. On one date,</p>
<ol style="list-style-type: decimal">
<li>rank firms according to a particular criterion (e.g., size, book-to-market ratio);<br>
</li>
<li>form <span class="math inline">\(J\ge 2\)</span> portfolios (i.e., homogeneous groups) consisting of the same number of stocks according to the ranking (usually, <span class="math inline">\(J=2\)</span>, <span class="math inline">\(J=3\)</span>, <span class="math inline">\(J=5\)</span> or <span class="math inline">\(J=10\)</span> portfolios are built, based on the median, terciles, quintiles or deciles of the criterion);<br>
</li>
<li>the weight of stocks inside the portfolio is either uniform (equal weights), or proportional to market capitalization;<br>
</li>
<li>at a future date (usually one month), report the returns of the portfolios.<br>
Then, iterate the procedure until the chronological end of the sample is reached.</li>
</ol>
<p>The outcome is a time series of portfolio returns <span class="math inline">\(r_t^j\)</span> for each grouping <span class="math inline">\(j\)</span>. An anomaly is identified if the <span class="math inline">\(t\)</span>-test between the first (<span class="math inline">\(j=1\)</span>) and the last group (<span class="math inline">\(j=J\)</span>) unveils a significant difference in average returns. More robust tests are described in <span class="citation">Cattaneo et al. (<a href="solutions-to-exercises.html#ref-cattaneo2019characteristic">2020</a>)</span>. A strong limitation of this approach is that the sorting criterion could have a non-monotonic impact on returns and a test based on the two extreme portfolios would not detect it. Several articles address this concern: <span class="citation">Patton and Timmermann (<a href="solutions-to-exercises.html#ref-patton2010monotonicity">2010</a>)</span> and <span class="citation">Romano and Wolf (<a href="solutions-to-exercises.html#ref-romano2013testing">2013</a>)</span> for instance. Another concern is that these sorted portfolios may capture not only the priced risk associated to the characteristic, but also some unpriced risk. <span class="citation">K. Daniel et al. (<a href="solutions-to-exercises.html#ref-daniel2020cross">2020</a>)</span> show that it is possible to disentangle the two and make the most of altered sorted portfolios.</p>
<p>Instead of focusing on only one criterion, it is possible to group asset according to more characteristics. The original paper <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1992cross">1992</a>)</span> also combines market capitalization with book-to-market ratios. Each characteristic is divided into 10 buckets, which makes 100 portfolios in total. Beyond data availability, there is no upper bound on the number of features that can be included in the sorting process. In fact, some authors investigate more complex sorting algorithms that can manage a potentially large number of characteristics (see e.g., <span class="citation">Feng, Polson, and Xu (<a href="solutions-to-exercises.html#ref-feng2019deep">2019</a>)</span> and <span class="citation">Bryzgalova, Pelger, and Zhu (<a href="solutions-to-exercises.html#ref-bryzgalova2019forest">2019</a>)</span>).</p>
<p>Finally, we refer to <span class="citation">Olivier Ledoit, Wolf, and Zhao (<a href="solutions-to-exercises.html#ref-ledoit2018efficient">2020</a>)</span> for refinements that take into account the covariance structure of asset returns and to <span class="citation">Cattaneo et al. (<a href="solutions-to-exercises.html#ref-cattaneo2019characteristic">2020</a>)</span> for a theoretical study on the statistical properties of the sorting procedure (including theoretical links with regression-based approaches). Notably, the latter paper discusses the optimal number of portfolios and suggests that it is probably larger than the usual 10 often used in the literature.</p>
<p>In the code and Figure <a href="factor.html#fig:factportsort">3.1</a> below, we compute size portfolios (equally weighted: above versus below the median capitalization). According to the size anomaly, the firms with below median market cap should earn higher returns on average. This is verified whenever the orange bar in the plot is above the blue one (it happens most of the time).</p>
<div class="sourceCode" id="cb12"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">data_ml</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span><span class="op">(</span><span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                            </span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>large <span class="op">=</span> <span class="va">Mkt_Cap_12M_Usd</span> <span class="op">&gt;</span> <span class="fu"><a href="https://rdrr.io/r/stats/median.html">median</a></span><span class="op">(</span><span class="va">Mkt_Cap_12M_Usd</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="co"># Creates the cap sort</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">ungroup</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                                 <span class="co"># Ungroup</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>year <span class="op">=</span> <span class="fu">lubridate</span><span class="fu">::</span><span class="fu"><a href="https://lubridate.tidyverse.org/reference/year.html">year</a></span><span class="op">(</span><span class="va">date</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                      <span class="co"># Creates a year variable</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span><span class="op">(</span><span class="va">year</span>, <span class="va">large</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                     <span class="co"># Analyze by year &amp; cap</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span><span class="op">(</span>avg_return <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span><span class="op">(</span><span class="va">R1M_Usd</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                     <span class="co"># Compute average return</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span>x <span class="op">=</span> <span class="va">year</span>, y <span class="op">=</span> <span class="va">avg_return</span>, fill <span class="op">=</span> <span class="va">large</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span>         <span class="co"># Plot!</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_col</a></span><span class="op">(</span>position <span class="op">=</span> <span class="st">"dodge"</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_light</a></span><span class="op">(</span><span class="op">)</span> <span class="op">+</span>                <span class="co"># Bars side-to-side</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span><span class="op">(</span>legend.position <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">0.8</span>, <span class="fl">0.2</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span>                        <span class="co"># Legend location</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed</a></span><span class="op">(</span><span class="fl">124</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span><span class="op">(</span>legend.title<span class="op">=</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_blank</a></span><span class="op">(</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span>      <span class="co"># x/y aspect ratio</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/scale_manual.html">scale_fill_manual</a></span><span class="op">(</span>values<span class="op">=</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"#F87E1F"</span>, <span class="st">"#0570EA"</span><span class="op">)</span>, name <span class="op">=</span> <span class="st">""</span>,  <span class="co"># Colors</span></span>
<span>                      labels<span class="op">=</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Small"</span>, <span class="st">"Large"</span><span class="op">)</span><span class="op">)</span>  <span class="op">+</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">ylab</a></span><span class="op">(</span><span class="st">"Average returns"</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span><span class="op">(</span>legend.text<span class="op">=</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/element.html">element_text</a></span><span class="op">(</span>size<span class="op">=</span><span class="fl">9</span><span class="op">)</span><span class="op">)</span> </span></code></pre></div>
<div class="figure" style="text-align: center">
<span style="display:block;" id="fig:factportsort"></span>
<img src="ML_factor_files/figure-html/factportsort-1.png" alt="The size factor: average returns of small versus large firms." width="700px"><p class="caption">
FIGURE 3.1: The size factor: average returns of small versus large firms.
</p>
</div>
<p></p>
</div>
<div id="factors" class="section level3" number="3.2.3">
<h3>
<span class="header-section-number">3.2.3</span> Factors<a class="anchor" aria-label="anchor" href="#factors"><i class="fas fa-link"></i></a>
</h3>
<p>The construction of so-called factors follows the same lines as above. Portfolios are based on one characteristic and the factor is a long-short ensemble of one extreme portfolio minus the opposite extreme (small minus large for the size factor or high book-to-market ratio minus low book-to-market ratio for the value factor). Sometimes, subtleties include forming bivariate sorts and aggregating several portfolios together, as in the original contribution of <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span>. The most common factors are listed below, along with a few references. We refer to the books listed at the beginning of the chapter for a more exhaustive treatment of factor idiosyncrasies. For most anomalies, theoretical justifications have been brought forward, whether risk-based or behavioral. We list the most frequently cited factors below:</p>
<ul>
<li>Size (<strong>SMB</strong> = small firms minus large firms): <span class="citation">Banz (<a href="solutions-to-exercises.html#ref-banz1981relationship">1981</a>)</span>, <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1992cross">1992</a>)</span>, <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span>, <span class="citation">Van Dijk (<a href="solutions-to-exercises.html#ref-van2011size">2011</a>)</span>, <span class="citation">Clifford Asness et al. (<a href="solutions-to-exercises.html#ref-asness2018size">2018</a>)</span> and <span class="citation">Astakhov, Havranek, and Novak (<a href="solutions-to-exercises.html#ref-astakhov2019firm">2019</a>)</span>.<br>
</li>
<li>Value (<strong>HML</strong> = high minus low: undervalued minus `growth’ firms): <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1992cross">1992</a>)</span>, <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span>, <span class="citation">C. S. Asness, Moskowitz, and Pedersen (<a href="solutions-to-exercises.html#ref-asness2013value">2013</a>)</span>. See <span class="citation">Israel, Laursen, and Richardson (<a href="solutions-to-exercises.html#ref-israel2020systematic">2020</a>)</span> and <span class="citation">Roca (<a href="solutions-to-exercises.html#ref-roca2021new">2021</a>)</span> for recent discussions.<br>
</li>
<li>Momentum (<strong>WML</strong> = winners minus losers): <span class="citation">Jegadeesh and Titman (<a href="solutions-to-exercises.html#ref-jegadeesh1993returns">1993</a>)</span>, <span class="citation">Carhart (<a href="solutions-to-exercises.html#ref-carhart1997persistence">1997</a>)</span> and <span class="citation">C. S. Asness, Moskowitz, and Pedersen (<a href="solutions-to-exercises.html#ref-asness2013value">2013</a>)</span>. The winners are the assets that have experienced the highest returns over the last year (sometimes the computation of the return is truncated to omit the last month). Cross-sectional momentum is linked, but not equivalent, to time series momentum (trend following), see e.g., <span class="citation">Moskowitz, Ooi, and Pedersen (<a href="solutions-to-exercises.html#ref-moskowitz2012time">2012</a>)</span> and <span class="citation">Lempérière et al. (<a href="solutions-to-exercises.html#ref-lemperiere2014two">2014</a>)</span>. Momentum is also related to contrarian movements that occur both at higher and lower frequencies (short-term and long-term reversals), see <span class="citation">Luo, Subrahmanyam, and Titman (<a href="solutions-to-exercises.html#ref-luo2020momentum">2020</a>)</span>.<br>
</li>
<li>Profitability (<strong>RMW</strong> = robust minus weak profits): <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2015five">2015</a>)</span>, <span class="citation">Bouchaud et al. (<a href="solutions-to-exercises.html#ref-bouchaud2019sticky">2019</a>)</span>. In the former reference, profitability is measured as (revenues - (cost and expenses))/equity.<br>
</li>
<li>Investment (<strong>CMA</strong> = conservative minus aggressive): <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2015five">2015</a>)</span>, <span class="citation">Hou, Xue, and Zhang (<a href="solutions-to-exercises.html#ref-hou2015digesting">2015</a>)</span>. Investment is measured via the growth of total assets (divided by total assets). Aggressive firms are those that experience the largest growth in assets.<br>
</li>
<li>Low `risk’ (sometimes, <strong>BAB</strong> = betting against beta): <span class="citation">Ang et al. (<a href="solutions-to-exercises.html#ref-ang2006cross">2006</a>)</span>, <span class="citation">Baker, Bradley, and Wurgler (<a href="solutions-to-exercises.html#ref-baker2011benchmarks">2011</a>)</span>, <span class="citation">Frazzini and Pedersen (<a href="solutions-to-exercises.html#ref-frazzini2014betting">2014</a>)</span>, <span class="citation">Boloorforoosh et al. (<a href="solutions-to-exercises.html#ref-boloorforoosh2019beta">2020</a>)</span>, <span class="citation">Baker, Hoeyer, and Wurgler (<a href="solutions-to-exercises.html#ref-baker2019leverage">2020</a>)</span> and <span class="citation">Cliff Asness et al. (<a href="solutions-to-exercises.html#ref-asness2020betting">2020</a>)</span>. In this case, the computation of risk changes from one article to the other (simple volatility, market beta, idiosyncratic volatility, etc.).</li>
</ul>
<p>With the notable exception of the low risk premium, the most mainstream anomalies are kept and updated in the data library of Kenneth French (<a href="https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html" class="uri">https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html</a>). Of course, the computation of the factors follows a particular set of rules, but they are generally accepted in the academic sphere. Another source of data is the AQR repository: <a href="https://www.aqr.com/Insights/Datasets" class="uri">https://www.aqr.com/Insights/Datasets</a>.</p>
<p>In the dataset we use for the book, we proxy the value anomaly not with the book-to-market ratio but with the price-to-book ratio (the book value is located in the denominator). As is shown in <span class="citation">Clifford Asness and Frazzini (<a href="solutions-to-exercises.html#ref-asness2013devil">2013</a>)</span>, the choice of the variable for value can have sizable effects.</p>
<p>Below, we import data from Ken French’s data library. A word of caution: the data is updated frequently and sometimes, experiences methodological changes. We refer to <span class="citation">Akey, Robertson, and Simutin (<a href="solutions-to-exercises.html#ref-akey2021noisy">2021</a>)</span> for a study of such changes in the common Fama-French factors. We will use this data later on in the chapter.</p>
<div class="sourceCode" id="cb13"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="http://www.quantmod.com">quantmod</a></span><span class="op">)</span>                         <span class="co"># Package for data extraction</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="http://xtable.r-forge.r-project.org/">xtable</a></span><span class="op">)</span>                           <span class="co"># Package for LaTeX exports </span></span>
<span><span class="va">min_date</span> <span class="op">&lt;-</span> <span class="st">"1963-07-31"</span>                  <span class="co"># Start date</span></span>
<span><span class="va">max_date</span> <span class="op">&lt;-</span> <span class="st">"2020-03-28"</span>                  <span class="co"># Stop date</span></span>
<span><span class="va">temp</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/tempfile.html">tempfile</a></span><span class="op">(</span><span class="op">)</span></span>
<span><span class="va">KF_website</span> <span class="op">&lt;-</span> <span class="st">"http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/"</span></span>
<span><span class="va">KF_file</span> <span class="op">&lt;-</span> <span class="st">"ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip"</span></span>
<span><span class="va">link</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste0</a></span><span class="op">(</span><span class="va">KF_website</span>, <span class="va">KF_file</span><span class="op">)</span>       <span class="co"># Link of the file</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/download.file.html">download.file</a></span><span class="op">(</span><span class="va">link</span>, <span class="va">temp</span>, quiet <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span>   <span class="co"># Download!</span></span>
<span><span class="va">FF_factors</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://readr.tidyverse.org/reference/read_delim.html">read_csv</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/connections.html">unz</a></span><span class="op">(</span><span class="va">temp</span>, <span class="st">"F-F_Research_Data_5_Factors_2x3.csv"</span><span class="op">)</span>, </span>
<span>                       skip <span class="op">=</span> <span class="fl">3</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>          <span class="co"># Check the number of lines to skip!</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/rename.html">rename</a></span><span class="op">(</span>date <span class="op">=</span> <span class="va">`...1`</span>, MKT_RF <span class="op">=</span> <span class="va">`Mkt-RF`</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>  <span class="co"># Change the name of first columns</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate_all.html">mutate_at</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/vars.html">vars</a></span><span class="op">(</span><span class="op">-</span><span class="va">date</span><span class="op">)</span>, <span class="va">as.numeric</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                 <span class="co"># Convert values to number</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>date <span class="op">=</span> <span class="fu"><a href="https://lubridate.tidyverse.org/reference/ymd.html">ymd</a></span><span class="op">(</span><span class="fu"><a href="https://lubridate.tidyverse.org/reference/parse_date_time.html">parse_date_time</a></span><span class="op">(</span><span class="va">date</span>, <span class="st">"%Y%m"</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>  <span class="co"># Date in right format</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>date <span class="op">=</span> <span class="fu"><a href="https://lubridate.tidyverse.org/reference/rollbackward.html">rollback</a></span><span class="op">(</span><span class="va">date</span> <span class="op">+</span> <span class="fu"><a href="https://rdrr.io/r/base/weekday.POSIXt.html">months</a></span><span class="op">(</span><span class="fl">1</span><span class="op">)</span><span class="op">)</span><span class="op">)</span>              <span class="co"># End of month date</span></span>
<span><span class="va">FF_factors</span> <span class="op">&lt;-</span> <span class="va">FF_factors</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>MKT_RF <span class="op">=</span> <span class="va">MKT_RF</span> <span class="op">/</span> <span class="fl">100</span>, <span class="co"># Scale returns</span></span>
<span>                                    SMB <span class="op">=</span> <span class="va">SMB</span> <span class="op">/</span> <span class="fl">100</span>,</span>
<span>                                    HML <span class="op">=</span> <span class="va">HML</span> <span class="op">/</span> <span class="fl">100</span>,</span>
<span>                                    RMW <span class="op">=</span> <span class="va">RMW</span> <span class="op">/</span> <span class="fl">100</span>,</span>
<span>                                    CMA <span class="op">=</span> <span class="va">CMA</span> <span class="op">/</span> <span class="fl">100</span>,</span>
<span>                                    RF <span class="op">=</span> <span class="va">RF</span><span class="op">/</span><span class="fl">100</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span><span class="op">(</span><span class="va">date</span> <span class="op">&gt;=</span> <span class="va">min_date</span>, <span class="va">date</span> <span class="op">&lt;=</span> <span class="va">max_date</span><span class="op">)</span>             <span class="co"># Finally, keep only recent points</span></span>
<span><span class="fu">knitr</span><span class="fu">::</span><span class="fu"><a href="https://rdrr.io/pkg/knitr/man/kable.html">kable</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/utils/head.html">head</a></span><span class="op">(</span><span class="va">FF_factors</span><span class="op">)</span>,  booktabs <span class="op">=</span> <span class="cn">TRUE</span>,</span>
<span>             caption <span class="op">=</span> <span class="st">"Sample of monthly factor returns."</span><span class="op">)</span> <span class="co"># A look at the data (see table)                   </span></span></code></pre></div>
<div class="inline-table"><table class="table table-sm">
<caption>
<span id="tab:factorImport">TABLE 3.1: </span>Sample of monthly factor returns.
</caption>
<thead><tr>
<th style="text-align:left;">
date
</th>
<th style="text-align:right;">
MKT_RF
</th>
<th style="text-align:right;">
SMB
</th>
<th style="text-align:right;">
HML
</th>
<th style="text-align:right;">
RMW
</th>
<th style="text-align:right;">
CMA
</th>
<th style="text-align:right;">
RF
</th>
</tr></thead>
<tbody>
<tr>
<td style="text-align:left;">
1963-07-31
</td>
<td style="text-align:right;">
-0.0039
</td>
<td style="text-align:right;">
-0.0041
</td>
<td style="text-align:right;">
-0.0097
</td>
<td style="text-align:right;">
0.0068
</td>
<td style="text-align:right;">
-0.0118
</td>
<td style="text-align:right;">
0.0027
</td>
</tr>
<tr>
<td style="text-align:left;">
1963-08-31
</td>
<td style="text-align:right;">
0.0507
</td>
<td style="text-align:right;">
-0.0080
</td>
<td style="text-align:right;">
0.0180
</td>
<td style="text-align:right;">
0.0036
</td>
<td style="text-align:right;">
-0.0035
</td>
<td style="text-align:right;">
0.0025
</td>
</tr>
<tr>
<td style="text-align:left;">
1963-09-30
</td>
<td style="text-align:right;">
-0.0157
</td>
<td style="text-align:right;">
-0.0052
</td>
<td style="text-align:right;">
0.0013
</td>
<td style="text-align:right;">
-0.0071
</td>
<td style="text-align:right;">
0.0029
</td>
<td style="text-align:right;">
0.0027
</td>
</tr>
<tr>
<td style="text-align:left;">
1963-10-31
</td>
<td style="text-align:right;">
0.0253
</td>
<td style="text-align:right;">
-0.0139
</td>
<td style="text-align:right;">
-0.0010
</td>
<td style="text-align:right;">
0.0280
</td>
<td style="text-align:right;">
-0.0201
</td>
<td style="text-align:right;">
0.0029
</td>
</tr>
<tr>
<td style="text-align:left;">
1963-11-30
</td>
<td style="text-align:right;">
-0.0085
</td>
<td style="text-align:right;">
-0.0088
</td>
<td style="text-align:right;">
0.0175
</td>
<td style="text-align:right;">
-0.0051
</td>
<td style="text-align:right;">
0.0224
</td>
<td style="text-align:right;">
0.0027
</td>
</tr>
<tr>
<td style="text-align:left;">
1963-12-31
</td>
<td style="text-align:right;">
0.0183
</td>
<td style="text-align:right;">
-0.0210
</td>
<td style="text-align:right;">
-0.0002
</td>
<td style="text-align:right;">
0.0003
</td>
<td style="text-align:right;">
-0.0007
</td>
<td style="text-align:right;">
0.0029
</td>
</tr>
</tbody>
</table></div>
<p></p>
<p>Posterior to the discovery of these stylized facts, some contributions have aimed at building theoretical models that capture these properties. We cite a handful below:</p>
<ul>
<li>
<strong>size</strong> and <strong>value</strong>: <span class="citation">Berk, Green, and Naik (<a href="solutions-to-exercises.html#ref-berk1999optimal">1999</a>)</span>, <span class="citation">K. D. Daniel, Hirshleifer, and Subrahmanyam (<a href="solutions-to-exercises.html#ref-daniel2001overconfidence">2001</a>)</span>, <span class="citation">Barberis and Shleifer (<a href="solutions-to-exercises.html#ref-barberis2003style">2003</a>)</span>, <span class="citation">Gomes, Kogan, and Zhang (<a href="solutions-to-exercises.html#ref-gomes2003equilibrium">2003</a>)</span>, <span class="citation">Carlson, Fisher, and Giammarino (<a href="solutions-to-exercises.html#ref-carlson2004corporate">2004</a>)</span>, <span class="citation">R. D. Arnott et al. (<a href="solutions-to-exercises.html#ref-arnott2014can">2014</a>)</span>;</li>
<li>
<strong>momentum</strong>: <span class="citation">T. C. Johnson (<a href="solutions-to-exercises.html#ref-johnson2002rational">2002</a>)</span>, <span class="citation">Grinblatt and Han (<a href="solutions-to-exercises.html#ref-grinblatt2005prospect">2005</a>)</span>, <span class="citation">Vayanos and Woolley (<a href="solutions-to-exercises.html#ref-vayanos2013institutional">2013</a>)</span>, <span class="citation">Choi and Kim (<a href="solutions-to-exercises.html#ref-choi2014momentum">2014</a>)</span>.</li>
</ul>
<p>In addition, recent bridges have been built between risk-based factor representations and behavioural theories. We refer essentially to <span class="citation">Barberis, Mukherjee, and Wang (<a href="solutions-to-exercises.html#ref-barberis2016prospect">2016</a>)</span> and <span class="citation">K. Daniel, Hirshleifer, and Sun (<a href="solutions-to-exercises.html#ref-daniel2019short">2020</a>)</span> and the references therein.</p>
<p>While these factors (i.e., long-short portfolios) exhibit time-varying risk premia and are magnified by corporate news and announcements (<span class="citation">Engelberg, McLean, and Pontiff (<a href="solutions-to-exercises.html#ref-engelberg2018anomalies">2018</a>)</span>), it is well-documented (and accepted) that they deliver positive returns over long horizons.<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;This has been a puzzle for the value factor during the 2010 decade during which the factor performed poorly (see &lt;span class="citation"&gt;Bellone et al. (&lt;a href="solutions-to-exercises.html#ref-bellone2020equity"&gt;2020&lt;/a&gt;)&lt;/span&gt; &lt;span class="citation"&gt;Cornell and Damodaran (&lt;a href="solutions-to-exercises.html#ref-cornell2021value"&gt;2021&lt;/a&gt;)&lt;/span&gt; and &lt;span class="citation"&gt;Stagnol et al. (&lt;a href="solutions-to-exercises.html#ref-stagnol2021understanding"&gt;2021&lt;/a&gt;)&lt;/span&gt;). &lt;span class="citation"&gt;Shea and Radatz (&lt;a href="solutions-to-exercises.html#ref-shea2020searching"&gt;2020&lt;/a&gt;)&lt;/span&gt; argue that it is because some fundamentals of value firms (like ROE) have not improved at the rate of those of growth firms. This underlines that it is hard to pick which fundamental metrics matter and that their importance varies with time. &lt;span class="citation"&gt;Binz, Schipper, and Standridge (&lt;a href="solutions-to-exercises.html#ref-binz2020can"&gt;2020&lt;/a&gt;)&lt;/span&gt; even find that resorting to AI to make sense (and mine) the fundamentals’ zoo only helps marginally.&lt;/p&gt;'><sup>6</sup></a> We refer to <span class="citation">Gagliardini, Ossola, and Scaillet (<a href="solutions-to-exercises.html#ref-gagliardini2016time">2016</a>)</span> and to the survey <span class="citation">Gagliardini, Ossola, and Scaillet (<a href="solutions-to-exercises.html#ref-gagliardini2019estimation">2019</a>)</span>, as well as to the related bibliography for technical details on estimation procedures of risk premia and the corresponding empirical results. Large sample studies that documents regime changes in factor premia were also carried out by <span class="citation">Ilmanen et al. (<a href="solutions-to-exercises.html#ref-ilmanen2019factor">2019</a>)</span>, <span class="citation">S. Smith and Timmermann (<a href="solutions-to-exercises.html#ref-smith2020instability">2021</a>)</span> and <span class="citation">Chib, Zhao, and Zhou (<a href="solutions-to-exercises.html#ref-chib2021finding">2021</a>)</span>. Moreover, the predictability of returns is also time-varying (as documented in <span class="citation">Farmer, Schmidt, and Timmermann (<a href="solutions-to-exercises.html#ref-farmer2019pockets">2019</a>)</span>, <span class="citation">Tsiakas, Li, and Zhang (<a href="solutions-to-exercises.html#ref-tsiakas2020equity">2020</a>)</span> and <span class="citation">Liu, Pan, and Wang (<a href="solutions-to-exercises.html#ref-liu2020can">2020</a>)</span>), and estimation methods can be improved (<span class="citation">T. L. Johnson (<a href="solutions-to-exercises.html#ref-johnson2019fresh">2019</a>)</span>).</p>
<p>In Figure <a href="factor.html#fig:riskpremiaFF">3.2</a>, we plot the average monthly return aggregated over each calendar year for five common factors. The risk free rate (which is not a factor per se) is the most stable, while the market factor (aggregate market returns minus the risk-free rate) is the most volatile. This makes sense because it is the only long equity factor among the five series.</p>
<div class="sourceCode" id="cb14"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">FF_factors</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>date <span class="op">=</span> <span class="fu"><a href="https://lubridate.tidyverse.org/reference/year.html">year</a></span><span class="op">(</span><span class="va">date</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                       <span class="co"># Turn date into year</span></span>
<span>    <span class="fu"><a href="https://tidyr.tidyverse.org/reference/gather.html">gather</a></span><span class="op">(</span>key <span class="op">=</span> <span class="va">factor</span>, value <span class="op">=</span> <span class="va">value</span>, <span class="op">-</span> <span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>     <span class="co"># Put in tidy shape</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span><span class="op">(</span><span class="va">date</span>, <span class="va">factor</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                          <span class="co"># Group by year and factor</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise</a></span><span class="op">(</span>value <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span><span class="op">(</span><span class="va">value</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                  <span class="co"># Compute average return</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span>x <span class="op">=</span> <span class="va">date</span>, y <span class="op">=</span> <span class="va">value</span>, color <span class="op">=</span> <span class="va">factor</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span>  <span class="co"># Plot</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_path.html">geom_line</a></span><span class="op">(</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed</a></span><span class="op">(</span><span class="fl">500</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_light</a></span><span class="op">(</span><span class="op">)</span>      <span class="co"># Fix x/y ratio + theme</span></span></code></pre></div>
<div class="figure">
<span style="display:block;" id="fig:riskpremiaFF"></span>
<img src="ML_factor_files/figure-html/riskpremiaFF-1.png" alt="Average returns of common anomalies (1963-2020). Source: Ken French library." width="672"><p class="caption">
FIGURE 3.2: Average returns of common anomalies (1963-2020). Source: Ken French library.
</p>
</div>
<p></p>
<p>The individual attributes of investors who allocate towards particular factors is a blossoming topic. We list a few references below, even though they somewhat lie out of the scope of this book. <span class="citation">Betermier, Calvet, and Sodini (<a href="solutions-to-exercises.html#ref-betermier2017value">2017</a>)</span> show that value investors are older, wealthier and face lower income risk compared to growth investors who are those in the best position to take financial risks. The study <span class="citation">Cronqvist, Siegel, and Yu (<a href="solutions-to-exercises.html#ref-cronqvist2015value">2015</a>)</span> leads to different conclusions: it finds that the propensity to invest in value versus growth assets has roots in genetics and in life events (the latter effect being confirmed in <span class="citation">Cocco, Gomes, and Lopes (<a href="solutions-to-exercises.html#ref-cocco2019evidence">2020</a>)</span>, and the former being further detailed in a more general context in <span class="citation">Cronqvist et al. (<a href="solutions-to-exercises.html#ref-cronqvist2015fetal">2015</a>)</span>). Psychological traits can also explain some factors: when agents extrapolate, they are likely to fuel momentum (this topic is thoroughly reviewed in <span class="citation">Barberis (<a href="solutions-to-exercises.html#ref-barberis2018psychology">2018</a>)</span>). Micro- and macro-economic consequences of these preferences are detailed in <span class="citation">Bhamra and Uppal (<a href="solutions-to-exercises.html#ref-bhamra2019does">2019</a>)</span>. To conclude this paragraph, we mention that theoretical models have also been proposed that link agents’ preferences and beliefs (via prospect theory) to market anomalies (see for instance <span class="citation">Barberis, Jin, and Wang (<a href="solutions-to-exercises.html#ref-barberis2019prospect">2020</a>)</span>).</p>
<p>Finally, we highlight the need of replicability of factor premia and echo the recent editorial by <span class="citation">C. R. Harvey (<a href="solutions-to-exercises.html#ref-harvey2020replication">2020</a>)</span>. As is shown by <span class="citation">Linnainmaa and Roberts (<a href="solutions-to-exercises.html#ref-linnainmaa2018history">2018</a>)</span> and <span class="citation">Hou, Xue, and Zhang (<a href="solutions-to-exercises.html#ref-hou2019replicating">2020</a>)</span>, many proclaimed factors are in fact very much data-dependent and often fail to deliver sustained profitability when the investment universe is altered or when the definition of variable changes (<span class="citation">Clifford Asness and Frazzini (<a href="solutions-to-exercises.html#ref-asness2013devil">2013</a>)</span>).</p>
<p>Campbell Harvey and his co-authors, in a series of papers, tried to synthesize the research on factors in <span class="citation">C. R. Harvey, Liu, and Zhu (<a href="solutions-to-exercises.html#ref-harvey2016and">2016</a>)</span>, <span class="citation">C. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2017lucky">2019</a>)</span> and <span class="citation">C. R. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2019census">2019</a>)</span>. His work underlines the need to set high bars for an anomaly to be called a ‘true’ factor. Increasing thresholds for <span class="math inline">\(p\)</span>-values is only a partial answer, as it is always possible to resort to data snooping in order to find an optimized strategy that will fail out-of-sample but that will deliver a <span class="math inline">\(t\)</span>-statistic larger than three (or even four). <span class="citation">C. R. Harvey (<a href="solutions-to-exercises.html#ref-harvey2017presidential">2017</a>)</span> recommends to resort to a Bayesian approach which blends data-based significance with a prior into a so-called Bayesianized <em>p</em>-value (see subsection below).</p>
<p>Following this work, researchers have continued to explore the richness of this zoo. <span class="citation">Bryzgalova, Huang, and Julliard (<a href="solutions-to-exercises.html#ref-bryzgalova2019bayesian">2019</a>)</span> propose a tractable Bayesian estimation of large-dimensional factor models and evaluate all possible combinations of more than 50 factors, yielding an incredibly large number of coefficients. This combined with a Bayesianized <span class="citation">Fama and MacBeth (<a href="solutions-to-exercises.html#ref-fama1973risk">1973</a>)</span> procedure allows to distinguish between pervasive and superfluous factors. <span class="citation">Chordia, Goyal, and Saretto (<a href="solutions-to-exercises.html#ref-chordia2020anomalies">2020</a>)</span> use simulations of 2 million trading strategies to estimate the rate of <em>false discoveries</em>, that is, when a spurious factor is detected (type I error). They also advise to use thresholds for <em>t</em>-statistics that are well above three. In a similar vein, <span class="citation">C. R. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2019false">2020</a>)</span> also underline that sometimes <em>true</em> anomalies may be missed because of a one time <span class="math inline">\(t\)</span>-statistic that is too low (type II error).</p>
<p>The propensity of journals to publish positive results has led researchers to estimate the difference between reported returns and <em>true</em> returns. <span class="citation">A. Y. Chen and Zimmermann (<a href="solutions-to-exercises.html#ref-chen2020publication">2020</a>)</span> call this difference the <em>publication bias</em> and estimate it as roughly 12%. That is, if a published average return is 8%, the actual value may in fact be closer to (1-12%)*8%=7%. Qualitatively, this estimation of 12% is smaller than the out-of-sample reduction in returns found in <span class="citation">McLean and Pontiff (<a href="solutions-to-exercises.html#ref-mclean2016does">2016</a>)</span>.</p>
</div>
<div id="predictive-regressions-sorts-and-p-value-issues" class="section level3" number="3.2.4">
<h3>
<span class="header-section-number">3.2.4</span> Predictive regressions, sorts, and p-value issues<a class="anchor" aria-label="anchor" href="#predictive-regressions-sorts-and-p-value-issues"><i class="fas fa-link"></i></a>
</h3>
<p>For simplicity, we assume a simple form:
<span class="math display" id="eq:factsimple">\[\begin{equation}
\tag{3.2}
\textbf{r} = a+b\textbf{x}+\textbf{e},
\end{equation}\]</span>
where the vector <span class="math inline">\(\textbf{r}\)</span> stacks all returns of all stocks and <span class="math inline">\(\textbf{x}\)</span> is a lagged variable so that the regression is indeed predictive. If the estimate <span class="math inline">\(\hat{b}\)</span> is significant given a specified threshold, then it can be tempting to conclude that <span class="math inline">\(\textbf{x}\)</span> does a good job at predicting returns. Hence, long-short portfolios related to extreme values of <span class="math inline">\(\textbf{x}\)</span> (mind the sign of <span class="math inline">\(\hat{b}\)</span>) are expected to generate profits. This is unfortunately often false because <span class="math inline">\(\hat{b}\)</span> gives information on the <em>past</em> ability of <span class="math inline">\(\textbf{x}\)</span> to forecast returns. What happens in the future may be another story.</p>
<p>Statistical tests are also used for portfolio sorts. Assume two extreme portfolios are expected to yield very different average returns (like very small cap versus very large cap, or strong winners versus bad losers). The portfolio returns are written <span class="math inline">\(r_t^+\)</span> and <span class="math inline">\(r_t^-\)</span>. The simplest test for the mean is <span class="math inline">\(t=\sqrt{T}\frac{m_{r_+}-m_{r_-}}{\sigma_{r_+-r_-}}\)</span>, where <span class="math inline">\(T\)</span> is the number of points and <span class="math inline">\(m_{r_\pm}\)</span> denotes the means of returns and <span class="math inline">\(\sigma_{r_+-r_-}\)</span> is the standard deviation of the difference between the two series, i.e., the volatility of the long-short portfolio. In short, the statistic can be viewed as a scaled Sharpe ratio (though usually these ratios are computed for long-only portfolios) and can in turn be used to compute <span class="math inline">\(p\)</span>-values to assess the robustness of an anomaly. As is shown in <span class="citation">Linnainmaa and Roberts (<a href="solutions-to-exercises.html#ref-linnainmaa2018history">2018</a>)</span> and <span class="citation">Hou, Xue, and Zhang (<a href="solutions-to-exercises.html#ref-hou2019replicating">2020</a>)</span>, many factors discovered by researchers fail to survive in out-of-sample tests.</p>
<p>One reason why people are overly optimistic about anomalies they detect is the widespread reverse interpretation of the <em>p</em>-value. Often, it is thought of as the probability of one hypothesis (e.g., my anomaly exists) given the data. In fact, it’s the opposite; it’s the likelihood of your data sample, knowing that the anomaly holds.
<span class="math display">\[\begin{align*}
p-\text{value} &amp;= P[D|H] \\
\text{target prob.}&amp; = P[H|D]=\frac{P[D|H]}{P[D]}\times P[H],
\end{align*}\]</span>
where <span class="math inline">\(H\)</span> stands for hypothesis and <span class="math inline">\(D\)</span> for data. The equality in the second row is a plain application of Bayes’ identity: the interesting probability is in fact a transform of the <span class="math inline">\(p\)</span>-value.</p>
<p>Two articles (at least) discuss this idea. <span class="citation">C. R. Harvey (<a href="solutions-to-exercises.html#ref-harvey2017presidential">2017</a>)</span> introduces <strong>Bayesianized</strong> <span class="math inline">\(p\)</span>-<strong>values</strong>:
<span class="math display" id="eq:Bpv">\[\begin{equation}
\tag{3.3}
\text{Bayesianized } p-\text{value}=\text{Bpv}= e^{-t^2/2}\times\frac{\text{prior}}{1+e^{-t^2/2}\times \text{prior}} ,
\end{equation}\]</span>
where <span class="math inline">\(t\)</span> is the <span class="math inline">\(t\)</span>-statistic obtained from the regression (i.e., the one that defines the <em>p</em>-value) and prior is the analyst’s estimation of the odds that the hypothesis (anomaly) is true. The prior is coded as follows. Suppose there is a p% chance that the null holds (i.e., (1-p)% for the anomaly). The odds are coded as <span class="math inline">\(p/(1-p)\)</span>.
Thus, if the <em>t</em>-statistic is equal to 2 (corresponding to a <em>p</em>-value of 5% roughly) and the prior odds are equal to 6, then the Bpv is equal to <span class="math inline">\(e^{-2}\times 6 \times(1+e^{-2}\times 6)^{-1}\approx 0.448\)</span> and there is a 44.8% chance that the null is true. This interpretation stands in sharp contrast with the original <span class="math inline">\(p\)</span>-value which cannot be viewed as a probability that the null holds. Of course, one drawback is that the level of the prior is crucial and solely user-specified.</p>
<p>The work of <span class="citation">Alexander Chinco, Neuhierl, and Weber (<a href="solutions-to-exercises.html#ref-chinco2019estimating">2020</a>)</span> is very different but shares some key concepts, like the introduction of Bayesian priors in regression outputs. They show that coercing the predictive regression with an <span class="math inline">\(L^2\)</span> constraint (see the ridge regression in Chapter <a href="lasso.html#lasso">5</a>) amounts to introducing views on what the true distribution of <span class="math inline">\(b\)</span> is. The stronger the constraint, the more the estimate <span class="math inline">\(\hat{b}\)</span> will be shrunk towards zero. One key idea in their work is the assumption of a distribution for the true <span class="math inline">\(b\)</span> across many anomalies. It is assumed to be Gaussian and centered. The interesting parameter is the standard deviation: the larger it is, the more frequently significant anomalies are discovered. Notably, the authors show that this parameter changes through time and we refer to the original paper for more details on this subject.</p>
</div>
<div id="fama-macbeth-regressions" class="section level3" number="3.2.5">
<h3>
<span class="header-section-number">3.2.5</span> Fama-Macbeth regressions<a class="anchor" aria-label="anchor" href="#fama-macbeth-regressions"><i class="fas fa-link"></i></a>
</h3>
<p>
Another detection method was proposed by <span class="citation">Fama and MacBeth (<a href="solutions-to-exercises.html#ref-fama1973risk">1973</a>)</span> through a two-stage regression analysis of risk premia. The first stage is a simple estimation of the relationship <a href="factor.html#eq:apt">(3.1)</a>: the regressions are run on a stock-by-stock basis over the corresponding time series. The resulting estimates <span class="math inline">\(\hat{\beta}_{i,k}\)</span> are then plugged into a second series of regressions:
<span class="math display">\[\begin{equation}
r_{t,n}= \gamma_{t,0} + \sum_{k=1}^K\gamma_{t,k}\hat{\beta}_{n,k} + \varepsilon_{t,n},
\end{equation}\]</span>
which are run date-by-date on the cross-section of assets.<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;Originally, &lt;span class="citation"&gt;Fama and MacBeth (&lt;a href="solutions-to-exercises.html#ref-fama1973risk"&gt;1973&lt;/a&gt;)&lt;/span&gt; work with the market beta only: &lt;span class="math inline"&gt;\(r_{t,n}=\alpha_n+\beta_nr_{t,M}+\epsilon_{t,n}\)&lt;/span&gt; and the second pass included nonlinear terms: &lt;span class="math inline"&gt;\(r_{t,n}=\gamma_{n,0}+\gamma_{t,1}\hat{\beta}_{n}+\gamma_{t,2}\hat{\beta}^2_n+\gamma_{t,3}\hat{s}_n+\eta_{t,n}\)&lt;/span&gt;, where the &lt;span class="math inline"&gt;\(\hat{s}_n\)&lt;/span&gt; are risk estimates for the assets that are not related to the betas. It is then possible to perform asset pricing tests to infer some properties. For instance, test whether betas have a linear influence on returns or not (&lt;span class="math inline"&gt;\(\mathbb{E}[\gamma_{t,2}]=0\)&lt;/span&gt;), or test the validity of the CAPM (which implies &lt;span class="math inline"&gt;\(\mathbb{E}[\gamma_{t,0}]=0\)&lt;/span&gt;).&lt;/p&gt;'><sup>7</sup></a> Theoretically, the betas would be known and the regression would be run on the <span class="math inline">\(\beta_{n,k}\)</span> instead of their estimated values.
The <span class="math inline">\(\hat{\gamma}_{t,k}\)</span> estimate the premia of factor <span class="math inline">\(k\)</span> at time <span class="math inline">\(t\)</span>. Under suitable distributional assumptions on the <span class="math inline">\(\varepsilon_{t,n}\)</span>, statistical tests can be performed to determine whether these premia are significant or not. Typically, the statistic on the time-aggregated (average) premia <span class="math inline">\(\hat{\gamma}_k=\frac{1}{T}\sum_{t=1}^T\hat{\gamma}_{t,k}\)</span>:
<span class="math display">\[t_k=\frac{\hat{\gamma}_k}{\hat{\sigma_k}/\sqrt{T}}\]</span>
is often used in pure Gaussian contexts to assess whether or not the factor is significant (<span class="math inline">\(\hat{\sigma}_k\)</span> is the standard deviation of the <span class="math inline">\(\hat{\gamma}_{t,k}\)</span>).</p>
<p>We refer to <span class="citation">Jagannathan and Wang (<a href="solutions-to-exercises.html#ref-jagannathan1998asymptotic">1998</a>)</span> and <span class="citation">Petersen (<a href="solutions-to-exercises.html#ref-petersen2009estimating">2009</a>)</span> for technical discussions on the biases and losses in accuracy that can be induced by standard ordinary least squares (OLS) estimations. Moreover, as the <span class="math inline">\(\hat{\beta}_{i,k}\)</span> in the second-pass regression are <em>estimates</em>, a second level of errors can arise (the so-called errors in variables). The interested reader will find some extensions and solutions in <span class="citation">Shanken (<a href="solutions-to-exercises.html#ref-shanken1992estimation">1992</a>)</span>, <span class="citation">Ang, Liu, and Schwarz (<a href="solutions-to-exercises.html#ref-ang2018using">2018</a>)</span> and <span class="citation">Jegadeesh et al. (<a href="solutions-to-exercises.html#ref-jegadeesh2019empirical">2019</a>)</span>.</p>
<p>Below, we perform <span class="citation">Fama and MacBeth (<a href="solutions-to-exercises.html#ref-fama1973risk">1973</a>)</span> regressions on our sample. We start by the first pass: individual estimation of betas. We build a dedicated function below and use some functional programming to automate the process. We stick to the original implementation of the estimation and perform synchronous regressions.</p>
<div class="sourceCode" id="cb15"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">nb_factors</span> <span class="op">&lt;-</span> <span class="fl">5</span>                                                     <span class="co"># Number of factors</span></span>
<span><span class="va">data_FM</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">left_join</a></span><span class="op">(</span><span class="va">data_ml</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                    <span class="co"># Join the 2 datasets</span></span>
<span>                         <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="va">date</span>, <span class="va">stock_id</span>, <span class="va">R1M_Usd</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="co"># (with returns...</span></span>
<span>                         <span class="fu"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span><span class="op">(</span><span class="va">stock_id</span> <span class="op"><a href="https://rdrr.io/r/base/match.html">%in%</a></span> <span class="va">stock_ids_short</span><span class="op">)</span>,     <span class="co"># ... over some stocks)</span></span>
<span>                     <span class="va">FF_factors</span>, </span>
<span>                     by <span class="op">=</span> <span class="st">"date"</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> </span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by</a></span><span class="op">(</span><span class="va">stock_id</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                          <span class="co"># Grouping</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate</a></span><span class="op">(</span>R1M_Usd <span class="op">=</span> <span class="fu"><a href="https://dplyr.tidyverse.org/reference/lead-lag.html">lag</a></span><span class="op">(</span><span class="va">R1M_Usd</span><span class="op">)</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                              <span class="co"># Lag returns</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/group_by.html">ungroup</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/stats/na.fail.html">na.omit</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                                   <span class="co"># Remove missing points</span></span>
<span>    <span class="fu"><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider</a></span><span class="op">(</span>names_from <span class="op">=</span> <span class="st">"stock_id"</span>, values_from <span class="op">=</span> <span class="st">"R1M_Usd"</span><span class="op">)</span></span>
<span><span class="va">models</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste0</a></span><span class="op">(</span><span class="st">"`"</span>, <span class="va">stock_ids_short</span>, </span>
<span>                        <span class="st">'` ~  MKT_RF + SMB + HML + RMW + CMA'</span><span class="op">)</span>,           <span class="co"># Model spec</span></span>
<span>                 <span class="kw">function</span><span class="op">(</span><span class="va">f</span><span class="op">)</span><span class="op">{</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/formula.html">as.formula</a></span><span class="op">(</span><span class="va">f</span><span class="op">)</span>, data <span class="op">=</span> <span class="va">data_FM</span>,           <span class="co"># Call lm(.)</span></span>
<span>                                 na.action<span class="op">=</span><span class="st">"na.exclude"</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>       </span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                    <span class="co"># Gather the output</span></span>
<span>                         <span class="st">"$"</span><span class="op">(</span><span class="va">coef</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                    <span class="co"># Keep only coefs</span></span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                 <span class="co"># Convert to dataframe</span></span>
<span>                         <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="va">Estimate</span><span class="op">)</span><span class="op">}</span>                         <span class="co"># Keep the estimates</span></span>
<span>                 <span class="op">)</span></span>
<span><span class="va">betas</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/unlist.html">unlist</a></span><span class="op">(</span><span class="va">models</span><span class="op">)</span>, ncol <span class="op">=</span> <span class="va">nb_factors</span> <span class="op">+</span> <span class="fl">1</span>, byrow <span class="op">=</span> <span class="cn">T</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>     <span class="co"># Extract the betas</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>row.names <span class="op">=</span> <span class="va">stock_ids_short</span><span class="op">)</span>                               <span class="co"># Format: row names</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">betas</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Constant"</span>, <span class="st">"MKT_RF"</span>, <span class="st">"SMB"</span>, <span class="st">"HML"</span>, <span class="st">"RMW"</span>, <span class="st">"CMA"</span><span class="op">)</span>    <span class="co"># Format: col names</span></span></code></pre></div>
<p></p>
<div class="inline-table"><table class="table table-sm">
<caption>
<span id="tab:FMreg">TABLE 3.2: </span>Sample of beta values (row numbers are stock IDs).
</caption>
<thead><tr>
<th style="text-align:left;">
</th>
<th style="text-align:right;">
Constant
</th>
<th style="text-align:right;">
MKT_RF
</th>
<th style="text-align:right;">
SMB
</th>
<th style="text-align:right;">
HML
</th>
<th style="text-align:right;">
RMW
</th>
<th style="text-align:right;">
CMA
</th>
</tr></thead>
<tbody>
<tr>
<td style="text-align:left;">
1
</td>
<td style="text-align:right;">
0.008
</td>
<td style="text-align:right;">
1.417
</td>
<td style="text-align:right;">
0.529
</td>
<td style="text-align:right;">
0.621
</td>
<td style="text-align:right;">
0.980
</td>
<td style="text-align:right;">
-0.379
</td>
</tr>
<tr>
<td style="text-align:left;">
3
</td>
<td style="text-align:right;">
-0.002
</td>
<td style="text-align:right;">
0.812
</td>
<td style="text-align:right;">
1.108
</td>
<td style="text-align:right;">
0.882
</td>
<td style="text-align:right;">
0.300
</td>
<td style="text-align:right;">
-0.552
</td>
</tr>
<tr>
<td style="text-align:left;">
4
</td>
<td style="text-align:right;">
0.004
</td>
<td style="text-align:right;">
0.363
</td>
<td style="text-align:right;">
0.306
</td>
<td style="text-align:right;">
-0.050
</td>
<td style="text-align:right;">
0.595
</td>
<td style="text-align:right;">
0.200
</td>
</tr>
<tr>
<td style="text-align:left;">
7
</td>
<td style="text-align:right;">
0.005
</td>
<td style="text-align:right;">
0.431
</td>
<td style="text-align:right;">
0.675
</td>
<td style="text-align:right;">
0.230
</td>
<td style="text-align:right;">
0.322
</td>
<td style="text-align:right;">
0.177
</td>
</tr>
<tr>
<td style="text-align:left;">
9
</td>
<td style="text-align:right;">
0.004
</td>
<td style="text-align:right;">
0.838
</td>
<td style="text-align:right;">
0.678
</td>
<td style="text-align:right;">
1.057
</td>
<td style="text-align:right;">
0.078
</td>
<td style="text-align:right;">
0.062
</td>
</tr>
<tr>
<td style="text-align:left;">
11
</td>
<td style="text-align:right;">
-0.001
</td>
<td style="text-align:right;">
0.986
</td>
<td style="text-align:right;">
0.121
</td>
<td style="text-align:right;">
0.483
</td>
<td style="text-align:right;">
-0.124
</td>
<td style="text-align:right;">
0.018
</td>
</tr>
</tbody>
</table></div>
<p>In the table, <em>MKT_RF</em> is the market return minus the risk free rate. The corresponding coefficient is often referred to as the beta, especially in univariate regressions. We then reformat these betas from Table <a href="factor.html#tab:FMreg">3.2</a> to prepare the second pass. Each line corresponds to one asset: the first 5 columns are the estimated factor loadings and the remaining ones are the asset returns (date by date).</p>
<div class="sourceCode" id="cb16"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">loadings</span> <span class="op">&lt;-</span> <span class="va">betas</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                            <span class="co"># Start from loadings (betas)</span></span>
<span>    <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="op">-</span><span class="va">Constant</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                 <span class="co"># Remove constant</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span><span class="op">)</span>                                 <span class="co"># Convert to dataframe             </span></span>
<span><span class="va">ret</span> <span class="op">&lt;-</span> <span class="va">returns</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                               <span class="co"># Start from returns</span></span>
<span>    <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="op">-</span><span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                     <span class="co"># Keep the returns only</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>row.names <span class="op">=</span> <span class="va">returns</span><span class="op">$</span><span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>     <span class="co"># Set row names</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/t.html">t</a></span><span class="op">(</span><span class="op">)</span>                                          <span class="co"># Transpose</span></span>
<span><span class="va">FM_data</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span><span class="op">(</span><span class="va">loadings</span>, <span class="va">ret</span><span class="op">)</span>                  <span class="co"># Aggregate both</span></span></code></pre></div>
<p></p>
<div class="inline-table"><table class="table table-sm">
<caption>
<span id="tab:betaformat">TABLE 3.3: </span>Sample of reformatted beta values (ready for regression).
</caption>
<thead><tr>
<th style="text-align:left;">
</th>
<th style="text-align:right;">
MKT_RF
</th>
<th style="text-align:right;">
SMB
</th>
<th style="text-align:right;">
HML
</th>
<th style="text-align:right;">
RMW
</th>
<th style="text-align:right;">
CMA
</th>
<th style="text-align:right;">
2000-01-31
</th>
<th style="text-align:right;">
2000-02-29
</th>
<th style="text-align:right;">
2000-03-31
</th>
</tr></thead>
<tbody>
<tr>
<td style="text-align:left;">
1
</td>
<td style="text-align:right;">
1.4173293
</td>
<td style="text-align:right;">
0.5292414
</td>
<td style="text-align:right;">
0.6206285
</td>
<td style="text-align:right;">
0.9800055
</td>
<td style="text-align:right;">
-0.3788295
</td>
<td style="text-align:right;">
-0.036
</td>
<td style="text-align:right;">
0.263
</td>
<td style="text-align:right;">
0.031
</td>
</tr>
<tr>
<td style="text-align:left;">
3
</td>
<td style="text-align:right;">
0.8120909
</td>
<td style="text-align:right;">
1.1083495
</td>
<td style="text-align:right;">
0.8824799
</td>
<td style="text-align:right;">
0.3002839
</td>
<td style="text-align:right;">
-0.5520309
</td>
<td style="text-align:right;">
0.077
</td>
<td style="text-align:right;">
-0.024
</td>
<td style="text-align:right;">
0.018
</td>
</tr>
<tr>
<td style="text-align:left;">
4
</td>
<td style="text-align:right;">
0.3629688
</td>
<td style="text-align:right;">
0.3061975
</td>
<td style="text-align:right;">
-0.0504485
</td>
<td style="text-align:right;">
0.5954709
</td>
<td style="text-align:right;">
0.2003223
</td>
<td style="text-align:right;">
-0.016
</td>
<td style="text-align:right;">
0.000
</td>
<td style="text-align:right;">
0.153
</td>
</tr>
<tr>
<td style="text-align:left;">
7
</td>
<td style="text-align:right;">
0.4314569
</td>
<td style="text-align:right;">
0.6748355
</td>
<td style="text-align:right;">
0.2303770
</td>
<td style="text-align:right;">
0.3220782
</td>
<td style="text-align:right;">
0.1773031
</td>
<td style="text-align:right;">
-0.009
</td>
<td style="text-align:right;">
0.027
</td>
<td style="text-align:right;">
0.000
</td>
</tr>
<tr>
<td style="text-align:left;">
9
</td>
<td style="text-align:right;">
0.8381647
</td>
<td style="text-align:right;">
0.6775922
</td>
<td style="text-align:right;">
1.0571755
</td>
<td style="text-align:right;">
0.0777108
</td>
<td style="text-align:right;">
0.0622119
</td>
<td style="text-align:right;">
0.032
</td>
<td style="text-align:right;">
0.076
</td>
<td style="text-align:right;">
-0.025
</td>
</tr>
<tr>
<td style="text-align:left;">
11
</td>
<td style="text-align:right;">
0.9858317
</td>
<td style="text-align:right;">
0.1205505
</td>
<td style="text-align:right;">
0.4833864
</td>
<td style="text-align:right;">
-0.1244214
</td>
<td style="text-align:right;">
0.0175315
</td>
<td style="text-align:right;">
0.144
</td>
<td style="text-align:right;">
0.258
</td>
<td style="text-align:right;">
0.049
</td>
</tr>
</tbody>
</table></div>
<p></p>
<p>We observe that the values of the first column (market betas) revolve around one, which is what we would expect.
Finally, we are ready for the second round of regressions.</p>
<div class="sourceCode" id="cb17"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">models</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="st">"`"</span>, <span class="va">returns</span><span class="op">$</span><span class="va">date</span>, <span class="st">"`"</span>, <span class="st">' ~  MKT_RF + SMB + HML + RMW + CMA'</span>, sep <span class="op">=</span> <span class="st">""</span><span class="op">)</span>,</span>
<span><span class="kw">function</span><span class="op">(</span><span class="va">f</span><span class="op">)</span><span class="op">{</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/formula.html">as.formula</a></span><span class="op">(</span><span class="va">f</span><span class="op">)</span>, data <span class="op">=</span> <span class="va">FM_data</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                        <span class="co"># Call lm(.)</span></span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                    <span class="co"># Gather the output</span></span>
<span>                         <span class="st">"$"</span><span class="op">(</span><span class="va">coef</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                    <span class="co"># Keep only the coefs</span></span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                 <span class="co"># Convert to dataframe</span></span>
<span>                         <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="va">Estimate</span><span class="op">)</span><span class="op">}</span>                         <span class="co"># Keep only estimates</span></span>
<span>                 <span class="op">)</span></span>
<span><span class="va">gammas</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/unlist.html">unlist</a></span><span class="op">(</span><span class="va">models</span><span class="op">)</span>, ncol <span class="op">=</span> <span class="va">nb_factors</span> <span class="op">+</span> <span class="fl">1</span>, byrow <span class="op">=</span> <span class="cn">T</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>    <span class="co"># Switch to dataframe</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>row.names <span class="op">=</span> <span class="va">returns</span><span class="op">$</span><span class="va">date</span><span class="op">)</span>                                  <span class="co"># &amp; set row names</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">gammas</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Constant"</span>, <span class="st">"MKT_RF"</span>, <span class="st">"SMB"</span>, <span class="st">"HML"</span>, <span class="st">"RMW"</span>, <span class="st">"CMA"</span><span class="op">)</span>   <span class="co"># Set col names</span></span></code></pre></div>
<p></p>
<div class="inline-table"><table class="table table-sm">
<caption>
<span id="tab:FamaMacBeth2b">TABLE 3.4: </span>Sample of gamma (premia) values.
</caption>
<thead><tr>
<th style="text-align:left;">
</th>
<th style="text-align:right;">
Constant
</th>
<th style="text-align:right;">
MKT_RF
</th>
<th style="text-align:right;">
SMB
</th>
<th style="text-align:right;">
HML
</th>
<th style="text-align:right;">
RMW
</th>
<th style="text-align:right;">
CMA
</th>
</tr></thead>
<tbody>
<tr>
<td style="text-align:left;">
2000-01-31
</td>
<td style="text-align:right;">
-0.011
</td>
<td style="text-align:right;">
0.041
</td>
<td style="text-align:right;">
0.223
</td>
<td style="text-align:right;">
-0.143
</td>
<td style="text-align:right;">
-0.276
</td>
<td style="text-align:right;">
0.033
</td>
</tr>
<tr>
<td style="text-align:left;">
2000-02-29
</td>
<td style="text-align:right;">
0.014
</td>
<td style="text-align:right;">
0.075
</td>
<td style="text-align:right;">
-0.133
</td>
<td style="text-align:right;">
0.052
</td>
<td style="text-align:right;">
0.085
</td>
<td style="text-align:right;">
-0.036
</td>
</tr>
<tr>
<td style="text-align:left;">
2000-03-31
</td>
<td style="text-align:right;">
0.004
</td>
<td style="text-align:right;">
-0.010
</td>
<td style="text-align:right;">
-0.013
</td>
<td style="text-align:right;">
0.049
</td>
<td style="text-align:right;">
0.040
</td>
<td style="text-align:right;">
0.050
</td>
</tr>
<tr>
<td style="text-align:left;">
2000-04-30
</td>
<td style="text-align:right;">
0.125
</td>
<td style="text-align:right;">
-0.147
</td>
<td style="text-align:right;">
-0.095
</td>
<td style="text-align:right;">
0.157
</td>
<td style="text-align:right;">
0.076
</td>
<td style="text-align:right;">
-0.021
</td>
</tr>
<tr>
<td style="text-align:left;">
2000-05-31
</td>
<td style="text-align:right;">
0.052
</td>
<td style="text-align:right;">
-0.011
</td>
<td style="text-align:right;">
0.074
</td>
<td style="text-align:right;">
-0.096
</td>
<td style="text-align:right;">
-0.095
</td>
<td style="text-align:right;">
-0.056
</td>
</tr>
<tr>
<td style="text-align:left;">
2000-06-30
</td>
<td style="text-align:right;">
0.027
</td>
<td style="text-align:right;">
-0.030
</td>
<td style="text-align:right;">
-0.018
</td>
<td style="text-align:right;">
0.054
</td>
<td style="text-align:right;">
0.043
</td>
<td style="text-align:right;">
0.016
</td>
</tr>
</tbody>
</table></div>
<p></p>
<p>Visually, the estimated premia are also very volatile. We plot their estimated values for the market, SMB and HML factors.</p>
<div class="sourceCode" id="cb18"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">gammas</span><span class="op">[</span><span class="fl">2</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/nrow.html">nrow</a></span><span class="op">(</span><span class="va">gammas</span><span class="op">)</span>,<span class="op">]</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                                         <span class="co"># Take gammas:</span></span>
<span>    <span class="co"># The first row is omitted because the first row of returns is undefined</span></span>
<span>    <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="va">MKT_RF</span>, <span class="va">SMB</span>, <span class="va">HML</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                             <span class="co"># Select 3 factors</span></span>
<span>    <span class="fu"><a href="https://dplyr.tidyverse.org/reference/bind_cols.html">bind_cols</a></span><span class="op">(</span>date <span class="op">=</span> <span class="va">data_FM</span><span class="op">$</span><span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                              <span class="co"># Add date</span></span>
<span>    <span class="fu"><a href="https://tidyr.tidyverse.org/reference/gather.html">gather</a></span><span class="op">(</span>key <span class="op">=</span> <span class="va">factor</span>, value <span class="op">=</span> <span class="va">gamma</span>, <span class="op">-</span><span class="va">date</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                  <span class="co"># Put in tidy shape</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span>x <span class="op">=</span> <span class="va">date</span>, y <span class="op">=</span> <span class="va">gamma</span>, color <span class="op">=</span> <span class="va">factor</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span>              <span class="co"># Plot</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_path.html">geom_line</a></span><span class="op">(</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/facet_grid.html">facet_grid</a></span><span class="op">(</span> <span class="va">factor</span><span class="op">~</span><span class="va">.</span> <span class="op">)</span> <span class="op">+</span>                          <span class="co"># Lines &amp; facets</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/scale_manual.html">scale_color_manual</a></span><span class="op">(</span>values<span class="op">=</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"#F87E1F"</span>, <span class="st">"#0570EA"</span>, <span class="st">"#F81F40"</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span> <span class="co"># Colors</span></span>
<span>    <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed</a></span><span class="op">(</span><span class="fl">980</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_light</a></span><span class="op">(</span><span class="op">)</span>                                <span class="co"># Fix x/y ratio</span></span></code></pre></div>
<div class="figure">
<span style="display:block;" id="fig:premiaplot"></span>
<img src="ML_factor_files/figure-html/premiaplot-1.png" alt="Time series plot of gammas (premia) in Fama-Macbeth regressions." width="672"><p class="caption">
FIGURE 3.3: Time series plot of gammas (premia) in Fama-Macbeth regressions.
</p>
</div>
<p></p>
<p>The two spikes at the end of the sample signal potential colinearity issues; two factors seem to compensate in an unclear aggregate effect. This underlines the usefulness of penalized estimates (see Chapter <a href="lasso.html#lasso">5</a>).</p>
</div>
<div id="factor-competition" class="section level3" number="3.2.6">
<h3>
<span class="header-section-number">3.2.6</span> Factor competition<a class="anchor" aria-label="anchor" href="#factor-competition"><i class="fas fa-link"></i></a>
</h3>
<p>
The core purpose of factors is to explain the cross-section of stock returns. For theoretical and practical reasons, it is preferable if redundancies within factors are avoided. Indeed, redundancies imply collinearity which is known to perturb estimates (<span class="citation">Belsley, Kuh, and Welsch (<a href="solutions-to-exercises.html#ref-belsley2005regression">2005</a>)</span>). In addition, when asset managers decompose the performance of their returns into factors, overlaps (high absolute correlations) between factors yield exposures that are less interpretable; positive and negative exposures compensate each other spuriously.</p>
<p>A simple protocol to sort out redundant factors is to run regressions of each factor against all others:
<span class="math display" id="eq:faccompet">\[\begin{equation}
\tag{3.4}
f_{t,k} = a_k +\sum_{j\neq k} \delta_{k,j} f_{t,j} + \epsilon_{t,k}.
\end{equation}\]</span>
The interesting metric is then the test statistic associated to the estimation of <span class="math inline">\(a_k\)</span>. If <span class="math inline">\(a_k\)</span> is significantly different from zero, then the cross-section of (other) factors fails to explain exhaustively the average return of factor <span class="math inline">\(k\)</span>. Otherwise, the return of the factor can be captured by exposures to the other factors and is thus redundant.</p>
<p>One mainstream application of this technique was performed in <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2015five">2015</a>)</span>, in which the authors show that the HML factor is redundant when taking into account four other factors (Market, SMB, RMW and CMA). Below, we reproduce their analysis on an updated sample. We start our analysis directly with the database maintained by Kenneth French. </p>
<p>We can run the regressions that determine the redundancy of factors via the procedure defined in Equation <a href="factor.html#eq:faccompet">(3.4)</a>.</p>
<div class="sourceCode" id="cb19"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">factors</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"MKT_RF"</span>, <span class="st">"SMB"</span>, <span class="st">"HML"</span>, <span class="st">"RMW"</span>, <span class="st">"CMA"</span><span class="op">)</span></span>
<span><span class="va">models</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/lapply.html">lapply</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="va">factors</span>, <span class="st">' ~  MKT_RF + SMB + HML + RMW + CMA-'</span>,<span class="va">factors</span><span class="op">)</span>,</span>
<span> <span class="kw">function</span><span class="op">(</span><span class="va">f</span><span class="op">)</span><span class="op">{</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/formula.html">as.formula</a></span><span class="op">(</span><span class="va">f</span><span class="op">)</span>, data <span class="op">=</span> <span class="va">FF_factors</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>               <span class="co"># Call lm(.)</span></span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                               <span class="co"># Gather the output</span></span>
<span>                         <span class="st">"$"</span><span class="op">(</span><span class="va">coef</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                               <span class="co"># Keep only the coefs</span></span>
<span>                         <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>                            <span class="co"># Convert to dataframe</span></span>
<span>                         <span class="fu"><a href="https://dplyr.tidyverse.org/reference/filter.html">filter</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">rownames</a></span><span class="op">(</span><span class="va">.</span><span class="op">)</span> <span class="op">==</span> <span class="st">"(Intercept)"</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>    <span class="co"># Keep only the Intercept</span></span>
<span>                         <span class="fu">dplyr</span><span class="fu">::</span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/select.html">select</a></span><span class="op">(</span><span class="va">Estimate</span>,<span class="va">`Pr...t..`</span><span class="op">)</span><span class="op">}</span>         <span class="co"># Keep the coef &amp; p-value</span></span>
<span>                 <span class="op">)</span></span>
<span><span class="va">alphas</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/unlist.html">unlist</a></span><span class="op">(</span><span class="va">models</span><span class="op">)</span>, ncol <span class="op">=</span> <span class="fl">2</span>, byrow <span class="op">=</span> <span class="cn">T</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span>       <span class="co"># Switch from list to dataframe</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span>row.names <span class="op">=</span> <span class="va">factors</span><span class="op">)</span></span>
<span><span class="co"># alphas # To see the alphas (optional)</span></span></code></pre></div>
<p></p>
<p>We obtain the vector of <span class="math inline">\(\alpha\)</span> values from Equation (<a href="factor.html#eq:faccompet">(3.4)</a>). Below, we format these figures along with <span class="math inline">\(p\)</span>-value thresholds and export them in a summary table. The significance levels of coefficients is coded as follows: <span class="math inline">\(0&lt;(***)&lt;0.001&lt;(**)&lt;0.01&lt;(*)&lt;0.05\)</span>.</p>
<div class="sourceCode" id="cb20"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">results</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="cn">NA</span>, nrow <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span>, ncol <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span> <span class="op">+</span> <span class="fl">1</span><span class="op">)</span>   <span class="co"># Coefs</span></span>
<span><span class="va">signif</span>  <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span><span class="op">(</span><span class="cn">NA</span>, nrow <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span>, ncol <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span> <span class="op">+</span> <span class="fl">1</span><span class="op">)</span>   <span class="co"># p-values</span></span>
<span><span class="kw">for</span><span class="op">(</span><span class="va">j</span> <span class="kw">in</span> <span class="fl">1</span><span class="op">:</span><span class="fu"><a href="https://rdrr.io/r/base/length.html">length</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span><span class="op">)</span><span class="op">{</span></span>
<span>    <span class="va">form</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="va">factors</span><span class="op">[</span><span class="va">j</span><span class="op">]</span>,</span>
<span>                  <span class="st">' ~  MKT_RF + SMB + HML + RMW + CMA-'</span>,<span class="va">factors</span><span class="op">[</span><span class="va">j</span><span class="op">]</span><span class="op">)</span>         <span class="co"># Build model</span></span>
<span>    <span class="va">fit</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="va">form</span>, data <span class="op">=</span> <span class="va">FF_factors</span><span class="op">)</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/summary.html">summary</a></span><span class="op">(</span><span class="op">)</span>                        <span class="co"># Estimate model</span></span>
<span>    <span class="va">coef</span> <span class="op">&lt;-</span> <span class="va">fit</span><span class="op">$</span><span class="va">coefficients</span><span class="op">[</span>,<span class="fl">1</span><span class="op">]</span>                                            <span class="co"># Keep coefficients</span></span>
<span>    <span class="va">p_val</span> <span class="op">&lt;-</span> <span class="va">fit</span><span class="op">$</span><span class="va">coefficients</span><span class="op">[</span>,<span class="fl">4</span><span class="op">]</span>                                           <span class="co"># Keep p-values</span></span>
<span>    <span class="va">results</span><span class="op">[</span><span class="va">j</span>,<span class="op">-</span><span class="op">(</span><span class="va">j</span><span class="op">+</span><span class="fl">1</span><span class="op">)</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="va">coef</span>                                               <span class="co"># Fill matrix</span></span>
<span>    <span class="va">signif</span><span class="op">[</span><span class="va">j</span>,<span class="op">-</span><span class="op">(</span><span class="va">j</span><span class="op">+</span><span class="fl">1</span><span class="op">)</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="va">p_val</span></span>
<span><span class="op">}</span></span>
<span><span class="va">signif</span><span class="op">[</span><span class="fu"><a href="https://rdrr.io/r/base/NA.html">is.na</a></span><span class="op">(</span><span class="va">signif</span><span class="op">)</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="fl">1</span>                                                  <span class="co"># Kick out NAs</span></span>
<span><span class="va">results</span> <span class="op">&lt;-</span> <span class="va">results</span> <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/Round.html">round</a></span><span class="op">(</span><span class="fl">3</span><span class="op">)</span>  <span class="op"><a href="https://magrittr.tidyverse.org/reference/pipe.html">%&gt;%</a></span> <span class="fu"><a href="https://rdrr.io/r/base/data.frame.html">data.frame</a></span><span class="op">(</span><span class="op">)</span>                           <span class="co"># Basic formatting</span></span>
<span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.001</span><span class="op">]</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.001</span><span class="op">]</span>,<span class="st">" (***)"</span><span class="op">)</span>              <span class="co"># 3 star signif</span></span>
<span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&gt;</span><span class="fl">0.001</span><span class="op">&amp;</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.01</span><span class="op">]</span> <span class="op">&lt;-</span>                                        <span class="co"># 2 star signif</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&gt;</span><span class="fl">0.001</span><span class="op">&amp;</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.01</span><span class="op">]</span>,<span class="st">" (**)"</span><span class="op">)</span></span>
<span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&gt;</span><span class="fl">0.01</span><span class="op">&amp;</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.05</span><span class="op">]</span> <span class="op">&lt;-</span>                                         <span class="co"># 1 star signif</span></span>
<span>    <span class="fu"><a href="https://rdrr.io/r/base/paste.html">paste</a></span><span class="op">(</span><span class="va">results</span><span class="op">[</span><span class="va">signif</span><span class="op">&gt;</span><span class="fl">0.01</span><span class="op">&amp;</span><span class="va">signif</span><span class="op">&lt;</span><span class="fl">0.05</span><span class="op">]</span>,<span class="st">" (*)"</span><span class="op">)</span>     </span>
<span><span class="va">results</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/character.html">as.character</a></span><span class="op">(</span><span class="va">factors</span><span class="op">)</span>, <span class="va">results</span><span class="op">)</span>                            <span class="co"># Add dep. variable</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/colnames.html">colnames</a></span><span class="op">(</span><span class="va">results</span><span class="op">)</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"Dep. Variable"</span>,<span class="st">"Intercept"</span>, <span class="va">factors</span><span class="op">)</span>                <span class="co"># Add column names</span></span></code></pre></div>
<p>
</p>
<div class="inline-table"><table class="table table-sm">
<caption>
<span id="tab:faccompet2">TABLE 3.5: </span>Factor competition among the Fama and French (2015) five factors.
</caption>
<thead><tr>
<th style="text-align:left;">
Dep. Variable
</th>
<th style="text-align:left;">
Intercept
</th>
<th style="text-align:left;">
MKT_RF
</th>
<th style="text-align:left;">
SMB
</th>
<th style="text-align:left;">
HML
</th>
<th style="text-align:left;">
RMW
</th>
<th style="text-align:left;">
CMA
</th>
</tr></thead>
<tbody>
<tr>
<td style="text-align:left;">
MKT_RF
</td>
<td style="text-align:left;">
0.008 (***)
</td>
<td style="text-align:left;">
NA
</td>
<td style="text-align:left;">
0.257 (***)
</td>
<td style="text-align:left;">
0.12
</td>
<td style="text-align:left;">
-0.363 (***)
</td>
<td style="text-align:left;">
-0.945 (***)
</td>
</tr>
<tr>
<td style="text-align:left;">
SMB
</td>
<td style="text-align:left;">
0.003 (*)
</td>
<td style="text-align:left;">
0.131 (***)
</td>
<td style="text-align:left;">
NA
</td>
<td style="text-align:left;">
0.083
</td>
<td style="text-align:left;">
-0.435 (***)
</td>
<td style="text-align:left;">
-0.139
</td>
</tr>
<tr>
<td style="text-align:left;">
HML
</td>
<td style="text-align:left;">
-0.001
</td>
<td style="text-align:left;">
0.032
</td>
<td style="text-align:left;">
0.044
</td>
<td style="text-align:left;">
NA
</td>
<td style="text-align:left;">
0.169 (***)
</td>
<td style="text-align:left;">
1.027 (***)
</td>
</tr>
<tr>
<td style="text-align:left;">
RMW
</td>
<td style="text-align:left;">
0.004 (***)
</td>
<td style="text-align:left;">
-0.096 (***)
</td>
<td style="text-align:left;">
-0.225 (***)
</td>
<td style="text-align:left;">
0.165 (***)
</td>
<td style="text-align:left;">
NA
</td>
<td style="text-align:left;">
-0.319 (***)
</td>
</tr>
<tr>
<td style="text-align:left;">
CMA
</td>
<td style="text-align:left;">
0.002 (***)
</td>
<td style="text-align:left;">
-0.112 (***)
</td>
<td style="text-align:left;">
-0.032
</td>
<td style="text-align:left;">
0.45 (***)
</td>
<td style="text-align:left;">
-0.144 (***)
</td>
<td style="text-align:left;">
NA
</td>
</tr>
</tbody>
</table></div>
<p></p>
<p>We confirm that the HML factor remains redundant when the four others are present in the asset pricing model. The figures we obtain are very close to the ones in the original paper (<span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2015five">2015</a>)</span>), which makes sense, since we only add 5 years to their initial sample.</p>
<p>At a more macro-level, researchers also try to figure out which models (i.e., combinations of factors) are the most likely, given the data empirically observed (and possibly given priors formulated by the econometrician). For instance, this stream of literature seeks to quantify to which extent the 3-factor model of <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span> outperforms the 5 factors in <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2015five">2015</a>)</span>. In this direction, <span class="citation">De Moor, Dhaene, and Sercu (<a href="solutions-to-exercises.html#ref-de2015comparing">2015</a>)</span> introduce a novel computation for <em>p</em>-values that compare the relative likelihood that two models pass a zero-alpha test. More generally, the Bayesian method of <span class="citation">Barillas and Shanken (<a href="solutions-to-exercises.html#ref-barillas2018comparing">2018</a>)</span> was subsequently improved by <span class="citation">Chib, Zeng, and Zhao (<a href="solutions-to-exercises.html#ref-chib2019comparing">2020</a>)</span> - see also <span class="citation">Chib and Zeng (<a href="solutions-to-exercises.html#ref-chib2020factors">2020</a>)</span> and <span class="citation">Chib et al. (<a href="solutions-to-exercises.html#ref-chib2021winners">2020</a>)</span> (an R package exists for the former: czfactor). For a discussion on model comparison from a transaction cost perspective, we refer to <span class="citation">S. A. Li, DeMiguel, and Martin-Utrera (<a href="solutions-to-exercises.html#ref-li2020factors">2020</a>)</span>. </p>
<p>Lastly, even the optimal number of factors is a subject of disagreement among conclusions of recent work. While the traditional literature focuses on a limited number (3-5) of factors (see also <span class="citation">Hwang and Rubesam (<a href="solutions-to-exercises.html#ref-hwang2021bayesian">2021</a>)</span>), more recent research by <span class="citation">DeMiguel et al. (<a href="solutions-to-exercises.html#ref-martin2018transaction">2020</a>)</span>, <span class="citation">A. He, Huang, and Zhou (<a href="solutions-to-exercises.html#ref-he2019factors">2020</a>)</span>, <span class="citation">Kozak, Nagel, and Santosh (<a href="solutions-to-exercises.html#ref-kozak2019shrinking">2019</a>)</span> and <span class="citation">Freyberger, Neuhierl, and Weber (<a href="solutions-to-exercises.html#ref-freyberger2020dissecting">2020</a>)</span> advocates the need to use at least 15 or more (in contrast, <span class="citation">Kelly, Pruitt, and Su (<a href="solutions-to-exercises.html#ref-kelly2019characteristics">2019</a>)</span> argue that a small number of <strong>latent</strong> factors may suffice). <span class="citation">Green, Hand, and Zhang (<a href="solutions-to-exercises.html#ref-green2017characteristics">2017</a>)</span> even find that the number of characteristics that help explain the cross-section of returns varies in time.<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;Older tests for the number of factors in linear models include &lt;span class="citation"&gt;Connor and Korajczyk (&lt;a href="solutions-to-exercises.html#ref-connor1993test"&gt;1993&lt;/a&gt;)&lt;/span&gt; and &lt;span class="citation"&gt;Bai and Ng (&lt;a href="solutions-to-exercises.html#ref-bai2002determining"&gt;2002&lt;/a&gt;)&lt;/span&gt;.&lt;/p&gt;'><sup>8</sup></a> </p>
</div>
<div id="advanced-techniques" class="section level3" number="3.2.7">
<h3>
<span class="header-section-number">3.2.7</span> Advanced techniques<a class="anchor" aria-label="anchor" href="#advanced-techniques"><i class="fas fa-link"></i></a>
</h3>
<p>The ever increasing number of factors combined to their importance in asset management has led researchers to craft more subtle methods in order to ``organize’’ the so-called <em>factor zoo</em> and, more importantly, to detect spurious anomalies and compare different asset pricing model specifications. We list a few of them below.
- <span class="citation">Feng, Giglio, and Xiu (<a href="solutions-to-exercises.html#ref-feng2019taming">2020</a>)</span> combine LASSO selection with Fama-MacBeth regressions to test if new factor models are worth it. They quantify the gain of adding one new factor to a set of predefined factors and show that many factors reported in papers published in the 2010 decade do not add much incremental value;<br>
- <span class="citation">C. Harvey and Liu (<a href="solutions-to-exercises.html#ref-harvey2017lucky">2019</a>)</span> (in a similar vein) use bootstrap on orthogonalized factors. They make the case that correlations among predictors is a major issue and their method aims at solving this problem. Their lengthy procedure seeks to test if maximal additional contribution of a candidate variable is significant;<br>
- <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama2018choosing">2018</a>)</span> compare asset pricing models through squared maximum Sharpe ratios;<br>
- <span class="citation">Giglio and Xiu (<a href="solutions-to-exercises.html#ref-giglio2018asset">2019</a>)</span> estimate factor risk premia using a three-pass method based on principal component analysis;<br>
- <span class="citation">Pukthuanthong, Roll, and Subrahmanyam (<a href="solutions-to-exercises.html#ref-pukthuanthong2018protocol">2018</a>)</span> disentangle priced and non-priced factors via a combination of principal component analysis and <span class="citation">Fama and MacBeth (<a href="solutions-to-exercises.html#ref-fama1973risk">1973</a>)</span> regressions;<br>
- <span class="citation">Gospodinov, Kan, and Robotti (<a href="solutions-to-exercises.html#ref-gospodinov2019too">2019</a>)</span> warn against factor misspecification (when spurious factors are included in the list of regressors). Traded factors (<span class="math inline">\(resp.\)</span> macro-economic factors) seem more likely (<span class="math inline">\(resp.\)</span> less likely) to yield robust identifications (see also <span class="citation">Bryzgalova (<a href="solutions-to-exercises.html#ref-bryzgalova2019spurious">2019</a>)</span>).</p>
<p>There is obviously no infallible method, but the number of contributions in the field highlights the need for robustness. This is evidently a major concern when crafting investment decisions based on factor intuitions. One major hurdle for short-term strategies is the likely time-varying feature of factors. We refer for instance to <span class="citation">Ang and Kristensen (<a href="solutions-to-exercises.html#ref-ang2012testing">2012</a>)</span>, <span class="citation">Cooper and Maio (<a href="solutions-to-exercises.html#ref-cooper2018new">2019</a>)</span>, and <span class="citation">Briere and Szafarz (<a href="solutions-to-exercises.html#ref-briere2021rains">2021</a>)</span> for practical results and to <span class="citation">Gagliardini, Ossola, and Scaillet (<a href="solutions-to-exercises.html#ref-gagliardini2016time">2016</a>)</span> and <span class="citation">S. Ma et al. (<a href="solutions-to-exercises.html#ref-ma2018testing">2020</a>)</span> for more theoretical treatments (with additional empirical results).</p>
</div>
</div>
<div id="factors-or-characteristics" class="section level2" number="3.3">
<h2>
<span class="header-section-number">3.3</span> Factors or characteristics?<a class="anchor" aria-label="anchor" href="#factors-or-characteristics"><i class="fas fa-link"></i></a>
</h2>
<p>The decomposition of returns into linear factor models is convenient because of its simple interpretation. There is nonetheless a debate in the academic literature about whether firm returns are indeed explained by exposure to macro-economic factors or simply by the characteristics of firms. In their early study, <span class="citation">Lakonishok, Shleifer, and Vishny (<a href="solutions-to-exercises.html#ref-lakonishok1994contrarian">1994</a>)</span> argue that one explanation of the value premium comes from incorrect extrapolation of past earning growth rates. Investors are overly optimistic about firms subject to recent profitability. Consequently, future returns are (also) driven by the core (accounting) features of the firm. The question is then to disentangle which effect is the most pronounced when explaining returns: characteristics versus exposures to macro-economic factors.</p>
<p>In their seminal contribution on this topic, <span class="citation">K. Daniel and Titman (<a href="solutions-to-exercises.html#ref-daniel1997evidence">1997</a>)</span> provide evidence in favour of the former (two follow-up papers are <span class="citation">K. Daniel, Titman, and Wei (<a href="solutions-to-exercises.html#ref-daniel2001explaining">2001</a>)</span> and <span class="citation">K. Daniel and Titman (<a href="solutions-to-exercises.html#ref-daniel2012testing">2012</a>)</span>). They show that firms with high book-to-market ratios or small capitalizations display higher average returns, even if they are negatively loaded on the HML or SMB factors. Therefore, it seems that it is indeed the intrinsic characteristics that matter, and not the factor exposure. For further material on characteristics’ role in return explanation or prediction, we refer to the following contributions:
- <span class="citation">Haugen and Baker (<a href="solutions-to-exercises.html#ref-haugen1996commonality">1996</a>)</span> estimate predictive regressions based on firms characteristics and show that it is possible to build profitable portfolios based on the resulting predictions. There method was subsequently enhanced with the adaptive LASSO by <span class="citation">Guo (<a href="solutions-to-exercises.html#ref-guo2020sparse">2020</a>)</span>.
- Section 2.5.2. in <span class="citation">Goyal (<a href="solutions-to-exercises.html#ref-goyal2012empirical">2012</a>)</span> surveys pre-2010 results on this topic;<br>
- <span class="citation">Chordia, Goyal, and Shanken (<a href="solutions-to-exercises.html#ref-chordia2015cross">2019</a>)</span> find that characteristics explain a larger proportion of variation in estimated expected returns than factor loadings;<br>
- <span class="citation">Kozak, Nagel, and Santosh (<a href="solutions-to-exercises.html#ref-kozak2018interpreting">2018</a>)</span> reconcile factor-based explanations of premia to a theoretical model in which some agents’ demands are sentiment driven;<br>
- <span class="citation">Han et al. (<a href="solutions-to-exercises.html#ref-han2018firm">2019</a>)</span> show with penalized regressions that 20 to 30 characteristics (out of 94) are useful for the prediction of monthly returns of US stocks. Their methodology is interesting: they regress returns against characteristics to build forecasts and then regress the returns on the forecast to assess if they are reliable. The latter regression uses a LASSO-type penalization (see Chapter <a href="lasso.html#lasso">5</a>) so that useless characteristics are excluded from the model. The penalization is extended to elasticnet in <span class="citation">D. Rapach and Zhou (<a href="solutions-to-exercises.html#ref-rapach2019time">2019</a>)</span>.<br>
- <span class="citation">Kelly, Pruitt, and Su (<a href="solutions-to-exercises.html#ref-kelly2019characteristics">2019</a>)</span> and <span class="citation">S. Kim, Korajczyk, and Neuhierl (<a href="solutions-to-exercises.html#ref-kim2019arbitrage">2019</a>)</span> both estimate models in which <em>factors</em> are <em>latent</em> but loadings (betas) and possibly alphas depend on characteristics. <span class="citation">Kirby (<a href="solutions-to-exercises.html#ref-kirby2020firm">2020</a>)</span> generalizes the first approach by introducing regime-switching. In contrast, <span class="citation">Lettau and Pelger (<a href="solutions-to-exercises.html#ref-lettau2018estimating">2020a</a>)</span> and <span class="citation">Lettau and Pelger (<a href="solutions-to-exercises.html#ref-lettau2018factors">2020b</a>)</span> estimate latent factors without any link to particular characteristics (and provide large sample asymptotic properties of their methods).<br>
- In the same vein as <span class="citation">Hoechle, Schmid, and Zimmermann (<a href="solutions-to-exercises.html#ref-hoechle2018correcting">2018</a>)</span>, <span class="citation">Gospodinov, Kan, and Robotti (<a href="solutions-to-exercises.html#ref-gospodinov2019too">2019</a>)</span> and <span class="citation">Bryzgalova (<a href="solutions-to-exercises.html#ref-bryzgalova2019spurious">2019</a>)</span> and discuss potential errors that arise when working with portfolio sorts that yield long-short returns. The authors show that in some cases, tests based on this procedure may be deceitful. This happens when the characteristic chosen to perform the sort is correlated with an external (unobservable) factor. They propose a novel regression-based approach aimed at bypassing this problem.</p>
<p>More recently and in a separate stream of literature, <span class="citation">R. S. J. Koijen and Yogo (<a href="solutions-to-exercises.html#ref-koijen2019demand">2019</a>)</span> have introduced a demand model in which investors form their portfolios according to their preferences towards particular firm characteristics. They show that this allows them to mimic the portfolios of large institutional investors. In their model, aggregate demands (and hence, prices) are directly linked to characteristics, not to factors. In a follow-up paper, <span class="citation">R. S. Koijen, Richmond, and Yogo (<a href="solutions-to-exercises.html#ref-koijen2019investors">2019</a>)</span> show that a few sets of characteristics suffice to predict future returns. They also show that, based on institutional holdings from the UK and the US, the largest investors are those who are the most influencial in the formation of prices. In a similar vein, <span class="citation">Betermier, Calvet, and Jo (<a href="solutions-to-exercises.html#ref-betermier2019supply">2019</a>)</span> derive an elegant (theoretical) general equilibrium model that generates some well-documented anomalies (size, book-to-market). The models of <span class="citation">R. D. Arnott et al. (<a href="solutions-to-exercises.html#ref-arnott2014can">2014</a>)</span> and <span class="citation">Alti and Titman (<a href="solutions-to-exercises.html#ref-alti2019dynamic">2019</a>)</span> are also able to theoretically generate known anomalies. Finally, in <span class="citation">I. Martin and Nagel (<a href="solutions-to-exercises.html#ref-martin2019market">2019</a>)</span>, characteristics influence returns via the role they play in the predictability of dividend growth. This paper discussed the asymptotic case when the number of assets and the number of characteristics are proportional and both increase to infinity.</p>
</div>
<div id="hot-topics-momentum-timing-and-esg" class="section level2" number="3.4">
<h2>
<span class="header-section-number">3.4</span> Hot topics: momentum, timing and ESG<a class="anchor" aria-label="anchor" href="#hot-topics-momentum-timing-and-esg"><i class="fas fa-link"></i></a>
</h2>
<div id="factor-momentum" class="section level3" number="3.4.1">
<h3>
<span class="header-section-number">3.4.1</span> Factor momentum<a class="anchor" aria-label="anchor" href="#factor-momentum"><i class="fas fa-link"></i></a>
</h3>
<p>A recent body of literature unveils a time series momentum property of factor returns. For instance, <span class="citation">T. Gupta and Kelly (<a href="solutions-to-exercises.html#ref-gupta2019factor">2019</a>)</span> report that autocorrelation patterns within these returns is statistically significant.<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;Autocorrelation in aggregate/portfolio returns is a widely documented effect since the seminal paper &lt;span class="citation"&gt;Lo and MacKinlay (&lt;a href="solutions-to-exercises.html#ref-lo1990contrarian"&gt;1990&lt;/a&gt;)&lt;/span&gt; (see also &lt;span class="citation"&gt;Moskowitz, Ooi, and Pedersen (&lt;a href="solutions-to-exercises.html#ref-moskowitz2012time"&gt;2012&lt;/a&gt;)&lt;/span&gt;).&lt;/p&gt;'><sup>9</sup></a> Similar results are obtained in <span class="citation">Falck, Rej, and Thesmar (<a href="solutions-to-exercises.html#ref-falck2020factor">2022</a>)</span>. In the same vein, <span class="citation">R. D. Arnott et al. (<a href="solutions-to-exercises.html#ref-arnott2019factor">2021</a>)</span> make the case that the industry momentum found in <span class="citation">Moskowitz and Grinblatt (<a href="solutions-to-exercises.html#ref-moskowitz1999industries">1999</a>)</span> can in fact be explained by this factor momentum. Going even further, <span class="citation">Ehsani and Linnainmaa (<a href="solutions-to-exercises.html#ref-ehsani2019factor">2022</a>)</span> conclude that the original momentum factor is in fact the aggregation of the autocorrelation that can be found in all other factors. Recently, the strength of factor momentum is scrutinized by <span class="citation">Fan et al. (<a href="solutions-to-exercises.html#ref-fan2021reexamination">2021</a>)</span>. The authors find that it is only robust for a small number of factors.</p>
<p>Acknowledging the profitability of factor momentum, <span class="citation">H. Yang (<a href="solutions-to-exercises.html#ref-yang2020decomposing">2020b</a>)</span> seeks to understand its source and decomposes stock factor momentum portfolios into two components: factor timing portfolio and a static portfolio. The former seeks to profit from the serial correlations of factor returns while the latter tries to harness factor premia. The author shows that it is the static portfolio that explains the larger portion of factor momentum returns. In <span class="citation">H. Yang (<a href="solutions-to-exercises.html#ref-yang2020weighted">2020a</a>)</span>, the same author presents a new estimator to gauge factor momentum predictability. Words of caution are provided in <span class="citation">Leippold and Yang (<a href="solutions-to-exercises.html#ref-leippold2021anatomy">2021</a>)</span>.</p>
<p>Lastly, <span class="citation">Garcia, Medeiros, and Ribeiro (<a href="solutions-to-exercises.html#ref-garcia2021factor">2021</a>)</span> document factor momentum at the daily frequency.</p>
<p>Given the data obtained on Ken French’s website, we compute the autocorrelation function (ACF) of factors. We recall that
<span class="math display">\[\text{ACF}_k(\textbf{x}_t)=\mathbb{E}[(\textbf{x}_t-\bar{\textbf{x}})(\textbf{x}_{t+k}-\bar{\textbf{x}})].\]</span></p>
<div class="sourceCode" id="cb21"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://wilkelab.org/cowplot/">cowplot</a></span><span class="op">)</span>                   <span class="co"># For stacking plots</span></span>
<span><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://pkg.robjhyndman.com/forecast/">forecast</a></span><span class="op">)</span>                  <span class="co"># For autocorrelation function</span></span>
<span><span class="va">acf_SMB</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://pkg.robjhyndman.com/forecast/reference/autoplot.acf.html">ggAcf</a></span><span class="op">(</span><span class="va">FF_factors</span><span class="op">$</span><span class="va">SMB</span>, lag.max <span class="op">=</span> <span class="fl">10</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs</a></span><span class="op">(</span>title <span class="op">=</span> <span class="st">""</span><span class="op">)</span>  <span class="co"># ACF SMB</span></span>
<span><span class="va">acf_HML</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://pkg.robjhyndman.com/forecast/reference/autoplot.acf.html">ggAcf</a></span><span class="op">(</span><span class="va">FF_factors</span><span class="op">$</span><span class="va">HML</span>, lag.max <span class="op">=</span> <span class="fl">10</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs</a></span><span class="op">(</span>title <span class="op">=</span> <span class="st">""</span><span class="op">)</span>  <span class="co"># ACF HML</span></span>
<span><span class="va">acf_RMW</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://pkg.robjhyndman.com/forecast/reference/autoplot.acf.html">ggAcf</a></span><span class="op">(</span><span class="va">FF_factors</span><span class="op">$</span><span class="va">RMW</span>, lag.max <span class="op">=</span> <span class="fl">10</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs</a></span><span class="op">(</span>title <span class="op">=</span> <span class="st">""</span><span class="op">)</span>  <span class="co"># ACF RMW</span></span>
<span><span class="va">acf_CMA</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://pkg.robjhyndman.com/forecast/reference/autoplot.acf.html">ggAcf</a></span><span class="op">(</span><span class="va">FF_factors</span><span class="op">$</span><span class="va">CMA</span>, lag.max <span class="op">=</span> <span class="fl">10</span><span class="op">)</span> <span class="op">+</span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs</a></span><span class="op">(</span>title <span class="op">=</span> <span class="st">""</span><span class="op">)</span>  <span class="co"># ACF CMA</span></span>
<span><span class="fu"><a href="https://wilkelab.org/cowplot/reference/plot_grid.html">plot_grid</a></span><span class="op">(</span><span class="va">acf_SMB</span>, <span class="va">acf_HML</span>, <span class="va">acf_RMW</span>, <span class="va">acf_CMA</span>,  <span class="co"># Plot</span></span>
<span>          labels <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">'SMB'</span>, <span class="st">'HML'</span>, <span class="st">'RMW'</span>, <span class="st">'CMA'</span><span class="op">)</span><span class="op">)</span> </span></code></pre></div>
<div class="figure" style="text-align: center">
<span style="display:block;" id="fig:facautocorr"></span>
<img src="ML_factor_files/figure-html/facautocorr-1.png" alt="Autocorrelograms of common factor portfolios." width="432"><p class="caption">
FIGURE 3.4: Autocorrelograms of common factor portfolios.
</p>
</div>
<p></p>
<p>Of the four chosen series, only the size factor is not significantly autocorrelated at the first order.</p>
</div>
<div id="factor-timing" class="section level3" number="3.4.2">
<h3>
<span class="header-section-number">3.4.2</span> Factor timing<a class="anchor" aria-label="anchor" href="#factor-timing"><i class="fas fa-link"></i></a>
</h3>
<p>Given the abundance of evidence of the time-varying nature of factor premia, it is legitimate to wonder if it is possible to predict when factor will perform well or badly. The evidence on the effectiveness of timing is diverse: positive for <span class="citation">Greenwood and Hanson (<a href="solutions-to-exercises.html#ref-greenwood2012share">2012</a>)</span>, <span class="citation">Hodges et al. (<a href="solutions-to-exercises.html#ref-hodges2017factor">2017</a>)</span>, <span class="citation">Hasler, Khapko, and Marfe (<a href="solutions-to-exercises.html#ref-hasler2019should">2019</a>)</span>, <span class="citation">Haddad, Kozak, and Santosh (<a href="solutions-to-exercises.html#ref-haddad2020economics">2020</a>)</span>, <span class="citation">Lioui and Tarelli (<a href="solutions-to-exercises.html#ref-lioui2020factor">2020</a>)</span> and <span class="citation">Neuhierl et al. (<a href="solutions-to-exercises.html#ref-neuhierl2023timing">2023</a>)</span> more recently, but negative for <span class="citation">Clifford Asness et al. (<a href="solutions-to-exercises.html#ref-asness2017contrarian">2017</a>)</span> and mixed for <span class="citation">Dichtl et al. (<a href="solutions-to-exercises.html#ref-dichtl2019optimal">2019</a>)</span>. The majority of positive findings may be due to the bias towards positive results - but this is pure speculation.</p>
<p>There is no consensus on which predictors to use (general macroeconomic indicators in <span class="citation">Hodges et al. (<a href="solutions-to-exercises.html#ref-hodges2017factor">2017</a>)</span>, stock issuances versus repurchases in <span class="citation">Greenwood and Hanson (<a href="solutions-to-exercises.html#ref-greenwood2012share">2012</a>)</span>, and aggregate fundamental data in <span class="citation">Dichtl et al. (<a href="solutions-to-exercises.html#ref-dichtl2019optimal">2019</a>)</span>). A method for building reasonable timing strategies for long-only portfolios with sustainable transaction costs is laid out in <span class="citation">Leippold and Rüegg (<a href="solutions-to-exercises.html#ref-leippold2020fama">2020</a>)</span>. The cross-section of characteristics is used for factor timing purposes in <span class="citation">Kagkadis et al. (<a href="solutions-to-exercises.html#ref-kagkadis2021factor">2021</a>)</span>. In <span class="citation">Vincenz and Zeissler (<a href="solutions-to-exercises.html#ref-vincenz2021time">2021</a>)</span>, it is found that macro variables are the best for this purpose. In ML-based factor investing, it is possible to resort to more granularity by combining firm-specific attributes to large-scale economic data as we explain in Section <a href="Data.html#macrovar">4.7.2</a>.</p>
</div>
<div id="the-green-factors" class="section level3" number="3.4.3">
<h3>
<span class="header-section-number">3.4.3</span> The green factors<a class="anchor" aria-label="anchor" href="#the-green-factors"><i class="fas fa-link"></i></a>
</h3>
<p>The demand for ethical financial products has sharply risen during the 2010 decade, leading to the creation of funds dedicated to socially responsible investing (SRI - see <span class="citation">Camilleri (<a href="solutions-to-exercises.html#ref-camilleri2020market">2020</a>)</span>). Though this phenomenon is not really new (<span class="citation">Schueth (<a href="solutions-to-exercises.html#ref-schueth2003socially">2003</a>)</span>, <span class="citation">Hill et al. (<a href="solutions-to-exercises.html#ref-hill2007corporate">2007</a>)</span>), its acceleration has prompted research about whether or not characteristics related to ESG criteria (environment, social, governance) are priced. Dozens and even possibly hundreds of papers have been devoted to this question, but no consensus has been reached. More and more, researchers study the financial impact of climate change (see <span class="citation">Bernstein, Gustafson, and Lewis (<a href="solutions-to-exercises.html#ref-bernstein2019disaster">2019</a>)</span>, <span class="citation">Hong, Li, and Xu (<a href="solutions-to-exercises.html#ref-hong2019climate">2019</a>)</span> and <span class="citation">Hong, Karolyi, and Scheinkman (<a href="solutions-to-exercises.html#ref-hong2020climate">2020</a>)</span>) and the societal push for responsible corporate behavior (<span class="citation">Fabozzi (<a href="solutions-to-exercises.html#ref-fabozzi2020introduction">2020</a>)</span>, <span class="citation">Kurtz (<a href="solutions-to-exercises.html#ref-kurtz2020three">2020</a>)</span>).
We gather below a very short list of papers that suggests conflicting results:</p>
<ul>
<li>
<strong>favorable</strong>: ESG investing works (<span class="citation">Kempf and Osthoff (<a href="solutions-to-exercises.html#ref-kempf2007effect">2007</a>)</span>, <span class="citation">Cheema-Fox et al. (<a href="solutions-to-exercises.html#ref-cheema2020decarbonization">2020</a>)</span>), can work (<span class="citation">Nagy, Kassam, and Lee (<a href="solutions-to-exercises.html#ref-nagy2016can">2016</a>)</span>, <span class="citation">Alessandrini and Jondeau (<a href="solutions-to-exercises.html#ref-alessandrini2020optimal">2020</a>)</span>), or can at least be rendered efficient (<span class="citation">Branch and Cai (<a href="solutions-to-exercises.html#ref-branch2012socially">2012</a>)</span>). A large meta-study reports overwhelming favorable results (<span class="citation">Friede, Busch, and Bassen (<a href="solutions-to-exercises.html#ref-friede2015esg">2015</a>)</span>), but of course, they could well stem from the publication bias towards positive results.<br>
</li>
<li>
<strong>unfavorable</strong>: Ethical investing is not profitable according to <span class="citation">Adler and Kritzman (<a href="solutions-to-exercises.html#ref-adler2008cost">2008</a>)</span> and <span class="citation">Blitz and Swinkels (<a href="solutions-to-exercises.html#ref-blitz2020exclusion">2020</a>)</span>. An ESG factor should be long unethical firms and short ethical ones (<span class="citation">Lioui (<a href="solutions-to-exercises.html#ref-lioui2018esg">2018</a>)</span>).<br>
</li>
<li>
<strong>mixed</strong>: ESG investing may be beneficial globally but not locally (<span class="citation">Chakrabarti and Sen (<a href="solutions-to-exercises.html#ref-chakrabarti2020time">2020</a>)</span>). Portfolios relying on ESG screening do not significantly outperform those with no screening but are subject to lower levels of volatility (<span class="citation">Gibson et al. (<a href="solutions-to-exercises.html#ref-gibson2020responsible">2020</a>)</span>, <span class="citation">Gougler and Utz (<a href="solutions-to-exercises.html#ref-gougler2020factor">2020</a>)</span>). As is often the case, the devil is in the details, and results depend on whether to use E, S or G (<span class="citation">Bruder et al. (<a href="solutions-to-exercises.html#ref-bruder2019integration">2019</a>)</span>).</li>
</ul>
<p>On top of these contradicting results, several articles point towards complexities in the measurement of ESG. Depending on the chosen criteria and on the data provider, results can change drastically (see <span class="citation">Galema, Plantinga, and Scholtens (<a href="solutions-to-exercises.html#ref-galema2008stocks">2008</a>)</span>, <span class="citation">Berg, Koelbel, and Rigobon (<a href="solutions-to-exercises.html#ref-berg2019aggregate">2020</a>)</span> and <span class="citation">Atta-Darkua et al. (<a href="solutions-to-exercises.html#ref-atta2020strategies">2020</a>)</span>).</p>
<p>We end this short section by noting that of course ESG criteria can directly be integrated into ML model, as is for instance done in <span class="citation">Franco et al. (<a href="solutions-to-exercises.html#ref-de2020esg">2020</a>)</span>.</p>
</div>
</div>
<div id="the-links-with-machine-learning" class="section level2" number="3.5">
<h2>
<span class="header-section-number">3.5</span> The links with machine learning<a class="anchor" aria-label="anchor" href="#the-links-with-machine-learning"><i class="fas fa-link"></i></a>
</h2>
<p>Given the exponential increase in data availability, the obvious temptation of any asset manager is to try to infer future returns from the abundance of attributes available at the firm level. We allude to classical data like accounting ratios and to alternative data, such as sentiment. This task is precisely the aim of Machine Learning. Given a large set of predictor variables (<span class="math inline">\(\mathbf{X}\)</span>), the goal is to predict a proxy for future performance <span class="math inline">\(\mathbf{y}\)</span> through a model of the form <a href="intro.html#eq:ML">(2.1)</a>.</p>
<p>If fundamental data (accounting ratios, earnings, relative valuations, etc.) help predict returns, then one refinement is to predict this fundamental data upfront. This may allow to anticipate changes or gain informational edges. Recent contributions in this directions include <span class="citation">K. Cao and You (<a href="solutions-to-exercises.html#ref-cao2020fundamental">2020</a>)</span> and <span class="citation">D. Huang et al. (<a href="solutions-to-exercises.html#ref-huang2020fundamental">2020</a>)</span>.</p>
<p>Some earlier attempts have already been made that aim to explain and predict returns with firm attributes (e.g., <span class="citation">Brandt, Santa-Clara, and Valkanov (<a href="solutions-to-exercises.html#ref-brandt2009parametric">2009</a>)</span>, <span class="citation">Hjalmarsson and Manchev (<a href="solutions-to-exercises.html#ref-hjalmarsson2012characteristic">2012</a>)</span>, <span class="citation">Ammann, Coqueret, and Schade (<a href="solutions-to-exercises.html#ref-ammann2016characteristics">2016</a>)</span>, <span class="citation">DeMiguel et al. (<a href="solutions-to-exercises.html#ref-martin2018transaction">2020</a>)</span> and <span class="citation">McGee and Olmo (<a href="solutions-to-exercises.html#ref-mcgee2020optimal">2020</a>)</span>), but not with any ML intent or focus originally. In retrospect, these approaches do share some links with ML tools. The general formulation is the following. At time <span class="math inline">\(T\)</span>, the agent or investor seeks to solve the following program:
<span class="math display">\[\begin{align*}
\underset{\boldsymbol{\theta}_T}{\max} \ \mathbb{E}_T\left[ u(r_{p,T+1})\right] = \underset{\boldsymbol{\theta}_T}{\max} \ \mathbb{E}_T\left[ u\left(\left(\bar{\textbf{w}}_T+\textbf{x}_T\boldsymbol{\theta}_T\right)'\textbf{r}_{T+1}\right)\right] ,
\end{align*}\]</span>
where <span class="math inline">\(u\)</span> is some utility function and <span class="math inline">\(r_{p,T+1}=\left(\bar{\textbf{w}}_T+\textbf{x}_T\boldsymbol{\theta}_T\right)'\textbf{r}_{T+1}\)</span> is the return of the portfolio, which is defined as a benchmark <span class="math inline">\(\bar{\textbf{w}}_T\)</span> plus some deviations from this benchmark that are a linear function of features <span class="math inline">\(\textbf{x}_T\boldsymbol{\theta}_T\)</span>. The above program may be subject to some external constraints (e.g., to limit leverage).</p>
<p>In practice, the vector <span class="math inline">\(\boldsymbol{\theta}_T\)</span> must be estimated using past data (from <span class="math inline">\(T-\tau\)</span> to <span class="math inline">\(T-1\)</span>): the agent seeks the solution of
<span class="math display" id="eq:optchar">\[\begin{align}
\tag{3.5}
\underset{\boldsymbol{\theta}_T}{\text{max}} \ \frac{1}{\tau} \sum_{t=T-\tau}^{T-1} u \left( \sum_{i=1}^{N_T}\left(\bar{w}_{i,t}+ \boldsymbol{\theta}'_T \textbf{x}_{i,t} \right)r_{i,t+1} \right)
\end{align}\]</span></p>
<p>on a sample of size <span class="math inline">\(\tau\)</span> where <span class="math inline">\(N_T\)</span> is the number of asset in the universe. The above formulation can be viewed as a learning task in which the parameters are chosen such that the reward (average return) is maximized.</p>
<div id="a-short-list-of-recent-references" class="section level3" number="3.5.1">
<h3>
<span class="header-section-number">3.5.1</span> A short list of recent references<a class="anchor" aria-label="anchor" href="#a-short-list-of-recent-references"><i class="fas fa-link"></i></a>
</h3>
<p>Independent of a characteristics-based approach, ML applications in finance have blossomed, initially working with price data only and later on integrating firm characteristics as predictors. We cite a few references below, grouped by methodological approach:</p>
<ul>
<li>penalized quadratic programming: <span class="citation">Goto and Xu (<a href="solutions-to-exercises.html#ref-goto2015improving">2015</a>)</span>, <span class="citation">Ban, El Karoui, and Lim (<a href="solutions-to-exercises.html#ref-ban2016machine">2016</a>)</span> and <span class="citation">Perrin and Roncalli (<a href="solutions-to-exercises.html#ref-perrin2019machine">2019</a>)</span>,</li>
<li>regularized predictive regressions: <span class="citation">D. E. Rapach, Strauss, and Zhou (<a href="solutions-to-exercises.html#ref-rapach2013international">2013</a>)</span> and <span class="citation">Alexander Chinco, Clark-Joseph, and Ye (<a href="solutions-to-exercises.html#ref-chinco2019sparse">2019</a>)</span>, </li>
<li>support vector machines: <span class="citation">L.-J. Cao and Tay (<a href="solutions-to-exercises.html#ref-cao2003support">2003</a>)</span> (and the references therein), </li>
<li>model comparison and/or aggregation: <span class="citation">K. Kim (<a href="solutions-to-exercises.html#ref-kim2003financial">2003</a>)</span>, <span class="citation">W. Huang, Nakamori, and Wang (<a href="solutions-to-exercises.html#ref-huang2005forecasting">2005</a>)</span>, <span class="citation">Matı́as and Reboredo (<a href="solutions-to-exercises.html#ref-matias2012forecasting">2012</a>)</span>, <span class="citation">Reboredo, Matı́as, and Garcia-Rubio (<a href="solutions-to-exercises.html#ref-reboredo2012nonlinearity">2012</a>)</span>, <span class="citation">Dunis et al. (<a href="solutions-to-exercises.html#ref-dunis2013hybrid">2013</a>)</span>, <span class="citation">Gu, Kelly, and Xiu (<a href="solutions-to-exercises.html#ref-gu2018empirical">2020b</a>)</span>, <span class="citation">Guida and Coqueret (<a href="solutions-to-exercises.html#ref-guida2018machine">2018b</a>)</span> and <span class="citation">Tobek and Hronec (<a href="solutions-to-exercises.html#ref-tobek2021does">2021</a>)</span>. The latter two more recent articles work with a large cross-section of characteristics.</li>
</ul>
<p>We provide more detailed lists for tree-based methods, neural networks and reinforcement learning techniques in Chapters <a href="trees.html#trees">6</a>, <a href="NN.html#NN">7</a> and <a href="RL.html#RL">16</a>, respectively. Moreover, we refer to <span class="citation">Ballings et al. (<a href="solutions-to-exercises.html#ref-ballings2015evaluating">2015</a>)</span> for a comparison of classifiers and to <span class="citation">Henrique, Sobreiro, and Kimura (<a href="solutions-to-exercises.html#ref-henrique2019literature">2019</a>)</span> and <span class="citation">Bustos and Pomares-Quimbaya (<a href="solutions-to-exercises.html#ref-bustos2020stock">2020</a>)</span> for surveys on ML-based forecasting techniques.</p>
</div>
<div id="explicit-connections-with-asset-pricing-models" class="section level3" number="3.5.2">
<h3>
<span class="header-section-number">3.5.2</span> Explicit connections with asset pricing models<a class="anchor" aria-label="anchor" href="#explicit-connections-with-asset-pricing-models"><i class="fas fa-link"></i></a>
</h3>
<p>The first and obvious link between factor investing and asset pricing is (average) return prediction. The main canonical academic reference is <span class="citation">Gu, Kelly, and Xiu (<a href="solutions-to-exercises.html#ref-gu2018empirical">2020b</a>)</span>. Let us first write the general equation and then comment on it:
<span class="math display" id="eq:genML">\[\begin{equation}
\tag{3.6}
r_{t+1,n}=g(\textbf{x}_{t,n}) + \epsilon_{t+1}.
\end{equation}\]</span></p>
<p>The interesting discussion lies in the differences between the above model and that of Equation <a href="factor.html#eq:apt">(3.1)</a>. The first obvious difference is the introduction of the nonlinear function <span class="math inline">\(g\)</span>: indeed, there is no reason (beyond simplicity and interpretability) why we should restrict the model to linear relationships. One early reference for nonlinearities in asset pricing kernels is <span class="citation">Bansal and Viswanathan (<a href="solutions-to-exercises.html#ref-bansal1993no">1993</a>)</span>.</p>
<p>More importantly, the second difference between <a href="factor.html#eq:genML">(3.6)</a> and <a href="factor.html#eq:apt">(3.1)</a> is the shift in the time index. Indeed, from an investor’s perspective, the interest is to be able to <em>predict</em> some information about the structure of the cross-section of assets. Explaining asset returns with synchronous factors is not useful because the realization of factor values is not known in advance. Hence, if one seeks to extract value from the model, there needs to be a time interval between the observation of the state space (which we call <span class="math inline">\(\textbf{x}_{t,n}\)</span>) and the occurrence of the returns. Once the model <span class="math inline">\(\hat{g}\)</span> is estimated, the time-<span class="math inline">\(t\)</span> (measurable) value <span class="math inline">\(g(\textbf{x}_{t,n})\)</span> will give a forecast for the (average) future returns. These predictions can then serve as signals in the crafting of portfolio weights (see Chapter <a href="backtest.html#backtest">12</a> for more on that topic).</p>
<p>While most studies do work with returns on the l.h.s. of <a href="factor.html#eq:genML">(3.6)</a>, there is no reason why other indicators should not be used. Returns are straightforward and simple to compute, but they could very well be replaced by more sophisticated metrics, like the Sharpe ratio, for instance. The firms’ features would then be used to predict a risk-adjusted performance rather than simple returns.</p>
<p>Beyond the explicit form of Equation <a href="factor.html#eq:genML">(3.6)</a>, several other ML-related tools can also be used to estimate asset pricing models. This can be achieved in several ways, some of which we list below.</p>
<p>First, one mainstream problem in asset pricing is to characterize the stochastic discount factor (SDF) <span class="math inline">\(M_t\)</span>, which satisfies <span class="math inline">\(\mathbb{E}_t[M_{t+1}(r_{t+1,n}-r_{t+1,f})]=0\)</span> for any asset <span class="math inline">\(n\)</span> (see <span class="citation">Cochrane (<a href="solutions-to-exercises.html#ref-cochrane2009asset">2009</a>)</span>). This equation is a natural playing field for the generalized method of moment (<span class="citation">Hansen (<a href="solutions-to-exercises.html#ref-hansen1982large">1982</a>)</span>): <span class="math inline">\(M_t\)</span> must be such that
<span class="math display" id="eq:SDFGMM">\[\begin{equation}
\tag{3.7}
\mathbb{E}[M_{t+1}R_{t+1,n}g(V_t)]=0,
\end{equation}\]</span>
where the instrumental variables <span class="math inline">\(V_t\)</span> are <span class="math inline">\(\mathcal{F}_t\)</span>-measurable (i.e., are known at time <span class="math inline">\(t\)</span>) and the capital <span class="math inline">\(R_{t+1,n}\)</span> denotes the excess return of asset <span class="math inline">\(n\)</span>. In order to reduce and simplify the estimation problem, it is customary to define the SDF as a portfolio of assets (see chapter 3 in <span class="citation">Back (<a href="solutions-to-exercises.html#ref-back2010asset">2010</a>)</span>). In <span class="citation">Luyang Chen, Pelger, and Zhu (<a href="solutions-to-exercises.html#ref-chen2019deep">2020</a>)</span>, the authors use a generative adversarial network (GAN, see Section <a href="NN.html#generative-aversarial-networks">7.7.1</a>) to estimate the weights of the portfolios that are the closest to satisfy <a href="factor.html#eq:SDFGMM">(3.7)</a> under a strongly penalizing form. </p>
<p>A second approach is to try to model asset returns as linear combinations of factors, just as in <a href="factor.html#eq:apt">(3.1)</a>. We write in compact notation <span class="math display">\[r_{t,n}=\alpha_n+\boldsymbol{\beta}_{t,n}'\textbf{f}_t+\epsilon_{t,n},\]</span>
and we allow the loadings <span class="math inline">\(\boldsymbol{\beta}_{t,n}\)</span> to be time-dependent. The trick is then to introduce the firm characteristics in the above equation. Traditionally, the characteristics are present in the definition of factors (as in the seminal definition of <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span>). The decomposition of the return is made according to the exposition of the firm’s return to these factors constructed according to market size, accounting ratios, past performance, etc. Given the exposures, the performance of the stock is attributed to particular style profiles (e.g., small stock, or value stock, etc.).</p>
<p>Habitually, the factors are heuristic portfolios constructed from simple rules like thresholding. For instance, firms below the 1/3 quantile in book-to-market are growth firms and those above the 2/3 quantile are the value firms. A value factor can then be defined by the long-short portfolio of these two sets, with uniform weights. Note that <span class="citation">Fama and French (<a href="solutions-to-exercises.html#ref-fama1993common">1993</a>)</span> use a more complex approach which also takes market capitalization into account both in the weighting scheme and also in the composition of the portfolios.</p>
<p>One of the advances enabled by machine learning is to automate the construction of the factors. It is for instance the approach of <span class="citation">Feng, Polson, and Xu (<a href="solutions-to-exercises.html#ref-feng2019deep">2019</a>)</span>. Instead of building the factors heuristically, the authors optimize the construction to maximize the fit in the cross-section of returns. The optimization is performed via a relatively deep feed-forward neural network and the feature space is lagged so that the relationship is indeed predictive, as in Equation <a href="factor.html#eq:genML">(3.6)</a>. Theoretically, the resulting factors help explain a substantially larger proportion of the in-sample variance in the returns. The prediction ability of the model depends on how well it generalizes out-of-sample.</p>
<p>A third approach is that of <span class="citation">Kelly, Pruitt, and Su (<a href="solutions-to-exercises.html#ref-kelly2019characteristics">2019</a>)</span> (though the statistical treatment is not machine learning per se).<a class="footnote-ref" tabindex="0" data-toggle="popover" data-content='&lt;p&gt;In the same spirit, see also &lt;span class="citation"&gt;Lettau and Pelger (&lt;a href="solutions-to-exercises.html#ref-lettau2018estimating"&gt;2020a&lt;/a&gt;)&lt;/span&gt; and &lt;span class="citation"&gt;Lettau and Pelger (&lt;a href="solutions-to-exercises.html#ref-lettau2018factors"&gt;2020b&lt;/a&gt;)&lt;/span&gt;.&lt;/p&gt;'><sup>10</sup></a> Their idea is the opposite: factors are latent (unobserved) and it is the betas (loadings) that depend on the characteristics. This allows many degrees of freedom because in <span class="math inline">\(r_{t,n}=\alpha_n+(\boldsymbol{\beta}_{t,n}(\textbf{x}_{t-1,n}))'\textbf{f}_t+\epsilon_{t,n},\)</span>
only the characteristics <span class="math inline">\(\textbf{x}_{t-1,n}\)</span> are known and both the factors <span class="math inline">\(\textbf{f}_t\)</span> and the functional forms <span class="math inline">\(\boldsymbol{\beta}_{t,n}(\cdot)\)</span> must be estimated. In their article, <span class="citation">Kelly, Pruitt, and Su (<a href="solutions-to-exercises.html#ref-kelly2019characteristics">2019</a>)</span> work with a linear form, which is naturally more tractable.</p>
<p>Lastly, a fourth approach (introduced in <span class="citation">Gu, Kelly, and Xiu (<a href="solutions-to-exercises.html#ref-gu2019autoencoder">2020a</a>)</span>) goes even further and combines two neural network architectures. The first neural network takes characteristics <span class="math inline">\(\textbf{x}_{t-1}\)</span> as inputs and generates factor loadings <span class="math inline">\(\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})\)</span>. The second network transforms returns <span class="math inline">\(\textbf{r}_t\)</span> into factor values <span class="math inline">\(\textbf{f}_t(\textbf{r}_t)\)</span> (in <span class="citation">Feng, Polson, and Xu (<a href="solutions-to-exercises.html#ref-feng2019deep">2019</a>)</span>). The aggregate model can then be written:
<span class="math display" id="eq:AEearly">\[\begin{equation}
\tag{3.8}
\textbf{r}_t=\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})'\textbf{f}_t(\textbf{r}_t)+\boldsymbol{\epsilon}_t.
\end{equation}\]</span></p>
<p>The above specification is quite special because the output (on the l.h.s.) is also present as input (in the r.h.s.). In machine learning, autoencoders (see Section <a href="NN.html#autoencoders">7.7.2</a>) share the same property. Their aim, just like in principal component analysis, is to find a parsimonious nonlinear representation form for a dataset (in this case, returns). In Equation <a href="factor.html#eq:AEearly">(3.8)</a>, the input is <span class="math inline">\(\textbf{r}_t\)</span> and the output function is <span class="math inline">\(\boldsymbol{\beta}_{t-1}(\textbf{x}_{t-1})'\textbf{f}_t(\textbf{r}_t)\)</span>. The aim is to minimize the difference between the two just as is any regression-like model.</p>
<p>Autoencoders are neural networks which have outputs as close as possible to the inputs with an objective of dimensional reduction. The innovation in <span class="citation">Gu, Kelly, and Xiu (<a href="solutions-to-exercises.html#ref-gu2019autoencoder">2020a</a>)</span> is that the pure autoencoder part is merged with a vanilla perceptron used to model the loadings. The structure of the neural network is summarized below.</p>
<p><span class="math display">\[\left. \begin{array}{rl}
\text{returns } (\textbf{r}_t) &amp; \overset{NN_1}{\longrightarrow} \quad \text{ factors } (\textbf{f}_t=NN_1(\textbf{r}_t)) \\
\text{characteristics } (\textbf{x}_{t-1}) &amp; \overset{NN_2}{\longrightarrow} \quad \text{ loadings } (\boldsymbol{\beta}_{t-1}=NN_2(\textbf{x}_{t-1}))
\end{array} \right\} \longrightarrow \text{ returns } (r_t)\]</span></p>
<p>A simple autoencoder would consist of only the first line of the model. This specification is discussed in more details in Section <a href="NN.html#autoencoders">7.7.2</a>.</p>
<p>As a conclusion of this chapter, it appears undeniable that the intersection between the two fields of asset pricing and machine learning offers a rich variety of applications. The literature is already exhaustive and it is often hard to disentangle the noise from the great ideas in the continuous flow of publications on these topics. Practice and implementation is the only way forward to extricate value from hype. This is especially true because agents often tend to overestimate the role of factors in the allocation decision process of real-world investors (see <span class="citation">Alex Chinco, Hartzmark, and Sussman (<a href="solutions-to-exercises.html#ref-chinco2019risk">2019</a>)</span> and <span class="citation">Castaneda and Sabat (<a href="solutions-to-exercises.html#ref-castaneda2019microfounding">2019</a>)</span>).</p>
</div>
</div>
<div id="coding-exercises" class="section level2" number="3.6">
<h2>
<span class="header-section-number">3.6</span> Coding exercises<a class="anchor" aria-label="anchor" href="#coding-exercises"><i class="fas fa-link"></i></a>
</h2>
<ol style="list-style-type: decimal">
<li>Compute annual returns of the growth versus value portfolios, that is, the average return of firms with above median price-to-book ratio (the variable is called `<strong>Pb</strong>’ in the dataset).<br>
</li>
<li>Same exercise, but compute the monthly returns and plot the value (through time) of the corresponding portfolios.<br>
</li>
<li>Instead of a unique threshold, compute simply sorted portfolios based on quartiles of market capitalization. Compute their annual returns and plot them.</li>
</ol>
</div>
</div>

  <div class="chapter-nav">
<div class="prev"><a href="intro.html"><span class="header-section-number">2</span> Introduction</a></div>
<div class="next"><a href="Data.html"><span class="header-section-number">4</span> Data preprocessing</a></div>
</div></main><div class="col-md-3 col-lg-2 d-none d-md-block sidebar sidebar-chapter">
    <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
      <ul class="nav navbar-nav">
<li><a class="nav-link" href="#factor"><span class="header-section-number">3</span> Factor investing and asset pricing anomalies</a></li>
<li><a class="nav-link" href="#introduction"><span class="header-section-number">3.1</span> Introduction</a></li>
<li>
<a class="nav-link" href="#detecting-anomalies"><span class="header-section-number">3.2</span> Detecting anomalies</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#challenges"><span class="header-section-number">3.2.1</span> Challenges</a></li>
<li><a class="nav-link" href="#simple-portfolio-sorts"><span class="header-section-number">3.2.2</span> Simple portfolio sorts </a></li>
<li><a class="nav-link" href="#factors"><span class="header-section-number">3.2.3</span> Factors</a></li>
<li><a class="nav-link" href="#predictive-regressions-sorts-and-p-value-issues"><span class="header-section-number">3.2.4</span> Predictive regressions, sorts, and p-value issues</a></li>
<li><a class="nav-link" href="#fama-macbeth-regressions"><span class="header-section-number">3.2.5</span> Fama-Macbeth regressions</a></li>
<li><a class="nav-link" href="#factor-competition"><span class="header-section-number">3.2.6</span> Factor competition</a></li>
<li><a class="nav-link" href="#advanced-techniques"><span class="header-section-number">3.2.7</span> Advanced techniques</a></li>
</ul>
</li>
<li><a class="nav-link" href="#factors-or-characteristics"><span class="header-section-number">3.3</span> Factors or characteristics?</a></li>
<li>
<a class="nav-link" href="#hot-topics-momentum-timing-and-esg"><span class="header-section-number">3.4</span> Hot topics: momentum, timing and ESG</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#factor-momentum"><span class="header-section-number">3.4.1</span> Factor momentum</a></li>
<li><a class="nav-link" href="#factor-timing"><span class="header-section-number">3.4.2</span> Factor timing</a></li>
<li><a class="nav-link" href="#the-green-factors"><span class="header-section-number">3.4.3</span> The green factors</a></li>
</ul>
</li>
<li>
<a class="nav-link" href="#the-links-with-machine-learning"><span class="header-section-number">3.5</span> The links with machine learning</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#a-short-list-of-recent-references"><span class="header-section-number">3.5.1</span> A short list of recent references</a></li>
<li><a class="nav-link" href="#explicit-connections-with-asset-pricing-models"><span class="header-section-number">3.5.2</span> Explicit connections with asset pricing models</a></li>
</ul>
</li>
<li><a class="nav-link" href="#coding-exercises"><span class="header-section-number">3.6</span> Coding exercises</a></li>
</ul>

      <div class="book-extra">
        <ul class="list-unstyled">
          
        </ul>
</div>
    </nav>
</div>

</div>
</div> <!-- .container -->

<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

  <div class="col-12 col-md-6 mt-3">
    <p>"<strong>Machine Learning for Factor Investing</strong>" was written by Guillaume Coqueret and Tony Guida. It was last built on 2023-07-17.</p>
  </div>

  <div class="col-12 col-md-6 mt-3">
    <p>This book was built by the <a class="text-light" href="https://bookdown.org">bookdown</a> R package.</p>
  </div>

</div></div>
</footer><!-- dynamically load mathjax for compatibility with self-contained --><script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    var src = "true";
    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
    if (location.protocol !== "file:")
      if (/^https?:/.test(src))
        src = src.replace(/^https?:/, '');
    script.src = src;
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script><script type="text/x-mathjax-config">const popovers = document.querySelectorAll('a.footnote-ref[data-toggle="popover"]');
for (let popover of popovers) {
  const div = document.createElement('div');
  div.setAttribute('style', 'position: absolute; top: 0, left:0; width:0, height:0, overflow: hidden; visibility: hidden;');
  div.innerHTML = popover.getAttribute('data-content');

  var has_math = div.querySelector("span.math");
  if (has_math) {
    document.body.appendChild(div);
    MathJax.Hub.Queue(["Typeset", MathJax.Hub, div]);
    MathJax.Hub.Queue(function() {
      popover.setAttribute('data-content', div.innerHTML);
      document.body.removeChild(div);
    })
  }
}
</script>
</body>
</html>