RFinance_2017_notes.txt



## talking points

Machine learning from very high level, no SVM or neural nets.
How applying ensembles and shrinkage can improve model performance.
How to build models less sensitive to lookback window.
One of the crucial meta-parameters is the length of the rolling lookback window.  
Backtesting lookback window in itself is a form overfitting.
Simple coin flipping model: identify skilled manager.
Selecting From Among Multiple Managers: p-hacking.
Effect of Lookback Window Length: in reality we don't have luxury of increasing lookback window.
On the plus side, we don't need a very high confidence level to successfuly trade.
Simulating Managers with Time-dependent Skill: red manager has higher skill, but underperforms by chance.
Trend-following: hard to Select Best Manager From Previous Period, longer lookback window doesn't help.
Ensemble: Select Top Two Managers From Previous Period works much better, because betting that best manager is among top two is easier.
Selecting Top Two Managers is really an ensemble of two models.
Ensemble shrinks the weights to zero.
? We suffer from bias (we don't choose the best strategy), but we gain from lower variance.
Long-short Ensemble: is momentum, or betting on relative outperformance.
Conclusion: whatever machine learning model you're using, it's much easier to bet on performance of ensemble or on relative outperformance.


## temp

has been successfuly applied to speech and image recognition, but has faced challenges in automated trading.  For example forecasting models which rely on historical prices have very poor performance, because asset returns have a very low signal to noise ratio.  As a result, model forecasts have large standard errors, making them difficult to apply in practice.  The problem is often exacerbated by the high dimensionality of many financial models, for example portfolio optimization.

We study several techniques which can help to reduce the problem of large standard errors, such as parameter regularization (shrinkage), dimensionality reduction, and ensembles of models.  We use bootstrap simulation to estimate the standard errors, and to quantify the effect of these techniques. 

Performance of strategies is time-dependent
Switching between strategies

False Discovery Rate Problem {.smaller}  
- p-hacking or false-discovery rate problem:
What is the probability of obtaining a very large number of heads?
plot probability as function of number of coins

- If we Flip Many Coins Then We Will Always Find One That We Think is Biased

- How long should the lookback window be, to avoid training a model on noise?  

In practice we're faced with a much more difficult problem of choosing among many different strategies.
Let's imagine we flip many coins.
We Will Always Find One That We Think is Biased

potential solution of false-discovery rate problem:
control the false-discovery rate using Bonferroni method Sidak correction  
increase p-value
not a good solution


We're allowed to select a fixed number of coins at random, and then to flip them together a fixed number of times.
How many coins should we select at random, in order too maximize our chances of selecting the biased coin?

another solution: increase the amount of data - use minutely data - but doesn't work

in real world return distributions are not stationary and drifts aren't static
time-varying drift (alpha)
simulate asset returns with static drifts plus a random noise.
Model with static drifts
Higher volatility requires more data
use $1\%$ daily vol and daily drift equal to $0.031\%$

simulate asset returns with time-dependent drifts plus a random noise.
bias-variance tradeoff
optimal length of the lookback interval
Higher volatility requires longer lookback interval
faster drift requires shorter lookback interval

use $1\%$ daily vol and daily drift equal to $0.031\%$

first conclusion is that we need to use minutely data
daily data isn't sufficient to switch between strategies

second conclusion is that we need to use shrinkage


## plan

+ [ ] Explain Probability of Backtest Overfitting by Prado.
+ [ ] Explain Minimum Backtest Length formula by Prado.
+ [x] Simulate several managers, with only one manager with skill.
+ [ ] The remaining managers underperform slightly, so that average performance is zero.
+ [x] Simulate several managers, with their skill changing over time.
+ [x] Plot the prices of three managers.
+ [x] Apply trend-following rule: select the best performing manager in previous period. 
+ [x] Does this strategy always outperform selecting a manager at random?
+ [ ] Demonstrate that out-of-sample performance pnl increases with the length of the lookback window, but only up to a point.
+ [ ] Demonstrate bias-variance tradeoff.
+ [ ] Demonstrate that out-of-sample performance decreases with greater number of managers.
+ [ ] Calculate the p-values in each period, and demonstrate that they are meaningless.
+ [ ] Apply ensemble rule: long top manager and short bottom manager.
+ [x] Demonstrate that ensemble lookback window profile is lower and flatter.
+ [x] Explain that ensemble is a form of shrinkage, because the sum of squared weights is smaller.
+ [ ] Apply momentum rule: long top 3 managers.
+ [ ] Apply momentum rule: long top 3 managers and short bottom 3 managers.
+ [ ] Demonstrate that momentum lookback window profile is lower and flatter.


## thoughts

add plots with distributions skilled and unskilled managers

In trading we don't need 95% confidence, just a very small edge of odds in our favor

Model with static drifts
Higher volatility requires more data

$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$

TRUE
FALSE
![rolling_windows](C:/Develop/R/lecture_slides/figure/rolling_windows.png)  

    smaller: yes

## Machine learning paradigm

- At each point we compare many strategies over the lookback window, and switch to the best performing strategy.  
- But backtesting is a black box.
- Length of the rolling lookback window is a meta-parameter
- The length of the rolling lookback window is a meta-parameter, which determines how quickly the model adapts to new information.  
- In machine learning we can simulate the learning process using backtesting.  
- Backtesting allows simulating the learning process.  

## Analogy With Coin Flipping {.smaller}  

which strategy is best and we switch to that one 
backtesting paradigm
rolling window
insert backtest png
learn 

Let's start with a very simple analogy of coin flipping.
How Many Coin Flips are Needed to Decide Which Coin is Biased?

- How many years of performance data is needed to select a manager with skill?  

- What is the probability of selecting the skilled manager from among multiple managers?  

- What is the probability of correctly selecting a biased coin from a set of unbiased coins, after flipping the coins simultaneously $n$ times?  

- How much performance data is needed to select a manager with skill?  

- The challenge is that a manager without skill can have good performance, based on chance alone.

- If we Flip Many Coins Then We Will Always Find One That We Think is Biased

Let's apply this perspective to the problem of identifying active investment managers who can beat the market. Assume that for every such exceptional manager who can generate an excess return-to-risk ratio of 0.2 (after fees and against his benchmark), there are three other managers who have an expected Sharpe Ratio of 0. If anything, this may be optimistic given S&P Dow Jones reports only 1 in 10 active managers have been outperforming their benchmarks.5 How many 'flips' would you need to observe from all four managers, in parallel, to identify the good one with $95\%$ confidence? 

That's saying you would want to see 225 years of returns to be able to identify the good manager with $95\%$ confidence. You might be tempted to think, 'What if I observe the managers on a weekly basis, won't that give me 225 flips in about 4 ½ years?' Unfortunately, that won't help, because when we observe the managers on a more frequent basis,

We have a set of unbiased coins, except for a single biased one, which has a probability of heads greater than 50% (say 60%).  

To find the biased coin, we flip the coins simultaneously $n$ times, and select the coin that produces the most heads.  

If several coins produce the same number of heads, then we select from them a coin randomly. 

What is the probability of correctly selecting the biased coin, after flipping the coins simultaneously $n$ times?  

To find the biased coin, we flip the coins simultaneously $n$ times, and select the coin that produces the most heads.  

If several coins produce the same number of heads, then we select from them a coin randomly. 

What is the probability of correctly selecting the biased coin, after flipping the coins simultaneously $n$ times?  

We're given an urn with unbiased coins, but one of the coins is biased (with a 60% probability of heads).  We want to find the biased coin, so we flip the coins simultaneously, and select the coin that produces the most heads.  

Plot probabilities of selecting the biased coin

We're only allowed to select a fixed number of coins at random, and then to flip them together a fixed number of times.
How many coins should we select at random, in order too maximize our chances of selecting the biased coin?

given an urn with unbiased , but one of the 


You also know that the probability of getting either a biased or an unbiased coin is 50-50.  

but you don't know if it's biased or not, so you flip it many times to see if it produces more heads than tails.  

# Calculate probabilities of selecting the biased coin, after first selecting a given number of coins from a bin of 20 coins
# plot(num_coins*prob_s/20, t="l")

# Calculate probability of selecting the biased coin, when there's a tie in number of heads
# prob_tie <- function(num_coins, pro_b)
#   sum(choose(num_coins-1, 1:(num_coins-1)) * pro_b^(1:(num_coins-1)) * (1-pro_b)^((num_coins-2):0) / (2:num_coins))