Python Progress

A quick update on my Python coding progress.

I am proud to say that I am actually writing some code that is actually doing stuff in Python. Boy starting something new is time consuming, doing the most mundane tasks that one could do in Excel in a flash takes absolute hours to do in a new language.

I also constantly feel why I am doing this when I know how to code many of these functions in R.

After a couple of weeks of learning I have now accumulated some knowledge which has been further enhanced with the arrival of a few more Python textbooks last week. In summary I am fast falling in love with the Pandas data library in Python which is like learning a whole new language on its own. Pandas has one of the most comprehensive data manipulation functions a data scientist can dream of. I love it, and I am sure in the weeks and months to come if I stay with the learning I will be doing really cool things with the data I come across.

So far I haven’t come across a library like “Performance Analytics” in R, once I come into contact with this type of library in Python then I will feel complete. I believe there is a way to pass R functionality via a wrapper into a Python project. This may be something I will research later today, however my first prize is to stay completely Python native, so anyone with insights into a Performance Analytic type library in Python please let me know.

Alpha Stable Distributions

I am doing some work with return distributions. In a previous gig we did a lot of work on this subject, and I was really encouraged by the track of work we were following.

A quick recap: we know that trading returns do not typically follow a normal Gaussian distribution path, yet most of the models in modern day finance still use these less than perfect solutions. Its typical human nature, the solutions give us a nice elegant quick solution most of the time. Because a more accurate solution is more difficult to figure out we rather dismiss the facts that bad things happen more often than we anticipate in the name of progress.

We were determined to find a more realistic solution, in keeping with my previous post, keeping it real 🙂 .

In the 2 charts I am going to illustrate how our most likely estimate (MLE) model using Alpha Stable Levy distributions does a much better job than the blue line normal distribution model.

fitted distribution

fitted distribution 2

Quantitative Ramblings

This post is an opportunity to play with a few ideas buzzing in my head this afternoon.

Here is some data on the hedge fund industry taking from the EDHEC dataset. See website for more details:

Here is an overview of monthly performance from 1 Jan 1997 – 31 Dec 2014.



On a risk adjusted basis the Equity Market Neutral returns are the clear star performer.






Now let us have a look at how some of these returns look from the lens of a normal distribution.

Rplot16 Rplot15 Rplot14 Rplot13





There is a very strong argument that the markets are random. If this is the case then fund managers should not be able to demonstrate any consistent pattern of returns. One of the ways to determine if there is a persistence of performance is to test for auto-correlation. In essence auto-correlation is a process whereby you test the correlation of a time series by itself but create a series of lags. Naturally the 0 lag will have a perfect correlation of 1 (100%) what you are looking to see is if the lags produce a statistically significant correlation by piercing the horizontal dashed lines. If there is a  statistically significant auto-correlation after many lags, I think we can dismiss this as spurious we are looking for significance after few lags.

I wasn’t surprised to find that the only strategy to produce auto-correlation was the Equity Market Neutral strategy.  L/S Equity was also able to produce auto-correlation.


Rplot18 Rplot19 Rplot20


To conclude this post is just me rambling along while watching some news. I will do more in depth analysis some time in the future but I think from the data presented there is certainly a strong argument to be made for managers that try and take out market direction in their trading behaviour. I think this makes a lot of sense, if forecasting the markets is random as many suggest, then the best chance of producing alpha as a manager is if you see the investing world within the relative scope of a market neutral environment.


Momentum Traders

We all hear that many institutions look at the momentum strategy where you go long when the 50 day moving average is above the 200 day average. As you will see below from 1993 this has been a pretty effective strategy using daily data applied to the S&P 500. For your information the strategy is still long despite the recent volatility.

Here is a chart showing the moving averages, the yellow is the 50 day and the red is the 200 day.



Here are the backtest results:

2015-01-23_1313 Rplot043

For those wanting to see the R code generating these charts and stats here it is:

#get the data and fill out the MA
getSymbols('SPY', from='1950-01-01')
SPY$ma200 <- SMA(Cl(SPY), 200)
SPY$ma50 <- SMA(Cl(SPY), 50)
#lets look at it from 1990 to 2015
spy <- SPY['1990/2015']
#our baseline, unfiltered results
ret <- ROC(Cl(spy)) 
#our comparision, filtered result
ma_sig <- Lag(ifelse(SPY$ma50 > SPY$ma200, 1, 0))
ma_ret <- ROC(Cl(spy)) * ma_sig
golden<- cbind(ma_ret,ret)
colnames(golden) = c('GoldCross','Buy&Hold')
#Plot to visually see the actual moving averages
            type = "line",
            name = "Moving Average : Golden Cross",
            TA= c(addSMA(50, col = 'yellow'), addSMA(200)))
# lets see what the latest signals are 1 being a buy signal
table.AnnualizedReturns(golden, Rf= 0.02/252)
charts.PerformanceSummary(golden, Rf = 0.02, main="Golden Cross",geometric=FALSE)

Created by Pretty R at

Volatility Clustering

In case you thought volatility is isolated, the charts below give you an idea of how volatility leads to more volatility – duh! Knowing this doesn’t license you to print money, life would be way too boring if that was the case. The problem is not knowing that volatility cluster it is that we have no idea how long we will stay in a low or high volatility environment and its amplitude. In essence the billion dollar question is identifying when an environment regime shifts.

There is a ton of research in this area, and I have to confess I am attracted to the hidden Markov chain process in the shorter time horizons. Later in the year I hope to share some of my research into this subject. Let’s get back to the clustering:

In the top chart I look at the S&P500 on a rolling 100 days with 2% or more moves, and the bottom chart shows the Shanghai Stock Index which took a 7% bath yesterday, and with GDP numbers coming out later today promises to provide further fireworks. What I can say about the comparative 2 charts is that China is starting to look like its entering a more volatile period ahead of the US.

For those interested in the R code I have included it as well. (Hat tip to John Hussman for the graphic idea, I wish my code were more elegant, but I think it does the job, ignore the errors about my labelling).



Shanghai Stock Index

Shanghai Stock Index

#get the data
getSymbols('^SSEC', from='1990-01-01')
#lets look at it from 1990 to 2015
#spy <- SPY['1990/2015']
Shang<- SSEC['1990/2015']
#our baseline, unfiltered results
ret <- ROC(Cl(Shang)) 
#our comparision, filtered result
filter.d <- Lag(ifelse(ret < -0.02, 1, 0))
drops<- rollapply(filter.d==1,100,sum)
filter.g <- Lag(ifelse(ret < 0.02, 1, 0))
gain<- rollapply(filter.g==1,100,sum)
plot(drops, main = "Drop and Gain Clustering", sub = "sum of 2% movements over 100 prior days", ylab ="drops")
plot(gain, main = "Drop and Gain Clustering", labels = FALSE, col = "red")
axis(side =4)
mtext("gains", side = 4)

Created by Pretty R at

Shiller PE Model

I am on a roll, I thought I would pull out my pride and joy the Shiller PE model and see what it has to say.


Now we are talking you can see significant outperformance. Here we are looking at more than 114 years worth of data. I guess that would include all cycles 😉

Before I get too carried away I was a bit concerned with the date structuring over the last year. I have used data from Quandl’s (MULTPL) dataset and I seem to be getting signals more than 1 per month. According to the model it is currently invested and long.


I am going to post the code below, but I am warning that I plan to do a little bit of work in the near future to double check if I am manipulating the dates correctly. Enjoy if this is your thing.

# I am pulling the data from Quandl and the MULTPL dataset. 
multpl<- read.csv('', colClasses=c('Date'='Date'))
snp<- read.csv('', colClasses=c('Date'='Date'))
date<- snp$Date
values<- snp[,2]
snp.obj<- as.xts(values, = as.Date(date, "%d/%m/%Y"))
snprets<- ROC(snp.obj, type = "discrete", n = 1)
date<- multpl$Date
values<- multpl[,2]
PE.obj<- as.xts(values, = as.Date(date, "%d/%m/%Y"))
Shiller<- merge(snp.obj,PE.obj, snprets)
Shiller.sub = Shiller['1900-01-01::']
colnames(Shiller.sub) = c('S&P500','Shiller PE','S&P500 returns')
mean<- rollapply(PE.obj,48,mean)
sdsig<- rollapply(PE.obj,48,sd) + mean
over<- Lag(ifelse(PE.obj> sdsig,1,0))
pe_ret <- snprets * over
PEtimer<- cbind(pe_ret,snprets) 
colnames(PEtimer) = c('PE-Timer','Buy&Hold')
grid.newpage(recording = FALSE)
grid.newpage(recording = FALSE)
grid.table(table.AnnualizedReturns(PEtimer, Rf=0))
table.AnnualizedReturns(PEtimer, Rf= 0)
charts.PerformanceSummary(PEtimer, Rf = 0, main="Shiller PE Timer",geometric=FALSE)

Created by Pretty R at


There are a number of sites that market their signals as great indicators for outperforming the market, let me highlight how careful one needs to be when looking at backtests.

In the strategy I am about to show you we create a ratio between the S&P 500 and the Dow and we trigger a signal to buy the market when the current ratio is above the rolling mean ratio. I will include the code at the bottom for those who want to understand the details, but the table and chart will illustrate my point. A further important point that I must emphasize and will continue to highlight is that I look at risk-adjusted returns as my proxy for out-performance.

So lets begin:

Rplot06 2015-01-15_1532

Yes Sir you beauty, we have here a 24yr backtest where our model handily outperforms a buy and hold with a Sharpe Ratio of 0.32 vs 0.22. So should we bet the farm on this baby? Not so fast I say, lets look at this strategy over some more data, in the table below we look at performance from 1950 a lengthy 64yrs.


What we see here, is underperformance; so it is very important when considering a model to ensure that the starting date isn’t cherry picked. In this illustration there are very few parameters, and we only tweaked the date. Many people pushing automated models love to “curve-fit” parameters to satisfy the backtest with no basis of reality.

Here is the R code for those that are interested:

getSymbols("^GSPC", from= "1900-01-01")
sp500.weekly <- GSPC[endpoints(GSPC, "weeks"),6]
sp500rets<- ROC(sp500.weekly, type = "discrete", n = 1)
DJ<- read.csv('', colClasses=c('Date'='Date'))
date<- DJ$Date
values<- DJ[,2]
DJ_xts<- as.xts(values, = as.Date(date, "%d/%m/%Y"))
dj.weekly <- DJ_xts[endpoints(DJ_xts, "weeks"),1]
djrets<- ROC(dj.weekly, type = "discrete", n = 1)
data<- merge(sp500.weekly,dj.weekly)
data.sub = data['1950-02-05::']
ratio<- data.sub[,1]/data.sub[,2]
ave.ratio<- rollapply(ratio,20,mean)
lead.lag<- ifelse(ratio >= ave.ratio, "Lead", "Lag")
# filtered results investing in S&P500 with the signal
ma_sig <- Lag(ifelse(lead.lag=="Lead", 1, 0))
ma_ret <- sp500rets * ma_sig
dowtimer<- cbind(ma_ret,sp500rets) # or
colnames(dowtimer) = c('SPX/DJI-Timer','Buy&Hold')
table.AnnualizedReturns(dowtimer, Rf= 0.04/52)
charts.PerformanceSummary(dowtimer, Rf = 0.04/52, main="SPX/DJI Timer",geometric=FALSE)

Created by Pretty R at