# Strategy Replication – Evolutionary Optimization based on Financial Sentiment Data

Wow, I enjoyed replicating this neatly written paper by Ronald Hochreiter.
Ronald is an Assistant Professor at the Vienna University of Economics and Business (Institute for Statistics and Mathematics).

In his paper he applies evolutionary optimization techniques to compute optimal rule-based trading strategies based on financial sentiment data.

The evolutionary technique is a general Genetic Algorithm (GA).

The GA is a mathematical optimization algorithm drawing inspiration from the processes of biological evolution to breed solutions to problems. Each member of the population (genotype) encodes a solution (phenotype) to the problem. Evolution in the population of encodings is simulated by means of evolutionary processes; selection, crossover and mutation.
Selection exploits information in the current population, concentrating interest on high-fitness solutions. Crossover and mutation perturb these solutions in an attempt to uncover better solutions. Mutation does this by introducing new gene values into the population, while crossover allows the recombination of fragments of existing solutions to create new ones.

After reading Ronald’s paper I immediately wanted to test the hypothesis that the model is good at predicting the 1-day ahead direction of returns. For example, when the rule determines to go long, are the next day returns positive and when the rule determines to exit the long position or stay flat, are the next day returns negative. The results are not much better than a flip of a coin (see the results in the attachment below). Also, turnover is high (see the plot) which may warrant the strategy useless.

However, many variations on this genetic algorithm exist; different selection and mutation operators could be tested and a crossover operator could be added. Instead of using financial sentiment data a variety of technical indicators could be used for generating an optimal trading rule – e.g. see “Evolving Trading Rule-Based Policies“.

I emailed Ronald to get clarification regarding several questions I had. He kindly and swiftly responded with appropriate answers.

• No crossover is used as the chromosome is too short.
• The target return for the Markowitz portfolio is calculated as the mean of the scenario means, i.e. mean of the mean vector.
• Pyramiding is not considered. The rule just checks whether we are invested (long) in the asset or not.
• A maximum number of iterations is specified as the stopping rule.

Like my earlier post, End-of-Day (EOD) stock prices are sourced from QuoteMedia through Quandl’s premium subscription and the StockTwits data is sourced from PsychSignal.

The following comparisons and portfolios were constructed:

1. In-Sample single stock results – Long-only buy-and-hold strategy vs. Optimal rule-based trading strategy
2. Out-of-Sample Buy-and-hold optimal Markowitz portfolio
3. Out-of-Sample Buy-and-hold 1-over-N portfolio
4. Out-of-Sample Equally weighted portfolio of the single investment evolutionary strategies

I used R packages quadprog and PerformanceAnalytics but I wrote my own Genetic Algorithm. I’ll continue using this algorithm to evaluate other indicatorssignals and rules  🙂

Here’s some code with the results. The evolutionary risk metrics (pg. 11) are not as good as those in the original paper (I used 100 generations for my GA) but as you can see, my output is almost identical to Ronald’s. A clone perhaps – hehe.

replicatoR_mendel_ronh

If you have a specific paper or strategy that you would like replicated, either for viewing publically or for private use, please contact me.

## 16 thoughts on “Strategy Replication – Evolutionary Optimization based on Financial Sentiment Data”

1. Love it! 🙂 Very well done, Nick!

1. Hi Arnaud, thanks for the feedback.

The rgenoud package certainly looks good !
I like your blog site too.

BTW – If you ever want to catch up in London, let me know.
Nick

2. jacky says:

Interesting paper and results! Thanks for sharing.

3. ciupka says:

Excellent work

4. @ jacky and ciupka

I’m glad you like the replication.

I think these evolutionary algorithms are useful in hybrid models too (e.g. ANNs with GP or GAs).
I’ll post something covering this topic in the future.

5. Interesting work. I talk about the Bollen paper in my new book http://arxiv.org/abs/1010.3003

Raised objections deal with data snooping. How did you deal the it? To be specific: Did you only use the out-of-sample once or you applied the GA until you got good results?

6. Hi Michael, thanks for commenting.

Firstly, that’s an interesting article you wrote recently 😉
http://www.priceactionlab.com/Blog/2015/09/hypothesis-testing/

But in reality, it’s not a great example of how a real backtest should work. Fortunately, the story is a work of fiction.

The hypothesis was on the overall strategy performance – a better approach is to define a hypothesis for each component and test each one separately.

There was no Optimization, no evaluation for Parameter Robustness, no Walk Forward Analysis, no Regime Analysis, no Risk Management….
I hope you cover these topics in your book ?

If Alice and Don had started their in-sample backtest in 2000 instead of 1993, what results would they have gotten ?

I think that if hypothesis testing is done correctly it’s the best approach to developing systematic trading strategies !

Now, to answer your question, the trading rules were optimized only on the in-sample dataset. These rules were then used on the out-of-sample data (once only).
Never optimize your parameters on the out-of-sample results! 🙂

In production I would re-calibrate the rules every so often, as most strategies do not have stable parameters through time.

Best,
Nick

1. Hello Nick,

You wrote about my blog:

“There was no Optimization, no evaluation for Parameter Robustness, no Walk Forward Analysis, no Regime Analysis, no Risk Management….I hope you cover these topics in your book ?”

Yes I do and all of them can be useless and even dangerous. The more tests you run, the more data-snooping you generate. Note for example that over-fitted systems are often robust to parameter variations because this is usually what that means.

You also wrote:

“I think that if hypothesis testing is done correctly it’s the best approach to developing systematic trading strategies !”

All hypothesis tests are conditioned on historical data and subject to large errors (Type I and II) if market conditions change. Therefore, hypothesis testing is a naive way of doing things in more ways that people think.

The problems with your analysis are due to multiple comparisons of the GA algorithm and are briefly discussed in this post:

http://www.priceactionlab.com/Blog/2015/09/counter-intuitive-backtesting/

In fact, any p-values you calculate from a hypothesis test must be corrected for multiple comparisons and almost always do not provide support for the alternative.

Finally, you are correct but with a cavea when you say:

“Now, to answer your question, the trading rules were optimized only on the in-sample dataset. These rules were then used on the out-of-sample data (once only).”

Out-of-sample tests are suitable for an independent hypothesis test. The problem is that a GA does multiple comparisons and what you do not realize is that if you do that frequently at the end you will get fooled by randomness due to multiple comparisons. Maybe “at the end” was this time.

Just a note of caution.

Best.

1. Hi Michael, I welcome your views and comments.

I think you are determined to dismiss Hypothesis Driven strategy development so I won’t try and change your mind.

Whatever methods and framework you apply, extreme care must be taken. There is no silver bullet !

Some approaches I take are to ensure that the input parameter set is robust (i.e. a small change in the inputs don’t result in large changes in the output performance). Basically, look for stable regions.
Use as few input parameters as possible.
Walk forward analysis can help minimize Data Mining Bias and overfitting.
Re-calibrate the parameters and rules every so often, as most strategies do not have stable parameters through time.

All the best and good luck ! 🙂

7. Hi Nick,
I didn’t read the original paper, so forgive me for my naive question:
Did your GA predict just the sign of the next-day return, or the magnitude as well? If just the sign, what do you use for the expected return magnitude in your Markowitz portfolio optimization?
Thanks,
Ernie

8. Hi Ernie. Many thanks for commenting !

The GA is used to predict the best trading rules for each stock where each genotype encodes a trading rule-set. For example, AXP is (1, 1, 1, 0, 0.44, 0.41, 0.31, 0.17).
This translates to;
IF (ibull >= 0.44) AND (rbull >= 0.41) THEN long position
IF (ibear >= 0.31) THEN exit position

So, hopefully these optimized parameters are good at predicting the direction of returns.
Hence, my evaluation of the GA accuracy looked at how well it predicted the sign (only) of the next-day return.
(Instead of using the Sharpe ratio as the fitness function, perhaps the accuracy could be used).

The expected return magnitude in the Markowitz portfolio optimization took the mean of the in-sample expected returns of the individual stocks.

(please see page 3 of the source paper for a description of the rule-set)

P.S. – see you in London 
http://www.globalmarkets-training.co.uk/artificialintelligence.html

Best,
Nick

9. Reblogged this on Insight Corporation and commented:
Mintegration with this interesting post on Evolutionary Optimization applied to Portfolio Management :

STRATEGY REPLICATION – EVOLUTIONARY OPTIMIZATION BASED ON FINANCIAL SENTIMENT DATA