Trading Journal: Day 9 – Hunting for Alpha

I had an apparent breakthrough in ML model building but it was pretty obvious it was too good to be true – train data got mixed into the test/verify data, and after triple checking the results were real, I found an issue. It’s about the 5th time that’s happened to me – I understand how hard it is to generalize machine learning models at this point.

Ash went to bed last night, and I did a bunch of research into the night. I found a couple useful threads.

Lots of OHLCV and L2 data – it’s typically really expensive to find/acquire heaps of fine-grained economic data. The best you can generally hope for is about 60 days of 1m OHLCV data (yahoo finance) or else you’re going to have nothing but daily price data. I found some sub-minute sample data from Indian exchanges as well, and a bunch of order book data. I have about 10 GB of sample data to train/test and search for generalizable models now.
Papers – I found models producing good results with fairly stock indicator data. I dug the code up for the related model building which gives an avenue to build on. The results in the tests presented show a Sharpe over 2 which is what I’m using as a target for myself before I can consider any degree of success in the search.
Competitions – this one was a big one. I found a competition looking to get Confidence of Variation to 5 for predictions based on order book data in a fixed sample. It gives promise, but you have a lot of competition that are constantly hunting the same alpha so the landscape changes a lot. Needing to divide into test+train sets poses challenges in validation of generalizing. Hopefully using lots of recent data and picking less liquid markets will be enough to avoid getting trampled.

I also have a bunch of Rust I wrote to collect order book data from exchanges at blazing speeds, so I’m set to setup a stream to get that data over for an agent to look at as fast as possible. Using Orderbook information will require blazing speeds and will present a lot of complexity in competing for the alpha profitably, but I’ll explore it based on the qualities of autocorrelation in the data.

I also found a Google course on Coursera for Machine Learning for Trading. As I read papers, I’m realizing that understanding how to put together all of the pieces myself will give me a lot more to work with. I barely understand most of the algorithms used and just kind of stir a bunch of numbers until I start to get results. I was hoping that would be enough, but I suspect we’ll need to take a detour and study to be able to pull this off.

I’ve been thinking about this for years and have made multiple attempts, every inch of signal I pull is motivation to keep trying to use RL to build an agent that can do better than I can. Given I’m fairly consistent, I think it should be feasible but I have to be careful not to be too rigid focusing only on the way I see things. There are different approaches and different features – I suspect I’ll end up with a lot of focus on the L2 data instead of OHLCV data, but I’ll see what comes next.