Technical Background for AlpacaForecast AI Prediction Matrix

June 6th, 2018 09:06


We use deep learning on high frequency tick data to recognize patterns indicating price change for our market forecasting models.

The idea is that despite a low signal to noise ratio in the market, a correctly implemented algorithm can extract a useful signal among the noise.
Instead of manually crafting an algorithm, we use Alpaca’s technology, based on state of the art machine learning techniques, to generate models capable of extracting the signal by itself.
Using high capacity models, we can use big amount of raw market data to extract small but significant event that would not be detectable on aggregated data or using lower capacity models, allowing us to capture otherwise invisible market events.

What machine learning models do we use

We use our own proprietary deep learning models, based on state of the art Convolutional Neural Networks (CNN) adapted for the specifics of financial data.

Deep Learning is a field of machine learning using neural networks composed of many layers, able to tackle previously intractable problems. It has seen great success in recent years for various tasks, from automated translation, game playing at professional level to computer vision. In computer vision in particular, deep learning allowed the development of current self driving cars, automated image captioning systems, and image classification at scale.
We draw inspiration from such computer vision technologies to detect market patterns, using stacked convolutions to build progressively more abstract patterns from tick data, with model trained end-to-end directly for forecasting.
Convolutions are well suited for the task, and allowed us to reach considerably higher performances than traditional Recurrent Neural Networks(RNN) applied to time series. Interestingly, the trend of using convolution instead or combined with recurrent cells is also seen in modern Natural Language Processing research.
For even better performances, we combine multiple models together to construct our final Alpaca signals.

Why do we use tick data

Thanks to the high capacity of Alpaca’s models, we’re able to feed high density data directly into the model, and replace most manual feature engineering steps.

This allows us to learn feature abstraction progressively within the model, instead of using sub-optimal dimension reduction techniques manually built for each assets. For example, candlestick charts are great for simplifying market information to the essential, but might miss some low level patterns useful for fine grained predictions.
Alpaca is also taking advantage of it’s advanced technology stack purposely built to handle massive amount of financial data, such as the marketstore, which we open-sourced and can be found on GitHub.

Performance Evaluation

During our development cycles we strictly separate training, validation and test data according to machine learning best practices.

We also take into account specifics of time series data, such as varying trends, seasonal patterns and calendar events.
To get the best performance evaluation, we continuously collect live data and evaluate our models in concrete real-time situations, to create reports tailored for our users.

If you are interested and want to know more, as always, don’t hesitate to contact us!