Building a DRL Agent: Market Selection and Processing Data for Training

This article is a collection of grok deepresearch notes to use as simple starting point for building your first DRL model trained on OHLCV data. This artifact is focused on data – both OHLCV+features as well as some starting alt data. This will not be very generalizable (eg it probably won’t work out of sample well enough), but it gives you a starting point to explore. I’ll explain how to take the data and train a DRL model w/ PPO+LSTM in another post – it’s not hard. This is kind of the “feature engineering for trading 101” course.

If you need a little background on which features to choose, there are some tools like AlphaLens that attempt to measure the signal in an indicator. See also: https://en.wikipedia.org/wiki/Principal_component_analysis

Most of the indicators are very weak, so I hypothesise getting good profitability without attention will be immensely difficult unless you’re looking in the order book. That’s also why this article recommends smaller markets, because there is more signal left for you to compete on. Big players can’t play in small markets – your competition will be unsophisticated relative to trading something like Bitcoin.

Starting with OHLCV data and building a model to train with it is a really good way to get a taste for the entire machine learning workflow, just don’t expect to find profitability without significant effort and learning. If you’re not a profitable day trader yourself, I suspect it will be very hard to do much better than random entries and exits.

A Comprehensive Note on Feature Engineering for DRL Agents with LSTM in Small or Mid-Cap Crypto Trading

This note provides a detailed exploration of how to engineer features for building a Deep Reinforcement Learning (DRL) agent with Long Short-Term Memory (LSTM) networks, specifically targeting small or mid-cap cryptocurrencies using Open, High, Low, Close, Volume (OHLCV) data. The focus is on processing, normalizing, and organizing these features to create a profitable trading model, considering the unique characteristics of less competitive markets, such as higher volatility and lower liquidity. The analysis draws from recent research and practical implementations, ensuring a thorough understanding for both technical and non-technical audiences.

Background and Context

Small or mid-cap cryptocurrencies, often defined by lower market capitalization compared to large-cap assets like Bitcoin or Ethereum, operate in less competitive markets. These assets typically exhibit higher volatility, lower liquidity, and potentially more susceptibility to external factors like news and social media. Given these traits, a DRL agent with LSTM, which excels at processing sequential time series data, is well-suited for capturing temporal dependencies in price movements and trading decisions. The challenge lies in selecting and engineering features from OHLCV data to represent the market state effectively, particularly when historical data may be limited for small caps.

Feature Engineering Process

The feature engineering process involves several steps, each critical to preparing the data for the DRL agent. The order of operations is important to ensure consistency and avoid data leakage, especially given the time series nature of the data. Below, we detail each step, supported by examples and considerations for small or mid-cap cryptos.

Step 1: Data Loading and Preprocessing

The first step is to load the CSV file containing time and OHLCV data (Open, High, Low, Close, Volume) into a suitable data structure, such as a pandas DataFrame in Python. This ensures the data is in the correct format and sorted by time, which is essential for time series analysis.

Action: Read the CSV file and parse the time column as dates, setting it as the index for easy time-based operations.
Example Code: pythonCollapseWrapCopyimport pandas as pd df = pd.read_csv('data.csv', parse_dates=['time'], index_col='time')
Considerations: Ensure there are no missing values or erroneous data points, as these can affect rolling calculations. For small caps, data quality might vary, so handle missing values by interpolation or dropping, depending on the extent.

Step 2: Feature Creation from OHLCV Data

From the raw OHLCV data, create derived features that capture market trends, volatility, and liquidity, which are particularly relevant for small or mid-cap cryptos. These include technical indicators and volatility measures, as outlined in the research.

Basic Features: The raw OHLCV data (open, high, low, close, volume) forms the foundation. These are directly usable and provide a snapshot of daily or intraday price movements and market activity.
Technical Indicators: Calculate indicators such as:
- Simple Moving Average (SMA): Average of close prices over a period (e.g., 5, 20 days).
- Exponential Moving Average (EMA): Weighted average, more responsive to recent prices (e.g., 5, 20 days).
- Relative Strength Index (RSI): Measures momentum, typically over 14 days, to identify overbought/oversold conditions.
- Moving Average Convergence Divergence (MACD): Difference between 12-day and 26-day EMAs, with a 9-day signal line.
- Volatility Measures: Standard deviation of daily returns over a window (e.g., 20 days) to capture price swings, crucial for volatile small caps.
Implementation: These indicators are calculated using rolling and exponential weighted moving average functions in pandas. For RSI, a manual calculation involves price differences and rolling means of gains and losses.

The following table lists common indicators and their relevance:

Indicator	Description	Relevance for Small/Mid-Cap Cryptos
Simple Moving Average (SMA)	Average price over a period (e.g., 5, 10 days)	Smooths noise, useful for volatile assets
Exponential Moving Average (EMA)	Weighted average, more responsive to recent prices	Captures rapid price changes in small caps
Relative Strength Index (RSI)	Measures momentum, typically over 14 days	Identifies overbought/oversold conditions
Moving Average Convergence Divergence (MACD)	Difference between short and long EMAs	Identifies trend changes, relevant for trends
Volatility (Std Dev of Returns)	Standard deviation over a period (e.g., 20 days)	Captures higher volatility in small caps

Example of a 21 period EMA on the 1m chart. You can see that when price crosses over the EMA it can indicate the trend is weakening – this might give you some ideas for different features you may want to try engineering.

Example Code for Indicators:

# SMA
df['sma_5'] = df['close'].rolling(window=5).mean()
df['sma_20'] = df['close'].rolling(window=20).mean()

# EMA
df['ema_5'] = df['close'].ewm(span=5, adjust=False).mean()
df['ema_20'] = df['close'].ewm(span=20, adjust=False).mean()

# RSI
def calculate_rsi(data, window):
    delta = data.diff()
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    avg_gain = gain.rolling(window).mean()
    avg_loss = loss.rolling(window).mean()
    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    return rsi
df['rsi'] = calculate_rsi(df['close'], window=14)

# MACD
df['ema_12'] = df['close'].ewm(span=12, adjust=False).mean()
df['ema_26'] = df['close'].ewm(span=26, adjust=False).mean()
df['macd'] = df['ema_12'] - df['ema_26']
df['signal'] = df['macd'].ewm(span=9, adjust=False).mean()

# Volatility
df['returns'] = df['close'].pct_change()
df['volatility'] = df['returns'].rolling(window=20).std()

You can experiment by adding different derived features like distance from closing price relative to the indicator. As an example, personally, on 1m chart I sometimes use the 21 period ema to help me see where the trend breaks, so what I’m looking for as a human is the distance relative to close, if it’s over/under (negative vs positive delta) and did it cross up or down recently.
Considerations for Small Caps: Given higher volatility, consider shorter window sizes for moving averages (e.g., 5-day instead of 20-day) to capture rapid price changes. Volume is particularly important for assessing market activity in less liquid markets.

Step 3: Normalization of Features

Normalization is crucial to ensure all features are on a similar scale, which helps the neural network converge properly. Given the time series nature, normalization must avoid using future data to simulate real-world scenarios.

Method: Use MinMax Scaling (scaling features to [0,1]) or standardization (mean 0, standard deviation 1). For this note, MinMax Scaling is used, as it is common in neural network applications.
Order of Operations: Normalize after adding derived features, as technical indicators are defined on actual price data, not normalized data. Normalizing raw data first and then calculating indicators would alter their meaning, so calculate indicators from raw data and then normalize the combined feature set.
Implementation: Split the data into training and testing sets first, fit the scaler on the training set, and transform both sets to prevent data leakage. This ensures the testing set is normalized using the same parameters as the training set, simulating real-time conditions.
Example Code for Normalization:

from sklearn import preprocessing
features_to_normalize = ['open', 'high', 'low', 'close', 'volume', 'sma_5', 'sma_20', 'ema_5', 'ema_20', 'rsi', 'macd', 'signal', 'volatility']
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]
scaler = preprocessing.MinMaxScaler()
scaler.fit(train_df[features_to_normalize])
train_df[features_to_normalize] = scaler.transform(train_df[features_to_normalize])
test_df[features_to_normalize] = scaler.transform(test_df[features_to_normalize])

Considerations: For small caps, data might have outliers due to high volatility, which can affect MinMax Scaling. Consider robust scaling methods or handling outliers before normalization. Additionally, for real-time applications, rolling normalization (based on past data) might be more appropriate, though for batch processing, the above approach is standard.

Step 4: Organizing Data for DRL Agent

In DRL, the agent needs to observe the state at each time step and decide on an action (e.g., buy, sell, hold). The state typically consists of a sequence of past observations, which is where LSTM comes in, as it can handle sequential data.

Action: Create sequences of the normalized feature vectors using a sliding window approach. For example, use the last 10 days of data as the state for day t, forming a sequence of feature vectors.
Implementation: Convert the normalized dataframe to a numpy array and create sequences for both training and testing sets.
Example Code for Sequencing:

from sklearn import preprocessing
features_to_normalize = ['open', 'high', 'low', 'close', 'volume', 'sma_5', 'sma_20', 'ema_5', 'ema_20', 'rsi', 'macd', 'signal', 'volatility']
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]
scaler = preprocessing.MinMaxScaler()
scaler.fit(train_df[features_to_normalize])
train_df[features_to_normalize] = scaler.transform(train_df[features_to_normalize])
test_df[features_to_normalize] = scaler.transform(test_df[features_to_normalize])

Considerations: The sequence length (e.g., 10 days) should be chosen based on the trading strategy and data frequency. For small caps, shorter sequences might capture rapid price changes better. The agent’s state in DRL might also include internal states like current holdings, but for feature engineering from OHLCV, focus on market data features as shown.

Additional Considerations

Rolling Normalization Consideration: While the standard approach is to normalize the entire training set, for real-time applications, rolling normalization (normalizing based on a sliding window of past data) could be considered to handle non-stationarity, though it adds complexity. This is particularly relevant for small caps with potentially trending data.
Additional Features: Research suggests including social indicators like Google Trends score.
Controversy and Flexibility: There is debate on whether to normalize each feature separately or together, with some preferring standardization over MinMax Scaling. The choice depends on the model and data characteristics, with MinMax Scaling being common for neural networks.

Practical Implementation and Best Practices

For practical implementation, the input to the LSTM would be a sequence of feature vectors, each representing a time step (e.g., daily data). A minimal set could include close price, volume, 5-day and 20-day SMAs, RSI (14-day), and volatility (20-day standard deviation). Adjust window sizes based on data frequency and market characteristics, especially for small caps with higher volatility.

Best Practices: Handle missing values before calculations, use training set statistics for normalization, and experiment with window sizes for indicators to optimize for small or mid-cap crypto trading. Libraries like TA-Lib can simplify indicator calculations, though manual methods are shown for accessibility.

Conclusion

In summary, to engineer features for a DRL agent with LSTM using OHLCV data for small or mid-cap crypto trading, load and preprocess the data, calculate technical indicators and volatility measures, normalize all features using training set statistics, and prepare sequences for the DRL agent. The order is critical: calculate derived features from raw data first, then normalize the combined set, ensuring consistency and avoiding data leakage. This approach, with adjustments for higher volatility and lower liquidity, forms a robust foundation for profitable trading models in less competitive markets.

Key Citations

AI4Finance-Foundation/FinRL: Financial Reinforcement Learning https://github.com/AI4Finance-Foundation/FinRL
Deep Reinforcement Learning for Trading Cryptocurrencies by Harsha Andey https://medium.com/coinmonks/deep-reinforcement-learning-for-trading-cryptocurrencies-5b5502b1ece1
Recommending cryptocurrency trading points with deep reinforcement learning approach https://www.mdpi.com/2076-3417/10/4/1506
A deep reinforcement learning framework for the financial portfolio management problem https://arduino.stackexchange.com/abs/1706.10059
Combining deep reinforcement learning with technical analysis and trend monitoring https://link.springer.com/article/10.1007/s00521-023-08516-x

A Comprehensive Note on Incorporating External Data for DRL Agents with LSTM in Small or Mid-Cap Crypto Trading

This note provides a detailed exploration of external data sources beyond Open, High, Low, Close, Volume (OHLCV) data that can be used for building a Deep Reinforcement Learning (DRL) agent with Long Short-Term Memory (LSTM) networks, specifically targeting small or mid-cap cryptocurrencies in less competitive markets. The focus is on identifying and incorporating such data to enhance the profitability of trading models, considering the unique characteristics of these assets, such as higher volatility and lower liquidity. The analysis draws from recent research and practical implementations, ensuring a thorough understanding for both technical and non-technical audiences.

Background and Context

Small or mid-cap cryptocurrencies, often defined by lower market capitalization compared to large-cap assets like Bitcoin or Ethereum, operate in less competitive markets. These assets typically exhibit higher volatility, lower liquidity, and potentially more susceptibility to external factors like news and social media. Given these traits, a DRL agent with LSTM, which excels at processing sequential time series data, is well-suited for capturing temporal dependencies in price movements and trading decisions. While OHLCV data forms the foundation, incorporating external data can provide additional insights, particularly for small caps where market dynamics are less established and more influenced by external factors.

External Data Sources and Rationale

External data, beyond traditional price and volume metrics, can capture market sentiment, public interest, and on-chain activity, which are particularly relevant for small or mid-cap cryptos. The following categories are identified based on research and practical applications:

Google Trends and Search Volume

Google Trends provides a score from 0 to 100 indicating the search volume for a particular term relative to its peak popularity. For a crypto, the search trend for its name or related terms can indicate public interest, which might correlate with price movements, especially for smaller projects with less established market presence.

Relevance for Small Caps: Higher search volume might signal increased attention, potentially leading to price surges or drops, which is crucial for volatile small caps.
Implementation: Use the pytrends library to fetch daily or weekly search scores for the crypto’s name, aligning with the price data’s time index.

Social Media Metrics

Social media platforms like X (formerly Twitter), Reddit, and crypto-specific forums can provide insights into community sentiment and activity. Metrics include the number of posts, comments, and sentiment analysis (positive, negative, neutral).

Examples: Number of daily tweets mentioning the crypto, average sentiment score, or Reddit post counts.
Relevance for Small Caps: Social media buzz can significantly impact small caps due to their reliance on community support and viral growth, potentially offering unique trading signals.
Implementation: Use APIs like Twitter’s to collect posts, employ natural language processing (NLP) tools like TextBlob or VADER for sentiment analysis, and aggregate daily metrics.

On-Chain Metrics

On-chain data, derived from the blockchain, includes transaction volume, number of active addresses, token supply, and distribution. These metrics reflect the crypto’s usage and network activity, which can be predictive for price movements.

Examples: Daily transaction volume, number of unique addresses with transactions, or changes in circulating supply.
Relevance for Small Caps: On-chain activity can indicate adoption and liquidity, crucial for less established cryptos with potentially sparse trading data.
Implementation: Use blockchain APIs (e.g., from CoinGecko, Nomics) to fetch daily metrics, ensuring alignment with price data.

News and Events Data

News articles, press releases, and significant events (e.g., partnerships, conferences) can influence crypto prices. Sentiment analysis of news can provide insights into market expectations.

Examples: Count of positive vs. negative news articles, binary indicators for major events.
Relevance for Small Caps: News can have a disproportionate impact on small caps, given their sensitivity to external catalysts.
Implementation: Use news APIs (e.g., NewsAPI, Alpha Vantage) to collect articles, apply NLP for sentiment, and aggregate daily scores.

Economic Indicators

While cryptos are somewhat decoupled from traditional markets, some economic indicators like exchange rates, inflation rates, or GDP might influence certain cryptos, especially stablecoins or those tied to specific regions.

Examples: USD/EUR exchange rate, inflation rates from major economies.
Relevance for Small Caps: Less direct, but for small caps with regional focus, economic factors might play a role.
Implementation: Fetch from financial data providers (e.g., FRED, World Bank), align with price data, and include as features.

Incorporating External Data into the Feature Set

To incorporate these external data into the DRL agent’s feature set, follow these steps, ensuring alignment with the LSTM’s sequential nature:

Step 1: Data Collection and Preprocessing

Action: Download each external dataset, ensuring it has a time index (e.g., daily or hourly) that aligns with the OHLCV data.
Example for Google Trends: Use pytrends to fetch search scores for the crypto’s name, downloading weekly data and interpolating to daily for alignment.
Example for Social Media: Use Twitter API to collect daily tweets, apply sentiment analysis, and aggregate counts (e.g., number of positive tweets).
Considerations: For small caps, data availability might be limited, so handle missing values by interpolation or forward-filling, depending on the metric.

Step 2: Alignment with Price Data

Action: Merge external data with the OHLCV dataset on the time index, ensuring no gaps or misalignments.
Example Code for Google Trends:

from sklearn import preprocessing
features_to_normalize = ['open', 'high', 'low', 'close', 'volume', 'sma_5', 'sma_20', 'ema_5', 'ema_20', 'rsi', 'macd', 'signal', 'volatility']
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]
scaler = preprocessing.MinMaxScaler()
scaler.fit(train_df[features_to_normalize])
train_df[features_to_normalize] = scaler.transform(train_df[features_to_normalize])
test_df[features_to_normalize] = scaler.transform(test_df[features_to_normalize])

Considerations: Ensure the time zones match, especially for global cryptos.

Step 3: Normalization and Feature Engineering

Action: Normalize all features, including external data, using the training set’s statistics to prevent data leakage. Use MinMax Scaling or standardization.
Example Code for Normalization:

from sklearn import preprocessing
features_to_normalize = ['open', 'close', 'volume', 'google_trends', 'twitter_positive', 'on_chain_volume']
scaler = preprocessing.MinMaxScaler()
scaler.fit(train_df[features_to_normalize])
train_df[features_to_normalize] = scaler.transform(train_df[features_to_normalize])
test_df[features_to_normalize] = scaler.transform(test_df[features_to_normalize])

Considerations: External data might have different scales (e.g., Google Trends 0-100, tweet counts in thousands), so normalization is crucial.

Step 4: Organizing Data for DRL Agent

Action: Create sequences of the combined feature vectors using a sliding window approach, including both OHLCV-derived features and external data.
Example Code for Sequencing:

sequence_length = 10
features = ['open', 'close', 'volume', 'sma_5', 'rsi', 'google_trends', 'twitter_positive']
train_data = train_df[features].values
X_train = []
for i in range(sequence_length, len(train_data)):
    X_train.append(train_data[i-sequence_length:i, :])
X_train = np.array(X_train)

Considerations: The sequence length should account for the temporal dependencies in external data, potentially requiring longer windows for slower-changing metrics like news sentiment.

Considerations for Small or Mid-Cap Cryptos

Given the unique characteristics of small or mid-cap cryptos, incorporating external data requires careful consideration:

Higher Volatility and Sensitivity: External data like social media and news might have a larger impact, offering unique trading signals not captured by price data alone.
Limited Data Availability: For smaller cryptos, social media presence or on-chain data might be sparse, requiring robust handling of missing values or alternative sources.
Potential for Noise: External data can introduce noise, especially if not directly correlated with price movements, so feature selection and validation are crucial.

Practical Implementation and Best Practices

For practical implementation, start with a minimal set of external features, such as Google Trends score and Twitter sentiment, and expand based on availability and model performance. Use libraries like pytrends, tweepy for Twitter, and blockchain APIs for on-chain data. Always validate the model’s performance with and without external features to assess their contribution, particularly for small caps where overfitting is a risk.

An unexpected insight is that for small caps, social media buzz might have a disproportionate impact due to less established market presence, potentially offering unique trading signals not seen in large-cap cryptos.

Conclusion

In summary, for building a DRL agent with LSTM for small or mid-cap crypto trading, external data beyond OHLCV includes Google Trends, social media metrics, on-chain data, news sentiment, and economic indicators. To incorporate, download and preprocess each dataset, align with price data, normalize, and include in the feature vector. Research suggests these additions can enhance model performance, but their impact varies and requires validation, especially given the potential for noise and limited data availability in less competitive markets.

Key Citations

A Comprehensive Note on Signals Indicating Market Conditions and Interest for Small or Mid-Cap Crypto Trading

Note that this is talking about daily TF – you probably want to capture these values on a higher timeframe and stick them on a row in the lower timeframe. I think using the previous day’s value would work. You could also consider including derived indices like the VIX OHLCV data. https://www.cboe.com/tradable_products/vix/#:~:text=The%20VIX%20Index%20is%20a%20calculation%20designed%20to%20produce%20a,%E2%84%A0)%20call%20and%20put%20options.

This note provides a detailed exploration of signals that can indicate whether market conditions are favorable and if there is significant interest in the market, particularly for small or mid-cap cryptocurrencies in less competitive markets. The focus is on identifying and implementing these signals, including proxies for implied volatility (IV) as mentioned in traditional trading contexts, to determine if a market is “in play,” meaning it’s active and worth trading. The analysis draws from recent research and practical implementations, ensuring a thorough understanding for both technical and non-technical audiences, with code examples for implementation.

Background and Context

Small or mid-cap cryptocurrencies, often defined by lower market capitalization compared to large-cap assets like Bitcoin or Ethereum, operate in less competitive markets. These assets typically exhibit higher volatility, lower liquidity, and potentially more susceptibility to external factors like news and social media. Determining whether a market is “in play,” meaning it has significant trading activity, volatility, and interest, is crucial for building a profitable Deep Reinforcement Learning (DRL) agent with Long Short-Term Memory (LSTM) networks. The user mentioned implied volatility (IV), commonly used in traditional markets like options trading to filter active markets, and referenced SMB, likely referring to Small Mid Cap or possibly SMB Capital, a trading firm known for such strategies.

Given the limited options trading for small or mid-cap cryptos, we need to find alternative signals and proxies for IV to assess market conditions and interest.

Signals Indicating Market Conditions and Interest

Research suggests several signals can indicate whether a market is “in play,” capturing activity, volatility, and public interest. These include:

Historical Volatility as a Proxy for Implied Volatility

Implied Volatility (IV) is a forward-looking measure derived from options prices, reflecting market expectations of future volatility. In traditional markets, firms like SMB Capital use IV to filter markets, assessing whether they’re active enough for trading strategies. However, for small or mid-cap cryptos, options trading is often limited, making IV calculation challenging.

Proxy Approach: Historical volatility, calculated from past price returns, can serve as a proxy. It measures realized volatility, assuming past behavior predicts future volatility to some extent.
Calculation: Use the standard deviation of logarithmic returns over a window (e.g., 20 days), annualized with 252 trading days (or 365 for cryptos trading 24/7).
Relevance for Small Caps: Higher historical volatility indicates more price swings, suggesting an active market, which aligns with the concept of being “in play.”

Trading Volume

Trading volume, the total amount of the crypto traded in a period, indicates market interest and liquidity.

Metrics: Daily volume, moving average volume, or volume compared to historical quantiles (e.g., 75th percentile).
Relevance for Small Caps: High volume suggests more traders are active, making the market more liquid and potentially “in play.”
Implementation: Compare current volume to historical averages to flag high activity.

Significant Price Movement

Price movement, such as daily percentage changes, can indicate market activity and volatility.

Metrics: Percentage change over a short period (e.g., 5 days), absolute changes exceeding thresholds (e.g., 5%).
Relevance for Small Caps: Significant price movements suggest volatility, attracting traders and indicating an active market.
Implementation: Calculate rolling percentage changes and set thresholds for significant movement.

External Data: Social Media and On-Chain Metrics

External data can capture public interest and network activity, particularly relevant for small caps with less established market presence.

Social Media Activity: Metrics like the number of X posts, Reddit mentions, or Google Trends scores for the crypto’s name. High activity suggests increased public interest, potentially driving price movements.
On-Chain Metrics: For blockchains, metrics like daily transaction volume, number of active addresses, or changes in circulating supply indicate network usage and adoption, suggesting market activity.
Relevance for Small Caps: These can have a disproportionate impact, offering unique trading signals not captured by price data alone.
Implementation: Fetch data via APIs (e.g., X API, blockchain explorers) and aggregate daily metrics, aligning with price data.

Other Potential Signals

Additional signals include order book depth (if available), news sentiment, and economic indicators, though their impact may vary.

Order Book Depth: A deep order book suggests more liquidity, indicating an active market.
News Sentiment: Positive news can drive interest, especially for small caps sensitive to external catalysts.
Economic Indicators: Less direct, but for region-specific cryptos, exchange rates or inflation might influence activity.

Implementing Signals in Code

To determine if a market is “in play,” we can calculate these metrics, set thresholds, and create a composite indicator. Below is a detailed code example using Python and pandas, assuming a DataFrame df with columns ‘close’, ‘volume’, and ‘time’, and optionally ‘social_activity’ for external data.

import pandas as pd
import numpy as np

# Calculate logarithmic returns
df['returns'] = np.log(df['close'] / df['close'].shift(1))

# Calculate historical volatility (20-day window, annualized with 252 days)
df['volatility'] = df['returns'].rolling(window=20).std() * np.sqrt(252)

# Calculate moving average of volume
df['volume_ma'] = df['volume'].rolling(window=20).mean()

# Calculate volume quantile (e.g., 75th percentile)
volume_quantile = df['volume'].quantile(0.75)
df['high_volume'] = df['volume'] > volume_quantile

# Calculate percentage change over the last 5 days
df['price_change_5d'] = (df['close'] - df['close'].shift(5)) / df['close'].shift(5) * 100

# Define significant price movement as > 5% or < -5%
df['significant_price_movement'] = abs(df['price_change_5d']) > 5

# Define high volatility as above mean + std dev
volatility_threshold = df['volatility'].mean() + df['volatility'].std()
df['high_volatility'] = df['volatility'] > volatility_threshold

# Create composite indicator for market in play
df['market_in_play'] = df['high_volatility'] & df['high_volume'] & df['significant_price_movement']

# Optionally, include external data
if 'social_activity' in df.columns:
    social_activity_threshold = df['social_activity'].quantile(0.75)
    df['high_social_activity'] = df['social_activity'] > social_activity_threshold
    df['market_in_play'] = df['market_in_play'] & df['high_social_activity']

This code creates a binary indicator market_in_play that flags when the market meets all criteria, which can be adjusted based on strategy.

Considerations for Small or Mid-Cap Cryptos

Given the unique characteristics of small or mid-cap cryptos, implementing these signals requires careful consideration:

Higher Volatility and Sensitivity: Historical volatility and price movement are crucial, given their higher volatility, making them more likely to be “in play” during active periods.
Limited Data Availability: For smaller cryptos, social media or on-chain data might be sparse, requiring robust handling of missing values or alternative sources.
Potential for Noise: External data can introduce noise, so feature selection and validation are essential to ensure they improve model performance.

Practical Implementation and Best Practices

For practical implementation, start with a minimal set of signals (e.g., volatility, volume, price movement) and expand based on data availability. Use libraries like pandas for calculations and validate the composite indicator’s effectiveness by backtesting trading strategies. Adjust thresholds based on historical data and market conditions, especially for small caps where overfitting is a risk.

An unexpected detail is that for small caps, social media activity might have a larger impact due to less established market presence, potentially offering unique trading signals not seen in large-cap cryptos, enhancing the ability to detect when a market is “in play.”

Conclusion

In summary, to determine if market conditions are good and if there’s interest, use signals like high historical volatility, trading volume, significant price movement, and external data such as social media activity or on-chain metrics. For small or mid-cap cryptos, historical volatility can proxy for implied volatility, given limited options trading. Implement these in code by calculating metrics, setting thresholds, and creating a composite indicator, ensuring alignment with trading strategies and market characteristics.

Building a DRL Agent: Market Selection and Processing Data for Training

Background and Context

Feature Engineering Process

Step 1: Data Loading and Preprocessing

Step 2: Feature Creation from OHLCV Data

Step 3: Normalization of Features

Step 4: Organizing Data for DRL Agent

Additional Considerations

Practical Implementation and Best Practices

Conclusion

Key Citations

A Comprehensive Note on Incorporating External Data for DRL Agents with LSTM in Small or Mid-Cap Crypto Trading

Background and Context

External Data Sources and Rationale

Google Trends and Search Volume

Social Media Metrics

On-Chain Metrics

News and Events Data

Economic Indicators

Incorporating External Data into the Feature Set

Step 1: Data Collection and Preprocessing

Step 2: Alignment with Price Data

Step 3: Normalization and Feature Engineering

Step 4: Organizing Data for DRL Agent

Considerations for Small or Mid-Cap Cryptos

Practical Implementation and Best Practices

Conclusion

Key Citations

A Comprehensive Note on Signals Indicating Market Conditions and Interest for Small or Mid-Cap Crypto Trading

Background and Context

Signals Indicating Market Conditions and Interest

Historical Volatility as a Proxy for Implied Volatility

Trading Volume

Significant Price Movement

External Data: Social Media and On-Chain Metrics

Other Potential Signals

Implementing Signals in Code

Considerations for Small or Mid-Cap Cryptos

Practical Implementation and Best Practices

Conclusion

Key Citations

Share this:

Share

About the blog

Get updated

Your message has been sent

Leave a comment Cancel reply