Machine Learning: Support and Resistance [YinYangAlgorithms]Overview:
Support and Resistance is normally based upon Pivot Points and Highest Highs and Lowest Lows. Many times coders even incorporate Volume, RSI and other factors into the equation. However there may be a downside to doing a pure technical approach based on historical levels. We live in a time where Machine Learning is becoming more and more used; thus we have decided to create a Machine Learning Support and Resistance Projection based Indicator. Rather than using traditional Support and Resistance calculations using historical data, we have taken a rather different approach. This Indicator instead attempts to Predict and Project where Support and Resistance locations will be based on a Machine Learning Model using a form of KNN (k-Nearest Neighbors).
Since this indicator creates a Projection of where it deems Support and Resistance will be, it has the ability to move its Support and Resistance before the price even gets to it if it believes it will surpass its projections. This may create a more accurate placement of Support and Resistance as they’re not based on historical levels.
This Indicator does not Repaint.
How it works:
This Indicator makes its projections based on the source you provide (by default close) of the previous bar and submits the source, RSI and EMA to our Projection Function to get its projection of the current bar.
The Projection function essentially calculates potential movement after finding the differences between the source the MA from the current bar, previous bar and average over the span of Machine Learning Length.
Potential movement is defined as:
Average Difference + Average(Machine Learning Average, Average Last Distance)
Average Difference: (Absolute value of Current Source - Current MA) - (Absolute value of Machine Learning Average - Machine Learning MA)
Average Last Distance: Average(Current Source - Current MA, Previous Source - Previous MA)
It then predicts the next bars directional movement (bullish or bearish bar) using several factors:
Previous Source > Previous MA
Current Source - Current MA > Average Source - Average MA
Current RSI > Previous RSI
Current RSI > 30 and Previous RSI <= 30
Current RSI < 70 and Previous RSI >= 70
This helps us to predict the direction the next bar may move.
We then calculate a multiplier that we apply to our Potential Movement value to get our final result which is our Current Bars Close Projection.
Our multiplier is calculated using:
(Current RSI > 30 and Previous RSI <= 30) OR (Current RSI < 70 and Previous RSI >= 70)
Current Source - Current MA > Previous Source - Previous MA
We then create an array and fill it with the previous X projections (Machine Learning Length) and send it to another function. This function, if told to, will sort the data accordingly and then output the KNN average of the length given.
We calculate and plot various KNN lengths to create different Zones:
Strong Support: Length of 2 but sort the data Ascending (low to high)
Strong Resistance: Length of 2 but sort the data Descending (high to low)
Support: Length of Machine Length Length / 10 or Min of 2 sorted by Ascending
Resistance: Length of Machine Length Length / 10 or Min of 2 sorted by Descending
There are also 4 other plots you may be wondering what they are, there is your AVG, VWMA, Long Term Memory and Current Projection.
By default your Current Projection is disabled in settings but you can enable it if you are curious to see how the projections for each close are calculated. It is, however, not a crucial point of interest (white line).
The average is simply the average value of the Machine Learning Data (purple line).
The VWMA is a VWMA calculation applied to our Data over a length specified in settings (by default 1)(blue line). The VWMA is crucial when combined with the Avg as they can cross over and under each other. These crosses represent potential Bullish and Bearish zones.
Lastly, but certainly not least, we have the Long Term Memory (maroon line). The Long Term Memory can be displayed either as an ‘Average’, ‘Hard Line’ or ‘None’. The Long Term Average is only updated every Machine Learning Length Bar Index’s and is populated with the average of the Machine Learning Data. For Instance, if Machine Learning Length is set to 100, the Long Term Memory is only updated every 100 bars, and since its length is the same as the Machine Learning Length, that means its data is composed of 10,000 bars worth of data. The Long Term Memory may be very beneficial for determining where Support and Resistance lie over the Long Term within a Machine Learning Algorithm. When set to ‘Average’ it plots the connection lines diagonally, and although they may be more visually appealing, they’re less useful when it comes to actually seeing support and resistance as generally speaking, support and resistance lie on the horizontal. When set to ‘Hard Line’ the Long Term Memory is connected with hard lines and holds the price value until the next time it is updated. This makes it much more useful for potentially identifying Support and Resistance.
Tutorial:
Here is an overview of what the Indicator looks like, now let's start to dissect it.
In the example above we can see how all of the lines between the Major Support and Resistance zones may act as BOTH Support and Resistance depending on which side the price is currently on. In the circle on the left, we can see how it can fluctuate between the two. If you look at the circle on the right, we can see how the Average line acts as a strong support before it fails to maintain it. Generally speaking, most Support and Resistance locations may potentially fail to hold after 3 tests, as the Average did in this example.
As you can see, the Support and Resistance doesn’t wait to be tested before adjusting, which is why there are 2 lines which create their zones. The inner line is the Support/Resistance and the outer line is the Strong Support/Resistance. The Yellow Circle shows the inner line was able to calculate the moving resistance correctly and then adjusted accordingly as it was projecting the price to keep increasing. However, if you look at the White Circle, you can see that since there was first a crash, and then parabolic movement, that the inner zone could not move and predict the resistance as well as the outer zone could.
We consider the price to be ‘Overvalued’ when it is above the VWMA (blue line) and ‘Undervalued’ when it is below the VWMA. It is considered ‘fair’ price when it is within the VWMA to Average zone (between the blue and purple lines). If you look at the example above, you’ll notice where the two yellow circles are, it is not only considered ‘Overvalued’, but it then proceeds to ride the inner resistance line upwards. This is common when the market is overly bullish and vice versa when it is bearish. Please keep in mind, although it is common, it doesn’t mean a correction can’t happen.
In this example above we look at the last bull run that may have started due to the halving. This bull run was very bullish as you can see in the example above. The price was constantly sitting within the Resistance Zone and the VWMA that was very close to it was constantly acting as a Support. Naturally, due to the Algorithm used in this Indicator, as the momentum starts to slow down, the VWMA (blue line) will start to space out more and more from the Resistance Zone. This doesn’t mean the momentum is gone, it just means it may be slowing down.
Unfortunately we have to study the Bear Market with a different perspective than the Bull Market. However, there are still some similarities within the two. If you refer to the example above and the previous example, you can clearly see that the Bull Market loves to stay with the Resistance Zone and use the VWMA as a Support. However, the Bear Market does not. This is a normal occurrence, however we can see from the example above you may see a correction / horizontal movement when the Outer Support Line is touched. If you look at all 3 yellow circles, the Outer Support Line was touched, then either a small correction or horizontal consolidation occurred.
We will conclude our Tutorial here, hopefully you’ll be able to benefit from a moving Support and Resistance calculated with Machine Learning that projects its locations, rather than using traditional calculations.
Settings:
Source: This source is the base for all our calculations
Machine Learning Length: How much projection data are we storing and using to make calculations.
Smoothing Length: We need to smooth calculations such as RSI, EMA and VWMA. What length are we smoothing it with?
VWMA ML Projection Length: How far into our Machine Learning data should we average for our VWMA. Please note the 'Smoothing Length' is still applied here after getting the Projection Average.
Long Term Memory: Long term memory has the same storage length but is only updated once per Machine Learning Length. For instance, if Machine Learning Length is 100, it will save the Average of our data once every 100 bars. This means its memory is an average of 10,000 bars of Machine Learning. 'Average' connects its values diagonally whereas 'Hard Line' holds its value until it changes.
Use Average Last Distance In Potential Movement: This can help accuracy but generally also displaces the Support and Resistance by projecting it further.
Show Current Projection: Projections occur for each bar, and our Machine Learning utilizes these projections by storing and evaluating them. This toggle will display the Current Projection Line which is used to create all our Projections.
If you have any questions, comments, ideas or concerns please don't hesitate to contact us.
HAPPY TRADING!
Machinelearning
AI Momentum [YinYang]Overview:
AI Momentum is a kernel function based momentum Indicator. It uses Rational Quadratics to help smooth out the Moving Averages, this may give them a more accurate result. This Indicator has 2 main uses, first it displays ‘Zones’ that help you visualize the potential movement areas and when the price is out of bounds (Overvalued or Undervalued). Secondly it creates signals that display the momentum of the current trend.
The Zones are composed of the Highest Highs and Lowest lows turned into a Rational Quadratic over varying lengths. These create our Rational High and Low zones. There is however a second zone. The second zone is composed of the avg of the Inner High and Inner Low zones (yellow line) and the Rational Quadratic of the current Close. This helps to create a second zone that is within the High and Low bounds that may represent momentum changes within these zones. When the Rationalized Close crosses above the High and Low Zone Average it may signify a bullish momentum change and vice versa when it crosses below.
There are 3 different signals created to display momentum:
Bullish and Bearish Momentum. These signals display when there is current bullish or bearish momentum happening within the trend. When the momentum changes there will likely be a lull where there are neither Bullish or Bearish momentum signals. These signals may be useful to help visualize when the momentum has started and stopped for both the bulls and the bears. Bullish Momentum is calculated by checking if the Rational Quadratic Close > Rational Quadratic of the Highest OHLC4 smoothed over a VWMA. The Bearish Momentum is calculated by checking the opposite.
Overly Bullish and Bearish Momentum. These signals occur when the bar has Bullish or Bearish Momentum and also has an Rationalized RSI greater or less than a certain level. Bullish is >= 57 and Bearish is <= 43. There is also the option to ‘Factor Volume’ into these signals. This means, the Overly Bullish and Bearish Signals will only occur when the Rationalized Volume > VWMA Rationalized Volume as well as the previously mentioned factors above. This can be useful for removing ‘clutter’ as volume may dictate when these momentum changes will occur, but it can also remove some of the useful signals and you may miss the swing too if the volume just was low. Overly Bullish and Bearish Momentum may dictate when a momentum change will occur. Remember, they are OVERLY Bullish and Bearish, meaning there is a chance a correction may occur around these signals.
Bull and Bear Crosses. These signals occur when the Rationalized Close crosses the Gaussian Close that is 2 bars back. These signals may show when there is a strong change in momentum, but be careful as more often than not they’re predicting that the momentum may change in the opposite direction.
Tutorial:
As we can see in the example above, generally what happens is we get the regular Bullish or Bearish momentum, followed by the Rationalized Close crossing the Zone average and finally the Overly Bullish or Bearish signals. This is normally the order of operations but isn’t always how it happens as sometimes momentum changes don’t make it that far; also the Rationalized Close and Zone Average don’t follow any of the same math as the Signals which can result in differing appearances. The Bull and Bear Crosses are also quite sporadic in appearance and don’t generally follow any sort of order of operations. However, they may occur as a Predictor between Bullish and Bearish momentum, signifying the beginning of the momentum change.
The Bull and Bear crosses may be a Predictor of momentum change. They generally happen when there is no Bullish or Bearish momentum happening; and this helps to add strength to their prediction. When they occur during momentum (orange circle) there is a less likely chance that it will happen, and may instead signify the exact opposite; it may help predict a large spike in momentum in the direction of the Bullish or Bearish momentum. In the case of the orange circle, there is currently Bearish Momentum and therefore the Bull Cross may help predict a large momentum movement is about to occur in favor of the Bears.
We have disabled signals here to properly display and talk about the zones. As you can see, Rationalizing the Highest Highs and Lowest Lows over 2 different lengths creates inner and outer bounds that help to predict where parabolic movement and momentum may move to. Our Inner and Outer zones are great for seeing potential Support and Resistance locations.
The secondary zone, which can cross over and change from Green to Red is also a very important zone. Let's zoom in and talk about it specifically.
The Middle Zone Crosses may help deduce where parabolic movement and strong momentum changes may occur. Generally what may happen is when the cross occurs, you will see parabolic movement to the High / Low zones. This may be the Inner zone but can sometimes be the outer zone too. The hard part is sometimes it can be a Fakeout, like displayed with the Blue Circle. The Cross doesn’t mean it may move to the opposing side, sometimes it may just be predicting Parabolic movement in a general sense.
When we turn the Momentum Signals back on, we can see where the Fakeout occurred that it not only almost hit the Inner Low Zone but it also exhibited 2 Overly Bearish Signals. Remember, Overly bearish signals mean a momentum change in favor of the Bulls may occur soon and overly Bullish signals mean a momentum change in favor of the Bears may occur soon.
You may be wondering, well what does “may occur soon” mean and how do we tell?
The purpose of the momentum signals is not only to let you know when Momentum has occurred and when it is still prevalent. It also matters A LOT when it has STOPPED!
In this example above, we look at when the Overly Bullish and Bearish Momentum has STOPPED. As you can see, when the Overly Bullish or Bearish Momentum stopped may be a strong predictor of potential momentum change in the opposing direction.
We will conclude our Tutorial here, hopefully this Indicator has been helpful for showing you where momentum is occurring and help predict how far it may move. We have been dabbling with and are planning on releasing a Strategy based on this Indicator shortly.
Settings:
1. Momentum:
Show Signals: Sometimes it can be difficult to visualize the zones with signals enabled.
Factor Volume: Factor Volume only applies to Overly Bullish and Bearish Signals. It's when the Volume is > VWMA Volume over the Smoothing Length.
Zone Inside Length: The Zone Inside is the Inner zone of the High and Low. This is the length used to create it.
Zone Outside Length: The Zone Outside is the Outer zone of the High and Low. This is the length used to create it.
Smoothing length: Smoothing length is the length used to smooth out our Bullish and Bearish signals, along with our Overly Bullish and Overly Bearish Signals.
2. Kernel Settings:
Lookback Window: The number of bars used for the estimation. This is a sliding value that represents the most recent historical bars. Recommended range: 3-50.
Relative Weighting: Relative weighting of time frames. As this value approaches zero, the longer time frames will exert more influence on the estimation. As this value approaches infinity, the behavior of the Rational Quadratic Kernel will become identical to the Gaussian kernel. Recommended range: 0.25-25.
Start Regression at Bar: Bar index on which to start regression. The first bars of a chart are often highly volatile, and omission of these initial bars often leads to a better overall fit. Recommended range: 5-25.
If you have any questions, comments, ideas or concerns please don't hesitate to contact us.
HAPPY TRADING!
Machine Learning: Trend Pulse⚠️❗ Important Limitations: Due to the way this script is designed, it operates specifically under certain conditions:
Stocks & Forex : Only compatible with timeframes of 8 hours and above ⏰
Crypto : Only works with timeframes starting from 4 hours and higher ⏰
❗Please note that the script will not work on lower timeframes.❗
Feature Extraction : It begins by identifying a window of past price changes. Think of this as capturing the "mood" of the market over a certain period.
Distance Calculation : For each historical data point, it computes a distance to the current window. This distance measures how similar past and present market conditions are. The smaller the distance, the more similar they are.
Neighbor Selection : From these, it selects 'k' closest neighbors. The variable 'k' is a user-defined parameter indicating how many of the closest historical points to consider.
Price Estimation : It then takes the average price of these 'k' neighbors to generate a forecast for the next stock price.
Z-Score Scaling: Lastly, this forecast is normalized using the Z-score to make it more robust and comparable over time.
Inputs:
histCap (Historical Cap) : histCap limits the number of past bars the script will consider. Think of it as setting the "memory" of model—how far back in time it should look.
sampleSpeed (Sampling Rate) : sampleSpeed is like a time-saving shortcut, allowing the script to skip bars and only sample data points at certain intervals. This makes the process faster but could potentially miss some nuances in the data.
winSpan (Window Size) : This is the size of the "snapshot" of market data the script will look at each time. The window size sets how many bars the algorithm will include when it's measuring how "similar" the current market conditions are to past conditions.
All these variables help to simplify and streamline the k-NN model, making it workable within limitations. You could see them as tuning knobs, letting you balance between computational efficiency and predictive accuracy.
AI Channels (Clustering) [LuxAlgo]The AI Channels indicator is constructed based on rolling K-means clustering, a common machine learning method used for clustering analysis. These channels allow users to determine the direction of the underlying trends in the price.
We also included an option to display the indicator as a trailing stop from within the settings.
🔶 USAGE
Each channel extremity allows users to determine the current trend direction. Price breaking over the upper extremity suggesting an uptrend, and price breaking below the lower extremity suggesting a downtrend. Using a higher Window Size value will return longer-term indications.
The "Clusters" setting allows users to control how easy it is for the price to break an extremity, with higher values returning extremities further away from the price.
The "Denoise Channels" is enabled by default and allows to see less noisy extremities that are more coherent with the detected trend.
Users who wish to have more focus on a detected trend can display the indicator as a trailing stop.
🔹 Centroid Dispersion Areas
Each extremity is made of one area. The width of each area indicates how spread values within a cluster are around their centroids. A wider area would suggest that prices within a cluster are more spread out around their centroid, as such one could say that it is indicative of the volatility of a cluster.
Wider areas around a specific extremity can indicate a larger and more spread-out amount of prices within the associated cluster. In practice price entering an area has a higher chance to break an associated extremity.
🔶 DETAILS
The indicator performs K-means clustering over the most recent Window Size prices, finding a number of user-specified clusters. See here to find more information on cluster detection.
The channel extremities are returned as the centroid of the lowest, average, and highest price clusters.
K-means clustering can be computationally expensive and as such we allow users to determine the maximum number of iterations used to find the centroids as well as the number of most historical bars to perform the indicator calculation. Do note that increasing the calculation window of the indicator as well as the number of clusters will return slower results.
🔶 SETTINGS
Window Size: Amount of most recent prices to use for the calculation of the indicator.
Clusters": Amount of clusters detected for the calculation of the indicator.
Denoise Channels: When enabled, return less noisy channels extremities, disabling this setting will return the exact centroids at each time but will produce less regular extremities.
As Trailing Stop: Display the indicator as a trailing stop.
🔹 Optimization
This group of settings affects the runtime performance of the script.
Maximum Iteration Steps: Maximum number of iterations allowed for finding centroids. Excessively low values can return a better script load time but poor clustering.
Historical Bars Calculation: Calculation window of the script (in bars).
SimilarityMeasuresLibrary "SimilarityMeasures"
Similarity measures are statistical methods used to quantify the distance between different data sets
or strings. There are various types of similarity measures, including those that compare:
- data points (SSD, Euclidean, Manhattan, Minkowski, Chebyshev, Correlation, Cosine, Camberra, MAE, MSE, Lorentzian, Intersection, Penrose Shape, Meehl),
- strings (Edit(Levenshtein), Lee, Hamming, Jaro),
- probability distributions (Mahalanobis, Fidelity, Bhattacharyya, Hellinger),
- sets (Kumar Hassebrook, Jaccard, Sorensen, Chi Square).
---
These measures are used in various fields such as data analysis, machine learning, and pattern recognition. They
help to compare and analyze similarities and differences between different data sets or strings, which
can be useful for making predictions, classifications, and decisions.
---
References:
en.wikipedia.org
cran.r-project.org
numerics.mathdotnet.com
github.com
github.com
github.com
Encyclopedia of Distances, doi.org
ssd(p, q)
Sum of squared difference for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Measure of distance that calculates the squared euclidean distance.
euclidean(p, q)
Euclidean distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Measure of distance that calculates the straight-line (or Euclidean).
manhattan(p, q)
Manhattan distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Measure of absolute differences between both points.
minkowski(p, q, p_value)
Minkowsky Distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
p_value (float) : `float` P value, default=1.0(1: manhatan, 2: euclidean), does not support chebychev.
Returns: Measure of similarity in the normed vector space.
chebyshev(p, q)
Chebyshev distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Measure of maximum absolute difference.
correlation(p, q)
Correlation distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Measure of maximum absolute difference.
cosine(p, q)
Cosine distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Cosine distance between vectors `p` and `q`.
---
angiogenesis.dkfz.de
camberra(p, q)
Camberra distance for N dimensions.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Weighted measure of absolute differences between both points.
mae(p, q)
Mean absolute error is a normalized version of the sum of absolute difference (manhattan).
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Mean absolute error of vectors `p` and `q`.
mse(p, q)
Mean squared error is a normalized version of the sum of squared difference.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Mean squared error of vectors `p` and `q`.
lorentzian(p, q)
Lorentzian distance between provided vectors.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Lorentzian distance of vectors `p` and `q`.
---
angiogenesis.dkfz.de
intersection(p, q)
Intersection distance between provided vectors.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Intersection distance of vectors `p` and `q`.
---
angiogenesis.dkfz.de
penrose(p, q)
Penrose Shape distance between provided vectors.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Penrose shape distance of vectors `p` and `q`.
---
angiogenesis.dkfz.de
meehl(p, q)
Meehl distance between provided vectors.
Parameters:
p (float ) : `array` Vector with first numeric distribution.
q (float ) : `array` Vector with second numeric distribution.
Returns: Meehl distance of vectors `p` and `q`.
---
angiogenesis.dkfz.de
edit(x, y)
Edit (aka Levenshtein) distance for indexed strings.
Parameters:
x (int ) : `array` Indexed array.
y (int ) : `array` Indexed array.
Returns: Number of deletions, insertions, or substitutions required to transform source string into target string.
---
generated description:
The Edit distance is a measure of similarity used to compare two strings. It is defined as the minimum number of
operations (insertions, deletions, or substitutions) required to transform one string into another. The operations
are performed on the characters of the strings, and the cost of each operation depends on the specific algorithm
used.
The Edit distance is widely used in various applications such as spell checking, text similarity, and machine
translation. It can also be used for other purposes like finding the closest match between two strings or
identifying the common prefixes or suffixes between them.
---
github.com
www.red-gate.com
planetcalc.com
lee(x, y, dsize)
Distance between two indexed strings of equal length.
Parameters:
x (int ) : `array` Indexed array.
y (int ) : `array` Indexed array.
dsize (int) : `int` Dictionary size.
Returns: Distance between two strings by accounting for dictionary size.
---
www.johndcook.com
hamming(x, y)
Distance between two indexed strings of equal length.
Parameters:
x (int ) : `array` Indexed array.
y (int ) : `array` Indexed array.
Returns: Length of different components on both sequences.
---
en.wikipedia.org
jaro(x, y)
Distance between two indexed strings.
Parameters:
x (int ) : `array` Indexed array.
y (int ) : `array` Indexed array.
Returns: Measure of two strings' similarity: the higher the value, the more similar the strings are.
The score is normalized such that `0` equates to no similarities and `1` is an exact match.
---
rosettacode.org
mahalanobis(p, q, VI)
Mahalanobis distance between two vectors with population inverse covariance matrix.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
VI (matrix) : `matrix` Inverse of the covariance matrix.
Returns: The mahalanobis distance between vectors `p` and `q`.
---
people.revoledu.com
stat.ethz.ch
docs.scipy.org
fidelity(p, q)
Fidelity distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Bhattacharyya Coefficient between vectors `p` and `q`.
---
en.wikipedia.org
bhattacharyya(p, q)
Bhattacharyya distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Bhattacharyya distance between vectors `p` and `q`.
---
en.wikipedia.org
hellinger(p, q)
Hellinger distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The hellinger distance between vectors `p` and `q`.
---
en.wikipedia.org
jamesmccaffrey.wordpress.com
kumar_hassebrook(p, q)
Kumar Hassebrook distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Kumar Hassebrook distance between vectors `p` and `q`.
---
github.com
jaccard(p, q)
Jaccard distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Jaccard distance between vectors `p` and `q`.
---
github.com
sorensen(p, q)
Sorensen distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
Returns: The Sorensen distance between vectors `p` and `q`.
---
people.revoledu.com
chi_square(p, q, eps)
Chi Square distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
eps (float)
Returns: The Chi Square distance between vectors `p` and `q`.
---
uw.pressbooks.pub
stats.stackexchange.com
www.itl.nist.gov
kulczynsky(p, q, eps)
Kulczynsky distance between provided vectors.
Parameters:
p (float ) : `array` 1D Vector.
q (float ) : `array` 1D Vector.
eps (float)
Returns: The Kulczynsky distance between vectors `p` and `q`.
---
github.com
Hybrid EMA AlgoLearner⭕️Innovative trading indicator that utilizes a k-NN-inspired algorithmic approach alongside traditional Exponential Moving Averages (EMAs) for more nuanced analysis. While the algorithm doesn't actually employ machine learning techniques, it mimics the logic of the k-Nearest Neighbors (k-NN) methodology. The script takes into account the closest 'k' distances between a short-term and long-term EMA to create a weighted short-term EMA. This combination of rule-based logic and EMA technicals offers traders a more sophisticated tool for market analysis.
⭕️Foundational EMAs: The script kicks off by generating a 50-period short-term EMA and a 200-period long-term EMA. These EMAs serve a dual purpose: they provide the basic trend-following capability familiar to most traders, akin to the classic EMA 50 and EMA 200, and set the stage for more intricate calculations to follow.
⭕️k-NN Integration: The indicator distinguishes itself by introducing k-NN (k-Nearest Neighbors) logic into the mix. This machine learning technique scans prior market data to find the closest 'neighbors' or distances between the two EMAs. The 'k' closest distances are then picked for further analysis, thus imbuing the indicator with an added layer of data-driven context.
⭕️Algorithmic Weighting: After the k closest distances are identified, they are utilized to compute a weighted EMA. Each of the k closest short-term EMA values is weighted by its associated distance. These weighted values are summed up and normalized by the sum of all chosen distances. The result is a weighted short-term EMA that packs more nuanced information than a simple EMA would.
Machine Learning Regression Trend [LuxAlgo]The Machine Learning Regression Trend tool uses random sample consensus (RANSAC) to fit and extrapolate a linear model by discarding potential outliers, resulting in a more robust fit.
🔶 USAGE
The proposed tool can be used like a regular linear regression, providing support/resistance as well as forecasting an estimated underlying trend.
Using RANSAC allows filtering out outliers from the input data of our final fit, by outliers we are referring to values deviating from the underlying trend whose influence on a fitted model is undesired. For financial prices and under the assumptions of segmented linear trends, these outliers can be caused by volatile moves and/or periodic variations within an underlying trend.
Adjusting the "Allowed Error" numerical setting will determine how sensitive the model is to outliers, with higher values returning a more sensitive model. The blue margin displayed shows the allowed error area.
The number of outliers in the calculation window (represented by red dots) can also be indicative of the amount of noise added to an underlying linear trend in the price, with more outliers suggesting more noise.
Compared to a regular linear regression which does not discriminate against any point in the calculation window, we see that the model using RANSAC is more conservative, giving more importance to detecting a higher number of inliners.
🔶 DETAILS
RANSAC is a general approach to fitting more robust models in the presence of outliers in a dataset and as such does not limit itself to a linear regression model.
This iterative approach can be summarized as follow for the case of our script:
Step 1: Obtain a subset of our dataset by randomly selecting 2 unique samples
Step 2: Fit a linear regression to our subset
Step 3: Get the error between the value within our dataset and the fitted model at time t , if the absolute error is lower than our tolerance threshold then that value is an inlier
Step 4: If the amount of detected inliers is greater than a user-set amount save the model
Repeat steps 1 to 4 until the set number of iterations is reached and use the model that maximizes the number of inliers
🔶 SETTINGS
Length: Calculation window of the linear regression.
Width: Linear regression channel width.
Source: Input data for the linear regression calculation.
🔹 RANSAC
Minimum Inliers: Minimum number of inliers required to return an appropriate model.
Allowed Error: Determine the tolerance threshold used to detect potential inliers. "Auto" will automatically determine the tolerance threshold and will allow the user to multiply it through the numerical input setting at the side. "Fixed" will use the user-set value as the tolerance threshold.
Maximum Iterations Steps: Maximum number of allowed iterations.
Support & Resistance AI (K means/median) [ThinkLogicAI]█ OVERVIEW
K-means is a clustering algorithm commonly used in machine learning to group data points into distinct clusters based on their similarities. While K-means is not typically used directly for identifying support and resistance levels in financial markets, it can serve as a tool in a broader analysis approach.
Support and resistance levels are price levels in financial markets where the price tends to react or reverse. Support is a level where the price tends to stop falling and might start to rise, while resistance is a level where the price tends to stop rising and might start to fall. Traders and analysts often look for these levels as they can provide insights into potential price movements and trading opportunities.
█ BACKGROUND
The K-means algorithm has been around since the late 1950s, making it more than six decades old. The algorithm was introduced by Stuart Lloyd in his 1957 research paper "Least squares quantization in PCM" for telecommunications applications. However, it wasn't widely known or recognized until James MacQueen's 1967 paper "Some Methods for Classification and Analysis of Multivariate Observations," where he formalized the algorithm and referred to it as the "K-means" clustering method.
So, while K-means has been around for a considerable amount of time, it continues to be a widely used and influential algorithm in the fields of machine learning, data analysis, and pattern recognition due to its simplicity and effectiveness in clustering tasks.
█ COMPARE AND CONTRAST SUPPORT AND RESISTANCE METHODS
1) K-means Approach:
Cluster Formation: After applying the K-means algorithm to historical price change data and visualizing the resulting clusters, traders can identify distinct regions on the price chart where clusters are formed. Each cluster represents a group of similar price change patterns.
Cluster Analysis: Analyze the clusters to identify areas where clusters tend to form. These areas might correspond to regions of price behavior that repeat over time and could be indicative of support and resistance levels.
Potential Support and Resistance Levels: Based on the identified areas of cluster formation, traders can consider these regions as potential support and resistance levels. A cluster forming at a specific price level could suggest that this level has been historically significant, causing similar price behavior in the past.
Cluster Standard Deviation: In addition to looking at the means (centroids) of the clusters, traders can also calculate the standard deviation of price changes within each cluster. Standard deviation is a measure of the dispersion or volatility of data points around the mean. A higher standard deviation indicates greater price volatility within a cluster.
Low Standard Deviation: If a cluster has a low standard deviation, it suggests that prices within that cluster are relatively stable and less likely to exhibit sudden and large price movements. Traders might consider placing tighter stop-loss orders for trades within these clusters.
High Standard Deviation: Conversely, if a cluster has a high standard deviation, it indicates greater price volatility within that cluster. Traders might opt for wider stop-loss orders to allow for potential price fluctuations without getting stopped out prematurely.
Cluster Density: Each data point is assigned to a cluster so a cluster that is more dense will act more like gravity and
2) Traditional Approach:
Trendlines: Draw trendlines connecting significant highs or lows on a price chart to identify potential support and resistance levels.
Chart Patterns: Identify chart patterns like double tops, double bottoms, head and shoulders, and triangles that often indicate potential reversal points.
Moving Averages: Use moving averages to identify levels where the price might find support or resistance based on the average price over a specific period.
Psychological Levels: Identify round numbers or levels that traders often pay attention to, which can act as support and resistance.
Previous Highs and Lows: Identify significant previous price highs and lows that might act as support or resistance.
The key difference lies in the approach and the foundation of these methods. Traditional methods are based on well-established principles of technical analysis and market psychology, while the K-means approach involves clustering price behavior without necessarily incorporating market sentiment or specific price patterns.
It's important to note that while the K-means approach might provide an interesting way to analyze price data, it should be used cautiously and in conjunction with other traditional methods. Financial markets are influenced by a wide range of factors beyond just price behavior, and the effectiveness of any method for identifying support and resistance levels should be thoroughly tested and validated. Additionally, developments in trading strategies and analysis techniques could have occurred since my last update.
█ K MEANS ALGORITHM
The algorithm for K means is as follows:
Initialize cluster centers
assign data to clusters based on minimum distance
calculate cluster center by taking the average or median of the clusters
repeat steps 1-3 until cluster centers stop moving
█ LIMITATIONS OF K MEANS
There are 3 main limitations of this algorithm:
Sensitive to Initializations: K-means is sensitive to the initial placement of centroids. Different initializations can lead to different cluster assignments and final results.
Assumption of Equal Sizes and Variances: K-means assumes that clusters have roughly equal sizes and spherical shapes. This may not hold true for all types of data. It can struggle with identifying clusters with uneven densities, sizes, or shapes.
Impact of Outliers: K-means is sensitive to outliers, as a single outlier can significantly affect the position of cluster centroids. Outliers can lead to the creation of spurious clusters or distortion of the true cluster structure.
█ LIMITATIONS IN APPLICATION OF K MEANS IN TRADING
Trading data often exhibits characteristics that can pose challenges when applying indicators and analysis techniques. Here's how the limitations of outliers, varying scales, and unequal variance can impact the use of indicators in trading:
Outliers are data points that significantly deviate from the rest of the dataset. In trading, outliers can represent extreme price movements caused by rare events, news, or market anomalies. Outliers can have a significant impact on trading indicators and analyses:
Indicator Distortion: Outliers can skew the calculations of indicators, leading to misleading signals. For instance, a single extreme price spike could cause indicators like moving averages or RSI (Relative Strength Index) to give false signals.
Risk Management: Outliers can lead to overly aggressive trading decisions if not properly accounted for. Ignoring outliers might result in unexpected losses or missed opportunities to adjust trading strategies.
Different Scales: Trading data often includes multiple indicators with varying units and scales. For example, prices are typically in dollars, volume in units traded, and oscillators have their own scale. Mixing indicators with different scales can complicate analysis:
Normalization: Indicators on different scales need to be normalized or standardized to ensure they contribute equally to the analysis. Failure to do so can lead to one indicator dominating the analysis due to its larger magnitude.
Comparability: Without normalization, it's challenging to directly compare the significance of indicators. Some indicators might have a larger numerical range and could overshadow others.
Unequal Variance: Unequal variance in trading data refers to the fact that some indicators might exhibit higher volatility than others. This can impact the interpretation of signals and the performance of trading strategies:
Volatility Adjustment: When combining indicators with varying volatility, it's essential to adjust for their relative volatilities. Failure to do so might lead to overemphasizing or underestimating the importance of certain indicators in the trading strategy.
Risk Assessment: Unequal variance can impact risk assessment. Indicators with higher volatility might lead to riskier trading decisions if not properly taken into account.
█ APPLICATION OF THIS INDICATOR
This indicator can be used in 2 ways:
1) Make a directional trade:
If a trader thinks price will go higher or lower and price is within a cluster zone, The trader can take a position and place a stop on the 1 sd band around the cluster. As one can see below, the trader can go long the green arrow and place a stop on the one standard deviation mark for that cluster below it at the red arrow. using this we can calculate a risk to reward ratio.
Calculating risk to reward: targeting a risk reward ratio of 2:1, the trader could clearly make that given that the next resistance area above that in the orange cluster exceeds this risk reward ratio.
2) Take a reversal Trade:
We can use cluster centers (support and resistance levels) to go in the opposite direction that price is currently moving in hopes of price forming a pivot and reversing off this level.
Similar to the directional trade, we can use the standard deviation of the cluster to place a stop just in case we are wrong.
In this example below we can see that shorting on the red arrow and placing a stop at the one standard deviation above this cluster would give us a profitable trade with minimal risk.
Using the cluster density table in the upper right informs the trader just how dense the cluster is. Higher density clusters will give a higher likelihood of a pivot forming at these levels and price being rejected and switching direction with a larger move.
█ FEATURES & SETTINGS
General Settings:
Number of clusters: The user can select from 3 to five clusters. A good rule of thumb is that if you are trading intraday, less is more (Think 3 rather than 5). For daily 4 to 5 clusters is good.
Cluster Method: To get around the outlier limitation of k means clustering, The median was added. This gives the user the ability to choose either k means or k median clustering. K means is the preferred method if the user things there are no large outliers, and if there appears to be large outliers or it is assumed there are then K medians is preferred.
Bars back To train on: This will be the amount of bars to include in the clustering. This number is important so that the user includes bars that are recent but not so far back that they are out of the scope of where price can be. For example the last 2 years we have been in a range on the sp500 so 505 days in this setting would be more relevant than say looking back 5 years ago because price would have to move far to get there.
Show SD Bands: Select this to show the 1 standard deviation bands around the support and resistance level or unselect this to just show the support and resistance level by itself.
Features:
Besides the support and resistance levels and standard deviation bands, this indicator gives a table in the upper right hand corner to show the density of each cluster (support and resistance level) and is color coded to the cluster line on the chart. Higher density clusters mean price has been there previously more than lower density clusters and could mean a higher likelihood of a reversal when price reaches these areas.
█ WORKS CITED
Victor Sim, "Using K-means Clustering to Create Support and Resistance", 2020, towardsdatascience.com
Chris Piech, "K means", stanford.edu
█ ACKNOLWEDGMENTS
@jdehorty- Thanks for the publish template. It made organizing my thoughts and work alot easier.
AI SuperTrend Clustering Oscillator [LuxAlgo]The AI SuperTrend Clustering Oscillator is an oscillator returning the most bullish/average/bearish centroids given by multiple instances of the difference between SuperTrend indicators.
This script is an extension of our previously posted SuperTrend AI indicator that makes use of k-means clustering. If you want to learn more about it see:
🔶 USAGE
The AI SuperTrend Clustering Oscillator is made of 3 distinct components, a bullish output (always the highest), a bearish output (always the lowest), and a "consensus" output always within the two others.
The general trend is given by the consensus output, with a value above 0 indicating an uptrend and under 0 indicating a downtrend. Using a higher minimum factor will weigh results toward longer-term trends, while lowering the maximum factor will weigh results toward shorter-term trends.
Strong trends are indicated when the bullish/bearish outputs are indicating an opposite sentiment. A strong bullish trend would for example be indicated when the bearish output is above 0, while a strong bearish trend would be indicated when the bullish output is below 0.
When the consensus output is indicating a specific trend direction, an opposite indication from the bullish/bearish output can highlight a potential reversal or retracement.
🔶 DETAILS
The indicator construction is based on finding three clusters from the difference between the closing price and various SuperTrend using different factors. The centroid of each cluster is then returned. This operation is done over all historical bars.
The highest cluster will be composed of the differences between the price and SuperTrends that are the highest, thus creating a more bullish group. The lowest cluster will be composed of the differences between the price and SuperTrends that are the lowest, thus creating a more bearish group.
The consensus cluster is composed of the differences between the price and SuperTrends that are not significant enough to be part of the other clusters.
🔶 SETTINGS
ATR Length: ATR period used for the calculation of the SuperTrends.
Factor Range: Determine the minimum and maximum factor values for the calculation of the SuperTrends.
Step: Increments of the factor range.
Smooth: Degree of smoothness of each output from the indicator.
🔹 Optimization
This group of settings affects the runtime performances of the script.
Maximum Iteration Steps: Maximum number of iterations allowed for finding centroids. Excessively low values can return a better script load time but poor clustering.
Historical Bars Calculation: Calculation window of the script (in bars).
SuperTrend AI (Clustering) [LuxAlgo]The SuperTrend AI indicator is a novel take on bridging the gap between the K-means clustering machine learning method & technical indicators. In this case, we apply K-Means clustering to the famous SuperTrend indicator.
🔶 USAGE
Users can interpret the SuperTrend AI trailing stop similarly to the regular SuperTrend indicator. Using higher minimum/maximum factors will return longer-term signals.
The displayed performance metrics displayed on each signal allow for a deeper interpretation of the indicator. Whereas higher values could indicate a higher potential for the market to be heading in the direction of the trend when compared to signals with lower values such as 1 or 0 potentially indicating retracements.
In the image above, we can notice more clear examples of the performance metrics on signals indicating trends, however, these performance metrics cannot perform or predict every signal reliably.
We can see in the image above that the trailing stop and its adaptive moving average can also act as support & resistance. Using higher values of the performance memory setting allows users to obtain a longer-term adaptive moving average of the returned trailing stop.
🔶 DETAILS
🔹 K-Means Clustering
When observing data points within a specific space, we can sometimes observe that some are closer to each other, forming groups, or "Clusters". At first sight, identifying those clusters and finding their associated data points can seem easy but doing so mathematically can be more challenging. This is where cluster analysis comes into play, where we seek to group data points into various clusters such that data points within one cluster are closer to each other. This is a common branch of AI/machine learning.
Various methods exist to find clusters within data, with the one used in this script being K-Means Clustering , a simple iterative unsupervised clustering method that finds a user-set amount of clusters.
A naive form of the K-Means algorithm would perform the following steps in order to find K clusters:
(1) Determine the amount (K) of clusters to detect.
(2) Initiate our K centroids (cluster centers) with random values.
(3) Loop over the data points, and determine which is the closest centroid from each data point, then associate that data point with the centroid.
(4) Update centroids by taking the average of the data points associated with a specific centroid.
Repeat steps 3 to 4 until convergence, that is until the centroids no longer change.
To explain how K-Means works graphically let's take the example of a one-dimensional dataset (which is the dimension used in our script) with two apparent clusters:
This is of course a simple scenario, as K will generally be higher, as well the amount of data points. Do note that this method can be very sensitive to the initialization of the centroids, this is why it is generally run multiple times, keeping the run returning the best centroids.
🔹 Adaptive SuperTrend Factor Using K-Means
The proposed indicator rationale is based on the following hypothesis:
Given multiple instances of an indicator using different settings, the optimal setting choice at time t is given by the best-performing instance with setting s(t) .
Performing the calculation of the indicator using the best setting at time t would return an indicator whose characteristics adapt based on its performance. However, what if the setting of the best-performing instance and second best-performing instance of the indicator have a high degree of disparity without a high difference in performance?
Even though this specific case is rare its however not uncommon to see that performance can be similar for a group of specific settings (this could be observed in a parameter optimization heatmap), then filtering out desirable settings to only use the best-performing one can seem too strict. We can as such reformulate our first hypothesis:
Given multiple instances of an indicator using different settings, an optimal setting choice at time t is given by the average of the best-performing instances with settings s(t) .
Finding this group of best-performing instances could be done using the previously described K-Means clustering method, assuming three groups of interest (K = 3) defined as worst performing, average performing, and best performing.
We first obtain an analog of performance P(t, factor) described as:
P(t, factor) = P(t-1, factor) + α * (∆C(t) × S(t-1, factor) - P(t-1, factor))
where 1 > α > 0, which is the performance memory determining the degree to which older inputs affect the current output. C(t) is the closing price, and S(t, factor) is the SuperTrend signal generating function with multiplicative factor factor .
We run this performance function for multiple factor settings and perform K-Means clustering on the multiple obtained performances to obtain the best-performing cluster. We initiate our centroids using quartiles of the obtained performances for faster centroids convergence.
The average of the factors associated with the best-performing cluster is then used to obtain the final factor setting, which is used to compute the final SuperTrend output.
Do note that we give the liberty for the user to get the final factor from the best, average, or worst cluster for experimental purposes.
🔶 SETTINGS
ATR Length: ATR period used for the calculation of the SuperTrends.
Factor Range: Determine the minimum and maximum factor values for the calculation of the SuperTrends.
Step: Increments of the factor range.
Performance Memory: Determine the degree to which older inputs affect the current output, with higher values returning longer-term performance measurements.
From Cluster: Determine which cluster is used to obtain the final factor.
🔹 Optimization
This group of settings affects the runtime performances of the script.
Maximum Iteration Steps: Maximum number of iterations allowed for finding centroids. Excessively low values can return a better script load time but poor clustering.
Historical Bars Calculation: Calculation window of the script (in bars).
Machine Learning Momentum Index (MLMI) [Zeiierman]█ Overview
The Machine Learning Momentum Index (MLMI) represents the next step in oscillator trading. By blending traditional momentum analysis with machine learning, MLMI delivers a potent and dynamic tool that aligns with the complexities of modern financial landscapes. Offering traders an adaptive way to understand and act on market momentum and trends, this oscillator provides real-time insights into market momentum and prevailing trends.
█ How It Works:
Momentum Analysis: MLMI employs a dual-layer analysis, utilizing quick and slow weighted moving averages (WMA) of the Relative Strength Index (RSI) to gauge the market's momentum and direction.
Machine Learning Integration: Through the k-Nearest Neighbors (k-NN) algorithm, MLMI intelligently examines historical data to make more accurate momentum predictions, adapting to the intricate patterns of the market.
MLMI's precise calculation involves:
Weighted Moving Averages: Calculations of quick (5-period) and slow (20-period) WMAs of the RSI to track short-term and long-term momentum.
k-Nearest Neighbors Algorithm: Distances between current parameters and previous data are measured, and the nearest neighbors are used for predictive modeling.
Trend Analysis: Recognition of prevailing trends through the relationship between quick and slow-moving averages.
█ How to use
The Machine Learning Momentum Index (MLMI) can be utilized in much the same way as traditional trend and momentum oscillators, providing key insights into market direction and strength. What sets MLMI apart is its integration of artificial intelligence, allowing it to adapt dynamically to market changes and offer a more nuanced and responsive analysis.
Identifying Trend Direction and Strength: The MLMI serves as a tool to recognize market trends, signaling whether the momentum is upward or downward. It also provides insights into the intensity of the momentum, helping traders understand both the direction and strength of prevailing market trends.
Identifying Consolidation Areas: When the MLMI Prediction line and the WMA of the MLMI Prediction line become flat/oscillate around the mid-level, it's a strong sign that the market is in a consolidation phase. This insight from the MLMI allows traders to recognize periods of market indecision.
Recognizing Overbought or Oversold Conditions: By identifying levels where the market may be overbought or oversold, MLMI offers insights into potential price corrections or reversals.
█ Settings
Prediction Data (k)
This parameter controls the number of neighbors to consider while making a prediction using the k-Nearest Neighbors (k-NN) algorithm. By modifying the value of k, you can change how sensitive the prediction is to local fluctuations in the data.
A smaller value of k will make the prediction more sensitive to local variations and can lead to a more erratic prediction line.
A larger value of k will consider more neighbors, thus making the prediction more stable but potentially less responsive to sudden changes.
Trend length
This parameter controls the length of the trend used in computing the momentum. This length refers to the number of periods over which the momentum is calculated, affecting how quickly the indicator reacts to changes in the underlying price movements.
A shorter trend length (smaller momentumWindow) will make the indicator more responsive to short-term price changes, potentially generating more signals but at the risk of more false alarms.
A longer trend length (larger momentumWindow) will make the indicator smoother and less responsive to short-term noise, but it may lag in reacting to significant price changes.
Please note that the Machine Learning Momentum Index (MLMI) might not be effective on higher timeframes, such as daily or above. This limitation arises because there may not be enough data at these timeframes to provide accurate momentum and trend analysis. To overcome this challenge and make the most of what MLMI has to offer, it's recommended to use the indicator on lower timeframes.
-----------------
Disclaimer
The information contained in my Scripts/Indicators/Ideas/Algos/Systems does not constitute financial advice or a solicitation to buy or sell any securities of any type. I will not accept liability for any loss or damage, including without limitation any loss of profit, which may arise directly or indirectly from the use of or reliance on such information.
All investments involve risk, and the past performance of a security, industry, sector, market, financial product, trading strategy, backtest, or individual's trading does not guarantee future results or returns. Investors are fully responsible for any investment decisions they make. Such decisions should be based solely on an evaluation of their financial circumstances, investment objectives, risk tolerance, and liquidity needs.
My Scripts/Indicators/Ideas/Algos/Systems are only for educational purposes!
AI Trend Navigator [K-Neighbor]█ Overview
In the evolving landscape of trading and investment, the demand for sophisticated and reliable tools is ever-growing. The AI Trend Navigator is an indicator designed to meet this demand, providing valuable insights into market trends and potential future price movements. The AI Trend Navigator indicator is designed to predict market trends using the k-Nearest Neighbors (KNN) classifier.
By intelligently analyzing recent price actions and emphasizing similar values, it helps traders to navigate complex market conditions with confidence. It provides an advanced way to analyze trends, offering potentially more accurate predictions compared to simpler trend-following methods.
█ Calculations
KNN Moving Average Calculation: The core of the algorithm is a KNN Moving Average that computes the mean of the 'k' closest values to a target within a specified window size. It does this by iterating through the window, calculating the absolute differences between the target and each value, and then finding the mean of the closest values. The target and value are selected based on user preferences (e.g., using the VWAP or Volatility as a target).
KNN Classifier Function: This function applies the k-nearest neighbor algorithm to classify the price action into positive, negative, or neutral trends. It looks at the nearest 'k' bars, calculates the Euclidean distance between them, and categorizes them based on the relative movement. It then returns the prediction based on the highest count of positive, negative, or neutral categories.
█ How to use
Traders can use this indicator to identify potential trend directions in different markets.
Spotting Trends: Traders can use the KNN Moving Average to identify the underlying trend of an asset. By focusing on the k closest values, this component of the indicator offers a clearer view of the trend direction, filtering out market noise.
Trend Confirmation: The KNN Classifier component can confirm existing trends by predicting the future price direction. By aligning predictions with current trends, traders can gain more confidence in their trading decisions.
█ Settings
PriceValue: This determines the type of price input used for distance calculation in the KNN algorithm.
hl2: Uses the average of the high and low prices.
VWAP: Uses the Volume Weighted Average Price.
VWAP: Uses the Volume Weighted Average Price.
Effect: Changing this input will modify the reference values used in the KNN classification, potentially altering the predictions.
TargetValue: This sets the target variable that the KNN classification will attempt to predict.
Price Action: Uses the moving average of the closing price.
VWAP: Uses the Volume Weighted Average Price.
Volatility: Uses the Average True Range (ATR).
Effect: Selecting different targets will affect what the KNN is trying to predict, altering the nature and intent of the predictions.
Number of Closest Values: Defines how many closest values will be considered when calculating the mean for the KNN Moving Average.
Effect: Increasing this value makes the algorithm consider more nearest neighbors, smoothing the indicator and potentially making it less reactive. Decreasing this value may make the indicator more sensitive but possibly more prone to noise.
Neighbors: This sets the number of neighbors that will be considered for the KNN Classifier part of the algorithm.
Effect: Adjusting the number of neighbors affects the sensitivity and smoothness of the KNN classifier.
Smoothing Period: Defines the smoothing period for the moving average used in the KNN classifier.
Effect: Increasing this value would make the KNN Moving Average smoother, potentially reducing noise. Decreasing it would make the indicator more reactive but possibly more prone to false signals.
█ What is K-Nearest Neighbors (K-NN) algorithm?
At its core, the K-NN algorithm recognizes patterns within market data and analyzes the relationships and similarities between data points. By considering the 'K' most similar instances (or neighbors) within a dataset, it predicts future price movements based on historical trends. The K-Nearest Neighbors (K-NN) algorithm is a type of instance-based or non-generalizing learning. While K-NN is considered a relatively simple machine-learning technique, it falls under the AI umbrella.
We can classify the K-Nearest Neighbors (K-NN) algorithm as a form of artificial intelligence (AI), and here's why:
Machine Learning Component: K-NN is a type of machine learning algorithm, and machine learning is a subset of AI. Machine learning is about building algorithms that allow computers to learn from and make predictions or decisions based on data. Since K-NN falls under this category, it is aligned with the principles of AI.
Instance-Based Learning: K-NN is an instance-based learning algorithm. This means that it makes decisions based on the entire training dataset rather than deriving a discriminative function from the dataset. It looks at the 'K' most similar instances (neighbors) when making a prediction, hence adapting to new information if the dataset changes. This adaptability is a hallmark of intelligent systems.
Pattern Recognition: The core of K-NN's functionality is recognizing patterns within data. It identifies relationships and similarities between data points, something akin to human pattern recognition, a key aspect of intelligence.
Classification and Regression: K-NN can be used for both classification and regression tasks, two fundamental problems in machine learning and AI. The indicator code is used for trend classification, a predictive task that aligns with the goals of AI.
Simplicity Doesn't Exclude AI: While K-NN is often considered a simpler algorithm compared to deep learning models, simplicity does not exclude something from being AI. Many AI systems are built on simple rules and can be combined or scaled to create complex behavior.
No Explicit Model Building: Unlike traditional statistical methods, K-NN does not build an explicit model during training. Instead, it waits until a prediction is required and then looks at the 'K' nearest neighbors from the training data to make that prediction. This lazy learning approach is another aspect of machine learning, part of the broader AI field.
-----------------
Disclaimer
The information contained in my Scripts/Indicators/Ideas/Algos/Systems does not constitute financial advice or a solicitation to buy or sell any securities of any type. I will not accept liability for any loss or damage, including without limitation any loss of profit, which may arise directly or indirectly from the use of or reliance on such information.
All investments involve risk, and the past performance of a security, industry, sector, market, financial product, trading strategy, backtest, or individual's trading does not guarantee future results or returns. Investors are fully responsible for any investment decisions they make. Such decisions should be based solely on an evaluation of their financial circumstances, investment objectives, risk tolerance, and liquidity needs.
My Scripts/Indicators/Ideas/Algos/Systems are only for educational purposes!
RibboNN Machine Learning [ChartPrime]The RibboNN ML indicator is a powerful tool designed to predict the direction of the market and display it through a ribbon-like visual representation, with colors changing based on the prediction outcome from a conditional class. The primary focus of this indicator is to assist traders in trend following trading strategies.
The RibboNN ML in action
Prediction Process:
Conditional Class: The indicator's predictive model relies on a conditional class, which combines information from both longcon (long condition) and short condition. These conditions are determined using specific rules and criteria, taking into account various market factors and indicators.
Direction Prediction: The conditional class provides the basis for predicting the direction of the market move. When the prediction value is greater than 0, it indicates an upward trend, while a value less than 0 suggests a downward trend.
Nearest Neighbor (NN): To attempt to enhance the accuracy of predictions, the RibboNN ML indicator incorporates a Nearest Neighbor algorithm. This algorithm analyzes historical data from the Ribbon ML's predictive model (RMF) and identifies patterns that closely resemble the current conditional prediction class, thereby offering more robust trend forecasts.
Ribbon Visualization:
The Ribbon ML indicator visually represents its predictions through a ribbon-like display. The ribbon changes colors based on the direction predicted by the conditional class. An upward trend is represented by a green color, while a downward trend is depicted by a red color, allowing traders to quickly identify potential market directions.
The introduction of the Nearest Neighbor algorithm provides the Ribbon ML indicator with unique and adaptive behaviors. By dynamically analyzing historical patterns and incorporating them into predictions, the indicator can adapt to changing market conditions and offer more reliable signals for trend following trading strategies.
Manipulation of the NN Settings:
Smaller Value of Neighbours Count:
When the value of "Neighbours Count" is small, the algorithm considers only a few nearest neighbors for making predictions.
A smaller value of "Neighbours Count" leads to more flexible decision boundaries, which can result in a more granular and sensitive model.
However, using a very small value might lead to overfitting, especially if the training data contains noise or outliers.
Larger Value of "Neighbours Count":
When the value of "Neighbours Count" is large, the algorithm considers a larger number of nearest neighbors for making predictions.
A larger value of "Neighbours Count" leads to smoother decision boundaries and helps capture the global patterns in the data.
However, setting a very large value might result in a loss of local patterns and make the model less sensitive to changes in the data.
Wick-to-Body Ratio Trend Forecast | Flux ChartsThe Wick-to-Body Ratio Trend Forecast Indicator aims to forecast potential movements following the last closed candle using the wick-to-body ratio. The script identifies those candles within the loopback period with a ratio matching that of the last closed candle and provides an analysis of their trends.
➡️ USAGE
Wick-to-body ratios can be used in many strategies. The most common use in stock trading is to discern bullish or bearish sentiment. This indicator extends candle ratios, revealing previous patterns that follow a candle with a similar ratio. The most basic use of this indicator is the single forecast line.
➡️ FORECASTING SYSTEM
This line displays a compilation of the averages of all the previous trends resulting from those historical candles with a matching ratio. It shows the average movements of the trends as well as the 'strength' of the trend. The 'strength' of the trend is a gradient that is blue when the trend deviates more from the average and red when it deviates less.
Chart: AMEX:SPY 30 min; Indicator Settings: Loopback 700, Previous Trends ON
The color-coded deviation is visible in this image of the indicator with the default settings (except for Forecast Lines > Previous Trends ), and the trend line grows bluer as the past patterns deviate more.
➡️ ADAPTIVE ACCEPTABLE RANGE
The algorithm looks back at every candle within the loopback period to find candles that match the last closed candle. The algorithm adaptively changes the acceptable range to which a candle can differ from the ratio of the last closed candle. The algorithm will never have more than 15 historical points used, as it will lower its sensitivity before it reaches that point.
Chart: BITSTAMP:BTCUSD 5 min; Indicator Settings: Loopback 700
Here is the BTC chart on 7/6/23 with default settings except for the loopback period at 700.
Chart: BITSTAMP:BTCUSD 5 min; Indicator Settings: Loopback 200
Here is the exact same chart with a loopback period of 200. While the first ratio for both is the same, a new ratio is revealed for the chart with a loopback of only 200 because the adaptive range is adjusted in the algorithm to find an acceptable number of reference points. Note the table in the top right however, while the algorithm adapts the acceptable range between the current ratio and historical ones to find reference points, there is a threshold at which candles will be considered too inaccurate to be considered. This prevents meaningless associations between candles due to a particularly rare ratio. This threshold can be adjusted in the settings through "Default Accuracy".
RSI-MFI Machine Learning [ Manhattan distance ]The RSI-MFI Machine Learning Indicator is a technical analysis tool that combines the Relative Strength Index (RSI) and Money Flow Index (MFI) indicators with the Manhattan distance metric.
It aims to provide insights into potential trade setups by leveraging machine learning principles and calculating distances between current and historical data points.
The indicator starts by calculating the RSI and MFI values based on the specified periods for each indicator.
The RSI measures the strength and speed of price movements, while the MFI evaluates the inflow and outflow of money in the market.
By combining these two indicators, the indicator captures both price momentum and money flow dynamics.
To apply machine learning principles , the indicator utilizes the Manhattan distance metric to quantify the similarity or dissimilarity between different data points.
The Manhattan distance is calculated by taking the absolute differences between corresponding RSI and MFI values of the current point and historical points.
Next, the indicator determines the nearest neighbors based on the calculated Manhattan distances.
The number of nearest neighbors is determined by the square root of the specified count of neighbors.
By identifying similar patterns and behaviors in the historical data, the indicator aims to uncover potential trade opportunities.
Trade signals are generated based on the calculated distances. The indicator compares each distance with the maximum distance encountered so far.
If a new maximum distance is found, it updates the value and considers the corresponding direction as a potential trade signal. The trade signals are stored in an array for further analysis.
Furthermore, the indicator considers the price action and a calculated regression line to differentiate between long and short trade signals.
Long trade signals are identified when the closing price is above the regression line, indicating a potentially bullish setup.
Short trade signals are identified when the closing price is below the regression line, indicating a potentially bearish setup.
The RSI-MFI Machine Learning Indicator visualizes the regression line on the price chart and labels the bars accordingly. It highlights the regression line with different colors based on the trade signals, making it easier for traders to identify potential entry or exit points.
Traders can use the RSI-MFI Machine Learning Indicator as a tool to analyze price movements, evaluate market conditions based on RSI and MFI, leverage machine learning concepts to find similar patterns, and make informed trading decisions.
Machine Learning : Torben's Moving Median KNN BandsWhat is Median Filtering ?
Median filtering is a non-linear digital filtering technique, often used to remove noise from an image or signal. Such noise reduction is a typical pre-processing step to improve the results of later processing (for example, edge detection on an image). Median filtering is very widely used in digital image processing because, under certain conditions, it preserves edges while removing noise (but see the discussion below), also having applications in signal processing.
The main idea of the median filter is to run through the signal entry by entry, replacing each entry with the median of neighboring entries. The pattern of neighbors is called the "window", which slides, entry by entry, over the entire signal. For one-dimensional signals, the most obvious window is just the first few preceding and following entries, whereas for two-dimensional (or higher-dimensional) data the window must include all entries within a given radius or ellipsoidal region (i.e. the median filter is not a separable filter).
The median filter works by taking the median of all the pixels in a neighborhood around the current pixel. The median is the middle value in a sorted list of numbers. This means that the median filter is not sensitive to the order of the pixels in the neighborhood, and it is not affected by outliers (very high or very low values).
The median filter is a very effective way to remove noise from images. It can remove both salt and pepper noise (random white and black pixels) and Gaussian noise (randomly distributed pixels with a Gaussian distribution). The median filter is also very good at preserving edges, which is why it is often used as a pre-processing step for edge detection.
However, the median filter can also blur images. This is because the median filter replaces each pixel with the value of the median of its neighbors. This can cause the edges of objects in the image to be smoothed out. The amount of blurring depends on the size of the window used by the median filter. A larger window will blur more than a smaller window.
The median filter is a very versatile tool that can be used for a variety of tasks in image processing. It is a good choice for removing noise and preserving edges, but it can also blur images. The best way to use the median filter is to experiment with different window sizes to find the setting that produces the desired results.
What is this Indicator ?
K-nearest neighbors (KNN) is a simple, non-parametric machine learning algorithm that can be used for both classification and regression tasks. The basic idea behind KNN is to find the K most similar data points to a new data point and then use the labels of those K data points to predict the label of the new data point.
Torben's moving median is a variation of the median filter that is used to remove noise from images. The median filter works by replacing each pixel in an image with the median of its neighbors. Torben's moving median works in a similar way, but it also averages the values of the neighbors. This helps to reduce the amount of blurring that can occur with the median filter.
KNN over Torben's moving median is a hybrid algorithm that combines the strengths of both KNN and Torben's moving median. KNN is able to learn the underlying distribution of the data, while Torben's moving median is able to remove noise from the data. This combination can lead to better performance than either algorithm on its own.
To implement KNN over Torben's moving median, we first need to choose a value for K. The value of K controls how many neighbors are used to predict the label of a new data point. A larger value of K will make the algorithm more robust to noise, but it will also make the algorithm less sensitive to local variations in the data.
Once we have chosen a value for K, we need to train the algorithm on a dataset of labeled data points. The training dataset will be used to learn the underlying distribution of the data.
Once the algorithm is trained, we can use it to predict the labels of new data points. To do this, we first need to find the K most similar data points to the new data point. We can then use the labels of those K data points to predict the label of the new data point.
KNN over Torben's moving median is a simple, yet powerful algorithm that can be used for a variety of tasks. It is particularly well-suited for tasks where the data is noisy or where the underlying distribution of the data is unknown.
Here are some of the advantages of using KNN over Torben's moving median:
KNN is able to learn the underlying distribution of the data.
KNN is robust to noise.
KNN is not sensitive to local variations in the data.
Here are some of the disadvantages of using KNN over Torben's moving median:
KNN can be computationally expensive for large datasets.
KNN can be sensitive to the choice of K.
KNN can be slow to train.
Machine Learning : Cosine Similarity & Euclidean DistanceIntroduction:
This script implements a comprehensive trading strategy that adheres to the established rules and guidelines of housing trading. It leverages advanced machine learning techniques and incorporates customised moving averages, including the Conceptive Price Moving Average (CPMA), to provide accurate signals for informed trading decisions in the housing market. Additionally, signal processing techniques such as Lorentzian, Euclidean distance, Cosine similarity, Know sure thing, Rational Quadratic, and sigmoid transformation are utilised to enhance the signal quality and improve trading accuracy.
Features:
Market Analysis: The script utilizes advanced machine learning methods such as Lorentzian, Euclidean distance, and Cosine similarity to analyse market conditions. These techniques measure the similarity and distance between data points, enabling more precise signal identification and enhancing trading decisions.
Cosine similarity:
Cosine similarity is a measure used to determine the similarity between two vectors, typically in a high-dimensional space. It calculates the cosine of the angle between the vectors, indicating the degree of similarity or dissimilarity.
In the context of trading or signal processing, cosine similarity can be employed to compare the similarity between different data points or signals. The vectors in this case represent the numerical representations of the data points or signals.
Cosine similarity ranges from -1 to 1, with 1 indicating perfect similarity, 0 indicating no similarity, and -1 indicating perfect dissimilarity. A higher cosine similarity value suggests a closer match between the vectors, implying that the signals or data points share similar characteristics.
Lorentzian Classification:
Lorentzian classification is a machine learning algorithm used for classification tasks. It is based on the Lorentzian distance metric, which measures the similarity or dissimilarity between two data points. The Lorentzian distance takes into account the shape of the data distribution and can handle outliers better than other distance metrics.
Euclidean Distance:
Euclidean distance is a distance metric widely used in mathematics and machine learning. It calculates the straight-line distance between two points in Euclidean space. In two-dimensional space, the Euclidean distance between two points (x1, y1) and (x2, y2) is calculated using the formula sqrt((x2 - x1)^2 + (y2 - y1)^2).
Dynamic Time Windows: The script incorporates a dynamic time window function that allows users to define specific time ranges for trading. It checks if the current time falls within the specified window to execute the relevant trading signals.
Custom Moving Averages: The script includes the CPMA, a powerful moving average calculation. Unlike traditional moving averages, the CPMA provides improved support and resistance levels by considering multiple price types and employing a combination of Exponential Moving Averages (EMAs) and Simple Moving Averages (SMAs). Its adaptive nature ensures responsiveness to changes in price trends.
Signal Processing Techniques: The script applies signal processing techniques such as Know sure thing, Rational Quadratic, and sigmoid transformation to enhance the quality of the generated signals. These techniques improve the accuracy and reliability of the trading signals, aiding in making well-informed trading decisions.
Trade Statistics and Metrics: The script provides comprehensive trade statistics and metrics, including total wins, losses, win rate, win-loss ratio, and early signal flips. These metrics offer valuable insights into the performance and effectiveness of the trading strategy.
Usage:
Configuring Time Windows: Users can customize the time windows by specifying the start and finish time ranges according to their trading preferences and local market conditions.
Signal Interpretation: The script generates long and short signals based on the analysis, custom moving averages, and signal processing techniques. Users should pay attention to these signals and take appropriate action, such as entering or exiting trades, depending on their trading strategies.
Trade Statistics: The script continuously tracks and updates trade statistics, providing users with a clear overview of their trading performance. These statistics help users assess the effectiveness of the strategy and make informed decisions.
Conclusion:
With its adherence to housing trading rules, advanced machine learning methods, customized moving averages like the CPMA, and signal processing techniques such as Lorentzian, Euclidean distance, Cosine similarity, Know sure thing, Rational Quadratic, and sigmoid transformation, this script offers users a powerful tool for housing market analysis and trading. By leveraging the provided signals, time windows, and trade statistics, users can enhance their trading strategies and improve their overall trading performance.
Disclaimer:
Please note that while this script incorporates established tradingview housing rules, advanced machine learning techniques, customized moving averages, and signal processing techniques, it should be used for informational purposes only. Users are advised to conduct their own analysis and exercise caution when making trading decisions. The script's performance may vary based on market conditions, user settings, and the accuracy of the machine learning methods and signal processing techniques. The trading platform and developers are not responsible for any financial losses incurred while using this script.
By publishing this script on the platform, traders can benefit from its professional presentation, clear instructions, and the utilisation of advanced machine learning techniques, customised moving averages, and signal processing techniques for enhanced trading signals and accuracy.
I extend my gratitude to TradingView, LUX ALGO, and JDEHORTY for their invaluable contributions to the trading community. Their innovative scripts, meticulous coding patterns, and insightful ideas have profoundly enriched traders' strategies, including my own.
Endpointed SSA of Price [Loxx]The Endpointed SSA of Price: A Comprehensive Tool for Market Analysis and Decision-Making
The financial markets present sophisticated challenges for traders and investors as they navigate the complexities of market behavior. To effectively interpret and capitalize on these complexities, it is crucial to employ powerful analytical tools that can reveal hidden patterns and trends. One such tool is the Endpointed SSA of Price, which combines the strengths of Caterpillar Singular Spectrum Analysis, a sophisticated time series decomposition method, with insights from the fields of economics, artificial intelligence, and machine learning.
The Endpointed SSA of Price has its roots in the interdisciplinary fusion of mathematical techniques, economic understanding, and advancements in artificial intelligence. This unique combination allows for a versatile and reliable tool that can aid traders and investors in making informed decisions based on comprehensive market analysis.
The Endpointed SSA of Price is not only valuable for experienced traders but also serves as a useful resource for those new to the financial markets. By providing a deeper understanding of market forces, this innovative indicator equips users with the knowledge and confidence to better assess risks and opportunities in their financial pursuits.
█ Exploring Caterpillar SSA: Applications in AI, Machine Learning, and Finance
Caterpillar SSA (Singular Spectrum Analysis) is a non-parametric method for time series analysis and signal processing. It is based on a combination of principles from classical time series analysis, multivariate statistics, and the theory of random processes. The method was initially developed in the early 1990s by a group of Russian mathematicians, including Golyandina, Nekrutkin, and Zhigljavsky.
Background Information:
SSA is an advanced technique for decomposing time series data into a sum of interpretable components, such as trend, seasonality, and noise. This decomposition allows for a better understanding of the underlying structure of the data and facilitates forecasting, smoothing, and anomaly detection. Caterpillar SSA is a particular implementation of SSA that has proven to be computationally efficient and effective for handling large datasets.
Uses in AI and Machine Learning:
In recent years, Caterpillar SSA has found applications in various fields of artificial intelligence (AI) and machine learning. Some of these applications include:
1. Feature extraction: Caterpillar SSA can be used to extract meaningful features from time series data, which can then serve as inputs for machine learning models. These features can help improve the performance of various models, such as regression, classification, and clustering algorithms.
2. Dimensionality reduction: Caterpillar SSA can be employed as a dimensionality reduction technique, similar to Principal Component Analysis (PCA). It helps identify the most significant components of a high-dimensional dataset, reducing the computational complexity and mitigating the "curse of dimensionality" in machine learning tasks.
3. Anomaly detection: The decomposition of a time series into interpretable components through Caterpillar SSA can help in identifying unusual patterns or outliers in the data. Machine learning models trained on these decomposed components can detect anomalies more effectively, as the noise component is separated from the signal.
4. Forecasting: Caterpillar SSA has been used in combination with machine learning techniques, such as neural networks, to improve forecasting accuracy. By decomposing a time series into its underlying components, machine learning models can better capture the trends and seasonality in the data, resulting in more accurate predictions.
Application in Financial Markets and Economics:
Caterpillar SSA has been employed in various domains within financial markets and economics. Some notable applications include:
1. Stock price analysis: Caterpillar SSA can be used to analyze and forecast stock prices by decomposing them into trend, seasonal, and noise components. This decomposition can help traders and investors better understand market dynamics, detect potential turning points, and make more informed decisions.
2. Economic indicators: Caterpillar SSA has been used to analyze and forecast economic indicators, such as GDP, inflation, and unemployment rates. By decomposing these time series, researchers can better understand the underlying factors driving economic fluctuations and develop more accurate forecasting models.
3. Portfolio optimization: By applying Caterpillar SSA to financial time series data, portfolio managers can better understand the relationships between different assets and make more informed decisions regarding asset allocation and risk management.
Application in the Indicator:
In the given indicator, Caterpillar SSA is applied to a financial time series (price data) to smooth the series and detect significant trends or turning points. The method is used to decompose the price data into a set number of components, which are then combined to generate a smoothed signal. This signal can help traders and investors identify potential entry and exit points for their trades.
The indicator applies the Caterpillar SSA method by first constructing the trajectory matrix using the price data, then computing the singular value decomposition (SVD) of the matrix, and finally reconstructing the time series using a selected number of components. The reconstructed series serves as a smoothed version of the original price data, highlighting significant trends and turning points. The indicator can be customized by adjusting the lag, number of computations, and number of components used in the reconstruction process. By fine-tuning these parameters, traders and investors can optimize the indicator to better match their specific trading style and risk tolerance.
Caterpillar SSA is versatile and can be applied to various types of financial instruments, such as stocks, bonds, commodities, and currencies. It can also be combined with other technical analysis tools or indicators to create a comprehensive trading system. For example, a trader might use Caterpillar SSA to identify the primary trend in a market and then employ additional indicators, such as moving averages or RSI, to confirm the trend and generate trading signals.
In summary, Caterpillar SSA is a powerful time series analysis technique that has found applications in AI and machine learning, as well as financial markets and economics. By decomposing a time series into interpretable components, Caterpillar SSA enables better understanding of the underlying structure of the data, facilitating forecasting, smoothing, and anomaly detection. In the context of financial trading, the technique is used to analyze price data, detect significant trends or turning points, and inform trading decisions.
█ Input Parameters
This indicator takes several inputs that affect its signal output. These inputs can be classified into three categories: Basic Settings, UI Options, and Computation Parameters.
Source: This input represents the source of price data, which is typically the closing price of an asset. The user can select other price data, such as opening price, high price, or low price. The selected price data is then utilized in the Caterpillar SSA calculation process.
Lag: The lag input determines the window size used for the time series decomposition. A higher lag value implies that the SSA algorithm will consider a longer range of historical data when extracting the underlying trend and components. This parameter is crucial, as it directly impacts the resulting smoothed series and the quality of extracted components.
Number of Computations: This input, denoted as 'ncomp,' specifies the number of eigencomponents to be considered in the reconstruction of the time series. A smaller value results in a smoother output signal, while a higher value retains more details in the series, potentially capturing short-term fluctuations.
SSA Period Normalization: This input is used to normalize the SSA period, which adjusts the significance of each eigencomponent to the overall signal. It helps in making the algorithm adaptive to different timeframes and market conditions.
Number of Bars: This input specifies the number of bars to be processed by the algorithm. It controls the range of data used for calculations and directly affects the computation time and the output signal.
Number of Bars to Render: This input sets the number of bars to be plotted on the chart. A higher value slows down the computation but provides a more comprehensive view of the indicator's performance over a longer period. This value controls how far back the indicator is rendered.
Color bars: This boolean input determines whether the bars should be colored according to the signal's direction. If set to true, the bars are colored using the defined colors, which visually indicate the trend direction.
Show signals: This boolean input controls the display of buy and sell signals on the chart. If set to true, the indicator plots shapes (triangles) to represent long and short trade signals.
Static Computation Parameters:
The indicator also includes several internal parameters that affect the Caterpillar SSA algorithm, such as Maxncomp, MaxLag, and MaxArrayLength. These parameters set the maximum allowed values for the number of computations, the lag, and the array length, ensuring that the calculations remain within reasonable limits and do not consume excessive computational resources.
█ A Note on Endpionted, Non-repainting Indicators
An endpointed indicator is one that does not recalculate or repaint its past values based on new incoming data. In other words, the indicator's previous signals remain the same even as new price data is added. This is an important feature because it ensures that the signals generated by the indicator are reliable and accurate, even after the fact.
When an indicator is non-repainting or endpointed, it means that the trader can have confidence in the signals being generated, knowing that they will not change as new data comes in. This allows traders to make informed decisions based on historical signals, without the fear of the signals being invalidated in the future.
In the case of the Endpointed SSA of Price, this non-repainting property is particularly valuable because it allows traders to identify trend changes and reversals with a high degree of accuracy, which can be used to inform trading decisions. This can be especially important in volatile markets where quick decisions need to be made.
Lorentzian Classification Strategy Based in the model of Machine learning: Lorentzian Classification by @jdehorty, you will be able to get into trending moves and get interesting entries in the market with this strategy. I also put some new features for better backtesting results!
Backtesting context: 2022-07-19 to 2023-04-14 of US500 1H by PEPPERSTONE. Commissions: 0.03% for each entry, 0.03% for each exit. Risk per trade: 2.5% of the total account
For this strategy, 3 indicators are used:
Machine learning: Lorentzian Classification by @jdehorty
One Ema of 200 periods for identifying the trend
Supertrend indicator as a filter for some exits
Atr stop loss from Gatherio
Trade conditions:
For longs:
Close price is above 200 Ema
Lorentzian Classification indicates a buying signal
This gives us our long signal. Stop loss will be determined by atr stop loss (white point), break even(blue point) by a risk/reward ratio of 1:1 and take profit of 3:1 where half position will be closed. This will be showed as buy.
The other half will be closed when the model indicates a selling signal or Supertrend indicator gives a bearish signal. This will be showed as cl buy.
For shorts:
Close price is under 200 Ema
Lorentzian Classification indicates a selling signal
This gives us our short signal. Stop loss will be determined by atr stop loss (white point), break even(blue point) by a risk/reward ratio of 1:1 and take profit of 3:1 where half position will be closed. This will be showed as sell.
The other half will be closed when the model indicates a buying signal or Supertrend indicator gives a bullish signal. This will be showed as cl sell.
Risk management
To calculate the amount of the position you will use just a small percent of your initial capital for the strategy and you will use the atr stop loss or last swing for this.
Example: You have 1000 usd and you just want to risk 2,5% of your account, there is a buy signal at price of 4,000 usd. The stop loss price from atr stop loss or last swing is 3,900. You calculate the distance in percent between 4,000 and 3,900. In this case, that distance would be of 2.50%. Then, you calculate your position by this way: (initial or current capital * risk per trade of your account) / (stop loss distance).
Using these values on the formula: (1000*2,5%)/(2,5%) = 1000usd. It means, you have to use 1000 usd for risking 2.5% of your account.
We will use this risk management for applying compound interest.
> In settings, with position amount calculator, you can enter the amount in usd of your account and the amount in percentage for risking per trade of the account. You will see this value in green color in the upper left corner that shows the amount in usd to use for risking the specific percentage of your account.
> You can also choose a fixed amount, so you will have to activate fixed amount in risk management for trades and set the fixed amount for backtesting.
Script functions
Inside of settings, you will find some utilities for display atr stop loss, break evens, positions, signals, indicators, a table of some stats from backtesting, etc.
You will find the settings for risk management at the end of the script if you want to change something or trying new values for other assets for backtesting.
If you want to change the initial capital for backtest the strategy, go to properties, and also enter the commisions of your exchange and slippage for more realistic results.
In risk managment you can find an option called "Use leverage ?", activate this if you want to backtest using leverage, which means that in case of not having enough money for risking the % determined by you of your account using your initial capital, you will use leverage for using the enough amount for risking that % of your acount in a buy position. Otherwise, the amount will be limited by your initial/current capital
I also added a function for backtesting if you had added or withdrawn money frequently:
Adding money: You can choose how often you want to add money (Monthly, yearly, daily or weekly). Then a fixed amount of money and activate or deactivate this function
Withdraw money: You can choose if you want to withdraw a fixed amount or a percentage of earnings. Then you can choose a fixed amount of money, the period of time and activate or deactivate this function. Also, the percentage of earnings if you choosed this option.
Some other assets where strategy has worked
BTCUSD 4H, 1D
ETHUSD 4H, 1D
BNBUSD 4H
SPX 1D
BANKNIFTY 4H, 15 min
Some things to consider
USE UNDER YOUR OWN RISK. PAST RESULTS DO NOT REPRESENT THE FUTURE.
DEPENDING OF % ACCOUNT RISK PER TRADE, YOU COULD REQUIRE LEVERAGE FOR OPEN SOME POSITIONS, SO PLEASE, BE CAREFULL AND USE CORRECTLY THE RISK MANAGEMENT
Do not forget to change commissions and other parameters related with back testing results!. If you have problems loading the script reduce max bars back number in general settings
Strategies for trending markets use to have more looses than wins and it takes a long time to get profits, so do not forget to be patient and consistent !
Please, visit the post from @jdehorty called Machine Learning: Lorentzian Classification for a better understanding of his script!
Any support and boosts will be well received. If you have any question, do not doubt to ask!
Machine Learning: Lorentzian Classification█ OVERVIEW
A Lorentzian Distance Classifier (LDC) is a Machine Learning classification algorithm capable of categorizing historical data from a multi-dimensional feature space. This indicator demonstrates how Lorentzian Classification can also be used to predict the direction of future price movements when used as the distance metric for a novel implementation of an Approximate Nearest Neighbors (ANN) algorithm.
█ BACKGROUND
In physics, Lorentzian space is perhaps best known for its role in describing the curvature of space-time in Einstein's theory of General Relativity (2). Interestingly, however, this abstract concept from theoretical physics also has tangible real-world applications in trading.
Recently, it was hypothesized that Lorentzian space was also well-suited for analyzing time-series data (4), (5). This hypothesis has been supported by several empirical studies that demonstrate that Lorentzian distance is more robust to outliers and noise than the more commonly used Euclidean distance (1), (3), (6). Furthermore, Lorentzian distance was also shown to outperform dozens of other highly regarded distance metrics, including Manhattan distance, Bhattacharyya similarity, and Cosine similarity (1), (3). Outside of Dynamic Time Warping based approaches, which are unfortunately too computationally intensive for PineScript at this time, the Lorentzian Distance metric consistently scores the highest mean accuracy over a wide variety of time series data sets (1).
Euclidean distance is commonly used as the default distance metric for NN-based search algorithms, but it may not always be the best choice when dealing with financial market data. This is because financial market data can be significantly impacted by proximity to major world events such as FOMC Meetings and Black Swan events. This event-based distortion of market data can be framed as similar to the gravitational warping caused by a massive object on the space-time continuum. For financial markets, the analogous continuum that experiences warping can be referred to as "price-time".
Below is a side-by-side comparison of how neighborhoods of similar historical points appear in three-dimensional Euclidean Space and Lorentzian Space:
This figure demonstrates how Lorentzian space can better accommodate the warping of price-time since the Lorentzian distance function compresses the Euclidean neighborhood in such a way that the new neighborhood distribution in Lorentzian space tends to cluster around each of the major feature axes in addition to the origin itself. This means that, even though some nearest neighbors will be the same regardless of the distance metric used, Lorentzian space will also allow for the consideration of historical points that would otherwise never be considered with a Euclidean distance metric.
Intuitively, the advantage inherent in the Lorentzian distance metric makes sense. For example, it is logical that the price action that occurs in the hours after Chairman Powell finishes delivering a speech would resemble at least some of the previous times when he finished delivering a speech. This may be true regardless of other factors, such as whether or not the market was overbought or oversold at the time or if the macro conditions were more bullish or bearish overall. These historical reference points are extremely valuable for predictive models, yet the Euclidean distance metric would miss these neighbors entirely, often in favor of irrelevant data points from the day before the event. By using Lorentzian distance as a metric, the ML model is instead able to consider the warping of price-time caused by the event and, ultimately, transcend the temporal bias imposed on it by the time series.
For more information on the implementation details of the Approximate Nearest Neighbors (ANN) algorithm used in this indicator, please refer to the detailed comments in the source code.
█ HOW TO USE
Below is an explanatory breakdown of the different parts of this indicator as it appears in the interface:
Below is an explanation of the different settings for this indicator:
General Settings:
Source - This has a default value of "hlc3" and is used to control the input data source.
Neighbors Count - This has a default value of 8, a minimum value of 1, a maximum value of 100, and a step of 1. It is used to control the number of neighbors to consider.
Max Bars Back - This has a default value of 2000.
Feature Count - This has a default value of 5, a minimum value of 2, and a maximum value of 5. It controls the number of features to use for ML predictions.
Color Compression - This has a default value of 1, a minimum value of 1, and a maximum value of 10. It is used to control the compression factor for adjusting the intensity of the color scale.
Show Exits - This has a default value of false. It controls whether to show the exit threshold on the chart.
Use Dynamic Exits - This has a default value of false. It is used to control whether to attempt to let profits ride by dynamically adjusting the exit threshold based on kernel regression.
Feature Engineering Settings:
Note: The Feature Engineering section is for fine-tuning the features used for ML predictions. The default values are optimized for the 4H to 12H timeframes for most charts, but they should also work reasonably well for other timeframes. By default, the model can support features that accept two parameters (Parameter A and Parameter B, respectively). Even though there are only 4 features provided by default, the same feature with different settings counts as two separate features. If the feature only accepts one parameter, then the second parameter will default to EMA-based smoothing with a default value of 1. These features represent the most effective combination I have encountered in my testing, but additional features may be added as additional options in the future.
Feature 1 - This has a default value of "RSI" and options are: "RSI", "WT", "CCI", "ADX".
Feature 2 - This has a default value of "WT" and options are: "RSI", "WT", "CCI", "ADX".
Feature 3 - This has a default value of "CCI" and options are: "RSI", "WT", "CCI", "ADX".
Feature 4 - This has a default value of "ADX" and options are: "RSI", "WT", "CCI", "ADX".
Feature 5 - This has a default value of "RSI" and options are: "RSI", "WT", "CCI", "ADX".
Filters Settings:
Use Volatility Filter - This has a default value of true. It is used to control whether to use the volatility filter.
Use Regime Filter - This has a default value of true. It is used to control whether to use the trend detection filter.
Use ADX Filter - This has a default value of false. It is used to control whether to use the ADX filter.
Regime Threshold - This has a default value of -0.1, a minimum value of -10, a maximum value of 10, and a step of 0.1. It is used to control the Regime Detection filter for detecting Trending/Ranging markets.
ADX Threshold - This has a default value of 20, a minimum value of 0, a maximum value of 100, and a step of 1. It is used to control the threshold for detecting Trending/Ranging markets.
Kernel Regression Settings:
Trade with Kernel - This has a default value of true. It is used to control whether to trade with the kernel.
Show Kernel Estimate - This has a default value of true. It is used to control whether to show the kernel estimate.
Lookback Window - This has a default value of 8 and a minimum value of 3. It is used to control the number of bars used for the estimation. Recommended range: 3-50
Relative Weighting - This has a default value of 8 and a step size of 0.25. It is used to control the relative weighting of time frames. Recommended range: 0.25-25
Start Regression at Bar - This has a default value of 25. It is used to control the bar index on which to start regression. Recommended range: 0-25
Display Settings:
Show Bar Colors - This has a default value of true. It is used to control whether to show the bar colors.
Show Bar Prediction Values - This has a default value of true. It controls whether to show the ML model's evaluation of each bar as an integer.
Use ATR Offset - This has a default value of false. It controls whether to use the ATR offset instead of the bar prediction offset.
Bar Prediction Offset - This has a default value of 0 and a minimum value of 0. It is used to control the offset of the bar predictions as a percentage from the bar high or close.
Backtesting Settings:
Show Backtest Results - This has a default value of true. It is used to control whether to display the win rate of the given configuration.
█ WORKS CITED
(1) R. Giusti and G. E. A. P. A. Batista, "An Empirical Comparison of Dissimilarity Measures for Time Series Classification," 2013 Brazilian Conference on Intelligent Systems, Oct. 2013, DOI: 10.1109/bracis.2013.22.
(2) Y. Kerimbekov, H. Ş. Bilge, and H. H. Uğurlu, "The use of Lorentzian distance metric in classification problems," Pattern Recognition Letters, vol. 84, 170–176, Dec. 2016, DOI: 10.1016/j.patrec.2016.09.006.
(3) A. Bagnall, A. Bostrom, J. Large, and J. Lines, "The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms." ResearchGate, Feb. 04, 2016.
(4) H. Ş. Bilge, Yerzhan Kerimbekov, and Hasan Hüseyin Uğurlu, "A new classification method by using Lorentzian distance metric," ResearchGate, Sep. 02, 2015.
(5) Y. Kerimbekov and H. Şakir Bilge, "Lorentzian Distance Classifier for Multiple Features," Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods, 2017, DOI: 10.5220/0006197004930501.
(6) V. Surya Prasath et al., "Effects of Distance Measure Choice on KNN Classifier Performance - A Review." .
█ ACKNOWLEDGEMENTS
@veryfid - For many invaluable insights, discussions, and advice that helped to shape this project.
@capissimo - For open sourcing his interesting ideas regarding various KNN implementations in PineScript, several of which helped inspire my original undertaking of this project.
@RikkiTavi - For many invaluable physics-related conversations and for his helping me develop a mechanism for visualizing various distance algorithms in 3D using JavaScript
@jlaurel - For invaluable literature recommendations that helped me to understand the underlying subject matter of this project.
@annutara - For help in beta-testing this indicator and for sharing many helpful ideas and insights early on in its development.
@jasontaylor7 - For helping to beta-test this indicator and for many helpful conversations that helped to shape my backtesting workflow
@meddymarkusvanhala - For helping to beta-test this indicator
@dlbnext - For incredibly detailed backtesting testing of this indicator and for sharing numerous ideas on how the user experience could be improved.
MLExtensionsLibrary "MLExtensions"
normalizeDeriv(src, quadraticMeanLength)
Returns the smoothed hyperbolic tangent of the input series.
Parameters:
src : The input series (i.e., the first-order derivative for price).
quadraticMeanLength : The length of the quadratic mean (RMS).
Returns: nDeriv The normalized derivative of the input series.
normalize(src, min, max)
Rescales a source value with an unbounded range to a target range.
Parameters:
src : The input series
min : The minimum value of the unbounded range
max : The maximum value of the unbounded range
Returns: The normalized series
rescale(src, oldMin, oldMax, newMin, newMax)
Rescales a source value with a bounded range to anther bounded range
Parameters:
src : The input series
oldMin : The minimum value of the range to rescale from
oldMax : The maximum value of the range to rescale from
newMin : The minimum value of the range to rescale to
newMax : The maximum value of the range to rescale to
Returns: The rescaled series
color_green(prediction)
Assigns varying shades of the color green based on the KNN classification
Parameters:
prediction : Value (int|float) of the prediction
Returns: color
color_red(prediction)
Assigns varying shades of the color red based on the KNN classification
Parameters:
prediction : Value of the prediction
Returns: color
tanh(src)
Returns the the hyperbolic tangent of the input series. The sigmoid-like hyperbolic tangent function is used to compress the input to a value between -1 and 1.
Parameters:
src : The input series (i.e., the normalized derivative).
Returns: tanh The hyperbolic tangent of the input series.
dualPoleFilter(src, lookback)
Returns the smoothed hyperbolic tangent of the input series.
Parameters:
src : The input series (i.e., the hyperbolic tangent).
lookback : The lookback window for the smoothing.
Returns: filter The smoothed hyperbolic tangent of the input series.
tanhTransform(src, smoothingFrequency, quadraticMeanLength)
Returns the tanh transform of the input series.
Parameters:
src : The input series (i.e., the result of the tanh calculation).
smoothingFrequency
quadraticMeanLength
Returns: signal The smoothed hyperbolic tangent transform of the input series.
n_rsi(src, n1, n2)
Returns the normalized RSI ideal for use in ML algorithms.
Parameters:
src : The input series (i.e., the result of the RSI calculation).
n1 : The length of the RSI.
n2 : The smoothing length of the RSI.
Returns: signal The normalized RSI.
n_cci(src, n1, n2)
Returns the normalized CCI ideal for use in ML algorithms.
Parameters:
src : The input series (i.e., the result of the CCI calculation).
n1 : The length of the CCI.
n2 : The smoothing length of the CCI.
Returns: signal The normalized CCI.
n_wt(src, n1, n2)
Returns the normalized WaveTrend Classic series ideal for use in ML algorithms.
Parameters:
src : The input series (i.e., the result of the WaveTrend Classic calculation).
n1
n2
Returns: signal The normalized WaveTrend Classic series.
n_adx(highSrc, lowSrc, closeSrc, n1)
Returns the normalized ADX ideal for use in ML algorithms.
Parameters:
highSrc : The input series for the high price.
lowSrc : The input series for the low price.
closeSrc : The input series for the close price.
n1 : The length of the ADX.
regime_filter(src, threshold, useRegimeFilter)
Parameters:
src
threshold
useRegimeFilter
filter_adx(src, length, adxThreshold, useAdxFilter)
filter_adx
Parameters:
src : The source series.
length : The length of the ADX.
adxThreshold : The ADX threshold.
useAdxFilter : Whether to use the ADX filter.
Returns: The ADX.
filter_volatility(minLength, maxLength, useVolatilityFilter)
filter_volatility
Parameters:
minLength : The minimum length of the ATR.
maxLength : The maximum length of the ATR.
useVolatilityFilter : Whether to use the volatility filter.
Returns: Boolean indicating whether or not to let the signal pass through the filter.
backtest(high, low, open, startLongTrade, endLongTrade, startShortTrade, endShortTrade, isStopLossHit, maxBarsBackIndex, thisBarIndex)
Performs a basic backtest using the specified parameters and conditions.
Parameters:
high : The input series for the high price.
low : The input series for the low price.
open : The input series for the open price.
startLongTrade : The series of conditions that indicate the start of a long trade.`
endLongTrade : The series of conditions that indicate the end of a long trade.
startShortTrade : The series of conditions that indicate the start of a short trade.
endShortTrade : The series of conditions that indicate the end of a short trade.
isStopLossHit : The stop loss hit indicator.
maxBarsBackIndex : The maximum number of bars to go back in the backtest.
thisBarIndex : The current bar index.
Returns: A tuple containing backtest values
init_table()
init_table()
Returns: tbl The backtest results.
update_table(tbl, tradeStatsHeader, totalTrades, totalWins, totalLosses, winLossRatio, winrate, stopLosses)
update_table(tbl, tradeStats)
Parameters:
tbl : The backtest results table.
tradeStatsHeader : The trade stats header.
totalTrades : The total number of trades.
totalWins : The total number of wins.
totalLosses : The total number of losses.
winLossRatio : The win loss ratio.
winrate : The winrate.
stopLosses : The total number of stop losses.
Returns: Updated backtest results table.
kNNLibrary "kNN"
Collection of experimental kNN functions. This is a work in progress, an improvement upon my original kNN script:
The script can be recreated with this library. Unlike the original script, that used multiple arrays, this has been reworked with the new Pine Script matrix features.
To make a kNN prediction, the following data should be supplied to the wrapper:
kNN : filter type. Right now either Binary or Percent . Binary works like in the original script: the system stores whether the price has increased (+1) or decreased (-1) since the previous knnStore event (called when either long or short condition is supplied). Percent works the same, but the values stored are the difference of prices in percents. That way larger differences in prices would give higher scores.
k : number k. This is how many nearest neighbors are to be selected (and summed up to get the result).
skew : kNN minimum difference. Normally, the prediction is done with a simple majority of the neighbor votes. If skew is given, then more than a simple majority is needed for a prediction. This also means that there are inputs for which no prediction would be given (if the majority votes are between -skew and +skew). Note that in Percent mode more profitable trades will have higher voting power.
depth : kNN matrix size limit. Originally, the whole available history of trades was used to make a prediction. This not only requires more computational power, but also neglects the fact that the market conditions are changing. This setting restricts the memory matrix to a finite number of past trades.
price : price series
long : long condition. True if the long conditions are met, but filters are not yet applied. For example, in my original script, trades are only made on crossings of fast and slow MAs. So, whenever it is possible to go long, this value is set true. False otherwise.
short : short condition. Same as long , but for short condition.
store : whether the inputs should be stored. Additional filters may be applied to prevent bad trades (for example, trend-based filters), so if you only need to consult kNN without storing the trade, this should be set to false.
feature1 : current value of feature 1. A feature in this case is some kind of data derived from the price. Different features may be used to analyse the price series. For example, oscillator values. Not all of them may be used for kNN prediction. As the current kNN implementation is 2-dimensional, only two features can be used.
feature2 : current value of feature 2.
The wrapper returns a tuple: [ longOK, shortOK ]. This is a pair of filters. When longOK is true, then kNN predicts a long trade may be taken. When shortOK is true, then kNN predicts a short trade may be taken. The kNN filters are returned whenever long or short conditions are met. The trade is supposed to happen when long or short conditions are met and when the kNN filter for the desired direction is true.
Exported functions :
knnStore(knn, p1, p2, src, maxrows)
Store the previous trade; buffer the current one until results are in. Results are binary: up/down
Parameters:
knn : knn matrix
p1 : feature 1 value
p2 : feature 2 value
src : current price
maxrows : limit the matrix size to this number of rows (0 of no limit)
Returns: modified knn matrix
knnStorePercent(knn, p1, p2, src, maxrows)
Store the previous trade; buffer the current one until results are in. Results are in percents
Parameters:
knn : knn matrix
p1 : feature 1 value
p2 : feature 2 value
src : current price
maxrows : limit the matrix size to this number of rows (0 of no limit)
Returns: modified knn matrix
knnGet(distance, result)
Get neighbours by getting k results with the smallest distances
Parameters:
distance : distance array
result : result array
Returns: array slice of k results
knnDistance(knn, p1, p2)
Create a distance array from the two given parameters
Parameters:
knn : knn matrix
p1 : feature 1 value
p2 : feature 2 value
Returns: distance array
knnSum(knn, p1, p2, k)
Make a prediction, finding k nearest neighbours and summing them up
Parameters:
knn : knn matrix
p1 : feature 1 value
p2 : feature 2 value
k : sum k nearest neighbors
Returns: sum of k nearest neighbors
doKNN(kNN, k, skew, depth, price, long, short, store, feature1, feature2)
execute kNN filter
Parameters:
kNN : filter type
k : number k
skew : kNN minimum difference
depth : kNN matrix size limit
price : series
long : long condition
short : short condition
store : store the supplied features (if false, only checks the results without storage)
feature1 : feature 1 value
feature2 : feature 2 value
Returns: filter output
Machine Learning: kNN (New Approach)Description:
kNN is a very robust and simple method for data classification and prediction. It is very effective if the training data is large. However, it is distinguished by difficulty at determining its main parameter, K (a number of nearest neighbors), beforehand. The computation cost is also quite high because we need to compute distance of each instance to all training samples. Nevertheless, in algorithmic trading KNN is reported to perform on a par with such techniques as SVM and Random Forest. It is also widely used in the area of data science.
The input data is just a long series of prices over time without any particular features. The value to be predicted is just the next bar's price. The way that this problem is solved for both nearest neighbor techniques and for some other types of prediction algorithms is to create training records by taking, for instance, 10 consecutive prices and using the first 9 as predictor values and the 10th as the prediction value. Doing this way, given 100 data points in your time series you could create 10 different training records. It's possible to create even more training records than 10 by creating a new record starting at every data point. For instance, you could take the first 10 data points and create a record. Then you could take the 10 consecutive data points starting at the second data point, the 10 consecutive data points starting at the third data point, etc.
By default, shown are only 10 initial data points as predictor values and the 6th as the prediction value.
Here is a step-by-step workthrough on how to compute K nearest neighbors (KNN) algorithm for quantitative data:
1. Determine parameter K = number of nearest neighbors.
2. Calculate the distance between the instance and all the training samples. As we are dealing with one-dimensional distance, we simply take absolute value from the instance to value of x (| x – v |).
3. Rank the distance and determine nearest neighbors based on the K'th minimum distance.
4. Gather the values of the nearest neighbors.
5. Use average of nearest neighbors as the prediction value of the instance.
The original logic of the algorithm was slightly modified, and as a result at approx. N=17 the resulting curve nicely approximates that of the sma(20). See the description below. Beside the sma-like MA this algorithm also gives you a hint on the direction of the next bar move.