Quantile-Based Adaptive Detection🙏🏻 Dedicated to John Tukey. He invented the boxplot, and I finalized it.
QBAD (Quantile-Based Adaptive Detection) is ‘the’ adaptive (also optionally weighted = ready for timeseries) boxplot with more senseful fences. Instead of hardcoded multipliers for outer fences, I base em on a set of quantile-based asymmetry metrics (you can view it as an ‘algorithmic’ counter part of central & standardized moments). So outer bands are Not hardcoded, not optimized, not cross-validated etc, simply calculated at O(nlogn).
You can use it literally everywhere in any context with any continuous data, in any task that requires statistical control, novelty || outlier detection, without worrying and doubting the sense in arbitrary chosen thresholds. Obviously, given the robust nature of quantiles, it would fit best the cases where data has problems.
The thresholds are:
Basis: the model of the data (median in our case);
Deviations: represent typical spread around basis, together form “value” in general sense;
Extensions: estimate data’s extremums via combination of quantile-based asymmetry metrics without relying on actual blunt min and max, together form “range” / ”frame”. Datapoints outside the frame/range are novelties or outliers;
Limits: based also on quantile asymmetry metrics, estimate the bounds within which values can ‘ever’ emerge given the current data generating process stays the same, together form “field”. Datapoints outside the field are very rare, happen when a significant change/structural break happens in current data-generating process, or when a corrupt datapoint emerges.
…
The first part of the post is for locals xd, the second is for the wanderers/wizards/creators/:
First part:
In terms of markets, mostly u gotta worry about dem instruments that represent crypto & FX assets: it’s either activity hence data sources there are decentralized, or data is fishy.
For a higher algocomplexity cost O(nlong), unlike MBAD that is 0(n), this thing (a control system in fact) works better with ishy data (contaminated with wrong values, incomplete, missing values etc). Read about the “ breakdown point of an estimator ” if you wanna understand it.
Even with good data, in cases when you have multiple instruments that represent the same asset, e.g. CL and BRN futures, and for some reason you wanna skip constructing a proper index of em (while you should), QBAD should be better put on each instrument individually.
Another reason to use this algo-based rather than math-based tool, might be in cases when data quality is all good, but the actual causal processes that generate the data are a bit inconsistent and/or possess ‘increased’ activity in a way. SO in high volatility periods, this tool should provide better.
In terms of built-ins you got 2 weightings: by sequence and by inferred volume delta. The former should be ‘On’ all the time when you work with timeseries, unless for a reason you want to consciously turn it off for a reason. The latter, you gotta keep it ‘On’ unless you apply the tool on another dataset that ain’t got that particular additional dimension.
Ain’t matter the way you gonna use it, moving windows, cumulative windows with or without anchors, that’s your freedom of will, but some stuff stays the same:
Basis and deviations are “value” levels. From process control perspective, if you pls, it makes sense to Not only fade or push based on these levels, but to also do nothing when things are ambiguous and/or don’t require your intervention
Extensions and limits are extreme levels. Here you either push or fade, doing nothing is not an option, these are decisive points in all the meanings
Another important thing, lately I started to see one kind of trend here on tradingview as well and in general in near quant sources, of applying averages, percentiles etc ‘on’ other stationary metrics, so called “indicators”. And I mean not for diagnostic or development reasons, for decision making xd
This is not the evil crime ofc, but hillbilly af, cuz the metrics are stationary it means that you can model em, fit a distribution, like do smth sharper. Worst case you have Bayesian statistics armed with high density intervals and equal tail intervals, and even some others. All this stuff is not hard to do, if u aint’t doing it, it’s on you.
So what I’m saying is it makes sense to apply QBAD on returns ‘of your strategy’, on volume delta, but Not on other metrics that already do calculations over their own moving windows.
...
Second part:
Looks like some finna start to have lil suspicions, that ‘maybe’ after all math entities in reality are more like blueprints, while actual representations are physical/mechanical/algorithmic. Std & centralized moments is a math entity that represents location, scale & asymmetry info, and we can use it no problem, when things are legit and consistent especially. Real world stuff tho sometimes deviates from that ideal, so we need smth more handy and real. Add to the mix the algo counter part of means: quantiles.
Unlike the legacy quantile-based asymmetry metrics from the previous century (check quantile skewness & kurtosis), I don’t use arbitrary sets of quantiles, instead we get a binary pattern that is totally geometric & natural (check the code if interested, I made it very damn explicit). In spirit with math based central & standardized moments, each consequent pair is wider empathizing tail info more and more for each higher order metric.
Unlike the classic box plot, where inner thresholds are quartiles and the rest are based on em, here the basis is median (minimises L1), I base inner thresholds on it, and we continue the pattern by basing the further set of levels on the previous set. So unlike the classic box plot, here we have coherency in construction, symmetry.
Another thing to pay attention to, tho for some reason ain’t many talk about it, it’s not conceptually right to think that “you got data and you apply std moments on it”. No, you apply it to ‘centered around smth’ data. That ‘smth’ should minimize L2 error in case of math, L1 error in case of algo, and L0 error in case of learning/MLish/optimizational/whatever-you-cal-it stuff. So in the case of L0, that’s actually the ‘mode’ of KDE, but that’s for another time. Anyways, in case of L2 it’s mean, so we center data around mean, and apply std moments on residuals. That’s the precise way of framing it. If you understand this, suddenly very interesting details like 0th and 1st central moments start to make sense. In case of quantiles, we center data around the median, and do further processing on residuals, same.
Oth moment (I call it init) is always 1, tho it’s interesting to extrapolate backwards the sequence for higher order moments construction, to understand how we actually end up with this zero.
1st moment (I call it bias) of residuals would be zero if you match centering and residuals analysis methods. But for some reason you didn’t do that (e.g centered data around midhinge or mean and applied QBAD on the centered data), you have to account for that bias.
Realizing stuff > understanding stuff
Learning 2981234 human invented fields < realizing the same unified principles how the Universe works
∞
Boxplot
Intrabar BoxPlotThe Intrabar BoxPlot publication highlights an uncommon technique by displaying statistical intrabar Lower Timeframe (LTF) values on the chart.
🔶 USAGE
🔹 Middle 50% Boxes
By showing the middle 50% intrabar values through a box, we can more easily see where the intrabar activity is mainly situated.
The middle 50% intrabar values are referred to from here on as Interquartile range (IQR).
In this example, the successive IQRs form a channel where the price eventually breaks out.
Disproportionately distributed values can give insights which can be used to find potential support/resistance areas.
IQR gaps can give valuable information as well. Potentially, the price can return to these gaps.
Seeing the IQR areas against regular candles gives an alternative image of the underlying price movements.
🔹 Highest volume Price level
The script displays the price level with the highest volume situated, dependable on the user's source setting. Setting the source at 'close' will only display intrabar close values; the same goes for high, low, ...
As seen in the above example, the volume levels can aid in finding support/resistance.
🔹 Median
The location of the median off all intrabar values is displayed as a coloured dot: green when the close price is higher than the opening price and red if otherwise. The median can give valuable insights into price movements.
🔹 Outliers
Medium (white dots) and extreme (white X) outliers, in combination with the IQR box, can help identify potential areas of interest.
🔹 Volume Delta
When there is a discrepancy between the delta volume and direction of the candle, this will be displayed as follows:
Green candle: when the sum of the volume of red intrabars is higher than the sum of the volume of green intrabars, the candle will be coloured orange.
Red candle: when the sum of the volume of green intrabars is higher than the sum of the volume of red intrabars, the candle will be coloured blue.
🔹 Highlight Boxplot only
Probably the easiest way to display boxplot only is by changing the Bar's style to Bars .
🔶 DETAILS
All intrabar values (Lower TimeFrame - LTF) are sorted and evaluated. Values can be close , high , low , ... by selecting this in Settings ( source ).
The middle 50% of all values are displayed as a box; this contains the values between percentile 25 (p25) and percentile 75 (p75). The value of percentile rank 75 means 75% of all values are lower. The value of percentile rank 25 means 25% of all values are lower, or 75% is higher.
The difference between p75 and p25 is also known as Interquartile range (IQR)
IQR is used to check for outliers.
Wiki: Boxplot , Interquartile range
Extreme high: maximum value, higher than p75 + IQR*3
Max outlier high: maximum value, higher than p75 + IQR*1.5 but lower than p75 + IQR*3
Max: maximum value, lower than p75 + IQR*1.5
Min: minimum value, higher than p25 - IQR*1.5
Min outlier low: minimum value, lower than p25 - IQR*1.5 but higher than p25 - IQR*3
Extreme low: minimum value, lower than p25 - IQR*3
Max and min must not be interpreted with the current candle high/low.
🔹 Example: Length of chart-puppets
The following example can make it easier to digest. Forty "chart-puppets" are sorted by their length.
The p25 value is 97
The p50 value is 120
The p75 value is 149
75% of all "chart-puppets" are smaller than p75, and 25% is larger than p75.
50% of all "chart-puppets" are smaller than p50, and 50% is larger than p50 (= median).
25% of all "chart-puppets" are smaller than p25, and 75% is larger than p25.
IQR = 149 - 97 = 52
Extreme outlier limit max: p75 + IQR*3 = 149 + 52*3 = 305
Mild outlier limit max: p75 + IQR*1.5 = 149 + 52*1.5 = 227
Mild outlier limit min: p25 - IQR*1.5 = 97 - 52*1.5 = 19
Extreme outlier limit min: p25 - IQR*3 = 97 - 52*3 = -59
In this example there are no outliers to be found, all values are located between p25 - IQR*1.5 (19) and p75 + IQR*1.5. (227)
🔹 Source settings
Note that results are dependable on the chosen source (settings). When, for example, close is chosen as the source, only intrabar close prices are included. This means a low or high can stretch further then the min or max.
Here we can see different results with different source settings
🔹 LTF settings
When 'Auto' is enabled (Settings, LTF), the LTF will be the nearest possible x times smaller TF than the current TF. When 'Premium' is disabled, the minimum TF will always be 1 minute to ensure TradingView plans lower than Premium don't get an error.
Examples with current Daily TF (when Premium is enabled):
500 : 3 minute LTF
1500 (default): 1 minute LTF
5000: 30 seconds LTF (1 minute if Premium is disabled)
🔶 SETTINGS
Source: Set source at close, high, low,...
🔹 LTF
LTF: LTF setting
Auto + multiple: Adjusts the initial set LTF
Premium: Enable when your TradingView plan is Premium or higher
🔹 Intrabar Delta : Colors, dependable on different circumstances.
Up: Price goes up, with more bullish than bearish intrabar volume.
Up-: Price goes up, with more bearish than bullish intrabar volume.
Down: Price goes down, with more bearish than bullish intrabar volume.
Down+: Price goes down, with more bullish than bearish intrabar volume.
🔹 Table
Show table: Show details at the top right corner
Show TF: Show LTF at the bottom right corner
Text color/table size
See DETAILS for more information
Rolling summaryStatistical methods based on mean cannot be effective all the time when attributed to financial data since it doesn't usually follow normal distribution, the data can be skewed or/and have extreme values which can be described as outliers.
In order to deal with this problem it is appropriate to use median-based techniques.
The most common one is called five-number summary/box plot, which plots median of the dataset, 25th (Q1) & 75th (Q3) percentiles (the medians of lower & upper parts of the original dataset divided by the original median), and whiskers calculated by taking range between Q1 and Q3, multiplying it by 1.5 and adding it to Q3 and subtracting it from Q1. The values which are outside the whiskers are considered outliers. Default settings of the script correspond to the classic box plot.
Seven-number summary can be also plotted by this script, by turning on 4 additional percentiles/Bowley’s seven-figure summary by turning on first 2 additional percentiles and changing their values to 10 and 90 respectively.
P.S.: Mean can be also turned in just to check the difference.