TradingView
Trendoscope
16 Jan 2023 pukul 00.11

Chatterjee Correlation 

Ethereum / TetherKuCoin

Huraian

This is my first attempt on implementing a statistical method. This problem was given to me by @lejmer (who also helped me later on building more efficient code to achieve this) when we were debating on the need for higher resource allocation to run scripts so it can run longer and faster. The major problem faced by those who want to implement statistics based methods is that they run out of processing time or need to limit the data samples. My point was that such things need be implemented with an algorithm which suits pine instead of trying to port a python code directly. And yes, I am able to demonstrate that by using this implementation of Chatterjee Correlation.

🎲 What is Chatterjee Correlation?
The Chatterjee rank Correlation Coefficient (CCC) is a method developed by Sourav Chatterjee which can be used to study non linear correlation between two series.
Full documentation on the method can be found here:
arxiv.org/pdf/1909.10140.pdf

In short, the formula which we are implementing here is:


Algorithm can be simplified as follows:
1. Get the ranks of X
2. Get the ranks of Y
3. Sort ranks of Y in the order of X (Lets call this SortedYIndices)
4. Calculate the sum of adjacent Y ranks in SortedYIndices (Lets call it as SumOfAdjacentSortedIndices)
5. And finally the correlation coefficient can be calculated by using simple formula
CCC = 1 - (3*SumOfAdjacentSortedIndices)/(n^2 - 1)

🎲 Looks simple? What is the catch?
Mistake many people do here is that they think in Python/Java/C etc while coding in Pine. This makes code less efficient if it involves arrays and loops. And the simple code may look something like this.

var xArray = array.new<float>() var yArray = array.new<float>() array.push(xArray, x) array.push(yArray, y) sortX = array.sort_indices(xArray) sortY = array.sort_indices(yArray) SumOfAdjacentSortedIndices = 0.0 index = array.get(xSortIndices, 0) for i=1 to n > 1? n -1 : na indexNext = array.get(sortX, i) SumOfAdjacentSortedIndices += math.abs(array.get(sortY, indexNext)-array.get(sortY, index)) index := indexNext correlation := 1 - 3*SumOfAdjacentSortedIndices/(math.pow(n,2)-1)


But, problem here is the number of loops run. Remember pine executes the code on every bar. There are loops run in array.sort_indices and another loop we are running to calculate SumOfAdjacentSortedIndices. Due to this, chances of program throwing runtime errors due to script running for too long are pretty high. This limits greatly the number of samples against which we can run the study. The options to overcome are

  • Limit the sample size and calculate only between certain bars - this is not ideal as smaller sets are more likely to yield false or inconsistent results.
  • Start thinking in pine instead of python and code in such a way that it is optimised for pine. - This is exactly what we have done in the published code.


🎲 How to think in Pine?
In order to think in pine, you should try to eliminate the loops as much as possible. Specially on the data which is continuously growing.

My first thought was that sorting takes lots of time and need to find a better way to sort series - specially when it is a growing data set. Hence, I came up with this library which implements Binary Insertion Sort.
https://www.tradingview.com/script/ZudhdK9O-BinaryInsertionSort/

Replacing array.sort_indices with binary insertion sort will greatly reduce the number of loops run on each bar. In binary insertion sort, the array will remain sorted and any item we add, it will keep adding it in the existing sort order so that there is no need to run separate sort. This allows us to work with bigger data sets and can utilise full 20,000 bars for calculation instead of few 100s.

However, last loop where we calculate SumOfAdjacentSortedIndices is not replaceable easily. Hence, we only limit these iterations to certain bars (Even though we use complete sample size). Plots are made for only those bars where the results need to be printed.

🎲 Implementation
Current implementation is limited to few combinations of x and fixed y. But, will be converting this into library soon - which means, programmers can plug any x and y and get the correlation.

Our X here can be
  • Average volume
  • ATR


And our Y is distance of price from moving average - which identifies trend.

Thus, the indicator here helps to understand the correlation coefficient between volume and trend OR volatility and trend for given ticker and timeframe. Value closer to 1 means highly correlated and value closer to 0 means least correlated. Please note that this method will not tell how these values are correlated. That is, we will not be able to know if higher volume leads to higher trend or lower trend. But, we can say whether volume impacts trend or not.

Please note that values can differ by great extent for different timeframes. For example, if you look at 1D timeframe, you may get higher value of correlation coefficient whereas lower value for 1m timeframe. This means, volume to trend correlation is higher in 1D timeframe and lower in lower timeframes.

Komen
kurtsmock
To beat the Pinescript, you have to think like a Pinescript ;)
Its the brilliance of the Pinescript limitations in my opinion. Forces hard work and creativity. You can't just rely on the work of others. Awesome stuff *highfive*
Trendoscope
@kurtsmock, yes, absolutely beautiful and not for the quitters. The reason why pine suits those who have grit and patience to make it work. Thanks for the comments 😊
Parimal_S
Thanks for this gift
Trendoscope
@Parimal_S, Glad you like it mate :)
definancingseason
Thanks for this. I was surprised recently to find that there is not very many pinescripts based on statistical analysis of multiple values. I want to give a huge thanks for the Binary Sort library as well, I think this may solve the problem I am facing as well writing my own correlation rank based pinescripts
Trendoscope
@definancingseason, Glad it is helpful. :)
madanus
@definancingseason, I agree, the binary insertion sort is just the thing for time series data. Hats off @HeWhoMustNotBeNamed
dinakarmarella
Looks different concept sir, appreciate your hard work, thank you sir
Trendoscope
KioseffTrading
AWESOME WORK!
Lebih