Computing the Pearson correlation matrix on huge datasets in Python

Computing the Pearson correlation matrix on huge datasets in Python

September 6th, 2021 One of the latest tasks at GoodIP was to calculate the similarities between around 480k items having around 800 observations in the range of 0–50k each. Reducing the dimensionality would compromise the quality of the long-tail results, which is undesirable. The following article evaluates the performance of different implementations, describes how to…