Analyzing Altcoin price using functional principal component analysis
Functional data analysis (FDA) analyses data providing information about curves, surfaces or anything else
varying over a continuum. In functional data's setting, each sample is considered to be a function which
differentiates itself from traditional high-dimensional data anlaysis. In this example, one technique from
FDA - the functional principal component analysis (FPCA) is being used to analyze the closing price of
20 Altcoins over the past year of time. Spline models are used for the pre-smoothing stage of the functional
data in this example.
Functional Data Design
- Dense design with noisy repeated measurements
- The collection of random functions from some stochastic process have the common unknown covariance
function
- Pre-smoothing of individual curves are needed
- Assume error due to smoothing is negligible asymptotically
Why FPCA in functional data?
- Dimension reduction (reducing random curves to a set of functional principal component scores)
- Exploratory data analysis (characterize the dominant modes of variation of the sample around
the mean trends)
Raw Data
- The raw data is on a dense and equally spaced grid (each sample has 365 observations)
- The raw data is very noisy (as the closing price varies on a daily basis)
- The price is on a log scale so that there is not a very strong difference in amplitude
(need this for the functional PCA stage)
- Need pre-smoothing on the individual curves
Pre-smoothing of the raw data
- Penalized spline models are used for the pre-smoothing of the raw data
- Bsplines are used as the data is aperiodic
- The order of the splines is chosen to be 4 (i.e. cubic splines)
- Both the number of knots and the smoothing parameter are chosen using the GCV approach
- Chose 48 knots (i.e. 50 basis)
- Smoothing parameter is set to be 8.1
Interpretations on the FPCA
- The first FPC explains 91.6% of the variation. It contrasts early stage (slightly higher variation)
with later stage (slightly lower variation) (i.e. slightly more weights are placed on the early
prices which lead to higher variability).
- The second FPC explains 4.8% of the variation. It contrasts the early stage and later stage of
the price (which is quite typical for the second FPC).
In plain language
- Results from FPC1 tell us that Altcoins with high positive FPC scores tend to do better in both
the early and later stage (while Altcoins with high negative FPC scores tend to do poorly in both
stages)
- Results from FPC2 tell us that Altcoins with high positive FPC scores tend to do poorly in early
stage but will do better in later stage (while Altcoins with high negative FPC scores tend to do
better in early stage but poorly in later stage)
- Results actually make sense as nowadays there are over 1600 Altcoins in the cryptocurrency market,
and many of the Altcoins have big fluctuation in their prices (often after they get stabilized)
Closing Thoughts
Functional data analysis is a relatively new field in statistics which borrows many of the functional
analysis theories in the statistical analysis.
Two concepts that differentiate functional data from high-dimensional data. First is that functional data
assumes infinite dimensionality while high-dimensional data assumes finite dimensionality. Second functional
data assumes data is smooth while in high-dimensional data data can be discrete.
Last updated on Jan 1, 2019