项目作者: aefdz

项目描述 :
Localization processes for functional data analysis. Software companion for the paper “Localization processes for functional data analysis” by Elías, A., Jiménez, R., and Yukich, J. (2020)
高级语言: R
项目地址: git://github.com/aefdz/localFDA.git
创建时间: 2020-07-22T16:08:57Z
项目社区:https://github.com/aefdz/localFDA

开源协议:Other

下载


localFDA

License
Travis build
status



Overview

Software companion for the paper “Localization processes for functional
data analysis” by Elías, Antonio, Jiménez, Raúl, and Yukich, Joe, (2020)
\. It provides the code for computing localization
processes and localization distances and their application to
classification and outlier detection problems.

Installation

  1. #install the package
  2. devtools::install_github("aefdz/localFDA")
  1. ## v checking for file 'C:\Users\anton\AppData\Local\Temp\Rtmp4617Sq\remotes2e00503a197c\aefdz-localFDA-25b0d40/DESCRIPTION' (425ms)
  2. ## - preparing 'localFDA':
  3. ## checking DESCRIPTION meta-information ... checking DESCRIPTION meta-information ... v checking DESCRIPTION meta-information
  4. ## - checking for LF line-endings in source and make files and shell scripts
  5. ## - checking for empty or unneeded directories
  6. ## - looking to see if a 'data/datalist' file should be added
  7. ## - building 'localFDA_0.0.0.9000.tar.gz'
  8. ##
  9. ##
  1. #load the package
  2. library(localFDA)

Test usage

Load the example data and plot it.

  1. X <- exampleData
  2. n <- ncol(X)
  3. p <- nrow(X)
  4. t <- as.numeric(rownames(X))
  5. #plot the data set
  6. df_functions <- data.frame(ids = rep(colnames(X), each = p),
  7. y = c(X),
  8. x = rep(t, n)
  9. )
  10. functions_plot <- ggplot(df_functions) +
  11. geom_line(aes(x = x, y = y, group = ids, color = ids),
  12. color = "black", alpha = 0.25) +
  13. xlab("t") + theme(legend.position = "none")
  14. functions_plot

Compute kth empirical localization processes

Empirical version of Equation (1) of the paper. For one focal,

  1. focal <- "1"
  2. localizarionProcesses_focal <- localizationProcesses(X, focal)$lc

Plot localization processes of order 1, 50, 100 and 200:

  1. df_lc <- data.frame(k = rep(colnames(localizarionProcesses_focal), each = p),
  2. y = c(localizarionProcesses_focal),
  3. x = rep(t, n-1)
  4. )
  5. lc_plots <- list()
  6. ks <- c(1, 50, 100, 200)
  7. for(i in 1:4){
  8. lc_plots[[i]] <- functions_plot +
  9. geom_line(data = filter(df_lc, k == paste0("k=", ks[i])),
  10. aes(x = x, y = y, group = k),
  11. color = "blue", size = 1) +
  12. geom_line(data = filter(df_functions, ids == focal),
  13. aes(x = x, y = y, group = ids),
  14. color = "red", linetype = "dashed", size = 1)+
  15. ggtitle(paste("k = ", ks[i]))
  16. }
  17. wrap_plots(lc_plots)

Compute kth empirical localization distances

Equation (18) of the paper. For one focal,

  1. localizationDistances_focal <- localizationDistances(X, focal)
  2. head(localizationDistances_focal)
  1. ## k=1 k=2 k=3 k=4 k=5 k=6
  2. ## 0.0005082926 0.0011346495 0.0017636690 0.0023955745 0.0030095117 0.0035089220

Plot the localization distances:

  1. df_ld <- data.frame(k = names(localizationDistances_focal),
  2. y = localizationDistances_focal,
  3. x = 1:c(n-1)
  4. )
  5. ldistances_plot <- ggplot(df_ld, aes(x = x, y = y)) +
  6. geom_point() +
  7. ggtitle("Localization distances for one focal") +
  8. xlab("kth") + ylab("L")
  9. ldistances_plot

Sample μ and σ

  1. localizationStatistics_full <- localizationStatistics(X, robustify = TRUE)
  2. #See the mean and sd estimations for k = 1, 100, 200, 400, 600
  3. localizationStatistics_full$trim_mean[c(1, 100, 200, 400, 600)]
  1. ## k=1 k=100 k=200 k=400 k=600
  2. ## 0.001083517 0.098465426 0.184940365 0.350528860 0.526580274
  1. localizationStatistics_full$trim_sd[c(1, 100, 200, 400, 600)]
  1. ## k=1 k=100 k=200 k=400 k=600
  2. ## 0.0005326429 0.0329170846 0.0490732397 0.0686018224 0.0806314699

Classification

  1. X <- classificationData
  2. ids_training <- sample(colnames(X), 90)
  3. ids_testing <- setdiff(colnames(X), ids_training)
  4. trainingSample <- X[,ids_training]
  5. testSample <- X[,ids_testing]; colnames(testSample) <- NULL #blind
  6. classNames <- c("G1", "G2")
  7. classification_results <- localizationClassifier(trainingSample, testSample, classNames, k_opt = 3)
  8. checking <- data.frame(real_classs = ids_testing,
  9. predicted_class =classification_results$test$predicted_class)
  10. checking
  1. ## real_classs predicted_class
  2. ## 1 12_G1 G1
  3. ## 2 14_G1 G1
  4. ## 3 21_G1 G1
  5. ## 4 44_G1 G1
  6. ## 5 54_G2 G2
  7. ## 6 56_G2 G2
  8. ## 7 72_G2 G2
  9. ## 8 81_G2 G2
  10. ## 9 94_G2 G2
  11. ## 10 100_G2 G2

Outlier detection

  1. X <- outlierData
  2. outliers <- outlierLocalizationDistance(X, localrule = 0.95, whiskerrule = 1.5)
  3. outliers$outliers_ld_rule
  1. ## [1] "1_magnitude" "1_shape" "2_magnitude" "2_shape"

Plot results,

  1. df_functions <- data.frame(ids = rep(colnames(X), each = nrow(X)),
  2. y = c(X),
  3. x = rep(seq(from = 0, to = 1, length.out = nrow(X)), ncol(X)))
  4. functions_plot <- ggplot(df_functions) +
  5. geom_line(aes(x = x, y = y, group = ids),
  6. color = "black") +
  7. xlab("t") +
  8. theme(legend.position = "bottom")+
  9. geom_line(data = df_functions[df_functions$ids %in% outliers$outliers_ld_rule,], aes(x = x, y = y, group = ids, color = ids), size = 1) +
  10. guides(color = guide_legend(title="Detected outliers"))
  11. functions_plot

References

Elías, Antonio, Jiménez, Raúl and Yukich, Joe (2020). Localization
processes for functional data analysis
[https://arxiv.org/abs/2007.16059]https://arxiv.org/abs/2007.16059.