Clustering using Self-Organizing Maps through Non-Linear Principal Components Analysis - Rainfalls in Southwestern Colombia
Clustering using Self-Organizing Maps through Non-Linear Principal Components Analysis - Rainfalls in Southwestern Colombia.
This GitHub is part of the article titled: “Regionalization of monthly rainfall in southwestern Colombia using neural networks” (See [1]). In case to use these code, please refer as:
The knowledge of rainfall regimes is a prerequisite necessary for many activities such as water resources management, mitigation of risks, planning of socioeconomic activities, and other hydrologic applications. In this paper, non-linear principal component analysis (NLPCA) and self-organized feature map (SOM), as non-linear techniques, are applied to identify the homogenous regions for monthly rainfall in the southwestern of Colombia. SOM uses data from a network of 44 monthly rainfall gauge stations represent in five principal components using NLPCA. The components represent the dimension reduction in the period from January of 1983 to December of 2016 into five principal components for each gauge station. The two-dimensional SOM indicates that two clusters grouped all rainfall gauges. A heterogeneity test showed that the two regions are acceptably homogeneous and depict the main features of the monthly rainfall variability over the study area. Besides, both identified clusters show two types of rainfall regimes: bimodal in the Andean Region and unimodal in the Pacific Region. The bimodal predominates in the mountainous area and the unimodal over the coastal zone. The application of SOM provided a better understanding of the seasonality and spatiality of rainfall.
The advantages of NLPCA and SOM are three points:
The dataset of monthly rainfall used in this study was obtained from 44-gauge stations located in different zones in the Southwestern Colombia (Nariño) (Fig. 1), available in Canchala et al. [2]. The time series analyzed covers 34 years of observation between 1983 and 2016.
Figure 1. Geographic location of the study area and distribution of rainfall stations
The methodology proposed in this study was developed according to the flowchart presented in Fig. 2. The regionalization of monthly rainfall was performed using two non-linear techniques: NLPCA and SOM. NLPCA was used to reduce the dimensionality of the dataset, and SOM to identify regions with homogeneous rainfall.
Figure 2. Flowchart of methodology
NLPCA operates by training a feed-forward neural network to perform the identity mapping, where the network inputs are reproduced at the output layer. The network contains an internal “bottleneck” layer that allows generating a compact representation of the input data. This technique successfully reduces the dimensionality and create a feature space map similar to the actual distribution of the underlying system parameters [3]. The scheme of NLPCA is shown in Fig. 3. In this, the dimensions of and
are
and
, respectively, where
is the input column vector of length
, and
is the number of hidden neurons in the encoding and decoding layers for
. The neurons
is calculated from a linear combination of hidden neurons
. A second transfer function
maps the encoding layer to the bottleneck layer containing a single neuron.
Figure 3. NN Model for calculating NLPCA
Figure 4. SOM two-level architecture
Follow the next instructions to get similar results as we present in [1].
First, run Requirements.m to check if your MatLab version is compatible to run scripts and functions. Verify using this script or doing the next checklist:
If you run the script a message dialog will appear to inform if your version is compatible or not (Check flag value, if this is zero then the Main_Script.m will not work).
This is the main script where, it is possible to reply the results from the manuscript [1]. Click here to check the full script.
We develop an autoencoder using a network with a [408-200-25-5] topology. Where 408 is the number of inputs (series time for each gauge station) after the inputs are reduced layer by layer until it achieves only five outputs. The main idea is verifying that per each training stage, the performance improves to thrust in the model.
If you execute step by step the Main_Script.m some views are presented in Fig. 5, Fig. 6, and Fig. 7.
Figure 5. First stage to train an autoenconder [408-200-408]
Figure 6. Second stage to train an autoenconder [200-25-200]
Figure 7. Third stage to train an autoenconder [25-5-25]
Figure 8. Encoder [408-200-25-5]
Figure 9. Classification result based on saved information
Figure 10. Regionalization of monthly rainfall in Nariño using CPNL and SOM
[1] T. Canchala, Y. Carvajal, W. Alfonso, W. Loaiza, and E. Caicedo. “Regionalization of monthly rainfall in southwestern Colombia using neural networks.” MethodsX. 2020.
[2] T. Canchala, Y. Carvajal, W. Alfonso, W. Loaiza, and E. Caicedo. “Estimation of missing data of monthly rainfall in southwestern Colombia using artificial neural networks.” Data in Brief, Vol. 26, pp. 104517. October 2019. https://doi.org/10.1016/j.dib.2019.104517
[3] M. A. Kramer. “Non-linear principal component analysis using auto-associative neural networks.” AIChE Journal, vol. 37, pp. 233-243, 1991. https://doi.org/10.1002/aic.690370209
[4] T. Kohonen. “Self-Organizing Maps.” Information Sciences. Berlin: Springer, vol. 30, pp. XX-502, 2001. https://doi.org/10.1007/978-3-642-56927-2
[5] T. Kohonen. “Self-organized formation of topologically correct feature maps.” Biological cybernetics, vol. 43, pp. 59-69, 1982. https://doi.org/10.1007/BF00337288
[6] K.-C. Hsu and S.-T. Li. “Clustering spatial-temporal precipitation data using wavelet transform and self-organizing map neural network.” Advances in Water Resources, vol. 33, pp. 190-400,42010. https://doi.org/10.1016/j.advwatres.2009.11.005
[7] F. Farsadnia, M.R. Kamrood, A.M. Nia, R. Modarres, M.T. Bray, and D. Han. “Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps.” Journal of hydrology, vol. 509, pp. 387-397, 2014. https://doi.org/10.1016/j.jhydrol.2013.11.050