--- title: "Geographical Convergent Cross Mapping (GCCM)" author: "Wenbo Lv" date: "2024-12-17" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{GCCM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## 1. The principle of geographical convergent cross mapping (GCCM) Takens’ theory proves that, for a dynamic system $\phi$, if its trajectory converges to an attractor manifold $M$, which are consisted by a bounded and invariant set of states, then the mapping between $\phi$ and M can be built and time series observations of $\phi$ can be used to construct $M$. According to the generalized embedding theorem, for a compact $d$-dimensional manifold $M$ and a set of observation functions $\left$, the map $\psi_{\phi,h} = \left$ is an embedding of $M$ with $L = 2d + 1$. Here embedding means a one-to-one map resolving all singularities of the original manifold. The elements $h_i$ can be lags of observations from single time series observations, lags of observations from multiple time series, or multiple observation functions. The first two constructions are only special cases of the third one. By taking the measured values at one specific unit and its neighbors (named as spatial lags in spatial statistics) as a set of observation functions, $\psi_{\phi,h} \left(x,s\right) = \left$ is a embedding, where $s$ is the focal unit currently under investigation and $s\left(i\right)$ is its $i$-th order of spatial lags. $h_s\left(x\right)$ and $h_{s\left(i\right)}\left(x\right)$ are their observation functions respectively. (Hereinafter, we will use $\psi \left(x,s\right)$ to present $\psi_{\phi,h} \left(x,s\right)$ for short). For two spatial variables $X$ and $Y$ on the same set of spatial units, their values and spatial lags can be regarded as observation functions reading values from each spatial unit. As the spatial lags in each order contain more than one spatial units, the observation function can be set as the mean of the spatial units or other summary functions considering the spatial direction, to assure the one-to-one mapping of the original manifold $M$. The cross-mapping prediction is defined as: $$ \hat{Y}_s \mid M_x = \sum\limits_{i=1}^{L+1} \left(\omega_{si}Y_{si} \mid M_x \right) $$ where $s$ represents a spatial unit at which the value of $Y$ needs to be predicted, $\hat{Y}_s$ is the prediction result, $L$ is the number of dimensions of the embedding, $si$ is the spatial unit used in the prediction, $Y_{si}$ is the observation value at $si$ and simultaneously the first component of a state in $M_y$, noted as $\psi\left(y,s_i\right)$. In further, $\psi\left(y,s_i\right)$ is determined by its one-to-one mapping point $\psi\left(x,s_i\right)$, which is in turn one of the $L+1$ nearest neighbors of the focal state in $M_x$. $\omega_{si}$ is the corresponding weight defined as: $$ \omega_{si} \mid M_x = \frac{weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{\sum_{i=1}^{L+1}weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)} $$ where $weight \left(\ast,\ast\right)$ is the weight function between two states in the shadow manifold, defined as: $$ weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \exp \left(- \frac{dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{dis \left(\psi\left(x,s_1\right),\psi\left(x,s\right)\right)} \right) $$ where $\exp$ is the exponential function and $dis \left(\ast,\ast\right)$ represents the distance function between two states in the shadow manifold defined as: $$ dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \frac{1}{L} \left(\left|h_{si}\left(x\right)-h_{s}\left(x\right)\right| + \sum_{k=1}^{L-1}abs \left[h_{si\left(k\right)}\left(x\right),h_{s\left(k\right)}\left(x\right)\right]\right) $$ Note that the absolute value distance is used here. The skill of cross-mapping prediction is measured by the Pearson correlation coefficient between the true observations and corresponding predictions: $$ \rho = \frac{Cov\left(Y,\hat{Y}\right)}{\sqrt{Var\left(Y\right) Var\left(\hat{Y}\right)}} $$ The prediction skill $\rho$ varies by setting different sizes of libraries, which means the quantity of observations used in reconstruction of the shadow manifold. We can use the convergence of $\rho$ to infer the causal associations. For GCCM, the convergence means that $\rho$ increases with the size of libraries and is statistically significant when the library becomes largest. And the confidence interval of $\rho$ can be estimated based the $z$-statistics with the normal distribution: $$ t = \rho \sqrt{\frac{n-2}{1-\rho^2}} $$ where $n$ is the number of observations to be predicted, and $$ z = \frac{1}{2} \ln \left(\frac{1+\rho}{1-\rho}\right) $$ ## 2. Examples ### 2.1 Install the `spEDM` package ```r install.packages("spEDM", dep = TRUE) ``` Load the `spEDM` package: ``` r library(spEDM) ``` ### 2.2 An example of lattice data about county-level population density Load data and package: ``` r popd_nb = spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")) ## Warning in spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")): ## neighbour object has 4 sub-graphs popd_nb ## Neighbour list object: ## Number of regions: 2806 ## Number of nonzero links: 15942 ## Percentage nonzero weights: 0.2024732 ## Average number of links: 5.681397 ## 4 disjoint connected subgraphs popdensity = readr::read_csv(system.file("extdata/popdensity.csv", package = "spEDM")) ## Rows: 2806 Columns: 7 ## ── Column specification ─────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (7): x, y, popDensity, DEM, Tem, Pre, slop ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. popdensity ## # A tibble: 2,806 × 7 ## x y popDensity DEM Tem Pre slop ## ## 1 117. 30.5 780. 8 17.4 1528. 0.452 ## 2 117. 30.6 395. 48 17.2 1487. 0.842 ## 3 117. 30.8 261. 49 16.0 1456. 3.56 ## 4 116. 30.1 258. 23 17.4 1555. 0.932 ## 5 116. 30.5 211. 101 16.3 1494. 3.34 ## 6 117. 31.0 386. 10 16.6 1382. 1.65 ## 7 117. 30.2 350. 23 17.5 1569. 0.346 ## 8 117. 30.7 470. 22 17.1 1493. 1.88 ## 9 117. 30.6 1226. 11 17.4 1526. 0.208 ## 10 116. 30.9 137. 598 13.9 1458. 5.92 ## # ℹ 2,796 more rows popd_sf = sf::st_as_sf(popdensity, coords = c("x","y"), crs = 4326) popd_sf ## Simple feature collection with 2806 features and 5 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346 ## Geodetic CRS: WGS 84 ## # A tibble: 2,806 × 6 ## popDensity DEM Tem Pre slop geometry ## * ## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879) ## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877) ## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548) ## 4 258. 23 17.4 1555. 0.932 (116.241 30.104) ## 5 211. 101 16.3 1494. 3.34 (116.173 30.495) ## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839) ## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412) ## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514) ## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558) ## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983) ## # ℹ 2,796 more rows ``` Run GCCM: ``` r startTime = Sys.time() pd_res = gccm(cause = "Pre", effect = "popDensity", data = popd_sf, libsizes = seq(10, 2800, by = 100), E = 3, nb = popd_nb, trendRM = TRUE) ## Computing: [== ] 4% (~28m57s remaining) Computing: [=== ] 7% (~21m50s remaining) Computing: [===== ] 11% (~17m18s remaining) Computing: [====== ] 14% (~14m18s remaining) Computing: [======== ] 18% (~12m11s remaining) Computing: [========= ] 21% (~10m30s remaining) Computing: [========== ] 25% (~8m49s remaining) Computing: [============ ] 29% (~8m12s remaining) Computing: [============= ] 32% (~7m23s remaining) Computing: [=============== ] 36% (~6m56s remaining) Computing: [================ ] 39% (~6m21s remaining) Computing: [================== ] 43% (~5m29s remaining) Computing: [=================== ] 46% (~5m34s remaining) Computing: [==================== ] 50% (~5m1s remaining) Computing: [====================== ] 54% (~4m23s remaining) Computing: [======================= ] 57% (~3m49s remaining) Computing: [========================= ] 61% (~3m32s remaining) Computing: [========================== ] 64% (~3m6s remaining) Computing: [============================ ] 68% (~2m45s remaining) Computing: [============================= ] 71% (~2m29s remaining) Computing: [============================== ] 75% (~2m12s remaining) Computing: [================================ ] 79% (~2m1s remaining) Computing: [================================= ] 82% (~1m37s remaining) Computing: [=================================== ] 86% (~1m18s remaining) Computing: [==================================== ] 89% (~1m7s remaining) Computing: [====================================== ] 93% (~45s remaining) Computing: [======================================= ] 96% (~22s remaining) Computing: [========================================] 100% (done) ## Computing: [== ] 4% (~28m19s remaining) Computing: [=== ] 7% (~19m19s remaining) Computing: [===== ] 11% (~15m3s remaining) Computing: [====== ] 14% (~13m25s remaining) Computing: [======== ] 18% (~10m48s remaining) Computing: [========= ] 21% (~9m7s remaining) Computing: [========== ] 25% (~8m25s remaining) Computing: [============ ] 29% (~7m50s remaining) Computing: [============= ] 32% (~6m44s remaining) Computing: [=============== ] 36% (~6m13s remaining) Computing: [================ ] 39% (~5m45s remaining) Computing: [================== ] 43% (~4m57s remaining) Computing: [=================== ] 46% (~5m10s remaining) Computing: [==================== ] 50% (~4m35s remaining) Computing: [====================== ] 54% (~4m6s remaining) Computing: [======================= ] 57% (~3m47s remaining) Computing: [========================= ] 61% (~3m17s remaining) Computing: [========================== ] 64% (~3m0s remaining) Computing: [============================ ] 68% (~2m42s remaining) Computing: [============================= ] 71% (~2m19s remaining) Computing: [============================== ] 75% (~2m8s remaining) Computing: [================================ ] 79% (~1m53s remaining) Computing: [================================= ] 82% (~1m31s remaining) Computing: [=================================== ] 86% (~1m13s remaining) Computing: [==================================== ] 89% (~1m4s remaining) Computing: [====================================== ] 93% (~43s remaining) Computing: [======================================= ] 96% (~21s remaining) Computing: [========================================] 100% (done) endTime = Sys.time() print(difftime(endTime,startTime, units ="mins")) ## Time difference of 20.01155 mins pd_res ## lib_sizes x_xmap_y_mean x_xmap_y_sig x_xmap_y_upper x_xmap_y_lower y_xmap_x_mean ## 1 10 0.01130395 5.494786e-01 0.04828686 -0.025709912 0.06607465 ## 2 110 0.03306905 7.987322e-02 0.06998652 -0.003938875 0.22689926 ## 3 210 0.04821180 1.064297e-02 0.08506316 0.011228724 0.28196395 ## 4 310 0.06475999 5.979971e-04 0.10151983 0.027823559 0.30785228 ## 5 410 0.07922017 2.655505e-05 0.11588358 0.042341180 0.33124862 ## 6 510 0.09428975 5.626576e-07 0.13083637 0.057487218 0.36226576 ## 7 610 0.10951603 6.021283e-09 0.14592777 0.072807968 0.39693375 ## 8 710 0.12249230 7.501999e-11 0.15877574 0.085878440 0.42302102 ## 9 810 0.13594244 4.778400e-13 0.17207994 0.099439543 0.44164444 ## 10 910 0.14767067 3.774758e-15 0.18367015 0.111275599 0.45689198 ## 11 1010 0.15863133 0.000000e+00 0.19449280 0.122346380 0.46853529 ## 12 1110 0.16852300 0.000000e+00 0.20425241 0.132345176 0.47657811 ## 13 1210 0.17808456 0.000000e+00 0.21367958 0.142017296 0.48683486 ## 14 1310 0.18761109 0.000000e+00 0.22306563 0.151660836 0.49750321 ## 15 1410 0.19697038 0.000000e+00 0.23228051 0.161141755 0.50708625 ## 16 1510 0.20507772 0.000000e+00 0.24025763 0.169359805 0.51660029 ## 17 1610 0.21264081 0.000000e+00 0.24769496 0.177030639 0.52744922 ## 18 1710 0.21969130 0.000000e+00 0.25462450 0.184185487 0.53845530 ## 19 1810 0.22618694 0.000000e+00 0.26100553 0.190780593 0.54912681 ## 20 1910 0.23207300 0.000000e+00 0.26678511 0.196759545 0.55982092 ## 21 2010 0.23780738 0.000000e+00 0.27241336 0.202586962 0.57043510 ## 22 2110 0.24300452 0.000000e+00 0.27751226 0.207870572 0.58089481 ## 23 2210 0.24773394 0.000000e+00 0.28215059 0.212680455 0.59104138 ## 24 2310 0.25208726 0.000000e+00 0.28641865 0.217109347 0.60092354 ## 25 2410 0.25628422 0.000000e+00 0.29053212 0.221380536 0.61050007 ## 26 2510 0.26039117 0.000000e+00 0.29455615 0.225561411 0.62015838 ## 27 2610 0.26436854 0.000000e+00 0.29845205 0.229611598 0.62986485 ## 28 2710 0.26826114 0.000000e+00 0.30226383 0.233576632 0.63868014 ## y_xmap_x_sig y_xmap_x_upper y_xmap_x_lower ## 1 0.0004611665 0.1028263 0.02914279 ## 2 0.0000000000 0.2617051 0.19150401 ## 3 0.0000000000 0.3156735 0.24754360 ## 4 0.0000000000 0.3409712 0.27397010 ## 5 0.0000000000 0.3637926 0.29789691 ## 6 0.0000000000 0.3939875 0.32968203 ## 7 0.0000000000 0.4276555 0.36529603 ## 8 0.0000000000 0.4529343 0.39215637 ## 9 0.0000000000 0.4709512 0.41136393 ## 10 0.0000000000 0.4856839 0.42710977 ## 11 0.0000000000 0.4969231 0.43914576 ## 12 0.0000000000 0.5046812 0.44746601 ## 13 0.0000000000 0.5145683 0.45808386 ## 14 0.0000000000 0.5248444 0.46913652 ## 15 0.0000000000 0.5340682 0.47907235 ## 16 0.0000000000 0.5432193 0.48894374 ## 17 0.0000000000 0.5536467 0.50020882 ## 18 0.0000000000 0.5642167 0.51164650 ## 19 0.0000000000 0.5744573 0.52274558 ## 20 0.0000000000 0.5847117 0.53387714 ## 21 0.0000000000 0.5948815 0.54493439 ## 22 0.0000000000 0.6048957 0.55583942 ## 23 0.0000000000 0.6146029 0.56642622 ## 24 0.0000000000 0.6240502 0.57674496 ## 25 0.0000000000 0.6331990 0.58675192 ## 26 0.0000000000 0.6424194 0.59685170 ## 27 0.0000000000 0.6516793 0.60700928 ## 28 0.0000000000 0.6600834 0.61624074 ``` Visualize the result: ``` r windowsFonts(TNR = windowsFont("Times New Roman")) fig1 = ggplot2::ggplot(data = pd_res, ggplot2::aes(x = lib_sizes)) + ggplot2::geom_line(ggplot2::aes(y = x_xmap_y_mean, color = "x xmap y"), lwd = 1.25) + ggplot2::geom_line(ggplot2::aes(y = y_xmap_x_mean, color = "y xmap x"), lwd = 1.25) + ggplot2::scale_y_continuous(breaks = seq(0, 1, by = 0.1), limits = c(-0.05, 1), expand = c(0, 0), name = expression(rho)) + ggplot2::scale_x_continuous(name = "Lib of Sizes", breaks = seq(10, 2800, by = 100), limits = c(0, 2800), expand = c(0, 0)) + ggplot2::scale_color_manual(values = c("x xmap y" = "#608dbe", "y xmap x" = "#ed795b"), labels = c("Pre xmap PopDensity", "PopDensity xmap Pre"), name = "") + ggplot2::theme_bw() + ggplot2::theme(axis.text = ggplot2::element_text(family = "TNR"), axis.text.x = ggplot2::element_text(angle = 30), axis.title = ggplot2::element_text(family = "TNR"), panel.grid = ggplot2::element_blank(), legend.position = "inside", legend.justification = c('right','top'), legend.background = ggplot2::element_rect(fill = 'transparent'), legend.text = ggplot2::element_text(family = "TNR")) fig1 ``` ![**Figure 1**. The cross-mapping prediction outputs between population density and county-level Precipitation.](../man/figures/gccm/fig1-1.png) ### 2.3 An example of grid data about farmland NPP Load data and package: ``` r npp = terra::rast(system.file("extdata/npp.tif", package = "spEDM")) npp ## class : SpatRaster ## dimensions : 404, 483, 3 (nrow, ncol, nlyr) ## resolution : 10000, 10000 (x, y) ## extent : -2625763, 2204237, 1877078, 5917078 (xmin, xmax, ymin, ymax) ## coord. ref. : CGCS2000_Albers ## source : npp.tif ## names : npp, pre, tem ## min values : 164.00, 384.3409, -47.8194 ## max values : 16606.33, 23878.3555, 263.6938 terra::plot(npp,nc = 3, mar = rep(0.1,4), oma = rep(0.1,4), axes = FALSE, legend = FALSE) ``` ![**Figure 2**. Maps of farmland NPP and climate factors.](../man/figures/gccm/fig2-1.png) *To save the computation time, we will aggregate the data by 3 times and select 3000 non-NA pixels to predict:* ``` r npp = terra::aggregate(npp, fact = 3, na.rm = TRUE) terra::global(npp,"isNA") ## isNA ## npp 14815 ## pre 14766 ## tem 14766 terra::ncell(npp) ## [1] 21735 namat = terra::as.matrix(!is.na(npp[[1]]), wide = TRUE) pred = terra::rowColFromCell(npp,which(namat)) dim(pred) ## [1] 6920 2 set.seed(42) indices = sample(nrow(pred), size = 3000, replace = FALSE) pred = pred[indices,] ``` Run GCCM: ``` r startTime = Sys.time() npp_res = gccm(cause = "pre", effect = "npp", data = npp, libsizes = seq(10,130,10), E = 3, RowCol = pred) ## Computing: [============= ] 31% (~6m44s remaining) Computing: [================ ] 38% (~6m56s remaining) Computing: [=================== ] 46% (~6m19s remaining) Computing: [====================== ] 54% (~5m11s remaining) Computing: [========================= ] 62% (~3m47s remaining) Computing: [============================ ] 69% (~2m45s remaining) Computing: [=============================== ] 77% (~1m52s remaining) Computing: [================================== ] 85% (~1m8s remaining) Computing: [===================================== ] 92% (~32s remaining) Computing: [========================================] 100% (done) ## Computing: [============= ] 31% (~6m57s remaining) Computing: [================ ] 38% (~7m4s remaining) Computing: [=================== ] 46% (~6m25s remaining) Computing: [====================== ] 54% (~5m16s remaining) Computing: [========================= ] 62% (~3m51s remaining) Computing: [============================ ] 69% (~2m45s remaining) Computing: [=============================== ] 77% (~1m54s remaining) Computing: [================================== ] 85% (~1m9s remaining) Computing: [===================================== ] 92% (~32s remaining) Computing: [========================================] 100% (done) endTime = Sys.time() print(difftime(endTime,startTime, units ="mins")) ## Time difference of 13.4349 mins npp_res ## lib_sizes x_xmap_y_mean x_xmap_y_sig x_xmap_y_upper x_xmap_y_lower y_xmap_x_mean ## 1 10 0.05366658 3.278589e-03 0.08928161 0.01791449 0.07974064 ## 2 20 0.09168397 4.898603e-07 0.12705360 0.05608148 0.14103020 ## 3 30 0.12385334 9.956924e-12 0.15893539 0.08845891 0.18137790 ## 4 40 0.15205226 0.000000e+00 0.18682219 0.11690188 0.22147635 ## 5 50 0.18402074 0.000000e+00 0.21836918 0.14921690 0.27559982 ## 6 60 0.23504915 0.000000e+00 0.26857650 0.20095299 0.26947293 ## 7 70 0.24022627 0.000000e+00 0.27366015 0.20621255 0.20894394 ## 8 80 0.20491112 0.000000e+00 0.23894542 0.17037399 0.09716466 ## 9 90 0.14005787 1.287859e-14 0.17496740 0.10479663 0.01715257 ## 10 100 0.17271242 0.000000e+00 0.20721815 0.13777749 -0.02868893 ## y_xmap_x_sig y_xmap_x_upper y_xmap_x_lower ## 1 1.227088e-05 0.115198404 0.04407993 ## 2 8.437695e-15 0.175928786 0.10577755 ## 3 0.000000e+00 0.215763902 0.14654259 ## 4 0.000000e+00 0.255239854 0.18717336 ## 5 0.000000e+00 0.308345190 0.24220210 ## 6 0.000000e+00 0.302343779 0.23596193 ## 7 0.000000e+00 0.242914082 0.17446196 ## 8 9.696014e-08 0.132490470 0.06159233 ## 9 3.476473e-01 0.052906596 -0.01864537 ## 10 1.883823e+00 0.007104863 -0.06440930 ``` Visualize the result: ``` r fig3 = ggplot2::ggplot(data = npp_res, ggplot2::aes(x = lib_sizes)) + ggplot2::geom_line(ggplot2::aes(y = x_xmap_y_mean, color = "x xmap y"), lwd = 1.25) + ggplot2::geom_line(ggplot2::aes(y = y_xmap_x_mean, color = "y xmap x"), lwd = 1.25) + ggplot2::scale_y_continuous(breaks = seq(0, 1, by = 0.1), limits = c(-0.05, 1), expand = c(0, 0), name = expression(rho)) + ggplot2::scale_x_continuous(name = "Lib of Sizes", breaks = seq(10,130,10), limits = c(9, 101), expand = c(0, 0)) + ggplot2::scale_color_manual(values = c("x xmap y" = "#608dbe", "y xmap x" = "#ed795b"), labels = c("Precipitation xmap NPP", "NPP xmap Precipitation"), name = "") + ggplot2::theme_bw() + ggplot2::theme(axis.text = ggplot2::element_text(family = "TNR"), axis.title = ggplot2::element_text(family = "TNR"), panel.grid = ggplot2::element_blank(), legend.position = "inside", legend.justification = c('right','top'), legend.background = ggplot2::element_rect(fill = 'transparent'), legend.text = ggplot2::element_text(family = "TNR")) fig3 ``` ![**Figure 3**. The cross-mapping prediction outputs between farmland NPP and precipitation.](../man/figures/gccm/fig3-1.png)