Information Consistency-Based Measures for Spatial Stratified Heterogeneity

 

1. Introduction to sshicm package

1.1 The sshicm package can be used to address following issues:

  • Information consistency-based measures of spatial stratified heterogeneity intensity for continuous and nominal variables.

  • Strength of spatial pattern associations based on information consistency measures.

1.2 Example data in the sshicm package

baltim data

“baltim” consists of Baltimore home sale prices and hedonics. In total, there are 221 instances in “baltim” data. The explanatory variables are whether it is a detached unit (DWELL), whether it has a patio (PATIO), whether it has a fireplace (FIREPL), whether it has air conditioning (AC), and whether the dwelling is in Baltimore County (CITCOU, while the target variable is the sale price of the home (PRICE).

cinc data

“cinc” is derived from the 2008 Cincinnati Crime + Socio-Demographics dataset. It includes spatial data on 457 objects located on an irregular lattice. The explanatory variables are male population (MALE), female population (FEMALE), median age (MEDIAN_AGE), average family size (AVG_FAMSIZ), and population density (DENSITY), while the target variable is the existence of theft (THEFT_D).

Figure 1. Maps of the baltim and cinc data sets. (Bai et al. 2023)
Figure 1. Maps of the baltim and cinc data sets. (Bai et al. 2023)

1.3 Functions in the sshicm package

Two functions for vector-type inputs of dependent and independent variables.

  • sshic() for continuous dependent variable

  • sshin() for continuous nominal variable

Regression-style data frame modeling function

A function sshicm() that yields all results in a single line, with the type parameter set to IC (Continuous) or IN (Nominal) to specify whether the dependent variable is a continuous or nominal variable.

2. The principle of measuring spatial stratified heterogeneity based on information consistency

Note: All explanatory variables must be discretized in advance or inherently be discrete nominal variables.

2.1 When the dependent variable is a continuous variable:

$$ I_{C}\left(d,s\right) = \sum_{s_{i} \in S}p\left(s_{i}\right)\frac{ \arctan \left(\textbf{RelE} \left( f_{d_{i}} \mid \mid f \right) \right)}{\pi / 2} $$

where di is the random variable corresponding to the target variable in stratum si , and fdi and f are the density functions of di and d, respectively. Additionally, RelE(fdi ∣ ∣f) is the relative entropy of fdi and f.

$$ \textbf{RelE} \left( f_{d_{i}} \mid \mid f \right) = H \left(f_{d_{i}} , f\right) - H \left(f_{d_{i}}\right) = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f} - \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f_{d_{i}}} = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{f_{d_{i}}}{f} $$

2.2 When the dependent variable is a nominal variable:

$$ I_{N}\left(d,s\right) = \frac{I \left(d,s\right)}{I \left(d\right)} = \frac{I \left(d\right) - I \left(d \mid s\right)}{I \left(d\right)} = 1 - \frac{\sum_{s_i \in S}\sum_{x \in V_d} p\left(s_i,x\right) \log p\left(x \mid s_i\right)}{\sum_{x \in V_d} p\left(x\right) \log p\left(x\right)} $$

where p(x) is the probability of observing x in U, p(si, x) is the probability of observing si and x in U, and p(x ∣ si) is the probability of observing x given that the stratum is si.

3. Examples of the sshicm package

install.packages("sshicm", dep = TRUE)
library(sshicm)
baltim = sf::read_sf(system.file("extdata/baltim.gpkg",package = "sshicm"))
sshicm(PRICE ~ .,baltim,type = "IC")
## # A tibble: 5 × 3
##   Variable     Ic    Pv
##   <chr>     <dbl> <dbl>
## 1 AC       0.223  0    
## 2 PATIO    0.162  0.643
## 3 FIREPL   0.135  0.657
## 4 DWELL    0.124  0.716
## 5 CITCOU   0.0898 0.988
cinc = sf::read_sf(system.file("extdata/cinc.gpkg",package = "sshicm"))
sshicm(THEFT_D ~ .,cinc,type = "IN")
## # A tibble: 5 × 3
##   Variable        In      Pv
##   <chr>        <dbl>   <dbl>
## 1 DENSITY    0.776   0.0681 
## 2 MEDIAN_AGE 0.228   0.0230 
## 3 MALE       0.0367  0      
## 4 AVG_FAMSIZ 0.0205  0.00300
## 5 FEMALE     0.00584 0.0200
ntds = gdverse::NTDs
sshicm(incidence ~ watershed + elevation + soiltype,data = ntds)
## # A tibble: 3 × 3
##   Variable      Ic     Pv
##   <chr>      <dbl>  <dbl>
## 1 watershed 0.284  0.0100
## 2 elevation 0.135  0.0531
## 3 soiltype  0.0825 0.133

Reference

Wang, J., Haining, R., Zhang, T., Xu, C., Hu, M., Yin, Q., … Chen, H. (2024). Statistical Modeling of Spatially Stratified Heterogeneous Data. Annals of the American Association of Geographers, 114(3), 499–519. https://doi.org/10.1080/24694452.2023.2289982.

Bai, H., Wang, H., Li, D., & Ge, Y. (2023). Information Consistency-Based Measures for Spatial Stratified Heterogeneity. Annals of the American Association of Geographers, 113(10), 2512–2524. https://doi.org/10.1080/24694452.2023.2223700.

Wang, J., Li, X., Christakos, G., Liao, Y., Zhang, T., Gu, X., & Zheng, X. (2010). Geographical Detectors‐Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China. International Journal of Geographical Information Science, 24(1), 107–127. https://doi.org/10.1080/13658810802443457.

Wang, J. F., Zhang, T. L., & Fu, B. J. A measure of spatial stratified heterogeneity. Ecological indicators, 2016. 67, 250-256. https://doi.org/10.1016/j.ecolind.2016.02.052.