Information-theoretical V-measure for Spatial Association

1. The principle of the information-theoretical V-measure

Let us denote the area of the domain as A. Consider two different regionalizations of the domain. To make a further discussion more lucid, we will refer to the first one as a regionalization and to the second one as a partition. The regionalization R divides the domain into n regions ri ∣ i = 1, …, n. The partition Z divides the domain into m zones zj ∣ j = 1, …, n. Both R and Z are essentially integer-type vectors with equal elements.

$$ h = 1 - \sum\limits_{j=1}^m \frac{A_j}{A} \frac{S_j^R}{S^R} $$

where $S^R = - \sum\limits_{i=1}^n \frac{A_i}{A} \log\frac{A_i}{A}$, $S_j^R = - \sum\limits_{i=1}^n \frac{a_{i,j}}{A_j} \log \frac{a_{i,j}}{A_j}$, and ai, j represents the count of elements where R =  = i and Z =  = j. Ai is the number of elements in the vector where R =  = i, and Aj is the number of elements in the vector where Z =  = j.

By swapping R and Z, c can be calculated. Finally, the v-measure can be calculated useing the below formula:

$$ V_{\beta} = \frac{(1+\beta)hc}{(\beta h) + c} $$

2. Example

install.packages("itmsa", dep = TRUE)
install.packages("gdverse", dep = TRUE)
library(itmsa)
ntds = gdverse::NTDs
ntds$incidence = sdsfun::discretize_vector(ntds$incidence, 5)
itm(incidence ~ watershed + elevation + soiltype,
    data = ntds, method = "vm")
## # A tibble: 3 × 3
##   Variable     Iv    Pv
##   <chr>     <dbl> <dbl>
## 1 watershed 0.373     0
## 2 elevation 0.365     0
## 3 soiltype  0.213     0