| Title: | Information-Theoretic Measures for Revealing Variable Interactions |
|---|---|
| Description: | Implements information-theoretic measures to explore variable interactions, including KSG mutual information estimation for continuous variables from Kraskov et al. (2004) <doi:10.1103/PhysRevE.69.066138>, knockoff conditional mutual information described in Zhang & Chen (2025) <doi:10.1126/sciadv.adu6464>, synergistic-unique-redundant decomposition introduced by Martinez-Sanchez et al. (2024) <doi:10.1038/s41467-024-53373-4>, allowing detection of complex and diverse relationships among variables. |
| Authors: | Wenbo Lyu [aut, cre, cph] (ORCID: <https://orcid.org/0009-0002-6003-3800>) |
| Maintainer: | Wenbo Lyu <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3 |
| Built: | 2026-05-26 14:57:26 UTC |
| Source: | https://github.com/stscl/infoxtr |
Estimate the conditional entropy of target variables given conditioning variables.
ce(data, target, conds, base = exp(1), type = c("cont", "disc"), k = 3)ce(data, target, conds, base = exp(1), type = c("cont", "disc"), k = 3)
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
conds |
Integer vector of column indices for the conditioning variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
A numerical value.
infoxtr::ce(matrix(1:100,ncol=2),1,2)infoxtr::ce(matrix(1:100,ncol=2),1,2)
Estimate the conditional mutual information between target and interacting variables given conditioning variables.
cmi( data, target, interact, conds, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE )cmi( data, target, interact, conds, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE )
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
interact |
Integer vector of column indices for the interacting variables. |
conds |
Integer vector of column indices for the conditioning variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
A numerical value.
set.seed(42) infoxtr::cmi(matrix(stats::rnorm(99,1,10),ncol=3),1,2,3)set.seed(42) infoxtr::cmi(matrix(stats::rnorm(99,1,10),ncol=3),1,2,3)
Discretize a numeric vector into categorical classes using several
commonly used discretization methods. Missing values (NA/NaN)
are ignored and returned as class 0.
discretize( x, n = 5, method = "natural", large = 3000, prop = 0.15, seed = 42, thr = 0.4, iter = 100, bps = NULL, right_closed = TRUE )discretize( x, n = 5, method = "natural", large = 3000, prop = 0.15, seed = 42, thr = 0.4, iter = 100, bps = NULL, right_closed = TRUE )
x |
A vector. |
n |
(optional) Number of classes. |
method |
(optional) Discretization method. One of
|
large |
(optional) Threshold sample size for natural breaks sampling. |
prop |
(optional) Sampling proportion used when |
seed |
(optional) Random seed used for sampling in natural breaks. |
thr |
(optional) Threshold used in the head/tail breaks algorithm. |
iter |
(optional) Maximum number of iterations for head/tail breaks. |
bps |
(optional) Numeric vector of manual breakpoints used when
|
right_closed |
(optional) Logical. If |
A discretized integer vector.
If x is not numeric, it will be converted to
integer categories via as.factor().
set.seed(42) infoxtr::discretize(stats::rnorm(99,1,10))set.seed(42) infoxtr::discretize(stats::rnorm(99,1,10))
Estimate the entropy of a vector using either category counts (for discrete data) or a k-nearest neighbor estimator (for continuous data).
entropy(vec, base = exp(1), type = c("cont", "disc"), k = 3)entropy(vec, base = exp(1), type = c("cont", "disc"), k = 3)
vec |
A vector. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
A numerical value.
set.seed(42) infoxtr::entropy(stats::rnorm(100), type = "cont") infoxtr::entropy(sample(letters[1:5], 100, TRUE), base = 2, type = "disc")set.seed(42) infoxtr::entropy(stats::rnorm(100), type = "cont") infoxtr::entropy(sample(letters[1:5], 100, TRUE), base = 2, type = "disc")
Estimate the joint entropy of selected variables.
je(data, indices, base = exp(1), type = c("cont", "disc"), k = 3)je(data, indices, base = exp(1), type = c("cont", "disc"), k = 3)
data |
Observation data. |
indices |
Integer vector of column indices to include in joint entropy calculation. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
A numerical value.
infoxtr::je(matrix(1:100,ncol=2),1:2)infoxtr::je(matrix(1:100,ncol=2),1:2)
Knockoff Conditional Mutual Information
kocmi( data, target, agent, conds, knockoff, null_knockoff = NULL, type = c("cont", "disc"), nboots = 10000, k = 3, threads = 1, seed = 42, base = exp(1), method = "equal", contain_null = TRUE )kocmi( data, target, agent, conds, knockoff, null_knockoff = NULL, type = c("cont", "disc"), nboots = 10000, k = 3, threads = 1, seed = 42, base = exp(1), method = "equal", contain_null = TRUE )
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
conds |
Integer vector of column indices for the conditioning variables. |
knockoff |
Knockoff realizations constructed for the |
null_knockoff |
(optional) Knockoff realizations generated under the
null setting where all variables are jointly used to construct knockoffs.
Each column represents one Monte Carlo sample. If |
type |
(optional) Estimation method: |
nboots |
(optional) Number of permutations used in the sign-flipping permutation test for evaluating the significance of the mean information difference. |
k |
(optional) For |
threads |
(optional) Number of threads used. |
seed |
(optional) Random seed used for permutation test. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
method |
(optional) Discretization method. One of
|
contain_null |
(optional) Logical. If |
A named numeric vector.
kocmi only support numeric data.
Zhang, X., Chen, L., 2025. Quantifying interventional causality by knockoff operation. Science Advances 11.
set.seed(42) kn1 = replicate(50, stats::rnorm(100)) kn2 = replicate(50, stats::rnorm(100)) mat = replicate(3, stats::rnorm(100)) infoxtr::kocmi(mat, 1, 2, 3, kn1, kn2)set.seed(42) kn1 = replicate(50, stats::rnorm(100)) kn2 = replicate(50, stats::rnorm(100)) mat = replicate(3, stats::rnorm(100)) infoxtr::kocmi(mat, 1, 2, 3, kn1, kn2)
Estimate the mutual information between target and interacting variables.
mi( data, target, interact, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE )mi( data, target, interact, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE )
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
interact |
Integer vector of column indices for the interacting variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
A numerical value.
infoxtr::mi(matrix(1:100,ncol=2),1,2)infoxtr::mi(matrix(1:100,ncol=2),1,2)
Synergistic-Unique-Redundant Decomposition
## S4 method for signature 'data.frame' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE ) ## S4 method for signature 'sf' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE, nb = NULL ) ## S4 method for signature 'SpatRaster' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE )## S4 method for signature 'data.frame' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE ) ## S4 method for signature 'sf' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE, nb = NULL ) ## S4 method for signature 'SpatRaster' surd( data, target, agent, lag = 1, bin = 5, method = "equal", max.order = 10, threads = 1, base = 2, normalize = TRUE )
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
lag |
(optional) Lag of the agent variables. |
bin |
(optional) Number of discretization bins. |
method |
(optional) Discretization method. One of
|
max.order |
(optional) Maximum combination order. |
threads |
(optional) Number of threads used. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
normalize |
(optional) Logical; if |
nb |
(optional) Neighbours list. |
A list.
Character vector indicating the variable combination associated with each information component.
Character vector indicating the information type of each component.
Numeric vector giving the magnitude of each information component.
surd only supports numeric input data. Both bin and method
support variable-specific settings using R-style recycling:
length 1: applied to the target and all agent variables
length 2: first for the target, second for all agents
length > 2: first for the target, remaining values are recycled across agents
Martinez-Sanchez, A., Arranz, G., Lozano-Duran, A., 2024. Decomposing causality into its synergistic, unique, and redundant components. Nature Communications 15.
columbus = sf::read_sf(system.file("case/columbus.gpkg", package="spEDM")) infoxtr::surd(columbus, 1, 2:3)columbus = sf::read_sf(system.file("case/columbus.gpkg", package="spEDM")) infoxtr::surd(columbus, 1, 2:3)
Estimate the transfer entropy from agent variables to target variables.
te( data, target, agent, lag_p = 3, lag_q = 3, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE, lag_single = FALSE )te( data, target, agent, lag_p = 3, lag_q = 3, base = exp(1), type = c("cont", "disc"), k = 3, normalize = FALSE, lag_single = FALSE )
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
lag_p |
(optional) Lag of the target variables. |
lag_q |
(optional) Lag of the agent variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
lag_single |
(optional) Logical; if |
A numerical value.
Schreiber, T., 2000. Measuring Information Transfer. Physical Review Letters 85, 461–464.
set.seed(42) infoxtr::te(matrix(stats::rnorm(100,1,10),ncol=2),1,2)set.seed(42) infoxtr::te(matrix(stats::rnorm(100,1,10),ncol=2),1,2)