Title: | Spatial Data Science Complementary Features |
---|---|
Description: | Wrapping and supplementing commonly used functions in the R ecosystem related to spatial data science, while serving as a basis for other packages maintained by Wenbo Lv. |
Authors: | Wenbo Lv [aut, cre, cph] |
Maintainer: | Wenbo Lv <[email protected]> |
License: | GPL-3 |
Version: | 0.5.0 |
Built: | 2024-11-19 07:21:41 UTC |
Source: | https://github.com/stscl/sdsfun |
check for NA values in a tibble
check_tbl_na(tbl)
check_tbl_na(tbl)
tbl |
A |
A logical value.
demotbl = tibble::tibble(x = c(1,2,3,NA,1), y = c(NA,NA,1:3), z = 1:5) demotbl check_tbl_na(demotbl)
demotbl = tibble::tibble(x = c(1,2,3,NA,1), y = c(NA,NA,1:3), z = 1:5) demotbl check_tbl_na(demotbl)
discretization
discretize_vector( x, n, method = "natural", breakpoint = NULL, sampleprob = 0.15, seed = 123456789 )
discretize_vector( x, n, method = "natural", breakpoint = NULL, sampleprob = 0.15, seed = 123456789 )
x |
A continuous numeric vector. |
n |
(optional) The number of discretized classes. |
method |
(optional) The method of discretization, default is |
breakpoint |
(optional) Break points for manually splitting data. When
|
sampleprob |
(optional) When the data size exceeds |
seed |
(optional) Random seed number, default is |
A discretized integer vector
xvar = c(22361, 9573, 4836, 5309, 10384, 4359, 11016, 4414, 3327, 3408, 17816, 6909, 6936, 7990, 3758, 3569, 21965, 3605, 2181, 1892, 2459, 2934, 6399, 8578, 8537, 4840, 12132, 3734, 4372, 9073, 7508, 5203) discretize_vector(xvar, n = 5, method = 'natural')
xvar = c(22361, 9573, 4836, 5309, 10384, 4359, 11016, 4414, 3327, 3408, 17816, 6909, 6936, 7990, 3758, 3569, 21965, 3605, 2181, 1892, 2459, 2934, 6399, 8578, 8537, 4840, 12132, 3734, 4372, 9073, 7508, 5203) discretize_vector(xvar, n = 5, method = 'natural')
transforming a category tibble into the corresponding dummy variable tibble
dummy_tbl(tbl)
dummy_tbl(tbl)
tbl |
A |
A tibble
a = tibble::tibble(x = 1:3,y = 4:6) dummy_tbl(a)
a = tibble::tibble(x = 1:3,y = 4:6) dummy_tbl(a)
transforming a categorical variable into dummy variables
dummy_vec(x)
dummy_vec(x)
x |
An integer vector or can be converted into an integer vector. |
A matrix.
dummy_vec(c(1,1,3,2,4,6))
dummy_vec(c(1,1,3,2,4,6))
get variable names in a formula and data
formula_varname(formula, data)
formula_varname(formula, data)
formula |
A formula. |
data |
A |
A list.
yname
Independent variable name
xname
Dependent variable names
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) formula_varname(PS_Score ~ EL_Score + OH_Score, gzma) formula_varname(PS_Score ~ ., gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) formula_varname(PS_Score ~ EL_Score + OH_Score, gzma) formula_varname(PS_Score ~ ., gzma)
spatial fuzzy overlay
fuzzyoverlay(formula, data, method = "and")
fuzzyoverlay(formula, data, method = "and")
formula |
A formula of spatial fuzzy overlay. |
data |
A data.frame or tibble of discretized data. |
method |
(optional) Overlay methods. When |
A numeric vector.
Independent variables in the data
provided to fuzzyoverlay()
must be discretized
variables, and dependent variable are continuous variable.
sim = tibble::tibble(y = stats::runif(7,0,10), x1 = c(1,rep(2,3),rep(3,3)), x2 = c(rep(1,2),rep(2,2),rep(3,3))) fo1 = fuzzyoverlay(y~x1+x2,data = sim, method = 'and') fo1 fo2 = fuzzyoverlay(y~x1+x2,data = sim, method = 'or') fo2
sim = tibble::tibble(y = stats::runif(7,0,10), x1 = c(1,rep(2,3),rep(3,3)), x2 = c(rep(1,2),rep(2,2),rep(3,3))) fo1 = fuzzyoverlay(y~x1+x2,data = sim, method = 'and') fo1 fo2 = fuzzyoverlay(y~x1+x2,data = sim, method = 'or') fo2
generate subsets of a set
generate_subsets(set, empty = TRUE, self = TRUE)
generate_subsets(set, empty = TRUE, self = TRUE)
set |
A vector. |
empty |
(optional) When |
self |
(optional) When |
A list.
generate_subsets(letters[1:3]) generate_subsets(letters[1:3],empty = FALSE) generate_subsets(letters[1:3],self = FALSE) generate_subsets(letters[1:3],empty = FALSE,self = FALSE)
generate_subsets(letters[1:3]) generate_subsets(letters[1:3],empty = FALSE) generate_subsets(letters[1:3],self = FALSE) generate_subsets(letters[1:3],empty = FALSE,self = FALSE)
only geodetector q-value
geodetector_q(y, hs)
geodetector_q(y, hs)
y |
Dependent variable |
hs |
Independent variable |
A numeric value
geodetector_q(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
geodetector_q(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
hierarchical clustering with spatial soft constraints
hclustgeo_disc(data, n, alpha = 0.5, D1 = NULL, scale = TRUE, wt = NULL, ...)
hclustgeo_disc(data, n, alpha = 0.5, D1 = NULL, scale = TRUE, wt = NULL, ...)
data |
An |
n |
The number of hierarchical clustering classes, which can be a numeric value or vector. |
alpha |
(optional) A positive value between |
D1 |
(optional) A |
scale |
(optional) Whether to scaled the dissimilarities matrix, default is |
wt |
(optional) Vector with the weights of the observations. By default, |
... |
(optional) Other arguments passed to |
A vector
with grouped memberships if n
are scalar
, otherwise a matrix
with grouped
memberships is returned where each column corresponds to the elements of n
, respectively.
This is a C++
enhanced implementation of the hclustgeo
function in ClustGeo
package.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) gzma$group = hclustgeo_disc(gzma,5,alpha = 0.75) plot(gzma["group"])
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) gzma$group = hclustgeo_disc(gzma,5,alpha = 0.75) plot(gzma["group"])
Function for constructing inverse distance weight.
inverse_distance_swm(sfj, power = 1, bandwidth = NULL)
inverse_distance_swm(sfj, power = 1, bandwidth = NULL)
sfj |
Vector object that can be converted to |
power |
(optional) Default is 1. Set to 2 for gravity weights. |
bandwidth |
(optional) When the distance is bigger than bandwidth, the
corresponding part of the weight matrix is set to 0. Default is |
The inverse distance weight formula is
A inverse distance weight matrices with class of matrix
.
library(sf) pts = read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) wt = inverse_distance_swm(pts) wt[1:5,1:5]
library(sf) pts = read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) wt = inverse_distance_swm(pts) wt[1:5,1:5]
Function for determining optimal spatial data discretization for individual variables based on locally estimated scatterplot smoothing (LOESS) model.
loess_optnum(qvec, discnumvec, increase_rate = 0.05)
loess_optnum(qvec, discnumvec, increase_rate = 0.05)
qvec |
A numeric vector of q statistics. |
discnumvec |
A numeric vector of break numbers corresponding to |
increase_rate |
(optional) The critical increase rate of the number of discretization.
Default is |
A two element numeric vector.
discnum
optimal number of spatial data discretization
increase_rate
the critical increase rate of the number of discretization
When increase_rate
is not satisfied by the calculation, the discrete number corresponding
to the highest q statistic
is selected as a return.
Note that sdsfun
sorts discnumvec
from smallest to largest and keeps qvec
in
one-to-one correspondence with discnumvec
.
qv = c(0.26045642,0.64120405,0.43938704,0.95165535,0.46347836, 0.25385338,0.78778726,0.95938330,0.83247885,0.09285196) loess_optnum(qv,3:12)
qv = c(0.26045642,0.64120405,0.43938704,0.95165535,0.46347836, 0.25385338,0.78778726,0.95938330,0.83247885,0.09285196) loess_optnum(qv,3:12)
Spatial autocorrelation test based on global moran index.
moran_test(sfj, wt = NULL, alternative = "greater", symmetrize = FALSE)
moran_test(sfj, wt = NULL, alternative = "greater", symmetrize = FALSE)
sfj |
An |
wt |
(optional) Spatial weight matrix. Must be a |
alternative |
(optional) Specification of alternative hypothesis as |
symmetrize |
(optional) Whether or not to symmetrize the asymmetrical spatial weight matrix
wt by: 1/2 * (wt + wt'). Default is |
A list with moran_test
class and result stored on the result
tibble.
Which contains the following information for each variable:
MoranI
observed value of the Moran coefficient
EI
expected value of Moran's I
VarI
variance of Moran's I (under normality)
ZI
standardized Moran coefficient
PI
p-value of the test statistic
This is a C++
implementation of the MI.vec
function in spfilteR
package,
and embellishes the console output.
The return result of this function is actually a list
, please access the result
tibble using $result
.
The non-numeric columns of the attribute columns in sfj
are ignored.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) moran_test(gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) moran_test(gzma)
normalization
normalize_vector(x, to_left = 0, to_right = 1)
normalize_vector(x, to_left = 0, to_right = 1)
x |
A continuous numeric vector. |
to_left |
(optional) Specified minimum. Default is |
to_right |
(optional) Specified maximum. Default is |
A continuous vector which has normalized.
normalize_vector(c(-5,1,5,0.01,0.99))
normalize_vector(c(-5,1,5,0.01,0.99))
Extract locations of sf objects.
sf_coordinates(sfj)
sf_coordinates(sfj)
sfj |
An |
A matrix.
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) sf_coordinates(pts)
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) sf_coordinates(pts)
Generates distance matrix for sf object
sf_distance_matrix(sfj)
sf_distance_matrix(sfj)
sfj |
An |
A matrix.
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) pts_distm = sf_distance_matrix(pts) pts_distm[1:5,1:5]
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) pts_distm = sf_distance_matrix(pts) pts_distm[1:5,1:5]
Get the geometry column name of an sf object
sf_geometry_name(sfj)
sf_geometry_name(sfj)
sfj |
An |
A character.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_geometry_name(gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_geometry_name(gzma)
Get the geometry type of an sf object
sf_geometry_type(sfj)
sf_geometry_type(sfj)
sfj |
An |
A lowercase character vector
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_geometry_type(gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_geometry_type(gzma)
Generates a utm projection epsg coding character corresponding to an sfj
object
under the WGS84 spatial reference.
sf_utm_proj_wgs84(sfj)
sf_utm_proj_wgs84(sfj)
sfj |
An |
A character.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_utm_proj_wgs84(gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) sf_utm_proj_wgs84(gzma)
Generates Voronoi diagram (Thiessen polygons) for sf object
sf_voronoi_diagram(sfj)
sf_voronoi_diagram(sfj)
sfj |
An |
An sf
object of polygon geometry type or can be converted to this by sf::st_as_sf()
.
Only sf objects of (multi-)point type are supported to generate voronoi diagram and the returned result includes only the geometry column.
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) pts_v = sf_voronoi_diagram(pts) library(ggplot2) ggplot() + geom_sf(data = pts_v, color = 'red', fill = 'transparent') + geom_sf(data = pts, color = 'blue', size = 1.25) + theme_void()
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) pts_v = sf_voronoi_diagram(pts) library(ggplot2) ggplot() + geom_sf(data = pts_v, color = 'red', fill = 'transparent') + geom_sf(data = pts, color = 'blue', size = 1.25) + theme_void()
Constructs spatial weight matrices based on contiguity via spdep
package.
spdep_contiguity_swm( sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE, style = "W", zero.policy = TRUE )
spdep_contiguity_swm( sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE, style = "W", zero.policy = TRUE )
sfj |
An |
queen |
(optional) if |
k |
(optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours to construct spatial weight matrices. |
order |
(optional) The order of the adjacency object. Default is |
cumulate |
(optional) Whether to accumulate adjacency objects. Default is |
style |
(optional) |
zero.policy |
(optional) if |
A matrix
When k
is set to a positive value, using K-Nearest Neighbor Weights.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) wt1 = spdep_contiguity_swm(gzma, k = 6, style = 'B') wt2 = spdep_contiguity_swm(gzma, queen = TRUE, style = 'B') wt3 = spdep_contiguity_swm(gzma, queen = FALSE, order = 2, style = 'B')
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) wt1 = spdep_contiguity_swm(gzma, k = 6, style = 'B') wt2 = spdep_contiguity_swm(gzma, queen = TRUE, style = 'B') wt3 = spdep_contiguity_swm(gzma, queen = FALSE, order = 2, style = 'B')
Constructs spatial weight matrices based on distance via spdep
package.
spdep_distance_swm( sfj, kernel = NULL, k = NULL, bandwidth = NULL, power = 1, style = "W", zero.policy = TRUE )
spdep_distance_swm( sfj, kernel = NULL, k = NULL, bandwidth = NULL, power = 1, style = "W", zero.policy = TRUE )
sfj |
An |
kernel |
(optional) The kernel function, can be one of |
k |
(optional) The number of nearest neighbours. Default is |
bandwidth |
(optional) The bandwidth, default is |
power |
(optional) Default is |
style |
(optional) |
zero.policy |
(optional) if |
five different kernel weight functions:
uniform:
,for
triangular
,for
quadratic (epanechnikov)
,for
quartic
,for
gaussian
For the equation above,
where
is the bandwidth
A matrix
When kernel
is setting, using distance weight based on kernel function, Otherwise
the inverse distance weight will be used.
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) wt1 = spdep_distance_swm(pts, style = 'B') wt2 = spdep_distance_swm(pts, kernel = 'gaussian') wt3 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian') wt4 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian', bandwidth = 10000)
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) wt1 = spdep_distance_swm(pts, style = 'B') wt2 = spdep_distance_swm(pts, kernel = 'gaussian') wt3 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian') wt4 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian', bandwidth = 10000)
spatial linear models selection
spdep_lmtest(formula, data, listw = NULL)
spdep_lmtest(formula, data, listw = NULL)
formula |
A formula for linear regression model. |
data |
An |
listw |
(optional) A listw. See |
A list
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) spdep_lmtest(PS_Score ~ ., gzma)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) spdep_lmtest(PS_Score ~ ., gzma)
construct neighbours list
spdep_nb(sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE)
spdep_nb(sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE)
sfj |
An |
queen |
(optional) if |
k |
(optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours. |
order |
(optional) The order of the adjacency object. Default is |
cumulate |
(optional) Whether to accumulate adjacency objects. Default is |
A neighbours list with class nb
When k
is set to a positive value, using K-Nearest Neighbor
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) nb1 = spdep_nb(pts, k = 6) nb2 = spdep_nb(pts, queen = TRUE) nb3 = spdep_nb(pts, queen = FALSE, order = 2)
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun')) nb1 = spdep_nb(pts, k = 6) nb2 = spdep_nb(pts, queen = TRUE) nb3 = spdep_nb(pts, queen = FALSE, order = 2)
SKATER forms clusters by spatially partitioning data that has similar values for features of interest.
spdep_skater(sfj, k = 6, nb = NULL, ini = 5, ...)
spdep_skater(sfj, k = 6, nb = NULL, ini = 5, ...)
sfj |
An |
k |
(optional) The number of clusters. Default is |
nb |
(optional) A neighbours list with class nb. If the input |
ini |
(optional) The initial node in the minimal spanning tree. Defaul is |
... |
(optional) Other parameters passed to spdep::skater(). |
A numeric vector of clusters.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) gzma_c = spdep_skater(gzma,8) gzma$group = gzma_c plot(gzma["group"])
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) gzma_c = spdep_skater(gzma,8) gzma$group = gzma_c plot(gzma["group"])
spatial variance
spvar(x, wt, method = "cpp")
spvar(x, wt, method = "cpp")
x |
A numerical vector . |
wt |
The spatial weight matrix. |
method |
(optional) The method for calculating spatial variance, which can be chosen as
either |
The spatial variance formula is
A numerical value.
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) wt1 = inverse_distance_swm(gzma) spvar(gzma$PS_Score,wt1)
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) wt1 = inverse_distance_swm(gzma) spvar(gzma$PS_Score,wt1)
Spatial stratified heterogeneity test based on geographical detector q value.
ssh_test(y, hs)
ssh_test(y, hs)
y |
Variable Y, continuous numeric vector. |
hs |
Spatial stratification or classification of each explanatory variable.
|
A tibble
This is a C++
implementation of the factor_detector
function in gdverse
package.
ssh_test(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
ssh_test(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
To calculate the Z-score using variance normalization, the formula is as follows:
standardize_vector(x)
standardize_vector(x)
x |
A numeric vector |
A standardized numeric vector
standardize_vector(1:10)
standardize_vector(1:10)
convert discrete variables in a tibble to integers
tbl_all2int(tbl)
tbl_all2int(tbl)
tbl |
A |
A converted tibble
,data.frame
or sf
object.
demotbl = tibble::tibble(x = c(1,2,3,3,1), y = letters[1:5], z = c(1L,1L,2L,2L,3L), m = factor(letters[1:5],levels = letters[5:1])) tbl_all2int(demotbl)
demotbl = tibble::tibble(x = c(1,2,3,3,1), y = letters[1:5], z = c(1L,1L,2L,2L,3L), m = factor(letters[1:5],levels = letters[5:1])) tbl_all2int(demotbl)