Title: | Calculate Pairwise Distances |
---|---|
Description: | A common framework for calculating distance matrices. |
Authors: | Nello Blaser [aut, cre] |
Maintainer: | Nello Blaser <[email protected]> |
License: | GPL |
Version: | 0.0.6 |
Built: | 2024-11-11 04:14:16 UTC |
Source: | https://github.com/blasern/rdist |
Farthest point sampling returns a reordering of the metric space P = p_1, ..., p_k, such that each p_i is the farthest point from the first i-1 points.
farthest_point_sampling( mat, metric = "precomputed", k = nrow(mat), initial_point_index = 1L, return_clusters = FALSE )
farthest_point_sampling( mat, metric = "precomputed", k = nrow(mat), initial_point_index = 1L, return_clusters = FALSE )
mat |
Original distance matrix |
metric |
Distance metric to use (either "precomputed" or a metric from |
k |
Number of points to sample |
initial_point_index |
Index of p_1 |
return_clusters |
Should the indices of the closest farthest points be returned? |
# generate data df <- matrix(runif(200), ncol = 2) dist_mat <- pdist(df) # farthest point sampling fps <- farthest_point_sampling(dist_mat) fps2 <- farthest_point_sampling(df, metric = "euclidean") all.equal(fps, fps2) # have a look at the fps distance matrix rdist(df[fps[1:5], ]) dist_mat[fps, fps][1:5, 1:5]
# generate data df <- matrix(runif(200), ncol = 2) dist_mat <- pdist(df) # farthest point sampling fps <- farthest_point_sampling(dist_mat) fps2 <- farthest_point_sampling(df, metric = "euclidean") all.equal(fps, fps2) # have a look at the fps distance matrix rdist(df[fps[1:5], ]) dist_mat[fps, fps][1:5, 1:5]
Does the distance matric come from a metric
is_distance_matrix(mat, tolerance = .Machine$double.eps^0.5) triangle_inequality(mat, tolerance = .Machine$double.eps^0.5)
is_distance_matrix(mat, tolerance = .Machine$double.eps^0.5) triangle_inequality(mat, tolerance = .Machine$double.eps^0.5)
mat |
The matrix to evaluate |
tolerance |
Differences smaller than tolerance are not reported. |
data <- matrix(rnorm(20), ncol = 2) dm <- pdist(data) is_distance_matrix(dm) triangle_inequality(dm) dm[1, 2] <- 1.1 * dm[1, 2] is_distance_matrix(dm)
data <- matrix(rnorm(20), ncol = 2) dm <- pdist(data) is_distance_matrix(dm) triangle_inequality(dm) dm[1, 2] <- 1.1 * dm[1, 2] is_distance_matrix(dm)
Returns the p-product metric of two metric spaces. Works for output of 'rdist', 'pdist' or 'cdist'.
product_metric(..., p = 2)
product_metric(..., p = 2)
... |
Distance matrices or dist objects |
p |
The power of the Minkowski distance |
# generate data df <- matrix(runif(200), ncol = 2) # distance matrices dist_mat <- pdist(df) dist_1 <- pdist(df[, 1]) dist_2 <- pdist(df[, 2]) # product distance matrix dist_prod <- product_metric(dist_1, dist_2) # check equality all.equal(dist_mat, dist_prod)
# generate data df <- matrix(runif(200), ncol = 2) # distance matrices dist_mat <- pdist(df) dist_1 <- pdist(df[, 1]) dist_2 <- pdist(df[, 2]) # product distance matrix dist_prod <- product_metric(dist_1, dist_2) # check equality all.equal(dist_mat, dist_prod)
rdist
provide a common framework to calculate distances. There are three main functions:
rdist
computes the pairwise distances between observations in one matrix and returns a dist
object,
pdist
computes the pairwise distances between observations in one matrix and returns a matrix
, and
cdist
computes the distances between observations in two matrices and returns a matrix
.
In particular the cdist
function is often missing in other distance functions. All
calculations involving NA
values will consistently return NA
.
rdist(X, metric = "euclidean", p = 2L) pdist(X, metric = "euclidean", p = 2) cdist(X, Y, metric = "euclidean", p = 2)
rdist(X, metric = "euclidean", p = 2L) pdist(X, metric = "euclidean", p = 2) cdist(X, Y, metric = "euclidean", p = 2)
X , Y
|
A matrix |
metric |
The distance metric to use |
p |
The power of the Minkowski distance |
Available distance measures are (written for two vectors v and w):
"euclidean"
:
"minkowski"
:
"manhattan"
:
"maximum"
or "chebyshev"
:
"canberra"
:
"angular"
:
"correlation"
:
"absolute_correlation"
:
"hamming"
:
"jaccard"
:
Any function that defines a distance between two vectors.