| Title: | Calculate Pairwise Distances |
|---|---|
| Description: | A common framework for calculating distance matrices. |
| Authors: | Nello Blaser [aut, cre] |
| Maintainer: | Nello Blaser <[email protected]> |
| License: | GPL |
| Version: | 0.0.6 |
| Built: | 2026-06-01 09:05:07 UTC |
| Source: | https://github.com/blasern/rdist |
Farthest point sampling returns a reordering of the metric space P = p_1, ..., p_k, such that each p_i is the farthest point from the first i-1 points.
farthest_point_sampling( mat, metric = "precomputed", k = nrow(mat), initial_point_index = 1L, return_clusters = FALSE )farthest_point_sampling( mat, metric = "precomputed", k = nrow(mat), initial_point_index = 1L, return_clusters = FALSE )
mat |
Original distance matrix |
metric |
Distance metric to use (either "precomputed" or a metric from |
k |
Number of points to sample |
initial_point_index |
Index of p_1 |
return_clusters |
Should the indices of the closest farthest points be returned? |
# generate data df <- matrix(runif(200), ncol = 2) dist_mat <- pdist(df) # farthest point sampling fps <- farthest_point_sampling(dist_mat) fps2 <- farthest_point_sampling(df, metric = "euclidean") all.equal(fps, fps2) # have a look at the fps distance matrix rdist(df[fps[1:5], ]) dist_mat[fps, fps][1:5, 1:5]# generate data df <- matrix(runif(200), ncol = 2) dist_mat <- pdist(df) # farthest point sampling fps <- farthest_point_sampling(dist_mat) fps2 <- farthest_point_sampling(df, metric = "euclidean") all.equal(fps, fps2) # have a look at the fps distance matrix rdist(df[fps[1:5], ]) dist_mat[fps, fps][1:5, 1:5]
Does the distance matric come from a metric
is_distance_matrix(mat, tolerance = .Machine$double.eps^0.5) triangle_inequality(mat, tolerance = .Machine$double.eps^0.5)is_distance_matrix(mat, tolerance = .Machine$double.eps^0.5) triangle_inequality(mat, tolerance = .Machine$double.eps^0.5)
mat |
The matrix to evaluate |
tolerance |
Differences smaller than tolerance are not reported. |
data <- matrix(rnorm(20), ncol = 2) dm <- pdist(data) is_distance_matrix(dm) triangle_inequality(dm) dm[1, 2] <- 1.1 * dm[1, 2] is_distance_matrix(dm)data <- matrix(rnorm(20), ncol = 2) dm <- pdist(data) is_distance_matrix(dm) triangle_inequality(dm) dm[1, 2] <- 1.1 * dm[1, 2] is_distance_matrix(dm)
Returns the p-product metric of two metric spaces. Works for output of 'rdist', 'pdist' or 'cdist'.
product_metric(..., p = 2)product_metric(..., p = 2)
... |
Distance matrices or dist objects |
p |
The power of the Minkowski distance |
# generate data df <- matrix(runif(200), ncol = 2) # distance matrices dist_mat <- pdist(df) dist_1 <- pdist(df[, 1]) dist_2 <- pdist(df[, 2]) # product distance matrix dist_prod <- product_metric(dist_1, dist_2) # check equality all.equal(dist_mat, dist_prod)# generate data df <- matrix(runif(200), ncol = 2) # distance matrices dist_mat <- pdist(df) dist_1 <- pdist(df[, 1]) dist_2 <- pdist(df[, 2]) # product distance matrix dist_prod <- product_metric(dist_1, dist_2) # check equality all.equal(dist_mat, dist_prod)
rdist provide a common framework to calculate distances. There are three main functions:
rdist computes the pairwise distances between observations in one matrix and returns a dist object,
pdist computes the pairwise distances between observations in one matrix and returns a matrix, and
cdist computes the distances between observations in two matrices and returns a matrix.
In particular the cdist function is often missing in other distance functions. All
calculations involving NA values will consistently return NA.
rdist(X, metric = "euclidean", p = 2L) pdist(X, metric = "euclidean", p = 2) cdist(X, Y, metric = "euclidean", p = 2)rdist(X, metric = "euclidean", p = 2L) pdist(X, metric = "euclidean", p = 2) cdist(X, Y, metric = "euclidean", p = 2)
X, Y
|
A matrix |
metric |
The distance metric to use |
p |
The power of the Minkowski distance |
Available distance measures are (written for two vectors v and w):
"euclidean":
"minkowski":
"manhattan":
"maximum" or "chebyshev":
"canberra":
"angular":
"correlation":
"absolute_correlation":
"hamming":
"jaccard":
Any function that defines a distance between two vectors.