Skip to contents

This function generates a kNN graph based on a given numeric matrix. Four different algorithms are implemented with different speed and accuracy trade-offs:

  • "hnsw" - Hierarchical Navigable Small Worlds vector search with slower index generation (which ammortises on larger data sets) and high precision.

  • "annoy" - Approximate Nearest Neighbours Oh Yeah algorithm, faster to index but querying on large data sets can be slow.

  • "nndescent" - Rust-based implementation of the PyNNDescent algorithm, a good all-rounder that performs well on very large data sets.

  • "balltree" - Ball tree structure for exact search with a configurable budget.

  • "exhaustive" - Exact nearest neighbour search.

Usage

generate_knn_graph(
  data,
  k,
  knn_method = c("hnsw", "annoy", "nndescent", "balltree", "exhaustive"),
  nn_params = params_nn(),
  seed = 42L,
  .verbose = TRUE
)

Arguments

data

Numeric matrix. The embedding or feature matrix to compute neighbours on. Rows are observations, columns are features.

k

Integer. The number of nearest neighbours to compute.

knn_method

Character. The algorithm to use for nearest neighbour search. One of c("hnsw", "annoy", "nndescent", "balltree", "exhaustive"). Defaults to "hnsw".

nn_params

List. Output of params_nn(). Controls algorithm-specific parameters.

seed

Integer. For reproducibility. Defaults to 42L.

.verbose

Boolean. Controls verbosity.

Value

A nearest neighbours class object with 1-indexed neighbour indices and distances.