Generate hierarchical cluster data
rs_data_hierarchical.RdGenerates synthetic data with a two-level cluster hierarchy:
n_supergroups top-level groups each containing n_subclusts tight
subclusters. Supergroup centres are spread far apart; subcluster centres sit
tightly around their supergroup centre.
Note that the actual number of samples returned may be slightly less than
n_samples if it is not evenly divisible by n_supergroups * n_subclusts.
Usage
rs_data_hierarchical(
n_samples,
dim,
n_supergroups,
n_subclusts,
supergroup_spread,
subcluster_spread,
point_std,
seed
)Arguments
- n_samples
Integer. Total number of points, distributed evenly across all subclusters.
- dim
Integer. Dimensionality of the ambient space.
- n_supergroups
Integer. Number of top-level groups. Defaults to
3.- n_subclusts
Integer. Number of subclusters per supergroup. Defaults to
3.- supergroup_spread
Numeric. Spread of supergroup centres. Defaults to
15.0.- subcluster_spread
Numeric. Spread of subcluster centres around their supergroup centre. Defaults to
2.0.- point_std
Numeric. Within-subcluster Gaussian noise. Defaults to
0.4.- seed
Integer. Seed for reproducibility.