
Intro
Description
This is an opionated package making various bioinformatics workflows in R (or ported from Python) much faster via low-level implementations in Rust. The core idea is to take different methods, write implementations in a compiled, memory-managed language with minimal kernel round trips and leverage R purely as an orchestraction layer. Result? Blazingly fast performance with low memory usage, making large-scale analyses feasable without any cloud compute. Over time more and more methods will be added. The aim will be to come a tidyverse equivalent, but for a lot of downstream methods post WGS processing. There is a sister package for plotting functions being build in parallel, see here (that one is in alpha phase).
Release notes
With the release of 0.3.0 a lot has happened. The lack of updates had a (big) reason… Some cooking has been going on… Since 0.3.0 the package contains a full release of the teasered single cell functionality suite that you can use to analyse millions of cells locally and implements already a large number of methods in this space into Rust and exposes them to R. Please checkout out the website of the package for details; particularly the sections around single cell (design choices and vignettes.)
Usage
Installation
You will need Rust on your system to have the package working. An installation guide is provided here. There is a bunch of further help written here by the rextendr guys in terms of Rust set up. (bixverse uses rextendr to interface with Rust.)
Steps for installation:
- In the terminal, install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- In R, install rextendr:
install.packages("rextendr")
- Finally install bixverse:
devtools::install_github("https://github.com/GregorLueg/bixverse")
Windows support
If you are using Windows, I am sorry, the tool chain is just very, very painful… I really tried to make this work and maybe there are some hacks in terms of compiling everything to install the package, but it has proven… challenging in the CI/CD. Hence, no official Windows support for now. It is specifically the incorporation of h5 which proves non-trivial with cross-compiling that with Rust within the R umbrella.
How to use the package.
The package website can be found here. A good primer for why Rust is here - a show case of how much faster Rust can make a lot of basic functions much faster. If you wish to integrate this into your package, please feel free. If you wish to use the single cell part, it is really worth reading this here first… It will give you a good explainer on the design decisions, the choices and trade-offs. The various vignettes will show you how to analyse data.
Roadmap
Single and spatial transcriptomics:
- For single cell the following stuff will be hopefully soon’ish implemented:
- More multi-file read in support. At the moment, multiple h5ad files are possible, but not yet for other file formats.
- h5 file i/o (provided by CellRanger).
- Saving data to h5ad for easier interoperability with Python.
- Something cool with Zarr … ?
- Methods on top of the meta cells: co-expression network detection etc.
- Expansion of the sister package to have plotting helpers in there for single cell.
- Helpers to slice and dice the data easier and add new data - this will
- Implementations of Palantir and Slingshot.
- Port over NicheNet
- Add more GPU-acceleration via cubecl/WGPU backend for GPU-agnostic acceleration where appropriate, see another sister package
- Leverage the current infrastructure and add dedicated support and methods for spatial transcriptomics. There are some cool methods in that space that for sure could benefit from the speed that a compiled, memory-managed language offers. Especially when analysing more data sets.
- Add other interesting methods that I can find (and have a use-case for).
For developers
If you wish to contribute, please read the Code Style.