PRECISION.seq.augmented: An R Package for Performance Assessment of Depth Normalization Methods in MicroRNA Sequencing using Augmented Data
The PRECISION.seq.augmented R package offers a comprehensive framework for evaluating depth normalization methods in microRNA sequencing data analysis. It allows investigating normalization performance in both clustering and classification contexts and provides researchers with tools to assess how different normalization approaches affect analytical outcomes. Using AI-augmented miRNA-seq data derived from paired benchmark and test datasets, the package enables systematic comparison across controlled conditions with varying biological signal strengths and technical artifact magnitudes. PRECISION.seq.augmented implements multiple normalization techniques, clustering approaches, and classification algorithms, allowing researchers to identify optimal strategies for their specific analytical needs and reproduce findings from our publications. This package represents an essential resource for researchers seeking to maximize the reliability and reproducibility of insights derived from miRNA sequencing data.
Installation
You can install the released version of PRECISION.seq.augmented directly from GitHub using devtools
:
devtools::install_github("Omics-Data-Harmonization-EBP/PRECISION.seq.augmented")
The R package PoissonSeq
for PoissonSeq normalization was removed from CRAN, but you can install the archived version from GitHub:
devtools::install_github("cran/PoissonSeq")
For successful installation, ensure all dependencies are properly installed. This package is based on R 4.2, and the following helper functions will install all required dependencies:
## from CRAN
CRAN.packages <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
}
CRAN.packages(c("BiocManager", "caret", "e1071", "glmnet", "pamr", "mclust", "cluster", "factoextra", "som", "digest"))
## from Bioconductor
Bioconductor.packages <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
BiocManager::install(new.pkg, dependencies = TRUE)
}
Bioconductor.packages(c("Biobase", "BiocGenerics", "edgeR", "EDASeq", "RUVSeq", "DESeq2", "preprocessCore", "sva"))
## from GitHub
devtools::install_github("cran/PoissonSeq")
Main Functions
The full package documentation with detailed function parameters and examples can be found on the package documentation website.
Data Access Functions
-
load.augmented.data()
- Load pre-generated augmented miRNA-seq datasets. Data will be downloaded from GitHub or loaded from storage if it was previously downloaded -
cleanup.augmented.data()
- Remove cached augmented data
Core Object and Data Modulation
-
create.precision.cluster()
- Creates the main analysis object for clustering evaluation handling data and results -
create.precision.classification()
- Creates the main analysis object for classification evaluation handling data and results -
biological.effects()
- Apply biological effects with amplification factors -
handling.effects()
- Modulate handling artifacts in test datasets
Harmonization Methods
The package implements multiple data harmonization techniques applicable to both clustering and classification:
-
harmon.all()
- Apply all of the following harmonization methods sequentially -
harmon.TC()
- Total Count normalization (scaling by library size) -
harmon.UQ()
- Upper Quartile normalization (scaling by 75th percentile) -
harmon.med()
- Median normalization (scaling by median count) -
harmon.TMM()
- Trimmed Mean of M-values normalization (edgeR) -
harmon.DESeq()
- DESeq2 normalization (geometric mean approach) -
harmon.PoissonSeq()
- PoissonSeq normalization (robust over-dispersed Poisson model) -
harmon.QN()
- Quantile normalization -
harmon.SVA()
- Surrogate Variable Analysis batch correction -
harmon.RUVr()
- Remove Unwanted Variation (residuals) -
harmon.RUVs()
- Remove Unwanted Variation (control samples) -
harmon.RUVg()
- Remove Unwanted Variation (control genes) -
harmon.ComBat.Seq()
- ComBat-seq batch effect adjustment
Clustering Algorithms
The package implements multiple clustering approaches with various distance metrics:
-
cluster.all()
- Apply all of the following clustering methods sequentially -
cluster.hc()
- Hierarchical clustering with multiple distance metrics (Euclidean distance, Pearson correlation, and Spearman correlation) -
cluster.kmeans()
- K-means clustering with configurable starting points and iteration parameters -
cluster.pam()
- Partitioning Around Medoids with the same distance metric options as hierarchical clustering (Euclidean, Pearson, Spearman). -
cluster.som()
- Self-Organizing Maps for non-linear dimensionality reduction and clustering, particularly suited for high-dimensional data -
cluster.mnm()
- Gaussian Mixture Model clustering with automated model selection using BIC.
Classification Algorithms
The package implements multiple learning methods for sample classification to evaluate how normalization affects predictive performance across training and validation datasets:
-
classification.all()
- Apply all of the following clustering methods sequentially -
classification.knn()
- k-Nearest Neighbor classification -
classification.svm()
- Support Vector Machine classification -
classification.pam()
- Prediction Analysis for Microarrays using nearest shrunken centroids, with built-in feature selection -
classification.lasso()
- Logistic Regression with LASSO regularization for automatic feature selection and reduced overfitting -
classification.ranfor()
- Random Forest classification