Introduction to PRECISION.seq.augmented package • PRECISION.seq.augmented

Overview

The PRECISION.seq.augmented package provides a comprehensive framework for clustering analysis of sequencing data through a two-stage pipeline:

Data Harmonization: The first crucial step involves harmonizing the sequencing data to ensure consistency and compatibility across different samples. In this package, we have 9 different data harmonization methods, including:

Trimmed mean of M-values (TMM)
Total count harmonization (TC)
Upper quantile harmonization (UQ)
Median harmonization (med)
Quantile harmonization (QU)
DESeq harmonization (DESeq)
PossionSeq harmonization (PossionSeq)
Quantile normalization (QN)
Surrogate Variable Analysis (SVA)
RUV normalization (RUVg, RUVr, RUVs)
ComBat-Seq harmonization (ComBat-Seq)

Clustering Analysis: Following data harmonization, one can perform clustering analysis using the following clustering methods:

K-means clustering (kmeans)
Hierarchical clustering (hc)
Self-organizing map (SOM)
Gaussian mixture model (MNM)
PAM clustering (pam.eucledian, pam.pearson, pam.spearman)

Basic Usage

Here’s a basic example of how to perform clustering analysis:

# Step 0: Prepare the 'precision' object
example_cluster <- create.precision.cluster(data = data.test, label = data.group)

# Step 1: Harmonize via all the methods
example_cluster <- harmon.all(example_cluster)

# Step 2: Clustering via all the methods
example_cluster <- cluster.all(example_cluster)

# Step 4: Measure the consistency between the clustering labels and the true labels
cluster <- c("kmeans", "hc", "som", "mnm", "pam.euclidean", "pam.pearson", "pam.spearman")
harmon <- c("Raw", "TC", "UQ", "med", "TMM", "DESeq", "PoissonSeq", "QN", "RUVg", "RUVs", "RUVr")
ari_indexes <- data.frame(matrix(nrow = length(cluster), ncol = length(harmon)))
true_label <- as.factor(c(rep("MXF", 27), rep("PMFH", 27)))
for (i in seq_along(cluster)) {
  for (j in seq_along(harmon)) {
    eval(parse(text = paste0("est_cluster <-", "example_cluster@cluster.result", "$", cluster[i], "$", harmon[j])))
    ari_indexes[i, j] <- mclust::adjustedRandIndex(true_label, est_cluster)
  }
}
rownames(ari_indexes) <- cluster
colnames(ari_indexes) <- harmon