Skip to contents

Introduction

scGraphVerse is a modular and extensible R package for constructing, comparing, and visualizing gene regulatory networks (GRNs) from single-cell RNAseq data. It includes a collection of inference algorithms, post-processing utilities, and visualization functions, all designed with interoperability, reproducibility, and scalability in mind.

Single-cell data presents unique challenges: high sparsity, batch variation, and the need to distinguish shared versus condition-specific regulation. scGraphVerse helps address these through multi-method inference, consensus building, simulation benchmarking, and communities and pathway analysis. It starts with SingleCellExperiment, Seurat or matrix-based objects, enabling flexible pipelines across diverse experimental setups.

Why use scGraphVerse?

While several GRN inference packages exist in Bioconductor (e.g. GENIE3, minet, SCENIC, netZoo), most are limited to single-method inference or lack support for multi-condition, multi-replicate, and multi-method comparisons. scGraphVerse adds value by:

  1. Supporting multiple inference algorithms, including GENIE3, GRNBoost2, ZILGM, PCzinb, and Joint Random Forests (JRF).
  2. Providing joint modeling of GRNs across multiple conditions via JRF
  3. Enabling benchmarking simulations from known ground-truth GRNs with customizable ZINB-based count matrix generation.
  4. Offering community analysis, including consensus GRNs, similarity analysis, and STRINGdb integration.

Users can plug in their single-cell objects, infer networks and visualize regulatory structure using built-in radar plots, ROC curves, and community overlays.

How scGraphVerse works

The typical scGraphVerse workflow begins with one or more count matrices, or objects like SingleCellExperiment or Seurat. These are passed to the core function infer_networks(), which runs the selected inference methods and returns an adjacency matrix or list of matrices.

For benchmarking or teaching purposes, synthetic datasets can be generated using zinb_simdata() based on a known adjacency matrix. This supports method comparison across early, late, and joint integration strategies using earlyj() and compare_consensus().

An end-to-end workflow, from network inference to community detection and consensus visualization, is provided in the rest of vignettes.

As intended for both high-throughput analysis and intuitive usage, scGraphVerse supports custom workflows using its modular functions.

Simulation study

In this simulation study, we use scGraphVerse to:

  1. Define a ground-truth regulatory network from high-confidence interactions.
  2. Simulate zero-inflated scRNA-seq count data that respects the ground truth.
  3. Infer gene regulatory networks using GENIE3.
  4. Evaluate performance with ROC curves, precision–recall scores, and community similarity.
  5. Build consensus networks and perform edge mining.

1. Defining a Ground-Truth Network from STRINGdb

We select 500 top-variable T-cell genes and fetch high-confidence edges (score ≥ 900) from STRINGdb as our ground truth.

# Note: This section requires internet connectivity for external data downloads
# To run this section, set eval=TRUE and ensure internet connectivity

# 1. Load PBMC data
pbmc_obj <- TENxPBMCData("pbmc3k")

sce <- logNormCounts(pbmc_obj)
symbols_tenx <- rowData(sce)$Symbol_TENx
valid <- !is.na(symbols_tenx) & symbols_tenx != ""
sce <- sce[valid, ]
rownames(sce) <- make.unique(symbols_tenx[valid])
logcounts(sce) <- as.matrix(logcounts(sce))
colnames(sce) <- paste0("cell_", seq_len(ncol(sce)))

ref <- celldex::HumanPrimaryCellAtlasData()
pred <- SingleR(test = sce, ref = ref, labels = ref$label.main)
colData(sce)$predicted_celltype <- pred$labels

# 2. Select top 500 T-cell genes
genes <- selgene(
    object = sce,
    top_n = 500,
    cell_type = "T_cells",
    cell_type_col = "predicted_celltype",
    remove_rib = TRUE,
    remove_mt = TRUE
)

# 3. Retrieve STRINGdb adjacency
str_res <- stringdb_adjacency(
    genes = genes,
    species = 9606,
    required_score = 900,
    keep_all_genes = FALSE
)

wadj_truth <- str_res$weighted
adj_truth <- str_res$binary
adj_truth <- adj_truth[order(rownames(adj_truth)), order(colnames(adj_truth))]

# 5. Visualize network
gtruth <- graph_from_adjacency_matrix(adj_truth, mode = "undirected")
ggraph(gtruth, layout = "fr") +
    geom_edge_link(color = "gray") +
    geom_node_point(color = "steelblue") +
    ggtitle(paste0(
        "Ground Truth: ",
        vcount(gtruth),
        " nodes, ",
        ecount(gtruth),
        " edges"
    )) +
    theme_minimal()

For demonstration purposes, we’ll use the built-in example data:

# Use built-in example data for demonstration
data("adj_truth")
data("count_matrices")

# Visualize the example ground truth network
gtruth <- igraph::graph_from_adjacency_matrix(adj_truth, mode = "undirected")
ggraph(gtruth, layout = "fr") +
    geom_edge_link(color = "gray") +
    geom_node_point(color = "steelblue") +
    ggtitle(paste0(
        "Example Ground Truth: ",
        igraph::vcount(gtruth),
        " nodes, ",
        igraph::ecount(gtruth),
        " edges"
    )) +
    theme_minimal()

Ground truth network nodes and edges

2. Simulating Zero-Inflated Count Data

We simulate three batches (n=50 cells each) of count matrices that follow the ground-truth network topology with dropout.

# Simulation parameters
nodes <- nrow(adj_truth)
sims <- zinb_simdata(
    n = 50,
    p = nodes,
    B = adj_truth,
    mu_range = list(c(1, 4), c(1, 7), c(1, 10)),
    mu_noise = c(1, 3, 5),
    theta = c(1, 0.7, 0.5),
    pi = c(0.2, 0.2, 0.2),
    kmat = 3,
    depth_range = c(0.8 * nodes * 3, 1.2 * nodes * 3)
)
# Transpose to cells × genes
count_matrices <- lapply(sims, t)

3. Inferring Networks with JRF

We run GENIE3 across the simulated batches to infer batch-specific regulatory edges.

networks_joint <- infer_networks(
    count_matrices_list = count_matrices,
    method = "GENIE3",
    nCores = 1
)
# Weighted adjacency
wadj_list <- generate_adjacency(networks_joint)
# Symmetrize weights
swadj_list <- symmetrize(wadj_list, weight_function = "mean")

4. ROC Curve and AUC

Plot the ROC curve comparing continuous edge weights to the binary ground truth.

roc_res <- plotROC(
    swadj_list,
    adj_truth,
    plot_title = "ROC Curve: JRF Joint Integration",
    is_binary = FALSE
)
roc_res$plot

ROC curve

auc_joint <- roc_res$auc

4.1. Precision–Recall and Graph Visualization

Compute precision scores and visualize the binary networks.

# Binary cutoff at 95th percentile
binary_listj <- cutoff_adjacency(
    count_matrices = count_matrices,
    weighted_adjm_list = swadj_list,
    n = 2,
    method = "GENIE3",
    quantile_threshold = 0.95,
    nCores = 1,
    debug = TRUE
)
#> [Method: GENIE3] Matrix 1 → Cutoff = 0.05923
#> [Method: GENIE3] Matrix 2 → Cutoff = 0.06142
#> [Method: GENIE3] Matrix 3 → Cutoff = 0.06199

# Precision scores
pscores_joint <- pscores(adj_truth, binary_listj)

result Graphs

head(pscores_joint)
#> $Statistics
#>   Predicted_Matrix TP  TN FP FN       TPR        FPR Precision        F1
#> 1         Matrix 1 17 539 25 14 0.5483871 0.04432624 0.4047619 0.4657534
#> 2         Matrix 2  7 544 20 24 0.2258065 0.03546099 0.2592593 0.2413793
#> 3         Matrix 3  4 537 27 27 0.1290323 0.04787234 0.1290323 0.1290323
#>          MCC
#> 1 0.43733694
#> 2 0.20323892
#> 3 0.08115992
#> 
#> $Radar
#> $Radar$data
#>                TPR     1-FPR Precision        F1        MCC
#> Max      1.0000000 1.0000000 1.0000000 1.0000000 1.00000000
#> Min      0.0000000 0.0000000 0.0000000 0.0000000 0.00000000
#> Matrix 1 0.5483871 0.9556738 0.4047619 0.4657534 0.43733694
#> Matrix 2 0.2258065 0.9645390 0.2592593 0.2413793 0.20323892
#> Matrix 3 0.1290323 0.9521277 0.1290323 0.1290323 0.08115992
#> 
#> $Radar$plot

# Network plot
plotg(binary_listj)

result Graphs

5. Consensus Networks and Community Similarity

Aggregate inferred binaries by majority vote and compare community structure to ground truth.

# Consensus matrix
consensus <- create_consensus(binary_listj, method = "vote")

plotg(list(consensus))

consensus Graph

# Note: This section may require internet connectivity for STRINGdb validation
# To run this section, set eval=TRUE and ensure internet connectivity

# Compare consensus to truth
evaluate_consensus <- compare_consensus(
    consensus_matrix = consensus,
    reference_matrix = adj_truth,
    false_plot = FALSE
)

# Community detection

comm_truth <- community_path(adj_truth)
#> Detecting communities...

Community detection adj_truth

#> Running pathway enrichment...
#> 'select()' returned 1:1 mapping between keys and columns
#> Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
#> Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
comm_cons <- community_path(consensus)
#> Detecting communities...

Community detection consensus

#> Running pathway enrichment...
# Similarity
sim_score <- community_similarity(comm_truth, list(comm_cons))

compare Graph with referencecompare Graph with reference

5.1. Edge Mining

Identify true positive edges in the consensus network using edge mining.

em <- edge_mining(list(consensus), adj_truth, query_edge_types = "TP")
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 20.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Rome
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] celldex_1.16.0              SingleR_2.8.0              
#>  [3] org.Hs.eg.db_3.20.0         AnnotationDbi_1.68.0       
#>  [5] scater_1.34.1               scuttle_1.16.0             
#>  [7] TENxPBMCData_1.24.0         HDF5Array_1.34.0           
#>  [9] rhdf5_2.50.2                DelayedArray_0.32.0        
#> [11] SparseArray_1.6.2           S4Arrays_1.6.0             
#> [13] abind_1.4-8                 Matrix_1.7-3               
#> [15] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
#> [17] Biobase_2.66.0              GenomicRanges_1.58.0       
#> [19] GenomeInfoDb_1.42.3         IRanges_2.40.1             
#> [21] S4Vectors_0.44.0            BiocGenerics_0.52.0        
#> [23] MatrixGenerics_1.18.1       matrixStats_1.5.0          
#> [25] ggraph_2.2.1                ggplot2_3.5.2              
#> [27] igraph_2.1.4                scGraphVerse_0.99.0        
#> [29] BiocStyle_2.34.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] graph_1.84.1              hash_2.2.6.3             
#>   [3] ica_1.0-3                 plotly_4.11.0            
#>   [5] Formula_1.2-6             zlibbioc_1.52.0          
#>   [7] tidyselect_1.2.1          bit_4.6.0                
#>   [9] doParallel_1.0.17         lattice_0.20-44          
#>  [11] blob_1.2.4                stringr_1.5.1            
#>  [13] parallel_4.4.2            hdrcde_3.4               
#>  [15] png_0.1-8                 plotrix_3.8-4            
#>  [17] cli_3.6.5                 ggplotify_0.1.2          
#>  [19] askpass_1.2.1             goftest_1.2-3            
#>  [21] pkgdown_2.1.3             textshaping_1.0.1        
#>  [23] purrr_1.1.0               BiocNeighbors_2.0.1      
#>  [25] fdatest_2.1.1             uwot_0.2.3               
#>  [27] curl_6.4.0                deSolve_1.40             
#>  [29] mime_0.13                 evaluate_1.0.4           
#>  [31] tidytree_0.4.6            gsubfn_0.7               
#>  [33] stringi_1.8.7             pROC_1.18.5              
#>  [35] backports_1.5.0           desc_1.4.3               
#>  [37] multinet_4.2.2            XML_3.99-0.18            
#>  [39] Exact_3.3                 httpuv_1.6.16            
#>  [41] magrittr_2.0.3            clusterProfiler_4.14.6   
#>  [43] rappdirs_0.3.3            splines_4.4.2            
#>  [45] mclust_6.1.1              rainbow_3.8              
#>  [47] pcaPP_2.0-5               rentrez_1.2.4            
#>  [49] dplyr_1.1.4               networkD3_0.4.1          
#>  [51] sctransform_0.4.2         rootSolve_1.8.2.4        
#>  [53] ggbeeswarm_0.7.2          DBI_1.2.3                
#>  [55] jquerylib_0.1.4           reactome.db_1.89.0       
#>  [57] withr_3.0.2               class_7.3-19             
#>  [59] systemfonts_1.2.3         enrichplot_1.26.6        
#>  [61] lmtest_0.9-40             tidygraph_1.3.1          
#>  [63] BiocManager_1.30.26       htmlwidgets_1.6.4        
#>  [65] fs_1.6.6                  ggrepel_0.9.6            
#>  [67] labeling_0.4.3            cellranger_1.1.0         
#>  [69] lmom_3.2                  reticulate_1.43.0        
#>  [71] robin_2.1.0               zoo_1.8-14               
#>  [73] XVector_0.46.0            knitr_1.50               
#>  [75] UCSC.utils_1.2.0          foreach_1.5.2            
#>  [77] fda_6.3.0                 patchwork_1.3.1          
#>  [79] caTools_1.18.3            grid_4.4.2               
#>  [81] data.table_1.17.8         ggtree_3.14.0            
#>  [83] R.oo_1.27.1               RSpectra_0.16-2          
#>  [85] irlba_2.3.5.1             alabaster.schemas_1.6.0  
#>  [87] fastDummies_1.7.5         gridGraphics_0.5-1       
#>  [89] DescTools_0.99.60         lazyeval_0.2.2           
#>  [91] yaml_2.3.10               survival_3.2-11          
#>  [93] scattermore_1.2           BiocVersion_3.20.0       
#>  [95] crayon_1.5.3              RcppAnnoy_0.0.22         
#>  [97] RColorBrewer_1.1-3        tidyr_1.3.1              
#>  [99] progressr_0.15.1          tweenr_2.0.3             
#> [101] later_1.4.2               ggridges_0.5.6           
#> [103] fds_1.8                   codetools_0.2-18         
#> [105] Seurat_5.3.0              KEGGREST_1.46.0          
#> [107] Rtsne_0.17                shape_1.4.6.1            
#> [109] ReactomePA_1.50.0         filelock_1.0.3           
#> [111] INetTool_0.1.1            data.tree_1.1.0          
#> [113] sqldf_0.4-11              pkgconfig_2.0.3          
#> [115] spatstat.univar_3.1-4     ggpubr_0.6.1             
#> [117] aplot_0.2.8               alabaster.base_1.6.1     
#> [119] spatstat.sparse_3.1-0     ape_5.8-1                
#> [121] viridisLite_0.4.2         xtable_1.8-4             
#> [123] car_3.1-3                 plyr_1.8.9               
#> [125] httr_1.4.7                tools_4.4.2              
#> [127] globals_0.18.0            SeuratObject_5.1.0       
#> [129] beeswarm_0.4.0            broom_1.0.8              
#> [131] nlme_3.1-152              dbplyr_2.5.0             
#> [133] ExperimentHub_2.14.0      r2r_0.1.2                
#> [135] digest_0.6.37             qpdf_1.4.1               
#> [137] bookdown_0.43             farver_2.1.2             
#> [139] tzdb_0.5.0                reshape2_1.4.4           
#> [141] ks_1.15.1                 yulab.utils_0.2.0        
#> [143] viridis_0.6.5             glue_1.8.0               
#> [145] cachem_1.1.0              BiocFileCache_2.14.0     
#> [147] polyclip_1.10-7           generics_0.1.4           
#> [149] Biostrings_2.74.1         mvtnorm_1.3-3            
#> [151] proto_1.0.0               parallelly_1.45.1        
#> [153] RcppHNSW_0.6.0            ragg_1.4.0               
#> [155] ScaledMatrix_1.14.0       carData_3.0-5            
#> [157] pbapply_1.7-4             httr2_1.1.2              
#> [159] glmnet_4.1-10             spam_2.11-1              
#> [161] gson_0.1.0                STRINGdb_2.18.0          
#> [163] graphlayouts_1.2.2        gtools_3.9.5             
#> [165] readxl_1.4.5              alabaster.se_1.6.0       
#> [167] ggsignif_0.6.4            gridExtra_2.3            
#> [169] shiny_1.11.1              GenomeInfoDbData_1.2.13  
#> [171] R.utils_2.13.0            rhdf5filters_1.18.1      
#> [173] RCurl_1.98-1.17           memoise_2.0.1            
#> [175] rmarkdown_2.29            fmsb_0.7.6               
#> [177] scales_1.4.0              R.methodsS3_1.8.2        
#> [179] gld_2.6.7                 gypsum_1.2.0             
#> [181] future_1.58.0             RANN_2.6.2               
#> [183] spatstat.data_3.1-6       rstudioapi_0.17.1        
#> [185] cluster_2.1.2             perturbR_0.1.3           
#> [187] spatstat.utils_3.1-5      hms_1.1.3                
#> [189] fitdistrplus_1.2-4        cowplot_1.2.0            
#> [191] colorspace_2.1-1          rlang_1.1.6              
#> [193] GENIE3_1.28.0             sparseMatrixStats_1.18.0 
#> [195] DelayedMatrixStats_1.28.1 dotCall64_1.2            
#> [197] ggforce_0.5.0             ggtangle_0.0.7           
#> [199] xfun_0.52                 alabaster.matrix_1.6.1   
#> [201] e1071_1.7-16              iterators_1.0.14         
#> [203] randomForest_4.7-1.2      GOSemSim_2.32.0          
#> [205] tibble_3.3.0              treeio_1.30.0            
#> [207] Rhdf5lib_1.28.0           readr_2.1.5              
#> [209] bitops_1.0-9              promises_1.3.3           
#> [211] RSQLite_2.4.2             qvalue_2.38.0            
#> [213] fgsea_1.32.4              proxy_0.4-27             
#> [215] GO.db_3.20.0              compiler_4.4.2           
#> [217] alabaster.ranges_1.6.0    forcats_1.0.0            
#> [219] distributions3_0.2.2      boot_1.3-28              
#> [221] beachmat_2.22.0           graphite_1.52.0          
#> [223] listenv_0.9.1             Rcpp_1.1.0               
#> [225] AnnotationHub_3.14.0      BiocSingular_1.22.0      
#> [227] tensor_1.5.1              MASS_7.3-65              
#> [229] BiocParallel_1.40.2       spatstat.random_3.4-1    
#> [231] R6_2.6.1                  fastmap_1.2.0            
#> [233] fastmatch_1.1-6           rstatix_0.7.2            
#> [235] vipor_0.4.7               ROCR_1.0-11              
#> [237] rsvd_1.0.5                gtable_0.3.6             
#> [239] KernSmooth_2.23-20        miniUI_0.1.2             
#> [241] deldir_2.0-4              htmltools_0.5.8.1        
#> [243] bit64_4.6.0-1             spatstat.explore_3.5-2   
#> [245] lifecycle_1.0.4           sass_0.4.10              
#> [247] vctrs_0.6.5               spatstat.geom_3.5-0      
#> [249] DOSE_4.0.1                haven_2.5.5              
#> [251] ggfun_0.2.0               sp_2.2-0                 
#> [253] future.apply_1.20.0       pracma_2.4.4             
#> [255] bslib_0.9.0               pillar_1.11.0            
#> [257] gplots_3.2.0              jsonlite_2.0.0           
#> [259] expm_1.0-0                chron_2.3-62