An introduction to Spatial Transcriptomics approaches

Spatial transcriptomics is a rapidly evolving field that aims to provide a spatially resolved gene expression profile of a tissue or organ. This technology has the potential to advance our understanding of complex biological processes and help identify new biomarkers for disease diagnosis and treatment. The main goal of spatial transcriptomics is to capture the gene expression profile of individual cells (or a mini mixture of cells in a given region) in their native tissue context, allowing for the identification of cell types and their spatial distribution. This information can then be used to create detailed maps of gene expression within tissues, providing insights into cellular interactions, developmental processes, and disease progression. In this notebook, we will cover practical steps in setting up a spatial transcriptomics analysis pipeline using the Seurat package. We will cover the basic analysis to recover gene expression in different regions as well as cell type deconvolution approaches.



Spatial transcriptomics

Tutorial links

Explore the following tutorials to get started with spatial transcriptomics analysis:

  • Spatial Features STUtilit. This tutorial covers the extraction and analysis of spatial features using the STUtility package.
  • Seurat Spatial Vignette Learn. How to perform spatial transcriptomics analysis using the Seurat package.
  • Quality Control STUtility. This tutorial focuses on quality control steps for spatial transcriptomics data using the STUtility package.
  • Download the Raw Data and Code Used in the Paper

    To access the raw data and the code used in the relevant paper, visit the following link: Zenodo Dataset


    These resources will provide you with a comprehensive understanding of spatial transcriptomics and guide you through the analysis process.

    
    # Downloads the script from the specified URL and saves it as "add_cranapt_jammy.sh"
    download.file("https://github.com/eddelbuettel/r2u/raw/master/inst/scripts/add_cranapt_jammy.sh", 
                  "add_cranapt_jammy.sh")
    Sys.chmod("add_cranapt_jammy.sh", "0755")  # Grants execute permissions (rwxr-xr-x) to the script
    system("./add_cranapt_jammy.sh")  # Executes the downloaded script in the system shell
                    
    
    # Installs various development libraries for text rendering, font management, and cryptographic operations.
    system("apt install libfontconfig1-dev libharfbuzz-dev libfribidi-dev libcairo2-dev libgmp-dev libcurl3-gnutls -y")  
    
    # Installs image processing libraries (JPEG, PNG, TIFF) and the GNU Scientific Library (GSL).
    system("apt install libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libbz2-dev libgsl-dev gsl-bin -y")  
    
    system("apt update")  # Updates the package list to ensure the latest versions are available.
    
    # Installs ImageMagick development files, required for image manipulation.
    system("apt install libmagick++-dev -y")
                    
    
    # Returns details about the installed R version, including major/minor versions, release date, and system architecture.
    R.Version()
                    
    
    # Sets a higher timeout limit to avoid interruptions during package downloads.
    options(timeout=1000)
    
    # Installs the "STUtility" package from GitHub with forced upgrade
    remotes::install_github("jbergenstrahle/STUtility", upgrade=T, force = TRUE)  
    
    # Installs "SPOTlight" from GitHub without upgrading existing dependencies.
    remotes::install_github("https://github.com/MarcElosua/SPOTlight", upgrade=F)  
    
    # Checks if "BiocManager" is installed; if not, installs it
    if (!require("BiocManager", quietly = TRUE))  
        install.packages("BiocManager", quiet = T)  
    
    # Installs "ReactomePA" for pathway enrichment analysis.
    BiocManager::install("ReactomePA", update = TRUE, quiet = T)  
    
    # Installs "ggpubr", an extension of ggplot2 for easy publication-ready plots
    install.packages("ggpubr")
    
    # Installs "clustree", a package for visualizing cluster trees
    install.packages("clustree", quiet = T)
    
    # Installs the human gene annotation database from Bioconductor
    BiocManager::install("org.Hs.eg.db", update = FALSE, quiet = T)  
    
    # Installs "enrichR" for gene enrichment analysis
    install.packages("enrichR", quiet = T)
    
    # Installs "SingleCellExperiment" for single-cell RNA-seq analysis.
    BiocManager::install('SingleCellExperiment', update = FALSE, quiet = T)
    
    # Installs "hdf5r", used for reading and writing HDF5 files.
    install.packages("hdf5r", quiet = T)
    
    # Installs multiple packages: DT (interactive tables), ggcorrplot (correlation plots), and scatterpie (scatter plots with pie charts).
    install.packages(c('DT','ggcorrplot','scatterpie'), quiet = T)  
    
    # Installs a specific version (1.5-3) of the "Matrix" package directly from CRAN archives.
    install.packages("https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.5-3.tar.gz", repos=NULL, type="source")
                    
    
    # Suppresses package loading messages to keep output clean.
    suppressPackageStartupMessages({  
    library(Seurat)  # Single-cell RNA-seq analysis framework.
    library(data.table)  # Efficient handling of large datasets.
    library(ggplot2)  # Grammar of graphics-based visualization.
    library(plotly)  # Interactive visualizations.
    library(RColorBrewer)  # Predefined color palettes.
    library(dplyr)  # Data manipulation package.
    library(STutility)  # Spatial transcriptomics toolkit.
    library(clustree)  # Visualization of cluster resolutions.
    library(ReactomePA)  # Pathway analysis.
    library(org.Hs.eg.db)  # Human gene annotation database.
    library(ggpubr)  # Publication-ready visualizations.
    library(enrichR)  # Gene enrichment analysis.
    library(stringr)  # String manipulation.
    library(patchwork)  # Combine multiple plots.
    library(SingleCellExperiment)  # Framework for single-cell data.
    library(SPOTlight)  # Deconvolution of spatial transcriptomics data.
    })
                    

    Download the data

    
    # Executes shell command and captures output.
    shell_call <- function(command, ...) {  
      result <- system(command, intern = TRUE, ...) 
      cat(paste0(result, collapse = "\n"))  # Prints the output in a readable format.
    }
                    
    
    # Prevents timeouts for large downloads.
    options(timeout=1000)
    
    # Downloads a ZIP file containing spatial transcriptomics exercises.
    download.file('https://iauchile-my.sharepoint.com/:u:/g/personal/adolfo_rh_postqyf_uchile_cl/EfjJEURo--BOiy13A9w1FwgBcFnFIUTccVr4F6G-hBcHBg?e=VTLZb7&download=1', 
                  'ST_Exercises.zip')  
    
    # Lists files in the current directory with human-readable file sizes.
    shell_call("ls -lh")
    
    # Unzips the downloaded file.
    shell_call("unzip ST_Exercises.zip")
                    

    RECOVER THE PATHOLOGIST ANNOTATIONS

    Load the data set

    
    # Loads a Seurat object.
    load(file = "ST_Exercises/Exercise1_dataset.RData")
    # Get some informations about the dataset
    # This command retrieves the current default assay for the Seurat object Her2p
    DefaultAssay(Her2p)
    # Displays the number of features (genes) and cells.
    dim(Her2p)
    # This command generates a frequency table of the ids column in the metadata slot of the Seurat object Her2p.
    table(Her2p@meta.data$ids)
                    

    Quality control (QC) plots

    Create the color palette

    
    # Create a color palette function using the colorRampPalette function from the grDevices package, and colors from the "Set1" palette from the RColorBrewer package.
    getPalette = colorRampPalette(brewer.pal(8, "Set1"))
    color = getPalette(6)
    VlnPlot(Her2p, # Generates a violin plot
           features = "nCount_RNA", # Specifies the feature to plot
           group.by = "ids", # Groups the data by the ids metadata column.
           pt.size = 0.1, # Sets the size of the points in the plot.
           cols = color) + # Specifies the colors to use for the different groups.
      stat_summary(fun.y=mean, geom="point", shape=95, size=15, color = "black") + NoLegend()
      # Adds a layer to the plot using stat_summary to calculate and display the mean of the feature.
                    

    Plot the metrics

    
    # Violin plot with mean values added.
    VlnPlot(Her2p, features = "nCount_RNA", group.by = "ids", 
            pt.size = 0.1, cols = color) +
      stat_summary(fun.y=mean, geom="point", shape=95, 
                  size=15, color = "black") + NoLegend()
    
    # Scatter plot comparing RNA counts to the number of features (genes).
    FeatureScatter(object = Her2p, feature1 = "nCount_RNA", 
    feature2 = "nFeature_RNA", group.by = "ids", cols = color)
    
    # Overlays RNA count distribution on spatial coordinates.
    FeatureOverlay(Her2p, features = c("nCount_RNA"),
                   sampleids = 1:3,
                   pt.size = 2.5,
                   add.alpha = TRUE,
                   ncol = 3, min.cutoff = 0, max.cutoff = 2000)
                    
    
    #Exercice: Save the image to a file
                    

    NOTE: Normalization, PCA, dimensionality reduction, and UMAP already applied on this object

    Clustering

    
    # Display the installed version of the Matrix package
    packageVersion("Matrix")
                    
    
    # Perform the Seurat graph-based clustering
    # Find the nearest neighbors for each cell using Non-negative Matrix Factorization (NMF)
    Her2p <- FindNeighbors(object = Her2p, 
                          dims = 1:10, # Indicates that the first 10 dimensions from the specified reduction method will be used.
                         reduction = "NMF", # Specifies that Non-negative Matrix Factorization (NMF) was used as the dimensionality reduction method.
                         verbose = FALSE) #Suppresses the detailed output
    
    # Specifies a sequence of resolutions ranging from 0 to 2, with an increment of 0.1
    Her2p <- FindClusters(Her2p, resolution = seq(0, 2, by = 0.1),
                          verbose = FALSE)
    
    # Visualize clustering results at different resolutions using Clustree
    clustree(Her2p, prefix = "SCT_snn_res."))
                    
    
    # Create a UMAP plot grouped by "ids", without cluster labels
    color = getPalette(9) # Generates a vector of 9 colors using the getPalette function defined earlier.
    DimPlot(Her2p, reduction = "umap", label = FALSE, group.by = "ids",
            pt.size = 2, cols=color, label.size=13)
                    
    
    # Plot only the slide 3 of the patient G
    # Overlay the expression of one or more features onto spatial coordinates.
    FeatureOverlay(Her2p, # The Seurat object containing your data.
                   features = c("SCT_snn_res.0.5"), # Specifies the features to overlay on the spatial coordinates. Clustering resolution 0.5
                   sampleids = 2, # Indicates that only the slide with ID 2 should be used for this plot.
                   pt.size = 3.5,
                   cols = color,
                   add.alpha = TRUE, # Add transparency to points
                   ncol = 1)
                    
    
    # Plot only the 3 slide of the patient G
    FeatureOverlay(Her2p, features = c("SCT_snn_res.0.4"),
                   sampleids = 1:3, # Use slides 1 to 3
                   pt.size = 3.5,
                   cols = color,
                   add.alpha = TRUE,
                   ncol = 3) # Arrange plots in 3 columns
                    
    
    #Exercice: Save the image to a file
                    
    
    # Plot the clusters without the HE
    ST.FeaturePlot(object = Her2p, features = "SCT_snn_res.0.4", 
                   cols = color, pt.size = 4, ncol = 3)
                    
    
    # Split the view for each cluster
    ST.FeaturePlot(object = Her2p, features = "SCT_snn_res.0.4", pt.size = 4,
                   split.labels = T, # Split visualization by cluster labels
                   indices = 2, # Select a specific set of indices
                   show.sb = FALSE, # Hide scale bars
                   ncol = 4, 
                   cols = color)
                    

    Differentially expressed genes

    
    DefaultAssay(Her2p) # Checks the active assay for the Seurat object Her2p
    Idents(Her2p) <- "SCT_snn_res.0.4" # Sets the identity class for the Seurat object Her2p based on the clustering resolution "SCT_snn_res.0.4".
    
    # Comparison of all clusters vs all clusters
    Her2p.markers <- FindAllMarkers(object = Her2p, # Identify DEGS in Her2p
                                    only.pos = TRUE, # Only positive markers genes that are upregulated in the cluster
                                    min.pct = 0.10, # Genes must be expressed in at least 10% of cells in either of the two groups being compared
                                    logfc.threshold = 0.10) # Set threshold for identifying DEGs
    
    # Filter markers with an adjusted p-value < 0.05
    Her2p.markers = Her2p.markers[which(Her2p.markers$p_val_adj<0.05),] # Filters the identified markers to retain only those with an adjusted p-value
    
    # Count the number of DEGs per cluster
    table(Her2p.markers[, "cluster"]) #How many DEGs per cluster ?
                    
    
    # Select the top 10 DEG of each cluster
    Her2p.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_log2FC) -> top10
    
    # Generate a dot plot for the selected genes
    DotPlot(Her2p, features = unique(top10$gene), 
            group.by = "SCT_snn_res.0.4", cols = c('#b8d8d8', '#e71d36'),
            dot.scale = 6, col.min = 0) + 
            theme(axis.text.x = element_text(face = "bold", color = c("black"), 
            size = 15,angle = 90))
                    
    
    #Exercice: Save the image to a file
                    
    
    # Plot key genes in the ridge plot
    RidgePlot(Her2p, assay = "SCT", 
              features = c("IFI27","IFI6"), # Genes of interest
              ncol = 2, group.by = "SCT_snn_res.0.4", 
              cols = color)
                    
    
    # UMAP and Heatmap visualization
    # Define color gradient for heatmap
    heatmap.colors <- c("lightgray", "mistyrose", "red", "darkred", "black")
    fts <- c("EPN2", "AGR3") # List of genes to visualize
    # Generate UMAP plots for selected features
    p.fts <- lapply(fts, function(ftr) {
      FeaturePlot(Her2p, features = ftr, reduction = "umap", order = TRUE, cols = heatmap.colors, pt.size = )
    })
    # Ṕlot transformed features expression on Visium coordinates
    p3 <- ST.FeaturePlot(Her2p, features = fts, ncol = 2, grid.ncol = 1, cols = heatmap.colors, pt.size = 2, show.sb = FALSE)
    # Combine all plots into one figure
    cowplot::plot_grid(cowplot::plot_grid(plotlist = p.fts, ncol = 1), p3, ncol = 2, rel_widths = c(1, 1.3))
                    
    
    # Generate a violin plot for the expression of the genes "EPN2" and "AGR3"  
    # in the Seurat object 'Her2p', grouping by the clustering resolution "SCT_snn_res.0.4".  
    VlnPlot(Her2p, features = c("EPN2", "AGR3"),   
            group.by = "SCT_snn_res.0.4", # Define grouping variable (cluster resolution 0.4)  
            pt.size = 0,  # Set point size to 0 to avoid plotting individual points  
            cols = color, # Use predefined colors for groups  
            ncol = 2) +  # Arrange plots in two columns  
    
      # Overlay a summary statistic (mean) as a horizontal bar  
      stat_summary(fun.y = mean, geom = "point", shape = 95,   
                   size = 15, color = "black") +  
    
      # Remove the legend from the plot  
      NoLegend()  
                    

    3D visualization

    It doesn't run in colab

    
    # Update image paths in the Seurat object
    Her2p@tools$Staffli@imgs <- gsub("//", "/", Her2p@tools$Staffli@imgs)
    Her2p@tools$Staffli@imgs <- gsub("../data/", "ST_Exercises/her2st-master/data/", Her2p@tools$Staffli@imgs)
                    
    
    # Generate a 3D stack of spatial transcriptomics data
    Her2p <- Create3DStack(Her2p)
    
    # Extract scatter plot data for spatial visualization
    stack_3d <- setNames(GetStaffli(Her2p)@scatter.data, c("x", "y", "z", "grid.cell"))
    
    # Visualize the spatial distribution of points across z-sections
    ggplot(stack_3d, aes(x, 2e3 - y)) +
      geom_point(size = 0.1, color = "gray") +
      facet_wrap(~z, ncol = 1) +
      theme_void()
    
    # Interpolate data for 3D visualization of gene expression
    interpolated.data <- FeaturePlot3D(Her2p, features = "IFI6", return.data = TRUE)
    
    # Plot interpolated gene expression levels
    ggplot(interpolated.data, aes(x, 2e3 - y, color = val)) +
      geom_point(size = 0.1) +
      facet_wrap(~z, ncol = 1) +
      theme_void() +
      ggtitle("IFI6") +
      scale_color_gradientn(colours = c("black", "dark blue", "cyan", "yellow", "red", "dark red"))
    
    # 3D visualization based on HE nuclei estimation
    FeaturePlot3D(Her2p, features = "IFI6", pt.size = 1.6, pts.downsample = 5e4)
    
    # Uncomment below to visualize clusters in 3D using HSV plot
    # selected.factors <- paste0("factor_", c(1:10))
    # HSVPlot3D(Her2p, features = selected.factors, pt.size = 3, add.margins = 1, mode = "spots")
                    

    Gene Ontology (GO) Terms

    
    # Select differentially expressed genes (DEGs) from cluster 6
    geneList = Her2p.markers[which(Her2p.markers$cluster == 6), "gene"]
    length(geneList)  # Print the number of selected genes
    
    # Convert Gene Symbols to Entrez IDs
    columns(org.Hs.eg.db)  # Display available conversion columns in the annotation database
    
    symbol <- mapIds(org.Hs.eg.db,
                     keys = geneList,      # List of gene symbols to convert
                     column = "ENTREZID",  # Convert to Entrez IDs
                     keytype = "SYMBOL",   # Input type: gene symbol
                     multiVals = "first")  # In case of multiple matches, take the first
    
    symbol = as.vector(symbol)  # Convert to vector format
    
    # Remove genes with missing Entrez IDs
    geneList = symbol
    geneList = geneList[which(geneList != "NA")]
    length(geneList)  # Print the number of genes after filtering
    
    # Perform pathway enrichment analysis using ReactomePA
    x <- enrichPathway(gene = geneList, 
                       organism = "human", 
                       pvalueCutoff = 0.05, 
                       readable = TRUE, 
                       pAdjustMethod = "bonferroni")
    
    head(as.data.frame(x))  # Show the first few results
    
    # Order results by adjusted p-value (ascending)
    x@result = x@result[order(x@result$p.adjust, decreasing = FALSE),]
    
    # Select the top 30 most significant pathways
    x@result = x@result[1:30,]
    
    # Order pathways by the number of genes involved (ascending)
    x@result = x@result[order(x@result$Count, decreasing = FALSE),]
    
    # Store the ordered results in a new data frame
    newbar.dt = x@result
    
    # Create a bar plot of enriched pathways
    cluster6_enrichpathway <- ggbarplot(newbar.dt, x = "Description", y = "Count",
              fill = "Count",  # Color bars based on the count of genes
              color = "white",  # Set bar border colors to white
              sort.val = "asc",  # Sort by value in ascending order
              sort.by.groups = FALSE,  # Do not sort within each group
              x.text.angle = 90,  # Rotate x-axis labels for better visibility
              xlab = "Pathways",
              ylab = "Number of genes",
              legend.title = "Counts",
              rotate = TRUE,
              ggtheme = theme_minimal()
    ) + scale_fill_continuous(low = "#bde0fe", high = color[4]) + 
      theme(text = element_text(size = 20))
                    

    Pathways analysis with EnrichR

    
    # List available databases in EnrichR
    listEnrichrSites()
    dbs <- listEnrichrDbs()
    
    # Sort the databases by name
    dbs %>% dplyr::arrange(libraryName)
    
    # Select specific pathway databases for enrichment analysis
    dbs <- c("Cancer_Cell_Line_Encyclopedia",
             "Elsevier_Pathway_Collection",
             "KEGG_2021_Human",
             "CellMarker_Augmented_2021",
             "Reactome_2016",
             "GO_Biological_Process_2018",
             "GO_Cellular_Component_2018",
             "GO_Molecular_Function_2018",
             "InterPro_Domains_2019")
    
    # EnrichR requires gene symbols (not Ensembl IDs)
    geneList = Her2p.markers[which(Her2p.markers$cluster == 6), "gene"]
    length(geneList)  # Print the number of selected genes
    
    # Perform enrichment analysis with EnrichR
    enriched.Paths <- enrichr(geneList, dbs)
    
    # Display results from selected databases
    enriched.Paths[[1]]  # Cancer_Cell_Line_Encyclopedia
    enriched.Paths[[2]]  # Elsevier_Pathway_Collection
    enriched.Paths[[3]]  # KEGG_2021_Human
    enriched.Paths[[4]]  # CellMarker_Augmented_2021
    enriched.Paths[[5]]  # Reactome_2016
    enriched.Paths[[6]]  # GO_Biological_Process_2018
    enriched.Paths[[7]]  # GO_Cellular_Component_2018
    enriched.Paths[[8]]  # GO_Molecular_Function_2018
    enriched.Paths[[9]]  # InterPro_Domains_2019
    
    # Select the pathway results from CellMarker_Augmented_2021
    Rpath = enriched.Paths[[4]]
    
    # Count the number of genes per pathway
    Gene_Count = c()
    for(x in 1:dim(Rpath)[1]){
      Gene_Count = c(Gene_Count, str_split(Rpath$Overlap[x], pattern = "/" )[[1]][1])
    }
    
    Rpath$Gene_Count = as.numeric(Gene_Count)
    
    # Order by adjusted p-value (ascending)
    Rpath = Rpath[order(Rpath$Adjusted.P.value, decreasing = FALSE),]
    
    # Filter significant pathways (adjusted p-value < 0.05)
    Rpath = Rpath[Rpath$Adjusted.P.value < 0.05,]
    
    # Select top 20 pathways
    Rpath = Rpath[1:20,]
    
    # Order pathways by gene count (descending)
    Rpath = Rpath[order(Rpath$Gene_Count, decreasing = TRUE),]
    
    # Create a bar plot of enriched pathways
    cluster6_enrichR <- ggbarplot(Rpath, x = "Term", y = "Gene_Count",
              fill = "Gene_Count",  # Color bars based on gene count
              color = "white",  # Set bar border colors to white
              sort.val = "asc",  # Sort by value in ascending order
              sort.by.groups = FALSE,  # Do not sort within each group
              x.text.angle = 90,  # Rotate x-axis labels for better visibility
              xlab = "Pathways",
              ylab = "Number of genes",
              legend.title = "Counts",
              rotate = TRUE,
              ggtheme = theme_minimal()
    ) + scale_fill_continuous(low = "#bde0fe", high = color[4]) + 
      theme(text = element_text(size = 18))
    
    # Display the plot
    cluster6_enrichR
                    

    CELL TYPES DECONVOLUTION

    Load the SC dataset

    
    # Load scRNA-seq dataset
    SC.data = readRDS("ST_Exercises/scRNASeq_pac2_processed.rds")
    
    # Set cell identities based on cell type
    Idents(SC.data) = "cellType"
    
    # UMAP visualization of cell types
    DimPlot(object = SC.data, reduction = 'umap', label = TRUE, label.size = 6,
            group.by = "cellType", pt.size = 1.5)
    
    # Identify DEGs across all clusters
    SC.markers <- FindAllMarkers(object = SC.data, only.pos = TRUE, min.pct = 0.10, logfc.threshold = 0.10) 
    # min.pct = 0.10: This argument sets the minimum percentage of cells expressing the gene to 10%.
    # logfc.threshold = 0.10: This sets the minimum log fold change threshold to 0.10 for a gene to be considered a marker.
    
    SC.markers = SC.markers[which(SC.markers$p_val_adj < 0.05 & SC.markers$avg_log2FC > 0.5),]
    # SC.markers$p_val_adj < 0.05: This condition selects markers with an adjusted p-value less than 0.05, meaning they are statistically significant.
    # SC.markers$avg_log2FC > 0.5: This condition selects markers with an average log2 fold change greater than 0.5
    
    # Count number of DEGs per cluster
    table(SC.markers$cluster)
    
    DefaultAssay(SC.data) # it returns the name of the assay currently set as the default for the Seurat object
    
    # Load 10X Visium FFPE sample
    SP = Load10X_Spatial(data.dir = "ST_Exercises/10X_Spatial_T_Breast_Block_A_Section1",
                         filename = "V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5")
                    

    QC plots and Normalization

    Normalization is a crucial preprocessing step in scRNA-seq data analysis. It involves adjusting the raw gene expression data to account for technical variations and differences in sequencing depth across cells. These comparisons should help to understand the strengths and weaknesses of each normalization method


    Method of Normalization


  • SCTransform: A more advanced method that uses negative binomial regression to remove technical variability and stabilize variance, providing a robust normalization for single-cell data
  • Log-Normalization: Scales gene expression values and applies a log transformation, which helps in stabilizing the variance and making the data more suitable for downstream analysis.

  • Removal of Technical Variability


  • SCTransform: More robust in removing technical variability, making it ideal for datasets with significant variation.
  • NormalizeData: Effective but less sophisticated than SCTransform in handling technical variability.

  • Complexity and Execution Time


  • SCTransform: More complex and time-consuming due to the regression calculations and variance-stabilizing transformation.
  • NormalizeData: Simpler and faster, making it suitable for quick analyses and large datasets.

  • Handling Covariates


  • SCTransform: Allows for the inclusion of covariates (e.g., number of genes detected per cell) for regression, improving the quality of normalized data.
  • NormalizeData: Does not typically incorporate covariates directly in the normalization process.

  • Variance Stabilization


  • SCTransform: Stabilizes variance, making the data more suitable for downstream analyses like clustering and marker identification.
  • NormalizeData: Uses log-transformation to partially stabilize variance but not as effectively as SCTransform.

  • Flexibility


  • SCTransform: Primarily focused on handling single-cell RNA-seq data with robust normalization methods.
  • NormalizeData: Flexible with different normalization methods (e.g., LogNormalize), allowing adjustments based on the specific needs of the analysis.

  • Scalability


  • SCTransform: Can handle large datasets but may be slower due to its complexity.
  • NormalizeData: Efficiently handles large datasets due to its simplicity and speed.
  • 
    # Plot number of reads per spatial spot
    plot1 <- VlnPlot(SP, features = "nCount_Spatial", pt.size = 0.1) + NoLegend() # create a violin plot
    plot2 <- SpatialFeaturePlot(SP, features = "nCount_Spatial") + theme(legend.position = "right") # Generates a spatial feature plot for the Seurat object
    wrap_plots(plot1, plot2) # This function from the patchwork package combines multiple plots (plot1 and plot2) into a single unified layout
    
    # Normalize data using SCTransform
    SP <- SCTransform(SP, assay = "Spatial", verbose = FALSE, return.only.var.genes = FALSE)
    
    # Normalize data using LogNormalize with a scale factor
    SP <- NormalizeData(object = SP, assay = "Spatial", normalization.method = "LogNormalize", 
                        scale.factor = median(SP@meta.data$nCount_Spatial))
    
    # Compare normalization methods (LogNormalize vs SCTransform)
    SP <- GroupCorrelation(SP, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
    SP <- GroupCorrelation(SP, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
    
    # Generate comparison plots
    p1 <- GroupCorrelationPlot(SP, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization")
    p2 <- GroupCorrelationPlot(SP, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization")
    
    # Display both plots side by side
    p1 + p2
                    

    Deconvolution Pipeline

    Deconvolution is a computational technique used to infer and quantify the proportions of different cell types within a mixed population of cells. Since scRNA-seq data often come from complex tissues containing multiple cell types, deconvolution helps to identify and isolate the gene expression profiles specific to each cell type within the mixture.


    Why is Deconvolution Important?


  • Identifying Cell Types: Deconvolution enables researchers to determine the presence and abundance of various cell types in a heterogeneous tissue sample.
  • Understanding Cellular Composition: It helps to elucidate the cellular composition of tissues, revealing how different cell types contribute to biological processes and disease states.
  • Improving Data Interpretation: By separating the gene expression signals of different cell types, deconvolution enhances the accuracy and interpretability of scRNA-seq data, leading to more precise biological insights.
  • 
    # Install required R packages (only run this if they are not installed)
    install.packages(c('DT', 'ggcorrplot', 'scatterpie'))
                    
    
    # Convert Seurat object to a SingleCellExperiment (SCE) object
    spe = SP  # Spatial transcriptomics dataset
    sce = as.SingleCellExperiment(SC.data)  # Convert scRNA-seq data to SCE format
    
    # Define highly variable genes (HVGs) using differentially expressed genes (DEGs)
    hvg = unique(SC.markers$gene)  # Select unique DEGs
    colLabels(sce) <- colData(sce)$cellType  # Assign cell type labels to the SCE object
    
    # Downsample cells to a maximum of 100 per cell identity
    # This avoids overrepresentation of any particular cell type
    idx <- split(seq(ncol(sce)), sce$cellType)  # Split cell indices by identity
    n_cells <- 100  # Define maximum number of cells per type
    
    cs_keep <- lapply(idx, function(i) {
      n <- length(i)
      if (n < n_cells)  # If a cell type has fewer than 100 cells, use all of them
        n_cells <- n
      sample(i, n_cells)  # Randomly sample cells
    })
    
    sce <- sce[, unlist(cs_keep)]  # Subset the dataset with the downsampled cells
                    
    
    # Perform spatial transcriptomics deconvolution using SPOTlight
    res <- SPOTlight(
      x = sce,  # scRNA-seq reference dataset
      y = spe@assays$SCT@counts,  # Spatial transcriptomics counts (SCT normalized)
      groups = sce$cellType,  # Cell type annotations from scRNA-seq
      mgs = SC.markers,  # Marker genes from scRNA-seq analysis
      hvg = hvg,  # Highly variable genes (DEGs)
      weight_id = "avg_log2FC",  # Weight based on log2 fold change
      group_id = "cluster",  # Cluster information
      gene_id = "gene"  # Gene column identifier
    )
    
    # Extract the resulting deconvolution matrix
    head(mat <- res$mat)[, seq_len(3)]  # Display first rows and first 3 columns
    
    # Check spatial transcriptomics metadata
    head(spe@meta.data)
    
    # Extract the Non-Negative Matrix Factorization (NMF) model fit
    mod <- res$NMF
                    
    
    # Display structure of the deconvolution matrix
    str(mat)
                    
    
    # Evaluate how specific each topic signature is for each cell identity
    plotTopicProfiles(
      x = mod,  # NMF model from SPOTlight
      y = sce$cellType,  # Cell type labels from scRNA-seq
      facet = FALSE,  # Show all in a single plot
      min_prop = 0.01,  # Minimum proportion to be plotted
      ncol = 1  # Number of columns in the plot
    ) +
      theme(aspect.ratio = 1)  # Maintain aspect ratio
                    
    
    # Ensure that all cells from the same cell identity share a similar topic profile
    plotTopicProfiles(
      x = mod,  # NMF model
      y = sce$cellType,  # Cell type labels
      facet = TRUE,  # Separate plots for each cell type
      min_prop = 0.01,  # Minimum proportion threshold
      ncol = 4  # Arrange plots in a grid with 4 columns
    )
                    
    
    # Extract the basis matrix from the NMF model
    # This matrix indicates which genes are most relevant for each topic
    sign <- NMF::basis(mod)
    colnames(sign) <- paste0("Topic", seq_len(ncol(sign)))  # Rename columns
    
    # Display the first few rows of marker genes for each topic
    head(sign)
    
    # Create an interactive table for exploring marker genes per topic
    DT::datatable(sign, fillContainer = TRUE, filter = "top")
    
    # Generate a spatial correlation matrix between cell types
    plotCorrelationMatrix(mat)
                    
    
    # Cell type names
    ct <- colnames(mat)
    
    # Set a threshold to filter out low proportions
    mat[mat < 0.1] <- 0
    
    # Define a color palette for cell types
    paletteMartin <- c("#000000", "#004949", "#009292", "#ff6db6", "#ffb6db", 
                       "#490092", "#006ddb", "#b66dff", "#6db6ff", "#b6dbff", 
                       "#920000", "#924900", "#db6d00", "#24ff24", "#ffff6d")
    
    # Create a gradient palette using the defined colors
    pal <- colorRampPalette(paletteMartin)(length(ct))
    names(pal) <- ct  # Assign colors to each cell type
                    
    
    # Convert the deconvolution matrix to a standard matrix format
    m2 = as.matrix(mat)
                    
    
    # Generate spatial transcriptomics plot with pie charts for cell proportions
    plotSpatialScatterpie(
      x = spe,  # Spatial transcriptomics object
      y = mat,  # Deconvolution results (cell type proportions)
      cell_types = unique(sce$cellType),  # Unique cell types
      img = FALSE,  # Do not overlay histology image
      scatterpie_alpha = 1,  # Full opacity
      pie_scale = 0.4  # Adjust pie chart size
    ) +
      scale_fill_manual(
        values = pal,  # Assign colors to cell types
        breaks = names(pal)  # Keep legend labels consistent
      )
                    
    
    # Save the spatial pie chart as a PDF file
    ggsave("Deconvolution.pdf", width = 12, height = 12)