Here POWSC method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data generated by scater::mockSCE command.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- counts(scater::mockSCE())
dim(ref_data)
# [1] 2000  200
Using simmethods::POWSC_estimation command to execute the estimation step.
estimate_result <- simmethods::POWSC_estimation(ref_data = ref_data,
                                                verbose = T,
                                                seed = 10)
# Estimating parameters using POWSC
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 200 cells and 2000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one group of cells.
simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 200
# nGenes: 2000
# nGroups: 2
# de.prob: 0.1
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 2000  200
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         cell_name       group
#       <character> <character>
# Cell1       Cell1      Group1
# Cell2       Cell2      Group1
# Cell3       Cell3      Group1
# Cell4       Cell4      Group1
# Cell5       Cell5      Group1
# Cell6       Cell6      Group1
head(rowData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         gene_name     de_gene
#       <character> <character>
# Gene1       Gene1          no
# Gene2       Gene2          no
# Gene3       Gene3         yes
# Gene4       Gene4          no
# Gene5       Gene5         yes
# Gene6       Gene6          no
In POWSC, we can set nCells directly. For example, if we want to simulate 500 cells, we can type other_prior = list(nCells = 500).
Here, we simulate a new dataset with 500 cells:
simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nGroups: 2
# de.prob: 0.1
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
POWSC will automatically simulate two cell groups by default. Users can set de.prob to specify the proportion of DEGs between two groups.
simulate_result <- simmethods::POWSC_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 500,
                     de.prob = 0.2),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nGroups: 2
# de.prob: 0.2
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$group)
# 
# Group1 Group2 
#    250    250
## gene information
gene_info <- simulate_result[["simulate_result"]][["row_meta"]]
### the proportion of DEGs
table(gene_info$de_gene)[2]/nrow(result) ## de.prob = 0.2
#    yes 
# 0.1895