Here Lun2 method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.
When you use Lun2 to estimate parameters from a real dataset, you must input a numeric vector to specify the batches or plates that each cell comes from, like other_prior = list(batch.condition = the numeric vector).
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
plates <- simmethods::group_condition
## plates can must be a numeric vector.
other_prior <- list(batch.condition = as.numeric(plates))
Using simmethods::Lun2_estimation command to execute the estimation step.
estimate_result <- simmethods::Lun2_estimation(ref_data = ref_data,
                                               other_prior = other_prior,
                                               verbose = T,
                                               seed = 10)
# Estimating parameters using Lun2
# Estimating number of groups...
# Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
# collapsing to unique 'x' values
# Computing normalisation factors...
# Warning in (function (x, sizes, min.mean = NULL, positive = FALSE, scaling =
# NULL) : encountered non-positive size factor estimates
# Estimating dispersions...
# Estimating gene means...
# Estimating plate effects...
# Estimating zero-inflated parameters...
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 9.2681e-48FALSE
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in sqrt(diag(vc)[np]): NaNs produced
# Warning in value[[3L]](cond): system is computationally singular: reciprocal
# condition number = 4.1192e-48FALSE
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data. In addtion, the simulated dataset will have one batch of cells.
simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# nCells: 160
# nGenes: 3973
# nPlates: 2
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 3973  160
head(colData(SCE_result))
# DataFrame with 6 rows and 2 columns
#         cell_name    plate
#       <character> <factor>
# Cell1       Cell1        1
# Cell2       Cell2        1
# Cell3       Cell3        1
# Cell4       Cell4        1
# Cell5       Cell5        1
# Cell6       Cell6        1
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
#         gene_name
#       <character>
# Gene1       Gene1
# Gene2       Gene2
# Gene3       Gene3
# Gene4       Gene4
# Gene5       Gene5
# Gene6       Gene6
In Lun2, we can not set nCells directly and should set cell.plates instead. For example, if we want to simulate 500 cells, we can type other_prior = list(cell.plates = rep(1, 500)). For genes, we can just set nGenes.
Here, we simulate a new dataset with 500 cells and 2000 genes:
simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(cell.plates = rep(1, 500),
                     nGenes = 2000),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 1
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.
The cell.plates parameter represents the sampling source of cells in real experiments.
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
In Lun2, we can not set nbatches directly and should set cell.plates instead. For example, if we want to simulate 2 batches, we can type other_prior = list(cell.plates  = sample(1:2, 500, replace = TRUE)). Note that the length of cell.plates numeric vector must be equal to the cell number.
For demonstration, we will simulate three batches using the learned parameters.
simulate_result <- simmethods::Lun2_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(cell.plates = sample(1:2, 500, replace = TRUE),
                     nGenes = 2000),
  seed = 111
)
# nCells: 500
# nGenes: 2000
# nPlates: 2
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of lib.sizes not equal to nCells. lib.sizes will be sampled.
# Warning in splatter::lun2Simulate(parameters, verbose = verbose, zinb = zinb):
# Number of gene parameters does not equal nGenes. Gene parameters will be
# sampled.
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 2000  500
## cell information
cell_info <- simulate_result[["simulate_result"]][["col_meta"]]
table(cell_info$plate)
# 
#   1   2 
# 236 264