Here CancerInSilico method will be demonstrated clearly and hope that this document can help you.
Before simulating datasets, it is important to estimate some essential parameters from a real dataset in order to make the simulated data more real. If you do not have a single-cell transcriptomics count matrix now, you can use the data collected in simmethods package by simmethods:data command.
library(simmethods)
library(SingleCellExperiment)
# Load data
ref_data <- simmethods::data
dim(ref_data)
# [1] 4000  160
Using simmethods::CancerInSilico_estimation command to execute the estimation step.
estimate_result <- simmethods::CancerInSilico_estimation(ref_data = ref_data,
                                                         verbose = T,
                                                         seed = 10)
# Estimating parameters using CancerInSilico
After estimating parameter from a real dataset, we will simulate a dataset based on the learned parameters with different scenarios.
The reference data contains 160 cells and 4000 genes, if we simulate datasets with default parameters and then we will obtain a new data which has the same size as the reference data.
simulate_result <- simmethods::CancerInSilico_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "SCE",
  seed = 111
)
# 
# time = 0.00
# size = 1
# time = 1.00
# size = 1
# nCells: 160
# nGenes: 4000
SCE_result <- simulate_result[["simulate_result"]]
dim(SCE_result)
# [1] 4000  160
head(colData(SCE_result))
# DataFrame with 6 rows and 1 column
#         cell_name
#       <character>
# Cell1       Cell1
# Cell2       Cell2
# Cell3       Cell3
# Cell4       Cell4
# Cell5       Cell5
# Cell6       Cell6
head(rowData(SCE_result))
# DataFrame with 6 rows and 1 column
#         gene_name
#       <character>
# Gene1       Gene1
# Gene2       Gene2
# Gene3       Gene3
# Gene4       Gene4
# Gene5       Gene5
# Gene6       Gene6
In CancerInSilico, we can set nCells to specify the number of cells.
Here, we simulate a new dataset with 2000 cells:
simulate_result <- simmethods::CancerInSilico_simulation(
  parameters = estimate_result[["estimate_result"]],
  return_format = "list",
  other_prior = list(nCells = 2000),
  seed = 111
)
# 
# time = 0.00
# size = 1
# time = 1.00
# size = 1
# nCells: 2000
# nGenes: 4000
result <- simulate_result[["simulate_result"]][["count_data"]]
dim(result)
# [1] 4000 2000