easypheno.simulate.synthetic_phenotypes
Module Contents
Functions
|
Remove duplicate SNPs, i.e. SNPs that are completely the same for all samples and therefore do not add information. |
|
Simulate phenotypes based on (real) genotypes in an additive setting with normally or gamma distributed noise and |
|
Check which ids were already used for simulations. |
|
save overview file for all simulations; append new simulations if file already exists |
|
Set all variables and generate one or more simulations with same configurations. |
- easypheno.simulate.synthetic_phenotypes.filter_duplicates(X, snp_ids)
Remove duplicate SNPs, i.e. SNPs that are completely the same for all samples and therefore do not add information.
- Parameters
X (numpy.array) – genotype matrix to be filtered
snp_ids (numpy.array) – vector containing corresponding SNP ids
- Returns
filtered genotype matrix and filtered SNP ids
- Return type
(numpy.array, numpy.array)
- easypheno.simulate.synthetic_phenotypes.get_simulation(X, sample_ids, snp_ids, number_of_samples, number_causal_snps, explained_variance, maf, heritability, seed, number_background_snps, distribution, shape)
Simulate phenotypes based on (real) genotypes in an additive setting with normally or gamma distributed noise and normally distributed effect sizes of causal SNPs.
- Parameters
X (numpy.array) – genotype matrix in additive encoding
sample_ids (numpy.array) – sample ids of genotype matrix
snp_ids (numpy.array) – SNP ids of genotype matrix
number_of_samples (int) – number of samples of synthetic phenotype
number_causal_snps (int) – number of SNPs used as causal markers in simulation
explained_variance (int) – percentage value of how much of the total variance the causal SNPs should explain
maf (int) – percentage value used for maf filtering of genotype matrix
heritability (int) – percentage value of how much of the variance should be explained by polygenic background
seed (int) – seed for random sampling
number_background_snps (int) – number of randomly selected SNPs to simulate the polygenic background
distribution (str) – probability distribution used to draw random noise can be ‘normal’ or ‘gamma’
shape (float) – only needed if distribution is ‘gamma’
- Returns
simulated phenotype with corresponding sample ids, SNP ids of causal SNPs, SNP ids of background SNPs,
- Return type
(numpy.array, numpy.array, numpy.array, numpy.array, numpy.array, numpy.array, numpy.array)
effect sizes of background, effect sizes of causal SNPs, used explained variance for each causal SNP
- easypheno.simulate.synthetic_phenotypes.check_sim_id(sim_dir)
Check which ids were already used for simulations.
- Parameters
sim_dir (pathlib.Path) – directory containing simulations to check
- Returns
last simulation number + 1
- Return type
- easypheno.simulate.synthetic_phenotypes.save_sim_overview(save_dir, sim_names, number_of_samples, number_causal_snps, explained_variance, maf, heritability, seeds, number_background_snps, distribution, shape)
save overview file for all simulations; append new simulations if file already exists
- Parameters
save_dir (pathlib.Path) – directory to save overview file to
sim_names (list) – list containing simulation name for each simulation
number_of_samples (list) – list containing number of samples for each simulation
number_causal_snps (list) – list containing number of causal SNPS for each simulation
explained_variance (list) – list containing total explained variance of causal SNPs for each simulation
maf (list) – list containing used maf frequency for each simulation
heritability (list) – list containing used heritability for each simulation
seeds (list) – list containing used seed for each simulation
number_background_snps (list) – list containing number of background SNPs for each simulation
distribution (list) – list containing used distribution of random noise for each simulation
shape (list) – list containing shape of gamma distribution, resp. None for normal distribution for each simulation
- easypheno.simulate.synthetic_phenotypes.save_simulation(save_dir, genotype_matrix_name, number_of_sim, X, sample_ids, snp_ids, number_of_samples, number_causal_snps, explained_variance, maf, heritability, seed, number_background_snps, distribution, shape)
Set all variables and generate one or more simulations with same configurations. Save overview file and simulated phenotypes to subfolder ‘genotype_matrix_name’ in save_dir as ‘Simulations_Overview.csv’ and Simulation_{sim_id}.csv. Save SNP ids of background SNPs, effect sizes/betas of background SNPs and configuration infos containing SNP ids and betas of causal SNPs to subfolder sim_configs in ‘genotype_matrix_name’ as ‘background_{sim_id}.csv’, ‘betas_background_{sim_id}.csv’ and ‘simulation_config_{sim_id}.csv’. If only one phenoype is simulated, the sim_id consists of a single number. If several phenotypes are simulated with the same configurations, then the sim_id is the number of the first simulation ‘-’ number of last simulation, e.g. ‘10-15’
- Parameters
save_dir (str) – directory to save simulations to
genotype_matrix_name (str) – name of genotype matrix to be used for simulations, needed to create subfolder in save_dir
number_of_sim (int) – number of simulations to create with same configurations
X (numpy.array) – genotype matrix in additive encoding
sample_ids (numpy.array) – sample ids of genotype matrix
snp_ids (numpy.array) – SNP ids of genotype matrix
number_of_samples (int) – number of samples of synthetic phenotype
number_causal_snps (int) – number of SNPs used as causal markers in simulation
explained_variance (int) – percentage value of how much of the total variance the causal SNPs should explain
maf (int) – percentage value used for maf filtering of genotype matrix
heritability (int) – percentage value of how much of the variance should be explained by polygenic background
seed (int) – seed for random sampling
number_background_snps (int) – number of randomly selected SNPs to simulate the polygenic background
distribution (str) – probability distribution used to draw random noise can be ‘normal’ or ‘gamma’
shape (float) – only needed if distribution is ‘gamma’