easypheno.simulate.results_analysis_synthetic_data
Module Contents
Functions
|
Collect the information on the simulation configurations for all within the specified directory |
|
Collect the information on the feature importances for all models within the specified directory and for the specified datasplit maf pattern |
|
Get statistics on feature importances compared to effect sizes on synthetic data, e.g. on how many background SNPs were detected |
|
Generate scatterplots based on feature importances and effect sizes on synthetic data. One plot containing all models for the specified datasplit maf pattern as well as single plots for each model will be generated and saved. |
|
Analyze feature importances versus effect sizes on synthetic data, both by retrieving stastistics and generating plots |
- easypheno.simulate.results_analysis_synthetic_data.gather_sim_configs(sim_config_dir, save_dir)
Collect the information on the simulation configurations for all within the specified directory
- Parameters
sim_config_dir (pathlib.Path) – directory which contains the sim config files
save_dir (pathlib.Path) – directory to save the collected sim config info
- easypheno.simulate.results_analysis_synthetic_data.gather_feature_importances(results_dir, save_dir, datasplit_maf_pattern)
Collect the information on the feature importances for all models within the specified directory and for the specified datasplit maf pattern
- Parameters
results_dir (pathlib.Path) – results directory at the level of the name of the genotype matrix
save_dir (pathlib.Path) – directory to save the collected info
datasplit_maf_pattern (str) – datasplit maf pattern to search on
- easypheno.simulate.results_analysis_synthetic_data.get_statistics_featimps_vs_simulation(all_sim_configs, all_feat_imps, min_perc_threshold=0.01)
Get statistics on feature importances compared to effect sizes on synthetic data, e.g. on how many background SNPs were detected
- Parameters
all_sim_configs (pandas.DataFrame) – simulation configs to consider
all_feat_imps (pandas.DataFrame) – feature importances to consider
min_perc_threshold (float) – threshold for minimum feature importance in relation to maximum feature importance for a specific model
- Returns
statistics for a comparison between feature importances and effect sizes in a DataFrame
- Return type
pandas.DataFrame
- easypheno.simulate.results_analysis_synthetic_data.generate_scatterplots_featimps_vs_simulation(all_feat_imps, all_sim_configs, save_dir, datasplit_maf_pattern)
Generate scatterplots based on feature importances and effect sizes on synthetic data. One plot containing all models for the specified datasplit maf pattern as well as single plots for each model will be generated and saved.
- Parameters
all_sim_configs (pandas.DataFrame) – simulation configs to consider
all_feat_imps (pandas.DataFrame) – feature importances to consider
save_dir (pathlib.Path) – directory to save the plots
datasplit_maf_pattern (str) – datasplit maf pattern to search on
- easypheno.simulate.results_analysis_synthetic_data.featimps_vs_simulation(results_directory_genotype_level, sim_config_dir, save_dir)
Analyze feature importances versus effect sizes on synthetic data, both by retrieving stastistics and generating plots