easypheno.simulate.results_analysis_synthetic_data

Module Contents

Functions

gather_sim_configs(sim_config_dir, save_dir)

Collect the information on the simulation configurations for all within the specified directory

gather_feature_importances(results_dir, save_dir, datasplit_maf_pattern)

Collect the information on the feature importances for all models within the specified directory and for the specified datasplit maf pattern

get_statistics_featimps_vs_simulation(all_sim_configs, all_feat_imps, min_perc_threshold = 0.01)

Get statistics on feature importances compared to effect sizes on synthetic data, e.g. on how many background SNPs were detected

generate_scatterplots_featimps_vs_simulation(all_feat_imps, all_sim_configs, save_dir, datasplit_maf_pattern)

Generate scatterplots based on feature importances and effect sizes on synthetic data. One plot containing all models for the specified datasplit maf pattern as well as single plots for each model will be generated and saved.

featimps_vs_simulation(results_directory_genotype_level, sim_config_dir, save_dir)

Analyze feature importances versus effect sizes on synthetic data, both by retrieving stastistics and generating plots

easypheno.simulate.results_analysis_synthetic_data.gather_sim_configs(sim_config_dir, save_dir)

Collect the information on the simulation configurations for all within the specified directory

Parameters
  • sim_config_dir (pathlib.Path) – directory which contains the sim config files

  • save_dir (pathlib.Path) – directory to save the collected sim config info

easypheno.simulate.results_analysis_synthetic_data.gather_feature_importances(results_dir, save_dir, datasplit_maf_pattern)

Collect the information on the feature importances for all models within the specified directory and for the specified datasplit maf pattern

Parameters
  • results_dir (pathlib.Path) – results directory at the level of the name of the genotype matrix

  • save_dir (pathlib.Path) – directory to save the collected info

  • datasplit_maf_pattern (str) – datasplit maf pattern to search on

easypheno.simulate.results_analysis_synthetic_data.get_statistics_featimps_vs_simulation(all_sim_configs, all_feat_imps, min_perc_threshold=0.01)

Get statistics on feature importances compared to effect sizes on synthetic data, e.g. on how many background SNPs were detected

Parameters
  • all_sim_configs (pandas.DataFrame) – simulation configs to consider

  • all_feat_imps (pandas.DataFrame) – feature importances to consider

  • min_perc_threshold (float) – threshold for minimum feature importance in relation to maximum feature importance for a specific model

Returns

statistics for a comparison between feature importances and effect sizes in a DataFrame

Return type

pandas.DataFrame

easypheno.simulate.results_analysis_synthetic_data.generate_scatterplots_featimps_vs_simulation(all_feat_imps, all_sim_configs, save_dir, datasplit_maf_pattern)

Generate scatterplots based on feature importances and effect sizes on synthetic data. One plot containing all models for the specified datasplit maf pattern as well as single plots for each model will be generated and saved.

Parameters
  • all_sim_configs (pandas.DataFrame) – simulation configs to consider

  • all_feat_imps (pandas.DataFrame) – feature importances to consider

  • save_dir (pathlib.Path) – directory to save the plots

  • datasplit_maf_pattern (str) – datasplit maf pattern to search on

easypheno.simulate.results_analysis_synthetic_data.featimps_vs_simulation(results_directory_genotype_level, sim_config_dir, save_dir)

Analyze feature importances versus effect sizes on synthetic data, both by retrieving stastistics and generating plots

Parameters
  • results_directory_genotype_level (str) – results directory at the level of the name of the genotype matrix

  • sim_config_dir (str) – directory which contains the sim config files

  • save_dir (str) – directory to save the results