easypheno.optim_pipeline

Module Contents

Functions

run(data_dir, genotype_matrix, phenotype_matrix, phenotype, encoding = None, maf_percentage = 0, save_dir = None, datasplit = 'nested-cv', n_outerfolds = 5, n_innerfolds = 5, test_set_size_percentage = 20, val_set_size_percentage = 20, models = None, n_trials = 100, save_final_model = False, batch_size = 32, n_epochs = 100000, outerfold_number_to_run = None)

Run the whole optimization pipeline

easypheno.optim_pipeline.run(data_dir, genotype_matrix, phenotype_matrix, phenotype, encoding=None, maf_percentage=0, save_dir=None, datasplit='nested-cv', n_outerfolds=5, n_innerfolds=5, test_set_size_percentage=20, val_set_size_percentage=20, models=None, n_trials=100, save_final_model=False, batch_size=32, n_epochs=100000, outerfold_number_to_run=None)

Run the whole optimization pipeline

Parameters
  • data_dir (str) – data directory where the phenotype and genotype matrix are stored

  • genotype_matrix (str) – name of the genotype matrix including datatype ending

  • phenotype_matrix (str) – name of the phenotype matrix including datatype ending

  • phenotype (str) – name of the phenotype to predict

  • encoding (str) – encoding to use. Default is None, so standard encoding of each model will be used. Options are: ‘012’, ‘onehot’, ‘raw’

  • maf_percentage (int) – threshold for MAF filter as percentage value. Default is 0, so no MAF filtering

  • save_dir (str) – directory for saving the results. Default is None, so same directory as data_dir

  • datasplit (str) – datasplit to use. Options are: nested-cv, cv-test, train-val-test

  • n_outerfolds (int) – number of outerfolds relevant for nested-cv

  • n_innerfolds (int) – number of folds relevant for nested-cv and cv-test

  • test_set_size_percentage (int) – size of the test set relevant for cv-test and train-val-test

  • val_set_size_percentage (int) – size of the validation set relevant for train-val-test

  • models (list) – list of models that should be optimized

  • n_trials (int) – number of trials for optuna

  • save_final_model (bool) – specify if the final model should be saved

  • batch_size (int) – batch size for neural network models

  • n_epochs (int) – number of epochs for neural network models

  • outerfold_number_to_run (int) – outerfold to run in case you do not want to run all