easypheno.optim_pipeline
Module Contents
Functions
|
Run the whole optimization pipeline |
- easypheno.optim_pipeline.run(data_dir, genotype_matrix, phenotype_matrix, phenotype, encoding=None, maf_percentage=0, save_dir=None, datasplit='nested-cv', n_outerfolds=5, n_innerfolds=5, test_set_size_percentage=20, val_set_size_percentage=20, models=None, n_trials=100, save_final_model=False, batch_size=32, n_epochs=100000, outerfold_number_to_run=None)
Run the whole optimization pipeline
- Parameters
data_dir (str) – data directory where the phenotype and genotype matrix are stored
genotype_matrix (str) – name of the genotype matrix including datatype ending
phenotype_matrix (str) – name of the phenotype matrix including datatype ending
phenotype (str) – name of the phenotype to predict
encoding (str) – encoding to use. Default is None, so standard encoding of each model will be used. Options are: ‘012’, ‘onehot’, ‘raw’
maf_percentage (int) – threshold for MAF filter as percentage value. Default is 0, so no MAF filtering
save_dir (str) – directory for saving the results. Default is None, so same directory as data_dir
datasplit (str) – datasplit to use. Options are: nested-cv, cv-test, train-val-test
n_outerfolds (int) – number of outerfolds relevant for nested-cv
n_innerfolds (int) – number of folds relevant for nested-cv and cv-test
test_set_size_percentage (int) – size of the test set relevant for cv-test and train-val-test
val_set_size_percentage (int) – size of the validation set relevant for train-val-test
models (list) – list of models that should be optimized
n_trials (int) – number of trials for optuna
save_final_model (bool) – specify if the final model should be saved
batch_size (int) – batch size for neural network models
n_epochs (int) – number of epochs for neural network models
outerfold_number_to_run (int) – outerfold to run in case you do not want to run all