easypheno.utils.helper_functions

Module Contents

Functions

get_list_of_implemented_models()

Create a list of all implemented models based on files existing in 'model' subdirectory of the repository.

test_likely_categorical(vector_to_test, abs_unique_threshold = 20)

Test whether a vector is most likely categorical.

get_mapping_name_to_class()

Get a mapping from model name (naming in package model without .py) to class name.

set_all_seeds(seed = 42)

Set all seeds of libs with a specific function for reproducibility of results

get_subpath_for_datasplit(datasplit, datasplit_params)

Construct the subpath according to the datasplit.

save_model_overview_dict(model_overview, save_path)

Structure and save results of a whole optimization run for multiple models in one csv file

sort_models_by_encoding(models_list)

Sort models by the encoding that will be used

get_all_subdirectories_non_recursive(path)

Get all non-recursive subdirectories of path

get_all_files(path)

Get all non-recursive files of path

get_all_files_with_suffix(path, suffix)

Get all non-recursive files of path

get_datasplit_config_info_for_resultfolder(resultfolder)

Get all datasplit info for a result folder

easypheno.utils.helper_functions.get_list_of_implemented_models()

Create a list of all implemented models based on files existing in ‘model’ subdirectory of the repository.

Return type

list

easypheno.utils.helper_functions.test_likely_categorical(vector_to_test, abs_unique_threshold=20)

Test whether a vector is most likely categorical. Simple heuristics: checking if the number of unique values exceeds a specified threshold

Parameters
  • vector_to_test (list) – vector that is tested if it is most likely categorical

  • abs_unique_threshold (int) – threshold of unique values’ ratio to declare vector categorical

Returns

True if the vector is most likely categorical, False otherwise

Return type

bool

easypheno.utils.helper_functions.get_mapping_name_to_class()

Get a mapping from model name (naming in package model without .py) to class name.

Returns

dictionary with mapping model name to class name

Return type

dict

easypheno.utils.helper_functions.set_all_seeds(seed=42)

Set all seeds of libs with a specific function for reproducibility of results

Parameters

seed (int) – seed to use

easypheno.utils.helper_functions.get_subpath_for_datasplit(datasplit, datasplit_params)

Construct the subpath according to the datasplit.

Datasplit parameters:

  • nested-cv: [n_outerfolds, n_innerfolds]

  • cv-test: [n_innerfolds, test_set_size_percentage]

  • train-val-test: [val_set_size_percentage, test_set_size_percentage]

Parameters
  • datasplit (str) – datasplit to retrieve

  • datasplit_params (list) – parameters to use for the specific datasplit

Returns

string with the subpath

Return type

str

easypheno.utils.helper_functions.save_model_overview_dict(model_overview, save_path)

Structure and save results of a whole optimization run for multiple models in one csv file

Parameters
  • model_overview (dict) – dictionary with results overview

  • save_path (str) – filepath for saving the results overview file

easypheno.utils.helper_functions.sort_models_by_encoding(models_list)

Sort models by the encoding that will be used

Parameters

models_list (list) – unsorted list of models

Returns

list of models sorted by encoding

Return type

list

easypheno.utils.helper_functions.get_all_subdirectories_non_recursive(path)

Get all non-recursive subdirectories of path

Parameters

path (pathlib.Path) – path to search

Returns

list with all non-recursive subdirs

Return type

list

easypheno.utils.helper_functions.get_all_files(path)

Get all non-recursive files of path

Parameters

path (pathlib.Path) – path to search

Returns

list with all non-recursive files

Return type

list

easypheno.utils.helper_functions.get_all_files_with_suffix(path, suffix)

Get all non-recursive files of path

Parameters
Returns

list with all non-recursive files

Return type

list

easypheno.utils.helper_functions.get_datasplit_config_info_for_resultfolder(resultfolder)

Get all datasplit info for a result folder

Parameters

resultfolder (str) – path to retrieve info

Returns

datasplit info with datasplit, n_outerfolds, n_innerfolds, val_set_size_percentage, test_set_size_percentage, maf_percentage

Return type

tuple