easypheno.preprocess.encoding_functions

Module Contents

Functions

get_encoding(models, user_encoding)

Get a list of all required encodings.

get_list_of_encodings()

Get a list of all implemented encodings.

get_base_encoding(encoding)

Check which base encoding is needed to create required encoding.

check_encoding_of_genotype(X)

Check the encoding of the genotype matrix

encode_genotype(X, required_encoding)

Compute the required encoding of the genotype matrix

get_additive_encoding(X, style = '012')

Generate genotype matrix in additive encoding:

get_onehot_encoding(X)

Generate genotype matrix in onehot encoding. If genotype matrix is homozygous, create 3d torch tensor with

easypheno.preprocess.encoding_functions.get_encoding(models, user_encoding)

Get a list of all required encodings.

Parameters
  • models – models to consider

  • user_encoding (str) – encoding specified by the user

Returns

list of encodings

Return type

list

easypheno.preprocess.encoding_functions.get_list_of_encodings()

Get a list of all implemented encodings.

! Adapt if new encoding is added !

Returns

List of all possible encodings

Return type

list

easypheno.preprocess.encoding_functions.get_base_encoding(encoding)

Check which base encoding is needed to create required encoding.

! Adapt if new encoding is added !

Parameters

encoding (str) – required encoding

Returns

base encoding

Return type

str

easypheno.preprocess.encoding_functions.check_encoding_of_genotype(X)

Check the encoding of the genotype matrix

! Adapt if new encoding is added !

Parameters

X (numpy.array) – genotype matrix

Returns

encoding of the genotype matrix

Return type

str

easypheno.preprocess.encoding_functions.encode_genotype(X, required_encoding)

Compute the required encoding of the genotype matrix

! Adapt if new encoding is added !

Parameters
  • X (numpy.array) – genotype matrix

  • required_encoding (str) – encoding of genotype matrix to create

Returns

X in new encoding

Return type

numpy.array

easypheno.preprocess.encoding_functions.get_additive_encoding(X, style='012')

Generate genotype matrix in additive encoding:

  • 0: homozygous major allele,

  • 1: heterozygous

  • 2: homozygous minor allele

for style=012 - 1: homozygous major allele, - 0: heterozygous - -1: homozygous minor allele

Parameters
  • X (numpy.array) – genotype matrix in raw encoding, i.e. containing the alleles

  • style (str) – encoding style, ‘012’ or ‘101’ default is ‘012’

Returns

genotype matrix in additive encoding (X_012)

Return type

numpy.array

easypheno.preprocess.encoding_functions.get_onehot_encoding(X)

Generate genotype matrix in onehot encoding. If genotype matrix is homozygous, create 3d torch tensor with (samples, SNPs, 4), with 4 as the onehot encoding

  • A : [1,0,0,0]

  • C : [0,1,0,0]

  • G : [0,0,1,0]

  • T : [0,0,0,1]

If genotype matrix is heterozygous, create 3d torch tensor with (samples, SNPs, 10), with 10 as the onehot encoding

  • A : [1,0,0,0,0,0,0,0,0,0]

  • C : [0,1,0,0,0,0,0,0,0,0]

  • G : [0,0,1,0,0,0,0,0,0,0]

  • K : [0,0,0,1,0,0,0,0,0,0]

  • M : [0,0,0,0,1,0,0,0,0,0]

  • R : [0,0,0,0,0,1,0,0,0,0]

  • S : [0,0,0,0,0,0,1,0,0,0]

  • T : [0,0,0,0,0,0,0,1,0,0]

  • W : [0,0,0,0,0,0,0,0,1,0]

  • Y : [0,0,0,0,0,0,0,0,0,1]

Parameters

X (numpy.array) – genotype matrix in raw encoding, i.e. containing the alleles

Returns

genotype matrix in onehot encoding (X_onehot)

Return type

numpy.array