Omics Module

Core module

class troppo.omics.core.IdentifierMapping(type_name: str, id_mapping_table: DataFrame)[source]

Bases: object

get_id_table(ids: Sequence[Union[str, int]], from_id)[source]
map_ids(ids: Sequence[Union[str, int]], from_id: Union[str, int], to_id: Union[str, int])[source]
property name
class troppo.omics.core.OmicsContainer(omicstype: Optional[str] = None, condition: Optional[str] = None, data: Optional[dict] = None, nomenclature: Optional[str] = None)[source]

Bases: object

OmicsContainer class to be used for the creation of objects that store omics data and other useful information, such as its type, and the tissue condition from where this data was obtained. To successfully create an OmicsContainer object one must:

  1. create an OmicsContainer object providing: a) its omictype b) the tissue/patient condition

  2. Use its .load() method providing a previously created reader object (HpaReader, ProbeReader, GenericReader)

Once created this object can be transformed in several ways:
  1. Id conversion

  2. Value conversion

  3. Filtering by id, regular expressions, or values threshold

  4. Log transformation, or data normalization

Main attribute is .data() which is a dictionary containing : {gene_id: Expression Value}

Attributes

otype: str

The type of omics data stored in the container

condition: str

The condition from where the data was obtained

data: dict

The data stored in the container

nomenclature: str

The nomenclature used for the gene ids

convertIds(new: str)[source]

Redefines the ids(keys) on the data attribute.

Parameters

new:string

designation of the new id according to hgnc

convertValues(mapping: dict)[source]

Converts the values in the exp_val field to different values based on a valid user supplied mapping. IMPORTANT: Will not work if _values contains NAs Mapping shall be a dictionary of either:

  • old value (may it be string or numeric): new value (may it be string or numeric)

  • tuple of (lower bound, upper bound) of old value: new value (numeric, string)

Parameters

mapping: dict

a dictionary containing the mapping between the values to be converted and the desired values

dropNA()[source]

Removes every entry whose exp_val is NA

filterById(regex: str) OmicsContainer[source]

Filters the data attribute to contain genes that match a regular expression or string supplied by the user

Parameters

regex: string

regular expression or string to be contained in the Gene Symbol field of the data attr.

Returns

OmicsContainer:

a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.

filterByValue(op: str, threshold: Union[int, float, tuple, str]) OmicsContainer[source]

Filters the _values attribute to match a user defined filter above and under use < and > operators, while between uses <= and >=.

Parameters

op: string

one of (above, under, between, oneof)

threshold: int, float, tuple, string

numeric threshold for above and under, tuple of (lowerbound, upperbound) for between, string for included discrete levels for levels operation

Returns

OmicsContainer:

a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.

get_Condition()[source]
get_Data()[source]
get_Nomenclature()[source]
get_OmicsType()[source]
get_integrated_data_map(model_reader: ~troppo.omics.readers.hpa.HpaReader, and_func=<built-in function min>, or_func=<built-in function max>)[source]

Function responsible for the integration of different omics data with a metabolic model loaded with framed package. Matches model ids for gene_ids, metabolites or reaction ids with those present in the omicsContainer object.

Parameters

model_reader: HpaReader or ProbeReader or GenericReader

a cobamp AbstractModelObjectReader object

and_func:

the mathematical function to replace the “AND” operator present in the Gene-Protein-Rules

or_func:

the mathematical function to replace the “OR” operator present in the Gene-Protein-Rules

Returns

OmicsDataMap:

an OmicsDataMap object which contains the mapping between reactions/metabolites and its fluxes based on the supplied omics data.

load(arg: dict, **kwargs)[source]

Loads data into the OmicsContainer object. Data can be loaded from a dictionary or from a reader object.

Parameters

arg: dict or reader object

The data to be loaded into the OmicsContainer object

kwargs: dict

The keyword arguments to be passed to the reader object

print_values()[source]
set_condition(newCond: str)[source]
set_data(newData: dict)[source]
set_type(newType: str)[source]
transform(func: str)[source]

Applies the func to the exp_values of the data attr. Only compatible with numerical container.

Parameters

func: string

a function to be applied to the values of the container, either ‘norm’ or ‘logx’

Original number = x Transformed number x’=log10(x)

class troppo.omics.core.OmicsDataMap(scores, mapType)[source]

Bases: object

Stores integrated omics data, matching a given metabolic model

Attributes

_mapType: str

The type of map stored in the object

_scores: dict

The scores stored in the object

get_scores()[source]
mapType()[source]
select(op: str, threshold: Number) set[source]

Filtering the original reaction scores to be under or above a threshold. Above or under operations use the >= and <= operators

Parameters

op: str

either “above” or “under” determining which scores shall be chosen

threshold: Number

either a float or an integer whether under or above all scores shall be chosen

Returns

set:

a set of reaction ids whose scores are above or under the threshold

set_scores(newScores: dict)[source]

Sets the scores attribute to a new dictionary

Parameters

newScores: dict

the new scores to be set

class troppo.omics.core.OmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]

Bases: TabularContainer

to_omics_container(sample_id)[source]
class troppo.omics.core.TabularContainer(row_labels: Sequence[Union[str, int]], column_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]

Bases: object

TabularContainer class to be used for the creation of objects that store tabular data and other useful information, such as its row and column labels. This class is meant to be used as a base class for other classes that store tabular data.

Parameters

row_labels: Sequence[Union[str, int]]

The row labels of the data

column_labels: Sequence[str]

The column labels of the data

values: lofl_array

The values of the data

Attributes

data: pd.DataFrame

The data stored in the container

property column_names
property data
drop(rows: Optional[Sequence] = None, columns: Optional[Sequence] = None)[source]

Drops the given rows and columns from the data attribute

Parameters

rows: Sequence

The rows to be dropped

columns: Sequence

The columns to be dropped

property row_names
transform(func: callable)[source]
class troppo.omics.core.TypedOmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray], omics_type: IdentifierMapping)[source]

Bases: OmicsMeasurementSet

convert_feature_ids(from_id, to_id)[source]
property omics_type: IdentifierMapping
to_omics_container(sample_id)[source]
troppo.omics.core.has_valid_dims(rows: Sequence, cols: Sequence, data: Union[Sequence[Sequence[Number]], ndarray])[source]

Checks if the data has the same dimensions as the rows and columns

Parameters

rows: Sequence

The rows of the data

cols: Sequence

The columns of the data

data: lofl_array

The data to be checked

Returns

bool, bool:

True if the data has the same dimensions as the rows and columns, False otherwise

Gene-level thresholding

class troppo.omics.gene_level_thresholding.GeneLevelThresholding(omics_dataframe: DataFrame, thresholding_strat: str = 'global', global_threshold_lower: Optional[int] = None, global_threshold_upper: Optional[int] = None, local_threshold: Optional[int] = None)[source]

Bases: object

This class is used to transform the dataframe containing the omics data and perform gene-level thresholding on omics data. It currently supports Global and Local thresholding approaches described by Richelle, Joshi and Lewis (2019) (https://doi.org/10.1371/journal.pcbi.1007185). These include: - global: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTL) are considered active. - local t1: genes with a value lower than the upper global threshold (GTU) are considered inactive; for genes with a value greater than the GTU, if the value is lower than the local threshold (LT), the gene is considered inactive, otherwise it is considered active. - local t2: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTU) are considered active; for genes with a value between the GTU and the lower global threshold (GTL), they are only considered active if their value is greater than the local threshold (LT). Thresholds are selected in accordance with the distribution of the data. The numbers in the thresholding options represent the position of the value to use. Currently, the options are: [0.1, 0.25, 0.5, 0.75, 0.9]; the threshold value will then be the value on the dataset that corresponds to that quantile.

Parameters

omics_dataframe: pandas.DataFrame

Omics data to be thresholded.

thresholding_strat: str

Thresholding strategy to be used. Must be one of: global, local t1, local t2.

global_threshold_lower: int or None, default = None

Position of the Global Lower threshold value on the quantile list.

global_threshold_upper: int or None, default = None

Position of the Global Upper threshold value on the quantile list.

local_threshold: int or None, default = None

Position of the Local threshold value on the quantile list.

apply_thresholding_filter() DataFrame[source]

Thresholding filter for the omics data.

Returns

filtered_dataset: pandas.DataFrame

Filtered omics dataframe.

static global_thresholding(sample_series: Series, gtlow: float, maxexp: float) dict[source]

Global thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series

Omics data from a specific sample.

gtlow: float

Global threshold lower value.

maxexp: float

Maximum expression value of the dataset.

Returns

filtered_sample: dict

static local_t1_thresholding(sample_series: Series, gtlow: float, lt: Series, maxexp: float) dict[source]

Local T1 thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series

Omics data from a specific sample.

gtlow: float

Global threshold lower value.

lt: pd.Series

Local threshold value for each sample.

maxexp: float

Maximum expression value of the dataset.

Returns

filtered_sample: dict

static local_t2_thresholding(sample_series: Series, gtlow: float, gtup: float, lt: Series, maxexp: float) dict[source]

Local T2 thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series

Omics data from a specific sample.

gtlow: float

Global threshold lower value.

gtup: float

Global threshold upper value.

lt: pandas.Series

Local threshold value for each gene.

maxexp: float

Maximum expression value of the dataset.

Returns

filtered_sample: dict

threshold_strategy(sample_series) dict[source]

Thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series

Omics data from a specific sample.

Returns

filtered_sample: dict

Filtered omics data from a specific sample.

Gene ID converter

Created by Jorge Gomes on 06/06/2018 id_converter

troppo.omics.id_converter.idConverter(ids: list, old: str, new: str) dict[source]

This function converts the ids from a given omics dataset into the desired ones to better match a metabolic model. Conversion is done based on the HGNC database.

NOMENCLATURES: [“hgnc_id”,”symbol”,”name”,”entrez_id”,”ensembl_gene_id”,”vega_id”,”ucsc_id”,”ccds_id”, “uniprot_ids”,

“pubmed_id”,”omim_id”,”locus_group”,”locus_type”,”alias_symbol”,”alias_name”, “prev_symbol”,”prev_name”, “ena”,”refseq_accession”,”rna_central_ids”]

Parameters

ids: list or set

containing the ids to be converted

old: string

exact match, the nomenclature designation of the input IDS. Must be different from new and contained in NOMENCLATURES

new: string

exact match, the nomenclature designation of the output IDs. Must be different from old and contained in NOMENCLATURES

Returns

dict: dictionary with the converted ids as keys and the original ids as values

troppo.omics.id_converter.searchNomenclature(ids: list) str[source]

This function searches which gene identification nomenclature is used on the provided ids. When ids from different nomenclatures are input, the result will be the nomenclature with the most matches. Also handles cases where some ids do not match but others do.

Parameters

ids: list

List of ids (all using the same nomenclature)

Returns

string

the nomenclature designation according to HGNC complete set table.

Reaction-level integration of Omics data

class troppo.omics.integration.AdjustedScoreIntegrationStrategy(protected_reactions: list)[source]

Bases: ScoreIntegrationStrategy, ReactionProtectionMixin

This class is used to integrate the scores of the different omics data.

Attributes

protected_reactionslist

The list of reactions to be protected from being removed by the integration strategy.

integrate(data_map: OmicsDataMap) dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap

The data map containing the gene scores to be integrated into reaction scores.

Returns

dict: The integrated scores.

class troppo.omics.integration.ContinuousScoreIntegrationStrategy(score_apply=None)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate continuous scores.

Attributes

score_applyfunction

The function to be applied to the scores.

integrate(data_map: OmicsDataMap) dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap

The data map containing the gene scores to be integrated into reaction scores.

Returns

dict: The integrated scores.

class troppo.omics.integration.CustomSelectionIntegrationStrategy(group_functions: dict)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate the scores of the different omics data.

Attributes

group_functionsdict

The dictionary containing the functions to be applied to the scores.

integrate(data_map: OmicsDataMap) dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap

The data map containing the gene scores to be integrated into reaction scores.

Returns

list: The integrated scores.

class troppo.omics.integration.DefaultCoreIntegrationStrategy(threshold: float, protected_reactions: list)[source]

Bases: ScoreIntegrationStrategy, ReactionProtectionMixin

This class is used to integrate the scores of the different omics data.

Attributes

threshold: float or int

The threshold to be applied to the scores.

protected_reactionslist

The list of reactions to be protected from being removed by the integration strategy.

integrate(data_map: OmicsDataMap) list[source]
class troppo.omics.integration.ReactionProtectionMixin(protected_reactions: list)[source]

Bases: object

This class is used to protect reactions from being removed by the integration strategy.

Attributes

protected_reactionslist

The list of reactions to be protected from being removed by the integration strategy.

class troppo.omics.integration.ScoreIntegrationStrategy[source]

Bases: object

abstract static integrate(self, data_map: OmicsDataMap)[source]
class troppo.omics.integration.ThresholdSelectionIntegrationStrategy(thresholds: list)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate the scores of the different omics data.

Attributes

thresholdslist or float or int

The thresholds to be applied to the scores. If a list is provided, the integration will be performed for each threshold. If a single value is provided, the integration will be performed only once.

integrate(data_map: OmicsDataMap) list[source]