Omics Module

Readers for Omics data

Core module

class troppo.omics.core.IdentifierMapping(type_name: str, id_mapping_table: DataFrame)[source]

Bases: object

get_id_table(ids: Sequence[Union[str, int]], from_id)[source]

map_ids(ids: Sequence[Union[str, int]], from_id: Union[str, int], to_id: Union[str, int])[source]

property name

class troppo.omics.core.OmicsContainer(omicstype: Optional[str] = None, condition: Optional[str] = None, data: Optional[dict] = None, nomenclature: Optional[str] = None)[source]

Bases: object

OmicsContainer class to be used for the creation of objects that store omics data and other useful information, such as its type, and the tissue condition from where this data was obtained. To successfully create an OmicsContainer object one must:

create an OmicsContainer object providing: a) its omictype b) the tissue/patient condition

Use its .load() method providing a previously created reader object (HpaReader, ProbeReader, GenericReader)

Once created this object can be transformed in several ways:

Id conversion
Value conversion
Filtering by id, regular expressions, or values threshold
Log transformation, or data normalization

Main attribute is .data() which is a dictionary containing : {gene_id: Expression Value}

Attributes

otype: str: The type of omics data stored in the container
condition: str: The condition from where the data was obtained
data: dict: The data stored in the container
nomenclature: str: The nomenclature used for the gene ids

convertIds(new: str)[source]

Redefines the ids(keys) on the data attribute.

Parameters

new:string: designation of the new id according to hgnc

convertValues(mapping: dict)[source]

Converts the values in the exp_val field to different values based on a valid user supplied mapping. IMPORTANT: Will not work if _values contains NAs Mapping shall be a dictionary of either:

old value (may it be string or numeric): new value (may it be string or numeric)

tuple of (lower bound, upper bound) of old value: new value (numeric, string)

Parameters

mapping: dict: a dictionary containing the mapping between the values to be converted and the desired values

dropNA()[source]: Removes every entry whose exp_val is NA

filterById(regex: str) → OmicsContainer[source]

Filters the data attribute to contain genes that match a regular expression or string supplied by the user

Parameters

regex: string: regular expression or string to be contained in the Gene Symbol field of the data attr.

Returns

OmicsContainer:: a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.

filterByValue(op: str, threshold: Union[int, float, tuple, str]) → OmicsContainer[source]

Filters the _values attribute to match a user defined filter above and under use < and > operators, while between uses <= and >=.

Parameters

op: string: one of (above, under, between, oneof)
threshold: int, float, tuple, string: numeric threshold for above and under, tuple of (lowerbound, upperbound) for between, string for included discrete levels for levels operation

Returns

OmicsContainer:: a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.

get_Condition()[source]

get_Data()[source]

get_Nomenclature()[source]

get_OmicsType()[source]

get_integrated_data_map(model_reader: ~troppo.omics.readers.hpa.HpaReader, and_func=<built-in function min>, or_func=<built-in function max>)[source]

Function responsible for the integration of different omics data with a metabolic model loaded with framed package. Matches model ids for gene_ids, metabolites or reaction ids with those present in the omicsContainer object.

Parameters

model_reader: HpaReader or ProbeReader or GenericReader: a cobamp AbstractModelObjectReader object
and_func:: the mathematical function to replace the “AND” operator present in the Gene-Protein-Rules
or_func:: the mathematical function to replace the “OR” operator present in the Gene-Protein-Rules

Returns

OmicsDataMap:: an OmicsDataMap object which contains the mapping between reactions/metabolites and its fluxes based on the supplied omics data.

load(arg: dict, **kwargs)[source]

Loads data into the OmicsContainer object. Data can be loaded from a dictionary or from a reader object.

Parameters

arg: dict or reader object: The data to be loaded into the OmicsContainer object
kwargs: dict: The keyword arguments to be passed to the reader object

print_values()[source]

set_condition(newCond: str)[source]

set_data(newData: dict)[source]

set_type(newType: str)[source]

transform(func: str)[source]

Applies the func to the exp_values of the data attr. Only compatible with numerical container.

Parameters

func: string: a function to be applied to the values of the container, either ‘norm’ or ‘logx’

Original number = x Transformed number x’=log10(x)

class troppo.omics.core.OmicsDataMap(scores, mapType)[source]

Bases: object

Stores integrated omics data, matching a given metabolic model

Attributes

_mapType: str: The type of map stored in the object
_scores: dict: The scores stored in the object

get_scores()[source]

mapType()[source]

select(op: str, threshold: Number) → set[source]

Filtering the original reaction scores to be under or above a threshold. Above or under operations use the >= and <= operators

Parameters

op: str: either “above” or “under” determining which scores shall be chosen
threshold: Number: either a float or an integer whether under or above all scores shall be chosen

Returns

set:: a set of reaction ids whose scores are above or under the threshold

set_scores(newScores: dict)[source]

Sets the scores attribute to a new dictionary

Parameters

newScores: dict: the new scores to be set

class troppo.omics.core.OmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]

Bases: TabularContainer

to_omics_container(sample_id)[source]

class troppo.omics.core.TabularContainer(row_labels: Sequence[Union[str, int]], column_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]

Bases: object

TabularContainer class to be used for the creation of objects that store tabular data and other useful information, such as its row and column labels. This class is meant to be used as a base class for other classes that store tabular data.

Parameters

row_labels: Sequence[Union[str, int]]: The row labels of the data
column_labels: Sequence[str]: The column labels of the data
values: lofl_array: The values of the data

Attributes

data: pd.DataFrame: The data stored in the container

property column_names

property data

drop(rows: Optional[Sequence] = None, columns: Optional[Sequence] = None)[source]

Drops the given rows and columns from the data attribute

Parameters

rows: Sequence: The rows to be dropped
columns: Sequence: The columns to be dropped

property row_names

transform(func: callable)[source]

class troppo.omics.core.TypedOmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray], omics_type: IdentifierMapping)[source]

Bases: OmicsMeasurementSet

convert_feature_ids(from_id, to_id)[source]

property omics_type: IdentifierMapping

to_omics_container(sample_id)[source]

troppo.omics.core.has_valid_dims(rows: Sequence, cols: Sequence, data: Union[Sequence[Sequence[Number]], ndarray])[source]

Checks if the data has the same dimensions as the rows and columns

Parameters

rows: Sequence: The rows of the data
cols: Sequence: The columns of the data
data: lofl_array: The data to be checked

Returns

bool, bool:: True if the data has the same dimensions as the rows and columns, False otherwise

Gene-level thresholding

class troppo.omics.gene_level_thresholding.GeneLevelThresholding(omics_dataframe: DataFrame, thresholding_strat: str = 'global', global_threshold_lower: Optional[int] = None, global_threshold_upper: Optional[int] = None, local_threshold: Optional[int] = None)[source]

Bases: object

This class is used to transform the dataframe containing the omics data and perform gene-level thresholding on omics data. It currently supports Global and Local thresholding approaches described by Richelle, Joshi and Lewis (2019) (https://doi.org/10.1371/journal.pcbi.1007185). These include: - global: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTL) are considered active. - local t1: genes with a value lower than the upper global threshold (GTU) are considered inactive; for genes with a value greater than the GTU, if the value is lower than the local threshold (LT), the gene is considered inactive, otherwise it is considered active. - local t2: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTU) are considered active; for genes with a value between the GTU and the lower global threshold (GTL), they are only considered active if their value is greater than the local threshold (LT). Thresholds are selected in accordance with the distribution of the data. The numbers in the thresholding options represent the position of the value to use. Currently, the options are: [0.1, 0.25, 0.5, 0.75, 0.9]; the threshold value will then be the value on the dataset that corresponds to that quantile.

Parameters

omics_dataframe: pandas.DataFrame: Omics data to be thresholded.
thresholding_strat: str: Thresholding strategy to be used. Must be one of: global, local t1, local t2.
global_threshold_lower: int or None, default = None: Position of the Global Lower threshold value on the quantile list.
global_threshold_upper: int or None, default = None: Position of the Global Upper threshold value on the quantile list.
local_threshold: int or None, default = None: Position of the Local threshold value on the quantile list.

apply_thresholding_filter() → DataFrame[source]

Thresholding filter for the omics data.

Returns

filtered_dataset: pandas.DataFrame: Filtered omics dataframe.

static global_thresholding(sample_series: Series, gtlow: float, maxexp: float) → dict[source]

Global thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series: Omics data from a specific sample.
gtlow: float: Global threshold lower value.
maxexp: float: Maximum expression value of the dataset.

Returns

filtered_sample: dict

static local_t1_thresholding(sample_series: Series, gtlow: float, lt: Series, maxexp: float) → dict[source]

Local T1 thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series: Omics data from a specific sample.
gtlow: float: Global threshold lower value.
lt: pd.Series: Local threshold value for each sample.
maxexp: float: Maximum expression value of the dataset.

Returns

filtered_sample: dict

static local_t2_thresholding(sample_series: Series, gtlow: float, gtup: float, lt: Series, maxexp: float) → dict[source]

Local T2 thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series: Omics data from a specific sample.
gtlow: float: Global threshold lower value.
gtup: float: Global threshold upper value.
lt: pandas.Series: Local threshold value for each gene.
maxexp: float: Maximum expression value of the dataset.

Returns

filtered_sample: dict

threshold_strategy(sample_series) → dict[source]

Thresholding strategy for the omics data. Processes a single sample at the time.

Parameters

sample_series: pandas.Series: Omics data from a specific sample.

Returns

filtered_sample: dict

Filtered omics data from a specific sample.

Gene ID converter

Created by Jorge Gomes on 06/06/2018 id_converter

troppo.omics.id_converter.idConverter(ids: list, old: str, new: str) → dict[source]

This function converts the ids from a given omics dataset into the desired ones to better match a metabolic model. Conversion is done based on the HGNC database.

NOMENCLATURES: [“hgnc_id”,”symbol”,”name”,”entrez_id”,”ensembl_gene_id”,”vega_id”,”ucsc_id”,”ccds_id”, “uniprot_ids”,

“pubmed_id”,”omim_id”,”locus_group”,”locus_type”,”alias_symbol”,”alias_name”, “prev_symbol”,”prev_name”, “ena”,”refseq_accession”,”rna_central_ids”]

Parameters

ids: list or set: containing the ids to be converted
old: string: exact match, the nomenclature designation of the input IDS. Must be different from new and contained in NOMENCLATURES
new: string: exact match, the nomenclature designation of the output IDs. Must be different from old and contained in NOMENCLATURES

Returns

dict: dictionary with the converted ids as keys and the original ids as values

troppo.omics.id_converter.searchNomenclature(ids: list) → str[source]

This function searches which gene identification nomenclature is used on the provided ids. When ids from different nomenclatures are input, the result will be the nomenclature with the most matches. Also handles cases where some ids do not match but others do.

Parameters

ids: list: List of ids (all using the same nomenclature)

Returns

string: the nomenclature designation according to HGNC complete set table.

Reaction-level integration of Omics data

class troppo.omics.integration.AdjustedScoreIntegrationStrategy(protected_reactions: list)[source]

Bases: ScoreIntegrationStrategy, ReactionProtectionMixin

This class is used to integrate the scores of the different omics data.

Attributes

protected_reactionslist: The list of reactions to be protected from being removed by the integration strategy.

integrate(data_map: OmicsDataMap) → dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap: The data map containing the gene scores to be integrated into reaction scores.

Returns

dict: The integrated scores.

class troppo.omics.integration.ContinuousScoreIntegrationStrategy(score_apply=None)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate continuous scores.

Attributes

score_applyfunction: The function to be applied to the scores.

integrate(data_map: OmicsDataMap) → dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap: The data map containing the gene scores to be integrated into reaction scores.

Returns

dict: The integrated scores.

class troppo.omics.integration.CustomSelectionIntegrationStrategy(group_functions: dict)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate the scores of the different omics data.

Attributes

group_functionsdict: The dictionary containing the functions to be applied to the scores.

integrate(data_map: OmicsDataMap) → dict[source]

This method is used to integrate the scores of the different omics data.

Parameters

data_map: OmicsDataMap: The data map containing the gene scores to be integrated into reaction scores.

Returns

list: The integrated scores.

class troppo.omics.integration.DefaultCoreIntegrationStrategy(threshold: float, protected_reactions: list)[source]

Bases: ScoreIntegrationStrategy, ReactionProtectionMixin

This class is used to integrate the scores of the different omics data.

Attributes

threshold: float or int: The threshold to be applied to the scores.
protected_reactionslist: The list of reactions to be protected from being removed by the integration strategy.

integrate(data_map: OmicsDataMap) → list[source]

class troppo.omics.integration.ReactionProtectionMixin(protected_reactions: list)[source]

Bases: object

This class is used to protect reactions from being removed by the integration strategy.

Attributes

protected_reactionslist: The list of reactions to be protected from being removed by the integration strategy.

class troppo.omics.integration.ScoreIntegrationStrategy[source]

Bases: object

abstract static integrate(self, data_map: OmicsDataMap)[source]

class troppo.omics.integration.ThresholdSelectionIntegrationStrategy(thresholds: list)[source]

Bases: ScoreIntegrationStrategy

This class is used to integrate the scores of the different omics data.

Attributes

thresholdslist or float or int: The thresholds to be applied to the scores. If a list is provided, the integration will be performed for each threshold. If a single value is provided, the integration will be performed only once.

integrate(data_map: OmicsDataMap) → list[source]