Omics Module
Core module
- class troppo.omics.core.IdentifierMapping(type_name: str, id_mapping_table: DataFrame)[source]
Bases:
object- property name
- class troppo.omics.core.OmicsContainer(omicstype: Optional[str] = None, condition: Optional[str] = None, data: Optional[dict] = None, nomenclature: Optional[str] = None)[source]
Bases:
objectOmicsContainer class to be used for the creation of objects that store omics data and other useful information, such as its type, and the tissue condition from where this data was obtained. To successfully create an OmicsContainer object one must:
create an OmicsContainer object providing: a) its omictype b) the tissue/patient condition
Use its .load() method providing a previously created reader object (HpaReader, ProbeReader, GenericReader)
- Once created this object can be transformed in several ways:
Id conversion
Value conversion
Filtering by id, regular expressions, or values threshold
Log transformation, or data normalization
Main attribute is .data() which is a dictionary containing : {gene_id: Expression Value}
Attributes
- otype: str
The type of omics data stored in the container
- condition: str
The condition from where the data was obtained
- data: dict
The data stored in the container
- nomenclature: str
The nomenclature used for the gene ids
- convertIds(new: str)[source]
Redefines the ids(keys) on the data attribute.
Parameters
- new:string
designation of the new id according to hgnc
- convertValues(mapping: dict)[source]
Converts the values in the exp_val field to different values based on a valid user supplied mapping. IMPORTANT: Will not work if _values contains NAs Mapping shall be a dictionary of either:
old value (may it be string or numeric): new value (may it be string or numeric)
tuple of (lower bound, upper bound) of old value: new value (numeric, string)
Parameters
- mapping: dict
a dictionary containing the mapping between the values to be converted and the desired values
- filterById(regex: str) OmicsContainer[source]
Filters the data attribute to contain genes that match a regular expression or string supplied by the user
Parameters
- regex: string
regular expression or string to be contained in the Gene Symbol field of the data attr.
Returns
- OmicsContainer:
a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.
- filterByValue(op: str, threshold: Union[int, float, tuple, str]) OmicsContainer[source]
Filters the _values attribute to match a user defined filter above and under use < and > operators, while between uses <= and >=.
Parameters
- op: string
one of (above, under, between, oneof)
- threshold: int, float, tuple, string
numeric threshold for above and under, tuple of (lowerbound, upperbound) for between, string for included discrete levels for levels operation
Returns
- OmicsContainer:
a new OmicsContainer object is returned once this filter is applied. Original instance remains unchanged.
- get_integrated_data_map(model_reader: ~troppo.omics.readers.hpa.HpaReader, and_func=<built-in function min>, or_func=<built-in function max>)[source]
Function responsible for the integration of different omics data with a metabolic model loaded with framed package. Matches model ids for gene_ids, metabolites or reaction ids with those present in the omicsContainer object.
Parameters
- model_reader: HpaReader or ProbeReader or GenericReader
a cobamp AbstractModelObjectReader object
- and_func:
the mathematical function to replace the “AND” operator present in the Gene-Protein-Rules
- or_func:
the mathematical function to replace the “OR” operator present in the Gene-Protein-Rules
Returns
- OmicsDataMap:
an OmicsDataMap object which contains the mapping between reactions/metabolites and its fluxes based on the supplied omics data.
- class troppo.omics.core.OmicsDataMap(scores, mapType)[source]
Bases:
objectStores integrated omics data, matching a given metabolic model
Attributes
- _mapType: str
The type of map stored in the object
- _scores: dict
The scores stored in the object
- select(op: str, threshold: Number) set[source]
Filtering the original reaction scores to be under or above a threshold. Above or under operations use the >= and <= operators
Parameters
- op: str
either “above” or “under” determining which scores shall be chosen
- threshold: Number
either a float or an integer whether under or above all scores shall be chosen
Returns
- set:
a set of reaction ids whose scores are above or under the threshold
- class troppo.omics.core.OmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]
Bases:
TabularContainer
- class troppo.omics.core.TabularContainer(row_labels: Sequence[Union[str, int]], column_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray])[source]
Bases:
objectTabularContainer class to be used for the creation of objects that store tabular data and other useful information, such as its row and column labels. This class is meant to be used as a base class for other classes that store tabular data.
Parameters
- row_labels: Sequence[Union[str, int]]
The row labels of the data
- column_labels: Sequence[str]
The column labels of the data
- values: lofl_array
The values of the data
Attributes
- data: pd.DataFrame
The data stored in the container
- property column_names
- property data
- drop(rows: Optional[Sequence] = None, columns: Optional[Sequence] = None)[source]
Drops the given rows and columns from the data attribute
Parameters
- rows: Sequence
The rows to be dropped
- columns: Sequence
The columns to be dropped
- property row_names
- class troppo.omics.core.TypedOmicsMeasurementSet(sample_labels: Sequence[Union[str, int]], feature_labels: Sequence[str], values: Union[Sequence[Sequence[Number]], ndarray], omics_type: IdentifierMapping)[source]
Bases:
OmicsMeasurementSet- property omics_type: IdentifierMapping
- troppo.omics.core.has_valid_dims(rows: Sequence, cols: Sequence, data: Union[Sequence[Sequence[Number]], ndarray])[source]
Checks if the data has the same dimensions as the rows and columns
Parameters
- rows: Sequence
The rows of the data
- cols: Sequence
The columns of the data
- data: lofl_array
The data to be checked
Returns
- bool, bool:
True if the data has the same dimensions as the rows and columns, False otherwise
Gene-level thresholding
- class troppo.omics.gene_level_thresholding.GeneLevelThresholding(omics_dataframe: DataFrame, thresholding_strat: str = 'global', global_threshold_lower: Optional[int] = None, global_threshold_upper: Optional[int] = None, local_threshold: Optional[int] = None)[source]
Bases:
objectThis class is used to transform the dataframe containing the omics data and perform gene-level thresholding on omics data. It currently supports Global and Local thresholding approaches described by Richelle, Joshi and Lewis (2019) (https://doi.org/10.1371/journal.pcbi.1007185). These include: - global: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTL) are considered active. - local t1: genes with a value lower than the upper global threshold (GTU) are considered inactive; for genes with a value greater than the GTU, if the value is lower than the local threshold (LT), the gene is considered inactive, otherwise it is considered active. - local t2: genes with a value lower than the upper global threshold (GTU) are considered inactive; genes with a value greater than the lower global threshold (GTU) are considered active; for genes with a value between the GTU and the lower global threshold (GTL), they are only considered active if their value is greater than the local threshold (LT). Thresholds are selected in accordance with the distribution of the data. The numbers in the thresholding options represent the position of the value to use. Currently, the options are: [0.1, 0.25, 0.5, 0.75, 0.9]; the threshold value will then be the value on the dataset that corresponds to that quantile.
Parameters
- omics_dataframe: pandas.DataFrame
Omics data to be thresholded.
- thresholding_strat: str
Thresholding strategy to be used. Must be one of: global, local t1, local t2.
- global_threshold_lower: int or None, default = None
Position of the Global Lower threshold value on the quantile list.
- global_threshold_upper: int or None, default = None
Position of the Global Upper threshold value on the quantile list.
- local_threshold: int or None, default = None
Position of the Local threshold value on the quantile list.
- apply_thresholding_filter() DataFrame[source]
Thresholding filter for the omics data.
Returns
- filtered_dataset: pandas.DataFrame
Filtered omics dataframe.
- static global_thresholding(sample_series: Series, gtlow: float, maxexp: float) dict[source]
Global thresholding strategy for the omics data. Processes a single sample at the time.
Parameters
- sample_series: pandas.Series
Omics data from a specific sample.
- gtlow: float
Global threshold lower value.
- maxexp: float
Maximum expression value of the dataset.
Returns
filtered_sample: dict
- static local_t1_thresholding(sample_series: Series, gtlow: float, lt: Series, maxexp: float) dict[source]
Local T1 thresholding strategy for the omics data. Processes a single sample at the time.
Parameters
- sample_series: pandas.Series
Omics data from a specific sample.
- gtlow: float
Global threshold lower value.
- lt: pd.Series
Local threshold value for each sample.
- maxexp: float
Maximum expression value of the dataset.
Returns
filtered_sample: dict
- static local_t2_thresholding(sample_series: Series, gtlow: float, gtup: float, lt: Series, maxexp: float) dict[source]
Local T2 thresholding strategy for the omics data. Processes a single sample at the time.
Parameters
- sample_series: pandas.Series
Omics data from a specific sample.
- gtlow: float
Global threshold lower value.
- gtup: float
Global threshold upper value.
- lt: pandas.Series
Local threshold value for each gene.
- maxexp: float
Maximum expression value of the dataset.
Returns
filtered_sample: dict
Gene ID converter
Created by Jorge Gomes on 06/06/2018 id_converter
- troppo.omics.id_converter.idConverter(ids: list, old: str, new: str) dict[source]
This function converts the ids from a given omics dataset into the desired ones to better match a metabolic model. Conversion is done based on the HGNC database.
NOMENCLATURES: [“hgnc_id”,”symbol”,”name”,”entrez_id”,”ensembl_gene_id”,”vega_id”,”ucsc_id”,”ccds_id”, “uniprot_ids”,
“pubmed_id”,”omim_id”,”locus_group”,”locus_type”,”alias_symbol”,”alias_name”, “prev_symbol”,”prev_name”, “ena”,”refseq_accession”,”rna_central_ids”]
Parameters
- ids: list or set
containing the ids to be converted
- old: string
exact match, the nomenclature designation of the input IDS. Must be different from new and contained in NOMENCLATURES
- new: string
exact match, the nomenclature designation of the output IDs. Must be different from old and contained in NOMENCLATURES
Returns
dict: dictionary with the converted ids as keys and the original ids as values
- troppo.omics.id_converter.searchNomenclature(ids: list) str[source]
This function searches which gene identification nomenclature is used on the provided ids. When ids from different nomenclatures are input, the result will be the nomenclature with the most matches. Also handles cases where some ids do not match but others do.
Parameters
- ids: list
List of ids (all using the same nomenclature)
Returns
- string
the nomenclature designation according to HGNC complete set table.
Reaction-level integration of Omics data
- class troppo.omics.integration.AdjustedScoreIntegrationStrategy(protected_reactions: list)[source]
Bases:
ScoreIntegrationStrategy,ReactionProtectionMixinThis class is used to integrate the scores of the different omics data.
Attributes
- protected_reactionslist
The list of reactions to be protected from being removed by the integration strategy.
- integrate(data_map: OmicsDataMap) dict[source]
This method is used to integrate the scores of the different omics data.
Parameters
- data_map: OmicsDataMap
The data map containing the gene scores to be integrated into reaction scores.
Returns
dict: The integrated scores.
- class troppo.omics.integration.ContinuousScoreIntegrationStrategy(score_apply=None)[source]
Bases:
ScoreIntegrationStrategyThis class is used to integrate continuous scores.
Attributes
- score_applyfunction
The function to be applied to the scores.
- integrate(data_map: OmicsDataMap) dict[source]
This method is used to integrate the scores of the different omics data.
Parameters
- data_map: OmicsDataMap
The data map containing the gene scores to be integrated into reaction scores.
Returns
dict: The integrated scores.
- class troppo.omics.integration.CustomSelectionIntegrationStrategy(group_functions: dict)[source]
Bases:
ScoreIntegrationStrategyThis class is used to integrate the scores of the different omics data.
Attributes
- group_functionsdict
The dictionary containing the functions to be applied to the scores.
- integrate(data_map: OmicsDataMap) dict[source]
This method is used to integrate the scores of the different omics data.
Parameters
- data_map: OmicsDataMap
The data map containing the gene scores to be integrated into reaction scores.
Returns
list: The integrated scores.
- class troppo.omics.integration.DefaultCoreIntegrationStrategy(threshold: float, protected_reactions: list)[source]
Bases:
ScoreIntegrationStrategy,ReactionProtectionMixinThis class is used to integrate the scores of the different omics data.
Attributes
- threshold: float or int
The threshold to be applied to the scores.
- protected_reactionslist
The list of reactions to be protected from being removed by the integration strategy.
- integrate(data_map: OmicsDataMap) list[source]
- class troppo.omics.integration.ReactionProtectionMixin(protected_reactions: list)[source]
Bases:
objectThis class is used to protect reactions from being removed by the integration strategy.
Attributes
- protected_reactionslist
The list of reactions to be protected from being removed by the integration strategy.
- class troppo.omics.integration.ScoreIntegrationStrategy[source]
Bases:
object- abstract static integrate(self, data_map: OmicsDataMap)[source]
- class troppo.omics.integration.ThresholdSelectionIntegrationStrategy(thresholds: list)[source]
Bases:
ScoreIntegrationStrategyThis class is used to integrate the scores of the different omics data.
Attributes
- thresholdslist or float or int
The thresholds to be applied to the scores. If a list is provided, the integration will be performed for each threshold. If a single value is provided, the integration will be performed only once.
- integrate(data_map: OmicsDataMap) list[source]