Readers for Omics data
Generic Reader
Created by Jorge Gomes on 06/09/2018 source generic_reader
- class troppo.omics.readers.generic.GenericReader(path: str, idCol: int, expCol: int, header_start: int = 0, sep: str = ',')[source]
Bases:
objectA generic reader to be used with omics files that are unable to be loaded by ProbeReader, or HpaReader, such as RNA-seq files from the gdc. Capable of handling files with additional info before the file header when supplied header_start by the user.
Arguments
- path: str
complete path to the file from which expresion data is read.
- idCol: int or str
either the name of the identifier column or its index in the file header
- expCol: int or str
either the name of the expression values column or its index in the file header
- header_start: int
line of the file header. Default = 0
- sep: str
field separator used in the omics file. Default = “,”
- class troppo.omics.readers.generic.TabularReader(path_or_df: str, index_col: int = 0, sample_in_rows: bool = True, header_offset: int = 0, cache_df: bool = False, ignore_samples: Optional[list] = None, omics_type: str = 'transcriptomics', nomenclature: Optional[str] = None, dsapply=None, **kwargs)[source]
Bases:
objectA generic reader for tabular files. It can be used to read any tabular file, but it is recommended to use specialized readers for specific file types, such as ProbeReader for microarray files, or HpaReader for HPA files.
Arguments
- path_or_df: str or pandas.DataFrame
The path to the file to be read, or a pandas DataFrame
- index_col: int, optional
The index column of the file, by default 0
- sample_in_rows: bool, optional
Whether the samples are in rows or columns, by default True
- header_offset: int, optional
The number of lines to skip before the header, by default 0
- cache_df: bool, optional
Whether to cache the DataFrame, by default False
- ignore_samples: list, optional
A list of samples to ignore, by default None
- omics_type: str, optional
The type of omics, by default ‘transcriptomics’
- nomenclature: str, optional
The nomenclature of the omics, by default None
- dsapply: function, optional
A function to apply to the DataFrame, by default None
- **kwargs: dict, optional
Additional arguments to pass to pandas.read_csv
Methods
- __iter__:
Iterates over the file, yielding a tuple of (sample, data)
- to_containers:
Converts the file to a list of OmicsContainers
HPA Reader
Created by Jorge Gomes on 09/03/2018 source HPA_Reader
- class troppo.omics.readers.hpa.HpaReader(fpath: str, tissue: str, id_col: int = 0, includeNA: bool = False)[source]
Bases:
objectReads the HPA pathology.tsv file from a fpath in the system. Discrete values are converted to numerical and expression values account for the level with the most patients.
Parameters
- fpath: str
complete path to the file from which omics data is read
- tissue: str
Exactly as in the file, regarding the column where expression values should be retrieved
- id_col: int,
either 0 (=”ensembl”) or 1(=”gene_symbol”) regarding which column shall be used for gene id
- includeNA: bool
flag if NA values should be included or not
Microarray Reader
Created by Jorge Gomes on 19/03/2018 source probe_reader
- class troppo.omics.readers.microarray.ProbeReader(fPath: str, expCol: int, annotFile: str, convTarget: str, convSep: str = ',', expSep: str = ',')[source]
Bases:
objectReads expression files sourced from microarrays DBs such as Gene Expression Barcode or Gene Expression OmniBus. Considers each value is identified by a probeID on the first column of the file. An annotation file supplied by the microarray chip vendor must be supplied for appropriate probe to gene Id conversion. Cases where a probe has no match with convTarget nomenclature will be ignored. Handles cases where more than one probe translate to the same gene, and where a probe translates to more than a gene.
Parameters
- fPath: str
complete path to the file from which expresion data is read.
- expCol: int
index of the column where expression values are retrieved from.
- annotFile: str
complete path to the annotation file.
- convTarget: str
exact match to the column name of the nomenclature used for probeID to geneID conversion recommended: Either Gene Symbol or Entrez Gene or equivalent.
- expSep: str
field separator used in the probe intesity/expression file. Default is “,”