Batch integration of Omics Data ======================================= Several integration algorithms were introduced in the previous tutorials. However, the demonstrated approach was limited to a single sample. In some cases, multiple samples are available and the context-specific models are required for each. Hence, making the integration of multiple samples a necessity. `batch_run` is a function from *Cobamp* that allows multiprocessing and is fully compatible with the *Troppo* framework. Thus allowing the integration of multiple samples in a single run. This function requires four parameters: - `function`: the function that will run the reconstruction that needs to be parallelized. - `sequence`: a list with the containers for each sample. - `paramargs`: a dictionary with the parameters for the function. - `threads`: the number of parallel processes to run. Initial setup ------------- .. code-block:: python import pandas as pd import cobra import re from troppo.omics.readers.generic import TabularReader from troppo.methods_wrappers import ReconstructionWrapper from cobamp.utilities.parallel import batch_run patt = re.compile('__COBAMPGPRDOT__[0-9]{1}') replace_alt_transcripts = lambda x: patt.sub('', x) # load the model model = cobra.io.read_sbml_model('data\HumanGEM_Consistent_COVID19_HAM.xml') # Create the reconstruction wrapper model_wrapper = ReconstructionWrapper(model=model, ttg_ratio=9999, gpr_gene_parse_function=replace_alt_transcripts) # load the data omics_data = pd.read_csv(filepath_or_buffer=r'data\Desai-GTEx_ensembl.csv', index_col=0) omics_data = omics_data.loc[['Lung_Healthy','Lung_COVID19']] # creat omics container omics_container = TabularReader(path_or_df=omics_data, nomenclature='entrez_id', omics_type='transcriptomics').to_containers() .. Define the function to be parallelized -------------------------------------- This function uses the `run_from_omics` method from the `ReconstructionWrapper` class. This requires the following parameters: - `omics_data`: the omics data container for the sample. - `algorithm`: a string containing the algorithm to use for the reconstruction. - `and_or_funcs`: a tuple with the functions to use for the AND and OR operations of the GPR. - `integration_strategy`: a tuple with the integration strategy and the function to apply to the scores. - `solver`: the solver to use for the optimization. - `**kwargs`: additional parameters for the reconstruction that are specific to used algorithm. .. code-block:: python def reconstruction_function_gimme(omics_container, parameters: dict): def score_apply(reaction_map_scores): return {k:0 if v is None else v for k, v in reaction_map_scores.items()} flux_threshold, obj_frac, rec_wrapper, method = [parameters[parameter] for parameter in ['flux_threshold', 'obj_frac', 'reconstruction_wrapper', 'algorithm']] reac_ids = rec_wrapper.model_reader.r_ids metab_ids = rec_wrapper.model_reader.m_ids AND_OR_FUNCS = (min, sum) return rec_wrapper.run_from_omics(omics_data=omics_container, algorithm=method, and_or_funcs=AND_OR_FUNCS, integration_strategy=('continuous', score_apply), solver='CPLEX', obj_frac=obj_frac, objectives=[{'biomass_human': 1}], preprocess=True, flux_threshold=flux_threshold, reaction_ids=reac_ids, metabolite_ids=metab_ids) .. Considering the function above, the parameters for the reconstruction are defined in a dictionary as follows: .. code-block:: python parameters = {'flux_threshold': 0.8, 'obj_frac': 0.8, 'reconstruction_wrapper': model_wrapper, 'algorithm': 'gimme'} .. Run the batch integration ------------------------- .. code-block:: python batch_gimme_res = batch_run(reconstruction_function_gimme, omics_container, parameters, threads=2) ..