Quick start¶

This notebook provides an example on how to use cobamp to easily calculate elementary flux modes (EFMs) and minimal cut sets (MCSs) coupled with the model reading and flux analysis capabilities of the COBRApy package.

Model reading and setup¶

Download the model in SBML format from the BiGG models repository. In this example we use the central carbon metabolic model of Escherichia coli, downloaded and stored in a temporary folder/path stored as model_path.

[1]:

import urllib

model_url = "http://bigg.ucsd.edu/static/models/e_coli_core.xml"
model_path, model_content = urllib.request.urlretrieve(model_url)

Read the SBML temporary file generated from the HTTP request content and generate a COBRA Model object from it. Two functions are defined to assist in further analyses: * restore_bounds() - restores all reaction bounds to their original values * apply_kos(**kos*)* - knockout reactions contained in kos (any iterable containing reaction IDs)

[2]:

import cobra

model = cobra.io.sbml.read_sbml_model(model_path)
model_bounds = {r.id:(r.lower_bound, r.upper_bound) for r in model.reactions}

# >>> model_bounds
# ... {'ACALD': (-1000.0, 1000.0),
# ... 'ACALDt': (-1000.0, 1000.0),
# ... 'ACKr': (-1000.0, 1000.0),
# ... 'ACONTa': (-1000.0, 1000.0),
# ... 'ACONTb': (-1000.0, 1000.0), ...

def restore_bounds():
    for r in model.reactions:
        r.lower_bound, r.upper_bound = model_bounds[r.id]

def apply_kos(kos):
    for ko in kos:
        model.reactions.get_by_id(ko).knock_out()

model

[2]:

Name	e_coli_core
Memory address	0x07f6bbab80588
Number of metabolites	72
Number of reactions	95
Objective expression	1.0BIOMASS_Ecoli_core_w_GAM - 1.0BIOMASS_Ecoli_core_w_GAM_reverse_712e5
Compartments	extracellular space, cytosol

Extracting environmental conditions from this particular model, assuming the maximum capacity constant $N$ ( $N=1000$ ). This assumes the media is defined when a lower or upper bound is different than 0, $N$ or $-N$ .

This will be needed some steps further for the MCSEnumerator approach.

[3]:

env_conditions = {r.id:(r.lower_bound, r.upper_bound)
 for r in model.reactions
 if (r.lower_bound not in [-1000, 0]) or (r.upper_bound not in [0, 1000])}

env_conditions

[3]:

{'ATPM': (8.39, 1000.0), 'EX_glc__D_e': (-10.0, 1000.0)}

Enumerating minimal cut sets¶

Import the MCSEnumerator wrapper class (KShortestMCSEnumeratorWrapper) that works with COBRA models. KShortestMCSEnumeratorWrapper provides a simple interface to enumerate minimal cut sets (MCSs) using the K-shortest algorithm (commonly referred to as the MCSEnumerator approach).

MCSs are sets of reactions that disable a certain phenotype in a model. This phenotype is defined as a set of flux or yield constraints. In this example, we will enumerate some MCSs.

[4]:

from cobamp.wrappers import KShortestMCSEnumeratorWrapper

Define the region of the flux space that the MCSs are supposed to block. In this scenario, we are attempting to improve succinate production using glucose as a carbon source. The following reaction IDs are required: * Succinate exchange : ‘EX_succ_e’ * Glucose exchange: ‘EX_glc__D_e’ * Maintenance ATP: ‘ATPM’

As such, we wish to block low production phenotypes by adding a lower bound on the yield between succinate and glucose. Since the latter is only consumed, we assume its flux value is always negative. Thus, we want yields to be as low as possible, hence the lower bound definition.

This yield constraint can be loosely represented as the following:

$Y\frac{succinate}{glucose} > -0.001$

MCSs will block this phenotype and thus, we will obtain negative yields.

[5]:

yield_space = {
   #('product_id'),('substrate_id'):(lower_bound, upper_bound, deviation - set to zero for most cases)
    ('EX_succ_e','EX_glc__D_e'):(-0.001, None, 0),
}

flux_space = {
   # flux id     : (lower_bound, upper_bound)
    'EX_glc__D_e': (-10, None),
    'ATPM' : (8.39, None)
}

Having defined the flux space, we can now start the algorithm. The wrapper for MCSEnumerator requires the following parameters: * model: A model instance from COBRA, framed (or any other framework if a ModelReader subclass is implemented for it and added to the list of readers) * target_flux_space_dict: A dictionary (str -> tuple(float, float)) mapping reaction identifiers to flux bounds representing the target flux space for minimal cut sets. * target_yield_space_dict: A dictionary ((str, str) -> tuple(float, float, float)) mapping product/substrate pairs to a lower and upper bound for the yield constraint between both fluxes and a deviation value for the constraint. * algorithm_type: Can be one of two values: * KShortestMCSEnumeratorWrapper.ALGORITHM_TYPE_POPULATE: Enumerate one size at a time * KShortestMCSEnumeratorWrapper.ALGORITHM_TYPE_ITERATIVE: Enumerate one solution at a time * stop_criteria: An integer defining the maximum number of iterations. If the POPULATE routine is used, the algorithm will yield all solutions that have size stop_criteria and below. On the other hand, if ITERATIVE is used, the algorithm will yield the stop_criteria-shortest solutions. * solver: A string that specifies the solver to be used

It is worth noting that CPLEX is currently the best alternative for elementary mode enumeration due to some limitation on both the solvers and the optlang framework.

With CPLEX, cobamp uses indicator constraints, while other solvers (GLPK and GUROBI) rely on Big-M formulations. The latter requires a parameter (M), which is to be set at a very large value. However, higher values of M result in numerical instability which negatively affects the solutions yielded by the optimizer.

Indicator constraints can be used with GUROBI, although optlang does not currently support their addition. If these features are implemented in optlang, cobamp will also support them.

[6]:

ksefm = KShortestMCSEnumeratorWrapper(
    model=model,
    target_flux_space_dict=flux_space,
    target_yield_space_dict=yield_space,
    algorithm_type=KShortestMCSEnumeratorWrapper.ALGORITHM_TYPE_POPULATE,
    stop_criteria=5,
    solver='CPLEX') # or 'GUROBI', 'GLPK'

The MILP problems aren’t actually run once the KShortestMCSEnumeratorWrapper instance is created. To obtain the solutions, the get_enumerator() method must be called. This method returns a generator that can be called in several ways.

[7]:

enumerator = ksefm.get_enumerator()

Depending on time constraints, one can interactively call the next function on the enumerator so it iterates just once. This is useful when the problem is large or when exploring new constraints.

[8]:

solutions = [sol for sol in enumerator]

Solutions are always lists of dictionaries mapping fluxes with values for all active fluxes. For MCSs, one simply has to obtain the dictionary keys for all dictionaries in the list.

[9]:

solutions[0]

[9]:

[{'ENO': 1548.6591179977859},
 {'NADH16': 357.40166865316615},
 {'GAPD': 1548.5399284864625},
 {'PGK': -1548.5399284864627},
 {'PGM': -1548.6591179977856},
 {'GLCpts': 2.324195470798569},
 {'EX_glc__D_e': -2.0858164481525625},
 {'EX_succ_e': 2084.743742550656},
 {'EX_h2o_e': 1131.6895113231285},
 {'H2Ot': -1131.6895113231285},
 {'SUCCt3': 2084.743742550656},
 {'EX_h_e': 1042.3718712753266},
 {'ATPM': 1.0}]

[10]:

mcs = [set(d.keys()) for d in solutions[0]]
mcs

[10]:

[{'ENO'},
 {'NADH16'},
 {'GAPD'},
 {'PGK'},
 {'PGM'},
 {'GLCpts'},
 {'EX_glc__D_e'},
 {'EX_succ_e'},
 {'EX_h2o_e'},
 {'H2Ot'},
 {'SUCCt3'},
 {'EX_h_e'},
 {'ATPM'}]

Unlike the previous approach, we can try and extensively enumerate all MCSs. We must redefine the enumerator as it already iterated once. We can import the chain function and use it to get a list of solutions until the stop criteria is reached.

[11]:

enumerator = ksefm.get_enumerator()

[12]:

from itertools import chain
solutions = list(chain(*ksefm.get_enumerator()))

[13]:

mcs = [set(e.keys()) for e in solutions]
mcs[1:20]

[13]:

[{'NADH16'},
 {'GAPD'},
 {'PGK'},
 {'PGM'},
 {'GLCpts'},
 {'EX_glc__D_e'},
 {'EX_succ_e'},
 {'EX_h2o_e'},
 {'H2Ot'},
 {'SUCCt3'},
 {'EX_h_e'},
 {'ATPM'},
 {'NADTRHD', 'PFK'},
 {'EX_co2_e', 'TPI'},
 {'G6PDH2r', 'PGI'},
 {'CO2t', 'TPI'},
 {'PGI', 'PGL'},
 {'CYTBD', 'FBA'},
 {'PGI', 'RPE'}]

Enumerating elementary flux modes¶

We can import the KShortestEFMEnumeratorWrapper to enumerate elementary flux modes (EFMs) using the K-shortest EFM algorithm.

EFMs can be considered as the smallest functional units in a metabolic model that can achieve a certain function and are useful in finding novel metabolic functions.

[14]:

from cobamp.wrappers import KShortestEFMEnumeratorWrapper

For this example, we will enumerate EFMs in the model we have previously loaded. Similarly to the example above, we intend to produce succinate with this organism. For that purpose, we will try to enumerate the smallest EFMs that allow the conversion of glucose into succinate and biomass (to ensure growth).

To specify these metabolites, they have to be present in the model. First, we will create a metabolite to represent cell growth. We then add a reaction to represent the fictional secretion of this metabolite. Both components are added to a copy of the model to preserve the original version.

[15]:

biomass_c = cobra.Metabolite(compartment='c', id='biomass_c', name='Biomass')
biomass_production = cobra.Reaction(id='EX_biomass_e', name='Biomass production', lower_bound=0, upper_bound=1000)

model_biomass = model.copy()
model_biomass.add_metabolites([biomass_c])

The biomass metabolite is added to the biomass reaction already present in the model as a product. On the new reaction, the same metabolite is added as a reactant, effectively creating a drain on the model.

[16]:

model_biomass.reactions.BIOMASS_Ecoli_core_w_GAM.add_metabolites({biomass_c: 1})
biomass_production.add_metabolites({biomass_c: -1})

The K-shortest enumeration algorithm is then instantiated using mostly the same arguments as in the example above, with the addition of: * consumed : A list of metabolite IDs that are guaranteed to be consumed by the cell * non_consumed: A list of external metabolite IDs that cannot be consumed * produced : A list of external metabolite IDs that must be produced by the cell

These parameters are used to narrow the space of possible EFMs to avoid unwanted metabolic functions and speed up the search.

[17]:

ksefm = KShortestEFMEnumeratorWrapper(
    model=model_biomass,
    non_consumed=[],
    consumed=['glc__D_e'],
    produced=['biomass_c','succ_e'],
    algorithm_type=KShortestEFMEnumeratorWrapper.ALGORITHM_TYPE_POPULATE,
    stop_criteria=100,
    solver='CPLEX'
)

Having set a limit of 100 for the size of the EFMs, we will now attempt to enumerate them. In this example we will only obtain with the smallest. Once the algorithm reaches first iteration that yields at least one EFM, we will no longer call the enumerator instance.

[18]:

enumerator = ksefm.get_enumerator()

efm_list = []
while len(efm_list) == 0:
    efm_list += next(enumerator)

The 17 smallest EFMs include 42 reactions

[19]:

len(efm_list)

[19]:

Analysis of elementary flux modes¶

Inspecting flux coefficients¶

The EFMs can be displayed as a set of reactions with coefficients. Please note that the drains that correspond with the external metabolites defined as being consumed or produced will not show up as active reactions, even though they are part of the EFM. This occurs since the mass balance equation for these metabolites is changed from $\sum^{n}_{j=1}{S_{ij}}v = 0$ to $\sum^{n}_{j=1}{S_{ij}}v > 1$ or $\sum^{n}_{j=1}{S_{ij}}v < -1$ , assuming $v$ as the flux vector.

The values for these fluxes can be calculated by replacing $v$ with the flux values from the EFM and obtaining the result from the left hand side of the inequation

[20]:

efm_list[0]

[20]:

{'ACONTa': 509.7485477179139,
 'ACONTb': 509.7485477179139,
 'AKGDH': 491.84149377600266,
 'BIOMASS_Ecoli_core_w_GAM': 16.597510373446326,
 'CO2t': -1034.139419087282,
 'CS': 509.7485477179139,
 'CYTBD': 2521.7435684650845,
 'ENO': 1166.9892116184212,
 'EX_co2_e': 1034.139419087282,
 'EX_h2o_e': 1646.8838174276166,
 'EX_h_e': 1316.6290456433399,
 'EX_nh4_e': -90.50290456432813,
 'EX_o2_e': -1260.8717842325423,
 'EX_pi_e': -61.05726141079617,
 'FBA': 601.4605809129475,
 'GAPD': 1191.8190871370969,
 'GLCpts': 617.9701244814147,
 'GLNS': 90.50290456432813,
 'GLUSy': 86.2589211618379,
 'H2Ot': -1646.8838174276166,
 'ICDHyr': 509.7485477179139,
 'NADH16': 2521.7435684650845,
 'NADTRHD': 5576.853941909496,
 'NH4t': 90.50290456432813,
 'O2t': 1260.8717842325423,
 'PDH': 571.952697095516,
 'PFK': 601.4605809129475,
 'PGI': 614.5676348548582,
 'PGK': -1191.8190871370969,
 'PGM': -1166.9892116184212,
 'PIt2r': 61.05726141079617,
 'PPC': 539.4033195021505,
 'PYK': 1.0,
 'RPE': -11.93029045643322,
 'RPI': -11.93029045643322,
 'SUCCt3': 491.84149377600266,
 'SUCOAS': -491.84149377600266,
 'TALA': -2.969294605809549,
 'THD2': 5369.595020747642,
 'TKT1': -2.969294605809549,
 'TKT2': -8.96099585062367,
 'TPI': 601.4605809129475}

[21]:

metabolite = model_biomass.metabolites.biomass_c
cur_efm = efm_list[0]
{r:cur_efm[r.id] for r in metabolite.reactions if r.id in cur_efm}

[21]:

{<Reaction BIOMASS_Ecoli_core_w_GAM at 0x7f6b7bff8780>: 16.597510373446326}

Visualizing EFMs in metabolic maps¶

Alternatively, external packages can be used to provide a more interactive result. The escher package (https://escher.github.io/), once installed in your Python distribution, can be used to display metabolic maps. The snippet below shows how the EFMs from cobamp can be used directly as inputs for escher maps.

[22]:

import escher

escher_builder = escher.Builder(
    map_name='e_coli_core.Core metabolism', # map that matches our model
    reaction_scale=[{'type': 'min', 'color': '#0000ff', 'size': 10}, # make active fluxes more visible
               {'type': 'mean', 'color': '#551a8b', 'size': 20},
               {'type': 'max', 'color': '#ff0000', 'size': 40}],

    hide_secondary_metabolites = True,
    reaction_data = efm_list[0], # provides flux values from the first EFM

)
escher_builder.display_in_notebook(js_source='web')

[22]:

Representing multiple EFMs as trees¶

Additionally, we can also display EFMs as ordered trees where each path from the root to a leaf is a single EFM. This allows us to observe patterns among large sets of EFMs to identify common mechanisms between them

The analysis.tree module contains some functions to achieve this. Please note that this functionality is still highly experimental

[23]:

from cobamp.analysis.graph import *

The function below provides an example workflow to organize EFMs (or MCSs) as trees

Firstly, the root node is created and the tree is populated with the EFMs. This procedure is performed by iteratively adding one node representing the most common reaction within a the set of EFMs. Each of these nodes is then recursively populated using the subset of EFMs containing reactions from the nodes above it.

The tree is then compressed by merging linear sequences of nodes into a single node whose value is a list of reactions. This list is then converted to a concatenated string for easier reading.

If needed, very large nodes can be ignored, with their value set to an arbitrary value. Additionally the tree can be pruned below a supplied depth value, replacing all nodes below it by the relative frequency of each reaction in the set of EFMs represented under these new leaves.

[24]:

def generate_efm_results_tree(efm_sets, ignore_greater_than=10, pruning_level=6, merge_dupes=False):
    root = Tree('ROOT')
    # fill the tree with EFMs
    fill_tree(root, efm_sets)
    # compress sequences of nodes with only 1 inward and 1 outward edge
    compress_linear_paths(root)
    if ignore_greater_than:
        ignore_compressed_nodes_by_size(root, ignore_greater_than) # if a node has too many fluxes, ignore it

    # apply a function to all nodes in this example, the node value
    # is converted to a string if it is currently a list of strings
    # this is the case of compressed nodes
    apply_fx_to_all_node_values(root, lambda x: ' + '.join(sorted(x)) if isinstance(x, list) else x if x is not None else "None")

    # if the tree is too deep, a depth value can be specified
    # so the tree is cut past that point
    # the leaf that replaces it will specify the relative
    # frequency of the reactions in all EFMs represented by it
    if pruning_level:
        probabilistic_tree_prune(root, target_level=pruning_level, cut_leaves=False, name_separator='+')

    # compress the tree again
    compress_linear_paths(root)

    # merge duplicate nodes. this effectively replaces all
    # objects of the same value with a single object
    if merge_dupes:
        merge_duplicate_nodes(root)
    return root

The tree is then generated using this function, returning the populated tree object

[25]:

tree = generate_efm_results_tree(
    efm_sets=efm_list, # list of elementary modes (list[dict[str,float]])
    ignore_greater_than=None, # ignore nodes with more than *ignore_greater_than* elements
    pruning_level=None, # prune tree from the nodes past depth=*pruning_level*
    merge_dupes=False # do not merge duplicate nodes
)

A simple print function is supplied to quickly view the tree.

[26]:

print(pretty_print_tree(tree))

|-- ACONTa + ACONTb + AKGDH + BIOMASS_Ecoli_core_w_GAM + CO2t + CS + CYTBD + ENO + EX_co2_e + EX_h2o_e + EX_h_e + EX_nh4_e + EX_o2_e + EX_pi_e + FBA + GAPD + GLCpts + GLNS + H2Ot + ICDHyr + NADH16 + NH4t + O2t + PDH + PFK + PGI + PGK + PGM + PIt2r + PPC + PYK + ROOT + RPE + RPI + SUCCt3 + SUCOAS + TALA + TKT1 + TKT2 + TPI(None)
|       |-- ATPS4r(13)
|       |       |-- GLUSy(7)
|       |       |       |-- GLUDy(1)
|       |       |       |-- SUCCt2_2(1)
|       |       |       |-- FBP(1)
|       |       |       |-- ATPM(1)
|       |       |       |-- GLUN(1)
|       |       |       |-- THD2(1)
|       |       |       |-- PPCK(1)
|       |       |-- GLUDy(6)
|       |       |       |-- SUCCt2_2(1)
|       |       |       |-- FBP(1)
|       |       |       |-- ATPM(1)
|       |       |       |-- GLUN(1)
|       |       |       |-- THD2(1)
|       |       |       |-- PPCK(1)
|       |-- NADTRHD(4)
|       |       |-- GLUSy(2)
|       |       |       |-- SUCCt2_2(1)
|       |       |       |-- THD2(1)
|       |       |-- GLUDy(2)
|       |       |       |-- SUCCt2_2(1)
|       |       |       |-- THD2(1)

Another way to draw this would be to used the NetworkX (https://networkx.github.io/) with GraphViz (https://www.graphviz.org/) installed.

[27]:

%matplotlib inline

tree = generate_efm_results_tree(
    efm_sets=efm_list, # list of elementary modes (list[dict[str,float]])
    ignore_greater_than=10, # ignore nodes with more than *ignore_greater_than* elements
    pruning_level=10, # prune tree from the nodes past depth=*pruning_level*
    merge_dupes=True # do not merge duplicate nodes
)

old_value = tree.value # save the old value of the root node
print('Old root string:', tree.value)
tree.value = "Glycolysis + TCA" # change the root node value to a shorter string

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph() # define a new networkx Graph object
populate_nx_graph(tree, G, unique_nodes=True) # populate it with data from the Tree object

pos = nx.nx_pydot.graphviz_layout(G) # generate the graphviz layout

# create a figure and plot the nodes, edges and labels
plt.figure(figsize=(20,10))
nx.draw_networkx_nodes(G, pos, node_size=10)
nx.draw_networkx_edges(G, pos, alpha=0.5, arrowsize=20)
nx.draw_networkx_labels(G, pos, font_size=18, font_color='red')

plt.show()

Old root string: ACONTa + ACONTb + AKGDH + BIOMASS_Ecoli_core_w_GAM + CO2t + CS + CYTBD + ENO + EX_co2_e + EX_h2o_e + EX_h_e + EX_nh4_e + EX_o2_e + EX_pi_e + FBA + GAPD + GLCpts + GLNS + H2Ot + ICDHyr + NADH16 + NH4t + O2t + PDH + PFK + PGI + PGK + PGM + PIt2r + PPC + PYK + ROOT + RPE + RPI + SUCCt3 + SUCOAS + TALA + TKT1 + TKT2 + TPI

Frequency analysis of multiple sets of EFMs¶

Heatmaps can be used as graphical representations of matrices. If multiple sets of EFMs are enumerated using different conditions, they can be compared in terms of the frequency at which their active reactions are displayed. cobamp provides a simple analysis function to generate a pandas dataframe containing reaction frequencies for one or more sets of EFMs.

A dictionary mapping a group identifier (as string) and a set of reaction indices from the EFMs is required as input.

Firstly, the heatmap and frequency dataframe generating functions must be imported as shown below

[28]:

from cobamp.analysis.plotting import display_heatmap
from cobamp.analysis.frequency import get_frequency_dataframe

As an example, one can try to enumerate the 50 smallest EFMs for consuming glucose and producing a wide array of interesting compounds as well as ensuring cell growth through the production of biomass. In this test case, we will try to get at least 50 EFMs to produce: * Succinate (succ_e) * D-Lactate (lac__D_e) * Glutamine (gln__L_e) * Glutamate (glu__L_e) * Ethanol (etoh_e)

Some functions should be defined to achieve this specific purpose. First, we can generalize the EFM enumerators that were previously used to accept any product. The function get_efm_enumerator_for_product(**model*, product)* returns an EFM enumerator for the model while forcing product to be secreted.

[29]:

def get_efm_enumerator_for_product(model, product):
    ksefm = KShortestEFMEnumeratorWrapper(
        model=model,
        non_consumed=[],
        consumed=['glc__D_e'],
        produced=['biomass_c', product],
        algorithm_type=KShortestEFMEnumeratorWrapper.ALGORITHM_TYPE_POPULATE,
        stop_criteria=100,
            solver='CPLEX'
    )

    return ksefm.get_enumerator()

Another function can also be created to obtain at least 50 EFMs and convert them to sets of reactions. This function, get_production_efms(**model*, product)*, calls the function defined above to obtain the enumerator and enumerates EFM up to size 100 but stops if 50 EFMs have been added.

[30]:

def get_production_efms(model, product):
    enumerator = get_efm_enumerator_for_product(model, product)
    efm_list = []
    stopped = False

    while len(efm_list) < 50 and not stopped:
        try:
            efm_list += next(enumerator)
        except StopIteration as si:
            stopped = True
            print('Stopping criteria reached!')

    return [set(efm.keys()) for efm in efm_list]

Finally, we can define the desired products and enumerate the EFMs for each. A dictionary mapping products and EFMs is the preferred format for storing and analysing these EFMs.

[31]:

products = ['succ_e','lac__D_e','gln__L_e','glu__L_e','etoh_e']

production_efms = {product:get_production_efms(model_biomass, product) for product in products}

Stopping criteria reached!

The enumerated EFMs for each product can be summarized in terms on reaction frequencies for each product. The function get_frequency_dataframe(**efm_groups*)* from the frequency submodule accepts the dictionaries created above and generates a dataframe. Further preprocessing using other methods can enhance this analysis.

[32]:

efm_df = get_frequency_dataframe(production_efms) # obtain the dataframe
efm_df = efm_df.loc[:,~ efm_df.isna().all()].fillna(0).astype(int) # remove entirely null columns (with no EFMs)

# convert absolute to relative frequency in percentage values
efm_df = efm_df * 100 / [len(production_efms[col]) for col in efm_df.columns]
efm_df

[32]:

	etoh_e	glu__L_e	lac__D_e	succ_e
ACALD	100.000000	3.448276	1.481481	0.000000
ACALDt	0.523560	3.448276	1.481481	0.000000
ACONTa	100.000000	100.000000	100.000000	100.000000
ACONTb	100.000000	100.000000	100.000000	100.000000
ADK1	6.806283	29.655172	9.629630	6.060606
AKGDH	0.000000	0.689655	0.000000	36.363636
AKGt2r	13.612565	25.517241	28.148148	0.000000
ALCD2x	100.000000	0.000000	0.000000	0.000000
ATPM	7.329843	6.206897	4.444444	7.272727
ATPS4r	49.214660	64.827586	56.296296	63.030303
BIOMASS_Ecoli_core_w_GAM	100.000000	100.000000	100.000000	100.000000
CO2t	73.821990	100.000000	100.000000	80.000000
CS	100.000000	100.000000	100.000000	100.000000
CYTBD	57.591623	100.000000	92.592593	91.515152
D_LACt2	1.047120	11.724138	100.000000	6.060606
ENO	100.000000	100.000000	100.000000	100.000000
ETOHt2r	100.000000	0.000000	0.000000	0.000000
EX_acald_e	0.523560	3.448276	1.481481	0.000000
EX_akg_e	13.612565	25.517241	28.148148	0.000000
EX_co2_e	73.821990	100.000000	100.000000	80.000000
EX_glu__L_e	2.617801	0.000000	12.592593	0.606061
EX_h2o_e	100.000000	100.000000	100.000000	100.000000
EX_h_e	100.000000	100.000000	100.000000	100.000000
EX_lac__D_e	1.047120	11.724138	0.000000	6.060606
EX_nh4_e	100.000000	100.000000	100.000000	100.000000
EX_o2_e	57.591623	100.000000	92.592593	91.515152
EX_pi_e	100.000000	100.000000	100.000000	100.000000
EX_pyr_e	26.701571	20.689655	9.629630	23.636364
EX_succ_e	42.408377	0.689655	7.407407	0.000000
FBA	93.193717	85.517241	90.370370	97.575758
...	...	...	...	...
MALS	0.000000	0.000000	0.000000	35.151515
MDH	62.303665	40.689655	45.185185	69.090909
ME1	21.465969	15.172414	19.259259	27.272727
ME2	37.172775	26.896552	34.814815	52.121212
NADH16	100.000000	100.000000	100.000000	100.000000
NADTRHD	30.366492	35.172414	28.148148	24.242424
NH4t	100.000000	100.000000	100.000000	100.000000
O2t	57.591623	100.000000	92.592593	91.515152
PDH	100.000000	100.000000	100.000000	100.000000
PFK	93.193717	86.896552	90.370370	97.575758
PGI	97.905759	95.172414	97.037037	100.000000
PGK	100.000000	100.000000	100.000000	100.000000
PGL	34.031414	46.896552	49.629630	2.424242
PGM	100.000000	100.000000	100.000000	100.000000
PIt2r	100.000000	100.000000	100.000000	100.000000
PPC	100.000000	100.000000	100.000000	100.000000
PPCK	7.329843	6.206897	4.444444	7.272727
PPS	6.806283	29.655172	9.629630	6.060606
PYK	39.790576	22.068966	43.703704	40.606061
PYRt2	26.701571	20.689655	9.629630	23.636364
RPE	97.905759	95.862069	98.518519	100.000000
RPI	100.000000	100.000000	100.000000	100.000000
SUCCt2_2	18.848168	11.724138	2.962963	29.696970
SUCCt3	50.785340	11.724138	10.370370	100.000000
SUCOAS	0.000000	0.689655	0.000000	36.363636
TALA	82.198953	86.206897	71.851852	100.000000
THD2	51.832461	44.137931	42.222222	37.575758
TKT1	82.198953	86.206897	71.851852	100.000000
TKT2	97.905759	95.862069	98.518519	100.000000
TPI	93.193717	85.517241	90.370370	97.575758

76 rows × 4 columns

For the purpose of this tutorial, we are only interested in the differences between EFMs for different products and thus, we will only consider rows (reactions) whose frequency is variable amongst products.

In this example, we identify the 20 reactions with highest variance and subset the dataframe to be represented as a heatmap.

[33]:

variance = efm_df.T.apply(lambda x: x.var())
high_variance = variance.sort_values(ascending= False).iloc[:20]
display_heatmap(efm_df.loc[high_variance.index,:])

Constraint-based analysis of mutant phenotypes¶

With MCSs stored as dictionaries, we can easily use COBRA to analyse these knockouts. In this snippet, a dictionary is created to store various simulation values (Flux Balance/Variability Analysis) obtained after applying the reaction knockouts suggested by the MCSs. These are: * product_minimum and product_maximum - Flux range for succinate production * max_growth - Maximum growth rate * product_at_max_growth - Predicted production rate at maximum growth * glucose_consumption - Glucose consumption rate at maximum growth

[34]:

def analyse(kos):
    apply_kos(kos)
    try:
        # Flux Balance Analysis
        sol = model.optimize()

        biomass = sol.fluxes.BIOMASS_Ecoli_core_w_GAM
        product = sol.fluxes.EX_succ_e
        substrate = sol.fluxes.EX_glc__D_e

        # Flux Variability Analysis
        fva = cobra.flux_analysis.variability.flux_variability_analysis(
            model, reaction_list=['EX_succ_e']).loc['EX_succ_e',:]

        return {
        'product_maximum': fva['maximum'],
        'product_minimum': fva['minimum'],
        'max_growth': biomass,
        'product_at_max_growth':product,
        'glucose_consumption': substrate
        }

    except:
        pass
    finally:
        # restore reaction bounds to their original state
        # so the next iteration applies knockouts over them
        restore_bounds()
    return

pandas can be used to store these values as a dataframe for easier manipulation and analysis

[35]:

import pandas as pd

df = pd.DataFrame({frozenset(ko): analyse(ko) for ko in mcs}).T

cobra/util/solver.py:416 UserWarning: solver status is 'infeasible'

Solutions with null or very low growth, as well as solutions with very low production rates are filtered out. This way, only the best candidates are displayed

[36]:

null_growth = df.max_growth.isnull() # growth is None - due to infeasibility
zero_growth = df.max_growth <= 1e-10 # growth is too low (near 0)
no_product = df.product_at_max_growth <= 1e-5 # no production at maximum biomass

df.index.names = ['knockouts']

df[~ (zero_growth | null_growth | no_product)].sort_values(by='max_growth', ascending=False).iloc[1:20,:]

[36]:

	glucose_consumption	max_growth	product_at_max_growth	product_maximum	product_minimum
knockouts
(PPS, LDH_D, ACALD, CYTBD, PYRt2)	-10	0.110886	9.09864	9.09864	9.09864
(PPS, D_LACt2, PYRt2, CYTBD, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, PPS, EX_lac__D_e, CYTBD, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, PPS, EX_lac__D_e, O2t, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, ADK1, EX_lac__D_e, EX_o2_e, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, ADK1, EX_lac__D_e, O2t, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(O2t, LDH_D, ADK1, ACALD, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(LDH_D, ADK1, ACALD, EX_o2_e, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, LDH_D, ADK1, ACALD, EX_o2_e)	-10	0.110886	9.09864	9.09864	9.09864
(O2t, LDH_D, ADK1, ACALD, PYRt2)	-10	0.110886	9.09864	9.09864	9.09864
(EX_pyr_e, ADK1, EX_lac__D_e, EX_o2_e, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(O2t, ADK1, ACALD, D_LACt2, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(ADK1, ACALD, D_LACt2, EX_o2_e, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(EX_pyr_e, O2t, ADK1, EX_lac__D_e, ACALD)	-10	0.110886	9.09864	9.09864	9.09864
(PYRt2, ADK1, ACALD, D_LACt2, EX_o2_e)	-10	0.110886	9.09864	9.09864	9.09864
(LDH_D, ADK1, ACALD, CYTBD, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(PPS, D_LACt2, ACALD, CYTBD, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(PPS, LDH_D, ACALD, CYTBD, EX_pyr_e)	-10	0.110886	9.09864	9.09864	9.09864
(EX_pyr_e, ADK1, EX_lac__D_e, CYTBD, ACALD)	-10	0.110886	9.09864	9.09864	9.09864

[ ]: