vignettes/perturbation_sample.Rmd
perturbation_sample.Rmd
The perturbation efficiency means that the disturbance and propagation efficiency of an element in the network. In a given network not all nodes have same or similar perturbation efficiency. Changes in some nodes might propagate to whole network and for some nodes the effect might be limited to small subgraph of the network. Not only topology but also miRNA:target interaction dynamics determine perturbation efficiency.
Expression level and type of trigger element plays a crucial role. The trigger element can be an miRNA or competing target. The perturbation efficiency is affected from ratio of miRNA amount to sum of expression levels of its targets. Also, amount of competing element among whole competing elements is important since it determines distribution of miRNA. For example, if trigger is an miRNA with expression level of 1500 and if sum of expression levels of its targets is 1000000, then this miRNA will not perturb its neighborhood efficiently. So, miRNA:target ratio is important for regulation of interaction network.
In a biological system, the miRNA:target interactions does not depend solely on stoichiometry, unfortunately. miRNAs affect the targets via degradation or inhibition after the binding. The experimental studies have shown that the features of miRNA:target interactions determine the binding and degradation efficiency. For example, binding energy between miRNA and target and seed structure of miRNA determine the binding efficiency of complex. In addition, the binding region on the target affects the degradation of target.
On the other hand, interaction factors affecting binding and degradation of miRNA to its targets also have impact one efficiency of perturbation of the change. For instance, if a node has very low binding affinity to targeting miRNA, change in expression level of that node will cause weak or no perturbation.
Thus, we developed functions which can calculate perturbation efficiency of a given node or all nodes. calc_perturbation()
function calculates perturbation efficiency for given trigger (e.g. Gene17 2-fold). find_node_perturbation()
function screens the whole network and calculate perturbation efficiency of all nodes.
# install.packages("devtools")
# devtools::install_github("selcenari/ceRNAnetsim")
library(ceRNAnetsim)
data("midsamp")
midsamp
#> Genes miRNAs Gene_expression miRNA_expression seeds targeting_region
#> 1 Gene1 Mir1 10000 1000 0.43 0.30
#> 2 Gene2 Mir1 10000 1000 0.43 0.01
#> 3 Gene3 Mir1 5000 1000 0.32 0.40
#> 4 Gene4 Mir1 10000 1000 0.23 0.50
#> 5 Gene4 Mir2 10000 2000 0.35 0.90
#> 6 Gene5 Mir2 5000 2000 0.05 0.40
#> 7 Gene6 Mir2 10000 2000 0.01 0.80
#> 8 Gene4 Mir3 10000 3000 0.43 0.40
#> 9 Gene6 Mir3 10000 3000 0.35 0.90
#> 10 Gene7 Mir3 7000 3000 0.23 0.30
#> 11 Gene8 Mir3 3000 3000 0.01 0.20
#> 12 Gene6 Mir4 10000 5000 0.05 0.40
#> 13 Gene9 Mir4 6000 5000 0.32 0.80
#> 14 Gene10 Mir4 2000 5000 0.43 0.20
#> 15 Gene11 Mir4 8000 5000 0.35 0.90
#> 16 Gene12 Mir4 1500 5000 0.43 0.40
#> 17 Gene13 Mir4 500 5000 0.23 0.90
#> 18 Gene14 Mir4 7000 5000 0.43 0.80
#> 19 Gene14 Mir3 7000 3000 0.43 0.90
#> 20 Gene15 Mir3 3000 3000 0.35 0.20
#> 21 Gene16 Mir3 2000 3000 0.01 0.80
#> 22 Gene17 Mir3 6000 3000 0.23 0.40
#> 23 Gene17 Mir2 6000 2000 0.35 0.90
#> 24 Gene18 Mir2 1000 2000 0.01 0.01
#> 25 Gene19 Mir2 6500 2000 0.43 0.90
#> 26 Gene20 Mir2 4800 2000 0.35 0.80
#> Energy
#> 1 -20
#> 2 -15
#> 3 -14
#> 4 -10
#> 5 -12
#> 6 -11
#> 7 -25
#> 8 -6
#> 9 -15
#> 10 -20
#> 11 -30
#> 12 -12
#> 13 -18
#> 14 -23
#> 15 -25
#> 16 -30
#> 17 -17
#> 18 -15
#> 19 -25
#> 20 -12
#> 21 -18
#> 22 -22
#> 23 -7
#> 24 -30
#> 25 -32
#> 26 -18
calc_perturbation()
work?This function works for a given node from network. It calculates and returns two values:
In the example below, “Gene17” is up-regulated 3-fold in midsamp dataset where Energy
and seeds
columns are used for calculating affinity effect and targeting_region
columns is used for calculating degradation effect. The network will be iterated over 30 times and number of disturbed nodes (as taking into account nodes that have changed more than the value of the threshold (0.1 percentage in terms of the change)) will be counted.
midsamp %>%
priming_graph(competing_count = Gene_expression,
miRNA_count = miRNA_expression,
aff_factor = c(Energy,seeds),
deg_factor = targeting_region) %>%
calc_perturbation("Gene17", 3, cycle = 30, limit = 0.1)
#> Warning in priming_graph(., competing_count = Gene_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tibble: 1 x 2
#> perturbation_efficiency perturbed_count
#> <dbl> <dbl>
#> 1 0.377 8
If you are interested in testing various fold change values of a given node, then we can use map
(actually parallelized version future_map
) to run function for set of input values.
First, let’s keep the primed version of graph in an object
primed_mid <- midsamp %>%
priming_graph(competing_count = Gene_expression,
miRNA_count = miRNA_expression,
aff_factor = c(Energy,seeds),
deg_factor = targeting_region)
#> Warning in priming_graph(., competing_count = Gene_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
Now, let’s calculate perturbation efficiency caused by 2-fold to 10-fold increase in Gene17
# for parallel processing
# future::plan(multiprocess)
seq(2,10) %>%
rlang::set_names() %>%
furrr::future_map_dfr(~ primed_mid %>% calc_perturbation("Gene17", .x, cycle = 30, limit = 0.1), .id="fold_change")
If you’re interested in screening nodes instead of fold changes then you don’t have to write a complicated map
command, there’s already a function available for that purpose.
find_node_perturbation()
work?The find_node_perturbation()
function calculates the perturbation efficiency and perturbed node count of each node in network.
In the example below, each node is increased 2-fold and tested for perturbation efficiency for 4 cycles with threshold of 0.1
midsamp %>%
priming_graph(competing_count = Gene_expression,
miRNA_count = miRNA_expression,
aff_factor = c(Energy,seeds),
deg_factor = targeting_region) %>%
find_node_perturbation(how = 3, cycle = 4, limit = 0.1)%>%
select(name, perturbation_efficiency, perturbed_count)
#> Warning in priming_graph(., competing_count = Gene_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tibble: 24 x 3
#> name perturbation_efficiency perturbed_count
#> <chr> <dbl> <dbl>
#> 1 Gene1 0.0626 2
#> 2 Gene2 0.0974 3
#> 3 Gene3 0.0311 2
#> 4 Gene4 0.561 10
#> 5 Gene5 0.0366 3
#> 6 Gene6 0.505 12
#> 7 Gene7 0.251 6
#> 8 Gene8 0.00973 0
#> 9 Gene9 0.688 6
#> 10 Gene10 0.510 6
#> # ... with 14 more rows
On the other hand, some of nodes in network might not be affected from perturbation because of low expression or weak interaction factors. In this case, fast
argument can be used. Argument fast
calculate affected expression percent of the targets and perturbation calculation is not ran for these elements in network, if that percentage value is smaller than given fast
value.
midsamp %>%
priming_graph(competing_count = Gene_expression,
miRNA_count = miRNA_expression,
aff_factor = c(Energy,seeds),
deg_factor = targeting_region) %>%
find_node_perturbation(how = 3, cycle = 4, limit = 0.1, fast=5)%>%
select(name, perturbation_efficiency, perturbed_count)
#> Warning in priming_graph(., competing_count = Gene_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> Warning: `as_quosure()` requires an explicit environment as of rlang 0.3.0.
#> Please supply `env`.
#> This warning is displayed once per session.
#> Subsetting by edges
#> Warning: `cols` is now required.
#> Please use `cols = c(eff_count)`
#> # A tibble: 24 x 3
#> name perturbation_efficiency perturbed_count
#> <chr> <dbl> <dbl>
#> 1 Gene1 NA NA
#> 2 Gene2 NA NA
#> 3 Gene3 NA NA
#> 4 Gene4 NA NA
#> 5 Gene5 NA NA
#> 6 Gene6 1.26 8
#> 7 Gene7 NA NA
#> 8 Gene8 NA NA
#> 9 Gene9 3.12 8
#> 10 Gene10 2.90 8
#> # ... with 14 more rows
The results of the find_node_perturbation()
will list effectiveness or importance of nodes in the network. This function can help selecting nodes which will have effective perturbation in network.
find_node_perturbation()
runs calc_perturb
on all nodes in the network in parallel with help of the future
and furrr
packages. In this vignette, the function is demonstrated on the midsamp
data. This dataset is not comparable to actual biological miRNA:target gene datasets in size and complexity. Although find_node_perturbation()
runs in parallel it might take long time to run in real huge biological datasets.
In real biological datasets, more complex interactions whether functional or non-functional could be observed. We have improved our approach with fast
argument in find_node_perturbation()
based on selection of elements that could be affected from perturbation. In this fucntion, fast
argument specifies the percentage of the competing amount that can be affected within the initial competing amount and acts as a selection parameter. For instance, in huge_example data:
data("huge_example")
huge_example%>%
priming_graph(competing_count = competing_counts, miRNA_count = mirnaexpression_normal)%>%
find_node_perturbation(how=5, cycle=3, fast = 3)%>%
select(name, perturbation_efficiency, perturbed_count)
huge_example%>%
priming_graph(competing_count = competing_counts, miRNA_count = mirnaexpression_normal)%>%
find_node_perturbation(how=5, cycle=3, fast = 3)%>%
filter(!is.na(perturbation_efficiency), !is.na(perturbed_count))%>%
select(name, perturbation_efficiency, perturbed_count)
name | perturbation_efficiency | perturbed_count |
---|---|---|
AK2 | 7.938457e-02 | 542 |
SLC25A5 | 1.540258e-01 | 542 |
MSL3 | 3.241137e-02 | 542 |
MYCBP2 | 3.141119e-02 | 514 |
VPS41 | 2.116591e-02 | 514 |
ADIPOR2 | 8.039431e-02 | 542 |
CELSR3 | 2.220963e-03 | 542 |
SCMH1 | 4.186803e-02 | 542 |
EXTL3 | 4.015119e-02 | 542 |
NR1H4 | 1.033477e-05 | 542 |