Section 2 Materials
R and packages
At the time of writing, the most recent version of R is 3.6.2 (Dark and Stormy Night). The instructions of how to install R in different platforms (Linux, OS X and Windows) can be found at https://www.r-project.org where precompiled binaries are provided for download. For Linux users who do not have a sudo
privilege, R could be installed from the source code at the home directory (that is, $HOME
):
wget https://cran.wu.ac.at/src/base/R-3/R-3.6.2.tar.gz
tar xvfz R-3.6.2.tar.gz
cd R-3.6.2
./configure --prefix=$HOME/R-3.6.2
make
make check
make install
$HOME/R-3.6.2/bin/R # start R
We highly recommend using a dedicated package BiocManager
to install and update any packages that have been deposited into Bioconductor
and CRAN
, two repositories that are exclusive to each other so that a package cannot be deposited into both. BiocManager
should be installed first in a conventional way (i.e. using the function install.packages
), and then can be used to install other packages in a single step. Once an additional package remotes
also installed, BiocManager
can be also used to install packages hosted at GitHub
, usually as a development repository prior to submission into Bioconductor
or CRAN
.
# first, install the package BiocManager
install.packages("BiocManager")
# then install packages from Bioconductor and CRAN
BiocManager::install(c("biobroom","dnet","ggrepel","gridExtra","limma","patchwork","remotes","tidyverse","osfr"), dependencies=T)
# can also install packages from GitHub
BiocManager::install("hfang-bristol/XGR")
Genetic interactions
We extracted human genetic interactions from BioGRID
(version 3.5.179) involving 3102 genes (mapped to NCBI GeneID; the same hereinafter) and their 7856 interactions. This dataset was preprocessed into an igraph
object (using the igraph
package), saved as an RData-formatted file ig.BioGRID_genetic.RData
, deposited at https://osf.io/gskpn
).
ig.BioGRID_genetic
## IGRAPH 0a7230a UN-- 3102 7856 --
## + attr: name (v/c), geneid (v/n), symbol (v/c), description (v/c),
## | nPMID (e/n)
## + edges from 0a7230a (vertex names):
## [1] A1BG --REV3L A2M --KRAS AAGAB --TP53 AAMP --KRAS
## [5] AANAT --P2RY6 AANAT --SPHK1 AANAT --SSTR5 AANAT --TOP1
## [9] AARS2 --LEO1 AARS2 --MRPS16 AARS2 --MRPS5 AARS2 --PSMB6
## [13] AATF --DONSON AATF --GFI1B AATF --MCM3AP ABCB5 --CSK
## [17] ABCB5 --KRAS ABCB7 --AURKA ABCB7 --HSCB ABCB7 --LONP1
## [21] ABCB7 --MBTPS2 ABCB7 --MED23 ABCB7 --NUBP1 ABCB7 --PITRM1
## [25] ABCB7 --TAF2 ABCE1 --KRAS ABCG2 --CSK ABCG5 --FLT3
## + ... omitted several edges
Gene expression
We obtained human tissue RNA-seq datasets (gene-centric expression level quantified as transcripts per million [TPM]) in the GTEx
study (version 8). This study recruited ~1000 postmortem donors from which 49 tissues (each tissue with at least 70 donors/samples) were profiled using bulk RNA-seq. To aid in selecting tissue-specific expressed genes and their expression distribution within a tissue, we precalculated descriptive summary for each gene per tissue: ymin (the minimum TPM amongst the same tissue samples), lower (25% quantile), middle (i.e. median), upper (75% quantile) and ymax (the maximum TPM). This per-tissue gene summary data was represented as a tibble
object (using the tibble
package) and saved as an RData file GTEx_V8_TPM_boxplot.RData
. Doing so this dataset, though much reduced in size, is still informative for further extraction of genes expressed in a tissue (filtering by ymin >= 1
) and for boxplot visualisation of expression distribution.
GTEx_V8_TPM_boxplot
## # A tibble: 1,709,316 x 9
## ENSG Symbol SMTSD SMTS ymin lower middle upper ymax
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ENSG00000… DDX11L1 Adipose - Subc… Adipose… 0 0 0 0 0.166
## 2 ENSG00000… DDX11L1 Muscle - Skele… Muscle 0 0 0 0.0150 0.116
## 3 ENSG00000… DDX11L1 Artery - Tibial Blood V… 0 0 0 0 0.130
## 4 ENSG00000… DDX11L1 Artery - Coron… Blood V… 0 0 0 0 0.0710
## 5 ENSG00000… DDX11L1 Heart - Atrial… Heart 0 0 0 0.0143 0.138
## 6 ENSG00000… DDX11L1 Adipose - Visc… Adipose… 0 0 0 0 0.127
## 7 ENSG00000… DDX11L1 Uterus Uterus 0 0 0 0.0244 0.148
## 8 ENSG00000… DDX11L1 Vagina Vagina 0 0 0 0.0202 0.118
## 9 ENSG00000… DDX11L1 Breast - Mamma… Breast 0 0 0 0.00493 0.0744
## 10 ENSG00000… DDX11L1 Skin - Not Sun… Skin 0 0 0 0 0.114
## # … with 1,709,306 more rows