PheLiGe

No data

Results

Primary SNP	Not selected
Secondary SNP	Not selected
Theta	No data

Filter parameters

p-value	0.05
N factor	0.9

Analysis parameters

p	1
k	30
T	0.3

Help

MAF: 0.03

Help

Primary SNP:

Not selected

Secondary SNP:

Not selected

Trait Name	Population	Collection	rsID or position	Chr	BP	EA	RA	EAF	Beta	SE	Z	P-value	N	Info	R
Peripheral blood FHL3 (ILMN_1703558)	European	Westra_eQTL	rs12563037	1	38258644	C	G	0.2763	-0.5279	0.0201	-26.2043	2.375e-151	5311	0	0.9927
Peripheral blood INPP5B (ILMN_1810116)	European	Westra_eQTL	rs12563037	1	38258644	C	G	0.2763	-0.4539	0.0205	-22.1597	8.423e-109	5311	0	0.9927
blood INPP5B (ILMN_1810116)	Mixed	CAGE_eQTL	rs12728438	1	38349400	A	G	0.2881	-0.5413	0.0304	-17.8210	4.856e-71	2765	0	1.0000
blood FHL3 (ILMN_1703558)	Mixed	CAGE_eQTL	rs12728438	1	38349400	A	G	0.2881	-0.5339	0.0303	-17.6171	1.821e-69	2765	0	1.0000
Peripheral blood SF3A3 (ILMN_1705151)	European	Westra_eQTL	rs12563037	1	38258644	C	G	0.2763	-0.3412	0.0209	-16.3322	5.820e-60	5311	0	0.9927
Whole_Blood INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.5373	0.0350	-15.3434	1.103e-39	369	0	1.0000
blood SF3A3 (ILMN_1705151)	Mixed	CAGE_eQTL	rs12728438	1	38349400	A	G	0.2881	-0.3382	0.0304	-11.1258	9.397e-29	2765	0	1.0000
Artery_Tibial INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4534	0.0411	-11.0402	3.084e-24	388	0	1.0000
Nerve_Tibial INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4511	0.0408	-11.0480	5.774e-24	361	0	1.0000
Esophagus_Muscularis INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4880	0.0450	-10.8474	3.735e-23	335	0	1.0000
Adipose_Visceral_Omentum INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4227	0.0429	-9.8431	1.204e-19	313	0	1.0000
Adipose_Subcutaneous INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4834	0.0531	-9.1028	9.946e-18	385	0	1.0000
Sitting height	European	UKB_GeneAtlas_v2	rs12728438	1	38349400	A	G	0.2793	-0.0511	0.0062	-8.1698	3.095e-16	452264	0.9973	1.0000
Thyroid INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4129	0.0485	-8.5063	6.235e-16	399	0	1.0000
Standing height	European	UKB_NealeLab	rs12728438	1	38349400	A	G	0.2903	-0.0152	0.0019	-8.0086	1.164e-15	336474	0	1.0000
Lung INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4336	0.0539	-8.0492	1.708e-14	383	0	1.0000
Artery_Aorta INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.3632	0.0459	-7.9088	1.349e-13	267	0	1.0000
Esophagus_Gastroesophageal_Junction INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.5466	0.0689	-7.9327	2.395e-13	213	0	1.0000
Spleen INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.8574	0.1050	-8.1658	3.087e-13	146	0	1.0000
Cells_EBV-transformed_lymphocytes INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.7518	0.0900	-8.3560	5.321e-13	117	0	1.0000
blood UTP11L (ILMN_2130838)	Mixed	CAGE_eQTL	rs12728438	1	38349400	A	G	0.2881	-0.2123	0.0300	-7.0841	1.399e-12	2765	0	1.0000
Artery_Tibial FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.3080	0.0428	-7.1900	4.580e-12	388	0	1.0000
Esophagus_Mucosa SF3A3 (ENSG00000183431.7)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.1624	0.0229	-7.1028	9.417e-12	358	0	1.0000
Colon_Sigmoid INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.5124	0.0715	-7.1629	2.446e-11	203	0	1.0000
Sitting height	European	UKB_NealeLab	rs12728438	1	38349400	A	G	0.2903	-0.0137	0.0021	-6.6034	4.024e-11	336172	0	1.0000
Cells_Transformed_fibroblasts INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2888	0.0424	-6.8150	7.112e-11	300	0	1.0000
Esophagus_Mucosa FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2878	0.0426	-6.7526	7.877e-11	358	0	1.0000
Nerve_Tibial SF3A3 (ENSG00000183431.7)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2786	0.0419	-6.6513	1.416e-10	361	0	1.0000
Ovary INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.5109	0.0724	-7.0535	2.308e-10	122	0	1.0000
Thyroid FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2663	0.0408	-6.5313	2.448e-10	399	0	1.0000
Peripheral blood UTP11L (ILMN_2130838)	European	Westra_eQTL	rs12563037	1	38258644	C	G	0.2763	-0.1338	0.0213	-6.2742	3.515e-10	5311	0	0.9927
Artery_Aorta FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs35978921	1	38361215	T	C	0.7127	-0.3445	0.0532	6.4788	6.206e-10	267	0	0.9810
Nerve_Tibial FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2915	0.0457	-6.3825	6.769e-10	361	0	1.0000
CD8+ T lymphocytes (Peripheral blood) FHL3 (4210564)	European	CEDAR	rs12728438	1	38349400	A	G	0.2793	-0.1941	0.0306	-6.3400	9.249e-10	284	0	1.0000
Skin_Sun_Exposed_Lower_leg SF3A3 (ENSG00000183431.7)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2634	0.0427	-6.1653	1.955e-9	414	0	1.0000
Esophagus_Muscularis FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2402	0.0401	-5.9969	6.121e-9	335	0	1.0000
Self-reported: thyroid problem (not cancer)	European	UKB_GeneAtlas_v2	rs12728438	1	38349400	A	G	0.2793	-0.0030	0.0005	-5.8109	6.218e-9	452264	0.9973	1.0000
Lung FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs35978921	1	38361215	T	C	0.7127	-0.2120	0.0360	5.8953	9.597e-9	383	0	0.9810
Stomach INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.3448	0.0581	-5.9347	1.280e-8	237	0	1.0000
CD15+ granulocytes (Peripheral blood) SF3A3 (1230747)	European	CEDAR	rs12728438	1	38349400	A	G	0.2793	-0.2354	0.0401	-5.8650	1.332e-8	271	0	1.0000
blood YRDC (ILMN_2061732)	Mixed	CAGE_eQTL	rs12728438	1	38349400	A	G	0.2881	0.1688	0.0298	5.6702	1.426e-8	2765	0	1.0000
Skin_Sun_Exposed_Lower_leg FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs35465817	1	38363620	C	CA	0.2883	-0.2160	0.0373	-5.7976	1.514e-8	414	0	0.9787
Esophagus_Mucosa INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.3084	0.0532	-5.7943	1.780e-8	358	0	1.0000
Self-reported: hypothyroidism/myxoedema	European	UKB_GeneAtlas_v2	rs12728438	1	38349400	A	G	0.2793	-0.0026	0.0005	-5.6249	1.857e-8	452264	0.9973	1.0000
Skin_Not_Sun_Exposed_Suprapubic FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2795	0.0484	-5.7802	1.968e-8	335	0	1.0000
Artery_Coronary INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.3669	0.0610	-6.0143	2.191e-8	152	0	1.0000
Small_Intestine_Terminal_Ileum INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.4890	0.0819	-5.9712	3.613e-8	122	0	1.0000
Coronary artery disease The CARDIoGRAMplusC4D	European	others	rs2291297	1	38272660	A	G	0.2795	0.0304	0.0055	5.5037	3.720e-8	1157660	0	0.9878
Coronary artery disease 2022	European	others	rs2291297	1	38272660	A	G	0.2795	0.0304	0.0055	5.5037	3.720e-8	1157660	0	0.9878
Colon_Transverse INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs12728438	1	38349400	A	G	0.2793	-0.2944	0.0515	-5.7143	3.761e-8	246	0	1.0000
Artery_Tibial UTP11L (ENSG00000183520.7)	Mixed	GTEx_v7	rs28625842	1	38310786	A	G	0.2793	-0.1809	0.0322	-5.6174	4.205e-8	388	0	1.0000
Brain_Caudate_basal_ganglia INPP5B (ENSG00000204084.8)	Mixed	GTEx_v7	rs35465817	1	38363620	C	CA	0.2883	-0.3164	0.0543	-5.8284	4.672e-8	144	0	0.9787
Rheumatoid Arthritis	Mixed	jenger_gwas	rs28625842	1	38310786	A	G	0.3112	-0.1012	0.0185	-5.4609	4.709e-8	59206	0	1.0000
Cells_Transformed_fibroblasts FHL3 (ENSG00000183386.5)	Mixed	GTEx_v7	rs28625842	1	38310786	A	G	0.2793	-0.2774	0.0493	-5.6250	4.993e-8	300	0	1.0000

General details

PheLiGe is a web-service that provides access to publicly available results from human genetic association studies. By serving information and tools for investigation of (regional) genotype-phenotype associations across phenome, this service aims to provide a researcher with an insight into biological function affected by variation in question, to help formulating aetiologic hypothesis and inform functional studies. Web-service allows for exploration of genome-wide and regional associations, finding phenotypes associated to a genetic variant, and comparison of associations patterns between different traits to assertain whether a co-association is due to pleiotropy or linkage.

You can access the database via a web-interface with three tabs: Analysis, GWAS/cis-QTL Descriptors, Associations. In the Associations tab you can search for phenotypic associations observed for an SNP of interest, directly or via a proxy variant in LD. The search results will be presented as a table with several pages and sorted by association p-value. Moreover, on this tab you can select two regions for following colocalisation analysis. In the Analysis tab, regional patterns of association are compared using the θ metric, and hypothesise on whether the overlapping signals are due to pleiotropy or linkage disequilibrium. In the GWAS/cis-QTL Descriptors tab, you can access association study meta-data, search for specific association studies and investigate interactive Manhattan plot of a trait of interest.

For the convenience of a new user, we designed an interactive tour that demonstrates basic usage of PheLiGe. The tour is available via the “Start tour” button in the upper right corner.

Version	0.0.3
Number of GWAS Descriptors	8554
Number of RWAS Descriptors	1348967
Number of Associations	93651510164

Citing PheLiGe

To cite PheLiGe in scientific communications, please state the full database name and URL (e.g. PheLiGe at https://phelige.com) along with the following publication reference:

Shashkova TI, Pakhomov ED, Gorev DD, Karssen LC, Joshi PK, Aulchenko YS. PheLiGe: an interactive database of billions of human genotype-phenotype associations. Nucleic Acids Res. 2021 Jan 8;49(D1):D1347-D1350. doi: 10.1093/nar/gkaa1086. PMID: 33245779; PMCID: PMC7779071.

In case you have used extended data analysis functionality as implemented in GWAS-MAP, please also cite:

Shashkova TI, Gorev DD, Pakhomov ED, Shadrina AS, Sharapov SZ, Tsepilov YA, et al. The GWAS-MAP platform for aggregation of results of genome-wide association studies and the GWAS-MAP|homo database of 70 billion genetic associations of human traits. Vavilov J Genet Breed [Internet]. 2020 Dec 31;24(8):876–84. doi: 10.18699/VJ20.686.

Contacts

Please direct all inquiries regarding the service, including signing up, to phelige@polyknomics.com.

GWAS data

We collected summary statistics of genome- (GWAS) and region-wide association studies (RWAS) from open sources. For each summary statistics file, we created an annotation that contains information about study design and its key characteristics (sample size, details of association analysis mode, study population, license and use terms, etc.) Since the data were generated in different laboratories using different protocols, the resulting summary statistics files have different formats. To solve this problem, we developed an integration module that transforms data into a universal format. To ensure consistency of data within the database, our import procedure compares information about the SNP identification number, its position in the genome, and alleles to the reference. If any of the characteristics do not match, the SNP is not imported. The present implementation uses the reference that consists of 503 genomes of Europeans from the "1000 genomes" project (1000G phase 3 version 5). Next, we harmonized the data, so that the same effect and reference alleles are used in all GWASs. If a summary statistics file did not directly contain all columns that are required for conversion to the universal format, in certain cases, a GWAS could still be imported into the database. For example, missing allele frequency could be replaced with that from the reference; missing standard error could be computed based on the effect size and a p-value.

Next, we perform quality control (QC) for each study. In particular, QC includes a comparison of the frequencies of alleles from the study with these from the reference sample, a comparison of the reported p-values and p-values computed from the reported effect size and its standard error, an analysis of the distribution of estimates of the allele effects. SNPs are marked as outliers if the reported allele frequency deviates from the reference panel allele frequency by more than 0.2 (AF outlier), or in case the reported and computed association -log₁₀(p-value) differ by more than 2% for p-values less than 10^-10 and by more than the absolute value of 0.5 for other p-values > 10^-10 (PZ outlier).

Associations tab

Association tab allows searching for genotype-phenotype associations directly by a specific SNP, as specified by an rsID or chr:position, or proxies of this SNP through the database of results of genome-wide (regional) association scan (GWAS/RWAS). Proxies are defined as SNP in linkage disequilibrium (LD) less than specified r² threshold. The LD statistics was estimated for SNPs with MAF > 1% using haplotype from EUR 1000 Genome phase 1 version 3 samples within 1Mbp window. For each SNPs we kept LD statistics for up to 1000 proxy SNPs with r² > 0.5.

Among all the associations that satisfy the specified filters, we display only one per GWAS/RWAS - either the one with the queried SNP, or, when it is absent, the one with an SNP having the strongest LD (largest r²) with it.

You could press on the 'Add Filter' button to choose additional SNP filters by MAF, number of genotypes people (N), imputation quality, and outliers. Moreover, if you select specific traits on GWAS/cis-QTL Descriptors tab (see description of this tab for more details), then a new button 'Show selected' will appear. Click on it to check the list of selected traits. Start searching by SNP and results will be shown only for the selected traits.

Output

The table with the results is sorted by p-values and could be filtered by p-value cut-off in the appropriate box. You can select visible columns of interest to show on this tab by clicking on the icon in the top right corner. You can download results in CSV format. You can click on the 'Open' button in the 'SNP plot' column of this table to access the regional plot. In the pop-up window you will see a regional association plot, a recombination map, and a gene track. In the regional association plot each dot represents an SNP. You can filter SNPs by MAF using the slider on the right. If you move the cursor over a dot, you will see a tooltip with the SNP information (chromosome, position, alleles, p-value, and others). By clicking on the rsID in the tooltip, you will be redirected to NCBI SNP database, while by clicking on the magnifier glass near rsID, you will be redirected to the Associations tab and the database will be queried for this rsID.

Next, using the button on the left of a trait name, you can select a "primary" and a "secondary" trait, after which a colocalization analysis will be passed to the Analysis tab (see description of the Analysis tab below).

Output column description

Trait name
SNP Plot
Population
Collection
rsID
Chr, BP
EA, RA
EAF
Beta, SE, Z, P-value
N
Info
PZ Outlier
AF Outlier
R

GWAS/cis-QTL Descriptors tab

Database of GWAS/RWAS scans metadata collected from articles, study web-sites or other sources of data descriptions. You can use simple search to find GWAS/RWAS by trait name or trait abbreviation or author's name. You can use advanced search to use the 'collection' filter and/or add necessary filters using the 'Add filter' button.

Each study descriptor contains a field that provides possible synonyms and related ontology terms for the trait. For complex traits and diseases the trait names are matched with terms from the Experimental Factor Ontology (https://www.ebi.ac.uk/ols/ontologies/efo) as well as with ICD-10 (International Classification of Disease, revision 10) notations and codes. We also use specific nomenclatures for some of other domains. For example, for all eQTL studies transcript names were mapped to the HUGO Gene Nomenclature Committee (HGNC) gene names, and these gene names are part of the "trait name". For studies of levels of N-glycosylation we use a standard Oxford notation name as a part of the "trait name".

You can test the above described features by searching, for example, for EFO term "EFO:0003819", or ICD10 term "ICD10 K02", or HGNC gene name "FUT8", or a core-fucosylated galactosylated N-glycans "FA2G2", etc.

Output

The results table could be formed by clicking on the gear icon located in the top right corner, where you can select columns that will be visible. The results table could be downloaded in CSV format. You can investigate Manhattan plot of a trait of interest by clicking on the 'Open' button in the 'Plot' column. Then a pop-up window with the plot will appear. You could select a chromosome and then navigate through the genome using the instruments presented on the left. The SNPs can be filtered by MAF using a slider on the right. At the highest level of resolution, each SNP will be represented by a dot. This view is identical to the regional plot view described above (see Associations tab description). Check the box to the left of a trait name to select traits if you are interested in them, then go to the Associations tab and search SNPs in this pool of traits.

Output column description

Trait abbreviation, Trait name
Plot
Population
Collection
Study Year
Authors
Reference PMID
Reference DOI
Data DOI
Trait type
Tissue
Domain
N Cases
N Controls
N People
Genomic build
Association Metric
Frequency Source

Analysis tab

For analysis of colocalization of signals of association from different traits, we implemented a slightly modified version of the θ metric defined by Momozawa et al. (https://doi.org/10.1038/s41467-018-04365-8). In short, this method compares "profiles" of associations of two traits in some region to distinguish pleiotropy from linkage disequilibrium. Trait of interest should be chosen at the Associations tab.

It is expected that under pleiotropy (e.g. if the same causal genetic variant is responsible for association of both traits to the region) the similarity between association patterns would be high. In contrast, two distinct variants in linkage disequilibrium (LD), unless this LD is very high, are expected to generate different patterns of associations.

The θ metric is, in essence, a weighted correlation, and high similarity is reflected by its values close to 1 or -1. If an allele increasing the value of one trait also increases the value of the second trait, the sign of θ is positive. If the allele increasing the value of one trait decreases the value of the other, the sign of θ is negative. In case there is little similarity between the two association patterns, the value of θ is close to zero.

In more detail, theta is weighted correlation based on p-values ( $x=-log(\textit{p-values}_{\textit{primary trait}})$ , $y=-log(\textit{p-values}_{\textit{secondary trait}})$ ) and sign of effects ( $\beta_{x} =\beta_{\textit{primary trait}}$ , $\beta_{y} =\beta_{\textit{secondary trait}}$ ):
$\theta = \frac{r_{ws}}{1 + e^{-k(rw - T)}}$ ,
where $r_w=\frac{1}{\sum_{i=1}^{n} w_i} \sum_{i=1}^{n} w_i (\frac{x_i - \overline{x_w}}{\sigma_x^w})(\frac{y_i - \overline{y_w}}{\sigma_y^w})$ and $r_{ws}=\frac{1}{\sum_{i=1}^{n} w_i} \sum_{i=1}^{n} w_i (\frac{x_i - x_w}{\sigma_x^w})(\frac{y_i - y_w}{\sigma_y^w})sign(\beta_{x_i} * \beta_{y_i})$ , in which $\overline{x_w} = \frac{\sum_{i=1}^{n} w_i \times x_i }{\sum_{i=1}^{n} w_i}, \overline{y_w} = \frac{\sum_{i=1}^{n} y_i \times x_i }{\sum_{i=1}^{n} y_i}$ - weighted mean of x and y values, $\sigma_x^w=\sqrt{\frac{\sum_{i=1}^{n} w_i \times (x_i - \overline{x_w})^2 }{\sum_{i=1}^{n} w_i}}, \sigma_y^w=\sqrt{\frac{\sum_{i=1}^{n} w_i \times (y_i - \overline{y_w})^2 }{\sum_{i=1}^{n} w_i}}$ - weighted variance of x and y, $w_i=(max(\frac{x_i}{x_{max}},\frac{y_i}{y_{max}}))^p$ - weight of i-element.

Parameters k, p, T are global and are reflected in the Analysis tab's top right corner.

The system calculates theta using SNP that are located within 250 kbp of the index SNP, that for both compared GWAS/RWAS are common (i.e. $\textit{MAF} > 3\%$ as default), have $\textit{p-value} < 0.05$ , $N_{SNP} > 0.9 * N_{study}$ . You can change the MAF threshold by slider above the z-z plot and get new results immediately.

Output

Based on colocalization analysis results you can determine whether two different traits may be under the control of the same functional variant(s) in the locus (in the manuscript by Momozawa et al., the threshold $|\theta| > 0.7$ was suggested) or rather by different functional variants in linkage disequilibrium ( $|\theta| \leq 0.7$ ). The web-service provides interactive graphics for visual comparison of a region. For example, if $|\theta| > 0.7$ you should see a clear linear relation between the z-statistics of the primary and secondary GWAS/RWAS. In addition, you can check which SNPs entered the analysis and which were omitted and why. Also genes (NCBI genome build GRCh37.p13) located in the selected region are shown on the graph.

Results

Filter parameters

Analysis parameters

General details

Citing PheLiGe

Contacts

Cookie policy

GWAS data

Associations tab

Output

Output column description

GWAS/cis-QTL Descriptors tab

Output

Output column description

Analysis tab

Output