General details
PheLiGe is a web-service that provides access to publicly available results from human genetic association
studies. By serving information and tools for investigation of (regional) genotype-phenotype associations
across phenome, this service aims to provide a researcher with an insight into biological function affected
by variation in question, to help formulating aetiologic hypothesis and inform functional studies.
Web-service allows for exploration of genome-wide and regional associations, finding phenotypes associated
to a genetic variant, and comparison of associations patterns between different traits
to assertain whether a co-association is due to pleiotropy or linkage.
You can access the database via a web-interface with three tabs: Analysis, GWAS/cis-QTL Descriptors, Associations.
In the Associations tab you can search for phenotypic associations observed for an SNP of interest,
directly or via a proxy variant in LD. The search results will be presented as a table with several
pages and sorted by association p-value. Moreover, on this tab you can select two regions for following colocalisation analysis.
In the Analysis tab, regional patterns of association are compared using the θ metric,
and hypothesise on whether the overlapping signals are due to pleiotropy or linkage disequilibrium.
In the GWAS/cis-QTL Descriptors tab, you can access association study meta-data, search for
specific association studies and investigate interactive Manhattan plot of a trait of interest.
For the convenience of a new user, we designed an interactive tour that demonstrates
basic usage of PheLiGe. The tour is available via the “Start tour” button in the upper right corner.
Version | 0.0.3 |
---|
Number of GWAS Descriptors | 8554 |
---|
Number of RWAS Descriptors | 1348967 |
---|
Number of Associations | 93651510164 |
---|
Citing PheLiGe
To cite PheLiGe in scientific communications, please state the full database name and URL
(e.g. PheLiGe at https://phelige.com)
along with the following publication reference:
Shashkova TI, Pakhomov ED, Gorev DD, Karssen LC, Joshi PK, Aulchenko YS.
PheLiGe: an interactive database of billions of human genotype-phenotype associations.
Nucleic Acids Res. 2021 Jan 8;49(D1):D1347-D1350.
doi: 10.1093/nar/gkaa1086.
PMID: 33245779; PMCID: PMC7779071.
In case you have used extended data analysis functionality as implemented in GWAS-MAP, please also cite:
Shashkova TI, Gorev DD, Pakhomov ED, Shadrina AS, Sharapov SZ, Tsepilov YA, et al.
The GWAS-MAP platform for aggregation of results of genome-wide association studies and
the GWAS-MAP|homo database of 70 billion genetic associations of human traits.
Vavilov J Genet Breed [Internet]. 2020 Dec 31;24(8):876–84.
doi: 10.18699/VJ20.686.
Cookie policy
What are cookies
Cookies are simple text files that are stored on your computer or mobile device by a website's server.
Each cookie is unique to your web browser. It will contain some anonymous information such as
a unique identifier, website’s domain name, and some digits and numbers.
The cookies we set
Third party cookies
We do not set any third party cookies.
Deleting or disabling cookies
If you want to delete, restrict or block the cookies that are set by our website, you can do so
through your browser settings. Please consult your browser's help page to find out how you can do it.
Contacting us
If you have any questions about this cookie policy or our use of cookies,
please contact us via phelige@polyknomics.com
GWAS data
We collected summary statistics of genome- (GWAS) and region-wide association studies (RWAS) from open sources.
For each summary statistics file, we created an annotation
that contains information about study design and its key characteristics (sample size,
details of association analysis mode, study population, license and use terms, etc.)
Since the data were generated in different laboratories using different protocols, the resulting summary statistics
files have different formats. To solve this problem, we developed an integration
module that transforms data into a universal format. To ensure consistency of data within the database,
our import procedure compares information
about the SNP identification number, its position in the genome, and alleles to the reference.
If any of the characteristics do not match, the SNP is not imported.
The present implementation uses the reference that consists of 503 genomes of Europeans from the
"1000 genomes" project (1000G phase 3 version 5).
Next, we harmonized the data, so that the same effect and reference alleles are used in all GWASs.
If a summary statistics file did not directly contain all columns that are required for conversion to the universal format,
in certain cases, a GWAS could still be imported into the database. For example, missing allele
frequency could be replaced with that from the reference; missing standard error
could be computed based on the effect size and a p-value.
Next, we perform quality control (QC) for each study. In particular, QC includes a
comparison of the frequencies of alleles from the study with these from the
reference sample, a comparison of the reported p-values and p-values computed from
the reported effect size and its standard error, an analysis of the distribution of
estimates of the allele effects. SNPs are marked as outliers if the reported allele
frequency deviates from the reference panel allele frequency by more than 0.2 (AF outlier),
or in case the reported and computed association -log10(p-value) differ
by more than 2% for p-values less than 10-10 and by more than the
absolute value of 0.5 for other p-values > 10-10 (PZ outlier).
Associations tab
Association tab allows searching for genotype-phenotype associations
directly by a specific SNP, as specified by an rsID or chr:position, or proxies of
this SNP through the database of results of genome-wide (regional) association scan (GWAS/RWAS).
Proxies are defined as SNP in linkage disequilibrium (LD) less than specified r2 threshold.
The LD statistics was estimated for SNPs with MAF > 1% using haplotype from EUR 1000 Genome phase 1 version 3
samples within 1Mbp window. For each SNPs we kept LD statistics for up to 1000 proxy SNPs with r2 > 0.5.
Among all the associations that satisfy the specified filters, we display only one
per GWAS/RWAS - either the one with the queried SNP, or, when it is absent, the one
with an SNP having the strongest LD (largest r2) with it.
You could press on the 'Add Filter' button to choose additional SNP filters by MAF,
number of genotypes people (N), imputation quality, and outliers. Moreover, if you
select specific traits on GWAS/cis-QTL Descriptors tab (see description of this tab for more details),
then a new button 'Show selected' will appear. Click on it to check the list of selected traits.
Start searching by SNP and results will be shown only for the selected traits.
Output
The table with the results is sorted by p-values and could be filtered by p-value
cut-off in the appropriate box. You can select visible columns of interest to
show on this tab by clicking on the icon in the top right corner. You can download results in
CSV format. You can click on the 'Open' button in the 'SNP plot' column of
this table to access the regional plot. In the pop-up window you will see a
regional association plot, a recombination map, and a gene track. In the
regional association plot each dot represents an SNP. You can filter SNPs by MAF
using the slider on the right. If you move the cursor over a dot, you will
see a tooltip with the SNP information (chromosome, position, alleles, p-value, and others).
By clicking on the rsID in the tooltip, you will be redirected to NCBI SNP
database, while by clicking on the magnifier glass near rsID, you will be
redirected to the Associations tab and the database will be queried for this rsID.
Next, using the button on the left of a trait name, you can select a "primary" and a "secondary" trait,
after which a colocalization analysis will be passed to the Analysis tab
(see description of the Analysis tab below).
Output column description
- Trait name
The full trait name of scan in our database
- SNP Plot
Interactive region plot centred by the target SNP with window ±250 kbp
- Population
Population that was used in study on level like European, Asian, Mixed
- Collection
The collection name corresponding to one study (ex. UKB_Nealelab) or problem (ex. CVD) to combine scans
- rsID
The reference SNP ID for GRCh37 build
- Chr, BP
The chromosome and the position of the target SNP from GRCh37 build
- EA, RA
The effect and the reference allele
- EAF
The effect allele frequency from study or from reference, could check in description for correspond trait
- Beta, SE, Z, P-value
Characteristics of the SNP effect
- N
The number of genotyped people for corresponding SNP (if was present in data) or common for all SNP taken from study description
- Info
Imputation quality of the target SNP (if was present in data)
- PZ Outlier
Marked if SNP is PZ outlier (will be excluded from analysis)
- AF Outlier
Marked if SNP is AF outlier
- R
r value between the proxy and the target SNP estimated for effect alleles based on the haplotypes from 1000 Genomes
GWAS/cis-QTL Descriptors tab
Database of GWAS/RWAS scans metadata collected from articles, study web-sites or
other sources of data descriptions. You can use simple search to find GWAS/RWAS by
trait name or trait abbreviation or author's name. You can use advanced search to
use the 'collection' filter and/or add necessary filters using the 'Add filter' button.
Each study descriptor contains a field that provides possible synonyms and related ontology terms
for the trait. For complex traits and diseases the trait names are matched with terms
from the Experimental Factor Ontology (https://www.ebi.ac.uk/ols/ontologies/efo) as well as with ICD-10 (International Classification of Disease, revision 10) notations and codes.
We also use specific nomenclatures for some of other domains. For example, for all eQTL studies
transcript names were mapped to the HUGO Gene Nomenclature Committee (HGNC) gene names,
and these gene names are part of the "trait name".
For studies of levels of N-glycosylation we use a standard Oxford notation name as a part of the "trait name".
You can test the above described features by searching, for example, for EFO term "EFO:0003819",
or ICD10 term "ICD10 K02", or HGNC gene name "FUT8",
or a core-fucosylated galactosylated N-glycans "FA2G2", etc.
Output
The results table could be formed by clicking on the gear icon located in the top
right corner, where you can select columns that will be visible. The results table
could be downloaded in CSV format. You can investigate Manhattan plot of a trait
of interest by clicking on the 'Open' button in the 'Plot' column. Then a pop-up
window with the plot will appear. You could select a chromosome and then
navigate through the genome using the instruments presented on the left. The
SNPs can be filtered by MAF using a slider on the right. At the highest level of
resolution, each SNP will be represented by a dot. This view is identical to the
regional plot view described above (see Associations tab description).
Check the box to the left of a trait name to select traits if you are interested in
them, then go to the Associations tab and search SNPs in this pool of traits.
Output column description
- Trait abbreviation, Trait name
Short and full name of scan in database
- Plot
Interactive manhattan plot
- Population
Population that was used in study on level like European, Asian, Mixed
- Collection
The collection name corresponding to one study (ex. UKB_Nealelab) or problem (ex. CVD) to combine scans
- Study Year
Year of the study publication
- Authors
First author or consortium/laboratory name
- Reference PMID
PubMed ID of the corresponding article
- Reference DOI
Link to the original article (if exists)
- Data DOI
Link to the original summary statistics
- Trait type
The trait type, one of binary, quantitative, categorical
- Tissue
Tissue of sample collection
- Domain
Domain of the trait like complex trait, protein, metabolite etc.
- N Cases
Number of people that were used in the study as cases (for binary traits)
- N Controls
Number of people that were used in the study as controls (for binary traits)
- N People
Total number of people in the study
- Genomic build
Genomic build of original data in the following format: hgN/GRChN
- Association Metric
Type of the metric used to estimate SNP effect: beta (linear regression coefficient), logOR (logistic regression coefficient) etc.
- Frequency Source
Source of allele frequencies: study or current reference used during the unification step
Analysis tab
For analysis of colocalization of signals of association from different traits, we implemented a slightly
modified version of the θ metric defined by Momozawa et al. (https://doi.org/10.1038/s41467-018-04365-8). In short, this method compares "profiles" of associations of two traits in some region
to distinguish pleiotropy from linkage disequilibrium. Trait of
interest should be chosen at the Associations tab.
It is expected that under pleiotropy (e.g. if the same causal genetic variant is responsible for
association of both traits to the region) the similarity between association patterns would be high.
In contrast, two distinct variants in linkage disequilibrium (LD), unless this LD is very high,
are expected to generate different patterns of associations.
The θ metric is, in essence, a weighted correlation, and high similarity is reflected by its values
close to 1 or -1. If an allele increasing the value of one trait also increases the value of the second trait,
the sign of θ is positive. If the allele increasing the value of one trait decreases the value of the other,
the sign of θ is negative. In case there is little similarity between the two association patterns,
the value of θ is close to zero.
In more detail, theta is weighted correlation based on p-values (, ) and sign of effects (, ):
,
where and , in which - weighted mean of x and y values, - weighted variance of x and y, - weight of i-element.
Parameters k, p, T are global and are reflected in the Analysis tab's top right corner.
The system calculates theta using SNP that are located within 250 kbp of the index SNP, that for both compared
GWAS/RWAS are common (i.e. as default), have , . You can change the MAF threshold by slider above the z-z plot and get new results immediately.
Output
Based on colocalization analysis results you can determine whether two different traits may be under the control
of the same functional variant(s) in the locus (in the manuscript by Momozawa et al., the threshold was suggested) or rather by different functional variants in linkage disequilibrium (). The web-service provides interactive graphics for visual comparison of a region. For example, if you should see a clear linear relation between the z-statistics of the primary and secondary GWAS/RWAS.
In addition, you can check which SNPs entered the analysis and which were omitted and why. Also genes
(NCBI genome build GRCh37.p13) located in the selected region are shown on the graph.