Primary SNP | Not selected |
---|---|
Secondary SNP | Not selected |
Theta | No data |
p-value | NA |
---|---|
N factor | NA |
p | NA |
---|---|
k | NA |
T | NA |
PheLiGe is a web-service that provides access to publicly available results from human genetic association studies. By serving information and tools for investigation of (regional) genotype-phenotype associations across phenome, this service aims to provide a researcher with an insight into biological function affected by variation in question, to help formulating aetiologic hypothesis and inform functional studies. Web-service allows for exploration of genome-wide and regional associations, finding phenotypes associated to a genetic variant, and comparison of associations patterns between different traits to assertain whether a co-association is due to pleiotropy or linkage.
You can access the database via a web-interface with three tabs: Analysis, GWAS/cis-QTL Descriptors, Associations. In the Associations tab you can search for phenotypic associations observed for an SNP of interest, directly or via a proxy variant in LD. The search results will be presented as a table with several pages and sorted by association p-value. Moreover, on this tab you can select two regions for following colocalisation analysis. In the Analysis tab, regional patterns of association are compared using the θ metric, and hypothesise on whether the overlapping signals are due to pleiotropy or linkage disequilibrium. In the GWAS/cis-QTL Descriptors tab, you can access association study meta-data, search for specific association studies and investigate interactive Manhattan plot of a trait of interest.
For the convenience of a new user, we designed an interactive tour that demonstrates basic usage of PheLiGe. The tour is available via the “Start tour” button in the upper right corner.
Version | 0.0.3 |
---|---|
Number of GWAS Descriptors | 8554 |
Number of RWAS Descriptors | 1348967 |
Number of Associations | 93651304520 |
To cite PheLiGe in scientific communications, please state the full database name and URL (e.g. PheLiGe at https://phelige.com) along with the following publication reference:
Shashkova TI, Pakhomov ED, Gorev DD, Karssen LC, Joshi PK, Aulchenko YS. PheLiGe: an interactive database of billions of human genotype-phenotype associations. Nucleic Acids Res. 2021 Jan 8;49(D1):D1347-D1350. doi: 10.1093/nar/gkaa1086. PMID: 33245779; PMCID: PMC7779071.
In case you have used extended data analysis functionality as implemented in GWAS-MAP, please also cite:
Shashkova TI, Gorev DD, Pakhomov ED, Shadrina AS, Sharapov SZ, Tsepilov YA, et al. The GWAS-MAP platform for aggregation of results of genome-wide association studies and the GWAS-MAP|homo database of 70 billion genetic associations of human traits. Vavilov J Genet Breed [Internet]. 2020 Dec 31;24(8):876–84. doi: 10.18699/VJ20.686.
Please direct all inquiries regarding the service, including signing up, to phelige@polyknomics.com.
What are cookies
Cookies are simple text files that are stored on your computer or mobile device by a website's server. Each cookie is unique to your web browser. It will contain some anonymous information such as a unique identifier, website’s domain name, and some digits and numbers.
The cookies we set
Login related cookies
We use cookies when you are logged in so that we can remember this fact. This prevents you from having to log in every single time you visit a new page. These cookies are typically removed or cleared when you log out to ensure that you can only access restricted features and areas when logged in.
Third party cookies
We do not set any third party cookies.
Deleting or disabling cookies
If you want to delete, restrict or block the cookies that are set by our website, you can do so through your browser settings. Please consult your browser's help page to find out how you can do it.
Contacting us
If you have any questions about this cookie policy or our use of cookies, please contact us via phelige@polyknomics.com
We collected summary statistics of genome- (GWAS) and region-wide association studies (RWAS) from open sources. For each summary statistics file, we created an annotation that contains information about study design and its key characteristics (sample size, details of association analysis mode, study population, license and use terms, etc.) Since the data were generated in different laboratories using different protocols, the resulting summary statistics files have different formats. To solve this problem, we developed an integration module that transforms data into a universal format. To ensure consistency of data within the database, our import procedure compares information about the SNP identification number, its position in the genome, and alleles to the reference. If any of the characteristics do not match, the SNP is not imported. The present implementation uses the reference that consists of 503 genomes of Europeans from the "1000 genomes" project (1000G phase 3 version 5). Next, we harmonized the data, so that the same effect and reference alleles are used in all GWASs. If a summary statistics file did not directly contain all columns that are required for conversion to the universal format, in certain cases, a GWAS could still be imported into the database. For example, missing allele frequency could be replaced with that from the reference; missing standard error could be computed based on the effect size and a p-value.
Next, we perform quality control (QC) for each study. In particular, QC includes a comparison of the frequencies of alleles from the study with these from the reference sample, a comparison of the reported p-values and p-values computed from the reported effect size and its standard error, an analysis of the distribution of estimates of the allele effects. SNPs are marked as outliers if the reported allele frequency deviates from the reference panel allele frequency by more than 0.2 (AF outlier), or in case the reported and computed association -log10(p-value) differ by more than 2% for p-values less than 10-10 and by more than the absolute value of 0.5 for other p-values > 10-10 (PZ outlier).
Association tab allows searching for genotype-phenotype associations directly by a specific SNP, as specified by an rsID or chr:position, or proxies of this SNP through the database of results of genome-wide (regional) association scan (GWAS/RWAS). Proxies are defined as SNP in linkage disequilibrium (LD) less than specified r2 threshold. The LD statistics was estimated for SNPs with MAF > 1% using haplotype from EUR 1000 Genome phase 1 version 3 samples within 1Mbp window. For each SNPs we kept LD statistics for up to 1000 proxy SNPs with r2 > 0.5.
Among all the associations that satisfy the specified filters, we display only one per GWAS/RWAS - either the one with the queried SNP, or, when it is absent, the one with an SNP having the strongest LD (largest r2) with it.
You could press on the 'Add Filter' button to choose additional SNP filters by MAF, number of genotypes people (N), imputation quality, and outliers. Moreover, if you select specific traits on GWAS/cis-QTL Descriptors tab (see description of this tab for more details), then a new button 'Show selected' will appear. Click on it to check the list of selected traits. Start searching by SNP and results will be shown only for the selected traits.
The table with the results is sorted by p-values and could be filtered by p-value cut-off in the appropriate box. You can select visible columns of interest to show on this tab by clicking on the icon in the top right corner. You can download results in CSV format. You can click on the 'Open' button in the 'SNP plot' column of this table to access the regional plot. In the pop-up window you will see a regional association plot, a recombination map, and a gene track. In the regional association plot each dot represents an SNP. You can filter SNPs by MAF using the slider on the right. If you move the cursor over a dot, you will see a tooltip with the SNP information (chromosome, position, alleles, p-value, and others). By clicking on the rsID in the tooltip, you will be redirected to NCBI SNP database, while by clicking on the magnifier glass near rsID, you will be redirected to the Associations tab and the database will be queried for this rsID.
Next, using the button on the left of a trait name, you can select a "primary" and a "secondary" trait, after which a colocalization analysis will be passed to the Analysis tab (see description of the Analysis tab below).
Database of GWAS/RWAS scans metadata collected from articles, study web-sites or other sources of data descriptions. You can use simple search to find GWAS/RWAS by trait name or trait abbreviation or author's name. You can use advanced search to use the 'collection' filter and/or add necessary filters using the 'Add filter' button.
Each study descriptor contains a field that provides possible synonyms and related ontology terms for the trait. For complex traits and diseases the trait names are matched with terms from the Experimental Factor Ontology (https://www.ebi.ac.uk/ols/ontologies/efo) as well as with ICD-10 (International Classification of Disease, revision 10) notations and codes. We also use specific nomenclatures for some of other domains. For example, for all eQTL studies transcript names were mapped to the HUGO Gene Nomenclature Committee (HGNC) gene names, and these gene names are part of the "trait name". For studies of levels of N-glycosylation we use a standard Oxford notation name as a part of the "trait name".
You can test the above described features by searching, for example, for EFO term "EFO:0003819", or ICD10 term "ICD10 K02", or HGNC gene name "FUT8", or a core-fucosylated galactosylated N-glycans "FA2G2", etc.
The results table could be formed by clicking on the gear icon located in the top right corner, where you can select columns that will be visible. The results table could be downloaded in CSV format. You can investigate Manhattan plot of a trait of interest by clicking on the 'Open' button in the 'Plot' column. Then a pop-up window with the plot will appear. You could select a chromosome and then navigate through the genome using the instruments presented on the left. The SNPs can be filtered by MAF using a slider on the right. At the highest level of resolution, each SNP will be represented by a dot. This view is identical to the regional plot view described above (see Associations tab description). Check the box to the left of a trait name to select traits if you are interested in them, then go to the Associations tab and search SNPs in this pool of traits.
For analysis of colocalization of signals of association from different traits, we implemented a slightly modified version of the θ metric defined by Momozawa et al. (https://doi.org/10.1038/s41467-018-04365-8). In short, this method compares "profiles" of associations of two traits in some region to distinguish pleiotropy from linkage disequilibrium. Trait of interest should be chosen at the Associations tab.
It is expected that under pleiotropy (e.g. if the same causal genetic variant is responsible for association of both traits to the region) the similarity between association patterns would be high. In contrast, two distinct variants in linkage disequilibrium (LD), unless this LD is very high, are expected to generate different patterns of associations.
The θ metric is, in essence, a weighted correlation, and high similarity is reflected by its values close to 1 or -1. If an allele increasing the value of one trait also increases the value of the second trait, the sign of θ is positive. If the allele increasing the value of one trait decreases the value of the other, the sign of θ is negative. In case there is little similarity between the two association patterns, the value of θ is close to zero.
In more detail, theta is weighted correlation based on p-values (, ) and sign of effects (, ):
,
where and , in which - weighted mean of x and y values, - weighted variance of x and y, - weight of i-element.
Parameters k, p, T are global and are reflected in the Analysis tab's top right corner.
The system calculates theta using SNP that are located within 250 kbp of the index SNP, that for both compared GWAS/RWAS are common (i.e. as default), have , . You can change the MAF threshold by slider above the z-z plot and get new results immediately.
Based on colocalization analysis results you can determine whether two different traits may be under the control of the same functional variant(s) in the locus (in the manuscript by Momozawa et al., the threshold was suggested) or rather by different functional variants in linkage disequilibrium (). The web-service provides interactive graphics for visual comparison of a region. For example, if you should see a clear linear relation between the z-statistics of the primary and secondary GWAS/RWAS. In addition, you can check which SNPs entered the analysis and which were omitted and why. Also genes (NCBI genome build GRCh37.p13) located in the selected region are shown on the graph.