General details

PheLiGe is a web-service that provides access to publicly available results from human genetic association studies. By serving information and tools for investigation of (regional) genotype-phenotype associations across phenome, this service aims to provide a researcher with an insight into biological function affected by variation in question, to help formulating aetiologic hypothesis and inform functional studies. Web-service allows for exploration of genome-wide and regional associations, finding phenotypes associated to a genetic variant, and comparison of associations patterns between different traits to assertain whether a co-association is due to pleiotropy or linkage.

You can access the database via a web-interface with three tabs: Analysis, GWAS/cis-QTL Descriptors, Associations. In the Associations tab you can search for phenotypic associations observed for an SNP of interest, directly or via a proxy variant in LD. The search results will be presented as a table with several pages and sorted by association p-value. Moreover, on this tab you can select two regions for following colocalisation analysis. In the Analysis tab, regional patterns of association are compared using the θ metric, and hypothesise on whether the overlapping signals are due to pleiotropy or linkage disequilibrium. In the GWAS/cis-QTL Descriptors tab, you can access association study meta-data, search for specific association studies and investigate interactive Manhattan plot of a trait of interest.

For the convenience of a new user, we designed an interactive tour that demonstrates basic usage of PheLiGe. The tour is available via the “Start tour” button in the upper right corner.

Citing PheLiGe

To cite PheLiGe in scientific communications, please state the full database name and URL (e.g. PheLiGe at https://phelige.com) along with the following publication reference:

Shashkova TI, Pakhomov ED, Gorev DD, Karssen LC, Joshi PK, Aulchenko YS. PheLiGe: an interactive database of billions of human genotype-phenotype associations. Nucleic Acids Res. 2021 Jan 8;49(D1):D1347-D1350. doi: 10.1093/nar/gkaa1086. PMID: 33245779; PMCID: PMC7779071.

In case you have used extended data analysis functionality as implemented in GWAS-MAP, please also cite:

Shashkova TI, Gorev DD, Pakhomov ED, Shadrina AS, Sharapov SZ, Tsepilov YA, et al. The GWAS-MAP platform for aggregation of results of genome-wide association studies and the GWAS-MAP|homo database of 70 billion genetic associations of human traits. Vavilov J Genet Breed [Internet]. 2020 Dec 31;24(8):876–84. doi: 10.18699/VJ20.686.

Contacts

Please direct all inquiries regarding the service, including signing up, to phelige@polyknomics.com.

GWAS data

We collected summary statistics of genome- (GWAS) and region-wide association studies (RWAS) from open sources. For each summary statistics file, we created an annotation that contains information about study design and its key characteristics (sample size, details of association analysis mode, study population, license and use terms, etc.) Since the data were generated in different laboratories using different protocols, the resulting summary statistics files have different formats. To solve this problem, we developed an integration module that transforms data into a universal format. To ensure consistency of data within the database, our import procedure compares information about the SNP identification number, its position in the genome, and alleles to the reference. If any of the characteristics do not match, the SNP is not imported. The present implementation uses the reference that consists of 503 genomes of Europeans from the "1000 genomes" project (1000G phase 3 version 5). Next, we harmonized the data, so that the same effect and reference alleles are used in all GWASs. If a summary statistics file did not directly contain all columns that are required for conversion to the universal format, in certain cases, a GWAS could still be imported into the database. For example, missing allele frequency could be replaced with that from the reference; missing standard error could be computed based on the effect size and a p-value.

Next, we perform quality control (QC) for each study. In particular, QC includes a comparison of the frequencies of alleles from the study with these from the reference sample, a comparison of the reported p-values and p-values computed from the reported effect size and its standard error, an analysis of the distribution of estimates of the allele effects. SNPs are marked as outliers if the reported allele frequency deviates from the reference panel allele frequency by more than 0.2 (AF outlier), or in case the reported and computed association -log10(p-value) differ by more than 2% for p-values less than 10-10 and by more than the absolute value of 0.5 for other p-values > 10-10 (PZ outlier).

Associations tab

Association tab allows searching for genotype-phenotype associations directly by a specific SNP, as specified by an rsID or chr:position, or proxies of this SNP through the database of results of genome-wide (regional) association scan (GWAS/RWAS). Proxies are defined as SNP in linkage disequilibrium (LD) less than specified r2 threshold. The LD statistics was estimated for SNPs with MAF > 1% using haplotype from EUR 1000 Genome phase 1 version 3 samples within 1Mbp window. For each SNPs we kept LD statistics for up to 1000 proxy SNPs with r2 > 0.5.

Among all the associations that satisfy the specified filters, we display only one per GWAS/RWAS - either the one with the queried SNP, or, when it is absent, the one with an SNP having the strongest LD (largest r2) with it.

You could press on the 'Add Filter' button to choose additional SNP filters by MAF, number of genotypes people (N), imputation quality, and outliers. Moreover, if you select specific traits on GWAS/cis-QTL Descriptors tab (see description of this tab for more details), then a new button 'Show selected' will appear. Click on it to check the list of selected traits. Start searching by SNP and results will be shown only for the selected traits.

Output

The table with the results is sorted by p-values and could be filtered by p-value cut-off in the appropriate box. You can select visible columns of interest to show on this tab by clicking on the icon in the top right corner. You can download results in CSV format. You can click on the 'Open' button in the 'SNP plot' column of this table to access the regional plot. In the pop-up window you will see a regional association plot, a recombination map, and a gene track. In the regional association plot each dot represents an SNP. You can filter SNPs by MAF using the slider on the right. If you move the cursor over a dot, you will see a tooltip with the SNP information (chromosome, position, alleles, p-value, and others). By clicking on the rsID in the tooltip, you will be redirected to NCBI SNP database, while by clicking on the magnifier glass near rsID, you will be redirected to the Associations tab and the database will be queried for this rsID.

Next, using the button on the left of a trait name, you can select a "primary" and a "secondary" trait, after which a colocalization analysis will be passed to the Analysis tab (see description of the Analysis tab below).

Output column description

Trait name
The full trait name of scan in our database
SNP Plot
Interactive region plot centred by the target SNP with window ±250 kbp
Population
Population that was used in study on level like European, Asian, Mixed
Collection
The collection name corresponding to one study (ex. UKB_Nealelab) or problem (ex. CVD) to combine scans
rsID
The reference SNP ID for GRCh37 build
Chr, BP
The chromosome and the position of the target SNP from GRCh37 build
EA, RA
The effect and the reference allele
EAF
The effect allele frequency from study or from reference, could check in description for correspond trait
Beta, SE, Z, P-value
Characteristics of the SNP effect
N
The number of genotyped people for corresponding SNP (if was present in data) or common for all SNP taken from study description
Info
Imputation quality of the target SNP (if was present in data)
PZ Outlier
Marked if SNP is PZ outlier (will be excluded from analysis)
AF Outlier
Marked if SNP is AF outlier
R
r value between the proxy and the target SNP estimated for effect alleles based on the haplotypes from 1000 Genomes

GWAS/cis-QTL Descriptors tab

Database of GWAS/RWAS scans metadata collected from articles, study web-sites or other sources of data descriptions. You can use simple search to find GWAS/RWAS by trait name or trait abbreviation or author's name. You can use advanced search to use the 'collection' filter and/or add necessary filters using the 'Add filter' button.

Each study descriptor contains a field that provides possible synonyms and related ontology terms for the trait. For complex traits and diseases the trait names are matched with terms from the Experimental Factor Ontology (https://www.ebi.ac.uk/ols/ontologies/efo) as well as with ICD-10 (International Classification of Disease, revision 10) notations and codes. We also use specific nomenclatures for some of other domains. For example, for all eQTL studies transcript names were mapped to the HUGO Gene Nomenclature Committee (HGNC) gene names, and these gene names are part of the "trait name". For studies of levels of N-glycosylation we use a standard Oxford notation name as a part of the "trait name".

You can test the above described features by searching, for example, for EFO term "EFO:0003819", or ICD10 term "ICD10 K02", or HGNC gene name "FUT8", or a core-fucosylated galactosylated N-glycans "FA2G2", etc.

Output

The results table could be formed by clicking on the gear icon located in the top right corner, where you can select columns that will be visible. The results table could be downloaded in CSV format. You can investigate Manhattan plot of a trait of interest by clicking on the 'Open' button in the 'Plot' column. Then a pop-up window with the plot will appear. You could select a chromosome and then navigate through the genome using the instruments presented on the left. The SNPs can be filtered by MAF using a slider on the right. At the highest level of resolution, each SNP will be represented by a dot. This view is identical to the regional plot view described above (see Associations tab description). Check the box to the left of a trait name to select traits if you are interested in them, then go to the Associations tab and search SNPs in this pool of traits.

Output column description

Trait abbreviation, Trait name
Short and full name of scan in database
Plot
Interactive manhattan plot
Population
Population that was used in study on level like European, Asian, Mixed
Collection
The collection name corresponding to one study (ex. UKB_Nealelab) or problem (ex. CVD) to combine scans
Study Year
Year of the study publication
Authors
First author or consortium/laboratory name
Reference PMID
PubMed ID of the corresponding article
Reference DOI
Link to the original article (if exists)
Data DOI
Link to the original summary statistics
Trait type
The trait type, one of binary, quantitative, categorical
Tissue
Tissue of sample collection
Domain
Domain of the trait like complex trait, protein, metabolite etc.
N Cases
Number of people that were used in the study as cases (for binary traits)
N Controls
Number of people that were used in the study as controls (for binary traits)
N People
Total number of people in the study
Genomic build
Genomic build of original data in the following format: hgN/GRChN
Association Metric
Type of the metric used to estimate SNP effect: beta (linear regression coefficient), logOR (logistic regression coefficient) etc.
Frequency Source
Source of allele frequencies: study or current reference used during the unification step

Analysis tab

For analysis of colocalization of signals of association from different traits, we implemented a slightly modified version of the θ metric defined by Momozawa et al. (https://doi.org/10.1038/s41467-018-04365-8). In short, this method compares "profiles" of associations of two traits in some region to distinguish pleiotropy from linkage disequilibrium. Trait of interest should be chosen at the Associations tab.

It is expected that under pleiotropy (e.g. if the same causal genetic variant is responsible for association of both traits to the region) the similarity between association patterns would be high. In contrast, two distinct variants in linkage disequilibrium (LD), unless this LD is very high, are expected to generate different patterns of associations.

The θ metric is, in essence, a weighted correlation, and high similarity is reflected by its values close to 1 or -1. If an allele increasing the value of one trait also increases the value of the second trait, the sign of θ is positive. If the allele increasing the value of one trait decreases the value of the other, the sign of θ is negative. In case there is little similarity between the two association patterns, the value of θ is close to zero.

In more detail, theta is weighted correlation based on p-values (, ) and sign of effects (, ):
,
where and , in which - weighted mean of x and y values, - weighted variance of x and y, - weight of i-element.

Parameters k, p, T are global and are reflected in the Analysis tab's top right corner.

The system calculates theta using SNP that are located within 250 kbp of the index SNP, that for both compared GWAS/RWAS are common (i.e. as default), have , . You can change the MAF threshold by slider above the z-z plot and get new results immediately.

Output

Based on colocalization analysis results you can determine whether two different traits may be under the control of the same functional variant(s) in the locus (in the manuscript by Momozawa et al., the threshold was suggested) or rather by different functional variants in linkage disequilibrium (). The web-service provides interactive graphics for visual comparison of a region. For example, if you should see a clear linear relation between the z-statistics of the primary and secondary GWAS/RWAS. In addition, you can check which SNPs entered the analysis and which were omitted and why. Also genes (NCBI genome build GRCh37.p13) located in the selected region are shown on the graph.

This website uses cookies to provide some of the essential functionality. Learn more