Together with the heritability enrichment observed in genes expressed in lung tissues, these results highlight the involvement of lung-related biological pathways in the development of severe COVID The previously reported associations for the strongest association for COVID severity at the 3p Increasing the global representation in genetic studies enhances the ability to detect novel associations. Two of the loci that affect disease severity were only discovered by including the four studies of individuals with East Asian ancestry.

Although we cannot be certain of the mechanism of action, the FOXP4 association is an attractive biological target, as it is expressed in the proximal and distal airway epithelium 36 and has been shown to have a role in controlling epithelial cell fate during lung development The COVID HGI continues to pursue expansion of the datasets included in the analyses of the consortium to populations from underrepresented populations in upcoming data releases.

We plan to release ancestry-specific results in full once the sample sizes allow for a well-powered meta-analysis. Care should be taken when interpreting the results from a meta-analysis because of challenges with case and control ascertainment and collider bias see Supplementary Note for a more detailed discussion on study limitations. Drawing a comprehensive and reproducible map of the host genetics factors associated with COVID severity and SARS-CoV-2 requires a sustained international effort to include diverse ancestries and study designs.

Future work will be required to better understand the biological and clinical value of these findings. Continued efforts to collect more samples and detailed phenotypic data should be endorsed globally, allowing for more thorough investigation of variable, heritable symptoms, particularly in light of the newly emerging strains of SARS-CoV-2, which may provoke different host responses that lead to disease.

Methods Contributing studies All of the participants were recruited following protocols approved by local Institutional Review Boards; this information is collected in Supplementary Table 1 for all 46 studies. All protocols followed local ethics recommendations and informed consent was obtained when required.

Information about sample numbers, sex and age from for each contributing study is given in Supplementary Table 1. Each individual study that contributed data to a particular analysis met a minimum threshold of 50 cases, as defined by the phenotypic criteria, for statistical robustness. The effective sample sizes for each ancestry group shown in Fig.

Details of contributing research groups are provided in Supplementary Table 1. Additional information regarding individual studies contributing to the consortium are described in Supplementary Table 1. Any other study-specific covariates to account for known technical artefacts could be added. SAIGE automatically accounts for sample relatedness and case—control imbalances.

Quality-control and analysis approaches for individual studies are reported in Supplementary Table 1. Study-specific summary statistics were then processed for meta-analysis. Potential false positives, inflation and deflation were examined for each submitted GWAS. Allele frequency plots against gnomAD 3. Standard error values as a function of the effective sample size were used to find studies that deviated from the expected trend.

Summary statistics passing this manual quality control were included in the meta-analysis. If multiple matching variants were included, the best match was chosen according to the minimum fold change in absolute allele frequency. Meta-analysis was performed using the inverse-variance-weighted IVW method on variants that were present in at least two-thirds of the studies contributing to the phenotype analysis.

The method summarizes effect sizes across the multiple studies by computing the mean of the effect sizes weighted by the inverse variance in each individual study. We report the unadjusted P values for each variant. This is calculated for each variant as the weighted sum of squared differences between the effects sizes and their meta-analysis effect, the weights being the inverse variance of the effect size.

Two loci reached genome-wide significance but were excluded from the significant results in Supplementary Table 2 due to heterogeneity between estimates from contributing studies and missingness between studies at chr. For each of the lead variants reported in Supplementary Table 2 , we aimed to find loci specific to susceptibility or severity by testing whether there was heterogeneity between the effect sizes associated with hospitalized COVID progression to severe disease and reported SARS-CoV-2 infection.

For these pairs of phenotype comparisons, we generated new meta-analysis summary statistics to use; including only those studies that could contribute data to both phenotypes that were under comparison. Principal component projection To project every GWAS participant into the same principal component PC space, we used pre-computed PC loadings and reference allele frequencies. We further normalized the projected PC scores by dividing the values by a square root of the number of variants used for projection to account for a subtle difference due to missing variants.

Gene prioritization To prioritize candidate causal genes reported in full in Supplementary Table 2 , we used various gene prioritization approaches using both locus-based and similarity-based methods. Because we only describe the in silico gene prioritization results without characterizing the actual functional activity in vitro or in vivo, we aimed to provide a systematic approach to nominate potential causal genes in a locus using the following criteria.

We then constructed a weighted-average LD matrix by per-population sample sizes in each meta-analysis, which we used as a LD reference. We retrieved fine-mapped variants from the GTEx v. For each variant, the overall V2G score aggregates differentially weighted evidence of variant—gene associations from several data sources, including molecular cis-QTL data for example, cis-protein QTLs from ref. Phenome-wide association study To investigate the evidence of shared effects of 15 index variants for COVID and previously reported phenotypes, we performed a phenome-wide association study.

This conservative approach allowed spurious signals primarily driven by proximity rather than actual colocalization to be removed see Methods. Heritability LD score regression v. As this method depends on matching the LD structure of the analysis sample to a reference panel, the summary statistics of European ancestry only were used. We additionally report SNP heritability estimates for the all-ancestries meta-analyses, calculated using European panel LD scores, in Supplementary Table 8.

Genome-wide association summary statistics We obtained genome-wide association summary statistics for 43 complex-disease, neuropsychiatric, behavioural or biomarker phenotypes Supplementary Table Summary statistics generated from GWAS using individuals of European ancestry were preferentially selected if available.

These summary statistics were used in subsequent genetic correlation and Mendelian randomization analyses. Genetic correlation LD score regression 50 was also used to estimate the genetic correlations between our COVID meta-analysis phenotypes reported using samples of only European ancestry, and between these and the curated set of 38 summary statistics. Genetic correlations were estimated using the same LD score regression settings as for heritability calculations.

We used a strict r2 threshold of 0. Namely, we ensured that the effect of a variant on the exposure and outcome corresponded to the same allele, we inferred positive-strand alleles and dropped palindromes with ambiguous allele frequencies, as well as incompatible alleles. Supplementary Table 10 includes the harmonized datasets used in the analyses.

In brief, the standard IVW meta-analytic framework was used to calculate the average causal effect by excluding each genetic variant used to instrument the analysis. A global statistic was calculated by summing the observed residual sum of squares, that is, the difference between the effect predicted by the IVW slope excluding the SNP, and the observed effect of the SNP on the outcome. Overall horizontal pleiotropy was subsequently analysed by comparing the observed residual sum of squares, with the residual sum of squares expected under the null hypothesis of no pleiotropy.

We also used the regression intercept in MR-Egger 57 to evaluate potential bias due to directional pleiotropic effects. The IVW approach estimates the causal effect by aggregating the single-SNP causal effects obtained using the ratio of coefficients method—that is, the ratio of the effect of the SNP on the outcome over the effect of the SNP on the exposure in a fixed-effects meta-analysis.

The SNPs were assigned weights based on their inverse variance. The IVW method confers the greatest statistical power for estimating causal associations 59 , but assumes that all variants are valid instruments and can produce biased estimates if the average pleiotropic effect differs from zero.

We conducted further sensitivity analyses using alternative Mendelian randomization methods that provide consistent estimates of the causal effect even when some instrumental variables are invalid, at the cost of reduced statistical power including: 1 Weighted median estimator WME ; 2 weighted mode-based estimator WMBE ; and 3 MR-Egger regression.

All statistical analyses were conducted using R v. We were able to centralize information, recruit partner studies, rapidly distribute summary statistics and present preliminary interpretations of the results to the public. This centralized resource provides a conceptual and technological framework for organizing global academic and industry groups around a shared goal.

Visitors can query study information, including study design and research questions. Registered studies are visualized on a world map and are searchable by institutional affiliation, city and country. To enhance scientific communication to the public, preliminary results are described in blog posts by the scientific communications team and shared on Twitter.

The first post was translated to 30 languages with the help of 85 volunteer translators. Information about these vendors, user licenses, and data is available on the Commercial Datasets page. As additional commercial small satellite datasets are evaluated and acquired, those datasets will also be made available. Data that is favorably evaluated and deemed of sufficient value will be purchased by NASA for broader sustained use. Contract types will be selected on a vendor-by-vendor basis that are best suited to provide long-term access to data.

To facilitate standard scientific collaborations, NASA will seek end-user license agreements EULAs to enable broad levels of dissemination and shareability of the commercial data with U. Government agencies and partners. On-ramp and Evaluation With the transition from pilot to ongoing data acquisition activities, ESD has established a process for identifying vendors and evaluating data. Evaluation Airbus U.

The selected PIs will be required to submit a final report as part of the evaluation. The reported results will be summarized and reported out to ESD senior management. The summary report is not intended to be a consensus recommendation, but a document that takes into account the results of all team member evaluations. NASA will use the summary report, individual PI reports, and other information to determine the suitability of data from each vendor for future procurements.

More details on the data evaluations are available on our Smallsat Data Evaluation page. All data purchased during the evaluation phase will be preserved for long term data use by NASA for future use in accordance with the scientific use license. Program Activities The program is reaching out to the scientific community via science meetings, workshops, and conference presentations.

