Hinna0818

📊 A Scalable Phenotyping and Statistical Pipeline for UK Biobank RAP Data.

20
3
100% credibility
Found Apr 20, 2026 at 20 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
R
AI Summary

UKBAnalytica is an R package for processing UK Biobank data to create standardized disease phenotypes, survival datasets, and perform statistical and machine learning analyses.

How It Works

1
👩‍🔬 Discover UKBAnalytica

You find this helpful tool that makes analyzing UK Biobank health data simple and fast.

2
🚀 Install easily

You add it to your R workspace with one quick command, no hassle.

3
📂 Load your data

You bring in your UK Biobank file, and it reads everything smoothly.

4
Pick diseases and build dataset

You choose diseases like diabetes or heart issues, and it instantly creates ready-to-use survival data showing who got sick when.

5
Choose your analysis
📊
Run statistics

Get hazard ratios, p-values, and tables to understand risks.

🤖
Try machine learning

Train smart models to predict outcomes and see what matters most.

🎉 See clear results

You get beautiful charts, tables, and insights ready for your research paper.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 20 to 20 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is UKBAnalytica_v2?

UKBAnalytica_v2 is an R package that turns raw UK Biobank RAP data exports into analysis-ready datasets for phenotyping and survival studies. Load your CSV, pick diseases from predefined definitions pulling from ICD10/9, self-reports, death records, or algorithms, and it spits out wide-format tables with history/incident flags, survival times, and participant flowcharts. Python helpers handle RAP downloads for demographics, proteins, or metabolites, making scalable biobank data pipelines straightforward.

Why is it gaining traction?

It solves the drudgery of multi-source phenotyping—parsing messy hospital codes, aligning dates, handling prevalent vs. incident cases—while bundling stats like Table 1s, propensity matching, mediation, MI pooling, and ML with SHAP plots into one flow. Developers skip weeks of custom scripts for UKB quirks, getting reproducible survival datasets fast. The quick-start builds full Cox-ready tables in minutes, with sensitivity tweaks for sources or early events.

Who should use this?

UK Biobank researchers running prospective cohort studies, especially survival analysis on CVD, diabetes, or respiratory traits. Biostatisticians phenotyping comorbidities for adjustment, or ML folks wanting SHAP-explained risk models on RAP exports. Perfect for teams iterating sensitivity analyses without rebuilding parsers.

Verdict

Solid for UKB RAP users despite low 20 stars and 1.0% credibility score—docs shine with examples, MIT license, but test coverage looks thin. Try it if you're in biobank data pipelines; skip for general stats.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.