Raw DNA Analysis: The Complete 2026 Guide
10 min read · Updated April 2026 · DecodeMyBio Editorial Team
Raw DNA analysis takes the unprocessed genotyping file from a consumer DNA test — 23andMe, AncestryDNA, MyHeritage, FamilyTreeDNA — and interprets it against clinical guidelines like CPIC. It reveals how your body processes medications, metabolizes nutrients, and responds to compounds like caffeine, alcohol, and THC. It is educational, costs $19–$99, and requires no new test.
What “Raw DNA Data” Actually Is
When you take a 23andMe or AncestryDNA test, the lab genotypes a predefined set of positions in your genome — typically around 600,000 to 700,000 single-nucleotide polymorphisms (SNPs). The company uses a subset of those to generate the ancestry breakdown, trait predictions, and health reports you see in your account dashboard. The remainder sits unused.
Your raw data file is the full list of those SNP calls — each line shows one position in the genome (identified by rsID, chromosome, and base-pair position) and the two letters your genome has at that position. It is roughly 15–25 MB as a plain text file, usually delivered as a .zip you can download from your account settings.
What makes raw data powerful is that it contains variants in every major pharmacogenomic gene — not just the handful that your testing company chose to report on. Consumer testing companies focus on ancestry and a curated set of wellness traits because that is their product. But the biological information in your file goes far beyond what they show you.
Raw DNA analysis services — DecodeMyBio and others — parse that file, identify the clinically-meaningful variants, and map them to published clinical guidelines so you can see how your genetics affect real medications, nutrition, and daily life.
Which Consumer DNA Tests Let You Download Raw Data?
Most major consumer testing services provide raw data exports. Coverage and chip technology vary — so the number of pharmacogenomic markers in your file depends on which service you used and which version of their chip was used for your sample.
| Service | Raw data download? | Typical PGx markers | File format |
|---|---|---|---|
| 23andMe (v3–v5) | Yes — free, 5-min export | 15–25 | .txt inside .zip |
| AncestryDNA (v1–v2+) | Yes — free, 5-min export | 7–15 | .txt inside .zip |
| MyHeritage DNA | Yes — free | 10–20 | .csv |
| FamilyTreeDNA (Family Finder) | Yes — free | 10–18 | .csv |
| LivingDNA / tellmeGen | Yes — free | Varies | .txt / .csv |
| Nebula Genomics (WGS) | Yes | All variants | .vcf.gz |
| Helix | No standard export | — | — |
If you already have a file from any of the first six providers above, you can run raw DNA analysis today. If you are deciding between tests to take now and plan to reuse the data for pharmacogenomics later, 23andMe v5 gives the widest PGx marker coverage for the money. For a deeper breakdown, see our AncestryDNA guide.
How to Download Your Raw DNA from 23andMe
- Sign in to your 23andMe account at 23andme.com.
- Open Settings — click your profile icon top-right and select “Settings”.
- Find Browse Raw Data at the bottom of the Settings page, under the “23andMe Data” section, then click the “Download” tab.
- Confirm and request — re-enter your password, answer the security question, and click “Submit Request”. 23andMe emails the link within about 5 minutes.
- Open the email, click the download link and save the .zip file. The file name starts with genome_ and ends in .zip. You do not need to unzip it before uploading.
How to Download Your Raw DNA from AncestryDNA
- Sign in to ancestry.com.
- Open DNA settings — hover over “DNA” in the top navigation, click “Your DNA Results Summary”, then “Settings” in the upper right.
- Find “Download your raw DNA data” and click “Download”.
- Confirm — enter your password and agree to the download terms.
- Open the confirmation email and download the .zip file that contains your raw .txt data.
MyHeritage and FamilyTreeDNA follow a similar flow — look for “Manage DNA kits” or “Raw data” in your account settings. Most downloads are available in under a minute after you confirm.
What Raw DNA Analysis Can Tell You
Consumer genotyping arrays cover most of the common variants with clinical evidence in the main CPIC-actionable genes. Specifically, a typical 23andMe or AncestryDNA raw data file contains interpretable information about:
- Drug metabolism — how quickly or slowly your body processes roughly 150 common medications, including SSRIs, tricyclics, opioids, proton-pump inhibitors, and blood thinners. See our psychiatric medication report.
- Nutrient metabolism — MTHFR, COMT, VDR, BCMO1 variants affecting folate, vitamin D, beta-carotene and methylation. See the nutrition report.
- Pain and anesthesia response — CYP2D6 and CYP2C9 variants affecting codeine, tramadol, and NSAID response. See the pain report.
- Cannabis and THC metabolism — CYP2C9, AKT1, CNR1 variants that affect how edibles and THC-containing products hit you. See the cannabis report.
- Celiac and gluten sensitivity — HLA-DQ2 and HLA-DQ8 haplotype screening. See the celiac report.
- Common traits — caffeine metabolism, lactose tolerance, alcohol flush, bitter taste, earwax type, and dozens more — included free with any upload.
Main Pharmacogenomic Genes Detectable in Consumer Raw Data
These are the genes with the strongest CPIC-level evidence where consumer arrays reliably capture the common phenotype-driving variants. Click any gene name for the deep-dive page with phenotype tables and medication interactions.
| Gene | Affects | Consumer array coverage |
|---|---|---|
| CYP2D6 | Codeine, tramadol, SSRIs, tricyclics, tamoxifen, atomoxetine | Common SNPs covered; copy-number changes not detected |
| CYP2C19 | Clopidogrel, omeprazole, pantoprazole, escitalopram, citalopram | Excellent — main star alleles covered |
| CYP2C9 | Warfarin, NSAIDs, siponimod, phenytoin | Good — *2, *3 variants covered |
| VKORC1 | Warfarin sensitivity | Yes |
| SLCO1B1 | Statins (simvastatin, rosuvastatin) myopathy risk | Yes |
| MTHFR | Folate methylation (not CPIC actionable for drugs) | Yes — C677T and A1298C |
What Raw DNA Analysis Cannot Tell You
Being honest about what consumer arrays cannot do is as important as knowing what they can. Four hard limitations apply to every raw DNA analysis service:
- Gene deletions and duplications — CYP2D6 in particular is known for copy-number variants (CYP2D6*5 deletion, CYP2D6*1xN duplications). Consumer SNP arrays cannot detect these. Clinical pharmacogenomic labs use targeted assays that can.
- Rare variants — if a variant affects fewer than ~1% of people, it likely is not on the array. This matters most for CYP2D6 (dozens of low-frequency but clinically significant alleles) and HLA-B (hundreds of variants relevant to abacavir, carbamazepine, allopurinol).
- Non-genetic factors — drug response depends on age, organ function, other medications (phenoconversion), diet, and health conditions. Genetics is one input, not the whole picture.
- Diagnostic conclusions — raw DNA analysis is educational and not an FDA-cleared clinical diagnostic. Prescribing decisions should always involve a clinician. For high-stakes scenarios, a clinical PGx test is the right tool.
See our full limitations page for the complete breakdown.
Raw DNA Analysis vs Clinical Pharmacogenomic Testing
Both approaches produce pharmacogenomic insights. The trade-offs are cost, coverage, and turnaround. For most people exploring PGx for the first time or comparing their situation against published CPIC guidelines, raw DNA analysis is enough. For high-stakes clinical decisions, a provider-ordered clinical test is the right call.
| Dimension | Raw DNA Reuse (DecodeMyBio) | Clinical PGx (GeneSight, Genomind) |
|---|---|---|
| Cost | $19–$99 self-pay | $330–$2,000 self-pay |
| New sample needed | No — uses existing data | Yes — cheek swab |
| Turnaround | Minutes | 2–4 weeks |
| Prescriber required | No | Yes |
| Gene-deletion / duplication detection | No | Yes |
| Rare-variant detection | Limited | Yes |
| Insurance coverage | No | Often (especially Medicare Part B) |
| Regulatory status | Educational (not FDA-cleared) | CLIA-certified lab / LDT |
For the full pricing breakdown, see our pharmacogenomic testing cost guide and our GeneSight cost breakdown.
How DecodeMyBio Analyzes Your Raw Data
Once you upload, we do four things:
- Parse the file — we extract every SNP call, map rsIDs to chromosomal positions, and build a variant table specific to your file's chip version.
- Call star alleles — for pharmacogenes like CYP2D6 and CYP2C19 we infer the star-allele haplotype (e.g. *1/*2) from the combination of SNPs in your file, using published PGx haplotype definitions.
- Assign phenotypes — each diplotype maps to a metabolizer phenotype (poor, intermediate, normal, rapid, ultra-rapid) per CPIC allele function tables.
- Generate CPIC-aligned recommendations — each drug-gene pair in your target report is scored against the current CPIC guideline. Your PDF report states the gene, phenotype, drug, and clinical action.
The report is reviewed for internal consistency, packaged as a PDF, and delivered to your dashboard and inbox. See our full methodology page for the technical detail and our data sources page for the references we use.
What Your Report Actually Looks Like
DecodeMyBio reports are designed to be usable at two levels — quick enough to scan in 5 minutes and detailed enough to hand to a prescriber. A typical Medication Safety Report includes:
- Risk snapshot — one-page summary of the most significant findings in your file.
- Medication checklist — 48 named medications across common therapeutic classes, each flagged with a clinical action (standard, adjust, avoid).
- Phenotype detail — star allele, metabolizer status, and clinical meaning for each analyzed gene.
- Clinician pocket summary — one-page overview formatted for your prescriber with CPIC citations.
Preview a full sample at our sample report page — no account or upload needed.
Raw DNA Analysis Pricing
| Report | Price | Covers |
|---|---|---|
| Celiac & Gluten Screening | $19 | HLA-DQ2 / DQ8 risk screen |
| Nutrition & Methylation | $39 | MTHFR, COMT, VDR, BCMO1, FUT2 |
| Medication Safety | $49 | 150+ drug-gene interactions, 14 PGx genes |
| Psychiatric Medication | $59 | SSRIs, SNRIs, tricyclics, antipsychotics |
| Essential Bundle | $99 | Medication Safety + Psychiatric + Nutrition |
| Complete Bundle | $139 | All 6 reports |
No subscription. Pay once, download forever. Compare to clinical PGx panels in the GeneSight vs Genomind guide.
Frequently Asked Questions
Is raw DNA analysis accurate?
For the variants it can see, yes — consumer array accuracy is typically 99%+ per SNP. The limitation is coverage, not precision. Consumer arrays test ~600,000–700,000 positions out of 3+ billion in the genome; within that set, calls are reliable.
Can I use raw DNA analysis to skip clinical PGx testing?
For informational and educational purposes, yes. For high-stakes prescribing decisions (starting warfarin, tamoxifen, clopidogrel with kidney disease, complex anesthesia), a provider-ordered clinical test is more appropriate because it covers rare variants and copy-number changes.
What if my raw data file is old?
Your genome does not change. 23andMe v3 (2010), v4 (2013), and v5 (2017) files are all valid. Newer chips just cover more pharmacogenomic markers, so a 2013 file may have fewer markers than a 2020 file — but what is there is still accurate.
What happens to my data after analysis?
Files are encrypted at rest and in transit. We retain raw data for 90 days to support re-analysis and customer support, then automatically delete it. We do not sell or share raw data with third parties. See privacy for full detail.
What if I do not have raw DNA data yet?
Take an inexpensive consumer test first — 23andMe and AncestryDNA start around $99. Combined with one DecodeMyBio report ($49), you are still well under clinical PGx pricing.
How long does analysis take?
File parsing and variant identification run in minutes. Once payment is confirmed, your PDF is available immediately in your dashboard and also sent by email.
Understanding Star Alleles and Metabolizer Phenotypes
Pharmacogenomic reports talk about your genotype using star alleles — notation like CYP2D6*1/*4 or CYP2C19*2/*17. This shorthand exists because most pharmacogenes have specific SNP combinations (haplotypes) that recur in populations. Each haplotype gets a star-number name (*1 is typically the reference, *2 onwards are the defined variants), and your two copies — one inherited from each parent — form your diplotype.
Each star allele has an activity score defined by CPIC based on in vitro and in vivo data (0 = no function, 0.5 = decreased, 1 = normal, >1 = increased). The sum of your two activity scores maps to a phenotype:
- Poor metabolizer (PM) — activity score 0. Little or no enzyme activity. Standard doses of drugs metabolized by this enzyme can reach toxic plasma levels; prodrugs (codeine, tamoxifen) fail to activate.
- Intermediate metabolizer (IM) — activity score 0.25–1.0. Reduced but not absent enzyme activity; dose adjustments may be warranted.
- Normal metabolizer (NM) — activity score 1.25–2.25. Standard dosing applies.
- Rapid metabolizer (RM) — activity score 2.25–3.0. Faster than normal breakdown; some drugs may be under-exposed.
- Ultra-rapid metabolizer (UM) — activity score >3.0. Typically from CYP2D6 gene duplications. Prodrugs may reach toxic levels of active metabolite (e.g. codeine → morphine). Not detectable from consumer arrays because duplications are structural variants.
Learn more about what poor metabolizer means and see the CYP2D6 deep-dive for a fully worked example.
The Science: How Consumer SNP Arrays Actually Work
Consumer DNA tests use a technology called SNP microarray genotyping. It is different from whole-genome sequencing in an important way: it only tests a specific, predetermined set of positions in your genome, not the full sequence. Each of those positions is a single-nucleotide polymorphism — a spot where the population has known variability (e.g. some people have an A, others have a G at a given position).
A typical 23andMe v5 chip (Illumina Global Screening Array) tests about 640,000 SNP positions. That sounds like a lot, but the human genome has over 3 billion base pairs. The chip is roughly sampling 0.02% of your genome — but it is sampling the positions researchers have identified as the most informative for ancestry, disease risk, and pharmacogenomics.
For each SNP, the chip produces a genotype call in one of three forms: homozygous reference (e.g. AA), heterozygous (AG), or homozygous alternate (GG). Call accuracy is typically 99%+ per SNP for common variants. The limitations are:
- Array content is fixed — if a SNP is not on the chip, it is not in your file. Chip versions vary across years: 23andMe v3 (2010), v4 (2013), v5 (2017) progressively expanded pharmacogenomic coverage.
- Structural variants are invisible — deletions, duplications, and large rearrangements (critical for CYP2D6) do not show up. You see individual SNP calls but not the architecture of the gene.
- Rare variants are underrepresented — array content skews toward population-common variants. Very rare but clinically important alleles (especially relevant for HLA typing, CYP2D6 sub-alleles) often are not covered.
- Ethnic representation varies — chips were historically designed with European-ancestry populations. Variants more common in African, East Asian, or Indigenous populations are improving with newer chips but coverage is still uneven.
For most pharmacogenomic applications this is acceptable. The major CPIC-actionable phenotypes in CYP2C19, CYP2C9, VKORC1, SLCO1B1, and CYP2D6 (excluding copy-number) are driven by SNPs that are covered.
Phenoconversion: When Drug Interactions Override Your Genetics
Your genetic metabolizer phenotype is what your genes predict — but what happens in your body is modified by everything else you are taking. Phenoconversion is the phenomenon where a drug interaction changes your effective phenotype, often dramatically.
A classic example: fluoxetine, paroxetine, and bupropion are strong CYP2D6 inhibitors. A person with a genetic CYP2D6 normal-metabolizer genotype, if taking one of these, behaves functionally as a CYP2D6 poor metabolizer. This matters for any drug metabolized by CYP2D6 — codeine will not convert efficiently to morphine, tamoxifen will not form endoxifen, tricyclics will accumulate.
Raw DNA analysis reports your genetic phenotype. It does not know what other medications you are on. If you are taking a strong inhibitor or inducer of a pharmacogene, your effective phenotype may be shifted one or two categories — a shift a clinician should factor into prescribing decisions. See our limitations page for the list of common CYP inhibitors and inducers to watch for.
Common Misconceptions About Raw DNA Analysis
“Consumer DNA tests are not accurate enough for medical use.”
Per-SNP accuracy on a modern 23andMe or AncestryDNA chip is typically 99%+ for common variants. The limitation is coverage (what the chip can see), not accuracy (whether calls are correct). For the variants that drive most CPIC-actionable phenotypes, consumer arrays are reliable.
“I already have 23andMe health reports — I don't need anything else.”
23andMe reports on a curated subset of variants in a limited set of conditions. The pharmacogenomic variants in your file — CYP2D6, CYP2C19, CYP2C9, VKORC1, SLCO1B1 — are largely not part of the 23andMe health section. Raw data analysis reveals what your testing company chose not to report on.
“If I have an MTHFR variant, I need special supplements.”
The supplement industry heavily markets to people with MTHFR variants, but the CDC, ACOG, and AAFP all recommend the same 400 mcg folic acid dose regardless of MTHFR status. Evidence for methylfolate superiority in the general population is weak.
“A normal metabolizer result means I can ignore PGx.”
“Normal” at one gene does not mean normal at all relevant genes. A CYP2D6 normal metabolizer who is also a CYP2C19 poor metabolizer still has significant drug response implications for omeprazole, clopidogrel, and escitalopram. Pharmacogenomics is multi-gene.
Consumer Arrays vs Whole Genome Sequencing (WGS)
If you are deciding whether to take a consumer DNA test (array-based) or invest in whole genome sequencing, here is the practical comparison for pharmacogenomic purposes:
Consumer arrays (23andMe, AncestryDNA) cost $79–$199 and test ~640,000–900,000 specific SNPs. For pharmacogenomics, they cover the common variants that drive 90–95% of CPIC-actionable phenotypes in major European-ancestry populations. Call accuracy is excellent. Limitation: no copy-number, limited rare variants.
Whole genome sequencing (Nebula, Dante Labs) costs $200–$500+ for 30× coverage and captures essentially every base in your genome. For pharmacogenomics it covers all known variants including rare ones. It also produces a VCF file that raw-data analysis services can process. Limitation: cost, data management complexity, and analytical overkill for people who only want PGx insights.
Practical take: for pharmacogenomic insights specifically, a consumer array plus raw-data reuse is the cost-effective path. WGS is worth it if you want comprehensive health-risk analysis or plan to do ongoing research with your genome. DecodeMyBio accepts VCF files from WGS services, so you are not locked out either way.
Who Benefits Most (and Least) from Raw DNA Analysis
Raw DNA analysis is an informational tool, not a one-size-fits-all recommendation. Here is who tends to get the most value:
High value:
- People who have tried multiple antidepressants without success — pharmacogenomic data often explains why.
- People who experience unusually strong side effects at normal doses of common drugs (codeine, tramadol, certain statins, PPIs).
- People starting a medication with a narrow therapeutic index (warfarin, phenytoin, tacrolimus) who want to discuss baseline genetic risk with their prescriber.
- People on multiple medications who want a written record to review for potential drug-gene interactions.
- People who already have a 23andMe or AncestryDNA file and want to get more value from it.
Lower value:
- People facing an immediate high-stakes prescribing decision (e.g. complex oncology dosing) — a provider-ordered clinical PGx test that also detects copy-number and rare variants is more appropriate.
- People looking for a broad disease-risk screen — raw DNA analysis focuses on drug-gene interactions, not polygenic disease risk.
- People without existing consumer DNA data who are not willing to take a consumer test first — starting with a provider-ordered clinical PGx test through insurance may be cheaper.
Our methodology page details exactly which genes and variants we analyze, and our limitations page is honest about what we cannot tell you.
Ready to Decode Your DNA?
Upload your 23andMe, AncestryDNA, MyHeritage, or FamilyTreeDNA file in about two minutes. You'll get 9 free trait insights while we analyze your variants, and you can preview the report before you pay.