Precursor Evidence Heatmap (Top 50 Most Variable)
Cumulative Detection Curve
Sample Clustering by Detection Pattern (Jaccard Distance)
Precursor Count per Protein (per Sample)
Per-Group Replicate Statistics
Contaminant Protein Analysis
Per-Sample Contaminant Intensity
Top Contaminant Proteins
Contaminant Expression Heatmap (Top 20)
Data Explorer
Export for ClaudeAbundance Profiles (Quartile Analysis)
Variable Proteins (Quartile Range >= 2)
Sample-Sample Scatter
AI-Powered Analysis Summary
How it works: Click the button below to generate a comprehensive AI-powered analysis across all comparisons in your experiment.
The AI will identify key DE proteins per comparison, cross-comparison biomarkers, and provide biological insights on high-confidence candidates.
Configure Comparison
report_log.txt
or the SLURM
.out
file from each DIA-NN search.
Only the command line and summary stats are read.
AI-Powered Comparison Analysis
Generate an AI narrative summary or export data for external analysis.
MOFA2 Factor Decomposition
Treats Run A and Run B as two views of the same samples and decomposes joint variance into shared and run-specific factors.
Note: QC Stats (with Groups) + Top 800 Expression Data are sent to AI.
De Novo Search
Standard tryptic database search + Casanovo de novo sequencing + DIAMOND BLAST. Use this for protein discovery in any species — including ancient or non-model organisms where Casanovo's novel peptides + BLAST cascade are the value-add.
- Enzyme: Trypsin/P, 7–50 AA
- Variable mods: ox(M), N-term acetyl
- Downstream: species attribution, BLAST alignments, coverage maps, deamidation tracking
Peptidomics — endogenous peptides
Nonspecific search for endogenous peptides (no enzymatic digestion). Use this for neuropeptides, milk peptides, antimicrobial peptides, or any analysis where peptides arrive in the MS already-cleaved by endogenous proteases.
- Enzyme: none (cleave_at = ""), 5–25 AA, 400–5000 Da
- Variable mods: ox(M), pyro-Glu (Q/E N-term), C-term amidation, N-term acetyl
- Downstream: peptide-source-protein chart, N-/C-term cleavage motifs, PTM landscape
Nonspecific search is ~10–50× slower than tryptic. Walltime auto-bumped to 8 h.
HLA / MHC — immunopeptidomics
MHC class I and II peptide identification. Nonspecific search with class-specific length and charge windows. Use this for immunopeptidome studies, neoantigen discovery, or vaccine target ID.
- Enzyme: none (cleave_at = "")
- Variable mods: ox(M), deamidation (N/Q)
- Charge range: 1–3 (z=1 dominant on TOF instruments)
- Downstream: length histogram, P2/PΩ anchor logos, source-protein analysis
Load DDA / de novo results by uploading a shared ZIP
Manuscript Summary Statistics (Table 1)
Peptide length distribution
Tryptic peptides typically span 7–25 aa.HLA anchor residue frequencies
P2 + PΩ are the dominant anchor positions for MHC-I. Allele preferences fingerprint the donor's HLA type (e.g. A*02:01 → L at P2 + L/V at PΩ).Cleavage flanking residues
N- and C-terminal residue percentages — fingerprints the endogenous protease activity that produced these peptides. High C-term K/R = trypsin contamination.total
in one file but 0 in others
are candidate condition-specific. Sort/filter on the column toolbar;
use Copy/CSV/Excel to export.
Top Proteins by De Novo Peptide Count
Taxonomic Coverage
Identity of each peptide across the top species, grouped by source protein. Reveals patterns like conserved vs species-specific protein regions.
Show full peptide-species identity matrix
Peptide Length Distribution
Charge State Distribution
Post-Translational Modifications
Modification analysis from de novo sequences. In paleoproteomics, high N-deamidation with low Q-deamidation indicates genuine ancient protein (time-dependent asparagine degradation).
Select a near-match peptide from the table below, then click Show Alignment to visualize mismatches with per-residue confidence. Green = genuine variant (AA score > 0.95), Red = possible sequencing error (AA score < 0.70).
This view cross-references BLAST mismatches with Casanovo's per-residue amino acid confidence scores to distinguish species-specific markers from sequencing artifacts.
How we put an FDR on a de novo species call
De novo peptides are BLASTed against NCBI nr to identify the species — but what is the false-discovery rate when the organism may be absent from the database? We adapt the NovoBoard decoy-spectra method (Tran et al. 2024) to the homology search. Here is the whole process, end to end.
For every real MS/MS spectrum we keep ~20% of its peaks and replace the other ~80% with peaks drawn at random from the global peak pool — keeping the precursor m/z and charge. The result looks like a real spectrum but encodes no real peptide.
The identical Casanovo model and settings run on both sets, giving a population of real de novo peptides and a matched population of decoy de novo peptides — a true 1:1 null with the same number of queries.
Both populations are BLASTed against nr. Real peptides hit far more often than decoy peptides at every Casanovo score; the cumulative decoy ÷ real hit ratio is the FDR. Worked example below (ocelot, Leopardus pardalis , 9 runs):
The decoy-spectra FDR only controls chance homology . Two further errors need their own controls: (1) de novo sequence error — gate with the Casanovo confidence slider (≈21% error at conf ≥0.95, ≈12% at ≥0.99); (2) species mis-assignment from conserved peptides — use the Species (LCA) pane (lowest common ancestor across all hits, not the single best hit). Full validation: docs/DENOVO_FDR_VALIDATION.md.
Live view — your loaded dataset
Hit rate = % of unique de novo peptides in each Casanovo-score bin with ≥1 nr BLAST hit (per peptide, e-value ≤1). Loads denovo/blast_results_decoy_spectra.tsv (falls back to the legacy shuffled decoy).Lowest-common-ancestor species attribution from the nr BLAST. Each de novo peptide is placed at the deepest taxon shared by its top hits: species/genus = diagnostic, family+ = conserved (not species-attributed), bacterial/viral = microbiome.
Per-peptide LCA
One row per de novo peptide combining all three evidence streams: Casanovo confidence , whether Sage found it in the database, and the nr BLAST species/clade (LCA) . Hidden below the confidence slider at the top of the page — low-confidence calls are excluded by default but every peptide is one slider-click away.
The de novo peptides assembled into proteins by the same parsimony model FragPipe (ProteinProphet) and IDPicker use — the minimal protein set explaining the peptides, with razor peptide assignment and indistinguishable-protein grouping. Click a protein to pop out a per-residue coverage map (reference vs de novo, full-screen-able) that colours amino-acid substitutions by Casanovo confidence. Honours the confidence slider above.
Export Complete Analysis
Download everything needed to reproduce and share this analysis. Includes all data files, DIA-NN search parameters, and session state.
What's included (click to expand)
- expression_matrix.csv -- Normalized protein intensities (pipeline-aware: DPC-Quant complete, or MaxLFQ with NAs)
- DE_Results_Full.csv -- All contrasts × all proteins with logFC, P.Value, adj.P.Val (when DE was run)
- QC_Metrics.csv -- Per-sample QC metrics + group labels (when QC stats exist)
- Phospho_DE_Results.csv -- Site-level phospho DE (when phosphoproteomics was run)
- diann_pg_matrix.tsv -- DIA-NN protein-level matrix with real missing values (0 = not detected, ~200 KB)
- data_quality_summary.csv -- Per-sample protein counts, % detected, contaminant counts
- detection_matrix.csv -- Per-protein precursor detection counts per sample
- quartile_profiles.csv -- Intensity quartile assignments per sample
- variable_proteins.csv -- Proteins with inconsistent abundance across samples
- sample_metadata.csv / group_assignments.csv -- Sample groups and identifiers
- contaminant_summary.csv -- Contaminant protein statistics
- search_info.md -- Full DIA-NN search parameters and job metadata
- session.rds -- Complete session state (reload in DE-LIMP)
- methods.txt / parameters.txt -- Pipeline parameters, normalization, app version
- reproducibility_log.R -- R code log + sessionInfo() to reproduce every step
- figures/ -- 9 publication-quality SVG figures: volcano, heatmap_top20, violin_top10_up/down, pca, qc_group_distribution, normalization_density, data_completeness, sample_correlation, pvalue_distribution
- PROMPT.md -- AI analysis prompt with biological questions and figure-reference instructions (DE-aware)
- MANIFEST.txt -- Per-section export status (any skipped files explained here)
DE Results Table
Quick export of the DE results for the selected comparison. Includes gene symbols, logFC, P.Value, adj.P.Val, and per-sample expression values. One CSV file — no search parameters or session data.
Export Results CSVCV Analysis
Coefficient of variation for significant proteins. Includes per-group CV and average CV values. One CSV file.
Export CV Analysis CSVFull DIA-NN Output
The complete DIA-NN search output (report.parquet, precursor matrices, spectral libraries, logs) is stored on the HPC cluster. These files can be large (100 MB+) and are not included in the analysis export.
- Action name - what you did (e.g., 'Run Pipeline')
- Timestamp - when you did it
- R code - how to reproduce it
Copy this entire code block to reproduce your analysis in a fresh R session.
DE-LIMP
Differential Expression — LIMPA Pipeline
Explore video tutorials, training courses, and methodology citations.