DE-LIMP (Hugging Face)

Sort Order:

Run Order Group

(Applies to Sample Metrics)

Precursor Evidence Heatmap (Top 50 Most Variable)

Cumulative Detection Curve

Sample Clustering by Detection Pattern (Jaccard Distance)

Precursor Count per Protein (per Sample)

Faceted Overlay Metrics

By Run By Group

No TIC data available. Extract from the Search tab before searching, or use the button below to extract from an existing output directory.

CSV

Box Plots Density Overlay

Color by:

Metric:

Viewing Comparison:

Tip: Assign experimental groups (required). Covariate columns are optional - customize names and include in model as needed.

Export

Optional covariates

In model

Column name (click to rename)

Tick “In model” to adjust DE for that factor (e.g., batch effects). The text box renames the column — fill values for each sample in the table below.

Viewing Comparison:

Overlay Contaminants

CSV

Per-Group Replicate Statistics

Contaminant Protein Analysis

Per-Sample Contaminant Intensity

Top Contaminant Proteins

Contaminant Expression Heatmap (Top 20)

Viewing Comparison:

💾 Export Full Table

Data Explorer

Export for Claude

Abundance Profiles (Quartile Analysis)

Proteins split into quartiles by average intensity. Colors show per-sample quartile assignment. Proteins that change quartile across samples may be biologically interesting.

Exclude contaminants

Variable Proteins (Quartile Range >= 2)

Sample-Sample Scatter

Sample A

Sample B

Label outliers (>4-fold)

Exclude contaminants

AI-Powered Analysis Summary

How it works: Click the button below to generate a comprehensive AI-powered analysis across all comparisons in your experiment.

The AI will identify key DE proteins per comparison, cross-comparison biomarkers, and provide biological insights on high-confidence candidates.

Download as Markdown Export for Claude

Viewing Comparison:

Heatmap of Selected/Top Proteins

PNG SVG

💾 Export

Color by:

Axes:

PNG

CSV

Distribution of Coefficient of Variation (CV) for significant proteins, broken down by experimental group.

PNG

Proteins detected in one group AND completely missing from the other have no finite logFC — limma silently drops them from the volcano. They're listed here as presence/absence calls . Most relevant under the MaxLFQ + limma pipeline (DPC-Quant fills missing values, so this list will normally be empty there).

Detected in ≥ N samples of one group:

CSV

Enrichment analysis on DE results. Auto-detects organism. Results cached per ontology.

PNG

CSV

Configure Comparison

Comparison Type

DE-LIMP vs DE-LIMP

DE-LIMP vs Spectronaut

DE-LIMP vs FragPipe

Run A Source

Current session

Load from file

Load Run A (.rds)

Browse...

Load Run B (.rds)

Browse...

Spectronaut Export (.zip)

Browse...

Or upload individual files...

Protein Quantities (.tsv)

Browse...

Candidates / DE Stats (.tsv, optional)

Browse...

Spectronaut Setup Guide How to export from Spectronaut

FragPipe output type

FragPipe-Analyst DE export

combined_protein.tsv (intensities only)

FragPipe-Analyst DE results (.csv/.tsv)

Browse...

combined_protein.tsv

Browse...

Layers 1-3 only. Run FragPipe-Analyst for full DE comparison.

Attach DIA-NN log files (optional — fills in search parameters)

Upload report_log.txt or the SLURM .out file from each DIA-NN search. Only the command line and summary stats are read.

Run A — DIA-NN log

Browse...

Run B — DIA-NN log

Browse...

Protein Details

Export CSV

AI-Powered Comparison Analysis

Generate an AI narrative summary or export data for external analysis.

Export ZIP for Claude Analysis

MOFA2 Factor Decomposition

Treats Run A and Run B as two views of the same samples and decomposes joint variance into shared and run-specific factors.

Chat with Full Data (QC + Expression)

💾 Save Chat Export for Claude

Note: QC Stats (with Groups) + Top 800 Expression Data are sent to AI.

De Novo Search

Standard tryptic database search + Casanovo de novo sequencing + DIAMOND BLAST. Use this for protein discovery in any species — including ancient or non-model organisms where Casanovo's novel peptides + BLAST cascade are the value-add.

Enzyme: Trypsin/P, 7–50 AA
Variable mods: ox(M), N-term acetyl
Downstream: species attribution, BLAST alignments, coverage maps, deamidation tracking

Peptidomics — endogenous peptides

Nonspecific search for endogenous peptides (no enzymatic digestion). Use this for neuropeptides, milk peptides, antimicrobial peptides, or any analysis where peptides arrive in the MS already-cleaved by endogenous proteases.

Enzyme: none (cleave_at = ""), 5–25 AA, 400–5000 Da
Variable mods: ox(M), pyro-Glu (Q/E N-term), C-term amidation, N-term acetyl
Downstream: peptide-source-protein chart, N-/C-term cleavage motifs, PTM landscape

Nonspecific search is ~10–50× slower than tryptic. Walltime auto-bumped to 8 h.

HLA / MHC — immunopeptidomics

MHC class I and II peptide identification. Nonspecific search with class-specific length and charge windows. Use this for immunopeptidome studies, neoantigen discovery, or vaccine target ID.

Class I (8–12 AA, 700–1500 Da) Class II (13–25 AA, 1300–3000 Da)

Enzyme: none (cleave_at = "")
Variable mods: ox(M), deamidation (N/Q)
Charge range: 1–3 (z=1 dominant on TOF instruments)
Downstream: length histogram, P2/PΩ anchor logos, source-protein analysis

Load DDA / de novo results by uploading a shared ZIP

Download dataset for AI (ZIP) Download methods + code (reproducibility)

Casanovo peptide score ≥

Casanovo peptide score = (product of per-residue scores) − 1 if the peptide fails precursor-mass closure. So a negative score means the WHOLE peptide's mass is incomplete — NOT that the matched residues are wrong (decoy-validated: score<0 hits still match references 99% at their confident residues). Default −1 shows all; raise only to inspect mass-complete reconstructions.

Manuscript Summary Statistics (Table 1)

Download CSV

Mass spec files:

Clear (show all)

View:

Combined (aggregated across selected files)

Per-file (compare files / conditions side-by-side)

Contaminants:

Exclude Cont_ proteins (from the searched contaminant database)

Protein filter:

Skin/hair = keratin family; opt-in, off by default.

Peptide length distribution

Tryptic peptides typically span 7–25 aa.

HLA class I shows a sharp peak at 9; class II at 13–15; peptidomics is broad 5–25.

HLA anchor residue frequencies

P2 + PΩ are the dominant anchor positions for MHC-I. Allele preferences fingerprint the donor's HLA type (e.g. A*02:01 → L at P2 + L/V at PΩ).

Cleavage flanking residues

N- and C-terminal residue percentages — fingerprints the endogenous protease activity that produced these peptides. High C-term K/R = trypsin contamination.

Rows = unique peptides; columns = source files; cells = PSM count. Peptides with high total in one file but 0 in others are candidate condition-specific. Sort/filter on the column toolbar; use Copy/CSV/Excel to export.

BLASTs novel peptides against UniProt SwissProt + TrEMBL (SwissProt first, then TrEMBL on misses) on HPC.

Exclude contaminant proteins

Top Proteins by De Novo Peptide Count

Taxonomic Coverage

Identity of each peptide across the top species, grouped by source protein. Reveals patterns like conserved vs species-specific protein regions.

Show full peptide-species identity matrix

All Conserved Near-match Distant

Peptide Length Distribution

Charge State Distribution

Post-Translational Modifications

Modification analysis from de novo sequences. In paleoproteomics, high N-deamidation with low Q-deamidation indicates genuine ancient protein (time-dependent asparagine degradation).

Select a near-match peptide from the table below, then click Show Alignment to visualize mismatches with per-residue confidence. Green = genuine variant (AA score > 0.95), Red = possible sequencing error (AA score < 0.70).

This view cross-references BLAST mismatches with Casanovo's per-residue amino acid confidence scores to distinguish species-specific markers from sequencing artifacts.

How we put an FDR on a de novo species call

De novo peptides are BLASTed against NCBI nr to identify the species — but what is the false-discovery rate when the organism may be absent from the database? We adapt the NovoBoard decoy-spectra method (Tran et al. 2024) to the homology search. Here is the whole process, end to end.

1 Build a decoy spectrum

For every real MS/MS spectrum we keep ~20% of its peaks and replace the other ~80% with peaks drawn at random from the global peak pool — keeping the precursor m/z and charge. The result looks like a real spectrum but encodes no real peptide.

2 Sequence real and decoy spectra with Casanovo

The identical Casanovo model and settings run on both sets, giving a population of real de novo peptides and a matched population of decoy de novo peptides — a true 1:1 null with the same number of queries.

3 BLAST both against nr, then compete: FDR = decoy ÷ real

Both populations are BLASTed against nr. Real peptides hit far more often than decoy peptides at every Casanovo score; the cumulative decoy ÷ real hit ratio is the FDR. Worked example below (ocelot, Leopardus pardalis , 9 runs):

! A clean FDR is necessary but NOT sufficient for a species call

The decoy-spectra FDR only controls chance homology . Two further errors need their own controls: (1) de novo sequence error — gate with the Casanovo confidence slider (≈21% error at conf ≥0.95, ≈12% at ≥0.99); (2) species mis-assignment from conserved peptides — use the Species (LCA) pane (lowest common ancestor across all hits, not the single best hit). Full validation: docs/DENOVO_FDR_VALIDATION.md.

Live view — your loaded dataset

Hit rate = % of unique de novo peptides in each Casanovo-score bin with ≥1 nr BLAST hit (per peptide, e-value ≤1). Loads denovo/blast_results_decoy_spectra.tsv (falls back to the legacy shuffled decoy).

Export LCA (CSV)

Lowest-common-ancestor species attribution from the nr BLAST. Each de novo peptide is placed at the deepest taxon shared by its top hits: species/genus = diagnostic, family+ = conserved (not species-attributed), bacterial/viral = microbiome.

Per-peptide LCA

One row per de novo peptide combining all three evidence streams: Casanovo confidence , whether Sage found it in the database, and the nr BLAST species/clade (LCA) . Hidden below the confidence slider at the top of the page — low-confidence calls are excluded by default but every peptide is one slider-click away.

Export master table (CSV)

The de novo peptides assembled into proteins by the same parsimony model FragPipe (ProteinProphet) and IDPicker use — the minimal protein set explaining the peptides, with razor peptide assignment and indistinguishable-protein grouping. Click a protein to pop out a per-residue coverage map (reference vs de novo, full-screen-able) that colours amino-acid substitutions by Casanovo confidence. Honours the confidence slider above.

Export protein groups (CSV)

Export Complete Analysis

Download everything needed to reproduce and share this analysis. Includes all data files, DIA-NN search parameters, and session state.

What's included (click to expand)

expression_matrix.csv -- Normalized protein intensities (pipeline-aware: DPC-Quant complete, or MaxLFQ with NAs)
DE_Results_Full.csv -- All contrasts × all proteins with logFC, P.Value, adj.P.Val (when DE was run)
QC_Metrics.csv -- Per-sample QC metrics + group labels (when QC stats exist)
Phospho_DE_Results.csv -- Site-level phospho DE (when phosphoproteomics was run)
diann_pg_matrix.tsv -- DIA-NN protein-level matrix with real missing values (0 = not detected, ~200 KB)
data_quality_summary.csv -- Per-sample protein counts, % detected, contaminant counts
detection_matrix.csv -- Per-protein precursor detection counts per sample
quartile_profiles.csv -- Intensity quartile assignments per sample
variable_proteins.csv -- Proteins with inconsistent abundance across samples
sample_metadata.csv / group_assignments.csv -- Sample groups and identifiers
contaminant_summary.csv -- Contaminant protein statistics
search_info.md -- Full DIA-NN search parameters and job metadata
session.rds -- Complete session state (reload in DE-LIMP)
methods.txt / parameters.txt -- Pipeline parameters, normalization, app version
reproducibility_log.R -- R code log + sessionInfo() to reproduce every step
figures/ -- 9 publication-quality SVG figures: volcano, heatmap_top20, violin_top10_up/down, pca, qc_group_distribution, normalization_density, data_completeness, sample_correlation, pvalue_distribution
PROMPT.md -- AI analysis prompt with biological questions and figure-reference instructions (DE-aware)
MANIFEST.txt -- Per-section export status (any skipped files explained here)

Export Complete Analysis ZIP

DE Results Table

Quick export of the DE results for the selected comparison. Includes gene symbols, logFC, P.Value, adj.P.Val, and per-sample expression values. One CSV file — no search parameters or session data.

Export Results CSV

CV Analysis

Coefficient of variation for significant proteins. Includes per-group CV and average CV values. One CSV file.

Export CV Analysis CSV

Full DIA-NN Output

The complete DIA-NN search output (report.parquet, precursor matrices, spectral libraries, logs) is stored on the HPC cluster. These files can be large (100 MB+) and are not included in the analysis export.

R Code Log
Methods Summary

Action Log: This code recreates your analysis step-by-step. Each section shows:

Action name - what you did (e.g., 'Run Pipeline')
Timestamp - when you did it
R code - how to reproduce it

Copy this entire code block to reproduce your analysis in a fresh R session.

💾 Download Reproducibility Log

DE-LIMP

Differential Expression — LIMPA Pipeline

If DE-LIMP helped your work, a star on GitHub helps other proteomics labs find it. Star DE-LIMP →

Links

GitHub | Hugging Face | Documentation | Discussions

Proteomics Resources & Training

Explore video tutorials, training courses, and methodology citations.