pAnnData Overview
The pAnnData object is the central data container in scpviz, extending the AnnData structure for single-cell and bulk proteomics.
It integrates matched protein-level and peptide-level matrices, along with metadata, summaries, and a protein–peptide relational structure (RS matrix).
This page introduces the core design of pAnnData and shows how to import data from supported formats.
Key Features
- Matched
AnnDataobjects for proteins (.prot) and peptides (.pep) - Support for multiple layers (raw, normalized, imputed, etc.)
- Integrated metadata (
.metadata) and summary tables (.summary) - Tracking of filtering, normalization, and analysis history
- Compatible with all scpviz modules (plotting, enrichment, filtering, etc.)
flowchart LR
subgraph pAnnData["`pAnnData`"]
P["`.prot
(protein AnnData)`"]
Q["`.pep
(peptide AnnData)`"]
S["`.summary
(sample-level table)`"]
M["`.metadata
(dict of metadata)`"]
R["`RS matrix
(protein × peptide mapping)`"]
end
P -- "proteins ↔ peptides" --> R
Q -- "peptides ↔ proteins" --> R
P -. "linked by sample IDs" .-> S
Q -. "linked by sample IDs" .-> S
M --> P
M --> Q
M --> S
Importing Data
Data can be imported into pAnnData directly from DIA-NN, Proteome Discoverer, or other supported formats:
from scpviz import pAnnData as pAnnData
from scpviz import plotting as scplt
from scpviz import utils as scutils
# From DIA-NN report
pdata = pAnnData.import_data(source_type='diann', report_file ="report.tsv")
# From Proteome Discoverer output
pdata = pAnnData.import_data(source_type='pd', prot_file ="proteomediscoverer_prot.txt", pept_file ="proteomediscoverer_pep.txt")
See the importing tutorial for more information.
Once imported, the pAnnData object serves as the entry point for downstream workflows:
filtering, normalization, imputation, visualization, and enrichment analysis.
Workflow Pipeline
The pAnnData object enables a modular analysis pipeline for single-cell and bulk proteomics.
Each step builds on the previous one, but you can skip or repeat steps depending on your dataset and analysis goals.
graph TB
A["`Import data
(DIA-NN / PD)`"] --> B["`Parse metadata
(.obs from filenames)`"]
B --> C["`Filter proteins/peptides
(≥2 unique peptides, sample queries)`"]
C --> D["`Normalize
(global, reference feature, directLFQ)`"]
D --> E["`Impute missing values
(KNN / group-wise)`"]
E --> F["`Visualize data
(abundance, PCA/UMAP, clustermap, raincloud, volcano)`"]
F --> G["`Differential Expression
(mean, pairwise, peptide-level)`"]
G --> H["`Enrichment (STRING)
(GSEA, GO, PPI)`"]
B --> I["`Export results`"]
%% Optional side paths
B -. "QC summaries" .-> F
C -. "RS matrix checks" .-> F
G -. "ranked/unranked lists" .-> H
D .-> I
F .-> I
G .-> I
Tutorials
Each step of the pipeline is explained in detail in the tutorials:
- Importing and Exporting Data
- Filtering
- Imputation + Normalization
- Plotting
- Differential Expression
- Enrichment + Networks
For a guided introduction, start with the Quickstart.
Modules
In addition to the core pAnnData class, scpviz provides two standalone modules that support analysis and visualization:
-
scpviz.utils
A collection of helper functions for data processing, filtering, formatting, and interacting with external resources such as UniProt.
These functions are primarily used internally but can also be useful for advanced users who need finer control over their workflows. -
scpviz.plotting
A set of high-level visualization utilities designed to work seamlessly withpAnnData.
Functions include abundance plots, rank plots, raincloud plots, clustermaps, UMAP/PCA projections, and UpSet diagrams.
Each plotting function is designed to accept amatplotlib.axes.Axesobject for flexible integration into custom figure layouts.
Developer Utilities
scpviz also includes a small set of developer-focused tools that help maintain consistency and internal state:
-
TrackedDataFrame
A subclass ofpandas.DataFramethat marks its parentpAnnDataobject as “stale” whenever it is modified directly.
This ensures that summary tables and metadata stay consistent with the main data layers. -
Hidden functions
Internal helpers that are not part of the standard API.
These are documented for completeness but should generally not be used directly in analysis workflows.
Developer utilities
These tools are included for package maintainers and power users. Most end-users will not need to interact with them directly.