Metrics

Mixins for computing and updating metadata from .obs, .var, or relational data.

MetricsMixin

Computes descriptive and RS-derived metrics for proteomic data.

This mixin provides utility functions for calculating summary statistics on protein and peptide abundance data, as well as inspecting the structure of the RS (protein × peptide) relational matrix.

Features:

Computes per-sample quantification and abundance metrics for both proteins and peptides
Calculates RS-derived properties such as the number of peptides per protein and the number of unique peptides
Updates .obs, .var, and .summary with relevant metrics
Provides visualization and tabular summaries of RS matrix connectivity

Methods:

Name	Description
`_update_metrics`	Computes per-sample and RS-derived metrics for `.prot` and `.pep`.
`_update_summary_metrics`	Adds per-sample high-confidence protein counts to `.summary`.
`describe_rs`	Returns a DataFrame summarizing peptide connectivity per protein.
`plot_rs`	Generates histograms of peptide–protein and protein–peptide mapping counts.

Source code in src/scpviz/pAnnData/metrics.py

class MetricsMixin:
    """
    Computes descriptive and RS-derived metrics for proteomic data.

    This mixin provides utility functions for calculating summary statistics on
    protein and peptide abundance data, as well as inspecting the structure of
    the RS (protein × peptide) relational matrix.

    Features:

    - Computes per-sample quantification and abundance metrics for both proteins and peptides
    - Calculates RS-derived properties such as the number of peptides per protein and the number of unique peptides
    - Updates `.obs`, `.var`, and `.summary` with relevant metrics
    - Provides visualization and tabular summaries of RS matrix connectivity

    Functions:
        _update_metrics: Computes per-sample and RS-derived metrics for `.prot` and `.pep`.
        _update_summary_metrics: Adds per-sample high-confidence protein counts to `.summary`.
        describe_rs: Returns a DataFrame summarizing peptide connectivity per protein.
        plot_rs: Generates histograms of peptide–protein and protein–peptide mapping counts.
    """
    def _update_metrics(self):
        """
        Compute and update core QC and RS-based metrics for `.obs` and `.var`.

        This internal method updates:

        - `.prot.obs` and `.pep.obs` with per-sample metrics:
            • `*_quant`: Proportion of non-missing values
            • `*_count`: Number of non-missing values
            • `*_abundance_sum`: Sum of observed abundances
            • `mbr_count`, `high_count`: Count of MBR annotations (if present in layer 'X_mbr')

        - `.prot.var` with RS-derived metrics (if available):
            • `peptides_per_protein`: Total peptides mapped to each protein
            • `unique_peptides`: Number of peptides uniquely mapping to each protein

        Note:
            This function is typically called automatically after filtering, imputation,
            or importing new data. It should not be run manually under normal usage.
        """
        if self.prot is not None:
            if self.prot.is_view:
                self.prot = self.prot.copy()

            X = self.prot.X.toarray()
            self.prot.obs['protein_quant'] = np.sum(~np.isnan(X), axis=1) / X.shape[1]
            self.prot.obs['protein_count'] = np.sum(~np.isnan(X), axis=1)
            self.prot.obs['protein_abundance_sum'] = np.nansum(X, axis=1)

            if 'X_mbr' in self.prot.layers:
                self.prot.obs['mbr_count'] = (self.prot.layers['X_mbr'] == 'Peak Found').sum(axis=1)
                self.prot.obs['high_count'] = (self.prot.layers['X_mbr'] == 'High').sum(axis=1)

        if self.pep is not None:
            if self.pep.is_view:
                self.pep = self.pep.copy()
            X = self.pep.X.toarray()
            self.pep.obs['peptide_quant'] = np.sum(~np.isnan(X), axis=1) / X.shape[1]
            self.pep.obs['peptide_count'] = np.sum(~np.isnan(X), axis=1)
            self.pep.obs['peptide_abundance_sum'] = np.nansum(X, axis=1)

            if 'X_mbr' in self.pep.layers:
                self.pep.obs['mbr_count'] = (self.pep.layers['X_mbr'] == 'Peak Found').sum(axis=1)
                self.pep.obs['high_count'] = (self.pep.layers['X_mbr'] == 'High').sum(axis=1)

        # RS metrics for prot.var
        if self.rs is not None and self.prot is not None:
            rs = self.rs  # leave it sparse
            peptides_per_protein = rs.getnnz(axis=1)
            unique_mask = rs.getnnz(axis=0) == 1
            unique_counts = rs[:, unique_mask].getnnz(axis=1)
            self.prot.var['peptides_per_protein'] = peptides_per_protein
            self.prot.var['unique_peptides'] = unique_counts

    def _update_summary_metrics(self, unique_peptide_thresh=2):
        """
        Compute RS-derived per-sample summary metric for protein confidence.

        Adds the following column to `.summary`:

        - `unique_pep2_protein_count`: Number of proteins per sample with at least 
        `unique_peptide_thresh` uniquely mapping peptides (default: 2).

        This is useful for quality control and filtering based on protein-level confidence.

        Args:
            unique_peptide_thresh (int): Minimum number of uniquely mapping peptides required
                to consider a protein as confidently quantified.
        """
        if (
            self.rs is not None and
            self.prot is not None and
            hasattr(self, '_summary') and
            'unique_peptides' in self.prot.var.columns
        ):
            unique_mask = self.prot.var['unique_peptides'] >= unique_peptide_thresh
            quant_matrix = self.prot.X.toarray()
            high_conf_matrix = quant_matrix[:, unique_mask]
            high_conf_count = np.sum(~np.isnan(high_conf_matrix), axis=1)
            self._summary['unique_pep2_protein_count'] = high_conf_count

    def describe_rs(self):
        """
        Summarize the protein–peptide RS (relational) matrix.

        Returns a DataFrame with one row per protein, describing its peptide mapping coverage:

        - `peptides_per_protein`: Total number of peptides mapped to each protein.
        - `unique_peptides`: Number of uniquely mapping peptides (peptides linked to only one protein).

        Returns:
            pd.DataFrame: Summary statistics for each protein in the RS matrix.

        Note:
            If `.prot` is available, index labels are taken from `.prot.var_names`.
        """
        if self.rs is None:
            print("⚠️ No RS matrix set.")
            return None

        rs = self.rs

        # peptides per protein
        peptides_per_protein = rs.getnnz(axis=1)
        # unique peptides per protein (those mapped only to this protein)
        unique_mask = rs.getnnz(axis=0) == 1
        unique_counts = rs[:, unique_mask].getnnz(axis=1)

        summary_df = pd.DataFrame({
            "peptides_per_protein": peptides_per_protein,
            "unique_peptides": unique_counts
        }, index=self.prot.var_names if self.prot is not None else range(rs.shape[0]))

        return summary_df

describe_rs

describe_rs()

Summarize the protein–peptide RS (relational) matrix.

Returns a DataFrame with one row per protein, describing its peptide mapping coverage:

peptides_per_protein: Total number of peptides mapped to each protein.
unique_peptides: Number of uniquely mapping peptides (peptides linked to only one protein).

Returns:

Type	Description
	pd.DataFrame: Summary statistics for each protein in the RS matrix.

Note

If .prot is available, index labels are taken from .prot.var_names.

Source code in src/scpviz/pAnnData/metrics.py

def describe_rs(self):
    """
    Summarize the protein–peptide RS (relational) matrix.

    Returns a DataFrame with one row per protein, describing its peptide mapping coverage:

    - `peptides_per_protein`: Total number of peptides mapped to each protein.
    - `unique_peptides`: Number of uniquely mapping peptides (peptides linked to only one protein).

    Returns:
        pd.DataFrame: Summary statistics for each protein in the RS matrix.

    Note:
        If `.prot` is available, index labels are taken from `.prot.var_names`.
    """
    if self.rs is None:
        print("⚠️ No RS matrix set.")
        return None

    rs = self.rs

    # peptides per protein
    peptides_per_protein = rs.getnnz(axis=1)
    # unique peptides per protein (those mapped only to this protein)
    unique_mask = rs.getnnz(axis=0) == 1
    unique_counts = rs[:, unique_mask].getnnz(axis=1)

    summary_df = pd.DataFrame({
        "peptides_per_protein": peptides_per_protein,
        "unique_peptides": unique_counts
    }, index=self.prot.var_names if self.prot is not None else range(rs.shape[0]))

    return summary_df