bio-splicing-pipeline

End-to-end alternative splicing analysis from FASTQ to differential splicing results for short-read bulk RNA-seq. Aligns with STAR 2-pass cohort-style, performs junction QC (RSeQC, MaxEntScan, SpliceAI), runs rMATS-turbo and leafcutter for concordant differential analysis, optionally MAJIQ V3 for complex events / heterogeneous cohorts, isoform-switching with NMD/ORF/domain consequences (IsoformSwitchAnalyzeR v2 + DRIMSeq+DEXSeq+stageR DTU), and sashimi visualizations. Use when performing comprehensive splicing analysis from raw bulk RNA-seq data; for variant-driven splice prediction see splice-variant-prediction; for rare-disease single-patient outlier detection see outlier-splicing-detection; for full-isoform PacBio/ONT analysis see long-read-splicing.

Install

npx skills add https://github.com/GPTomics/bioSkills --skill bio-splicing-pipeline

SKILL.md

Version Compatibility

Reference examples tested with: STAR 2.7.11+, fastp 0.23+, numpy 1.26+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name to verify parameters
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Alternative Splicing Analysis Pipeline

"Analyze alternative splicing from my RNA-seq data" → Orchestrate STAR alignment, PSI quantification (rMATS-turbo/SUPPA2), differential splicing detection, isoform switching analysis (IsoformSwitchAnalyzeR), sashimi plot visualization, and junction QC.

Complete workflow from raw RNA-seq to differential splicing results.

Pipeline Overview

FASTQ → Read QC → STAR 2-pass → Junction QC → rMATS-turbo → Results → Visualization
                                    ↓
                            (Optional) IsoformSwitchAnalyzeR

Step 1: Read Quality Control

# fastp for adapter trimming and quality filtering
fastp \
    -i sample_R1.fastq.gz \
    -I sample_R2.fastq.gz \
    -o sample_clean_R1.fastq.gz \
    -O sample_clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --thread 8 \
    -h sample_fastp.html

Step 2: STAR 2-Pass Alignment

# First pass to detect novel junctions
STAR \
    --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_pass1_ \
    --outSAMtype BAM Unsorted \
    --outSJfilterOverhangMin 8 8 8 8 \
    --alignSJDBoverhangMin 1

# Generate new index with discovered junctions
# (Combine SJ.out.tab files from all samples)
cat *_SJ.out.tab > combined_SJ.out.tab

# Second pass with combined junctions
STAR \
    --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \
    --readFilesCommand zcat \
    --sjdbFileChrStartEnd combined_SJ.out.tab \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --outSJfilterOverhangMin 8 8 8 8 \
    --alignSJDBoverhangMin 1 \
    --quantMode GeneCounts

Step 3: Junction QC Checkpoint

import subprocess

def check_junction_saturation(bam_file, bed_file, output_prefix):
    '''
    QC Checkpoint: Verify junction detection saturation.
    Plateau indicates sufficient depth for splicing analysis.
    '''
    subprocess.run([
        'junction_saturation.py',
        '-i', bam_file,
        '-r', bed_file,
        '-o', output_prefix
    ], check=True)

    # Manual check: curves should plateau
    print(f'Check {output_prefix}.junctionSaturation_plot.pdf')
    print('If curves still rising, consider deeper sequencing')

Step 4: Differential Splicing with rMATS-turbo

# Create sample list files
# condition1_bams.txt: sample1.bam,sample2.bam,sample3.bam
# condition2_bams.txt: sample4.bam,sample5.bam,sample6.bam

rmats.py \
    --b1 condition1_bams.txt \
    --b2 condition2_bams.txt \
    --gtf annotation.gtf \
    -t paired \
    --readLength 150 \
    --nthread 8 \
    --od rmats_output \
    --tmp rmats_tmp

Step 5: Filter Results

import pandas as pd

def filter_differential_splicing(rmats_dir, event_type='SE',
                                  fdr_cutoff=0.05, dpsi_cutoff=0.1, min_reads=10):
    '''
    Filter rMATS results for significant events.

    Thresholds:
    - |deltaPSI| > 0.1 (lenient) or > 0.2 (stringent)
    - FDR < 0.05
    - Junction reads >= 10
    '''
    jc_file = f'{rmats_dir}/{event_type}.MATS.JC.txt'
    df = pd.read_csv(jc_file, sep='\t')

    significant = df[
        (df['FDR'] < fdr_cutoff) &
        (df['IncLevelDifference'].abs() > dpsi_cutoff)
    ].copy()

    print(f'Significant {event_type} events: {len(significant)}')

    # Sort by significance and effect size
    significant['score'] = -significant['FDR'].apply(lambda x: max(x, 1e-300)).apply(
        lambda x: __import__('numpy').log10(x)
    ) * significant['IncLevelDifference'].abs()

    return significant.sort_values('score', ascending=False)

Step 6: Optional Isoform Switching

library(IsoformSwitchAnalyzeR)

# Import Salmon quantification if available
switchList <- importRdata(
    isoformCountMatrix = counts,
    isoformRepExpression = tpm,
    designMatrix = design,
    isoformExonAnnoation = 'annotation.gtf',
    isoformNtFasta = 'transcripts.fa'
)

# Analyze switches
switchList <- isoformSwitchTestDEXSeq(switchList, reduceToSwitchingGenes = TRUE)

Step 7: Sashimi Visualization

import subprocess

def visualize_top_events(rmats_dir, grouping_file, gtf_file, output_dir, n_top=20):
    '''Generate sashimi plots for top differential events.'''
    import pandas as pd
    from pathlib import Path

    Path(output_dir).mkdir(parents=True, exist_ok=True)

    for event_type in ['SE', 'A5SS', 'A3SS', 'MXE', 'RI']:
        jc_file = f'{rmats_dir}/{event_type}.MATS.JC.txt'
        df = pd.read_csv(jc_file, sep='\t')
        sig = df[(df['FDR'] < 0.05) & (df['IncLevelDifference'].abs() > 0.1)]

        for idx, event in sig.head(n_top).iterrows():
            chrom = event['chr']
            start = event.get('upstreamES', event.get('1stExonStart_0base', 0)) - 500
            end = event.get('downstreamEE', event.get('2ndExonEnd', 0)) + 500
            gene = event['geneSymbol']

            subprocess.run([
                'ggsashimi.py',
                '-b', grouping_file,
                '-c', f'{chrom}:{start}-{end}',
                '-o', f'{output_dir}/{event_type}_{gene}',
                '-g', gtf_file,
                '--shrink',
                '--fix-y-scale',
                '-M', '5'
            ], check=True)

Complete Pipeline Script

#!/bin/bash
set -e

# Configuration
SAMPLES="sample1 sample2 sample3 sample4 sample5 sample6"
CONDITIONS="control control control treatment treatment treatment"
GTF="annotation.gtf"
STAR_INDEX="star_index/"
THREADS=8

# Step 1: QC and trimming
for sample in $SAMPLES; do
    fastp -i ${sample}_R1.fq.gz -I ${sample}_R2.fq.gz \
          -o ${sample}_clean_R1.fq.gz -O ${sample}_clean_R2.fq.gz \
          --thread $THREADS
done

# Step 2: STAR 2-pass alignment
# ... (as above)

# Step 3: Junction QC
for sample in $SAMPLES; do
    junction_saturation.py -i ${sample}.bam -r annotation.bed -o ${sample}_junc
done

# Step 4: rMATS differential splicing
rmats.py --b1 control_bams.txt --b2 treatment_bams.txt \
         --gtf $GTF -t paired --readLength 150 --nthread $THREADS \
         --od rmats_output --tmp rmats_tmp

echo "Pipeline complete. Check rmats_output/ for results."

When NOT to Use This Pipeline (Pipeline Variants)

This pipeline targets bulk short-read RNA-seq differential splicing between two groups. For other regimes, use the dedicated skill:

Question	Use instead
"Does this DNA variant alter splicing?"	alternative-splicing/splice-variant-prediction (SpliceAI, Pangolin, MMSplice, ClinGen SVI 2023)
"What is aberrant in this single rare-disease patient?"	alternative-splicing/outlier-splicing-detection (FRASER 2.0, OUTRIDER, DROP)
"Full-isoform analysis from PacBio Iso-Seq / ONT"	alternative-splicing/long-read-splicing (FLAIR, IsoQuant, Bambu, SQANTI3, rMATS-long)
"Single-cell splicing analysis"	alternative-splicing/single-cell-splicing (chemistry-first decision; MARVEL, BRIE2 plate; long-read SC)
"Heterogeneous cohort, n>=10 vs n>=10"	This pipeline + MAJIQ V3 HET module (see alternative-splicing/differential-splicing)
"Microexon-focused (3-27 nt)"	This pipeline with VAST-TOOLS or MicroExonator; see alternative-splicing/splicing-quantification

Related Skills

alternative-splicing/splicing-quantification - PSI computation, event taxonomy, sign conventions
alternative-splicing/differential-splicing - Tool selection, MAJIQ V3, Shiba, leafcutter, reconciliation
alternative-splicing/isoform-switching - DTU + NMD/ORF/domain consequences (IsoformSwitchAnalyzeR v2, stageR)
alternative-splicing/sashimi-plots - ggsashimi, MAJIQ-VOILA, leafviz visualization
alternative-splicing/splicing-qc - STAR 2-pass cohort-style, library prep, depth thresholds
alternative-splicing/single-cell-splicing - 10X chemistry decision; plate-based and long-read SC
alternative-splicing/splice-variant-prediction - SpliceAI / Pangolin / MMSplice variant interpretation
alternative-splicing/outlier-splicing-detection - FRASER 2.0 / DROP rare-disease workflow
alternative-splicing/long-read-splicing - PacBio HiFi / ONT full-isoform analysis
read-alignment/star-alignment - STAR 2-pass cohort-style configuration
rna-quantification/alignment-free-quant - Salmon TPM for SUPPA2 and DTU pipelines

Installs0

GitHub Stars777

LanguagePython

AddedMay 25, 2026

View on GitHub