2026-01-22

RAG Data Preparation: The Foundation That Makes or Breaks Your AI System

Comprehensive guide to preparing data for RAG systems covering document parsing, chunking strategies, contextual enrichment, and embedding optimization

Most RAG implementation failures trace back to data preparation, not retrieval architecture. Teams spend weeks tuning retrieval parameters when the real problem is poorly parsed documents or inappropriate chunking. This guide covers the critical foundation that determines the quality ceiling of your RAG system.

Why Data Preparation Is the Most Critical RAG Step

There is a common pattern in RAG implementations: sophisticated retrieval architectures (hybrid search, reranking, CRAG) that still produce poor results. The root cause is almost always upstream in the data preparation layer.

The key insight: if data preparation fails at 60% quality, no amount of architectural sophistication can push retrieval quality above that ceiling. Teams report 40-60% quality improvements from fixing data preparation alone, often without touching retrieval logic.

Document Parsing: Extracting Clean Text from Messy Sources

Real-world documents are messy. PDFs store text as positioned glyphs, not logical sequences. Tables get mangled. Multi-column layouts require layout analysis. Scanned documents need OCR with 5-15% error rates.

Parsing Tool Selection

Tool	Table Accuracy	Text Fidelity	Speed	Best For
Docling	97.9%	Excellent	~10s/page	Complex documents
LlamaParse	75-90%	Good	~6s/doc	Speed-critical
Unstructured	75-100%*	Good	Variable	OCR-heavy
PyMuPDF/PyPDF	60-70%	Fair	Fast	Simple PDFs

*Unstructured achieves 100% on simple tables, 75% on complex structures

Practical PDF Parsing

from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat

converter = DocumentConverter()

# Parse PDF with layout analysis
result = converter.convert("technical-manual.pdf")

# Access structured content
for element in result.document.body:
    if element.type == "table":
        # Table extracted with structure preserved
        markdown_table = element.export_to_markdown()
    elif element.type == "text":
        # Text with section context
        text = element.text
        section = element.section_header

# Export as markdown for RAG ingestion
markdown_output = result.document.export_to_markdown()

HTML Content Extraction

from bs4 import BeautifulSoup
from readability import Document

def extract_html_content(html: str) -> dict:
    """Extract meaningful content from HTML, handling diverse structures."""

    # Use readability for main content extraction
    doc = Document(html)
    main_content = doc.summary()
    title = doc.title()

    # Parse with BeautifulSoup for structure
    soup = BeautifulSoup(main_content, 'html.parser')

    # Remove navigation, ads, footers
    for element in soup.find_all(['nav', 'footer', 'aside', 'script', 'style']):
        element.decompose()

    # Extract text with structure preservation
    text_blocks = []
    for element in soup.find_all(['h1', 'h2', 'h3', 'p', 'li', 'td']):
        text = element.get_text(strip=True)
        if text:
            text_blocks.append({
                'type': element.name,
                'text': text,
                'level': int(element.name[1]) if element.name.startswith('h') else 0
            })

    return {
        'title': title,
        'blocks': text_blocks,
        'full_text': soup.get_text(separator='\n', strip=True)
    }

Tip: Start with rule-based parsing before resorting to LLM-based parsing. Use hybrid approaches: heuristics for structure combined with Vision-Language Models only for the most challenging elements.

Text Preprocessing: Cleaning for Embedding Quality

Embeddings encode noise along with signal. Inconsistent formatting creates spurious similarity. PII in embeddings creates security risks. Preprocessing removes these issues before they propagate through the pipeline.

import re
from typing import List
import unicodedata

class TextPreprocessor:
    def __init__(self):
        self.pii_patterns = {
            'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
            'phone': r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',
            'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
            'credit_card': r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b',
        }

    def normalize_whitespace(self, text: str) -> str:
        """Normalize all whitespace to single spaces."""
        text = re.sub(r'\s+', ' ', text)
        return text.strip()

    def normalize_unicode(self, text: str) -> str:
        """Normalize unicode characters to consistent form."""
        return unicodedata.normalize('NFKC', text)

    def redact_pii(self, text: str) -> str:
        """Detect and redact PII patterns."""
        for pii_type, pattern in self.pii_patterns.items():
            text = re.sub(pattern, f'[REDACTED_{pii_type.upper()}]', text)
        return text

    def remove_boilerplate(self, text: str, patterns: List[str] = None) -> str:
        """Remove known boilerplate text patterns."""
        default_patterns = [
            r'Page \d+ of \d+',
            r'Copyright \d{4}.*?(?=\n|$)',
            r'All rights reserved\.?',
        ]
        patterns = patterns or default_patterns
        for pattern in patterns:
            text = re.sub(pattern, '', text, flags=re.IGNORECASE)
        return text

    def process(self, text: str, redact_pii: bool = True) -> str:
        """Run full preprocessing pipeline."""
        text = self.normalize_unicode(text)
        text = self.remove_boilerplate(text)
        text = self.normalize_whitespace(text)
        if redact_pii:
            text = self.redact_pii(text)
        return text

Deduplication

Near-duplicate content wastes storage and skews retrieval results. MinHash LSH provides efficient near-duplicate detection:

from datasketch import MinHash, MinHashLSH
import hashlib
from typing import List, Set

class Deduplicator:
    def __init__(self, threshold: float = 0.8, num_perm: int = 128):
        self.threshold = threshold
        self.num_perm = num_perm
        self.lsh = MinHashLSH(threshold=threshold, num_perm=num_perm)
        self.exact_hashes: Set[str] = set()
        self.doc_id = 0

    def _compute_minhash(self, text: str) -> MinHash:
        """Compute MinHash signature for text."""
        minhash = MinHash(num_perm=self.num_perm)
        words = text.lower().split()
        for i in range(len(words) - 2):
            shingle = ' '.join(words[i:i+3])
            minhash.update(shingle.encode('utf-8'))
        return minhash

    def is_duplicate(self, text: str) -> bool:
        """Check for exact or near duplicates."""
        # Exact duplicate check
        text_hash = hashlib.md5(text.encode()).hexdigest()
        if text_hash in self.exact_hashes:
            return True
        self.exact_hashes.add(text_hash)

        # Near-duplicate check
        minhash = self._compute_minhash(text)
        if self.lsh.query(minhash):
            return True

        self.lsh.insert(f"doc_{self.doc_id}", minhash)
        self.doc_id += 1
        return False

    def deduplicate(self, documents: List[str]) -> List[str]:
        """Remove duplicates from document list."""
        return [doc for doc in documents if not self.is_duplicate(doc)]

Chunking Strategies: The Art of Splitting Documents

Chunking determines how information is organized for retrieval. The core tension: too small loses context, too large dilutes relevance signal.

Strategy Comparison

Recursive Character Splitting (Recommended Default)

from langchain_text_splitters import RecursiveCharacterTextSplitter

def recursive_chunking(text: str, chunk_size: int = 512, overlap: int = 50) -> list:
    """
    Split text recursively using hierarchy of separators.
    Tries to keep paragraphs together, then sentences, then words.
    """
    splitter = RecursiveCharacterTextSplitter(
        separators=[
            "\n\n",  # Paragraphs first
            "\n",  # Then line breaks
            ". ",  # Then sentences
            ", ",  # Then clauses
            " ",  # Finally words
        ],
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        length_function=len,
    )
    return splitter.split_text(text)

Semantic Chunking

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings

def semantic_chunking(text: str) -> list:
    """
    Split based on semantic similarity between sentences.
    Groups semantically related content together.
    """
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    splitter = SemanticChunker(
        embeddings=embeddings,
        breakpoint_threshold_type="percentile",
        breakpoint_threshold_amount=95,  # Split at top 5% dissimilarity
        min_chunk_size=100
    )

    return splitter.split_text(text)

# Performance: Up to 70% accuracy improvement in retrieval (varies by content type)
# Trade-off: Requires embedding calls during chunking

Hierarchical Parent-Child Chunking

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_chroma import Chroma

def setup_hierarchical_chunking(documents: list, embeddings):
    """
    Create parent-child hierarchy for precision + context.
    Search on small child chunks, return large parent chunks.
    """
    # Parent splitter: large chunks for context
    parent_splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000,
        chunk_overlap=200
    )

    # Child splitter: small chunks for precise retrieval
    child_splitter = RecursiveCharacterTextSplitter(
        chunk_size=400,
        chunk_overlap=50
    )

    # Storage for parent documents
    docstore = InMemoryStore()

    # Vector store indexes child chunks
    vectorstore = Chroma(
        collection_name="child_chunks",
        embedding_function=embeddings
    )

    # Retriever searches children, returns parents
    retriever = ParentDocumentRetriever(
        vectorstore=vectorstore,
        docstore=docstore,
        child_splitter=child_splitter,
        parent_splitter=parent_splitter
    )

    retriever.add_documents(documents)
    return retriever

# Performance: Improved relevance on structured documents by preserving context
# Trade-off: 2-3x storage overhead

Chunk Size Guidelines

Content Type	Recommended Size	Overlap	Rationale
Technical docs	512 tokens	50-100	Balance detail with context
Conversational	256 tokens	25-50	Shorter exchanges
Legal/contracts	1024 tokens	100-150	Preserve clause context
Code	1000 chars	100	Keep functions intact
Q&A pairs	128-256 tokens	0	Each Q&A is self-contained

Contextual Chunking: Solving the Lost Context Problem

Traditional chunking destroys document context. A chunk saying “This approach reduces latency by 40%” is useless without knowing which approach. Contextual chunking addresses this.

Anthropic’s Contextual Retrieval Technique

from anthropic import Anthropic
from typing import List

def add_contextual_headers(
    document: str,
    chunks: List[str],
    model: str = "claude-3-5-haiku-latest"
) -> List[str]:
    """
    Prepend chunk-specific context using Claude.
    Reduces retrieval failures by 35% with contextual embeddings alone,
    49% when combined with BM25 hybrid search, and 67% with reranking added.
    """
    client = Anthropic()
    contextualized_chunks = []

    context_prompt = """Here is the full document:
<document>
{document}
</document>

Here is a chunk from that document:
<chunk>
{chunk}
</chunk>

Provide a short context (2-3 sentences) to situate this chunk within the document. Focus on:
1. What section/topic this chunk belongs to
2. Key entities or concepts being discussed
3. How it relates to the document's main subject

Context:"""

    for chunk in chunks:
        response = client.messages.create(
            model=model,
            max_tokens=200,
            messages=[{
                "role": "user",
                "content": context_prompt.format(document=document, chunk=chunk)
            }]
        )

        context = response.content[0].text
        contextualized_chunks.append(f"{context}\n\n{chunk}")

    return contextualized_chunks

# Cost with prompt caching: ~$1.02 per million document tokens

Rule-Based Context (Zero-Cost Alternative)

def add_structural_context(chunks: List[dict]) -> List[dict]:
    """
    Add context based on document structure without LLM calls.
    Uses metadata from structure-aware chunking.
    """
    contextualized = []

    for chunk in chunks:
        metadata = chunk.get('metadata', {})
        content = chunk['content']

        context_parts = []
        if 'document_title' in metadata:
            context_parts.append(f"From: {metadata['document_title']}")
        if 'header_1' in metadata:
            context_parts.append(f"Section: {metadata['header_1']}")
        if 'header_2' in metadata:
            context_parts.append(f"Subsection: {metadata['header_2']}")

        context = " | ".join(context_parts)
        contextualized.append({
            'content': f"{context}\n\n{content}" if context else content,
            'metadata': metadata
        })

    return contextualized

Embedding Model Selection

Choosing the right embedding model depends on content type, chunk size, and deployment constraints. Note that MTEB scores change frequently as models are updated and new benchmarks are added, so always verify current scores before making decisions.

Model	MTEB Score	Dimensions	Cost/1M tokens	Best For
Cohere embed-v4	65.2	1024	$0.10	Multilingual, production
text-embedding-3-large	64.6	3072	$0.13	General purpose
text-embedding-3-small	62.3	1536	$0.02	Cost-sensitive
Voyage voyage-3-large	63.8	1536	$0.12	RAG-optimized
BGE-M3	63.0	1024	Self-hosted	Privacy-critical

Embedding Optimization

from typing import List
import numpy as np
from openai import OpenAI

client = OpenAI()

def get_embeddings(
    texts: List[str],
    dimensions: int = 1024
) -> List[List[float]]:
    """
    Get OpenAI embeddings with dimension reduction.
    256-dim text-embedding-3-large outperforms full ada-002.
    """
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=texts,
        dimensions=dimensions  # Matryoshka truncation
    )
    return [item.embedding for item in response.data]


def batch_embed_with_normalization(
    texts: List[str],
    batch_size: int = 100,
    dimensions: int = 1024
) -> np.ndarray:
    """
    Embed texts in batches with L2 normalization.
    Normalization enables cosine similarity via dot product.
    """
    all_embeddings = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        embeddings = get_embeddings(batch, dimensions)
        all_embeddings.extend(embeddings)

    embeddings_array = np.array(all_embeddings)

    # L2 normalize for cosine similarity
    norms = np.linalg.norm(embeddings_array, axis=1, keepdims=True)
    return embeddings_array / norms

Metadata Extraction and Enrichment

Metadata enables filtering before semantic search, provides ranking signals, and supports source attribution.

from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime

@dataclass
class ChunkMetadata:
    # Content-based
    keywords: List[str]
    entities: List[str]
    content_type: str

    # Structural
    document_title: str
    section_header: Optional[str]
    chunk_index: int

    # Contextual
    source_url: Optional[str]
    ingestion_date: datetime
    language: str

    # Technical
    word_count: int
    has_code: bool
    has_table: bool

Automated Extraction

import spacy
from collections import Counter
from typing import Dict, List

class MetadataExtractor:
    def __init__(self):
        self.nlp = spacy.load("en_core_web_sm")

    def extract_entities(self, text: str) -> Dict[str, List[str]]:
        """Extract named entities using spaCy."""
        doc = self.nlp(text)
        entities = {}
        for ent in doc.ents:
            if ent.label_ not in entities:
                entities[ent.label_] = []
            entities[ent.label_].append(ent.text)
        return entities

    def extract_keywords(self, text: str, top_n: int = 10) -> List[str]:
        """Extract keywords using noun chunks."""
        doc = self.nlp(text)
        noun_chunks = [chunk.text.lower() for chunk in doc.noun_chunks]
        chunk_counts = Counter(noun_chunks)
        return [word for word, _ in chunk_counts.most_common(top_n)]

    def detect_content_type(self, text: str) -> str:
        """Heuristic content type detection."""
        code_patterns = ['def ', 'function ', 'class ', 'import ', '```']
        if any(pattern in text for pattern in code_patterns):
            return 'code'

        tech_indicators = ['API', 'database', 'server', 'deployment']
        if sum(1 for ind in tech_indicators if ind.lower() in text.lower()) >= 2:
            return 'technical'

        return 'general'

Storing with Vector Database

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance, Filter, FieldCondition, MatchValue

def store_chunks_with_metadata(
    chunks: List[str],
    embeddings: List[List[float]],
    metadata_list: List[dict],
    collection_name: str = "documents"
):
    """Store chunks with rich metadata in Qdrant."""
    client = QdrantClient(host="localhost", port=6333)

    client.recreate_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(
            size=len(embeddings[0]),
            distance=Distance.COSINE
        )
    )

    points = [
        PointStruct(
            id=idx,
            vector=embedding,
            payload={"text": chunk, **metadata}
        )
        for idx, (chunk, embedding, metadata)
        in enumerate(zip(chunks, embeddings, metadata_list))
    ]

    client.upsert(collection_name=collection_name, points=points)


def search_with_filter(
    query_embedding: List[float],
    collection_name: str,
    content_type: str = None,
    top_k: int = 10
) -> List[dict]:
    """Search with optional metadata filter."""
    client = QdrantClient(host="localhost", port=6333)

    query_filter = None
    if content_type:
        query_filter = Filter(
            must=[FieldCondition(
                key="content_type",
                match=MatchValue(value=content_type)
            )]
        )

    results = client.search(
        collection_name=collection_name,
        query_vector=query_embedding,
        query_filter=query_filter,
        limit=top_k
    )

    return [{"text": hit.payload["text"], "score": hit.score} for hit in results]

Quality Metrics for Data Preparation

Measuring data quality enables data-driven optimization and early issue detection.

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from typing import List

def evaluate_chunk_coherence(chunks: List[str], embedding_model) -> dict:
    """
    Measure semantic coherence within chunks.
    High coherence = chunk discusses single topic.
    """
    coherence_scores = []

    for chunk in chunks:
        sentences = chunk.split('. ')
        if len(sentences) < 2:
            coherence_scores.append(1.0)
            continue

        embeddings = np.array(embedding_model.embed(sentences))
        similarities = cosine_similarity(embeddings)

        n = len(sentences)
        coherence = (similarities.sum() - n) / (n * (n - 1)) if n > 1 else 1.0
        coherence_scores.append(coherence)

    return {
        'mean_coherence': np.mean(coherence_scores),
        'min_coherence': np.min(coherence_scores),
        'low_coherence_count': sum(1 for s in coherence_scores if s < 0.5)
    }


def evaluate_boundary_quality(chunks: List[str]) -> dict:
    """Check if chunks have clean boundaries."""
    bad_starts = 0
    bad_ends = 0

    lowercase_starters = ['and', 'but', 'or', 'so', 'because', 'however']

    for chunk in chunks:
        first_word = chunk.split()[0].lower() if chunk.split() else ''
        if first_word in lowercase_starters:
            bad_starts += 1

        if chunk and chunk.rstrip()[-1] not in '.!?:':
            bad_ends += 1

    return {
        'bad_start_ratio': bad_starts / len(chunks),
        'bad_end_ratio': bad_ends / len(chunks),
        'clean_boundary_ratio': 1 - (bad_starts + bad_ends) / (2 * len(chunks))
    }


def evaluate_retrieval_quality(
    embeddings: np.ndarray,
    test_queries: List[str],
    relevant_chunk_ids: List[List[int]],
    embedding_model
) -> dict:
    """Evaluate embedding quality using retrieval tests."""
    query_embeddings = np.array(embedding_model.embed(test_queries))
    similarities = cosine_similarity(query_embeddings, embeddings)

    hits_at_k = {1: 0, 5: 0, 10: 0}
    mrr_sum = 0

    for i, relevant_ids in enumerate(relevant_chunk_ids):
        ranked_indices = np.argsort(similarities[i])[::-1]

        for rank, idx in enumerate(ranked_indices):
            if idx in relevant_ids:
                mrr_sum += 1 / (rank + 1)
                for k in hits_at_k:
                    if rank < k:
                        hits_at_k[k] += 1
                break

    n_queries = len(test_queries)
    return {
        'mrr': mrr_sum / n_queries,
        'hit_rate@1': hits_at_k[1] / n_queries,
        'hit_rate@5': hits_at_k[5] / n_queries,
        'hit_rate@10': hits_at_k[10] / n_queries
    }

Common Pitfalls and Solutions

Pitfall 1: Skipping Parsing Validation

Assuming parsing tools work perfectly on all documents leads to missing content and mangled tables in retrieval results. Always validate parsing output on representative samples before full ingestion.

Pitfall 2: One-Size-Fits-All Chunking

Using the same chunk size for all content types results in code split mid-function and tables losing context. Match chunking strategy to content structure.

Pitfall 3: Ignoring Lost Context

Chunks that reference “it”, “this method”, “as mentioned” become meaningless in isolation. Implement contextual chunking (LLM or rule-based) to make chunks self-contained.

Pitfall 4: Choosing Models by Benchmark Alone

MTEB scores do not reflect performance on specific content. A high-benchmark model can perform poorly on domain-specific queries. Evaluate embedding models on your own test queries.

Pitfall 5: Processing in Wrong Order

Chunking before cleaning or embedding before deduplication creates noisy results. Follow the pipeline: parse -> clean -> dedupe -> chunk -> enrich -> embed.

Building Your Pipeline: A Practical Approach

The order in which you tackle data preparation matters. Here’s how to think about building your pipeline.

Start with parsing validation. Before writing any pipeline code, manually inspect parsing output for 10-20 representative documents. Look for mangled tables, missing sections, and garbled text. If your parser fails on 30% of samples, no downstream optimization will save you.

Next, establish your preprocessing baseline. Run your text through normalization, PII detection, and boilerplate removal. Compare before/after samples. The goal is clean, consistent text without losing meaningful content.

Then choose your chunking strategy based on what you learned from parsing. If your documents have clear hierarchical structure (headers, sections), leverage it with structure-aware chunking. If they’re dense technical prose, recursive splitting is your friend. If you’re dealing with mixed content types, consider routing different document types to different strategies.

Add context only after chunking works well. Contextual enrichment is powerful but adds cost and complexity. Get your basic pipeline producing reasonable results first, then measure whether contextual chunking improves your specific retrieval scenarios.

Finally, close the loop with metrics. Implement coherence and boundary quality checks. Create a small test set of queries with known relevant chunks. Run retrieval evaluations weekly as you tune parameters. Without measurement, you’re guessing.

The key insight: each step depends on the previous one working correctly. Resist the urge to implement everything at once. A simple pipeline you understand beats a complex one you can’t debug.

Key Takeaways

Data preparation sets the quality ceiling: The most sophisticated RAG architecture cannot compensate for poorly prepared data.

Parsing determines everything downstream: Invest in quality parsing tools and validate output before proceeding.

Context matters more than chunk size: Contextual chunking reduces retrieval failures by 35% alone, 49% with BM25, and 67% with reranking.

Quality metrics are non-negotiable: Measure parsing accuracy, chunk coherence, and retrieval quality throughout the pipeline.

Start simple, measure, enhance: Begin with RecursiveCharacterTextSplitter and quality parsing. Add complexity only when metrics justify it.

Sources

Anthropic: Introducing Contextual Retrieval - Research on contextual chunking with performance benchmarks
LangChain Text Splitters Documentation - Official documentation for chunking strategies
Docling: Document Parsing Library - High-accuracy document parsing tool
MTEB Leaderboard - Massive Text Embedding Benchmark for model comparison
Qdrant Documentation - Vector database with metadata filtering
datasketch: MinHash LSH - Near-duplicate detection library
spaCy: Industrial NLP - Named entity recognition and text processing
OpenAI Embeddings Guide - Embedding model documentation with Matryoshka truncation

RAG Architecture Patterns: Beyond Basic Vector Search

A comprehensive guide to advanced RAG techniques including hybrid search, reranking, GraphRAG, and self-corrective patterns with production AWS implementation examples.

ragllmvector-databases+7

December 15, 2025

AI/LLM Glossary: 82 Terms Every Developer Should Know

A practical, implementation-focused glossary for developers navigating the AI/LLM landscape. From tokens to agents, RAG to fine-tuning, with code examples and honest assessments.

llmgenaiai-agents+9

January 17, 2026

Amazon Bedrock Knowledge Bases: Anatomy and the Confluence-Shaped Question

A platform-engineer read of what a Bedrock Knowledge Base actually is, which data sources and vector stores are first-class, and why the console default rarely fits a small corpus.

awsaws-bedrockrag+5

May 12, 2026

Prompt Engineering for Production Systems: A Systematic Engineering Approach

A comprehensive technical guide to building production-grade prompt engineering systems, covering systematic design, security, observability, and cost optimization for enterprise LLM applications.

prompt-engineeringllmai-development+6

December 26, 2025

LangChain in Production: Patterns That Work and Anti-Patterns That Don't

Real lessons from deploying LangChain applications to production. Learn about the anti-patterns that cause failures and the patterns that enable success, with working code examples and cost optimization strategies.

langchainllmproduction+5

December 3, 2025