Sweep - PairCoder Docs

When you rename a function, delete a class, or refactor a module, the old code doesn't clean itself up. Dead imports, orphaned tests, stale helpers, and unreachable exports all pass CI because nothing calls them. They accumulate until the codebase feels like an archaeological dig.

bpsai-pair sweep solves this. Given a diff (branch, commit range, or working tree), it identifies what the change made unnecessary and reports actionable findings grouped by category and confidence.

Works on Any Stack

Sweep ships with a Generic provider that works on any programming language via regex heuristics (~80% accuracy), plus a precise Python provider using the ast module (~95% accuracy). No configuration needed -- language is detected from file extensions.

Quick Start

# Sweep current branch against its merge base
bpsai-pair sweep

# Sweep staged changes only
bpsai-pair sweep --staged

# Sweep against a specific ref
bpsai-pair sweep --since v2.23.0

# JSON output for CI integration
bpsai-pair sweep --json

How It Works

Sweep runs three phases on every invocation:

Phase	What It Does
1. Diff Analysis	Parses the git diff and extracts changed, deleted, and renamed symbols at the function/class/variable level -- not just line-level changes
2. Reference Scan	Searches the codebase for remaining references to old symbol names using word-boundary matching. Filters definition sites, comments, and ignored directories
3. Classification	Categorizes each finding by type and confidence, with suggested actions for each

Finding Categories

Category	What It Means	Suggested Action
`dead_import`	A module imports a symbol that was deleted or renamed	Remove the import
`orphaned_test`	A test references a function or class that no longer exists	Delete or update the test
`stale_helper`	A helper function wraps or calls a deleted function	Remove the helper
`unreachable_export`	A symbol is exported in `__init__.py` or `__all__` but no longer exists	Remove from exports
`stale_reference`	A deleted symbol is mentioned in comments, docstrings, or config files	Update the reference

Confidence Levels

Each finding is assigned a confidence level based on how certain the classification is:

High -- Exact match in an import statement or test function name. Safe to act on.
Medium -- Contextual match (function call, variable reference). Review before acting.
Low -- Found in a comment, string literal, or config file. May be intentional.

Command Reference

Flag	Description	Default
`--since <ref>`	Diff against a specific git ref (commit, tag, or branch)	Auto-detect merge base (dev or main)
`--staged`	Only sweep staged changes	`false`
`--working`	Sweep uncommitted working tree changes	`false`
`--json`	Structured JSON output for CI, MCP, or agent consumption	`false`
`--category <cat>`	Filter findings by category (`dead_import`, `orphaned_test`, etc.)	All categories
`--confidence <level>`	Minimum confidence threshold (`high`, `medium`, `low`)	`low` (show all)
`--fix`	Auto-remove high-confidence dead imports (safe subset only)	`false`
`--deep`	Request deep analysis from Amunet via A2A (when available)	`false`

Exit Codes

0 -- No high-confidence findings (or no findings at all)
1 -- High-confidence findings exist. Use in CI to fail builds with dead code.

Language Support

Sweep auto-detects the language from file extensions and selects the best available provider:

Language	Provider	Accuracy	Method
Python	PythonProvider	~95%	AST module -- full function, class, import, constant extraction
JavaScript, TypeScript, React	GenericProvider	~80%	Regex heuristics for function, class, import patterns
Go, Rust, Java, Ruby, C#	GenericProvider	~80%	Regex heuristics for common definition patterns

Deep Analysis with Amunet

When Amunet is registered on your A2A network, --deep sends the diff for full dependency graph analysis. Amunet traces reverse impacts through the actual import/call graph rather than grep, catching transitive dead code that local analysis misses. Results merge with local findings automatically.

Post-Engage Integration

Sweep runs automatically after each bpsai-pair engage sprint. Findings appear in the PR body under a "Cleanup Opportunities" section. This is advisory only -- it does not block PR creation.

## Cleanup Opportunities

| Category | File | Line | Confidence | Suggestion |
|----------|------|------|------------|------------|
| dead_import | src/utils.py | 3 | high | Remove import of deleted `calculate_total` |
| orphaned_test | tests/test_calc.py | 45 | high | Test references deleted `calculate_total` |

Ignoring Files

Create a .sweepignore file in your project root to exclude paths from reference scanning. Uses the same format as .gitignore:

# .sweepignore
vendor/
generated/
*.pb.go
*_generated.ts

CI Integration

Add sweep to your CI pipeline to catch dead code before it merges:

# GitHub Actions example
- name: Check for dead code
  run: bpsai-pair sweep --since origin/main --confidence high --json

The --json output includes structured findings that can be parsed by other tools or posted as PR comments.

Examples

After a refactor

# You just renamed calculate_total to compute_line_items
bpsai-pair sweep

# Found 3 dead references across 2 files (2 high, 1 medium confidence)
#
#   dead_import   src/billing.py:3       high    Remove import of `calculate_total`
#   orphaned_test tests/test_billing.py:45  high    Test `test_calculate_total` references deleted symbol
#   stale_reference docs/api.md:12       medium  Mentions `calculate_total` in documentation

Auto-fix dead imports

# Remove high-confidence dead imports automatically
bpsai-pair sweep --fix

# Fixed 2 dead imports in 2 files

Filter by category

# Only show orphaned tests
bpsai-pair sweep --category orphaned_test