Back to Knowledge Hub AI Data

SWE-smith: The Dataset Pipeline That Makes Software Engineering Agents Trainable at Scale

SWE-smith auto-generates 50k repo tasks, pushing open models to 40.2% pass@1 on SWE-bench Verified.

5 min read Updated 2025

SWE-smith breaks tests in repos to synthesize realistic tasks with environments and validation. It improves coverage for training and evals.

Where SWE-smith saves time

Explore more in-depth guides and comparisons in our Knowledge Hub.

Browse All Articles Compare Tools