Software Fault Prediction Using a Language-Proficient Transformer Model: An Enhanced Approach with BugsplorerPy

Authors

  • E. Adediran Department of Computer Science, Lead City University Ibadan, Nigeria
  • S. O. Akinola Department of Computer Science, University of Ibadan, Nigeria

Keywords:

Semantic Tokenization, Cross-File Bug Detection, Hierarchical Transformation, Python Defect

Abstract


The pursuit of reliable software defect prediction (SDP) methodologies continues to confront fundamental
limitations in addressing the idiosyncrasies of dynamically-typed languages, particularly Python, whose syntactic
flexibility and implicit dependencies challenge conventional static analysis paradigms. This work presents
BugsplorerPy, an architecturally innovative transformer-based framework that advances the state-of-the-art
through three seminal contributions: (1) a syntax-aware hierarchical attention mechanism that dynamically
adapts to Python’s indentation-scoped control flow and duck-typed variable semantics, (2) an interprocedural
analysis pipeline that models cross-file defect propagation through import graphs and call-chain embeddings,
and (3) a parameter-efficient adaptation strategy that maintains the expressivity of foundation models while
optimizing for real-world IDE deployment constraints. Empirical validation on the Defectors benchmark—the
first curated dataset for Python-specific defect analysis—reveals statistically significant improvements (p<0.01)
across all evaluation dimensions: achieving 78.5-81.4% balanced accuracy (? +3.83% over baseline), 0.862
0.882 AuROC (? +4.88%), and 72.2-80.1% Recall@20%LOC (? +6.23%), with particular gains in detecting
type-system violations (F1 +7.1%) and exception handling flaws (F1 +5.8%). The model’s novel hybrid
architecture, which synergizes static program analysis with learned representations, demonstrates 83% precision
in identifying defect-prone file clusters—a critical capability for large-scale refactoring efforts. These findings
not only validate the necessity of language-specific SDP adaptations but also establish a new methodological
paradigm for balancing interpretability (through attention-based defect attribution) with the representational
power of modern transformer networks in software engineering contexts.

Downloads

Published

2025-12-18