What is Heretic?

Heretic is a toolkit for customizing, ablation testing, and evaluating large language models (LLMs).It supports dense models and MoE/hybrid architectures and provides built-in chat, a benchmark runner, and model testing utilities.

heretic implements multiple ablation and analysis methods, including directional ablation (Arditi et al., 2024), projected abliteration (Lai, 2025), MPOA (Lai, 2025), experimental SOMA (Piras et al., 2025), and ARA (Weidmann, 2026).

Integrations include Hugging Face model hosting and community repositories (GitHub), with command-line usage examples such as heretic qwen/qwen3-4b-instruct-2507 and pip install support.Designed for researchers and engineering teams, heretic enables reproducible experiments, model introspection, and configurable instruction-following behavior.

Common use cases include ablation studies, benchmarking, model evaluation, and building custom inference workflows for research and production environments.

Heretic user reviews

Based on 1 review, 100.0% of users recommend Heretic, rated highly for quality results.

recommend

don't

1 review

Liked for

Quality results 1 of 1

All key features 1 of 1

Good integrations 1 of 1

Would you recommend Heretic?

Recommend this tool?

Heretic's key features

Customizing, ablation testing, and evaluation of large language models
Support for dense, MoE, and hybrid model architectures
Built-in chat interface, benchmark runner, and model testing utilities
Implements multiple ablation/analysis methods (directional ablation, projected abliteration, MPOA, SOMA, ARA)
Integrations with Hugging Face and GitHub, command-line usage and pip install (Python 3.10+)

Heretic use cases

Run rigorous ablation studies on dense and MoE/hybrid LLMs using heretic to pinpoint which components drive performance, leverage built-in model introspections and analysis methods, and export reproducible experiment artifacts integrated with Hugging Face and GitHub for publication-ready results
Build and iterate custom inference workflows and hybrid MoE deployments with heretic's CLI/pip support and built-in chat interface to rapidly prototype, debug via live model introspection, and version pipelines in GitHub for reproducible production rollouts
Benchmark and compare candidate LLMs across standardized datasets using heretic's benchmark runner to automate multi-model evaluations and ablation sweeps, generate shareable performance reports, and make data-driven model selection and hyperparameter decisions