- Chunk-Guided Q-Learning (CGQ) is an offline RL method that trains a reactive single-step policy while leveraging action chunking to stabilize value learning over long horizons.
- CGQ is stable and performant. The chunk-based critic provides a longer-range bootstrap target, reducing compounding error. The single-step critic preserves full reactivity, recovering fine-grained trajectory stitching that chunked methods cannot.