Temporal Preference Concepts and their Functions in a Large Language Model

Unruly Abstractions

Abstract

Causally localizes a subgraph for temporal preference in a distilled LLM (Qwen3-4B-Instruct-2507) using gradient attribution and activation patching, with steering vectors as suggestive control