Research
Papers, working drafts, and ongoing projects.
Relevance Is Not Value: Pre-Retrieval Information Selection for LLM Forecasting Agents
Michael Jiao · 2026
Forecasting agents typically retrieve sources by topical relevance, but a relevant article can still be redundant, stale, or unlikely to move the final probability. We test whether a frozen LLM can predict a source's value of information from cheap previews alone — title, domain, date, snippet — before paying the cost of opening it. On OpenForesight questions with hard topical distractors, we compare random, recency, lexical and embedding relevance, an LLM relevance judge, and an LLM VOI selector that picks the sources expected to most improve its own forecast. We evaluate Brier score, log loss, and cost-performance curves across opening budgets to show that relevance and forecast value are not the same thing, and that pre-retrieval value estimates can cut the cost of forecasting agents.
Cache Me if You Can: Repairing the KV State after Compression
Michael Jiao · 2026
KV-cache compression in long-context LLM inference makes eviction decisions from the current prefix, which can discard tokens that only become important in later turns. We introduce RepairKV, a runtime mechanism that uses the idle window between turns to rescore evicted KV rows against a newly available signal and promote a budgeted subset back into the active cache before decoding resumes. On Qwen2.5-7B-Instruct at 32K context, RepairKV reaches 91.0% retrieval on a four-query needle-in-a-haystack task versus 24.5% for the matched no-repair baseline at the same active-cache budget, with only 96 tokens promoted. The repair effect persists across different initial eviction policies and as relevance shifts between turns.