All posts

Why your AI coach did that: Agent Log goes GA

MR
Martijn Russchen
·7 min read

Most AI coaching tools are a black box. The plan rewrites overnight, the workout shows up on your calendar, and you take it on faith that the model did something sensible. When it gets it wrong, when it gives you a sweet-spot session on a day you can barely walk down the stairs, your only recourse is a complaint into the void.

That's the wrong shape. If the AI is going to act on your behalf, pick today's session, modify a workout because you slept badly, push your long ride to Saturday because Tuesday's Z2 went deeper than planned, you should be able to read the receipt. Today.

I've been running Agent Log behind a beta flag for the past few months. Today it comes out of beta and turns on for every IntervalCoach athlete: free, trial, Pro, Max. No flag toggle, no tier gate. Find it under your account menu as "Agent Log".

What it actually is

Every time the system makes a decision about your training, an entry lands in your Agent Log. The collapsed row tells you what was decided. Tap it open and you see the inputs that drove the decision and the reasoning trace the agent generated.

The entry types it captures:

  • Readiness. The morning call: push, moderate, or rest. Lists every signal the readiness pipeline considered (sleep, HRV, recovery, soreness, training load balance, illness/injury status, ramp rate, fitness projection, usually 12-15 of them) and which ones tipped the call.
  • Workout selection. Which workout type got picked for today, why, and which alternatives the scoring system also considered. Distinguishes the morning cron's pick from a TrainNow click.
  • Load adjustment. When the system caps your TSS or intensity for a reason (returning from illness, deep TSB, aerobic efficiency declining, etc). Shows the multiplier and the reason in plain language.
  • Workout adaptation. When the actual workout on your calendar gets swapped or modified vs. what the plan said. Long Ride · 90 min → Endurance · 60 min, with the trigger.
  • Phase transition. Build → Peak, Peak → Taper, Taper → Race Week, etc. Includes the why (weeks-to-goal threshold, training-load milestone, completed phase length).
  • Weekly plan. Every Sunday cron run leaves a row for each week of the upcoming plan, with type, TSS target, intensity sessions, sports, and the planned week's start date.
  • Plan regeneration. When the macro plan rebuilds from scratch (manual rebuild, settings change, periodization model change). Source field tells you which.
  • Post-workout. The AI's analysis after a completed activity: effectiveness 0-10, stimulus type, key insight, recovery hours, what to adjust about the next session.
  • Wellness re-checks. When your wearable syncs mid-morning and the system re-evaluates whether the morning call still holds. Most of these end up suppressed (outside the 10:00-17:00 re-check window, or no material change), and the row tells you exactly which gate caught it.
  • Adaptation resolved. When an adapted workout is staged for your approval (you have "preview before applying" on) and you click approve or reject. Captures what was proposed, what you decided, and from which surface (web, iOS, email link).
  • Coach+ actions. Every state-changing chat tool call: workout moved, race created, intensity adjusted, settings changed, coaching memory saved. So when next week you ask "wait, who deleted my A-race?", the answer is right there in the log.

The expanded row pulls in the agent trace: wellness inputs the agent saw, the planned workout it was looking at, your TSB drift since the plan was created, the pipeline steps it ran, and the reasoning lines it produced. No prompt context, no API gore. Just the layer of "what did the system see, and what did it conclude."

A real week, end to end

Below is what my own Agent Log looks like for the past week (numbers and timestamps verbatim from production data, just my own). I'm in a taper for the A+ fietsweekend.

Thursday May 7, a normal training day.

Time Type Decision
07:53 Readiness GO. No adaptation signals detected, training as planned (Phase: Peak, TSB +4, CTL 31)
07:53 Workout Neuromusculaire Sprint
09:14 Re-check Suppressed. Wellness arrived at 09:00 local, outside the 10:00–17:00 window
20:14 Analysis Neuromusculaire Sprint, 7/10, anaerobic stimulus

This is the boring shape, and the boring shape is the point. Peak phase, positive TSB, training as planned. Every signal the readiness pipeline checks (sleep, recovery, ramp rate, load balance, illness flags, efficiency, projection) comes back clean, and the row lists them in plain language so I can scroll back and confirm. The morning re-check fired when Whoop synced, but the data landed at 09:00 local, just before the 10:00 window, so the gate correctly suppressed it.

Saturday May 9, the morning the system capped my load.

Time Type Decision
06:42 Readiness MODIFY. TSS capped at 50%, intensity at Z3. Aerobic efficiency declining, back from sick 10 days ago (Phase: Taper, TSB +6, CTL 31)
06:42 Load TSS ×0.5, gradual return plus aerobic efficiency declining
06:42 Workout Recovery (swap from plan)
08:14 Re-check Suppressed. Wellness arrived at 08:00 local, outside the 10:00–17:00 window
22:01 Analysis Lange duurrit, 8/10, endurance stimulus

This is the pivot. Through Thursday and Friday the readiness call was GO. On Saturday morning, the aerobic-efficiency signal crossed threshold (decoupling rising on recent rides) and the system clamped TSS to 50% with a Z3 ceiling. The "back from sick 10 days ago" tag is historical context, but it's the aerobic efficiency declining factor doing the load-cap work this time. The post-workout row's the twist: I rode anyway, a Lange duurrit, scored 8/10 for endurance with decoupling of 0.3%. Body coping fine with the cap, exactly as the system intended.

Sunday May 10, five Coach+ tool calls in a single afternoon.

Time Type Decision
07:11 Readiness MODIFY. TSS 50%, Z3 cap. Aerobic efficiency declining, back from sick 11 days ago (Phase: Taper, TSB 0, CTL 32)
14:32 Coach+ Five state-changing tool calls in one chat session
18:32 Analysis Afternoon Workout, 7/10, recovery stimulus

Sunday afternoon I had a chat with Coach+ that produced five state-changing tool calls in a row. Every one of them lands in the agent log tagged Coach+. Chat history shows the conversation we had; the audit log shows what actually changed on my training as a result. If next week I find myself asking "wait, did Coach+ really move that workout?", the row is right there with the timestamp.

Monday May 11 (today), the cycle continues.

Time Type Decision
06:39 Readiness MODIFY. TSS capped at 50%, intensity at Z3. Aerobic efficiency declining, back from sick 12 days ago (Phase: Taper, TSB +1, CTL 32)
06:39 Load TSS ×0.5, gradual return plus aerobic efficiency declining
06:39 Workout Recovery
06:40 Re-check Suppressed (09:00 local)
13:08 Analysis Morning Workout, 6/10, recovery stimulus

Today's row carries the same reasoning, three days in. The audit log shows that the decision wasn't a one-shot. The system has been consistently applying it every morning since Saturday's pivot. If I question whether it's still right, the factor list answers: aerobic efficiency still declining, ramp rate sustainable, training load balanced. And compared to Thursday's GO row, it's easy to see exactly when the call flipped.

The audit shape

A few things I deliberately decided about Agent Log:

  • Every decision becomes a discrete fact. No quiet overwrites. When the morning cron picks a workout and you later regenerate via Coach+, both decisions are visible, not just the latest one.
  • Decisions are dated by when, not by about-when. A Sunday plan generation that decides about next Race Week doesn't appear under "Race Week", it appears under that Sunday. (This sounds obvious, but the first version got it wrong, with weekly_plan rows landing under future dates and making the timeline read like a forecast instead of an audit.)
  • No quiet skips. If the system would have done something but a gate caught it, the gate gets its own row with the reason. The suppressed re-check rows in my examples above are the everyday case (early-morning wellness syncs that landed outside the 10:00–17:00 re-check window). Features running in shadow mode log the same way: a dry-run row records what the proposed-adaptation feature would have suggested, even though it didn't act. Either shape is doing the audit-log work of recording what the system considered, not just what it acted on.
  • Approve and reject get their own entries. When you have "preview before applying" on, the system stages an adaptation as pending. That moment gets a proposed_adaptation row. The moment you click approve or reject (from web, iOS, or the email link) gets its own adaptation_decision row, with the surface tagged. So the audit reads as a state machine: proposed → resolved (approved/rejected), each transition recorded.
  • Coach+ tool calls are part of it. Every state-changing tool call that happens through chat (workout moved, race added, settings updated, coaching memory saved) leaves a row tagged Coach+. Chat history shows the conversation; the Agent Log shows the side-effects.

What's not in it

A few things I deliberately left out:

  • Prompt and response bodies. The reasoning trace is what the agent surfaced as its reasoning lines, not the raw prompt or the model output. Showing the full prompt would mean shipping Coach+'s system instructions and your full athlete context to the client every time you opened the page. That's a privacy-and-payload story I don't want.
  • Cron internals. The pipeline-steps row tells you the agent ran (e.g.) fetchContext → readinessAssessment → workoutSelection → uploadWorkout. It doesn't tell you which provider serviced the LLM call, how long the cron took, or what the cost was. Those live in admin tooling, not the athlete-facing log.
  • Calendar events themselves. The log captures the decision to add or move a workout, not the workout DSL. The calendar already shows you the workout.

Where to find it

Sign in, open your account menu, click Agent Log. The page is timezone-aware (timestamps render in your local time), grouped by day, newest-first. Tap any row to expand. The refresh button forces a re-fetch from KV; otherwise the API caches for 30 seconds with stale-while-revalidate so it's snappy when you open it from the dashboard.

If you've been on the beta flag, nothing changes other than the toggle is gone. If you're new to the log, it's there now, for every account, regardless of tier.