Hypernym Infinite Memory · v0.63 closeout · generated 2026-06-10 07:39 UTC

The harness protected the science: no clean shared slot, no memory score.

v0.63 attempted to resume the unscored v0.62 domains, but the chat-slot idle barrier returned 40 consecutive fast 503 busy responses before the first benchmark row. That is a capacity/isolation result only. It does not weaken the v0.62 memory finding; it says institutional evals need an isolated lane or a lease/queue protocol.

Benchmark rows run
0 / 12
No memory-quality row reached the chat model.
Idle probes
40 / 40
Every pre-row chat-slot probe returned 503 busy.
Eval tokens spent
0
No benchmark prompt tokens were counted in v0.63.
Health checks
200
Preflight and final health were OK while chat was busy.

What This Proves

  • The v0.63 runner correctly refused to score recall rows when the shared single-flight chat lane was unavailable.
  • /health is not a sufficient readiness signal for a benchmark or demo; health can be OK while the chat slot is occupied.
  • The experiment avoided contaminating memory-quality results with another service's endpoint usage.

What This Does Not Prove

  • It does not prove the memory system failed.
  • It does not prove the endpoint backend is broken.
  • It does not score story, agent-loop, relationship, or psychology recall under v0.63 because no row ran.

Last Positive Memory Evidence

v0.62 remains the latest live memory-quality finding. It scored the research-update domain under 1024 and 2048 pressure with both tail-output variants.

v0.62 factValue
Scored rows passed strict + semantic4 / 4
Prompt tokens on successful rows247,962
Pressure bands scored1024 and 2048
1024 row latencyabout 77-79 seconds
2048 row latencyabout 195-196 seconds
Deck-safe claimResearch-update recall survived heavy pressure with exact current-fact IDs and provenance handles on scored rows.

v0.63 Capacity Trace

The idle barrier sent a tiny chat request, Return exactly OK., before allowing any benchmark row. The lane never cleared during the configured wait budget.

Probe sliceStatusElapsedBody
Attempt 15030.1566sBackend busy: 1 request(s) already in flight (limit: 1). Try again in 10s.
Attempt 405030.1552sBackend busy: 1 request(s) already in flight (limit: 1). Try again in 10s.
Status count503 x 40configured 30s intervalNo chat-slot idle success before stop.
Stop reasonpartialpre-rowidle_probe_failed_before_first_row

Institutional Frame

Hypernym Infinite Memory is a memory control plane for model fleets: per-tenant memory stores, controller-curated recall, exact provenance handles, and a lower-cost path to long-memory inference for small local models.

Capability Thesis

Use cheap, local, mobile-class models with external memory/control logic so the serving cost does not scale like naive long-context attention for every user and every turn.

What We Have Seen

Under controlled rows, exact current-fact IDs and memory-key/provenance handles can survive high pressure; under shared serving, queue discipline becomes the first operational bottleneck.

Hyperscaler Question

Not merely “can it accept more tokens?” The real question is whether it reduces the cost and reliability penalty of long-memory inference across many users and tenants.

Decision Path

1

v0.62 scored memory

Research-update rows passed at 1024/2048 pressure. This is the quality evidence to carry forward.

2

v0.62 hit shared-slot busy

Later domains returned fast 503s. Those rows were not scored as memory failures.

3

v0.63 added idle barrier

The runner waited before the first resumed row and required a clean chat probe.

4

No clean lane appeared

40/40 probes were busy. Next institutional run needs lane reservation or endpoint isolation.

CTO Optimization Findings

  • Add a chat-slot lease or reservation API so evals can distinguish “busy because another tenant is using it” from “backend did not recover.”
  • Expose a readiness endpoint for chat-slot availability, not just process health.
  • Keep pre-row and post-long-row idle barriers in every institutional harness.
  • Report memory quality only on HTTP 200 scored rows; report shared-capacity windows separately.
  • For large-model/hyperscaler scenarios, treat this as memory-plane scheduling plus provenance, not just bigger context.

Next Clean Run

  • Run v0.63 again on an isolated lane or after a confirmed service window.
  • Do not change the row matrix; the failed condition was access, not prompt design.
  • Expected rows: story-canon, agent-loop, relationship, and psychology domains.
  • Success criteria: at least one full domain row group completes with strict/semantic scoring and no idle-probe contamination.

Data Trace

Every claim on this page points to a local artifact that a CTO, auditor, or later agent can inspect directly.

EvidencePathUse
v0.63 live scoresresearch/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_unscored_domain_drain_resume_live_codex_v1/scores.jsonAggregate zero-row result, stop reason, and idle summary.
v0.63 idle attemptsresearch/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_unscored_domain_drain_resume_live_codex_v1/idle-probe-before-first-row-attempts.jsonAll 40 fast 503 busy probes.
v0.63 manifestresearch/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_unscored_domain_drain_resume_live_codex_v1/run-manifest.json12-row planned matrix and idle-probe configuration.
v0.63 preflight healthresearch/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_unscored_domain_drain_resume_live_codex_v1/preflight-health.jsonHealth OK before failed chat-idle window.
v0.63 final healthresearch/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_unscored_domain_drain_resume_live_codex_v1/final-health.jsonHealth OK after failed chat-idle window.
v0.63 snapshot.forge/artifacts/cxdb-hypernym-infinite-mim-post-v063-snapshot-20260610T073942Z.mdDurable context handoff for future sessions.
v0.63 RL trace.forge/artifacts/rl-traces-HYPERNYM_INFINITE_MIM_v063_20260610T073942Z.jsonlMachine-readable research-policy update.
v0.62 live scoresresearch/tracks/hypernym-infinite-mim/results/v0.62-tail-contract-cross-domain-pressure/20260610T_tail_contract_cross_domain_pressure_live_codex_v1/scores.jsonLast positive memory-quality evidence.
Working memoryresearch/tracks/hypernym-infinite-mim/WORKING_MEMORY.mdHuman-readable current state and resume instructions.

Compound Research Chain

ArtifactPointer
Current public boardhttps://hypernym-infinite-memory-v09.pages.dev/
Previous immutable v0.62 boardhttps://087eddb2.hypernym-infinite-memory-v09.pages.dev/
v0.62 local board.forge/artifacts/hypernym-infinite-mim-v0.62-cto-board.html
v0.63 local board.forge/artifacts/hypernym-infinite-mim-v0.63-cto-board.html
v0.63 strategyresearch/tracks/hypernym-infinite-mim/v0.63-unscored-domain-drain-resume-strategy.md
Compound visualization standardresearch/tracks/hypernym-infinite-mim/compound-research-visualization-standard.md