<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Damir Dev | Frontend (React, Next.js), System Design & AI/LLM]]></title><description><![CDATA[A blog about frontend engineering, React, Next.js, scalable architecture, and applied LLM systems in production.]]></description><link>https://blog.damir-karimov.com</link><generator>RSS for Node</generator><lastBuildDate>Mon, 04 May 2026 12:56:54 GMT</lastBuildDate><atom:link href="https://blog.damir-karimov.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Client-Side Caching with LLMs: A Layered Decision Architecture for Cache Strategy under Uncertainty]]></title><description><![CDATA[Client-side caching is commonly implemented as a storage optimization layer using TTLs and invalidation rules. In practice, caching behaves as a decision system under uncertainty, where correctness de]]></description><link>https://blog.damir-karimov.com/client-side-caching-llm-decision-architecture</link><guid isPermaLink="true">https://blog.damir-karimov.com/client-side-caching-llm-decision-architecture</guid><category><![CDATA[System Design]]></category><category><![CDATA[caching]]></category><category><![CDATA[llm]]></category><category><![CDATA[Frontend Development]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Damir Karimov]]></dc:creator><pubDate>Mon, 04 May 2026 10:57:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69e3f110ee84f66e94dba1f1/a03d76d7-04af-49dd-97da-2af78323cd43.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Client-side caching is commonly implemented as a storage optimization layer using TTLs and invalidation rules. In practice, caching behaves as a decision system under uncertainty, where correctness depends on data volatility, context, and user interaction patterns.</p>
<p>Static approaches break when data freshness is not uniform across the same application. This leads to either stale UI (over-caching) or excessive network requests (under-caching).</p>
<h2>Problem: Caching is a Decision Problem, Not Storage</h2>
<p>Client-side caching should be modeled as a policy engine:</p>
<ul>
<li><p>data has different volatility profiles</p>
</li>
<li><p>freshness requirements depend on UI context</p>
</li>
<li><p>user interactions influence cache relevance</p>
</li>
</ul>
<p>Typical volatility categories:</p>
<ul>
<li><p>user profiles → low volatility</p>
</li>
<li><p>feeds / notifications → high volatility</p>
</li>
<li><p>search results → context-dependent volatility</p>
</li>
<li><p>partially hydrated UI → unknown volatility</p>
</li>
</ul>
<p>The core issue is not caching mechanics, but missing decision logic for when and how to invalidate or reuse cached data.</p>
<h2>Baseline Approaches in Client-Side Caching</h2>
<h3>1. SWR and TTL-based caching</h3>
<p>Standard implementations (e.g. React Query, SWR) rely on:</p>
<ul>
<li><p>stale-while-revalidate</p>
</li>
<li><p>background refetching</p>
</li>
<li><p>TTL-based invalidation</p>
</li>
</ul>
<p>They perform well when:</p>
<ul>
<li><p>data freshness is predictable</p>
</li>
<li><p>update cycles are stable</p>
</li>
</ul>
<p>They fail when:</p>
<ul>
<li><p>volatility varies within the same dataset</p>
</li>
<li><p>freshness depends on UI state or user context</p>
</li>
</ul>
<h3>2. Heuristic scoring systems</h3>
<p>A more adaptive approach introduces computed cache policies:</p>
<pre><code class="language-text">
volatilityScore = EWMA(changeFrequency)
priorityScore = userInteractionWeight * dataImportance
ttl = baseTTL / volatilityScore
</code></pre>
<p>Improvements:</p>
<ul>
<li><p>adaptive cache lifetime</p>
</li>
<li><p>frequency-aware invalidation</p>
</li>
</ul>
<p>Limitations:</p>
<ul>
<li><p>requires manual feature engineering</p>
</li>
<li><p>weak generalization across domains</p>
</li>
<li><p>depends on complete signal observability</p>
</li>
</ul>
<h3>3. Lightweight ML models</h3>
<p>Alternative approach using ML:</p>
<ul>
<li><p>logistic regression</p>
</li>
<li><p>gradient boosting (XGBoost / LightGBM)</p>
</li>
<li><p>embedding-based classifiers</p>
</li>
</ul>
<p>Advantages:</p>
<ul>
<li><p>low latency inference</p>
</li>
<li><p>stable and predictable behavior</p>
</li>
<li><p>cheaper than LLM inference</p>
</li>
</ul>
<p>Limitations:</p>
<ul>
<li><p>requires labeled target (cache optimality is hard to define)</p>
</li>
<li><p>requires retraining pipelines</p>
</li>
<li><p>sensitive to product changes and distribution shifts</p>
</li>
</ul>
<h2>Why Traditional Approaches Plateau</h2>
<p>All baseline systems assume:</p>
<ul>
<li><p>feature space is complete</p>
</li>
<li><p>system dynamics are stationary</p>
</li>
</ul>
<p>In real applications:</p>
<ul>
<li><p>user behavior is contextual</p>
</li>
<li><p>volatility depends on UI state</p>
</li>
<li><p>“freshness importance” is semantic, not numeric</p>
</li>
<li><p>features are partially observable</p>
</li>
</ul>
<p>This creates an upper bound for heuristic and ML-light approaches.</p>
<h2>When LLMs Become Relevant</h2>
<p>LLMs are not a replacement for caching systems.</p>
<p>They function as a fallback policy layer in ambiguous or under-specified decision space.</p>
<p>They are useful when:</p>
<ul>
<li><p>feature confidence is low</p>
</li>
<li><p>signals conflict</p>
</li>
<li><p>unseen patterns appear</p>
</li>
</ul>
<h2>Layered Decision Architecture</h2>
<p>The correct system design is hierarchical:</p>
<pre><code class="language-text">
IF rule matches:
    use deterministic policy
ELSE IF ML confidence high:
    use ML policy
ELSE:
    use LLM policy
</code></pre>
<p>This ensures:</p>
<ul>
<li><p>deterministic execution dominates</p>
</li>
<li><p>ML handles structured uncertainty</p>
</li>
<li><p>LLM handles ambiguous cases only</p>
</li>
</ul>
<h2>System Architecture</h2>
<pre><code class="language-text">
UI Layer
   ↓
Context Builder
   ↓
Policy Engine
   ├── Rule Layer (fast path)
   ├── ML Scoring Model
   └── LLM Fallback Engine
   ↓
Cache Layer
   ↓
Network Layer
</code></pre>
<h2>Context Representation</h2>
<p>All decisions are based on structured signals, not raw prompts:</p>
<pre><code class="language-json">{
  "key": "user_feed",
  "lastUpdatedMs": 1200,
  "accessFrequency": "high",
  "volatilityScore": 0.82,
  "userAction": "scroll",
  "stalenessToleranceMs": 500
}
</code></pre>
<p>Key constraint:</p>
<ul>
<li><p>no free-form input</p>
</li>
<li><p>only deterministic feature structures</p>
</li>
</ul>
<h2>Role of LLM in the System</h2>
<p>LLM output is constrained to classification:</p>
<pre><code class="language-json">{
  "strategy": "HIT | REVALIDATE | BYPASS | SWR",
  "ttlMs": 120000,
  "confidence": 0.78
}
</code></pre>
<h2>Meta-Cache Layer (Decision Caching)</h2>
<p>To reduce LLM cost and latency variance:</p>
<pre><code class="language-text">
decisionCache(contextHash) → cache strategy
</code></pre>
<p>Effects:</p>
<ul>
<li><p>reduces repeated LLM inference</p>
</li>
<li><p>stabilizes decision latency</p>
</li>
<li><p>amortizes cost over repeated contexts</p>
</li>
</ul>
<h2>Cost-Aware Execution Model</h2>
<p>Execution routing:</p>
<pre><code class="language-text">
IF rule applies:
    skip ML and LLM
ELSE IF ML confidence &gt; threshold:
    use ML model
ELSE:
    use LLM
</code></pre>
<p>Typical distribution:</p>
<ul>
<li><p>80–90% rule-based</p>
</li>
<li><p>10–20% ML-based</p>
</li>
<li><p>&lt;10% LLM-based</p>
</li>
</ul>
<h2>Failure Modes and Mitigation</h2>
<h3>1. LLM overuse</h3>
<p>Problem:</p>
<ul>
<li>increased cost</li>
</ul>
<p>Mitigation:</p>
<ul>
<li><p>strict confidence thresholds</p>
</li>
<li><p>deterministic routing priority</p>
</li>
</ul>
<h3>2. Latency variance</h3>
<p>Problem:</p>
<ul>
<li>inconsistent response time</li>
</ul>
<p>Mitigation:</p>
<ul>
<li><p>decision caching</p>
</li>
<li><p>asynchronous precomputation</p>
</li>
</ul>
<h3>3. Model drift</h3>
<p>Problem:</p>
<ul>
<li>degraded decision quality over time</li>
</ul>
<p>Mitigation:</p>
<ul>
<li><p>feedback loop</p>
</li>
<li><p>periodic recalibration of scoring model</p>
</li>
</ul>
<h2>Engineering Takeaways</h2>
<ul>
<li><p>caching should be modeled as a decision system</p>
</li>
<li><p>SWR and TTL cover most production cases</p>
</li>
<li><p>heuristic systems improve adaptivity but have limits</p>
</li>
<li><p>ML is optimal in structured, stable feature spaces</p>
</li>
<li><p>LLMs are only justified for ambiguity handling</p>
</li>
<li><p>production systems require layered routing</p>
</li>
</ul>
<h2>Key Conclusions</h2>
<ul>
<li><p>Client-side caching is fundamentally a policy optimization problem</p>
</li>
<li><p>No single approach (rules, ML, LLM) is sufficient alone</p>
</li>
<li><p>Hybrid architecture is required for production systems</p>
</li>
<li><p>LLMs should be strictly bounded to fallback scenarios</p>
</li>
<li><p>Decision caching is critical for cost and latency control</p>
</li>
</ul>
<h2>Key Takeaways</h2>
<ul>
<li><p>caching ≠ storage optimization, it is decision logic</p>
</li>
<li><p>most cases are solved by rules and SWR</p>
</li>
<li><p>ML is effective in structured domains with stable signals</p>
</li>
<li><p>LLMs are fallback systems for uncertain states</p>
</li>
<li><p>layered routing is required for stability and cost control</p>
</li>
</ul>
]]></content:encoded></item></channel></rss>