Skip to main content

Cloud AI

On-device AI inference is always free. Cloud AI extends LucidPal with Gemini 2.5 Flash for faster responses and larger context windows.

Overview

LucidPal uses an LLMOrchestrator to route every generation request between cloud and local inference at runtime:

Route conditionBackend used
forceLocal = trueAlways local
Auto + no paid subscriptionAlways local
Auto + paid subscription + onlineCloud by default
Cloud unreachable after 2s (first connect)Falls back to local
Cloud stream dies mid-session after 30s reconnectFalls back to local
Cloud stream fails before first tokenOne-shot retry with local

The orchestrator is transparent to callers — ChatViewModel and AgentViewModel call the same LLMServiceProtocol interface regardless of which backend is active.

Availability

FeatureFreeStarterProUltimate
On-device inference
Cloud AI chat
Ability plan synthesis

Monthly Limits

PlanCloud Messages
StarterStandard
ProHigher
UltimateHighest

Limits reset at each billing cycle. When exhausted, on-device AI remains fully available.


How Cloud Routing Works

First Connect (2s stability timer)

On the first message of a session, LucidPal starts a 2-second stability timer after initiating the cloud connection. If the cloud stream fails to yield any token within 2 seconds, the orchestrator cancels cloud and retries with local inference.

Mid-Session Reconnect (30s stability timer)

If cloud inference becomes unreachable mid-session (network drop, server error), the orchestrator waits up to 30 seconds for the stream to recover before evicting the local model and switching to cloud.

Local Model Eviction

When cloud is active and stable, LucidPal evicts the local model from memory after 30 seconds to conserve RAM. The next time local inference is needed, the model reloads automatically — expect a brief cold-start delay.

One-Shot Cloud Fallback

If the cloud stream fails before yielding any token (connection error at the transport layer), the orchestrator makes a single retry attempt with local inference before surfacing an error to the UI.

Timeout Behavior

ScenarioWhat happens
Cloud stream times outPartial content is shown with a notice appended
Task cancelled by userPartial content kept, no error shown
Context window fullGeneration stops at last usable position

Preferred Source Setting

In Settings → Inference → Preferred Source:

SettingBehavior
Auto (default)Orchestrator decides based on subscription, connectivity, and stability timers
On-Device Only (forceLocal = true)Cloud is never used — forces local even with paid subscription

On-Device Only is useful for airplane mode, battery saving, or when you want zero network traffic.


Error States

ErrorCauseRecovery
dailyLimitReachedCloud credits exhausted for this billing cycleUse on-device, or wait for reset
notAuthenticatedAuth token expired or revokedRe-authenticate in Settings → Account
generateFailedllama.cpp runtime errorRetry with cloud if available
modelNotLoadedLocal model not in memoryWait for reload, or switch to cloud

Ability Plans (Synthesis)

Synthesis is a single-turn AI call used by ability plans after tool data is gathered on-device. Unlike chat, it does not stream — it takes the gathered context and produces a final response.

Ability plans gather calendar events, notes, or other data locally, then send that context to cloud synthesis for AI-powered summarization or enrichment.


Troubleshooting

"Subscription required" error

Cloud AI requires an active paid subscription (Starter, Pro, or Ultimate).

  1. Open Settings → Subscription
  2. Verify your plan is active
  3. If expired, renew to restore access

"Monthly limit reached" error

You've used your allocated cloud AI messages for this billing cycle.

  • Use on-device inference for the rest of the cycle
  • Upgrade your plan for higher limits
  • Wait until your next billing date

Entitlement changes not reflected

If you subscribe or cancel while the app is open:

  1. Close and reopen LucidPal
  2. The app refreshes entitlements on launch