Cloud AI

On-device AI inference is always free. Cloud AI extends LucidPal with Gemini 2.5 Flash for faster responses and larger context windows.

Overview

LucidPal uses an LLMOrchestrator to route every generation request between cloud and local inference at runtime:

Route condition	Backend used
`forceLocal = true`	Always local
Auto + no paid subscription	Always local
Auto + paid subscription + online	Cloud by default
Cloud unreachable after 2s (first connect)	Falls back to local
Cloud stream dies mid-session after 30s reconnect	Falls back to local
Cloud stream fails before first token	One-shot retry with local

The orchestrator is transparent to callers — ChatViewModel and AgentViewModel call the same LLMServiceProtocol interface regardless of which backend is active.

Availability

Feature	Free	Starter	Pro	Ultimate
On-device inference	✓	✓	✓	✓
Cloud AI chat	—	✓	✓	✓
Ability plan synthesis	—	✓	✓	✓

Monthly Limits

Plan	Cloud Messages
Starter	Standard
Pro	Higher
Ultimate	Highest

Limits reset at each billing cycle. When exhausted, on-device AI remains fully available.

How Cloud Routing Works

First Connect (2s stability timer)

On the first message of a session, LucidPal starts a 2-second stability timer after initiating the cloud connection. If the cloud stream fails to yield any token within 2 seconds, the orchestrator cancels cloud and retries with local inference.

Mid-Session Reconnect (30s stability timer)

If cloud inference becomes unreachable mid-session (network drop, server error), the orchestrator waits up to 30 seconds for the stream to recover before evicting the local model and switching to cloud.

Local Model Eviction

When cloud is active and stable, LucidPal evicts the local model from memory after 30 seconds to conserve RAM. The next time local inference is needed, the model reloads automatically — expect a brief cold-start delay.

One-Shot Cloud Fallback

If the cloud stream fails before yielding any token (connection error at the transport layer), the orchestrator makes a single retry attempt with local inference before surfacing an error to the UI.

Timeout Behavior

Scenario	What happens
Cloud stream times out	Partial content is shown with a notice appended
Task cancelled by user	Partial content kept, no error shown
Context window full	Generation stops at last usable position

Preferred Source Setting

In Settings → Inference → Preferred Source:

Setting	Behavior
Auto (default)	Orchestrator decides based on subscription, connectivity, and stability timers
On-Device Only (`forceLocal = true`)	Cloud is never used — forces local even with paid subscription

On-Device Only is useful for airplane mode, battery saving, or when you want zero network traffic.

Error States

Error	Cause	Recovery
`dailyLimitReached`	Cloud credits exhausted for this billing cycle	Use on-device, or wait for reset
`notAuthenticated`	Auth token expired or revoked	Re-authenticate in Settings → Account
`generateFailed`	llama.cpp runtime error	Retry with cloud if available
`modelNotLoaded`	Local model not in memory	Wait for reload, or switch to cloud

Ability Plans (Synthesis)

Synthesis is a single-turn AI call used by ability plans after tool data is gathered on-device. Unlike chat, it does not stream — it takes the gathered context and produces a final response.

Ability plans gather calendar events, notes, or other data locally, then send that context to cloud synthesis for AI-powered summarization or enrichment.

Troubleshooting

"Subscription required" error

Cloud AI requires an active paid subscription (Starter, Pro, or Ultimate).

Open Settings → Subscription
Verify your plan is active
If expired, renew to restore access

"Monthly limit reached" error

You've used your allocated cloud AI messages for this billing cycle.

Use on-device inference for the rest of the cycle
Upgrade your plan for higher limits
Wait until your next billing date

Entitlement changes not reflected

If you subscribe or cancel while the app is open:

Close and reopen LucidPal
The app refreshes entitlements on launch

Overview​

Availability​

Monthly Limits​

How Cloud Routing Works​

First Connect (2s stability timer)​

Mid-Session Reconnect (30s stability timer)​

Local Model Eviction​

One-Shot Cloud Fallback​

Timeout Behavior​

Preferred Source Setting​

Error States​

Ability Plans (Synthesis)​

Troubleshooting​

"Subscription required" error​

"Monthly limit reached" error​

Entitlement changes not reflected​