Vision & Photo Analysis

Attach photos, screenshots, and documents to any chat message — LucidPal describes them, reads text from them, and answers questions, entirely on your device.

What Vision Analysis Does

Vision lets the AI "see" images you attach to a message. You can:

Describe a photo — "What's in this picture?"
Read text from a screenshot — parse event invitations, menus, receipts, or handwritten notes
Analyse a document image — extract key details from a scanned page or photo of a whiteboard
Answer questions about an image — "How many items are on this receipt?" or "What date is on this flyer?"

The model processes the visual content and combines it with your text question to give a single, unified reply — the same way the text model handles a calendar request.

How to Attach a Photo

Open a chat session in LucidPal.
Tap the paperclip / attachment icon next to the text input field.
Choose Photo Library to pick an existing photo, or Camera to take one now.
Select your image — a thumbnail preview appears in the input bar.
Type your question (optional — you can send with just the image).
Send the message.

The vision model processes the image, then the full response appears in the chat bubble.

note

Only one image can be attached per message. To ask about multiple images, send separate messages.

What the Model Sees

Before the image reaches the model, LucidPal's VisionImageProcessor automatically:

Resizes the image so its longest side is at most 896 px, preserving aspect ratio
Compresses it to JPEG at 0.8 quality — enough fidelity for accurate analysis, small enough to run quickly
Passes the JPEG to the vision model's CLIP encoder for embedding

A separate 224 px thumbnail is generated for the chat bubble preview — that smaller version is never sent to the model.

This means the model sees a clean, reasonably detailed version of your image — fine for reading printed text, identifying objects, and describing scenes. Very small text (e.g., 6-pt footnotes) or highly detailed charts may not be fully legible.

The Qwen3.5 Vision 4B Model

LucidPal offers two ways to get vision capability:

Setup	How it works
Integrated model (Qwen3.5 Vision 4B)	One download handles both text chat and image analysis — no second model needed
Separate vision model	A dedicated vision GGUF loaded alongside your text model

Qwen3.5 Vision 4B is the recommended choice for most users. It is an integrated model — a single 2.5 GB file that covers all text requests (calendar, notes, habits) and all vision requests. There is no need to download anything else.

When a separate text model (e.g. Qwen3.5 2B) is active and a standalone vision model is also downloaded, LucidPal automatically switches to the vision model when it detects an image attachment, then switches back for text-only messages.

note

Integrated models show an Integrated badge in Settings. Separate vision models show a Vision badge.

Vision Settings

Go to Settings → Vision to control vision behaviour.

Vision Toggle

A single Vision toggle (under the Vision section header) enables or disables photo attachment processing:

On — photo attachments trigger the vision model. The paperclip icon is active in chat.
Off — only text inference runs. Images cannot be attached. Use this if you want to conserve RAM or only use a text model.

Vision Model Selection

Under Settings → Vision Model, you can:

Select which vision model is active — tap any downloaded model to make it the active vision model
Download a new vision model — tap Download Vision Models to browse available options
Delete a model — swipe left on any downloaded model and tap Delete

If you select an integrated model here, it also becomes your active text model — there is no separate text model needed.

Limitations

Limitation	Detail
Model must be downloaded first	Vision only works when a vision or integrated model is downloaded and selected. The toggle is disabled until then.
RAM requirement	Qwen3.5 Vision 4B requires ~5 GB of available RAM — iPhone 14 Pro, 15, or 16 series recommended.
One image per message	Multiple attachments in a single message are not supported.
Image size cap	Images are auto-downscaled so their longest side is at most 896 px (aspect ratio preserved). Very large originals lose no important detail, but microscopic text may not be legible.
Image types	Works best with clear, well-lit photos. Blurry, very dark, or heavily compressed images produce less accurate results.
No video	Only still images (JPEG, PNG, HEIF) are supported.
No PDF pages	For PDF documents, use the Document Summarization feature instead.

Privacy

All image processing is 100% on-device. Your photos are:

Resized and encoded locally by VisionImageProcessor
Stored temporarily in the app's private temp directory during inference
Passed only to the local vision model — never uploaded to any server
Removed from temp storage after the response is generated

See the Privacy guide for the full data table.

What Vision Analysis Does​

How to Attach a Photo​

What the Model Sees​

The Qwen3.5 Vision 4B Model​

Vision Settings​

Vision Toggle​

Vision Model Selection​

Limitations​

Privacy​