/visualise - AI Image Generation¶
| Runtime | ~10 seconds per image |
| Reads | Source note from vault |
| Writes | PNG image to vault, embed link in source note |
| Model | Google Nano Banana Pro (Gemini Image API) |
What It Does¶
Reads any vault note and generates a professional infographic, diagram, or knowledge card from it. Claude crafts an optimised prompt, calls Google's image generation API, saves the PNG alongside the note, and embeds it in the markdown.
Why It Matters¶
Markdown is great for thinking. It's terrible for communicating complex frameworks at a glance. A three-tier model described in bullet points takes 30 seconds to parse. The same model as a visual diagram takes 3 seconds.
The bottleneck was never the image generation - it was the prompting. Writing a good prompt for an image model requires translating abstract concepts into spatial relationships, colour coding, text placement, and visual hierarchy. That translation is exactly what an AI assistant with full vault context does well.
/visualise makes Claude the prompt engineer. You point at a note, it reads the full context, crafts a detailed prompt optimised for Nano Banana Pro's strengths (accurate text rendering, logical layouts), generates the image, and puts it where it belongs.
How It Works¶
graph LR
A[Vault Note] --> B[Claude reads + understands structure]
B --> C[Crafts optimised image prompt]
C --> D[Calls Gemini Image API]
D --> E[PNG saved to vault]
E --> F[Embed added to note]
- Read the source - Claude reads the full note to understand the concept, relationships, hierarchy, and key terms
- Craft the prompt - Translates the content into a detailed image generation prompt with explicit spatial layout, text labels, colour guidance, and style
- Generate - Calls the Gemini Image API via a local wrapper script
- Save and embed - Writes the PNG to the same directory and adds an Obsidian embed link
The Key Innovation¶
Claude as prompt engineer, not the user. The gap between "I want a diagram of this framework" and the prompt that actually produces a good diagram is significant. You'd need to specify:
- Visual type (layered diagram, flow chart, matrix, timeline)
- Spatial layout (vertical stack, left-to-right, 2x2 grid)
- Every text label that must appear (Nano Banana Pro renders text accurately, but needs it spelled out)
- Colour coding and visual hierarchy
- Style and typography
Claude already understands the note's structure. It knows which elements are tiers, which are comparisons, which are sequential steps. That structural understanding translates directly into prompt engineering that a human would spend 5 minutes crafting manually.
Two Quality Tiers¶
| Tier | Model | Cost | Best For |
|---|---|---|---|
| Flash | Gemini 2.5 Flash Image | ~$0.04/image | Internal vault visuals, quick iteration |
| Pro | Nano Banana Pro (Gemini 3 Pro Image) | ~$0.13/image | External content, LinkedIn, presentations |
Pro produces noticeably better text rendering, layout precision, and visual polish. Use it for anything leaving the vault.
Example Usage¶
Visualise a specific note:
Visualise by concept name (Claude searches the vault):
Specify quality tier:
What It Generates¶
The skill handles different visual types based on the source content:
| Source Content | Visual Output |
|---|---|
| Framework with tiers/levels | Layered diagram with progression arrows |
| Process or workflow | Flow diagram with sequential steps |
| A vs B comparison | Side-by-side chart with matching rows |
| Hierarchy or org structure | Tree diagram with nesting |
| Timeline or roadmap | Horizontal timeline with milestones |
| Concept or definition | Knowledge card with key attributes |
| Multi-factor model | Matrix or grid layout |
Image Editing¶
Already have a generated image that needs tweaking? Pass --input-image to modify an existing PNG:
This uses Nano Banana Pro's image-to-image editing - faster and more consistent than regenerating from scratch.
Setup¶
- Get a Google AI Studio API key at aistudio.google.com/apikey
- Add to your shell profile:
export GOOGLE_API_KEY="your-key-here" - Install the Python SDK:
pip install google-genai - Copy the
~/bin/visualisewrapper script from the template
The free tier is generous for personal use. Billing-enabled keys unlock the Pro model and higher rate limits.
Customisation Guide¶
- Default model - Edit the script to change from
flashtoproif you always want top quality - Aspect ratios - The skill defaults to 16:9. Override with
--aspect 1:1for social media or--aspect 9:16for tall diagrams - Style preferences - Add your preferred style to the SKILL.md prompt template (dark mode, specific colour palette, etc.)
- Output location - The skill saves images alongside source notes by default. Modify the path convention in SKILL.md if you prefer a central
assets/folder