Image, Video & Music Studio — AI Creative Tools Guide | The Prompt Kitchen

How image AI actually works — no math, just intuition

Image AI tools (like Midjourney, DALL-E, and Stable Diffusion) are trained on hundreds of millions of images from the internet, each paired with a text description. Over time, the model learns the statistical relationships between words and visual patterns. "Misty forest" reliably produces cool blues and diffused light because that's what millions of misty-forest images look like.

When you type a prompt, the model doesn't "draw" your description the way a human artist would. It statistically reconstructs an image that matches the pattern of your words — starting from pure noise and gradually refining it into something coherent. Think of it like tuning in a radio signal until the music becomes clear.

This is why specificity wins. Vague prompts — "a forest" — produce generic results because they could match thousands of different images. Specific prompts — "ancient redwood forest at dawn, volumetric god rays, Sony A7R V, 24mm f/2.8" — narrow the statistical target considerably.

Anatomy of an image prompt

You don't need all of these in every prompt. Think of them as ingredients — use what serves the dish.

🎯 Subject

The main thing in the image. Be specific: not "a woman" but "a 50-year-old woman with grey-streaked hair." Include relevant details about pose, expression, clothing.

🎨 Style / Medium

Photography, oil painting, watercolor, vector illustration, pencil sketch, 3D render, cinematic still. Naming a specific style or artist is powerful — but be aware that using living artists' names can be controversial.

💡 Lighting

One of the most impactful ingredients. Options: golden hour, overcast diffused, studio three-point, dramatic side-lighting, neon, candlelight, god rays, backlit rim light.

📐 Composition / Framing

Close-up portrait, wide establishing shot, bird's-eye view, worm's-eye view, rule of thirds, centered symmetry, Dutch angle. Camera movement terms work here too.

🎨 Color Palette

Muted earth tones, high-contrast monochrome, pastel, saturated primary colors, warm amber and cream, cool blues and greys. Named palettes from film or art also work well.

🌫️ Mood / Atmosphere

Melancholic, joyful, tense, ethereal, gritty, peaceful, mysterious. These influence everything from lighting to color to subject expression in ways that individual terms can't.

📷 Camera Terms

Shot on [camera body], [lens]mm f/[aperture], shallow depth of field, bokeh, long exposure, macro, tilt-shift. Real camera specs give models a precise optical reference.

🚫 Negative Prompts

What to exclude. Especially useful for: people (when you want empty scenes), text and watermarks, deformed hands, blurry, oversaturated, cartoon (when you want realism).

Which image model should you use?

Midjourney

midjourney.com · Subscription required · Discord or web

The gold standard for artistic and aesthetic imagery. Midjourney produces consistently beautiful, polished images with a distinctive quality that's hard to match. It has its own aesthetic tendencies — slightly painterly, high production value — which you can control with parameters.

Key parameters:

  • --ar 16:9 — sets aspect ratio (16:9 landscape, 9:16 portrait, 1:1 square)
  • --style raw — reduces Midjourney's default painterly treatment for more photographic results
  • --chaos 0–100 — 0 is predictable, 100 is wildly varied. Start low (10–20) for professional work.
  • --no [thing] — negative prompt (e.g., --no people, text, watermarks)
  • --stylize 0–1000 — how strongly MJ applies its aesthetic. Default 100, lower for more literal prompts.
  • --seed [number] — reproduces an image or style across generations
Aesthetic quality Artistic styles Consistent quality floor Community and tutorials
Paid subscription Text in images still imperfect Hands can be difficult

DALL-E 3 (via ChatGPT)

chat.openai.com · Included with ChatGPT Plus

DALL-E 3 is excellent at following detailed, complex instructions — more faithfully than Midjourney, which can be impressionistic. It's particularly good at combining multiple concepts in one image and at architectural/interior images. Access it through ChatGPT by asking "generate an image of..."

Follows complex prompts precisely Good text in images Architectural interiors Accessible via ChatGPT
Less artistic than Midjourney Requires ChatGPT Plus More conservative content policy

Stable Diffusion / Flux

Run locally or via services like Replicate, ComfyUI, Automatic1111

Open-source models you can run on your own computer (with a capable GPU) or through third-party services. The flexibility is unmatched — fine-tuned models for specific styles, ControlNet for precise composition control, custom training. Steeper learning curve, but no subscription and no content restrictions from a central provider.

Free to run locally Huge model ecosystem ControlNet for precision Full customization
Steeper learning curve Requires decent GPU Quality varies by model

Adobe Firefly

firefly.adobe.com · Free tier & Creative Cloud plans

Trained exclusively on licensed content — making it the safest choice for commercial work. Deeply integrated into Photoshop and Illustrator (Generative Fill is a game-changer for photo editing). Not the most creative model, but reliable and commercially clean.

Commercially safe Photoshop integration Generative Fill / Expand Free tier available
Less creative range Best with Adobe subscription

Ideogram

ideogram.ai · Free & paid tiers

The go-to model when you need readable text inside an image — logos, posters, signs, typographic art. Other models struggle badly with text; Ideogram handles it remarkably well. Also strong on realistic photography and general composition.

Best text-in-image Logo & poster work Strong realism Generous free tier
Smaller community & resources

Image prompt formulas

Copy any of these, fill in the brackets, and you have a solid starting point. Add Midjourney parameters at the end if using that tool.

Portrait

Photorealistic Portrait Formula

[age and gender], [distinctive feature or expression], [clothing description], [background setting], shot on [camera body] with [lens]mm f/[aperture], [lighting type], editorial photography, 8K, sharp focus

Example: 40-year-old woman with weathered hands and a calm expression, wearing a worn denim jacket, standing in a doorway of a rural farmhouse, shot on Canon EOS R5 with 85mm f/1.4, golden hour backlighting, editorial photography, 8K, sharp focus

Negative prompt: airbrushed skin, unrealistic proportions, watermark, text, deformed hands
Illustration

Concept Art / Illustration Formula

[subject description], [art style — e.g., digital painting / watercolor / ink illustration], [mood/atmosphere], [color palette], [compositional note], concept art, detailed, [additional style reference if helpful]

Example: Ancient stone library inside a living tree, warm candlelight filtering through roots that form the ceiling, towering bookshelves, a lone scholar reading at a central table, digital painting in the style of Studio Ghibli backgrounds, soft warm palette, wide establishing shot, concept art, highly detailed

Negative prompt: modern elements, photography, text, watermark, bad anatomy
Product

Product Mockup Formula

[product name and description], [material and finish], [surface/context], [lighting], product photography, white/neutral background, commercial quality, sharp focus, no shadows on background

Example: Handmade ceramic coffee mug, matte terracotta and cream speckled glaze, resting on raw linen cloth, soft diffused window light from the left, shallow depth of field, product photography, white background, commercial quality, sharp focus

Negative prompt: fake-looking glaze, text, logos, people, harsh background shadows, oversaturated
Architecture

Architecture / Environment Formula

[interior or exterior], [architectural style], [time of day / lighting condition], [key materials and details], [atmosphere], [shot type — establishing / intimate / aerial], architectural photography, high resolution

Example: Interior of a brutalist library reading room, poured concrete walls, dramatic skylights casting geometric shadows, rows of wooden desks, warm tungsten lamp pools, empty and quiet, wide establishing shot, architectural photography, high resolution

Negative prompt: people (unless specified), modern furniture clashing with style, text, watermark, lens distortion
Logo / Icon

Logo / Icon Formula

Use Ideogram for this — it handles text far better than other models. For purely graphic logos (no text), Midjourney also works well.

Minimalist logo for [business/brand name], [industry], [1-2 word style description — e.g., geometric / organic / bold / playful], [color palette], vector style, clean lines, white background, professional

Example: Minimalist logo for "Fern & Stone" artisan bakery, organic botanical motif using a fern leaf and a simple stone shape, warm terracotta and cream palette, vector style, clean lines, white background, professional

For Ideogram: include "with the text '[Brand Name]' in [font style — e.g., modern serif / clean sans-serif]"
Style Transfer

Style Transfer Formula

[your subject, described specifically], in the style of [art movement / medium / specific work], [key stylistic elements to emphasize], [color characteristics], [mood]

Example: A busy city intersection at night, in the style of Edward Hopper oil paintings, strong artificial light pools against deep shadows, muted yellow and green palette, solitary figures, melancholy urban atmosphere, painterly brushwork visible

Note: Naming living artists can be controversial and some tools will decline. Named art movements (Impressionism, Art Deco, Bauhaus) are always safe alternatives.

Tips & tricks

Aspect ratio cheat sheet

1:1
Instagram post, profile photo, thumbnail
16:9
YouTube, presentations, wallpaper, TV
9:16
Instagram/TikTok Reels, phone wallpaper, Stories
4:3
Traditional photography, older displays
3:4
Portrait photography, Pinterest pins
21:9
Cinematic ultrawide, website hero images

Keeping images consistent across a series

🌱

Use the seed parameter

In Midjourney, add --seed [number] to pin the random starting point. Images with the same seed and similar prompts will share visual DNA — useful for product series, character sheets, or style consistency across a collection.

🖼️

Use an image reference (--sref in Midjourney)

Provide an existing image as a style reference with --sref [image URL]. The model will match the visual style without copying the content. Combine multiple references with multiple URLs.

📋

Keep a style string

Once you have an image you love, copy its exact prompt — including all parameters and lighting/style descriptors — and reuse it as a template. Change only the subject. This is the simplest consistency system.

Text in images

💡

The honest answer on text

Ideogram is clearly the best for readable text. DALL-E 3 is improving and handles short words well. Midjourney is still unreliable for text — letters get scrambled. If text accuracy is critical (a poster, a logo with a specific name), use Ideogram and always proof-read the output carefully before using it.

How video generation differs from images

Image generation produces a single still frame. Video generation has to produce dozens of frames that are consistent with each other over time — same lighting, same character, same physics, from moment to moment. This is called temporal consistency, and it's genuinely hard.

This is why video AI is impressive but imperfect. Current tools handle short clips (4–10 seconds) well. Complex motion, specific characters across multiple clips, and anything longer gets shaky. The field is moving fast — what was impossible in 2023 is routine now.

The practical implication: video AI is currently best suited for ambient footage (landscapes, establishing shots, background loops), product showcases (object rotating or in use), and B-roll to complement other footage. It struggles with characters doing specific things across multiple shots.

Video model landscape

Sora (OpenAI)

sora.com · Subscription required

OpenAI's video model, launched publicly in late 2024. Strong at photorealistic scenes, handles camera movement well, and can produce clips up to 20 seconds. The "storyboard" feature lets you generate multi-scene sequences. Still struggles with complex character motion and hands.

PhotorealismCamera movementLonger clipsStoryboard feature
Complex motionCharacters across clipsSubscription required

Runway Gen-3 / Gen-4

runwayml.com · Free & paid tiers

Runway is the most filmmaker-friendly video AI tool — it has the most controls (camera movement direction, motion brush to animate specific areas) and the largest professional community. Gen-4 added significantly better consistency between clips. Popular with professional video editors and motion designers.

Director controlsMotion brushProfessional featuresActive community
Credits system can be expensiveLearning curve

Kling (Kuaishou)

klingai.com · Free & paid tiers

Kling is a Chinese-made model that surprised the industry with its quality. Particularly strong at smooth, realistic human motion — one of the best for people walking, gesturing, or doing physical tasks. Up to 5-minute clips in the paid tier.

Human motionRealistic physicsLonger durationGenerous free tier
Interface can be slow from outside Asia

Luma Dream Machine

lumalabs.ai · Free & paid tiers

Luma is fast, has a generous free tier, and is excellent at turning still images into short video clips with realistic camera motion. If you have a great image and want to add subtle motion to bring it to life, Luma is often the fastest path.

Image-to-videoFast generationGood camera motionFree tier
Less precise control than Runway

Veo 2 (Google)

Via VideoFX / Google One AI Premium

Google's Veo 2 is their most capable video model — competitive with Sora for cinematic quality. Access is currently through Google's VideoFX platform and AI Premium subscription. Strong at understanding complex scene descriptions and producing cinematic footage.

Cinematic qualityComplex scene understandingGoogle ecosystem
Limited access currently

Anatomy of a video prompt

🎬 Scene Setup

Location, time of day, environment. Be specific — "a cobblestone alley in a European city at blue hour" is far better than "a street."

📷 Camera Movement

Pan left/right, tilt up/down, dolly in/out, orbit (circular), static locked, slow push in, handheld/verité, drone pullback. Name the move explicitly.

🏃 Subject Motion

What is happening in the frame? Walking, pouring, growing, falling — be specific about direction, speed, and character if relevant.

⏱️ Duration & Pacing

Most tools generate 4–10 second clips. State if you want slow, languid motion vs. dynamic, faster pacing. "Slow motion" halves perceived speed.

☀️ Lighting Changes

Does the light change? Clouds passing over the sun, a flickering candle, sunrise gradually brightening — dynamic lighting makes clips feel alive.

🎵 Mood / Sound Notes

Most tools don't generate audio yet, but mood language shapes the visual feel. "Tense, silent, held breath" vs. "warm, ambient, comfortable."

Video prompt formulas

B-Roll

Cinematic B-Roll Formula

[Subject / location], [camera movement], [lighting condition], [atmosphere], cinematic, [film stock or look reference], 4K, slow motion

Example: Empty rain-soaked city street at night, slow dolly forward down the center line, wet cobblestones reflecting neon signs, misty atmosphere, cinematic, ARRI Alexa film look, 4K, slight slow motion
Product

Product Showcase Formula

[Product], [surface/environment], [camera movement — typically slow orbit or push in], [lighting], professional product video, clean background, no people

Example: Glass perfume bottle with gold cap, on a marble surface with scattered petals, slow 360-degree orbital camera movement, soft studio rim lighting catching the glass facets, professional product video, white background, no people, 4K
Nature Loop

Nature / Ambient Loop Formula

[Natural scene], [subtle motion — wind in trees / waves / clouds moving], static or very slow camera, [lighting], ambient, peaceful, seamlessly loopable

Example: Dense bamboo forest, gentle wind causing slow rhythmic swaying, shafts of morning light filtering through the canopy, static camera with very slight upward tilt, ambient, peaceful, 4K, suitable as a seamlessly loopable background video
Social

Social Media Short Formula

[Visually compelling subject], [dynamic action or transformation], [vertical 9:16 framing], [energy level], fast cut or single shot, [aesthetic reference], suitable for social media

Example: Coffee being poured into a clear glass revealing beautiful layered colors, close-up macro shot, slow motion, vertical 9:16 framing, warm and cozy aesthetic, ASMR-style sensory focus, suitable for Instagram Reels

Practical tips for video generation

🖼️

Start from an image, not just text

Image-to-video (generating video from a reference image) consistently produces better results than text-to-video alone. Generate your perfect frame with an image tool first, then animate it. Luma Dream Machine and Runway are both excellent at this.

🎨

Storyboard with images first

Plan your scene sequence using an image model before animating. This lets you establish consistent characters, settings, and lighting across clips before committing video generation credits. Think of image generation as your pre-production storyboard.

⚠️

What still doesn't work well

Hands and fingers — still difficult in most models. Readable text — almost always garbled. Specific dialogue or lip sync — requires dedicated tools (HeyGen, Synthesia). Long duration — anything over 10–15 seconds loses consistency. Specific named individuals — ethically and technically complicated.

🔗

Chaining clips for consistency

Runway Gen-4 lets you use the last frame of one clip as the first frame of the next. This "chain" technique is currently the most reliable way to maintain visual consistency across multiple shots. Plan your clips like you'd plan a shot list.

How AI music actually works

AI music tools are trained on millions of songs and learn the statistical relationships between descriptive words and musical patterns. "Lo-fi hip hop" reliably produces a specific combination of tempo, chord progressions, drum patterns, and vinyl crackle because that's what that genre consistently sounds like across millions of examples.

The key difference from image AI: music has time. Modern tools like Suno and Udio don't just generate audio — they generate complete songs with structure: intros, verses, choruses, bridges, and outros. They understand how music unfolds, not just how it sounds in a single moment.

This means your prompt isn't just describing a sound — it's describing an entire listening experience. The more specific you are about genre, mood, instruments, and vibe, the better the result.

💡

The single most important tip

Generate 4–5 versions and pick the best one. AI music has far more variation than text — the same prompt can produce very different results each time. Always generate multiple options before deciding. Suno's free tier gives you ~50 songs per day, so don't be stingy with generations.

Which music AI should you use?

Suno

suno.com · Free & paid · Best starting point for most people

The most popular AI music tool and the best starting point for beginners. Suno creates complete songs with vocals, lyrics, and production in just about any genre. The free tier is genuinely generous — about 50 songs per day. Type a style description, click generate, and you have a real song in under 30 seconds.

~50 free songs/day Complete songs with vocals Huge genre range Custom lyrics support Very beginner-friendly
Auto-lyrics can be generic Can sound slightly AI on close listen

Free tier: Yes — ~50 songs/day, all major features included.

Paid: ~$10–$30/month. More generations, commercial rights, no watermark.

Suno: Custom Mode vs. Simple Mode

Simple Mode: Describe the song you want in plain language. Suno writes the lyrics and handles everything else. Best for experimenting.

Custom Mode: Write your own lyrics and specify the style separately. Use [Verse], [Chorus], [Bridge], [Outro] tags to control song structure. Best when you want specific lyrics or more control over the arrangement.

Udio

udio.com · Free & paid · Highest audio quality

Udio's audio quality is exceptional — arguably the best-sounding AI music currently available. It's particularly strong at niche genres (jazz, classical, metal, folk) and produces more nuanced results. The section editing feature lets you regenerate specific parts of a song without losing the rest. Worth using when quality matters more than speed.

Top audio quality Excellent niche genre range Section editing Stems export (paid)
Fewer free credits than Suno Slightly slower generation

Free tier: Yes — limited daily credits.

Paid: ~$10–$30/month for more credits and commercial licensing.

MusicFX (Google)

labs.google/musicfx · Completely free · Instrumentals only

Google's experimental music tool, available free via Google Labs. Excellent for short instrumental loops and ambient music — background tracks, studying music, sound design. Uniquely, it generates in near-real-time so you can hear changes as you adjust the prompt. No vocals or full song structure.

Completely free Real-time generation Great for ambient & loops No account needed
No vocals or lyrics Short clips only (~30 sec) Experimental — may change

Free tier: Yes — completely free, no sign-in required.

Stable Audio (Stability AI)

stableaudio.com · Free & paid · Long instrumentals & SFX

Stable Audio shines at generating longer instrumental tracks — up to 3 minutes — which makes it the go-to for background music, podcast intros, or anything that needs to run longer than a clip. Also strong for sound effects and cinematic scores. If you need a long, loopable instrumental, this is your tool.

Up to 3-minute tracks Cinematic & ambient Sound effects / SFX Precise timing control
Primarily instrumental Only 20 free generations/month

Free tier: Yes — 20 generations/month.

Paid: ~$12/month for 500 generations + commercial rights.

Boomy

boomy.com · Free & paid · Simplest to use, publish to streaming

Boomy takes the simplest approach: choose a genre, adjust a few sliders, generate. Less creative control than Suno or Udio, but it has a unique feature: you can publish your creations directly to Spotify, Apple Music, and other streaming platforms and potentially earn royalties. Perfect if your goal is publishing rather than experimenting.

Simplest interface Publish to Spotify & Apple Music Royalty earning potential
Less creative control More template-based output

Free tier: Yes — limited monthly creations.

Paid: ~$3–$10/month for more songs and full publishing access.

Anatomy of a music prompt

The more dimensions you describe, the more precisely the AI can match your vision. You don't need all of these — but each one narrows the target.

🎸 Genre / Style

The most important ingredient. Be specific: not "rock" but "bluesy Southern hard rock" or "jangly 60s British Invasion." Genre carries tempo, instrumentation, and production style all at once.

😌 Mood / Emotion

How should the listener feel? "Melancholic and reflective," "joyful and energetic," "tense and cinematic." Emotional language often works better than technical terms alone.

⏱️ Tempo / Energy

Slow and dreamy, mid-tempo driving, fast and intense — or a specific BPM like "120 BPM." Also useful: "builds from quiet to explosive" or "consistent and meditative."

🎹 Instruments

Name specific instruments: "acoustic guitar and upright bass," "Rhodes piano and muted trumpet," "heavy distorted guitar and pounding drums, no synths." The more specific, the better.

🎤 Vocals

Female vocalist with breathy delivery, deep gravelly male voice, harmonies, choir, rap flow — or simply "instrumental, no vocals." Suno adds vocals by default if you don't specify.

📅 Era / Production

Decade references are powerful: "1970s soul production," "80s synth pop with gated reverb drums," "modern hyperpop." These describe an entire production aesthetic in a few words.

🏗️ Song Structure

For Suno Custom Mode: use [Intro], [Verse], [Chorus], [Bridge], [Outro] tags in your lyrics. In style prompts, describe arc: "slow build from quiet to full band."

🎬 Use Case

Telling the AI what the music is for helps: "background music for a coffee shop," "podcast opening theme," "emotional scene in a drama," "workout pump-up track."

Music prompt formulas

Copy any of these, fill in the brackets, and paste into Suno or Udio. Generate 3–5 versions and pick the best.

Background

Study / Work Background Music

lo-fi hip hop, relaxed and focused, vinyl crackle, mellow piano loops, soft jazz drums, warm bass, no vocals, instrumental, perfect for studying

— or —

ambient electronic, meditative and calm, soft synthesizer pads, gentle arpeggios, no percussion, no vocals, perfect for deep work and concentration
Song

Custom Song Formula

[Genre] song, [mood], [vocal style], [key instruments], [era or production style], [tempo]

Example: Indie folk song, bittersweet and nostalgic, warm female vocals with harmonies, fingerpicked acoustic guitar and cello, gentle percussion, mid-2000s indie production, mid-tempo

Example: Upbeat funk soul, joyful and celebratory, soulful male vocals, brass section, rhythm guitar, tight funk drums, 1970s classic soul production, 105 BPM
Cinematic

Film Score / Cinematic Music

[Emotional tone] cinematic orchestral score, [instruments], no vocals, [tempo / arc], film score quality

Example: Melancholic and hopeful cinematic score, solo piano with strings gradually joining, swelling to full orchestra, bittersweet emotional arc, no vocals, slow build from intimate to epic, film score quality

Example: Tense action thriller score, fast pulsing strings, brass stabs, heavy percussion, no vocals, building intensity, Hans Zimmer style
Ambient

Ambient / Mood Music

ambient [adjective], [setting or feeling], [textures / instruments], no vocals, [energy level]

Example: Ambient rainy evening, melancholic but cozy, piano notes blending with rain sounds and distant thunder, soft reverb tails, no drums, no vocals, slow and contemplative

Example: Dark ambient, tense and ominous, low drone textures, distant metallic sounds, sparse low piano notes, building unease, no vocals, thriller soundtrack feel
Social / Viral

Social Media / Short-Form Video Music

[Trending genre], [energy], [signature elements], [vocals or no vocals], suitable for short-form video content

Example: Phonk trap, dark and intense, heavy 808 bass, aggressive hi-hat rolls, distorted Memphis vocal chops, building drop, suitable for montage and car content

Example: City pop, nostalgic and dreamy, 1980s Japanese pop production, guitar riff, bright synthesizers, smooth female vocals, retro aesthetic, suitable for aesthetic video content

Tips for better AI music

🎯

"Rock" is too broad — be specific with genre

"Rock" covers everything from The Beatles to Metallica to Radiohead. Try "grunge rock with heavy distortion and angst," "jangly 60s British Invasion guitar pop," or "radio-friendly 2000s pop-rock with anthemic chorus." The more specific, the more focused.

💭

Describe the feeling, not just the genre

"The feeling of driving alone at 2am on an empty highway" or "background music for a montage of old home videos" often produces better results than pure genre labels. The AI has learned music in context, not just in isolation.

📅

Era references carry a lot of information

"1970s soul" implies analog warmth, horn arrangements, groove-based bass, and a specific drum sound. "80s synth pop" implies gated reverb, DX7 piano, and bright production. Decade references communicate more than you might expect.

🎤

Always specify vocals or no vocals

Suno and Udio add vocals by default. If you want instrumental music, explicitly say "no vocals, instrumental only." If you want vocals, describe the voice: "breathy female singer," "deep soul male vocals," "harmonies," "rap verse with melodic hook."

⚖️

A note on copyright and commercial use

Most AI music tools' free tiers restrict commercial use — check each tool's terms before using in paid projects. Paid tiers typically include the commercial rights you need. Genre descriptions ("80s funk style") are safer and usually produce better results than "in the style of [specific artist]."

💡

Quick tool picker

Complete songs with lyrics → Suno  ·  Best audio quality → Udio  ·  Free instrumental loops → MusicFX  ·  Long background track (3+ min) → Stable Audio  ·  Publish to Spotify → Boomy