HomeDocsGPT-4o Image API
Image APIs

GPT-4o Image API

Unified multimodal endpoint for image generation, editing, and understanding with consistent schemas across workflows. · Updated 2025-03-18

Overview

GPT-4o Image combines perception and generation. Use one API to create new images, edit existing assets, or extract structured metadata from uploads—all while sharing the same authentication and response schema.

Generation

curl -X POST "https://api.transendai.net/v1/images/gpt4o/generation" \
  -H "Authorization: Bearer $TRANSEND_API_KEY" \
  -H "Content-Type": "application/json" \
  -d '{
    "prompt": "High-end sneaker on a marble pedestal with volumetric lighting",
    "size": "1024x768",
    "guidance": 6.5
  }'
FieldDescription
promptNatural language description.
sizeWidth × height (max 2048 in either dimension).
guidance0–10 float controlling adherence to the prompt.
reference_imagesOptional array of URLs to guide style/composition.

Editing

curl -X POST "https://api.transendai.net/v1/images/gpt4o/edit" \
  -H "Authorization: Bearer $TRANSEND_API_KEY" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F 'payload={
    "prompt": "Swap the background to black marble and add cyan accent lighting.",
    "size": "1024x1024"
  }'

Masks are optional; if omitted, GPT-4o automatically infers editable regions.

Understanding

curl -X POST "https://api.transendai.net/v1/images/gpt4o/analyze" \
  -H "Authorization: Bearer $TRANSEND_API_KEY" \
  -F "[email protected]" \
  -F 'payload={"tasks":["caption","objects","text"]}'

Response excerpt:

{
  "analysis": {
    "caption": "Coffee shop receipt totaling $18.50",
    "objects": [
      { "label": "receipt", "confidence": 0.99 },
      { "label": "latte", "confidence": 0.81 }
    ],
    "text": [
      { "content": "Total $18.50", "bounding_box": [42, 128, 310, 156] }
    ]
  }
}

Streaming

Set stream: true to receive partial outputs in SSE. Each event includes step metadata such as denoise, upscale, and final.

Error Reference

CodeMeaningFix
400Unsupported task combination.Request any subset of generation, edit, analyze separately.
413Upload too large.Compress assets to 25 MB or provide signed URLs.
422Mask mismatch.Ensure mask dimensions match the source image.

Tips

  • Combine perception with generation by running analyze first, then piping results into a follow-up generation request.
  • Use response_format: { "type": "json_schema" } to enforce structured metadata when generating descriptions or labels.
  • Monitor GPU-intensive operations via the observability dashboards.

Related Resources