Introduction
Operations
Platform
Image APIs
GPT-4o Image API
Unified multimodal endpoint for image generation, editing, and understanding with consistent schemas across workflows. · Updated 2025-03-18
Overview
GPT-4o Image combines perception and generation. Use one API to create new images, edit existing assets, or extract structured metadata from uploads—all while sharing the same authentication and response schema.
Generation
curl -X POST "https://api.transendai.net/v1/images/gpt4o/generation" \
-H "Authorization: Bearer $TRANSEND_API_KEY" \
-H "Content-Type": "application/json" \
-d '{
"prompt": "High-end sneaker on a marble pedestal with volumetric lighting",
"size": "1024x768",
"guidance": 6.5
}'
| Field | Description |
|---|---|
prompt | Natural language description. |
size | Width × height (max 2048 in either dimension). |
guidance | 0–10 float controlling adherence to the prompt. |
reference_images | Optional array of URLs to guide style/composition. |
Editing
curl -X POST "https://api.transendai.net/v1/images/gpt4o/edit" \
-H "Authorization: Bearer $TRANSEND_API_KEY" \
-F "[email protected]" \
-F "[email protected]" \
-F 'payload={
"prompt": "Swap the background to black marble and add cyan accent lighting.",
"size": "1024x1024"
}'
Masks are optional; if omitted, GPT-4o automatically infers editable regions.
Understanding
curl -X POST "https://api.transendai.net/v1/images/gpt4o/analyze" \
-H "Authorization: Bearer $TRANSEND_API_KEY" \
-F "[email protected]" \
-F 'payload={"tasks":["caption","objects","text"]}'
Response excerpt:
{
"analysis": {
"caption": "Coffee shop receipt totaling $18.50",
"objects": [
{ "label": "receipt", "confidence": 0.99 },
{ "label": "latte", "confidence": 0.81 }
],
"text": [
{ "content": "Total $18.50", "bounding_box": [42, 128, 310, 156] }
]
}
}
Streaming
Set stream: true to receive partial outputs in SSE. Each event includes step metadata such as denoise, upscale, and final.
Error Reference
| Code | Meaning | Fix |
|---|---|---|
400 | Unsupported task combination. | Request any subset of generation, edit, analyze separately. |
413 | Upload too large. | Compress assets to 25 MB or provide signed URLs. |
422 | Mask mismatch. | Ensure mask dimensions match the source image. |
Tips
- Combine perception with generation by running
analyzefirst, then piping results into a follow-up generation request. - Use
response_format: { "type": "json_schema" }to enforce structured metadata when generating descriptions or labels. - Monitor GPU-intensive operations via the observability dashboards.