✅ Code copied!

Integrate API with your language

This quick guide shows you how to integrate an API into your app using a variety of popular programming languages. Just choose a language you're comfortable with, copy the sample code, and start connecting to the API right away.

1. Webhook

After you call our API, we will send detailed data to your Webhook URL. See more webhook information here. The code below is an example of how to fetch and validate the data we send to your webhook:

2. AI Image Generator

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
  "base64_images": "data:image/png;base64,iVBORw0KGgoAA..."
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "IMAGE_GENERATION_COMPLETED",
  "event_uuid": "0cea9414-8566-11f0-aeae-ce29e8fcb7da",
  "data": {
    "uuid": "00710a10-8566-11f0-aeae-ce29e8fcb7da",
    "model_name": "imagen-flash",
    "input_text": "Create a dog eat pizza",
    "used_credit": 0,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "media_url": "https://fa1030eacc97e2f2ca187ef328dddf17.r2.cloudflarestorage.com/geminigen-dev-upload-bucket/47/generated_result/image/00710a10-8566-11f0-aeae-ce29e8fcb7da/gen/00710a10-8566-11f0-aeae-ce29e8fcb7da_0.png?response-content-type=application%2Foctet-stream&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=9e4aaa0a83527e9fde114e51284a68ed%2F20250830%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250830T055618Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=56ba151acf556f7018bd236605c9704f471db2de47e9a0b96e183d54204af98c",
    "created_at": "2025-08-30T05:55:58",
    "updated_at": "2025-08-30T05:56:17"
  }
}
      

3. AI Video Generation

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
  "id": 52726,
  "uuid": "f9918af0-84a0-11f0-9ff3-5ed40afdc636",
  "user_id": 31,
  "model_name": "veo-2",
  "input_text": "Cat is running",
  "type": "video",
  "status": 1,
  "status_desc": "",
  "status_percentage": 1,
  "error_code": "",
  "error_message": "",
  "custom_prompt": "",
  "file_size": 0,
  "file_password": "",
  "expired_at": null,
  "name": null,
  "emotion": null,
  "estimated_credit": 60000,
  "media_type": "video",
  "created_at": "2025-08-29T06:25:36",
  "updated_at": null
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "VIDEO_GENERATION_COMPLETED",
  "event_uuid": "9e44cd04-84a2-11f0-b801-5ed40afdc636",
  "data": {
    "uuid": "765efd78-84a2-11f0-b801-5ed40afdc636",
    "model_name": "veo-2",
    "input_text": "Dog is running",
    "used_credit": 60000,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "media_url": "https://87c129bea46e5e69d2d92f9b9ef83ca8.r2.cloudflarestorage.com/geminigen-prd-upload-bucket/31/generated_result/video/765efd78-84a2-11f0-b801-5ed40afdc636/765efd78-84a2-11f0-b801-5ed40afdc636_0.mp4?response-content-type=application%2Foctet-stream&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=852af3549a421613746f7c16fc8699f5%2F20250829%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250829T063721Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=6c98a4ab32eed528649161caa2557d70350b2c5671ca2f04fc058acc4e398c8b",
    "thumbnail_url": "https://cdn.geminigen.ai/thumbnails/31/765efd78-84a2-11f0-b801-5ed40afdc636/765efd78-84a2-11f0-b801-5ed40afdc636_0_200px.jpg",
    "created_at": "2025-08-29T06:36:15",
    "updated_at": "2025-08-29T06:37:19"
  }
}
        

4. Text To Speech

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
    "success": true,
    "result": {
        "id": 52729,
        "uuid": "317f693e-84a4-11f0-b19e-5ed40afdc636",
        "user_id": 31,
        "model_name": "tts-flash",
        "input_text": "Hello world, I am Long",
        "generate_result": null,
        "input_file_path": null,
        "type": "tts-text",
        "used_credit": 0,
        "status": 1,
        "status_desc": "",
        "status_percentage": 50,
        "error_code": "",
        "error_message": "",
        "rating": "",
        "rating_content": "",
        "custom_prompt": null,
        "created_at": "2025-08-29T06:48:38",
        "updated_at": null,
        "file_size": 0,
        "file_password": "",
        "expired_at": null,
        "inference_type": "gemini_voice",
        "name": "Hello world, I am Long",
        "created_by": "API",
        "is_premium_credit": true,
        "emotion": null,
        "note": "logged-in user: 31, plan_id PP0001",
        "estimated_credit": 44,
        "ai_credit": 0,
        "media_type": "audio",
        "service_mode": null
    }
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "TTS_TEXT_SUCCESS",
  "event_uuid": "4133e0a8-84a4-11f0-8e02-5ed40afdc636",
  "data": {
    "uuid": "317f693e-84a4-11f0-b19e-5ed40afdc636",
    "voices": [],
    "speed": 1,
    "model_name": "tts-flash",
    "input_text": "Hello world, I am Long",
    "estimated_credit": 44,
    "used_credit": 44,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "created_at": "2025-08-29T06:48:38",
    "updated_at": "2025-08-29T06:49:01",
    "media_url": "https://87c129bea46e5e69d2d92f9b9ef83ca8.r2.cloudflarestorage.com/geminigen-prd-upload-bucket/31/generated_result/audio/20250829_064844_598020/Hello_world__I_am_Long.mp3?response-content-type=application%2Foctet-stream&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=852af3549a421613746f7c16fc8699f5%2F20250829%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250829T064903Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=a4d25fe0f3507dbab8a398203ac8b4351c698d5342c02c4fa5eb07338f346ba2"
  }
}
        

5. Document To Speech

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
  "success": true,
  "result": {
    "id": 52730,
    "uuid": "03b0f152-84a5-11f0-b744-5ed40afdc636",
    "user_id": 31,
    "model_name": "tts-flash",
    "input_text": "yeucau.txt",
    "generate_result": null,
    "input_file_path": null,
    "type": "tts-document",
    "used_credit": 0,
    "status": 1,
    "status_desc": "",
    "status_percentage": 1,
    "error_code": "",
    "error_message": "",
    "rating": "",
    "rating_content": "",
    "custom_prompt": null,
    "created_at": "2025-08-29T06:54:31",
    "updated_at": null,
    "file_size": 599,
    "file_password": null,
    "expired_at": null,
    "inference_type": "gemini_voice",
    "name": "yeucau.txt",
    "created_by": "API",
    "is_premium_credit": true,
    "emotion": null,
    "note": null,
    "estimated_credit": 0,
    "ai_credit": 0,
    "media_type": "audio",
    "service_mode": null
  }
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "TTS_DOCUMENT_SUCCESS",
  "event_uuid": "926d91e2-85b9-11f0-a452-5a8eb76a5ed6",
  "data": {
    "uuid": "58e60b8e-85b9-11f0-b19e-5ed40afdc636",
    "voices": [],
    "speed": 1,
    "model_name": "tts-flash",
    "input_text": "yeucau.txt",
    "estimated_credit": 1026,
    "used_credit": 1022,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "created_at": "2025-08-30T15:52:35",
    "updated_at": "2025-08-30T15:54:07",
    "media_url": "https://87c129bea46e5e69d2d92f9b9ef83ca8.r2.cloudflarestorage.com/geminigen-prd-upload-bucket/31/generated_result/audio/20250830_155400_704043/yeucau.mp3?response-content-type=application%2Foctet-stream&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=852af3549a421613746f7c16fc8699f5%2F20250830%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250830T155412Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=d2cf604e6004d6a9f11788cf7f7c57e1ba49a2d289a655553b11100c525e13e9",
    "file_size": 599
  }
}
        

6. Dialogue Gen

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
    "success": true,
    "result": {
        "id": 1659,
        "uuid": "faaf4ea4-858b-11f0-b20a-ce29e8fcb7da",
        "user_id": 47,
        "model_name": "tts-flash",
        "input_text": "Voice 1: Hello world, I am Long. test AI Dialogue Generator",
        "generate_result": null,
        "input_file_path": null,
        "type": "tts-multi-speaker",
        "used_credit": 0,
        "status": 1,
        "status_desc": "",
        "status_percentage": 50,
        "error_code": "",
        "error_message": "",
        "rating": "",
        "rating_content": "",
        "custom_prompt": null,
        "created_at": "2025-08-30T10:27:50",
        "updated_at": null,
        "file_size": 0,
        "file_password": "",
        "expired_at": null,
        "inference_type": "gemini_voice",
        "name": "Voice 1: Hello world, I am Long. test AI Dialogue ",
        "created_by": "API",
        "is_premium_credit": true,
        "emotion": null,
        "note": "logged-in user: 47, plan_id PP0001",
        "estimated_credit": 118,
        "ai_credit": 0,
        "media_type": "audio",
        "service_mode": null
    }
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "TTS_TEXT_SUCCESS",
  "event_uuid": "9f2a12f6-864b-11f0-ab99-667310249fb6",
  "data": {
    "uuid": "9016a7de-864b-11f0-b30a-667310249fb6",
    "voices": [],
    "speed": 1,
    "model_name": "tts-flash",
    "input_text": "Voice 1: Hello world, I am Long",
    "estimated_credit": 62,
    "used_credit": 62,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "created_at": "2025-08-31T09:19:14",
    "updated_at": "2025-08-31T09:19:36",
    "media_url": "https://87c129bea46e5e69d2d92f9b9ef83ca8.r2.cloudflarestorage.com/geminigen-prd-upload-bucket/31/generated_result/audio/20250831_091919_007294/Voice_1__Hello_world__I_am_Long.mp3?response-content-type=application%2Foctet-stream&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=852af3549a421613746f7c16fc8699f5%2F20250831%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250831T091938Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=e8f610edeb59a3792f65c3dad5d01d6efc38d998b610aaf14d4208f11debef62"
  }
}
        

6. Text Gen

The code below we only include required parameters. There are many other fine-tuning parameters you can see more here.

The returned response data format will be (JSON):


{
    "negative_prompt": null,
    "status_percentage": 1,
    "file_password": "",
    "created_by": "API",
    "service_mode": null,
    "input_file_path": null,
    "rating": "",
    "error_code": "",
    "is_premium_credit": 1,
    "created_at": "2025-09-12T11:26:58",
    "uuid": "64c80512-8fcb-11f0-89ba-02083500aad5",
    "generate_result": null,
    "rating_content": "",
    "error_message": "",
    "is_emotion_failed": 0,
    "updated_at": null,
    "id": 1711,
    "type": "text",
    "generate_job_id": null,
    "expired_at": null,
    "emotion": null,
    "deleted_at": null,
    "user_id": 47,
    "estimated_credit": 0,
    "custom_prompt": "",
    "provider": "google",
    "media_type": "text",
    "model": "gemini-2.5-pro",
    "used_credit": 0,
    "note": null,
    "inference_type": null,
    "thumbnail_url": null,
    "model_name": "gemini-2.5-pro",
    "status": 1,
    "file_size": 0,
    "name": null,
    "template_id": null,
    "input_text": "Introducing the most powerful AI models today",
    "status_desc": "",
    "ai_credit": 0,
    "key_provider": null
}
      

The data request to your webhook will be in the form(JSON):


{
  "event_name": "TEXT_GENERATION_COMPLETED",
  "event_uuid": "7b94f2f0-8fcb-11f0-89ba-02083500aad5",
  "data": {
    "uuid": "64c80512-8fcb-11f0-89ba-02083500aad5",
    "model_name": "gemini-2.5-pro",
    "input_text": "Introducing the most powerful AI models today",
    "used_credit": 410,
    "status": 2,
    "status_percentage": 100,
    "error_message": "",
    "response_text": "Of course. The landscape of AI is moving at a breathtaking pace, with new \"most powerful\" models being announced every few months. As of mid-2024, the field is dominated by a few key players who are pushing the boundaries of what's possible.\n\nHere's an introduction to the most powerful and influential AI models today, broken down by category.\n\n---\n\n### The Titans: The Flagship General-Purpose Models\n\nThese are the all-encompassing, multimodal models from the major AI labs that compete for the title of \"the best.\" They excel at a wide range of tasks, including reasoning, conversation, and content creation across text, images, and audio.\n\n#### 1. **OpenAI's GPT-4o (\"o\" for Omni)**\n*   **What Makes it Powerful:** GPT-4o is OpenAI's latest flagship model and represents a massive leap in usability and multimodality. Its key innovation is being a single, natively multimodal model. Instead of separate models for text, vision, and audio, GPT-4o processes everything seamlessly. This allows for incredibly fast, real-time voice and vision conversations that feel natural and human-like.\n*   **Key Strengths:**\n    *   **Speed and Efficiency:** It's significantly faster and cheaper to run than its predecessor, GPT-4 Turbo.\n    *   **Real-Time Multimodality:** Can understand and respond to a combination of text, audio, and images in real-time. You can talk to it, show it things via your camera, and it responds instantly.\n    *   **Emotional Nuance in Voice:** The voice assistant can generate speech with different emotional tones (laughing, singing, etc.), making interaction far more engaging.\n*   **Best For:** Real-time problem solving, natural voice assistance, and interactive creative collaboration.\n\n#### 2. **Google's Gemini 1.5 Pro**\n*   **What Makes it Powerful:** Google's champion is built on a \"Mixture-of-Experts\" (MoE) architecture, making it highly efficient. Its standout feature is an absolutely massive context window. While most models handle thousands of tokens (words/pieces of words), Gemini 1.5 Pro can process **1 million tokens**—equivalent to an entire movie, multiple long books, or a large codebase.\n*   **Key Strengths:**\n    *   **Massive Context Window:** Can analyze and reason over vast amounts of information provided in a single prompt. You can \"drop in\" a 500-page PDF and ask detailed questions about it.\n    *   **Advanced Multimodal Reasoning:** Excels at analyzing video content frame-by-frame, finding specific moments, and answering complex questions about what's happening.\n    *   **High Performance at Scale:** Maintains high accuracy even when processing enormous amounts of data.\n*   **Best For:** Deep analysis of large documents, video content analysis, and complex code review.\n\n#### 3. **Anthropic's Claude 3 Opus**\n*   **What Makes it Powerful:** Claude 3 Opus is renowned for its exceptional performance in complex reasoning, analytical tasks, and generating sophisticated, high-quality written content. It was one of the first models to consistently outperform GPT-4 on several key industry benchmarks upon its release. Anthropic also places a strong emphasis on AI safety and ethics.\n*   **Key Strengths:**\n    *   **Superior Reasoning and Analysis:** Often considered the \"thinking\" model, it excels at tasks requiring deep understanding, like financial analysis, interpreting scientific papers, and creative writing.\n    *   **Reduced \"Refusals\":** It's better at understanding context and is less likely to refuse to answer prompts that are safe but might border on a sensitive topic.\n    *   **Strong Vision Capabilities:** Can analyze charts, graphs, and images with high accuracy.\n*   **Best For:** Professional writing, business analysis, academic research, and tasks requiring nuanced understanding.\n\n---\n\n### The Open-Source Champions\n\nThese models are \"free\" to be downloaded, modified, and used by developers and companies, fostering a massive wave of innovation outside the big tech labs.\n\n#### 1. **Meta's Llama 3**\n*   **What Makes it Powerful:** Llama 3 is the current king of open-source models. Meta trained it on a colossal, high-quality dataset, resulting in state-of-the-art performance that competes with—and sometimes surpasses—proprietary models like GPT-3.5 and even early versions of GPT-4. It comes in several sizes (8B and 70B parameters) to fit different needs.\n*   **Key Strengths:**\n    *   **Top-Tier Performance:** The 70B model is incredibly capable at reasoning, coding, and instruction following.\n    *   **Permissive License:** Allows for commercial use, making it the go-to choice for startups and companies building their own AI products.\n*   **Best For:** Developers building custom AI applications, researchers, and companies wanting to host their own powerful models.\n\n#### 2. **Mistral AI's Mistral Large & Mixtral 8x22B**\n*   **What Makes it Powerful:** Paris-based Mistral AI has quickly become a major force. Their models are known for their efficiency and top-tier performance. They use a Mixture-of-Experts (MoE) architecture, which means only parts of the model are activated for any given task, making them much faster and cheaper to run than monolithic models of a similar size.\n*   **Key Strengths:**\n    *   **Efficiency:** Delivers incredible performance for its computational cost.\n    *   **Multilingual:** Strong native support for multiple languages.\n*   **Best For:** Applications requiring a balance of high performance and cost-effectiveness.\n\n---\n\n### The Specialized Powerhouses\n\nThese models are designed to be the best in the world at one specific thing.\n\n#### **Image Generation**\n*   **Midjourney v6:** Widely considered the leader for artistic quality, photorealism, and creating stunning, aesthetically pleasing images. It has a deep understanding of art styles, composition, and lighting.\n*   **DALL-E 3 (OpenAI):** Its power comes from its deep integration with ChatGPT. You can describe what you want in natural, conversational language, and it excels at following complex instructions and rendering text accurately.\n*   **Stable Diffusion 3 (Stability AI):** The most powerful *open-source* image model. It's incredibly versatile and customizable, allowing developers to fine-tune it for specific styles and applications.\n\n#### **Video Generation**\n*   **OpenAI's Sora:** While not yet publicly available, Sora has redefined what's possible in AI video. It can generate high-fidelity, coherent video clips up to a minute long from a simple text prompt, demonstrating a sophisticated understanding of the physical world.\n*   **Runway Gen-3 & Pika Labs:** The leading publicly available tools for creating high-quality, short video clips from text or images.\n\n#### **Scientific Discovery**\n*   **Google DeepMind's AlphaFold 3:** A monumental achievement. This model can predict the structure and interactions of nearly all of life's molecules (proteins, DNA, etc.) with incredible accuracy. Its power isn't in chatting, but in revolutionizing drug discovery and biological research.\n\n### Summary Table\n\n| Model Name       | Developer       | Key Feature                                     | Type              |\n| ---------------- | --------------- | ----------------------------------------------- | ----------------- |\n| **GPT-4o**       | OpenAI          | Real-time, seamless voice/vision interaction    | Proprietary/Closed |\n| **Gemini 1.5 Pro** | Google          | Massive 1 million token context window          | Proprietary/Closed |\n| **Claude 3 Opus**  | Anthropic       | Elite-level reasoning and analytical depth      | Proprietary/Closed |\n| **Llama 3**        | Meta            | State-of-the-art open-source performance        | Open-Source       |\n| **Midjourney v6**  | Midjourney, Inc. | Unmatched artistic and photorealistic quality   | Proprietary/Closed |\n| **Sora**           | OpenAI          | Hyper-realistic and coherent AI video generation | In Preview        |\n| **AlphaFold 3**    | Google DeepMind | Predicts the structure of all life's molecules  | Research Tool     |",
    "created_at": "2025-09-12T11:26:58",
    "updated_at": "2025-09-12T11:27:35"
  }
}