For this project, I explored how to use AI to automate the creation of high-quality advertising images. The goal was to build a tool that could take separate images of a product, a model, and a background, and then use AI to merge them into a single, photorealistic fashion advertisement.
My Development Approach
I designed a multi-step pipeline that uses Google's Gemini API to understand the input images and then generate a new one based on a combined vision.
1. Deconstructing the Scene
My first challenge was to teach the AI to "see" and understand the individual components of a potential ad. I used the Gemini API's vision capabilities to analyze the images I provided:
2. Crafting the Perfect Prompt
Once I had the text descriptions, the next step was to combine them into a single, effective instruction for the image generator. This was a fun prompt engineering challenge. I wrote a Python script that intelligently weaves the descriptions together into a cohesive narrative, like:
This ensured the final output was not just a random combination but a thoughtfully constructed scene.
3. Bringing the Image to Life
With the detailed prompt ready, I fed it back into the Gemini API to generate the final image. The script automatically saves the generated image and a JSON file containing all the descriptions used to create it. This makes it easy to track how different prompts affect the final result.
To keep the project clean and maintainable, I organized the code logically: a main.py script to run the process, utils.py for the core functions (like description and prompt generation), and a simple config.py to manage the API key.
Reference: https://github.com/majjikishore007/ImageGeneration