GenAI - Image Generation from keywords

1. Introduction

Last Updated: 2023-10-12

Imagegen Image Generation

Google Imagen is a powerful Google's large language model that can generate realistic and creative images from text descriptions. Imagen on Vertex AI allows users to build next-generation AI products that transform their user's imagination into high quality visual assets, in seconds. In addition to Image Generation from text, it also supports Image Editing via text prompts, Image Captioning, Visual Q&A and Subject & Style Based Image Model Tuning.

Prompt Generation

To Create an image using Imagen, you need to provide a text description of the image known as the prompt, using which an image is generated. However to generate a high quality photo-realistic image, prompting expertise is required. It could also be domain dependent if you want to generate an image related to specific business domains like retail, manufacturing etc. An easier approach to design a prompt is by giving a set of keywords to the Text Bison model.

The overall approach is as under -

Gradio UI

Gradio is an open-source Python library that allows you to quickly create easy-to-use, customizable UI components for your machine learning model, any API, or even an arbitrary Python function using a few lines of code. You can integrate the Gradio GUI directly into your Jupyter notebook or share it as a link with anyone. Gradio supports a wide range of media types, including text, images, videos, and audio. It also provides a number of pre-built UI components, such as input fields, buttons, sliders, and drop-down menus.

What you'll build

In this codelab, you're going to deploy a Gradio app that will:

Generate a text prompt using keywords or phrases. The generated prompt can be manually edited too.
Generate images from the generated prompt on the UI.

What you'll learn

How to use zero-shot and few-shot learning with text-bison model programmatically to generate imagen specific prompts for image generation.
How to generate images using the Imagen model via API and Prompt.
How to Build, Deploy and Test Gradio Application from Vertex AI workbench notebook.

What you'll need

Access to a GCP project for example - ‘Cloud-llm-preview4'
Access to create Vertex AI Workbench
Vertex AI API enabled.
Networking Requirements for Gradio: Notebook instance access public url

2. Getting set up

Create the notebook

Log in the project
Navigate to workbench from the left navigation menu
Under "USER-MANAGED NOTEBOOKS", create a new notebook with the default options.
Click on "OPEN JUPYTERLAB" once the instance has been provisioned.

Note: It may take a few minutes to start the notebook if the notebook is in a stopped state.

Get the code

We've put the code file here. This notebook can be imported in your environment and run as is (except for changing your project details).

3. Run the code

Install/Import the required Dependencies and Libraries

Install the Gradio app
Import the Vertex AI APIs for Text-Bison and Image Generation.
Import all other required libraries.

Prompt Generation Using Text-Bison

Uses a user input containing Keywords and/or Phrase i.e. comma separated list of keywords or phrases that can be used to construct a sentence describing the required image to be generated.

For Example - Persona, Subject, Background, Lighting and other Descriptions.

The function that generates the prompt is given as under:

def prompt_generation(persona,signal,theme, lighting, quality, extra_desc):

model = TextGenerationModel.from_pretrained("text-bison")

response_few_shot = model.predict(

    few_shot_prompt,

    **parameters

)

response_single_shot = model.predict(

    prompt,

    **parameters

)



Few-Shot and Zero-shot prompt

Zero-shot prompting is a text generation technique where the model is given no context or examples to generate text from. This can be challenging, as the model has to rely on its own knowledge to generate text that is coherent and informative.

However, zero-shot prompting can also be very creative, as the model is not constrained by any pre-existing examples.

Few-shot prompting is a text generation technique where the model is given a small number of examples to generate text from. This can be easier than zero-shot prompting, as the model has some guidance on what to generate. However, few-shot prompting can also be limiting, as the model may only be able to generate text that is similar to the examples it was given.

Below is the sample code for the Few-Shot and Zero-shot prompts.

**# Few Shot Prompt used in the code**

few_shot_prompt = f"""You are an expert in writing prompts for Image Generation Models. Using the provided phrases and keywords, concatenate them and add on some realistic details to generate logical and Meaningful prompt that can be used for image generation.

input: people, gardening, house garden, colorful plants, Real, HD image, Photo.

output: A Photo of people gardening in a house garden landscape with few coloured flowering plants. Realistic FULL HD Images, Elegant and natural facial and eye features taken by professional photographer

input: plumber, faucet, kitchen, high quality, natural lighting, Photo

output: A Photo of a plumber fixing a faucet in the kitchen. High quality image with natural indoor lighting.

input: house and garden, halloween, warm lighting, high quality image, Sketch

output: A Sketch of Beautiful House and Garden with Halloween Decorations. Warm lighting, High Quality, 4K photograph taken by professional photographer from front.

input: nice living room, warm lighting,Professional Photographer from far, Photo

output: A photo of a Well designed Living Room. Warm lighting, High Quality, 4K photograph taken by Professional Photographer from far

input: {params_list_str}

output:

"""

# Zero Shot Prompt used in the code

prompt = f"""You are an expert in writing prompts for Image Generation Models. Help me write a list of meaningful prompts for Image Generation Model specifically including the words: "{params_list_str}". Remember to include these words in the prompt and make the prompt meaningful."""



Image Generation Using Imagen

Uses a user input prompt and negative prompt(optional) and feeds the same to model (imagegeneration@002).

def image_generation_completion(input, negative_prompt):

input_prompt = input

model = ImageGenerationModel.from_pretrained("imagegeneration@002")

response = model.generate_images(

    prompt=input_prompt,

    number_of_images=4, #kept to static value of 4

    negative_prompt=negative_prompt

)



The following code is included to generate images from a user input prompt and negative prompt. The final code uses the prompt generated by the text-bison model.

from vertexai.preview.vision_models import ImageGenerationModel

def image_generation(input, negative_prompt):

input_prompt = input

model = ImageGenerationModel.from_pretrained("imagegeneration@002")

response = model.generate_images(

    prompt=input_prompt,

    number_of_images=4, #kept to static value of 4.. can be a max value of 8

    negative_prompt=negative_prompt

)

images = response.images

return images

user_prompt = "Prompt: A Young Woman Showcasing and selling an undecorated Fresh Christmas Tree from A bunch of trees. Cold Lighting, High Quality and detailed Image Taken By Professional Photographer from far."

negative_prompt = "Distorted and unattractive faces"

generated_images_list = image_generation(user_prompt,negative_prompt)

#show one of the generated image

generated_images_list[0].show()



Output -

4. Deploy the Gradio App

Gradio is used for frontend where users can input the keywords and generate structured prompts and these prompts can be used directly or can be further edited by the user and then fed into Imagen to generate images as per the inputs. Gradio is a Python library that can be used to create user interfaces for machine learning models. For this application, Blocks are used to add flexibility and complex data flows to this application. Blocks provides application layout management using Rows and Columns:

with gr.Blocks() as demo:

#Prompt Generation Part

with gr.Row():

    with gr.Column(scale=1):

        Persona = gr.Textbox(label="Persona", info = "Customer segment such as Plumber, Electrician etc.")

    with gr.Column(scale=1):

        Signals = gr.Textbox(label="Signals", info = "Main content of banner such as Faucet, Lamp etc.")

    with gr.Column(scale=1):

        Theme = gr.Textbox(label="Theme", info = "Context of the banner such as Halloween, Kitchen etc.")

with gr.Row():

    with gr.Column(scale=1):

        photo_modifiers = gr.Textbox(label="Photography Modifiers", info = "Photography specific modifiers and parameters such as Lighting(Dramatic/Natural/Warm/Cold), Camera Proximity etc.")

    with gr.Column(scale=1):

        quality_modifiers =  gr.Textbox(label="Image Quality Modifier", info = "Quality Modifiers like high-quality, beautiful, stylized. 4K, HDR, By a professional etc")

    with gr.Column(scale=1):

        other_desc =  gr.Textbox(label="Any Other Description", info = "Other Descriptions for Image such as Style (Painting/Photo/Sketch), Bakground/Foreground Context")

with gr.Row():

    btn = gr.Button("Submit")

with gr.Row():

    returned_prompts = gr.Textbox(label="Result Prompts", interactive = True)

btn.click(fn=prompt_generation, inputs=[Persona, Signals,Theme, photo_modifiers, quality_modifiers, other_desc], outputs = returned_prompts)



To handle the user inputs and outputs, Gradio provides multiple components like Image, Video, Slider, Dropdown, Textbox, Radio and other options. These components give developers flexibility and control on how to accept inputs from the users and feed it to the Test-bison, imagen or any other ML model.

For this project, the application is created using Blocks to add flexibility and complex

data flows to the application. In Addition to Blocks, multiple Gradio Components are

used including:

Rows, Columns for Proper Layouts.
Button, Textbox, Dropdown and Slider to achieve required functionality and ease of use
Image Component to display results.
Other helpers like EventData, update to support dynamic changes to UI.

Below is a code snippet used to generate images from input and negative prompt:

 #Image Generation part

with gr.Row():

    with gr.Column(scale=1):

        image_prompt = gr.Textbox(label="Image Generation Prompt")

with gr.Accordion("Advanced options", open=False): #Let's hide the advanced options!

    with gr.Row():

        negative_prompt = gr.Textbox(label="Negative prompt", info = "Specify What not to Include in Image ex. Bad Quality Image")

with gr.Row():

    with gr.Column(scale=1):    

        img_btn = gr.Button("Generate Images")

with gr.Row():

    with gr.Column():

        output_image_1 = gr.Image(label = "Result Image 1", visible = False)

    with gr.Column():

        output_image_2 = gr.Image(label = "Result Image 2", visible = False)

with gr.Row():

    with gr.Column():

        output_image_3 = gr.Image(label = "Result Image 3", visible = False)

    with gr.Column():

        output_image_4 = gr.Image(label = "Result Image 4", visible = False)

returned_prompts.select(populate_image_prompt, inputs = [returned_prompts], outputs = image_prompt)

img_btn.click(fn=image_generation_completion, inputs=[image_prompt,negative_prompt], outputs = [output_image_1,output_image_2,output_image_3,output_image_4])



To run and test the Gradio application, simply type in a text prompt and click the Generate button. Imagen will generate an image based on your prompt. You can try different prompts to see what kinds of images Imagen can generate.

Below is the screenshot of the Prompt Generation on the Gradio App.

Below is the screenshot of the Image Generation on the Gradio App.

Some examples -

Example 1 -

[Left image] Prompt (Using simple keywords as prompts) : A couple of friends boating.

[Right image] Prompt (Using Prompts generated by Text-Bison) : A photo of 2 young men fishing on a boat surrounded by dark trees in the woods. Men are wearing a shirt and are standing on a boat. Natural Lighting, High Quality, 4K Photo photographed by Professional Photographer.

Example 2 -

[Left image] Prompt ((Using simple keywords as prompts)) : A Christmas tree

[Right image] Prompt (Using Prompts generated by Text-Bison) : A Christmas tree in a room with a lamp and furniture. The tree is decorated with lights and ornaments. It is placed near a window, and there is a wall visible in the background. Warm Lighting, High Quality, HDR Photo photographed by Professional Photographer taken from far.

5. Clean-up

To clean up your resources,

Stop the gradio app.
Stop/Delete the Workbench notebook.

6. Congratulations

Congratulations, you've successfully deployed a Gradio application for creating prompts and images with Google Text-Bison API and Imagen API.