1. Introduction
In this lab, you're going to build a web service to generate trivia quizzes and integrate it into a fun, working app. You'll be using a different programming language than you may have used before: English!
What you'll do...
- You'll craft a prompt that generates a trivia quiz according to a set of criteria.
- You'll build a simple web app and verify that it runs as expected in your development environment.
- You'll incrementally add logic to your web app to turn it into an API server that generates quizzes according to a set of input parameters.
- You'll see how easy it is to deploy your quiz generation service to the cloud using Google Cloud Run.
- Finally, you'll configure a real app ( quizaic.com) to use your deployed quiz generator service and you'll be able to play live quizzes based on the output.
What you'll learn...
- How to create a templated prompt for a Large Language Model (LLM).
- How to create a simple web server app in Python.
- How to add support for Google's LLM into your web app.
- How to deploy your app to the cloud so anyone can try your new creation.
- How to integrate your quiz generator into a larger app.
What you'll need...
- Chrome web browser
- A Google account
- A Cloud Project with billing enabled
This lab is targeted to developers of all levels, including beginners. Although you'll be using Python, you don't need to be familiar with Python programming in order to understand what's going on because we'll explain all the code you'll see.
2. Setup
This section covers everything you need to do to get started with this lab.
Self-paced environment setup
- Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.
- The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
- The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as
PROJECT_ID
). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project. - For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
- Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.
Start Cloud Shell
In this lab you're going to work in a Cloud Shell session, which is a command interpreter hosted by a virtual machine running in Google's cloud. You could just as easily run this section locally on your own computer, but using Cloud Shell gives everyone access to a reproducible experience in a consistent environment. After the lab, you're welcome to retry this section on your own computer.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.
Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
Enable some APIs
In later steps, you'll see where these services are needed (and why), but for now, run this command to give your project access to Cloud Build, Artifact Registry, Vertex AI, and Cloud Run:
gcloud services enable cloudbuild.googleapis.com \ artifactregistry.googleapis.com \ aiplatform.googleapis.com \ run.googleapis.com
This should produce a successful message similar to the following:
Operation "operations/acf.cc11852d-40af-47ad-9d59-477a12847c9e" finished successfully.
3. Prompting - Programming in Natural Language
We're going to start by learning how to develop a prompt for a Large Language Model. Navigate to Google Cloud Console > Vertex AI > Vertex AI Studio (Language). You should see a page like this:
Under Generate Text
, click the Text Prompt
button. In the next dialog, enter a prompt that you think might be effective for generating a trivia quiz according to the following requirements:
- Topic: World History
- Number of questions: 5
- Difficulty level: intermediate
- Language: English
Click the Submit button to see the output.
As shown in the following screenshot, the right hand panel gives you the ability to select which model you'd like to use and fine-tune some of the settings:
The following settings are available:
- Region is where your generation request should run.
- Model selects which large language model you'd like to use. For this codelab, stick with "gemini-1.0-pro-001".
- Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a true or correct response, while higher temperatures can lead to more diverse or unexpected results.
- Token limit determines the maximum amount of text output from one prompt. A token is approximately four characters. The default value is 1024.
- Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature). The default top-k value is 40.
- Top-p changes how the model selects tokens for output. Tokens are selected from most probable to least until the sum of their probabilities equals the top-p value.
- Max responses is the maximum number of model responses generated per prompt.
- A stop sequence is a series of characters (including spaces) that stops response generation if the model encounters it.
- Streaming responses selects whether responses should be printed as they are generated or saved up and displays when complete.
- Safety filter threshold adjusts how likely you are to see responses that could be harmful.
Once you have a prompt that seems to generate a reasonable quiz according to the requirements noted above, we could parse this quiz using custom code but wouldn't it be nicer to have the LLM generate the quiz in a structured format we can directly load into our program? The program we'll be using later in this lab to call your generator expects quizzes to be expressed in JSON, which is a popular cross-language format for representing structured data.
Quizzes in this lab are expressed as an array of objects, where each object contains a question, an array of possible responses to that question, and a correct response. Here's the JSON encoding for quizzes in this lab:
[ { "question": "Who was the first person to walk on the moon?", "responses": [ "Neil Armstrong", "Buzz Aldrin", "Michael Collins", "Yuri Gagarin" ], "correct": "Neil Armstrong" }, { "question": "What was the name of the war that took place between the British and the French in North America from 1754 to 1763??", "responses": [ "The French and Indian War", "The Seven Years' War", "The War of the Austrian Succession", "The Great War" ], "correct": "The French and Indian War" }, ... ]
See if you can modify your prompt to now output the quiz in the required JSON format.
- Specify in words the precise format you're looking for (e.g. the italicized sentence above).
- Include in your prompt an example of the desired JSON format.
Once you have your prompt generating quizzes according to your desired specification, click the GET CODE
button in the upper right hand corner of the page to see Python code that can be used to programmatically submit your prompt to a Vertex AI LLM. If you're interested in using a programming language other than Python, check out https://cloud.google.com/vertex-ai/docs/samples?text=generative.
4. Build a Simple Web Server
Now that you have a working prompt, we want to integrate it into a larger app. Of course, we could embed your prompt into the larger app's source code but we want your generator to function as a microservice that provides a quiz generation service for other apps. In order to make that happen, we'll need to create a simple web server and make it publicly available. We'll do that in the following steps.
Start by clicking the Open Editor
button at the top of your Cloud Shell panel. It looks like this:
You'll then find yourself in an IDE environment similar to Visual Studio Code, in which you can create projects, edit source code, run your programs, etc.
If your screen is too cramped, you can expand or shrink the dividing line between the console and your edit/terminal window by dragging the horizontal bar between those two regions, highlighted here:
You can switch back and forth between the Editor and the Terminal by clicking the Open Editor
and Open Terminal
buttons, respectively. Try switching back and forth between these two environments now.
Next, create a folder in which to store your work for this lab, by clicking the add folder button , enter quiz-generator
, and pressing enter. All of the files you create in this lab, and all of the work you do in Cloud Shell, will take place in this folder.
Now create a requirements.txt
file. This tells Python which libraries your app depends on. For this simple web app, you're going to use a popular Python module for building web servers called Flask,
the google-cloud-aiplatform
client library, and a web server framework called gunicorn
. In the file navigation pane, right click on the quiz-generator
folder and select the New file
menu item, like this:
When prompted for the new file's name, enter requirements.txt
and press the enter key. Make sure the new file ends up in the quiz-generator
project folder.
Paste the following lines in the new file to specify that your app depends on the Python flask package, the gunicorn web server, and the google-cloud-aiplatform client library, along with the associated versions of each.
flask==3.0.0 gunicorn==21.2.0 google-cloud-aiplatform==1.47.0
You don't need to explicitly save this file because the Cloud Editor will auto-save changes for you.
Using the same technique, create another new file named main.py
. This will be your app's main (and only) Python source file. Again, make sure the new file ends up in the quiz-generator
folder.
Insert the following code into this file:
from flask import Flask
import os
app = Flask(__name__) # Create a Flask object.
PORT = os.environ.get("PORT") # Get PORT setting from environment.
if not PORT:
PORT = 8080
# The app.route decorator routes any GET requests sent to the root path
# to this function, which responds with a "Hello world!" HTML document.
@app.route("/", methods=["GET"])
def say_hello():
html = "<h1>Hello world!</h1>"
return html
# This code ensures that your Flask app is started and listens for
# incoming connections on the local interface and port 8080.
if __name__ == "__main__":
app.run(host="0.0.0.0", port=PORT)
Switch back to the terminal and change into the project folder with this command:
cd quiz-generator
Run the following command to install your project dependencies:
pip3 install -r requirements.txt
After install dependencies, you should see output that ends like this:
Successfully installed flask-3.0.0
Now launch your app by running this command in the terminal:
flask --app main.py --debug run --port 8080
At this point, your app is running on the virtual machine dedicated to your Cloud Shell session. Cloud Shell includes a proxy mechanism that makes it possible for you to access web servers (like the one you just started) running on your virtual machine from anywhere on the global internet.
Click the web preview
button and then the Preview on Port 8080
menu item like this:
This will open a web browser tab to your running app, which should look something like this:
5. Add a generate method with parameter parsing
Now we want to add support for fielding a new method called generate
. Do this by adding an import statement to manipulate the HTTP request and modify the main route to parse this request and print parameters, as follows:
from flask import Flask
from flask import request #<-CHANGED
import os
app = Flask(__name__) # Create a Flask object.
PORT = os.environ.get("PORT") # Get PORT setting from environment.
if not PORT:
PORT = 8080
# The app.route decorator routes any GET requests sent to the /generate
# path to this function, which responds with "Generating:" followed by
# the body of the request.
@app.route("/", methods=["GET"]) #<-CHANGED
def generate(): #<-CHANGED
params = request.args.to_dict() #<-CHANGED
html = f"<h1>Quiz Generator</h1>" #<-CHANGED
for param in params: #<-CHANGED
html += f"<br>{param}={params[param]}" #<-CHANGED
return html #<-CHANGED
# This code ensures that your Flask app is started and listens for
# incoming connections on the local interface and port 8080.
if __name__ == "__main__":
app.run(host="0.0.0.0", port=PORT)
Now reload your existing web browser tab to see the results. This time you should see the "Quiz Generator", along with a query parameter automatically added to your url (authuser
). Try adding two additional parameters by appending the string "`¶m1=val1¶m2=val2`" to the end of the URL in your browser's address bar, reload the page, and you should see something like this:
Now that we've seen how to send and parse query parameters on a URL, we'll add support for the specific parameters we'll want to send our quiz generator, which are as follows:
topic
- the desired quiz subject matternum_q
- the number of desired questionsdiff
- the desired difficulty level (easy, intermediate, hard)lang
- the desired quiz language
from flask import Flask
from flask import request
import os
# Default quiz settings #<-CHANGED
TOPIC = "History" #<-CHANGED
NUM_Q = "5" #<-CHANGED
DIFF = "intermediate" #<-CHANGED
LANG = "English" #<-CHANGED
app = Flask(__name__) # Create a Flask object.
PORT = os.environ.get("PORT") # Get PORT setting from environment.
if not PORT:
PORT = 8080
# This function takes a dictionary, a name, and a default value.
# If the name exists as a key in the dictionary, the corresponding
# value is returned. Otherwise, the default value is returned.
def check(args, name, default): #<-CHANGED
if name in args: #<-CHANGED
return args[name] #<-CHANGED
return default #<-CHANGED
# The app.route decorator routes any GET requests sent to the /generate
# path to this function, which responds with "Generating:" followed by
# the body of the request.
@app.route("/", methods=["GET"])
# This function generates a quiz using Vertex AI.
def generate():
args = request.args.to_dict() #<-CHANGED
topic = check(args, "topic", TOPIC) #<-CHANGED
num_q = check(args, "num_q", NUM_Q) #<-CHANGED
diff = check(args, "diff", DIFF) #<-CHANGED
lang = check(args, "lang", LANG) #<-CHANGED
html = f"""
<h1>Quiz Generator</h1><br>
{topic=}<br>
{num_q=}<br>
{diff=}<br>
{lang=}""" #<-CHANGED
return html
# This code ensures that your Flask app is started and listens for
# incoming connections on the local interface and port 8080.
if __name__ == "__main__":
app.run(host="0.0.0.0", port=PORT)
Now reload your existing web browser tab to see the results. You should see the something like the following web page:
Try changing the URL to set values for various parameters. For example, try using the suffix "?authuser=0&topic=Literature&num_q=10&diff=easy&lang=French
" at the end of the URL in your address bar:
6. Add and format your prompt
Next, we'll add support for the specific parameters we'll want to send our quiz generator, which are as follows:
topic
- the desired quiz subject matternum_q
- the number of desired questionsdiff
- the desired difficulty level (easy, intermediate, hard)lang
- the desired quiz language
Copy the prompt you developed with the Vertex Generative AI Studio in an earlier step but change the hard-coded values for topic, number of questions, and difficulty level with these strings:
- {topic}
- {num_q}
- {diff}
- {lang}
from flask import Flask
from flask import request
import os
# Default quiz settings
TOPIC = "History"
NUM_Q = 5
DIFF = "intermediate"
LANG = "English"
PROMPT = """
Generate a quiz according to the following specifications:
- topic: {topic}
- num_q: {num_q}
- diff: {diff}
- lang: {lang}
Output should be (only) an unquoted json array of objects with keys:
"Question", "responses", and "correct".
""" #<-CHANGED
app = Flask(__name__) # Create a Flask object.
PORT = os.environ.get("PORT") # Get PORT setting from environment.
if not PORT:
PORT = 8080
# This function takes a dictionary, a name, and a default value.
# If the name exists as a key in the dictionary, the corresponding
# value is returned. Otherwise, the default value is returned.
def check(args, name, default):
if name in args:
return args[name]
return default
# The app.route decorator routes any GET requests sent to the /generate
# path to this function, which responds with "Generating:" followed by
# the body of the request.
@app.route("/", methods=["GET"])
# This function generates a quiz using Vertex AI.
def generate():
args = request.args.to_dict()
topic = check(args, "topic", TOPIC)
num_q = check(args, "num_q", NUM_Q)
diff = check(args, "diff", DIFF)
lang = check(args, "lang", LANG)
prompt = PROMPT.format(topic=topic, num_q=num_q, diff=diff, lang=lang) #<-CHANGED
html = f"<h1>Prompt:</h1><br><pre>{prompt}</pre>" #<-CHANGED
return html
# This code ensures that your Flask app is started and listens for
# incoming connections on the local interface and port 8080.
if __name__ == "__main__":
app.run(host="0.0.0.0", port=PORT)
Now reload your existing web browser tab to see the results. You should see the something like the following web page:
Try modifying the URL to alter those four parameters.
7. Add the Vertex AI client library
Now we're ready to use the Vertex AI Python client library to generate your quiz. This will automate the interactive prompting you did in step #3 and give your generator service programmatic access to Google's LLM capabilities. Update your main.py
file as follows:
Make sure to replace "YOUR_PROJECT" with your actual project id.
from flask import Flask
from flask import request
from flask import Response #<-CHANGED
import os
import vertexai
from vertexai.generative_models import GenerativeModel #<-CHANGED
# Default quiz settings
TOPIC = "History"
NUM_Q = 5
DIFF = "intermediate"
LANG = "English"
MODEL = "gemini-1.0-pro" #<-CHANGED
PROMPT = """
Generate a quiz according to the following specifications:
- topic: {topic}
- num_q: {num_q}
- diff: {diff}
- lang: {lang}
Output should be (only) an unquoted json array of objects with keys "question", "responses", and "correct".
"""
app = Flask(__name__) # Create a Flask object.
PORT = os.environ.get("PORT") # Get PORT setting from environment.
if not PORT:
PORT = 8080
# Initialize Vertex AI access.
vertexai.init(project="YOUR_PROJECT", location="us-central1") #<-CHANGED
parameters = { #<-CHANGED
"candidate_count": 1, #<-CHANGED
"max_output_tokens": 1024, #<-CHANGED
"temperature": 0.5, #<-CHANGED
"top_p": 0.8, #<-CHANGED
"top_k": 40, #<-CHANGED
} #<-CHANGED
model = GenerativeModel(MODEL) #<-CHANGED
# This function takes a dictionary, a name, and a default value.
# If the name exists as a key in the dictionary, the corresponding
# value is returned. Otherwise, the default value is returned.
def check(args, name, default):
if name in args:
return args[name]
return default
# The app.route decorator routes any GET requests sent to the /generate
# path to this function, which responds with "Generating:" followed by
# the body of the request.
@app.route("/", methods=["GET"])
# This function generates a quiz using Vertex AI.
def generate():
args = request.args.to_dict()
topic = check(args, "topic", TOPIC)
num_q = check(args, "num_q", NUM_Q)
diff = check(args, "diff", DIFF)
lang = check(args, "lang", LANG)
prompt = PROMPT.format(topic=topic, num_q=num_q, diff=diff, lang=lang)
response = model.generate_content(prompt, generation_config=parameters) #<-CHANGED
print(f"Response from Model: {response.text}") #<-CHANGED
html = f"{response.text}" #<-CHANGED
return Response(html, mimetype="application/json") #<-CHANGED
# This code ensures that your Flask app is started and listens for
# incoming connections on the local interface and port 8080.
if __name__ == "__main__":
app.run(host="0.0.0.0", port=PORT)
Now reload your existing web browser tab to see the results. Note that this may take several seconds because now you're actually making an LLM request. You should see the something like the following web page:
Try altering the URL to request a different quiz topic, number of questions, and difficulty level.
And with that, your microservice is finished - congratulations! In the next step, you'll learn how to deploy your service in the Cloud so that anyone can access it from anywhere.
8. To the Cloud!
Now that you've built your own quiz generator, you'll want to share this bit of awesomeness with the rest of the world, so it's time to deploy it to the Cloud. But you'd really like to do more than just share it. You'd like to make sure it:
- runs reliably - you get automatic fault tolerance in case a computer running your app crashes
- scales automatically - your app will keep up with vast levels of traffic, and automatically reduce its footprint when unused
- minimizes your costs, by not charging you for resources you're not using - you're charged only for resources consumed while responding to traffic
- is accessible via a custom domain name - you have access to a one-click solution to assign a custom domain name to your service
- offers excellent response time - cold starts are reasonably responsive but you can fine tune that by specifying a minimum instance configuration
- supports end-to-end encryption using standard SSL/TLS web security - when you deploy a service, you get standard web encryption, and the corresponding required certificates, for free and automatically
By deploying your app to Google Cloud Run, you get all of the above and more. The basic building block for sharing your app with Cloud Run is a container.
Containers give us the ability to create a modular box in which to run an application with all its dependencies bundled together. Because containers can be used on nearly any virtual or real server, this gives us a way to deploy your application anywhere you like, from on-premise to the Cloud, and even to move your application from one service provider to another.
To learn more about containers and how they work in Google Cloud Run, check out the Dev to Prod in Three Easy Steps with Cloud Run codelab.
Deploy Your App to Cloud Run
Cloud Run is a regional service, which means the infrastructure that runs your Cloud Run services is located in a specific region and is managed by Google to be redundantly available across all the zones within that region. For simplicity, in this lab we'll use the hardcoded region us-central1
.
We're going to use something called buildpack to automatically generate your container. Create a new file named Procfile
in Cloud Editor and insert this one line of text:
web: gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
This tells the buildpack system how to run your app in the auto-generated container. Next, run the following command in the Cloud Shell Terminal (from that same quiz-generator
directory) :
gcloud run deploy quiz-generator \ --source . \ --region us-central1 \ --allow-unauthenticated
This tells the gcloud
command that you want it to use buildpacks to create your container image, based on the source files it finds in the current directory (the dot
in --source .
is shorthand for the current directory). Because the service takes care of the container image implicitly, you don't need to specify an image on this gcloud
command.
Wait a few moments until the deployment is complete. On success, the gcloud
command displays the new service's URL:
Building using Buildpacks and deploying container to Cloud Run service [quiz-generator] in project [YOUR_PROJECT] region [YOUR_REGION] OK Building and deploying new service... Done. OK Creating Container Repository... OK Uploading sources... OK Building Container... Logs are available at [https://console.cloud.google.com/cloud-build/builds/0cf1383f-35db-412d -a973-557d5e2cd4a4?project=780573810218]. OK Creating Revision... OK Routing traffic... OK Setting IAM Policy... Done. Service [quiz-generator] revision [quiz-generator-00001-xnr] has been deployed and is serving 100 percent of traffic. Service URL: https://quiz-generator-co24gukjmq-uc.a.run.app
You can also retrieve your service URL with this command:
gcloud run services describe quiz-generator \ --region us-central1 \ --format "value(status.url)"
This should display something like:
https://quiz-generator-co24gukjmq-uc.a.run.app
This link is a dedicated URL, with TLS security, for your Cloud Run service. This link is permanent (as long as you don't disable your service) and usable anywhere on the internet. It doesn't use the Cloud Shell's proxy mechanism mentioned earlier, which depended on a transient virtual machine.
Click on the highlighted Service URL
to open a web browser tab to your running app. Verify the result is the same as what you saw when in your development environment. Also verify you can adjust the generated quiz by supplying parameters at the end of the URL.
Congratulations! Your app is now running in Google's Cloud. Without having to think about it, your app is publicly available, with TLS (HTTPS) encryption, and automatic scaling to mind-boggling levels of traffic.
9. Putting all the pieces together
In this final step, we're ready to run your quiz generator as part of the quizaic app. Visit the quizaic URL, login to your Google account, and navigate to the Create Quiz
tab. Select generator type Custom
, paste your Cloud Run URL into the URL field, fill in the other required fields, and submit the form.
In a few moments, you should have a new quiz (see "My new quiz" in the image below), with an AI generated thumbnail image, which you can edit, play, clone, or delete via the corresponding buttons. This new quiz was created using the web service you just deployed based on your templated prompt!
10. Cleaning Up
While Cloud Run does not charge when the service is not in use, you might still be charged for storing the built container image.
You can either delete your GCP project to avoid incurring charges, which will stop billing for all the resources used within that project, or simply delete your container image using this command:
gcloud config set artifacts/repository cloud-run-source-deploy gcloud config set artifacts/location us-central1 gcloud artifacts docker images list # Note image tag for resulting list gcloud artifacts docker images delete <IMAGE-TAG>
To delete your Cloud Run service, use this command:
gcloud run services delete quiz-generator --region us-central1 --quiet
11. You Did It!
Congratulations - you've successfully crafted an LLM prompt and deployed a Cloud Run microservice using that prompt. Now you can program in natural language and share your creations with the world!
I want to leave you with one important question:
Once you got your app working in your developer environment, how many lines of code did you have to modify to deploy it to the cloud, with all the production-grade attributes offered by Cloud Run?
The answer, of course, is zero. :)
Other codelabs to check out...
- Dev to Prod in Three Easy Steps with Cloud Run
- Text Summarizer app with Vertex AI and Svelte Kit
- Chat App with PaLM API on Cloud Run
- Cloud Function that wraps the PaLM Text Bison Models
- Data to Generative AI with Spanner and Vertex AI Imagen API
Reference docs...
12. Call to Action
If you've enjoyed this codelab and are likely to spend more time hands-on with Google Cloud then you should really Join Google Cloud Innovators today!
Google Cloud Innovators is Free and includes:
- Live discussions, AMAs, and roadmap sessions to learn the latest directly from Googlers
- the latest Google Cloud news right in your inbox
- Digital badge and video conference background
- 500 credits of labs and learning on Skills Boost
Click here to register!