1. Overview
The Serverless Migration Station series of codelabs (self-paced, hands-on tutorials) and related videos aim to help Google Cloud serverless developers modernize their appications by guiding them through one or more migrations, primarily moving away from legacy services. Doing so makes your apps more portable and gives you more options and flexibility, enabling you to integrate with and access a wider range of Cloud products and more easily upgrade to newer language releases. While initially focusing on the earliest Cloud users, primarily App Engine (standard environment) developers, this series is broad enough to include other serverless platforms like Cloud Functions and Cloud Run, or elsewhere if applicable.
This codelab teaches you how to migrate from App Engine Blobstore to Cloud Storage. There are also implicit migrations from:
webapp2
web framework to Flask (covered by Module 1)- App Engine NDB to Cloud NDB for Datastore access (covered by Module 2)
- Python 2 to 3 (the migrated app is both Python 2 & 3 compatible)
Refer to any related migration modules for more step-by-step information.
You'll learn how to
- Add use of the App Engine Blobstore API/library
- Store user uploads to the Blobstore service
- Prepare for next step to migrate to Cloud Storage
What you'll need
- A Google Cloud Platform project with an active GCP billing account
- Basic Python skills
- Working knowledge of common Linux commands
- Basic knowledge of developing and deploying App Engine apps
- A working Module 15 App Engine app: complete the Module 15 codelab (recommended) or copy the Module 15 app from the repo
Survey
How will you use this tutorial?
How would you rate your experience with Python?
How would you rate your experience with using Google Cloud services?
2. Background
This codelab starts with the sample app from Module 15 and demonstrates how to migrate from Blobstore (and NDB) to Cloud Storage (and Cloud NDB). The migration process involves replacing dependencies on App Engine's legacy bundled services, which allow you to move your apps to another Cloud serverless platform or other hosting platform if desired.
This migration requires a bit more effort compared to the other migrations in this series. Blobstore has dependencies on the original webapp framework, and it is why the sample app uses the webapp2 framework instead of Flask. This tutorial features migrations to Cloud Storage, Cloud NDB, Flask, and Python 3.
The app still registers end-user "visits" and displays the ten most recent, but the previous (Module 15) codelab added new functionality to accommodate for Blobstore usage: the app prompts end-users to upload an artifact (a file) corresponding to their "visit." Users can do so or select "skip" to opt-out. Regardless of the user's decision, the next page renders the same output as previous incarnations of this app, displaying the most recent visits. One additional twist is that visits with corresponding artifacts feature a "view" link for displaying a visit's artifact. This codelab implements the migrations mentioned earlier while preserving the described functionality.
3. Setup/Prework
Before we get to the main part of the tutorial, let's set up our project, get the code, then deploy the baseline app so we know we started with working code.
1. Setup project
If you deployed the Module 15 app already, we recommend reusing that same project (and code). Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine is enabled.
2. Get baseline sample app
One of the prerequisites to this codelab is to have a working Module 15 sample app. If you don't have it, you can get it from the Module 15 "START" folder (link below). This codelab walks you through each step, concluding with code that resembles what's in the Module 16 "FINISH" folder.
- START: Module 15 folder (Python 2)
- FINISH: Module 16 folder (Python 2)
- Entire repo (to clone or download ZIP file)
The directory of Module 15 STARTing files should look like this:
$ ls README.md app.yaml main-gcs.py main.py templates
The main-gcs.py
file is an alternative version of main.py
from Module 15 allowing for selection of a Cloud Storage bucket differing from the default of an app's assigned URL based on the project's ID: PROJECT_ID
.appspot.com
. This file plays no part in this (Module 16) codelab other than similar migration techniques can be applied to that file if desired.
3. (Re)Deploy baseline app
Your remaining prework steps to execute now:
- Re-familiarize yourself with the
gcloud
command-line tool - Re-deploy the sample app with
gcloud app deploy
- Confirm the app runs on App Engine without issue
Once you've successfully executed those steps and confirm your Module 15 app works. The initial page greets users with a form prompting for a visit artifact file to upload along with an option, a "skip" button, to opt out:
Once users upload a file or skip, the app renders the familiar "most recent visits" page:
Visits featuring an artifact will have a "view" link to the right of the visit timestamp to display (or download) the artifact. Once you confirm the app's functionality, you're ready to migrate away from App Engine legacy services (webapp2, NDB, Blobstore) to contemporary alternatives (Flask, Cloud NDB, Cloud Storage).
4. Update configuration files
Three configuration files come into play for the updated version of our app. The required tasks are:
- Update required built-in 3rd-party libraries in
app.yaml
as well as leave the door open to a Python 3 migration - Add a
requirements.txt
, that specifies all of the required libraries that are not built-in - Add
appengine_config.py
so the app supports both built-in and non-built-in 3rd-party libraries
app.yaml
Edit your app.yaml
file by updating the libraries
section. Remove jinja2
and add grpcio
, setuptools
, and ssl
. Choose the latest version available for all three libraries. Also add the Python 3 runtime
directive, but commented out. When you're done, it should look like this (if you selected Python 3.9):
BEFORE:
runtime: python27
threadsafe: yes
api_version: 1
handlers:
- url: /.*
script: main.app
libraries:
- name: jinja2
version: latest
AFTER:
#runtime: python39
runtime: python27
threadsafe: yes
api_version: 1
handlers:
- url: /.*
script: main.app
libraries:
- name: grpcio
version: latest
- name: setuptools
version: latest
- name: ssl
version: latest
The changes primarily deal with the Python 2 built-in libraries available on App Engine servers (so you don't have to self-bundle them). We removed Jinja2 because it comes with Flask, which we're going to add to reqs.txt. Whenever Google Cloud client libraries, such as those for Cloud NDB and Cloud Storage, are used, grpcio and setuptools are needed. Finally, Cloud Storage itself requires the ssl library. The commented out runtime directive at the top is for when you're ready to port this app to Python 3. We'll cover this topic at the end of this tutorial.
requirements.txt
Add a requirements.txt
file, requiring the Flask framework, and the Cloud NDB and Cloud Storage client libraries, none of which are built-in. Create the file with this content:
flask
google-cloud-ndb
google-cloud-storage
The Python 2 App Engine runtime requires self-bundling of non-built-in 3rd-party libraries, so execute the following command to install these libraries into the lib folder:
pip install -t lib -r requirements.txt
If you have both Python 2 and 3 on your development machine, you may have to use the pip2 command to ensure getting the Python 2 versions of these libraries. Once you upgrade to Python 3, you no longer need to self-bundle.
appengine_config.py
Add an appengine_config.py
file supporting built-in and non-built-in 3rd-party libraries. Create the file with this content:
import pkg_resources
from google.appengine.ext import vendor
# Set PATH to your libraries folder.
PATH = 'lib'
# Add libraries installed in the PATH folder.
vendor.add(PATH)
# Add libraries to pkg_resources working set to find the distribution.
pkg_resources.working_set.add_entry(PATH)
The steps just completed should be similar or identical to the steps listed on the Installing libraries for Python 2 apps section of the App Engine docs, and more specifically, the contents of appengine_config.py
should match what's in Step 5 there.
The work on configuration files is complete, so let's move ahead to the application.
5. Modify application files
Imports
The first set of changes for main.py
include swapping out all the stuff being replaced. Here is what is changing:
webapp2
is replaced by Flask- Instead of using Jinja2 from
webapp2_extras
, use the Jinja2 that comes with Flask - App Engine Blobstore and NDB are replaced by Cloud NDB and Cloud Storage
- The Blobstore handlers in
webapp
are replaced by a combo of theio
standard library module, Flask, andwerkzeug
utilities - By default, Blobstore writes to a Cloud Storage bucket named after your app's URL (
PROJECT_ID.appspot.com
). Because we're porting to the Cloud Storage client library,google.auth
is used to get the project ID to specify the exact same bucket name. (You can change the bucket name since it isn't hardcoded any more.)
BEFORE:
import webapp2
from webapp2_extras import jinja2
from google.appengine.ext import blobstore, ndb
from google.appengine.ext.webapp import blobstore_handlers
Implement the changes in the list above by replacing the current import section in main.py
with the below code snippet.
AFTER:
import io
from flask import (Flask, abort, redirect, render_template,
request, send_file, url_for)
from werkzeug.utils import secure_filename
import google.auth
from google.cloud import exceptions, ndb, storage
Initialization and unnecessary Jinja2 support
The next block of code to replace is the BaseHandler
specifying the use of Jinja2 from webapp2_extras
. This is unnecessary because Jinja2 comes with Flask and is its default templating engine, so remove it.
On the Module 16 side, instantiate objects we didn't have in the older app. This includes initializing the Flask app and creating API clients for Cloud NDB and Cloud Storage. Finally, we put together the Cloud Storage bucket name as described above in the imports section. Here are the before and after implementing these updates:
BEFORE:
class BaseHandler(webapp2.RequestHandler):
'Derived request handler mixing-in Jinja2 support'
@webapp2.cached_property
def jinja2(self):
return jinja2.get_jinja2(app=self.app)
def render_response(self, _template, **context):
self.response.write(self.jinja2.render_template(_template, **context))
AFTER:
app = Flask(__name__)
ds_client = ndb.Client()
gcs_client = storage.Client()
_, PROJECT_ID = google.auth.default()
BUCKET = '%s.appspot.com' % PROJECT_ID
Update Datastore access
Cloud NDB is mostly compatible with App Engine NDB. One difference already covered is the need for an API client. Another is the latter requires Datastore access be controlled by the API client's Python context manager. Essentially, this means all Datastore access calls using the Cloud NDB client library can only occur within Python with
blocks.
That's one change; the other is that Blobstore and its objects, e.g., BlobKey
s, are not supported by Cloud Storage, so change the file_blob
to be an ndb.StringProperty
instead. Below are the data model class and the updated store_visit()
and fetch_visits()
functions reflecting these changes:
BEFORE:
class Visit(ndb.Model):
'Visit entity registers visitor IP address & timestamp'
visitor = ndb.StringProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
file_blob = ndb.BlobKeyProperty()
def store_visit(remote_addr, user_agent, upload_key):
'create new Visit entity in Datastore'
Visit(visitor='{}: {}'.format(remote_addr, user_agent),
file_blob=upload_key).put()
def fetch_visits(limit):
'get most recent visits'
return Visit.query().order(-Visit.timestamp).fetch(limit)
AFTER:
class Visit(ndb.Model):
'Visit entity registers visitor IP address & timestamp'
visitor = ndb.StringProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
file_blob = ndb.StringProperty()
def store_visit(remote_addr, user_agent, upload_key):
'create new Visit entity in Datastore'
with ds_client.context():
Visit(visitor='{}: {}'.format(remote_addr, user_agent),
file_blob=upload_key).put()
def fetch_visits(limit):
'get most recent visits'
with ds_client.context():
return Visit.query().order(-Visit.timestamp).fetch(limit)
Here is a pictorial representation of the changes that have been made so far:
Updating the handlers
Upload handler
Handlers in webapp2
are classes while they're functions in Flask. Instead of an HTTP verb method, Flask uses the verb to decorate the function. Blobstore and its webapp
handlers are replaced by functionality from Cloud Storage as well as Flask and its utilities:
BEFORE:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
'Upload blob (POST) handler'
def post(self):
uploads = self.get_uploads()
blob_id = uploads[0].key() if uploads else None
store_visit(self.request.remote_addr, self.request.user_agent, blob_id)
self.redirect('/', code=307)
AFTER:
@app.route('/upload', methods=['POST'])
def upload():
'Upload blob (POST) handler'
fname = None
upload = request.files.get('file', None)
if upload:
fname = secure_filename(upload.filename)
blob = gcs_client.bucket(BUCKET).blob(fname)
blob.upload_from_file(upload, content_type=upload.content_type)
store_visit(request.remote_addr, request.user_agent, fname)
return redirect(url_for('root'), code=307)
Some notes regarding this update:
- Rather than a
blob_id
, file artifacts are now identified by filename (fname
) if present, andNone
otherwise (user opted out of uploading a file). - The Blobstore handlers abstracted away the upload process from its users, but Cloud Storage does not, so you can see the newly-added code that sets the file's blob object and location (bucket) as well as the call that performs the actual upload. (
upload_from_file()
). webapp2
uses a routing table at the bottom of the application file while Flask routes are found in each decorated handler.- Both handlers wrap-up their functionality by redirecting to home (
/
) while preserving thePOST
request with an HTTP 307 return code.
Download handler
Updating the download handler follows a similar pattern to the upload handler, only there's much less code to look at. Replace Blobstore and webapp
functionality with the Cloud Storage and Flask equivalents:
BEFORE:
class ViewBlobHandler(blobstore_handlers.BlobstoreDownloadHandler):
'view uploaded blob (GET) handler'
def get(self, blob_key):
self.send_blob(blob_key) if blobstore.get(blob_key) else self.error(404)
AFTER:
@app.route('/view/<path:fname>')
def view(fname):
'view uploaded blob (GET) handler'
blob = gcs_client.bucket(BUCKET).blob(fname)
try:
media = blob.download_as_bytes()
except exceptions.NotFound:
abort(404)
return send_file(io.BytesIO(media), mimetype=blob.content_type)
Notes on this update:
- Again, Flask decorates handler functions with their route while
webapp
does it in a routing table at the bottom, so recognize the latter's pattern matching syntax('/view/([^/]+)?'
) vs. Flask's ('/view/<path:fname>'
). - As with the upload handler, there is a little more work required on the Cloud Storage side for functionality abstracted away by the Blobstore handlers, namely identifying the file (blob) in question and explicitly downloading the binary vs. Blobstore handler's single
send_blob()
method call. - In both cases, an HTTP 404 error is returned to the user if an artifact isn't found.
Main handler
The final changes to the main application take place in the main handler. The webapp2
HTTP verb methods are replaced by a single function combining their functionality. Replace the MainHandler
class with the root()
function and remove the webapp2
routing table as shown below:
BEFORE:
class MainHandler(BaseHandler):
'main application (GET/POST) handler'
def get(self):
self.render_response('index.html',
upload_url=blobstore.create_upload_url('/upload'))
def post(self):
visits = fetch_visits(10)
self.render_response('index.html', visits=visits)
app = webapp2.WSGIApplication([
('/', MainHandler),
('/upload', UploadHandler),
('/view/([^/]+)?', ViewBlobHandler),
], debug=True)
AFTER:
@app.route('/', methods=['GET', 'POST'])
def root():
'main application (GET/POST) handler'
context = {}
if request.method == 'GET':
context['upload_url'] = url_for('upload')
else:
context['visits'] = fetch_visits(10)
return render_template('index.html', **context)
Rather than separate get()
and post()
methods, they're essentially an if-else
statement in root()
. Also, because root()
is a single function, there's only one call to render the template for both GET
and POST
whereas it's not really possible in webapp2
.
Here is a pictorial representation of this second and final set of changes to main.py
:
(optional) Backwards compatibility "enhancement"
So the solution created above works perfectly... but only if you're starting from scratch and don't have files created by Blobstore. Because we updated the app to identify files by filename instead of BlobKey
, the completed Module 16 app as-is won't be able to view Blobstore files. In other words, we made a backwards-incompatible change performing this migration. We now present an alternative version of main.py
called main-migrate.py
(found in the repo) which attempts to bridge this gap.
The first "extension" to support Blobstore created files is a data model that has a BlobKeyProperty
(in addition to a StringProperty
for Cloud Storage-created files):
class Visit(ndb.Model):
'Visit entity registers visitor IP address & timestamp'
visitor = ndb.StringProperty()
timestamp = ndb.DateTimeProperty(auto_now_add=True)
file_blob = ndb.BlobKeyProperty() # backwards-compatibility
file_gcs = ndb.StringProperty()
The file_blob
property will be used to identify Blobstore-created files while file_gcs
is for Cloud Storage files. Now when creating new visits, explicitly store a value in file_gcs
instead of file_blob
, so store_visit looks a little different:
BEFORE:
def store_visit(remote_addr, user_agent, upload_key):
'create new Visit entity in Datastore'
with ds_client.context():
Visit(visitor='{}: {}'.format(remote_addr, user_agent),
file_blob=upload_key).put()
AFTER:
def store_visit(remote_addr, user_agent, upload_key):
'create new Visit entity in Datastore'
with ds_client.context():
Visit(visitor='{}: {}'.format(remote_addr, user_agent),
file_gcs=upload_key).put()
When fetching the most recent visits, "normalize" the data before sending it to the template:
BEFORE:
@app.route('/', methods=['GET', 'POST'])
def root():
'main application (GET/POST) handler'
context = {}
if request.method == 'GET':
context['upload_url'] = url_for('upload')
else:
context['visits'] = fetch_visits(10)
return render_template('index.html', **context)
AFTER:
@app.route('/', methods=['GET', 'POST'])
def root():
'main application (GET/POST) handler'
context = {}
if request.method == 'GET':
context['upload_url'] = url_for('upload')
else:
context['visits'] = etl_visits(fetch_visits(10))
return render_template('index.html', **context)
Next, confirm the existence of either file_blob
or file_gcs
(or neither). If there is a file available, pick the one that exists and use that identifier (BlobKey
for Blobstore-created files or filename for Cloud Storage-created files). When we say "Cloud Storage-created files," we mean files created using the Cloud Storage client library. Blobstore also writes to Cloud Storage also, but in this case, those would be Blobstore-created files.
Now more importantly, what is this etl_visits()
function that's used to normalize or ETL (extract, transform, and load) the data for the end-user? It looks like this:
def etl_visits(visits):
return [{
'visitor': v.visitor,
'timestamp': v.timestamp,
'file_blob': v.file_gcs if hasattr(v, 'file_gcs') \
and v.file_gcs else v.file_blob
} for v in visits]
It probably looks like what you expected: the code loops through all visits, and for each visit, takes the visitor and timestamp data verbatim, then checks to see if file_gcs
or file_blob
exists, and if so, picking one of them (or None
if neither exist).
Here's an illustration of the differences between main.py
and main-migrate.py
:
If you're starting from scratch without Blobstore-created files, use main.py
, but if you're transitioning and desire supporting files created by both Blobstore and Cloud Storage, check out main-migrate.py
as an example of how to deal with scenario like to help you plan migrations for your own apps. When doing complex migrations, special cases are likely to arise, so this example is meant to show a greater affinity for modernizing real apps with real data.
6. Summary/Cleanup
This section wraps up this codelab by deploying the app, verifying it works as intended and in any reflected output. After app validation, perform any clean-up steps and consider next steps.
Deploy and verify application
Before redeploying your app, be sure to run pip install -t lib -r requirements.txt
to get those self-bundled 3rd-party libraries in the lib folder. If you want to run the backwards-compatible solution, rename main-migrate.py
as main.py
first. Now run gcloud app deploy
, and confirm the app works identically to the Module 15 app. The form screen looks like this:
The most recent visits page looks like this:
Congratulations for completing this codelab replacing App Engine Blobstore with Cloud Storage, App Engine NDB with Cloud NDB, and webapp2
with Flask. Your code should now match what's in the FINISH (Module 16) folder. The alternative main-migrate.py
is also present in that folder.
Python 3 "migration"
The commented out Python 3 runtime
directive at the top of app.yaml
is all that's needed to port this app to Python 3. The source code itself is already Python 3 compatible, so no changes are needed there. To deploy this as a Python 3 app, execute the following steps:
- Uncomment the Python 3
runtime
directive at the top ofapp.yaml
. - Delete all the other lines in
app.yaml
. - Delete the
appengine_config.py
file. (unused in Python 3 runtime) - Delete the
lib
folder if it exists. (unnecessary with Python 3 runtime)
Clean up
General
If you are done for now, we recommend you disable your App Engine app to avoid incurring billing. However if you wish to test or experiment some more, the App Engine platform has a free quota, and so as long as you don't exceed that usage tier, you shouldn't be charged. That's for compute, but there may also be charges for relevant App Engine services, so check its pricing page for more information. If this migration involves other Cloud services, those are billed separately. In either case, if applicable, see the "Specific to this codelab" section below.
For full disclosure, deploying to a Google Cloud serverless compute platform like App Engine incurs minor build and storage costs. Cloud Build has its own free quota as does Cloud Storage. Storage of that image uses up some of that quota. However, you might live in a region that does not have such a free tier, so be aware of your storage usage to minimize potential costs. Specific Cloud Storage "folders" you should review include:
console.cloud.google.com/storage/browser/LOC.artifacts.PROJECT_ID.appspot.com/containers/images
console.cloud.google.com/storage/browser/staging.PROJECT_ID.appspot.com
- The storage links above depend on your
PROJECT_ID
and *LOC
*ation, for example, "us
" if your app is hosted in the USA.
On the other hand, if you're not going to continue with this application or other related migration codelabs and want to delete everything completely, shut down your project.
Specific to this codelab
The services listed below are unique to this codelab. Refer to each product's documentation for more information:
- The App Engine Blobstore service falls under Stored Data quotas and limits, so review that as well as the pricing page for legacy bundled services.
- Cloud Storage has a free tier for specific regions; also see its general pricing page for more information.
- The App Engine Datastore service is provided by Cloud Datastore (Cloud Firestore in Datastore mode) which also has a free tier; see its pricing page for more information."
Note that if you migrated from Module 15 to 16, you'll still have data in Blobstore, hence why we include its pricing information above.
Next steps
Beyond this tutorial, other migration modules that focus on moving away from the legacy bundled services to consider include:
- Module 2: migrate from App Engine
ndb
to Cloud NDB - Modules 7-9: migrate from App Engine Task Queue push tasks to Cloud Tasks
- Modules 12-13: migrate from App Engine Memcache to Cloud Memorystore
- Modules 18-19: migrate from App Engine Task Queue (pull tasks) to Cloud Pub/Sub
App Engine is no longer the only serverless platform in Google Cloud. If you have a small App Engine app or one that has limited functionality and wish to turn it into a standalone microservice, or you want to break-up a monolithic app into multiple reusable components, these are good reasons to consider moving to Cloud Functions. If containerization has become part of your application development workflow, particularly if it consists of a CI/CD (continuous integration/continuous delivery or deployment) pipeline, consider migrating to Cloud Run. These scenarios are covered by the following modules:
- Migrate from App Engine to Cloud Functions: see Module 11
- Migrate from App Engine to Cloud Run: see Module 4 to containerize your app with Docker, or Module 5 to do it without containers, Docker knowledge, or
Dockerfile
s
Switching to another serverless platform is optional, and we recommend considering the best options for your apps and use cases before making any changes.
Regardless of which migration module you consider next, all Serverless Migration Station content (codelabs, videos, source code [when available]) can be accessed at its open source repo. The repo's README
also provides guidance on which migrations to consider and any relevant "order" of Migration Modules.
7. Additional resources
Codelab issues/feedback
If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:
Migration resources
Links to the repo folders for Module 15 (START) and Module 16 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations which you can clone or download a ZIP file.
Codelab | Python 2 | Python 3 |
Module 15 | N/A | |
Module 16 (this codelab) | (same as Python 2) |
Online resources
Below are online resources which may be relevant for this tutorial:
App Engine Blobstore and Cloud Storage
- App Engine Blobstore service
- Migrating to Cloud Storage client library
- Cloud Storage home page
- Cloud Storage documentation
App Engine platform
- App Engine documentation
- Python 2 App Engine (standard environment) runtime
- Using App Engine built-in libraries on Python 2 App Engine
- Python 3 App Engine (standard environment) runtime
- Differences between Python 2 & 3 App Engine (standard environment) runtimes
- Python 2 to 3 App Engine (standard environment) migration guide
- App Engine pricing and quotas information
- Second generation App Engine platform launch (2018)
- Comparing first & second generation platforms
- Long-term support for legacy runtimes
- Documentation migration samples repo
- Community-contributed migration samples repo
Other Cloud information
- Python on Google Cloud Platform
- Google Cloud Python client libraries
- Google Cloud "Always Free" tier
- Google Cloud SDK (
gcloud
command-line tool) - All Google Cloud documentation
Python
- Django and Jinja2 templating systems
webapp2
web frameworkwebapp2
documentationwebapp2_extras
linkswebapp2_extras
Jinja2 documentation- Flask web framework
Videos
- Serverless Migration Station
- Serverless Expeditions
- Subscribe to Google Cloud Tech
- Subscribe to Google Developers
License
This work is licensed under a Creative Commons Attribution 2.0 Generic License.