Migrating from App Engine Blobstore to Cloud Storage (Module 16)

1. Overview

This series of codelabs (self-paced, hands-on tutorials) aims to help Google App Engine (standard environment) developers modernize their apps. Each tutorial guides users through a series of migrations, primarily moving away from legacy bundled services. The net effect is to make apps more portable, giving users more options and flexibility. Some of these options include updating apps to the latest App Engine runtimes, switching to Cloud Functions, Cloud Run, Google Kubernetes Engine, or elsewhere if desired

This codelab teaches you how to migrate from App Engine Blobstore to Cloud Storage. There are also implicit migrations from:

Refer to any related migration modules for more step-by-step information.

You'll learn how to

  • Add use of the App Engine Blobstore API/library
  • Store user uploads to the Blobstore service
  • Prepare for next step to migrate to Cloud Storage

What you'll need

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Background

This codelab starts with the sample app from Module 15 and demonstrates how to migrate from Blobstore (and NDB) to Cloud Storage (and Cloud NDB). The migration process involves replacing dependencies on App Engine's legacy bundled services, which allow you to move your apps to another Cloud serverless platform or other hosting platform if desired.

This migration requires a bit more effort compared to the other migrations in this series. Blobstore has dependencies on the original webapp framework, and it is why the sample app uses the webapp2 framework instead of Flask. This tutorial features migrations to Cloud Storage, Cloud NDB, Flask, and Python 3.

The app still registers end-user "visits" and displays the ten most recent, but the previous (Module 15) codelab added new functionality to accommodate for Blobstore usage: the app prompts end-users to upload an artifact (a file) corresponding to their "visit." Users can do so or select "skip" to opt-out. Regardless of the user's decision, the next page renders the same output as previous incarnations of this app, displaying the most recent visits. One additional twist is that visits with corresponding artifacts feature a "view" link for displaying a visit's artifact. This codelab implements the migrations mentioned earlier while preserving the described functionality.

3. Setup/Prework

Before we get to the main part of the tutorial, let's set up our project, get the code, then deploy the baseline app so we know we started with working code.

1. Setup project

If you deployed the Module 15 app already, we recommend reusing that same project (and code). Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine is enabled.

2. Get baseline sample app

One of the prerequisites to this codelab is to have a working Module 15 sample app. If you don't have it, you can get it from the Module 15 "START" folder (link below). This codelab walks you through each step, concluding with code that resembles what's in the Module 16 "FINISH" folder.

The directory of Module 15 STARTing files should look like this:

$ ls
README.md       app.yaml        main-gcs.py     main.py         templates

The main-gcs.py file is an alternative version of main.py from Module 15 allowing for selection of a Cloud Storage bucket differing from the default of an app's assigned URL based on the project's ID: PROJECT_ID.appspot.com. This file plays no part in this (Module 16) codelab other than similar migration techniques can be applied to that file if desired.

3. (Re)Deploy baseline app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool
  2. Re-deploy the sample app with gcloud app deploy
  3. Confirm the app runs on App Engine without issue

Once you've successfully executed those steps and confirm your Module 15 app works. The initial page greets users with a form prompting for a visit artifact file to upload along with an option, a "skip" button, to opt out:

f5b5f9f19d8ae978.png

Once users upload a file or skip, the app renders the familiar "most recent visits" page:

f5ac6b98ee8a34cb.png

Visits featuring an artifact will have a "view" link to the right of the visit timestamp to display (or download) the artifact. Once you confirm the app's functionality, you're ready to migrate away from App Engine legacy services (webapp2, NDB, Blobstore) to contemporary alternatives (Flask, Cloud NDB, Cloud Storage).

4. Update configuration files

Three configuration files come into play for the updated version of our app. The required tasks are:

  1. Update required built-in 3rd-party libraries in app.yaml as well as leave the door open to a Python 3 migration
  2. Add a requirements.txt, that specifies all of the required libraries that are not built-in
  3. Add appengine_config.py so the app supports both built-in and non-built-in 3rd-party libraries

app.yaml

Edit your app.yaml file by updating the libraries section. Remove jinja2 and add grpcio, setuptools, and ssl. Choose the latest version available for all three libraries. Also add the Python 3 runtime directive, but commented out. When you're done, it should look like this (if you selected Python 3.9):

BEFORE:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: jinja2
  version: latest

AFTER:

#runtime: python39
runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: latest
- name: setuptools
  version: latest
- name: ssl
  version: latest

The changes primarily deal with the Python 2 built-in libraries available on App Engine servers (so you don't have to self-bundle them). We removed Jinja2 because it comes with Flask, which we're going to add to reqs.txt. Whenever Google Cloud client libraries, such as those for Cloud NDB and Cloud Storage, are used, grpcio and setuptools are needed. Finally, Cloud Storage itself requires the ssl library. The commented out runtime directive at the top is for when you're ready to port this app to Python 3. We'll cover this topic at the end of this tutorial.

requirements.txt

Add a requirements.txt file, requiring the Flask framework, and the Cloud NDB and Cloud Storage client libraries, none of which are built-in. Create the file with this content:

flask
google-cloud-ndb
google-cloud-storage

The Python 2 App Engine runtime requires self-bundling of non-built-in 3rd-party libraries, so execute the following command to install these libraries into the lib folder:

pip install -t lib -r requirements.txt

If you have both Python 2 and 3 on your development machine, you may have to use the pip2 command to ensure getting the Python 2 versions of these libraries. Once you upgrade to Python 3, you no longer need to self-bundle.

appengine_config.py

Add an appengine_config.py file supporting built-in and non-built-in 3rd-party libraries. Create the file with this content:

import pkg_resources
from google.appengine.ext import vendor

# Set PATH to your libraries folder.
PATH = 'lib'
# Add libraries installed in the PATH folder.
vendor.add(PATH)
# Add libraries to pkg_resources working set to find the distribution.
pkg_resources.working_set.add_entry(PATH)

The steps just completed should be similar or identical to the steps listed on the Installing libraries for Python 2 apps section of the App Engine docs, and more specifically, the contents of appengine_config.py should match what's in Step 5 there.

The work on configuration files is complete, so let's move ahead to the application.

5. Modify application files

Imports

The first set of changes for main.py include swapping out all the stuff being replaced. Here is what is changing:

  1. webapp2 is replaced by Flask
  2. Instead of using Jinja2 from webapp2_extras, use the Jinja2 that comes with Flask
  3. App Engine Blobstore and NDB are replaced by Cloud NDB and Cloud Storage
  4. The Blobstore handlers in webapp are replaced by a combo of the io standard library module, Flask, and werkzeug utilities
  5. By default, Blobstore writes to a Cloud Storage bucket named after your app's URL (PROJECT_ID.appspot.com). Because we're porting to the Cloud Storage client library, google.auth is used to get the project ID to specify the exact same bucket name. (You can change the bucket name since it isn't hardcoded any more.)

BEFORE:

import webapp2
from webapp2_extras import jinja2
from google.appengine.ext import blobstore, ndb
from google.appengine.ext.webapp import blobstore_handlers

Implement the changes in the list above by replacing the current import section in main.py with the below code snippet.

AFTER:

import io

from flask import (Flask, abort, redirect, render_template,
        request, send_file, url_for)
from werkzeug.utils import secure_filename

import google.auth
from google.cloud import exceptions, ndb, storage

Initialization and unnecessary Jinja2 support

The next block of code to replace is the BaseHandler specifying the use of Jinja2 from webapp2_extras. This is unnecessary because Jinja2 comes with Flask and is its default templating engine, so remove it.

On the Module 16 side, we need to instantiate objects we didn't have in the older app. This includes initializing the Flask app and creating API clients for Cloud NDB and Cloud Storage. Finally, we put together the Cloud Storage bucket name as described above in the imports section. Here are the before and after implementing these updates:

BEFORE:

class BaseHandler(webapp2.RequestHandler):
    'Derived request handler mixing-in Jinja2 support'
    @webapp2.cached_property
    def jinja2(self):
        return jinja2.get_jinja2(app=self.app)

    def render_response(self, _template, **context):
        self.response.write(self.jinja2.render_template(_template, **context))

AFTER:

app = Flask(__name__)
ds_client = ndb.Client()
gcs_client = storage.Client()
_, PROJECT_ID = google.auth.default()
BUCKET = '%s.appspot.com' % PROJECT_ID

Update Datastore access

Cloud NDB is mostly compatible with App Engine NDB. One difference already covered is the need for an API client. Another is the latter requires Datastore access be controlled by the API client's Python context manager. Essentially, this means all Datastore access calls using the Cloud NDB client library can only occur within Python with blocks.

That's one change; the other is that Blobstore and its objects, e.g., BlobKeys, are not supported by Cloud Storage, so we need to change the file_blob to be an ndb.StringProperty instead. Below are the data model class and the updated store_visit() and fetch_visits() functions reflecting these changes:

BEFORE:

class Visit(ndb.Model):
    'Visit entity registers visitor IP address & timestamp'
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)
    file_blob = ndb.BlobKeyProperty()

def store_visit(remote_addr, user_agent, upload_key):
    'create new Visit entity in Datastore'
    Visit(visitor='{}: {}'.format(remote_addr, user_agent),
            file_blob=upload_key).put()

def fetch_visits(limit):
    'get most recent visits'
    return Visit.query().order(-Visit.timestamp).fetch(limit)

AFTER:

class Visit(ndb.Model):
    'Visit entity registers visitor IP address & timestamp'
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)
    file_blob = ndb.StringProperty()

def store_visit(remote_addr, user_agent, upload_key):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent),
                file_blob=upload_key).put()

def fetch_visits(limit):
    'get most recent visits'
    with ds_client.context():
        return Visit.query().order(-Visit.timestamp).fetch(limit)

Here is a pictorial representation of the changes that have been made so far:

a8f74ca392275822.png

Updating the handlers

Upload handler

Handlers in webapp2 are classes while they're functions in Flask. Instead of an HTTP verb method, Flask uses the verb to decorate the function. Blobstore and its webapp handlers are replaced by functionality from Cloud Storage as well as Flask and its utilities:

BEFORE:

class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
    'Upload blob (POST) handler'
    def post(self):
        uploads = self.get_uploads()
        blob_id = uploads[0].key() if uploads else None
        store_visit(self.request.remote_addr, self.request.user_agent, blob_id)
        self.redirect('/', code=307)

AFTER:

@app.route('/upload', methods=['POST'])
def upload():
    'Upload blob (POST) handler'
    fname = None
    upload = request.files.get('file', None)
    if upload:
        fname = secure_filename(upload.filename)
        blob = gcs_client.bucket(BUCKET).blob(fname)
        blob.upload_from_file(upload, content_type=upload.content_type)
    store_visit(request.remote_addr, request.user_agent, fname)
    return redirect(url_for('root'), code=307)

Some notes regarding this update:

  • Rather than a blob_id, file artifacts are now identified by filename (fname) if present, and None otherwise (user opted out of uploading a file).
  • The Blobstore handlers abstracted away the upload process from its users, but Cloud Storage does not, so you can see the newly-added code that sets the file's blob object and location (bucket) as well as the call that performs the actual upload. (upload_from_file()).
  • webapp2 uses a routing table at the bottom of the application file while Flask routes are found in each decorated handler.
  • Both handlers wrap-up their functionality by redirecting to home ( / ) while preserving the POST request with an HTTP 307 return code.

Download handler

Updating the download handler follows a similar pattern to the upload handler, only there's much less code to look at. Replace Blobstore and webapp functionality with the Cloud Storage and Flask equivalents:

BEFORE:

class ViewBlobHandler(blobstore_handlers.BlobstoreDownloadHandler):
    'view uploaded blob (GET) handler'
    def get(self, blob_key):
        self.send_blob(blob_key) if blobstore.get(blob_key) else self.error(404)

AFTER:

@app.route('/view/<path:fname>')
def view(fname):
    'view uploaded blob (GET) handler'
    blob = gcs_client.bucket(BUCKET).blob(fname)
    try:
        media = blob.download_as_bytes()
    except exceptions.NotFound:
        abort(404)
    return send_file(io.BytesIO(media), mimetype=blob.content_type)

Notes on this update:

  • Again, Flask decorates handler functions with their route while webapp does it in a routing table at the bottom, so recognize the latter's pattern matching syntax ('/view/([^/]+)?') vs. Flask's ('/view/<path:fname>').
  • As with the upload handler, there is a little more work required on the Cloud Storage side for functionality abstracted away by the Blobstore handlers, namely identifying the file (blob) in question and explicitly downloading the binary vs. Blobstore handler's single send_blob() method call.
  • In both cases, an HTTP 404 error is returned to the user if an artifact isn't found.

Main handler

The final changes to the main application take place in the main handler. The webapp2 HTTP verb methods are replaced by a single function combining their functionality. Replace the MainHandler class with the root() function and remove the webapp2 routing table as shown below:

BEFORE:

class MainHandler(BaseHandler):
    'main application (GET/POST) handler'
    def get(self):
        self.render_response('index.html',
                upload_url=blobstore.create_upload_url('/upload'))

    def post(self):
        visits = fetch_visits(10)
        self.render_response('index.html', visits=visits)

app = webapp2.WSGIApplication([
    ('/', MainHandler),
    ('/upload', UploadHandler),
    ('/view/([^/]+)?', ViewBlobHandler),
], debug=True)

AFTER:

@app.route('/', methods=['GET', 'POST'])
def root():
    'main application (GET/POST) handler'
    context = {}
    if request.method == 'GET':
        context['upload_url'] = url_for('upload')
    else:
        context['visits'] = fetch_visits(10)
    return render_template('index.html', **context)

Rather than separate get() and post() methods, they're essentially an if-else statement in root(). Also, because root() is a single function, there's only one call to render the template for both GET and POST whereas it's not really possible in webapp2.

Here is a pictorial representation of this second and final set of changes to main.py:

5ec38818c32fec2.png

(optional) Backwards compatibility "enhancement"

So the solution created above works perfectly... but only if you're starting from scratch and don't have files created by Blobstore. Because we updated the app to identify files by filename instead of BlobKey, the completed Module 16 app as-is won't be able to view Blobstore files. In other words, we made a backwards-incompatible change performing this migration. We now present an alternative version of main.py called main-migrate.py (found in the repo) which attempts to bridge this gap.

The first "extension" to support Blobstore created files is a data model that has a BlobKeyProperty (in addition to a StringProperty for Cloud Storage-created files):

class Visit(ndb.Model):
    'Visit entity registers visitor IP address & timestamp'
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)
    file_blob = ndb.BlobKeyProperty()  # backwards-compatibility
    file_gcs  = ndb.StringProperty()

The file_blob property will be used to identify Blobstore-created files while file_gcs is for Cloud Storage files. Now when creating new visits, we need to explicitly store a value in file_gcs instead of file_blob, so store_visit looks a little different:

BEFORE:

def store_visit(remote_addr, user_agent, upload_key):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent),
                file_blob=upload_key).put()

AFTER:

def store_visit(remote_addr, user_agent, upload_key):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent),
                file_gcs=upload_key).put()

When fetching the most recent visits, we need to massage the data before sending it to the template:

BEFORE:

@app.route('/', methods=['GET', 'POST'])
def root():
    'main application (GET/POST) handler'
    context = {}
    if request.method == 'GET':
        context['upload_url'] = url_for('upload')
    else:
        context['visits'] = fetch_visits(10)
    return render_template('index.html', **context)

AFTER:

@app.route('/', methods=['GET', 'POST'])
def root():
    'main application (GET/POST) handler'
    context = {}
    if request.method == 'GET':
        context['upload_url'] = url_for('upload')
    else:
        context['visits'] = etl_visits(fetch_visits(10))
    return render_template('index.html', **context)

The reason is because we need to check whether file_blob or file_gcs exists (or neither). If there is a file available, pick the one that exists and use that identifier (BlobKey for Blobstore-created files or filename for Cloud Storage-created files). When we say "Cloud Storage-created files," we mean files created using the Cloud Storage client library. Blobstore also writes to Cloud Storage also, but in this case, those would be Blobstore-created files.

Now more importantly, what is this etl_visits() function that's used to "massage," "normalize," or extract, transform, and load (ETL) our data for the end-user? It looks like this:

def etl_visits(visits):
    return [{
            'visitor': v.visitor,
            'timestamp': v.timestamp,
            'file_blob': v.file_gcs if hasattr(v, 'file_gcs') \
                    and v.file_gcs else v.file_blob
            } for v in visits]

It probably looks like what you expected: the code loops through all visits, and for each visit, takes the visitor and timestamp data verbatim, then checks to see if file_gcs or file_blob exists, and if so, picking one of them (or None if neither exist).

Here's an illustration of the differences between main.py and main-migrate.py:

718b05b2adadb2e1.png

If you're starting from scratch without Blobstore-created files, use main.py, but if you're transitioning and desire supporting files created by both Blobstore and Cloud Storage, check out main-migrate.py as an example of how to deal with scenario like to help you plan migrations for your own apps. When doing complex migrations, special cases are likely to arise, so this example is meant to show a greater affinity for modernizing real apps with real data.

6. Summary/Cleanup

Deploy application

Before redeploying your app, be sure to run pip install -t lib -r requirements.txt to get those self-bundled 3rd-party libraries in the lib folder. If you want to run the backwards-compatible solution, rename main-migrate.py as main.py first. Now run gcloud app deploy, and confirm the app works identically to the Module 15 app. The form screen looks like this:

f5b5f9f19d8ae978.png

The most recent visits page looks like this:

f5ac6b98ee8a34cb.png

Congratulations for completing this codelab replacing App Engine Blobstore with Cloud Storage, App Engine NDB with Cloud NDB, and webapp2 with Flask. Your code should now match what's in the FINISH (Module 16) folder. The alternative main-migrate.py is also present in that folder.

Python 3 "migration"

The commented out Python 3 runtime directive at the top of app.yaml is for when you port this app to Python 3. The source code itself is already Python 3 compatible, so no changes are needed there. To deploy this as a Python 3 app, execute the following steps:

  1. Uncomment the Python 3 runtime directive at the top of app.yaml.
  2. Delete all the other lines in app.yaml.
  3. Delete the appengine_config.py file. (unused in Python 3 runtime)
  4. Delete the lib folder if it exists. (unnecessary with Python 3 runtime)

Clean up

If you are done for now, we recommend you disable your App Engine app to avoid incurring billing. However, if you want to play around with it a bit more, that is fine too. The App Engine platform has a free quota, and as long as you don't exceed that usage tier, you shouldn't be charged. That's for compute, however each App Engine service has its own billing schedule as well:

  • The App Engine Blobstore service falls under Stored Data quotas and limits, so review that.
  • The App Engine Datastore service has a free tier as well, but will charge you if you go beyond those limits. See its pricing page for more information.

For full disclosure, deploying to a Google Cloud serverless compute platform like App Engine incurs minor build and storage costs. Cloud Build has its own free quota as does Cloud Storage. Storage of that image uses up some of that quota. However, you might live in a region that does not have such a free tier, so be aware of your storage usage to minimize potential costs.

On the other hand, if you're not going to continue with this application or other related migration codelabs and want to delete everything completely, you can shut down your project.

Next steps

Beyond this tutorial, other migration modules that focus on moving away from the legacy bundled services you should consider include:

  • Module 2: migrate from App Engine ndb to Cloud NDB
  • Modules 7-9: migrate from App Engine push tasks (taskqueue) to Cloud Tasks
  • Module 12-13: migrate from App Engine Memcache to Cloud Memorystore

If instead, you're considering moving to one of App Engine's sister serverless platforms, see:

  • Module 11: migrate from App Engine to Cloud Functions
  • Migrate from App Engine to Cloud Run: see Module 4 if you want to containerize your app for Cloud Run with Docker, or Module 5 if you don't do containers or Dockerfiles

Regardless of which migration module you consider next, the repo features all the code samples, links you to all the codelabs and videos available, and also provides guidance on which migrations to consider and any relevant "order" of migrations.

7. Additional resources

Codelab issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 15 (START) and Module 16 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations which you can clone or download a ZIP file.

Codelab

Python 2

Python 3

Module 15

code

N/A

Module 16 (this codelab)

code

(same as Python 2)

Online resources

Below are online resources which may be relevant for this tutorial:

App Engine general documentation

App Engine Blobstore service and Cloud Storage

Google Cloud

Python

Videos

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.