Module 9: Migrate a Python 2 App Engine Cloud NDB+Tasks app to Python 3 Cloud Firestore+Tasks

1. Overview

This series of codelabs (self-paced, hands-on tutorials) aims to help Google App Engine (standard environment) developers modernize their apps. Each tutorial guides users through a series of migrations, primarily moving away from legacy bundled services. The net effect is to make apps more portable, giving users many more options, some of which include hosting apps outside of App Engine, updating to the latest App Engine language runtimes, or switching to Cloud Functions or Cloud Run if they are a better fit for your workloads.

This codelab starts with the Module 8 ( codelab, repo) sample app (Python 2, Flask, Cloud NDB, Cloud Tasks) and demonstrates one example "next generation" version of that app. It imagines a scenario where you want to refactor or completely rewrite it in a second generation runtime supported language. While it's not part of the codelab, developing a mobile companion using Firebase is also part of the scenario where it makes sense to move the backend database to Cloud Firestore which has the added advantage of being accessible from Firebase, allowing both your mobile and web apps to use the same data backend. Since Cloud projects only allow for one type of database at this time, the end result is this app refactor requires another Cloud project.

You'll learn how to

  • Port the app to Python 3
  • Switch to Cloud Firestore for common backend data storage
  • Upgrade to the latest Cloud Tasks client library version

What you'll need

Survey

How will you use this tutorial?

Only read through it Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Background

The Module 8 codelab is a tutorial leading developers to migrate from App Engine ndb and taskqueue to their standalone Cloud equivalents, Cloud NDB and Cloud Tasks, respectively. As mentioned in the Overview, this (Module 9) tutorial imagines rewriting that as a brand new app, upgrading from Python 2 to 3, selecting Cloud Firestore as the new backend data storage service, and any minor changes from the final version of the Python 2 Cloud Tasks client library to the latest available for Python 3 developers.

This tutorial features the following steps:

  1. Setup/Prework
  2. Update configuration files
  3. Update main application

3. Setup/Prework

Before tackling any code, get your Cloud projects ready, and ensure you have a functioning Module 8 application.

Setup project

For this tutorial, you must use an existing Cloud project for the START app from Module 8 (whether yours [preferred] or ours). You will also need to create a brand new project in the Cloud Console for the new Module 9 refactoring that you'll build as part of this tutorial. Both projects must have active billing accounts (could be the same one). App Engine must be enabled for both projects as well as Datastore for one and Firestore for the other, and finally, Cloud Tasks for both. Review and be sure you understand the cost implications of these tutorials—while billing must be enabled, you should be able to complete this tutorial at little to no cost. More information is provided at the end giving instructions on releasing resources to minimize/avoid incurring billing.

The Module 8 app will be deployed to the first project, and the new app (refactor/rewrite) will be in the second/new project. Because Cloud projects can only have one database, and the Module 8 app uses Cloud Datastore, the second project is needed for Module 9 which uses Cloud Firestore. If developing a corresponding Firebase app, you should create the 2nd project in the Firebase console then access that project via the Cloud console to enable Cloud Tasks.

Get baseline sample app

Use the code you had at the end of completing the Module 8 codelab. If you don't have it but have familiarized yourself with it, you can grab the Module 8 code in its repo folder. Whether you use yours or ours, the Module 8 code is where we'll START. This codelab walks you through each step, and when complete, you should arrive at a working Module 9 application. The completed Module 9 repo folder is also included so you can compare yours with ours.

Your Module 8 folder should contain the following files (and possibly a lib folder):

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

(Re)Deploy Module 8 app

Deploy this app to App Engine in your first project. If this is the first time you're doing it or this project is (also) new, enable Cloud Datastore and Cloud Tasks (in the API manager) as well as App Engine (on the App Engine dashboard page). For convenience, you can also enable all 3 on the command-line if you've installed the Cloud SDK and its gcloud command:

gcloud services enable appengine.googleapis.com cloudtasks.googleapis.com datastore.googleapis.com

Switch to this project on the command-line then deploy this application with these commands, substituting your Module 8 Cloud project ID in for MOD8_PROJ_ID:

gcloud config set project MOD8_PROJ_ID
gcloud app deploy

Your app should look and function just like the same apps in both Modules 7 and 8, a web app that tracks visits, and possibly showing old visits to delete:

4aa8a2cb5f527079.png

Now we're ready to refactor/rewrite this as a new Python 3 app using the latest tools Google Cloud has to offer, starting with the configuration files.

4. Update configuration files

requirements.txt

The new requirements.txt is nearly the same as the one for Module 8, with only a few necessary changes to the Cloud client libraries specifications:

  1. Replace google-cloud-ndb with google-cloud-firestore
  2. Bump-up to the latest versions of all the Cloud client libraries
  3. The final version of Flask that works for Python 2 is 1.1.2. You can select the latest for Python 3 if you wish, or specify flask>=1.1.2 if you want the new app to work for Python 2 and 3.
flask==1.1.2
google-cloud-firestore==2.3.4
google-cloud-tasks==2.7.0

We recommend using the latest versions of each library; the versions numbers above are the latest at the time of this writing. The code in the FINISH repo folder will likely have newer releases.

app.yaml

The second generation App Engine runtime does not support built-in 3rd-party libraries like in 2.x nor does it support copying of non-built-in libraries. The only requirement for 3rd-party packages is to list them in requirements.txt. As a result, the entire libraries section of app.yaml can be deleted.

Another update is that the Python 3 runtime requires use of web frameworks that do their own routing. As a result, all script handlers must be changed to auto. However, since all routes must be changed to auto, it's then irrelevant to have any handlers, so let's remove those too. Your new, abbreviated app.yaml should look like this:

runtime: python38

Shortening app.yaml is optional but allows for easier containerization of your app (see Module 4 codelab and 2.x w/Docker, 3.x w/Docker, or 3.x w/Cloud Buildpacks repos.

Delete appengine_config.py and lib

One of the welcome changes on the second generation of App Engine runtimes is that copying of 3rd-party packages is no longer required from users. No built-in libraries (per the changes to app.yaml above), no appengine_config.py file nor lib folder.

5. Update application files

There is only one application file, main.py, so all changes in this section affect just that file. Below is a "diffs" illustration on the overall changes that need to be made to refactor the existing code into the new app. Readers are not expected to read the code line-by-line, as its purpose is to simply get a pictorial overview of what's required in this refactor (but feel free to open in a new tab or download and zoom in if desired).

5d043768ba7be742.png

Let's tackle these one section at a time, starting at the top.

Update imports and initialization

The import section in main.py for Module 8 uses Cloud NDB and Cloud Tasks; it should look as follows:

  • BEFORE:
from datetime import datetime
import json
import logging
import time
from flask import Flask, render_template, request
import google.auth
from google.cloud import ndb, tasks

app = Flask(__name__)
ds_client = ndb.Client()
ts_client = tasks.CloudTasksClient()

Logging is simplified and enhanced in the second generation runtimes like Python 3:

  • For comprehensive logging experience, use Cloud Logging
  • For simple logging, just send to stdout (or stderr) via print()
  • There's no need to use the Python logging module (so remove it)

As such, delete the import of logging and swap google.cloud.ndb with google.cloud.filestore. Similarly, swap to a Firestore client (vs. NDB) so the top of your new app now looks like this:

  • AFTER:
from datetime import datetime
import json
import time
from flask import Flask, render_template, request
import google.auth
from google.cloud import firestore, tasks

app = Flask(__name__)
fs_client = firestore.Client()
ts_client = tasks.CloudTasksClient()

Those are the required changes for initialization. Now to dig into the main application code.

Migrate to Cloud Firestore

Both App Engine ndb and Cloud NDB require a data model (class); for this app, it's Visit. The store_visit() function works the same in all other migration modules: it registers a visit by creating a new Visit record, saving a visiting client's IP address and user agent (browser type).

  • BEFORE:
class Visit(ndb.Model):
    'Visit entity registers visitor IP address & timestamp'
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

Neither Cloud Datastore nor Cloud Firestore uses a data model class, so remove it. Also, neither automatically creates a timestamp when records are created, requiring the developer to do it manually. Making this change using the Firestore client library, we arrive at this store_visit() replacement for your new/updated app:

  • AFTER:
def store_visit(remote_addr, user_agent):
    'create new Visit document in Firestore'
    doc_ref = fs_client.collection('Visit')
    doc_ref.add({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })

The key function is fetch_visits(). Not only does it perform the original query for the latest Visits, but it also grabs the timestamp of the last Visit displayed and creates the push task that calls /trim (thus trim()) to mass-delete the old Visits. Here it is using Cloud NDB:

  • BEFORE:
def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    with ds_client.context():
        data = Visit.query().order(-Visit.timestamp).fetch(limit)
    oldest = time.mktime(data[-1].timestamp.timetuple())
    oldest_str = time.ctime(oldest)
    logging.info('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return (v.to_dict() for v in data), oldest_str

The primary changes:

  1. Swap out the Cloud NDB query for the Cloud Firestore equivalent; the query styles differ slightly.
  2. Firestore doesn't require use of a context manager nor makes you extract its data (with to_dict()) like Cloud NDB does.
  3. Replace logging calls with print()

After those changes, fetch_visits() look like this:

  • AFTER:
def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    visits_ref = fs_client.collection('Visit')
    visits = list(v.to_dict() for v in visits_ref.order_by('timestamp',
            direction=firestore.Query.DESCENDING).limit(limit).stream())
    oldest = time.mktime(visits[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

This would normally be all that's necessary. Unfortunately there's one major issue.

(Possibly) create a new (push) queue

In Module 7, we added use of App Engine taskqueue to the existing Module 1 app. One key benefit of having push tasks as a legacy App Engine feature is that a "default" queue is automatically created. When that app was migrated to Cloud Tasks in Module 8, that default queue was already there, so we still didn't need to be concerned about it then. That changes here in Module 9.

Not only have we divested from App Engine taskqueue in favor of Cloud Tasks, but now we have a brand new project where that default queue doesn't exist nor will it be automatically created for us. As written, creating a task in fetch_visits() (for a non-existing queue) will fail. You will need a new function to check whether the ("default") queue exists, creating one if it doesn't exist. In our app, call it _create_queue_if(). Here is the function to add above fetch_visits():

def _create_queue_if():
    'app-internal function creating default queue if it does not exist'
    try:
        ts_client.get_queue(name=QUEUE_PATH)
    except Exception as e:
        if 'does not exist' in str(e):
            ts_client.create_queue(parent=PATH_PREFIX,
                    queue={'name': QUEUE_PATH})
    return True

The Cloud Tasks create_queue() function requires the full pathname of the queue except the queue name, so we need another variable PATH_PREFIX which is the QUEUE_PATH minus the queue name. Up at the constants declarations, add PATH_PREFIX so all the constant assignments look like this:

_, PROJECT_ID = google.auth.default()
REGION_ID = 'REGION_ID'    # replace w/your own
QUEUE_NAME = 'default'     # replace w/your own
QUEUE_PATH = ts_client.queue_path(PROJECT_ID, REGION_ID, QUEUE_NAME)
PATH_PREFIX = QUEUE_PATH.rsplit('/', 2)[0]

Okay, now that _create_queue_if() works, modify the last line in fetch_visits() to create the queue if necessary and then create the task:

    if _create_queue_if():
        ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

Both _create_queue_if() and fetch_visits() should now look like this in aggregate:

def _create_queue_if():
    'app-internal function creating default queue if it does not exist'
    try:
        ts_client.get_queue(name=QUEUE_PATH)
    except Exception as e:
        if 'does not exist' in str(e):
            ts_client.create_queue(parent=PATH_PREFIX,
                    queue={'name': QUEUE_PATH})
    return True

def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    visits_ref = fs_client.collection('Visit')
    visits = list(v.to_dict() for v in visits_ref.order_by('timestamp',
            direction=firestore.Query.DESCENDING).limit(limit).stream())
    oldest = time.mktime(visits[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    if _create_queue_if():
        ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

Other than having to add this extra code, the rest of the Cloud Tasks code is mostly intact from Module 8. The final piece of code to look at is the task handler.

Update (push) task handler

In the task handler, trim(), the Cloud NDB code queries for visits older than the oldest displayed. It uses a keys-only query to speed things up—why fetch all the data if you only need the Visit IDs? Once you have all the visit IDs, delete them all in a batch with Cloud NDB's delete_multi() function.

  • BEFORE:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    with ds_client.context():
        keys = Visit.query(
                Visit.timestamp < datetime.fromtimestamp(oldest)
        ).fetch(keys_only=True)
        nkeys = len(keys)
        if nkeys:
            logging.info('Deleting %d entities: %s' % (
                    nkeys, ', '.join(str(k.id()) for k in keys)))
            ndb.delete_multi(keys)
        else:
            logging.info(
                    'No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

Like fetch_visits(), the bulk of the changes involve swapping out Cloud NDB code for Cloud Firestore's with just a tweak in query styles and removing use of its context manager, and changing the logging calls to print().

  • AFTER:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    query = fs_client.collection('Visit')
    visits = query.where('timestamp', '<',
            datetime.fromtimestamp(oldest)).stream()
    deleted_visit_ids = []
    for visit in visits:
        visit.reference.delete()
        deleted_visit_ids.append(visit.id)
    dlist = ', '.join(str(v_id) for v_id in deleted_visit_ids)
    if dlist:
        print('Deleting %d entities: %s' % (dlist.count(',')+1, dlist))
    else:
        print('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

The subtleties that may not be apparent:

  • Cloud NDB's queries allow for a keys-only data fetch keys = Visit.query().fetch(keys_only=True) whereas Firestore lacks that functionality, so you fetch all the documents.
  • You don't delete Firestore documents with IDs, just their references: visit.reference.delete()
  • NDB key IDs have a getter method:(str(k.id()) for k in keys))) while Firestore document IDs are an attribute: (visit.id)

Before wrapping up, there's one significant improvement in efficiency we can make. The Module 8 trim() logs a CSV string of all the visits deleted (by ID), but the Module 9 version above is a gross memory hog if a lot of visits need deleting, specifically the deleted_visit_ids list. Module 8 used a generator expression to conserve memory (lists require all objects to be in RAM), so let's break out the visit deletion code to a utility generator called _delete_docs() and replace use of a list with an iterator in trim() that more closely resembles what we had in Module 8:

def _delete_docs(visits):
    'app-internal generator deleting old FS visit documents'
    for visit in visits:
        visit.reference.delete()
        yield visit.id

@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    query = fs_client.collection('Visit')
    visits = query.where('timestamp', '<',
            datetime.fromtimestamp(oldest)).stream()
    dlist = ', '.join(str(v_id) for v_id in _delete_docs(visits))
    if dlist:
        print('Deleting %d entities: %s' % (dlist.count(',')+1, dlist))
    else:
        print('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

There are no changes to the main application handler root().

Port to Python 3?

This sample app was designed to run on both Python 2 and 3 without any porting nor compatibility libraries required.

Upgrade to the latest Cloud Tasks client library?

The final version of the Cloud Tasks client library supporting Python 2 is 1.5.0. At the time of this writing, the latest version of the client library for Python 3 (2.7.0) is fully compatible with 1.5.0, thus no updates are required, so this is done too.

No changes are needed in the HTML template file, templates/index.html, either, so this wraps all the updates necessary. Congratulations for arriving at your new Module 9 application!

6. Summary/Cleanup

Deploy application

Deploying the new app requires a second project. If you intend on creating a companion mobile app, we recommend creating this project from the Firebase console and enabling Cloud Firestore as your backend database. Then go to the Cloud console to enable Cloud Tasks and App Engine. (If not developing a mobile app, you can do everything from the Cloud console.) For convenience, you can also enable all on the command-line if you've installed the Cloud SDK and its gcloud command:

gcloud services enable appengine.googleapis.com cloudtasks.googleapis.com firestore.googleapis.com

Now switch to the new project on the command-line then deploy this application with these commands, substituting your Module 9 Cloud project ID in for MOD9_PROJ_ID:

gcloud config set project MOD9_PROJ_ID
gcloud app deploy

As you merely rewired things under the hood, this app should operate identically to your Module 7 and 8 apps:

Module 7 visitme app

This step completes codelab. We invite you to compare your code to what's in the Module 9 folder. Congratulations!

Optional: Clean up and/or disable app

In this tutorial, you used a pair of Cloud projects, both with billable product usage:

Project 1

  • App Engine
  • Cloud Datastore
  • Cloud Tasks

Project 2

  • App Engine
  • Cloud Firestore
  • Cloud Tasks

If you're not ready to go to the next tutorial yet, disable both apps to avoid incurring charges. When you're ready to move to the next codelab, you can re-enable either one. While your apps are disabled, they won't get any traffic to incur charges, however Datastore usage (first project) or Firestore usage (second project) may be billable if either exceeds their free quotas, so delete enough to fall under that limit. While not listed on that free quotas page, Cloud Tasks does have a free tier as well.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, shutdown your Cloud projects.

Next steps

Beyond this tutorial, other migration modules to look at include containerizing your App Engine app for Cloud Run; links to the Module 4 and Module 5 codelabs are provided below. If you performed the Cloud Firestore migration in this codelab and are considering a further migration to Cloud Firestore to take advantage of its Firebase features, see Module 6, link also below.

  • Module 4: Migrate to Cloud Run with Docker
  • Containerize your app to run on Cloud Run with Docker
  • This migration allows you to stay on Python 2.
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
  • Containerize your app to run on Cloud Run with Cloud Buildpacks
  • You do not need to know anything about Docker, containers, or Dockerfiles.
  • Requires your app to have already migrated to Python 3 (Buildpacks doesn't support Python 2)

7. Additional resources

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 8 (START) and Module 9 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations which you can clone or download a ZIP file.

Codelab

Python 2

Python 3

Module 8

code

(n/a)

Module 9

(n/a)

code

Online resources

Below are online resources which may be relevant for this tutorial:

App Engine

Cloud NDB

Cloud Firestore

Cloud Tasks

Other Cloud information