Module 9: Migrate a Python 2 App Engine Cloud NDB+Tasks app to Python 3 Cloud Datastore+Tasks

This series of codelabs (self-paced, hands-on tutorials) aims to help Google App Engine (Standard) developers modernize their apps by guiding them through a series of migations. The most significant step is to move away from original runtime bundled services because the next generation runtimes are more flexible, giving users a greater variety of service options. Moving to the newer generation runtime enables you to integrate with Google Cloud products more easily, use a wider range of supported services, and support current language releases.

This codelab further modernizes the sample app from Module 8 (codelab, repo), demonstrating several additional and optional migrations.

You'll learn how to

  • Migrate sample app to Python 3
  • Migrate from Cloud NDB to Cloud Datastore (same as Module 3)
  • Understand any differences between Cloud Tasks v1 and v2

What you'll need

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

In the Module 8 codelab, you were able to move away from App Engine (ndb and taskqueue) libraries to their modern Google Cloud equivalents, Cloud NDB and Cloud Tasks. However, many developers are migrating from Python 2 to 3, which is not a trivial jump. Similarly, App Engine's second generation runtimes only support Python 3, so you're not just porting between major language releases, but must also take into account the differences in App Engine runtimes, which is also non-trivial. This codelab's purpose is to help with this, not to mention taking into account that Cloud 3.x libraries continue to evolve while their 2.x counterparts stay frozen.

This tutorial's migration features these primary steps are:

  1. Setup/Prework
  2. Update configuration files
  3. Update main application

NOTE: The Cloud NDB to Cloud Datastore migration is identical to that of Module 3 (codelab), repo), so we will not describe its migration details in-depth. Refer to the Module 3 tutorial or code in its repo folder for more details. However, this migration is optional, meaning you can stay on Cloud NDB since it is Python 2 and 3 compatible.

Before we get going with the main part of the tutorial, let's set up our project, get the code, then deploy the baseline app so we know we started with working code.

1. Setup project

We recommend reusing the same project as the one you used for completing the Module 8 codelab. Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

2. Get baseline sample app

One of the prerequisites to this codelab is to have a working Module 8 sample app. If you don't have one, we recommend completing the Module 8 tutorial (link above) before moving ahead here. Otherwise if you're already familiar with its contents, you can just start by grabbing the Module 8 code below.

Whether you use yours or ours, the Module 8 code is where we'll START. This Module 2 codelab walks you through each step, and when complete, it should resemble code at the FINISH point (including an optional port from Python 2 to 3).

The directory of Module 8 files (yours or ours) should look like this:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

If you completed the Module 8 tutorial, you'll also have a lib folder with Flask and its dependencies.

3. (Re)Deploy Module 8 app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 8 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

requirements.txt

There are a few changes to requirements.txt from Module 8, all related to Cloud client libraries. Replace google-cloud-ndb with google-cloud-datastore and bump-up the version of the google-cloud-tasks package:

Flask==1.1.2
google-cloud-datastore==2.1.0
google-cloud-tasks==2.1.0

We recommend using the latest versions of each library; the versions numbers above are the latest at the time of this writing. The code in the FINISH repo folder is updated more frequently and may have newer releases.

app.yaml

The second generation App Engine runtime does not support built-in 3rd-party libraries like in 2.x nor does it support copying of non-built-in libraries. The only requirement for 3rd-party packages is to list them in requirements.txt. As a result, the entire libraries section of app.yaml can be deleted.

Also update to a Python 3 runtime, i.e., 3.7 or 3.8, and change all script handlers to auto. The second generation runtime requires web frameworks do their own routing, so they're not used in app.yaml any more. Your new, abbreviated app.yaml should look like this:

runtime: python38

handlers:
- url: /.*
  script: auto

An additional improvement you can make is to get rid of the handlers: section altogether (especially since script: auto is the only accepted directive regardless of URL path) and replace it with an entrypoint: directive. If you do that, your app.yaml will be even shorter (assuming main.py starts your service):

runtime: python38
entrypoint: python main.py

Check out these pages in the docs to learn more about the entrypoint: directive for app.yaml files:

Shortening app.yaml is optional but allows for easier containerization of your app (see Module 4 codelab and 2.x w/Docker, 3.x w/Docker, or 3.x w/Cloud Buildpacks repos.

Delete appengine_config.py and lib

One of the welcome changes on the second generation of App Engine runtimes is that copying of 3rd-party packages is no longer required from users. No built-in libraries (per the changes to app.yaml above), no appengine_config.py file nor lib folder.

There is only one application file, main.py, so all changes in this section affects just that file.

Update imports and initialization

The import section in main.py currently uses Cloud NDB and Cloud Tasks; it should look as follows:

  • BEFORE:
from datetime import datetime
import json
import logging
import time
from flask import Flask, render_template, request
from google.cloud import ndb, tasks

Logging is simplified and enhanced in the second generation runtimes:

  • For comprehensive logging experience, use Cloud Logging
  • For simple logging, just send to stdout (or stderr) via print()
  • There's no need to use the Python logging module

As such, delete the import of logging and swap google.cloud.ndb with google.cloud.datastore so your import section now looks like this:

  • AFTER:
from datetime import datetime
import json
import time
from flask import Flask, render_template, request
from google.cloud import datastore, tasks

Similarly, switch to instantiating a Datastore client instead of an NDB client to talk to Datastore with:

app = Flask(__name__)
ds_client = datastore.Client()
ts_client = tasks.CloudTasksClient()

Migrate to Cloud Tasks (and Cloud Datastore)

  • BEFORE:
class Visit(ndb.Model):
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

As described in Module 3, Datastore does not have a data model class nor a way to automatically add a creation timestamp. It has more a JSON flavor to it, so delete the Visit class and replace store_visit() with the following:

  • AFTER:
def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

The key function is fetch_visits(). Not only does it do the original query for the latest Visits, but it also grabs the timestamp of the last Visit displayed and creates the push tasks that calls /trim (thus trim()) to mass-delete the old Visits.

  • BEFORE:
def fetch_visits(limit):
    'get most recent visits and add task to delete older visits'
    with ds_client.context():
        data = Visit.query().order(-Visit.timestamp).fetch(limit)
    oldest = time.mktime(data[-1].timestamp.timetuple())
    oldest_str = time.ctime(oldest)
    logging.info('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return (v.to_dict() for v in data), oldest_str

The primary changes:

  1. Swap out the Cloud NDB query for the Cloud Datastore equivalent; the query styles differ only slightly.
  2. Datastore doesn't require use of a context manager nor does it make you extract its data (with to_dict()) like Cloud NDB does.
  3. Replace logging calls with print()

After those changes, the code will look like this:

  • AFTER:
def fetch_visits(limit):
    'get most recent visits and add task to delete older visits'
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    data = list(query.fetch(limit=limit))
    oldest = time.mktime(data[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return data, oldest_str

Update (push) task handler

As you can see, the Cloud Tasks code stays the same. The final piece of code to look at is the task handler itself, trim().

  • BEFORE:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    with ds_client.context():
        keys = Visit.query(
                Visit.timestamp < datetime.fromtimestamp(oldest)
        ).fetch(keys_only=True)
        nkeys = len(keys)
        if nkeys:
            logging.info('Deleting %d entities: %s' % (
                    nkeys, ', '.join(str(k.id()) for k in keys)))
            ndb.delete_multi(keys)
        else:
            logging.info('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

Like fetch_visits(), the bulk of the changes involve swapping out Cloud NDB code for Cloud Datastore's with just a tweak in query styles, and changing the logging calls to print().

  • AFTER:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    query = ds_client.query(kind='Visit')
    query.add_filter('timestamp', '<', datetime.fromtimestamp(oldest))
    query.keys_only()
    keys = list(visit.key for visit in query.fetch())
    nkeys = len(keys)
    if nkeys:
        print('Deleting %d entities: %s' % (
                nkeys, ', '.join(str(k.id) for k in keys)))
        ds_client.delete_multi(keys)
    else:
        print('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

The subtle updates you may have missed in the code:

  • Cloud NDB's queries allows for a keys-only data fetch:
    • keys = Visit.query().fetch(keys_only=True)
  • Cloud Datastore queries always send back entities, so extracting the keys are required:
    • keys = list(visit.key for visit in query.fetch())
  • Cloud NDB key IDs have a getter method:
    • logging.info('Deleting %d entities: %s' % (nkeys, ', '.join(str(k.id()) for k in keys)))
  • Cloud Datastore key IDs are a property:
    • print('Deleting %d entities: %s' % (nkeys, ', '.join(str(k.id) for k in keys)))

There are no changes to templates/index.html in this nor in the next codelab.

Deploy application

Double-check all your config and app updates, re-deploy, and confirm the app (still) works. You should expect identical output as from Modules 7 and 8. You just rewired things under the hood, so everything should still work as expected.

If you jumped into this tutorial without doing the Module 7 or 8 codelabs, the app itself doesn't change; it registers all visits to the main web page (/) and looks like this once you've visited the site enough times as well as tells you that it has deleted all visits older than the tenth:

Module 7 visitme app

That concludes this codelab. Your code should now match what's in the Module 9 repo. Congrats for modernizing your app to Python 3 on the next-gen platform!

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Datastore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

Beyond this tutorial, other migration modules to look at include containerizing your App Engine app for Cloud Run; links to the Module 4 and Module 5 codelabs are provided below. If you performed the Cloud Datastore migration in this codelab and are considering a further migration to Cloud Firestore to take advantage of its Firebase features, see Module 6, link also below.

  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • This migration allows you to stay on Python 2.
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • You do not need to know anything about Docker, containers, or Dockerfiles.
    • Requires your app to have already migrated to Python 3 (Buildpacks doesn't support Python 2)
  • Module 6: Migrate to Cloud Firestore
    • Migrate to Cloud Firestore to access Firebase features
    • While Cloud Firestore supports Python 2, this codelab is available only in Python 3.

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 8 (START) and Module 9 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations which you can clone or download a ZIP file.

Codelab

Python 2

Python 3

Module 8

code

(n/a)

Module 9

(n/a)

code

App Engine resources

Below are additional resources regarding this specific migration: