Migrate a Python 2 App Engine Cloud NDB+Tasks app to Python 3 Cloud Datastore+Tasks (Module 9)

This series of codelab tutorials aims to help App Engine developers modernize their apps. The most significant step is to move away from original runtime bundled services because they're not supported by the next generation runtimes. This codelab helps users migrate from App Engine push tasks and its taskqueue API/library to Google Cloud Tasks. The sample app...

  • Originates from the same App Engine Cloud NDB & Cloud Tasks sample app from Module 8 (codelab, repo)
  • Applies the same Cloud NDB to Cloud Datastore migration as Module 3 (codelab), 2.x) or 3.x) repos.

If you haven't done either tutorial above, we invite you to do at least the Module 8 codelab prior to continuing as that is the codebase we'll start from. We'll migrate it from the first generation Python 2 App Engine runtime to the second generation Python 3 App Engine runtime, and Cloud NDB & Cloud Tasks v1 to Cloud Datastore & Cloud Tasks v2. (If you don't wish to complete the Cloud NDB to Datastore migration, that not a problem as it is optional. [You can stay on Cloud NDB as it is Python 3-compatible.])

You'll learn how to

  • Migrate to Python 3
  • Use Cloud Tasks (vs. App Engine taskqueue)
  • Migrate from Cloud NDB to Cloud Datastore (same as Module 3)

What you'll need

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

In the previous codelab, you were able to move away from App Engine (ndb & taskqueue) libraries to their modern Google Cloud equivalents, Cloud NDB & Cloud Tasks. However, many developers are migrating from Python 2 to 3, not a trivial jump. Similarly, App Engine's second generation runtimes only support Python 3, so you're not just porting between major language releases, but must also take into account the differences in App Engine runtimes, also non-trivial. This codelab's purpose is to help with this, not to mention taking into account that Cloud 3.x libraries continue to evolve while their 2.x counterparts stay frozen.

This tutorial's migration features these primary steps are:

  1. Setup/Prework
  2. Update configuration files
  3. Update main application

NOTE: We will also migrate from Cloud NDB to Cloud Datastore at the same time. Since this migration is identical to that of Module 3 (codelab), repo), we will not describe its migration details in-depth; refer to the Module 3 tutorial or code for more details. As mentioned above, this migration is optional. (You can stay on Cloud NDB as it is Python 3-compatible.)

There are 3 objectives in this part of the tutorial:

  1. Setup project/application
  2. Download baseline sample app
  3. (Re)Familiarize yourself w/gcloud commands

Setup project/application

We recommend reusing the same project as the one you used for completing the Module 8 codelab. Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

Download baseline sample app

One of the prerequisites to this codelab is to have a working Module 8 sample app. If you don't have one, we recommend completing the Module 8 tutorial (link above) before moving ahead here. Otherwise if you're already familiar with its contents, you can just start by grabbing the Module 8 code below.

Whether you use yours or ours, the Module 8 code is where we'll START. This Module 2 codelab walks you through each step, and when complete, it should resemble code at the FINISH point (including an optional port from Python 2 to 3).

The directory of Module 8 files (yours or ours) should look like this:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

If you completed the Module 8 tutorial, you'll also have a lib folder with Flask and its dependencies.

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 8 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

requirements.txt

There are a few changes to requirements.txt from Module 8, all related to Cloud client libraries. Replace google-cloud-ndb with google-cloud-datastore and add the google-cloud-tasks package.

At the time of this writing, Datastore is on 2.0.1 while Tasks is on 2.0.0. (The requirements.txt file in the repo will have the latest versions.) This differs from 2.x where google-cloud-datastore has been frozen at 1.15.3 while google-cloud-tasks is pinned at 1.5.0. Your requirements.txt will look something like this after those updates:

Flask==1.1.2
google-cloud-datastore==2.0.1
google-cloud-tasks==2.0.0

app.yaml

The second generation App Engine runtime does not support built-in third-party libraries like in 2.x nor does it support bundling/vendoring of non-built-in libraries. The only requirement for third-party packages is to list them in requirements.txt. As a result, then entire libraries section of app.yaml can be deleted.

Also update to a Python 3 runtime, i.e., 3.7 or 3.8, and change all script handlers to auto. The second generation runtime requires web frameworks do their own routing, so they're not used in app.yaml any more. Your new, abbreviated app.yaml should look like this:

runtime: python38

handlers:
- url: /.*
  script: auto

An additional improvement you can make is to get rid of the handlers: section altogether (especially since script: auto is the only accepted directive regardless of URL path) and replace it with an entrypoint: directive. If you do that, your app.yaml will be even shorter (assuming main.py starts your service):

runtime: python38
entrypoint: python main.py

Check out these pages in the docs to learn more about the entrypoint: directive for app.yaml files:

Shortening app.yaml is optional but allows for easier containerization of your app (see Module 4 codelab and 2.x w/Docker, 3.x w/Docker, or 3.x w/Cloud Buildpacks repos.

Delete appengine_config.py and lib

One of the welcome changes on the second generation of App Engine runtimes is that Bundling/vendoring of third-party packages is no longer required from users. No built-in libraries (per the changes to app.yaml above), no appengine_config.py file nor lib folder.

Update imports and initialization

The current app uses Cloud NDB which we'll shortly change to Cloud Datastore.

  • BEFORE:
from datetime import datetime
import json
import logging
import time
from flask import Flask, render_template, request
from google.cloud import ndb, tasks

Logging is simplified and enhanced in the second generation runtimes:

  • For comprehensive logging experience, use Cloud Logging
  • For simple logging, just send to stdout (or stderr) via print()
  • There's no need to use the Python logging module

As such, delete the import of logging and swap google.cloud.ndb with google.cloud.datastore so your import section now looks like this:

  • AFTER:
from datetime import datetime
import json
import time
from flask import Flask, render_template, request
from google.cloud import datastore, tasks

Similarly, switch to instantiating a Datastore client instead of an NDB client to talk to Datastore with:

app = Flask(__name__)
ds_client = datastore.Client()
ts_client = tasks.CloudTasksClient()

Migrate to Cloud Tasks (and Cloud Datastore)

  • BEFORE:
class Visit(ndb.Model):
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

As described in Module 3, Datastore does not have a data model class nor a way to automatically add a creation timestamp. It has more a JSON flavor to it, so delete the Visit class and replace store_visit() with the following:

  • AFTER:
def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

The key function is fetch_visits(). Not only does it do the original query for the latest Visits, but it also grabs the timestamp of the last Visit displayed and creates the push tasks that calls /trim (thus trim()) to mass-delete the old Visits.

  • BEFORE:
def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    with ds_client.context():
        data = Visit.query().order(-Visit.timestamp).fetch(limit)
    oldest = time.mktime(data[-1].timestamp.timetuple())
    oldest_str = time.ctime(oldest)
    logging.info('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return (v.to_dict() for v in data), oldest_str

The primary changes:

  1. Swap out the Cloud NDB query for the Cloud Datastore equivalent; the query styles differ only slightly.
  2. Datastore doesn't require use of a context manager nor does it make you extract its data (with to_dict()) like Cloud NDB does.
  3. Replace logging calls with print()

After those changes, the code will look like this:

  • AFTER:
def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    data = list(query.fetch(limit=limit))
    oldest = time.mktime(data[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return data, oldest_str

Update (push) task handler

As you can see, the Cloud Tasks code stays the same. The final piece of code to look at is the task handler itself, trim().

  • BEFORE:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    with ds_client.context():
        keys = Visit.query(
                Visit.timestamp < datetime.fromtimestamp(oldest)
        ).fetch(keys_only=True)
        nkeys = len(keys)
        if nkeys:
            logging.info('Deleting %d entities: %s' % (
                    nkeys, ', '.join(str(k.id()) for k in keys)))
            ndb.delete_multi(keys)
        else:
            logging.info('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

Like fetch_visits(), the bulk of the changes involve swapping out Cloud NDB code for Cloud Datastore's with just a tweak in query styles, and changing the logging calls to print().

  • AFTER:
@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    query = ds_client.query(kind='Visit')
    query.add_filter('timestamp', '<', datetime.fromtimestamp(oldest))
    query.keys_only()
    keys = list(visit.key for visit in query.fetch())
    nkeys = len(keys)
    if nkeys:
        print('Deleting %d entities: %s' % (
                nkeys, ', '.join(str(k.id) for k in keys)))
        ds_client.delete_multi(keys)
    else:
        print('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

The subtle updates you may have missed in the code:

  • Cloud NDB's queries allows for a keys-only data fetch:
    • keys = Visit.query().fetch(keys_only=True)
  • Cloud Datastore queries always send back entities, so extracting the keys are required:
    • keys = list(visit.key for visit in query.fetch())
  • Cloud NDB key IDs have a getter method:
    • logging.info('Deleting %d entities: %s' % (nkeys, ', '.join(str(k.id()) for k in keys)))
  • Cloud Datastore key IDs are a property:
    • print('Deleting %d entities: %s' % (nkeys, ', '.join(str(k.id) for k in keys)))

Deploy application

There are no changes to templates/index.html. Doublecheck all the changes, that your code compiles, and re-deploy. Confirm the app (still) works. You should expect identical output as from the steps before. You just rewired things under the hood, so everything should still work as expected.

That concludes this codelab. Your code should now match what's in the Module 9 repo. Congrats for modernizing your app to Python 3 on the next-gen platform!

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Datastore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

Beyond this tutorial, the next step is Module 9 and its codelab and porting to Python 3. It's a bit optional as not everyone's ready for that step. There's also an optional port from Cloud NDB to Cloud Datastore — that's definitely optional and only for those who want to move off of the NDB and consolidate code that use Cloud Datastore; that migration is identical to the Module 3 migration codelab.

  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • Allows you to stay on Python 2
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • Do not need to know anything about Docker, containers, or Dockerfiles
    • Requires you to have already migrated your app to Python 3
  • Module 6: Migrate to Cloud Firestore
    • Migrate to Cloud Firestore to access Firebase features
    • While Cloud Firestore supports Python 2, this codelab is available only in Python 3.

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 8 (START) and Module 9 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations.

Codelab

Python 2

Python 3

Module 8

repo

(n/a)

Module 9

(n/a)

repo

App Engine resources

Below are additional resources regarding this specific migration: