Migrate a Python 2 App Engine Cloud NDB & Cloud Tasks app to Python 3 and Cloud Datastore (Module 9)

1. Overview

The Serverless Migration Station series of codelabs (self-paced, hands-on tutorials) and related videos aim to help Google Cloud serverless developers modernize their appications by guiding them through one or more migrations, primarily moving away from legacy services. Doing so makes your apps more portable and gives you more options and flexibility, enabling you to integrate with and access a wider range of Cloud products and more easily upgrade to newer language releases. While initially focusing on the earliest Cloud users, primarily App Engine (standard environment) developers, this series is broad enough to include other serverless platforms like Cloud Functions and Cloud Run, or elsewhere if applicable.

The purpose of this codelab is to port the Module 8 sample app to Python 3 as well as switch Datastore (Cloud Firestore in Datastore mode) access from using Cloud NDB to the native Cloud Datastore client library and upgrade to the latest version of the Cloud Tasks client library.

We added use of Task Queue for push tasks in Module 7, then migrated that usage to Cloud Tasks in Module 8. Here in Module 9, we continue on to Python 3 and Cloud Datastore. Those using Task Queues for pull tasks will migrate to Cloud Pub/Sub and should refer to Modules 18-19 instead.

You'll learn how to

  • Port the Module 8 sample app to Python 3
  • Switch Datastore access from Cloud NDB to Cloud Datastore client libraries
  • Upgrade to the latest Cloud Tasks client library version

What you'll need

Survey

How will you use this tutorial?

Only read through it Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Background

Module 7 demonstrates how to use App Engine Task Queue push tasks in Python 2 Flask App Engine apps. In Module 8, you migrate that app from Task Queue to Cloud Tasks. Here in Module 9, you continue that journey and port that app to Python 3 as well as switch Datastore access from using Cloud NDB to the native Cloud Datastore client library.

Since Cloud NDB works for both Python 2 and 3, it suffices for App Engine users porting their apps from Python 2 to 3. An additional migration of client libraries to Cloud Datastore is completely optional, and there is only one reason to consider it: you have non-App Engine apps (and/or Python 3 App Engine apps) already using the Cloud Datastore client library and want to consolidate your codebase to accessing Datastore with just one client library. Cloud NDB was created specifically for Python 2 App Engine developers as a Python 3 migration tool, so if you don't already have code using the Cloud Datastore client library, you don't need to consider this migration.

Finally, the development of the Cloud Tasks client library continues only in Python 3, so we are "migrating" from one of the final Python 2 versions to its Python 3 contemporary. Fortunately, there are no breaking changes from Python 2, meaning that there's nothing else you need to do here.

This tutorial features the following steps:

  1. Setup/Prework
  2. Update configuration
  3. Modify application code

3. Setup/Prework

This section explains how to:

  1. Set up your Cloud project
  2. Get baseline sample app
  3. (Re)Deploy and validate baseline app

These steps ensure you're starting with working code and that it's ready for migration to Cloud services.

1. Setup project

If you completed the Module 8 codelab, reuse that same project (and code). Alternatively, create a brand new project or reuse another existing project. Ensure the project has an active billing account and an enabled App Engine app. Find your project ID as you need to have it handy during this codelab, using it whenever you encounter the PROJECT_ID variable.

2. Get baseline sample app

One of the prerequisites is a working Module 8 App Engine app: complete the Module 8 codelab (recommended) or copy the Module 8 app from the repo. Whether you use yours or ours, the Module 8 code is where we'll begin ("START"). This codelab walks you through the migration, concluding with code that resembles what's in the Module 9 repo folder ("FINISH").

Regardless which Module 7 app you use, the folder should look like the below, possibly with a lib folder as well:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

3. (Re)Deploy and validate baseline app

Execute the following steps to deploy the Module 8 app:

  1. Delete the lib folder if there is one and run pip install -t lib -r requirements.txt to repopulate lib. You may need to use pip2 instead if you have both Python 2 and 3 installed on your development machine.
  2. Ensure you've installed and initialized the gcloud command-line tool and reviewed its usage.
  3. (optional) Set your Cloud project with gcloud config set project PROJECT_ID if you don't want to enter the PROJECT_ID with each gcloud command you issue.
  4. Deploy the sample app with gcloud app deploy
  5. Confirm the app runs as expected without issue. If you completed the Module 8 codelab, the app displays the top visitors along with the most recent visits (illustrated below). At the bottom is an indication of the older tasks which will be deleted.

4aa8a2cb5f527079.png

4. Update configuration

requirements.txt

The new requirements.txt is nearly the same as the one for Module 8, with only one big change: replace google-cloud-ndb with google-cloud-datastore. Make this change so your requirements.txt file looks like this:

flask
google-cloud-datastore
google-cloud-tasks

This requirements.txt file doesn't feature any version numbers, meaning the latest versions are selected. If any incompatibilities arise, use of version numbers to lock-in working versions for an app is standard practice.

app.yaml

The second generation App Engine runtime does not support built-in 3rd-party libraries like in 2.x nor does it support copying of non-built-in libraries. The only requirement for 3rd-party packages is to list them in requirements.txt. As a result, the entire libraries section of app.yaml can be deleted.

Another update is that the Python 3 runtime requires use of web frameworks that do their own routing. As a result, all script handlers must be changed to auto. However, since all routes must be changed to auto, and there are no static files served from this sample app, it's irrelevant to have any handlers, so remove the entire handlers section as well.

The only thing needed in app.yaml is to set the runtime to a supported version of Python 3, say 3.10. Make this change so the new, abbreviated app.yaml is just this single line:

runtime: python310

Delete appengine_config.py and lib

Next generation App Engine runtimes revamp 3rd-party package usage:

  • Built-in libraries are those vetted by Google and made available on App Engine servers, likely because they contain C/C++ code which developers aren't allowed to deploy to the cloud—these are no longer available in the 2nd generation runtimes.
  • Copying non-built-in libraries (sometimes called "vendoring" or "self-bundling") is no longer needed in 2nd generation runtimes. Instead, they should be listed in requirements.txt where the build system automatically installs them on your behalf at deploy time.

As a result of those changes to 3rd-party package management, neither the appengine_config.py file nor lib folder are needed, so delete them. In 2nd generation runtimes, App Engine automatically installs third-party packages listed in requirements.txt. Summarizing:

  1. No self-bundled or copied 3rd-party libraries; list them in requirements.txt
  2. No pip install into a lib folder, meaning no lib folder period
  3. No listing built-in 3rd-party libraries (thus no libraries section) in app.yaml; list them in requirements.txt
  4. No 3rd-party libraries to reference from your app means no appengine_config.py file

Listing all desired 3rd-party libraries in requirements.txt is the only developer requirement.

5. Update application files

There is only one application file, main.py, so all changes in this section affect just that file. Below is a "diffs" illustration on the overall changes that need to be made to refactor the existing code into the new app. Readers are not expected to read the code line-by-line, as its purpose is to simply get a pictorial overview of what's required in this refactor (but feel free to open in a new tab or download and zoom in if desired).

5d043768ba7be742.png

Update imports and initialization

The import section in main.py for Module 8 uses Cloud NDB and Cloud Tasks; it should look as follows:

BEFORE:

from datetime import datetime
import json
import logging
import time
from flask import Flask, render_template, request
import google.auth
from google.cloud import ndb, tasks

app = Flask(__name__)
ds_client = ndb.Client()
ts_client = tasks.CloudTasksClient()

Logging is simplified and enhanced in the second generation runtimes like Python 3:

  • For comprehensive logging experience, use Cloud Logging
  • For simple logging, just send to stdout (or stderr) via print()
  • There's no need to use the Python logging module (so remove it)

As such, delete the import of logging and swap google.cloud.ndb with google.cloud.datastore. Similarly, change ds_client to point to a Datastore client instead of an NDB client. With these changes made, the top of your new app now looks like this:

AFTER:

from datetime import datetime
import json
import time
from flask import Flask, render_template, request
import google.auth
from google.cloud import datastore, tasks

app = Flask(__name__)
ds_client = datastore.Client()
ts_client = tasks.CloudTasksClient()

Migrate to Cloud Datastore

Now it's time to replace NDB client library usage with Datastore. Both App Engine NDB and Cloud NDB require a data model (class); for this app, it's Visit. The store_visit() function works the same in all other migration modules: it registers a visit by creating a new Visit record, saving a visiting client's IP address and user agent (browser type).

BEFORE:

class Visit(ndb.Model):
    'Visit entity registers visitor IP address & timestamp'
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

However Cloud Datastore does not use a data model class, so delete the class. Furthermore, Cloud Datastore does not automatically create a timestamp when records are created, requiring you to do it manually—this is done with the datetime.now() call.

Without the data class, your modified store_visit() should look like this:

AFTER:

def store_visit(remote_addr, user_agent):
    'create new Visit entity in Datastore'
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

The key function is fetch_visits(). Not only does it perform the original query for the latest Visits, but it also grabs the timestamp of the last Visit displayed and creates the push task that calls /trim (thus trim()) to mass-delete the old Visits. Here it is using Cloud NDB:

BEFORE:

def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    with ds_client.context():
        data = Visit.query().order(-Visit.timestamp).fetch(limit)
    oldest = time.mktime(data[-1].timestamp.timetuple())
    oldest_str = time.ctime(oldest)
    logging.info('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return (v.to_dict() for v in data), oldest_str

The primary changes:

  1. Swap out the Cloud NDB query for the Cloud Datastore equivalent; the query styles differ slightly.
  2. Datastore doesn't require use of a context manager nor makes you extract its data (with to_dict()) like Cloud NDB does.
  3. Replace logging calls with print()

After those changes, fetch_visits() look like this:

AFTER:

def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    visits = list(query.fetch(limit=limit))
    oldest = time.mktime(visits[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

This would normally be all that's necessary. Unfortunately there's one major issue.

(Possibly) Create a new (push) queue

In Module 7, we added use of App Engine taskqueue to the existing Module 1 app. One key benefit of having push tasks as a legacy App Engine feature is that a "default" queue is automatically created. When that app was migrated to Cloud Tasks in Module 8, that default queue was already there, so we still didn't need to be concerned about it then. That changes here in Module 9.

One critical aspect to consider is that the new App Engine application no longer uses App Engine services, and as such, you can no longer assume that App Engine automatically creates a task queue automatically in a different product (Cloud Tasks). As written, creating a task in fetch_visits() (for a non-existing queue) will fail. A new function is needed to check whether the ("default") queue exists, and if not, create one.

Call this function _create_queue_if(), and add it to your application just above fetch_visits() because that is where it is called. The body of the function to add:

def _create_queue_if():
    'app-internal function creating default queue if it does not exist'
    try:
        ts_client.get_queue(name=QUEUE_PATH)
    except Exception as e:
        if 'does not exist' in str(e):
            ts_client.create_queue(parent=PATH_PREFIX,
                    queue={'name': QUEUE_PATH})
    return True

The Cloud Tasks create_queue() function requires the full pathname of the queue except the queue name. For simplicity, create another variable PATH_PREFIX representing the QUEUE_PATH minus the queue name (QUEUE_PATH.rsplit('/', 2)[0]). Add its definition near the top so the code block with all the constant assignments look like this:

_, PROJECT_ID = google.auth.default()
REGION_ID = 'REGION_ID'    # replace w/your own
QUEUE_NAME = 'default'     # replace w/your own
QUEUE_PATH = ts_client.queue_path(PROJECT_ID, REGION_ID, QUEUE_NAME)
PATH_PREFIX = QUEUE_PATH.rsplit('/', 2)[0]

Now modify the last line in fetch_visits() to use _create_queue_if(), first creating the queue if necessary, then creating the task afterwards:

    if _create_queue_if():
        ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

Both _create_queue_if() and fetch_visits() should now look like this in aggregate:

def _create_queue_if():
    'app-internal function creating default queue if it does not exist'
    try:
        ts_client.get_queue(name=QUEUE_PATH)
    except Exception as e:
        if 'does not exist' in str(e):
            ts_client.create_queue(parent=PATH_PREFIX,
                    queue={'name': QUEUE_PATH})
    return True

def fetch_visits(limit):
    'get most recent visits & add task to delete older visits'
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    visits = list(query.fetch(limit=limit))
    oldest = time.mktime(visits[-1]['timestamp'].timetuple())
    oldest_str = time.ctime(oldest)
    print('Delete entities older than %s' % oldest_str)
    task = {
        'app_engine_http_request': {
            'relative_uri': '/trim',
            'body': json.dumps({'oldest': oldest}).encode(),
            'headers': {
                'Content-Type': 'application/json',
            },
        }
    }
    if _create_queue_if():
        ts_client.create_task(parent=QUEUE_PATH, task=task)
    return visits, oldest_str

Other than having to add this extra code, the rest of the Cloud Tasks code is mostly intact from Module 8. The final piece of code to look at is the task handler.

Update (push) task handler

In the task handler, trim(), the Cloud NDB code queries for visits older than the oldest displayed. It uses a keys-only query to speed things up—why fetch all the data if you only need the Visit IDs? Once you have all the visit IDs, delete them all in a batch with Cloud NDB's delete_multi() function.

BEFORE:

@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    with ds_client.context():
        keys = Visit.query(
                Visit.timestamp < datetime.fromtimestamp(oldest)
        ).fetch(keys_only=True)
        nkeys = len(keys)
        if nkeys:
            logging.info('Deleting %d entities: %s' % (
                    nkeys, ', '.join(str(k.id()) for k in keys)))
            ndb.delete_multi(keys)
        else:
            logging.info(
                    'No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

Like fetch_visits(), the bulk of the changes involve swapping out Cloud NDB code for Cloud Datastore, tweaking the query styles, removing use of its context manager, and changing the logging calls to print().

AFTER:

@app.route('/trim', methods=['POST'])
def trim():
    '(push) task queue handler to delete oldest visits'
    oldest = float(request.get_json().get('oldest'))
    query = ds_client.query(kind='Visit')
    query.add_filter('timestamp', '<', datetime.fromtimestamp(oldest))
    query.keys_only()
    keys = list(visit.key for visit in query.fetch())
    nkeys = len(keys)
    if nkeys:
        print('Deleting %d entities: %s' % (
                nkeys, ', '.join(str(k.id) for k in keys)))
        ds_client.delete_multi(keys)
    else:
        print('No entities older than: %s' % time.ctime(oldest))
    return ''   # need to return SOME string w/200

There are no changes to the main application handler root().

Port to Python 3

This sample app was designed to run on both Python 2 and 3. Any Python 3-specific changes were covered earlier in relevant sections of this tutorial. There are no additional steps nor compatibility libraries required.

Cloud Tasks update

The final version of the Cloud Tasks client library supporting Python 2 is 1.5.0. At the time of this writing, the latest version of the client library for Python 3 is fully compatible with that version, thus no further updates are required.

HTML template update

No changes are needed in the HTML template file, templates/index.html, either, so this wraps up all the necessary changes to arrive at the Module 9 app.

6. Summary/Cleanup

Deploy and verify application

Once you've completed the code updates, mainly the port to Python 3, deploy your app with gcloud app deploy. The output should be identical to the apps from Modules 7 and 8 app except that you've moved the database access to the Cloud Datastore client library and have upgraded to Python 3:

Module 7 visitme app

This step completes codelab. We invite you to compare your code to what's in the Module 9 folder. Congratulations!

Clean up

General

If you are done for now, we recommend you disable your App Engine app to avoid incurring billing. However if you wish to test or experiment some more, the App Engine platform has a free quota, and so as long as you don't exceed that usage tier, you shouldn't be charged. That's for compute, but there may also be charges for relevant App Engine services, so check its pricing page for more information. If this migration involves other Cloud services, those are billed separately. In either case, if applicable, see the "Specific to this codelab" section below.

For full disclosure, deploying to a Google Cloud serverless compute platform like App Engine incurs minor build and storage costs. Cloud Build has its own free quota as does Cloud Storage. Storage of that image uses up some of that quota. However, you might live in a region that does not have such a free tier, so be aware of your storage usage to minimize potential costs. Specific Cloud Storage "folders" you should review include:

  • console.cloud.google.com/storage/browser/LOC.artifacts.PROJECT_ID.appspot.com/containers/images
  • console.cloud.google.com/storage/browser/staging.PROJECT_ID.appspot.com
  • The storage links above depend on your PROJECT_ID and *LOC*ation, for example, "us" if your app is hosted in the USA.

On the other hand, if you're not going to continue with this application or other related migration codelabs and want to delete everything completely, shut down your project.

Specific to this codelab

The services listed below are unique to this codelab. Refer to each product's documentation for more information:

Next steps

This concludes our migration from App Engine Task Queue push tasks to Cloud Tasks. The optional migration from Cloud NDB to Cloud Datastore is also covered on its own (without Task Queue or Cloud Tasks) in Module 3. In addition to Module 3, there are other migration modules focusing on moving away from App Engine legacy bundled services to consider include:

  • Module 2: migrate from App Engine NDB to Cloud NDB
  • Module 3: migrate from Cloud NDB to Cloud Datastore
  • Modules 12-13: migrate from App Engine Memcache to Cloud Memorystore
  • Modules 15-16: migrate from App Engine Blobstore to Cloud Storage
  • Modules 18-19: App Engine Task Queue (pull tasks) to Cloud Pub/Sub

App Engine is no longer the only serverless platform in Google Cloud. If you have a small App Engine app or one that has limited functionality and wish to turn it into a standalone microservice, or you want to break-up a monolithic app into multiple reusable components, these are good reasons to consider moving to Cloud Functions. If containerization has become part of your application development workflow, particularly if it consists of a CI/CD (continuous integration/continuous delivery or deployment) pipeline, consider migrating to Cloud Run. These scenarios are covered by the following modules:

  • Migrate from App Engine to Cloud Functions: see Module 11
  • Migrate from App Engine to Cloud Run: see Module 4 to containerize your app with Docker, or Module 5 to do it without containers, Docker knowledge, or Dockerfiles

Switching to another serverless platform is optional, and we recommend considering the best options for your apps and use cases before making any changes.

Regardless of which migration module you consider next, all Serverless Migration Station content (codelabs, videos, source code [when available]) can be accessed at its open source repo. The repo's README also provides guidance on which migrations to consider and any relevant "order" of Migration Modules.

7. Additional resources

Codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 8 (START) and Module 9 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine codelab migrations which you can clone or download a ZIP file.

Codelab

Python 2

Python 3

Module 8

code

(n/a)

Module 9

(n/a)

code

Online resources

Below are online resources which may be relevant for this tutorial:

App Engine

Cloud NDB

Cloud Datastore

Cloud Tasks

Other Cloud information

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.