Step 2: Migrate from App Engine NDB to Google Cloud NDB

This series of tutorials aims to help App Engine developers modernize their apps. The most significant step is to move away from original runtime bundled services because they're not supported by the next generation runtimes. This tutorial helps users migrate from the App Engine built-in ndb client library to the Cloud NDB client library.

Note: Your app may be not use NDB nor Datastore at all; if this is the case, you can complete this codelab merely as an exercise.

What you'll learn

  • Refamiliarize yourself with using App Engine NDB
  • Learn how to migrate to Cloud NDB
  • Prepare for next step to migrate Datastore access

What you'll need

  • A Google Cloud Platform project with an active GCP billing account
  • Basic Python skills
  • Working knowledge of basic Linux commands
  • Basic knowledge of developing & deploying App Engine apps
  • A working Step 1 App engine app

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

With the web framework port out of the way, we can now focus on upgrading the use of App Engine built-in services libraries. In the case for this tutorial, it's merely a switch from App Engine's ndb library to the Cloud NDB library.

Completing this migration opens many doors for developers. Users can then:

  1. Migrate Python 3 & the next generation App Engine runtime (Gen2)
  2. Migrate to Cloud Datastore (client library for non-App Engine apps)
  3. Containerize their Python 2 app and migrate to Cloud Run

But, we aren't there yet. Finish this codelab before considering those next steps. This tutorial's migration features these primary steps:

  1. Setup/Prework
  2. Add Cloud NDB library
  3. Update application files

One of the prerequisites to this codelab is to have a working Step 1 sample app. If you don't have one, go complete the Step 1 tutorial before moving ahead here.

Your prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool
  2. Re-deploy your working Step 1 App engine app

Once you've successfully executed those steps, let's start this migration by making the required changes to the configuration files.

Many original App Engine built-in services have blossomed into their own products, and Datastore is one of them. Now *non-*App Engine apps can use Cloud Datastore. Switching from App Engine ndb to Cloud NDB libraries requires changes to configuration files and a minor tweak to your application code.

Update requirements.txt

Datastore access without the built-in ndb library is achieved with the Cloud NDB library (google-cloud-ndb), so add that as another line in your requirements.txt file. That library's version at the time of this writing is specified here (but the repo's requirments.txt may be updated than in this static tutorial).

Flask==1.1.2
google-cloud-ndb==1.7.1

Now run this command to add Cloud NDB and its dependencies to your lib folder:

$ pip install -U -t lib -r requirements.txt

The previous tutorial didn't feature the -U option which updates existing dependencies and avoids warnings for previously-installed packages. Confirm a bunch of packages were installed, the most important being google-cloud-ndb.

Update app.yaml

Adding Google Cloud client libraries like google-cloud-ndb has a few requirements, all revolving around the inclusion of "built-in" libraries. These are 3rd-party libraries already available on Google servers. Users don't have to list them in requirements.txt nor bundle/vendor them with pip install. The only requirements are to specify them in app.yaml and point them to the vendored third-party libraries they may work with (in lib).

To do that, add the following lines to your app.yaml to reference a pair of 3rd-party bundled packages: grpcio and setuptools in a (possibly new) libraries section:

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

Why these built-in libraries? gRPC is an open RPC framework used by all Google Cloud client libraries. The grpcio library is the Python gRPC adapter and thus required. The reasoning for including setuptools is coming up.

Update appengine_config.py

The pkg_resources tool, part of the setuptools library, is used to let App Engine access built-in 3rd-party libraries. As mentioned earlier, this is so users don't have to list them in requirements.txt nor bundle manually with pip install. To tie these pieces together, you have to update appengine_config.py to use pkg_resources, like this:

import pkg_resources
from google.appengine.ext import vendor

# Set PATH to your libraries folder.
PATH = 'lib'
# Add libraries installed in the PATH folder.
vendor.add(PATH)
# Add libraries to pkg_resources working set to find the distribution.
pkg_resources.working_set.add_entry(PATH)

With the configuration file formalities out of the way, you can now migrate from App Engine NDB library, ndb, to Google Cloud NDB. The changes needed in the main application: update imported libraries and adding use of context management.

Imports

Switching the package import is fairly innocuous. Make the following swap in main.py:

  • BEFORE
from google.appengine.ext import ndb
  • AFTER:
from google.cloud import ndb

The change is subtle, but you see the explicit change away from an App Engine library to a Google Cloud library.

Datastore access

Using context managers is required by the library for all Datastore access. If you're unfamiliar with Python context managers, their purpose is to "gate" access to resources such that they must be acquired before they can be used. Based on the Computer Science control technique known as Resource Allocation Is Initialization (or RAII), this technique is also used for files (must be opened before they can be accessed), spin locks (which must be obtained before entering a "critical section" of code, etc.

Cloud NDB requires you create a client to communicate with Datastore; do that with ndb.Client(). Add ds_client = ndb.Client() in main.py right after Flask initialization:

app = Flask(__name__)
ds_client = ndb.Client()

Thanks to the Cloud NDB engineering team, the migration from App Engine NDB to Cloud NDB is straightforward. All you need to do is use Cloud NDB's Datastore client context manager. Put simply, wrap any code blocks accessing Datastore with with statements. Below are the same functions from Step 1 to write a new Entity to and read the ten most recently added Entities.

  • BEFORE:

Here's the original code without context management:

def store_visit(remote_addr, user_agent):
    Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    return (v.to_dict() for v in Visit.query().order(
        -Visit.timestamp).fetch_page(limit)[0])
  • AFTER:

Now let's add that. Request the context manager with ds_client.context() and move your Datastore access code within the with block by with these changes:

def store_visit(remote_addr, user_agent):
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    with ds_client.context():
        return (v.to_dict() for v in Visit.query().order(
                -Visit.timestamp).fetch_page(limit)[0])

Once you've confirmed all the changes in this tutorial, re-deployed your app, and confirmed everything works, your code will match the Step 2 repo. Congrats for completing this part of the migration. You've just crossed the finish line, literally. This is the last of the strongly recommended migrations as far as Datastore goes. You've also developed some muscle memory for migrating your own app to Cloud NDB.

From here, there's flexibility as to your next move. Choose any of these options:

  • Step 2: Continue to the bonus part of this tutorial where we begin to help you port your app to Python 3 to get you on the next generation App Engine runtime.
  • Step 4: Continue to use NDB but migrate your app to a container executing serverlessly on Cloud Run. This is one way to keep that Python 2 app running!
  • Step 3: Further modernize Datastore access from Cloud NDB to the (official) Cloud Datastore library (how users outside of App Engine access Cloud Datastore).

If you're not ready to move on, you can disable your app and check you're app execution & storage still fall within the free quotas. When you're ready for the next tutorial, you can re-enable your app. Or, if you're not performing any further migrations or wish to wipe all of this, you can shutdown your project.

We recommend developers migrate to Python 3 to access the latest App Engine runtime & features. Once you've migrated away from App Engine built-in libraries, you can start your port to Python 3. If Datastore is the only built-in service used in your app, you can do that now as Cloud NDB is available for both 2.x & 3.x (whereas App Engine's ndb is only available in 2.x). It is optional however, as you can continue with any additional migrations first, such as moving away from the App Engine Memcache, Mail, Blobstore, Task Queues, Images, etc.

Overview

While porting to Python 3 is not within the scope of a Google Cloud tutorial, this part of the codelab will at least give you an idea of how the Python 3 App Engine runtime differs from the original (Python 2) runtime. Because 1) the Cloud NDB is available on 3.x, and 2) our sample app is so simple, no migration steps are required for our main.py application file. This simple app runs under 2.x or 3.x unmodified, so all of the required changes are in configuration:

  1. Simplify app.yaml to reference Python 3 and remove reference to bundled 3rd-party libraries.
  2. Delete appengine_config.py as it's no longer necessary.
  3. Delete the lib folder for the same reason.

Simplify app.yaml

BEFORE:

The only real change for this sample app is to significantly shorten app.yaml. As a reminder, here's what we had in app.yaml at the conclusion of Step 2:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

AFTER:

In Python 3, the threadsafe, api_version, and libraries directives are all deprecated, and handlers has changed a bit. All apps are presumed threadsafe and api_version isn't used in Python 3. There are no longer built-in third-party packages preinstalled on App Engine services, so libraries is also deprecated. Check the documentation on changes to app.yaml for more information on these changes. As a result, delete all three from app.yaml and specify which Python 3 runtime (the sample uses 3.8) so it looks like this:

handlers:

runtime: python38

handlers:
- url: /.*
  script: auto

The most suprising difference is that all of your handler scripts must be changed to auto. The reason is the new runtime expects a web framework that does the app routing, so they're no longer necessary, and in fact, the entire handlers directive is now optional and replaceable by a single entrypoint directive. More on this in the script reference documentation. For example, you could further simplify app.yaml to:

runtime: python38
entrypoint: python main.py

Learn more from the entrypoint documentation and see more examples & best practices here and here.

Delete appengine_config.py and lib

The appengine_config.py config file is all about third-party packages, whether you bundle/vendor them yourself or "built-in" where you don't because they're already available on App Engine servers. Here's a summary of the big changes with third-party libraries:

  1. No vendoring—while packages are still registered in requirements.txt, third-party libraries no longer need to be vendored... no pip install and no lib folder.
  2. No more built-in libraries—no third-party packages are preinstalled on App Engine servers... just add what you need to requirements.txt.

As a result appengine_config.py is deprecated, so delete it. Delete the lib folder for the same reason. App Engine will acquire and install packages from requirements.txt. You only need to ensure it has everything you need.

Summary (BONUS)

The requirements.txt and templates/index.html files remain unchanged. This concludes the bonus part of Step 2. You can refer to the Step 2 sample Python 3 repo and compare it to the Step 2 sample Python 2 repo to see the exact "diff"s. See the earlier summary above for next steps and cleanup.

In the original App Engine runtimes, e.g., Python 2, users can have multiple handlers for their web apps, each defined in app.yaml, where those handlers could be in different application files. If you decide to port your app from Python 2 to 3, be aware that the next-generation runtime only supports web frameworks that perform their own routing, meaning all requests will be handled by a single application entrypoint (and subsequently, all handlers must be set to auto). Learn more about this in the links below: