Migrate from Google Cloud NDB to Cloud Datastore (Module 3)

xx While Cloud NDB is a great Datastore solution for long-time App Engine developers and helps with transitioning to Python 3, it is not the only way App Engine developers can access Datastore.

When App Engine's Datastore became its own product in 2013, Google Cloud Datastore, a new client library was created so that all users can use Datastore.

Python 3 App Engine as well as non-App Engine developers are directed to use the Cloud Datastore (not Cloud NDB) client library. Python 2 App Engine developers are encouraged to migrate from ndb to Cloud NDB and port to Python 3 from there.

If you have (App Engine or non-App Engine) apps using Cloud Datastore, there are several reasons to consider migrating your Cloud NDB apps to CLoud Datastore. Moving to Cloud Datastore:

  • Allows developers to focus on a single codebase for Datastore access
  • Avoids maintaining some code using Cloud NDB & others using Cloud Datastore
  • More consistency in codebase and better code reuseability
  • Common/shared libraries contribute to lower overall maintenance cost

You'll learn how to

  • Use Cloud NDB (if you're unfamiliar with it)
  • Migrate from Cloud NDB to Cloud Datastore
  • Further migrate your app to Python 3

What you'll need

  • A Google Cloud Platform project with an active GCP billing account
  • Basic Python skills
  • Working knowledge of basic Linux commands
  • Basic knowledge of developing & deploying App Engine apps
  • A working Module 2 App Engine 2.x or 3.x app.

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

App Engine's Datastore started as the bundled NoSQL data storage solution since its original launch in 2008. Since then, as mentioned above, Datastore has grown up to become its own product, Cloud Datastore so developers can use it outside App Engine. Fast-forwarding some more to 2017, the next generation of Datastore launched in 2017 rebranded as Cloud Firestore to signal its feature integration with the Firebase real-time database. For backwards-compatibility reasons, Cloud Firestore operates in "Cloud Firestore in Datastore mode" when accessed from the Cloud NDB or Cloud Datastore client libraries.

Through completing this migration, you can then:

  1. Migrate to Python 3 & the next-gen App Engine runtime
  2. Containerize your Python 2 (or 3) app and migrate to Cloud Run
  3. Add use of App Engine (push) task queues then migrate to Cloud Tasks
  4. Migrate to Cloud Firestore (using Firestore in native mode)

But let's migrate to Cloud Datastore first. This migration features these primary steps:

  1. Setup/Prework
  2. Replace Cloud NDB with Cloud Datastore client libraries
  3. Update application

Before we get going with the main part of the tutorial, let's setup our project, get the code, then deploy the baseline app so we know we started with working code.

1. Setup project

If you completed the Module 2 codelab, we recommend reusing that same project (and code). Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

2. "Get" baseline sample app

One of the prerequisites is to have a working Module 2 sample app. Here are some options on where to get it from:

  1. The working sample you created after completing the Module 2 codelab
  2. Doing the Module 2 codelab now before starting this one
  3. Copy the Module 2 repo (link below)

Whether you use yours or ours, the Module 2 code is where we'll START. This Module 3 codelab walks you through each step, and when complete, it should resemble code at the FINISH point. There are Python 2 and 3 versions of this tutorial, so grab the correct code repo below.

Python 2

The directory of Python 2 Module 2 STARTing files (yours or ours) should look like this:

$ ls
README.md               appengine_config.py     requirements.txt
app.yaml                main.py                 templates

If you completed the Module 2 tutorial, you'll also have a lib folder with Flask and its dependencies. If you don't have a lib folder, create it with the pip install -t lib -r requirements.txt command so that we can deploy this baseline app in the next step. If you have both Python 2 and 3 installed, we recommend using pip2 instead of pip to avoid confusion with Python 3.

Python 3

The directory of Python 3 Module 2 STARTing files (yours or ours) should look like this:

$ ls
README.md               main.py                 templates
app.yaml                requirements.txt

Neither lib nor appengine_config.py are used for Python 3.

3. (Re)Deploy Module 1 app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 1 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

The only configuration change is a minor package swap in your requirements.txt file.

1. Update requirements.txt

Upon completing Module 2, your requirements.txt file looked like this:

  • BEFORE (Python 2 & 3):
Flask==1.1.2
google-cloud-ndb==1.7.1

Update requirements.txt by replacing the Cloud NDB library (google-cloud-ndb) with the latest version of the Cloud Datastore library (google-cloud-datastore), leaving the entry for Flask intact, bearing in mind the final version of Cloud Datastore that's Python 2 compatible is 1.15.3:

  • AFTER (Python 2):
Flask==1.1.2
google-cloud-datastore==1.15.3
  • AFTER (Python 3):
Flask==1.1.2
google-cloud-datastore==2.1.0

Keep in mind that the repo is maintained more regularly than this tutorial, so it's possible the requirements.txt there may reflect newer versions. We recommend using the latest versions of each library, but if they don't work, you can roll back to an older release. The versions numbers above are the latest when this codelab was last updated.

2. Other configuration files

The other configuration files, app.yaml and appengine_config.py, should remain unchanged from the previous migration step:

  • app.yaml should (still) reference the 3rd-party bundled packages grpcio and setuptools.
  • appengine_config.py should (still) point pkg_resources and google.appengine.ext.vendor to the 3rd-party resources in lib.

Now let's move to the application files.

There are no changes to template/index.html, but there are a few updates for main.py.

1. Imports

The starting code for the import section should look as follows:

  • BEFORE:
from flask import Flask, render_template, request
from google.cloud import ndb

Replace the google.cloud.ndb import with one for Cloud Datastore: google.cloud.datastore. Because the Datastore client library does not support auto-creation of a timestamp field in an Entity, also import the standard library datetime module to create one manually. By convention, standard library imports go above third-party package imports. When you're done with these changes, it should look like this:

  • AFTER:
from datetime import datetime
from flask import Flask, render_template, request
from google.cloud import datastore

2. Initialization and data model

After initializing Flask, the Module 2 sample app creating an NDB data model class and its fields:

  • BEFORE:
app = Flask(__name__)
ds_client = ndb.Client()

class Visit(ndb.Model):
    visitor   = ndb.StringProperty()
    timestamp = ndb.DateTimeProperty(auto_now_add=True)

The Cloud Datastore library does not have such a class, so delete the Visit class declaration. You still need a client to talk to Datastore, so change ndb.Client() to datastore.Client(). The Datastore library is more "flexible," allowing you to create Entities without "pre-declaring" their structure like NDB. After this update, this part of main.py should look like:

  • AFTER:
app = Flask(__name__)
ds_client = datastore.Client()

3. Datastore access

Migrating to Cloud Datastore requires changing how you create, store, and query Datastore entites (at the user-level). For your applications, the difficulty of this migration depends on how complex your Datastore code is. In our sample app, we attempted to make the update as straightforward as possible. Here is our starting code:

  • BEFORE:
def store_visit(remote_addr, user_agent):
    with ds_client.context():
        Visit(visitor='{}: {}'.format(remote_addr, user_agent)).put()

def fetch_visits(limit):
    with ds_client.context():
        return (v.to_dict() for v in Visit.query().order(
                -Visit.timestamp).fetch_page(limit)[0])

With Cloud Datastore, create a generic entity, identifying grouped objects in your Entity with a "key". Create the data record with a JSON object (Python dict) of key-value pairs, then write it to Datastore with the expected put(). Querying is similar but more straightforward with Datastore. Here you can see how the equivalent Datastore code differs:

  • AFTER:
def store_visit(remote_addr, user_agent):
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

def fetch_visits(limit):
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    return query.fetch(limit=limit)

Update the function bodies for store_visit() and fetch_visits() as above, keeping their signatures identical to the previous version. There are no changes at all to the main handler root(). After completing these changes, your

Deploy application

Re-deploy your app with gcloud app deploy, and confirm the app works. Your code should now match what's in the Module 3 repo folders:

Congrats for completing this Module 3 codelab. You've just crossed the finish line, since this is the last of the strongly recommended migrations in this series as far as Datastore goes.

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Datastore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

From here, feel free to explore these next migration modules:

  • Module 3 Bonus: Continue below to the bonus part of this tutorial to explore porting to Python 3 and the next generation App Engine runtime.
  • Module 7: App Engine Push Task Queues (required if you use [push] Task Queues)
    • Adds App Engine taskqueue push tasks to Module 1 app
    • Prepares users for migrating to Cloud Tasks in Module 8
  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • Allows you to stay on Python 2
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • Do not need to know anything about Docker, containers, or Dockerfiles
    • Requires you to have already migrated your app to Python 3
  • Module 3:
    • Modernize Datastore access from Cloud NDB to Cloud Datastore
    • This is the library used for Python 3 App Engine apps and non-App Engine apps
  • Module 6: Migrate to Cloud Firestore
    • Migrate to Cloud Firestore to access Firebase features
    • While Cloud Firestore supports Python 2, this codelab is available only in Python 3.

To access the latest App Engine runtime and features, we recommend that you migrate to Python 3. In our sample app, Datastore was the only built-in service we used, and since we've migrated from ndb to Cloud NDB, we can now port to App Engine's Python 3 runtime.

Overview

While porting to Python 3 is not within the scope of a Google Cloud tutorial, this part of the codelab gives developers an idea of how the Python 3 App Engine runtime differs. One outstanding feature of the next-gen runtime is simplified access to third-party packages: There's no need to specify built-in packages in app.yaml nor a requirement to copy or upload non-built-in libraries; they are implicitly installed from being listed in requirements.txt.

Because our sample is so basic and Cloud Datastore is Python 2-3 compatible, no application code needs to be explicitly ported to 3.x: The app runs on 2.x & 3.x unmodified, meaning the only required changes are in configuration in this case:

  1. Simplify app.yaml to reference Python 3 and remove reference to bundled 3rd-party libraries.
  2. Delete appengine_config.py and the lib folder as they're no longer necessary.

The main.py and templates/index.html application files remain unchanged.

Update requirements.txt

The final version of the Cloud Datastore supporting Python 2 is 1.15.3. Update requirements.txt by with the latest version for Python 3 (may be newer by now). When this tutorial was written, the latest version was 2.1.0, so edit that line to look like this (or whatever the latest version is):

google-cloud-datastore==2.1.0

Simplify app.yaml

BEFORE:

The only real change for this sample app is to significantly shorten app.yaml. As a reminder, here's what we had in app.yaml at the conclusion of Module 3:

runtime: python27
threadsafe: yes
api_version: 1

handlers:
- url: /.*
  script: main.app

libraries:
- name: grpcio
  version: 1.0.0
- name: setuptools
  version: 36.6.0

AFTER:

In Python 3, the threadsafe, api_version, and libraries directives are all deprecated; all apps are presumed threadsafe and api_version isn't used in Python 3. There are no longer built-in third-party packages preinstalled on App Engine services, so libraries is also deprecated. Check the documentation on changes to app.yaml for more information on these changes. As a result, you should delete all three from app.yaml and update to a supported Python 3 version (see below).

Use of handlers directive

In addition, the handlers directive, which directs traffic at App Engine applications has also been deprecated. Since the next-gen runtime expects web frameworks to manage app routing, all "handler scripts" must be changed to "auto". Combining the changes from above, you arrive at this app.yaml:

runtime: python38

handlers:
- url: /.*
  script: auto

Learn more about "script: auto" from its documentation page.

Removing handlers directive

Since handlers is deprecated, you can remove the entire section too, leaving a single-line app.yaml:

runtime: python38

By default, this will launch the Gunicorn WSGI web server which is available for all applications. If you're familiar with gunicorn, this is the command executed when it's started by default with the barebones app.yaml:

gunicorn main:app --workers 2 -c /config/gunicorn.py

Use of entrypoint directive

If, however, your application requires a specific start-up command, that can be specified with an entrypoint directive:

runtime: python38
entrypoint: python main.py

This example specifically requests the Flask development server be used instead of gunicorn. Code that starts the development server must also be added to your app to launch on the 0.0.0.0 interface on port 8080 by adding this small section to the bottom of main.py:

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080, debug=True)

Learn more about entrypoint from its documentation page. More examples & best practices can be found here and here.

Delete appengine_config.py and lib

Delete the appengine_config.py file and the lib folder. In migrating to Python 3, App Engine acquires and installs packages listed in requirements.txt.

The appengine_config.py config file is used to recognize third-party libraries/packages, whether you've copied them yourself or use ones already available on App Engine servers (built-in). When moving to Python 3, a summary of the big changes are:

  1. No bundling of copied third-party libraries (listed in requirements.txt)
  2. No pip install into a lib folder, meaning no lib folder period
  3. No listing built-in third-party libraries in app.yaml
  4. No need to reference app to third-party libraries, so no appengine_config.py file

Listing all required third-party libraries in requirements.txt is all that's needed.

Deploy application

Re-deploy your app to ensure that it works. You can also confirm how close your solution is to the Module 2 sample Python 3 code. To visualize the differences with Python 2, compare the code with its Python 2 version.

Congrats on finishing the bonus step in Module 2! Visit the documentation on preparing configuration files for the Python 3 runtime. Finally, review the earlier summary above for next steps and cleanup.

Preparing your application

When it is time to migrate your application, you will have to port your main.py and other application files to 3.x, so a best practice is to try your best to make your 2.x application as "forward-compatible" as possible.

There are plenty of online resources to help you accomplish that, but some of the key tips:

  1. Ensure all application dependences are fully 3.x-compatible
  2. Ensure your application runs on at least 2.6 (preferably 2.7)
  3. Ensure application passes entire test suite (and minimum 80% coverage)
  4. Use compatibility libraries such as six, Future, and/or Modernize
  5. Educate yourself on key backwards-incompatible 2.x vs. 3.x differences
  6. Any I/O will likely lead to Unicode vs. byte string incompatibilities

The sample app was designed with all this in mind, hence why it runs on 2.x and 3.x right out of the box so we can focus on showing you what needs to be changed in order to use the next-gen platform.

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 2 (START) and Module 3 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine migrations.

Codelab

Python 2

Python 3

Module 2

repo

repo

Module 3

repo

repo

App Engine resources

Below are additional resources regarding this specific migration: