Migrate from Cloud Datastore to Cloud Firestore (Module 6)

The Google App Engine (GAE) migration modules generally teach GAE (Standard) developers how to modernize their apps by moving away from original runtime bundled services because they're not supported by the next generation runtimes. This one is an exception because Cloud Datastore is not a legacy built-in App Engine library, thus this migration more optional than others in this series.

This tutorial teaches you how to migrate an App Engine app using Cloud Datastore to Cloud Firestore. It's meant for those who feel compelled to use Firestore to take advantage of its Firebase real-time database features.

You'll learn how to

  • Redeploy your Cloud Datastore app & basic usage (if you haven't used it in awhile)
  • Recognize differences between Datastore & Firestore
  • Migrate from Cloud Datastore to Cloud Firestore

What you'll need

Survey

How will you use this codelab?

Only read through it Read it and complete the exercises

The next generation of Cloud Datastore launched in 2017 with a product rebrand as Cloud Firestore to signal its feature integration with Firebase. However, users whose migrations are fairly sizable or switching to Gen2 & Python 3 where a new project may be desired have the option of switching to Cloud Firestore natively to take advantage of its full capabilities. See this document on choosing between Cloud Firestore in Datastore mode or native Firestore mode.

NOTE: Cloud Firestore is the only NoSQL datastore system available to GCP projects, so users must choose between Firestore in native mode or in Datastore mode; you can't use both Datastore and Firestore in the same project.

As mentioned in the previous step, a migration to Cloud Datastore means applications are using "Cloud Firestore in Datastore mode". The primary purpose of this tutorial is to give developers an idea of the differences between Cloud Datastore as they know it who want to gain initial familiarity with Cloud Firestore.

This migration is not one we expect users to perform, which is why it is optional and a "bonus" step. While there are obvious advantages to using Cloud Firestore natively such as client auth, Firebase rules integration, and of course, the Firebase realtime feature (query/document watch), the migration steps are non-trivial:

  • You must create a new project as once data is stored, projects cannot switch from Datastore (in Firebase mode) to Firestore (native mode).
  • There isn't a migration tool that can stream data from one project to another.
  • Some critical Datastore features, including namespaces and a higher write throughput (>10k/s), are not available from Firestore.
  • The export and import tools are "primitive" and "all of nothing" scenarios...
    • If your Datastore is large, it can possibly take many hours to export then import into Firestore
    • During this time, your application/service won't be able write/update data.
    • Migration activities count towards normal usage; you may want to spread it out (across daily quotas if possible) to minimize costs.
    • Because the updated service runs in a different project, you'll need a window for DNS updates to propagate.
  • Datastore & Firestore have similar but different data models so migration requires updating how the app/service works
    • Ancestor queries from Datastore are now Collection queries (the default)
    • Broad type queries from Datastore are Firestore Collection group queries
    • Indexes and handling are different, etc.

That said, if you have a trivially-simple app to migrate or are here to learn about the differences between using Datastore vs. Firestore, and wish to use this tutorial as an exercise to achieve that goal, please continue.

NOTE: We present this very optional migration only in Python 3, but you can certainly interpolate and do so with Python 2 where the only difference is that for compatibility reasons, Firestore records use Unicode strings, hence a u'' leading indicator is needed in front of the Python 2 string literals. For example, the store_visit() function will look like the below (vs. in the main.py featured in this repo where such directives aren't present).

def store_visit(remote_addr, user_agent):
    doc_ref = fs_client.collection(u'Visit')
    doc_ref.add({
        u'timestamp': datetime.now(),
        u'visitor': u'{}: {}'.format(remote_addr, user_agent),
    })

This tutorial's migration features these primary steps:

  1. Setup/Prework
  2. Add Cloud Firestore library
  3. Update application files

There are 3 objectives in this part of the tutorial:

  1. Setup project/application
  2. Download baseline sample app
  3. (Re)Familiarize yourself w/gcloud commands

1. Setup project

We recommend reusing the same project as the one you used for completing the Module 3 codelab. Alternatively, you can create a brand new project or reuse another existing project. Ensure the project has an active billing account and App Engine (app) is enabled.

2. "Get" baseline sample app

One of the prerequisites to this codelab is to have a working Module 3 sample app. If you don't have one, go complete the Module 3 tutorial (link above) before moving ahead here. Otherwise if you're already familiar with its contents, you can just start by grabbing the Module 3 code below.

Whether you use yours or ours, the Module 3 code is where we'll START. This Module 6 codelab walks you through each step, and when complete, it should resemble code at the FINISH point. (This tutorial is only available for Python 3.)

The directory of Module 3 files (yours or ours) should look like this:

$ ls
README.md               main.py                 templates
app.yaml                requirements.txt

3. (Re)Deploy Module 1 app

Your remaining prework steps to execute now:

  1. Re-familiarize yourself with the gcloud command-line tool (if nec.)
  2. (Re)deploy the Module 3 code to App Engine (if nec.)

Once you've successfully executed those steps and confirm it's operational, we'll move ahead in this tutorial, starting with the configuration files.

Python 2 requirements

  • Ensure app.yaml (still) references the 3rd-party bundled packages: grpcio and setuptools
  • Ensure appengine_config.py still uses pkg_resources and google.appengine.ext.vendor to point the app at 3rd-party resources.
  • In the next section updating requirements.txt, you must use google-cloud-firestore==1.9.0 as that is the final 2.x-compatible version of the Python Firestore client library
    • If your requirements.txt has an entry for google-cloud-core, leave it as-is.
    • Be sure to delete lib and reinstall with pip install -t lib -r requirements.txt

The only configuration change is a minor package swap in your requirements.txt file.

  1. Update requirements.txt to include the Cloud Firestore library (google-cloud-firestore).
  2. app.yaml and templates/index.html remain unchanged
  3. Update your application to use Cloud Firestore

1. Update requirements.txt

Replace google-cloud-datastore with google-cloud-firestore in requirements.txt:

Flask==1.1.2
google-cloud-firestore==2.0.1

We recommend using the latest versions of each library, but if they don't work, you can roll back to an older release. The versions numbers above are the latest at the time of this writing.

1. Imports

Switching the package import is a minor change from datastore to firestore:

  • BEFORE:
from google.cloud import datastore
  • AFTER:
from google.cloud import firestore

2. Firestore access

After initializing Flask, create your Firestore client in the same way you did for Datastore. Make a similar change as above but for client initialization:

  • BEFORE:
app = Flask(__name__)
ds_client = datastore.Client()
  • AFTER:
app = Flask(__name__)
fs_client = firestore.Client()

By performing the migration from Cloud NDB to Cloud Datastore, you've already done the heavylifting to get to Cloud Firestore. With Datastore, you create data records in the form of Entities made up of common Properties and group them by Keys. Data records in Firestore are Documents, made up of key-value pairs, and grouped together into Collections. Migrating from Datastore requires you to think about these differences because they will materialize when you're creating data records as well as querying for them. YMMV ("your mileage may vary") depending on how complex your Datastore code is.

For Datastore, you make queries based on Entity type along with filtering and sorting criteria. For Firestore, querying data is similar. Let's look at a quick example, assuming these query values, clients (ds_client or fs_client, respectively), and imports:

from datetime import datetime
from firestore.Query import DESCENDING

OCT1 = datetime(2020, 10, 1)
LIMIT = 10

For Datastore, let's query for the ten most recent Visit entities newer than 2020 Oct 1 in descending order:

query = ds_client.query(kind='Visit')
query.add_filter('timestamp', '>=', datetime(2020, 10, 1))
query.order = ['-timestamp']
return query.fetch(limit=LIMIT)

Doing the same for Firestore, from the Visit collection:

query = fs_client.collection('Visit')
query.where('timestamp', '>=', datetime(2020, 10, 1))
query.order_by('timestamp', direction=DESCENDING)
return query.limit(LIMIT).stream()

The sample app query is simpler (no "WHERE" clause). As a review, here is the Cloud Datastore code:

  • BEFORE:
def store_visit(remote_addr, user_agent):
    entity = datastore.Entity(key=ds_client.key('Visit'))
    entity.update({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })
    ds_client.put(entity)

def fetch_visits(limit):
    query = ds_client.query(kind='Visit')
    query.order = ['-timestamp']
    return query.fetch(limit=limit)

Migrating to Firestore, you'll find creating new documents similar to entities, and the queries as shown earlier.

  • AFTER:
def store_visit(remote_addr, user_agent):
    doc_ref = fs_client.collection('Visit')
    doc_ref.add({
        'timestamp': datetime.now(),
        'visitor': '{}: {}'.format(remote_addr, user_agent),
    })

def fetch_visits(limit):
    visits_ref = fs_client.collection('Visit')
    visits = (v.to_dict() for v in visits_ref.order_by('timestamp',
            direction=firestore.Query.DESCENDING).limit(limit).stream())
    return visits

The main function root() stays the same as does the index.html template file. Doublecheck your changes, save, deploy, and verify.

Deploy application

Re-deploy your app with gcloud app deploy, and confirm the app works. Your code should now match what's in the Module 6 repo (or a 2.x version if that was your preference).

Congrats for completing this optional Module 6 migration. This is likely one of, if not the final, migrations you can make as far as App Engine data storage goes. One alternative migration you can consider is consider containerizing your app for Cloud Run if you haven't already.

Optional: Clean up

What about cleaning up to avoid being billed until you're ready to move onto the next migration codelab? As existing developers, you're likely already up-to-speed on App Engine's pricing information.

Optional: Disable app

If you're not ready to go to the next tutorial yet, disable your app to avoid incurring charges. When you're ready to move onto the next codelab, you can re-enable it. While your app is disabled, it won't get any traffic to incur charges, however another thing you can get billed for is your Firestore usage if it exceeds the free quota, so delete enough to fall under that limit.

On the other hand, if you're not going to continue with migrations and want to delete everything completely, you can shutdown your project.

Next steps

Beyond this tutorial, there are several other migraion module codelabs you can consider:

  • Module 7: App Engine Push Task Queues (required if you use [push] Task Queues)
    • Adds App Engine taskqueue push tasks to Module 1 app
    • Prepares users for migrating to Cloud Tasks in Module 8
  • Module 4: Migrate to Cloud Run with Docker
    • Containerize your app to run on Cloud Run with Docker
    • Allows you to stay on Python 2
  • Module 5: Migrate to Cloud Run with Cloud Buildpacks
    • Containerize your app to run on Cloud Run with Cloud Buildpacks
    • Do not need to know anything about Docker, containers, or Dockerfiles
    • Requires you to have already migrated your app to Python 3

App Engine migration module codelabs issues/feedback

If you find any issues with this codelab, please search for your issue first before filing. Links to search and create new issues:

Migration resources

Links to the repo folders for Module 3 (START) and Module 6 (FINISH) can be found in the table below. They can also be accessed from the repo for all App Engine migrations.

Codelab

Python 2

Python 3

Module 3

repo

repo

Module 6

(n/a)

repo

App Engine resources

Below are additional resources regarding this specific migration: