1. Introduction
This codelab is a guide for anyone who is migrating queries from Apache Cassandra to Google Cloud Bigtable.
In this codelab you'll
- Use the Cloud Bigtable emulator
- Explore a timeseries use case
- Create a table and column family
- Learn the Cloud Bigtable Java equivalents of CQL Insert, Update, Select, and Delete
How would you rate your experience with using Google Cloud Platform?
How will you use this tutorial?
2. Set up
You'll be looking at a sample dataset with just a few rows to let you get an understanding of the core concepts quickly.
Cassandra
Equivalent Cassandra queries will be present on each step, so feel free to follow along on a local cluster if you'd like, or you can quickly set up a click-to-deploy Cassandra cluster and SSH into it.
If you're following along, create a keyspace and use it. The replication strategy won't be important here.
cqlsh> create keyspace mykeyspace with replication = {'class':'SimpleStrategy','replication_factor' : 2}; cqlsh> use mykeyspace;
Cloud Bigtable
You'll need a Cloud Bigtable instance for your table. You can set up a local instance for free using the emulator. You won't need to create a keyspace in Cloud Bigtable, your replication is handled by your instance configuration.
Use the following command to start the emulator:
gcloud beta emulators bigtable start
Then in another shell window or tab set the emulator environment variable with this command:
$(gcloud beta emulators bigtable env-init) #Sets BIGTAB`LE_EMULATOR_HOST
Then create a Java project you will use to run the code examples, and import the Cloud Bigtable client with Maven, Gradle or SBT. Then in a new Java file create a connection to the data client.
BigtableDataSettings settings =
BigtableDataSettings.newBuilder().setProjectId(projectId).setInstanceId(instanceId).build();
try {
dataClient = BigtableDataClient.create(settings);
} catch (Exception e) {
System.out.println("Error during data client connection: \n" + e.toString());
}
3. Table creation
In both Cassandra and Cloud Bigtable tables, each row has a key associated with it. Cassandra keys have a partition key and clustering column which can be separate or overlap. The entirety of Cloud Bigtable keys are used for splits (partitions) and ordering. These two systems are incredibly similar when it comes to primary/row key construction. Both systems are basically lexicographically sorted lists where the keys act as the main form of row distribution between nodes. In most cases you can reuse the same key for Cloud Bigtable.
In this codelab, you'll use Cloud Bigtable to store time series data about mobile phones and mobile tablets (not to be confused with Bigtable tablets.) Below are the instructions for creating the table.
Cassandra
cqlsh:mykeyspace> create table mobileTimeSeries ( deviceid text, devicetype text, date date, connected_cell map<timestamp,Boolean>, os_build text, os_name text, PRIMARY KEY((devicetype, deviceid), date));
Cloud Bigtable
You can create a table and column family using the Java client, but it's easiest to use the following command with the cbt tool:
cbt createtable mobile-time-series families=stats_summary
Cloud Bigtable is a NoSQL database, so you won't need to define a schema at table creation, but you'll want to think about the queries you're going to run and optimize the row key for them.
The most common strategy for key migration is simply to take all partition keys and clustering columns and join them to form a string representing the Cloud Bigtable row key. We generally recommend using string-based keys as they help with debugging the key distribution through Key Visualizer. You can use a separator such as a hash ‘#' between the column values to help with readability.
In this example we'll use a rowkey of "[DEVICE_TYPE]#[DEVICE_ID]#[YYYYMMDD]"
4. Inserts
Inserts are fairly similar between Cassandra and Cloud Bigtable. Here you'll insert one row and then multiple rows using a rowkey of "[DEVICE_TYPE]#[DEVICE_ID]#[YYYYMMDD]".
Cassandra
Single
cqlsh:mykeyspace> insert into mobileTimeSeries (deviceid, devicetype, date, connected_cell, os_build) values ('4c410523', 'phone',toDate(now()), {toTimeStamp(now()): true}, 'PQ2A.190405.003');
Batch
cqlsh:mykeyspace> BEGIN BATCH insert into mobileTimeSeries (deviceid, devicetype, date, os_name, os_build) values ('a0b81f74', 'tablet', '2019-01-01', 'chromeos', '12155.0.0-rc1'); insert into mobileTimeSeries (deviceid, devicetype, date, os_name, os_build) values ('a0b81f74', 'tablet', '2019-01-02','chromeos', '12145.0.0-rc6'); APPLY BATCH;
Cloud Bigtable
Single
Create a mutation with the row key and data you'd like to use, then apply the mutation with the data client. You'll add a row of data for a phone with information about its cell connection and operating system.
try {
System.currentTimeMillis();
long timestamp = (long) 1556712000 * 1000; // Timestamp of June 1, 2019 12:00
String rowKey = "phone#4c410523#20190501";
ByteString one = ByteString.copyFrom(new byte[] {0, 0, 0, 0, 0, 0, 0, 1});
RowMutation rowMutation =
RowMutation.create(tableId, rowKey)
.setCell(
COLUMN_FAMILY_NAME,
ByteString.copyFrom("connected_cell".getBytes()),
timestamp,
one)
.setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "PQ2A.190405.003");
dataClient.mutateRow(rowMutation);
} catch (Exception e) {
System.out.println("Error during Write: \n" + e.toString());
}
Batch
Define multiple mutations on a bulkMutation object, then use the data client to apply all the mutations with one API call. You'll add a few days of data about a mobile tablet's operating system name and version.
try {
long timestamp = (long) 1556712000 * 1000; // Timestamp of June 1, 2019 12:00
BulkMutation bulkMutation =
BulkMutation.create(tableId)
.add(
"tablet#a0b81f74#20190501",
Mutation.create()
.setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "chromeos")
.setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "12155.0.0-rc1"))
.add(
"tablet#a0b81f74#20190502",
Mutation.create()
.setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "chromeos")
.setCell(COLUMN_FAMILY_NAME, "os_build", timestamp, "12155.0.0-rc6"));
dataClient.bulkMutateRows(bulkMutation);
} catch (Exception e) {
System.out.println("Error during WriteBatch: \n" + e.toString());
}
5. Updates
Here you'll update a cell that hasn't been written yet and then write a new value to a cell while keeping the previous versions too.
Cassandra
Adding cells
cqlsh:mykeyspace> UPDATE mobileTimeSeries SET os_name = 'android' WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-06';
Updating cells
cqlsh:mykeyspace> UPDATE mobileTimeSeries SET connected_cell = connected_cell + {toTimeStamp(now()): false} WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-06';
Cloud Bigtable
In Cloud Bigtable, you can just treat updates the same as writes.
Adding cells
This is the same as writing cells, just provide a column that you haven't written to previously. Here you'll add the operating system name to the phone's row.
try {
long timestamp = (long) 1556713800 * 1000; // Timestamp of June 1, 2019 12:30
String rowKey = "phone#4c410523#20190501";
RowMutation rowMutation =
RowMutation.create(tableId, rowKey)
.setCell(COLUMN_FAMILY_NAME, "os_name", timestamp, "android");
dataClient.mutateRow(rowMutation);
} catch (Exception e) {
System.out.println("Error during update: \n" + e.toString());
}
Updating cells
Here you'll add a new data about the phone's cell connection status. You can use cell versions to store part of your time series data easily. Just provide a timestamp for your write and you'll add a new version for the cell. To clean up your data, you can use garbage collection to delete versions after a certain number or a certain amount of time.
try {
long timestamp = (long) 1556713800 * 1000; // Timestamp of June 1, 2019 12:30
String rowKey = "phone#4c410523#20190501";
ByteString zero = ByteString.copyFrom(new byte[] {0, 0, 0, 0, 0, 0, 0, 0});
RowMutation rowMutation =
RowMutation.create(tableId, rowKey)
.setCell(
COLUMN_FAMILY_NAME,
ByteString.copyFrom("connected_cell".getBytes()),
timestamp,
zero);
dataClient.mutateRow(rowMutation);
} catch (Exception e) {
System.out.println("Error during update2: \n" + e.toString());
}
6. Selects
Now, you'll retrieve the data you've written into the table. When migrating CQL select statements you need to take into account several aspects of select statements such as columns, filtering through where clauses, and limit and aggregates functions such as group by. Here, you'll just look at two simple select statements to get the basic idea, but can look in the documentation for more information on selecting. In Cloud Bigtable there are two types of retrieve operations: Get and Scan. Get retrieves one row while scan retrieves range of rows.
Cassandra
Single
cqlsh:mykeyspace> SELECT * FROM mobileTimeSeries WHERE devicetype='phone' AND deviceid = '4c410523' AND date = '2019-09-04';
Multiple
cqlsh:mykeyspace> SELECT * FROM mobileTimeSeries WHERE devicetype='tablet' AND deviceid = 'a0b81f74' AND date >= '2019-09-04';
Cloud Bigtable
Single
Use a row lookup to get data for a specific phone at the specified date which is all within one row. This will return each timestamped version of the values, so you should see two lines for connected_cell at different timestamps.
try {
String rowKey = "phone#4c410523#20190501";
Row row = dataClient.readRow(tableId, rowKey);
for (RowCell cell : row.getCells()) {
System.out.printf(
"Family: %s Qualifier: %s Value: %s Timestamp: %s%n",
cell.getFamily(),
cell.getQualifier().toStringUtf8(),
cell.getValue().toStringUtf8(),
cell.getTimestamp());
}
} catch (Exception e) {
System.out.println("Error during lookup: \n" + e.toString());
}
Multiple
Use a range scan to view a month of data for a specified mobile tablet which is spread across multiple rows. You can use a filter with these to only get certain versions of the data or filter on values.
try {
Query query = Query.create(tableId).range("tablet#a0b81f74#201905", "tablet#a0b81f74#201906");
ServerStream<Row> rowStream = dataClient.readRows(query);
for (Row row : rowStream) {
System.out.println("Row Key: " + row.getKey().toStringUtf8());
for (RowCell cell : row.getCells()) {
System.out.printf(
"Family: %s Qualifier: %s Value: %s Timestamp: %s%n",
cell.getFamily(),
cell.getQualifier().toStringUtf8(),
cell.getValue().toStringUtf8(),
cell.getTimestamp());
}
}
} catch (Exception e) {
System.out.println("Error during scan: \n" + e.toString());
}
7. Deletes
Here you'll delete the data you put into your table. First you'll delete an individual row, then you'll delete multiple rows.
Cassandra's CQL allows for single row deletes as well as range removals when all the primary columns are specified. You can do this with Bigtable by scanning a range then performing row level deletes. Note that you will get the same result, but will have more operations since each delete will be its own operation.
Cassandra
Single
cqlsh:mykeyspace> DELETE from mobileTimeSeries where devicetype='phone' and deviceid = '4c410523';
Multiple
cqlsh:mykeyspace> DELETE from mobileTimeSeries where devicetype='tablet' and deviceid = 'a0b81f74';
Cloud Bigtable
Single
Here you'll delete the data for a specific phone and date. Use the row key to delete one row at a time.
try {
String rowKey = "phone#4c410523#20190501";
RowMutation mutation = RowMutation.create(tableId, rowKey).deleteRow();
dataClient.mutateRow(mutation);
} catch (Exception e) {
System.out.println("Error during Delete: \n" + e.toString());
}
Multiple
Here you'll delete all the data for a specific mobile tablet. To migrate a CQL-query deleting multiple rows, you would need to perform a scan and then delete each row using the resulting set of row keys.
try {
Query query = Query.create(tableId).prefix("tablet#a0b81f7");
ServerStream<Row> rowStream = dataClient.readRows(query);
BulkMutation bulkMutation = BulkMutation.create(tableId);
for (Row row : rowStream) {
bulkMutation.add(row.getKey(), Mutation.create().deleteRow());
}
dataClient.bulkMutateRows(bulkMutation);
} catch (Exception e) {
System.out.println("Error during DeleteMultiple: \n" + e.toString());
}
8. Finishing up
Clean up
Cassandra
If you created a Cassandra cluster to follow along with this, feel free to delete it as you normally would.
Cloud Bigtable
If you created your table on an existing Cloud Bigtable instance you can delete it with the cbt command
cbt deletetable mobile-time-series
If you used the emulator, you can just stop the emulator to clear out all the work, by typing CTRL-C in the terminal you started it in.
Next steps
- Learn more about Cloud Bigtable in the documentation.
- Try a more in depth Cloud Bigtable codelab
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.
- Learn how to monitor time-series data with the OpenTSDB integration