Generated columns in MySQL

Sometimes a table just needs to capture JSON input and store it. An example is the greenhouse api (developers.greenhouse.io/webhooks.html). In order to track stage changes you need to listen to webhook events which arrive as json blobs. Rather than having to design a full schema up front you can simply store the payload and add generated columns later as you need them. Here's an example you can try:

DROP TABLE IF EXISTS service_webhooks;
CREATE TABLE service_webhooks (
  id int not null auto_increment primary key,
  payload json
);


INSERT INTO service_webhooks (payload) VALUES ('{"user_id": 123, "message": "how are you?", "reciever_id": 456, "metadata": {"key": "value"}}');
INSERT INTO service_webhooks (payload) VALUES ('{"user_id": 456, "message": "doing well", "reciever_id": 123, "metadata": {"key": "value"}}');
INSERT INTO service_webhooks (payload) VALUES ('{"user_id": 789, "message": "hi!", "broadcast_id": "abc", "metadata": {"key": "value"}}');

The query you'd normally run is:

SELECT id, payload->>"$.user_id", payload FROM service_webhooks;

However this is hard to remember and we use it often enough we'd like to extract it. We can use generated columns as follows:

ALTER TABLE service_webhooks ADD COLUMN user_id INT GENERATED ALWAYS AS (payload->>"$.user_id") VIRTUAL;
SELECT * FROM service_webhooks;

In this case we see:

Screen Shot 2022-04-15 at 7.29.30 PM.png

This column is virtual so it is computed at run time, however this means we can't make indexes on it or have it appear in the binary log for ETL or CDC use cases. To do that we can use generated columns that are generated:

ALTER TABLE service_webhooks ADD COLUMN user_id_2 INT GENERATED ALWAYS AS (payload->>"$.user_id") STORED;
ALTER TABLE service_webhooks ADD INDEX `user_id_2` (`user_id_2`);

Screen Shot 2022-04-15 at 7.30.37 PM.png

Screen Shot 2022-04-15 at 7.30.55 PM.png

Screen Shot 2022-04-15 at 7.31.10 PM.png

Other use cases for computed columns range from running a complex stored procedure and storing that result in a column for calculations later (a very rudimentary sort of materialization), modular cohorting (hash an integer id into a group MOD(id, 10) for instance), or doing simple math computations like adding a tax and a charge amount.

Good luck!