Webhooks Integration
What is a Webhook?
Webhooks are automated messages sent from websites or applications when a certain event occurred. Based on the configuration of the message, these messages include data that are sent to a unique Webhook URL.
How can you use Webhooks to import data to your data warehouse?
Various websites and applications allow you to subscribe to events. Webhooks allow you to send data in form of a JSON file based on these certain events, also called triggers.
For example, you may want to track whenever a lead moves inside their sales pipeline on Hubspot. You configure the Webhook to "Subscribe" to this changed status event. Whenever the status changes, the event is triggered and Hubspot will then POST an HTTP request to the pre-configured Webhook URL containing the event data. Once our server receives the event data, the data will be stored temporarily in your Google Cloud Storage. A scheduled job runs every 30 minutes to load this data into their respective tables in Bigquery. After the data is loaded into the BigQuery table, the temporary copy of the raw event data will be deleted from your Google Cloud Storage.
How can you set up a Webhook to send data to your Y42 Webhook URL?
Step 1: Create a Webhook integration
Go to the Integrations workspace and click on Add. Add a new Webhooks integration and name it accordingly.
Step 2: Name the table
By default, when creating a Webhook integration, you will have one table already. You can use the existing one but also create new ones. By entering a new table name and clicking on "Create New Table" you can create another Webhooks table.
Step 3: Create a Webhook in your application
Every platform and application (e.g. Zendesk, Shopify, Hubspot, etc.) has its own way to configure Webhooks. How extensive these configurations are, depends on the platforms and applications. To make sure that your integration works, please make sure that
- you have copied the Webhooks endpoint URL from you Y42 Webhooks table (see Step 2)
- you configure the Webhooks to send data in JSON format in a POST method
Example: Zendesk
Another characteristic of Webhooks is that they are only triggered when certain conditions are met.
Example: Zendesk
Schema
As events are totally customizable by their use case, our Webhook integration generalizes the approach and fetches data using these two fixed columns:
Timestamp (INT64) | JSON (STRING) |
1645803162 |
{ "uid": "bltfe99999fa999b9a9", } |
The first column is a timestamp that represents the timestamp when the event was received. The second column is the JSON payload in a STRING format. In order to make use of the data that was received by the Webhook integration, you will need to parse the data by using the JSON extraction node inside a model.
Schema changes
Webhooks do not propagate Schema changes as columns on the Y42 integration level. Instead, we load all the Payloads into one JSON string. Any change into the schema, made by the subscribed App or intended by the user, needs to be managed in the Modeling layer.
Rate Limits
We currently do not rate-limit our webhooks in any way.
Remarks
Schema changes
Any change to the schema, made by the original application or intended by the user, needs to be managed in the modeling layer.
Rate Limits
We currently do not rate-limit our webhooks in any way.
Historical Data
Webhooks do not allow to syncing of historical data. Webhooks can only capture data from the connection date onwards.
Updating Data
Every POST done to the Webhook URL will be batched and appended on a 30-minute interval to its destination table. When the Webhook receives the data, this data will be temporarily saved on you Cloud storage and 30 minutes after the reception of the data, this data is transferred to the dedicated BigQuery table and the temporary Cloud storage file will be deleted.
That means, for example, if your Zendesk Webhook is sending data every minute to the Y42 Webhook URL, this data will be batched and loaded to the destination only every 30 minutes!
Note: You can not trigger a manual re-import. The Webhook automatically updates the BigQuery table every 30 minutes after reception of new data.