How to sync data from Senseforce (Paze) to PostgreSQL with Airbyte

In this blog post we are going to discover how we can use Airbyte to synchronize Senseforce datasets to our own PostgreSQL database.

For a general introduction to what Senseforce and Airbyte are, please have a look at my previous post.

Prerequisites
Prepare a Dataset in Senseforce
Configuring the Airbyte Connector
Starting the Sync
Advantages of Airbyte for Senseforce data syncs
Summary

Prerequisites

To follow this guide, you need:

git and docker installed
a Senseforce user account which can create datasets

Prepare a Dataset in Senseforce

For a general and extensive introduction on how to use the Senseforce connector for Airbyte, see the official airbyte docs. Nevertheless, let's start from scratch here.

To use Airbyte to sync some data from Senseforce, we first need to define what data we want to extract.

Create a new Dataset Create new Dataset
Add the columns you want to sync by clicking on the data attributes in the "Add Data" section Senseforce Add Data Dialog

Important: You definitely need to add the "Timestamp", "Thing and "Id" column of the "Metadata" section. This is needed so that Airbyte can provide the Stream Slicing and Incremental Sync features.

In our example we are interested in the "Uncompressed size" and the "Inserted Events" metrics - see the above Screenshot for reference. But feel free to add any columns you like - as long as you keep Thing, Timestamp and Id in the dataset.
Give the Dataset a nice name and save. Senseforce save the dataset
Navigate to your user profile and create an API token Senseforce Create an API Token Make sure to not down this token as you will need it later in the Airbyte configuration.

NOTE: That's it. Your Senseforce installation is ready to export the defined dataset.

Configuring the Airbyte Connector

Conveniently for us, Airbyte already provides a Senseforce Source Connector - meaning we have an easy time and can use the user interface of Airbyte to configure our data extraction.

Clone the Airbyte Github Repository to your local machine by running:
```
1git clone https://github.com/airbytehq/airbyte.git
```
This will download the Airbyte source code and provides a convenient way for us to spin up an on-demand airbyte instance on our local machine.

NOTE: Alternatively, you can also host airbyte on a remote server - but that's a story for another time.

Run the following commands to run Airbyte on your local machine.

1cd airbyte
2docker compose up

Wait until the following output is shown in your terminal:

1      ___    _      __          __
2     /   |  (_)____/ /_  __  __/ /____
3    / /| | / / ___/ __ \/ / / / __/ _ \
4   / ___ |/ / /  / /_/ / /_/ / /_/  __/
5  /_/  |_/_/_/  /_.___/\__, /\__/\___/
6                      /____/
7  --------------------------------------
8   Now ready at http://localhost:8000/
9  --------------------------------------
10  Version: 0.40.23
11

With your browser, navigate to http://localhost:8000. The default username is airbyte and password is password. Complete the sign-up steps and will arrive at a page similar to the one below. Airbyte Start-Page
Click on "Create your first Connection" and select "Senseforce" from the Airbyte Dropdown menu Airbyte source selection
You will arrive at the following screen: Airbyte connector config screen

Add the information as follows:
- Source name: Display name of this source in your airbyte instance. Can be any name.
- Dataset ID: Id of your Senseforce dataset. This is the last part of your dataset-url in Senseforce Senseforce Dataset Id
- The first day (in UTC) when to read data from: As the name implies.
- Senseforce backend URL: The URL of your Senseforce backend. Easiest way to find your backend URL is to log out from your Senseforce profile. In the login Screen, you see the backend-url. It's simply the domain of the login screen. Senseforce backend url
- API Access Token: Enter the access token you created in the previous step.
Click Set up source afterwards.
The next screen will ask you to select a Destination: Airbyte Destination selection screen

There you may select where to send your data to. In our example, we want to select Postgres to store the exports in our very own database. In the next screen, enter:
- Destination name: Name of this Destination in your Airbyte instance. This can be any name.
- Host: Hostname of the database.
- Port: Port of the database.
- DB Name: Name of the database.
- Default Schema: The default schema tables are written to if the source does not specify a namespace. The usual value for this field is "public".
- User: Username to use to access the database.
- Password: Password associated with the database user.
- SSL Connection: Encrypt data using SSL. When activating SSL, select one of the connection modes in the next setting.
- SSL modes: Only relevant, if you select SSL Connection. This setting defines the SSL connection modes to connect to your Postgres database. Make sure your Postgres database is configured to connect with the selected mode.
- JDBC URL Params: Additional properties to pass to the JDBC URL string when connecting to the database formatted as 'key=value' pairs separated by the symbol '&'. (example: key1=value1&key2=value2&key3=value3).
- SSH Tunnel Method: Whether to initiate an SSH tunnel before connecting to the database, and if so, which kind of authentication to use. (This is actually a great feature, as it allows to securely SSH - tunnel to your database host. So you don't need to expose the database host).
Click "Set up Destination".
The next screen is the "Connection" screen, allowing you to configure, how to sync data between Senseforce and your PostgreSQL. You might hover over all the information-symbols to get an easy-to-understand description of what exactly we are setting. To finish the configuration, adjust the next screen as follows and click on "Set up connection". Airbyte Senseforce to Postgres connection configuration
- Connection Name: Name of the connection in your Airbyte instance. Can be any name.
- Replication frequency: Airbyte provides a powerful scheduler. If you want, you can set this to eg. 30 minutes to schedule a data sync every 30 minutes. We are interested in one-time-sync only, therefore we set it to "Manual".
- Namespace: This defines the name of our resulting file. It will be called export in our example.
- Sync mode: This defines how Airbyte syncs data from source to destination. Options are either "Incremental" or "Full Refresh". A "Full Refresh" always syncs all data. For Incremental Syncs, Airbyte stores a state variable - which in case of the Senseforce connector is the Timestamp of your Dataset. When you attempt to sync this source the next time, Airbyte will look up this state variable and continue syncing from the last successfully synced timestamp.
- Normalization & Transformation: For PostgreSQL destination, Airbyte provides a nice feature called Normalization. When running airbyte, the raw-data are simply stored as JSON blobs - so you will end up with having one single column called _airbyte_raw_data. If you enable Normalization, Airbyte attempts to unpack and flatten the JSON and provide separate columns for each JSON key. subsequently this means, that you will end up with one column per Senseforce dataset column - which is kind of what we want.
NOTE: If you do not enable Normalization, your data will still be synced - but you need to unpack the JSON in your downstream systems.

You are now read for your first data extraction. In the connection screen where you ended up after configuring your connector, click Sync now to start your first source synchronization. You might click on the "Sync Running" button to open the Logs and see whats happening. In the below example we see, that the connector is currently working and reading thousands of records. Airbyte sync logs

Wait until the sync is finished. You can find your exported data in the specified PostgreSQL database.

Advantages of Airbyte for Senseforce data syncs

Ok, we did all this connector configuration work - but what did we gain - compared to directly using the Senseforce API? Actually quite a lot:

We can use the configured connector to sync Senseforce data to any other supported Destination. Like local CSVs, Google BigQuery, Snowflake or even MQTT.
The Senseforce API is quite powerful, but also quite complex. It allows filtering and pagination. Airbyte handles all of that for us.
1. It paginates data to never fetch more than the Senseforce-supported API limits.
2. It implements backoff and retries to handle API rate limits.
3. And most importantly: Airbyte uses Stream Slices to fetch small chunks of data. So if you want to fetch data for let's say 5 years - Airbyte intelligently filters the Dataset to only include one day worth of data. This ensures to prevent any timeouts and API overloads.
We can start scheduling this connection - to automatically get the latest amount of data. We can combine this with "Incremental Syncs" to extract the most important information from Senseforce in a cost- and time-efficient manner.

Summary

We have seen, that Airbyte makes it easy to extract data even from very powerful APIs like the Senseforce APIs. It helps with features like Incremental Syncs, Retries, Backoff strategies, pagination and Stream Slices.

We also saw the easy-to-use Senseforce query builder and how convenient we can create our Datasets which we subsequently use in one of our downstream systems. By utilizing a PostgreSQL export, we enable ourselves to do advanced data analyses or custom BI dashboarding.

Senseforce, with it's very targeted machine industry low-code-tool seems perfect to collect, manage and prepare our data with very little to now IT/Data Science know how required. In combination with Airbyte, one can enable their companies Data Scientists to perform further Analyses, Machine Learning and more.

------------------

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

Cost control
Data privacy
Excellent performance - adjusted specifically for your intended use

Get your free LLM training guide