How to sync data from Senseforce (Paze) to local CSV with Airbyte

blog preview

In this blog post we are going to discover what Airbyte is, what Senseforce is and how we can utilize Airbyte to sync any dataset from Senseforce to a local csv file.

What is Airbyte

In simple terms, Airbyte is a system made for integrating data. More specifically, it is tailored to excel at ELT - use cases. Meaning, they are extremely good in extracting and loading data and then storing them in another database or system. Why are they so good?

  • They have a ton of free and ready-to-use connectors. So there is a good chance that the system you want to integrate already is covered by Airbyte.
  • They have by far THE best connector building SKD. If the system of your choice is not available as a connector - it's really easy to create new connectors (which is also the reason they have so many of them)
  • Awesome, helpful and active community. You know who built a lot of the Airbyte-Connectors? Well, the community. Airbyte is very much focused on it's community and they invest heavily in building and maintaining an active community. They are even paying active community members.

And what are they offering?

  • As mentioned, a lot of connectors out-of-the-box and a great connector builder SKD
  • Scheduling to automatically trigger syncs
  • Logging and Monitoring of connectors
  • Incremental updates (no need to sync ALL data with every sync. Airbyte will maintain a state)
  • Pagination
  • Authentication
  • Stream Slicing (If you have a lot of data to sync, Airbyte automatically divides the API requests into chunks of data - not overloading the source APIs)

All in all, Airbyte is one of the best, if not the best solution to integrate modern systems into a modern data stack.

Airbyte Marketing OverviewHow Airbyte markets themselves (source: https://airbyte.com)

What is Senseforce

NOTE: As per time of this writing Senseforce was re-branded to Paze.Industries. As many still know them with their previous name, I'll call them Senseforce, but update this post once the new brand is established. Congrats to the re-branding by the way.

Senseforce/Paze.industries - in it's core - is a low-code IoT solution, targeted for the machine industry. Senseforce excels in out-of-the-box feature-completeness. They provide a wide array of features needed for big-tech IoT installations - they make it relatively easy to connect a machine, gather data and create feature-rich data applications in their cloud platform and cover the full lifecycle of a machine when it comes to the machine's data.

Among others, they offer the following key features:

  • Low-Code query builder. It's one of the feature-richest low-code SQL query builders out there
  • Script integrations: Besides low-code queries, one can also extend their data analytics with scripts
  • Dashboards
  • Edge Device Remote Management: They provide secure remote accessibility to any connected remote device
  • Virtual Events: Sort of data Transformations
  • Scheduling and Triggers: Create Events and Actions based on your data
  • User/Groups: Senseforce was made to accommodate tenants and clients of tenants, meaning they provide very detailed user and group management capabilities

And thankfully they also offer a REST-API, which we are going to use to sync their data from.

Senseforce low-code query builderSenseforce low-code query builder

Prerequisites

To follow this guide, you need:

  • git and docker installed
  • a Senseforce user account which can create datasets

Prepare a Dataset in Senseforce

As all Airbyte - Connectors, also the Senseforce Airbyte connector provides a great starting point for how to get started. See the airbyte docs for the introduction. Nevertheless, let's start from scratch here.

To use Airbyte to sync some data from Senseforce, we first need to define what data we want to extract.

  1. Create a new Dataset

    Create new DatasetCreate new Dataset

  2. Add the columns you want to sync by clicking on the data attributes in the "Add Data" section

    Senseforce Add Data DialogSenseforce Add Data Dialog

    Important: You definitely need to add the "Timestamp", "Thing" and "Id" column of the "Metadata" section. This is needed so that Airbyte can provide the Stream Slicing and Incremental Sync features.

    In our example we are interested in the "Uncompressed size" and the "Inserted Events" metrics - see the above Screenshot for reference. But feel free to add any columns you like - as long as you keep Thing, Timestamp and Id in the dataset.

  3. Give the Dataset a nice name and save.

    Senseforce save the datasetSenseforce save the dataset

  4. Navigate to your user profile and create an API token

    Senseforce Create an API TokenSenseforce Create an API Token

    Make sure to not down this token as you will need it later in the Airbyte configuration.

NOTE: That's it. Your Senseforce installation is ready to export the defined dataset.

Configuring the Airbyte Connector

Conveniently for us, Airbyte already provides a Senseforce Source Connector - meaning we have an easy time and can use the user interface of Airbyte to configure our data extraction.

  1. Clone the Airbyte Github Repository to your local machine by running:

    1git clone https://github.com/airbytehq/airbyte.git

    This will download the Airbyte source code and provides a convenient way for us to spin up an on-demand airbyte instance on our local machine.

    NOTE: Alternatively, you can also host airbyte on a remote server - but that's a story for another time.

  2. Run the following commands to run Airbyte on your local machine.

    1cd airbyte
    2docker compose up

    Wait until the following output is shown in your terminal:

    1 ___ _ __ __
    2 / | (_)____/ /_ __ __/ /____
    3 / /| | / / ___/ __ \/ / / / __/ _ \
    4 / ___ |/ / / / /_/ / /_/ / /_/ __/
    5 /_/ |_/_/_/ /_.___/\__, /\__/\___/
    6 /____/
    7 --------------------------------------
    8 Now ready at http://localhost:8000/
    9 --------------------------------------
    10 Version: 0.40.23
    11
  3. With your browser, navigate to http://localhost:8000. The default username is airbyte and password is password. Complete the sign-up steps and will arrive at a page similar to the one below.

    Airbyte Start-PageAirbyte Start-Page

  4. Click on "Create your first Connection" and select "Senseforce" from the Airbyte Dropdown menu

    Airbyte source selectionAirbyte source selection

  5. You will arrive at the following screen:

    Airbyte connector config screenAirbyte connect to config screen

    Add the information as follows:

    • Source name: Display name of this source in your airbyte instance. Can be any name.

    • Dataset ID: Id of your Senseforce dataset. This is the last part of your dataset-url in Senseforce

      Senseforce Dataset IdSenseforce Dataset Id

    • The first day (in UTC) when to read data from: As the name implies.

    • Senseforce backend URL: The URL of your Senseforce backend. Easiest way to find your backend URL is to log out from your Senseforce profile. In the login Screen, you see the backend-url. It's simply the domain of the login screen.

      Senseforce backend urlSenseforce backend url

    • API Access Token: Enter the access token you created in the previous step.

    Click Set up source afterwards.

  6. The next screen will ask you to select a Destination:

    Airbyte Destination selection screenAirbyte Destination selection screen

    There you may select where to send your data to. In our example, we want to select Local CSV to store the exports to a local CSV file. In the next screen, enter:

    • Destination name: Name of this Destination in your Airbyte instance. This can be any name.
    • destination_path: Airbyte adds all files to a local folder mount which can - by default - be found in your locals host folder /tmp/airbyte_local. The setting destination_path therefore needs to be relative to /local. A possible example is /local/export. Note: This setting defines the directory where your files are placed - not the filename itself.
  7. Click "Set up Destination".

  8. The next screen is the "Connection" screen, allowing you to configure, how to sync data between Senseforce and your local csv. You might hover over all the information-symbols to get an easy-to-understand description of what exactly we are setting. To finish the configuration, adjust the next screen as follows and click on "Set up connection".

    Airbyte Senseforce to Local CSV connection configurationAirbyte Senseforce tLocal CSV connection configuration

    • Connection Name: Name of the connection in your Airbyte instance. Can be any name.
    • Replication frequency: Airbyte provides a powerful scheduler. If you want, you can set this to eg. 30 minutes to schedule a data sync every 30 minutes. We are interested in one-time-sync only, therefore we set it to "Manual".
    • Namespace: This defines the name of our resulting file. It will be called export in our example.
    • Sync mode: This defines how Airbyte syncs data from source to destination. Options are either "Incremental" or "Full Refresh". A "Full Refresh" always syncs all data. For Incremental Syncs, Airbyte stores a state variable - which in case of the Senseforce connector is the Timestamp of your Dataset. When you attempt to sync this source the next time, Airbyte will look up this state variable and continue syncing from the last successfully synced timestamp.

Starting the Sync

You are now read for your first data extraction. In the connection screen where you ended up after configuring your connector, click Sync now to start your first source synchronization. You might click on the "Sync Running" button to open the Logs and see whats happening. In the below example we see, that the connector is currently working and reading thousands of records.

Airbyte sync logsAirbyte sync logs

Wait until the sync is finished. You can find your exported CSV file in the the /tmp/airbyte-local directory on your host machine.

Advantages of Airbyte for Senseforce data syncs

Ok, we did all this connector configuration work - but what did we gain - compared to directly using the Senseforce API? Actually quite a lot:

  1. We can use the configured connector to sync Senseforce data to any other supported Destination. Like PostgreSQL, Snowflake or even MQTT.
  2. The Senseforce API is quite powerful, but also quite complex. It allows filtering and pagination. Airbyte handles all of that for us.
    1. It paginates data to never fetch more than the Senseforce-supported API limits.
    2. It implements backoff and retries to handle API rate limits.
    3. And most importantly: Airbyte uses Stream Slices to fetch small chunks of data. So if you want to fetch data for let's say 5 years - Airbyte intelligently filters the Dataset to only include one day worth of data. This ensures to prevent any timeouts and API overloads.
  3. We can start scheduling this connection - to automatically get the latest amount of data. We can combine this with "Incremental Syncs" to extract the most important information from Senseforce in a cost- and time-efficient manner.

Summary

We have seen, that Airbyte makes it easy to extract data even from very powerful APIs like the Senseforce APIs. It helps with features like Incremental Syncs, Retries, Backoff strategies, pagination and Stream Slices.

We also saw the easy-to-use Senseforce query builder and how convenient we can create our Datasets which we subsequently use in one of our downstream systems. By utilizing a simple CSV export, we enable ourselves to do some Data Analytics on our local machine.

Senseforce, with it's very targeted machine industry low-code-tool seems perfect to collect, manage and prepare our data with very little to now IT/Data Science know how required. In combination with Airbyte, one can enable their companies Data Scientists to perform further Analyses, Machine Learning and more.

------------------

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use

Need assistance?

Do you have any questions about the topic presented here? Or do you need someone to assist in implementing these areas? Do not hesitate to contact me.