Dropbase Pipelines and how they solve all your ETL problems

Learn how Dropbase Pipelines can help automate your ETL workflows--even with offline, flat files! We discuss how Dropbase provides an at-the-edge ETL experience, and share a video demo of Dropbase Pipelines.

What are Data Pipelines and "at the edge ETL "?

Data pipelines are a sequence of steps that extract data from many sources, possibly apply any transformations and then load the data to a destination. Data pipelines are commonly abbreviated as ETL or ELT, depending on when the data is transformed. For example, a company could be interested in storing their customer data analytics from different device types (mobile app data, website traffic data, etc) after combining the data geographically. They would start by connecting those various data sources to Segment (extract), apply processing steps to group it by geography (transform) and finally send it to their Snowflake data warehouse (load).

The ability of creating an automated ETL pipeline that begins with offline flat files is what we call "at the edge ETL".

Once automated, data pipelines provide a strong infrastructure for your business intelligence team to create actionable insights. However, all ETL tools—like Stitch—only extract from online and SaaS sources. What happens if you're part of a company that spends too much time wrangling with flat files (like CSVs and Excel files)? You could be spending precious hours editing large CSVs manually and then load them to an online source ... only to realize that you made a mistake and have to start all over again! Rinse and repeat for every week. This is where Dropbase comes in.

In addition to providing all the usual ETL features of connecting to online data sources, Dropbase accepts a wide variety of flat file formats to which you can apply completely repeatable transformation steps and create a live, analytics-ready database. The ability of creating an automated ETL pipeline that begins with offline flat files is what we call "at the edge ETL".

In Dropbase, Pipelines can be used to solve the following types of problems:

  • You get a flat file from your supplier/customer/partner on a weekly basis that you apply the same cleaning steps to, every week. As long as the flat file you get has a consistent schema, you can create a Pipeline to automatically apply the cleaning steps with a single click and save time every week.
  • You receive various flat files from various partners, apply transformations on each file, and then have to combine the files. Pipelines in Dropbase, as we'll see below, can be linked together in a chain and use other Dropbase databases as inputs.

Pipelines, in general, are very useful when you regularly process incoming data of the same original schema. You simply specify the processing steps (like steps in a recipe) on your worksheet and deploy it as a pipeline. Let's see how to deploy a pipeline in Dropbase!

How to create a pipeline in Dropbase

  1. Create/open a worksheet and make sure you have imported some data. Pipelines (and even data exports) require there to be some imported data for it to work
  2. On the right sidebar, apply whatever processing steps you want
  3. Click load to database (and don't forget to give your database a name!)
  4. On the top of the right sidebar, click "Deploy as pipeline" and name your pipeline.
  5. You can optionally click on "Add pipeline description" to add some information about your pipeline: your input data source, a brief summary of the transformations, its intended frequency, etc
  6. In the "Export to" dropdown menu, select the database you wish to export all future incoming data to
  7. Choose your export method: Overwrite deletes all previous data, while Append inserts new data below older data
  8. Click "Deploy as pipeline"!

Now, when you have a new file of the same original schema, you can simply drop the file into the Upload window after clicking your Pipeline. Check out our docs for more details and the below video for an example.

Pipelines in action: an example