The Evolution of Data Pipelines: From ETL to ELT and Beyond!

May 13, 2025

Few things have changed as rapidly and profoundly as data pipeline architectures. Data pipeline architecture is essentially the design of how data flows from its source systems all the way to the consumption layer (analytics, BI, ML, etc.). This often involves the classic steps of Extract, Transform, Load (ETL) – though not always in that order. A well-designed pipeline is critical for handling the “5 V’s” of big data (volume, velocity, variety, veracity, value) while remaining efficient to maintain. Over the past decade, as technologies and business needs have evolved, data pipelines have transformed dramatically. What started as monolithic, batch ETL jobs in the Hadoop era has given way to modern, cloud-native ELT and streaming systems, and now new paradigms like zero-ETL integration and no-copy data sharing are emerging. In this article, we’ll trace this evolution from past to present, highlighting each architectural pattern and discussing how modern tools (in particular, SQLMesh for data transformations) enhance these pipelines. We’ll also share real-world context and best practices for teams looking to modernize their data stack.

ETL – The Hadoop Era of Batch Processing (2011–2017)

In the early 2010s, the rise of Hadoop ushered in the first generation of “big data” pipelines. During the Hadoop era (roughly 2011–2017), most organizations still hosted data on-premises with limited, non-scalable storage and compute. Distributed processing via Hadoop helped, but compute resources were still precious. This meant data engineers had to do a lot of upfront work to make data usable: extracting raw data, cleaning and normalizing it, and transforming it before loading it into an analytics database. This is the classic ETL pattern, where transformation happens prior to loading into the final data store.

A simplified ETL pipeline in the Hadoop era: data is extracted from the source, processed through a coded pipeline that transforms it before loading (cleaning and aggregating the data), and then stored in an on-premises database for consumption (e.g., by BI tools). This architecture emphasizes heavy pre-processing of data due to constrained storage and compute resources in on-prem environments.

Typical Hadoop-era pipelines were often hard-coded and brittle. Engineers spent considerable time modeling schemas and hand-optimizing MapReduce or SQL queries to fit within the tight resource constraints. For example, an e-commerce company might nightly extract log files or relational data, run a series of Hive/Pig jobs on a Hadoop cluster to transform and aggregate that data, then load the results into a relational data warehouse or OLAP cube for reporting. By the time the data reached the business intelligence (BI) layer, it was pre-aggregated and query-ready, but this came at the cost of complex, inflexible pipelines. If requirements changed or new data sources arrived, engineers often had to rewrite large portions of the ETL code.

This heavy upfront transformation made sense at the time: storage was expensive and compute was limited, so you transformed data to be as compact and usable as possible before storing it. However, it also meant long lead times for new data (since every change had to go through the whole pipeline) and difficulty adapting to new questions. The limitations of the ETL approach set the stage for the next evolution as technology shifted toward the cloud.

ELT – The Cloud Modern Data Stack Era (2017–Present)

Around 2017, the industry saw a major shift with the advent of cloud-native data warehouses and data lakes. In this modern data stack era, cloud platforms like Snowflake, Amazon Redshift, Google BigQuery, and Databricks allow storage and compute to scale independently (and cheaply). Suddenly, it was feasible to load raw data into a cloud warehouse first and transform it later inside the warehouse. This flipped the old paradigm into what we now call ELT (Extract, Load, Transform).

Under ELT, a typical pipeline might extract data from source systems and load it straight into a staging area of a cloud data warehouse or data lake, without extensive preprocessing. Because cloud storage is scalable and affordable, teams can dump raw data as-is. Then, using the warehouse’s power, they transform the data to a cleaned, enriched form for analysis. This reordering – loading before transforming – brings tremendous agility. Data engineers no longer have to pre-shape the data to fit a narrow destination schema; they can store everything and decide later how to join or aggregate it for various uses. As a result, new use cases blossomed. With virtually unlimited compute on demand, organizations could run more complex analytics, spin up experiments, and feed machine learning pipelines directly from the warehouse.

In practice, the ELT architecture is enabled by a host of cloud-based tools. Data ingestion is typically handled by automated pipelines (for example, tools like Fivetran or Airbyte that replicate data from sources into the warehouse). Once the raw data lands in a “staging” schema, the transformation layer takes over. This is where SQLMesh comes in. SQLMesh is an open-source framework for defining and executing data transformations inside the warehouse (or data lake) in a declarative way – meaning engineers write SQL (or Python) models that describe what data to produce, and SQLMesh handles the how and when. With SQLMesh, the transformations that would have been hard-coded in a Hadoop job can instead be managed as modular SQL models with built-in support for version control, testing, and orchestration.

To illustrate, imagine a retail company adopting ELT: they use an ingestion tool to continuously load raw sales transactions and web clickstream data into Snowflake. Using SQLMesh, they define a model for cleaned sales_orders that joins raw transactions with reference data (like product info and store info) in SQL. This model might be scheduled to run hourly. SQLMesh’s engine takes care of figuring out dependencies (e.g., raw data must be loaded first), running the SQL in Snowflake, and materializing the updated sales_orders table. Downstream, another SQLMesh model might aggregate sales_orders into a daily sales summary for reporting. Because SQLMesh tracks models and their versions, the team can safely evolve the transformation logic over time – if they want to add a new column or fix a bug, SQLMesh’s version control and environment promotion features let them test the change in an isolated environment and then “promote” it to production with confidence. This approach drastically reduces the risk of breaking pipelines when making changes, since all dependencies are known and can be validated ahead of time (SQLMesh provides an impact analysis similar to a Terraform plan, which is effectively an automated lineage check for your SQL).

The modern ELT-based stack usually also includes an orchestration layer (tools like Airflow, Dagster, or Prefect) to schedule and monitor jobs, though SQLMesh can even handle scheduling natively via its built-in scheduler (using cron expressions in model definitions). The output of the ELT pipeline – now stored in the production schemas of the data warehouse/lakehouse – is ready for consumption by BI and analytics tools or data science models. The modularity of this stack is a huge advantage: each piece (ingestion, storage, transformation, BI) can be swapped or scaled independently. Many companies have standardized on an ELT-centric “modern data stack” composed of best-of-breed SaaS tools: e.g., Fivetran for ingestion, Snowflake/BigQuery for storage, SQLMesh for transformations, Airflow for orchestration, and something like Looker or Tableau for BI. These loosely coupled, cloud-based components integrate to form a flexible pipeline.

Crucially, ELT unlocked more speed and experimentation. Since raw data is available immediately in the warehouse, data analysts and scientists can query it directly or transform it in new ways without waiting for a whole new ETL job. This doesn’t mean transform logic is ad-hoc, though – frameworks like SQLMesh ensure that transformations remain governed and reproducible, by offering features such as data testing (you can write tests or assertions to validate data quality), caching (to reuse intermediate results and avoid re-computation when data hasn’t changed), and observability (logging and monitoring of pipeline runs, with integration points for data catalog or quality tools). In short, ELT ushered in an era where declarative transformations became the norm: rather than scripting every step, engineers declare their end-state tables, and systems like SQLMesh handle the heavy lifting, tracking what needs to run when.

Of course, ELT didn’t entirely replace ETL overnight – some organizations still use ETL for certain use cases or initial loads (and the term “ETL” is often used loosely to describe any data pipeline). However, ELT has become the dominant pattern for analytical data pipelines in the cloud. It addresses many of the Hadoop-era pain points by leveraging cloud scale. Yet, as data needs continued to grow, even ELT with scheduled batch transformations wasn’t enough for every scenario. This brings us to the next stage of evolution: streaming pipelines.

Streaming – Real-Time Data Pipelines in Parallel

Even as batch-oriented ELT pipelines handle the majority of analytics, the rise of real-time use cases has driven the adoption of streaming data pipelines. Sometimes, waiting an hour or a day for fresh data is too slow – businesses may need insights in near real-time (consider online fraud detection, live dashboards for app metrics, personalization on a website, etc.). To meet this need, streaming pipelines emerged alongside the traditional batch pipelines.

A streaming data pipeline architecture focuses on moving and processing data continuously, as events happen, rather than in discrete batches. Typically, streaming pipelines run in parallel to the regular batch/ELT pipeline – they are an additional layer used for specific real-time applications. For example, a company might still load data into a warehouse daily (batch ELT for broad analytics), but also have a streaming system that pushes certain event data to a real-time dashboard or triggers machine learning models within seconds.

The industry-standard technology underpinning streaming pipelines is often Apache Kafka – an event streaming platform that acts as a durable messaging system for high-volume event data. Data producers (say, a web service producing user click events) push events into Kafka topics. Downstream consumers can then subscribe and process those events in real-time. Additional frameworks like Apache Flink or Spark Structured Streaming can be used to transform or aggregate streams on the fly. The typical streaming pattern is described as stream -> collect -> process -> store -> analyze. In practice, a streaming pipeline might look like: source events -> Kafka -> a stream processing job (possibly doing some transformation, enrichment, or windowed aggregation) -> an output sink. The output might be an application (e.g., an alerting system or a live dashboard), or even feeding into a data lake or warehouse for later combination with batch data.

One common design is the Lambda architecture, where you maintain both a batch and a speed layer: the streaming pipeline provides immediate but maybe not perfectly refined data, while the batch pipeline later provides the high-quality, fully transformed data for the same events (and the two results are reconciled). Another emerging design is the Delta architecture (or continuous ETL), where micro-batches are processed in a streaming fashion to achieve low latency without a completely separate pipeline.

Streaming pipelines come with their own challenges. Because data is moving so fast, it’s difficult to validate data quality or apply complex transformations in-flight as thoroughly as you can in batch. For instance, you might not have the luxury of checking referential integrity or performing large joins in a high-throughput stream without introducing unacceptable latency. There’s also the operational complexity – ensuring exactly-once processing, handling late or out-of-order data, scaling the stream processors, etc., requires careful architecture. Some organizations find that maintaining a full streaming stack (Kafka clusters, stream processing jobs, etc.) is a significant investment that only pays off for certain use cases.

Given these trade-offs, many teams use streaming selectively. Near real-time needs might be solved with micro-batching (e.g., running an ELT job every 5 minutes) if true event-at-a-time streaming isn’t justified. In fact, some modern cloud warehouses have features to support this – for example, Snowflake’s Snowpipe or BigQuery streaming ingestion allow small batches to continuously load, blurring the line between batch and streaming. As an example, JetBlue’s data team opted to use Snowflake Tasks (which schedule frequent small loads) to achieve near-real-time updates, instead of implementing a separate Kafka pipeline. This kind of solution can simplify architecture by leveraging existing tools for incremental loading, albeit with slightly higher latency than a pure stream. On the other hand, a media company like Fox Networks, with extreme real-time demands (think of ingesting streaming data for live TV analytics), built a robust streaming and micro-batch architecture using Spark and various AWS services to guarantee real-time data with high reliability.

From SQLMesh’s perspective, streaming data doesn’t eliminate the need for transformations – it changes when and where they happen. You might still use SQLMesh or similar frameworks to manage transformations on the data after it lands in a lake or warehouse via streams. For instance, streaming events could be continuously appended to a delta lake table, and SQLMesh could orchestrate regular jobs to merge those into curated fact tables in the warehouse every hour. There is also an emerging convergence where streaming frameworks can trigger batch model runs or where a tool like SQLMesh could be extended to handle incremental updates more gracefully. The key is that streaming pipelines address the velocity aspect of data, delivering data faster, but they usually complement rather than replace the existing ELT batch pipelines.

“Zero ETL” Integration

Data pipeline architectures are never static – there’s continuous innovation to reduce complexity and latency. One recent trend getting a lot of buzz is “Zero ETL”. The name is a bit of a misnomer; there is still extraction and transformation happening, but the goal of zero ETL architectures is to eliminate the need for explicit, separate ETL/ELT processes between operational databases and analytical systems. In other words, make the integration so seamless that data can be queried in one place without a hefty pipeline to move and reshape it.

How is that possible? Essentially, by the tight coupling between the OLTP and OLAP layers. Cloud providers are building integrations where a transactional database automatically feeds data into an analytics store behind the scenes, or both systems share the same underlying storage. For example, Amazon Aurora (a transactional relational database) can replicate data into Redshift (a data warehouse) almost continuously under the hood, or Google BigTable can stream changes into BigQuery. Snowflake has introduced Unistore, which attempts to combine aspects of transactional and analytical capabilities in one platform. In a zero-ETL scenario, you might simply designate which tables from your production database should be available in your warehouse, and the cloud platform handles getting those changes over in near real-time. There’s no separate pipeline code for you to maintain – hence “zero ETL.”

In practice, these architectures still involve extraction (the data is replicated) and transformation (data might be stored in a slightly different form or need minor cleaning), but much of it is abstracted away by the platform. The data might remain in a data lake format and be queried directly there, or use change-data-capture to apply updates to warehouse tables continuously. The benefit is obvious: less pipeline maintenance and faster availability of data in the analytic system. If your OLTP and OLAP are basically the same system (or tightly integrated), you remove latency and complexity.

However, zero ETL comes with constraints. Often, it requires using a single cloud vendor’s ecosystem end-to-end. The transactional database and the data warehouse must be the specific pair that supports this native integration (e.g., Aurora->Redshift on AWS, BigTable->BigQuery on GCP, or two sides of Snowflake Unistore). This can lead to some vendor lock-in. Moreover, even if the data is auto-replicated, you likely will still need to transform and model it for optimal use – raw transactional schemas aren’t usually analytics-friendly. This is where tools like SQLMesh remain relevant: they can sit on top of a zero-ETL pipeline to provide the transformation layer (perhaps now working more with views or materialized views, since the data is already “loaded”). In a zero-ETL world, one could imagine SQLMesh defining virtual models that directly query the raw data in place (for example, querying data in a lake that’s continuously updated, without an explicit “load” step). In fact, SQLMesh’s support for various execution engines (like Spark, Trino, etc.) means it could help bridge across systems – if one part of your data is in a lake and another in a warehouse, SQLMesh can orchestrate transformations that pull from both, without manual ETL code.

It’s worth noting that not everyone will go “full zero-ETL.” Many organizations will continue to maintain some pipelines for custom integrations or because they use heterogeneous systems. Interestingly, there’s a parallel trend of decoupling storage and compute at the vendor level – for instance, using open table formats (Iceberg, Hudi, Delta) in object storage (like S3) and querying them with various engines. This is almost the opposite of zero-ETL, emphasizing portability and avoiding lock-in, even if it means you do have to manage some pipelines yourself. We’re seeing a bit of a tug-of-war between all-in-one platform approaches and open multi-tool approaches.

In summary, zero ETL architectures aim to simplify pipelines by natively connecting operational and analytical data stores, reducing or eliminating the need for separate extraction and loading processes. While promising, they currently tend to exist in specific vendor silos. They don’t obviate the need for transformation logic – they just change where that logic lives (more in the source or integrated platform, less in user-managed pipelines). Data engineering teams adopting zero-ETL still need robust data modeling practices, tests, and oversight, which SQLMesh can provide in the new paradigm.

Emerging Paradigms: No-Copy Data Sharing

The last architectural pattern in our evolution journey isn’t about how data flows from source to target, but rather about eliminating data movement altogether for certain use cases. No-copy data sharing is an emerging approach where, instead of sending data to the consumer, you grant the consumer access to query the data directly where it lives. This is sometimes considered an “architecture” in itself because it changes the way data producers and consumers interact across systems.

Traditionally, if Company A wants to share data with Company B (or even between departments), they might set up an ETL pipeline to copy data from A’s database into B’s environment (or a file transfer, etc.). With no-copy data sharing, Company A can simply share a piece of its warehouse or data lake with Company B. The data never leaves A’s storage; B can query it over a secure connection as if it were in their own system, and only the results of queries are transferred. Snowflake’s Secure Data Sharing and Databricks’ Delta Sharing are prime examples of this approach.

In a data sharing architecture, the emphasis is on permissions and governance rather than pipelines. The diagram for data sharing might simply show a data provider and data consumer connected by a dotted line (representing access), with the data itself remaining in one place. There is truly no ETL involved in the sharing step (hence why the Monte Carlo blog categorizes it separately from ETL/ELT) – the only “processing” might be setting up views or secure objects that filter or mask the shared data as needed.

No-copy data sharing architecture: the data remains at the source and is not physically copied or moved. Instead, the consuming team or platform is granted secure access to query the source data directly (represented by the dotted line “data doesn’t move” link to BI/ML). This eliminates the ETL pipelines for data sharing use cases, though it requires strong governance to control access.

This model is especially powerful for cross-company data sharing, data monetization scenarios, or even inter-departmental sharing in large enterprises where moving data would be too slow or duplicative. For example, a SaaS company might share usage data with a client by giving them read-only access to a particular schema in its Snowflake instance, instead of sending CSVs or building an API. The client can then run their own queries or connect BI tools to that shared data in real-time, always seeing the most up-to-date information.

No-copy sharing doesn’t replace internal pipelines; rather, it augments them for specific cases. You wouldn’t use data sharing for all data movement (especially internal and operational needs), but it’s a great solution for providing external parties with data without maintaining separate export pipelines. It’s also complementary with the other patterns: for instance, your organization might use ELT or zero-ETL internally, and then share some of the resulting data products externally via a no-copy share.

One thing to keep in mind is that data sharing requires trust and governance. You need to ensure that the shared data is well-defined, quality-checked, and access-controlled. In terms of tooling, if you are the data provider, you’ll still use transformation tools to prepare the data before sharing (you probably don’t want to share raw, uncleaned data directly). SQLMesh can help here by producing the curated tables that are then shared. If you’re a data consumer receiving a share, you might use SQLMesh within your environment to further transform or integrate the shared data with your own internal data (again, without copying – maybe using cross-database queries or virtualization).

Both zero ETL and no-copy data sharing show how the future of data pipelines is heading toward less data movement and more seamless integration. They hint at a world where the boundaries between systems are blurrier. But in today’s reality, most organizations still employ multiple types of pipelines. As the Monte Carlo authors note, these patterns aren’t mutually exclusive – “most organizations deploy some or all of these data pipeline architectures” in combination. A single enterprise might have legacy batch ETL jobs for certain systems, an ELT modern warehouse for the core analytics, a couple of streaming feeds for critical real-time needs, and maybe a new zero-ETL project in pilot, all coexisting. The key for data engineers is to understand the strengths and trade-offs of each approach and apply the right architecture to the right problem.

Choosing the Right Architecture (and Tools) – Best Practices and Takeaways

Looking across this evolution, it’s clear that no one architecture is “best” for all situations. Each arose to solve specific problems:

ETL handles large volumes on limited infrastructure by pre-processing data to reduce load at query time – but it’s slow to adapt.

ELT leverages cloud scalability to make pipelines more flexible and faster to develop, but typically operates on batch schedules.

Streaming addresses latency, delivering fresh data quickly, though often with added complexity and potential data quality trade-offs.

Zero ETL aims to simplify pipelines by letting operational and analytical systems seamlessly sync, but currently ties you to specific platforms.

Data Sharing removes the need to copy data for sharing across boundaries, but requires trust and doesn’t by itself transform the data.

An advanced data platform might incorporate all of these in different places. For teams looking to adopt or upgrade their data stack, here are some practical takeaways and best practices to consider:

Understand your use cases and requirements (create data SLAs): Before jumping on a new architecture, determine what your business needs in terms of data freshness, volume, and quality. If your analysts are fine with daily updates, a batch ELT pipeline might suffice. If certain applications need second-by-second data, identify those and consider streaming or micro-batching for them. It’s wise to codify these expectations as SLAs – e.g. “dashboard X is updated every 15 minutes with 99% accuracy” – to guide your architecture choices. This ensures you don’t over-engineer a streaming solution where a simple batch would do, or vice versa.

Mix and match patterns as needed: It’s not all-or-nothing. Use streaming for the portions of data that truly require it, and ELT for the rest. If you invest in a zero-ETL integration, you can use it alongside existing pipelines. Many modern data stacks are a hybrid. For example, you might ingest most sources via Fivetran into a warehouse (ELT) but also set up Kafka for a couple of event streams (streaming) that feed into the same warehouse. Later, you might share some curated warehouse tables with partners via Snowflake Data Sharing (no-copy). These components can complement each other.

Leverage declarative frameworks and automation: As pipelines get more complex, it’s crucial to manage them with proper tooling. Writing one-off scripts might work for a simple ETL, but it doesn’t scale. Declarative transformation frameworks like SQLMesh help enforce best practices by design. They let you define data models with clear dependencies, which aids in data lineage and impact analysis (no more guessing what will break if you change a column). They also integrate with version control systems (treat your pipeline as code!) so that every change is tracked. Automation is key: from CI/CD pipelines that test your data changes, to orchestrators that trigger jobs reliably. A best practice is to design pipelines to be modular – break the work into smaller reusable pieces (for instance, build a reusable customer_dim table that many downstream models can use, rather than doing customer transformations separately in every model). Modular design, combined with tools that handle dependency management, means you can change one piece without rewriting everything.

Test and validate your data continuously: Data quality cannot be an afterthought. Incorporate data tests into your pipeline development. SQLMesh allows writing tests or expectations for your models (e.g., “row counts should not decrease by more than 5% day-over-day” or “primary keys should be unique”). This kind of automated testing or using a data observability tool (like Monte Carlo, which monitors data anomalies in production) will save you from silently propagating bad data to your users. Catch issues as early as possible – ideally before you promote a change to production. Also consider data lineage tools or features that can show how data flows across your pipeline; this context is invaluable when something goes wrong or when assessing the impact of a change.

Monitor performance and cost: One benefit of the cloud is easy scaling, but that can lead to surprise costs or inefficiencies if you’re not careful. Make sure to monitor your pipeline runtimes and query costs. If a transformation is getting slower over time, investigate it – perhaps an index is needed or logic can be optimized. Many warehouses provide query logs you can analyze for slow or costly queries. Optimize your data models (for example, cluster your biggest fact table by date if most queries filter by date). And don’t load data more frequently than necessary – streaming is great, but if no one looks at that data in real-time, you might be burning resources for nothing. Regularly review your pipeline and trim unused or inefficient parts. This is part of the “operational excellence” of data engineering.

Ensure pipelines are idempotent and resilient: An oft-cited best practice is to make your pipelines idempotent, meaning running them twice produces the same result as running them once. In practical terms, this could mean designing incremental loads such that they can be re-run without duplicating data (e.g., use upsert or partition overwrite logic). SQLMesh assists with this by supporting incremental models and caching – if a job fails halfway, you can often rerun it without starting from scratch, and it will only process the missing portion. Additionally, add safeguards like checkpointing or temporary staging tables that only swap to production after success, so partial failures don’t corrupt your outputs. Use alerts (through your orchestrator or custom monitors) to notify the team of failures or data anomalies. The goal is a pipeline that is robust – it either completes correctly or cleanly stops and can be fixed/restarted without manual clean-up.

Embrace the “data product” mindset: As you modernize your data pipeline, think of the outputs of your pipeline not just as tables, but as data products with customers and SLAs. This concept, popularized by data mesh architecture, means treating datasets (like a monthly financial report or a machine learning feature table) with the same care a product team would treat a user-facing app. Document your important tables (what are the definitions, who owns them, and how often are they updated). Use a data catalog to make them discoverable. Secure them appropriately (apply access controls so only the right people can see sensitive data). By elevating key datasets to first-class “products,” you ensure that the pipeline delivers real value and that value is maintained over time. Tools in the modern data stack, from transformation frameworks to observability platforms, all help in delivering reliable data products by enforcing quality and consistency.

Finally, keep an eye on the future. The lines between databases, data warehouses, and data lakes continue to blur. New technologies will further reduce the friction in data movement, but they will also require us to learn new patterns. The good news is that the core principles of good data pipeline architecture remain steady: understand your data and dependencies, build in quality checks, and make the system as simple as possible (but no simpler) to meet the requirements. With those principles and the right tools, you can confidently adopt new innovations like SQLMesh, streaming platforms, or zero-ETL offerings, and integrate them into your stack without losing control.

Conclusion

Data pipeline architectures have evolved from the rigid, heavy-lift ETL of the Hadoop era to the flexible, componentized ELT pipelines of the cloud, and now toward real-time and no-copy paradigms that promise even less friction. For experienced data engineers, it’s an exciting progression – each step solving old problems but bringing new considerations. Adopting a modern stack doesn’t mean throwing out everything that came before; it means choosing the right architecture for the job and often blending patterns to cover all your bases. By leveraging frameworks like SQLMesh for transformation and workflow management, teams can ensure that no matter how data is extracted or loaded – be it batch or stream, lake or warehouse – the process of transforming, testing, and delivering it is repeatable and reliable. In the end, the goal is to get trusted data, of the right shape, at the right time, to the people and systems that need it. Achieving that is as much about architecture and culture as it is about any one tool. So learn from the past patterns, keep an eye on emerging ones, and don’t be afraid to combine them. With a solid foundation and modern tooling, you’ll be well-equipped to build data pipelines that stand the test of time (and change). Now go build some data pipelines!

Andrew Madson MSc, MBA

Head of Education & Evangelism at Tobiko

Founder, Insights x Design

Insights x Design