Isaac's Blog

Ideating around integrations development

Hi everyone, this is my first time writing on the internet, and I wanted to share some of the ideas that I have explored over the past few months at Entrepreneur First. I hope that other entrepreneurs can find my learnings useful as a jumping off point for their own ideas.

In this post, I'll describe my ideation around tooling to enable developers to build better integrations, and an overview of the market landscape.

Integrations development

Improving "Integrations" development has been the burning problem for me, and was the start point for my current entrepreneurial path.

First of all, what is an integration? I'm defining an integration as a connection between your system and an external system in order to transfer information. Let's imagine that you are building a financial operations SaaS. Your application allows customers to connect their bank accounts, pulling in transaction data on a periodic basis so that they can better understand their spend. This can be really useful for customers that have data spread across multiple bank accounts. In this case, an integration needs to be developed to connect the SaaS application to a bank data aggregator like Plaid.

Once a startup moves beyond the stage where the product and relevant integrations to satisfy their beachhead market are covered, scaling the business necessarily requires adding new integrations. I noticed that more than 50% of a B2B startup's engineering resource are spent developing and maintaining integrations. This feels like a distraction from working on the core product, and what differentiates your businesses. As Bezos put it, “Focus on the things that make your beer taste better".

Having the relevant integrations in place is often crucial to closing new customers, and there is a long tail of integrations that are required. Let's go back to the earlier example of the financial operations SaaS, who is trying to sell to a customer with global business operations. The existing Plaid integration only gives visibility to their US and European bank accounts, but provides no coverage for their bank account in Australia. The value proposition of visibility over all spend is not met, without the vendor adding an extra integration that can provide bank transaction data in APAC region. What usually happens at this point is the integration is promised, and there is a mad rush from engineering and product to get it delivered - pushing aside all other development of exciting new features, or fixing critical bugs.

My idea

I believe that there is missing infrastructure and tooling to support integrations development, to get a startup to achieve some nature of product-market-fit, before shifting gears into scaling.

There are a couple patterns that all startups are repeating from scratch. There are a finite number of authentication methods to tackle (basic auth, bearer token, OAuth2.0 etc.) and a corresponding "connection flow" for the user in the app where they enter their credentials, there's observability that you want to provide to the customer so that they understand how often their data is being refreshed. You might want to send a notification when an action is required from a customer, like their integration was disconnected and needs them to enter a new password. I can go on and on!

The non-functional requirements of integrations are also quite different. As a B2B startup you might have hundreds to thousands of DAU i.e. not much scale! However, each of your customers' integrations might require fetching tens of thousands of records throughout the day. In certain domains, data integrity and completeness is very important there too. The data flowing through integrations might also contain consumer PII that requires heightened security and compliance measures to store and process.

My vision was to build a platform that would tackle both those functional and non-functional requirements. It would provide building blocks and frameworks for common use-cases, and combine them with serverless hosting to make the scaling and security considerations simple. At scale, when we had more customers using the platform, with more and more integrations and use-cases, we'd train an LLM that could provide customers with pre-approved integrations, solving the problem that startups go on to face when providing coverage for the long tail of integrations.

In the big vision of this idea, we could even provide abstractions on top of the integration itself. If our customer just wants bank data, do they care whether that data is provided by Plaid, Finicity or Yodlee? We could provide dynamic routing based on price, or uptime, and better pricing by negotiating with integration providers. I wanted to build a future where integrations were truly commoditized.

The market

Three forms of solutions currently dominate the market: 1) Cloud ETL, 2) Integration Platform as a Service, 3) Unified API.

ETL stands for extract, transform and load. This describes the sequence of operations that were commonly used to move data from one place to another. There's been a recent trend to re-order this as extract, load, then transform, but it hasn't materially changed the tooling needs. Typically ETL is employed when moving data from the scattered operational databases that live across an enterprise to a single analytical database (aka data warehouse) for business intelligence use-cases. Apache Airflow is the most prevalent framework to manage ETL pipelines, and has become synonymous with data engineering.

There's a huge market for providing this procedure as a simpler SaaS or Cloud ETL, leading to success of companies like Fivetran and the newer Airbyte. These players have moved far beyond database to database transfers, and each support hundreds of different integrations as data sources. Airbyte, in particular, is open-source, and provides a simple enough SDK so you can add additional integrations without relying on the Airbyte team to get round to it. Sounds perfect, right?

ETL is a good start, but limited for modern SaaS use-cases in two important dimensions. The modern SaaS is real-time. Customers expect visibility as soon as things happen, which is a fundamentally different architecture to batch jobs that run a few times a day. Braze built a $500M ARR business based on this insight, and there are so many more verticals where real time data and actions can unlock a ton of value. ETL was designed for internal data pipelines, and is fundamentally not designed for real-time streaming [^1]. The second gap is customers don't just expect to read from their data sources anymore, but write back to them.

The next approach to integrations development is the clunkily named Integration Platform as a Service (iPaaS). Workato is the king of iPaaS, wrapping up 1000+ integrations into a customizable SaaS product. A key advantage here are that iPaaS allow non-technical users within your org to set up new integrations for customers with a low code user interface.

The iPaas is a closed system, if there is an integration missing, you are beholden to vendor's development timelines. Another risk is vendor lock in, given your data and flows are so tightly coupled to the iPaaS, it would be very challenging to migrate away from it in the future. Overall, the product offering might make sense for an enterprise, but does not work for a fast moving startup. I'd be interested in seeing if startups take up on Alloy Automation, an upstart in a market full of legacy players.

The final approach is the Unified API, such as Merge. The idea here is an API that provides an unified data model over multiple API providers. For example, instead of having to integrate with 10 different ERPs (NetSuite, Quickbooks, Xero etc.) there's a single API to fetch invoices, another invoice to fetch customers and so on. Unified APIs have had a lot of uptake with startups. They provide real-time bi-directional sync (read or write) that was missing with ETL, and the flexibility and developer focus that are missing from iPaaS solutions. There's a lot of opportunity here to build unified APIs for brand new niches, as Terra is doing for wearables. I have the most experience as a buyer and user of unified APIs, and so did the CTOs I chatted with when validating my idea.

Customer feedback

When I started to speak with CTOs of startups that met my ICP (B2B SaaS, 10+ engineering FTE), I was surprised that I had underestimated the amount of time spent on integrations. One team told me it could take up to 90% of their team's capacity! Most CTOs had evaluated all three product types, and preferred the unified API approach.

However, I was surprised when I learned exactly how some teams were using unified APIs. They found that the unified model itself wasn't useful, as in the process of providing a common model, important details were abstracted away. They described the value as being able to work with the sales engineers of the unified API provider, who were super familiar with the intricacies of the APIs and their data. Essentially, the value of the product was the consulting expertise rather than technology itself. The CTOs pointed to problems like how tricky it is to understand the API without seeing production data, or how vendor documentation is often missing or outdated.

Another CTO was considering buying Workato because their integrations have been live for the longest, and have had other customers deploy them in production use-cases, so were most likely to be bug free. This again pointed to the root cause issue with integration development, which is not missing technology but a lack of human expertise.

I lost conviction that the value of providing the infrastructure and abstracting away the boilerplate could overcome the "cold start problem". Customers wanted battle-tested integrations first and foremost.

Takeways

I felt dissuaded that I could build a technology solution that would be 10X better than a unified API across any and every integration. I believe the key to winning this market is conquer one niche at a time, and have enough live customer data flowing through your integration to ensure correctness.

On the topic of owning a niche, I think another interesting angle is in building the most cost efficient ETL. A team with deep expertise of a given sources like PostgreSQL, destinations like Snowflake, and cloud infrastructure should be able to build ETL pipelines that minimize compute cost, and pass the savings along to the customer.

Join me on the next post where I will do a similar deep dive into another area. If any of the ideas here resonate with you, I'd love to hear from you, and happy to have a chat. You can find me here.

[^1] Change Data Capture (CDC) is an interesting approach for real-time ETL but it's only appropriate for database to database transfers.