Schema lost in transit only to recreate again. WT*?

Sep 9, 2021

Monish Suvarna

The current state of the art for Data Engineers is to build pipelines that ingest structured and semi-structured data in JSON, CSV, AVRO and store these as BLOBs on S3 (where the schema is lost). Then “Schema on Read” technologies such as Snowflake, Dremio or Presto process these BLOBs by (re)-applying the schema that was lost to deliver insights.

So this begs a set of questions starting with “Why is schema lost in transit?” The industry gives us two primary reasons –

Schema can change frequently and “Schema on Write” data platforms cannot handle these schema changes.
Performance of the ingest Pipes – transferring BLOBs is more efficient than understanding the schema.

If we compare the two approaches below:

Characteristic	Schema on Write	Schema on Read
Reads	Fast(er)	Slow(er)
Writes	Slow(er)	Fast(er)
Post-facto Schema Changes	Nada	I don’t care
Data Format	Structured and Semi-structured(?)	Structured, Semi-Structured and Unstructured
Validation	Upfront	Let’s do ETL to validate

The truth is that Engineers need the flexibility of choosing either option depending on the situation. The coming age of ML and JSON requires that flexibility to deliver both Analyst reports and real time ML. What if a data platform can give you the best of both worlds? That data platform is bi(OS).

bi(OS) was designed by not asking what the differences are, but asking why these differences exist using first principles of computer science. The end result is a real-time hyper-converged data platform that gives Data Engineers the best of both worlds. My blog next week will explain how bi(OS) does the impossible while helping Data Engineers do 10x more in days. In the meantime, I would love to hear your thoughts as to why these differences exist. Please join the conversation on the 10x Data Engineer community.

Schema lost in transit only to recreate again. WT*?

bi(OS) Works for People

Enterprise Management Associates

Adam Edgley

Shrinivas Ron

Frank Jas

Martin Quiroga

Designed for Your Success

Ready to Try It?

The bi(OS) difference

Contact Us