Aspiring 10x DE Interviews
(We are) building the (data engineering) bridge while we are using it. If someone could step off and look at how we have built this bridge, they would realize that there are much better ways of building it from scratch.
#1 wish for Data Engineering Products - 99% of them would disappear. It is insane the number of products that have spun up to solve Data Engineering problems. bi(OS) is the first product that treats (data engineering) as a middleware engineering problem with a proper solution.
#1 Data Engineering wish is to have a DataBase that has powerful OLTP and OLAP capabilities to avoid having a combination of AeroSpike, ClickHouse, Postgres, and Kafka everywhere as a router and buffer.
(Data Engineering) in simple terms - make data available for analysts, Business Intelligence folks, Data Scientists in a way they can use it. Since being introduced to 10x Data Engineering, I feel things might drastically change (for Data Engineering) in the future.
Trade-offs from the trenches
CDC as an ingestion mechanism has survived decades due to one key reason - the product/tech teams like to keep the analytics infrastructure an arms-length-away. This mindset creates a fragmented architecture and perpetuates high downstream costs. This talk looks at the end-to-end impact of using CDC as an ingestion mechanism for Data Engineering - all the way upstream to QoS impact on microservices.
Peeling the real-time marketing veneer of Pub-Sub mechanisms, one finds queues as their foundation. While in vogue for moving data from OLTP to OLAP systems, these mechanisms create significant overhead across the Data Engineering stack. This talk posits - if the data consumers have to be sized appropriately for the queue to function as needed, why not eliminate it?
While the previous talk looked at the price of Pub/Sub mechanisms, this talk looks at how different messaging systems (e.g., WhatsApp vs. Gmail) use queuing effectively. TL, DR - a queue is used as a fallback to provide temporary buffering and not as a primary path between the producer and consumer.
This talk goes back to the pre-Big Data days to look at the evolution of analytics. It peels the marketing veneer of Data Warehouse and LakeHouse and asks - is the separation of compute and storage a lie? TL, DR - at least partially.
Conway's law states - organizations will design systems that copy their communication structure. This talk applies Conway's law to the field of Data Engineering and asks is there a better way to organize Data beyond the ETL+ESB+EDW+BI?
The Lean Modern Data Stack of unicorns, enterprises and startups