Engineering

Who is M/F? Real World Data Quality

My blog last week talked about Schema lost in transit. Let me tell you a real story about a customer. We uploaded data from their repository into bi(OS) and the image above is a screenshot of what we saw on bi(OS) after an hour. We were not surprised as we have seen this story at ...

Schema lost in transit only to recreate again. WT*?

The current state of the art for Data Engineers is to build pipelines that ingest structured and semi-structured data in JSON, CSV, AVRO and store these as BLOBs on S3 (where the schema is lost). Then “Schema on Read” technologies such as Snowflake, Dremio or Presto process these ...