-
Data Integration
- Have to pick an approach and tradeoffs for any given problem
- Combining tools by deriving data
- Microservices each store their own data - no defined order for events across them
- Tough to not bottleneck in order to determine total order broadcast
- Usually someone saying “99% of people dont need X” doesn’t understand the variety of use cases/requirements out there
-
Batch and Stream Processing
- Batch is functional flavored
- Evolution can happen by maintaining two derived views off the same data and moving towards one incrementally (e.g. railroad gauge standard)
-
"I think that the dataflow across an entire organization starts looking like one huge database”
- Unifying reads via a single api
- Unifying writes so sync across systems works
- Make the equivalent of the unix shell for unbundled databases
-
Designing Apps Around Dataflow
- App code = deriving functions
- Separating code and state
- Spreadsheets “observe” dependent variables and update their values automatically
-
Observing Derived State

- Reads are events too - e.g. kafka stream offsets
-
Aiming for Correctness
- New alternatives/additions to transactionality/ACID
- Enforcing constraints, e.g. uniqueness requires consensus
- Timeliness = users observe latest state, Integrity = no data loss/contradictions
- violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
- Avoid coordination, idempotency good
-
Trust but verify - add checks whenever possible
-
Doing the right thing
- Replace “data” with “surveillance”