• Data Integration

    • Have to pick an approach and tradeoffs for any given problem
    • Combining tools by deriving data
      • e.g. search indexes
    • Microservices each store their own data - no defined order for events across them
    • Tough to not bottleneck in order to determine total order broadcast
    • Usually someone saying “99% of people dont need X” doesn’t understand the variety of use cases/requirements out there
  • Batch and Stream Processing

    • Batch is functional flavored
    • Evolution can happen by maintaining two derived views off the same data and moving towards one incrementally (e.g. railroad gauge standard)
  • "I think that the dataflow across an entire organization starts looking like one huge database”

    • Unifying reads via a single api
    • Unifying writes so sync across systems works
    • Make the equivalent of the unix shell for unbundled databases
  • Designing Apps Around Dataflow

    • App code = deriving functions
    • Separating code and state
    • Spreadsheets “observe” dependent variables and update their values automatically
  • Observing Derived State

    Untitled

    • Reads are events too - e.g. kafka stream offsets
  • Aiming for Correctness

    • New alternatives/additions to transactionality/ACID
    • Enforcing constraints, e.g. uniqueness requires consensus
    • Timeliness = users observe latest state, Integrity = no data loss/contradictions
      • violations of timeliness are “eventual consistency,” whereas violations of integrity are “perpetual inconsistency.”
    • Avoid coordination, idempotency good
  • Trust but verify - add checks whenever possible

  • Doing the right thing

    • Replace “data” with “surveillance”