Data is the glue that holds distributed systems together. Data has it's own lifecycle starting from origination to flow, transformation, modification and finally deletion. At a high level these are some of the data design considerations
Database Type
- Relational, KV-store, Column Family, Document, Graph etc
Indexing and Querying
- LSM, B-Tree, B+-Tree
- R/W Access patterns
- Range queries
Schema, Metadata, Encoding and Evolution
- Versioning
- Forward and Backward compatibility
- Format
- Avro, Thrift, Protobuf
Transaction
- Atomicity
- Isolation level
- Lost update
- Read committed
- Read repeatable (snapshot isolation - Readers never block writers and vice versa)
- Prevent lost update (read-modify-update cycle)
- CSET, Atomic write, Conflict resolution,
- Phantom (generalisation of lost update)
- Materialising conflict
- Serialisability
- Actual serial execution
- 2 phase locking
- Writers don’t just block other writers; they also block readers and vice versa.
- Deadlocks
- Predicate lock, index range lock
- SSI (Optimistic concurrency control)
- all reads within a transaction are made from a consistent snapshot of the database
- Pessimistic concurrency control - 2PL, ASE
- Optimistic concurrency control - MVCC
- Data durability
- WAL Logs
- Backup and Restore
- Error handling, RRD, Save points
Replication
- Master slave
- Read your writes, monotonic read, consistent prefix reads
- Multi master (couchdb)
- Conflict resolution / avoidance
- Leaderless (Dynamo style)
- Quorum, Sloppy quorum, LWW, Concurrent writes, Version vectors
Data partitioning
- Hotspot
- Partition by Key range, consistent hashing, Compound primary key
- Partition index - Local index (scatter-gather), Global index
- Rebalancing - Fixed, Dynamic
- Request routing
Trouble with Distributed systems
- Faults and partial failures
- Unreliable networks - synchronous vs asynchronous networks
- Unreliable clocks
- Don’t rely on the accuracy of the clock
- NTP sync is not accurate
- Confidence interval in time - Spanner
- Process pause
- Knowledge truth lies
- Truth is defined by majority
- Fencing tokens is required for distributed lock
- Safety and liveness
Linearisablity
- Serializability is an isolation property of transactions
- Linearisablity is a recency guarantee
- To make a system appear as if there is only a single copy of the data.
- Single leader replication is potentially linearisable. Multi-leader and leaderless are not linearisable
- A network interruption forces a choice between linearisability and availability.
- CAP - Linearisable / Available when partitioned
- The reason for dropping linearisability is performance , not fault tolerance.
- Most DB are neither CAP-linearisable nor CAP-available
- MVCC is intentionally non-linearisable
- Single leader replication (async) is non-linearisable
- Partition makes it non-available
- ZooKeeper by default is neither CAP-consistent (CP) nor CAP-available (AP) – it’s really just “P”.
- You can optionally make it CP by calling sync if you want,
- and for reads (but not for writes) it’s actually AP, if you turn on the right option.
Error handling , Recovery
- Replay logs
- Discover partitions
- Replication catchup
- Restores
Source of truth
- Primary DB
- Derived data
- Caches
- Materialised Views
- CQRS
Batch processing
- Putting the computation near the data
- Mapping and Chaining
- Sort merge joins
- GroupBy
- Handling Skew
- MapSide joins
- Broadcast hash joins - joining a large dataset with a small dataset
- Partitioned hash joins - partition and reduce the dataset
- Mapside merge joins - if input is partitioned and sorted appropriately
- Use cases - Search index, Key Value stores
- The output of a reduce-side join is partitioned and sorted by the join key, whereas the output of a map-side join is partitioned and sorted in the same way as the large input
- Input is immutable and no side effects
- In an environment where tasks are not so often terminated, the design decisions of MapReduce make less sense.
- Materialising intermediate state
- Makes fault tolerance easy
- Sometimes they are not needed
Streaming processing
- Messaging
- Brokers
- Routing: Fanout, load balancing
- Acknowledgement and redelivery
- Message ordering
- Partitioned logs - Kafka
- Unified log
- Partitioning to scale
- Replayable
- Consumer offsets
- Durability
- Immutable input and idempotent operations
- CDC - Change Data Capture
- Implemented using log based broker
- Connectors - Debezium etc
- Event sourcing
- Immutable events written to event log (this is used in accounting where delta is captured)
- CQRS - separate forms for read and write and allowing several different read views
- Stream processing
- Produce a new stream, real time dashboard, write it to a end database
- Has operators - sorting doesn’t make sense
- Uses: CEP, Stream analytics, Maintaining materialised views
- Event time vs Processing time
- Stream Joins
- Stream-Stream joins (click through rate)
- Stream-Table join (enrichment)
- Similar to map side hash join
- Table can be kept upto date using CDC
- Table-Table join (materialised view maintenance)
- Time dependence of joins
- Slowly changing dimension
Big Data
- Data Analytics
- ETL
- Data warehouse
Data Security
- Encryption at Rest
- Data Retention
- Data classification
- (Customer) Data isolation
- Data Integrity
- Data Availability
- Data anonymisation
- Backup and recovery
- Database Security
- Access Control
No comments:
Post a Comment