============================================================================= Lecture 1 ============================================================================= - who I am - course requirements - mid-term 25, final 25, project 25%, participation 25% - participation includes being responsible for one class period's worth of lectures ============================================================================= "Decentralized and P2P systems (Distributed systems)" "decentralized": - no single master, point-of-failure - local control/autonomy P2P: - all are peers - peers are basically the same. - file-sharing: - sometimes means data dist, control central ============================================================================= dist sys: - distance (latency, no simultaneity, agreement difficult) - "a DS is one where networked computers coordinate activity only by passing messages" - Internet, intranets, mobile devices continuum small, fast (bus-based) <-------clusters------LAN--------> big slow networks low latency high high bandwidth low secure, reliable interconnect autonomous dependent failures unreliable network coordinated resources fear and distrust independent failures decentralized administration Challanges: - private comm over public networks - who sent it (auth), integrity, privacy - building reliable systems from unreliable components - indep failures - lamport: "a distributed system is one in which the failure of a machine I've never heard of can prevent me from doing my work." - location - placement for efficient sharing, - finding it later - coordination and shared state - what should "we" do, and when? - can we agree on what we've done? ============================================================================= - reliability • recoverability Don’t lose data if a failure occurs (also durability) • availability Don’t interrupt service if a failure occurs. • scalability Grow effectively with the workload. See also: manageability. • survivability Murphy’s Law says it’s a dangerous world. Can systems protect themselves? • See also: security, adaptibility, agility, etc. ============================================================================= scalability (cost vs capacity chart) (also show "marginal cost of capacity" lines - scalable almost flat, unscalable not) ============================================================================= scalability => manageability (clients) attached to (server cloud) with "monitor arrow" to "adaptation policies" with "actuator arrow" back to cloud No humans! ============================================================================= Availability - replicate! - data - hardware - functionality - build decentralized without "single point of failure" ============================================================================= Recoverability Assume: - nodes have storage: - volatile: fast (memory) - non-volatile: slow (disks) But NVRAM getting cheaper, flash memory, etc. device failure probability: "mean time between failure" ============================================================================= consequences - concurrent processes - user work independently - non-determinism, raceconditions, sync, deadlocks, liveness - no global clock - coordination through messages - clock sync works, but only to a degree - no global state - no single process has knowledge of entire state of the system - failures - node failures - network partitions ============================================================================= Lecture 2 ============================================================================= Time in dist systems. notions: - wall clock (lots of problems) - logical time Start with events, causality, space-time diagrams. Assume: - reliable, ordered comm, - comm only through msg-passing. Then: - events of one proc linearly ordered - send happens before receive Say e '<' (happens-before) e' Also the transitive closure. If time only moves forward: - '<' has no cycles - just a partial order If (e < e') e "causually precedes" e'. if (!(e < e') && !(e' < e) ) e "causually independent" of e'", or "concurrent" ============================================================================= In dist system w/ actions modeled by events - nothing happens between events - time only advances at an event (discrete time) Need func C:E->T which assigns timestamp C(e) Reasonable requirement conforms to causality: forall e, e' elof E: e < e' -> C(e) < C(e') This is the "clock condition". Consequences: - time is monotonically increasing - time of send < time of receive Lamports protocol: 1) internal event or send event at Pi, Ci := Ci + d 2) each msg carries send time 3) at Pi rcv w/ time t, clock Ci := max(Ci,t) + d d usually 1. P1 1 2 3-r(p2,1) 6-r(p3,5) P2 1-s(p1) 2-s(p3) P3 1 3-r(p2,2) 4 5-s(p1) Implications: - events at diff procs can have same timestamp What if we need uniqueness? (tie-break of proc #) What do we lose in mapping partically ordered events to linearly ordered integers? (can't show that e can not have influenced e' if C(e) less ============================================================================= How to do this? Vector time - each proc knows something about what other procs have seen Each proc Pi has clock Ci that is vector of len n (# procs). Vector clock: - initialized as null vector - ticks prior to an event by incrementing own value: Ci[i] := Ci[i] + 1 - each msg stamped w/ vector - at receipt (after incrementing local): Ci = piecewise-max(Ci, t) (picture) So Ci[i] shows how many events have occurred in i. In fact: C(e)[i] = |{ e' | e' is an event of Pi and e' <= e}| What do these relations mean w/ vectors: u <= v iff forall i (u[i] <= v[i]) u < v iff (u <= v) && !(u == v) u || v iff !(u < v) && !(v < u) || is not transitive! Example! What do we get? Isomorphishm between causal, temporal structure. forall e,e' elof E: e < e' iff C(e) < C(e') In other words, we can now tell if two points causually related. ============================================================================= What do we do w/ vector time? finding "earliest" event - in resolving deadlocks - finding "simultaneous" events good for defining consistent ckpts. How? (include one snapshot per proc, ensure no cycles.) - observer getting notifications from different processes might want them in order. - debugging - can show that some event can not have caused another - reduce information necessary to replay - detect race conditions - if two procs interact outside msgs, and are concurrent then, it's a race - measuring "degree of parallelism" ============================================================================= Matrix clocks Allow obsolete info in replacted dbs to be discarded ============================================================================= Performance - scalar cheap - vector, matrix mostly impractical, but: - incremental storage, msg-append often helps ============================================================================= Lecture 3 ============================================================================= Think of the problem of global state detection: Need ckpts of nodes, channels. Note: # msgs seen in c ckpt must be same as c's endpoint ckpt # msgs sent must be >= # rcvd Assume: - piecewise deterministic - reliable, ordered, uni-directional communication channels - no other comm To get snapshot (atomically do): 1) take ckpt, 2) send token on all outgoing edges At receipt of token, if haven't already seen it (atomically do): 1) take ckpt 2) send token on all outgoing edges If a msg is recorded as having been sent, is it guaranteed to have been seen in dest ckpt? (NO! Dest may have made ckpt earlier). - Is this a problem? (No, pw-determ) If msg recorded as having been received, must it have been recorded sent? (YES!)