Time in dist systems. notions: - wall clock (lots of problems) - logical time Start with events, causality, space-time diagrams. Assume: - reliable, ordered comm, - comm only through msg-passing. Then: - events of one proc linearly ordered - send happens before receive Say e '<' (happens-before) e' Also the transitive closure. If time only moves forward: - '<' has no cycles - just a partial order If (e < e') e "causually precedes" e'. if (!(e < e') && !(e' < e) ) e "causually independent" of e'", or "concurrent" ============================================================================= In dist system w/ actions modeled by events - nothing happens between events - time only advances at an event (discrete time) Need func C:E->T which assigns timestamp C(e) Reasonable requirement conforms to causality: forall e, e' elof E: e < e' -> C(e) < C(e') This is the "clock condition". Consequences: - time is monotonically increasing - time of send < time of receive Lamports protocol: 1) internal event or send event at Pi, Ci := Ci + d 2) each msg carries send time 3) at Pi rcv w/ time t, clock Ci := max(Ci,t) + d d usually 1. P1 1 2 3-r(p2,1) 6-r(p3,5) P2 1-s(p1) 2-s(p3) P3 1 3-r(p2,2) 4 5-s(p1) Implications: - events at diff procs can have same timestamp What if we need uniqueness? (tie-break of proc #) What do we lose in mapping partically ordered events to linearly ordered integers? (can't show that e can not have influenced e' if C(e) less ============================================================================= How to do this? Vector time - each proc knows something about what other procs have seen Each proc Pi has clock Ci that is vector of len n (# procs). Vector clock: - initialized as null vector - ticks prior to an event by incrementing own value: Ci[i] := Ci[i] + 1 - each msg stamped w/ vector - at receipt (after incrementing local): Ci = piecewise-max(Ci, t) (picture) So Ci[i] shows how many events have occurred in i. In fact: C(e)[i] = |{ e' | e' is an event of Pi and e' <= e}| What do these relations mean w/ vectors: u <= v iff forall i (u[i] <= v[i]) u < v iff (u <= v) && !(u == v) u || v iff !(u < v) && !(v < u) || is not transitive! Example! What do we get? Isomorphishm between causal, temporal structure. forall e,e' elof E: e < e' iff C(e) < C(e') In other words, we can now tell if two points causually related. ============================================================================= What do we do w/ vector time? finding "earliest" event - in resolving deadlocks - finding "simultaneous" events good for defining consistent ckpts. How? (include one snapshot per proc, ensure no cycles.) - observer getting notifications from different processes might want them in order. - debugging - can show that some event can not have caused another - reduce information necessary to replay - detect race conditions - if two procs interact outside msgs, and are concurrent then, it's a race - measuring "degree of parallelism" ============================================================================= Matrix clocks Allow obsolete info in replacted dbs to be discarded ============================================================================= Performance - scalar cheap - vector, matrix mostly impractical, but: - incremental storage, msg-append often helps ============================================================================= Consistent snapshots: Assume: - piecewise deterministic - reliable, ordered, uni-directional communication channels - no other comm To get snapshot (atomically do): 1) take ckpt, 2) send token on all outgoing edges At receipt of token, if haven't already seen it (atomically do): 1) take ckpt 2) send token on all outgoing edges If a msg is recorded as having been sent, is it guaranteed to have been seen in dest ckpt? (NO! Dest may have made ckpt earlier). - Is this a problem? (No, pw-determ) If msg recorded as having been received, must it have been recorded sent? (YES!)