ZAB Algorithm

ZAB is an atomic broadcast algorithm, which orders and executes concurrent requests in a sequential manner.

It consists of two modes:

  • Recovery: When the service starts or after a leader failure. Ends when a leader emerges and a quorum of servers have synchronized their state with the leader
  • Broadcast: The leader is the server that executes a broadcast by initiating the broadcast protocol. Once a leader has synchronized with a quorum of followers, it begins to broadcast messages

Important related links:

BASE and ACID

BASE involves step-by-step transformation of a transactional application into one that will be far more concurrent and less rigid, it stands for “Basically Available Soft-State Services with Eventual Consistency”

  • Basically Available: Fast response even if some replicas are slow or crashed
    • BASE paper describes: In data centers, partitioning faults are very rare and are mapped to crash failures by forcing the isolated machines to reboot
  • Soft-State Service: No durable memory
    • Can’t store any permanent data
    • Restarts in a “clean” state after a crash
    • To remember data, either replicate it in memory in enough copies to never lose all in any crash, or pass it to some other service that keeps “hard state”
  • Eventual Consistency: OK to send “optimistic” answers to the external client
    • Could use cached data (without checking for staleness)
    • Could guess at what the outcome of an update will be
    • Might skip locks, hoping that no conflicts will happen
    • Later, if needed, correct any inconsistencies in an offline clean up activity

ACID is a model for correct behavior of databases with the following properties:

  • Atomicity: Even if “transactions” have multiple operations, does them to completion (commit) or rolls back so that they leave no effect (abort)
  • Consistency: A transaction that runs on a correct database leaves it in a correct (“consistent”) state.
  • Isolation: It looks as if each transaction ran all by itself. Basically says “we’ll hide any concurrency”, it runs all its operation in a sequential manner
  • Durability: Once a transaction commits, updates can’t be lost or rolled back

Related article: CAP theorem

Consensus with paxos

Paxos is an algorithm to achieve consensus in asynchronous distributed system model. More about Paxos can be found unter:

Terms in Computer Science

Primitive:
A low-level object or operation from which higher-level, more complex objects and operations can be constructed. In graphics, primitives are basic elements, such as lines, curves, and polygons, which you can combine to create more complex graphical images. In programming, primitives are the basic operations supported by the programming language. A programmer combines these primitives to create more complex operations, which are packaged as functions, procedures, and methods. This definition is referred from http://www.webopedia.com/TERM/P/primitive.html

Emerging technologies relating to big data

Big data processing framework:

Distributed Computing Platform:

Apache ZooKeeper: A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination service, which implements a degenerated version of Paxos algorithm, called ZAB (a total ordered broadcast protocol).

HBase:

Graph processing: