Replication – Don’t forget to clone you!

Hi Folks,

In today’s middleware, we must allow for fluid placement of computation and data between everything from sensors to multi-domain clouds. In such environments, the problem of maintaining consistency is exacerbated by widespread replication — including client-managed data. Replication play as the key factor to manage large scale fault tolerance systems.

The amount of time which a service is available, can be increased by maintaining copies of data between servers. As an example, if we have n number of servers where each server has a p probability of failing or becoming unreachable, then the availability of an object stored at each of these servers, can be increased if we have more failure-independent servers.

1 – probability(all managers failed or unreachable ) = 1 􀀀- pn

Caches don’t necessarily keep collection of objects entirely at application level. In that case server replication performs better than common caching techniques in enhancing availability.

Fault tolerance system must guarantee correct behavior of high available data. Data &
service replication between computers do ensure that there will be correct servers which can
outvote the wrong values produced by failure models. There’re replication schemes, where data can be replicated across nodes in full or partial way.

Replication schemes

In full replication, data is replicated among all participating nodes where it gives a fault-tolerance system. These systems perform well in read only transactions where it can avoid remote communication by executing locally. Let’s assume a system where data is not replicated & has only single object copy. In nodes’ partial failures, objects held by failed nodes will be lost & the following transactions have never been committed. But in fully replicated systems where single computer is formed by a group of linked computers ensures that above trade-o will not be happened in nodes’ partial failures. But as number of nodes grow, the transmitted messages increase quadratic-ally in metric-space network where the communication cost between two neighbor nodes form a metric. So the broadcasting transactional primitives (read/write sets, memory differences) will not be scaled. To address this issue partially replicated object models come into act.

Someone can notice the heavy brute force replication in fully replicated systems where objects are replicated in each nodes. So each node is maintaining replicas of all objects. It can cause communication overhead where it increases locality while ensuring one-copy serializability. It’s quiet costly even though full replication model doesn’t want to handle objects’ requests & retrieve operations. To avoid this expensive replication, clusters of nodes can be created ensuring at least one object replica should be in each cluster. It may avoid the loss of objects since multiple nodes used to know about the object in the cluster. Partially replicated systems are more scalable because replicas only need to update data items where only they have local copies. But in the presence of larger work loads these systems don’t perform well as they need to deal with expensive remote data accesses. Caching mechanisms were produced over time to deal with this issue by ensuring that they are not violating transactional semantics. Also there’re active schedulers where they schedule replica transactions to gain high performance.

Leave a comment