HeartBeat

Show a server is available by periodically sending a message to all the other servers.

Problem

When multiple servers form a cluster, each server is responsible for storing some portion of the data, based on the partitioning and replication schemes used. Timely detection of server failures is important for taking corrective actions by making some other server responsible for handling requests for the data on a failed server.

Solution

Periodically send a request to all the other servers indicating liveness of the sending server. Select the request interval to be more than the network round trip time between the servers. All the listening servers wait for the timeout interval, which is a multiple of the request interval. In general,

for more details go to Chapter 07 of the online ebook at oreilly.com

This pattern is part of Patterns of Distributed Systems

23 November 2023