search by tags

for the user

adventures into the land of the command line

mongodb replication

A replica set in MongoDB is a group of mongod processes that maintain the same data set. A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.

Member Types


The primary node receives all write operations and records all changes to its data sets in its operation log, i.e. oplog. The secondaries replicate the primary’s oplog and apply the operations to their data sets. They apply these operations from the primary asynchronously.

If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary.

You may add an extra mongod instance to a replica set as an arbiter. Arbiters do not maintain a data set, their purpose is to maintain a replica set by responding to heartbeat and election requests by other replica set members. An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.

Read preference


By default, clients read from the primary, however, clients can specify a read preference to send read operations to secondaries. Read preference modes are:

primary Default mode. All operations read from the current replica set primary.
primaryPreferred In most situations, operations read from the primary but if it is unavailable, operations read from secondary members.
secondary All operations read from the secondary members of the replica set.
secondaryPreferred In most situations, operations read from secondary members but if no secondary members are available, operations read from the primary.
nearest Operations read from member of the replica set with the least network latency, irrespective of the member’s type.

Automatic Failover


Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.

When a member is marked as inaccessible, the secondary with the highest priority available will call an election. Secondary members with a priority value of 0 cannot become primary and do not seek election.

When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary. The first secondary to hold an election and receive a majority of the members’ votes becomes primary.

You can additionally configure a secondary to: Prevent it from becoming a primary in an election
Prevent applications from reading from it
Keep a running “historical” snapshot for use in recovery

Initiate a replica set

conf = {
  "_id": "my_groovy_replica_set",
  "version": 1,
  "members": [

    {
      "_id": 0,
      "host": "this.is.mongo0:27017"
    },

    {
      "_id": 1,
      "host": "this.is.mongo1:27017"
    },

    {
      "_id": 2,
      "host": "this.is.mongo2:27017"
    }

  ]
};

rs.initiate(conf);


Commands


Add a member to a replica set. You must run it from the primary of the replica set.

rs.add('this.is.mongo.3:27017')
OR
rs.add( { host: "this.is.mongo3:27017", priority: 0 } )

Add an arbiter to an existing replica set.

rs.addArb(this.is.mongo3:27107)
OR
rs.add('this.is.mongo3:27017', true)

Remove a member from an existing replica set.

rs.remove(this.is.mongo1)

Make a replica set member ineligible to become primary (for x seconds).

rs.freeze(seconds)
OR
cfg = rs.conf()
cfg.members[2].priority = 0
rs.reconfig(cfg)

To allow the current connection to allow read operations to run on secondary members.

rs.slaveOk()
OR
db.getMongo().setSlaveOk()

Trigger the current primary to step down and trigger an election for a new primary (blocks all writes to the primary while it runs).

rs.stepDown(60)

Check current replica set config.

rs.conf()

Check the status of the replication set.

rs.status()
OR
use admin
db.runCommand( { replSetGetStatus : 1 } )

Check if the current member is the primary.

rs.isMaster().ismaster

Check if the current member is a secondary.

rs.isMaster().secondary

Return a formatted report of the status of a replica set from the perspective of the secondary member of the set.

rs.printSlaveReplicationInfo()

source: m1.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary
source: m2.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary

Print a report of the replica set member’s oplog.

rs.printReplicationInfo()

configured oplog size:   192MB
log length start to end: 65422secs (18.17hrs)
oplog first event time:  Mon Jun 23 2014 17:47:18 GMT-0400 (EDT)
oplog last event time:   Tue Jun 24 2014 11:57:40 GMT-0400 (EDT)
now:                     Thu Jun 26 2014 14:24:39 GMT-0400 (EDT)


Restarting stuff


What happens to the replica set?

Take this setup for instance:

no arbiter
this.is.mongo0 - secondary
this.is.mongo1 - secondary
this.is.mongo2 - primary

If I restart this.is.mongo0, when the secondary is rebooting and I check the replication status on the primary:

this.is.mongo2:PRIMARY> rs.status()
.
.
.
{
    "_id" : 0,
    "name" : "this.is.mongo0:27017",
    "health" : 0,
    "state" : 8,
    "stateStr" : "(not reachable/healthy)",
    "uptime" : 0,
    "optime" : {
        "ts" : Timestamp(0, 0),
        "t" : NumberLong(-1)
    },
    "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
    "lastHeartbeat" : ISODate("2017-05-09T14:38:46.137Z"),
    "lastHeartbeatRecv" : ISODate("2017-05-09T14:27:12.363Z"),
    "pingMs" : NumberLong(0),
    "lastHeartbeatMessage" : "Connection refused",
    "configVersion" : -1
},
.
.
.

After 10 seconds, this will change from this:

"lastHeartbeatMessage" : "Connection refused",

To this:

"lastHeartbeatMessage" : "Couldn't get a connection within the time limit",

Or this:

"lastHeartbeatMessage" : "no response within election timeout period",

And configVersion will change from 1 or 3 to -1

At this point the members will elect a new primary. It could be the same host is re-elected primary, but in this case, my replication set looked like this after:

this.is.mongo0 - secondary
this.is.mongo1 - primary
this.is.mongo2 - secondary

Primary election is very fast with three members, I’m not sure how long it takes with a large cluster size.