search by tags

for the user

adventures into the land of the command line

mongo troubleshooting

Some errors, their possible solutions and possible fixes.


HostUnreachable: short read

in mongo-server-0's log file...

2017-05-15T08:29:20.807+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to mongo-server-1:27017; HostUnreachable: short read

You might see this in a replication set in your mongod.log file on an effected host.

If you login to the database and check the replication status you might see something like this:

my_repl_set:SECONDARY> rs.status()
{
    "set" : "my_repl_set",
    "date" : ISODate("2017-05-15T08:35:23.746Z"),
    "myState" : 2,
    "term" : NumberLong(18),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [
        {
            "_id" : 0,
            "name" : "mongo-server-0:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 389,
            "optime" : {
                "ts" : Timestamp(1494604007, 1),
                "t" : NumberLong(18)
            },
            "optimeDate" : ISODate("2017-05-12T15:46:47Z"),
            "configVersion" : 1,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "mongo-server-1:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2017-05-15T08:35:21.983Z"),
            "lastHeartbeatRecv" : ISODate("2017-05-15T08:35:20.721Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "short read",
            "configVersion" : -1
        },
        {
            "_id" : 2,
            "name" : "mongo-server-2:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : {
                "ts" : Timestamp(0, 0),
                "t" : NumberLong(-1)
            },
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2017-05-15T08:35:21.984Z"),
            "lastHeartbeatRecv" : ISODate("2017-05-15T08:35:22.444Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "short read",
            "configVersion" : -1
        }
    ],
    "ok" : 1
}

It means that the server cannot communicate with the other members in the replica set. The cause is a bit tricky because it appears that it is a network issue. So generally the first thing you might try is to ping or nc the effected host from elsewhere, or vice versa.

$ ping mongo-server-0
$ nc -t -v -w 1 123.45.678.9 27017

You might even to try to use the mongo client on a different server to try and connect.

$ mongo mongo-server-0:27017/my_database -u username -p password

You might even check you have a firewall blocking something on the effected host.

[email protected]:~# netstat -nat | grep :27017
tcp        0      0 0.0.0.0:27017           0.0.0.0:*               LISTEN

At this point you might even scratch your head and think… what does “short read” even mean??

Well for me this issue extended to SSL being configured on one server and not on another.

The effected host was allowing only SSL connections and the yet-to-be-configured servers were not communicating with SSL. So make sure you also check that. After configuring the other servers to communicate with SSL, things worked as expected.


Could not find member to sync from

After setting up or recreating a replica set on a system that has no much load, you may see this in the rs.status() output.

{
    "_id" : 0,
    "name" : "mongo-server-0:27017",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 1197,
    "optime" : {
        "ts" : Timestamp(1494844175, 1),
        "t" : NumberLong(20)
    },
    "optimeDate" : ISODate("2017-05-15T10:29:35Z"),
    "infoMessage" : "could not find member to sync from",
    "configVersion" : 1,
    "self" : true
},

But why? Well, according to this, when a secondary chooses a source to sync from, it will choose a node whose oplog is newer (not equal) than its own oplog, so after startup,when all nodes have some data,the oplog will be same.

When all the oplogs are the same, the secondary cannot yet choose a sync souce. However after a write operation happens, the primary will have an updated and therefore newer oplog.

At this point, the secondary can successfully choose a targe to sync from and the error message will disappear.

After editting some data, the primary’s oplog was updated and low and behold…

{
    "_id" : 0,
    "name" : "mongo-server-0:27017",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 1531,
    "optime" : {
        "ts" : Timestamp(1494847785, 1),
        "t" : NumberLong(20)
    },
    "optimeDate" : ISODate("2017-05-15T11:29:45Z"),
    "syncingTo" : "mongo-server-1:27017",
    "configVersion" : 1,
    "self" : true
},