MongoDB is a NoSQL document-oriented database that uses JSON-like documents with dynamic schemas called BSON. MongoDB provides high availability with replica sets that consist of two or more copies of the data. Each replica set member may act in the role of primary or secondary. All writes and reads are done on the primary replica by default, but since all secondaries contain copy of the data (although it might be stale, since data is only eventually consistent) reads can be load balanced to the secondaries. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary.
MongoDB scales horizontally using sharding. The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.). [1]
MongoDB provides consistency (if reads are performed on the primary and not the secondaries) and can be roughly placed on the CAP theorem triangle as CP though such classifications are rather rough, and it should only help answering the questoin "what is the default behaviour of the distributed system when a partition happens" [2]:
In this post I'll deploy a three node replica set with sharding.
File: gistfile1.txt
-------------------
[mongodb-n01,2,3]$ lsb_release -d
Description: Ubuntu 14.04.4 LTS
[mongodb-n01,2,3]$ apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
[mongodb-n01,2,3]$ echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
[mongodb-n01,2,3]$ apt-get update && apt-get install -y mongodb-org && pkill mongo
[mongodb-n01,2,3]$ sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf --fork
With the server installed on all nodes, connect to one of the nodes and set up the replica set:
File: gistfile1.txt
-------------------
[mongodb-n01]$ mongo 10.183.2.172:27017
MongoDB shell version: 3.2.6
connecting to: 10.183.2.172:27017/test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
> cfg = {
... '_id':'testReplicaSet',
... 'members':[
... {'_id':0, 'host': '10.183.2.172:27017'},
... {'_id':1, 'host': '10.183.2.198:27017'},
... {'_id':2, 'host': '10.183.1.69:27017'}
... ]
... }
{
"_id" : "testReplicaSet",
"members" : [
{
"_id" : 0,
"host" : "10.183.2.172:27017"
},
{
"_id" : 1,
"host" : "10.183.2.198:27017"
},
{
"_id" : 2,
"host" : "10.183.1.69:27017"
}
]
}
> rs.initiate(cfg)
{ "ok" : 1 }
testReplicaSet:OTHER> rs.status()
{
"set" : "testReplicaSet",
"date" : ISODate("2016-05-31T15:09:30.512Z"),
"myState" : 1,
"term" : NumberLong(1),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "10.183.2.172:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 450,
"optime" : {
"ts" : Timestamp(1464707267, 2),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T15:07:47Z"),
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(1464707267, 1),
"electionDate" : ISODate("2016-05-31T15:07:47Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "10.183.2.198:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 114,
"optime" : {
"ts" : Timestamp(1464707267, 2),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T15:07:47Z"),
"lastHeartbeat" : ISODate("2016-05-31T15:09:29.107Z"),
"lastHeartbeatRecv" : ISODate("2016-05-31T15:09:29.356Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "10.183.2.172:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "10.183.1.69:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 114,
"optime" : {
"ts" : Timestamp(1464707267, 2),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T15:07:47Z"),
"lastHeartbeat" : ISODate("2016-05-31T15:09:29.106Z"),
"lastHeartbeatRecv" : ISODate("2016-05-31T15:09:29.356Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "10.183.2.172:27017",
"configVersion" : 1
}
],
"ok" : 1
}
testReplicaSet:PRIMARY>
testReplicaSet:SECONDARY> rs.printSlaveReplicationInfo()
source: 10.183.2.198:27017
syncedTo: Tue May 31 2016 16:39:43 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary
source: 10.183.1.69:27017
syncedTo: Tue May 31 2016 16:39:43 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary
Add the following replication config option to /etc/mongod.conf on all nodes and restart mongo:
File: gistfile1.txt
-------------------
[mongodb-n01,2,3]$ cat /etc/mongod.conf | grep repl
replication:
replSetName: testReplicaSet
[mongodb-n01,2,3]$ pkill mongo && sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf --fork
With this we now have a simple 3 node replica set. Connect to the master and insert some data:
File: gistfile1.txt
-------------------
[mongodb-n01]$ mongo 10.183.2.172
testReplicaSet:PRIMARY> db.test_db.insert({_id:1, value:'test'})
WriteResult({ "nInserted" : 1 })
testReplicaSet:PRIMARY> db.test_db.findOne()
{ "_id" : 1, "value" : "test" }
testReplicaSet:PRIMARY>
[mongodb-n02]$ mongo 10.183.2.198
testReplicaSet:SECONDARY> db.test_db.findOne()
2016-05-31T15:15:20.486+0000 E QUERY [thread1] Error: error: { "ok" : 0, "errmsg" : "not master and slaveOk=false", "code" : 13435 } : ...
testReplicaSet:SECONDARY> rs.slaveOk(true)
testReplicaSet:SECONDARY> db.test_db.findOne()
{ "_id" : 1, "value" : "test" }
testReplicaSet:SECONDARY>
MongoDB also supports tagged replica sets. Tag sets allow you to target read operations to specific members of a replica set and specifies whether a write operation has succeeded. Write concern allows your application to detect insertion errors or unavailable mongod instances. Read preferences consider the value of a tag when selecting a member to read from.
To enable tags:
File: gistfile1.txt
-------------------
testReplicaSet:PRIMARY> var conf = rs.conf()
testReplicaSet:PRIMARY> conf.members[0].tags = {'datacenter': 'iad3', 'rack_id': '342543'}
{ "datacenter" : "iad3", "rack_id" : "342543" }
testReplicaSet:PRIMARY> conf.members[1].tags = {'datacenter': 'iad3', 'rack_id': '342544'}
{ "datacenter" : "iad3", "rack_id" : "342544" }
testReplicaSet:PRIMARY> conf.members[2].tags = {'datacenter': 'ord1', 'rack_id': '733421'}
{ "datacenter" : "ord1", "rack_id" : "733421" }
testReplicaSet:PRIMARY> conf.settings = {
... 'getLastErrorModes' : {
... 'MultiDC':{datacenter : 2},
... 'MultiRack':{rack_id : 2}
... }
... }
{
"getLastErrorModes" : {
"MultiDC" : {
"datacenter" : 2
},
"MultiRack" : {
"rack_id" : 2
}
}
}
testReplicaSet:PRIMARY> rs.reconfig(conf)
{ "ok" : 1 }
testReplicaSet:PRIMARY>
testReplicaSet:PRIMARY> rs.config()
{
"_id" : "testReplicaSet",
"version" : 2,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "10.183.2.172:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
"datacenter" : "iad3",
"rack_id" : "342543"
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "10.183.2.198:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
"datacenter" : "iad3",
"rack_id" : "342544"
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "10.183.1.69:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
"datacenter" : "ord1",
"rack_id" : "733421"
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
"MultiRack" : {
"rack_id" : 2
},
"MultiDC" : {
"datacenter" : 2
}
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("574da8b80c314d4ba0a53190")
}
}
testReplicaSet:PRIMARY>
In the configuration above we ensure the data gets replicated to at least one server in each DC and the writes propagate to at least two racks in any DC.
The configuration so far does not contain any user authentication or authorization. To enable it let's create an admin user, and two regular users that can read and read/write:
File: gistfile1.txt
-------------------
[mongodb-n01]$ mongo 10.183.2.172
testReplicaSet:PRIMARY> use admin
switched to db admin
testReplicaSet:PRIMARY> db.createUser({
... user:'admin', pwd:'supersercretpassword',
... customData:{desc:'The admin user for the admin database'},
... roles:['readWrite','dbAdmin','clusterAdmin']
... })
Successfully added user: {
"user" : "admin",
"customData" : {
"desc" : "The admin user for the admin database"
},
"roles" : [
"readWrite",
"dbAdmin",
"clusterAdmin"
]
}
testReplicaSet:PRIMARY> show dbs
admin 0.000GB
local 0.000GB
test 0.000GB
testReplicaSet:PRIMARY> show collections
system.users
system.version
testReplicaSet:PRIMARY> db.createUser({
... user:'read_user', pwd:'readuserpassword',
... customData:{desc:'The read only user for the test database'},
... roles:['read']
... })
Successfully added user: {
"user" : "read_user",
"customData" : {
"desc" : "The read only user for the test database"
},
"roles" : [
"read"
]
}
testReplicaSet:PRIMARY> db.createUser({
... user:'write_user', pwd:'writeuserpassword',
... customData:{desc:'The read/write user for the test database'},
... roles:['readWrite']
... })
Successfully added user: {
"user" : "write_user",
"customData" : {
"desc" : "The read/write user for the test database"
},
"roles" : [
"readWrite"
]
}
To ensure only trusted servers can join the replica set, let's create a unique key file and reconfigure the replica set, by adding the "security" section in the config:
File: gistfile1.txt
-------------------
[mongodb-n01,2,3]$ cat /etc/mongod.conf | grep -A4 security
security:
keyFile: /etc/mongo.key
clusterAuthMode: keyFile
authorization: enabled
[mongodb-n01]$ openssl rand -hex 10 > /etc/mongo.key # copy to node 2 and 3
[mongodb-n01,2,3]$ pkill mongo && sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf --fork
[mongodb-n01,2,3]$ mongo 10.183.2.172:27017 -u admin -p supersercretpassword --authenticationDatabase admin
MongoDB shell version: 3.2.6
connecting to: 10.183.2.172:27017/test
testReplicaSet:PRIMARY>
To set up a sharded cluster I'll use the 3 MongoDB data servers as setup previously, 3 config servers and one mongos (the LB node that will distribute reads and writes to the shards). To setup the 3 config servers:
File: gistfile1.txt
-------------------
[mongo-config-n01,2,3] echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" > /etc/apt/sources.list.d/mongodb-org-3.2.list
[mongo-config-n01,2,3] apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
[mongo-config-n01,2,3] apt-get update && apt-get install -y mongodb-org
[mongo-config-n01,2,3] pkill mongo && sudo -u mongodb /usr/bin/mongod --config /etc/mongod.conf --fork
[mongo-config-n01,2,3] cat /etc/mongod.conf | egrep -A2 "replication|sharding"
replication:
replSetName: shardReplicaSet
sharding:
clusterRole: configsvr
[mongo-config-n01]$ mongo 10.183.3.6
MongoDB shell version: 3.2.6
connecting to: 10.183.3.6/test
> rs.initiate( {
... _id: "shardReplicaSet",
... configsvr: true,
... members: [
... { _id: 0, host: "10.183.3.6:27017" },
... { _id: 1, host: "10.183.3.10:27017" },
... { _id: 2, host: "10.183.3.46:27017" }
... ]
... } )
{ "ok" : 1 }
shardReplicaSet:PRIMARY> rs.status()
{
"set" : "shardReplicaSet",
"date" : ISODate("2016-05-31T19:27:31.272Z"),
"myState" : 1,
"term" : NumberLong(1),
"configsvr" : true,
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "10.183.3.6:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1122,
"optime" : {
"ts" : Timestamp(1464722365, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T19:19:25Z"),
"electionTime" : Timestamp(1464721787, 2),
"electionDate" : ISODate("2016-05-31T19:09:47Z"),
"configVersion" : 2,
"self" : true
},
{
"_id" : 1,
"name" : "10.183.3.10:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 485,
"optime" : {
"ts" : Timestamp(1464722365, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T19:19:25Z"),
"lastHeartbeat" : ISODate("2016-05-31T19:27:29.854Z"),
"lastHeartbeatRecv" : ISODate("2016-05-31T19:27:26.689Z"),
"pingMs" : NumberLong(0),
"configVersion" : 2
},
{
"_id" : 2,
"name" : "10.183.3.46:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 485,
"optime" : {
"ts" : Timestamp(1464722365, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2016-05-31T19:19:25Z"),
"lastHeartbeat" : ISODate("2016-05-31T19:27:30.029Z"),
"lastHeartbeatRecv" : ISODate("2016-05-31T19:27:26.702Z"),
"pingMs" : NumberLong(1),
"configVersion" : 2
}
],
"ok" : 1
}
shardReplicaSet:PRIMARY>
To configure the mongos server:
File: gistfile1.txt
-------------------
[mongos-n01] echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" > /etc/apt/sources.list.d/mongodb-org-3.2.list
[mongos-n01] apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
[mongos-n01] apt-get update && apt-get install -y mongodb-org && pkill mongo
[mongos-n01] cat /etc/mongos.conf | grep -A2 sharding
sharding:
configDB: shardReplicaSet/10.183.3.6:27017,10.183.3.10:27017,10.183.3.46:27017
[mongos-n01] sudo -u mongodb /usr/bin/mongos --config /etc/mongos.conf --fork
[mongos-n01] cat /var/log/mongodb/mongod.log
2016-05-31T19:39:42.836+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2016-05-31T19:39:42.843+0000 I SHARDING [mongosMain] MongoS version 3.2.6 starting: pid=3243 port=27017 64-bit host=mongos-n01 (--help for usage)
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] db version v3.2.6
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] git version: 05552b562c7a0b3143a729aaa0838e558dc49b25
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] allocator: tcmalloc
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] modules: none
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] build environment:
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] distmod: ubuntu1404
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] distarch: x86_64
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] target_arch: x86_64
2016-05-31T19:39:42.843+0000 I CONTROL [mongosMain] options: { config: "/etc/mongos.conf", net: { bindIp: "10.183.2.10", port: 27017 }, processManagement: { fork: true }, sharding: { configDB: "shardReplicaSet/10.183.3.6:27017,10.183.3.10:27017,10.183.3.46:27017" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2016-05-31T19:39:42.844+0000 I SHARDING [mongosMain] Updating config server connection string to: shardReplicaSet/10.183.3.6:27017,10.183.3.10:27017,10.183.3.46:27017
2016-05-31T19:39:42.844+0000 I NETWORK [mongosMain] Starting new replica set monitor for shardReplicaSet/10.183.3.10:27017,10.183.3.46:27017,10.183.3.6:27017
2016-05-31T19:39:42.844+0000 I NETWORK [ReplicaSetMonitorWatcher] starting
2016-05-31T19:39:42.846+0000 I SHARDING [thread1] creating distributed lock ping thread for process mongos-n01:27017:1464723582:-798954996 (sleeping for 30000ms)
2016-05-31T19:39:42.852+0000 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 10.183.3.6:27017
2016-05-31T19:39:42.854+0000 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 10.183.3.6:27017
2016-05-31T19:39:42.861+0000 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 10.183.3.46:27017
2016-05-31T19:39:47.051+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
2016-05-31T19:39:47.200+0000 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-05-31T19:39:47.203+0000 I SHARDING [Balancer] about to contact config servers and shards
2016-05-31T19:39:47.204+0000 I SHARDING [Balancer] config servers and shards contacted successfully
2016-05-31T19:39:47.204+0000 I SHARDING [Balancer] balancer id: mongos-n01:27017 started
2016-05-31T19:39:47.225+0000 I ASIO [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to 10.183.3.10:27017
2016-05-31T19:39:47.236+0000 I NETWORK [mongosMain] waiting for connections on port 27017
2016-05-31T19:39:47.966+0000 I SHARDING [Balancer] distributed lock 'balancer' acquired for 'doing balance round', ts : 574de88334f6a5aefbb74e5d
2016-05-31T19:39:47.971+0000 I SHARDING [Balancer] distributed lock with ts: 574de88334f6a5aefbb74e5d' unlocked.
2016-05-31T19:39:57.984+0000 I SHARDING [Balancer] distributed lock 'balancer' acquired for 'doing balance round', ts : 574de88d34f6a5aefbb74e5e
[mongos-n01] mongo 10.183.2.10
mongos> sh.addShard("testReplicaSet/10.183.2.172:27017")
{ "shardAdded" : "testReplicaSet", "ok" : 1 }
mongos> sh.addShard("testReplicaSet/10.183.2.198:27017")
{ "shardAdded" : "testReplicaSet", "ok" : 1 }
mongos> sh.addShard("testReplicaSet/10.183.1.69:27017")
{ "shardAdded" : "testReplicaSet", "ok" : 1 }
mongos>
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("574de87e34f6a5aefbb74e5b")
}
shards:
{ "_id" : "testReplicaSet", "host" : "testReplicaSet/10.183.1.69:27017,10.183.2.172:27017,10.183.2.198:27017" }
active mongoses:
"3.2.6" : 1
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "test", "primary" : "testReplicaSet", "partitioned" : false }
mongos>
Make sure you have the correct users created on all servers:
File: gistfile1.txt
-------------------
mongo> admin = db.getSiblingDB("admin")
mongo> admin.createUser(
{
user: "konstantin",
pwd: "changeit",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
mongo> db.getSiblingDB("admin").auth("konstantin", "changeit" )
mongo> db.getSiblingDB("admin").createUser(
{
"user" : "cluster_admin",
"pwd" : "changeit",
roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
}
)
mongo>
And finally all configs at a minimum should look something like this:
File: gistfile1.txt
-------------------
[mongodb-n01,2,3] cat /etc/mongod.conf
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 10.183.2.172
security:
keyFile: /etc/mongo.key
clusterAuthMode: keyFile
authorization: enabled
replication:
replSetName: testReplicaSet
sharding:
clusterRole: shardsvr
[mongo-config-n01,2,3] cat /etc/mongod.conf
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 10.183.3.6
security:
keyFile: /etc/mongo.key
clusterAuthMode: keyFile
authorization: enabled
replication:
replSetName: shardReplicaSet
sharding:
clusterRole: configsvr
[mongos-n01] cat /etc/mongod.conf
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
net:
port: 27017
bindIp: 10.183.2.10
security:
keyFile: /etc/mongo.key
sharding:
configDB: shardReplicaSet/10.183.3.6:27017,10.183.3.10:27017,10.183.3.46:27017
Resources:
[1]. https://en.wikipedia.org/wiki/MongoDB
[2]. https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html