Aug 29, 2019 by Thibault Debatty | 5119 views
https://cylab.be/blog/36/mongodb-sharding-on-a-single-server
Sharding is a method for distributing data (and load) across multiple machines. In this blog post we show how to test MongoDB sharding by running multiple MongoDB instances on a single Ubuntu 16.04 machine.
A MongoDB cluster consists of 3 components:
mongod --shardsvr
on the command line (hence the same as for a regular MongoDB server).In MongoDB, shard servers do not perform automatic discovery or automatic data distribution. Hence the cluster requires
mongod --configsvr
The driver API of MongoDB (the one you import in your client application) is not able to use sharded cluster directly. Hence the cluster also requires
mongos --configdb <config server>
(hence this time it's another binary).MongoDB uses a shard key to distribute the data between the servers. You can use the _id of each record, but any other simple or compound key is possible.
By default, MongoDB performs range-based sharding:
Range-based sharding allows to speedup range queries. However, this requires the user to apriori choose appropriate shard keys. A poorly chosen shard key may result in serious imbalance between servers. Luckily, MongoDB is able to automatically modify shard key ranges to re-distribute data between the servers.
For this example, we will run all MongoDB servers on a single machine. Therefore we have to:
We will have 2 shard servers, 1 configuration server and 1 router :
First of all, make sure MongoDB is installed and is correctly stopped:
sudo apt-get install mongodb
sudo service mongdb stop
Now you can start shard01 in a terminal:
mkdir -p /srv/mongodb/shard01
mongod --shardsvr --port 27018 --dbpath /srv/mongodb/shard01
In another terminal, start shard02:
mkdir -p /srv/mongodb/shard02
mongod --shardsvr --port 27020 --dbpath /srv/mongodb/shard02
Then config:
mkdir -p /srv/mongodb/config
mongod --configsvr --port 27019 --dbpath /srv/mongodb/config
And finally we can start the router:
mongos --port 27017 --configdb 127.0.0.1:27019
Alternatively, you can start them all using a single script:
#!/bin/bash
ROOT="~/mongodb"
mkdir -p "$ROOT/log"
mkdir -p "$ROOT/shard01"
mongod --shardsvr --port 27018 --dbpath "$ROOT/shard01" --fork \
--logpath "$ROOT/log/shard01.log"
mkdir -p "$ROOT/shard02"
mongod --shardsvr --port 27020 --dbpath "$ROOT/shard02" --fork \
--logpath "$ROOT/log/shard02.log"
mkdir -p "$ROOT/config"
mongod --configsvr --port 27019 --dbpath "$ROOT/config" --fork \
--logpath "$ROOT/log/config.log"
mongos --port 27017 --configdb 127.0.0.1:27019 --fork \
--logpath "$ROOT/log/router.log"
Until now, the router knows where the config server is located, but the configuration server has no information about the running shard servers. We have to configure this manually, by connecting to the router:
mongo localhost:27017
To add the shards:
sh.addShard("127.0.0.1:27018")
sh.addShard("127.0.0.1:27020")
Then enable sharding for our database:
sh.enableSharding("test")
And finally enable sharding for the collection:
sh.shardCollection("test.testcollection", {_id:1})
We can now use the sharded cluster, as if it was a single MongoDB cluster:
use test
db.testcollection.insert({name: "Me"})
db.testcollection.find()
We can always obtain additional information about the shard by running:
sh.status()
...
shards:
{ "_id" : "shard0000", "host" : "127.0.0.1:27018" }
{ "_id" : "shard0001", "host" : "127.0.0.1:27019" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.testcollection
shard key: { "_id" : 1 }
chunks:
shard0001 1
shard0000 1
{ "_id" : { "$minKey" : 1 } } -->>
{ "_id" : ObjectId("5d677cf65b26b77c1eba1c00") }
on : shard0001 Timestamp(2, 0)
{ "_id" : ObjectId("5d677cf65b26b77c1eba1c00") } -->>
{ "_id" : { "$maxKey" : 1 } }
on : shard0000 Timestamp(2, 1)
The last lines of this command also shows which ranges of the shard key are split in chunks and distributed between the different shard servers.
This blog post is licensed under CC BY-SA 4.0