‘Mongos’ your MongoDB scharding hub

Let’s get right into it. This is going to be a small tutorial on how to shard your data using MongoDB. So grab that cup of coffee, turn on some tunes, and let’s get cranking away. I’m going to assume a few things:

1. You know why sharding is good for managing large amounts of content.
2. You’re ok with not knowing what column(S) to shard against.
3. You have a working MongoDB+PHP set up. (Look at reference if not)

What we’re shooting for
I like to know what im getting my self into before I start any project and I’m going to assume you do to. Once done you’re going to have the set up shown in Figure 1.1. Of course, for this example we’ll run everything on one machine but in a production ready environment you should dedicate each of the services its own server.

Figure 1.1 – MongoDB Shard Topology

The figure above contains 6 objects. The mongo config server, ‘mongos’, 3 mongoDB servers, and your application.

So what’s Mongos (in a nutshell)
Mongos is a service which sits between your application and the mongoDBs. It communicates with the
configuration server to determine where the requested data lives, shard1, shard2, or shardN. It then fetches the data from the shards (if the data lives in more than one location), aggregates the data, and returns it in JSON form.

Setting up MongoDB
I’m going to forgo providing a full list of how-to’s here since there is great documentation on how to install mongoDB on Windows, Unix, you name it here:

1. http://www.mongodb.org/display/DOCS/Quickstart+OS+X (OS X)
2. http://www.mongodb.org/display/DOCS/Quickstart+Unix (Unix)
3. http://www.mongodb.org/display/DOCS/Quickstart+Windows (Windows)


What I will say though is this. If you run into any issues installing mongoDB on Ubuntu try these steps.
1. Remove mondoDB

> sudo apt-get remove mongodb-client
Listing 1.1 – Remove mongodb-client

2. Follow the instruction here: http://www.mongodb.org/display/DOCS/Ubuntu+and+Debian+packages

Creating the directories
Once you’re done setting up your mongoDB environment we’re going to need a few directories. Each of the below directories will hold the data mongoDB needs. We need 3 directories.

> sudo mkdir -p /db/data/config
> sudo mkdir -p /db/data/shard1
> sudo mkdir -p /db/data/shard2

Listing 1.2 – Create 3 directories

The initial command will create the directory in which the configuration server will place the necessary content for mongos to use. The second and the third lines create the directories used by your mongoDB instances used for sharding.

Setting up the config server
The first step is to set-up our look up system, the configuration server. Mongos uses the configuration server to determine where the content lives mong other things.

> sudo mongod --dbpath "/db/data/config" --port 100381 --configsvr
Listing 1.3 – Creating and starting mongoDB config server

The command shown in Listing 1.3 uses mongod to create and start the mongoDB configuration daemon. We assign it a port number, “100381”, and use the special flag, ‘configsvr’. This allows mongoDB to identify this instance as a configuration server.

Setting up mongos
With the configuration server ready let’s start up a mongos instance and assign it a configuration server to use.

> sudo mongos --configdb localhost:100381 --port 100382 --chunkSize 1
Listing 1.4 – Set up and start up mongos

The above command will start the mongos daemon and use the configuration server we created by using the flag “configdb” and passing it the host and port of the configuration server. Using the “port” flag we also allowed mongos to listen on port 100382 and assigned it a max allowed data size using “chunkSize”.

Setting up the shard boxes
Now let’s set up the boxes containing our data.

> sudo mongod --port 100383 --dbpath /db/data/shard1 --shardsvr
> sudo mongod --port 100384 --dbpath /db/data/shard2 --shardsvr

Listing 1.5 – Set-up and start mongoDB shard servers

The above commands create and start 2 mongoDB shard instances. Both commands start mondod instance, assigning them unique port numbers, assigning a db directory to use (we created these directories above), and finally assign them the “shardsvr” flag.

Adding the shard servers to mongos
Log onto your mongos instance and add the shard servers created above.

> mongo localhost:100382
> use admin
> db.runCommand({addshard: "localhost:100383", allowLocal: true})
> db.runCommand({addshard: "localhost:100384", allowLocal: true})

Listing 1.6 – Add shard servers to mongos

The first command shown in Listing 1.6 connects to the mongos instance. Keep in mind that you will connect to this instance when making changes or fetching data for your application. The second command places you in the admin database to execute further commands. Finally commands 3 and 4 will add the shard. If you want to add more shards simply run the command and change out the host:port to point to the new shard server.

Ok your done. Your environment is ready to accept data. You have set up a mongoDB shard environment using mongos, 2 mongoDB instances, and a mongo configuration service. Awesome, time to test.

Adding data for testing.
Log onto your mongos instance and create a test database, “shardhappens” for testing.

> mongo localhost:100382
> use admin
> shardhappens = db.getSisterDB("shardhappens")
> db.runCommand({enablesharding: "shardhappens"})

Listing 1.7 – Create a database to for testing

With our database created we need to create a collection as well as a key to shard against.
> db.runCommand({shardcollection: "shardhappens.pebbles", key: {item_id: 1})
Listing 1.8 – Shard by a specific key

Listing 1.8 will shard our data within the pebbles collection using the key (column), “item_id”. Now let’s add in some data to shard using PHP.

Add data for testing
Create a file named, “insert.php”, and save it were the above mongoDB environment is set up. Run the PHP script and wait for it to end. Once it’s done you will have data spread evenly across 2 mongoDB instances to play with.

$collection = $db->pebbles;
for($i=0; $i < 1000; $i++) { $data = array(); $data['item_id'] = rand(1, 1000); $data['text'] = "Hi mom! I'm sharding some data here...*high five*"; $collection->insert($data);

catch(Exception $e)
echo $e->getMessage();

Listing 1.9 – PHP to insert data into MongoDB

Once the script has completed log back into your mongos instance and check the chunk sizes for each shard.

> mongo localhost:100382
> printShardingStatus()

Listing 1.10 – Checking shard status

You should see an output similar to the text below.

shardhappens.pebbles chunks:
shard0001 10
shard0002 10

Listing 1.11 – Shard status print out.

The values for 10 might be different for you. That’s it. You just used mongoDB to shard content.

Even though we did not go into much details on shard theory or take a deep dive into mongoDB, you successfully utilized all the mongoDB tools to shard data as well as used PHP to connect and insert content in your mongoDB environment.

1. http://us2.php.net/manual/en/mongo.installation.php
2. http://www.mongodb.org/display/DOCS/Sharding+Introduction