‘Mongos’ your MongoDB scharding hub

October 6th, 2011

Let’s get right into it. This is going to be a small tutorial on how to shard your data using MongoDB. So grab that cup of coffee, turn on some tunes, and let’s get cranking away. I’m going to assume a few things:

1. You know why sharding is good for managing large amounts of content.
2. You’re ok with not knowing what column(S) to shard against.
3. You have a working MongoDB+PHP set up. (Look at reference if not)

What we’re shooting for
I like to know what im getting my self into before I start any project and I’m going to assume you do to. Once done you’re going to have the set up shown in Figure 1.1. Of course, for this example we’ll run everything on one machine but in a production ready environment you should dedicate each of the services its own server.

Figure 1.1 – MongoDB Shard Topology

The figure above contains 6 objects. The mongo config server, ‘mongos’, 3 mongoDB servers, and your application.

So what’s Mongos (in a nutshell)
Mongos is a service which sits between your application and the mongoDBs. It communicates with the
configuration server to determine where the requested data lives, shard1, shard2, or shardN. It then fetches the data from the shards (if the data lives in more than one location), aggregates the data, and returns it in JSON form.

Setting up MongoDB
I’m going to forgo providing a full list of how-to’s here since there is great documentation on how to install mongoDB on Windows, Unix, you name it here:

1. http://www.mongodb.org/display/DOCS/Quickstart+OS+X (OS X)
2. http://www.mongodb.org/display/DOCS/Quickstart+Unix (Unix)
3. http://www.mongodb.org/display/DOCS/Quickstart+Windows (Windows)

http://www.mongodb.org/display/DOCS/Ubuntu+and+Debian+packages

What I will say though is this. If you run into any issues installing mongoDB on Ubuntu try these steps.
1. Remove mondoDB

> sudo apt-get remove mongodb-client
Listing 1.1 – Remove mongodb-client

2. Follow the instruction here: http://www.mongodb.org/display/DOCS/Ubuntu+and+Debian+packages

Creating the directories
Once you’re done setting up your mongoDB environment we’re going to need a few directories. Each of the below directories will hold the data mongoDB needs. We need 3 directories.


> sudo mkdir -p /db/data/config
> sudo mkdir -p /db/data/shard1
> sudo mkdir -p /db/data/shard2

Listing 1.2 – Create 3 directories

The initial command will create the directory in which the configuration server will place the necessary content for mongos to use. The second and the third lines create the directories used by your mongoDB instances used for sharding.

Setting up the config server
The first step is to set-up our look up system, the configuration server. Mongos uses the configuration server to determine where the content lives mong other things.

> sudo mongod --dbpath "/db/data/config" --port 100381 --configsvr
Listing 1.3 – Creating and starting mongoDB config server

The command shown in Listing 1.3 uses mongod to create and start the mongoDB configuration daemon. We assign it a port number, “100381″, and use the special flag, ‘configsvr’. This allows mongoDB to identify this instance as a configuration server.

Setting up mongos
With the configuration server ready let’s start up a mongos instance and assign it a configuration server to use.

> sudo mongos --configdb localhost:100381 --port 100382 --chunkSize 1
Listing 1.4 – Set up and start up mongos

The above command will start the mongos daemon and use the configuration server we created by using the flag “configdb” and passing it the host and port of the configuration server. Using the “port” flag we also allowed mongos to listen on port 100382 and assigned it a max allowed data size using “chunkSize”.

Setting up the shard boxes
Now let’s set up the boxes containing our data.

> sudo mongod --port 100383 --dbpath /db/data/shard1 --shardsvr
> sudo mongod --port 100384 --dbpath /db/data/shard2 --shardsvr

Listing 1.5 – Set-up and start mongoDB shard servers

The above commands create and start 2 mongoDB shard instances. Both commands start mondod instance, assigning them unique port numbers, assigning a db directory to use (we created these directories above), and finally assign them the “shardsvr” flag.

Adding the shard servers to mongos
Log onto your mongos instance and add the shard servers created above.

> mongo localhost:100382
> use admin
> db.runCommand({addshard: "localhost:100383", allowLocal: true})
> db.runCommand({addshard: "localhost:100384", allowLocal: true})

Listing 1.6 – Add shard servers to mongos

The first command shown in Listing 1.6 connects to the mongos instance. Keep in mind that you will connect to this instance when making changes or fetching data for your application. The second command places you in the admin database to execute further commands. Finally commands 3 and 4 will add the shard. If you want to add more shards simply run the command and change out the host:port to point to the new shard server.

Ok your done. Your environment is ready to accept data. You have set up a mongoDB shard environment using mongos, 2 mongoDB instances, and a mongo configuration service. Awesome, time to test.

Adding data for testing.
Log onto your mongos instance and create a test database, “shardhappens” for testing.

> mongo localhost:100382
> use admin
> shardhappens = db.getSisterDB("shardhappens")
> db.runCommand({enablesharding: "shardhappens"})

Listing 1.7 – Create a database to for testing

With our database created we need to create a collection as well as a key to shard against.
> db.runCommand({shardcollection: "shardhappens.pebbles", key: {item_id: 1})
Listing 1.8 – Shard by a specific key

Listing 1.8 will shard our data within the pebbles collection using the key (column), “item_id”. Now let’s add in some data to shard using PHP.

Add data for testing
Create a file named, “insert.php”, and save it were the above mongoDB environment is set up. Run the PHP script and wait for it to end. Once it’s done you will have data spread evenly across 2 mongoDB instances to play with.


try
{
$mongo = new Mongo("localhost:100382");
$db = $mongo->selectDB('shardhappens');
$collection = $db->pebbles;
for($i=0; $i < 1000; $i++)
{
$data = array();
$data['item_id'] = rand(1, 1000);
$data['text'] = "Hi mom! I'm sharding some data here...*high five*";
$collection->insert($data);
}

}
catch(Exception $e)
{
echo $e->getMessage();
}

Listing 1.9 – PHP to insert data into MongoDB

Once the script has completed log back into your mongos instance and check the chunk sizes for each shard.

> mongo localhost:100382
> printShardingStatus()

Listing 1.10 – Checking shard status

You should see an output similar to the text below.

shardhappens.pebbles chunks:
shard0001 10
shard0002 10

Listing 1.11 – Shard status print out.

The values for 10 might be different for you. That’s it. You just used mongoDB to shard content.

Conclusion
Even though we did not go into much details on shard theory or take a deep dive into mongoDB, you successfully utilized all the mongoDB tools to shard data as well as used PHP to connect and insert content in your mongoDB environment.

References
1. http://us2.php.net/manual/en/mongo.installation.php
2. http://www.mongodb.org/display/DOCS/Sharding+Introduction

CCU, Page Views per minute, Unique Users per Hour, Oh My!

June 21st, 2011

Lately I’ve been fortunate to conduct an array of load tests using very large numbers and I believe I’ve stumbled upon something interesting within the Load/Performance testing community. Or rather something I might not understand yet…that’s also possible. A standardized way to load test.

Let’s take an example. If I want to load test (not stress test) an application and I’m using CCU (concurrent users – how many users are active on the site in a single second) I might have someone inform me that I need to use Page Views per hour or Unique users per hour. Let’s say that we have 100 Unique users per hour on the site. If I load test using this figure for the a duration of an hour to meet the expected 100 Unique users this is where I encounter the dilemma and sometimes confusion.

Did these 100 Unique users arrive in the first minute of the hour? The first second of the hour? Did 10 users arrive every 10 seconds for 100 seconds? Never to return again? So far these questions, from my readings are of no concern as long as you hit your the desired load within the duration of an hour. This is a wrong way to look at it and I believe test. If you’re given a task to load test an application you must check how well your infrastructure deals with the peak CCU load, you might find out that hitting your server with 100 CCU completely destroys it (if this happens at this number you have many problems) .

I’ll discuss a bit more on Benchmarking, Load Testing, and Profiling with a ZF twist and sometimes not going forward.

Armando Padilla

A week at ZendCon 2010

November 4th, 2010

Monday through Thursday what a ride. Yes I did miss some of the keynote speaker and yes I snagged a lunch which didnt belong to me. Sorry Sorry…ugh :-)

Anyways ZendCon was awesome! I’m not sure why this event does not have more participants but I have to say that I greatly enjoyed it. Thumbs up to the organizers as well as the speakers. Ill be attending Zendcon until i drop dead. :-)

Here are some key take aways from the conference.
1. Someone took Matthew Weier O’Phinney voice. He lost his voice, yet continued to participate in the talks. Pretty hardcore if you ask me.
2. If you’re not already looking at the cloud. Look up and try to spot one. No really, start reading up on it.
3. The Zend Framework 2.0 is being worked on and looks very promising in terms of performance.
4. The Zend Framework wants your feedback in terms of a Migration plan and bug hunts!
5. Zend Framework now has mobile support.
6. You MUST attend Zendcon 2011.

Reasons to attend Zendcon – Reason #1 – Networking
Does this sound familiar? “Meh, ill just read the Power Points when they are posted”. Yes that was me and it might be you. Though this is a good alternative, if you have no way to reach the con or cant come up with the $ to pay for it (btw ask your company if they can help you with this. Most companies do) I have to say that you’re missing out on a lot. The network of people you build while at the conference is great. You talk about lessons learned while tackling a specific problem and if the conversation attracts enough people youll find yourself huddled around a table chatting it up with 5 others with the same issue.

Reason #2 – Information Information information!!!!
The second reason to attend the conference is the ability to learn what other organizations are doing. I can not stress this enough. Being blind or not caring about the direction of your competitors or an industry leader in your field is tantemount to jumping out of a plane without a parachute (FAIL).

As a developer you want to know what company X is doing, how long they have tried this approach, and where its taken them. For all you know they failed and your team/company is close to implementing the same solution because it looks ‘attractive’…on paper. Zendcon provides this. I loved the open forum style talks. Each speaker shared their experiences from the field and allowed the audience to also share their experience. And yes I was typing away all these notes for my knowledge locker just in case i ever run into a similar situation.

Reason #3 – The Measuring stick doesnt lie
Finally the last and third reason I enjoy going to conferences and now enjoy going to this one. I like to measure where I’m at as an engineer. If you want to be good at what you do its not simply about building apps its about surrounding yourself with the people you want to be like. Learning from them, listen to what they are talking about, and listen to what they failed and succeed at. For me, the individual sitting on stage was a good reminder as to where I want to go. They encompassed the “do’ers” of the community in my book and with Matt it was more the breath of knowledge the guy has in his head. I kinda wanted to ask him what were the next steps to becoming a good developer and what he recommended someone in my area do to grow. But I chickened out. Next year.

All in all Zendcon a good conference to attend as a PHP developer and hope to see you there next year!

P.S. I will be posting my notes here shortly (this time for real. I know i promised slides from the Google I/O)

Google I/O 2010 – Day 1

May 19th, 2010

Google I_O_Talk_Speed_TracerAttended my first Google I/O this year and truth be told I wasnt sure what to expect. My goal was to learn a bit more of Architecture, performance, and HTM 5. Looking back at day 1, I came away feeling good that I actually made it to the event given that I felt like just heading into work.

These are my notes for the first day. The below were the talks I chose to attend and created a quick PowerPoint Deck for a portalble version of the notes.

My Schedule:
1. Measure in milliseconds redux: Meet Speed Tracer [ My Notes: Powerpoint Deck Download ]
2. Beyond JavaScript: programming the web with native code
3. Architecting for performance with GWT
4. Developing With HTML5
5. GWT Linkers target HTML5 Web Workers, Chrome Extensions, and more

Stay tuned for tomorrows deck.