How to Build a Storage System

Challenge to Rack Labs from the business based on real customer need:

Build us a storage system that is the following:

-Quick to Market:  We needed to develop the system with tools/ technologies that were well known at Rackspace and Mosso.

-Scalable: It needs to provide the same level of performance to 1,000 users as it does 100,000 users.  The system must scale horizontally.  As our customer base and their needs grow, so should our system.

-Dynamic: The space should grow/shrink with demand.  Users shouldn’t need to do capacity planning and over-buy/under-utilize space, or worry about running out of space.  Customers should only pay for what they use.

-Low cost: It needs to be inexpensive.  We were targeting $0.15/ Gig.

-Developer friendly: Make it easily accessible to developers.  It should support generic web-friendly interfaces as well as more traditional, language-specific technologies.

-Reliable performance: It should perform the same whether a customer is storing 5Gigs of data or 5TB of data.

-Redundant: Make sure that data will always be available. The system should keep multiple copies of the data and be built in a manner that is redundant at both the hardware and network layers.

-Secure: Data should be protected.  Ensure that data is secure and only available to the specific customer.  All traffic should be encrypted over SSL and data needs to be stored in an internal private network largely isolated from other Rackspace networks.

Rack Labs interpretation of this request:

How to build a Storage System - by Rack Labs

Business Design, the one liner:

Build us a Storage System that is easy to use, highly available, very secure, fast, extremely scalable,  simple to maintain, cost effective for the customer and oh, it has to be done as inexpensively and as quickly as possible.  A developer’s dream.

Lots of problems.  Let’s knock ‘em out.

1. Problem:  Finding good coders is hard.

Solution

Well first you find the smartest developer in the company and put him in a room and tell him you want a storage system.  Give him the room he needs to experiment, create and exercise his full capabilities.  When he starts showing some results, you add the second smartest guy in the company to the team and tell him to help.  Dim the lights, turn on the AC and supply ample amounts of Ruta Maya coffee and more smart folks will join.

2.  Problem: Gotta be cheap.  No fancy gear, heck, all you get is what we got.

Solution

Rackspace is a hosting company.  We have a ton of perfectly good older servers around that are not in use anymore. First of all you take some of those “seasoned” servers and you rebuild them from the ground up with exactly what you need.  Heck we’ve got the parts. We don’t need lots of processing power. We don’t need much RAM.  We need disk space and a lot of it.

Can we buy some big hard drives? Sure.

3. Problem: Build a super redundant system using a bunch of old servers.

Solution

Use a lot of them and make multiple copies of everyone’s data and stick them in different areas of the DC for good measure.  Build the system so that if any one server dies, just plug another one in.  The system is smart enough to start making copies when it sees a failure anywhere.

4. Problem: This system is going to run on a bunch of servers.  That means provisioning and maintaining them.  People are expensive.

Solution

We built a provisioning system that is completely plug and play.  The servers are easy to build and we do that hundreds of times a day for our other customers.  CloudFS servers can be slotted anywhere in the DC and they boot off the network.  If one fails, we just stick another one in its place.  Fast and easy.  We also built a monitoring and control console that is heavily automated and the entire system can be by just a small crew of folks.

5. Problem: Build a publically accessible system that is privately secure.

Solution

This was a challenge but we figured it out.  First of all we isolated the network so no one else in our DCs has access to it.  Then we layered on all the fun encryption you find in SSL.  We are confident that no one will know what anyone else is storing unless, of course, they want them to.

6. Problem:  This system has to be easy to use.  What is easy to us is not necessarily easy to everyone else.

Solution

Mosso and Rackspace.  The hard-core developers can code directly to language specific APIs but Mosso and Rackspace will be adding interfaces that will make it simple enough for our parents to possibly use (still trying to keep email a mystery to them though!)  Interested in being a partner in this?  The door is open.

7.Problem:  Quick to market.  How many times have we heard this before?

Solution

This one was not easy to solve.  We think “quick” is 2 years.  They think it is 1.  We hired a few more folks to help out and we stay up late at night.  The nice thing is that we get to release something solid.  Does it have all the bells and whistles we want?  Not yet, but we are working on it.  What the Beta testers will see is only the beginning.

The Result - Today Mosso is announcing a private beta for a new storage system called CloudFS. CloudFS is ready for testing. We are looking for a select group of beta testers to help us put it through its paces. We know we’ve got some more work to do to officially release this as a product, but there’s nothing like real users to help “push” us along. If you’re interested head to our CloudFS page on Mosso.

Want to help out?  Know Python or Java?  We need you. Email us: 0XC0FFEE@racklabs.com

(Posted on behalf of the Rack Labs team)


Tech Talk 3: Practical Web Semantics

Posted by Bill Boebel, Mailtrust

Join Mailtrust again on Monday April 28th for a presentation by Manu Sporny of Digital Bazaar covering the Semantic Web.

When: Monday, April 28, 2008, 6:00 PM Eastern
Where: Mailtrust, 775 University City Blvd, Blacksburg VA
RSVP: techtalk@mailtrust.com (free pizza!!)

From the Virginia Tech campus, take the Tom’s Creek B bus to the first University City Blvd stop. The bus runs every 10 minutes.

Summary of the talk:

A practical introduction to The Semantic Web and the technologies that enable web developers and bloggers to embed meaning, such as marking up people, places, events, music and locations, into websites. Areas covered will include Resource Description Framework (RDF) basics including CURIEs and N3 notation, implementation approaches such as Microformats and RDFa, and authoring tools such as Operator and Fuzzbot. The talk will be given by Manu Sporny, who is an Invited Expert to the World Wide Web Consortium (W3C), one of the primary RDFa Task Force members working on the RDFa specification and the primary author of the hAudio Microformat specification.

If you can’t make it, don’t worry… stay tuned to this blog for a video of the talk shortly after the event.


Clouds Everywhere, Including Rackspace

Big news from Google this week. As expected, they launched an application hosting offer called AppEngine.

I am not going to dwell on the details of what it can and cannot do since that has been covered ad nauseam on the web. Either way, this move was expected, and is a great proof point of the hosting revolution ahead of us. There are millions of servers in the world providing all sorts of compute functionality. Over the next decade, we think that this computing will be done in a different model – the hosting model. In the past there were two main ways computing was done: in house (or, do-it-yourself) and outsourced (i.e. IBM comes in and does it for you). Hosting is a totally new way to do computing. It started with web-specific technologies but is invading all sorts of things in IT now. Hosting, when done right, offers more power and more reliability at a lower cost than either doing it yourself or outsourcing.

Now, what about all these clouds? I will be doing a series of posts on them. But here are a few key thoughts:

First point, EC2/S3 and AppEngine are flavors of hosting, but not the same thing. Amazon is what I call a “components cloud.” You can buy raw compute and storage. What you do with it is up to you. AppEngine is what I call a “whole stack” cloud. A series of tools are made available to you, and if you can build what you want with them, then you are in great shape- everything is taken care of. AppEngine is very prescriptive. That is the downside. If you fit however, it is a massive gain in power and capability. It’s the classic trade. Want flexibility, then you need to deal with complexity; willing to compromise, your life gets easy.

Second point, I think both types of clouds are here to stay and are important. Will some people move from one cloud to another? Sure. But, I believe most people will use different tools for different things.

Third point, Rackspace is committed to the cloud. Our strategy should become more apparent to the world in the coming months, but our Mosso offer has been in the wild for some time. We are advancing it fast and already 45,000 applications are running live on it. The Mosso offering is a “whole stack” cloud, but with a major difference from Google - we support commonly used stacks. LAMP, .Net and Ruby today. I think openness to standards and lack of so-called “cloud lock in” (i.e. you can’t move an app off AppEngine - they only support THEIR stack) will be key variables in the cloud race.

Fourth point, the hosting category also includes Managed Hosting, Dedicated Hosting, Email Hosting, Shared Hosting, as well as the relatively new “Cloud Hosting”. I don’t think the current flavors of hosting will die. The mass migration of computing to this new model should drive demands for all sorts of hosting. The idea that Google will host the Internet just seems unrealistic and counter to the history of business. I am biased in that view, but I believe that businesses and consumers alike will use many different flavors of hosting. Always.

This is an exciting time in our space. The hosting revolution is just getting going. The revolution centers on the idea that IT needs to do more, do it faster and do it cheaper. Cloud based hosting services should help make it happen. We look forward to being a part of it.


Tech Talk videos online

Posted by Bill Boebel, CTO, Mailtrust

In February and March we hosted our first two Tech Talks at our Mailtrust office in Blacksburg. We opened it up to our employees, local developers at other companies, as well as students from Virginia Tech, with a goal of bringing people together to share information about interesting technologies. The turn out was awesome and people asked great questions.

We recorded the events in case you couldn’t attend…

Tech Talk 1: MapReduce vs MySQL (speaker Stu Hood)
video on Y!DN

Tech Talk 2: Next Generation Data Storage with CouchDB (speaker: Jan Lehnardt)
video part 1
video part 2

Thanks Stu and Jan for the great presentations! Jan will be giving his CouchDB presentation again tomorrow at RubyConf in Salt Lake City, and Stu will be giving his MapReduce presentation again Tuesday at Java SIG in Palo Alto. If you’re in the area, I highly recommend hitting these talks.

And for those in the Blacksburg VA area, Manu Sporny from Digital Bazaar will be at our office on April 28th at 6pm for Tech Talk 3. He will be giving an introduction to the Semantic Web and talking about his work with the W3C and RDFa Task Force.


Google’s Dangerous Trade

200803191002.jpg

Google Sites is live, continuing the inevitable creep of Google into the hosting space. Truthfully, I don’t believe the current offers overlap with ours, but it is safe to assume they will keep advancing. We realize this and have to deal with it just like we have to deal with other behemoths like AT&T and IBM. But, for the leading simple site hosters (GoDaddy, Web.com, etc.) this move has to really hurt. A free offer from the most known Internet brand can’t be ignored.

But, I want to make the case that Google is making a mistake. I think this move will end up being a negative for their long term results. Here is the argument:

1. They already get most of the profit from the hosting space. Hosting leads are auctioned through Google’s adwords and search keywords. Companies bid up to the point that customer acquisition becomes just slightly profitable. Meaning, the ad eats into profits big time. And, trust me, much of the hosting acquisition comes from web sites and search where Google dominates placement. Let’s look at the math.

“Web hosting” as a keyword on Google gets searched 2 million times a month.

The top spot right now goes for about $15 per click.

Let’s say the average hoster converts 10% of those clicks to customers (a pretty aggressive assumption).

That means for every $10/month hosting account converted, Google is getting $150 ($15 per click x 10 clicks needed for a sale).

Without any marginal cost (ok, serving that search page probably costs .00001 cents or something tiny) that is almost ALL profit.

Now, let’s compare that to the hoster: let’s assume a 30% profit margin for a 30 month life (which is high in both profit and customer life). That is only $90 of profits ($10/month * .30 margin *30 months) over the life of the customer.

Who would you rather be? Google has the ultimate model. They get paid for leads in an auction process and can deliver those leads at super low cost. Why are they making this trade?

Picture 2-1.jpg

2. It undermines their search franchise. The more Google starts to become a funnel to their own services, the less trusted they will be as a search engine. I honestly believe that the tech world is starting to turn against Google. Why wouldn’t they? They all feel threatened by them (all the while paying them big bucks for advertising). Small groups of defectors like this create new players. Hey, that’s how Google started! The techies fell in love and it spread. Yahoo, AltaVista, Excite, all the big search 1.0 leaders were overwhelmed by the viral love of Google from the tech world. I don’t see anyone new yet, but expect someone to emerge. Personally, I am loving Mahalo (which still relies on Google for ads).

3. It is hard, messy and people intensive. Hosting is a very different business than software development and search. It will require a new set of skills, customer service operations and a lot more accountability to the business user. It just seems to be a distraction from the core mission of Google. I realize Google needs the next big growth engine, but just because servers are involved does not mean it is closely tied to their core.

4. Privacy concerns expand. As Google starts to house more of your business data, and does it for free, there are going to be increasing questions about why? Whether true or not, perceptions are going to grow that the data is being used to drive other businesses including ads. The terms of service of apps for your domain are clear that the data can be used for Google’s business goals. These growing concerns will undermine the overall power of the Google brand.

That’s the argument. Does it hold water? Would love to hear your thoughts. This could be just the rambling of a threatened business guy, but we honestly don’ believe we compete with Google today. I am interested in this as a strategist and I really think this is a dangerous move for Google. I am positive it will create issues they do not expect.


Last 1.5 Days at TED

I am a week late, but wanted to give some highlights from the last part of TED. Ended up being a great conference. I will surely be back if I can ever get my act together and register in time.

High points from the last half of conference:

Obvious choice: Benjamin Zander. The conductor of the Boston Philharmonic stole the show. He at least doubled the 18 minute time restriction and got the whole crowd singing in German and eager to learn more about classical music. But, more than anything, his energy and enthusiasm was simply inspiring. Seeing him conduct in Boston has been added to my list of things I have to do.

200803092130

Unpopular choice: Nassim Nicholas Taleb. Author of the Black Swan and Fooled by Randomness outlined the key aspects of his latest book which focus on the inevitable impact of unpredictable, earth shattering events. What I found so interesting is how dissonant his talk was given he was speaking to a group eager to learn and use knowledge for change. He basically asked the group to focus on how little we know and how little we can impact complex systems. Not sure I agree with him 100% but I think he made the TED crowd more uncomfortable than anyone, an accomplishment in its own right.

Personal favorite: Chris Abani. The novelist who I have never read (but will) spoke about his personal and heartbreaking experiences of growing up in Nigeria. He was truly captivating and exuded such a peace for a guy who has seen such hell.


200803092132

Few not so great moments and suggestions for the TED crew:

Session themes. I am not sure I understood how the speeches held together. I think that is okay, but in some cases it seemed to take the speakers off their game. They tried to fit what they knew into the topic. Those that just ignored the theme did better. TED seems to be about exploring our dynamic world and the forces driving it. Not sure the session themes added much to that basic idea.

Academia rules. I love to hear those that study and research all day, but I would have loved to hear more from real world practitioners. I think this is what made Ben Zander so great. He shared his experiences, not his thoughts. TED should work to balance the academy with the achievers.

Thomas Dolby band. This might be TED heresy, but I found the Dolby band interludes slightly obnoxious. Quirky and unusual yes, but a song or two would have been enough. Not sure they should be the official band of TED.

Despite a few suggestions, overall kudos to Chris Anderson and the whole TED crew. They have created something really special that will continue to make news for years to come. I counted myself lucky to be there.


Previous Posts