What is SREally?

Let me start by describing who we are. We are a collection of Site Reliability Engineers, a particular class of software engineer that has a deep focus on all things production.

Let me summarize the inspiration for this blog by translating a typical early SRE experience into a more broadly experienced problem. I believe most developers, of any background, have either experienced something like this, or at worst can relate to it:

Nancy: Hi! I am the new engineer!
John: Hi Nancy! Its nice to meet you, we are glad to have a senior engineer on the team finally.
Nancy: I am excited to get started, can you point me at your git repo so I can start looking over the code base?
John: We don’t use git here.
Nancy: Oh, what do you use? Mercurial? Subversion?
John: We just email a zip file of the code base when we make updates.
Nancy: … Really?

A typical SRE will often have this exact same experience when joining a company, though with areas that might surprise typical developers. The goal is to document some of the things that can be done early in order to help your environment when you start to scale up.

Now, before you worry that this will end up being a bunch of recommendations that push so hard for being perfect that it forgets that getting stuff done is the most important metric, lets clarify that our goal is to make sure that the information is there to make better choices. Sometimes technical debt is necessary, but its always best to take on less where possible, and in ideal cases we can highlight ways to help that will be more efficient without taking on any technical debt at all!

So, who are we and why do we feel qualified to talk on this subject? We are a collection of SRE’s that have worked on products like GMail, Google Calendar, Twitter, Apcera, Automatic, as everything from first employees in, to later stage team builders. We have experience with products that range from hundreds of thousands of services all the way down to a single machine.