DevOpsDays Chicago 2019 – Jay Gordon – Ignite: What I Learned From A Dress, an On-Call Nightmare

DevOpsDays Chicago 2019 – Jay Gordon – Ignite: What I Learned From A Dress, an On-Call Nightmare


Jay Gordon – What I Learned From A Dress,
an On-Call Nightmare>>Jay Gordon: Hey, how is it doing, Chicago? So what I learned from a dress. It is an oncall nightmare, it is something
we’re going to talk about. And I hope you enjoy the story. So first and foremost, I will wait for a minute,
there it is. That’s me, Jay, I’m an ops professional, I
work in Microsoft, this is my one-year anniversary. And one of the coolest things that I have
learned is storytelling. It is what makes working in this DevOps field
interesting. And so I just — I started, I decided to start
a podcast. And I called it on-call nightmares, because
I wanted to collect all your stories. I wanted to hear what you have been through. And then I decided, I’m coming to DevOps days
Chicago, I’m telling my story. My podcast has three rules, do not incriminate
yourself, others, and help us learn. I think of us as being blameless and having
retrospectives. That’s what the story is about. This is my story, I worked at Buzzfeed, part
of the ops team, it was a regular day, I was building mongo racks in the data center, on
call for alerts, and then suddenly, llamas! That’s right. So how does a bunch of llamas impact my day? Well, simply stated, there were a pair of
llamas that went on the run. You know, a real Jay-Z and Beyonce thing,
and the internet, of course, went ape shit. That’s what the internet does, when you see
ridiculous things that make people interesting, they create viral moments, and alerts explode,
right? So a lot of times, you are on the floor of
the data center, and you are figuring out why the alerts are exploding and we figured
out that we had so much technical debt. So much time was spent, and we built a system
that couldn’t handle the amount of traffic that we had. And so what do you do about that? You stop, think, this is our technical debt,
we will make remediations. So I went home. And do the things that we all do when we get
home, 8:30PM, the pager goes off. My wife asked, is this because of the dress? [Deep sigh]. I asked what dress, and this is the dress. I kept alerts from my boss, and you can see
that things were really pissed off. And for those of you that don’t know what
the dress phenomenon was, it created 670,000 active connections per second on a lamp, and
that P stood for PERL. So sorry. And it all started with the most interesting
thing, Kay, a wonderful woman, worked for Tumblr and now at Buzzfeed and said in an
email, can you settle the argument for us? What happened, alerts explode, because Kay
put out a viral post, because holy shit, there’s a dress, nobody knows what color it is. The internet does what it is, and I get more
alerts, and pager notices, yeah, you’re screwed. [ Laughter ]. So what do we do? We start recognizing that we reached a really,
really bad point and this is supposed to be —
>>This is what came up when the PERL app would go to hell. And Ben Smith, the editor of buzz Feed said
we can’t do anything, we tied everything together and assumed capacity. That’s the whole DevOps part of this. We assumed that a capacity, because I’m not
going to name names, we kept blameness, because people didn’t want to use the systems through
auto scaling they didn’t trust. So we assumed the clothes rack we had was
enough for the dress, and obviously it wasn’t. We had a huge request from people to wear
that dress. The thing I learned, this is how I will wrap
up, is you need to be ready to restock and reassess when the season’s hot style is unexpectedly
in demand. It could create a nightmare, so thanks, love
you all. [ Applause ].

Leave a Reply

Your email address will not be published. Required fields are marked *