Ep#77 Deploy Operational Excellence AWS WAF with Rich Boyd

July 27, 2022
headshot

About the Guest

Rich Boyd

I built my first desktop computer in 1995 and have enjoyed tinkering with them ever since. I have spent the majority of my career supporting a variety of server architectures on site and remotely. Day to day I'm living the DevOps philosophy: breaking down barriers between developers and operations. I'm active in several local Meetups and enjoy presenting at those and conferences. Culture is the cornerstone of DevOps and I excel at enabling developers to ship code safer and quicker.

Episode Summary

This episode is about AWS Well Architected and specifically about the Operational Excellence Pillar. Who should be using AWS WAF from startups to enterprises. What are the biggest challenges when performing an AWS WAF Review?

If you're interested in learning more check out the AWS WAF Website.
Are you looking to attend an AWS Summit or maybe AWS re:invent, more information here!

Episode Show Notes & Transcript

Host: Jon

Now, anything you say now can be used on the blooper reel, but don't worry. It won't be held against you.

Guest: Rich

<laugh> awesome. So I don't, so I shouldn't talk about how much I love Oracle that right. Just

Host: Jon

Don't yeah, yeah. Yeah. Well, you can talk about Larry, all you like <laugh> it's

Guest: Rich

Only I had a yacht. His big,

Host: Jon

Yeah. Yeah. Hey, to eat his own right. Success is a matter, whatever he wants to do. All right, rich. You ready?

Guest: Rich

Ready to go?

Host: Jon

3, 2, 1, please. Join me in welcoming, Rich Boyd, chief evangelist and content guru. No, wait a second. That is an awesome title for operational excellence pillar at AWS rich, by the way, I like your title.

Guest: Rich

Thank you so much. The, the title's actually operational excellence, pillar lead, but chief evangelist kind of sounds a little cooler. Don't you think?

Host: Jon

I, I know how that personally feels. I do. <laugh> so rich actually, before we get started, uh, this morning, all of a sudden my chime lid up and before I opened it, I saw it said from Rich and I'm like, oh God, are we canceling again? And when I say we, I was guilty in those. And I was like, uh, no, in fact, we are not canceling again. We've been trying to kick this off for the last two months. Rich had to rearrange, he's doing something cool and new we'll talk about in a second. And at the same time I had to cancel, we finally decided that we just need to do this and click record and see where it goes.

Guest: Rich

Totally. It's it's great to finally do this. We've been dancing around this playing chime tag for two months. It's awesome to finally do this. I've been really looking forward to it.

Host: Jon

So rich, before I jump into AWS, well architected framework, because obviously as the, the a AE, I was gonna say AE like Anna, the OE pillar lead. Uh, we are gonna specifically talk about that pillar, but so rich, before we jump into it, how about a little bit of backstory on yourself?

Guest: Rich

Sure. So, um, uh, I was a, a, I was born in Houston, Texas in 1970. No, no, you don't care about that. So I built my first computer in the back of a radio shack in Southern Mississippi in about 1995. Uh, I love technology. I love tinkering with things like as a young child, loved Legos. Computers kind of worked really well, but took three shots at, at, uh, getting a CS degree didn't pan out. So I joined the good old army, uh, uncle Sam, uh, sent me to, you know, Germany, Kosovo, and Iraq. Uh, after my time in the army, I, I decided that I wanted to get into systems engineering. So I worked for a systems engineering, uh, company that did work for the DOD and spent five years living in Iraq and Afghanistan around the time, came back to the states, worked for them for a little bit more.

And then finally in 2013, I started going to this Austin DevOps meetup that was there and it kind of opened my eyes to this new way of thinking about technology. So I made the jump to startup. So I lived that startup life for about five years in Austin. And then those people from Amazon came and knocking and I couldn't turn 'em down. So I became a technical account manager for good old Amazon doc or Amazon web services. And, uh, shortly after that, I became an enterprise sport lead. So that's a technical account manager that also mentors technical account managers. And then in 2020, I joined the well architected team as a solutions architect. So specialist helping customers improve their architectural health, real fun job. And then in June of this year, uh, opportunity came calling and I took the operational excellence pillar lead position. So that's, that's where I'm at right now. And that's what I'm coming to you from today.

Host: Jon

So going in the army, I think it's kind of a natural fit for the operational excellence pillar lead. I can just see how those two, like go hand in hand where you're like, we gotta do this. Right. It's gotta be, you know, on target. So I think it's a natural fit for you. In fact, that's why we postpone our first recording because this was coming up.

Guest: Rich

Totally. I mean, so if you think about it, the way that we approach, remember that connections show from the BBC that was on PBS when we were kids, right. Connections, the coolest thing ever, right. If you see that as a kid, you're gonna be a nerd, just perfect. Right? So everything is kind of connected. Everything comes from something else. So if you think about operations, incident management, runbook those kind of thing. Those come from checklists from NASA army air force, et cetera. Right. That's what the, that's what those things are rooted in aviations, actually kind of the birthplace for the practices, creating a checklist of things to do to make sure you don't screw up. Right. So like everything's built from something else. Um, side note, here's a fun one for you, modern, uh, security around electronics, or it is rooted in radio security from the fifties, sixties and seventies from the army.

Right. So a lot of our like communication security, best practices come from there. And that's why they're kind of clunky sometimes. Right. But yeah, I mean, the army is a good kind of, it is a good example of that. It's a good kind of, uh, background to have, I think for this, but in operational excellence, it's one of the broadest of the pillars. So you have to, you have to cover a lot of bases, right? So like, I, this is my latest thing that I got. This is the Peter Drucker management library from HBR, Peter Drucker, the father of like modern management, a lot of his concepts and ideas directly influence the evolution of it operations, right? So like, I've gotta, like, my sources are kind of all over the place. Like I cover everything from how you structure your organization to deliver business value, to how you integrate new code changes and deliver it to how you deal with fires and incidents and those kind of things. So I, I cover a lot of bases, so I have to have a lot of like source material to kinda like build on top of

Host: Jon

Nice, oh, well, it's nice to know the background and how you get a lot of your content or material. Let's talk about AWS. Well, architected framework, all right. 2015, Jeff Barr wrote about it. Uh, and it started out with four pillars. We're up to six pillars. Now let's talk about the framework, why it was created and a little back story on it.

Guest: Rich

Sure. So the Jeff Barr blog post is Obi won, going to meet Luke for the first time. Right? That's I

Host: Jon

Love that analogy,

Guest: Rich

Right? That's episode four, but there's episodes one through three. We gotta talk about first. So well, architected comes from a checklist, believe it or not good callback. So a 2012 ish, a group of SAS we're challenged, they're trying to figure out is my customer doing the right thing to make them successful? And if they're not doing the right things, how can I better help them? Right? So they collected this kind of group of smart things to do good things, to do best practices. If you will. They put 'em in a checklist, in an Excel spreadsheet. And they started asking their customers that they were following best practices. So now you've got a way to have a guided discussion with, with the customer, find out what they're doing, find ways to help 'em that's where it started from 2012 to 2014, it became this semi-formal practice.

It became a formal engagement. And then it became a framework. And that's where the four, the first four pillar were established. My pillar was added in 2015. So they added the fifth pillar, oh, in 2015. And then from there, it kind of just keeps growing and evolving. And it goes from being a collection of best practices to a framework of best practices to now a whole brand for best practices, because we have more than just the pillars and the framework. We have lenses. We have additional best practice guidances, right? If you haven't seen it, we just published a new white paper on operations, readiness reviews. Ugh. It's beautiful. It's amazing. It goes super deep and how to build an operations readiness review practice at your, at your company at your organization. And here's the crazy thing about it. That white paper is based on a best practice. That is one sentence in my pill. That's how much like depth there is to these best practices.

Host: Jon

Okay. Right. You gotta send me a link. I'm gonna put it in the description below on the white paper. And plus I also wanna read it, talking about jumping into like the well architect framework and really how it was, it came up with a bunch of essays, right? This, this checklist going through it. And I love how in true Amazonian style, that's something can be developed internally by just SAS. And it's not a push down, you know, top down type approach where you're actually with the customers day in and day out. And you're like, I gotta follow this. Here's my checklist. Oh, I let me pass it to that SA and now it becomes a framework, a methodology that everybody follows and tries to follow, to make a well architected workload, not only in AWS, because it does work semi on-premise. Right?

Guest: Rich

So speaking kind of bluntly, right? We say cloud, we say, it's a collection of best practices for how you build, operate and evolve in the cloud. But many of the best practices apply to your workload no matter where it is, whether that's in another cloud or OnPrim like a lot of the best practices are not specific to cloud. Now there some are, right. So for example, adopt a consumption model. That's kind of hard to do. If you're having to pre-provision or front load capacity in a data center, you kind of have to guess and estimate, and then a more Ize over a year, two year, three year, he can't do that. But other things, uh, for instance, use Virgin control, it's kind of a no brainer, but you can use that anywhere where your workload sits, right? So in many ways the best prices apply universally, but there are some that are very specific to cloud.

Host: Jon

So I have the well architected framework. I have this in front of me. Right. And I'm looking at this document and I'm going, okay, so what's next? I mean, literally what, what do I do with this? What's the first step in getting started? I know there's the well architected review. Yeah. Right. Every, you know, you, you sit down and you go through this, I've been through a number of those processes, but I'd like to hear from your voice, like, what is the step from not only your perspective, but then I can do these on my own within my company on, you know, a specific workload. And we will talk about how to define a workload in a few moments.

Guest: Rich

Totally. So when I talk about, so there's the well framework, there's three things. There's three kind of big ideas in well, architected. There's the framework, which is the collection of the best practices, right? There's the well architected tool, which is a self-service tool available at no additional charge and the database console so that you can do a review self-service on your own. Right? And then there is the well active framework review, which covers both these things like the VIN overlap and a framework review can be self-service in the tool. Okay. It's something you can do with a tool, or you can have your solutions architect or technical account managers help guide you through that. Right. If you're just starting, if you're just kind of dipping your toe in the best thing to do fire up the, a of his console, go to the well tool, define a workload and run through the review because it's gonna show you one like how the best practices fit together, right.

How they fit in, because it's not just like question best practice, best practice, best practice. You start to see that there are dependency change in the best practices. Some are like super important to have. Others are like aspirational or specific based on your journey in the cloud. Right? If you're new to cloud, you wanna focus on like the important ones. First, if you're in cloud native or you're more cloud advanced, then go for someone like the, the trickier ones to the, the kind of like more evolved ones, right. Go through that review and the tool. And it's gonna tell you what best practices you're follow, where there is room for improvement. And from there, start, start thinking about that. Start like, sit with that. And if you want to start doing more with it, take that review and either contact your account team, you know, your SA, your tan, your account manager, or we have a whole network of partners out there that do well, architecture reviews that assist in those that can help you improve your workload, reach out to our partners as well.

Right. But going through the review and the tool is like, if you read the white paper and then do that, now you're dangerous. Now you got enough knowledge to be dangerous. Right. And then after that, you know, start thinking about, okay, I've got this collection of best practices. How do I tell people in my company about how do I, like, how do I get them involved or get them excited about that, right. Start thinking about like how you can take a couple of those best practices that you need to need to work on and get other people in your company. Interested build kind of a, a little like community around that inside your work.

Host: Jon

One of the first steps you said it was define your workload. Mm-hmm, <affirmative> define workload.

Guest: Rich

<laugh>

Host: Jon

I've been through this. And let me tell you what, it's it, I guess it's a matter of interpretation because you could plus, or minus include some resources that may touch or contained to that, but I'm gonna let you define it.

Guest: Rich

Put me on the spot, man. Put me on the spot. Okay. All right. Here's my pitch. Here's my pitch for workload. And it's gonna be vague, just, just prep. The audience gonna be vague. A workload is the collection of people, processes and technology that combine to deliver business value for a customer. So when I talk about workload and scope and a workload, I start with end customer, right? Like what's the end user? What are they getting out of this thing? Right? Like, is it a eCommerce platform? Is it a certain page in that e-com platform? Is it an API service, whatever it is, right. Work backward from the customer. And then that allows you to kind of narrow the scope. What you should be thinking about with workload is how can you, you reduce the scope, right? A work, your workload. If you're at a startup you're, you could only have one workload.

You have just one thing that you're trying to get an MVP for, or get like your next, you know, a round B round C round out of right. Or if you're an enterprise, you probably have more than one workload or tons of workloads, right. Starting from the customer, being customer obsessed, working backwards from the customer is the kind of shortcut to scope in that workload. But keep in mind a workload. Isn't your VPC, a workload, isn't your EC two, a workload. Isn't your land is your API gateway. A workload. Is those things. Plus the people that interact with those things and the processes those people use to interact with them. So, you know, I probably should have mentioned this, but the best practices of well architected, cover people, process and technology, the cover, the whole thing. It's not just how to flip a bit or, or toggle a feature, right?

It's a lot more than that. So when you scope your workload worth that way, uh, one tried and true thing that you should definitely do when you scope workload. If you're really interested in improving your architectural health, using the wall architecture framework is once you define your workload, identify who the executive sponsor is. Get an executive sponsor, find out who is the single threaded owner for the success of that workload at your work and talk to them about it, right? Making any kind of systemic change in any company, no matter tech or non-tech requires a mix of grassroots and executive sponsorship. If you can get both your chances of making a lasting change are pretty high, but if it's just grassroots or it's just executive, your chances of successor are not certain.

Host: Jon

So I've defined my workload. And by the way, when you told me about that, it includes people. I actually don't think I've, I might have read that, but I don't think it's really sank in as much as you telling me that it also defines people. So let's take that in mind because we're gonna talk about people here in a second. And I have my workload. What's it take to do a review? And how long does it typically take now for a quick interruption? A huge shout out to our friends at Veeam for sponsoring this episode, VE back for AWS can easily protect all of your Amazon EC two RDS and VPC data. Wait a second. They can protect my VPC data too. Yep. That's right. Simplify AWS, backup and recovery while ensuring security and compliance. All right. Now, back to our episode, what's it take to do a review and how long does it typically take?

Guest: Rich

That is a loaded question. So if you are, if you are a, on your own in the console right now, which I hope you are actually no open the console and then come back to the podcast. So if you go in the console and you define a workload to do a review, it's gonna take you between 30 minutes to an hour, depends on like how much you want to explore. So if you just wanna go through and read the questions, read the best practices and read the little info thing, 30 minutes to an hour. But if you see a best practice and you wanna learn more, well, that might take you to the framework, white paper or the pillar specific white paper. And then you kind of got a rabbit hole, right? But 30 minutes to an hour is about the average. Now, if you want to guided well review from a SA Tam or a partner, that's longer, right?

That's a discovery session. That's a conversation. So you should probably book between four and six hours for that. I know that sounds like a lot. Here's why it's important. The person conducting that review for you, your essay, your tan, your partner. They're gonna get to know you. They're gonna learn about you about your challenges about the cool things you're doing. And through that discovery, they're gonna have a real good idea of how they can help you improve your architectural health. So the, the investment is really worth the reward. All right, going one step further for the solutions architect of the Tam. You you've contact your AWS SA or Tam. You want them to do a review. We also have offerings like an immersion day for well architected, where we sit down with your team, your engineers, and we spend a whole day going deep on, well, architected all the pillars, all the best practices, right? And then you can do that and then do the review. And now all of a sudden, you know, a lot about the well architected framework. So that discussion and discovery is even more detail, even more in depth, right? So you can go like self-service 30 minutes to an hour, or you can do multi-day depth. It depends on your appetite and what you wanna get out of it.

Host: Jon

Okay. So now I feel like I've done the reviews correctly on all the ones I've done. And it was a, I worked for a partner and we went in and we did a lot of discussions. And the very first pillar literally probably took us about two hours to go through. And everybody was like, what? Oh my God, well, you got a lot of information and you were able to answer other pillar questions based off of that or have it. So the others went much quicker. And this was only when there was five. I think we spend, uh, six to eight hours per, per, per workload because it, the discussion now, as you get into deeper and more workloads, the, you know, processes, the reliability, the documentation behind the scene is typically the same. So you could those go a little bit quicker, but the very first one, as you get to know your customer or your partner at the time, these do take some time to actually dive deep on them.

Guest: Rich

So that's the, that's the interesting part about the second, third and fourth review, right? The first review and specifically the first pillar OE not to toot my own horn here, the OE pillar takes the longest to do, to do is part of a review. It just does. It covers so much ground, right? Yep. But you learn so much in that first pillar that when you get to security now, you know, a couple of the answers, you, you know, what, what, they're, what the customer's doing, what the customer needs help with. And the, the further you go through the pillars, you, it actually speeds up. It picks up by the time you get to cost and sustainability, those discussions are like 30 or 45 minutes usually. Right? So that's one thing. Once you're done the review and you do the second review for another workload, you start to see a lot of common themes, right?

Like, think about, you know, everybody's heard of common way's law, right? Your organization, your org chart affects your technology choices and architecture. It does, they're intrinsically tied. So a lot of the improvement areas in one workload are gonna be the same for others, which is neat because now instead of having to fix a workload, you can do something that fixes both workloads, right? You can lift all boats. So if you think strategically about it, you can maximize remediation efforts to affect more than just that workload. The other thing that I'll throw out there is, you know, I talked about people, processes, technology, best practices, right? Well, some best practices are for the organization, not just for the workload, like some of those best practices float up to your larger organization or, you know, think about it. So if you have a centralized security team, right. For all of your workloads, the security best practices that they, that need to be implemented probably need to be implemented and adopted by that centralized team. And then federated back down to your workloads. Right. So that, there's, there's a lot of depth there. Like, it just looks like a collection of best practices. And then once you start going down the rabbit hole, oh Boyd, it, it becomes a bigger thing. Right? So dip your toe with the tool and then think about the second and third chest move.

Host: Jon

I like that. I can do it myself within the console. But I think I prefer somebody asking me questions, or I prefer the method of me asking them questions because me, me doing it. I'm like, yeah, we do that. Yeah, we, yeah. Yeah. And it's, it's more of a discussion. So let's go onto the premise that I have somebody in the same room as me, who's doing an outside view of a wall architected review who should be in that room with me.

Guest: Rich

So I talked about executive sponsor. So if you're going to do a guided well architecture review, whether you at your company are gonna, and you can do it, your company, right? Like you don't have to get your essay or Tam. If you've read the white paper, if you've dig, dug through the tool, if you've immersed yourself in well architected, and you want to run a guided well review for your organization by yourself, do it. No, there's nothing that says you can't, you should, but no matter who is guiding, like whoever is guiding the well architected framework review, whoever's guiding that discussion. Couple of things. So one try to identify before the review, who the pillar sponsors are. So think about who is the author who can give me the authoritative answer to a question in ops security, reliability, right? That might be the same person.

If you're a startup probably is, or that could be one person per pillar, or it could be two or three people per pillar, like figure out who are the SMEs that you need to get answers from and make sure that those SMEs have some kind of agency. What I mean by that is that it's not just enough to ask a question, get an answer. Once you ask the question, you get the answer. Now you got something that you need to improve. So the SMEs you're asking questions of are probably gonna be working with you to make those improvements as well. So like the, your involvement, your relationship with them just starts with the review, but it continues after you've asked all those questions, right? So that's one, um, be careful with sprawl, right? If you have 20 people in the room and everybody knows, if you have 20 people in the room, two people are gonna talk and 18 people are gonna be on their laptops or their phones, right.

They're just gonna be doing other things. So keep it small, keep it quaint. Right? If you are asking questions for operational excellence, get your O E SMEs in the room. And then when they're done, bring in the security group, right. Do that. It makes it easier than everybody. Um, and it, and it, you know, time is precious. So don't, you know, don't commit somebody to being in a force to six hour meeting, right? Have them involved at a minimum for that pillar and offer, Hey, if you wanna sit around and hear these other answers, you're more than welcome to. Right? The other thing to keep in mind is that in some ways, a well architected framework review can feel like a postmortem in some ways. Here's why I'm gonna ask you, are you following this best practice? Or I'm gonna ask you, how do you implement this best practice?

And if you say we're not, or you say we don't, that's a vulnerable moment, right? You don't want to ever say, you're not following a best PR. Nobody does. Right. You just, it's a human thing. So there's a certain amount of vulnerability that is involved in that guided framework review. You've gotta think about that. So you gotta create a safe space for vulnerable conversations and you, you have to make sure that people can be honest, what you don't want is people saying yes, because they're afraid to say no. Cause if they say yes, when it should be no, then you're not gonna be able to help. 'em right. You, you kind like paper and things over, you don't wanna do that. So there's a, I mean, I hate to say it, but like, it's almost like couples therapy running a well framework view, right? You gotta get people to be vulnerable and open. You get to, you gotta get them to, to believe in the best practices in the change that's possible. And then, then harness that to actually make the improvement

Host: Jon

Now doing a well architected review. And, uh, I, I guess the continuation of a review, I did it I'm done. Is that it? Do I ever come back to it? Are there further reviews I should do on this workload or process? Or just say, yeah, I did it check the box off and uh, we'll move on to another one in the future.

Guest: Rich

So a well architected framework review is like your yearly physical with your primary practitioner, your primary care. Okay. And if you're not doing a yearly physical, you need to do that. Everybody should be doing yearly physicals. That's just smart as an adult. Um, when you do your first review, your very first review, there's actually three phases to the review. Let me back up just a little bit in the way that we train partners and we train Amazonians on like how to do a review. There's really three phases. There's prepare, review and improve, prepare us where you find your stakeholders, like your executive sponsor, your pillar sponsors, et cetera. And you do some enablement. You say, Hey, here's the white paper, or, Hey, I'm gonna summarize well architected in 50 words or less for you, right? You, you prep your audience, right? Give them the links to all the resources and say, Hey, we're gonna ask some questions.

So then you do the review. But when you're done with the review, if you do it in the well architected tool, which you should, you're gonna get a list of issues, which are areas to improve your workload. All right? You can use the well architected tool to track the improvement of those issues over time. So once the review closes, now, the hard part starts figuring out how to fix things. Step one, in that process, sit down with your executive sponsor and those pillar sponsors and recap the best practices that are missing, right? Look at your list in the well tool and pick the top three or five, pick the top three or five limit scope to three or five, and then sit down with those stakeholders, say, okay, here's in my opinion, based on this review, here are the top three to five things that we should work on.

Collectively. Let's make a plan to do that. And then start budgeting time capacity at center to do that. Now, maybe you don't have capacity, but if your executive sponsor can pull that magic lever and stacks of dollars bills pop out, you can go to somebody else to fix it for you like a partner or pro server, whatever. Right? But the review is just one part of a, kind of a three step process. And that that improved phase can be 60 days, 90 days, 180 days, whatever. Depends on how much you take on and how much complexity there's gonna be to implement the best practices. I tend to suggest people a time box, like select X number of best practices, limit to Y amount of time and dig in. Right. And before, you know, when you do that, make sure you measure your success, right? So for example, if you're implementing a best practice, how do you measure that in terms of successful outcomes to your business and your customer, right?

So for example, are we gonna in, are we gonna reduce that mean time to detect or resolve? Are we gonna increase our up time? Like, what are the measures? What are the metrics that we can measure that show that we started the process here. We finished the process here, and here's how the graph went up or graph went down, right? So you can say, we did a thing we were successful. And here's why now when you start meeting that first workload, you, you got two options. If you don't have a lot of time and resources to people, stick to the workload and get it fixed, get it updated, get it improved. And then build a case study to tell people about might say, Hey, here's what we did. Here's how we did it. Here's the outputs. You should try this. Or if you've got a community of practice, my favorite, like new phrase, community of practice, if you built a community of practice around this idea of well architected, then try to do another workload.

And when you do the review on that workload, when you're done with the review phase, compare notes, compare that first workload of that second workload. And it'll be, eye-opening the amount of things that are similar, the amount of areas for improvement that are similar. And at that point, you can start figuring how to apply the things you're doing with workload one to workload two, cuz along that way in improving workload one, you're learning, you're learning how to do these things. You're learning what works and what doesn't. So when you apply that to workload two, it takes a little bit less effort, takes a little bit less time, right? Better yet. If the solution is an automation or code like a config pack, right? Or configuring something in the account, that's pretty easy to apply and pretty straightforward. So thinking those terms, right? The people and process ones are a little harder to fix. The technology ones are usually the easiest ones to take on first.

Host: Jon

So rich, that was actually a great segue into applying one from one workload and how easily and similar there are into multiple workload. Let's talk about your pillar, operational excellence. What is it? What is it about, I mean, come give us an example of some of the things that you will talk to customers and partners about.

Guest: Rich

So the operational excellence pillar is cut into kind of four domains, right? There is organized, prepare, operate, evolve, and that's really what my pillar covers. So in organiz, we talk about how your organization team company does things and how they form and how they kind of figure out how to get people to talk and share ideas. So for example, um, one of the best practices is, and I know this sounds trite, but evaluate external or internal customer needs, right? Make sure you've got a process to get customer feedback. Whether that is like a comment on a web form or user activity back to the people that are developing and making changes, like create that Tel loop, right. That feedback loop. That's why it's important. Uh, another one in, in ops three, for example, escalation is encouraged like your organization. If, if there needs to be an escalation that should be okay to do your staff, your team, your frontline engineer should be okay doing that.

And it should be encouraged. You shouldn't have an organization where you try to like hide fires, right? Keep the, you know, keep the lid on 'em and don't let people know about 'em like open and honest communication is important. One of the newer ones that we've added is, um, you know, making sure that D diverse opinions are encouraged and sought within and across teams like diversity inclusion is a huge deal, right? Organizations that, that index on diversity inclusion are more successful organizations. They're better organizations. They're more able to iterate and ideate. Right? So those are important things. When we get into the, you know, when we transition to prepare, we're talking about planning and architecting, right? Uh, one of the, one of the big ones it's so important is implement application telemetry, setting your log level to error is not implement application telemetry. Right?

Host: Jon

Darn. I'm gonna change that

Guest: Rich

Right now. I know. I know. I know. I know. Right, right, right. So implement application telemetry means making sure that your application tell you like two big things, two big questions answer. What's the state of your application. So healthy, unhealthy, right? You need to tell me that second thing is, is it delivering business value? So if your application can't answer a or B you have work to do, but all of your applications should be able to answer those questions at any given moment. Right. That's just kind of the philosophy of it. When we move into like, you know, further into prepare, we talk about C I C D we talk about, um, you know, using runbooks using playbooks and then operate is a very interesting one. And it's, it's part sort of the heart of the pillar, operational excellence deals with the health of your workload in operations eight and then the health of your operations and operations nine.

So you should be applying the same kind of scientific rigor to knowing if your workload is healthy or not to your operations. Right. So you should figure out, okay, how do I know if my operations activities are successful or not? How do I know how efficient they are? Because if you can measure something, you can improve it. If you can't measure it, you really can't improve it. Like if you launch an initiative around, you know, improving operational health, without data to guide you, you may not actually address the problem. You may not actually truly improve things. So N OE and in well, architected measuring is important. If you're going to improve anything, you must be able to measure it first. Like that's like one of the big ideas. And then we move into evolve ops 11, the last question. And, and that is really how you take everything that you've learned, everything that you're doing, and you make it better, like really having a process for continuous improvement.

You know, in my previous life before Amazon, one of the challenges to running like an ops team is technical debt. The important thing to do is knock out new features and knock out bug technical debt is always backlog and always there and always forgotten, right? So if you're going to make improvements, you need to take some percent of your capacity, like say 10% per week, and devote that to making improvements, having a process around improve, like doing postmortems, doing kaizens, doing those things at the sprint at the end of the sprint, when you do your sprint retro, find something to improve for the next sprint and do it like something as simple as that. But that's so important without that it's stagnation, it's leaving, it's being left behind the market. It's not delivering as much business value to your customer as you possibly can. All those things from come from that. So we really want you to have a learning organization. We want you to, to improve everything, not just your technology, but your people and the processes in your work. So OE is, it's a weird one, right? Security or reliability. They're a little more narrow scope performance, right? Sustainability cost OE is, is kind of got its arms around a lot of things, right?

Host: Jon

Rich OE sounds like more of a sit down discussion type pillar, right? Nothing that can be done through automation. You have to have that, whether you're doing it yourself and you have to be open and honest, or somebody's doing it with you. But remember you said, this was the longest one to kick off, but really sets the precedents for the rest of the pillars in there. And I'm not saying this is more important than the rest pillars, but I know doing this pillar for myself, it took a long time to do it. So it's a discussion there.

Guest: Rich

So you can, you could de you can detect. So for example, with defining workload metrics, right. If I know that you're using CloudWatch and you've got CloudWatch dashboards that can lead me to thinking that you're doing the best practice, right. So there are, there are like, there are symptoms of the best practice. There's ways to kind of detect if you're falling or not. If you've got, um, a GitHub integration in your code pipeline, I can, I can know that you're using version control in some capacity, right. Or you're using code commit, but other ones like, oh, I don't know, um, perform knowledge management. There's no way to detect that. I mean, I could maybe scan if you have like, uh, markdown docs in an S3 bucket or something like that, but not really. Right? Like, so it is much more of a guided discussion. Other pillars are more prone to automation, right? Yep. They're, they're easier to kind of detect and even remediate in some cases, but OE is the, it's the fuzzy one. It's the, it's the toughie

Host: Jon

Rich. I got two more questions before we wrap things up is well architected framework for me. And let me dive a little bit on that. Is, is it for everyone who's using it?

Guest: Rich

I firmly believe in my heart of hearts that the well architecture framework is for you, no matter who you are, where you come from, or what job title you have. So like, I'll, I'll give a good example in cost optimization. One of the key best practices is implement cloud financial management, right? Implementing cloud financial management really requires a cross discipline effort of not just your engineers, not just your operators, but your finance folks, working together to create a practice around cloud spend in improving cloud spend. If it, if the only person looking at the bill sits in engineering, you have a problem, right? So everybody executives, leaders, managers, developers, operators, security, Qiu name it. Our tens big enough for you and best practices apply to you. And I'll, I'll I'll even go one step further.

One of our phrases that we use is it doesn't require an architecture. Astronaut. The best practices are written in plain English using, you know, five syllables or less so that anybody can understand them. Anybody even, especially non technologists, right? The barrier to entry is not really steep from a knowledge perspective. And they apply to you. They apply to areas that may not be directly involved in tech. Um, I can tell you that in my, you know, in my personal life, I help people that aren't in tech with how to kind of storm form and norm teams, which is a lot of OE, right? So those, these best practices apply to not just your job, but in some ways outside of your job.

Host: Jon

All right. Rich, last question. Are you guys doing anything? If you want to get involved a little bit more in the community, DevOps, DevOps days, uh, how can we get involved and what awareness do you wanna bring to everybody?

Guest: Rich

So I'm an organizer for a couple of DevOps days, style conferences in Texas, and I'm an organizer for a meetup or two as well. Um, DevOps stays Austin. We just had our first conference since 2019, we had 350 people there. It was awesome. It was great to be in real life with other people breathing, you know, like exchanging atoms and talking. It was great, but we're still on the road of recovery. Right. I think that we all have been isolating and that has a certain amount of trauma to it. Right? So we're, we gotta get out of our shell. What I will say is that the meetups in local tech communities are not going to become vibrant, vibrant, and thriving without the people in those communities getting involved. What does that mean? Go to a meet up, if you know about a meet up post on social, like help those volunteer organizers out by helping to promote and helping to grow.

Like if you just participate, if you just show up to meet up, you've done so much. That's really it, right. Show up for some beer and free pizza. It's awesome. Um, if you're, if you wanna do more talk to an organizer, say, Hey, I wanna do more than just sit in the audience. How can I help out? That's what I did. I went to my first DevOps meetup. When I, in 2012, I started getting involved. I started drinking the Kool-Aid. I became a volunteer in 2014 and an organizer in 2015. And now, you know, a decade later, I, I run my own conferences and I help out run meetups. Right. So if you were part of a meetup or you wanna start a meetup period, do that. If you wanna run your own DevOps days, go to DevOps days.org, right? Start your own local DevOps days. We have tons scheduled, uh, in 2019 before the pandemic, we had over 160 DevOps days events worldwide. Right. And if you've never been to DevOps days, you're gonna love it. You should go.

Host: Jon

Nice. Nice. All right. You heard it right here. Take a look. DevOps days. There's plenty around, plenty happening and it can happen without you in the community. Rich, I gotta thank you so much for joining my podcast. This was fun.

Guest: Rich

That's a blast. We should do this more often.

Host: Jon

Uh, dude, I've always open and available and by the way, this honor recording. So I will hold you to it. <laugh>

Guest: Rich

Absolutely man. Absolutely.

Host: Jon

All right, everybody Rich Boydd, chief evangelist and content guru for the operational excellence pillar at AWS. By the way, I really definitely enjoy the title. My name's Jon Myer. You're a host. Don't forget to hit that. Like subscribe and notify, because guess what folks we're outta here.