Interview: Comprehensive Cloud Monitoring – DockerCon ’18

Containers allow you to quickly deploy new applications and roll out updates, but how do you know if they're performing as expected, especially in more complex environments? Rajesh Raman, Chief Architect at SignalFx, answers these questions and demonstrates how visualizing and detecting anomalies is indispensable for DevOps in this quick interview by Swapnil Bhartiya of TFiR (the Fourth Industrial Revolution).

Join our live demo to learn how companies including Yelp,Kayak and Hubspot leverage SignalFx metrics monitoring for instant visibility and alerting into their entire environment.

Swapnil: Hi! this is Swapnil and we are here at DcokerCon in San Francisco and today we have Rajesh with us from SignalFx. Can you tell us a bit about SignalFx and what does the company do?

Rajesh: Sure! so SignalFx is a metrics based monitoring system we think of ourselves as an operational intelligence platform for people to understand their cloud infrastructure and cloud deployments so this can be you know whether it's the infrastructure metrics themselves which might be coming from you know public cloud infrastructure like AWS or GCP or Azure but it's also their applications the services that they deployed on those cloud environments and also perhaps even lambda and you know serverless infrastructure so we have a standard way of getting all these metrics and helping people understand how their environments up to date.

Swapnil: Are you have touched upon a lot of things already yeah so what is the need for monitoring in this kind of setup where you are doing everything by just clicking a button.

Rajesh: So deployment is of course one aspect of it so docker is really good at you know providing your standardized way to deploy your applications but it's quite another thing to understand if your applications are working the way that you would expect them to at the very least you need to know whether your infrastructure is performing the way it is that might be as simple as looking at CPU usage and memory usage but if your application is more complicated if it's performing some sort of workload or using third-party open source systems you need to actually instrument what's interesting or important to you you cannot really understand or you cannot really improve things that you don't measure so the quantities that actually represent the performance or the working of your application are things that you want to kind of pay close attention to - so you want to get those metrics send it to a system like SignalFx and will help you visualize them perform analytics on them to do anomaly detection let you know if things are working good or not so that's kind of the value that SignalFx brings.

Swapnil: You focus only on docker, containers or you are talking about cloud-native words in general.

Rajesh: So we are basically focused on in the entire spectrum of cloud-native infrastructure docker is of course one of the platforms that we do support but it can even be servers with serverless like lambda they can even be applications that you're running yeah so we are basically happy to take metrics from anywhere.

Swapnil: Okay so when you to talk about serverless that kind of changes the equation because no you're talking about the function which are triggered by certain events you know so you don't have the same level of you know either excess of control that you have in other space so how does the whole equation changes you know of monitoring in the severless space versus traditional.

Rajesh: Sure! so there are like two kind of you can decompose the problem into two parts so one is like how do you grab the metrics how do you actually instrument what you need to do to get the information that you need that is what primarily changes in the serverless world because you don't have a node that you can deploy heavy weight agent to get these metrics from so you need some api's and you need some infrastructure to help you to get those metrics so we of course provide some help for people to do that but once the metrics actually get to us from SignalFx's perspective we are happy to get metrics and time series from anywhere and we are actually agnostic about where these metrics come from how they are measured and what they represent so we provide a platform for you that we then to perform analytics on as a general purpose thing so we help on both sides of it but to us the main solution as a platform we are agnostic about where those metrics come from and lambda is just yet another form factor that we support.

Swapnil: What are the concerns in the serverless space?

Rajesh: So I think the concerns are so one is that there are some infrastructure kind of concerns with lambda which is you want to know things like cold starts like how many cold starts are happening and this is something that Amazon does not actually give you a lot of insight about and then you want to know on each lambda invocation like what's the performance on an singular call like how long does that take so that's something that we can give you almost like out of the out of the bag without you having to do very much to instrument your own application but in addition you might want to like change the code in your application and instrument the things that are important to you like how is it taking, what are you doing in the specific action you might be doing two or three different things you might be looking up a cache putting something here or there and you may want to instrument those metrics as well so that when you look at it in aggregate across all your lambda indications not only do you want to know how many cold starts there are what's the average time for each lambda invocation but on the operation of the lambda invocation itself like in did you throw any errors you know what kind of work did you do what kind of metrics did you gather and so we kind of like gather both of that and let you visualize them again you can do a anomaly detection on this and the other thing about lambda is that people are very sensitive about the latency of how quickly you can make these measurements and how quickly you can provide intelligence of monitoring on this and that's something that SignalFx is very good at. like we are known for real-time streaming analytics so you know within a couple of seconds if something is happening you immediately see that in the dashboard you can have anomaly detected when you know lambdas can last for only a few seconds sometimes and so for something to be known like a few minutes after it happened like that's not provide enough value that's one of the key strengths of SignalFx is to provide real-time intelligence.

Swapnil: And since you are already doing with customers who are I mean you are in it next phase you know you're talking of monitoring so what kind of adoption is there for serverless already this very really new buzzword.

Rajesh: Yeah yes! we are actually seeing pretty good adaption I think it depends more on the company and where they are on their cloud-native journey so we think of people having you know starting from somewhat an experimental phase where the company as a whole is using some what legacy IT but they might have a few labs or trying to experiment and see what should the next generation architecture look like and then there's kind of like companies that are in the middle phase that we call like somewhat decentralized chaos where you have different parts of the company they each want to find a tool chain that works for them so each team might be doing something different and then finally you have the teams that are thinking very strategically about deploying what kinds of architectures do they need what kind of tooling what kind of monitoring and so those guys we provide what we call organized enablement and so we help companies in each of these three stages of their kind of growth and their lifecycle and different companies are thinking in slightly different ways about how they want to deploy these architectures but we are definitely seeing the companies who have gone through this journey and are seeing the value of kind of like lambda. Lambda is of course not applicable to every single workload but there are so broad kind of class of workloads where lambdas provide a lot of value and and we are seeing pretty strong adoption for those kinds of workloads.

Swapnil: So so we have like you know jump from one topic to the other topic but so much is already happening but one thing that you did mention is metrics and I did feel that you know you do want to you know talk about you can you explain you know elaborate you know.

Rajesh: Sure so there are a few different pillars to the whole monitoring and observability space you know there is like kind of like logs now traces is becoming a little bit hot metrics we believe plays an important role and then there are some structured event type solutions so metrics is something that is actually going to play an increasing role in this entire monitoring landscape because as we see as these platforms kind of like change their deployment models change the one thing that's common actually across all of them is like the concept of having to make measurements and do kind of like real-time streaming analytics on it and these measurements like they may be infrastructure type measurements like if they're if you have a more traditional deployment but as you move to serverless these metrics are more like workload measurements of what your application is actually doing and so there can be very high level things like like how many transactions are you processing or you know if you're looking up a cache how many hundred page cache hits are you seeing how many cache misses I say how many exceptions are seeing so these are the things that for a developer coming from a DevOps mindset really characterized the health of your application it's not whether CPU is high or is low because that may be normal that it's a little bit high maybe you're utilizing your resources very well but for people to think strategically about what are the quantities that I should measure that really characterize whether my application or my infrastructure is working well or not and you get more and more into this what we call custom metrics ecosystem where people are measuring things that kind of bridge between operational and business metrics that really tell you okay is your service doing what it's supposed to be doing.

Swapnil: So we talked about technology we talked about monitoring and all those things let's talk about you a bit so when you're not doing all these tech stuff what do you do in your free time

Rajesh: Aha so if I of course have two kids I have a dog that keeps me busy my hobby is playing guitar actually.

Swapnil: Oh really?

Rajesh: Yeah so okay that's interesting so when

Swapnil: You pick the hobby of playing it huh oh

Rajesh: I played it for a really long time I should be awesome but I don't practice as much.

Swapnil: Did you learn guitar to impress somebody or just for yourself because usually people learn guitar for a certain reason.

Rajesh: Yeah I have a theory that people continue to play guitar for their own reasons but they all start playing guitar for the same reason. I started playing guitar in high school you can imagine.

Swapnil: Okay and you still play right

Rajesh: I still play yes!

Swapnil: So how much you have progressed you know have you progress where you know if there is a next year there the keynote and they need a band can you be on the stage?

Rajesh: Well actually I recently switched to classical guitar oh it's just a little bit different

Swapnil: uh-huh

Rajesh: But I'm really enjoying it it's very challenging it's much more theoretical it's much more you know you really have to pick up on your technique so it's a very different ballgame to me I used to play a lot more rock before that's kind of like more fun more gig kind of music but it's different now and I don't have enough time to dedicate to playing in a band but I do practice by myself.

Swapnil: So if I'm not wrong you know where the technologies you also travel a lot right.

Rajesh: Sometimes yeah okay so how do

Swapnil: You keep up with your guitar practices and the tech.

Rajesh: To be honest I don't keep up very well

Swapnil: Ha Ha Tech or guitar?

Rajesh: Guitar!

Swapnil: Wish there was something portable which you can just carry with you all the time.

Rajesh: Yeah! I do have some travelling guitar type things but it's still you know some amount of infrastructure that you have.

Swapnil: To lug around so anything else beyond beyond guitar kids and dogs?

Rajesh: I think that's where most of my time goes it's at the spending time with the family working and yeah just keeping afloat I suppose.

Swapnil: Awesome awesome! in the next video maybe we'll try to have a you know a demo of your guitar skill

Rajesh: Why not

Swapnil: Thanks talking about the servers and hopefully we'll catch up with you again in the next event and It'll be great.