theCUBE Interview with SignalFx at PagerDuty Summit 2018

Our CEO Karthik Rau and CTO Arijit Mukherjii join Jeff Frick from theCUBE to discuss the importance of integrating problem detection and incident management software as companies continue to leverage cloud-native technologies. Karthik and Arijit go deep on new critical requirements for problem detection. Learn why leveraging the most intelligent algorithms to detect patterns that are occurring in a production environment in real-time is so critical for brands today.

Jeff: Hey welcome back everybody. Jeff Frick here with theCUBE. We're at PagerDuty Summit at the Westin St. Francis in Union Square, historic venue. Our second time to this show, there's about 900 people here talking about kind of the future of devops, but going a lot further than devops. And we're excited to have a couple of CUBE alumni here at the conference from SignalFx. We've got Arjit Mukarji.

Arijit: Mukarji, yeah.

Jeff: Thank you. And Karthik Rao, co-founder and CEO of SignalFx. Gentlemen, welcome.

Karthik: Thank you very much.

Jeff: So what do you do at PagerDuty Summit?

Karthik: Well we've been partners with PagerDuty for a long time now, we've known them since the very early days, we share a common investor. But we both operate very squarely in the same space, which is companies moving towards dev ops development and deployment methodologies, leveraging cloud and native architectures. We solve a different part of the problem around monitoring and observation and we partner with them very closely around incident management Once a problem is detected, we typically integrate in with PagerDuty and trigger whatever incident management paths that our customers are orchestrating by PagerDuty. So, it's been really an integral part of our entire work flow since we started the company. So we're very close partners with them.

Jeff: Yeah, it's interesting 'cause Jen announced they have 300 integrations or 300+ integrations, whatever the number is, and to the outside looking in, it might look like a lot of those are competitive, like there's a lot of work flow and notification types of partners in that ecosystem, but in fact, lots of different people with lots of different slices of the pie.

Karthik: Yeah, absolutely. It's a really big problem space that everyone is trying to solve in this day and age. Some of our competitors are in that list, but you know we partner very closely with PagerDuty. As I mentioned earlier, our focus really is around problem detection and leveraging the most intelligent algorithms, statistical models in real time to detect patterns that are occurring in a production environment and triggering an alert, and typically we're integrating in with PagerDuty and PagerDuty deals with the human elements of once something has been detected, how do you manage that incident? How do you router to the appropriate people? One of the things that's really interesting as this world is changing towards these DevOps models is the number of people that have to get involved is substantially greater than it was before. In the old days, you would have an alert go into a knock and you have a specialist group of people with very specific runbooks because your software wasn't changing very often. In today's world, your software is changing sometimes on a daily basis, and it could be changing across dozens of teams, hundreds of teams in larger organizations. And so, there's a problem on the detection side because companies like SignalFX have to do a really great job of detecting problems as they emerge across these disparate teams, across a much, much, much, larger environment with much larger volumes of data and then companies like PagerDuty really have to deal with a far more complex set of requirements around making sure the right people get notified at the right time. And so they're two very different problems and we're very happy to- and have been partnering with them for a number of years now.

Jeff: And again, the complexity around the APIs where the app is running, there's so many levels now of new complexity compared to when it was just one app, running on one system, probably in your own data center, probably that you wrote, compared to this kind of API centric multi-cloud world that we live in today.

Arijit: That is exactly right because what's happening is our application architectures are changing 'cause we used to have these monoliths, we used to have three tiers and whatnot, and we're replacing that with the micro-services, loosely cabled systems, and whatnot. At the same time, the substrate on which we are running those services, those are also changing. Right, so instead of servers, now we have virtual machines, we have cloud distances and containers and pods and what-have-you. So in a way, we are sort of growing below too in some sense and so that's why sort of monitoring this kind of complex, more numerous environment is becoming a harder challenge. We're doing this for a good cause, because we want to move faster, we want to innovate faster, but at the same time, it's also making the established problems harder, which is sort of what requires newer tools, which sort of brings companies like us into the picture.

Jeff: Right, yep. And then just the shear scale, volume, number of data that's flowing through the pipes now on all these different applications is growing exponentially, right? We see time and time again, so it really begs for a smarter approach.

Karthik: Absolutely, I mean on two levels right? The number of minutes of software consumption is up exponentially, right? Since the smartphone came out in 2007, you've got billions of people connected to software now, connected all the time, so the load is up order sum magnitude which is driving, even if you didn't change the architectures, you would have to build out substantially more back-end systems, but now the architectures are changing as well, where every physical server is now parceled up into VMs which are parceled up into containers. And so the number of systems are also up by order sum magnitude. And so there's no possible way for a human to respond to individual alerts happening on individual systems, you're just going to drown in noise. So the requirements of this new world really are, you have to have an analytic spaced approach to monitoring and more automation, more intelligence around detecting the patterns that really matter.

Jeff: Right. Which is such a great opportunity for artificial intelligence, right, a machine learning. And we talk about it all the time, everyone wants to talk about those, kind of as a vendor-led something that you buy. Yeah, that's kind of okay, but really where the huge benefit is, companies like you guys and PagerDuty using that technology, integrated in with what you deliver on your core to do a much better job in this crazy increasing scale of volume that's run with these machines.

Arijit: Yes, because the systems are becoming so complex that even if you asked a human to go and set up the perfect monitoring or perfect alerting, et cetera, it might be quite a hard challenge, right? So, as a result sort of automation, computer intelligence, et cetera needs to be brought in to bear, because again, it's a more complex system, we need higher order systems that have dealed with them.

Jeff: Right.

Arijit: You are very, very right, yes. And that's a trend we are starting to see within the product, we are actually focusing a lot on sort of data science capabilities which too are sort of making them more and more sort of machine running and automation. In the future, we have capabilities in the product that can look at populations and identify outliers, look at cyclical problems and identify outliers again. So the idea is to make it easy for users to monitor a complex system without having to get into the guts, so to speak.

Jeff: Right.

Karthik: And to do it on various sorts of data, right? I think you have an interesting use case that we've been experimenting with recently.

Arijit: That's right.

Karthik: If you want to talk about that.

Arijit: Yeah, so I actually have a talk tomorrow, it's called "Interesting One." It's about monitoring social signals, monitoring humans. So we have these systems, we have these metrics platforms and they are quite generic, the tools that we have nowadays and are sort of available to us are quite powerful, and the set of inputs need not be isolated to what the computers are telling me. Why not look at other things, why not look at business signals? In my case, I'm going to talk about monitoring what the humans are doing on Slack as a way for me to know whether there's something of interest that's going on in my infrastructure, in my service that I need to be aware of, right? And you'll be shocked how surprisingly accurate it tends to be. It's just an interesting thing, and it makes one wonder what else is out there for us to sort of look at? Why confine ourselves, right?

Jeff: Right. It's funny because we hear about sentiment analysis in social media all the time, but more in the context of Pepsi or a big consumer brand that's trying to figure out how people feel. But to do it inside your own company on your own internal tool, like a Slack, that's a whole different level of insight.

Karthik: You'd be surprised at the number of companies that monitor Twitter to understand whether they have an outage.

Arijit: That's right.

Karthik: Yeah, because in this day and age, users are on Twitter within seconds if something is perceived to be slow, or something is perceived to be down, they're on Twitter. So there are all sorts of other interesting signals to potentially pull from.

Jeff: Right, right. Well and guess what, we were just at AT&T Spark yesterday and the 5G's coming and it's 100x more data'll be flowing through the mobiles, so the problem's not going to get any smaller any time soon.

Jeff: So what else have you guys been up to since we last spoke? Continuing to grow, making some interesting moves.

Karthik: Absolutely-

Jeff: Crossing oceans.

Karthik: We've been very, very busy, one of the big areas of investment for us has been international growth, so we've been investing quite a bit in Europe. We have just introduced an instance of our service that's based in a European data center. For a lot of our European-based clients, they prefer to have data locality, data residency within the European Union, so that's something new that we just introduced last month, continue to have a ton of momentum, outed AMIA, they're very much on the cloud journey, and embracing cloud and embracing DevOps, so it's really great to see that momentum out there.

Jeff: Right, and clearly with GDPR and those types of things, you have to have a presence for certain types of customers, certain types of data. Anything surprising in that move that you didn't expect or?

Karthik: No, I don't know, I'll let you.

Arijit: Not in that move, but it's just interesting to see how quickly some of these modern technologies are getting adopted and how- one of the things sort of we talk about a lot in our trade is ephemeral, right? So how things are short-lived nowadays, and you used to lease these servers that used to stay in your data center for three years, then you went to Amazon and you leased your instances, which probably lived for a few months or a few days, then they became containers, and the containers sometimes only for a few hours or for- you know. And then, if you think about serverless and whatnot, it's in a whole different level, and the amount of ephemeral that's going on, especially in the more cloud native companies, was a little bit of a surprise in the sense that, it actually poses a very interesting challenge in how do you monitor something that's changing so fast? And we had to have a lot of engineering put in to sort of make that problem more tractable for us. And it continues to be an area of investment. That to me, was something that was a little bit of a surprise when we started off. Much of this dockerization and coordinating was not yet in place, and so that was an interesting technical challenge as well as a surprise.

Jeff: Well I'm curious too as instances, right so there's the core instances that are running core businesses that don't change that much, but it's a promotion, it's a this or that, right? It's a spin up app and a spin down app. Are those even going up on the same infrastructure from the first time they do it to the second time they do it. I mean, how much are you learning that you can leverage as people are doing things differently over and over again as their objectives change, their applications change, they're going to go to market around that specific application. That's changing all the time as well.

Arijit: Yeah, so I think the challenge there is to sort of build, at least from a technical point of view, from SignalFx point of view, is build something that is versatile enough to handle these different use cases. We've got new use cases, new ways of doing things are going to continue to happen, probably going to keep on accelerating. So the challenge for us is good and bad, is how do we make a platform that is generic, that can be used for anything that may come down the pike, not only just now. At the second time, how do we innovate to continue to be up to speed with the latest of that's what's going on in terms of infrastructure trends, software delivery trends, and whatnot. Because if we're not able to do that, then that puts us sort of behind.

Jeff: Right, right.

Arijit: So it's a sort of lot of phonetic innovation, but it's also exciting at the same time.

Jeff: Right, right, right. And just the whole concept too, where I think what's best practice quickly becomes expected baseline really, really fast. I mean, what's cutting edge, innovative now unfortunately or fortunately, that become the benchmark by which everything else is measured overnight. That's the thing that just amazes me, what was magical yesterday is just expected, boring behavior today. Alright good, so as we get to the end of the year a lot of exciting stuff, you guys said you're going to be at Reinvent, we will see you there. Anything else that you're looking forward to over the next couple months?

Karthik: Just, we're really excited about Reinvent's big show for us, and we'll have some good announcements around the show. And yeah, looking forward to just continuing to do what we've been doing and deliver more rally to our customers.

Jeff: Love it, just keep working hard.

Jeff: Alright. Arjit, hope your throat gets better before your big talk tomorrow.

Arijit: Yeah, that's right.

Jeff: Alright, thanks for stopping by Karthik, it was great to see you.

Karthik: Great to see you.

Jeff: I'm Jeff, you're watching theCUBE, we're at PagerDuty Summit at the Westin St. Francis in San Francisco. Thanks for watching, see you next time.