Monitoring: An Analytics Problem

Software no longer just supports your business -- it is your business. In a world of always-on, always-connected digital experiences, expectations for real-time, personalized engagement is the norm. Companies can no longer afford costly errors and slow time to resolution.

Organizations of all sizes are embracing cloud-native technologies to stay competitive. But that’s just the beginning. They’re also applying automation and cognitive tools to deal with massive data volumes that have outstripped the capabilities of traditional monitoring technology.

What’s needed now is a cloud monitoring solution built on a streaming architecture, with real-time analytics and operational intelligence to separate meaningful signals from mountains of noise, using automation to address issues before they affect customers. In real-time. At any scale.

Join SignalFx CEO Karthik Rau and Chief Architect Rajesh Raman as they explore these topics and more with John Furrier of theCUBE at Google Next 2018.



Join our live demo to learn how companies including Yelp,Kayak and Hubspot leverage SignalFx metrics monitoring for instant visibility and alerting into their entire environment.


John: Live from San Francisco, it's theCUBE covering Google Cloud Next 2018, brought to you by Google Cloud and its ecosystem partners. (techy music)

John: Hello everyone, welcome back to theCUBE's live coverage here. We're in San Francisco for Google Cloud's major conference, Next 2018. I'm John Furrier, here for three days. Wall to wall coverage on day one. We've got two great guests from SignalFx, Karthik Rau, founder and CEO, and Rajesh Raman, who's the chief architect. Signal's a hot startup in the area. Way ahead of its time, but now as the world gets more advanced, the solution is front and center as the value proposition if cloud moves into the mainstream, devops going to a world at large scale. Not just networking, monitoring, applications, you've got service meshes booming, great topic. Karthik, great to see you, Rajesh, thanks for joining us.

Karthik: Thank you.

Rajesh: John, great to be on.

John: So, first of all let's just get it out of the way, you guys have some fresh funding in May, so just quickly give an update on the company. You guys raised--

Karthik: Yeah

John: A series...

Karthik: A series D.

John: Series D, give us, but how much?

Karthik: Yeah, so we raised $45 million from General Catalyst leading the round back in May, been building a ton of momentum as a company, close to a couple hundred people today. We're using a lot of that to expand internationally. We've got a team in Europe now, just opened up a team in Australia. So, things have been going great.

John: Congratulations, we've had chats before, always been impressed. You guys have a great stable of awesome engineers and talent in the company doing some great work, but it begs the question, I always like to get into the what ifs. What if I could have large scale application development environments with programmable infrastructure, how does that change things? So, Karthik, what's... How as that what if changes, now that is what's happening you're starting to see the cloud at scale for the common masses of enterprises, where old ways of doing things are kind of moving away. It's like horse and buggy versus having a car for the first time--

Karthik: Yeah.

John: Jobs are changing, but the value doesn't necessarily change. You still go from point A to point B, you still got an engine, people who care about fixing cars, so people just want to drive the cloud, some people want to get under the hood, whole new architecture.

Karthik: Yeah.

John: What's the what if of if I could have all these resources, what's the challenges and what do you guys solve.

Karthik: Well, I think there are a couple of challenges in this new environment. One is the number of components are just orders of magnitude more than they used to be in a cloud environment, right? We went from having physical machines that live for three years in a data center, divide it up into VMs 10 years ago, now divided up into containers for every process. Not only that, but these containers get spun up and spun down every few minutes or every few hours, and so it's just the number of components in the churn is just massive. So, that in and of itself requires a far more analytics-based approach to understand patterns rather than what's happening on an individual component. The second thing that's changed is the operating model's fundamentally different, because now you're building and running web services, and when you're running web services the people who build the software are the ones who technically are responsible for operating it. And so, you know, you have more updates, you've got more people involved, you've got lots of different components that all need to interact with one another, and so having a communication framework across all of these disparate teams become really, really, really critical. So, those are the two fundamental changes as you move from, you know, for operating these modern, massively distributed Applications.

John: Yes.

John: And I'll just add just some observation data that we've seeing in theCUBE is those same folks building aren't necessarily operators, so they want to be in and out fast, right? (laughs)

John: They don't want to be running and operating all the time, they want to push some code. Melody Meckfessel here at Google ran a survey with developers and said, you know, "What makes you happy," and it was two things that bothered developers: technical debt and speed for deployments, commits, and the commit number was around minutes. If you can't get something done in minutes then they're onto something else, so the mind share attention of developers and technicos. So, this is a challenge at scale when you have technical debt, which we've seen companies come out of the woodwork, "Oh, yeah, "I'm going to automate something, "I'm going to throw some compute at it with the cloud "with the best monitoring package on the planet "and look how great it is," but all they did was just code some instrumentation and that's it.

Karthik: Mm-hmm.

John: They weren't dealing with a lot of moving parts. Now as more things come in this is a challenge that a lot of companies face. You guys kind of solved this problem...

Karthik: Yeah, absolutely, so maybe Rajesh was a part of the team at Facebook that built the Facebook monitoring system, and that's actually what gave us a lot of the vision to start SignalFx five-and-a-half years ago, so maybe--

John: Tell about the protection, the vision--

Karthik: Yeah.

John: And what you guys are doing.

Rajesh: Yeah, so CICD, you know, it kind of, like, underlies a lot of this vision of, like, moving fast. You mentioned that people wanted, like, you know, push their code in a few minutes... The thing that makes that possible is for you to have observability into what's happening while that push happens, because it's one thing to push very fast, it's another thing to recognize that you might have pushed something bad and to be able to revert it very quickly, too. And so, you'd only need, like, you know, good observability into all the things that matter that characterize the health of your system to be able to quickly recognize patterns, to be able to quickly recognize anomalies, and to be able to maybe push forward or even roll back very quickly. So, I think, like, observability is like a very key aspect of this entire CICD story.

John: That's great, and that's great to know that you were over at Facebook because obviously Facebook built, at scale from the ground up, a lot of opensource. Obviously they contributed a lot to opensource, but it's interesting, as they matured and you start to see their philosophy change. It used to be move fast, break stuff.

Rajesh: Yeah.

John: To move fast, be reliable.

Rajesh: Yeah.

John: This is now the norm that's the table stakes in cloud. You have to move fast, you got to push code, but you got to maintain an operational integrity. This is, like, not like an option. This is, like, standard. >> Absolutely.

John: How do you guys help solve that problem?

Rajesh: So, I think there are a few different aspects to it. So, the first is to, you know, people need to ensure that they have observability into their application, so this is ensuring that you have the right kind of instrumentation in place. Thankfully this is kind of becoming commoditized right now and getting metrics from your system. The second part, and the more key part, is then being able to process this data in a real time way. You know, have high resolution, very low latency, and then to be able to do real time streaming analytics on this data. In highly elastic environments when things come and go very quickly, the identity of any individual, like, component is less important than the aggregate system behavior, and so you really need the analytics capability to kind of, like, go across this data, do various kinds of aggregations, compare it against past data, do predictive analytics, that sort of thing. So, analytics becomes the very key concept of, you know, how you operate these environments.

John: It sounds so easy.

Karthik: Yeah, well one thing I'll add to that, so you know, to your point a lot of big companies sometimes are scared by this. You know, "How do we," you know... "We can't move quickly and break things," and everything that they've designed is around having process and structure to check and make sure everything is clean before they push changes out, and now we're in this world where, you know, an intern or a developer can push directly on a production, how do you manage that? The key thing in this modern world when you're trying to release software quickly, Rajesh hit on this earlier, you need the magic undo button.

John: Yeah.

Karthik: That is the key to this entire process. You need to design your software, you need to design your process, and you need to design your tools so that if you introduce something bad you catch it immediately and you can roll it back. So, lots of devops practices are oriented around this, right? The idea of a canary release, I'm going to roll out an update to one percent of my systems and users, test it out, observe all the metrics, make sure everything is clean before I roll it out to everyone else, and the ability to roll back quickly is also important. But in order to do all of this you need the visibility, you need the metrics, and you need to be able to do analytics on it quickly to identify the patterns as they emerge.

John: That's a great point and I'd love to just double down on that and get your thoughts because some of the Google Cloud people who are operating at this scale, I put them on this whole service-centric architecture, because they're services. We're talking about services, managing sets of services, having analytics, observation space, the reverting back and the undo button, the magic button do-over, whatever you want to call it, but the interesting thing is clean. Having a clean service whether it's an API, message queue, or an event, this stuff's happening all over the place in the new services world. How do you guys help there, is that where you guys get involved? Do you see up in that layer, how far up are you guys looking at some of the instrumentation and the insights?

Karthik: Yeah, you want to take that?

Rajesh: Yeah, sure, so you know, the one thing that we really like about SignalFx and we were very keen on when we built the platform is that we are very agnostic about metrics. We're happy to accept metrics from anywhere, we'll take instrumentation--

John: (chuckles) You don't discriminate against metrics.

Rajesh: We'll take instrumentation from cloud environment, we'll take, you know, metrics from opensource systems and premier applications, so you know, some of these systems are already kind of built in to get metrics from. You know, we talk to the Kafkas and Cassandras of the world, for example. We can also talk to GCP and AWS and grab metrics from their system. I think the interesting question is like when people really are taking the devops philosophy of, like, so how do you instrument your own application, what questions do you want to ask from your environment that answer the critical questions that you kind of have, and so you know, that's the one, that's the next step in the hierarchy of needs is for people to ask the right kinds of questions, and you know, instrument their applications properly. But like having done that, we can go up and down the stack in terms of, like, insight into whether all the way from your cloud environment through opensource systems, all the way up--

John: So, you guys'll take data from anyone, just stream it in--

Rajesh: Yeah.

John: Normal mechanisms there, what's the value added, where's the secret sauce on SignalFx?

Rajesh: So, I think value, it's all about analytics. We are all about analytics, so we are able to look at patterns of the data, we can go up and down the stack and correlate across different layers of software, look at interactions across components in your microservice, for example. You know, one really interesting thing that's happening, as you might be aware, like the whole service mesh aspect of it, which lets us, gives us insight into interactions between components--

John: Yeah.

Rajesh: In a microservices architecture, so you know, we are able to get all that data and give you insight into how your whole system is working.

John: So, you guys, you can see in the microservices layer?

Rajesh: Absolutely.

Rajesh: Yeah.

John: That's powerful.

Karthik: And the key point is monitoring really has become an analytics problem, that's what we keep saying, right, because what's happening on an individual component is no longer as interesting as what's happening across the entire service, so you have to aggregate the information and look at the trend across the entire service, but the second thing that's really important is you need to be able to do it quickly, and this is where our streaming real time system really mattes. And people might ask, "Why does it "matter to do something real time." Like, "Seconds versus minutes, can a human actually "process something in seconds versus minutes?" Perhaps not, but everyone's moving towards automation, right?

John: Yeah.

Karthik: So, if you want to move to a system where you have a closed loop, you have automation, and guess what, all of these modern systems, all the stuff that Google's talking here is all about automation.

John: Yeah.

Karthik: And in that world seconds versus minutes, it means a tremendous amount of difference, right, where if you can find signals that will tell you there's an emerging problem within seconds and then you can revert a bad code push or you can auto-scale a cluster or you can, you know, change your load balancing algorithms all within seconds, that is what enables you to deliver, you know, 4.9s, 5.9s type of availability.

John: And the consequences of not having that is outages--

Karthik: Yeah, outages.

Karthik: Performance.

Karthik: Performance degradations, unhappy customers. I mean the cost to a brand now of having any kind of a performance issue is enormous, right? People are on Twitter before your team knows about it. (chuckles)

John: Actually, you guys have a lot of the things you're solving, what is the core problem that you solve, what's the value proposition if you narrow it down that's high order bit for SignalFx? What's the corporate problem you solve?

Karthik: Well, we're solving the monitoring and observation problem for people operating cloud applications, so what happens is when you use SignalFx you have the confidence to move quickly, right? It gives you the safety net to be able to deploy changes on a daily basis, to have the shared context across a distributed team, so if you've got hundreds of two pizza box teams working together we give you that framework, the communication framework and the proactive intelligence to find issues as they emerge and proactively address them. And bottom line what that means is you can move as quickly as a Google or a Facebook or a Netflix even if you're a traditional Fortune 500 company that's regulated, and you know, you think you may not be able to do it but you really can.

John: You give them the turbo charge, basically, for the analytics. All right, here's a question for you, what are the core guiding principles for the company? You guys obviously have a lot going on so you've got a core tech team, I mentioned it earlier.

John: Mm-hmm.

John: What are some of the guiding principles as you guys hire, build product, talk to customers, what's the key DNA of SignalFx?

Karthik: Yeah, I would say we are a very impact-driven company, so I'm, you know, very, very proud of all the people that we have on the team. We've got a lot of entrepreneurs who are focused on solving big problems, solving problems that customers may not necessarily know they need at the time, but as the market evolves we're there to solve it for them. So, we're a very customer-centric company. We have fantastic, we invest aggressively in technology, so it's not just about wrapping a pretty UI around, you know, Bolton Tech. We have real differentiated technology that solves real problems for people, and you know, I think we've in general just tried to skate to where the puck is and understand where the market's headed as a company.

John: What are some of the customer feedback that you're getting? For folks that don't know SignalFx, what are some of the things that you're hearing from customers, why are you winning, what are some of the examples, can you share some color commentary?

Karthik: Yeah, I'll give one example, a Fortune 500 company that has been very aggressively investing in cloud the past, you know, four or five years, built an entire digital team, and their entire initiative is, like a lot of people in the Fortune 500 now, is to have a direct-to-consumer type of a relationship, and one of the things that they struggled with early was how do they move quickly, support product launches that might have massive load, and have the visibility to know that they can do that and catch issues as they emerge, and they didn't have a solution that could give that visibility to them until they leveraged SignalFx. And so now, if you talk to people there they'll say that they've essentially gone from defense every time they did one of these product launches to being on offense and really understanding what it takes to successfully launch a product and they're doing way more of these, so--

John: Moving the needle on time to market.

Karthik: Moving their business forward, you know, and digital transformation just by--

John: Yeah.

Karthik: Having SignalFx as a core enabler.

John: It's the cloud version of putting out fires, so to speak, when you do product launches, right?

Karthik: Yeah.

John: I got to ask you guys a question. You guys are both industry veterans, obviously Facebook has a storied history. We know all the great things that happen on the infrastructure side. Karthik, you've been in VM where you've seen the movie before where VM, where it made the market, changed IT for the better, still talk about the VMwares now. Now as we see cloud taking that next transformational push, describe the wave we're on right now, because it's kind of an interesting time in tech history where the talent that's coming in is pretty amazing. The young guns coming in with opensource the way it's flourishing is pretty phenomenal. Some of the smartest computer science and/or engineering talent is really solving what was old school B2B problems that really no one really wanted to solve. I mean, it was people were buying IT. Now you're talking about building operating systems, so the computer science kind of mojo in the enterprise has upped a bit.

Karthik: Mm-hmm.

John: What's this wave about, how would you describe the wave of this time in history of the tech industry?

Karthik: Do you want to... (laughs) I'll add my take but why don't you go first.

Rajesh: I think the thing that I find striking is just like, you know, when people used to talk about big data, you know, a few years ago, and now that is like, that's just normal.

John: Yeah.

Rajesh: And like, the amount of compute and the amount of storage that people are able to, you know, bring to command at--

John: Yeah.

Rajesh: On any problem, it's just incredible, and that's just going to, I think, like continue to grow, right? That's going to be an amazing thing to watch. I think, you know, what this means... It also has interesting implications for, you know, companies like SignalFx who are trying to be in the monitoring space because the mojo used to be you had to have all this complicated software to do the instrumentation. Well, the instrumentations part is easy, but now all the value that's going to come about monitoring is in what you do with all that data, how you analyze it and look for, like, you know, so the whole AI ops and all that is going to be the key of the whole monitoring problem going forward, you know, five, 10 years from now, but we already see that analytics is such a key aspect of the whole thing, so...

Karthik: Yeah, I'm very, I think we're at the beginning, still at the beginning of a massive 30 to 40 year cycle, and this hasn't happened since the PC revolution in the 1970s, right, so the smartphone comes out 2007, massively opens up the market for software-based services to several billion people who are connected all the time now, drives a massive refresh of the backend infrastructure, drives the adoption of opensource, and so we're at this magical point now where the market for software-based services is just exploding, and every enterprise, you know, is becoming a software company, and you know, I think the volume of data that we're accumulating is just growing exponentially and what you can do with AI at this point, it's just... We're just beginning to see the benefit of it, so I think this is a really, really exciting time and I think we're just at the beginning. Most of the enterprises, and even the tech companies, are just beginning to capitalize on what is in store for us in the industry.

John: I find it to be intoxicating, fun, and just great people coming in. To your point about the beginning of a 40 year run, also the nature of software development is being modernized at an extremely accelerated pace, so as people in the enterprise start re-imagining how they do software, because if they're a software company they've never had product managers. I mean, so the notion of what is a product, how do you launch a product, is all kind of first generation problems and opportunities, so I think to me it's really the enablement... And this is really what I think people are looking for is who can take the burden off my shoulders, help me move faster, more gas, less brake.

Karthik: Mm-hmm.

John: Go faster, drive value, and then ultimately compete, because competitive advantage with technology... What does that mean to you guys, because how do you react to that because what you essentially are doing is creating instrumentation for enabling companies to create new value faster with technology and software, in some cases at a level that they've never seen before. What do you, how do you react to that?

Karthik: Well, I think that's exactly what we do, right, I mean, every company, I think most companies realized that they had to invest in software and focus on all these new opportunities at the early part of this decade. First thing they had to do was figure out who's going to build all this software, so most of them had to go hire engineers or build digital teams. They had to decide where are they going to run, the cloud wars of, you know, the early part of this decade. Do we build a private cloud, do we use a public cloud, I think both of those things have happened and people are now comfortable with those decisions. The third leg, which is squarely in the space that we're in, which is how do you operationalize this new model, and I think people are working through that now. As they get through that in the next few years, the companies like SignalFx helping every company, operationalize it very quickly, I think that's when the true promise of this new digital era will be realized where you'll start to see all of these fantastic applications, mobile apps, web service apps, direct-to-consumer streamlined supply chains. We're just beginning to see the benefit of that, and we'll see when that happens then the volume of data that they're collecting will increase exponentially and then the promise of machine learning and AI will take an altogether nother step.

John: You got to know how to automate it before you can automate it, basically. What's next, final question for you guys, what's going on with SignalFx, what are you guys going to conquer, what's the next major milestones for you guys, what are you looking to do?

Karthik: Yeah, well we're continuing to focus on driving value for our customers, so we're expanding our geographic presence, so we're doing a lot of international expansion at this point. We're hiring a lot of engineers, so if anyone is interested in a development job, reach out to us.

John: What kind of engineers are you looking to hire?

Karthik: Rajesh, you want to take that, sorry. (chuckles) What kind of engineers...

John: What kind of engineers you looking to hire?

Rajesh: Everything. (chuckles)

Rajesh: I mean, all kinds of engineers, especially distributed systems engineers, front end engineers, full stack engineers, like real tech, all the good engineers we can get.

John: (chuckles) Awesome.

Karthik: A lot of product development, there's a lot of interesting things happening in this space, and so we're, you know, continuing to invest very aggressively.

John: Large scale distributed systems.

Karthik: Yep.

John: You've got decentralized right around the corner, so you've got a lot of stuff happening.

Karthik: Yeah.

Rajesh: Yeah.

John: Great job to have you coming on, thanks for coming on, Karthik.

Karthik: Great, great to be on.

John: Rajesh, thank you so much.

Rajesh: My pleasure.

John: SignalFx here in the cloud of Google here at Next, it's theCUBE, theCUBE cloud, CUBE data, we're bringing it all to you. I'm John Furrier, thanks for watching. More coverage, stay with us, we'll be back after this short break. (techy music)