In this post, John Rousseau, Operations Team Lead at Onshape talks to us about how they’re using SignalFx to become data driven across the company.

About Onshape

Onshape is the first and only full-cloud 3D CAD system that lets everyone on a design team work together using any web browser, phone, or tablet. Onshape was built from scratch for the way today’s engineers, designers and manufacturers really work, giving them secure and simultaneous access to a single master version of their CAD data without the hassles of software licenses or copying files.

Tell us about  your team

John: The operations team here does a bit more than a traditional operations team, more like a platforms or SRE organization. We handle larger scale issues like server side scalability development, database scaling and perforamnce, release engineering, security services, internal tools development, and platform work on AWS. 

Tell us a little bit about the nuts and bolts of your application

John: To create our disruptive CAD-as-a-Service product and make it scale with demands of a global user base creating designs on every kind of device, Onshape has been built 100% on AWS. Spread across three regions, the app runs on hundreds of instances. The infrastructure is divided between Dev/Test, Staging, and Production clusters with roles representing the microservices that make up all of Onshape. Teams are organized around their services and supported by an operations organization focused on broader issues like release engineering, security services, scaling infrastructure, building platforms, and providing tooling.

What kind of challenges do you face with monitoring?

John: The main challenges we’ve had have been:

  • Providing self-service access to metric creation, visualization, and analytics so both engineering and non-engineering teams can track and analyze their own KPIs and gett visibility throughout the software development lifecycle—via a consumable interface that doesn’t require learning new query languages or DSLs
  • Handling dynamic data so that charts, dashboards, or alerts don’t need to be modified or refreshed every time a service scales or customers are added or a different time range needs to be looked at
  • Having real-time interactive analysis of our data as it’s streaming to pivot on metadata like customer type or service interactively, so teams can explore and notify against behavioral changes at every level of the app right when they occur, instead of minutes or hours later
  • Limited engineering resources need to be spent on building Onshape instead of cobbling together, customizing open source components into, scaling, and maintaining a metrics platform

Why did you choose SignalFx?

John: We originally chose a different metrics monitoring product, before SignalFx had even launched, but found it to be too expensive to roll out to all parts of the product development pipeline, unable to deal with dynamically changing data, impossible to do interactive analytics in, and to slow to catch problems before they impacted our customers.

We love that SignalFx could provide an easy to use, self-service metrics capability, with instrumentation, visualization, and analytics that can be used by not just operations–but also the developer, product, and UX teams–without making anyone learn a new query language or DSL.

SignalFx was the only solution that we could reasonably roll out to the entire Onshape infrastructure. Host based pricing models force us to make a decision as to whether every new host is worth paying an additional fee for, but SignalFx’s usage-based pricing gives us the flexibility to monitor metrics from every system we care about.

"We don’t just use SignalFx to monitor our infrastructure. We use it to understand how Onshape is being used by customers to guide product development."

John Rousseau
John Rousseau Operations Team Lead

How do you use SignalFx?

John: We send everything from infrastructure metrics to custom application metrics into SignalFx. The majority of metrics we create are at the application level, because that’s where the real value is.

We use SignalFx’s integrations with AWS, collectd, and StatsD extensively for system level metrics as well as custom application metrics that get instrumented directly into Onshape code and matched with internal metadata and external metadata from AWS such as AZs, instance types, and more.

SignalFx dashboards are displayed on big 80″ screens throughout the office so everyone can see their own metrics and higher level metrics in one place. The operations team is in SignalFx all day, every day.



Can you give us an example of how SignalFx has made life better?

John: Here’s a very concrete example: 

Recently, we created charts to visualize API metrics and anomaly detectors to alert against spikes in requests. After catching a huge surge in requests from a single IP address on a Sunday, 10x normal, we discovered a partner was crawling and exporting our content. They were under the impression that that was ok because they had been given the go-ahead from another part of Onshape, but that would have caused performance problems for other partners and our customers. Without SignalFx, we wouldn’t have caught it in time.

About the authors

Aneel Lakhani

Aneel is a marketer. Previously he worked on marketing at other startups, served as a Research Director at Gartner, and did stints at big companies like Cisco and IBM doing everything from sales engineering to product management and large scale systems architecture.

Enjoyed this blog post? Sign up for our blog updates