Simplified Serverless Insights on Amazon CloudWatch

Oct 2, 2019

 

simplified-serverless-insights-cw

We’ve been asked by our enterprise customers if Thundra has been used in any other enterprise in production before. Well, that brings us the story of how Thundra incepted. Back at the beginning of 2017, Opsgenie was adopting serverless through AWS Lambda while building a new future 100% serverless. During these times, Opsgenie had their internal tooling for making AWS Cloudwatch logs prettier and distinguishing the logs of invocations and so on. However, what they lack was the advanced tracing to see what is actually happening in Lambda functions. That’s why Serkan implemented a library under com.opsgenie called Thundra (Thundra is the bird who can control the storms and lightning and he thought that this name was right for the Alaaddin version of the AWS X-Ray) which traces the Lambda applications method-by-method and even line-by-line. Thundra has been used in Opsgenie for a year in production to gather insights about the AWS Lambda applications since then. This answers the questions about our usage in production by big companies. 

1Thundra bird in Aladdin cartoon

After we spun off Thundra from Opsgenie during its acquisition by Atlassian, we started to talk with more customers of every size about their serverless monitoring pain points and possible solutions to address these pains. We saw that companies of a bigger scale have some in-house solutions (like we do at Opsgenie) for tracing on top of some open-source toolings such as OpenTracing or Opencensus but it’s a pain for them to maintain these solutions for their increasing need for distributed and local tracing combined. They need a solution that they can use for advanced tracing with the least privilege principle which can easily be plugged in and out when needed. They also thought that this solution should be very easy to configure in several minutes. For small and medium-sized companies, on the other hand, the priority is to have a decent solution that can give them enough insights about their serverless applications with minimal effort. The value that they can get from the advanced tracing is a kind of future work for them because they just need to organize their functions better and use tooling in the easiest way possible.

We saw that both parts are happy with the advanced tracing that Thundra provides but they were looking for a solution that can help them to setup Thundra even more easily. We thought that Thundra can help both camps with a solution while keeping in mind that the least privilege principle. For this reason, we are coming up with an important update to our product today. From now on, you’ll be able to plug Thundra to your AWS account and start monitoring all of your serverless functions in just a couple of minutes. With this update, you won’t need to wrap your handler in order to start learning about the performance of your function. Instead, we’ll gather the insights out of your Amazon CloudWatch logs and organize them in a simpler yet more understandable fashion. 

How is this working?

When you first land in the app, you will see an onboarding that asks for access to your AWS account.

2
By clicking on `Connect Thundra`, you will be redirected to your AWS account to set up a CloudFormation template that will install Thundra subscribers for your AWS Lambda functions. Note that you can skip this step if you don’t prefer a third-party solution like us to have access to your account. In this case, you can continue with the old way of instrumenting the functions one by one. 

After you install the CloudFormation to your account, you will start seeing your functions there in less than 3 minutes. 

3

This is how simple it is. You can now start seeing the basic performance metrics of your functions, some basic metrics, logs for the invocations and more. You can use these views forever in order to have a better understanding of your serverless stack but we recommend to instrument your functions for the best way of understanding via distributed and local traces.

Why is this important?

With this new update, our new users will see a new onboarding which will let them set Thundra up for all the functions in their account in less than 5 minutes. In this case, Thundra will be able to provide insights about the serverless functions even without adding Thundra instrumentation to the code. These insights still integrated with Thundra alerts and all the application pages so well that you can use Thundra console and stay on top of performance issues always without using the instrumentation. You can then plug Thundra to multiple AWS accounts and use it as a centralized monitoring solution for several accounts. 

At this point, I should emphasize that plugging Thundra to your AWS accounts is not mandatory. We paid attention to the needs of our enterprise customers and make this skippable because they said to us that even you have all of the security compliances to claim that you’re secure some companies just won’t let a third party tool access to their AWS account. You can still pass the onboarding by instrumenting your functions with our libraries for Java, Node.js, Python, Golang, and .NET. Another reason that we didn’t make an AWS account is mandatory is our long term aims for supporting multi-cloud and non-serverless environments. In that sense, we are the only solution that requires access to your AWS account while making the access non-mandatory.

In order to see the unique features of Thundra such as line-by-line tracing integrated with distributed tracing, you will still need to instrument your code with Thundra libraries but this is also an easy task by just adding Thundra layer to your functions. We’ll make it easier in the very near future but I’ll come to that point in a minute. 

As you may know, Thundra is very well known for its capability of asynchronous monitoring. In which the monitoring data is sent to Thundra backend asynchronously through CloudWatch logs. Previously, our customers were required to install a Lambda function that subscribes to the log group of your function and sends the data to Thundra backend. From now on, you don’t need to install another Lambda function in your account. Just update the environment variable called thundra_agent_lambda_report_cloudwatch_enable to true and you’re done! You won’t add any overhead to your Lambda invocation in this way. We simplified everything possible for your comfort! 

As an existing customer, what do you need to change?

Short answer: Nothing. You can just continue using Thundra as before. But we recommend you take some action in order to take advantage of this new update. You can also plug Thundra to your AWS account if you want to see all the functions in your account - even the ones that you didn’t instrument with Thundra. In order to do that, you should go to the AWS settings page and click on “Add new account”. Using the instructions, you can plug Thundra to your AWS account and start seeing the functions that are not instrumented by Thundra libraries on your Thundra console in a couple of minutes. Note that you won’t lose any data after you did this operation. At this point, I should warn that if there’s another subscriber for the CloudWatch logs, you need to take them out first and let Thundra do its job. 

Before this update, we were gathering the logs and metrics of your functions with our instrumentation. As you may know, we were gathering some advance metrics such as memory usages by pools on top of metrics that can be retrieved from CloudWatch as well - overall memory usage for example-. While rolling this update, we wanted to simplify our agents for our new users and disabled the logs and metrics in our metrics by default. In this case, we aimed to decrease the overhead that our agents can cause while calculating metrics (although it was less than 2 milliseconds on our benchmarks). From now on, We will gather the logs and metrics from CloudWatch by default. In order to take advantage of this improvement, you should update your Thundra libraries as follows: 

  • For Node.js, update your libraries to version more than 2.6.1 or a Layer whose version is more than 25.
  • For Python, update your libraries to version bigger than 2.3.7 or a Layer whose version is bigger than 13.
  • For Java, update your libraries to version bigger than 2.3.15 or a Layer whose version is bigger than 29.
  • For .NET, update your libraries to version bigger than 1.5.0. 

Even after you switch to these versions, you can enable the metrics and Thundra loggers by setting thundra_agent_lambda_log_disable and thundra_agent_lambda_metric_disable to false.

What’s next?

With this update, we make the monitoring with Amazon CloudWatch logs extremely simple. You can now see the performance insights about your serverless apps much more easier than before with no need of instrumentation. But we do believe that instrumentation is required for observability via traces.  Now, the next aim is to make the observability simpler by making the instrumentation simpler. For this reason, we already started working on something that we call “Simple monitoring from Thundra Console”. With this, our users will be able to instrument Lambda functions from the console and get rid of the manual process of instrumenting the functions one-by-one. Plus, this will unlock many capabilities that requires modifications to our library configurations.  For example; you’ll be able to manage your chaos experiments from Thundra console without seeing the function code itself. The future is bright and we are thrilled about it. 

If you want to chat on several stages of serverless with us, you can ping us over Twitter(@thundraio) or join our Slack and let’s chat. You can sign up to Thundra or see live demo to see Thundra in action.