New Relic to open-source Pixie’s eBPF observability platform

neelaj · on Dec 10, 2020

My take is that eBPF is such a powerful technology that it holds the potential to fundamentally change how Networking, Observability and Security are delivered. Just like Virtualization and now Containers/Kubernetes challenged an entire industry focused on big servers, the ability to safely embed programs into the Linux Kernel w/ eBPF challenges every incumbent. We're seeing every incumbent vendor look hard at the technology and make build vs. buy calculations. New Relic probably compared acquiring Pixie this early to either building internally or waiting to see how things developed for both the market and Pixie. I'm guessing they made a bet that the longer they waited the more expensive it would be to buy an eBPF based company. It was a good fit so they pulled the trigger, de-risking things for both themselves and the Pixie team (Note I have no inside info this is all just a guess).

daxfohl · on Dec 10, 2020

I wonder if there's anything in the pipeline for storage.

reacharavindh · on Dec 11, 2020

I heard NVMe over Fabric was that thing that would transform storage in the future. Network storage near the latency and throughput of local storage, but with the flexibility and redundancy.

daxfohl · on Dec 11, 2020

Neat, I'd never heard of it. It does look interesting!

cobbzilla · on Dec 10, 2020

For those like me who had no idea what eBPF is — https://blog.pixielabs.ai/ebpf and https://blog.pixielabs.ai/ebpf-http-tracing/

sys_call · on Dec 10, 2020

Also https://ebpf.io/

atmosx · on Dec 10, 2020

For those who want to play with eBPF on kubernetes using an open source tool there is cilium & hubble[1], plug-and-play CNI & Network Observability tool.

[1]: https://docs.cilium.io/en/v1.9/intro/

boston_sre87 · on Dec 11, 2020

These iovisor guys write some cool stuff that uses bpf, like: https://github.com/iovisor/kubectl-trace

thejonanshow · on Dec 10, 2020

I'm curious to hear people's thoughts on eBPF generally, it seems likely that this is where observability companies are headed. It's non-trivial to implement but monitoring from the kernel layer makes so much sense that I expect the tooling will come along quickly.

dan-buzzkill · on Dec 10, 2020

eBPF is great but it only works for linux and access to the kernel layer doesn't work for serverless environments, so it's definitely a piece of the puzzle but not a silver bullet IMO.

zasgar · on Dec 10, 2020

Co-founder/CEO of Pixie here.

eBPF is indeed a part of the puzzle. It allows us to access telemetry data without any manual instrumentation when running on Linux machines.

Pixie itself is extendable and currently ingests data from many other sources as well. Joining forces with New Relic will allow us to focus on expanding the open-source project, but also expand our capabilities by plugging into other open APIs and frameworks such as OpenTelemetry, Grafana, Prometheus.

pstuart · on Dec 10, 2020

I caught your presentation to GoSF a couple weeks ago -- it was very impressive and I'm looking forward to the opportunity to apply lessons learned from that.

https://github.com/pixie-labs/pixie/tree/main/demos/simple-g...

p.s. The slides would be nice to have too :-)

zasgar · on Dec 11, 2020

Thanks for attending the talk:

Slides: https://www2.slideshare.net/ZainAsgar/no-instrumentation-gol...

Write up: https://blog.pixielabs.ai/ebpf-http-tracing/ https://blog.pixielabs.ai/ebpf-function-tracing/post/

ed_elliott_asc · on Dec 11, 2020

Hi,

Why do you need the kernel support if you modify the binaries, why not insert a function to write your logs and then insert a call to that function rather than relying on kernel support via an int 3?

Curious really.

serbrech · on Dec 11, 2020

performance, plus lower level metrics that tell you about the health of the system/host, disregarding what software is running.

xiphias2 · on Dec 10, 2020

Is there any technical reason why it couldn't work with serverless?

yjftsjthsd-h · on Dec 10, 2020

Because serverless still needs to run on a machine, and that machine is typically at least one of 1. shared with other users, in which case giving you kernel access would be a security issue, or 2. ephemeral (firecracker VM or such) in which case eBPF is... technically possible, but not nearly as useful (you go from "this server has had X events of type Y over the last 24 hours" to "this VM had X events happen in the 590ms before it was destroyed").

xiphias2 · on Dec 10, 2020

I see, I thought it could be used for some simple thing, like a load balancer / proxy with a bit of logic in it, but I guess it's too constrainted to do something useful as a server

throwaway894345 · on Dec 10, 2020

What's the advantage of monitoring from the kernel layer? It's not jumping out at me...

lbotos · on Dec 10, 2020

At GitLab we've built and used strace-parser to help us get to quick and deep debugging, but running strace is EXPENSIVE: https://gitlab.com/gitlab-com/support/toolbox/strace-parser

We've been eagerly awaiting some customers to adopt newer kernels so we can start leveraging eBPF because of the performance gains in these type of scenarios.

Getting down the the kernel often can help find problems with disk access or network issues.

One of our staff engineers is exploring it now for NFS stats: https://gitlab.com/wchandler/tracing-tools/-/blob/master/nfs...

In Support Engineering we often straddle the line of 'SRE style stare at graphs and configuration as code' and 'log on to the box and look at syscalls'. We are very very excited about eBPF.

star-trek-fleet · on Dec 10, 2020

Take a look at Pixie dynamic logging [1].

Deploys eBPF kprobes (based on bpftrace) and uprobe (based a custom front-end language) and instantly get rich data (arts, return value, latency), query data in a Pandas-like scripting language and visual dashboard.

[1] https://docs.pixielabs.ai/using-pixie/code-tracing/

star-trek-fleet · on Dec 10, 2020

A few things:

* Non intrusive: meaning one can snoop info of application without changing application code.

* Deep visibility: function level and syscall/kernel functions reveal more context and are more accurate in a lot of cases.

* Low overhead: everything runs inside kernel space, no context switching compared to other kernel based/aided tracing.

* Expressiveness: eBPF is fairly expressive, can do many things that usually are exclusive to high level programming languages.

SEJeff · on Dec 10, 2020

Tracing from user to kernel back to userspace in one single pane of glass. Have you never heard anyone hype about DTrace on Solaris? This allows building similar things on production systems with little to no impact on running production applications.

emanlin · on Dec 10, 2020

dtrace has also been on macOS since 10.5, with a pretty nice GUI app as well. I’ve used it to trace ruby and python code to isolate slow API requests from the end user’s perspective.

colechristensen · on Dec 10, 2020

That sounds really interesting, do you have any references to share about macOS gui dtrace?

SEJeff · on Dec 11, 2020

The app is called Instruments. Check it out at /Developer/Applications/Instruments

https://developer.apple.com/library/archive/documentation/An...

suthakamal · on Dec 10, 2020

hadn't thought of this like DTrace. Thx for that!

SEJeff · on Dec 10, 2020

Anytime! eBPF is Linux’s equivalent of DTrace

cyphar · on Dec 10, 2020

Strictly speaking, one of the uses of eBPF is a DTrace-like tracing tool but it's also used in quite a few other places (for instance it's also used for devices cgroup policy in cgroupv2).

SEJeff · on Dec 11, 2020

And filtering syscalls for seccomp or literally packet filtering. I believe Jens Axboe and team were looking at using eBPF for some of the low level IO subsystems.

But I was explaining the most common and obvious (for a user) use of eBPF.

They should just look at the io visor project to see some of the stuff that can be done with it (disclaimer, I work with one of the io visor maintainers)

seneca · on Dec 10, 2020

The kernel has a "god's eye view" of everything happening on its OS. With event based tracing in the kernel there's no chance of missing an occurrence because your sampling rate is too low, for example. You can also correlate and enrich data that just isn't available in userspace.

boston_sre87 · on Dec 11, 2020

It's extremely quick to do it there as well. Originally, bpf was created to replace tcpdump with a quicker less impactful alternative. People saw that it was a pretty neat alternative so they started extending it (the e in eBPF). Think they might have finally come round again and just call it BPF now again.

morelisp · on Dec 10, 2020

For hardware management maybe but so much of "observability" in general is at the application layer I can't see BPF displacing anything more than a tiny corner of it.

On the other hand - it already is overhauling service meshes, VPNs and firewalls and network security policies, etc. The stuff fly.io is debuting now is probably going to be standard in few years.

seneca · on Dec 10, 2020

Strongly disagree. BPF uprobes allow extremely fine grained tracing of userspace applications, and allow you to programmatically correlate them with kernel level information.

morelisp · on Dec 11, 2020

Sure, but "fine-grained tracing" is itself such a small part of observability even if BPF takes over that entire part of the stack it's still nowhere near a complete observability story.

This is outside my area of expertise so maybe I'm just missing some deeper insight - and I can see where you're coming from for traditional "throw the artifact over the wall, good luck running it" system operations kind of stuff. Or debugging services in production, which tracing is a key part of - but again just a small part of observability, and one that the rest of your dev process should be actively trying to minimize. If you have any degree of DevOps going on, many key SLIs will much higher-level (p99 of HTTP requests, MB of storage per customer), and I don't see how eBPF addresses that better than existing instrumentation, or in some cases at all.

jeffbee · on Dec 10, 2020

Sure but it's also terribly expensive. Are you trying to give up 5-10% of your CPU time to observability?

krab · on Dec 10, 2020

Gladly, if it doesn't influence latency much and it is presented intelligently in terms of my high level code.

paulie_a · on Dec 10, 2020

Most CPUs are not hitting 100 percent anyways so what's a 10 percent loss?

smithclay · on Dec 10, 2020

This adds to some interestsing acquisition activity in this area: Splunk, for example, just acquired Flowmill (also focusing on eBPF + observability).

https://techcrunch.com/2020/11/24/splunk-acquires-network-ob...

[blatantly promoting my substack] Been following this area about a year, first wrote about some of startups using eBPF here in late 2019: https://monitoring2.substack.com/p/ebpf-a-new-bff-for-observ...

alfanick · on Dec 10, 2020

I know what eBPF is, but what is Pixie? A single sentence describing Pixie wouid be useful.

In general, academic rule applies: Never write an article without defining terms beforehand.

nserrino · on Dec 10, 2020

Engineer from Pixie here. Pixie is an APM tool for Kubernetes leveraging eBPF for automatic data collection. You can use it to monitor and debug your application performance, without code or configuration changes.

eatmyshorts · on Dec 10, 2020

You're being too modest. Pixie is a general purpose observability and monitoring tool, with especially strong attributes around APM.

alfanick · on Dec 10, 2020

APM? /s

nserrino · on Dec 10, 2020

/s noted (but point also taken). For those unfamiliar: APM = application performance monitoring. Essentially, figuring out why your application is slow or broken.

torbital · on Dec 11, 2020

Couldn't this just be called 'monitoring'?

mleonhard · on Dec 11, 2020

New Relic calls it something different so they can charge a lot more for it.

mykhamill · on Dec 10, 2020

Application Performance Monitoring

https://docs.newrelic.com/docs/apm/new-relic-apm/getting-sta...

hdz · on Dec 11, 2020

Any plans to bring this to Cloud Foundry?

bogomipz · on Dec 10, 2020

I was not familiar with this project. In looking at the architecture[1] documentation it indicates it uses the Pixie Cloud. Is is possible to use this without the Pixie cloud?

[1] https://docs.pixielabs.ai/about-pixie/how-pixie-works/

zasgar · on Dec 10, 2020

Co-founder/CEO of Pixie here.

In the current form Pixie Cloud is required since it's a freemium product.

We are planning to open-source a self-managed version of Pixie early next year, and it can be used completely self-hosted.

bogomipz · on Dec 10, 2020

Thanks, I will be looking forward to this announcement as the current requirement would prevent me from using it. Cheers.

suthakamal · on Dec 10, 2020

How helpful is this to some/any of hybrid/multi-cloud and serverless? Analysing most of the telemetry at the edge seems like it could be helpful there. Useful for instrumenting low-latency edge applications?

ssssttt · on Dec 10, 2020

seems ambitious. why would a public company like new relic do this?

imukherjee · on Dec 10, 2020

Co-Founder/CPO of Pixie here.

This definitely is an unprecedented and forward looking investment by New Relic. While ambitious, it became evident in our conversations with them they we committed on standardizing on open source telemetry standards such as Prometheus, Open-Telemetry, Graphana etc.

We're not yet in a position to speak for them in detail but we believe this bet on our project reinforces their plan to open-source telemetry layer to accelerate the adoption of observability practices by developers.

Hope that adds some color! would be great to hear more thoughts.

pkaye · on Dec 10, 2020

How does Pixie technology compare to Sysdig? Do they do similar things?

imukherjee · on Dec 10, 2020

The underlying data collection approach is similar however, our focus in application performance monitoring for developers and Sysdig's focus is on container level security & monitoring for devops,devsecops (@Sysdig folks please correct me if I am wrong :) )

Sysdig was a pioneer in harvesting data from the kernel. Their original solution required installing a kernel module and they are now moving to eBPF based approaches. The Falco project is really exciting.

Since we're a relatively new project (started 2 years ago) we started with eBPF and built our platform around it. As we open source we'll share with groups like Falco and hopefully collaborate.

PoignardAzur · on Dec 10, 2020

At a guess, to commoditize their complement.

nui_manako · on Dec 10, 2020

Don’t see the source code in GitHub. When do you plan to open it up? Interested in digging into your engine code.

zasgar · on Dec 10, 2020

Co-founder/CEO of Pixie here.

We are preparing to open source early next year, so stay tuned. The code will be posted on our repository, that currently hosts our community scripts: https://github.com/pixie-labs/pixie.

raulreyes · on Dec 10, 2020

Pixie scripts FTW.

avinassh · on Dec 10, 2020

What does signing a definitive agreement mean?

iampims · on Dec 10, 2020

Pixie agreed to be acquired by NewRelic

boston_sre87 · on Dec 11, 2020

Yea, the deal is not yet closed, but assuming no regulatory hold up (think new relic is public, so if pixie is worth enough, they'll need to go through HSR) or similar stuff, they agree that the acquisition will go through.