Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New Relic to open-source Pixie’s eBPF observability platform (pixielabs.ai)
345 points by htroisi on Dec 10, 2020 | hide | past | favorite | 63 comments


My take is that eBPF is such a powerful technology that it holds the potential to fundamentally change how Networking, Observability and Security are delivered. Just like Virtualization and now Containers/Kubernetes challenged an entire industry focused on big servers, the ability to safely embed programs into the Linux Kernel w/ eBPF challenges every incumbent. We're seeing every incumbent vendor look hard at the technology and make build vs. buy calculations. New Relic probably compared acquiring Pixie this early to either building internally or waiting to see how things developed for both the market and Pixie. I'm guessing they made a bet that the longer they waited the more expensive it would be to buy an eBPF based company. It was a good fit so they pulled the trigger, de-risking things for both themselves and the Pixie team (Note I have no inside info this is all just a guess).


I wonder if there's anything in the pipeline for storage.


I heard NVMe over Fabric was that thing that would transform storage in the future. Network storage near the latency and throughput of local storage, but with the flexibility and redundancy.


Neat, I'd never heard of it. It does look interesting!


For those like me who had no idea what eBPF is — https://blog.pixielabs.ai/ebpf and https://blog.pixielabs.ai/ebpf-http-tracing/



For those who want to play with eBPF on kubernetes using an open source tool there is cilium & hubble[1], plug-and-play CNI & Network Observability tool.

[1]: https://docs.cilium.io/en/v1.9/intro/


These iovisor guys write some cool stuff that uses bpf, like: https://github.com/iovisor/kubectl-trace


I'm curious to hear people's thoughts on eBPF generally, it seems likely that this is where observability companies are headed. It's non-trivial to implement but monitoring from the kernel layer makes so much sense that I expect the tooling will come along quickly.


eBPF is great but it only works for linux and access to the kernel layer doesn't work for serverless environments, so it's definitely a piece of the puzzle but not a silver bullet IMO.


Co-founder/CEO of Pixie here.

eBPF is indeed a part of the puzzle. It allows us to access telemetry data without any manual instrumentation when running on Linux machines.

Pixie itself is extendable and currently ingests data from many other sources as well. Joining forces with New Relic will allow us to focus on expanding the open-source project, but also expand our capabilities by plugging into other open APIs and frameworks such as OpenTelemetry, Grafana, Prometheus.


I caught your presentation to GoSF a couple weeks ago -- it was very impressive and I'm looking forward to the opportunity to apply lessons learned from that.

https://github.com/pixie-labs/pixie/tree/main/demos/simple-g...

p.s. The slides would be nice to have too :-)



Hi,

Why do you need the kernel support if you modify the binaries, why not insert a function to write your logs and then insert a call to that function rather than relying on kernel support via an int 3?

Curious really.


performance, plus lower level metrics that tell you about the health of the system/host, disregarding what software is running.


Is there any technical reason why it couldn't work with serverless?


Because serverless still needs to run on a machine, and that machine is typically at least one of 1. shared with other users, in which case giving you kernel access would be a security issue, or 2. ephemeral (firecracker VM or such) in which case eBPF is... technically possible, but not nearly as useful (you go from "this server has had X events of type Y over the last 24 hours" to "this VM had X events happen in the 590ms before it was destroyed").


I see, I thought it could be used for some simple thing, like a load balancer / proxy with a bit of logic in it, but I guess it's too constrainted to do something useful as a server


What's the advantage of monitoring from the kernel layer? It's not jumping out at me...


At GitLab we've built and used strace-parser to help us get to quick and deep debugging, but running strace is EXPENSIVE: https://gitlab.com/gitlab-com/support/toolbox/strace-parser

We've been eagerly awaiting some customers to adopt newer kernels so we can start leveraging eBPF because of the performance gains in these type of scenarios.

Getting down the the kernel often can help find problems with disk access or network issues.

One of our staff engineers is exploring it now for NFS stats: https://gitlab.com/wchandler/tracing-tools/-/blob/master/nfs...

In Support Engineering we often straddle the line of 'SRE style stare at graphs and configuration as code' and 'log on to the box and look at syscalls'. We are very very excited about eBPF.


Take a look at Pixie dynamic logging [1].

Deploys eBPF kprobes (based on bpftrace) and uprobe (based a custom front-end language) and instantly get rich data (arts, return value, latency), query data in a Pandas-like scripting language and visual dashboard.

[1] https://docs.pixielabs.ai/using-pixie/code-tracing/


A few things:

* Non intrusive: meaning one can snoop info of application without changing application code.

* Deep visibility: function level and syscall/kernel functions reveal more context and are more accurate in a lot of cases.

* Low overhead: everything runs inside kernel space, no context switching compared to other kernel based/aided tracing.

* Expressiveness: eBPF is fairly expressive, can do many things that usually are exclusive to high level programming languages.


Tracing from user to kernel back to userspace in one single pane of glass. Have you never heard anyone hype about DTrace on Solaris? This allows building similar things on production systems with little to no impact on running production applications.


dtrace has also been on macOS since 10.5, with a pretty nice GUI app as well. I’ve used it to trace ruby and python code to isolate slow API requests from the end user’s perspective.


That sounds really interesting, do you have any references to share about macOS gui dtrace?


The app is called Instruments. Check it out at /Developer/Applications/Instruments

https://developer.apple.com/library/archive/documentation/An...


hadn't thought of this like DTrace. Thx for that!


Anytime! eBPF is Linux’s equivalent of DTrace


Strictly speaking, one of the uses of eBPF is a DTrace-like tracing tool but it's also used in quite a few other places (for instance it's also used for devices cgroup policy in cgroupv2).


And filtering syscalls for seccomp or literally packet filtering. I believe Jens Axboe and team were looking at using eBPF for some of the low level IO subsystems.

But I was explaining the most common and obvious (for a user) use of eBPF.

They should just look at the io visor project to see some of the stuff that can be done with it (disclaimer, I work with one of the io visor maintainers)


The kernel has a "god's eye view" of everything happening on its OS. With event based tracing in the kernel there's no chance of missing an occurrence because your sampling rate is too low, for example. You can also correlate and enrich data that just isn't available in userspace.


It's extremely quick to do it there as well. Originally, bpf was created to replace tcpdump with a quicker less impactful alternative. People saw that it was a pretty neat alternative so they started extending it (the e in eBPF). Think they might have finally come round again and just call it BPF now again.


For hardware management maybe but so much of "observability" in general is at the application layer I can't see BPF displacing anything more than a tiny corner of it.

On the other hand - it already is overhauling service meshes, VPNs and firewalls and network security policies, etc. The stuff fly.io is debuting now is probably going to be standard in few years.


Strongly disagree. BPF uprobes allow extremely fine grained tracing of userspace applications, and allow you to programmatically correlate them with kernel level information.


Sure, but "fine-grained tracing" is itself such a small part of observability even if BPF takes over that entire part of the stack it's still nowhere near a complete observability story.

This is outside my area of expertise so maybe I'm just missing some deeper insight - and I can see where you're coming from for traditional "throw the artifact over the wall, good luck running it" system operations kind of stuff. Or debugging services in production, which tracing is a key part of - but again just a small part of observability, and one that the rest of your dev process should be actively trying to minimize. If you have any degree of DevOps going on, many key SLIs will much higher-level (p99 of HTTP requests, MB of storage per customer), and I don't see how eBPF addresses that better than existing instrumentation, or in some cases at all.


Sure but it's also terribly expensive. Are you trying to give up 5-10% of your CPU time to observability?


Gladly, if it doesn't influence latency much and it is presented intelligently in terms of my high level code.


Most CPUs are not hitting 100 percent anyways so what's a 10 percent loss?


This adds to some interestsing acquisition activity in this area: Splunk, for example, just acquired Flowmill (also focusing on eBPF + observability).

https://techcrunch.com/2020/11/24/splunk-acquires-network-ob...

[blatantly promoting my substack] Been following this area about a year, first wrote about some of startups using eBPF here in late 2019: https://monitoring2.substack.com/p/ebpf-a-new-bff-for-observ...


I know what eBPF is, but what is Pixie? A single sentence describing Pixie wouid be useful.

In general, academic rule applies: Never write an article without defining terms beforehand.


Engineer from Pixie here. Pixie is an APM tool for Kubernetes leveraging eBPF for automatic data collection. You can use it to monitor and debug your application performance, without code or configuration changes.


You're being too modest. Pixie is a general purpose observability and monitoring tool, with especially strong attributes around APM.


APM? /s


/s noted (but point also taken). For those unfamiliar: APM = application performance monitoring. Essentially, figuring out why your application is slow or broken.


Couldn't this just be called 'monitoring'?


New Relic calls it something different so they can charge a lot more for it.



Any plans to bring this to Cloud Foundry?


I was not familiar with this project. In looking at the architecture[1] documentation it indicates it uses the Pixie Cloud. Is is possible to use this without the Pixie cloud?

[1] https://docs.pixielabs.ai/about-pixie/how-pixie-works/


Co-founder/CEO of Pixie here.

In the current form Pixie Cloud is required since it's a freemium product.

We are planning to open-source a self-managed version of Pixie early next year, and it can be used completely self-hosted.


Thanks, I will be looking forward to this announcement as the current requirement would prevent me from using it. Cheers.


How helpful is this to some/any of hybrid/multi-cloud and serverless? Analysing most of the telemetry at the edge seems like it could be helpful there. Useful for instrumenting low-latency edge applications?


seems ambitious. why would a public company like new relic do this?


Co-Founder/CPO of Pixie here.

This definitely is an unprecedented and forward looking investment by New Relic. While ambitious, it became evident in our conversations with them they we committed on standardizing on open source telemetry standards such as Prometheus, Open-Telemetry, Graphana etc.

We're not yet in a position to speak for them in detail but we believe this bet on our project reinforces their plan to open-source telemetry layer to accelerate the adoption of observability practices by developers.

Hope that adds some color! would be great to hear more thoughts.


How does Pixie technology compare to Sysdig? Do they do similar things?


The underlying data collection approach is similar however, our focus in application performance monitoring for developers and Sysdig's focus is on container level security & monitoring for devops,devsecops (@Sysdig folks please correct me if I am wrong :) )

Sysdig was a pioneer in harvesting data from the kernel. Their original solution required installing a kernel module and they are now moving to eBPF based approaches. The Falco project is really exciting.

Since we're a relatively new project (started 2 years ago) we started with eBPF and built our platform around it. As we open source we'll share with groups like Falco and hopefully collaborate.


At a guess, to commoditize their complement.


Don’t see the source code in GitHub. When do you plan to open it up? Interested in digging into your engine code.


Co-founder/CEO of Pixie here.

We are preparing to open source early next year, so stay tuned. The code will be posted on our repository, that currently hosts our community scripts: https://github.com/pixie-labs/pixie.


Pixie scripts FTW.


What does signing a definitive agreement mean?


Pixie agreed to be acquired by NewRelic


Yea, the deal is not yet closed, but assuming no regulatory hold up (think new relic is public, so if pixie is worth enough, they'll need to go through HSR) or similar stuff, they agree that the acquisition will go through.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: