If it's just serving files you don't necessarily need DPDK/XDP. For server-grade hardware there now is P2P-DMA and TLS accelerators which can offload everything to peripherials while still using normal socket APIs. You get NVMe -(PCIe)-> crypto accelerator -(PCIe)-> ethernet for the bulk of the data.
Neither CPU nor main memory see any of the network packets as long as they stay on the happy path. Only connection setup, DMA orchestration and occasional TLS renegotiation have to be handled.
I know Chelsio has crypto directly on the NIC, but are dedicated crypto accelerator cards a thing and are they ever worth it? Why leave the CPU idle when the CPU itself is a good crypto accelerator (AES-NI, ARMv8 crypto)?
AMD Ryzen has a built-in crypto "decelerator" — a FreeBSD driver was written for the crypto engine, but it's disabled by default because it made everything slower than AES-NI. (Though I guess it would be funny to use it to mine bitcoin, since it supports SHA256. AMD — Advanced Mining Devices!)
Intel has a product line called QAT ("QuickAssisT"?) that does crypto acceleration, as well as compression. I don't know how performant it is. There are definitely several older crypto accelerators that were faster than CPUs of the time; I don't know if any of them (outside of QAT) is still relevant.
The AMD Zen1 Crypto Co-Processor is indeed slower than AESNI; I think it's mostly used by stuff like SecureBoot, TPM, etc, and also used internally by the CPU to generate RDRAND/RDSEED data. It was probably never intended to be used by OS drivers and certainly not intended to be any kind of accelerator.
The part I know of that is built into the CPU is a DMA engine called I/OAT; it just does DMA and maybe basic checksum and RAID transformations. It is sometimes confused with QAT (I've personally confused the two...):
The northbridge. my understanding is that they no longer sell the discrete cards to perform these tasks, and instead offload it to chips that come on the boards.
From what I recall, the chelsio cards only support a mode of encryption suitable for storage devices, and it's not something you'd use for streaming media.
No, not true. The Chelsio card support GCM and CBC crypto in lookaside (like QAT) using the ccr(4) OCF driver, inline ("NIC TLS") with out - of - tree patches, and TLS offload in TOE mode.
Now that ktls is upstream, we are looking at using the ccr crypto acceleration. We've already tested them in inline mode. TOE is not an option for us, since we do innovation in the TCP stack.
In the work we've done, the TLS crypto of bulk data is handled entirely in the CPU by the kernel via ktls. Please see the slides, specifically the data flow diagrams.
I think he's talking about the general case (and since XDP is in the discussion, probably the general case for Linux) and not trying to speak specifically to what Netflix is doing for their CDN.
Neither CPU nor main memory see any of the network packets as long as they stay on the happy path. Only connection setup, DMA orchestration and occasional TLS renegotiation have to be handled.