Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why the shyness about saying the brand's name? If their product and support was subpar to the point of blacklisting the entire vendor, it could be useful to spread the information to A) warn others of potential problems and B) put pressure on the vendor to improve their products.


I'm with you, but somehow a postmortem for our own outage seemed like the wrong place to name-and-shame a vendor...


If I had to guess, it's Broadcom. While their merchant switching ASICs (Trident+, Trident2) have become good enough to displace most custom spun ASICs for 10 Gbps and 40 Gbps switching, their NIC hardware has long been somewhat of a disaster. Interesting to note is that Broadcom has basically sold the NIC business to QLogic: http://www.broadcom.com/press/release.php?id=s832628


Given that context... Yes, it probably would be the wrong time and place to blame other people for a (semi) unrelated issue.


OTOH, almost every vendor has had a bad product at one time, so shaming tends to just give people ammunition for their existing irrational biases.


All vendors have bad products from time to time, but they deal with it differently. I can think of one vendor who covered up bugs in their silicon with undocumented driver hacks in Windows, and stonewalled Linux kernel devs on the nature of the faults, for example.


Storytime? What happened?


Since I don't have deep pockets I shan't name the vendor, but it started out with us taking delivery of a bunch of blades where our blade supplier had changed the NIC from one vendor to another. We discovered we had repeatable problems with J2EE cluster traffic over UDP - we'd see packet loss rates go to absurd levels as we ramped up load on the cluster, leading to a situation where a node dropping out and then rejoining would cuase the cluster to lock up trying to bring the node back into the fold. We could repeat the massive packet loss using UDP test tools. Rather ugly.

Coincidentally we had a visit from the CTO of high-performance storage vendor who happen to have a bunch of kernel hackers on their staff. We mentioned our problem in passing and he explained how they'd nearly lost a major contract because a deployment had moved thei customer from being storage-bound to initially untracable data loss. Digging around by their kernel hackers showed the NIC was losing data. After a certain amount of too-ing-and-fro-ing the vendor moved from denying the problem to admitting that their silicon had a defect that would throw away data under load, and their Windows drivers tried to spackle over the problem. They were relying on no-one being able to drive enough load through the card to cause a problem.

This dovetailed with our experience, and we found that installing a different manufacturer's card in the blade let us work around the problem. Our blade supplier moved to a new NIC vendor subsequently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: