Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This sounds like a BGA issue. They also had the on PS3/4 and XBox if I'm not mistaking.

The chip has a grid of small solder balls on the bottom instead of pins sticking out. Due to thermal differences during operation some rows can experience mechanical stress due to uneven heating of the device. When there are cracks the contact disconnects from the board. Some images here:

http://www.playbackups.com/Playstation3-xbox-360-repair-repa...

When it's cooled again the contacts join, placing it in the oven applies an even thermal load across the entire board and you basically anneal the cracks.



Yes exactly.

This happens all too often unfortunately. It's why you didn't see anything BGA packaged in the defence industry for a number of years -- they are not mechanically stable. I did have a reference for this but I can't find it now.

Also the multi-layer boards tend to bend when you repetitively heat/cool them resulting in the actual metal traces cracking inside.

Sometimes there's enough contact after this oven cycle for it to reconnect BGA packages and board traces semi-reliably but like hell I'd rely on this method for long-term stability.

I did a spell post-university reworking things that pick and place machines had screwed up and it was pretty much entirely packages like BGAs where there were arrays of solder connections. The production guys were always returning prototype devices due to mechanical problems on the boards as well and they were coming back with socketed LGAs and soldered PGAs.


> This happens all too often unfortunately. It's why you didn't see anything BGA packaged in the defence industry for a number of years -- they are not mechanically stable. I did have a reference for this but I can't find it now.

There is also a problem with rework. BGAs are hard to get off the board without destroying the board in the process, especially on multi-layer boards. For cheap commercial boards where the automatic decision is to scrap the board when the chip fails, that's fine. For $10k+ circuit card assemblies on a low volume defense production line, scrapping the board is a last resort. This applies to production defects as well as field returns.

> Also the multi-layer boards tend to bend when you repetitively heat/cool them resulting in the actual metal traces cracking inside.

Another problem is delamination (separation of the board layers). Delamination allows contaminants to get in on the traces and possibly start shorting things out. It seriously degrades the reliability of the board. That was the biggest problem for us when trying to rework CCAs. We had no BGAs, but we did have a card that used a few parts with thermal pads on the bottom. It took heat from both sides of the board to get the chip off, and it was very easy to apply too much heat and delaminate the board in the process.


The rework gear is also a pain. Definitely the opposite of hackable.


For anyone who doesn't know what BGA stands for, it is Ball Grid Array.

http://en.wikipedia.org/wiki/Ball_grid_array


And from a manufacturing/engineering point-of-view (which I am neither), I guess it's really hard to solve this problem.

You have some unpredictable 1/100 or 1/1000 defect that occurs long after production and sale.

Just how do you go about isolating the cause, and testing a solution? Make 5 changes, and put through a production batch of 1000 units, and then do accelerated testing? If 5 fail from one batch, and 2 from the rest, is there even enough statistical power to confirm that you've come across a solution? And you just burned through 5000 units.

Sounds like fun trying to solve this kind of problem.


There are PCB design/layout rules that deal with BGA. I'm not saying it's a 100% guarantee, but (much like EMC/EMI design rules) there are a lot of solid pointers that remove 90% of the issues. The remaining 10% are (again, much like EMC/EMI) subject to the layouter's level of experience.

Currently on mobile, can't link a PDF right now but if you Google " BGA PCB layout guidelines" you'll get a ton of documents.

Lastly: PCBs go through several optimization cycles, some occur after release for high volume stuff. There are always revision numbers of the silkscreen, sometimes they catch an issue like this after x1000 devices in the wild and do an update.


To add to what jmpe says:

In production you would profile the boards. You take a board and run it through the oven with some thermocouples. You'd then set the temperatures of the pre-heat, heat, and cool down sections. This would heat the solder to melting point without putting too much stress on the components.

This is from memory from a long time ago using a teeny tiny little pick and place machine that did a few thousand components per hour.

BGAs were always always horrible to do.

"Design for production" is really very important and it's hard to find much information about it. Some simple little things can make the difference between an operator having to plonk a component on the board by hand every time just before it goes into the oven or having the machine do it. (Again, from memory).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: