Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
FPGA and Xeon combined in one socket (theregister.co.uk)
89 points by jsnell on June 19, 2014 | hide | past | favorite | 72 comments


The barrier to adoption of FPGAs is not so much the hardware issue as the toolchain issue. The toolchains are closed, frequently windows-only, slow, and not friendly to newbies.

(Not that shader language or CUDA is that accessible, but you can play with shaders in a browser now. The time to "hello world" or equivalent isn't too bad)


I argue the barrier to FPGA adoption is the actual use-case, not the tools. People who need them can use the tools just fine. FPGA designers are not idiots. They do not walk around scratching their heads wondering, "how come nobody is using our products? If only they were easier to use!" There is a new open-source attempt at replacing imcumbent tools bi-annually begun and dropped.

The use cases for FPGAs are a much harder impediment to adoption. Many people get "FPGA boners" when they even hear the word, fancying themselves "chip designers," but practical use cases are much rarer. As evidence, notice they predominate in the military world, where budget is less of an issue than the commercial world.

The technical issue with FPGAs is that they are still one level abstracted from any CPU. They are only valuable in problems where some algorithm or task can be done with specific logic more quickly than the CPU, given 1. the performance hit of reduced real estate and 2. reduced clock speed relative to a CPU and 3. more money than a CPU.

Further diminishing their value is that any function important enough to require an FPGA can more economically get absorbed into the nearest silicon. For example, consider the serial/deserial coding of audio/video codecs. That used to be done in FGPAs, but got moved into a standard bus (SPI) and moved into codecs and CPUs.

Because of this rarity, experienced engineers know that when an FPGA is introduced to the problem in practical reality, it's a temporary solution (most often to make time-to-market). This confers a degree of honor which is why people get so emotionally-aroused about FPGAs.

You can bet though, that if whatever search function Microsoft is running on those FPGAs proves to be useful, it will be soon absorbed into a more economical form, such as an ASIC, or, more likely, additional instructions to the CPU.

Really, installs on 1,600 servers such as this article reports, is not that impressive and certainly only a prototypical rollout.


Ok, I'll take the counter argument.

FPGAs promise the designer 'arbitrary logic' and deliver 'a place others sell into.'

I disagree that FPGA experts "like" the tools they are given, they tolerate them. One of my friends worked at Xilinx for 15 years and understood this all too well. He felt the leading cause of the problem was that the tools group was a P&L center, they needed to turn a profit in order to exist. They got that profit by charging high prices for the tools and high prices for support. His argument was that 'easier' tools cut into support revenue. When I've had high level (E-level, but not C-level) discussions with Xilinx and Altera there has been a lot of acknowledgement about the 'difficulty of getting up to speed' on the tool chain and many free hours of consulting are offered. From a business engagement point of view, making hard to use tools and then "giving away" thousands of dollars of free consulting to the customer to gain their support seems to work well. The customer feels supported, and stops wondering why if the have consultants around for free those consultants wouldn't just make the tools more straight forward to use and available on a wider variety of platforms.

But the biggest thing has always been intellectual property. You buy an STM32F4 and it has an Ethernet Mac on it (using Synopsis IP as evidenced by the note in the documentation), you pay $8 for the microprocessor, work around the bugs, and get it running. If you buy an FPGA, lets say a Spartan 3E, you pay $18 for the chip, and if you want to use that Synopsis Ethernet MAC?[1] $25,000 for the HDL source to add to our project $10,000 if you are ok with just the EDIF output which can be fed into a place-and-route back end. Oh and some royalty if you ship it on a product you are selling.

The various places that have been accumulating 'open' IP such as Open Cores (http://opencores.org/)have been really helpful for this but it really needs a different pricing model I suspect. A lot of HDL is where OS source was back at the turn of the century (locked down and expensive).

[1] I did this particular exercise in 2005 when I was designing a network attached memory device (https://www.google.com/patents/US20060218362) and was appalled at the extortionate pricing.


> From a business engagement point of view, making hard to use tools and then "giving away" thousands of dollars of free consulting to the customer to gain their support seems to work well.

To me, the entire recent history of computer industry (well, all of it is recent BTW) shows that, if you want your technology to become mass-adopted, you need to make it easier for the little guy to get in the game. The high school kid tinkering with stuff in the parents' basement; the proverbial starving student. That's how x86 crushed RISC; that's how Linux became prominent; that's how Arduino became the most popular micro-con platform (despite more clever things being available).

You make the learning curve nice and gentle, and you draw into your ranks all the unwashed masses out there. In time, out of those ranks the next tech leaders will emerge.


I don't disagree, and I suggested as much to the Xilinx folks (well their EVP of marketing at the time) that if they just added $0.25 to the price per chip they could fund the entire tools effort with that 'tax' and since they would be 'giving away' the tools they could re-task all of the compliance guys who were insuring that licenses worked or didn't work into building useful features.

Their counter is of course that they have customers who sweat the $0.25 difference in price. (which I understand but $10,000 in tools and $15,000 in consulting a year is a hundred thousand chips. Which they say "oh at that volume we would wave the tooling cost." And that got me back to your point of "You already have their design win, why give them free tools? Why not give free tools who have yet to commit to your architecture?"

It is a very frustrating conversation to have.


What's your opinion on the new xilinx c-base design tools, and altera's opencl tools for doing compute acceleration ?


I haven't used either of them. I played around with Systems C a bit when it was the rage but found that my issues weren't in optimizing some bit of C code with a better opcode rather it was assembling a system with the peripherals I wanted in the places I wanted them.

For a long time I considered soft CPUs a bad idea (the Stretch guys kept trying to sell me on them but since I wasn't really doing things like deep packet inspection I didn't have a good use case, even RAID algorithms on them were better handled by pretty generic DSP type architectures.) However in playing with the Zedboard which has a couple of Cortex A9's attached to the Xilinx fabric I find some interesting things there. If only as a new kind of 'i/o' but that is neither I/O port based nor memory map based (it expresses as memory but it feels different than the memory mapping of old like on the PDP/VAX machines and 68K systems). Could just be nostalgia though.


In video industry FPGA based devices are used relatively wide. Multiplexers, encoders, satellite and cable devices etc. And they don't do "everything", usually there is a regular purpose cpu - power, x86, arm etc. as a controller, which also hosts OS and all software and lots of specialized FPGAs. In one device there can be one low power dual core cpu (celeron, old ppc) and tens of top level monster FPGAs like Stratix V.

Reconfiguration is also very much needed - customers often experience network and different specific problems that are only fixable in hardware. So we need configurable packet processors (basically top level router in a chip), configurable processing fpga, lots of small fpga etc. And bug fix for one customer is then gradually distributed to everyone. And devices lifespan is long - many customers use 5 or even 10 year old hardware. Performance is enough for them and new functionality is provided for them, partially in way of new fpga firmware.

PS: some devices, especially cable or satellite, are sold in thousands by several companies, so its not like unique hand-made hardware.

PPS: of course any costs to buy toolchain or windows pc in such companies doesn't matter really. Finding talented fpga designers is way harder as far as I understand.


And, according to the research paper (http://research.microsoft.com/pubs/212001/Catapult_ISCA_2014...) the benefits aren't especially convincing.

- 95% throughput increase at same tail latency, - 29% tail latency improvement at same throughput

I would have expected factors or orders of magnitude.


Thanks. Interesting paper.

Further ,this increases total cost of ownership by 30%, So the performance improvement is just about 70%.


A big advantage that FPGAs have over ASICs, aside from time-to-market and one-time (fab) costs, is reconfigurability. Take a few seconds to stream in a new bitfile and suddenly you have a different chip. This seems like an obvious need in something like a datacenter full of FPGAs: unless you've purpose-built the thing for one or a small set of algorithms that never change, you want the ability to deploy a completely new logic design tomorrow or next week or next year.


What type of logic are you going to be changing every other week in the Datacenter?

What chillingeffect is saying (and I agree) is that this logic should be running on the cpu.


Actual chip design needs FPGAs for hardware emulation prior to fabrication. Yes you'll change the logic out on a weekly, daily, or even hourly basis.


Yes and no. A small ASIC can be prototyped on an FPGA, but FPGA are radically smaller and slower than high end chips.


Well yes, of course, but that's a case where (as mentioned above) the FPGA is a temporary for a final chip.


Except that the MS Catapult results seem to suggest that you can get a substantial timing bump without much power overhead by using a second network of FPGAS running concurrently with the CPUS.

And in the case of Catapult, they'll be refactoring the algorithm to represent changes to their search feature matching, which is the majority of what they offloaded to FPGAs.


Yes, the Bing example seems to give real-world validation of all this.

In general it seems that hardware vs. software fit is orthogonal to concerns such as frequent reconfiguration. Some algorithms simply match hardware well (high concurrency, low-complexity control flow, able to stream through data without complex state). These are often algorithms that do not fit general-purpose CPUs well (cache hierarchy wasted on streaming; lots of control overhead; low core counts relative to FPGA-level parallelism). Some of these algorithms may be for specialized and/or frequently-changing applications such that they should not be burned into an ASIC that will live in a datacenter for 3-5 years.


> any function important enough to require an FPGA can more economically get absorbed into the nearest silicon. For example, consider the serial/deserial coding of audio/video codecs. That used to be done in FGPAs, but got moved into a standard bus (SPI) and moved into codecs and CPUs.

But isn't that because, back then, CPUs simply weren't fast enough for decent video coding? And they still aren't blazing fast for that purpose - compared to GPUs (see GPU-enabled video encoders).

I think the main argument against FPGAs is that it's still a chore to re-purpose them. Sure, it's not as painful as making a new chip from sand, but it's harder than applying code changes to software running in production on CPUs.

If it's a relatively simple algorithm, that changes rarely, where speed is the main bottleneck, that's a very promising scenario for an FPGA.


FPGA in the datacenter may be a red herring; a relatively small percentage of digital circuits are computers.

FPGA is widely used in control circuits for industrial uses, for example.


It is instructive to note where FPGAs are used (and yes, they are used). For example, lower-priced oscilloscopes. Too small volume to merit an ASIC, too high performance to use a CPU, and too fast to interface directly to the central DSP.


FPGAs really only make sense when you need to move a metric assload of data around in a hurry while simultaneously running "embarassingly parallel" logic... and even then, they are rarely the most economical approach in the long run. As a rule they are replaced by ASICs in specialized high-volume applications and CPUs in less specialized ones.

There's nothing better for prototyping and proof-of-concept work, though. And they'll always have a place in low-volume applications that aren't cost-sensitive.


They are also replacing TTL circuits in small volume command and control applications. Circuits with dozens or hundreds of 7400 series ICs wired together in Byzantine ways.


Actually the build chain for both Xilinx and Altera works quite well in Linux, in fact I have heard that some consider it superior based on memory management issues.

Modelsim is available on Linux as well, but only their more expensive SE product. They charge a premium for Linux platform.


My experience (http://www.cl.cam.ac.uk/research/security/ctsrd/cheri.html, http://netfpga.org/10G_specs.html) is that both of the toolchains are terrible on both of the platforms.

Xilinx and Altera keep trying to convince us that they are software vendors, when in actual fact they are hardware vendors. Rarely is a company good at both.


That the toolchains run on Linux at all at least prevents one from being forced to use Windows, which is a plus, but yes, the state of FPGA tools is pretty horrible.

It reminds me of the days of proprietary vendor compilers where every platform vendor had their own subtly incompatible and/or differently buggy C or C++ compiler. In this way the FPGA development model is a decade or two behind software workflows.


I've explored the Xilinx tools extensively during graduate school. It was by far the most complicated thing I've ever had the misfortune of using. I remember there being far too many tools and features that overlap in function and purpose to that point that none of the tools were great.


I guess I was mainly trying to say that the Linux version was similar to the Windows version, it's not Windows-only as the previous comment indicated. But right, your other comments are all valid.


Which tools are Windows only? I have developed for FPGAs from all major vendors exclusively on Linux.

But closed source expensive tools are a problem.


And expensive hardware. Last time I ran the back-of-the-envelope numbers (admittedly 2-3 years ago) they weren't competitive with GPUs on price/flops and price/iops. Three caveats:

1. I used DigiKey prices from the largest bulk tier. Presumably if you're actually ordering that much you can get them right from the supplier for cheaper.

2. Latency! You can make it almost arbitrarily low on FPGAs.

3. Direct interface with the hardware.

I'd love to see a real cost/benefit analysis by someone with skin in the game because mine was pretty simplistic.


> Last time I ran the back-of-the-envelope numbers (admittedly 2-3 years ago) they weren't competitive with GPUs on price/flops and price/iops.

To me, that sounds more like the problem was not a good fit for FPGA hardware.

I'm just speculating, though.


Yes, that's exactly what I'm saying. FPGAs simply weren't competitive for applications that were limited by sheer floating-point/integer throughput. It's not that they weren't good for a single very narrow application, they weren't good for a very broad swath of compute-intensive applications. That explains the low adoption rates.


I saw Altera's open cl devkit[0] for the fpga recently. That would smooth out the toolchain issue a bit.

[0]http://www.anandtech.com/show/7334/a-look-at-alteras-opencl-...


This gets even bigger if they throw their IP muscle behind it like they do with ICC. If you can get (for pay or free) fast matrix multiply, FFT, crypto, etc cores for the FPGA you will see even faster adoption.

If they're clever enough to make some of those IP cores available to say MATLAB adoption will be faster still.

Nothing sells hardware easier than "do no extra work but spend another couple of grand and see your application speed up significantly"


Can you elaborate on what you're trying to say?

MATLAB already have MATLAB->HDL, which works very well. We have a team that uses it exclusively for FPGA programming.


MATLAB will recognize if you've got FFTW or ATLAS or other highly tuned numerical libraries installed. And MATLAB will then use them whenever possible.

If Intel does a good enough job of providing a collection of compute kernels and the surrounding CPU libraries to make using them roughly as "easy" as CUDA then a lot of people will pick that up.

I don't have any hard numbers but I would suspect that there are a great many more people who use MATLAB on a CPU than those who do MATLAB->HDL. So what I'm speculating about is that Intel might support those folks who use MATLAB on a CPU for more general purpose things.

Does that make more sense?


And those libraries will likely be OpenCL-based and nicely portable to Intel's Xeon Phi options.


"Intel reveals its FrankenChip ARM killer: one FPGA and one Xeon IN ONE SOCKET

Scattered reports of maniacal cackling amid driving rain and lightning at Chipzilla's lab"

Is this just a Register thing, or do all UK rags use this kind of unprofessional hyperbole? It's literally the most annoying thing in the world.


The Register is 50% satire and 50% tech news. Don't take it seriously.


This is typical UK, and for me what makes reading UK computer magazines like the old Computer Shopper so interesting.

There is always some kind of British humour on the articles.


This is their competition: http://www.xilinx.com/products/silicon-devices/soc/zynq-7000...

Zynq has been out and working in industry for a couple years now.


I don't know how much competition they're giving Xilinx, but Altera is doing the same thing with the same high performance ARM core: http://www.altera.com/devices/processor/soc-fpga/overview/pr... As the fine article notes, Intel is now doing some fabbing for Altera.

I've gotten the impression that putting a general purpose CPU in the corner of an FPGA was a pretty standard thing.

One of the things that should differentiate this new effort from Intel is FPGA "direct access to the Xeon cache hierachy and system memory" per "general manager of Intel's data center group, Diane Bryant".


Direct cache access sounds cool, but I'm sort of under the impression that Altera and Intel are playing catchup with the FPGA SoC idea. I believe Xilinx was the first mover by a large margin, though I could be wrong. That doesn't mean Xilinx's Zynq will always be the best product, but it is already for sale right now, is an established product, and works well.


FPGAs with CPU cores have been around for over a decade. The difference is that in the past, the industry has mostly focused on the PowerPC core. Now that ARM has such tremendous popularity, it makes sense to focus on it.


It made sense for Xilinx to use PowerPC in the past, the chips were being fabbed by IBM.


I have always wanted to learn Verilog. However, I find it quite different from the typical programming language such as c or java. What is the best way for someone who has programming experience to learn Verilog?


The first thing to know is that Verilog is not a programming language. It is a hardware description language. This may sound picky, but it fundamentally changes the way you need to think about using the language. With Verilog/VHDL/HDL, you describe a circuit, which requires very different thinking to programming languages where you describe a sequence of instructions.

The other thing to know is that HDL languages are mostly the domain of electrical engineers and hence have suffered a lack of any "computer science" in them. The languages and all of the tools are clunky and reminiscent of 1970/1980's style programming when CS and EE diverged. Hence, do not expect to find decent online tutorials or freeware source code available. It's all locked up and proprietary as with all other EE tools.

The best place is to start with a text book, this one (http://www.amazon.com/Fundamentals-Digital-Logic-Verilog-Des...) is a nice introduction to digital design with examples from Verilog.

Personally I prefer VHDL, and this fantastic introduction (http://www.amazon.com/Circuit-Design-VHDL-Volnei-Pedroni/dp/...)

To make either of these useful, you will need a hardware platform and some tools to play with. The DE1/2 is a reasonably priced entry board with plenty of lights, switches and peripherals to play with at a reasonable cost and is well matched with the text books above.

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=E...


Very true. The lack of implicit sequencing is the hardest thing for people to understand from an imperative programming background. You need a different set of conceptual structures such as the state machine and the pipeline. This is why the various C-to-Verilog tools remain niche, they can translate behaviour but not redesign for you.

I also agree that it's stuck in the 70s, comparable to Fortran 77 or ALGOL. Bundling related signals and functionality together to produce something corresponding to an "object" or a "type" is basically impossible. All sorts of errors that could be caught automatically or discouraged by language design aren't. There's a lot of typing, and necessary duplication of effort. IDEs don't help much.

Heavy unit and system testing is fortunately widespread. Because it's the only way to ensure you end up with something that actually works.

I have a back-burner project to design a more modern language that compiles to Verilog which would make this sort of thing much more accessible.


1. I recommend that a newbie get a DE0 nano board. It's much cheaper than DE1/DE2 and has fun sensors like an accelerometer on it which can lead to pretty cool applications. I designed a quadcopter control system entirely on the DE0, using a NIOS II based Qsys system. The academic price is only $59: https://www.terasic.com.tw/cgi-bin/page/archive.pl?No=593

2. Fun fact: the cover of the Fundamentals of Digital Logic book has Chess on it because the author, Zvonko Vranesic, is not only an father in the FPGA/CAD industry, he is also an International Chess master. Also, he's quite good at ping pong for being 76 :(


Depends what you're doing -- DE0 Nano is nice for integrating into larger projects, but if you want to say implement a little CPU and peripherals, something like the plain 'ol DE0 is only slightly more expensive and has a bunch of buttons/switches/LEDs/7-seg-displays for just-getting-started projects as well as handy IO interfaces like VGA, PS2 mouse/keyboard, etc.


> I designed a quadcopter control system entirely on the DE0, using a NIOS II based Qsys system.

Any room left on silicon for things like computer vision (well, simple stuff, like recognizing a red ball), or is the whole thing pretty much dedicated to flying the quad?

Also, could you share your design?


Also, I wonder what everyone thinks of the LOGI boards:

http://valentfx.com/


> Personally I prefer VHDL

Could you describe why you prefer one over the other?

Perhaps related to your particular application for FPGA?

> and is well matched with the text books above.

Is that likely to be true for the lesser DE0 model too?


Verilog and VHDL are equivalent in the power of expressiveness, one is more or less interchangeable with the other. Both are supported by both major chip vendors (Xilinx and Altera). Although neither vendor supports anywhere near the latest versions of either of these languages which is frustrating.

It may sound crazy, but I prefer VHDL because it is more verbose. The benefit of the verbosity is precision. with VHDL, you must specify exactly what you want, the syntax doesn't allow for ambiguity. With Verilog, you can let the "compiler" infer some things for you, but you need to think really hard if it will infer the right thing for you. The benefit is that you get to type fewer characters. Since I do not pay (or get paid) per character typed, it is far more important to me to type a little bit more, and get exactly the design I want. When designing circuits, one thing you cannot afford to be is lazy. It is all about precision, because if you get it wrong, there is no step through debugger to help you, and often, it will still work in simulation, but fail to work in real hardware, which means you're stuck.

I can't say for certain with the latest version of the textbook, but IIRC, the version I had many years ago was specifically tied to the DE2. There were a range of exercises to go through with specific problem statements etc. IMHO, the DE1 is equivalent enough that you can change a few pin mappings and get the same result. I cannot say anything about the DE0.


There is also the Basys, if you would rather go with Xilinx:

http://www.digilentinc.com/Products/Catalog.cfm?NavPath=2,40...


The best advice I ever got when I started to play with Verilog and FPGAs in undergrad was that one should think of the circuit first, then write the Verilog to describe it. As another poster said, this isn't programming; there are no usual sequential semantics (first compute this, then assign that value) even though code samples may look that way. The tricky (slash insanely cool) thing about HDLs is that they infer a lot of things -- latches, MUXes, ALUs -- out of a high-level description. But the abstraction is leaky, so you need to understand digital logic (state machines, latches, pipelines, ...) and then work up from there.

I guess what I'm really trying to say is, study digital logic first, then imagine the circuit you want to build, then write the Verilog that infers that circuit :-)


This may be huge. People will use the FPGA as they use the GPU now, but FPGA has the potential to greatly reduce the programming complexity associated with GPUs.

In the end, the success will boil down to how easy the development is, and how well designed the libraries will be - if the framwork will be capable to automatically reconfigure the hardware to offload CPU-intensive tasks, this has high tech potential for widespread adoption, not just datacenter-wise.


Can you elaborate on how FPGA has the potential to reduce programming complexity associated with GPUs? I personally think it is harder to program with Verilog than to program using CUDA.


Well it's probably easier to program CUDA for embarassingly parallel tasks, or other tasks well-suited to CUDA, but FPGAs might make certain tasks easier because of their flexibility.


If I recall correctly, Intel tried a few years ago to sell a system-on-chip combining an Atom CPU with a FPGA from Altera. I believe it didn't work very well, especially with regards to communication and synchronization between the two cores.


It didn't work very well, but there is a good reason: Nobody wanted a slow and comparatively low performance chip paired with a small FPGA connected via a (slow) PCI-express connect. There are hundreds of big FPGA boards with PCIe connectors that can be tied to big CPUs already. It was a non-product from the get-go.


Hmm, this might have implications for digital currencies and their mining.


Unlikely. Custom built Application Specific Integrated Circuits (ASICs) (i.e. bitcoin mining chips - e.g http://www.butterflylabs.com/) will always be faster than FPGAs (which are comparatively slow) and CPUs (which are fast, but general).


Except that every other months or so sees the introduction of a new proof-of-work system that won't have ASICs for years, if ever. FPGAs can easily outperform the CPU/GPU competition on any alt-coin using such proof-of-work.


Sold! Seriously. This is what I wanted for the last two decades.


A good time to know Verilog.


Sounds like a bit of a gimmick.

FPGAs are typically used in ASIC development to emulate the ASIC being developed. I've seen boards with 20 FPGAs emulate an ASIC design at <~1/10th of the speed at >>x10 power. While FPGAs are programmable hardware they are far less efficient than custom hardware for various reasons. Naturally ASIC emluation is an application where FPGAs have a very large advantage over software... At volume they're also a lot more expensive and good tools are also very expensive (virtually no mass produced commercial product uses FPGAs). Now obviously if the FPGA is inside the Xeon you're not really paying much more for it (except you lose whatever other function could be crammed in there).

Companies like Microsoft, Facebook, Google have enough servers to make a custom block inside Intel's CPU more attractive than an FPGA in terms of price/power/performance (and they can get that from ARM vendors which is probably scaring Intel).

CPU vendors have spent the last several decades moving more and more applications that used to be in the realm of custom hardware to the realm of software. There are certainly niches of highly parallelizable operations but a lot of general purpose compute is very well served by CPUs (and a lot of it is often memory bandwidth bound, not compute bound). Some of these niches have already been semi-filled through GPUs, special instructions etc.

The FPGA on the Xeon is almost certainly not going to have access to all the same interfaces that either a GPU or the CPU has and is only going to be useful for a relatively narrow range of applications.

I think what's going on here is that as the process size goes down simply cramming more and more cores into the chip makes less and less sense, i.e. things don't scale linearly in general. So the first thing we see is cramming a GPU in there which eventually also doesn't scale (and also isn't really a server thing). Now they basically have extra space and don't really know what to put in it. Also each of the current blocks (GPU, CPU) are so complicated that trying to evolve them is very expensive.

EDIT: Just to explain a little where I'm coming from here. I worked for a startup designing an ASIC where FPGAs were used to validate the ASIC design. I also worked on commercial products that included FPGAs for custom functions where the volume was not high enough to justify an ASIC and the problem couldn't be solved by software. I worked with DSPs, CPUs, various forms of programmable logic, SoCs with lots of different HW blocks etc. over a long long time so I'm trying to share some of my observations... If you think they're absolutely wrong I'd be happy to debate them.

EDIT2: Re-reading what I wrote it may sound like I am saying I am an ASIC designer. I'm not. I'm a software developer who has dabbled in hardware design and has worked in hardware design environments (i.e. the startup I worked for was designing ASICs but I was mostly working on related software).


FPGAs are terrible at emulating ASICs, but CPUs are even worse, yet FPGAs do excel at certain problems that can be expressed as programmable logic that operates in a massively parallel manner.

What if the Intel FPGA did have access to the same resources as a GPU? This isn't inconceivable, it's in the same socket as the CPU.

This gives you the ability to implement specialized algorithms related to compression, encryption, or stream manipulation in a manner that's way more flexible than a GPU can provide, and way more parallel than a CPU can handle.


Because the CPUs are so tightly coupled to their peripherals (e.g. the L1 cache) it is extremely hard to give other blocks on the same chip the same access. I've seen this over and over in SoCs. Something like a video codec, even though it's on the same die, doesn't have a tight interface to the CPU. Even something like a SIMD unit doesn't always have the same interfaces though it's part of the CPU.

Being on the same die is better than being on a separate chip but there are some internal interfaces that rely on placement and latency on the die. E.g. it's unlikely that you can add new instructions to a CPU via FPGA or have the FPGA interact with the L1 cache. It's more likely there will be some sort of shared memory and standard peripheral interface to the FPGA (e.g. interrupts, I/O).

EDIT: So to expand on this the FPGA is expected to have a relative high latency to the CPU and a relative low bandwidth to external resources (e.g. if you compare the CPU interface to L1). It's unlikely that the FPGA will have the same cache hierarchy that a CPU core has (size and performance). So it'll be useful where change is expected, the bottleneck is compute, the task is highly parallelizable, the standard instruction set/other blocks aren't very good at, and going for a pure ASIC solution doesn't make sense (either in a separate block or onboard a customer version of the same chip) either because of price/time/volume.


What if the Intel FPGA did have access to the same resources as a GPU? This isn't inconceivable, it's in the same socket as the CPU.

An FPGA that competes with a modern GPU would probably cost in the neighborhood of US $50,000 per chip.


In a general sense, yes, but not in very narrow problems where the GPU would stumble and flail because of architectural limitations that would prevent it from fully applying itself.


YZF,why can't we start from an optimized FPGA - i.e. small memory blocks spread all around with massive bandwidth and low latency, and find a way to give decent enough access to the cpu to all that memory ?

And yes i know that the cpu will be the bottleneck, but it will be the bottleneck anyway.


I think it boils down to various constraints. If you want high bandwidth low latency you need to be physically close on the chip. Presumably an existing chip is already optimized given those constraints and adding another component in means you need to trade something else off.

The other thing that I've seen which may or may not apply to the Intel case is that complexity in chip design can be managed more easily by having blocks that connect to standard interfaces. I.e. if you look inside the Xeon it probably looks like a bunch of different chips that were thrown onto the same die with some standard interconnects. Most of the optimization effort goes inside those blocks, e.g. inside a single core, and it's a lot more difficult to add an FPGA closer to the core vs. just throwing it somewhere else on the chip. That is the number of engineers in Intel who are intimately familiar with the innards of the x86 core design and are capable of making these sorts of changes is probably much much lower than the number who are capable of throwing in some external "block" onto the die and tie it into a standard bus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: