NMInet

Social Networking for the Microelectronics Industry

At the Design Automation and Test in Europe conference held earlier this year in Dresden, Professor Mark Horowitz of Stanford University talked about his take on system design in an era when “lots of markets are drying up because the cost of innovation is so high”.


Working on the basis that writing in Verilog is not productive enough, he decided to look at higher levels of abstraction and to the idea of a “chip generator”, even though this sounds disturbingly like the silicon compiler of the 1990s.


“Didn’t people already do that and wasn’t it a large flop?” he asked, rhetorically.


“I don’t believe you compile a design. You are translating designer knowledge,” Horowitz argued, adding that it should be possible to have not something that generates digital hardware directly but work on a parameterisable architecture.


“You can take a multiprocessor architecture and build a generator for it. Ours happens to be based around Tensilica,” said Horowitz.


The first test for the experimental environment was a H.264 encoder. “We built SIMD engines and tailored them for the application. “Initially, it started off 500 times slower than real time,” he said, even though that was a bit better than a pure software implementation. Not only that, energy consumption is not good in a software implementation because of the way that the data is moving around all the time.


Horowitz argued, ideally, a typical 16bit computation should take “a fraction of a picojoule” on a device built using the 90nm process – and silicon has seen two generations of improvement since then.


“But the lowest a processor will operate is on the order of 80pJ. There is a tremendous opportunity between overhead and operation.”


“You have to do operating merging which is exactly what you do in an ASIC,” he explained. This led to the creation of specialised execution units that consumed around 200,000 gates sitting alongside 40,000-gate processors. The units were larger but provided a massive speed boost and lower overall energy consumption, although it was still less efficient than a custom-designed ASIC.


However, Horowitz argued that it should be easier to build validation environments that test the final design because you are steadily refining an initial, entirely programmable architecture. By reworking key parts, based on whether they are bottlenecks or optimisation can provide substantial power savings, it should be easier not only to get to a target performance and power point but also test it.


“Things have to be squishy,” Horowitz argued.


The question is whether you wind up with a similar effect to what has happened in software. The compilers start off being unable to match hand optimisation. But, slowly, for most situations the compilers wind up being able to perform more complex optimisations because they can try out more things than assembly writers. The hand optimisation then goes into the parts of the code that really need the extra help, and which are much smaller and more manageable.


Posted by Chris Edwards


The Low-Power Design Blog is sponsored by Mentor Graphics. The company has focused years of R&D on low-power design techniques and is glad to support a resource that highlights creative methods for reducing the power consumption of electronic systems.

Views: 0

Tags: Horowitz, Mark, Tensilica, design, low, multiprocessor, power

Add a Comment

You need to be a member of NMInet to add comments!

Join NMInet

© 2012   Created by John Moor.   Powered by .

Badges  |  Report an Issue  |  Terms of Service