[lowrisc-dev] Open GPU for the first CPU
lowrisc-dev at aggregator.eu
Sun Feb 8 19:44:11 GMT 2015
I was thinking along the (GPU) lines you do a few weeks now. One
question is whether RISC-V ISA is the right ISA for a GPU (I wish so)
and whether some community of developers could be established to take
care of development of a GPU. Something along Rocket cores line -
personally, I was thinking about gradual rewrite of MESA 3d library into
hardware, accelerating parts of it until it's a full-fledged GPU-V
Looking at opencores.org under Video controller section, I can only find
Wishbone graphics controller and VGA, nothing about 3d.
Wouldn't it be great if there was a scalable GPU-V generator along the
lines of Rocket? A generator that could generate GPU from input
variables like number of blocks, ALUS, etc. Again, a spec would be
needed for the GPU so the OpenCL and OpenGL would work on any generated
With regards to moving data in a massive-register system, unless there
is a spec for the "GPU ISA", we can't start to tinker with it albeit in
a software simulator, because there's no benchmark code available. Same
as with benchmarking Rocket cores.
What I was personally thinking what to start with (with the Mesa lib)
would be some rasterizer-glued accelerator of a VGA GPU to perform
triangle rendering with antialiasing at high speed, offloading the CPU
in Mesa library. Maybe tile-based. Step by step... With HBM or HMC
memory, the bandwidth to off-chip memory can be there. And I just
imagine a SoC with multiple high bandwidth interfaces having both GPU
and CPU part. Something like i.e. Raspberry Pi (now version 2 :) on
On Tue, Jan 13, 2015 at 11:11:55PM +0100, Reinoud Zandijk wrote:
> Hi Jerome,
> On Tue, Jan 13, 2015 at 04:17:07PM -0500, Jerome Glisse wrote:
> > > What about https://github.com/VerticalResearchGroup/miaow ?
> > The 3 clauses BSD license is horrible, i wonder when people will
> > it. That said last time i checked that they only implemented very basic
> > part. It was far away from being usefull or meaningfull.
> I won't start a BSD license vs GPL discussion here :) Apart from that, it
> BSD license without the advertisement clause are fine. BSD license with the advertisement clause is a pain to deal with, anyone who ever had to work on software distribution can testify on that (especialy if lawyer were involve). Most of time solution end up being not shipping the software and replacing it with something else.
>> GPL did not make that mistake, and almost all major BSD licensed project do use the 2 clauses license (ie one without the advertisement clause). I am certainly not a BSD/GPL flamer. But i will definitly cry out loud to any body who use or consider using the 3 clauses license or any license with an advertisement clause.
> Dne 8.2.2015 20:01, lowrisc-dev-request at lists.lowrisc.org napsal(a):
> GPU are all about bandwidth, aligning compute unit one after the other is a pointless exercise. It is all about feeding compute unit with data to crunch. The "secret" of GPU is to have an order of magnitude more threads in flight than there is compute unit (10 times more on high GPU is a good approximation). Idea is that you will always have thread that are ready to perfom an operation on the floating point or integer ALU.
> Having all the executions unit go to the same program is a mistake. On GPU you often have several programs in flight (in case of graphics some work on vertices others on pixels ...). Also the stack size you need to keep around to account for active/inactive thread is log2(#unit_same_program). So far both AMD and NVidia seems to have converge on 64 compute unit (each unit here being a simple
> ALU capable of perfoming a single float or integer operation per cycle). With 64 threads you only need 6 qword stacks ie 48bytes.
> Intel did try to do just as you said and it turn out to be one of there biggest flop (larrabee disaster).
> So if anyone wants to design a GPU, the main thing is first figuring out how to get the biggest bandwidth you can at all level (memory fetch from main memory, or register file access). The compute unit themself or the instruction scheduler are not the most complex part, they are in fact the easy part as all tricks you can do to perform arithmetic operations or things like instruction cache and decoding are well know and well documented. But how to design a register file capable of delivering 1024bits or more per cycle is hard. Or a texturing unit
> capable of filtering a texel in cycle and batching main memory access to maximize bandwidth and minimize cache miss is the hard part.
> That is where most of the secret sauce is and you will not find many things in the litterature about those aspect.
>> With regards,
More information about the lowrisc-dev