It’s really great to see progress on a clean and open sourced ISA. In theory, such an open source design could be altered and fabricated by anyone. However, if I had such a need I wouldn’t have any idea where to start.
I think that it would be interesting to have insight into the process, problems and solutions that you have experienced in the fabrication process for lowRISC.
Would you be willing to publish this kind of information so that others could follow in your footsteps?
i'd like to propose a novel way of structuring the SoC. I'll first try to put
it into an ASCII art :)
(see http://pastebin.com/Q9Rt8fm0 for a fixed width copy)
lowRISC SoC STRUCTURE
/----------- FIFOs (*) ----------\
Appl.CPU0 <=>|| ||<=> Minion CPU -- (
| || || |
Appl.CPU1 <=>|| ||<=> Minion CPU -- S soft
| || || |
... <=>||<==> DMA <==>||<=> Minion CPU -- H
| || (**) || |
... <=>|| ||<=> Minion CPU -- I hardware
|| || |
|| ||<=> .... -- M )
|| || |
|| ||<=> .... ---
|| || |
|| ||<=> Minion CPU - USB + Ethernet + whatever
|| || |
|| ||<=> Minion CPU (Power, control)
|| || | +-- Flash bootrom
|| || | \-- Power and freq contr.
|| || |
|| ||<=> FPGA interface if wanted
|| || |
L2 CACHE ||<=> Minion CPU (***)
|| | |
TAG CACHE | | (private FIFO)
|| | |
||<==> DMA2 <=========> GPU (***)
DRAM (DDR3, DDR4 or GDDR5)
(*) Each minion has a separate two way FIFO communication channel to the
Application CPUs. Its the Hypervisors task to prevent simultanious access.
(**) DMA channel for each minion, programmable only by the Minion side since
Application CPUs don't know where memory is in the Minion nor know if its
space is free.
(***) Open for debate on where the GPU should be positioned later on but this
is the most logical place IMHO. The Minion can provide basic abstract settings
like mode parameters and can pass commands/code to the GPU.
Each minion has its own *private* SDRAM that holds its code and its data
buffers. The size of this is of course not yet determined.
One minion is the coordinator and controls the power and frequency and all
other internal coordination. It also boots from say an external (serial?)
FlashROM, initialises the other Minions as directed and does general startup.
Other than the coordinator-Minion, the Minions depicted here are not
nessisarily separate cpus but one Minion could serve multiple pieces.
The entire `Minion side' is basicly acting as a HAL to a Hypervisor (or bare
OS) running on the `Application side' CPUs. A bare OS ofcourse is troublesome
with virtualisation as a Hypervisor is better suited.
All communication needed between the Application CPUs themselves are done
using the standard RISCV IPI communication ways trough the Hypervisor as to
not complicate things like virtualisation. Hopefully the new RISCV system docs
will also propose this.
Communication with the HAL, i.e. the Minions, is done by requesting the
Hypervisor to send command blocks to the desired Minion over the designated
FIFO. These can either be waited on or be fire-and-forget and you'll receive
an interrupt when its result is retrievable.
Data transfers are initiated by the Minions using their DMA channel to
read/write data from the main memory at their convenience; the locations and
sizes are given by the caller; continuous filling a given circular buffer by
the minion is of course also possible.
The `Minions' don't have tagged memory and are not tag aware and will write
all tags as the default and/or insecure data; this to ensure that no tricks
can be played with it. They also don't need to have virtual memory support nor
be coherent with anything.
The Application CPUs OTOH have a complete implementation of tagged memory
support, have virtual memory and can be completely OoO and beefed up as much
as wanted. They only need to be coherent with eachother and with the DMA
engine that acts like just-another writing/reading CPU.
The very high speed Application CPU memory bus to the DRAM is very short and
is only connected to the CPUs, the DMA engine and the L2 cache. No need to
distribute it all over the SoC. Each Application CPU frequency can be slowed
down as much as wanted from say 2 Ghz to 2 Khz or even full-stop or powered
The HAL/Minion bus can be at a much lower speed if desired and can be scaled
independently from the Application CPU memory bus. Since there is no coherency
between individual Minion CPUs, they can be put into full sleep slumber until
a command or event comes by. They can also individually be powered down.
All booting, memory configuration, frequency and power control and other misc.
tasks are done by a designated Minion; no need to expose this all, with the
risk of frying the SoC(!) by an OS/Hypervisor.
I'd like to stir up some ideas on hardware security, which could be a
big advantage with verifiable hardware (x-raying, etc).
Hardware isolation is somewhat necessary to achieve software isolation.
Even with a hypervisor, DMA attacks and malicious USB devices can easily
own your entire system. In the memo it vaguely mentions IOMMU technology
which would help here, allowing you to limit peripherals away from
secure data it otherwise shouldn't have access to.
There's also the somewhat unrelated issue of trusted execution. This can
include things like only running signed code, but more importantly only
disclosing secrets when verified code is running. This can help mitigate
evil maid attacks and malicious firmware.
Just food for thought,
Note that I am a ... dilettante in this area. I've never worked at this
level, but got very interested in tagged architectures from exposure in
the very early '80s to Lisp Machines, which used 8 bit tags for things
like dynamic typing.
That said, in trying to figure out how the lowRISC tagging system per
memo 2014-01 might work and perform, I wonder:
Where will the backing store of the tag cache come from?
If not from stealing bits from 72 bit wide ECC DIMMs (which I don't
get the impression is the plan, although I wonder if ECC/parity will
be supported), how will dirty tags be written if tag cache lines
aren't 64 bits?
Related, what sort of tag cache organizations are you looking at? E.g.
how can the mooted "can be small" 8KiB tag cache most take advantage of
its 32KiB possible entries?