[lowrisc-dev] lowRISC SoC structure / communication between application cores and minions

Reinoud Zandijk reinoud at NetBSD.org
Tue Dec 30 22:59:09 GMT 2014


Hi Alex :)

On Tue, Dec 30, 2014 at 11:45:47AM +0000, Alex Bradbury wrote:
> On 23 December 2014 at 20:12, Reinoud Zandijk <reinoud at netbsd.org> wrote: I
> don't mean to come across as overly worried about overheads roughly
> equivalent to a syscall. I'm just very aware of recent interest in systems
> research in bypassing the kernel for network or disk traffic for lower
> overhead and latency. See e.g. Ix and Arrakis:
> <https://www.usenix.org/conference/osdi14/technical-sessions/presentation/belay>
> <https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter>.

See below :) (The two links don't work)

> This functionality is also already deployed in the market place, with e.g.
> Intel's SR-IOV
> <http://en.wikipedia.org/wiki/X86_virtualization#PCI-SIG_Single_Root_I.2FO_Virtualization_.28SR-IOV.29>.

As for the X86 virtualization; if i understand the wikipedia entry correctly,
its about generating a shadow PCI configuration tree based on a single tree
that only has all the devices in it that are exported to the VM client; the
guest OS then can do a standard enquiry to the PCI bus and gets a (virtual)
PCI root with all the other devices attached to it as they are configured. For
the guest OS, thats all there is and it can configure and manipulate them as
it wants. This is done by address remapping, IO interupt remapping, extra
(possibly virtual) configuration space etc. This is all specially mapped and
needed for mmio based devices.

So IIRC, basicly SR-IOV comes down to
   pci0
     |-- pci hub1
     |     |- pci audio card
     |     |- pci ide controller
     |
     |-- pci hub2
           |- pci Video controller
	   |- pci Ethernet controller

Where pci0 is basicly virtual though mapped in the same space as PCI would
normally start its discovery, pci hub1 and pci hub2 are also virtual or
remapped and the other devices are remapped to appear where they showup in the
tree. The devices might be virtual but might also be a physical device like an
ethernet card.

As for my solution, the issue is neglectable and solves itself.

On OS boot, the OS asks the Hypervisor what devices it got assigned for it,
including say how much memory it got allocated (and where). This returns a
list of device IDs. These IDs are allocated by the Hypervisor and are 1:1
mapped on either:

* pure virtual devices say like a `memory device', power on/off
* virtual devices handled by a host OS like a virtual disk device
* real devices like an real ethernet device, a real audio device, a real USB
  bus
* semi-real devices like a slice of a real harddisc presented as a virtual
  disk device

Commands for virtual devices, say the memory device, power on/off etc. are
handled by the Hypervisor and passed to the control Minion if needed.

Commands for real devices are directly piped to the designated Minion (with
maybe a check on valid parameters by the Hypervisor if one is more paranoid).

Commands for semi-real devices can be massaged by the Hypervisor and then
directly passed to the device.

If only say a few USB devices are to be passed but not all(!) it can be
filtered by the Hypervisor or the controller might be virtual and only direct
transmissions are passed directly.

For either a host OS or a guest OS there is no difference between the
interface; they all communicate and behave the same.

> There may be perfectly valid reasons we don't want to worry about
> those use cases right now, or possibly other approaches can make the
> overhead of trapping low enough not to worry about this. It's just
> something I think needs thinking about at this stage. Though I freely
> admit, although I try to follow work in this area, it's not my field
> of expertise.

Coming back to my 1st statement, what they call direct access is more
comparable to say NetBSD raw-device access verses normal device access. In
normal access, pieces of the harddisc are buffered by a memory mapping
buffermanager, read in by faulting and purged out on demand. This is very good
for general purpose programs and the OS in general. RAW devices OTOH are not
buffered at all and reads and writes are directly passed to the device. To
give an example: a sequential copy from a normal (buffered) device (harddisc)
can be in the order of 47 Mb/sec where in RAW mode it can be 134 Mb/sec.

Its thus also a matter of choice and implementation of a virtual device; if
its implemented as a file on a host OS, yes then its slower; you get the
overhead of the buffercache and the overhead of the FS. If its implemented as
a semi-real harddisc device (i.e. slice on harddisc) its 100% speed. If its a
complete device its 100% too.

> You might also want to use the FIFO for unprivileged code for
> frequent, lightweight messages. e.g. a virtual machine passing
> information on observed types to a minion core which is collecting
> stats or even performing the full JIT compile off the main thread.

The Hypervisor could also allow an OS to have access to one of the Minion
FIFOs or even a separate processes (very hairy); this is just a matter of a
permission setting register in the application-processor. Still i expect the
number of messages/second in a FIFO to be enormous, even when going trough a
Hypervisor call. Even if its say a 1Mhz bus, with 16 bytes/message without
acknowledge its 65536 messages/second; now we can do better than 1Mhz for
sure!

General stats on the tag system, the L1 cache performance etc can also be
collected in one page/aplication-processor, provided the CPU has counters for
it, that can be added once a stat-tick to the current values. The page can
then be transfered over and over again by DMA on the stat-tick by the
statistics collector Minion. Other stats could also be set up this way by
asking the OS to mark a page as wired and asking the stats Minion to also
collect the page on the stat-tick.

This way its even extendable; sounds like a better way then sending massive
amount of data for statistical analysis.

> > As for the choice between a Hypervisor and running a bare OS, i'd go for the
> > Hypervisor solution. This way multiple OSs can easily run in parallel
> > including nested OSs. Even with one OS it abstracts away specific
> > implementation details that we might want to change in later versions.
> 
> With tagged memory and a base hypervisor layer, it's starting to sound
> like an IBM mainframe-on-a-chip!

Why stop at a mainframe-on-a-chip! ;-)

> Is there an existing system you would propose to model this on? L4? Akaros?
> <http://www.klueska.com/pubs/socc11-akaros.pdf>.

We could learn lessons from them but their situation is quite different from
our target i think.

We could learn from them though by not demaning a fixed way of doing things.
For example, by allowing all non-instantanious Hypervisor calls like all FIFO
read/write requests to be either blocking, asynchronous or just notify i.e.
discard; if asynchronous, the OS then gets an interrupt when the response is
ready. When such calls would allways block, the entire OS would stop until the
Hypervisor returns; not a situation that is desirable.

When it is allowed to be asynchronous and say a FIFO read/write would
otherwise block, there are various options: the OS requesting it can then
either shedule a different thread or, when not available, accept an
OS-timeslice preemtion before its allotted timeslice is over. All idling OSes
are also supposed to request preemption until there is something more exiting
to do than burning cycles.

> > This also prevents (most) DMA based attacks since no user code will ever run
> > on the Minions. Unless of course their firmware upgrade is explicitly allowed
> > by the minion master. These minion firmware could be considered Trusted and
> > are normally bound to the specific board and its hardware interfaces.
> 
> Yes, replacing minion's code should certainly be a privileged
> operation and in some deployments you'd want it to be fixed as part of
> the secure boot sequence.

Sure! Or at least demand some form of microswitch or boot loader (Hypervisor
startup shell?) setting to unlock a specific minion reprogramming.

> > IF we go for supporting tags in the Minions, well... what to do with them? We
> > can't just save them to disc for that needs knowledge not present there. We
> > can't transfer them over a serial line unless we define a protocol for that. I
> > think thats the curse of wanting to support something that no devices cater
> > for. We're too early :)
> 
> A minion could be employed as a smart I/O device where it reads in
> something over e.g. bluetooth, builds a datastructure in its
> scratchpad and applies appropriate tags which then gets DMAed back.
> There may be other uses for tags that can be constructed for the
> minions.

I hadn't thought about those kind of applications yet though i am not sure it
would be best to categorise data there but it sure can have its merits.

Would the FlexPRET cpu's be an option for our Minions? Then we can eliminate
the entire L1 cache and use all that silicon space for our 1 cycle scratchpad
memory! It would also allow for multiple hardware threads for our direct
control, FIFO handling etc.

> > As for an intermediate solution, the DMA engine could be instructed to only
> > accept tags of a given type or give an error; say only accept data marked
> > encrypted and/or store data marked a certain type.
> 
> Possibly, though this is going against the idea that tagged memory is
> a general purpose reconfigurable mechanism by fixing behaviour.

Not fixing behaviour at all! I only suggested that the DMA engine could be
instructed to store data under a specified type or that the DMA engine may
only accept data with a specified type, just as a part of the DMA command.

> Thanks, as always for your thoughts!
Hope this gets up a bit further on the road!

With regards,
Reinoud

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
Url : http://listmaster.pepperfish.net/pipermail/lowrisc-dev-lists.lowrisc.org/attachments/20141230/524de57a/attachment.sig


More information about the lowrisc-dev mailing list