Having recently discovered Qubes-OS and Anti Evil Maid I've been reading up
on x86 trusted boot and find it shocking how many little flaws it has that
add up to it being near worthless. As such, I'd like to propose my own take
on trusted boot and other aspects of system integrity protection.
First, processors should have a dedicated minion core running an extremely
simple, well tested firmware capable only of taking and storing SHA256
hashes, the hash to be changeable with firmware upgrades if/when SHA256 is
broken/upgraded. This minion core then has read only access to all system
memory, including CPU caches and nonvolatile firmware storage for all other
minions. Its own firmware storage should be exposed read only to a hardware
pin for external verification. This minion core would then have its own
external bus directly to a TPM or equivalent security chip.
The minion core then reports hashes of all firmware and boot binaries
(basically, anything modifiable that can't be protected by full disk
encryption) via PCR registers. The minion can also check the hashes against
its own values for extra security by enabling use of non TPM supported
hashes in parallel and convenience by use of non TPM based key stores like
smart cards (with more advanced firmware to manage unsealing operations).
Use of cryptographic signatures could also be considered. The system would
refuse to boot if any hash marked as critical did not match, and issue an
obvious warning if non critical hashes didn't match.
In order to protect against DMA attacks, the IOMMU should also isolate all
DMA devices by default, configuring each device with a small, unshared
buffer so that the system can't be tampered with between trusted booting
and OS configured IOMMU protection can be enabled. This would mean that not
all minion cores would need to be untampered with to boot securely (that's
why there's non critical hashes above), allowing use of still trusted
features of the system while awaiting repair.
Finally, a means of preventing unauthorized write access in the first place
is needed. One way might be to have a dedicated minion core with exclusive
write access to all system firmware, including a store for the system's
boot partition. Another might be an equivalent to ARM's TrustZone, with
only the secure context getting write access to the firmware and boot
stores. Either way, when booted to allow firmware upgrades, invoked by a
hardware switch only, the system will use the trust minion core to verify
the secure operating system and any firmware/software it relies on, then
enable users to select firmware images designed for different parts of the
system and verify cryptographic signatures on them before flashing to the
chip and resealing the TPM.
Caveat: if the user signs their firmware on the same machine, then
compromise of that machine would still compromise the signing keys and
enable the attacker to write seemingly legitimate boot firmware. A
dedicated firmware writing system only really accomplishes 2 things; 1,
less systems to fully trust, since a single, highly (software) secured
system could be used to sign firmware for multiple systems, and 2, prevents
the attacker from writing bad firmware immediately, as the secure boot mode
can be triggered by hardware, allowing the user more time to realize the
problem and correct it by loading new keys.
What isn't covered here is a way to load signing keys securely, as I'm not
sure how. Perhaps a dedicated hardware input allowing connection to another
The only other thing I'd like to add about this is that I'm not a security
researcher, so while I think this would make system integrity a whole lot
better without too much extra hardware I might have missed something
important, so if only one thing comes from this, seek advice from a
security researcher who works with system integrity, such as Joanna
Rutkowska (who wrote a lot of the stuff I'm basing this on).
> Refusing to unseal system secrets implies you have a working dynamic root
of trust, yes? Didn't you discount that earlier?
TPM keys can be sealed against either a static (e.g. BitLocker) or dynamic
(e.g. Qubes/anti evil maid with TXT) root of trust with this proposal being
more like a static root of trust (with some of its own additions such as
warning about non critical changes without refusing to boot).
The only fundamental difference between the two systems is that a dynamic
root of trust aims to establish a trusted environment retrospectively by
checking the kernel directly after locking it into its own protected memory
area, rather than needing the firmware to be intact and trusted to load the
kernel. Problem is, the firmware still has a lot of tricks it can use to
break that protection, such as changing the apparent hardware configuration
to conceal potential attack vectors from the dynamic root of trust, so it's
better IMHO to just verify the firmware anyway.
A user could always configure the firmware hash as non critical and get
more or less the same thing as a dynamic root of trust in my proposal if
they felt differently, since the IOMMU should protect kernel memory by
default to the best of its abilities in either case.
I have an idea that I would like to contribute to LowRISC:
The problem I see with most current CPU´s and software stacks running on
them is that they have merged the program stack and the data stack into a
single stack. This results in various problems like stack overflows, stack
underflows, return-oriented programming and various other exploits.
My idea is to seperate the program stack (which would just contain the
fixed size return addresses) from the data stack (which would contain the
data), and put them into different segments.
>From my experience, most exploits are overflowing/underflowing the data
stack, not the code stack. So by simply seperating both stacks, you can
protect the code stack from any exploits on the data stack.
I tried to implement a proof-of-concept with Linux and Windows on X86 some
time ago, but it was very hard to find free and reuseable registers for
the seperated stacks on X86.
What I think is necessary is:
* A CodeStack-Segment register
* A CodeStack-StackPointer register
* A DataStack-Segment register
* A DataStack-StackPointer register
* The necessary opcodes to cope with them
If you were able to add those to your LowRISC architecture, and the
compilers would be adapted to make use of the added registers, to generate
code that utilizes seperated stack code, I think that would be a huge step
to protect the whole platform against most of the buffer overflows and
While you are at it, I would also suggest to implement something like
Google Native´s client NACL instruction in hardware.
On Tue, Feb 17, 2015 at 08:54:31AM +0000, Alex Bradbury wrote:
> We went through a selection process when choosing our ISA and
> ultimately RISC-V won out.
Okay, after having reviewed RISC-V some more, I agree with your
decision to go with it. The fact that there are already at least some
chips using it is a major boost.
> We're very aware of the challenges and the timelines involved in
> producing silicon. e.g. we have a 128-core research test chip on a
> 40nm process taping out this summer.
yes well as you mentioned it is somewhat unrelated.
Though you mentioned that they have limited instruction sets.
I'm personally a fan of MISC architectures bundled with FPGA's.
> I'm not sure why you think we're
> anywhere near running out of steam, on the contrary we're just getting
It's important to have a good start.
For instance due to lack of GPU or even USB in OpenCores.
It may be best to go with a microcontroller.
Can make something that will fit an Arduino.
In addition to the RISC-V ISA, it will require
UART, EEPROM, Flash, SDRAM, A/D decoder,
so it could be a drop in replacement,
albeit with different ISA.
I'm here mostly to advocate for AGI friendly chips.
a Libre FPGA with fully open toolchain being a main point.
EEPROM, Flash and SDRAM all work in that direction.
Also the boards are done, there are lots of -duino boards.
So if something comparable to the ATMega328 is made,
there is already a large market of potential buyers.
That way, all you have to focus on is the one chip.
It does have the constraints of voltage, size and function,
but it also means all those things don't have to be decided.
In terms of nm's I'd recommend using the largest width that gives the
desired functionality (a little more than ATMega328).
wider width will lower the cost while also increase
the half-life of the chips.
On Tue, Feb 17, 2015 at 03:30:24AM +0100, ALadar-V wrote:
> Dne 13.2.2015 16:13, Jookia napsal(a):
> > Hey,
> > On 02/14/2015 01:56 AM, Reinoud Zandijk wrote:
> >> Not to stir a flamewar, but his GPU hardware code is released under
> >> LGPL which
> >> might be feasable.
> > Not wanting to stir a flame war either, but if there's only one option
> > for an open GPU in the end, perhaps it'd be better to have a branch
> > that's LGPL/GPU with the GPU until a permissive GPU is viable?
Personally I don't think LGPL is stringent enough, GPL-3 prefered.
I'd prefer if everything was relicensed with a strong copyleft.
Permissive licenses have historically only been abused.
For instance Apple took BSD source and gave nothing useful back.
Linux being GPL-2 allows tivoization, such as in Android,
it bricks, makes me cry, and fearful of developing on it.
> > I'd hate to see an open GPU not used due to it's license.
If it's too permissive, can always relicense as something more
> > That said from a practical standpoint it'd be nice to outline what
> > performance benchmarks a GPU should achieve to be deemed usable. Desktop
> > compositing at minimum, though playable Quake would be nice.
For an initial release, it would be best to use what is ready.
There is an ASIC ready OpenRISC core, and ethernet,
only thing that needs to be ported is a UART to ASIC.
Then can package that, crowd-fund it, sell it as a web server.
> Agreed, for general use. But the fundamental question is the team of
> developers - whether they can design a GPU at all. The team is the key
> to performance.
The team is here for the moment, but if there aren't results and
revenue soon, the team will run out of steam.
> > I'm probably vastly underestimating how unfinished and low performance
> > the GPU would be, but I'm just amazed it exists at all.
> Me too.
> > Cheers,
> > Jookia.
This project seems to be suffering from
scope-creep and planning fallacy.
In terms of Scope-Creep,
there is some reinventing of the wheel,
by not using the chips currently available i.e. OpenRISC.
by attempting to appeal to academics when they make their own boards.
Considering you don't have everything ready yet,
i.e. all the designs, and a prototype,
you can't release in next 6 months.
without the complete designs,
you can't in next year.
There's no time.
To gain some more focus I recommend cutting down the goals to
"To create a fully open SoC" and dropping the rest.
"To create a fully libre SoC" is even better.
Can get RYF (Respect Your Freedom) certificaiton.
Minimize the amount of work to do,
and maximize the amount of revenue,
that is what releasing early means.
Once you have the design put together,
ASIC with OpenRISC, ethernet and possibly UART,
have to get quotes from manufacturers,
connect with other projects for support,
then set up a crowdfunding campaign,
generate lots of hype and funds,
with the other projects support.
As a non-profit you can have a margin,
I recommend 75% over cost of production,
25% for sales/marketing, 25% for R&D/tech-support,
25% for administration/community.
A fully libre SoC has never been done before,
it would be a great achievement,
people will pay to buy it.
Once there is the initial revenue,
can think about adding VGA, GPU,
or even libre FPGA's.
Meanwhile focus on what's done.
Gecko3 was an SoC that was made but not marketed or sold.
It may be possible to salvage some of their plans.
note: can't have a proprietary programmable FPGA on a fully open SoC,
though if it is cheaper/easier maybe can use one time programmables,
so that you can deliver a product to market sooner.
Anyways, I hope you pull yourselves together,
and get something to market,
before your steam runs out.
I was thinking along the (GPU) lines you do a few weeks now. One
question is whether RISC-V ISA is the right ISA for a GPU (I wish so)
and whether some community of developers could be established to take
care of development of a GPU. Something along Rocket cores line -
personally, I was thinking about gradual rewrite of MESA 3d library into
hardware, accelerating parts of it until it's a full-fledged GPU-V
Looking at opencores.org under Video controller section, I can only find
Wishbone graphics controller and VGA, nothing about 3d.
Wouldn't it be great if there was a scalable GPU-V generator along the
lines of Rocket? A generator that could generate GPU from input
variables like number of blocks, ALUS, etc. Again, a spec would be
needed for the GPU so the OpenCL and OpenGL would work on any generated
With regards to moving data in a massive-register system, unless there
is a spec for the "GPU ISA", we can't start to tinker with it albeit in
a software simulator, because there's no benchmark code available. Same
as with benchmarking Rocket cores.
What I was personally thinking what to start with (with the Mesa lib)
would be some rasterizer-glued accelerator of a VGA GPU to perform
triangle rendering with antialiasing at high speed, offloading the CPU
in Mesa library. Maybe tile-based. Step by step... With HBM or HMC
memory, the bandwidth to off-chip memory can be there. And I just
imagine a SoC with multiple high bandwidth interfaces having both GPU
and CPU part. Something like i.e. Raspberry Pi (now version 2 :) on
On Tue, Jan 13, 2015 at 11:11:55PM +0100, Reinoud Zandijk wrote:
> Hi Jerome,
> On Tue, Jan 13, 2015 at 04:17:07PM -0500, Jerome Glisse wrote:
> > > What about https://github.com/VerticalResearchGroup/miaow ?
> > The 3 clauses BSD license is horrible, i wonder when people will
> > it. That said last time i checked that they only implemented very basic
> > part. It was far away from being usefull or meaningfull.
> I won't start a BSD license vs GPL discussion here :) Apart from that, it
> BSD license without the advertisement clause are fine. BSD license with the advertisement clause is a pain to deal with, anyone who ever had to work on software distribution can testify on that (especialy if lawyer were involve). Most of time solution end up being not shipping the software and replacing it with something else.
>> GPL did not make that mistake, and almost all major BSD licensed project do use the 2 clauses license (ie one without the advertisement clause). I am certainly not a BSD/GPL flamer. But i will definitly cry out loud to any body who use or consider using the 3 clauses license or any license with an advertisement clause.
> Dne 8.2.2015 20:01, lowrisc-dev-request(a)lists.lowrisc.org napsal(a):
> GPU are all about bandwidth, aligning compute unit one after the other is a pointless exercise. It is all about feeding compute unit with data to crunch. The "secret" of GPU is to have an order of magnitude more threads in flight than there is compute unit (10 times more on high GPU is a good approximation). Idea is that you will always have thread that are ready to perfom an operation on the floating point or integer ALU.
> Having all the executions unit go to the same program is a mistake. On GPU you often have several programs in flight (in case of graphics some work on vertices others on pixels ...). Also the stack size you need to keep around to account for active/inactive thread is log2(#unit_same_program). So far both AMD and NVidia seems to have converge on 64 compute unit (each unit here being a simple
> ALU capable of perfoming a single float or integer operation per cycle). With 64 threads you only need 6 qword stacks ie 48bytes.
> Intel did try to do just as you said and it turn out to be one of there biggest flop (larrabee disaster).
> So if anyone wants to design a GPU, the main thing is first figuring out how to get the biggest bandwidth you can at all level (memory fetch from main memory, or register file access). The compute unit themself or the instruction scheduler are not the most complex part, they are in fact the easy part as all tricks you can do to perform arithmetic operations or things like instruction cache and decoding are well know and well documented. But how to design a register file capable of delivering 1024bits or more per cycle is hard. Or a texturing unit
> capable of filtering a texel in cycle and batching main memory access to maximize bandwidth and minimize cache miss is the hard part.
> That is where most of the secret sauce is and you will not find many things in the litterature about those aspect.
>> With regards,
[I'm crossposting to the openrisc mailing list(s) and the lowrisc-dev
As many of you may know, as part of the lowRISC project we are looking
to use a collection of simple RISC-V cores ('minions') to provide
software-implemented peripherals as well as other uses such as secure
isolated execution. Moving more of the protocol implementation to
software rather than hardware does perhaps ease the verification
challenge slightly (or at least moves it into the realm of "we can fix
it with an update"), though thorough verification suites for
communication protocols are still of great interest.
A search for 'i2c VIP' or similar shows a range of verification IP
available from a range of vendors. In the ideal case, the open
hardware community would create its own equivalent suites. I was
wondering if anyone has any useful links to such efforts, particularly
in the open source world? I had a bit of a look through opencores.org,
though none of my admittedly limited sampling had a particularly
thorough testbench - are there any particularly thoroughly tested
One example I did find is
I was at Alex's talk earlier today at FOSDEM and I am very interested in
contributing to lowRISC in any way that I can. I have some experience
developing for LLVM and would really like to take a look at the work that
has been put into the lowRISC backend so far. I can't find any links to a
version control repository anywhere on the website. Is there anything
publicly available that I can look at now?