RISC-V LLVM September update
by Alex Bradbury
A high quality, upstream RISC-V backend for LLVM is perhaps the most
frequently requested missing piece of the RISC-V software ecosystem. This
email provides an update on the rapid progress we've been making towards that
goal, outlines next steps and upcoming events, and tries to better explain the
approach that we're taking. As always, you can track status
[here](http://www.lowrisc.org/llvm/status/) and find the code
[here](https://github.com/lowRISC/riscv-llvm).
Note: you may prefer to read the HTML-rendered version of this post:
http://www.lowrisc.org/blog/2017/09/moving-risc-v-llvm-forwards/
## Status
I've been able to make substantial progress since the [last
update](http://lists.llvm.org/pipermail/llvm-dev/2017-August/116709.html).
### RV32
100% of the GCC torture suite passes for RV32I at -O0, -O1, -O2, -O3, and -Os
(after masking gcc-only tests). MC-layer (assembler) support for RV32IMAFD has
now been implemented, as well as code generation for RV32IM.
### RV64
This is the biggest change versus my last update. LLVM recently gained support
for parameterising backends by register size, which allows code duplication to
be massively reduced for architectures like RISC-V. As planned, I've gone
ahead and implemented RV64I MC-layer and code generation support making use of
this feature. I'm happy to report that 100% of the GCC torture suite passes
for RV64I at O1, O2, O3 and Os (and there's a single compilation failure at
O0). I'm very grateful for Krzysztof Parzyszek's (QUIC) work on variable-sized
register classes, which has made it possible to parameterise the backend on
XLEN in this way. That LLVM feature was actually motivated by requirements of
the Hexagon architecture - I think this is a great example of how we can all
benefit by contributing upstream to projects, even across different ISAs.
### Other activities
* Community members Luís Marques and David Craven have been experimenting with
D and Rust support respectively.
* Andes Technology have started working to contribute to this effort, and are
putting together patches for compressed instruction set support. This is a
really positive move and I hope others will follow their lead.
### Development stats
* The 'reference' [patch queue](https://github.com/lowRISC/riscv-llvm) consists
of 48 patches, modifying 128 files and inserting over 11500 lines to the LLVM
and Clang codebases. Close to 50% of these lines are new tests.
* Other activities:
* 8 committed patches to LLVM in areas other than lib/Target/RISCV
(cleanups, support code, documentation improvements, bug fixes) with 5 more
making their way through the review process.
* 2 GCC/binutils bugs reported
* RISC-V [psABI
doc](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md):
Authored 7 merged commits, 13 issues filed.
I'd like to thank [everyone](https://github.com/lowRISC/riscv-llvm#credits)
who has contributed code review, feedback, or suggestions so far, as well as
our industrial sponsor.
## Approach and philosophy
As enthusiastic supporters of RISC-V, I think we all want to see a huge range
of RISC-V core implementations, making different trade-offs or targeting
different classes of applications. But we _don't_ want to see that variety in
the RISC-V ecosystem result in dozens of different vendor-specific compiler
toolchains and a fractured software ecosystem. Unfortunately most work on LLVM
for RISC-V has been invested in private/proprietary code bases or short-term
prototypes. The work described in this post has been performed out in the open
from the start, with a strong focus on code quality, testing, and on moving
development upstream as quickly as possible - i.e. a solution for the long term.
My implementation approach has been to first work towards a cleanly designed
and well tested RV32I baseline compiler. Once the fundamentals are solid, it
is _significantly_ easier to add in new features, optimisations, or indeed
customisations for different RISC-V variants. This careful approach has
enabled the rapid progress of the past few weeks. I would summarise this
approach as:
1. Ensuring (to the extent possible) that correct code is always generated for
a simple RV32I baseline
2. Expand that baseline to support more RISC-V ISA variants (RV{32,64}IMA and
later FD)
3. Push forwards on optimisations (generated code quality) and compiler
feature support. Part of this work is moving to larger scale test programs and
benchmarks in order to maintain confidence about the correctness of generated
code.
This project is currently in the process of moving from point 2) to point 3)
on the list above. As it stands, you will benefit from LLVM's many middle-end
optimisations, but the final code generation stage has seen little work
focused on the performance of generated code. With a high quality base now in
place, adding these optimisations can be done relatively easily.
If you want to see first-class support for RISC-V in LLVM, now is the time to
get involved and help make it happen.
## Roadmap and upcoming events
My aim is to have Clang and LLVM developed to serve as a competitive
alternative to GCC on RISC-V by the end of the calendar year. If the RISC-V
community works together, this is an achievable goal.
My near-term goals are:
* Expand testing for RV64. There are rather few RV64 unit tests right now
because parameterising the backend by XLEN mostly "just worked".
* Go through the implementation again (particularly recently added code) to
look for further cleanup or refactoring opportunities, then propose to merge
it upstream.
* Once I'm happy with the implementation approach for the most recently added
features, move the 5.0-based
[riscv-llvm-integration](http://github.com/lowrisc/riscv-llvm-integration)
tree forwards.
* Review patches from contributors such as Andes and help to support language
port efforts.
Focus areas after that include:
* Clang toolchain driver, MAFD codegen and ABI support
* Benchmarking vs RISC-V GCC and generated code quality improvements
* Documentation, expanded test cases, and improved compiler testing tooling
I've mapped out a number of TODO items
[here](https://github.com/lowRISC/riscv-llvm/issues).
I'm pleased to report that my proposal for a RISC-V "birds of a feather"
session at the [upcoming LLVM Dev
Meeting](http://llvm.org/devmtg/2017-10/#bof4) (Oct 18th) was accepted. You
should definitely attend this event if you are an LLVM developer working on an
out-of-tree RISC-V backends or are looking to get involved (representatives
from several companies in that position are already confirmed as attending).
We're also looking to run a longer working/hacking session the day before, at
a San Jose location. More details on both events will be circulated shortly.
## FAQ
### Do you care about performance of generated code and code size?
Like any compiler developer, of course! Starting with a solid and well tested
base is the best way of achieving those aims. We're now in a position where we
can push forwards on these fronts, which will will soon become a primary focus
of this development effort.
### I have a RISC-V LLVM fork for which works for me, why should I
care about an upstream backend?
This is ultimately a question about long-term maintenance and sharing the
support burden with others in the RISC-V ecosystem. By working together and
pooling our development efforts, we can unlock the benefits of the open source
approach. Contributing to this effort is almost definitely the right long-term
choice for your project. If you want to discuss how to contribute, please drop
me an email or come along to the upcoming birds of a feather session.
### What is the difference between this and other RISC-V LLVM efforts?
This effort is focused on high code quality, long term maintainability, and in
getting development merged in to upstream LLVM. Andes have recently released a
private development tree to the public. That tree was was based on an early
version of the lowRISC patchset, but diverged significantly in terms of
implementation approach. Andes are now working to submit patches to this
effort.
### Who are you?
I (Alex Bradbury) am a co-founder and director of lowRISC CIC, where this
development work has been taking place. I have been developing LLVM backends
for the past seven years, and am now upstream code owner for the RISC-V
backend. If you're interested in LLVM, you are hopefully already familiar with
my [LLVM Weekly](http://llvmweekly.org/) newsletter.
### Why is lowRISC interested in LLVM?
lowRISC is a not-for-profit created to push forward open source hardware by
developing a secure, open, and flexible SoC design. Enabling custom hardware
and derivative designs is about much more than shipping RTL with an open
source license - hardware is of little use without the software infrastructure
in place to support it. A high quality and easy to modify LLVM backend is
important for potential lowRISC adopters, but also benefits our own hardware
development efforts. Specifically, we will be building the software component
of our tagged memory mechanisms on top of LLVM. If our mission sounds
interesting to you, there's good news - [we're
hiring](http://www.lowrisc.org/blog/2017/09/were-hiring-work-on-making-op....
### Who do I contact to discuss further sponsoring this effort?
If your company would like to see lowRISC's work on RISC-V LLVM be sustained
or expanded through 2018, then contributing development time and/or
sponsorship is the best way to do this. Please contact asb(a)lowrisc.org to
discuss further.
5 years, 12 months
RISC-V LLVM status update
by Alex Bradbury
As you will have seen from previous postings, I've been working on upstream
LLVM support for the RISC-V instruction set architecture. The initial RFC
<http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html>
provides a good overview of my approach. Thanks to funding from a third party,
I've recently been able to return to this effort as my main focus. Now feels
like a good time to give an update on where the RISCV backend is at, and how
you can help.
Note: a version of this message has also been sent to the llvm-dev
mailing list <http://lists.llvm.org/pipermail/llvm-dev/2017-August/116709.html>.
## Current status
* A full, regularly rebased patchset can be found here
<https://github.com/lowRISC/riscv-llvm>.
* 16 of these patches have been put up for review so far. 7 have been
committed, and 8 are awaiting review.
* The vast majority of the GCC torture suite compiles and runs at O0,
targeting RV32I. 1315 out of 1352 compile and run (32 compile-time failures, 5
run-time failures).
* I intend to keep <http://www.lowrisc.org/llvm/status> updated with status,
test results etc.
## Next steps and getting involved
The plan has always been to work from the MC-layer upwards towards reliable
RV32I codegen. This then provides a stable 'core' of the backend where it's
easy for further development work to be parallelised, and for others to make
contributions. I think we're now at that point.
I would really like to avoid setting up a new 'downstream', and to use this
opportunity to pull in new people to upstream LLVM development. However
collaboration is made rather difficult for now due to the large gap between
the patches that have been reviewed and committed upstream vs the complete
patchset. If you would like to help, reviewing the remaining patches
<https://reviews.llvm.org/differential/?authors=asb&order=updated> is an
incredibly valuable way to do so. If you would like to be listed as a reviewer
for future patches, just let me know (and feel free to add yourself for
existing patches). I'll be doing some refactoring of RISCVInstrInfo.td as
discussed here
<http://lists.llvm.org/pipermail/llvm-dev/2017-August/116635.html>, but this
is a straight-forward non-functional change. Other than that, there are no
planned changes to the patches currently up for review.
My current plans look like the below, though of course will accelerate
significantly with more contributors:
* Next 2-3 weeks: resolve remaining torture suite issues, ABI compliance
testing, documentation on MC layer backend implementation
* End of Oct: Clang toolchain driver, initial MAFD MC layer + codegen +
calling conventions, initial benchmarking vs RISC-V GCC and generated code
quality improvements
* End of Dec: finalise MAFD support, improve MAFD compliance testing and
larger-scale codegen testing, further expanded backend docs, generated code
quality improvements
I've mapped out a number of TODO items here
<https://github.com/lowRISC/riscv-llvm/issues>, which I hope can help to
co-ordinate efforts. Where possible, this indicates the current preferred
approach (e.g. we plan to provide RV64 support building upon Krzysztof's
variable-sized register class work). I've submitted a RISC-V BoF session
proposal for the upcoming LLVM Dev Meeting. If accepted, this should provide a
great opportunity to further discuss and co-ordinate work between the multiple
parties who have expressed an interest.
## Credits and support
Thank you to everyone who has contributed review comments, suggestions, or code
to this patchset and related support patches: Chandler Carruth, Chih-Mao Chen,
David Chisnall, Simon Cook, David Craven, Hal Finkel, James Y Knight, David
Majnemer, Ed Maste, Dylan McKay, Krzysztof Parzyszek, Jordy Portman, Philip
Reames, Tim Northover, Eugene Zalenko, Florian Zeitz. Apologies to anybody who
I've missed.
This work has been performed through lowRISC CIC <http://www.lowrisc.org>, a
not-for-profit company. If your company would like to see my work on RISC-V
LLVM to be sustained or to accelerate, contributing sponsorship and/or
development time is the best way to do that. Please contact asb(a)lowrisc.org if
you would like to discuss sponsorship, or have questions about code
contributions that you can't discuss on-list.
6 years