[lowrisc-dev] Porting tagged memory support to current version of RISC-V Rocket Chip

Wei Song ws327 at cam.ac.uk
Tue Oct 6 09:36:16 BST 2015

Hello Monjur,

The reasoning for tag cache is to reduce the traffic to DRAM.
In lowRISC, tags and data are stored separately in different DRAM
So a miss in L1 will cause at least two DRAM reads (one for data and one
for tag).
The total DRAM traffic is increased by 100%.
A tag cache is supposed to reduce the amount of tag traffic but does not
help on data traffic.

If in your case the tag cache is always hit, I am also wondering why
there is this 22% overhead.
However, a big tag cache does not guarantee hit.
Is you tag cache kind of dummy, which I mean the tag cache provides fake
tags without the need to fill empty cache lines even after reset?
Otherwise, the tag cache is empty at the beginning and there will be
compulsory misses after reset.

Best regards,

On 05/10/2015 23:03, Monjur Alam wrote:
> Hi Wei,
> Thank you very much for your help through out by providing valuable
> suggestion.
> So far, we have implemented tag support of riscv for L1 (will add L2
> later on). The architecture is (more or less same as lowRisc):
> 0. Unlike lowRisk, we perform basic operations (load, store) for data
> and tag parallel. 
> 1. Extend data cache 1 bit / double word
> 2. Added tag cache that resides between L1 and DRAM
> 3. Design a tagger module for making bridge between tagCache and DDR3
> But, we have seen that the performance is degraded around 22%; we have
> tested it by existing benchmarks. We are planing to map the design
> into zc706 FPGA and to run SPEC benchmark on our architecture.
> 1. As tag cache (32 MB) assure tag hit, why such performance
> degradation (22%)?
> 2. Does tag cache conceptually help for data miss (not tag miss).
> Because, data miss fetch DRAM, so completion of operation depends on
> data fetch, not only tag even tag is fetched from tag cache which is
> faster?
> 3. Do we really need tag cache, we can fetch tag from DRAM like data.
> Your suggestion please.
> Regards,
> Monjur
> On Tue, Sep 22, 2015 at 4:27 AM, Wei Song <ws327 at cam.ac.uk
> <mailto:ws327 at cam.ac.uk>> wrote:
>     Hello Zhe Cheng,
>     Actually extending tags in L2 is very simple.
>     L2 is ignorant to the content of cache lines. What you need to do
>     is to extend the size of data array.
>     TileLink is the communication fabric used internally in Rocket.
>     Both the broadcasting hub and L2 use the same TileLink/MemIO
>     converter, you you do not need to revise a new converter.
>     At start, HTIF writes program to L2. When L2 needs to write back,
>     some cache line is then written to memory using the TileLink/MemIO
>     converter.
>     Seems like you have made to broadcast one working already.
>     Best regards,
>     Wei
>     On 22/09/2015 00:54, Zhe Cheng Lee wrote:
>>     Hello Wei,
>>     Than you for your response. I was previously using a broadcast
>>     coherence hub instead of a L2, but now I have moved to using an
>>     L2 after verifying that tag bits can be stored to and loaded from
>>     the L1 caches fine in my modifications to the rocket chip. In
>>     this case, will the data be written from HTIF to L2 through a
>>     different converter? Is there a TileLink-to-L2 data converter?
>>     Best regards.
>>     -Zhe Cheng
>>     On Sat, Sep 19, 2015 at 9:15 AM, Wei Song <ws327 at cam.ac.uk
>>     <mailto:ws327 at cam.ac.uk>> wrote:
>>         Hello Zhe Cheng,
>>         I just noticed another issue which may or may not cause the
>>         error.
>>         Since you do not want to use the tag cache, I assume you are
>>         using the original MemIOUncachedTileLinkIOConverter to covert
>>         TileLink messages to MemIO messages.
>>         Also I assume you are using the broadcast coherence hub
>>         instead of using a L2.
>>         In this case, the data written from HTIF are always written
>>         to memory through this MemIO/TileLinke converter.
>>         You need to remove tags for messages from TileLink to MemIO
>>         and add tags for messages from MemIO to TileLinks.
>>         Tag cache does the conversion so I did not change the code of
>>         this MemIO/TileLinke converter.
>>         But some revision is needed in your case. Something like what
>>         the HTIF and icache has been done.
>>         The assembly seems from the dump file, which is correct to my
>>         eyes.
>>         The difference between trace file and dump file would reveal
>>         more insights.
>>         If you think the value load to gp is wrong, may be have a
>>         look of the test case and try to figure out what exactly
>>         wrong would help you debug.
>>         I think it is the test case test_3 in
>>         riscv-tests/isa/rv64ui/ld.S.
>>         Best regards,
>>         Wei
>>         On 18/09/15 23:59, Zhe Cheng Lee wrote:
>>>         Hi Wei,
>>>         Thank you very much for your response. It is indeed
>>>         complicated to get this to really work. I found your
>>>         response helpful, though. I didn't consider HTIF before when
>>>         modifying the current rocket chip. I can see why HTIF is
>>>         imporant then.
>>>         By control path, do you mean the control signals associated
>>>         with the new instructions and the logic to handling them? If
>>>         so, then yes, I have changed it.
>>>         I added the tag utilities (I changed the data types in these
>>>         tag function from Bits to UInt) and modified the
>>>         corresponding lines in htif.scala accordingly to the changes
>>>         in this commit
>>>         <https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b>
>>>         to support tag memory, but the simulations still have not
>>>         passed.
>>>         As a side note, I added the changes in icache.scala to
>>>         remove the tags at the line to be presented to the
>>>         instruction cache as well, but when I compared, say,
>>>         rv64ui-p-ld test .out simulated from the latest rocket-chip
>>>         with the .out file from my changes to it, I noticed that the
>>>         two PCs differ after several instructions when the program
>>>         actually starts. When I revert back the changes in
>>>         icache.scala (as in, removeTag doesn't get called), the two
>>>         PCs start deviating later on instead of within the first few
>>>         after the program starts. Does the L1 instruction caches not
>>>         interact with HTIF?
>>>         Without removing the tags in the instruction cache, the PCs
>>>         begin to deviate after the branch instruction in:
>>>          27c:   0080b183            ld  gp,8(ra)
>>>          280:   ff010eb7            lui t4,0xff010
>>>          284:   f01e8e9b            addiw   t4,t4,-255
>>>          288:   010e9e93            slli    t4,t4,0x10
>>>          28c:   f01e8e93            addi    t4,t4,-255 #
>>>         ffffffffff00ff01 <_end+0xffffffffff00eee1>
>>>          290:   010e9e93            slli    t4,t4,0x10
>>>          294:   f00e8e93            addi    t4,t4,-256
>>>          298:   00300e13            li  t3,3
>>>          29c:   37d19c63            bne gp,t4,614 <fail>
>>>         I am guessing the correct data isn't loaded to gp? How do I
>>>         check this in the output file? I thought gp is the alias for
>>>         register 31, but I don't see r31 around gp at that point.
>>>         Thanks.
>>>         On Fri, Sep 18, 2015 at 4:36 AM, Wei Song <ws327 at cam.ac.uk
>>>         <mailto:ws327 at cam.ac.uk>> wrote:
>>>             Hello Zhe Cheng,
>>>             I think you are probably right on what is needed for
>>>             supporting tags on
>>>             the latest rocket repo.
>>>             However, it is always complicated to make it really work.
>>>             One thing I noticed is that you probably need to apply
>>>             the changes to
>>>             htif.scala as well if you have not done so.
>>>             The tags are stored in a cache line in a way like
>>>             [tag][word][tag][word]....
>>>             The insertTag() and removeTag() in HTIF will make sure
>>>             tag/data end up
>>>             in the right interleaved position inside a cache line.
>>>             Host interface (HTIF) is very important as the test
>>>             programs (elf/hex)
>>>             are written to memory/L2 through it.
>>>             I think the host interface may have written totally
>>>             unaligned program to
>>>             memory due to the lack of insertTag() function.
>>>             Also you need to revise the control path of the rocket
>>>             core, which I
>>>             think you have done so.
>>>             For general debugging tips, you can compare the traces
>>>             from simulation
>>>             with the dump files of the test programs.
>>>             Making sure the rocket processor is running the correct
>>>             instructions
>>>             would be my first check.
>>>             BTW, I am working on bringing up a standard-alone
>>>             lowRISC with tag
>>>             support based on the latest Rocket chip.
>>>             However, it is a slow process and I will need at least a
>>>             couple of
>>>             months on it.
>>>             You will be able to run on a clean design if you can
>>>             wait that long.
>>>             Or if you would like to help, see the "update" branch of
>>>             lowrisc-chip.git.
>>>             I am working on peripherals now. Tag support is not
>>>             added yet, so I can
>>>             use some help to bring back tag support to the new code.
>>>             Hope this is helpful,
>>>             Wei
>>>             On 18/09/2015 00:32, Zhe Cheng Lee wrote:
>>>             > Hi, all,
>>>             >
>>>             > Has anyone successfully port lowRISC changes to
>>>             support tagged memory to a
>>>             > more updated version of the rocket chip repository
>>>             (e.g. develop lowRISC
>>>             > from a more updated version of the rocket chip
>>>             repository)?
>>>             >
>>>             > I want to develop a design module that rely on those
>>>             tagged memory bits and
>>>             > are to be integrated with the most recent version of
>>>             the rocket chip. At
>>>             > this stage of my development process, I just want at
>>>             least the L1 caches to
>>>             > support tagged memory. In other words, I'm not
>>>             concerned about including
>>>             > the tag cache or supporting tagged memory in main
>>>             memory right now. I'm
>>>             > having trouble successfully pushing the tags into the
>>>             L1 caches. I have
>>>             > already added the load/store tag instruction decoding
>>>             and encoding (I'm
>>>             > aware that the order of the control signals in the
>>>             decode table has been
>>>             > changed a bit since the rocket-chip version lowRISC is
>>>             based off of), the
>>>             > new memory access type constant MT_T, and the
>>>             necessary config parameters.
>>>             >
>>>             > At first, I thought I just need to include the
>>>             highlighted modifications in
>>>             > lowRISC's nbdcache.scala from
>>>             >
>>>             https://github.com/lowRISC/rocket/commit/51f65e2dce1bc60ef37c6da956bd8f9c8972961b#diff-de7e6f4be95f6d3b7e13d6c32e5c9783
>>>             > and in its tilelink.scala from
>>>             >
>>>             https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b
>>>             > to the corresponding places in rocket-chip's
>>>             nbdcache.scala, cache.scala,
>>>             > and tilelink.scala. Even without the tag utilities and
>>>             tag cache, this
>>>             > should be fine just for testing existing instructions,
>>>             since those tag bits
>>>             > would just be ignored in those cases, correct? But
>>>             with that, the
>>>             > simulations do not pass the prebuilt tests and
>>>             benchmarks that don't test
>>>             > the load/store tag instructions.
>>>             >
>>>             > Can anyone help with this?
>>>             >
>>>             > Thanks.

More information about the lowrisc-dev mailing list