[lowrisc-dev] Porting tagged memory support to current version of RISC-V Rocket Chip

Wei Song ws327 at cam.ac.uk
Sat Oct 10 17:11:17 BST 2015

Hello Monjur,

I had run SPEC 2006 Integer cases on a Zedboard using the script from 
Speckle, although not all cases.
You can have a look of the results in 
These are the result results collected from FPGA runs.

Best regards,

On 09/10/15 22:19, Monjur Alam wrote:
> Hi Wei,
> I got your point. Answer to your question is No, it does not fill the 
> cache with fake tag after reset. And, you are write, always miss 
> happen at the the beginning just after reset. Thanks for pointing this.
> One more suggestion please; do you ever run SPEC CPU2006 on top of 
> rocket-chip on FPGA. I have created a stackOverflow question 
> (http://stackoverflow.com/questions/33004581/running-spec06-with-riscv-architecture). 
> The Speckle provides a wrapper for that to run spike. But, spike has 
> no connection with rocket-chip. I think, running CPU2006  on top of 
> rocket-chip on FPGA will demonstrate the performance overhead of real 
> architecture.
> Your opinion please.
> Regards,
> Monjur
> On Tue, Oct 6, 2015 at 4:36 AM, Wei Song <ws327 at cam.ac.uk 
> <mailto:ws327 at cam.ac.uk>> wrote:
>     Hello Monjur,
>     The reasoning for tag cache is to reduce the traffic to DRAM.
>     In lowRISC, tags and data are stored separately in different DRAM
>     partitions.
>     So a miss in L1 will cause at least two DRAM reads (one for data
>     and one for tag).
>     The total DRAM traffic is increased by 100%.
>     A tag cache is supposed to reduce the amount of tag traffic but
>     does not help on data traffic.
>     If in your case the tag cache is always hit, I am also wondering
>     why there is this 22% overhead.
>     However, a big tag cache does not guarantee hit.
>     Is you tag cache kind of dummy, which I mean the tag cache
>     provides fake tags without the need to fill empty cache lines even
>     after reset?
>     Otherwise, the tag cache is empty at the beginning and there will
>     be compulsory misses after reset.
>     Best regards,
>     Wei
>     On 05/10/2015 23:03, Monjur Alam wrote:
>>     Hi Wei,
>>     Thank you very much for your help through out by providing
>>     valuable suggestion.
>>     So far, we have implemented tag support of riscv for L1 (will add
>>     L2 later on). The architecture is (more or less same as lowRisc):
>>     0. Unlike lowRisk, we perform basic operations (load, store) for
>>     data and tag parallel.
>>     1. Extend data cache 1 bit / double word
>>     2. Added tag cache that resides between L1 and DRAM
>>     3. Design a tagger module for making bridge between tagCache and DDR3
>>     But, we have seen that the performance is degraded around 22%; we
>>     have tested it by existing benchmarks. We are planing to map the
>>     design into zc706 FPGA and to run SPEC benchmark on our architecture.
>>     1. As tag cache (32 MB) assure tag hit, why such performance
>>     degradation (22%)?
>>     2. Does tag cache conceptually help for data miss (not tag miss).
>>     Because, data miss fetch DRAM, so completion of operation depends
>>     on data fetch, not only tag even tag is fetched from tag cache
>>     which is faster?
>>     3. Do we really need tag cache, we can fetch tag from DRAM like data.
>>     Your suggestion please.
>>     Regards,
>>     Monjur
>>     On Tue, Sep 22, 2015 at 4:27 AM, Wei Song <ws327 at cam.ac.uk
>>     <mailto:ws327 at cam.ac.uk>> wrote:
>>         Hello Zhe Cheng,
>>         Actually extending tags in L2 is very simple.
>>         L2 is ignorant to the content of cache lines. What you need
>>         to do is to extend the size of data array.
>>         TileLink is the communication fabric used internally in Rocket.
>>         Both the broadcasting hub and L2 use the same TileLink/MemIO
>>         converter, you you do not need to revise a new converter.
>>         At start, HTIF writes program to L2. When L2 needs to write
>>         back, some cache line is then written to memory using the
>>         TileLink/MemIO converter.
>>         Seems like you have made to broadcast one working already.
>>         Best regards,
>>         Wei
>>         On 22/09/2015 00:54, Zhe Cheng Lee wrote:
>>>         Hello Wei,
>>>         Than you for your response. I was previously using a
>>>         broadcast coherence hub instead of a L2, but now I have
>>>         moved to using an L2 after verifying that tag bits can be
>>>         stored to and loaded from the L1 caches fine in my
>>>         modifications to the rocket chip. In this case, will the
>>>         data be written from HTIF to L2 through a different
>>>         converter? Is there a TileLink-to-L2 data converter?
>>>         Best regards.
>>>         -Zhe Cheng
>>>         On Sat, Sep 19, 2015 at 9:15 AM, Wei Song <ws327 at cam.ac.uk
>>>         <mailto:ws327 at cam.ac.uk>> wrote:
>>>             Hello Zhe Cheng,
>>>             I just noticed another issue which may or may not cause
>>>             the error.
>>>             Since you do not want to use the tag cache, I assume you
>>>             are using the original MemIOUncachedTileLinkIOConverter
>>>             to covert TileLink messages to MemIO messages.
>>>             Also I assume you are using the broadcast coherence hub
>>>             instead of using a L2.
>>>             In this case, the data written from HTIF are always
>>>             written to memory through this MemIO/TileLinke converter.
>>>             You need to remove tags for messages from TileLink to
>>>             MemIO and add tags for messages from MemIO to TileLinks.
>>>             Tag cache does the conversion so I did not change the
>>>             code of this MemIO/TileLinke converter.
>>>             But some revision is needed in your case. Something like
>>>             what the HTIF and icache has been done.
>>>             The assembly seems from the dump file, which is correct
>>>             to my eyes.
>>>             The difference between trace file and dump file would
>>>             reveal more insights.
>>>             If you think the value load to gp is wrong, may be have
>>>             a look of the test case and try to figure out what
>>>             exactly wrong would help you debug.
>>>             I think it is the test case test_3 in
>>>             riscv-tests/isa/rv64ui/ld.S.
>>>             Best regards,
>>>             Wei
>>>             On 18/09/15 23:59, Zhe Cheng Lee wrote:
>>>>             Hi Wei,
>>>>             Thank you very much for your response. It is indeed
>>>>             complicated to get this to really work. I found your
>>>>             response helpful, though. I didn't consider HTIF before
>>>>             when modifying the current rocket chip. I can see why
>>>>             HTIF is imporant then.
>>>>             By control path, do you mean the control signals
>>>>             associated with the new instructions and the logic to
>>>>             handling them? If so, then yes, I have changed it.
>>>>             I added the tag utilities (I changed the data types in
>>>>             these tag function from Bits to UInt) and modified the
>>>>             corresponding lines in htif.scala accordingly to the
>>>>             changes in this commit
>>>>             <https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b>
>>>>             to support tag memory, but the simulations still have
>>>>             not passed.
>>>>             As a side note, I added the changes in icache.scala to
>>>>             remove the tags at the line to be presented to the
>>>>             instruction cache as well, but when I compared, say,
>>>>             rv64ui-p-ld test .out simulated from the latest
>>>>             rocket-chip with the .out file from my changes to it, I
>>>>             noticed that the two PCs differ after several
>>>>             instructions when the program actually starts. When I
>>>>             revert back the changes in icache.scala (as in,
>>>>             removeTag doesn't get called), the two PCs start
>>>>             deviating later on instead of within the first few
>>>>             after the program starts. Does the L1 instruction
>>>>             caches not interact with HTIF?
>>>>             Without removing the tags in the instruction cache, the
>>>>             PCs begin to deviate after the branch instruction in:
>>>>              27c: 0080b183 ld  gp,8(ra)
>>>>              280: ff010eb7 lui t4,0xff010
>>>>              284: f01e8e9b addiw t4,t4,-255
>>>>              288: 010e9e93 slli t4,t4,0x10
>>>>              28c: f01e8e93 addi t4,t4,-255 # ffffffffff00ff01
>>>>             <_end+0xffffffffff00eee1>
>>>>              290: 010e9e93 slli t4,t4,0x10
>>>>              294: f00e8e93 addi t4,t4,-256
>>>>              298: 00300e13 li  t3,3
>>>>              29c: 37d19c63 bne gp,t4,614 <fail>
>>>>             I am guessing the correct data isn't loaded to gp? How
>>>>             do I check this in the output file? I thought gp is the
>>>>             alias for register 31, but I don't see r31 around gp at
>>>>             that point.
>>>>             Thanks.
>>>>             On Fri, Sep 18, 2015 at 4:36 AM, Wei Song
>>>>             <ws327 at cam.ac.uk <mailto:ws327 at cam.ac.uk>> wrote:
>>>>                 Hello Zhe Cheng,
>>>>                 I think you are probably right on what is needed
>>>>                 for supporting tags on
>>>>                 the latest rocket repo.
>>>>                 However, it is always complicated to make it really
>>>>                 work.
>>>>                 One thing I noticed is that you probably need to
>>>>                 apply the changes to
>>>>                 htif.scala as well if you have not done so.
>>>>                 The tags are stored in a cache line in a way like
>>>>                 [tag][word][tag][word]....
>>>>                 The insertTag() and removeTag() in HTIF will make
>>>>                 sure tag/data end up
>>>>                 in the right interleaved position inside a cache line.
>>>>                 Host interface (HTIF) is very important as the test
>>>>                 programs (elf/hex)
>>>>                 are written to memory/L2 through it.
>>>>                 I think the host interface may have written totally
>>>>                 unaligned program to
>>>>                 memory due to the lack of insertTag() function.
>>>>                 Also you need to revise the control path of the
>>>>                 rocket core, which I
>>>>                 think you have done so.
>>>>                 For general debugging tips, you can compare the
>>>>                 traces from simulation
>>>>                 with the dump files of the test programs.
>>>>                 Making sure the rocket processor is running the
>>>>                 correct instructions
>>>>                 would be my first check.
>>>>                 BTW, I am working on bringing up a standard-alone
>>>>                 lowRISC with tag
>>>>                 support based on the latest Rocket chip.
>>>>                 However, it is a slow process and I will need at
>>>>                 least a couple of
>>>>                 months on it.
>>>>                 You will be able to run on a clean design if you
>>>>                 can wait that long.
>>>>                 Or if you would like to help, see the "update"
>>>>                 branch of lowrisc-chip.git.
>>>>                 I am working on peripherals now. Tag support is not
>>>>                 added yet, so I can
>>>>                 use some help to bring back tag support to the new
>>>>                 code.
>>>>                 Hope this is helpful,
>>>>                 Wei
>>>>                 On 18/09/2015 00:32, Zhe Cheng Lee wrote:
>>>>                 > Hi, all,
>>>>                 >
>>>>                 > Has anyone successfully port lowRISC changes to
>>>>                 support tagged memory to a
>>>>                 > more updated version of the rocket chip
>>>>                 repository (e.g. develop lowRISC
>>>>                 > from a more updated version of the rocket chip
>>>>                 repository)?
>>>>                 >
>>>>                 > I want to develop a design module that rely on
>>>>                 those tagged memory bits and
>>>>                 > are to be integrated with the most recent version
>>>>                 of the rocket chip. At
>>>>                 > this stage of my development process, I just want
>>>>                 at least the L1 caches to
>>>>                 > support tagged memory. In other words, I'm not
>>>>                 concerned about including
>>>>                 > the tag cache or supporting tagged memory in main
>>>>                 memory right now. I'm
>>>>                 > having trouble successfully pushing the tags into
>>>>                 the L1 caches. I have
>>>>                 > already added the load/store tag instruction
>>>>                 decoding and encoding (I'm
>>>>                 > aware that the order of the control signals in
>>>>                 the decode table has been
>>>>                 > changed a bit since the rocket-chip version
>>>>                 lowRISC is based off of), the
>>>>                 > new memory access type constant MT_T, and the
>>>>                 necessary config parameters.
>>>>                 >
>>>>                 > At first, I thought I just need to include the
>>>>                 highlighted modifications in
>>>>                 > lowRISC's nbdcache.scala from
>>>>                 >
>>>>                 https://github.com/lowRISC/rocket/commit/51f65e2dce1bc60ef37c6da956bd8f9c8972961b#diff-de7e6f4be95f6d3b7e13d6c32e5c9783
>>>>                 > and in its tilelink.scala from
>>>>                 >
>>>>                 https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b
>>>>                 > to the corresponding places in rocket-chip's
>>>>                 nbdcache.scala, cache.scala,
>>>>                 > and tilelink.scala. Even without the tag
>>>>                 utilities and tag cache, this
>>>>                 > should be fine just for testing existing
>>>>                 instructions, since those tag bits
>>>>                 > would just be ignored in those cases, correct?
>>>>                 But with that, the
>>>>                 > simulations do not pass the prebuilt tests and
>>>>                 benchmarks that don't test
>>>>                 > the load/store tag instructions.
>>>>                 >
>>>>                 > Can anyone help with this?
>>>>                 >
>>>>                 > Thanks.

More information about the lowrisc-dev mailing list