[lowrisc-dev] Porting tagged memory support to current version of RISC-V Rocket Chip

Monjur Alam alammonjur at gmail.com
Fri Oct 9 22:19:35 BST 2015


Hi Wei,

I got your point. Answer to your question is No, it does not fill the cache
with fake tag after reset. And, you are write, always miss happen at the
the beginning just after reset. Thanks for pointing this.

One more suggestion please; do you ever run SPEC CPU2006 on top of
rocket-chip on FPGA. I have created a stackOverflow question (
http://stackoverflow.com/questions/33004581/running-spec06-with-riscv-architecture).
The Speckle provides a wrapper for that to run spike. But, spike has no
connection with rocket-chip. I think, running CPU2006  on top of
rocket-chip on FPGA will demonstrate the performance overhead of real
architecture.

Your opinion please.

Regards,
Monjur

On Tue, Oct 6, 2015 at 4:36 AM, Wei Song <ws327 at cam.ac.uk> wrote:

> Hello Monjur,
>
> The reasoning for tag cache is to reduce the traffic to DRAM.
> In lowRISC, tags and data are stored separately in different DRAM
> partitions.
> So a miss in L1 will cause at least two DRAM reads (one for data and one
> for tag).
> The total DRAM traffic is increased by 100%.
> A tag cache is supposed to reduce the amount of tag traffic but does not
> help on data traffic.
>
> If in your case the tag cache is always hit, I am also wondering why there
> is this 22% overhead.
> However, a big tag cache does not guarantee hit.
> Is you tag cache kind of dummy, which I mean the tag cache provides fake
> tags without the need to fill empty cache lines even after reset?
> Otherwise, the tag cache is empty at the beginning and there will be
> compulsory misses after reset.
>
> Best regards,
> Wei
>
>
> On 05/10/2015 23:03, Monjur Alam wrote:
>
> Hi Wei,
>
> Thank you very much for your help through out by providing valuable
> suggestion.
>
> So far, we have implemented tag support of riscv for L1 (will add L2 later
> on). The architecture is (more or less same as lowRisc):
>
> 0. Unlike lowRisk, we perform basic operations (load, store) for data and
> tag parallel.
> 1. Extend data cache 1 bit / double word
> 2. Added tag cache that resides between L1 and DRAM
> 3. Design a tagger module for making bridge between tagCache and DDR3
>
> But, we have seen that the performance is degraded around 22%; we have
> tested it by existing benchmarks. We are planing to map the design into
> zc706 FPGA and to run SPEC benchmark on our architecture.
>
> 1. As tag cache (32 MB) assure tag hit, why such performance degradation
> (22%)?
> 2. Does tag cache conceptually help for data miss (not tag miss). Because,
> data miss fetch DRAM, so completion of operation depends on data fetch, not
> only tag even tag is fetched from tag cache which is faster?
> 3. Do we really need tag cache, we can fetch tag from DRAM like data.
>
> Your suggestion please.
>
> Regards,
> Monjur
>
>
> On Tue, Sep 22, 2015 at 4:27 AM, Wei Song <ws327 at cam.ac.uk> wrote:
>
>> Hello Zhe Cheng,
>>
>> Actually extending tags in L2 is very simple.
>> L2 is ignorant to the content of cache lines. What you need to do is to
>> extend the size of data array.
>> TileLink is the communication fabric used internally in Rocket.
>> Both the broadcasting hub and L2 use the same TileLink/MemIO converter,
>> you you do not need to revise a new converter.
>> At start, HTIF writes program to L2. When L2 needs to write back, some
>> cache line is then written to memory using the TileLink/MemIO converter.
>> Seems like you have made to broadcast one working already.
>>
>> Best regards,
>> Wei
>>
>>
>> On 22/09/2015 00:54, Zhe Cheng Lee wrote:
>>
>> Hello Wei,
>>
>> Than you for your response. I was previously using a broadcast coherence
>> hub instead of a L2, but now I have moved to using an L2 after verifying
>> that tag bits can be stored to and loaded from the L1 caches fine in my
>> modifications to the rocket chip. In this case, will the data be written
>> from HTIF to L2 through a different converter? Is there a TileLink-to-L2
>> data converter?
>>
>> Best regards.
>> -Zhe Cheng
>>
>> On Sat, Sep 19, 2015 at 9:15 AM, Wei Song < <ws327 at cam.ac.uk>
>> ws327 at cam.ac.uk> wrote:
>>
>>> Hello Zhe Cheng,
>>>
>>> I just noticed another issue which may or may not cause the error.
>>> Since you do not want to use the tag cache, I assume you are using the
>>> original MemIOUncachedTileLinkIOConverter to covert TileLink messages to
>>> MemIO messages.
>>> Also I assume you are using the broadcast coherence hub instead of using
>>> a L2.
>>> In this case, the data written from HTIF are always written to memory
>>> through this MemIO/TileLinke converter.
>>> You need to remove tags for messages from TileLink to MemIO and add tags
>>> for messages from MemIO to TileLinks.
>>>
>>> Tag cache does the conversion so I did not change the code of this
>>> MemIO/TileLinke converter.
>>> But some revision is needed in your case. Something like what the HTIF
>>> and icache has been done.
>>>
>>> The assembly seems from the dump file, which is correct to my eyes.
>>> The difference between trace file and dump file would reveal more
>>> insights.
>>> If you think the value load to gp is wrong, may be have a look of the
>>> test case and try to figure out what exactly wrong would help you debug.
>>> I think it is the test case test_3 in riscv-tests/isa/rv64ui/ld.S.
>>>
>>> Best regards,
>>> Wei
>>>
>>>
>>> On 18/09/15 23:59, Zhe Cheng Lee wrote:
>>>
>>> Hi Wei,
>>>
>>> Thank you very much for your response. It is indeed complicated to get
>>> this to really work. I found your response helpful, though. I didn't
>>> consider HTIF before when modifying the current rocket chip. I can see why
>>> HTIF is imporant then.
>>>
>>> By control path, do you mean the control signals associated with the new
>>> instructions and the logic to handling them? If so, then yes, I have
>>> changed it.
>>>
>>> I added the tag utilities (I changed the data types in these tag
>>> function from Bits to UInt) and modified the corresponding lines in
>>> htif.scala accordingly to the changes in this commit
>>> <https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b>
>>> to support tag memory, but the simulations still have not passed.
>>>
>>> As a side note, I added the changes in icache.scala to remove the tags
>>> at the line to be presented to the instruction cache as well, but when I
>>> compared, say, rv64ui-p-ld test .out simulated from the latest rocket-chip
>>> with the .out file from my changes to it, I noticed that the two PCs differ
>>> after several instructions when the program actually starts. When I revert
>>> back the changes in icache.scala (as in, removeTag doesn't get called), the
>>> two PCs start deviating later on instead of within the first few after the
>>> program starts. Does the L1 instruction caches not interact with HTIF?
>>>
>>> Without removing the tags in the instruction cache, the PCs begin to
>>> deviate after the branch instruction in:
>>>
>>>  27c:   0080b183            ld  gp,8(ra)
>>>  280:   ff010eb7            lui t4,0xff010
>>>  284:   f01e8e9b            addiw   t4,t4,-255
>>>  288:   010e9e93            slli    t4,t4,0x10
>>>  28c:   f01e8e93            addi    t4,t4,-255 # ffffffffff00ff01
>>> <_end+0xffffffffff00eee1>
>>>  290:   010e9e93            slli    t4,t4,0x10
>>>  294:   f00e8e93            addi    t4,t4,-256
>>>  298:   00300e13            li  t3,3
>>>  29c:   37d19c63            bne gp,t4,614 <fail>
>>>
>>> I am guessing the correct data isn't loaded to gp? How do I check this
>>> in the output file? I thought gp is the alias for register 31, but I don't
>>> see r31 around gp at that point.
>>>
>>> Thanks.
>>>
>>>
>>> On Fri, Sep 18, 2015 at 4:36 AM, Wei Song < <ws327 at cam.ac.uk>
>>> ws327 at cam.ac.uk> wrote:
>>>
>>>> Hello Zhe Cheng,
>>>>
>>>> I think you are probably right on what is needed for supporting tags on
>>>> the latest rocket repo.
>>>> However, it is always complicated to make it really work.
>>>>
>>>> One thing I noticed is that you probably need to apply the changes to
>>>> htif.scala as well if you have not done so.
>>>> The tags are stored in a cache line in a way like
>>>> [tag][word][tag][word]....
>>>>
>>>> The insertTag() and removeTag() in HTIF will make sure tag/data end up
>>>> in the right interleaved position inside a cache line.
>>>>
>>>> Host interface (HTIF) is very important as the test programs (elf/hex)
>>>> are written to memory/L2 through it.
>>>> I think the host interface may have written totally unaligned program to
>>>> memory due to the lack of insertTag() function.
>>>>
>>>> Also you need to revise the control path of the rocket core, which I
>>>> think you have done so.
>>>>
>>>> For general debugging tips, you can compare the traces from simulation
>>>> with the dump files of the test programs.
>>>> Making sure the rocket processor is running the correct instructions
>>>> would be my first check.
>>>>
>>>> BTW, I am working on bringing up a standard-alone lowRISC with tag
>>>> support based on the latest Rocket chip.
>>>> However, it is a slow process and I will need at least a couple of
>>>> months on it.
>>>> You will be able to run on a clean design if you can wait that long.
>>>> Or if you would like to help, see the "update" branch of
>>>> lowrisc-chip.git.
>>>> I am working on peripherals now. Tag support is not added yet, so I can
>>>> use some help to bring back tag support to the new code.
>>>>
>>>> Hope this is helpful,
>>>> Wei
>>>>
>>>>
>>>> On 18/09/2015 00:32, Zhe Cheng Lee wrote:
>>>> > Hi, all,
>>>> >
>>>> > Has anyone successfully port lowRISC changes to support tagged memory
>>>> to a
>>>> > more updated version of the rocket chip repository (e.g. develop
>>>> lowRISC
>>>> > from a more updated version of the rocket chip repository)?
>>>> >
>>>> > I want to develop a design module that rely on those tagged memory
>>>> bits and
>>>> > are to be integrated with the most recent version of the rocket chip.
>>>> At
>>>> > this stage of my development process, I just want at least the L1
>>>> caches to
>>>> > support tagged memory. In other words, I'm not concerned about
>>>> including
>>>> > the tag cache or supporting tagged memory in main memory right now.
>>>> I'm
>>>> > having trouble successfully pushing the tags into the L1 caches. I
>>>> have
>>>> > already added the load/store tag instruction decoding and encoding
>>>> (I'm
>>>> > aware that the order of the control signals in the decode table has
>>>> been
>>>> > changed a bit since the rocket-chip version lowRISC is based off of),
>>>> the
>>>> > new memory access type constant MT_T, and the necessary config
>>>> parameters.
>>>> >
>>>> > At first, I thought I just need to include the highlighted
>>>> modifications in
>>>> > lowRISC's nbdcache.scala from
>>>> >
>>>> <https://github.com/lowRISC/rocket/commit/51f65e2dce1bc60ef37c6da956bd8f9c8972961b#diff-de7e6f4be95f6d3b7e13d6c32e5c9783>
>>>> https://github.com/lowRISC/rocket/commit/51f65e2dce1bc60ef37c6da956bd8f9c8972961b#diff-de7e6f4be95f6d3b7e13d6c32e5c9783
>>>> > and in its tilelink.scala from
>>>> >
>>>> <https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b>
>>>> https://github.com/lowRISC/uncore/commit/cebfde6d42b7465cab79518fad91e323a1a5af41#diff-228d7a2c10baa84f6595aeec2d50174b
>>>> > to the corresponding places in rocket-chip's nbdcache.scala,
>>>> cache.scala,
>>>> > and tilelink.scala. Even without the tag utilities and tag cache, this
>>>> > should be fine just for testing existing instructions, since those
>>>> tag bits
>>>> > would just be ignored in those cases, correct? But with that, the
>>>> > simulations do not pass the prebuilt tests and benchmarks that don't
>>>> test
>>>> > the load/store tag instructions.
>>>> >
>>>> > Can anyone help with this?
>>>> >
>>>> > Thanks.
>>>>
>>>>
>>>
>>>
>>
>>
>
>


More information about the lowrisc-dev mailing list