LinuxLists.cc - Two questions about cache coherency on arm platforms

2020-03-23 12:36:04

Subject: Two questions about cache coherency on arm platforms

Hi, All,
I am not very familiar with ARM processors. I have two questions about
cache coherency. Could anyone help me?

1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
As far as I know, big cores and little cores are in seperate clusters on
big.LITTLE system. And cache coherence betwwen clusters requires the
memory regions are marked as 'Outer Shareable' and is very expensive.
I have checked the kernel code, and seems it only requires coherence in
'Inner Shareable' domain. So my question is how can linux guarantees
cache coherence in 'CPU migration' or 'Global Task Scheduling' models
wich both clusters are active at the same time? For example, a thread
ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
to Cluster B.

2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
In linux kernel on arm64 platform, the flow function sync_icache_aliases()
is used to sync i-cache and d-cache. I understand the aliasing case. but
for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
whithout really invalidate the icache? Will i-cache refill from L2 cache?

void sync_icache_aliases(void *kaddr, unsigned long len)
{
unsigned long addr = (unsigned long)kaddr;

if (icache_is_aliasing()) {
__clean_dcache_area_pou(kaddr, len);
__flush_icache_all();
} else {
/*
* Don't issue kick_all_cpus_sync() after I-cache invalidation
* for user mappings.
*/
__flush_icache_range(addr, addr + len);
}
}

--
Cheers,
Changbin Du

2020-03-23 13:18:23

by Mark Rutland

[permalink] [raw]

Subject: Re: Two questions about cache coherency on arm platforms

On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> Hi, All,
> I am not very familiar with ARM processors. I have two questions about
> cache coherency. Could anyone help me?
>
> 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> As far as I know, big cores and little cores are in seperate clusters on
> big.LITTLE system.

This is often true, but not always the case. For example, with DSU big
and little cores can be placed within the same cluster.

> And cache coherence betwwen clusters requires the
> memory regions are marked as 'Outer Shareable' and is very expensive.

This is not correct.

Linux requires that all cores it uses are within the same Inner
Shareable domain, regardless of whether they are in distinct clusters.
Linux does not support systems where cores are in distinct Inner
Shareable domains.

This is the intended use of the architecture. Per ARM DDI 0487E.a page
B2-144:

| This architecture assumes that all PEs that use the same operating
| system or hypervisor are in the same Inner Shareable shareability
| shareability

... where a PE is a "Processing Element", which you can think of as a
single core.

> I have checked the kernel code, and seems it only requires coherence in
> 'Inner Shareable' domain. So my question is how can linux guarantees
> cache coherence in 'CPU migration' or 'Global Task Scheduling' models
> wich both clusters are active at the same time? For example, a thread
> ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
> to Cluster B.

As above, this works because all the relevant cores are within the same
Inner Shareable domain.

> 2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
> In linux kernel on arm64 platform, the flow function sync_icache_aliases()
> is used to sync i-cache and d-cache. I understand the aliasing case. but
> for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
> whithout really invalidate the icache?

The __flush_icache_range/__flush_cache_user_range assembly function does
both the D-cache maintenance with DC CVAU, then the I-cache maintenance
with IC IVAU, so I think you have misread it.

Thanks,
Mark.

> Will i-cache refill from L2 cache?
>
> void sync_icache_aliases(void *kaddr, unsigned long len)
> {
> unsigned long addr = (unsigned long)kaddr;
>
> if (icache_is_aliasing()) {
> __clean_dcache_area_pou(kaddr, len);
> __flush_icache_all();
> } else {
> /*
> * Don't issue kick_all_cpus_sync() after I-cache invalidation
> * for user mappings.
> */
> __flush_icache_range(addr, addr + len);
> }
> }
>
> --
> Cheers,
> Changbin Du

2020-03-23 16:16:55

by Changbin Du

[permalink] [raw]

Subject: Re: Two questions about cache coherency on arm platforms

Hi Mark,
Thanks for your answer. I still don't understand the first question.

On Mon, Mar 23, 2020 at 01:17:20PM +0000, Mark Rutland wrote:
> On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> > Hi, All,
> > I am not very familiar with ARM processors. I have two questions about
> > cache coherency. Could anyone help me?
> >
> > 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> > As far as I know, big cores and little cores are in seperate clusters on
> > big.LITTLE system.
>
> This is often true, but not always the case. For example, with DSU big
> and little cores can be placed within the same cluster.
>
Yes, it is ture for DynamIQ that bl cores can be placed within the same cluster.
But I don't understand how linux support big.LITTLE before DynamIQ.

I read below description in ARM Cortex-A Series Programmer’s Guide for
ARMv8-A.
| big.LITTLE software models require transparent and efficient transfer of data between big and LITTLE clusters.
| Coherency between clusters is provided by a cache-coherent interconnect such as the ARM CoreLink CCI-400 described in Chapter 14.

So I think big cores and little cores are in different clusters in this
case. Then we are not within the same Inner Shareable domain?

> > And cache coherence betwwen clusters requires the
> > memory regions are marked as 'Outer Shareable' and is very expensive.
>
> This is not correct.
>
> Linux requires that all cores it uses are within the same Inner
> Shareable domain, regardless of whether they are in distinct clusters.
> Linux does not support systems where cores are in distinct Inner
> Shareable domains.
>
I see. Thanks.

> This is the intended use of the architecture. Per ARM DDI 0487E.a page
> B2-144:
>
> | This architecture assumes that all PEs that use the same operating
> | system or hypervisor are in the same Inner Shareable shareability
> | shareability
>
> ... where a PE is a "Processing Element", which you can think of as a
> single core.
>
> > I have checked the kernel code, and seems it only requires coherence in
> > 'Inner Shareable' domain. So my question is how can linux guarantees
> > cache coherence in 'CPU migration' or 'Global Task Scheduling' models
> > wich both clusters are active at the same time? For example, a thread
> > ran in Cluster A and modified 'Inner Shareable' memory, then it migrates
> > to Cluster B.
>
> As above, this works because all the relevant cores are within the same
> Inner Shareable domain.
>
> > 2. ARM64 cache maintenance code sync_icache_aliases() for non-aliasing icache.
> > In linux kernel on arm64 platform, the flow function sync_icache_aliases()
> > is used to sync i-cache and d-cache. I understand the aliasing case. but
> > for non-aliasing case why it just does "dc cvau" (in __flush_icache_range())
> > whithout really invalidate the icache?
>
> The __flush_icache_range/__flush_cache_user_range assembly function does
> both the D-cache maintenance with DC CVAU, then the I-cache maintenance
> with IC IVAU, so I think you have misread it.
>a
Yes. I missed the IC IVAU instruction defined in macro
invalidate_icache_by_line.

> Thanks,
> Mark.
>
> > Will i-cache refill from L2 cache?
> >
> > void sync_icache_aliases(void *kaddr, unsigned long len)
> > {
> > unsigned long addr = (unsigned long)kaddr;
> >
> > if (icache_is_aliasing()) {
> > __clean_dcache_area_pou(kaddr, len);
> > __flush_icache_all();
> > } else {
> > /*
> > * Don't issue kick_all_cpus_sync() after I-cache invalidation
> > * for user mappings.
> > */
> > __flush_icache_range(addr, addr + len);
> > }
> > }
> >
> > --
> > Cheers,
> > Changbin Du

--
Cheers,
Changbin Du

2020-03-23 16:48:21

by Mark Rutland

[permalink] [raw]

Subject: Re: Two questions about cache coherency on arm platforms

On Mon, Mar 23, 2020 at 04:15:40PM +0000, Changbin Du wrote:
> Hi Mark,
> Thanks for your answer. I still don't understand the first question.
>
> On Mon, Mar 23, 2020 at 01:17:20PM +0000, Mark Rutland wrote:
> > On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> > > Hi, All,
> > > I am not very familiar with ARM processors. I have two questions about
> > > cache coherency. Could anyone help me?
> > >
> > > 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> > > As far as I know, big cores and little cores are in seperate clusters on
> > > big.LITTLE system.
> >
> > This is often true, but not always the case. For example, with DSU big
> > and little cores can be placed within the same cluster.
>
> Yes, it is ture for DynamIQ that bl cores can be placed within the same cluster.
> But I don't understand how linux support big.LITTLE before DynamIQ.

Multiple clusters can be in the same Inner Shareable domain, and Linux
relies on this being the case for systems it supports. It's possible to
build a system where clusters are in distinct Inner Shareable domains,
but Linux does not support using all cores on such a system.

Even with CCI, CCN, CMN, etc, Linux requires that all cores (which it is
told about) are in the same Inner Shareable domain. That is what is
commonly built.

> I read below description in ARM Cortex-A Series Programmer’s Guide for
> ARMv8-A.
> | big.LITTLE software models require transparent and efficient transfer of data between big and LITTLE clusters.
> | Coherency between clusters is provided by a cache-coherent interconnect such as the ARM CoreLink CCI-400 described in Chapter 14.
>
> So I think big cores and little cores are in different clusters in this
> case. Then we are not within the same Inner Shareable domain?

Linux requires that those clusters are in the same Inner Shareable
domain, and that's what people (mostly) build today.

Thanks,
Mark.

2020-03-24 00:02:56

by Changbin Du

[permalink] [raw]

Subject: Re: Two questions about cache coherency on arm platforms

On Mon, Mar 23, 2020 at 04:47:24PM +0000, Mark Rutland wrote:
> On Mon, Mar 23, 2020 at 04:15:40PM +0000, Changbin Du wrote:
> > Hi Mark,
> > Thanks for your answer. I still don't understand the first question.
> >
> > On Mon, Mar 23, 2020 at 01:17:20PM +0000, Mark Rutland wrote:
> > > On Mon, Mar 23, 2020 at 08:35:26PM +0800, Changbin Du wrote:
> > > > Hi, All,
> > > > I am not very familiar with ARM processors. I have two questions about
> > > > cache coherency. Could anyone help me?
> > > >
> > > > 1. How is cache coherency maintenanced on ARMv8 big.LITTLE system?
> > > > As far as I know, big cores and little cores are in seperate clusters on
> > > > big.LITTLE system.
> > >
> > > This is often true, but not always the case. For example, with DSU big
> > > and little cores can be placed within the same cluster.
> >
> > Yes, it is ture for DynamIQ that bl cores can be placed within the same cluster.
> > But I don't understand how linux support big.LITTLE before DynamIQ.
>
> Multiple clusters can be in the same Inner Shareable domain, and Linux
> relies on this being the case for systems it supports. It's possible to
> build a system where clusters are in distinct Inner Shareable domains,
> but Linux does not support using all cores on such a system.
>
> Even with CCI, CCN, CMN, etc, Linux requires that all cores (which it is
> told about) are in the same Inner Shareable domain. That is what is
> commonly built.
>
Thank you, I see now. I thought clusters must be in distinct Inner
Shareable domains. So I was wrong. The mannual is somewhat misleading.

> > I read below description in ARM Cortex-A Series Programmer’s Guide for
> > ARMv8-A.
> > | big.LITTLE software models require transparent and efficient transfer of data between big and LITTLE clusters.
> > | Coherency between clusters is provided by a cache-coherent interconnect such as the ARM CoreLink CCI-400 described in Chapter 14.
> >
> > So I think big cores and little cores are in different clusters in this
> > case. Then we are not within the same Inner Shareable domain?
>
> Linux requires that those clusters are in the same Inner Shareable
> domain, and that's what people (mostly) build today.
>
> Thanks,
> Mark.

--
Cheers,
Changbin Du