Hi Mark,
With 68234df4ea7939f98431aa81113fbdce10c4a84b
arm64: kill flush_cache_all()
The documented semantics of flush_cache_all are not possible to provide
for arm64 (short of flushing the entire physical address space by VA),
and there are currently no users; KVM uses VA maintenance exclusively,
cpu_reset is never called, and the only two users outside of arch code
cannot be built for arm64.
While cpu_soft_reset and related functions (which call flush_cache_all)
were thought to be useful for kexec, their current implementations only
serve to mask bugs. For correctness kexec will need to perform
maintenance by VA anyway to account for system caches, line migration,
and other subtleties of the cache architecture. As the extent of this
cache maintenance will be kexec-specific, it should probably live in the
kexec code.
This patch removes flush_cache_all, and related unused components,
preventing further abuse.
This patch delete the flush_cache_all interface.
But if we use VA to flush cache to do cache-coherency with other master(eg:gpu)
We must iterate over the sg-list to flush by va to pa.
In this way, the iterate of sg-list may cost too much time(sg-table to sg-list) if
the sglist is too long. Take a look at the ion_pages_sync_for_device in ion.
The driver(eg: ION) need to use this interface(flush cache all) to *improve the efficiency*.
[adding LAKML]
On Mon, Mar 21, 2016 at 04:07:47PM +0800, Chen Feng wrote:
> Hi Mark,
Hi,
> With 68234df4ea7939f98431aa81113fbdce10c4a84b
> arm64: kill flush_cache_all()
> The documented semantics of flush_cache_all are not possible to provide
> for arm64 (short of flushing the entire physical address space by VA),
> and there are currently no users; KVM uses VA maintenance exclusively,
> cpu_reset is never called, and the only two users outside of arch code
> cannot be built for arm64.
>
> While cpu_soft_reset and related functions (which call flush_cache_all)
> were thought to be useful for kexec, their current implementations only
> serve to mask bugs. For correctness kexec will need to perform
> maintenance by VA anyway to account for system caches, line migration,
> and other subtleties of the cache architecture. As the extent of this
> cache maintenance will be kexec-specific, it should probably live in the
> kexec code.
>
> This patch removes flush_cache_all, and related unused components,
> preventing further abuse.
>
>
> This patch delete the flush_cache_all interface.
As the patch states, it does so because the documented semantics are
impossible to provide, as there is no portable mechanism to "flush" all
caches in the system.
Set/Way operations cannot guarantee that data has been cleaned to the
PoC (i.e. memory), or invalidated from all levels of cache. Reasons
include:
* They may race against background behaviour of the CPU (e.g.
speculation), which may allocate/evict/migrate lines. Depending on the
cache topology, this may "hide" lines from subsequent Set/Way
operations.
* They are not broadcast, and do not affect other CPUs. Depending on the
implemented cache coherency protocols, other CPUs may be able to
acquire dirty lines, or retain lines in shared states, and hence these
may not be operated on.
* They do not affect system caches (which respect cache maintenance by
VA in ARMv8-A).
The only portable mechanism to perform cache maintenance to all relevant
caches is by VA.
> But if we use VA to flush cache to do cache-coherency with other
> master(eg:gpu)
>
> We must iterate over the sg-list to flush by va to pa.
>
> In this way, the iterate of sg-list may cost too much time(sg-table to
> sg-list) if the sglist is too long. Take a look at the
> ion_pages_sync_for_device in ion.
>
> The driver(eg: ION) need to use this interface(flush cache all) to
> *improve the efficiency*.
As above, we cannot use Set/Way operations for this, and cannot provide
a flush_cache_all interface.
I'm not sure what to suggest regarding improving efficiency.
Is walking the sglist the expensive portion, or is the problem the cost
of multiple page-size operations (each with their own barriers)?
Thanks,
Mark.
On 03/21/2016 03:08 AM, Mark Rutland wrote:
> [adding LAKML]
>
> On Mon, Mar 21, 2016 at 04:07:47PM +0800, Chen Feng wrote:
>> Hi Mark,
>
> Hi,
>
>> With 68234df4ea7939f98431aa81113fbdce10c4a84b
>> arm64: kill flush_cache_all()
>> The documented semantics of flush_cache_all are not possible to provide
>> for arm64 (short of flushing the entire physical address space by VA),
>> and there are currently no users; KVM uses VA maintenance exclusively,
>> cpu_reset is never called, and the only two users outside of arch code
>> cannot be built for arm64.
>>
>> While cpu_soft_reset and related functions (which call flush_cache_all)
>> were thought to be useful for kexec, their current implementations only
>> serve to mask bugs. For correctness kexec will need to perform
>> maintenance by VA anyway to account for system caches, line migration,
>> and other subtleties of the cache architecture. As the extent of this
>> cache maintenance will be kexec-specific, it should probably live in the
>> kexec code.
>>
>> This patch removes flush_cache_all, and related unused components,
>> preventing further abuse.
>>
>>
>> This patch delete the flush_cache_all interface.
>
> As the patch states, it does so because the documented semantics are
> impossible to provide, as there is no portable mechanism to "flush" all
> caches in the system.
>
> Set/Way operations cannot guarantee that data has been cleaned to the
> PoC (i.e. memory), or invalidated from all levels of cache. Reasons
> include:
>
> * They may race against background behaviour of the CPU (e.g.
> speculation), which may allocate/evict/migrate lines. Depending on the
> cache topology, this may "hide" lines from subsequent Set/Way
> operations.
>
> * They are not broadcast, and do not affect other CPUs. Depending on the
> implemented cache coherency protocols, other CPUs may be able to
> acquire dirty lines, or retain lines in shared states, and hence these
> may not be operated on.
>
> * They do not affect system caches (which respect cache maintenance by
> VA in ARMv8-A).
>
> The only portable mechanism to perform cache maintenance to all relevant
> caches is by VA.
>
>> But if we use VA to flush cache to do cache-coherency with other
>> master(eg:gpu)
>>
>> We must iterate over the sg-list to flush by va to pa.
>>
>> In this way, the iterate of sg-list may cost too much time(sg-table to
>> sg-list) if the sglist is too long. Take a look at the
>> ion_pages_sync_for_device in ion.
>>
>> The driver(eg: ION) need to use this interface(flush cache all) to
>> *improve the efficiency*.
>
> As above, we cannot use Set/Way operations for this, and cannot provide
> a flush_cache_all interface.
>
> I'm not sure what to suggest regarding improving efficiency.
>
> Is walking the sglist the expensive portion, or is the problem the cost
> of multiple page-size operations (each with their own barriers)?
>
Last time I looked at this, it was mostly the multiple page-size operations.
Ion buffers can be big and easily bigger than the cache as well so flushing
8MB of buffer for a 2MB cache is really a performance killer. The Set/Way
operations are an improvement on systems where they can be used.
The way Ion does cache maintenance is full of sadness and despair in general.
Until everything gets a significant rework, the best option may be
minimization of code paths where cache operations are called.
> Thanks,
> Mark.
>
Thanks,
Laura
Hi Mark & Laura
On 2016/3/21 23:58, Laura Abbott wrote:
> On 03/21/2016 03:08 AM, Mark Rutland wrote:
>> [adding LAKML]
>>
>> On Mon, Mar 21, 2016 at 04:07:47PM +0800, Chen Feng wrote:
>>> Hi Mark,
>>
>> Hi,
>>
>>> With 68234df4ea7939f98431aa81113fbdce10c4a84b
>>> arm64: kill flush_cache_all()
>>> The documented semantics of flush_cache_all are not possible to provide
>>> for arm64 (short of flushing the entire physical address space by VA),
>>> and there are currently no users; KVM uses VA maintenance exclusively,
>>> cpu_reset is never called, and the only two users outside of arch code
>>> cannot be built for arm64.
>>>
>>> While cpu_soft_reset and related functions (which call flush_cache_all)
>>> were thought to be useful for kexec, their current implementations only
>>> serve to mask bugs. For correctness kexec will need to perform
>>> maintenance by VA anyway to account for system caches, line migration,
>>> and other subtleties of the cache architecture. As the extent of this
>>> cache maintenance will be kexec-specific, it should probably live in the
>>> kexec code.
>>>
>>> This patch removes flush_cache_all, and related unused components,
>>> preventing further abuse.
>>>
>>>
>>> This patch delete the flush_cache_all interface.
>>
>> As the patch states, it does so because the documented semantics are
>> impossible to provide, as there is no portable mechanism to "flush" all
>> caches in the system.
>>
>> Set/Way operations cannot guarantee that data has been cleaned to the
>> PoC (i.e. memory), or invalidated from all levels of cache. Reasons
>> include:
>>
>> * They may race against background behaviour of the CPU (e.g.
>> speculation), which may allocate/evict/migrate lines. Depending on the
>> cache topology, this may "hide" lines from subsequent Set/Way
>> operations.
>>
>> * They are not broadcast, and do not affect other CPUs. Depending on the
>> implemented cache coherency protocols, other CPUs may be able to
>> acquire dirty lines, or retain lines in shared states, and hence these
>> may not be operated on.
>>
>> * They do not affect system caches (which respect cache maintenance by
>> VA in ARMv8-A).
>>
>> The only portable mechanism to perform cache maintenance to all relevant
>> caches is by VA.
>>
>>> But if we use VA to flush cache to do cache-coherency with other
>>> master(eg:gpu)
>>>
>>> We must iterate over the sg-list to flush by va to pa.
>>>
>>> In this way, the iterate of sg-list may cost too much time(sg-table to
>>> sg-list) if the sglist is too long. Take a look at the
>>> ion_pages_sync_for_device in ion.
>>>
>>> The driver(eg: ION) need to use this interface(flush cache all) to
>>> *improve the efficiency*.
>>
>> As above, we cannot use Set/Way operations for this, and cannot provide
>> a flush_cache_all interface.
>>
>> I'm not sure what to suggest regarding improving efficiency.
>>
>> Is walking the sglist the expensive portion, or is the problem the cost
>> of multiple page-size operations (each with their own barriers)?
>>
>
> Last time I looked at this, it was mostly the multiple page-size operations.
> Ion buffers can be big and easily bigger than the cache as well so flushing
> 8MB of buffer for a 2MB cache is really a performance killer. The Set/Way
> operations are an improvement on systems where they can be used.
>
> The way Ion does cache maintenance is full of sadness and despair in general.
> Until everything gets a significant rework, the best option may be
> minimization of code paths where cache operations are called.
>
Thanks for your share.
Think about the low-mem scene, there are only 4k pages in system.
If the ION buffer is 8MB or bigger, the sglist is very long.
for_each_sg(sgl, sg, ret, i) be took too much time.
Laura,
>the best option may be
> minimization of code paths where cache operations are called.
I am confused with the above comments. The buffer size is used by camera,
the frame size may be just so bigger. Do you have any method for improve this?
Thanks very much.
>> Thanks,
>> Mark.
>>
>
> Thanks,
> Laura
>
> .
>
On Mon, Mar 21, 2016 at 08:58:02AM -0700, Laura Abbott wrote:
> On 03/21/2016 03:08 AM, Mark Rutland wrote:
> >On Mon, Mar 21, 2016 at 04:07:47PM +0800, Chen Feng wrote:
> >>But if we use VA to flush cache to do cache-coherency with other
> >>master(eg:gpu)
> >>
> >>We must iterate over the sg-list to flush by va to pa.
> >>
> >>In this way, the iterate of sg-list may cost too much time(sg-table to
> >>sg-list) if the sglist is too long. Take a look at the
> >>ion_pages_sync_for_device in ion.
> >>
> >>The driver(eg: ION) need to use this interface(flush cache all) to
> >>*improve the efficiency*.
> >I'm not sure what to suggest regarding improving efficiency.
> >
> >Is walking the sglist the expensive portion, or is the problem the cost
> >of multiple page-size operations (each with their own barriers)?
>
> Last time I looked at this, it was mostly the multiple page-size operations.
We may be able to amortize some of that cost if we had non-synchronised
cache maintenance operations for each page, then followed that up with a
single final DSB SY.
There are several places in arch/arm64/mm/dma-mapping.c (practically
every use of for_each_sg) that could potentially benefit. I'm not sure
how much that's likely to gain as it will depend heavily on the
microarchitecture.
Regardless, it looks like that would require ion_pages_sync_for_device
and friends to be reworked, as it seems to only hand single pages down
to the architecture backend rather than a more complete sglist.
Thanks,
Mark.