In https://lore.kernel.org/lkml/CAHk-=wijdojzo56FzYqE5TOYw2Vws7ik3LEMGj9SPQaJJ+Z73Q@mail.gmail.com/
Linus offers the opinion that kunmap calls should imply a
flush_dcache_page(). Christoph added calls to flush_dcache_page()
in commit 8dad53a11f8d. Was this "voodoo programming", or was there
a real problem being addressed?
On Thu, Nov 4, 2021 at 8:03 AM Matthew Wilcox <[email protected]> wrote:
>
> Linus offers the opinion that kunmap calls should imply a
> flush_dcache_page(). Christoph added calls to flush_dcache_page()
> in commit 8dad53a11f8d. Was this "voodoo programming", or was there
> a real problem being addressed?
I don't think anybody actually uses/cares about flush_dcache_page() at
all, and pretty much all uses are random and voodoo.
No sane architecture uses pure virtual caches, and the insane ones
haven't been an issue for a long time either.
But if there are still systems with pure virtual caches, and they need
manual cache flushing, then I do think that kunmap is one of the
points that needs it, since that's the "I'm done accessing this data
through this virtual address" place.
End result: I really don't think anybody cares any more (and only
truly broken architectures ever did). I'd personally be perfectly
happy just saying "we might as well drop support for non-coherent
caches entirely".
But as long as we have those random odd "flush dcache manually"
things, I think kunmap() is one of the places that probably should
continue to do them.
Of course, the kunmap case is _doubly_ irrelevant, because we should
certainly hope that not only are those noncoherent pure virtual caches
a thing of the past, highmem itself should be going away.
Why did this come up? Do you actually have some hardware or situation
that cares?
Linus
On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote:
> On Thu, Nov 4, 2021 at 8:03 AM Matthew Wilcox <[email protected]> wrote:
> > Linus offers the opinion that kunmap calls should imply a
> > flush_dcache_page(). Christoph added calls to flush_dcache_page()
> > in commit 8dad53a11f8d. Was this "voodoo programming", or was there
> > a real problem being addressed?
>
> I don't think anybody actually uses/cares about flush_dcache_page() at
> all, and pretty much all uses are random and voodoo.
We do. flush_dcache_page() is not just about virtual caches. On arm32/64
(and powerpc), even with PIPT-like caches, we use it to flag a page's
D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping
is executable, we do the cache maintenance to ensure the I and D caches
are coherent with each other.
I wouldn't add this call to kmap/kunmap_local(), it would be a slight
unnecessary overhead (we had a customer complaining about kmap_atomic()
breaking write-streaming, I think the new kmap_local() solved this
problem, if in the right context).
--
Catalin
On Thu, Nov 4, 2021 at 9:54 AM Catalin Marinas <[email protected]> wrote:
>
> We do. flush_dcache_page() is not just about virtual caches. On arm32/64
> (and powerpc), even with PIPT-like caches, we use it to flag a page's
> D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping
> is executable, we do the cache maintenance to ensure the I and D caches
> are coherent with each other.
Ugh,. ok, so we have two very different use-cases for that function.
Perhaps more importantly, they have hugely different semantics. For
you, it's about pages that can be mapped executable, so it's only
relevant for mappable pages.
For the traditional broken pure virtual cache case, it's not about
user mappings at all, it's about any data structure that we might have
in highmem.
Of course, I think we got rid of most of the other uses of highmem,
and we no longer put any "normal" kernel data in highmem pages. There
used to be patches that did inodes and things like that in highmem,
and they actually depended on the "cache the virtual address so that
it's always the same" behavior.
> I wouldn't add this call to kmap/kunmap_local(), it would be a slight
> unnecessary overhead (we had a customer complaining about kmap_atomic()
> breaking write-streaming, I think the new kmap_local() solved this
> problem, if in the right context).
kmap_local() ends up being (I think) fundamentally broken for virtual
cache coherency anyway, because two different CPU's can see two
different virtual addresses at the same time for the same page (in
ways that the old kmap interfaces could not).
So maybe the answer is "let's forget about the old virtual cache
coherence issue, and make it purely about the I$ mapping case".
At that point, kmap is irrelevant from a virtual address standpoint
and so it doesn't make much sense to fliush on kunmap - but anybody
who writes to a page still needs that flush_dcache_page() thing.
Linus
+ rmk
On Thu, Nov 04, 2021 at 10:08:40AM -0700, Linus Torvalds wrote:
> On Thu, Nov 4, 2021 at 9:54 AM Catalin Marinas <[email protected]> wrote:
> > We do. flush_dcache_page() is not just about virtual caches. On arm32/64
> > (and powerpc), even with PIPT-like caches, we use it to flag a page's
> > D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping
> > is executable, we do the cache maintenance to ensure the I and D caches
> > are coherent with each other.
>
> Ugh,. ok, so we have two very different use-cases for that function.
>
> Perhaps more importantly, they have hugely different semantics. For
> you, it's about pages that can be mapped executable, so it's only
> relevant for mappable pages.
>
> For the traditional broken pure virtual cache case, it's not about
> user mappings at all, it's about any data structure that we might have
> in highmem.
>
> Of course, I think we got rid of most of the other uses of highmem,
> and we no longer put any "normal" kernel data in highmem pages. There
> used to be patches that did inodes and things like that in highmem,
> and they actually depended on the "cache the virtual address so that
> it's always the same" behavior.
We can still have ptes in highmem.
> > I wouldn't add this call to kmap/kunmap_local(), it would be a slight
> > unnecessary overhead (we had a customer complaining about kmap_atomic()
> > breaking write-streaming, I think the new kmap_local() solved this
> > problem, if in the right context).
>
> kmap_local() ends up being (I think) fundamentally broken for virtual
> cache coherency anyway, because two different CPU's can see two
> different virtual addresses at the same time for the same page (in
> ways that the old kmap interfaces could not).
Luckily I don't think we have a (working) SMP system with VIVT caches.
On UP, looking at arm, for VIVT caches it flushes the D-cache before
kunmap_local() (arch_kmap_local_pre_unmap()). So any new kmap_local()
would see the correct data even if it's in a different location.
> So maybe the answer is "let's forget about the old virtual cache
> coherence issue, and make it purely about the I$ mapping case".
We still have VIVT processors supported in the kernel and a few where
the VIPT cache is aliasing (some ARMv6 CPUs). On these,
flush_dcache_page() is still used to ensure the user aliases are
coherent with the kernel one, so it's not just about the I/D-cache
coherency.
> At that point, kmap is irrelevant from a virtual address standpoint
> and so it doesn't make much sense to fliush on kunmap - but anybody
> who writes to a page still needs that flush_dcache_page() thing.
The cachetlb.rst doc states the two cases where flush_dcache_page()
should be called:
1. After writing to a page cache page (that's what we need on arm64 for
the I-cache).
2. Before reading from a page cache page and user mappings potentially
exist. I think arm32 ensures the D-cache user aliases are coherent
with the kernel one (added rmk to confirm).
Now, whether the kernel code does call flush_dcache_page() in the above
scenarios is another matter. But if we are to remove the 2nd case, for
VIVT/aliasing-VIPT hardware we'd need kmap() to perform some cache
maintenance even if the page is not in highmem.
--
Catalin
On Thu, Nov 04, 2021 at 06:04:45PM +0000, Catalin Marinas wrote:
> The cachetlb.rst doc states the two cases where flush_dcache_page()
> should be called:
>
> 1. After writing to a page cache page (that's what we need on arm64 for
> the I-cache).
>
> 2. Before reading from a page cache page and user mappings potentially
> exist. I think arm32 ensures the D-cache user aliases are coherent
> with the kernel one (added rmk to confirm).
Yes, where necessary, we flush the user aliases in flush_dcache_page().
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Nov 4, 2021 at 11:04 AM Catalin Marinas <[email protected]> wrote:
>
> Luckily I don't think we have a (working) SMP system with VIVT caches.
> On UP, looking at arm, for VIVT caches it flushes the D-cache before
> kunmap_local() (arch_kmap_local_pre_unmap()). So any new kmap_local()
> would see the correct data even if it's in a different location.
Ok, good.
Yeah, because kmap_local and SMP really would seem to be a "that can't
work with VIVT".
> We still have VIVT processors supported in the kernel and a few where
> the VIPT cache is aliasing (some ARMv6 CPUs). On these,
> flush_dcache_page() is still used to ensure the user aliases are
> coherent with the kernel one, so it's not just about the I/D-cache
> coherency.
Maybe we could try to split it up and make each function have more
well-defined rules? One of the issues with the flush_dcache thing is
that it's always been so ad-hoc and it's not been hugely clear.
For example, people seem to think it's purely about flushing writes.
But for the virtual aliasing issue and kmap, you may need to flush
purely between reads too - just to make sure that you don't have any
stale contents.
That's why kunmap needs to have some unconditional cache flush thing
for the virtual aliasing issue.
But hey, it's entirely possible that it should *not* have that
"flush_dcache_page()" thing, but something that is private to the
architecture.
So VIVT arm (and whoever else) would continue to do the cache flushing
at kunmap_local time (or kmap - I don't think it matters which one you
do, as long as you make sure there are no stale contents from the
previous use of that address).
And then we'd relegate flush_dcache_page() purely for uses where
somebody modifies the data and wants to make sure it ends up being
coherent with subsequent uses (whether kmap and VIVT or I$/D$
coherency issues)?
> The cachetlb.rst doc states the two cases where flush_dcache_page()
> should be called:
>
> 1. After writing to a page cache page (that's what we need on arm64 for
> the I-cache).
>
> 2. Before reading from a page cache page and user mappings potentially
> exist. I think arm32 ensures the D-cache user aliases are coherent
> with the kernel one (added rmk to confirm).
I think the "kernel cache coherency" matters too. The PTE contents
thing seems relevant if we use kmap for that...
So I do think that the "page cache or user mapping" is not necessarily
the only case.
But personally I consider these situations so broken at a hardware
level that I can't find it in myself to care too deeply.
Because user space with non-coherent I$/D$ should do its own cache
flushing if it does "read()" to modify an executable range - exactly
the same way it has to do it for just doing regular stores to that
range.
It really shouldn't be the kernel that cares at all.
Linus
On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote:
> Why did this come up? Do you actually have some hardware or situation
> that cares?
Oh, we're doing review of the XFS/iomap folio patches, which led to
looking at zero_user_segments(), and I realised that memzero_page()
was now functionally identical to zero_user(). And you'd been quite
specific about not having flush_dcache_page() in there, so ... I wondered
if you'd had a change of mind.
On Thu, Nov 4, 2021 at 11:39 AM Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote:
> > Why did this come up? Do you actually have some hardware or situation
> > that cares?
>
> Oh, we're doing review of the XFS/iomap folio patches, which led to
> looking at zero_user_segments(), and I realised that memzero_page()
> was now functionally identical to zero_user(). And you'd been quite
> specific about not having flush_dcache_page() in there, so ... I wondered
> if you'd had a change of mind.
Ugh. I guess it ends up being there whether I like it or not. All that
"zero_user_segments() stuff is too ugly for words, though, so I think
whoever wrote it must have been on some interesting pharmaceuticals.
What the hell are the two start/end things? And most users actually
just want a single page and should never have used that thing. Nasty.
I'm not touching that with a ten-foot pole.
Linus
On Thu, Nov 04, 2021 at 11:23:56AM -0700, Linus Torvalds wrote:
> On Thu, Nov 4, 2021 at 11:04 AM Catalin Marinas <[email protected]> wrote:
> > We still have VIVT processors supported in the kernel and a few where
> > the VIPT cache is aliasing (some ARMv6 CPUs). On these,
> > flush_dcache_page() is still used to ensure the user aliases are
> > coherent with the kernel one, so it's not just about the I/D-cache
> > coherency.
>
> Maybe we could try to split it up and make each function have more
> well-defined rules? One of the issues with the flush_dcache thing is
> that it's always been so ad-hoc and it's not been hugely clear.
[...]
> So VIVT arm (and whoever else) would continue to do the cache flushing
> at kunmap_local time (or kmap - I don't think it matters which one you
> do, as long as you make sure there are no stale contents from the
> previous use of that address).
>
> And then we'd relegate flush_dcache_page() purely for uses where
> somebody modifies the data and wants to make sure it ends up being
> coherent with subsequent uses (whether kmap and VIVT or I$/D$
> coherency issues)?
For PIPT hardware (I suppose most newish architectures),
flush_dcache_page() only matters with separate I$ and D$. So we can
indeed redefine it as only meaningful when a user page has been written
by the kernel (and maybe we can give it a better name).
For VIVT, kmap/kunmap() can take care of synchronising the aliases. If
the kmap'ed page has a user mapping, kmap() would also need to flush the
aliases, not just kunmap() (currently relying on flush_dcache_page() for
the read side as well). I suspect this is going to make kmap() more
expensive for those highmem pages only used by the kernel, or for pages
not yet mapped in user space (say during a page fault on mmap'ed file).
Long time ago on arm32 we used to do a check with page_mapping() and
mapping_mapped() but I think the latter disappeared from the kernel.
> > The cachetlb.rst doc states the two cases where flush_dcache_page()
> > should be called:
> >
> > 1. After writing to a page cache page (that's what we need on arm64 for
> > the I-cache).
> >
> > 2. Before reading from a page cache page and user mappings potentially
> > exist. I think arm32 ensures the D-cache user aliases are coherent
> > with the kernel one (added rmk to confirm).
>
> I think the "kernel cache coherency" matters too. The PTE contents
> thing seems relevant if we use kmap for that...
>
> So I do think that the "page cache or user mapping" is not necessarily
> the only case.
At least the arm32 set_pte() for VIVT caches does its own D$ flushing on
the kmap() address. So kunmap() flushing is not strictly necessary for
this specific case (I think). But there may be other cases where it
matters.
> But personally I consider these situations so broken at a hardware
> level that I can't find it in myself to care too deeply.
We've had them supported in mainline for so many years, and working
(mostly, there was the odd driver that did not call the right API). But
I'm fine with deprecating them, making them slower in favour of cleaner
semantics of kmap, flush_dcache_page etc.
> Because user space with non-coherent I$/D$ should do its own cache
> flushing if it does "read()" to modify an executable range - exactly
> the same way it has to do it for just doing regular stores to that
> range.
Yes, if the user did a read(), it should flush the caches it cares
about. I don't think we even have a flush_dcache_page() call in the
kernel in these cases, just copy_to_user(). Basically the kernel should
only flush if it wrote via its own mapping (linear, kmap).
However, with mmap(PROT_EXEC), the user expects the I$/D$ to be coherent
without explicit user cache maintenance, even with PIPT hardware. That's
where flush_dcache_page() matters.
We also had some weird bugs with a dynamic loader mapping a page
initially as PROT_READ|PROT_EXEC, doing an
mprotect(PROT_READ|PROT_WRITE) just to write some data (not text, so it
never thought explicit cache flushing by user was needed) and back to
mprotect(PROT_READ|PROT_EXEC). Because of the CoW, the new page did not
have the I$/D$ synchronised, leading to the occasional SIGILL. Again,
flush_dcache_page() after CoW is need, though hidden in the arch code
(we do this on arm64 in copy_user_highpage()).
--
Catalin