Hi folks !
So I have a little problem on powerpc ... :-)
Due to the way I'm attempting to do my I$/D$ coherency on embedded
processors, I basically need to "filter out" _PAGE_EXEC in set_pte_at()
if the page isn't clean (PG_arch_1) and the set_pte_at() isn't caused by
an exec fault. etc...
The problem with that approach (current upstream) is that the generic
code tends not to read back the PTE, and thus still carries around a PTE
value that doesn't match what was actually written.
For example, we end up with update_mmu_cache() called with an "entry"
argument that has _PAGE_EXEC set while we really didn't write it into
the page tables. This will be problematic when we finally add preloading
directly into the TLB on those processors. There's at least one other
fishy case where huetlbfs would carry the PTE value around and later do
the wrong thing because pte_same() with the loaded one failed.
What do you suggest we do here ? Among the options at hand:
- Ugly but would probably "just work" with the last amount of changes:
we could make set_pte_at() be a macro on powerpc that modifies it's PTE
value argument :-) (I -did- warn it was ugly !)
- Another one slightly less bad that would require more work but mostly
mechanical arch header updates would be to make set_pte_at() return the
new value of the PTE, and thus change the callsites to something like:
entry = set_pte_at(mm, addr, ptep, entry)
- Any other idea ? We could use another PTE bit (_PAGE_HWEXEC), in
fact, we used to, but we are really short on PTE bits nowadays and I
freed that one up to get _PAGE_SPECIAL... _PAGE_EXEC is trivial to
"recover" from ptep_set_access_flags() on an exec fault or from the VM
prot.
Cheers,
Ben.
On Sat, 31 Oct 2009, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> So I have a little problem on powerpc ... :-)
Thanks a lot for running this by us.
>
> Due to the way I'm attempting to do my I$/D$ coherency on embedded
> processors, I basically need to "filter out" _PAGE_EXEC in set_pte_at()
> if the page isn't clean (PG_arch_1) and the set_pte_at() isn't caused by
> an exec fault. etc...
>
> The problem with that approach (current upstream) is that the generic
> code tends not to read back the PTE, and thus still carries around a PTE
> value that doesn't match what was actually written.
>
> For example, we end up with update_mmu_cache() called with an "entry"
> argument that has _PAGE_EXEC set while we really didn't write it into
> the page tables. This will be problematic when we finally add preloading
> directly into the TLB on those processors. There's at least one other
> fishy case where huetlbfs would carry the PTE value around and later do
> the wrong thing because pte_same() with the loaded one failed.
I've not looked to see if there are more such issues in arch/powerpc
itself, but those instances you mention are the only ones I managed
to find: uses of update_mmu_cache() and that hugetlb_cow() one.
The hugetlb_cow() one involves not set_pte_at() but set_huge_pte_at(),
so you'd want to change that too? And presumably set_pte_at_notify()?
It all seems a lot of tedium, when so very few places are interested
in the pte after they've set it.
>
> What do you suggest we do here ? Among the options at hand:
>
> - Ugly but would probably "just work" with the last amount of changes:
> we could make set_pte_at() be a macro on powerpc that modifies it's PTE
> value argument :-) (I -did- warn it was ugly !)
I'm not keen on that one :)
>
> - Another one slightly less bad that would require more work but mostly
> mechanical arch header updates would be to make set_pte_at() return the
> new value of the PTE, and thus change the callsites to something like:
>
> entry = set_pte_at(mm, addr, ptep, entry)
I prefer that, but it still seems more trouble than it's worth.
And though I prefer it to set_pte_at(mm, addr, ptep, &entry)
(which would anyway complicate many of the callsites), it might
unnecessarily increase the codesize for all architectures (depends
on whether gcc notices entry isn't used afterwards anyway).
>
> - Any other idea ? We could use another PTE bit (_PAGE_HWEXEC), in
> fact, we used to, but we are really short on PTE bits nowadays and I
> freed that one up to get _PAGE_SPECIAL... _PAGE_EXEC is trivial to
> "recover" from ptep_set_access_flags() on an exec fault or from the VM
> prot.
No, please don't go ransacking your PTE for a sparish bit.
You're being a very good citizen to want to bring this so forcefully
to the attention of any user of set_pte_at(); but given how few care,
and the other such functions you'd want to change too, am I being
disgracefully lazy to suggest that you simply change the occasional
update_mmu_cache(vma, address, pte);
to
/* powerpc's set_pte_at might have adjusted the pte */
update_mmu_cache(vma, address, *ptep);
? Which would make no difference to those architectures whose
update_mmu_cache() is an empty macro. And fix the mm/hugetlb.c
instance in a similar way?
Hugh
On Mon, 2009-11-02 at 13:27 +0000, Hugh Dickins wrote:
> On Sat, 31 Oct 2009, Benjamin Herrenschmidt wrote:
>
> > Hi folks !
> >
> > So I have a little problem on powerpc ... :-)
>
> Thanks a lot for running this by us.
Heh, I though you may have been bored :-)
> I've not looked to see if there are more such issues in arch/powerpc
> itself, but those instances you mention are the only ones I managed
> to find: uses of update_mmu_cache() and that hugetlb_cow() one.
Right, that's all I spotted so far
> The hugetlb_cow() one involves not set_pte_at() but set_huge_pte_at(),
> so you'd want to change that too? And presumably set_pte_at_notify()?
> It all seems a lot of tedium, when so very few places are interested
> in the pte after they've set it.
We need to change set_huge_pte_at() too. Currently, David fixed the
problem in a local tree by making hugetlb_cow() re-read the PTE .
set_pte_at_notify() would probably be similar, I'm not too familiar with
its usage scenario yet to be honest.
> > What do you suggest we do here ? Among the options at hand:
> >
> > - Ugly but would probably "just work" with the last amount of changes:
> > we could make set_pte_at() be a macro on powerpc that modifies it's PTE
> > value argument :-) (I -did- warn it was ugly !)
>
> I'm not keen on that one :)
Yeah. Me neither :-)
> > - Another one slightly less bad that would require more work but mostly
> > mechanical arch header updates would be to make set_pte_at() return the
> > new value of the PTE, and thus change the callsites to something like:
> >
> > entry = set_pte_at(mm, addr, ptep, entry)
>
> I prefer that, but it still seems more trouble than it's worth.
Right. I was hoping you might have a better idea :-)
> And though I prefer it to set_pte_at(mm, addr, ptep, &entry)
> (which would anyway complicate many of the callsites), it might
> unnecessarily increase the codesize for all architectures (depends
> on whether gcc notices entry isn't used afterwards anyway).
Macro or static inlines back to __set_pte_at(..., entry) in those archs
would probably take care of avoiding the bloat but still a lot of churn.
> >
> > - Any other idea ? We could use another PTE bit (_PAGE_HWEXEC), in
> > fact, we used to, but we are really short on PTE bits nowadays and I
> > freed that one up to get _PAGE_SPECIAL... _PAGE_EXEC is trivial to
> > "recover" from ptep_set_access_flags() on an exec fault or from the VM
> > prot.
>
> No, please don't go ransacking your PTE for a sparish bit.
Yeah, the whole exercise for me was initially to -save- PTE bits :-)
> You're being a very good citizen to want to bring this so forcefully
> to the attention of any user of set_pte_at(); but given how few care,
> and the other such functions you'd want to change too, am I being
> disgracefully lazy to suggest that you simply change the occasional
>
> update_mmu_cache(vma, address, pte);
> to
> /* powerpc's set_pte_at might have adjusted the pte */
> update_mmu_cache(vma, address, *ptep);
>
> ? Which would make no difference to those architectures whose
> update_mmu_cache() is an empty macro. And fix the mm/hugetlb.c
> instance in a similar way?
That would do fine. In fact, I've always been slightly annoyed by
set_pte_at() not taking the PTE pointer for other reasons such as on
64-K pages, we have a "hidden" part of the PTE that is at PTE address +
32K, or we may want to get to the PTE page for some reason (some arch
store things there) etc...
IE. update_mmu_cache() would be more generally useful if it took the
ptep instead of the pte. Of course, I'm sure some embedded archs are
going to cry for the added load here ...
I like your idea. I'll look into doing a patch converting it and will
post it here.
Thanks !
Cheers,
Ben.
On Tue, 3 Nov 2009, Benjamin Herrenschmidt wrote:
> On Mon, 2009-11-02 at 13:27 +0000, Hugh Dickins wrote:
>
> > You're being a very good citizen to want to bring this so forcefully
> > to the attention of any user of set_pte_at(); but given how few care,
> > and the other such functions you'd want to change too, am I being
> > disgracefully lazy to suggest that you simply change the occasional
> >
> > update_mmu_cache(vma, address, pte);
> > to
> > /* powerpc's set_pte_at might have adjusted the pte */
> > update_mmu_cache(vma, address, *ptep);
> >
> > ? Which would make no difference to those architectures whose
> > update_mmu_cache() is an empty macro. And fix the mm/hugetlb.c
> > instance in a similar way?
>
> That would do fine. In fact, I've always been slightly annoyed by
> set_pte_at() not taking the PTE pointer for other reasons such as on
> 64-K pages, we have a "hidden" part of the PTE that is at PTE address +
> 32K, or we may want to get to the PTE page for some reason (some arch
> store things there) etc...
>
> IE. update_mmu_cache() would be more generally useful if it took the
> ptep instead of the pte. Of course, I'm sure some embedded archs are
> going to cry for the added load here ...
>
> I like your idea. I'll look into doing a patch converting it and will
> post it here.
Well, I wasn't proposing
update_mmu_cache(vma, address, ptep);
but
update_mmu_cache(vma, address, *ptep);
which may not meet your future idea, but is much less churn for now
i.e. no change to any of the arch's update_mmu_cache(),
just a change to some of its callsites.
Hugh
On Mon, 2009-11-02 at 23:45 +0000, Hugh Dickins wrote:
> > IE. update_mmu_cache() would be more generally useful if it took the
> > ptep instead of the pte. Of course, I'm sure some embedded archs are
> > going to cry for the added load here ...
> >
> > I like your idea. I'll look into doing a patch converting it and
> will
> > post it here.
>
> Well, I wasn't proposing
>
> update_mmu_cache(vma, address, ptep);
> but
> update_mmu_cache(vma, address, *ptep);
>
> which may not meet your future idea, but is much less churn for now
> i.e. no change to any of the arch's update_mmu_cache(),
> just a change to some of its callsites.
I see... but if we go that way, I think we may as well do the whole
churn... I'll have a look at how bad it is.
Cheers,
Ben.
On Tue, Nov 03, 2009 at 09:19:27AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2009-11-02 at 13:27 +0000, Hugh Dickins wrote:
> > On Sat, 31 Oct 2009, Benjamin Herrenschmidt wrote:
> >
> > > Hi folks !
> > >
> > > So I have a little problem on powerpc ... :-)
> >
> > Thanks a lot for running this by us.
>
> Heh, I though you may have been bored :-)
>
> > I've not looked to see if there are more such issues in arch/powerpc
> > itself, but those instances you mention are the only ones I managed
> > to find: uses of update_mmu_cache() and that hugetlb_cow() one.
>
> Right, that's all I spotted so far
>
> > The hugetlb_cow() one involves not set_pte_at() but set_huge_pte_at(),
> > so you'd want to change that too? And presumably set_pte_at_notify()?
> > It all seems a lot of tedium, when so very few places are interested
> > in the pte after they've set it.
>
> We need to change set_huge_pte_at() too. Currently, David fixed the
> problem in a local tree by making hugetlb_cow() re-read the PTE .
Well, actually I have another cleanup patch in the queue which makes
set_huge_pte_at() equal to set_pte_at() on powerpc, and I was using
that on the tree where this problem became apparent.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson