2005-09-02 21:19:17

by Jeff Dike

[permalink] [raw]
Subject: Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

On Wed, Aug 10, 2005 at 09:37:28PM +0200, Blaisorblade wrote:
> Also look, on the "set_pte" theme, at the attached patch.

+ WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte));

This one has been firing on me, and I decided to figure out why. The
culprit is this code in do_no_page:

if (pte_none(*page_table)) {
if (!PageReserved(new_page))
inc_mm_counter(mm, rss);

flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte_at(mm, address, page_table, entry);

The first mk_pte immediately sets the pte to the protection limits of
the VMA, regardless of the access type. So, if it's a read access on
a writeable page, we get a writeable, but not dirty pte, since the
mkdirty never happens. The exercises the warning you added.

This seems somewhat bogus to me. If we set the pte protection to its
limits, then the maybe_mkwrite is unneccesary.

If we are the process in this address space, and we have a write
access, then the maybe_mkwrite doesn't do anything because the pte is
already writeable because the VMA has to be writeable, or we would
have been faulted already.

If we are a debugger changing the process memory, then the vma may be
read-only, and maybe_mkwrite is explicitly a no-op in this case.

This doesn't seem to harm our dirty bit emulation. fix_range_common
checks the dirty and accessed bits and disables read and write
protection as appropriate.

So, it seems like the warning could be dropped, or perhaps made more
selective, like checking for is_write == 0 and VM_WRITE, but then the
test is getting complicated.

Heff


2005-09-03 05:06:34

by Hugh Dickins

[permalink] [raw]
Subject: Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

On Fri, 2 Sep 2005, Jeff Dike wrote:
> On Wed, Aug 10, 2005 at 09:37:28PM +0200, Blaisorblade wrote:
> > Also look, on the "set_pte" theme, at the attached patch.
>
> + WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte));
>
> This one has been firing on me, and I decided to figure out why. The
> culprit is this code in do_no_page:
>
> if (pte_none(*page_table)) {
> if (!PageReserved(new_page))
> inc_mm_counter(mm, rss);
>
> flush_icache_page(vma, new_page);
> entry = mk_pte(new_page, vma->vm_page_prot);
> if (write_access)
> entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> set_pte_at(mm, address, page_table, entry);
>
> The first mk_pte immediately sets the pte to the protection limits of
> the VMA, regardless of the access type. So, if it's a read access on
> a writeable page, we get a writeable, but not dirty pte, since the
> mkdirty never happens. The exercises the warning you added.
>
> This seems somewhat bogus to me. If we set the pte protection to its
> limits, then the maybe_mkwrite is unneccesary.
>
> If we are the process in this address space, and we have a write
> access, then the maybe_mkwrite doesn't do anything because the pte is
> already writeable because the VMA has to be writeable, or we would
> have been faulted already.

Not at all. The private, COW areas. They may be writeable in the sense
that VM_WRITE is set, but pte_write permission cannot be in vm_page_prot,
or we'd never get the fault when to Copy On Write.

What is bogus, I think, are those places where we do the reverse:
like do_anonymous_page's pte_wrprotect of its mkpte of the ZERO_PAGE.

> If we are a debugger changing the process memory, then the vma may be
> read-only, and maybe_mkwrite is explicitly a no-op in this case.
>
> This doesn't seem to harm our dirty bit emulation. fix_range_common
> checks the dirty and accessed bits and disables read and write
> protection as appropriate.
>
> So, it seems like the warning could be dropped, or perhaps made more
> selective, like checking for is_write == 0 and VM_WRITE, but then the
> test is getting complicated.
>
> Heff

Jugh

2005-09-04 11:36:46

by Blaisorblade

[permalink] [raw]
Subject: Re: [uml-devel] Re: [patch 1/3] uml: share page bits handling between 2 and 3 level pagetables

On Friday 02 September 2005 22:17, Jeff Dike wrote:
> On Wed, Aug 10, 2005 at 09:37:28PM +0200, Blaisorblade wrote:
> > Also look, on the "set_pte" theme, at the attached patch.

> + WARN_ON(!pte_young(*pte) || pte_write(*pte) && !pte_dirty(*pte));

> This one has been firing on me, and I decided to figure out why. The
> culprit is this code in do_no_page:

> if (pte_none(*page_table)) {
> if (!PageReserved(new_page))
> inc_mm_counter(mm, rss);
>
> flush_icache_page(vma, new_page);
> entry = mk_pte(new_page, vma->vm_page_prot);
> if (write_access)
> entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> set_pte_at(mm, address, page_table, entry);
>
> The first mk_pte immediately sets the pte to the protection limits of
> the VMA, regardless of the access type.

> So, if it's a read access on
> a writeable page, we get a writeable, but not dirty pte, since the
> mkdirty never happens. The exercises the warning you added.
Thanks for noticing - I had really this doubt when writing some code (in the
patch, I've added a dirty PTEs on read accesses because I was unsure, and
even because of my warning).

> This seems somewhat bogus to me. If we set the pte protection to its
> limits, then the maybe_mkwrite is unneccesary.

> This doesn't seem to harm our dirty bit emulation. fix_range_common
> checks the dirty and accessed bits and disables read and write
> protection as appropriate.

> So, it seems like the warning could be dropped, or perhaps made more
> selective, like checking for is_write == 0 and VM_WRITE, but then the
> test is getting complicated.

No, just replace pte_write() with is_write, as below. They might not coincide,
but if on a write fault we return with a clean PTE, we'll loop indefinitely
(experienced while hacking on remap_f_p), so the warning above is definitely
correct.

WARN_ON(!pte_young(*pte) || is_write && !pte_dirty(*pte));
> Heff

--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


___________________________________
Yahoo! Messenger: chiamate gratuite in tutto il mondo
http://it.beta.messenger.yahoo.com