On Thursday 29 May 2008 23:56, Martin Schwidefsky wrote:
> Greetings,
> with a recent performance analysis we discovered something interesting
> in regard to the physical dirty bits found on s390. The page_remove_rmap
> function stands out when areas of anonymous memory gets unmapped. The
> reason is the transfer of the dirty bit from the page storage key to the
> struct page when the last mapper of a page is removed. For anonymous
> pages that cease to exist this is superfluous. Without the storage key
> operations process termination is noticable faster, e.g. for a gcc test
> case we got a speedup of 2%.
> To get this done page_remove_rmap needs to know if the page dirty bit
> can be ignored. The page_test_dirty / page_clear_dirty call can only be
> avoided if page_remove_rmap is called from zap_pte_range or do_wp_page.
> If it is called from any other place - in particular try_to_unmap_one -
> the page dirty bit may not be ignored.
> The patch below introduces a new function to do that, in lack of a
> better name I called it page_zap_rmap. Comments ?
I don't know if it is that simple, is it?
I don't know how you are guaranteeing the given page ceases to exist.
Even checking for the last mapper of the page (which you don't appear
to do anyway) isn't enough because there could be a swapcount, in which
case you should still have to mark the page as dirty.
For example (I think, unless s390 somehow propogates the dirty page
bit some other way that I've missed), wouldn't the following break:
process p1 allocates anonymous page A
p1 dirties A
p1 forks p2, A now has a mapcount of 2
p2 VM_LOCKs A (something to prevent it being swapped)
page reclaim unmaps p1's pte, fails on p2
p2 exits, page_dirty does not get checked because of this patch
page has mapcount 0, PG_dirty is clear
Page reclaim can drop it without writing it to swap
Or am I misunderstanding something?
As far as the general idea goes, it might be possible to avoid the
check somehow, but you'd want to be pretty sure of yourself before
diverging the s390 path further from the common code base, no?
The "easy" way to do it might be just unconditionally mark the page
as dirty in this path (if the pte was writeable), so you can avoid
the page_test_dirty check and be sure of not missing the dirty bit.
On Tue, 2008-06-03 at 09:57 +1000, Nick Piggin wrote:
First of all: thanks for looking into this. Games with the dirty bit are
scary and any change needs careful consideration.
> I don't know if it is that simple, is it?
It should be analog to the fact that for the two place the page_zap_rmap
function is supposed to be used the pte dirty bit isn't checked as well.
> I don't know how you are guaranteeing the given page ceases to exist.
> Even checking for the last mapper of the page (which you don't appear
> to do anyway) isn't enough because there could be a swapcount, in which
> case you should still have to mark the page as dirty.
>
> For example (I think, unless s390 somehow propogates the dirty page
> bit some other way that I've missed), wouldn't the following break:
>
> process p1 allocates anonymous page A
> p1 dirties A
> p1 forks p2, A now has a mapcount of 2
> p2 VM_LOCKs A (something to prevent it being swapped)
> page reclaim unmaps p1's pte, fails on p2
> p2 exits, page_dirty does not get checked because of this patch
> page has mapcount 0, PG_dirty is clear
> Page reclaim can drop it without writing it to swap
Indeed, this would break. Even without the VM_LOCK there is a race of
try_to_unmap vs. process exit.
> As far as the general idea goes, it might be possible to avoid the
> check somehow, but you'd want to be pretty sure of yourself before
> diverging the s390 path further from the common code base, no?
I don't want to diverge more than necessary. But the performance gains
of the SSKE/ISKE avoidance makes it worthwhile for s390, no?
> The "easy" way to do it might be just unconditionally mark the page
> as dirty in this path (if the pte was writeable), so you can avoid
> the page_test_dirty check and be sure of not missing the dirty bit.
Hmm, but then an mprotect() can change the pte to read-ony and we'd miss
the dirty bit again. Back to the drawing board.
By the way there is another SSKE I want to get rid of: __SetPageUptodate
does a page_clear_dirty(). For all uses of __SetPageUptodate the page
will be dirty after the application did its first write. To clear the
page dirty bit only to have it set again shortly after doesn't make much
sense to me. Has there been any particular reason for the
page_clear_dirty in __SetPageUptodate ?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Tuesday 03 June 2008 18:06, Martin Schwidefsky wrote:
> On Tue, 2008-06-03 at 09:57 +1000, Nick Piggin wrote:
>
> First of all: thanks for looking into this. Games with the dirty bit are
> scary and any change needs careful consideration.
>
> > I don't know if it is that simple, is it?
>
> It should be analog to the fact that for the two place the page_zap_rmap
> function is supposed to be used the pte dirty bit isn't checked as well.
pte dirty bit is checked in zap_pte_range. In do_wp_page, the pte dirty
bit is not checked because it cannot have been dirtied via that mapping.
However, this may not necessarily be true in the s390 case where it might
be dirtied by _another_ mapping which has subsequently exited but not
propogated the physical dirty bit (I don't know, but I'm just wary about
it).
> > I don't know how you are guaranteeing the given page ceases to exist.
> > Even checking for the last mapper of the page (which you don't appear
> > to do anyway) isn't enough because there could be a swapcount, in which
> > case you should still have to mark the page as dirty.
> >
> > For example (I think, unless s390 somehow propogates the dirty page
> > bit some other way that I've missed), wouldn't the following break:
> >
> > process p1 allocates anonymous page A
> > p1 dirties A
> > p1 forks p2, A now has a mapcount of 2
> > p2 VM_LOCKs A (something to prevent it being swapped)
> > page reclaim unmaps p1's pte, fails on p2
> > p2 exits, page_dirty does not get checked because of this patch
> > page has mapcount 0, PG_dirty is clear
> > Page reclaim can drop it without writing it to swap
>
> Indeed, this would break. Even without the VM_LOCK there is a race of
> try_to_unmap vs. process exit.
>
> > As far as the general idea goes, it might be possible to avoid the
> > check somehow, but you'd want to be pretty sure of yourself before
> > diverging the s390 path further from the common code base, no?
>
> I don't want to diverge more than necessary. But the performance gains
> of the SSKE/ISKE avoidance makes it worthwhile for s390, no?
I guess it's worth exploring.
> > The "easy" way to do it might be just unconditionally mark the page
> > as dirty in this path (if the pte was writeable), so you can avoid
> > the page_test_dirty check and be sure of not missing the dirty bit.
>
> Hmm, but then an mprotect() can change the pte to read-ony and we'd miss
> the dirty bit again. Back to the drawing board.
Hmm, I guess you _could_ set_page_dirty in mprotect.
> By the way there is another SSKE I want to get rid of: __SetPageUptodate
> does a page_clear_dirty(). For all uses of __SetPageUptodate the page
> will be dirty after the application did its first write. To clear the
> page dirty bit only to have it set again shortly after doesn't make much
> sense to me. Has there been any particular reason for the
> page_clear_dirty in __SetPageUptodate ?
I guess it is just to match SetPageUptodate. Not all __SetPageUptodate
paths may necessarily dirty the page, FWIW.
On Tue, 2008-06-03 at 18:29 +1000, Nick Piggin wrote:
> pte dirty bit is checked in zap_pte_range. In do_wp_page, the pte dirty
> bit is not checked because it cannot have been dirtied via that mapping.
> However, this may not necessarily be true in the s390 case where it might
> be dirtied by _another_ mapping which has subsequently exited but not
> propogated the physical dirty bit (I don't know, but I'm just wary about
> it).
Yes, what is needed is a check that tells us if the page content is
still needed after the last mapper has gone. For anonymous pages the
check "!PageAnon(page) || PageSwapCache(page)" should do. The idea is
that on each path that replaces a valid pte with a swap pte the
PageSwapCache bit is set first, then page_remove_rmap is called. As far
as I can see this is true for the current code.
> > > The "easy" way to do it might be just unconditionally mark the page
> > > as dirty in this path (if the pte was writeable), so you can avoid
> > > the page_test_dirty check and be sure of not missing the dirty bit.
> >
> > Hmm, but then an mprotect() can change the pte to read-ony and we'd miss
> > the dirty bit again. Back to the drawing board.
>
> Hmm, I guess you _could_ set_page_dirty in mprotect.
Yeah, but we don't really want to do that.
> > By the way there is another SSKE I want to get rid of: __SetPageUptodate
> > does a page_clear_dirty(). For all uses of __SetPageUptodate the page
> > will be dirty after the application did its first write. To clear the
> > page dirty bit only to have it set again shortly after doesn't make much
> > sense to me. Has there been any particular reason for the
> > page_clear_dirty in __SetPageUptodate ?
>
> I guess it is just to match SetPageUptodate. Not all __SetPageUptodate
> paths may necessarily dirty the page, FWIW.
The current users of __SetPageUptodate are:
* do_wp_page
* do_anonymous_page
* do_fault for (flags & FAULT_FLAG_WRITE)
For these three cases the pte is created with pte_mkdirty. Clearing the
physical page dirty bit is pointless.
* hugetlb_cow
* hugetlb_no_page
The dirty bits are of no interest to hugetlbfs, no?
I think it is safe to remove the page_clear_dirty from __SetPageUptodate
New patch below.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
---
Subject: [PATCH] Optimize storage key operations for anon pages
From: Martin Schwidefsky <[email protected]>
For anonymous pages without a swap cache backing the check for the physical
dirty bit in page_remove_rmap is unnecessary. The instruction that are used
to check and reset the dirty bit are expensive. Removing the check noticably
speeds up process exit. In addition the clearing of the dirty bit in
__SetPageUptodate is pointless as well. With these two changes there is
no storage key operation for an anonymous page anymore if it does not hit
the swap space.
The micro benchmark which repeatedly executes an empty shell script gets
about 5% faster.
Signed-off-by: Martin Schwidefsky <[email protected]>
---
diff -urpN linux-2.6/include/linux/page-flags.h linux-2.6-patched/include/linux/page-flags.h
--- linux-2.6/include/linux/page-flags.h 2008-06-03 09:46:50.000000000 +0200
+++ linux-2.6-patched/include/linux/page-flags.h 2008-06-03 11:37:27.000000000 +0200
@@ -217,9 +217,6 @@ static inline void __SetPageUptodate(str
{
smp_wmb();
__set_bit(PG_uptodate, &(page)->flags);
-#ifdef CONFIG_S390
- page_clear_dirty(page);
-#endif
}
static inline void SetPageUptodate(struct page *page)
diff -urpN linux-2.6/mm/rmap.c linux-2.6-patched/mm/rmap.c
--- linux-2.6/mm/rmap.c 2008-06-03 11:34:55.000000000 +0200
+++ linux-2.6-patched/mm/rmap.c 2008-06-03 11:36:27.000000000 +0200
@@ -678,7 +678,8 @@ void page_remove_rmap(struct page *page,
* Leaving it set also helps swapoff to reinstate ptes
* faster for those pages still in swapcache.
*/
- if (page_test_dirty(page)) {
+ if ((!PageAnon(page) || PageSwapCache(page)) &&
+ page_test_dirty(page)) {
page_clear_dirty(page);
set_page_dirty(page);
}