2002-07-28 07:20:48

by Andrew Morton

[permalink] [raw]
Subject: [patch 2/13] remove pages from the LRU in __free_pages_ok()



There are some situations where a page's final release is performed by
put_page(). Such as in access_process_vm(). This tends to go BUG()
because the page is on the LRU.

The patch changes __free_pages_ok() to remove the page from the LRU
in this case, as in 2.4.

We need to make changes to this code again later - I have workloads in
which page_cache_release() consumes 5% of CPU resources. But this
should keep people out of trouble meanwhile.



page_alloc.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

--- 2.5.29/mm/page_alloc.c~lru-removal Sat Jul 27 23:38:59 2002
+++ 2.5.29-akpm/mm/page_alloc.c Sat Jul 27 23:49:03 2002
@@ -89,10 +89,14 @@ static void __free_pages_ok (struct page

KERNEL_STAT_ADD(pgfree, 1<<order);

+ if (PageLRU(page)) {
+ BUG_ON(in_interrupt()); /* It could deadlock */
+ lru_cache_del(page);
+ }
+
BUG_ON(PagePrivate(page));
BUG_ON(page->mapping != NULL);
BUG_ON(PageLocked(page));
- BUG_ON(PageLRU(page));
BUG_ON(PageActive(page));
BUG_ON(PageWriteback(page));
BUG_ON(page->pte.chain != NULL);
@@ -451,11 +455,8 @@ unsigned long get_zeroed_page(unsigned i

void page_cache_release(struct page *page)
{
- if (!PageReserved(page) && put_page_testzero(page)) {
- if (PageLRU(page))
- lru_cache_del(page);
+ if (!PageReserved(page) && put_page_testzero(page))
__free_pages_ok(page, 0);
- }
}

void __free_pages(struct page *page, unsigned int order)

.


2002-07-28 23:49:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, Andrew Morton wrote:
>
> There are some situations where a page's final release is performed by
> put_page(). Such as in access_process_vm(). This tends to go BUG()
> because the page is on the LRU.

This is wrong.

If that happens, then you should just make access_process_vm() use
"page_cache_release()". That's what you basically make "__free_pages_ok()"
do, but since you still have to add the BUG_ON() check to make clear that
it is illegal to do this from an interrupt context, it's much better to
just do this check statically.

Linus

2002-07-29 00:21:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, Andrew Morton wrote:
> >
> > There are some situations where a page's final release is performed by
> > put_page(). Such as in access_process_vm(). This tends to go BUG()
> > because the page is on the LRU.
>
> This is wrong.
>
> If that happens, then you should just make access_process_vm() use
> "page_cache_release()". That's what you basically make "__free_pages_ok()"
> do, but since you still have to add the BUG_ON() check to make clear that
> it is illegal to do this from an interrupt context, it's much better to
> just do this check statically.
>

OK. This means that put_page() against userspace-mapped pages
is to be avoided.

Did an audit. What on earth is drivers/scsi/sg.c:sg_rb_correct4mmap()
doing?

Also skb_release_data(), ___pskb_trim() and __pskb_pull_tail(). Can these
ever perform the final release against a page which is on the LRU? In
interrupt context?

tcp_sendmsg() is doing put_page() too. I _think_ it's OK because
the page is mapped into the calling process and because of the
way in which shrink_cache() looks at page->count. Not sure.



fs/binfmt_elf.c | 3 ++-
fs/smbfs/file.c | 8 ++++----
kernel/ptrace.c | 3 ++-
3 files changed, 8 insertions(+), 6 deletions(-)

--- 2.5.29/fs/binfmt_elf.c~put_page Sun Jul 28 17:17:23 2002
+++ 2.5.29-akpm/fs/binfmt_elf.c Sun Jul 28 17:31:33 2002
@@ -33,6 +33,7 @@
#include <linux/smp_lock.h>
#include <linux/compiler.h>
#include <linux/highmem.h>
+#include <linux/pagemap.h>

#include <asm/uaccess.h>
#include <asm/param.h>
@@ -1249,7 +1250,7 @@ static int elf_core_dump(long signr, str
flush_page_to_ram(page);
kunmap(page);
}
- put_page(page);
+ page_cache_release(page);
}
}
}
--- 2.5.29/fs/smbfs/file.c~put_page Sun Jul 28 17:19:08 2002
+++ 2.5.29-akpm/fs/smbfs/file.c Sun Jul 28 17:19:40 2002
@@ -105,9 +105,9 @@ smb_readpage(struct file *file, struct p
int error;
struct dentry *dentry = file->f_dentry;

- get_page(page);
+ page_cache_get(page);
error = smb_readpage_sync(dentry, page);
- put_page(page);
+ page_cache_release(page);
return error;
}

@@ -194,11 +194,11 @@ smb_writepage(struct page *page)
if (page->index >= end_index+1 || !offset)
return -EIO;
do_it:
- get_page(page);
+ page_cache_get(page);
err = smb_writepage_sync(inode, page, 0, offset);
SetPageUptodate(page);
unlock_page(page);
- put_page(page);
+ page_cache_release(page);
return err;
}

--- 2.5.29/kernel/ptrace.c~put_page Sun Jul 28 17:20:04 2002
+++ 2.5.29-akpm/kernel/ptrace.c Sun Jul 28 17:26:13 2002
@@ -11,6 +11,7 @@
#include <linux/errno.h>
#include <linux/mm.h>
#include <linux/highmem.h>
+#include <linux/pagemap.h>
#include <linux/smp_lock.h>

#include <asm/pgtable.h>
@@ -156,7 +157,7 @@ int access_process_vm(struct task_struct
flush_page_to_ram(page);
}
kunmap(page);
- put_page(page);
+ page_cache_release(page);
len -= bytes;
buf += bytes;
addr += bytes;

.

2002-07-29 00:32:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, Andrew Morton wrote:
>
> OK. This means that put_page() against userspace-mapped pages
> is to be avoided.

Yes. I think we've gotten most of it, and I just fixed the
access_process_vm() thing in my tree.

> Did an audit. What on earth is drivers/scsi/sg.c:sg_rb_correct4mmap()
> doing?

I don't know. The code looks fundamentally broken (why do a
"put_page_testzero()" if you don't actually test the return value?
Somebody is confused).

Linus

2002-07-29 00:47:59

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, Andrew Morton wrote:
> >
> > OK. This means that put_page() against userspace-mapped pages
> > is to be avoided.
>
> Yes. I think we've gotten most of it, and I just fixed the
> access_process_vm() thing in my tree.

OK. Please do the smbfs one as well. That's an outright
race with truncate now. Because we took the lru_cache_del()
out of truncate_complete_page() in the rmap merge.

And we took the lru_cache_del() out of truncate_complete_page()
because, err, I forget. There was a situation in which the page
could still be mapped into process pagetables, and the lru_cache_del()
would have left it unswappable until process exit.

Rik, can you remember the exact scenario?

-

2002-07-29 00:56:46

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

On Sun, 28 Jul 2002, Andrew Morton wrote:

> And we took the lru_cache_del() out of truncate_complete_page()
> because, err, I forget. There was a situation in which the page
> could still be mapped into process pagetables, and the lru_cache_del()
> would have left it unswappable until process exit.
>
> Rik, can you remember the exact scenario?

Truncate vs. the page fault path. It would be possible for
pages to be removed from the lru list by truncate and turning
into anonymous process memory.

This becomes a leak when only pages on the lru list can be
paged out.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-07-29 02:58:57

by David Miller

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

From: Andrew Morton <[email protected]>
Date: Sun, 28 Jul 2002 17:32:56 -0700

Also skb_release_data(), ___pskb_trim() and __pskb_pull_tail(). Can these
ever perform the final release against a page which is on the LRU?
In interrupt context?

These page releases run from either user or softint context.

They must never run from HW irqs, in fact there is a BUG()
check there against this.

Any page that can be found in the page cache can end up here. So
whatever that mean for "release against a page which is on the LRU"
applies here.

2002-07-29 03:47:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, David S. Miller wrote:
>
> Also skb_release_data(), ___pskb_trim() and __pskb_pull_tail(). Can these
> ever perform the final release against a page which is on the LRU?
> In interrupt context?
>
> These page releases run from either user or softint context.
>
> They must never run from HW irqs, in fact there is a BUG()
> check there against this.

>From a page cache standpoint softirq's are 100% equivalent to hardware
irq's, so that doesn't much help here.

> Any page that can be found in the page cache can end up here. So
> whatever that mean for "release against a page which is on the LRU"
> applies here.

Being in the page cache can be ok. What is _not_ ok is if this function
can ever be the last user to release such a page (ie the original page
count of the page had better be held on by something else - which usually
is the page-cacheness itself, since shrinking the page cache will only
happen for pages that are unused).

Linus

2002-07-29 03:51:04

by David Miller

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

From: Linus Torvalds <[email protected]>
Date: Sun, 28 Jul 2002 20:51:13 -0700 (PDT)

On Sun, 28 Jul 2002, David S. Miller wrote:
> They must never run from HW irqs, in fact there is a BUG()
> check there against this.

From a page cache standpoint softirq's are 100% equivalent to
hardware irq's, so that doesn't much help here.

Wait are we trying to make the final freeing of (potentially)
LRU/page-cache pages from any non-base context illegal?

If that really becomes an issue we can do something which moves
this back to user context when the result of doing it in irq
context would be problematic.

2002-07-29 04:05:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, David S. Miller wrote:
> >
> > Also skb_release_data(), ___pskb_trim() and __pskb_pull_tail(). Can these
> > ever perform the final release against a page which is on the LRU?
> > In interrupt context?
> >
> > These page releases run from either user or softint context.
> >
> > They must never run from HW irqs, in fact there is a BUG()
> > check there against this.
>
> >From a page cache standpoint softirq's are 100% equivalent to hardware
> irq's, so that doesn't much help here.

So we cannot take pagemap_lru_lock from softirq context.

hmm. ia32's do_IRQ() doesn't run do_sotfirq() any more, but the
other architectures do. What's up with that?

> > Any page that can be found in the page cache can end up here. So
> > whatever that mean for "release against a page which is on the LRU"
> > applies here.
>
> Being in the page cache can be ok. What is _not_ ok is if this function
> can ever be the last user to release such a page (ie the original page
> count of the page had better be held on by something else - which usually
> is the page-cacheness itself, since shrinking the page cache will only
> happen for pages that are unused).
>

shrink_cache() explicitly removes the page from the LRU (well, it
won't even get that far if someone else has a ref).

truncate_complete_page() _used_ to explicitly remove the page from
the lru, but we took that out. And it was never reliable anyway,
because some pages were left there (invalidatepage failed).

Anyway. I have patches against 2.5.24, which work, which
turn pagemap_lru_lock into an innermost, irq-safe lock. If
we get that in place then page_cache_release() from IRQ context
is fine.

-

2002-07-29 04:13:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

"David S. Miller" wrote:
>
> From: Linus Torvalds <[email protected]>
> Date: Sun, 28 Jul 2002 20:51:13 -0700 (PDT)
>
> On Sun, 28 Jul 2002, David S. Miller wrote:
> > They must never run from HW irqs, in fact there is a BUG()
> > check there against this.
>
> From a page cache standpoint softirq's are 100% equivalent to
> hardware irq's, so that doesn't much help here.
>
> Wait are we trying to make the final freeing of (potentially)
> LRU/page-cache pages from any non-base context illegal?

It already is. The combination of circumstances is pretty
remote, and indeed may never happen. But the final put_page()
against an LRU page will go BUG() because the page is on the LRU.
And a page_cache_reelase() in IRQ context could deadlock over
pagemap_lru_lock() (it'll go BUG in -aa kernels).

> If that really becomes an issue we can do something which moves
> this back to user context when the result of doing it in irq
> context would be problematic.

I don't think it can happen in 2.4. In the truncate case,
the page is taken off the LRU by hand. If do_flushpage()
failed then the buffers still have a ref on the page, which
is undone in shrink_cache(), inside pagemap_lru_lock.

So, probably safe, but way too subtle.

-

2002-07-29 04:16:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, David S. Miller wrote:
> From a page cache standpoint softirq's are 100% equivalent to
> hardware irq's, so that doesn't much help here.
>
> Wait are we trying to make the final freeing of (potentially)
> LRU/page-cache pages from any non-base context illegal?

We're not "trying to". It's always been hugely illegal, because we don't
protect the LRU lists against interrupts etc, so anything that happens at
irq (or bh) time will mess up the LRU lists if it tries to remove a page
from them.

But the thing is, nobody should normally have a reference to such a page
anyway. The only way they happen is by something mapping a page from user
space, and saving it away, while the user space goes away and drops its
references to the page.

And the page cache won't drop its own reference to the page as long as
somebody holds on to it - with the extra special case of truncate(), which
at least used to remove it from the LRU's before it did that. I think one
of Andrews patches undid that truncate() thing, though. I suspect we'll
have to fix _that_ patch, not the other paths.

Linus

2002-07-29 04:19:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, Andrew Morton wrote:
> truncate_complete_page() _used_ to explicitly remove the page from
> the lru, but we took that out. And it was never reliable anyway,
> because some pages were left there (invalidatepage failed).

I think we should try to fix invalidatepage instead, and just always
remove it from the LRU.

(If invalidatepage fails, we can just leave the page as an anonymous page
_off_ the LRU, and let whoever holds a reference to the page eventually
drop it, whatever).

> Anyway. I have patches against 2.5.24, which work, which
> turn pagemap_lru_lock into an innermost, irq-safe lock. If
> we get that in place then page_cache_release() from IRQ context
> is fine.

I'd _really_ really prefer to go the other way. I think this brokenness is
all from that one broken patch that removed the "remove from LRU".

And from what I can tell, that broken patch has no real point to it
anyway.

Linus

2002-07-29 04:24:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, Andrew Morton wrote:
>
> I don't think it can happen in 2.4. In the truncate case,
> the page is taken off the LRU by hand. If do_flushpage()
> failed then the buffers still have a ref on the page, which
> is undone in shrink_cache(), inside pagemap_lru_lock.
>
> So, probably safe, but way too subtle.

That was by no means "subtle", it was all very much "design".

Just undo the broken patch by Rik, and we should all be home free again.

Linus

2002-07-29 04:40:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, Andrew Morton wrote:
> >
> > I don't think it can happen in 2.4. In the truncate case,
> > the page is taken off the LRU by hand. If do_flushpage()
> > failed then the buffers still have a ref on the page, which
> > is undone in shrink_cache(), inside pagemap_lru_lock.
> >
> > So, probably safe, but way too subtle.
>
> That was by no means "subtle", it was all very much "design".

Well Rik and I missed it. Not that this requires much subtlety ;)

> Just undo the broken patch by Rik, and we should all be home free again.

Problem is that the rmap VM doesn't perform swapout via pagetables.
It performs it via the LRU.

If someone is sleeping on a pagefault against a mmapped file, and a
truncate happens meanwhile, that page comes back as an anonymous
page. It's not on the LRU any more so it has become unswappable.

Maybe we can check i_size in filemap_nopage and deliver a SIGBUS
or something.

For now we can go back to the explicit lru_cache_del() in truncate;
the little bit of unswappable memory is preferable to a deadlock
in TCP. I'll send the patch after a bit of testing.

But (squeaky wheel), longer-term I want to make pagemap_lru_lock
irq safe. This reduces contention on that lock by 30% (bringing
the total reduction for that patch series to around 98% - wiped off
the map). And it fixes this problem. And it allows us to move
pages between LRU lists at IO completion, if we want to do that.

-

2002-07-29 04:47:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, Andrew Morton wrote:
>
> Problem is that the rmap VM doesn't perform swapout via pagetables.
> It performs it via the LRU.
>
> If someone is sleeping on a pagefault against a mmapped file, and a
> truncate happens meanwhile, that page comes back as an anonymous
> page. It's not on the LRU any more so it has become unswappable.

Note that there is nothing fundamentally wrong with keeping the anonymous
page on the LRU either.

Some of th e2.4.x kernels kept _all_ anonymous pages on the LRU. That
added a lot of LRU overhead, but there could be a fairly simple
workaround: don't add anon pages to the LRU normally, but if a LRU page is
turned into an anonymous page _and_ it is still mapped, keep it on the LRU
list as an anonymous page.

Then, when it is unmapped, the last unmapper (even if it isn't necessarily
the last count) removes it from the LRU list.

This wouldn't have worked in 2.4.x - because without rmap you can't know
whether a count comes from mapping or not. But with rmap you can know, and
you can also notice the "last unmapper" thing.

Linus

2002-07-29 05:03:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, Andrew Morton wrote:
> >
> > Problem is that the rmap VM doesn't perform swapout via pagetables.
> > It performs it via the LRU.
> >
> > If someone is sleeping on a pagefault against a mmapped file, and a
> > truncate happens meanwhile, that page comes back as an anonymous
> > page. It's not on the LRU any more so it has become unswappable.
>
> Note that there is nothing fundamentally wrong with keeping the anonymous
> page on the LRU either.
>
> Some of th e2.4.x kernels kept _all_ anonymous pages on the LRU. That
> added a lot of LRU overhead, but there could be a fairly simple
> workaround: don't add anon pages to the LRU normally, but if a LRU page is
> turned into an anonymous page _and_ it is still mapped, keep it on the LRU
> list as an anonymous page.

All 2.4 and 2.5 kernels currently put all anon pages on the LRU all the
time. And, yes, this creates great amounts of list scanning in
Andrea's VM. Quite expensive for some workloads.

He took anon pages on and off the LRU several times shortly after
2.4.10. They ended up "on". When I asked him if we could take them
off again earlier this month he said

"the only advantage they give us in 2.4 vm is that they let us know
better when it's time to swap_out. without them, we'd start to swapout
only when there's some significant amount of mapped cache, without
regard of the amount of the amount of anonymous ram. In short swap
performance are higher with this information. We could try to collect
this info without having to waste time on the lru but I've never tried
that."


However, with rmap things are different. When an anon page
reaches the tail of the LRU and the access info is right, it
gets added to swapcache. So anon pages on the LRU is a fundamental
part of rmap - there's no other way of getting at them.

I haven't yet checked whether the rmap design has reduced the CPU
load which I was observing in 2.5.24 from churning these
anon pages through shrink_cache(). Rik thinks it may do so.

-

2002-07-29 05:51:58

by David Miller

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

From: Andrew Morton <[email protected]>
Date: Sun, 28 Jul 2002 21:17:12 -0700

hmm. ia32's do_IRQ() doesn't run do_sotfirq() any more, but the
other architectures do. What's up with that?

Lots of architectes haven't caught up with Ingo's IRQ changes. That
is the changeset where the do_softirq() call in do_IRQ() got deleted
on x86.

Franks a lot,
David S. Miller
[email protected]

2002-07-29 05:51:04

by David Miller

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

From: Linus Torvalds <[email protected]>
Date: Sun, 28 Jul 2002 21:21:12 -0700 (PDT)

But the thing is, nobody should normally have a reference to such a
page anyway. The only way they happen is by something mapping a
page from user space, and saving it away, while the user space goes
away and drops its references to the page.

Ignoring for a moment whether you agree with the idea of zero-copying
userspace pages over sockets, I would at least like to retain the
ability to experiment with something like this.

I'm not all that opposed to Andrew's described scheme of making the
LRU lock IRQ safe. And it would allow one to experiment with
userpage socket zerocopy.

2002-07-29 06:12:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, David S. Miller wrote:
>
> But the thing is, nobody should normally have a reference to such a
> page anyway. The only way they happen is by something mapping a
> page from user space, and saving it away, while the user space goes
> away and drops its references to the page.
>
> Ignoring for a moment whether you agree with the idea of zero-copying
> userspace pages over sockets, I would at least like to retain the
> ability to experiment with something like this.

Oh, you misunderstand.. (probably because I'm unclear)

I'm not saying that getting a page from a user space mapping is bad: a lot
of places do that independently of zero-copy.

But hopefully nobody should have the problematic last reference to a LRU
page _except_ the user space itself. That should be safe for page cache
pages thanks to the truncate change.

And for anonymous pages, I really think that the right solution is to do
the same remove-from-LRU thing for the "last unmap" (which should be
trivial to notice with rmap).

I'm trying to come up with what kept us safe in 2.4.x for anonymous pages,
and I get this sinking feeling that we really aren't.

Which may imply that Andrew's irq-safe LRU list is the right thing to do
after all. At least on 2.4.x (and if you do it there, then my arguments
about it being unnecessary on 2.5.x due to "design" just totally cruble
away, since clearly Andrew was rigth and the "design argument" was total
crud.

Let me sleep on this. Does anybody have any intelligent thoughts?

Linus

2002-07-29 06:18:21

by David Miller

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

From: Linus Torvalds <[email protected]>
Date: Sun, 28 Jul 2002 23:16:24 -0700 (PDT)

[ This is Linus speaking in all the quoted material below... ]

> But the thing is, nobody should normally have a reference to such a
> page anyway. The only way they happen is by something mapping a
> page from user space, and saving it away, while the user space goes
> away and drops its references to the page.
...
But hopefully nobody should have the problematic last reference to a LRU
page _except_ the user space itself. That should be safe for page cache
pages thanks to the truncate change.

[ Now DaveM is talking :-) ]

So let's say that we have a page going out a socket. The socket's FD
has multiple references so when the user exit()'s the anonymous page
is still "in-flight" but the socket isn't closed and thus we won't
wait for the write to complete.

So when the user's reference is dropped, does that operation kill it
from the LRU or will the socket's remaining reference to that page
defer the LRU removal?

2002-07-29 06:23:22

by Paul Mackerras

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Andrew Morton writes:

> hmm. ia32's do_IRQ() doesn't run do_sotfirq() any more, but the
> other architectures do. What's up with that?

The do_softirq() call is now in irq_exit() in include/asm/hardirq.h.

Paul.

2002-07-29 06:23:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()



On Sun, 28 Jul 2002, David S. Miller wrote:
>
> So when the user's reference is dropped, does that operation kill it
> from the LRU or will the socket's remaining reference to that page
> defer the LRU removal?

That is indeed the question. Right now it will defer, which looks like a
bug. Or at least it is a bug without the interrupt-safe LRU manipulations.

I'm starting to be more convinced about Andrew's alternate patch, the
"move LRU lock innermost and make it irq-safe".

Which also would make it saner to do the LRU handling inside
__put_pages_ok() (and actually remove the BUG_ON(in_interrupt()) that
Andrew had there in the old patch).

Linus

2002-07-29 06:47:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:
>
> On Sun, 28 Jul 2002, David S. Miller wrote:
> >
> > So when the user's reference is dropped, does that operation kill it
> > from the LRU or will the socket's remaining reference to that page
> > defer the LRU removal?
>
> That is indeed the question. Right now it will defer, which looks like a
> bug. Or at least it is a bug without the interrupt-safe LRU manipulations.
>
> I'm starting to be more convinced about Andrew's alternate patch, the
> "move LRU lock innermost and make it irq-safe".
>
> Which also would make it saner to do the LRU handling inside
> __put_pages_ok() (and actually remove the BUG_ON(in_interrupt()) that
> Andrew had there in the old patch).
>

That was a big lump of patch, and I need to get all the other
MM developers to say "yes, we can live with this". Everything which
takes the LRU lock needed to be redone to get the holdtimes and
acquisition frequency down.

It's quite unfeasible for 2.4. The only 2.4 kernel which
has the in_interrupt() check is Andrea's. And, I assume,
the SuSE production kernel. So empirically, we're probably OK there.
But the RH kernel has AIO (yes?) which may change the picture.

A simple little hack which would prevent it in 2.4 would be,
in __free_pages_ok():

if (PageLRU(page)) {
if (in_interrupt()) {
SetPageFoo(page);
return;
}
lru_cache_del(page);
}

and in shrink_cache(), inside pagemap_lru_lock:

if (PageFoo(page)) {
__lru_cache_del(page);
BUG_ON(page_count(page) != 0);
page_cache_get(page);
__free_page(page);
continue;
}

This is basically Dave's "defer it to process context", with kswapd
doing the work.

-

2002-07-29 08:35:16

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

On Sun, 28 Jul 2002, Linus Torvalds wrote:

> And for anonymous pages, I really think that the right solution is to do
> the same remove-from-LRU thing for the "last unmap" (which should be
> trivial to notice with rmap).

Unfortunately it isn't, because the whole thing occurs because
of a race between truncate and the page fault path. Whatever
check we do, it's still possible for the race to hit us.

This is because vmtruncate walks the mms one by one, so the mms
already passed by vmtruncate can still fault in the not yet
truncated pages from the file. I've stared at this thing for a
while and didn't come up with a way to prevent this, but maybe
somebody will find it ;)

> Which may imply that Andrew's irq-safe LRU list is the right thing to do
> after all.

According to Andrew it actually increases performance so it
doesn't seem as bad as it sounds.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-07-30 11:26:59

by Ed Tomlinson

[permalink] [raw]
Subject: Re: [patch 2/13] remove pages from the LRU in __free_pages_ok()

Linus Torvalds wrote:

> On Sun, 28 Jul 2002, David S. Miller wrote:
>>
>> So when the user's reference is dropped, does that operation kill it
>> from the LRU or will the socket's remaining reference to that page
>> defer the LRU removal?
>
> That is indeed the question. Right now it will defer, which looks like a
> bug. Or at least it is a bug without the interrupt-safe LRU manipulations.
>
> I'm starting to be more convinced about Andrew's alternate patch, the
> "move LRU lock innermost and make it irq-safe".
>
> Which also would make it saner to do the LRU handling inside
> __put_pages_ok() (and actually remove the BUG_ON(in_interrupt()) that
> Andrew had there in the old patch).

This would also make slablru easier. We could just set the 'release me'
flag when slabs are chained to the free list. In vmscan we make sure we
check if the page is referenced before the release logic.

Ed Tomlinson