2001-11-06 09:26:23

by Andrew Morton

[permalink] [raw]
Subject: ext3-0.9.15 against linux-2.4.14

Download details and documentation are at

http://www.uow.edu.au/~andrewm/linux/ext3/

Changes since ext3-0.9.13 (which was against linux-2.4.13):

- Fixed a null-pointer dereference oops which could hit on
SMP machines. This fix was applied to 2.4.12-ac6, but the
oops has never been reported against -ac kernels.

- Large amounts of developer debug code has been removed. This
will now be maintained separately.

- There is an interaction failure between ext3 and the current
Extended Attributes and Access Control Lists patch which leads
to crashes under heavy load on SMP. This is possibly due to
a subtle API change between ext3 in 2.2 and 2.4 kernels (ie: I
broke it). On the to-do list.

- For a long time, the ext3 patch has used a semaphore in the core
kernel to prevent concurrent pagein and truncate of the same
file. This was to prevent a race wherein the paging-in task
would wake up after the truncate and would instantiate a page
in the process's page tables which had attached buffers. This
leads to a BUG() if the swapout code tries to swap the page out.

This semaphore has been removed. The swapout code has been altered
to simply detect and ignore these pages.

This is an incredibly obscure and hard-to-hit situation. The testcase
which used to trigger it can no longer do so. So if anyone sees the
message "try_to_swap_out: page has buffers!", please shout out.

There are no plans to remove this semaphore from -ac kernels,
unless Alan wants it that way.

-


2001-11-06 09:35:44

by Alan

[permalink] [raw]
Subject: Re: ext3-0.9.15 against linux-2.4.14

> There are no plans to remove this semaphore from -ac kernels,
> unless Alan wants it that way.

That should just come out by magic as the VM and other stuff converge

2001-11-06 18:09:36

by Steven N. Hirsch

[permalink] [raw]
Subject: Re: ext3-0.9.15 against linux-2.4.14

On Tue, 6 Nov 2001, Andrew Morton wrote:

> Download details and documentation are at
>
> http://www.uow.edu.au/~andrewm/linux/ext3/
>
> Changes since ext3-0.9.13 (which was against linux-2.4.13):
>
> - For a long time, the ext3 patch has used a semaphore in the core
> kernel to prevent concurrent pagein and truncate of the same
> file. This was to prevent a race wherein the paging-in task
> would wake up after the truncate and would instantiate a page
> in the process's page tables which had attached buffers. This
> leads to a BUG() if the swapout code tries to swap the page out.
>
> This semaphore has been removed. The swapout code has been altered
> to simply detect and ignore these pages.
>
> This is an incredibly obscure and hard-to-hit situation. The testcase
> which used to trigger it can no longer do so. So if anyone sees the
> message "try_to_swap_out: page has buffers!", please shout out.

Andrew,

I have been getting thousands of these when the system was under heavy
load, but didn't realize it was from the ext3 code! I'm using Linus's
2.4.14-pre7 + ext3 patch from Neil Brown's site (the latter is identified
as "ZeroNineFourteen".) Would you like me to upgrade kernel and patch?

Steve


2001-11-06 18:54:48

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3-0.9.15 against linux-2.4.14

"Steven N. Hirsch" wrote:
>
> On Tue, 6 Nov 2001, Andrew Morton wrote:
>
> > Download details and documentation are at
> >
> > http://www.uow.edu.au/~andrewm/linux/ext3/
> >
> > Changes since ext3-0.9.13 (which was against linux-2.4.13):
> >
> > - For a long time, the ext3 patch has used a semaphore in the core
> > kernel to prevent concurrent pagein and truncate of the same
> > file. This was to prevent a race wherein the paging-in task
> > would wake up after the truncate and would instantiate a page
> > in the process's page tables which had attached buffers. This
> > leads to a BUG() if the swapout code tries to swap the page out.
> >
> > This semaphore has been removed. The swapout code has been altered
> > to simply detect and ignore these pages.
> >
> > This is an incredibly obscure and hard-to-hit situation. The testcase
> > which used to trigger it can no longer do so. So if anyone sees the
> > message "try_to_swap_out: page has buffers!", please shout out.
>
> Andrew,
>
> I have been getting thousands of these when the system was under heavy
> load, but didn't realize it was from the ext3 code! I'm using Linus's
> 2.4.14-pre7 + ext3 patch from Neil Brown's site (the latter is identified
> as "ZeroNineFourteen".) Would you like me to upgrade kernel and patch?
>

Now that's interesting. The printk is in there so I can ensure
that the codepath gets tested and is known to work.

Could you please send me details of the hardware setup, URL
for Neil's patch and a description of the workload? Whatever
I need to make it happen locally.

If the message bothers you, please just remove the printk from
vmscan.c.

2001-11-07 18:05:28

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3-0.9.15 against linux-2.4.14

Stephen Tweedie wrote:
>
> Andrew, the code
>
> if (page->buffers) {
> /*
> * Anonymous buffercache page left behind by
> * truncate.
> */
> printk(__FUNCTION__ ": page has buffers!\n");
> goto preserve;
> }
>
> is going to end up preserving the pte forever and shouting to syslog
> every time the VM walks over the pte in question. I'd be just as
> happy dropping these ptes on the floor when we find them, as they are
> clearly of no use to anybody at this point.
>

Yes, perhaps we could do something smarter - I wasn't even sure it
was possible to hit any more (still waiting to hear back from
Steve Hirsch!)

The idea is that in this rare case, shrink_cache() will at
some later time revisit the page and again try to remove its
buffers, and will succeed. It's still on the LRU.

We definitely need to kill the printk(), but I really want
to get to test this code path locally.

-

2001-11-07 17:08:27

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: ext3-0.9.15 against linux-2.4.14

Hi,

On Tue, Nov 06, 2001 at 01:09:42PM -0500, Steven N. Hirsch wrote:

> > This is an incredibly obscure and hard-to-hit situation. The testcase
> > which used to trigger it can no longer do so. So if anyone sees the
> > message "try_to_swap_out: page has buffers!", please shout out.

> I have been getting thousands of these when the system was under heavy
> load, but didn't realize it was from the ext3 code! I'm using Linus's
> 2.4.14-pre7 + ext3 patch from Neil Brown's site (the latter is identified
> as "ZeroNineFourteen".) Would you like me to upgrade kernel and patch?

Andrew, the code

if (page->buffers) {
/*
* Anonymous buffercache page left behind by
* truncate.
*/
printk(__FUNCTION__ ": page has buffers!\n");
goto preserve;
}

is going to end up preserving the pte forever and shouting to syslog
every time the VM walks over the pte in question. I'd be just as
happy dropping these ptes on the floor when we find them, as they are
clearly of no use to anybody at this point.

Cheers,
Stephen