2000-12-13 17:54:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga



On Wed, 13 Dec 2000, Linus Torvalds wrote:
>
> Lookin gat "swapoff()", it could easily be something like
>
> - swapoff walks theough the processes, marking the pages dirty
> (correctly)
> - swapoff goes on to the next swap entry, and because it needs memory for
> this, the VM layer will swap out old entries by marking them dirty in
> the "struct page".
> - final stages of swapoff() removes the swap cache entry, never minding
> the fact that it is marked dirty again in "struct page", and clean in
> various VM page tables.
>
> Ho humm.. I don't think that is it exactly, but something along those
> lines.

Actually, having thought about it for five more minutes, I actually think
that that _is_ it.

If so, the fix looks like it could be really simple. The whole problem
arises from the fact that we remove the page from the swap cache only
_after_ we've walked the page-tables to look at it. It looks like the
fairly trivial fix is simply to remove it from the swap cache before,
getting rid of all such races in swapoff().

Mind trying out this patch?

NOTE! It's untested. It might not work. It might trigger some sanity-test
somewhere else. But it looks like it should do the right thing (the page
might be moved to _another_ swap device early, if there are multiple swap
areas, but even that should be fine - the unuse_process() stuff doesn't
care about what swapcache this actually is any more.

Does this patch make a difference (I moved the delete seven lines upwards,
and removed the test - the test looks extraneous).

Linus

----
--- v2.4.0-test12/linux/mm/swapfile.c Tue Oct 31 12:42:27 2000
+++ linux/mm/swapfile.c Wed Dec 13 09:17:51 2000
@@ -370,6 +370,7 @@
swap_free(entry);
return -ENOMEM;
}
+ delete_from_swap_cache(page);
read_lock(&tasklist_lock);
for_each_task(p)
unuse_process(p->mm, entry, page);
@@ -377,8 +378,6 @@
shm_unuse(entry, page);
/* Now get rid of the extra reference to the temporary
page we've been using. */
- if (PageSwapCache(page))
- delete_from_swap_cache(page);
page_cache_release(page);
/*
* Check for and clear any overflowed swap map counts.


2000-12-13 18:33:02

by Mike Galbraith

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> >
> > Lookin gat "swapoff()", it could easily be something like
> >
> > - swapoff walks theough the processes, marking the pages dirty
> > (correctly)
> > - swapoff goes on to the next swap entry, and because it needs memory for
> > this, the VM layer will swap out old entries by marking them dirty in
> > the "struct page".
> > - final stages of swapoff() removes the swap cache entry, never minding
> > the fact that it is marked dirty again in "struct page", and clean in
> > various VM page tables.
> >
> > Ho humm.. I don't think that is it exactly, but something along those
> > lines.
>
> Actually, having thought about it for five more minutes, I actually think
> that that _is_ it.
>
> If so, the fix looks like it could be really simple. The whole problem
> arises from the fact that we remove the page from the swap cache only
> _after_ we've walked the page-tables to look at it. It looks like the
> fairly trivial fix is simply to remove it from the swap cache before,
> getting rid of all such races in swapoff().
>
> Mind trying out this patch?
>
> NOTE! It's untested. It might not work. It might trigger some sanity-test
> somewhere else. But it looks like it should do the right thing (the page
> might be moved to _another_ swap device early, if there are multiple swap
> areas, but even that should be fine - the unuse_process() stuff doesn't
> care about what swapcache this actually is any more.
>
> Does this patch make a difference (I moved the delete seven lines upwards,
> and removed the test - the test looks extraneous).

Not in my test tree. Same fault, and same trace leading up to it.
I'll run virgin source hard tomorrow to be sure. (No message means
no change)

-Mike

2000-12-13 19:59:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga



On Wed, 13 Dec 2000, Mike Galbraith wrote:
>
> Not in my test tree. Same fault, and same trace leading up to it. no

Ok.

It definitely looks like a swapoff() problem.

Have you ever seen the behaviour without running swapoff?

Also, can you re-create it without running swapon() (if it's something
like a lost dirty bit, it should be possible to trigger even without the
swapon, and I'd like to hear if that can happen - if it only happens with
swapon() and you can't trigger it with just a swapoff() it might be a
question of re-using some swap file stuff and delaying the writeout or
whatever).

Linus

2000-12-13 20:07:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga



Ehh, I think I found it.

Hint: "ptep_mkdirty()".

Oops.

I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that
this explains it.

Linus

2000-12-13 20:50:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga



On Wed, 13 Dec 2000, Linus Torvalds wrote:
>
> Hint: "ptep_mkdirty()".

In case you wonder why the bug was so insidious, what this caused was two
separate problems, both of them able to cause SIGSGV's.

One: we didn't mark the page table entry dirty like we were supposed to.

Two: by making it writable, we also made the page shared, even if it
wasn't supposed to be shared (so when the next process wrote to the page,
if the swap page was shared with somebody else, the changes would show up
even in the process that _didn't_ write to it).

And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
this. Which was why it hadn't been immediately obvious that anything was
broken.

Linus

2000-12-13 21:11:48

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga

On Wed, Dec 13, 2000 at 11:35:57AM -0800, Linus Torvalds wrote:
>
>
> Ehh, I think I found it.
>
> Hint: "ptep_mkdirty()".
>
> Oops.
>
> I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that
> this explains it.
>
> Linus

Good. Sounds like you guys have a handle on it now.

:-)

Jeff

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/

2000-12-13 21:31:14

by Gérard Roudier

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga



On Wed, 13 Dec 2000, Linus Torvalds wrote:

>
>
> Ehh, I think I found it.
>
> Hint: "ptep_mkdirty()".
>
> Oops.
>
> I'll bet you $5 USD (and these days, that's about a gadzillion Euros) that

Poor European G?rard as slim as 1.84 meter - 78 Kg these days.
What about old days poor European Linus versus these days American Linus
on these points ? ;-)

> this explains it.

Really ? :o)

> Linus

G?rard.

2000-12-13 23:20:52

by Rainer Mager

[permalink] [raw]
Subject: RE: Signal 11 - the continuing saga

Err, for those of us who aren't up to our elbows in the kernel code, is
there a patch for this? Presumeably this will be rolled into 2.4.0test13 but
I'd like to try it out? Also, can someone summarize the fix in English along
with the expected, improved behavior (e.g. Linux will never have a signal 11
again and will never, ever crash ;-)

Finally, as soon as there is a patch, can other people who have seen this
problem test it. My problem is so random that I'd need at least a few days
to gain some confidence this is fixed.


Thanks all.

--Rainer

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Linus Torvalds
> Sent: Thursday, December 14, 2000 5:19 AM
> To: Mike Galbraith
> Cc: Kernel Mailing List
> Subject: Re: Signal 11 - the continuing saga
>
>
> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> >
> > Hint: "ptep_mkdirty()".
>
> In case you wonder why the bug was so insidious, what this caused was two
> separate problems, both of them able to cause SIGSGV's.
>
> One: we didn't mark the page table entry dirty like we were supposed to.
>
> Two: by making it writable, we also made the page shared, even if it
> wasn't supposed to be shared (so when the next process wrote to the page,
> if the swap page was shared with somebody else, the changes would show up
> even in the process that _didn't_ write to it).
>
> And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
> this. Which was why it hadn't been immediately obvious that anything was
> broken.
>
> Linus

2000-12-14 04:28:38

by Mike Galbraith

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Mike Galbraith wrote:
> >
> > Not in my test tree. Same fault, and same trace leading up to it. no
>
> Ok.
>
> It definitely looks like a swapoff() problem.
>
> Have you ever seen the behaviour without running swapoff?

No.

> Also, can you re-create it without running swapon() (if it's something
> like a lost dirty bit, it should be possible to trigger even without the
> swapon, and I'd like to hear if that can happen - if it only happens with
> swapon() and you can't trigger it with just a swapoff() it might be a
> question of re-using some swap file stuff and delaying the writeout or
> whatever).

I'll try loading up swap, swapoff and then doing jobs that fit in ram.

(hmm.. what about inactive_clean list when you do swapoff.. might there
be pages sitting there that are [were] swap cache? reclaim_page=kaboom?)

-Mike

2000-12-14 07:53:48

by Mike Galbraith

[permalink] [raw]
Subject: Re: Signal 11 - the continuing saga

On Wed, 13 Dec 2000, Linus Torvalds wrote:

> On Wed, 13 Dec 2000, Linus Torvalds wrote:
> >
> > Hint: "ptep_mkdirty()".

<g> rather obvious oopsie.. once spotted.

> In case you wonder why the bug was so insidious, what this caused was two
> separate problems, both of them able to cause SIGSGV's.
>
> One: we didn't mark the page table entry dirty like we were supposed to.
>
> Two: by making it writable, we also made the page shared, even if it
> wasn't supposed to be shared (so when the next process wrote to the page,
> if the swap page was shared with somebody else, the changes would show up
> even in the process that _didn't_ write to it).
>
> And "ptep_mkdirty()" is only used by swapoff, so nothing else would show
> this. Which was why it hadn't been immediately obvious that anything was
> broken.

The terminal OOM problem is now gone and I haven't seen a SIGSEGV yet
running virgin source.

IOU 5 bogo$$

-Mike

(I still see something with IKD that _could_ be timing related troubles.
There are a couple of grubby fingerprints I need to wipe off, and some
churn/burn hours to be sure)

2000-12-17 23:58:32

by Rainer Mager

[permalink] [raw]
Subject: Signal 11 - revisited

I was wondering if anyone had any new info/suggestions for the Signal 11
problem.

I think I last reported that I had tried 2.4.0test12 w AGPGart and DRM
turned off. This seemed a bit more stable but I did have X crash with
Signall 11 after about 1.5 days.

I'd really appreciate any advice on how to diagnose this.


Thanks,

--Rainer