Hi,
While browsing through rmap.c in 2.5.31 I found what looks like a
bug introduced by the PageDirect optimizations. Look at the following
piece of code in try_to_unmap:
for (pc = page->pte.chain; pc; pc = next_pc) {
next_pc = pc->next;
switch (try_to_unmap_one(page, pc->ptep)) {
case SWAP_SUCCESS:
/* Free the pte_chain struct. */
pte_chain_free(pc, prev_pc, page);
break;
case SWAP_AGAIN:
/* Skip this pte, remembering status. */
prev_pc = pc;
ret = SWAP_AGAIN;
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
break;
case SWAP_ERROR:
ret = SWAP_ERROR;
break;
}
}
Note the strange use of continue and break which both achieve the same!
What was meant to happen (judging from rmap-13c) is that we break
out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
try_to_unmap_one. However, this doesn't happen and a subsequent call
to pte_chain_free will use the wrong value for prev_pc.
The impact seems to be at least leakage of pte_chain structures.
I propose the following (untested) patch:
--- rmap.c Sun Aug 11 03:41:54 2002
+++ /home/ehrhardt/rmap.c Mon Aug 12 15:49:25 2002
@@ -336,9 +336,11 @@
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
+ pc = NULL
break;
case SWAP_ERROR:
ret = SWAP_ERROR;
+ pc = NULL
break;
}
}
regards Christian Ehrhardt
--
THAT'S ALL FOLKS!
On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
> Note the strange use of continue and break which both achieve the same!
> What was meant to happen (judging from rmap-13c) is that we break
> out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
> try_to_unmap_one. However, this doesn't happen and a subsequent call
> to pte_chain_free will use the wrong value for prev_pc.
Excellent hunting! Thank you!
Your fix should work too, although in my opinion it's a
little bit too subtle, so I've changed it into:
case SWAP_FAIL:
ret = SWAP_FAIL;
goto give_up;
case SWAP_ERROR:
ret = SWAP_ERROR;
goto give_up;
}
}
give_up:
This is going into 2.4-rmap and 2.5 right now.
thanks,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Mon, 12 Aug 2002, Rik van Riel wrote:
> On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
>
> > Note the strange use of continue and break which both achieve the same!
> > What was meant to happen (judging from rmap-13c) is that we break
> > out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
> > try_to_unmap_one. However, this doesn't happen and a subsequent call
> > to pte_chain_free will use the wrong value for prev_pc.
>
> Excellent hunting! Thank you!
>
> Your fix should work too, although in my opinion it's a
> little bit too subtle, so I've changed it into:
>
> case SWAP_FAIL:
> ret = SWAP_FAIL;
> goto give_up;
> case SWAP_ERROR:
> ret = SWAP_ERROR;
> goto give_up;
> }
> }
> give_up:
Any chance this is the cause of the following?
---------------extract-----------------------
Subject: Re: [patch 1/21] random fixes
From: Adam Kropelin <[email protected]>
Date: 2002-08-12 2:54:31
FYI, just got this while un-tarring a kernel tree with
2.5.31+everything.gz:
(no nvidia ;)
Date: Sun, 11 Aug 2002 20:40:31 -0700
From: Andrew Morton <[email protected]>
That'll be this one:
BUG_ON(page->pte.chain != NULL);
we've had a few reports of this dribbling in since rmap went in. But
nothing repeatable enough for it to be hunted down.
But we do have a repeatable inconsistency happening with ntpd and
memory pressure. That may be related, but in that case it's probably
related to mlock().
So. An open bug, alas.
On Mon, 12 Aug 2002, Thomas Molina wrote:
> On Mon, 12 Aug 2002, Rik van Riel wrote:
> > On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
> >
> > > Note the strange use of continue and break which both achieve the same!
> > > What was meant to happen (judging from rmap-13c) is that we break
> > Excellent hunting! Thank you!
> Any chance this is the cause of the following?
Yes, quite possible.
> From: Adam Kropelin <[email protected]>
> Date: 2002-08-12 2:54:31
> But we do have a repeatable inconsistency happening with ntpd and
> memory pressure. That may be related, but in that case it's probably
> related to mlock().
kind regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Rik van Riel wrote:
>
> On Mon, 12 Aug 2002, Thomas Molina wrote:
> > On Mon, 12 Aug 2002, Rik van Riel wrote:
> > > On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
> > >
> > > > Note the strange use of continue and break which both achieve the same!
> > > > What was meant to happen (judging from rmap-13c) is that we break
> > > Excellent hunting! Thank you!
> > Any chance this is the cause of the following?
>
> Yes, quite possible.
>
Well Adam reported it against the patched version, in which
is appears that I accidentally fixed that bug. So we may
yet have a problem:
for (pc = start; pc; pc = next_pc) {
int i;
next_pc = pc->next;
if (next_pc)
prefetch(next_pc);
for (i = 0; i < NRPTE; i++) {
pte_t *p = pc->ptes[i];
if (!p)
continue;
if (victim_i == -1)
victim_i = i;
switch (try_to_unmap_one(page, p)) {
case SWAP_SUCCESS:
/*
* Release a slot. If we're releasing the
* first pte in the first pte_chain then
* pc->ptes[i] and start->ptes[victim_i] both
* refer to the same thing. It works out.
*/
pc->ptes[i] = start->ptes[victim_i];
start->ptes[victim_i] = NULL;
dec_page_state(nr_reverse_maps);
victim_i++;
if (victim_i == NRPTE) {
page->pte.chain = start->next;
pte_chain_free(start);
start = page->pte.chain;
victim_i = 0;
}
break;
case SWAP_AGAIN:
/* Skip this pte, remembering status. */
ret = SWAP_AGAIN;
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
goto out;
case SWAP_ERROR:
ret = SWAP_ERROR;
goto out;
}
}
}
out:
return ret;
}