2002-08-12 13:41:40

by Christian Ehrhardt

[permalink] [raw]
Subject: pte_chain leak in rmap code (2.5.31)


Hi,

While browsing through rmap.c in 2.5.31 I found what looks like a
bug introduced by the PageDirect optimizations. Look at the following
piece of code in try_to_unmap:

for (pc = page->pte.chain; pc; pc = next_pc) {
next_pc = pc->next;
switch (try_to_unmap_one(page, pc->ptep)) {
case SWAP_SUCCESS:
/* Free the pte_chain struct. */
pte_chain_free(pc, prev_pc, page);
break;
case SWAP_AGAIN:
/* Skip this pte, remembering status. */
prev_pc = pc;
ret = SWAP_AGAIN;
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
break;
case SWAP_ERROR:
ret = SWAP_ERROR;
break;
}
}

Note the strange use of continue and break which both achieve the same!
What was meant to happen (judging from rmap-13c) is that we break
out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
try_to_unmap_one. However, this doesn't happen and a subsequent call
to pte_chain_free will use the wrong value for prev_pc.

The impact seems to be at least leakage of pte_chain structures.

I propose the following (untested) patch:

--- rmap.c Sun Aug 11 03:41:54 2002
+++ /home/ehrhardt/rmap.c Mon Aug 12 15:49:25 2002
@@ -336,9 +336,11 @@
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
+ pc = NULL
break;
case SWAP_ERROR:
ret = SWAP_ERROR;
+ pc = NULL
break;
}
}

regards Christian Ehrhardt

--
THAT'S ALL FOLKS!


2002-08-12 14:17:40

by Rik van Riel

[permalink] [raw]
Subject: Re: pte_chain leak in rmap code (2.5.31)

On Mon, 12 Aug 2002, Christian Ehrhardt wrote:

> Note the strange use of continue and break which both achieve the same!
> What was meant to happen (judging from rmap-13c) is that we break
> out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
> try_to_unmap_one. However, this doesn't happen and a subsequent call
> to pte_chain_free will use the wrong value for prev_pc.

Excellent hunting! Thank you!

Your fix should work too, although in my opinion it's a
little bit too subtle, so I've changed it into:

case SWAP_FAIL:
ret = SWAP_FAIL;
goto give_up;
case SWAP_ERROR:
ret = SWAP_ERROR;
goto give_up;
}
}
give_up:

This is going into 2.4-rmap and 2.5 right now.

thanks,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-13 00:47:22

by Thomas Molina

[permalink] [raw]
Subject: Re: pte_chain leak in rmap code (2.5.31)

On Mon, 12 Aug 2002, Rik van Riel wrote:

> On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
>
> > Note the strange use of continue and break which both achieve the same!
> > What was meant to happen (judging from rmap-13c) is that we break
> > out of the for-Loop once SWAP_FAIL or SWAP_ERROR is returned from
> > try_to_unmap_one. However, this doesn't happen and a subsequent call
> > to pte_chain_free will use the wrong value for prev_pc.
>
> Excellent hunting! Thank you!
>
> Your fix should work too, although in my opinion it's a
> little bit too subtle, so I've changed it into:
>
> case SWAP_FAIL:
> ret = SWAP_FAIL;
> goto give_up;
> case SWAP_ERROR:
> ret = SWAP_ERROR;
> goto give_up;
> }
> }
> give_up:

Any chance this is the cause of the following?

---------------extract-----------------------

Subject: Re: [patch 1/21] random fixes
From: Adam Kropelin <[email protected]>
Date: 2002-08-12 2:54:31

FYI, just got this while un-tarring a kernel tree with
2.5.31+everything.gz:
(no nvidia ;)

Date: Sun, 11 Aug 2002 20:40:31 -0700
From: Andrew Morton <[email protected]>

That'll be this one:

BUG_ON(page->pte.chain != NULL);

we've had a few reports of this dribbling in since rmap went in. But
nothing repeatable enough for it to be hunted down.

But we do have a repeatable inconsistency happening with ntpd and
memory pressure. That may be related, but in that case it's probably
related to mlock().

So. An open bug, alas.



2002-08-13 01:06:10

by Rik van Riel

[permalink] [raw]
Subject: Re: pte_chain leak in rmap code (2.5.31)

On Mon, 12 Aug 2002, Thomas Molina wrote:
> On Mon, 12 Aug 2002, Rik van Riel wrote:
> > On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
> >
> > > Note the strange use of continue and break which both achieve the same!
> > > What was meant to happen (judging from rmap-13c) is that we break
> > Excellent hunting! Thank you!
> Any chance this is the cause of the following?

Yes, quite possible.

> From: Adam Kropelin <[email protected]>
> Date: 2002-08-12 2:54:31

> But we do have a repeatable inconsistency happening with ntpd and
> memory pressure. That may be related, but in that case it's probably
> related to mlock().

kind regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-08-13 02:59:29

by Andrew Morton

[permalink] [raw]
Subject: Re: pte_chain leak in rmap code (2.5.31)

Rik van Riel wrote:
>
> On Mon, 12 Aug 2002, Thomas Molina wrote:
> > On Mon, 12 Aug 2002, Rik van Riel wrote:
> > > On Mon, 12 Aug 2002, Christian Ehrhardt wrote:
> > >
> > > > Note the strange use of continue and break which both achieve the same!
> > > > What was meant to happen (judging from rmap-13c) is that we break
> > > Excellent hunting! Thank you!
> > Any chance this is the cause of the following?
>
> Yes, quite possible.
>

Well Adam reported it against the patched version, in which
is appears that I accidentally fixed that bug. So we may
yet have a problem:

for (pc = start; pc; pc = next_pc) {
int i;

next_pc = pc->next;
if (next_pc)
prefetch(next_pc);
for (i = 0; i < NRPTE; i++) {
pte_t *p = pc->ptes[i];

if (!p)
continue;
if (victim_i == -1)
victim_i = i;

switch (try_to_unmap_one(page, p)) {
case SWAP_SUCCESS:
/*
* Release a slot. If we're releasing the
* first pte in the first pte_chain then
* pc->ptes[i] and start->ptes[victim_i] both
* refer to the same thing. It works out.
*/
pc->ptes[i] = start->ptes[victim_i];
start->ptes[victim_i] = NULL;
dec_page_state(nr_reverse_maps);
victim_i++;
if (victim_i == NRPTE) {
page->pte.chain = start->next;
pte_chain_free(start);
start = page->pte.chain;
victim_i = 0;
}
break;
case SWAP_AGAIN:
/* Skip this pte, remembering status. */
ret = SWAP_AGAIN;
continue;
case SWAP_FAIL:
ret = SWAP_FAIL;
goto out;
case SWAP_ERROR:
ret = SWAP_ERROR;
goto out;
}
}
}
out:
return ret;
}