2008-02-24 14:47:56

by Jörn Engel

[permalink] [raw]
Subject: Page scan keeps touching kernel text pages

While tracking down some unrelated bug I noticed that shrink_page_list()
keeps testing very low page numbers (aka kernel text) until deciding
that the page lacks a mapping and cannot get freed. Looks like a waste
of cpu and cachelines to me.

Is there a better reason for this behaviour than lack of a patch?

Jörn

--
Joern's library part 11:
http://www.unicom.com/pw/reply-to-harmful.html


2008-02-25 15:06:37

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Sun, Feb 24, 2008 at 03:47:11PM +0100, J?rn Engel wrote:
> While tracking down some unrelated bug I noticed that shrink_page_list()
> keeps testing very low page numbers (aka kernel text) until deciding
> that the page lacks a mapping and cannot get freed. Looks like a waste
> of cpu and cachelines to me.
>
> Is there a better reason for this behaviour than lack of a patch?

shrink_page_list() would be expected to be passed pages pulled from
the active or inactive lists via isolate_lru_pages()? I would not have
expected to find the kernel text on the LRU and therefore not expect to
see it passed to shrink_page_list()?

I would expect to find pages below the kernel text as real pages, and
potentially on the LRU on some architectures. Which architecture are
you seeing this? Which zones do the pages belong?

-apw

2008-02-25 15:16:29

by Jörn Engel

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, 25 February 2008 15:07:24 +0000, Andy Whitcroft wrote:
> On Sun, Feb 24, 2008 at 03:47:11PM +0100, Jörn Engel wrote:
> > While tracking down some unrelated bug I noticed that shrink_page_list()
> > keeps testing very low page numbers (aka kernel text) until deciding
> > that the page lacks a mapping and cannot get freed. Looks like a waste
> > of cpu and cachelines to me.
> >
> > Is there a better reason for this behaviour than lack of a patch?
>
> shrink_page_list() would be expected to be passed pages pulled from
> the active or inactive lists via isolate_lru_pages()? I would not have
> expected to find the kernel text on the LRU and therefore not expect to
> see it passed to shrink_page_list()?

Your expectations match mine. At least someone shares my dilusions. :)

> I would expect to find pages below the kernel text as real pages, and
> potentially on the LRU on some architectures. Which architecture are
> you seeing this? Which zones do the pages belong?

32bit x86 (run in qemu, shouldn't make a difference).

Not sure about the zones. Let me rerun to check that.

Jörn

--
Ninety percent of everything is crap.
-- Sturgeon's Law

2008-02-25 17:36:42

by Jörn Engel

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, 25 February 2008 16:15:36 +0100, Jörn Engel wrote:
> On Mon, 25 February 2008 15:07:24 +0000, Andy Whitcroft wrote:
>
> > I would expect to find pages below the kernel text as real pages, and
> > potentially on the LRU on some architectures. Which architecture are
> > you seeing this? Which zones do the pages belong?
>
> 32bit x86 (run in qemu, shouldn't make a difference).
>
> Not sure about the zones. Let me rerun to check that.

Example output:
scanning zone DMA
page 3fa 3 00000000 628
page 2bf 2 00000000 628
page 97 3 00000000 628
page 98 2 00000000 628
scanning zone DMA
page 2c0 3 00000000 628
page 2c3 2 00000000 628
page 44 3 00000000 628
page 46 2 00000000 628
scanning zone DMA
page 37 3 00000000 628
page 35 2 00000000 628
page 32 3 00000000 628
page 38 2 00000000 628

Looks like all kernel text is in zone DMA. Second column holds the page
number, third is refcount, fourth is the flags, fifth is the line, which
corresponds to this one after my debugging changes:
if (!mapping || !remove_mapping(mapping, page))
goto keep_locked;

Jörn

--
Joern's library part 4:
http://www.paulgraham.com/spam.html

2008-02-25 17:48:41

by Dave Hansen

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, 2008-02-25 at 15:07 +0000, Andy Whitcroft wrote:
> shrink_page_list() would be expected to be passed pages pulled from
> the active or inactive lists via isolate_lru_pages()? I would not have
> expected to find the kernel text on the LRU and therefore not expect to
> see it passed to shrink_page_list()?

It may have been kernel text at one time, but what about __init
functions? Don't we free that section back to the normal allocator
after init time? Those can end up on the LRU.

-- Dave

2008-02-25 18:54:11

by Jörn Engel

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, 25 February 2008 09:48:22 -0800, Dave Hansen wrote:
> On Mon, 2008-02-25 at 15:07 +0000, Andy Whitcroft wrote:
> > shrink_page_list() would be expected to be passed pages pulled from
> > the active or inactive lists via isolate_lru_pages()? I would not have
> > expected to find the kernel text on the LRU and therefore not expect to
> > see it passed to shrink_page_list()?
>
> It may have been kernel text at one time, but what about __init
> functions? Don't we free that section back to the normal allocator
> after init time? Those can end up on the LRU.

Pages below 0x2ba should be non-init in my test kernel:
c02ba000 T __init_begin
...
c02d5000 B __init_end

scanning zone DMA
page 3fa 3 00000000 628
page 2bf 2 00000000 628
page 97 3 00000000 628
page 98 2 00000000 628

So __init explains one page of this minimal sample, but not the other
three.

Jörn

--
Never argue with idiots - first they drag you down to their level,
then they beat you with experience.
-- unknown

2008-02-25 19:20:44

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, Feb 25, 2008 at 07:53:20PM +0100, J?rn Engel wrote:
> On Mon, 25 February 2008 09:48:22 -0800, Dave Hansen wrote:
> > On Mon, 2008-02-25 at 15:07 +0000, Andy Whitcroft wrote:
> > > shrink_page_list() would be expected to be passed pages pulled from
> > > the active or inactive lists via isolate_lru_pages()? I would not have
> > > expected to find the kernel text on the LRU and therefore not expect to
> > > see it passed to shrink_page_list()?
> >
> > It may have been kernel text at one time, but what about __init
> > functions? Don't we free that section back to the normal allocator
> > after init time? Those can end up on the LRU.
>
> Pages below 0x2ba should be non-init in my test kernel:
> c02ba000 T __init_begin
> ...
> c02d5000 B __init_end
>
> scanning zone DMA
> page 3fa 3 00000000 628
> page 2bf 2 00000000 628
> page 97 3 00000000 628
> page 98 2 00000000 628
>
> So __init explains one page of this minimal sample, but not the other
> three.

I thought that init sections were deliberatly pushed to the end of the
kernel when linked, cirtainly on my laptop here that seems to be so.
That would make the first two "after" the kernel. The other two appear
to be before the traditional kernel load address, which is 0x100000, so
those pages are before not in the kernel?

-apw

2008-02-25 19:46:46

by Dave McCracken

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Monday 25 February 2008, Andy Whitcroft wrote:
> I thought that init sections were deliberatly pushed to the end of the
> kernel when linked, cirtainly on my laptop here that seems to be so.
> That would make the first two "after" the kernel. ?The other two appear
> to be before the traditional kernel load address, which is 0x100000, so
> those pages are before not in the kernel?

I believe the memory below the kernel load address on x86 is returned to the
free memory pool at some point during boot, which would explain those
addresses.

Dave McCracken

2008-02-25 20:39:43

by Jörn Engel

[permalink] [raw]
Subject: Re: Page scan keeps touching kernel text pages

On Mon, 25 February 2008 13:46:32 -0600, Dave McCracken wrote:
> On Monday 25 February 2008, Andy Whitcroft wrote:
> > I thought that init sections were deliberatly pushed to the end of the
> > kernel when linked, cirtainly on my laptop here that seems to be so.
> > That would make the first two "after" the kernel.  The other two appear
> > to be before the traditional kernel load address, which is 0x100000, so
> > those pages are before not in the kernel?
>
> I believe the memory below the kernel load address on x86 is returned to the
> free memory pool at some point during boot, which would explain those
> addresses.

It does explain all pages. Sorry about the noise from an mm-newbie.

Jörn

--
Joern's library part 14:
http://www.sandpile.org/