2010-01-10 02:09:58

by Andi Kleen

[permalink] [raw]
Subject: 2.6.33 pagemap endless read loop


An LTP run on x86-64/2.6.33-rc3 run right now results in "proc01" hanging

lr-x------ 1 root root 64 2010-01-10 02:28 7 -> /proc/2679/task/2679/pagemap

strace shows an endless loop of

read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 10

cat of that file hangs in the same way so it seems like a kernel bug.
A cat of /proc/2679/pagemap also hangs, in fact any pagemap cat seems to.

I tried to revert the latest change from Horiguchi-san:

commit 5dc37642cbce34619e4588a9f0bdad1d2f870956
Author: Naoya Horiguchi <[email protected]>
Date: Mon Dec 14 18:00:01 2009 -0800

mm hugetlb: add hugepage support to pagemap

but that didn't fix it, so it must be something else.

Haven't done further bisect or anything, should be easy to reproduce.

-Andi

--
[email protected] -- Speaking for myself only.


2010-01-10 06:31:54

by Cong Wang

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

On Sun, Jan 10, 2010 at 03:09:55AM +0100, Andi Kleen wrote:
>
>An LTP run on x86-64/2.6.33-rc3 run right now results in "proc01" hanging
>
>lr-x------ 1 root root 64 2010-01-10 02:28 7 -> /proc/2679/task/2679/pagemap
>
>strace shows an endless loop of
>
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 1024
>read(7, "\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0"..., 1024) = 10
>
>cat of that file hangs in the same way so it seems like a kernel bug.
>A cat of /proc/2679/pagemap also hangs, in fact any pagemap cat seems to.
>

Are you sure?

On my 32bit machine, it runs for a long time, but finally exits.
>From the document of pagemap, it is the mapping of all the virtual pages
of a process, on x86-32, it should be 4G/4K = 1024*1024 pages, also
that when you 'cat' it, tty layer is involved to display it too.

--
Live like a child, think like the god.

2010-01-10 12:37:10

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

> Are you sure

I don't know if it's really endless, but it's doing a good emulation
of it if it's not.

>
> On my 32bit machine, it runs for a long time, but finally exits.

That was on 64bit.

proc01 had accumulated several minutes of CPU time when I stopped it.

> From the document of pagemap, it is the mapping of all the virtual pages
> of a process, on x86-32, it should be 4G/4K = 1024*1024 pages, also

Only the mapped ones surely?

Right now it looks like it dumps all the holes and that obviously
will never finish on 64bit which has 47bits of address space.

-Andi
--
[email protected] -- Speaking for myself only.

2010-01-12 06:00:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

On Sun, 2010-01-10 at 13:37 +0100, Andi Kleen wrote:
> > Are you sure
>
> I don't know if it's really endless, but it's doing a good emulation
> of it if it's not.
>
> >
> > On my 32bit machine, it runs for a long time, but finally exits.
>
> That was on 64bit.
>
> proc01 had accumulated several minutes of CPU time when I stopped it.
>
> > From the document of pagemap, it is the mapping of all the virtual pages
> > of a process, on x86-32, it should be 4G/4K = 1024*1024 pages, also
>
> Only the mapped ones surely?
>
> Right now it looks like it dumps all the holes and that obviously
> will never finish on 64bit which has 47bits of address space.

Takes forever if you whack .pagemap_pte_hole too, just spends it's life
in pagemap_pte_range() instead.

start_vaddr:0x0 end_vaddr:0x7ffffffff000

Hm, zillion paces ain't a walk.

-Mike

2010-01-12 15:30:23

by Cong Wang

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

On Sun, Jan 10, 2010 at 01:37:02PM +0100, Andi Kleen wrote:
>> Are you sure
>
>I don't know if it's really endless, but it's doing a good emulation
>of it if it's not.
>
>>
>> On my 32bit machine, it runs for a long time, but finally exits.
>
>That was on 64bit.
>
>proc01 had accumulated several minutes of CPU time when I stopped it.
>
>> From the document of pagemap, it is the mapping of all the virtual pages
>> of a process, on x86-32, it should be 4G/4K = 1024*1024 pages, also
>
>Only the mapped ones surely?
>
>Right now it looks like it dumps all the holes and that obviously
>will never finish on 64bit which has 47bits of address space.
>

Hmm, could you please try:

time cat /proc/self/pagemap >/dev/null

? Is it much faster than:

time cat /proc/self/pagemap

?

Thanks!
--
Live like a child, think like the god.

2010-01-12 16:28:33

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

> Hmm, could you please try:
>
> time cat /proc/self/pagemap >/dev/null
>

I stopped it after a minute of CPU time...

Obviously reading 47bits of zeroes take a long time.

-Andi
--
[email protected] -- Speaking for myself only.

2010-01-13 02:09:07

by Cong Wang

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

On Wed, Jan 13, 2010 at 12:28 AM, Andi Kleen <[email protected]> wrote:
>> Hmm, could you please try:
>>
>>      time cat /proc/self/pagemap >/dev/null
>>
>
> I stopped it after a minute of CPU time...
>
> Obviously reading 47bits of zeroes take a long time.
>

Yeah, but this just proves it is not endless, doesn't it? :)

Thanks.

2010-01-14 08:15:08

by Fengguang Wu

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

On Thu, Jan 14, 2010 at 02:10:13PM +0800, Andrew Morton wrote:
> On Thu, 14 Jan 2010 13:53:19 +0800 Wu Fengguang <[email protected]> wrote:
>
> > On Thu, Jan 14, 2010 at 07:15:02AM +0800, Andrew Morton wrote:

> > @@ -740,6 +740,11 @@ static ssize_t pagemap_read(struct file
> > if (!ptrace_may_access(task, PTRACE_MODE_READ))
> > goto out_task;
> >
> > + /* stop dumb cp/cat */
> > + ret = -EFAULT;
> > + if (!*ppos)
> > + goto out_task;
> > +
> > ret = -EINVAL;
> > /* file position must be aligned */
> > if ((*ppos % PM_ENTRY_BYTES) || (count % PM_ENTRY_BYTES))
>
> Seems simple. But is there a usecase for reading at that offset?
> I think it's possible to map a page at 0 with MAP_FIXED. I forget..

How about this trick?
Tests show that cp/cat will exit if read nothing :)

---
pagemap: early return on unmapped areas

This helps stop dumb cp/cat early instead of loop for days:

http://bugzilla.kernel.org/show_bug.cgi?id=15041

This also serves as an optimization to normal pagemap users (eg.
page-types).

CC: Andi Kleen <[email protected]>
CC: Matt Mackall <[email protected]>
CC: <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/proc/task_mmu.c | 4 ++++
1 file changed, 4 insertions(+)

--- linux-mm.orig/fs/proc/task_mmu.c 2010-01-14 13:38:44.000000000 +0800
+++ linux-mm/fs/proc/task_mmu.c 2010-01-14 15:54:28.000000000 +0800
@@ -586,6 +586,10 @@ static int pagemap_pte_hole(unsigned lon
struct pagemapread *pm = walk->private;
unsigned long addr;
int err = 0;
+
+ if ((end - start) / PAGE_SIZE > pm->end - pm->out)
+ return PM_END_OF_BUFFER;
+
for (addr = start; addr < end; addr += PAGE_SIZE) {
err = add_to_pagemap(addr, PM_NOT_PRESENT, pm);
if (err)

2010-01-14 08:30:53

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.33 pagemap endless read loop

> ---
> pagemap: early return on unmapped areas

Yes seems like a reasonable idea.

-Andi

2010-01-14 08:53:01

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH] pagemap: early return on unmapped areas

On Thu, Jan 14, 2010 at 04:30:48PM +0800, Andi Kleen wrote:
> > ---
> > pagemap: early return on unmapped areas
>
> Yes seems like a reasonable idea.

Thanks!

I just changed the ">" to ">=", because it happen to make a difference
for cp/cat, which does 4k sized read:

4k read buffer = 4k/8 pages = 512 pages = NR_PMD_PAGES

So the change to ">=" makes cp exit immediately (4k buffer matches
exactly one PMD hole).

before
dd if=/proc/$$/pagemap of=/dev/null bs=4k
512+0 records in
512+0 records out
2097152 bytes (2.1 MB) copied, 0.0225796 s, 92.9 MB/s

after
dd if=/proc/$$/pagemap of=/dev/null bs=4k
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.8029e-05 s, 0.0 kB/s

page-types still works (only faster;) with this patch.

Thanks,
Fengguang
---
pagemap: early return on unmapped areas

This helps stop dumb cp/cat early instead of loop for days:

http://bugzilla.kernel.org/show_bug.cgi?id=15041

This also serves as an optimization to normal pagemap users (eg.
page-types).

CC: Andi Kleen <[email protected]>
CC: Matt Mackall <[email protected]>
CC: <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/proc/task_mmu.c | 4 ++++
1 file changed, 4 insertions(+)

--- linux-mm.orig/fs/proc/task_mmu.c 2010-01-14 13:38:44.000000000 +0800
+++ linux-mm/fs/proc/task_mmu.c 2010-01-14 16:38:49.000000000 +0800
@@ -586,6 +586,10 @@ static int pagemap_pte_hole(unsigned lon
struct pagemapread *pm = walk->private;
unsigned long addr;
int err = 0;
+
+ if ((end - start) / PAGE_SIZE >= pm->end - pm->out)
+ return PM_END_OF_BUFFER;
+
for (addr = start; addr < end; addr += PAGE_SIZE) {
err = add_to_pagemap(addr, PM_NOT_PRESENT, pm);
if (err)

2010-01-14 17:02:17

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH] pagemap: early return on unmapped areas

On Thu, 2010-01-14 at 16:52 +0800, Wu Fengguang wrote:
> On Thu, Jan 14, 2010 at 04:30:48PM +0800, Andi Kleen wrote:
> > > ---
> > > pagemap: early return on unmapped areas
> >
> > Yes seems like a reasonable idea.
>
> Thanks!
>
> I just changed the ">" to ">=", because it happen to make a difference
> for cp/cat, which does 4k sized read:
>
> 4k read buffer = 4k/8 pages = 512 pages = NR_PMD_PAGES
>
> So the change to ">=" makes cp exit immediately (4k buffer matches
> exactly one PMD hole).
>
> before
> dd if=/proc/$$/pagemap of=/dev/null bs=4k
> 512+0 records in
> 512+0 records out
> 2097152 bytes (2.1 MB) copied, 0.0225796 s, 92.9 MB/s
>
> after
> dd if=/proc/$$/pagemap of=/dev/null bs=4k
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 2.8029e-05 s, 0.0 kB/s

Have you guys lost your minds?

I regularly used the exact standard tool you just broke above to test
pagemap when I was developing it. There's no doubt that this patch (or
any similarly misguided ones) WILL regress existing naive or not
intended to be portable 32-bit users. And if I was some poor future
bastard trying to write a program against pagemap with your patch, odds
of getting a concussion banging my head against the wall trying to
figure out its highly non-file-like behavior would be quite high.

Why are we even still on this. There's no regression. LTP + pagemap +
64-bit has always behaved thus. Nor are large virtual files new. And
LTP's n00b assumption that it can just read all of any file in /proc has
always been wrong. LTP had the exact same problem on 64-bit /proc/kcore
(how long has that existed?) and finally fixed it in 2005.

--
http://selenic.com : development and support for Mercurial and Linux