LinuxLists.cc - [RFC BUG?] dereference PAGE

2007-05-02 03:26:18

Subject: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

Hi.

I don't know why, but when I'm dereferencing PAGE_OFFSET(0xC0000000 on
x86) address from user space on rc7-mm2 I don't receive SIGSEGV signal
and there is no any core dump.
btw: on poor rc-7 all is ok.

test_code:
---
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>

#define PAGE_OFFSET 0xC0000000

static void _error(const char *msg)
{
int errcode = errno;

fprintf(stderr, "ERROR:\n--> %s [%s]\n", msg, strerror(errcode));
exit(EXIT_FAILURE);
}

int main(void)
{
int pid;
struct rlimit rl;
int status;

printf("Trying to cause ELF core dump...\n");

rl.rlim_cur = rl.rlim_max = 0x10000000;

if(setrlimit(RLIMIT_CORE, &rl) < 0)
_error("setrlim error!");
if((pid = fork()) < 0)
_error("fork error!");
else if(pid == 0) {
*((long*) PAGE_OFFSET ) = 0; /* trying to dereference kernel start address */
_exit(EXIT_SUCCESS);
}

if(waitpid(pid, &status, 0) < 0)
_error("waitpid error!");
if(WCOREDUMP(status))
printf("All is ok. We receive SIGSEGV and core dump has occured.\n");
else
printf("All is bad. We don't receive SIGSEGV and core dump hasn't
occured. (WHY?!)\n"); /* here I just get SIGCHLD, this means that
child process have made it's work success... */

exit(EXIT_SUCCESS);
}
---

[asgard@midgard]$ uname -a
Linux midgard 2.6.21-rc7-mm2 #5 SMP Wed May 2 04:15:09 MSD 2007 i686 GNU/Linux

[asgard@midgard]$ ./a.out
Trying to cause ELF core dump...
All is bad. We don't receive SIGSEGV and core dump hasn't occured. (WHY?!)

---

[asgard@midgard]$ uname -a
Linux midgard 2.6.21-rc7 #5 SMP Wed May 2 02:11:50 MSD 2007 i686 GNU/Linux

[asgard@midgard]$ ./a.out
Trying to cause ELF core dump...
All is ok. We receive SIGSEGV and core dump has occured.

With best regards.
Dan Kruchinin.

2007-05-02 07:52:07

by Andrew Morton

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

On Wed, 2 May 2007 07:26:13 +0400 "Dan Kruchinin" <[email protected]> wrote:

> Hi.
>
> I don't know why, but when I'm dereferencing PAGE_OFFSET(0xC0000000 on
> x86) address from user space on rc7-mm2 I don't receive SIGSEGV signal
> and there is no any core dump.
> btw: on poor rc-7 all is ok.
>
> test_code:
> ---
> #include <stdio.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <unistd.h>
> #include <sys/wait.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <sys/types.h>
>
> #define PAGE_OFFSET 0xC0000000
>
> static void _error(const char *msg)
> {
> int errcode = errno;
>
> fprintf(stderr, "ERROR:\n--> %s [%s]\n", msg, strerror(errcode));
> exit(EXIT_FAILURE);
> }
>
> int main(void)
> {
> int pid;
> struct rlimit rl;
> int status;
>
> printf("Trying to cause ELF core dump...\n");
>
> rl.rlim_cur = rl.rlim_max = 0x10000000;
>
> if(setrlimit(RLIMIT_CORE, &rl) < 0)
> _error("setrlim error!");
> if((pid = fork()) < 0)
> _error("fork error!");
> else if(pid == 0) {
> *((long*) PAGE_OFFSET ) = 0; /* trying to dereference kernel start address */
> _exit(EXIT_SUCCESS);
> }
>
> if(waitpid(pid, &status, 0) < 0)
> _error("waitpid error!");
> if(WCOREDUMP(status))
> printf("All is ok. We receive SIGSEGV and core dump has occured.\n");
> else
> printf("All is bad. We don't receive SIGSEGV and core dump hasn't
> occured. (WHY?!)\n"); /* here I just get SIGCHLD, this means that
> child process have made it's work success... */
>
> exit(EXIT_SUCCESS);
> }
> ---
>
> [asgard@midgard]$ uname -a
> Linux midgard 2.6.21-rc7-mm2 #5 SMP Wed May 2 04:15:09 MSD 2007 i686 GNU/Linux
>
> [asgard@midgard]$ ./a.out
> Trying to cause ELF core dump...
> All is bad. We don't receive SIGSEGV and core dump hasn't occured. (WHY?!)
>
> ---
>
> [asgard@midgard]$ uname -a
> Linux midgard 2.6.21-rc7 #5 SMP Wed May 2 02:11:50 MSD 2007 i686 GNU/Linux
>
> [asgard@midgard]$ ./a.out
> Trying to cause ELF core dump...
> All is ok. We receive SIGSEGV and core dump has occured.
>

Thanks for the report. I can reproduce it.

Bisection shows that x86_64-mm-paravirt-initial-pagetable.patch caused
this.

I didn't check whether the patch actually permits us to read kernel
memory. Probably it does. Probably we'd prefer that it didn't ;)

2007-05-02 08:47:14

by Bill Irwin

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

On Wed, May 02, 2007 at 12:51:40AM -0700, Andrew Morton wrote:
> Thanks for the report. I can reproduce it.
> Bisection shows that x86_64-mm-paravirt-initial-pagetable.patch caused
> this.
> I didn't check whether the patch actually permits us to read kernel
> memory. Probably it does. Probably we'd prefer that it didn't ;)

Brown paper bag time. I don't know how it got past me.

-- wli

2007-05-02 10:06:21

by Bill Irwin

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

On Wed, May 02, 2007 at 12:51:40AM -0700, Andrew Morton wrote:
>> Thanks for the report. I can reproduce it.
>> Bisection shows that x86_64-mm-paravirt-initial-pagetable.patch caused
>> this.
>> I didn't check whether the patch actually permits us to read kernel
>> memory. Probably it does. Probably we'd prefer that it didn't ;)

On Wed, May 02, 2007 at 01:46:17AM -0700, Bill Irwin wrote:
> Brown paper bag time. I don't know how it got past me.

Brain dump before crashing for the night:

The patch refuses to clobber already-present pagetable entries of
whatever origin. There are pagetables prior to this setup covering the
address range just above PAGE_OFFSET. If this theory is correct, you
should only be able to go a few MB above PAGE_OFFSET before encountering
unreadable kernel memory. IIRC those pagetables are a statically
allocated array in assembly; altering that array to set supervisor bits
may resolve it, though it may also be freed as initmem.

-- wli

2007-05-02 16:29:42

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

Bill Irwin wrote:
> Brain dump before crashing for the night:
>
> The patch refuses to clobber already-present pagetable entries of
> whatever origin. There are pagetables prior to this setup covering the
> address range just above PAGE_OFFSET. If this theory is correct, you
> should only be able to go a few MB above PAGE_OFFSET before encountering
> unreadable kernel memory. IIRC those pagetables are a statically
> allocated array in assembly; altering that array to set supervisor bits
> may resolve it, though it may also be freed as initmem.
>

I think this should be fixed now. Eric made all those writes
unconditional (to fix a problem with PSE superpages not being created).
The patch is in Andi's queue.

J

2007-05-02 16:34:16

by Bill Irwin

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

Bill Irwin wrote:
>> Brain dump before crashing for the night:
>> The patch refuses to clobber already-present pagetable entries of
>> whatever origin. There are pagetables prior to this setup covering the
>> address range just above PAGE_OFFSET. If this theory is correct, you
>> should only be able to go a few MB above PAGE_OFFSET before encountering
>> unreadable kernel memory. IIRC those pagetables are a statically
>> allocated array in assembly; altering that array to set supervisor bits
>> may resolve it, though it may also be freed as initmem.

On Wed, May 02, 2007 at 09:28:46AM -0700, Jeremy Fitzhardinge wrote:
> I think this should be fixed now. Eric made all those writes
> unconditional (to fix a problem with PSE superpages not being created).
> The patch is in Andi's queue.

It needs verification with the testcase from this thread.

-- wli

2007-05-02 16:51:29

by Andi Kleen

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

>
> It needs verification with the testcase from this thread.

Verified. Bug is fixed.

-Andi

2007-05-02 17:17:38

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

Bill Irwin <[email protected]> writes:

> Bill Irwin wrote:
>>> Brain dump before crashing for the night:
>>> The patch refuses to clobber already-present pagetable entries of
>>> whatever origin. There are pagetables prior to this setup covering the
>>> address range just above PAGE_OFFSET. If this theory is correct, you
>>> should only be able to go a few MB above PAGE_OFFSET before encountering
>>> unreadable kernel memory. IIRC those pagetables are a statically
>>> allocated array in assembly; altering that array to set supervisor bits
>>> may resolve it, though it may also be freed as initmem.
>
> On Wed, May 02, 2007 at 09:28:46AM -0700, Jeremy Fitzhardinge wrote:
>> I think this should be fixed now. Eric made all those writes
>> unconditional (to fix a problem with PSE superpages not being created).
>> The patch is in Andi's queue.
>
> It needs verification with the testcase from this thread.

Sounds reasonable.

However there is no reason to suspect it won't fix this case because
unconditional writes are what we have always done, and we have always
kept swapper_pg_dir from early boot as well.

In essence my patch I sent out to Andi was a partial revert.

It isn't slated to go in until nextround but I also rewrote the early
page table setup in C. Allowing set_fixmap to work in the early
kernel, and fix problems of not having enough memory mapped to build
the identity mappings, because we are then updating the page table
we have also in the PAE case.

Eric

2007-05-02 17:53:23

by Bill Irwin

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

On Wed, May 02, 2007 at 09:28:46AM -0700, Jeremy Fitzhardinge wrote:
>>> I think this should be fixed now. Eric made all those writes
>>> unconditional (to fix a problem with PSE superpages not being created).
>>> The patch is in Andi's queue.

Bill Irwin <[email protected]> writes:
>> It needs verification with the testcase from this thread.

On Wed, May 02, 2007 at 11:16:27AM -0600, Eric W. Biederman wrote:
> Sounds reasonable.
> However there is no reason to suspect it won't fix this case because
> unconditional writes are what we have always done, and we have always
> kept swapper_pg_dir from early boot as well.
> In essence my patch I sent out to Andi was a partial revert.

It would not be so far out to be aware of what pagetable entries were
carried over from the initial swapper_pg_dir and explicitly clobber
them (for instance, modifying their protection bits while otherwise
retaining them).

On Wed, May 02, 2007 at 11:16:27AM -0600, Eric W. Biederman wrote:
> It isn't slated to go in until nextround but I also rewrote the early
> page table setup in C. Allowing set_fixmap to work in the early
> kernel, and fix problems of not having enough memory mapped to build
> the identity mappings, because we are then updating the page table
> we have also in the PAE case.

I'm not sure when we run into those problems, though I understand what
they are. I suppose it would be good to resolve them.

-- wli

2007-05-02 19:13:57

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC BUG?] dereference PAGE_OFFSET address (rc7-mm2)

Bill Irwin <[email protected]> writes:

> On Wed, May 02, 2007 at 09:28:46AM -0700, Jeremy Fitzhardinge wrote:
>>>> I think this should be fixed now. Eric made all those writes
>>>> unconditional (to fix a problem with PSE superpages not being created).
>>>> The patch is in Andi's queue.
>
> Bill Irwin <[email protected]> writes:
>>> It needs verification with the testcase from this thread.
>
> On Wed, May 02, 2007 at 11:16:27AM -0600, Eric W. Biederman wrote:
>> Sounds reasonable.
>> However there is no reason to suspect it won't fix this case because
>> unconditional writes are what we have always done, and we have always
>> kept swapper_pg_dir from early boot as well.
>> In essence my patch I sent out to Andi was a partial revert.
>
> It would not be so far out to be aware of what pagetable entries were
> carried over from the initial swapper_pg_dir and explicitly clobber
> them (for instance, modifying their protection bits while otherwise
> retaining them).

And that is actually what we do, but without paying attention, when
setting up the identity mappings.

> On Wed, May 02, 2007 at 11:16:27AM -0600, Eric W. Biederman wrote:
>> It isn't slated to go in until nextround but I also rewrote the early
>> page table setup in C. Allowing set_fixmap to work in the early
>> kernel, and fix problems of not having enough memory mapped to build
>> the identity mappings, because we are then updating the page table
>> we have also in the PAE case.
>
> I'm not sure when we run into those problems, though I understand what
> they are. I suppose it would be good to resolve them.

The nasty one is if you have a kernel sized just right. All you get
from the early page tables of allocatable memory is the low 640K. If
you have PSE disabled (because PAGEALLOC_DEBUG is enabled) up only
have enough pages in the mapping to setup a 256M identity mapping unsing
4K pages ouch.

Another issue is that we have boot_ioremap, and bt_ioremap for different
stages of the boot. Enabling fixmap early solves that.

Then there is my personal hobby horse. Using new fangled hardware with
memory mapped I/O registers for debugging. You need to be able to modify
the page table to use it so you might as well setup the fixmap entries
and then you have something that can be used as long as you care to
leave the hardware enabled.

Eric