2007-09-13 05:40:19

by Randy Dunlap

[permalink] [raw]
Subject: crashme fault

I run almost-daily kernel testing. I haven't seen 'crashme' cause a
kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
x86_64. After the first fault, I ran 'crashme' about 10 more times
to get the second fault (usually for 10 minutes, one time for 30
minutes).

[This is gjc-crashme, not Dave's scrashme replacement:
http://people.delphiforums.com/gjc/crashme.html]

There is very little helpful info. RIP is strange, e.g.: 000000000051b446
The call stack is not printed. No kernel symbols are printed,
even though I have CONFIG_KALLSYMS{_ALL}=y.

Here are the 2 kernel message logs of the faults.
Any ideas/suggestions? I suppose I will try to log the syscall
parameters that crashme is using.

~~~~~~~~~~~~~~~~~~~~ 1. ~~~~~~~~~~~~~~~~~~~~~~~

[ 7487.208128] Unable to handle kernel paging request at 00000000ff019b53 RIP:
[ 7487.212752] [<0000000000510eea>]
[ 7487.218537] PGD 10c1a2067 PUD 0
[ 7487.221811] Oops: 0000 [1] SMP
[ 7487.224989] CPU 2
[ 7487.227024] Modules linked in: loop
[ 7487.230550] Pid: 19139, comm: crashme Not tainted 2.6.23-rc6-git2 #1
[ 7487.236896] RIP: 0033:[<0000000000510eea>] [<0000000000510eea>]
[ 7487.242925] RSP: 002b:00007fffc9a8ec10 EFLAGS: 00010e83
[ 7487.248234] RAX: 00000000ffff8c4a RBX: 00000000004014f1 RCX: 00002b20e11c8b37
[ 7487.255361] RDX: 0000000000510ee0 RSI: 0000000000000000 RDI: 000000000000000a
[ 7487.262489] RBP: 00007fffc9a8ec10 R08: 00007fffc9a8eb60 R09: 0000000000000000
[ 7487.269616] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
[ 7487.276743] R13: 00007fffc9a8ee00 R14: 0000000000000000 R15: 0000000000000000
[ 7487.283871] FS: 00002b20e13676d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
[ 7487.291952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7487.297693] CR2: 00000000ff019b53 CR3: 000000005be6b000 CR4: 00000000000006e0
[ 7487.304821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7487.311949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7487.319076] Process crashme (pid: 19139, threadinfo ffff810106830000, task ffff810102cf5040)
[ 7487.327511]
[ 7487.329009] RIP [<0000000000510eea>]
[ 7487.332690] RSP <00007fffc9a8ec10>
[ 7487.336180] CR2: 00000000ff019b53
[ 7487.339810] Kernel panic - not syncing: Fatal exception

~~~~~~~~~~~~~~~~~~~ 2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[16625.590976] Unable to handle kernel paging request at 00000000ff019f05 RIP:
[16625.595604] [<000000000051b446>]
[16625.601397] PGD d9af2067 PUD 0
[16625.604577] Oops: 0000 [1] SMP
[16625.607754] CPU 1
[16625.609789] Modules linked in: loop
[16625.613315] Pid: 23057, comm: crashme Not tainted 2.6.23-rc6-git2 #1
[16625.619662] RIP: 0033:[<000000000051b446>] [<000000000051b446>]
[16625.625689] RSP: 002b:00007fff7b5851f0 EFLAGS: 00010603
[16625.630998] RAX: 0000000000000000 RBX: 00002b672f63ec00 RCX: 00002b672f6d1b37
[16625.638127] RDX: 0000000000401476 RSI: 0000000000000000 RDI: 000000000000000a
[16625.645254] RBP: 00007fff7b5851f0 R08: 00007fff7b585140 R09: 0000000000000000
[16625.652381] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
[16625.659508] R13: 00007fff7b5853e0 R14: 0000000000000000 R15: 0000000000000000
[16625.666637] FS: 00002b672f8706d0(0000) GS:ffff81011fc75e40(0000) knlGS:0000000000000000
[16625.674717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16625.680458] CR2: 00000000ff019f05 CR3: 00000000d942b000 CR4: 00000000000006e0
[16625.687587] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16625.694714] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[16625.701841] Process crashme (pid: 23057, threadinfo ffff8100d9bfe000, task ffff81011e83a040)
[16625.710267]
[16625.711765] RIP [<000000000051b446>]
[16625.715446] RSP <00007fff7b5851f0>
[16625.718937] CR2: 00000000ff019f05
[16625.722553] Kernel panic - not syncing: Fatal exception


---
~Randy


2007-09-15 04:28:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Wed, 12 Sep 2007, Randy Dunlap wrote:
>
> I run almost-daily kernel testing. I haven't seen 'crashme' cause a
> kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
> x86_64. After the first fault, I ran 'crashme' about 10 more times
> to get the second fault (usually for 10 minutes, one time for 30
> minutes).

Interesting. If this is reproducible for you, can you try to narrow down
(with bit-bisect) roughly when it started.

> There is very little helpful info. RIP is strange, e.g.: 000000000051b446
> The call stack is not printed. No kernel symbols are printed,
> even though I have CONFIG_KALLSYMS{_ALL}=y.

It looks like it's a page fault as a result of a *user*space* access, and
most likely your machine would continue happily, except you have
"panic_on_oops" set, so when the oops happens, it shuts the system down.

Now, the reason I say it looks like a user space access is that you have

RIP: 0033:[<0000000000510eea>]
RSP: 002b:00007fffc9a8ec10

which are all user space segments. So the register contents clearly say
"page fault in user space".

However, what makes the kernel oops (rather than just send a SIGSEGV) is
that the page fault "error code" is zero (that's the number that is
printed out just after the "Oops" string). For a normal user space access,
you should have bit #2 set in the error code.

So the kernel thinks it's a kernel page fault, because the page fault
error code says so. But everything else seems to indicate that it's really
user mode.. It would be very interesting to hear when this started
happening.

Even if you cannot bisect it down all the way, since you say that you do
almost daily kernel testing, is this really new to 2.6.23-rc6-git2, and
2.6.23-rc6-git1 was fine?

Andi, anything comes to mind?

Linus

---
> [ 7487.208128] Unable to handle kernel paging request at 00000000ff019b53 RIP:
> [ 7487.212752] [<0000000000510eea>]
> [ 7487.218537] PGD 10c1a2067 PUD 0
> [ 7487.221811] Oops: 0000 [1] SMP
> [ 7487.224989] CPU 2
> [ 7487.227024] Modules linked in: loop
> [ 7487.230550] Pid: 19139, comm: crashme Not tainted 2.6.23-rc6-git2 #1
> [ 7487.236896] RIP: 0033:[<0000000000510eea>] [<0000000000510eea>]
> [ 7487.242925] RSP: 002b:00007fffc9a8ec10 EFLAGS: 00010e83
> [ 7487.248234] RAX: 00000000ffff8c4a RBX: 00000000004014f1 RCX: 00002b20e11c8b37
> [ 7487.255361] RDX: 0000000000510ee0 RSI: 0000000000000000 RDI: 000000000000000a
> [ 7487.262489] RBP: 00007fffc9a8ec10 R08: 00007fffc9a8eb60 R09: 0000000000000000
> [ 7487.269616] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
> [ 7487.276743] R13: 00007fffc9a8ee00 R14: 0000000000000000 R15: 0000000000000000
> [ 7487.283871] FS: 00002b20e13676d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
> [ 7487.291952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 7487.297693] CR2: 00000000ff019b53 CR3: 000000005be6b000 CR4: 00000000000006e0
> [ 7487.304821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7487.311949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 7487.319076] Process crashme (pid: 19139, threadinfo ffff810106830000, task ffff810102cf5040)
> [ 7487.327511]
> [ 7487.329009] RIP [<0000000000510eea>]
> [ 7487.332690] RSP <00007fffc9a8ec10>
> [ 7487.336180] CR2: 00000000ff019b53
> [ 7487.339810] Kernel panic - not syncing: Fatal exception

2007-09-15 05:06:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote:

> On Wed, 12 Sep 2007, Randy Dunlap wrote:
> >
> > I run almost-daily kernel testing. I haven't seen 'crashme' cause a
> > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
> > x86_64. After the first fault, I ran 'crashme' about 10 more times
> > to get the second fault (usually for 10 minutes, one time for 30
> > minutes).
>
> Interesting. If this is reproducible for you, can you try to narrow down
> (with bit-bisect) roughly when it started.

It doesn't happen quite on demand. Only 2 times in several hours
of testing of the last few days.

> > There is very little helpful info. RIP is strange, e.g.: 000000000051b446
> > The call stack is not printed. No kernel symbols are printed,
> > even though I have CONFIG_KALLSYMS{_ALL}=y.
>
> It looks like it's a page fault as a result of a *user*space* access, and
> most likely your machine would continue happily, except you have
> "panic_on_oops" set, so when the oops happens, it shuts the system down.

I see.

> Now, the reason I say it looks like a user space access is that you have
>
> RIP: 0033:[<0000000000510eea>]
> RSP: 002b:00007fffc9a8ec10
>
> which are all user space segments. So the register contents clearly say
> "page fault in user space".
>
> However, what makes the kernel oops (rather than just send a SIGSEGV) is
> that the page fault "error code" is zero (that's the number that is
> printed out just after the "Oops" string). For a normal user space access,
> you should have bit #2 set in the error code.
>
> So the kernel thinks it's a kernel page fault, because the page fault
> error code says so. But everything else seems to indicate that it's really
> user mode.. It would be very interesting to hear when this started
> happening.
>
> Even if you cannot bisect it down all the way, since you say that you do
> almost daily kernel testing, is this really new to 2.6.23-rc6-git2, and
> 2.6.23-rc6-git1 was fine?

I'll check back thru my test logs.

> Andi, anything comes to mind?
>
> Linus
>
> ---
> > [ 7487.208128] Unable to handle kernel paging request at 00000000ff019b53 RIP:
> > [ 7487.212752] [<0000000000510eea>]
> > [ 7487.218537] PGD 10c1a2067 PUD 0
> > [ 7487.221811] Oops: 0000 [1] SMP
> > [ 7487.224989] CPU 2
> > [ 7487.227024] Modules linked in: loop
> > [ 7487.230550] Pid: 19139, comm: crashme Not tainted 2.6.23-rc6-git2 #1
> > [ 7487.236896] RIP: 0033:[<0000000000510eea>] [<0000000000510eea>]
> > [ 7487.242925] RSP: 002b:00007fffc9a8ec10 EFLAGS: 00010e83
> > [ 7487.248234] RAX: 00000000ffff8c4a RBX: 00000000004014f1 RCX: 00002b20e11c8b37
> > [ 7487.255361] RDX: 0000000000510ee0 RSI: 0000000000000000 RDI: 000000000000000a
> > [ 7487.262489] RBP: 00007fffc9a8ec10 R08: 00007fffc9a8eb60 R09: 0000000000000000
> > [ 7487.269616] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
> > [ 7487.276743] R13: 00007fffc9a8ee00 R14: 0000000000000000 R15: 0000000000000000
> > [ 7487.283871] FS: 00002b20e13676d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
> > [ 7487.291952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 7487.297693] CR2: 00000000ff019b53 CR3: 000000005be6b000 CR4: 00000000000006e0
> > [ 7487.304821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 7487.311949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [ 7487.319076] Process crashme (pid: 19139, threadinfo ffff810106830000, task ffff810102cf5040)
> > [ 7487.327511]
> > [ 7487.329009] RIP [<0000000000510eea>]
> > [ 7487.332690] RSP <00007fffc9a8ec10>
> > [ 7487.336180] CR2: 00000000ff019b53
> > [ 7487.339810] Kernel panic - not syncing: Fatal exception
> -

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-15 05:22:27

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Fri, 14 Sep 2007 22:05:17 -0700 Randy Dunlap wrote:

> On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote:
>
> > On Wed, 12 Sep 2007, Randy Dunlap wrote:
> > >
> > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a
> > > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
> > > x86_64. After the first fault, I ran 'crashme' about 10 more times
> > > to get the second fault (usually for 10 minutes, one time for 30
> > > minutes).
> >
> > Interesting. If this is reproducible for you, can you try to narrow down
> > (with bit-bisect) roughly when it started.
>
> It doesn't happen quite on demand. Only 2 times in several hours
> of testing of the last few days.
>
> > > There is very little helpful info. RIP is strange, e.g.: 000000000051b446
> > > The call stack is not printed. No kernel symbols are printed,
> > > even though I have CONFIG_KALLSYMS{_ALL}=y.
> >
> > It looks like it's a page fault as a result of a *user*space* access, and
> > most likely your machine would continue happily, except you have
> > "panic_on_oops" set, so when the oops happens, it shuts the system down.
>
> I see.
>
> > Now, the reason I say it looks like a user space access is that you have
> >
> > RIP: 0033:[<0000000000510eea>]
> > RSP: 002b:00007fffc9a8ec10
> >
> > which are all user space segments. So the register contents clearly say
> > "page fault in user space".
> >
> > However, what makes the kernel oops (rather than just send a SIGSEGV) is
> > that the page fault "error code" is zero (that's the number that is
> > printed out just after the "Oops" string). For a normal user space access,
> > you should have bit #2 set in the error code.
> >
> > So the kernel thinks it's a kernel page fault, because the page fault
> > error code says so. But everything else seems to indicate that it's really
> > user mode.. It would be very interesting to hear when this started
> > happening.
> >
> > Even if you cannot bisect it down all the way, since you say that you do
> > almost daily kernel testing, is this really new to 2.6.23-rc6-git2, and
> > 2.6.23-rc6-git1 was fine?
>
> I'll check back thru my test logs.

I looked thru several hundred test logs and didn't find any other
occurrences of it. Doesn't prove much.
I'll continue to run the test most of this weekend.


> > Andi, anything comes to mind?
> >
> > Linus
> >
> > ---
> > > [ 7487.208128] Unable to handle kernel paging request at 00000000ff019b53 RIP:
> > > [ 7487.212752] [<0000000000510eea>]
> > > [ 7487.218537] PGD 10c1a2067 PUD 0
> > > [ 7487.221811] Oops: 0000 [1] SMP
> > > [ 7487.224989] CPU 2
> > > [ 7487.227024] Modules linked in: loop
> > > [ 7487.230550] Pid: 19139, comm: crashme Not tainted 2.6.23-rc6-git2 #1
> > > [ 7487.236896] RIP: 0033:[<0000000000510eea>] [<0000000000510eea>]
> > > [ 7487.242925] RSP: 002b:00007fffc9a8ec10 EFLAGS: 00010e83
> > > [ 7487.248234] RAX: 00000000ffff8c4a RBX: 00000000004014f1 RCX: 00002b20e11c8b37
> > > [ 7487.255361] RDX: 0000000000510ee0 RSI: 0000000000000000 RDI: 000000000000000a
> > > [ 7487.262489] RBP: 00007fffc9a8ec10 R08: 00007fffc9a8eb60 R09: 0000000000000000
> > > [ 7487.269616] R10: 0000000000000008 R11: 0000000000000612 R12: 0000000000000000
> > > [ 7487.276743] R13: 00007fffc9a8ee00 R14: 0000000000000000 R15: 0000000000000000
> > > [ 7487.283871] FS: 00002b20e13676d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
> > > [ 7487.291952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [ 7487.297693] CR2: 00000000ff019b53 CR3: 000000005be6b000 CR4: 00000000000006e0
> > > [ 7487.304821] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 7487.311949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > [ 7487.319076] Process crashme (pid: 19139, threadinfo ffff810106830000, task ffff810102cf5040)
> > > [ 7487.327511]
> > > [ 7487.329009] RIP [<0000000000510eea>]
> > > [ 7487.332690] RSP <00007fffc9a8ec10>
> > > [ 7487.336180] CR2: 00000000ff019b53
> > > [ 7487.339810] Kernel panic - not syncing: Fatal exception
> > -
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-15 18:34:21

by Andi Kleen

[permalink] [raw]
Subject: Re: crashme fault

> Andi, anything comes to mind?

No, unfortunately not. There weren't any changes to entry.S recently
that could corrupt the error code as far as I remember. Also cannot think of
something else.

A version where it started happening would be useful.

-Andi

2007-09-15 18:41:01

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

Andi Kleen wrote:
>> Andi, anything comes to mind?
>
> No, unfortunately not. There weren't any changes to entry.S recently
> that could corrupt the error code as far as I remember. Also cannot think of
> something else.
>
> A version where it started happening would be useful.

I'll begin testing older kernels then.


Had another on recent last night (probably not helpful):

[58169.607164] Unable to handle kernel paging request at 00000000ff018915 RIP:
[58169.611784] [<0000000000506986>]
[58169.617570] PGD d4380067 PUD 0
[58169.620757] Oops: 0000 [1] SMP
[58169.623943] CPU 2
[58169.625970] Modules linked in: loop
[58169.629496] Pid: 26735, comm: crashme Not tainted 2.6.23-rc6-git4 #1
[58169.635842] RIP: 0033:[<0000000000506986>] [<0000000000506986>]
[58169.641861] RSP: 002b:00007fff14d3f9a0 EFLAGS: 00010247
[58169.647171] RAX: 0000000000506909 RBX: 00002b8695e82c00 RCX: 00002b8695f15b37
[58169.654298] RDX: 0000000000000000 RSI: 0000000000401476 RDI: 000000000000000a
[58169.661426] RBP: 00007fff14d3f9a0 R08: 00007fff14d3f8f0 R09: 0000000000000000
[58169.668553] R10: 0000000000000008 R11: 0000000000000212 R12: 0000000000000000
[58169.675680] R13: 00007fff14d3fb90 R14: 0000000000000000 R15: 0000000000000000
[58169.682808] FS: 00002b86960b46d0(0000) GS:ffff81011fc75840(0000) knlGS:0000000000000000
[58169.690889] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58169.696630] CR2: 00000000ff018915 CR3: 00000000d8820000 CR4: 00000000000006e0
[58169.703757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58169.710885] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[58169.718014] Process crashme (pid: 26735, threadinfo ffff8100cd176000, task ffff81011fffe040)
[58169.726439]
[58169.727937] RIP [<0000000000506986>]
[58169.731609] RSP <00007fff14d3f9a0>
[58169.735100] CR2: 00000000ff018915
[58169.738710] Kernel panic - not syncing: Fatal exception
[58169.743953] Rebooting in 30 seconds..

2007-09-15 19:45:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sat, 15 Sep 2007, Randy Dunlap wrote:
>
> Had another on recent last night (probably not helpful):

At least the original "crashme" would write its random number seeds to a
logfile each time (and I made it fsync it in some versions), which meant
that once a crash happened, you could re-produce it immediately (if it was
reproducible at all, of course).

Does your crashme have something like that?

All your crashes look basically identical - I don't think there is
anything new in this one, they're all the same issue. What CPU do you have
- vendor, stepping, version etc - and has something else than the kernel
changed in your setup lately?

As mentioned, the crash does look like a user-level crash got reported as
a kernel page fault, and while a CPU bug sounds incredibly unlikely, this
does have the smell of something strange like a fault in the middle of an
"iretq" or "sysretq", where part of the CPU state has already been
restored - which would explain why rip/cs is user space - but some part of
the CPU is still in kernel mode - which would explain the incorrect page
fault error code.

Here's a really *stupid* patch (and untested too, btw) to see if it gets
easier to debug when you don't oops, just print the register state
instead.

(It might be interesting to also do something like

force_sig_specific(SIGSTOP, current);

to then be able to more easily attach to the process that had problems,
and debug it in user space to see what was going on..)

Linus
---
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 327c9f2..1b81392 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -320,6 +320,11 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,

info.si_code = SEGV_MAPERR;

+ if (!(error_code & PF_USER) && user_mode(regs)) {
+ printk("kernel mode page fault from user space? Huh?\n");
+ __show_regs(regs);
+ error_code |= PF_USER;
+ }

/*
* We fault-in kernel-space virtual memory on-demand. The

2007-09-15 19:55:50

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

Linus Torvalds wrote:
>
> On Sat, 15 Sep 2007, Randy Dunlap wrote:
>> Had another on recent last night (probably not helpful):
>
> At least the original "crashme" would write its random number seeds to a
> logfile each time (and I made it fsync it in some versions), which meant
> that once a crash happened, you could re-produce it immediately (if it was
> reproducible at all, of course).
>
> Does your crashme have something like that?

I tell it the "random" seed to use. I can also sets its debug level,
but when I did that yesterday, it never faulted, so I lowered it again,
them boom. Could be coincidence.


> All your crashes look basically identical - I don't think there is
> anything new in this one, they're all the same issue. What CPU do you have
> - vendor, stepping, version etc - and has something else than the kernel
> changed in your setup lately?

Just kernel changes. CPU is dual Pentium Xeon + HT:

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Xeon(TM) CPU 3.40GHz
stepping : 4
cpu MHz : 3400.227
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips : 6805.96
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Xeon(TM) CPU 3.40GHz
stepping : 4
cpu MHz : 3400.227
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips : 6800.28
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Xeon(TM) CPU 3.40GHz
stepping : 4
cpu MHz : 3400.227
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips : 6800.72
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Xeon(TM) CPU 3.40GHz
stepping : 4
cpu MHz : 3400.227
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips : 6800.57
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:



> As mentioned, the crash does look like a user-level crash got reported as
> a kernel page fault, and while a CPU bug sounds incredibly unlikely, this
> does have the smell of something strange like a fault in the middle of an
> "iretq" or "sysretq", where part of the CPU state has already been
> restored - which would explain why rip/cs is user space - but some part of
> the CPU is still in kernel mode - which would explain the incorrect page
> fault error code.
>
> Here's a really *stupid* patch (and untested too, btw) to see if it gets
> easier to debug when you don't oops, just print the register state
> instead.

Will add this patch.

> (It might be interesting to also do something like
>
> force_sig_specific(SIGSTOP, current);
>
> to then be able to more easily attach to the process that had problems,
> and debug it in user space to see what was going on..)
>
> Linus
> ---
> diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
> index 327c9f2..1b81392 100644
> --- a/arch/x86_64/mm/fault.c
> +++ b/arch/x86_64/mm/fault.c
> @@ -320,6 +320,11 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
>
> info.si_code = SEGV_MAPERR;
>
> + if (!(error_code & PF_USER) && user_mode(regs)) {
> + printk("kernel mode page fault from user space? Huh?\n");
> + __show_regs(regs);
> + error_code |= PF_USER;
> + }
>
> /*
> * We fault-in kernel-space virtual memory on-demand. The

2007-09-15 22:16:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sat, 15 Sep 2007, Linus Torvalds wrote:
>
> Here's a really *stupid* patch (and untested too, btw) to see if it gets
> easier to debug when you don't oops, just print the register state
> instead.

Side note - while thinking about this, I'm wondering whether maybe that
"stupid" patch might not actually be the right thing to do.

The fact is, the "user access" bit in the error code is in many ways a
totally *different* issue than the "user_mode(regs)" test. We may end up
having a system page fault for some action that was initiated by user
space: eg taking a page fault on a segment load. And it may well be marked
as a "system access" in the error code, even though the instruction that
triggered it was all user mode, and even though we could just terminate
the program rather than oops.

That said, since the kernel isn't paging itself out, user mode generally
should not be able to generate those kinds of system access page faults.
Our segment tables should all be in core, and so I would still in
*practice* always expect the register state and the error code to match in
the "usermode'ness".

So regardless of whether we want to trust "user_mode(regs)" more than
"error_code & PF_USER", it would definitely be very interesting if you can
give a good "this is where it started happening".

Linus

2007-09-15 22:47:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sat, 15 Sep 2007, Linus Torvalds wrote:
>
> So regardless of whether we want to trust "user_mode(regs)" more than
> "error_code & PF_USER", it would definitely be very interesting if you can
> give a good "this is where it started happening".

Also, can you point to good crashme sources, and give the arguments you
used to run it when it crashed?

The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by
default (nor does it even compile on a modern unix), so it's not going to
do anything. I hacked it up, and it appears to work ok for me, but I'm not
at all confident that I'm even close to recreating what you are doing.

(It probably goes without saying that I've not reproduced the oops on my
Core 2 Duo. Lots of #GP and #PF errors, but nothing interesting.

Linus

2007-09-15 23:50:22

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

Linus Torvalds wrote:
>
> On Sat, 15 Sep 2007, Linus Torvalds wrote:
>> So regardless of whether we want to trust "user_mode(regs)" more than
>> "error_code & PF_USER", it would definitely be very interesting if you can
>> give a good "this is where it started happening".
>
> Also, can you point to good crashme sources, and give the arguments you
> used to run it when it crashed?

I use http://people.delphiforums.com/gjc/crashme.tgz from
http://people.delphiforums.com/gjc/crashme.html, plus one small patch:

--- gjc-crashme.orig/crashme.c
+++ gjc-crashme/crashme.c
@@ -260,7 +260,8 @@ void my_signal(sig, func)
#else
struct sigaction act;
act.sa_handler = func;
- act.sa_mask = 0;
+ //act.sa_mask = 0;
+ sigemptyset(&act.sa_mask);
#ifdef linux
act.sa_restorer = 0;
#endif /* linux */


Command: ./crashme +2000 666 1000 1:00:00 1

> The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by
> default (nor does it even compile on a modern unix), so it's not going to
> do anything. I hacked it up, and it appears to work ok for me, but I'm not
> at all confident that I'm even close to recreating what you are doing.

So I'm missing some mprotect() call also?
Oops.

> (It probably goes without saying that I've not reproduced the oops on my
> Core 2 Duo. Lots of #GP and #PF errors, but nothing interesting.
>
> Linus

2007-09-16 00:35:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sat, 15 Sep 2007, Randy Dunlap wrote:
> Command: ./crashme +2000 666 1000 1:00:00 1

Ok, that's close to what I was testing (one of the examples from the
crashme docs).

> > The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by default
> > (nor does it even compile on a modern unix), so it's not going to do
> > anything. I hacked it up, and it appears to work ok for me, but I'm not at
> > all confident that I'm even close to recreating what you are doing.
>
> So I'm missing some mprotect() call also?
> Oops.

Yeah, by default, it won't do a mprotect(), so as far as I can tell,
you're always going to hit NX on a writable data page. I'm running with
the appended patch.

That said, I didn't actually *check* that the mprotect makes any
difference, and maybe you have one of the early EM64T chips that didn't
have NX support at all. Or maybe it's dodgy, and is actually the whole
*reason* for the page fault problem..

I don't have any P4's capable of 64-bit operation (nor do I really want
any ;)

Linus

---
diff --git a/crashme.c b/crashme.c
index 5e7e663..19a2fbb 100644
--- a/crashme.c
+++ b/crashme.c
@@ -1,3 +1,5 @@
+#define pyr
+#define PAGSIZ 4096
/* crashme: Create a string of random bytes and then jump to it.
crashme [+]<nbytes>[.inc] <srand> <ntrys> [nsub] [verboseness] */

@@ -143,7 +145,6 @@ a script.
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/param.h>
-#include <sys/vmmac.h>
#define strchr index
#endif

@@ -215,7 +216,7 @@ unsigned char *bad_malloc(n)
{unsigned char *data;
data = (unsigned char *) malloc(n);
#ifdef pyr
- if (mprotect(((int)data/PAGSIZ)*PAGSIZ, (n/PAGSIZ+1)*PAGSIZ,
+ if (mprotect(((long)data/PAGSIZ)*PAGSIZ, (n/PAGSIZ+1)*PAGSIZ,
PROT_READ|PROT_WRITE|PROT_EXEC))
perror("mprotect");
#endif
@@ -259,11 +260,8 @@ void my_signal(sig, func)
signal(sig, func);
#else
struct sigaction act;
+ memset(&act, 0, sizeof(act));
act.sa_handler = func;
- act.sa_mask = 0;
-#ifdef linux
- act.sa_restorer = 0;
-#endif /* linux */
act.sa_flags = SA_NOMASK;
#ifdef SA_RESTART
act.sa_flags |= SA_RESTART;

2007-09-16 03:10:26

by Andi Kleen

[permalink] [raw]
Subject: Re: crashme fault

On Sat, Sep 15, 2007 at 03:47:19PM -0700, Linus Torvalds wrote:
>
>
> On Sat, 15 Sep 2007, Linus Torvalds wrote:
> >
> > So regardless of whether we want to trust "user_mode(regs)" more than
> > "error_code & PF_USER", it would definitely be very interesting if you can
> > give a good "this is where it started happening".
>
> Also, can you point to good crashme sources, and give the arguments you

LTP (http://ltp.sourceforge.net) should have a modern crashme.

-Andi

2007-09-16 15:53:32

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: crashme fault

On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote:
> I run almost-daily kernel testing. I haven't seen 'crashme' cause a
> kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,

Did the room temperature change in the server room? ;) Those early
EM64T P4 core based are fast but they suck a whole lot of power. It
seems a bitflip in a cpu register. You can use cpufreq to set low
frequency (they should support it, and they shouldn't use more than
100w per core that way) to lower the temp, and see if the problem goes
away. If it's a software issue it will hopefully be still reproducible
with ~2ghz instead of 3.4ghz.

2007-09-16 16:18:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Sun, 16 Sep 2007 17:53:21 +0200 Andrea Arcangeli wrote:

> On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote:
> > I run almost-daily kernel testing. I haven't seen 'crashme' cause a
> > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2,
>
> Did the room temperature change in the server room? ;) Those early

I rarely go there, but when I do, it's usually very cool.
But I have no way of tracking the room temp there.

> EM64T P4 core based are fast but they suck a whole lot of power. It
> seems a bitflip in a cpu register. You can use cpufreq to set low
> frequency (they should support it, and they shouldn't use more than
> 100w per core that way) to lower the temp, and see if the problem goes
> away. If it's a software issue it will hopefully be still reproducible
> with ~2ghz instead of 3.4ghz.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-16 16:40:38

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Sat, 15 Sep 2007 17:34:54 -0700 (PDT) Linus Torvalds wrote:

>
>
> On Sat, 15 Sep 2007, Randy Dunlap wrote:
> > Command: ./crashme +2000 666 1000 1:00:00 1
>
> Ok, that's close to what I was testing (one of the examples from the
> crashme docs).
>
> > > The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by default
> > > (nor does it even compile on a modern unix), so it's not going to do
> > > anything. I hacked it up, and it appears to work ok for me, but I'm not at
> > > all confident that I'm even close to recreating what you are doing.
> >
> > So I'm missing some mprotect() call also?
> > Oops.
>
> Yeah, by default, it won't do a mprotect(), so as far as I can tell,
> you're always going to hit NX on a writable data page. I'm running with
> the appended patch.

I'll apply this patch today, but I haven't done so yet (for the 2
bug reports below). Then I'll back up more, prior to 2.6.22 (which
got the Huh? message and kept going).

> That said, I didn't actually *check* that the mprotect makes any
> difference, and maybe you have one of the early EM64T chips that didn't
> have NX support at all. Or maybe it's dodgy, and is actually the whole
> *reason* for the page fault problem..

It's a bit old. I wouldn't be surprised if it doesn't support NX.

> I don't have any P4's capable of 64-bit operation (nor do I really want
> any ;)

ack that.


~~~~~~~~~~~~~~~~~~~~~~~ 2.6.23-rc3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[21769.161490] Unable to handle kernel paging request at 00000000ff016abe RIP:
[21769.166119] [<000000000050522f>]
[21769.171912] PGD d5572067 PUD 0
[21769.175100] Oops: 0000 [1] SMP
[21769.178278] CPU 3
[21769.180312] Modules linked in: loop
[21769.183830] Pid: 8823, comm: crashme Not tainted 2.6.23-rc3 #1
[21769.189657] RIP: 0033:[<000000000050522f>] [<000000000050522f>]
[21769.195684] RSP: 002b:00007fffe9528118 EFLAGS: 00010202
[21769.200993] RAX: 00000000c169bc00 RBX: 0000000000000000 RCX: 00002b88c172eb37
[21769.208121] RDX: 00000000005051e0 RSI: 0000000000000000 RDI: 000000000000000a
[21769.215249] RBP: 00007fffe9528120 R08: 00007fffe9528070 R09: 0000000000000000
[21769.222377] R10: 0000000000000008 R11: 0000000000000216 R12: 0000000000000000
[21769.229505] R13: 00007fffe9528310 R14: 0000000000000000 R15: 0000000000000000
[21769.236632] FS: 00002b88c18cd6d0(0000) GS:ffff81011fc751c0(0000) knlGS:0000000000000000
[21769.244711] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[21769.250453] CR2: 00000000ff016abe CR3: 00000000d792f000 CR4: 00000000000006e0
[21769.257581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21769.264709] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[21769.271837] Process crashme (pid: 8823, threadinfo ffff8100d5656000, task ffff81011ffff810)
[21769.280175]
[21769.281673] RIP [<000000000050522f>]
[21769.285355] RSP <00007fffe9528118>
[21769.288845] CR2: 00000000ff016abe
[21769.292455] Kernel panic - not syncing: Fatal exception
[21769.297699] Rebooting in 30 seconds..

~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.6.22 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sep 15 17:15:03 caor-test8 kernel: [18447.172949] kernel mode page fault from user space? Huh?
Sep 15 17:15:03 caor-test8 kernel: [18447.178268]
Sep 15 17:15:03 caor-test8 kernel: [18447.179773] Modules linked in: loop
Sep 15 17:15:03 caor-test8 kernel: [18447.183306] Pid: 30819, comm: crashme Not tainted 2.6.22 #2
Sep 15 17:15:03 caor-test8 kernel: [18447.183311] RIP: 0033:[<0000000000507163>] [<0000000000507163>]
Sep 15 17:15:03 caor-test8 kernel: [18447.183318] RSP: 002b:00007fff10da99b0 EFLAGS: 00010202
Sep 15 17:15:03 caor-test8 kernel: [18447.183320] RAX: 0000000000000000 RBX: 00002b4b99e18c00 RCX: 00002b4b99eabb37
Sep 15 17:15:03 caor-test8 kernel: [18447.183323] RDX: 0000000000507160 RSI: 0000000000401476 RDI: 000000000000000a
Sep 15 17:15:03 caor-test8 kernel: [18447.183326] RBP: 00007fff10da99b0 R08: 00007fff10da9900 R09: 0000000000000000
Sep 15 17:15:03 caor-test8 kernel: [18447.183328] R10: 0000000000000008 R11: 0000000000000212 R12: 0000000000000000
Sep 15 17:15:03 caor-test8 kernel: [18447.183330] R13: 00007fff10da9ba0 R14: 0000000000000000 R15: 0000000000000000
Sep 15 17:15:03 caor-test8 kernel: [18447.183333] FS: 00002b4b9a04a6d0(0000) GS:ffffffff80721000(0000) knlGS:0000000000000000
Sep 15 17:15:03 caor-test8 kernel: [18447.183336] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 15 17:15:03 caor-test8 kernel: [18447.183339] CR2: 00000000ff019fd5 CR3: 00000000cacb0000 CR4: 00000000000006e0

2007-09-16 17:15:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sun, 16 Sep 2007, Randy Dunlap wrote:
>
> I'll apply this patch today, but I haven't done so yet (for the 2
> bug reports below).

Actually, it's probably better that you don't change your situation
unnecessarily, in case the bug goes away.

Since you are triggering the problem even *without* the patch, whatever
crashme is doing now for you is the interesting case.

I suspect NX just isn't active for you, so the mprotect doesn't matter,
but if it *does* matter, that's interesting in itself.

> Then I'll back up more, prior to 2.6.22 (which got the Huh? message and
> kept going).

Ahh, ok, so this is not a new thing, and it happened in 2.6.22 too.

I'm really starting to suspect some early EM64T bug, and I also suspect
that it's harmless but that we should just do the trivial patch to say "if
the register state is in user mode, we don't care if the CPU says it was a
kernel access".

Linus

2007-09-16 18:13:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sun, 16 Sep 2007, Linus Torvalds wrote:
>
> I'm really starting to suspect some early EM64T bug, and I also suspect
> that it's harmless but that we should just do the trivial patch to say "if
> the register state is in user mode, we don't care if the CPU says it was a
> kernel access".

Namely something like this..

The basic idea is that it's pointless to test for "error_code & PF_USER"
to decide whether we should oops or kill the process: the *real* issue is
whether we *can* kill the process or not. And that depends not on whether
the CPU claimed it was a user access or not, but on whether the register
state we'd return to is user-mode or not!

So anything that decides whether it should send a signal or do to the
"no_context" Ooops path should use "user_mode_vm(regs)" (yeah, I realize
that the "_vm" part is unnecessary on x86-64, but it doesn't hurt either,
and all of the issues are the same on 32/64-bit) which tests the right
thing.

Now, normally the USER bit in the error code should be the exact same
thing, except for

- Some CPU bug (microcode issue, whatever) where some complex fault
situation sets the wrong error code.

- user space accesses that caused a system page fault (ie a page fault
while handling another trap - possibly due to lazy page table setup
and having the LDT or some other CPU data structure in vmalloc space)

Now, the vmalloc space accesses should be handled separately anyway, so I
really wonder if it's some subtle CPU bug (I can't reproduce any problems
on my Core 2 Duo), but the point is that I think this patch really is
conceptually a real fix regardless, even if it _shouldn't_ matter.

Comments?

Randy, this replaces the hacky patch I sent, but also shuts up about the
odd thing you're hitting, so for testing your case further this may not be
the right thing. However, it would be nice to hear whether this just makes
"crashme" work properly for you without any side effects..

Linus

----
arch/x86_64/mm/fault.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 327c9f2..75270a0 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -87,7 +87,7 @@ static noinline int is_prefetch(struct pt_regs *regs, unsigned long addr,
instr = (unsigned char __user *)convert_rip_to_linear(current, regs);
max_instr = instr + 15;

- if (user_mode(regs) && instr >= (unsigned char *)TASK_SIZE)
+ if (user_mode_vm(regs) && instr >= (unsigned char *)TASK_SIZE)
return 0;

while (scan_more && instr < max_instr) {
@@ -320,7 +320,6 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,

info.si_code = SEGV_MAPERR;

-
/*
* We fault-in kernel-space virtual memory on-demand. The
* 'reference' page table is init_mm.pgd.
@@ -404,7 +403,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
goto good_area;
if (!(vma->vm_flags & VM_GROWSDOWN))
goto bad_area;
- if (error_code & 4) {
+ if (error_code & PF_USER) {
/* Allow userspace just enough access below the stack pointer
* to let the 'enter' instruction work.
*/
@@ -558,7 +557,7 @@ out_of_memory:
goto again;
}
printk("VM: killing process %s\n", tsk->comm);
- if (error_code & 4)
+ if (user_mode_vm(regs))
do_group_exit(SIGKILL);
goto no_context;

@@ -566,7 +565,7 @@ do_sigbus:
up_read(&mm->mmap_sem);

/* Kernel mode? Handle exceptions or die */
- if (!(error_code & PF_USER))
+ if (!user_mode_vm(regs))
goto no_context;

tsk->thread.cr2 = address;

2007-09-16 18:29:16

by Andi Kleen

[permalink] [raw]
Subject: Re: crashme fault

On Sun, Sep 16, 2007 at 10:14:46AM -0700, Linus Torvalds wrote:
>
>
> On Sun, 16 Sep 2007, Randy Dunlap wrote:
> >
> > I'll apply this patch today, but I haven't done so yet (for the 2
> > bug reports below).
>
> Actually, it's probably better that you don't change your situation
> unnecessarily, in case the bug goes away.

I checked the Prescott Specification Update, but didn't seen anything
that could cause such strange faults.

I'm trying to reproduce it here.

-Andi

2007-09-17 05:08:46

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Sun, 16 Sep 2007 11:12:23 -0700 (PDT) Linus Torvalds wrote:

>
>
> On Sun, 16 Sep 2007, Linus Torvalds wrote:
> >
> > I'm really starting to suspect some early EM64T bug, and I also suspect
> > that it's harmless but that we should just do the trivial patch to say "if
> > the register state is in user mode, we don't care if the CPU says it was a
> > kernel access".
>
> Namely something like this..
>
> The basic idea is that it's pointless to test for "error_code & PF_USER"
> to decide whether we should oops or kill the process: the *real* issue is
> whether we *can* kill the process or not. And that depends not on whether
> the CPU claimed it was a user access or not, but on whether the register
> state we'd return to is user-mode or not!
>
> So anything that decides whether it should send a signal or do to the
> "no_context" Ooops path should use "user_mode_vm(regs)" (yeah, I realize
> that the "_vm" part is unnecessary on x86-64, but it doesn't hurt either,
> and all of the issues are the same on 32/64-bit) which tests the right
> thing.
>
> Now, normally the USER bit in the error code should be the exact same
> thing, except for
>
> - Some CPU bug (microcode issue, whatever) where some complex fault
> situation sets the wrong error code.
>
> - user space accesses that caused a system page fault (ie a page fault
> while handling another trap - possibly due to lazy page table setup
> and having the LDT or some other CPU data structure in vmalloc space)
>
> Now, the vmalloc space accesses should be handled separately anyway, so I
> really wonder if it's some subtle CPU bug (I can't reproduce any problems
> on my Core 2 Duo), but the point is that I think this patch really is
> conceptually a real fix regardless, even if it _shouldn't_ matter.
>
> Comments?
>
> Randy, this replaces the hacky patch I sent, but also shuts up about the
> odd thing you're hitting, so for testing your case further this may not be
> the right thing. However, it would be nice to hear whether this just makes
> "crashme" work properly for you without any side effects..

I'll test this overnight on 2.6.23-rc6-git2 since that was failing.

I haven't been able to reproduce the fault on 2.6.21 after several
hours of testing.

I'll also test a microcode update to see if it helps.

---
~Randy

2007-09-17 05:29:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Sun, 16 Sep 2007, Randy Dunlap wrote:
>
> I'll test this overnight on 2.6.23-rc6-git2 since that was failing.
>
> I haven't been able to reproduce the fault on 2.6.21 after several
> hours of testing.
>
> I'll also test a microcode update to see if it helps.

Before you do the microcode update, try to see if you can bisect the place
between 2.6.21->22 that seems to start it. Even if you don't get all the
way, if you are confident enough about the "no error" case to be able to
bisect it down by doing a few reboots, it will at least cut down the set
of possible commits by roughly a factor of 2^<nr-of-bisect> events, so
even "just" a series of 4-5 bisect things might give us more of a clue.

Of course, if it's somewhat random and timing-dependent, bisection can be
hard (the "2^n" thing is very efficient, but it also means that a *single*
wrong answer will totally invalidate the result, so if something isn't
entirely reproducible, bisection often fails!)

Linus

2007-09-17 14:31:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

Linus Torvalds wrote:
>
> On Sun, 16 Sep 2007, Randy Dunlap wrote:
>> I'll test this overnight on 2.6.23-rc6-git2 since that was failing.
>>
>> I haven't been able to reproduce the fault on 2.6.21 after several
>> hours of testing.
>>
>> I'll also test a microcode update to see if it helps.
>
> Before you do the microcode update, try to see if you can bisect the place
> between 2.6.21->22 that seems to start it. Even if you don't get all the
> way, if you are confident enough about the "no error" case to be able to
> bisect it down by doing a few reboots, it will at least cut down the set
> of possible commits by roughly a factor of 2^<nr-of-bisect> events, so
> even "just" a series of 4-5 bisect things might give us more of a clue.

OK, I haven't done the microcode update yet. I ran crashme overnight
with your newer patch and it crashed:

[14254.327676] Unable to handle kernel paging request at 00000000ff021eaf RIP:
[14254.332299] [<0000000000504225>]
[14254.338084] PGD d8542067 PUD 0
[14254.341271] Oops: 0000 [1] SMP
[14254.344449] CPU 3
[14254.346484] Modules linked in: loop
[14254.350001] Pid: 28565, comm: crashme Not tainted 2.6.23-rc6-git2 #2
[14254.356349] RIP: 0033:[<0000000000504225>] [<0000000000504225>]
[14254.362376] RSP: 002b:00007fff5afccbf8 EFLAGS: 00010656
[14254.367685] RAX: 000000005afccbf8 RBX: 00002abd4fbf5c00 RCX: 00002abd4fc88b37
[14254.374812] RDX: 0000000000504220 RSI: 0000000000000000 RDI: 000000000000000a
[14254.381939] RBP: 00007fff5afccc00 R08: 00007fff5afccb50 R09: 0000000000000000
[14254.389068] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[14254.396195] R13: 00007fff5afccdf0 R14: 0000000000000000 R15: 0000000000000000
[14254.403324] FS: 00002abd4fe276d0(0000) GS:ffff81011fc751c0(0000) knlGS:0000000000000000
[14254.411403] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[14254.417144] CR2: 00000000ff021eaf CR3: 00000000d85f6000 CR4: 00000000000006e0
[14254.424273] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14254.431400] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[14254.438528] Process crashme (pid: 28565, threadinfo ffff8100d8970000, task ffff8100d8628820)
[14254.446953]
[14254.448443] RIP [<0000000000504225>]
[14254.452124] RSP <00007fff5afccbf8>
[14254.455614] CR2: 00000000ff021eaf
[14254.459244] Kernel panic - not syncing: Fatal exception


> Of course, if it's somewhat random and timing-dependent, bisection can be
> hard (the "2^n" thing is very efficient, but it also means that a *single*
> wrong answer will totally invalidate the result, so if something isn't
> entirely reproducible, bisection often fails!)

2007-09-17 14:54:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: crashme fault



On Mon, 17 Sep 2007, Randy Dunlap wrote:
>
> OK, I haven't done the microcode update yet. I ran crashme overnight
> with your newer patch and it crashed:

Well, duh.

That's because I forgot to do the "error_code & PF_USER" =>
"user_mode_vm(regs)" thing in the most common case - the
"bad_area_nosemaphore" if-statement.

But thinking more about it, it's actually just easier and more
straightforward to just take the same approach that my original hack did,
namely to just set PF_USER if the register state implies it was in user
mode.

So ignore that patch. You're better off with my original one that also
gave debugging info, and if we decide that this really looks like a CPU
buglet, the patch to actually commit would be a simplified version of that
(appended, just FYI).

Also, I changed my mind on the microcode update - if you have a newer
microcode, go ahead and try it, because if that fixes the issue, then we
can really just close this as a "unimportant CPU bug" with the patch
below.

Linus

---
arch/x86_64/mm/fault.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 327c9f2..54816ad 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -374,6 +374,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
if (unlikely(in_atomic() || !mm))
goto bad_area_nosemaphore;

+ /*
+ * User-mode registers count as a user access even for any
+ * potential system fault or CPU buglet.
+ */
+ if (user_mode_vm(regs))
+ error_code |= PF_USER;
+
again:
/* When running in the kernel we expect faults to occur only to
* addresses in user space. All other faults represent errors in the

2007-09-17 20:05:30

by Randy Dunlap

[permalink] [raw]
Subject: Re: crashme fault

On Mon, 17 Sep 2007 07:53:50 -0700 (PDT) Linus Torvalds wrote:

> On Mon, 17 Sep 2007, Randy Dunlap wrote:
> >
> > OK, I haven't done the microcode update yet. I ran crashme overnight
> > with your newer patch and it crashed:
>
> Well, duh.
>
> That's because I forgot to do the "error_code & PF_USER" =>
> "user_mode_vm(regs)" thing in the most common case - the
> "bad_area_nosemaphore" if-statement.
>
> But thinking more about it, it's actually just easier and more
> straightforward to just take the same approach that my original hack did,
> namely to just set PF_USER if the register state implies it was in user
> mode.
>
> So ignore that patch. You're better off with my original one that also
> gave debugging info, and if we decide that this really looks like a CPU
> buglet, the patch to actually commit would be a simplified version of that
> (appended, just FYI).

[switched email address due to some unknown problems]


Reverted that patch. Loaded new microcode update file.
Ran test for 3 hours without a fault.

It all seems rather inconclusive to me, still, the new patch (below)
makes sense.

> Also, I changed my mind on the microcode update - if you have a newer
> microcode, go ahead and try it, because if that fixes the issue, then we
> can really just close this as a "unimportant CPU bug" with the patch
> below.
>
> Linus
>
> ---
> arch/x86_64/mm/fault.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
> index 327c9f2..54816ad 100644
> --- a/arch/x86_64/mm/fault.c
> +++ b/arch/x86_64/mm/fault.c
> @@ -374,6 +374,13 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
> if (unlikely(in_atomic() || !mm))
> goto bad_area_nosemaphore;
>
> + /*
> + * User-mode registers count as a user access even for any
> + * potential system fault or CPU buglet.
> + */
> + if (user_mode_vm(regs))
> + error_code |= PF_USER;
> +
> again:
> /* When running in the kernel we expect faults to occur only to
> * addresses in user space. All other faults represent errors in the
> -

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***