2015-04-06 12:10:37

by Fengguang Wu

[permalink] [raw]
Subject: [miscdevice] BUG: unable to handle kernel NULL pointer dereference at 00000028

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing

commit 0b509d8d336eef6d622d66b3ae2a1fc3a072bf92
Author: Tom Van Braeckel <[email protected]>
AuthorDate: Tue Mar 31 16:39:21 2015 +0200
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Fri Apr 3 16:15:30 2015 +0200

misc: pass miscdevice through file's private_data

Make the miscdevice accessible through the file's private_data.

Previously, this was done only when an open() file operation had been
registered. If no custom open() file operation was defined,
private_data was set to NULL.

This subtle quirk was confusing, to the point where kernel code
registered *empty* file open operations to have private_data point to
the misc device structure and avoid duplicating that logic.

And it could easily lead to bugs, where the addition or removal of a
custom open() file operation surprisingly changes the initial value of
a file's private_data structure.

To resolve this, we now place the miscdevice in the file's private_data
member unconditionally when open() is called.

Signed-off-by: Tom Van Braeckel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

+------------------------------------------+------------+------------+------------+
| | 16c9c8e1ae | 0b509d8d33 | linux-deve |
+------------------------------------------+------------+------------+------------+
| boot_successes | 900 | 290 | 111 |
| boot_failures | 0 | 10 | 3 |
| BUG:kernel_test_crashed | 0 | 3 | |
| BUG:unable_to_handle_kernel | 0 | 7 | 2 |
| Oops | 0 | 7 | 2 |
| EIP_is_at_release_pgd | 0 | 7 | |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 7 | 2 |
| backtrace:do_group_exit | 0 | 7 | 2 |
| backtrace:SyS_exit_group | 0 | 7 | 2 |
| Out_of_memory:Kill_process | 0 | 1 | |
| BUG:kernel_test_oops | 0 | 1 | |
| EIP_is_at_put_page | 0 | 0 | 2 |
| BUG:kernel_boot_crashed | 0 | 0 | 1 |
+------------------------------------------+------------+------------+------------+

[ 1.928994] init: Failed to create pty - disabling logging for job
[ 1.931296] init: Temporary process spawn error: No space left on device
Kernel tests: Boot OK!
[ 13.037537] BUG: unable to handle kernel NULL pointer dereference at 00000028
[ 13.038009] IP: [<c12c7d96>] release_pgd+0x9/0x5b
[ 13.038009] *pde = 00000000
[ 13.038009] Oops: 0000 [#1]
[ 13.038009] Modules linked in:
[ 13.038009] CPU: 0 PID: 12024 Comm: trinity-main Not tainted 4.0.0-rc5-00108-g0b509d8 #12
[ 13.038009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 13.038009] task: d216e480 ti: d27c6000 task.ti: d27c6000
[ 13.038009] EIP: 0060:[<c12c7d96>] EFLAGS: 00010202 CPU: 0
[ 13.038009] EIP is at release_pgd+0x9/0x5b
[ 13.038009] EAX: 00000028 EBX: c159c174 ECX: 00000000 EDX: 80000000
[ 13.038009] ESI: 00000028 EDI: c159c194 EBP: d27c7ef8 ESP: d27c7ee8
[ 13.038009] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 13.038009] CR0: 8005003b CR2: 00000028 CR3: 120cd000 CR4: 00000710
[ 13.038009] DR0: 086aa000 DR1: c0000000 DR2: 00000000 DR3: 00000000
[ 13.038009] DR6: ffff0ff0 DR7: 00000600
[ 13.038009] Stack:
[ 13.038009] 80000000 c159c174 00000004 c159c194 d27c7f0c c12c7ecd 00000000 c159b7ac
[ 13.038009] 00000000 d27c7f1c c12c8679 c159b7ac c159b800 d27c7f30 c12c9194 d2736800
[ 13.038009] 00000008 d26a68e0 d27c7f54 c10bef03 d0abb780 d26a68e0 d2736808 d26fdf10
[ 13.038009] Call Trace:
[ 13.038009] [<c12c7ecd>] release_all_pagetables+0x25/0x44
[ 13.038009] [<c12c8679>] free_guest_pagetable+0xe/0x28
[ 13.038009] [<c12c9194>] close+0x23/0x7d
[ 13.038009] [<c10bef03>] __fput+0xd2/0x170
[ 13.038009] [<c10befc7>] ____fput+0x8/0xa
[ 13.038009] [<c103d4d0>] task_work_run+0x4f/0x71
[ 13.038009] [<c102f2a0>] do_exit+0x2fa/0x720
[ 13.038009] [<c10be5fd>] ? SyS_write+0x48/0x81
[ 13.038009] [<c102f71b>] do_group_exit+0x2e/0x80
[ 13.038009] [<c102f77e>] SyS_exit_group+0x11/0x11
[ 13.038009] [<c138c422>] sysenter_do_call+0x12/0x12
[ 13.038009] Code: fe ff ff 89 c2 89 d8 eb 0e 31 c0 8d b6 00 00 00 00 89 c1 89 d8 89 ca e8 b1 c0 d5 ff 90 5b 5e 5f 5d c3 55 89 e5 57 56 89 c6 53 52 <8b> 00 8d b6 00 00 00 00 a8 01 74 40 8b 06 8d b6 00 00 00 00 25
[ 13.038009] EIP: [<c12c7d96>] release_pgd+0x9/0x5b SS:ESP 0068:d27c7ee8
[ 13.038009] CR2: 0000000000000028
[ 13.038009] ---[ end trace dd4a2cbce75581e6 ]---
[ 13.038009] Kernel panic - not syncing: Fatal exception

git bisect start b4ed2f2721dcdb3f739dc40356bb00f423ce63e4 e42391cd048809d903291d07f86ed3934ce138e9 --
git bisect good d2c37c2728fca7a10feabfcf94463114e6cc7821 # 20:15 300+ 0 Merge 'arm-soc/next/cleanup' into devel-hourly-2015040414
git bisect good a7a4927ab9751002ab29b2fae49bdb9aa0423575 # 20:22 300+ 0 Merge 'peterz-queue/perf/pt' into devel-hourly-2015040414
git bisect good 68f1574305b898837a9b705dac6c3bac14013056 # 20:29 300+ 0 Merge 'arm-perf/misc-patches' into devel-hourly-2015040414
git bisect good 43d4e931d057b03411fdb36964a63c03d8596cd7 # 20:37 300+ 0 Merge 'staging/staging-testing' into devel-hourly-2015040414
git bisect bad 89f4d34f74643560ec55b1a7519415d1fe34953c # 20:41 0- 2 Merge 'char-misc/char-misc-testing' into devel-hourly-2015040414
git bisect good d38b98a3b8c951a2d7f742609524632e078ddede # 20:57 300+ 0 Merge branch 'fix_ioremap_wc' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag into char-misc-next
git bisect good 86d39839bc6bccc9b6b89de8c9c38beb9709f559 # 21:15 300+ 0 Merge tag 'extcon-next-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next
git bisect bad 652594c7dfd9bf6392e3a727bc69d89a2562d953 # 21:19 3- 1 hv: run non-blocking message handlers in the dispatch tasklet
git bisect bad 8c02a5ba34a1fae6def8cb5a39bb582f09bca49c # 21:24 0- 1 coresight: making cpu index lookup arm64 compliant
git bisect bad 149cb911ae242242e5aae698710bf59e804a96e6 # 21:31 0- 1 spmi: pmic_arb: remove ARM build time dependency
git bisect bad 0b509d8d336eef6d622d66b3ae2a1fc3a072bf92 # 21:36 0- 1 misc: pass miscdevice through file's private_data
git bisect good 16c9c8e1ae228e89b66cbc03ec6c753ee44d39bc # 21:47 300+ 0 Revert "uio: constify of_device_id array"
# first bad commit: [0b509d8d336eef6d622d66b3ae2a1fc3a072bf92] misc: pass miscdevice through file's private_data
git bisect good 16c9c8e1ae228e89b66cbc03ec6c753ee44d39bc # 21:55 900+ 0 Revert "uio: constify of_device_id array"
# extra tests with DEBUG_INFO
git bisect good 0b509d8d336eef6d622d66b3ae2a1fc3a072bf92 # 22:11 900+ 0 misc: pass miscdevice through file's private_data
# extra tests on HEAD of linux-devel/devel-hourly-2015040414
git bisect bad b4ed2f2721dcdb3f739dc40356bb00f423ce63e4 # 22:11 0- 3 0day head guard for 'devel-hourly-2015040414'
# extra tests on tree/branch char-misc/char-misc-testing
git bisect bad 1ac4e6fee41d6534b6e54dcbed381590e242bdcb # 22:24 0- 2 DTS: ARM: OMAP3-N900: Add lis3lv02d support
# extra tests with first bad commit reverted
# extra tests on tree/branch linus/master
git bisect good 1cced5015b171415169d938fb179c44fe060dc15 # 22:40 900+ 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
# extra tests on tree/branch next/master
git bisect good b0a12fb5bc87820b12df22c64dd680a96443de00 # 22:55 900+ 900 Add linux-next specific files for 20150402


This script may reproduce the error.

----------------------------------------------------------------------------
#!/bin/bash

kernel=$1
initrd=quantal-core-i386.cgz

wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
qemu-system-x86_64
-enable-kvm
-cpu kvm64
-kernel $kernel
-initrd $initrd
-m 300
-smp 2
-device e1000,netdev=net0
-netdev user,id=net0
-boot order=nc
-no-reboot
-watchdog i6300esb
-rtc base=localtime
-serial stdio
-display none
-monitor null
)

append=(
hung_task_panic=1
earlyprintk=ttyS0,115200
rd.udev.log-priority=err
systemd.log_target=journal
systemd.log_level=warning
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
drbd.minor_count=8
)

"${kvm[@]}" --append "${append[*]}"
----------------------------------------------------------------------------

Thanks,
Fengguang


Attachments:
(No filename) (9.22 kB)
dmesg-quantal-ivb41-64:20150404213826:i386-randconfig-ib1-04041719:4.0.0-rc5-00108-g0b509d8:12 (44.28 kB)
config-4.0.0-rc5-00108-g0b509d8 (72.01 kB)
Download all attachments

2015-04-07 08:18:50

by Tom Van Braeckel

[permalink] [raw]
Subject: [PATCH] lguest: explicitly setup /dev/lguest private_data

The private_data member of the /dev/lguest device file is used to hold
the current struct lguest and needs to be set to NULL to signify that
no initialization has taken place.

We explicitly set it to NULL to be independent of whatever value the
misc subsystem initializes it to.

Signed-off-by: Tom Van Braeckel <[email protected]>
---
Backstory:
==========
The misc subsystem used to initialize a file's private_data to point to
the misc device when a driver had registered a custom open file
operation and initialized it to NULL when a custom open file operation
had *not* been provided.

This subtle quirk was confusing, to the point where kernel code
registered *empty* file open operations to have private_data point to
the misc device structure.

And it lead to bugs, where the addition or removal of a custom open
file operation surprisingly changed the initial contents of a file's
private_data structure.

The misc subsystem is currently underdoing changes to *always* set
private_data to point to the misc device instead of only doing this
when a custom open file operation has been registered.

Intel's 0day kernel testing robot discovered that the lguest driver
depended on it implicitly being initialized to NULL, as Fengguang Wu
reported. Thanks a lot for all the hard work!

drivers/lguest/lguest_user.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index c4c6113..054bf70 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -98,6 +98,17 @@ static int trap(struct lg_cpu *cpu, const unsigned long __user *input)
return 0;
}

+/*
+ * Set up the /dev/lguest file structure
+ * The file's private_data will hold the "struct lguest" after
+ * initialization and is used to check whether it is already initialized.
+ */
+static int open(struct inode *inode, struct file *file)
+{
+ file->private_data = NULL;
+ return 0;
+}
+
/*L:040
* Once our Guest is initialized, the Launcher makes it run by reading
* from /dev/lguest.
@@ -405,10 +416,11 @@ static int close(struct inode *inode, struct file *file)
*
* We begin our understanding with the Host kernel interface which the Launcher
* uses: reading and writing a character device called /dev/lguest. All the
- * work happens in the read(), write() and close() routines:
+ * work happens in the open(), read(), write() and close() routines:
*/
static const struct file_operations lguest_fops = {
.owner = THIS_MODULE,
+ .open = open,
.release = close,
.write = write,
.read = read,
--
2.1.0

2015-04-07 08:34:31

by Daniel Baluta

[permalink] [raw]
Subject: Re: [PATCH] lguest: explicitly setup /dev/lguest private_data

On Tue, Apr 7, 2015 at 11:18 AM, Tom Van Braeckel
<[email protected]> wrote:
> The private_data member of the /dev/lguest device file is used to hold
> the current struct lguest and needs to be set to NULL to signify that
> no initialization has taken place.
>
> We explicitly set it to NULL to be independent of whatever value the
> misc subsystem initializes it to.
>
> Signed-off-by: Tom Van Braeckel <[email protected]>
> ---
> Backstory:
> ==========
> The misc subsystem used to initialize a file's private_data to point to
> the misc device when a driver had registered a custom open file
> operation and initialized it to NULL when a custom open file operation
> had *not* been provided.
>
> This subtle quirk was confusing, to the point where kernel code
> registered *empty* file open operations to have private_data point to
> the misc device structure.
>
> And it lead to bugs, where the addition or removal of a custom open
> file operation surprisingly changed the initial contents of a file's
> private_data structure.
>
> The misc subsystem is currently underdoing changes to *always* set
> private_data to point to the misc device instead of only doing this
> when a custom open file operation has been registered.
>
> Intel's 0day kernel testing robot discovered that the lguest driver
> depended on it implicitly being initialized to NULL, as Fengguang Wu
> reported. Thanks a lot for all the hard work!
>
> drivers/lguest/lguest_user.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
> index c4c6113..054bf70 100644
> --- a/drivers/lguest/lguest_user.c
> +++ b/drivers/lguest/lguest_user.c
> @@ -98,6 +98,17 @@ static int trap(struct lg_cpu *cpu, const unsigned long __user *input)
> return 0;
> }
>
> +/*
> + * Set up the /dev/lguest file structure
> + * The file's private_data will hold the "struct lguest" after
> + * initialization and is used to check whether it is already initialized.
> + */
> +static int open(struct inode *inode, struct file *file)
> +{
> + file->private_data = NULL;
> + return 0;
> +}
> +
> /*L:040
> * Once our Guest is initialized, the Launcher makes it run by reading
> * from /dev/lguest.
> @@ -405,10 +416,11 @@ static int close(struct inode *inode, struct file *file)
> *
> * We begin our understanding with the Host kernel interface which the Launcher
> * uses: reading and writing a character device called /dev/lguest. All the
> - * work happens in the read(), write() and close() routines:
> + * work happens in the open(), read(), write() and close() routines:
> */
> static const struct file_operations lguest_fops = {
> .owner = THIS_MODULE,
> + .open = open,
> .release = close,
> .write = write,
> .read = read,

Hmm, isn't this already fixed?

https://lkml.org/lkml/2015/3/23/319

thanks,
Daniel.

2015-04-07 08:36:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] lguest: explicitly setup /dev/lguest private_data

On Tue, Apr 07, 2015 at 10:18:18AM +0200, Tom Van Braeckel wrote:
> The private_data member of the /dev/lguest device file is used to hold
> the current struct lguest and needs to be set to NULL to signify that
> no initialization has taken place.
>
> We explicitly set it to NULL to be independent of whatever value the
> misc subsystem initializes it to.
>
> Signed-off-by: Tom Van Braeckel <[email protected]>
> ---
> Backstory:
> ==========
> The misc subsystem used to initialize a file's private_data to point to
> the misc device when a driver had registered a custom open file
> operation and initialized it to NULL when a custom open file operation
> had *not* been provided.
>
> This subtle quirk was confusing, to the point where kernel code
> registered *empty* file open operations to have private_data point to
> the misc device structure.
>
> And it lead to bugs, where the addition or removal of a custom open
> file operation surprisingly changed the initial contents of a file's
> private_data structure.
>
> The misc subsystem is currently underdoing changes to *always* set
> private_data to point to the misc device instead of only doing this
> when a custom open file operation has been registered.
>
> Intel's 0day kernel testing robot discovered that the lguest driver
> depended on it implicitly being initialized to NULL, as Fengguang Wu
> reported. Thanks a lot for all the hard work!
>
> drivers/lguest/lguest_user.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)

I can take this through my char-misc tree, where this misc core change
was, if the lguest maintainer (i.e. Rusty) acks it.

Tom, thanks for tracking this down.

greg k-h

2015-04-07 08:55:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] lguest: explicitly setup /dev/lguest private_data

On Tue, Apr 07, 2015 at 11:34:25AM +0300, Daniel Baluta wrote:
> On Tue, Apr 7, 2015 at 11:18 AM, Tom Van Braeckel
> <[email protected]> wrote:
> > The private_data member of the /dev/lguest device file is used to hold
> > the current struct lguest and needs to be set to NULL to signify that
> > no initialization has taken place.
> >
> > We explicitly set it to NULL to be independent of whatever value the
> > misc subsystem initializes it to.
> >
> > Signed-off-by: Tom Van Braeckel <[email protected]>
> > ---
> > Backstory:
> > ==========
> > The misc subsystem used to initialize a file's private_data to point to
> > the misc device when a driver had registered a custom open file
> > operation and initialized it to NULL when a custom open file operation
> > had *not* been provided.
> >
> > This subtle quirk was confusing, to the point where kernel code
> > registered *empty* file open operations to have private_data point to
> > the misc device structure.
> >
> > And it lead to bugs, where the addition or removal of a custom open
> > file operation surprisingly changed the initial contents of a file's
> > private_data structure.
> >
> > The misc subsystem is currently underdoing changes to *always* set
> > private_data to point to the misc device instead of only doing this
> > when a custom open file operation has been registered.
> >
> > Intel's 0day kernel testing robot discovered that the lguest driver
> > depended on it implicitly being initialized to NULL, as Fengguang Wu
> > reported. Thanks a lot for all the hard work!
> >
> > drivers/lguest/lguest_user.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
> > index c4c6113..054bf70 100644
> > --- a/drivers/lguest/lguest_user.c
> > +++ b/drivers/lguest/lguest_user.c
> > @@ -98,6 +98,17 @@ static int trap(struct lg_cpu *cpu, const unsigned long __user *input)
> > return 0;
> > }
> >
> > +/*
> > + * Set up the /dev/lguest file structure
> > + * The file's private_data will hold the "struct lguest" after
> > + * initialization and is used to check whether it is already initialized.
> > + */
> > +static int open(struct inode *inode, struct file *file)
> > +{
> > + file->private_data = NULL;
> > + return 0;
> > +}
> > +
> > /*L:040
> > * Once our Guest is initialized, the Launcher makes it run by reading
> > * from /dev/lguest.
> > @@ -405,10 +416,11 @@ static int close(struct inode *inode, struct file *file)
> > *
> > * We begin our understanding with the Host kernel interface which the Launcher
> > * uses: reading and writing a character device called /dev/lguest. All the
> > - * work happens in the read(), write() and close() routines:
> > + * work happens in the open(), read(), write() and close() routines:
> > */
> > static const struct file_operations lguest_fops = {
> > .owner = THIS_MODULE,
> > + .open = open,
> > .release = close,
> > .write = write,
> > .read = read,
>
> Hmm, isn't this already fixed?
>
> https://lkml.org/lkml/2015/3/23/319

Ah, this might be a cross-tree issue then, the 0-day bot tested my tree
without this change in it, and hit the problem. So all is good when we
merge with Linus for 4.1-rc1.

But to be "safe" I could queue this up to my tree as well, any objection
to that?

thanks,

greg k-h

2015-04-11 00:32:08

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] lguest: explicitly setup /dev/lguest private_data

Greg KH <[email protected]> writes:
> On Tue, Apr 07, 2015 at 10:18:18AM +0200, Tom Van Braeckel wrote:
>> The private_data member of the /dev/lguest device file is used to hold
>> the current struct lguest and needs to be set to NULL to signify that
>> no initialization has taken place.
>>
>> We explicitly set it to NULL to be independent of whatever value the
>> misc subsystem initializes it to.
>>
>> Signed-off-by: Tom Van Braeckel <[email protected]>
>> ---
>> Backstory:
>> ==========
>> The misc subsystem used to initialize a file's private_data to point to
>> the misc device when a driver had registered a custom open file
>> operation and initialized it to NULL when a custom open file operation
>> had *not* been provided.
>>
>> This subtle quirk was confusing, to the point where kernel code
>> registered *empty* file open operations to have private_data point to
>> the misc device structure.
>>
>> And it lead to bugs, where the addition or removal of a custom open
>> file operation surprisingly changed the initial contents of a file's
>> private_data structure.
>>
>> The misc subsystem is currently underdoing changes to *always* set
>> private_data to point to the misc device instead of only doing this
>> when a custom open file operation has been registered.
>>
>> Intel's 0day kernel testing robot discovered that the lguest driver
>> depended on it implicitly being initialized to NULL, as Fengguang Wu
>> reported. Thanks a lot for all the hard work!
>>
>> drivers/lguest/lguest_user.c | 14 +++++++++++++-
>> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> I can take this through my char-misc tree, where this misc core change
> was, if the lguest maintainer (i.e. Rusty) acks it.

Acked-by: Rusty Russell <[email protected]>

Cheers,
Rusty.