Hi!
While running the syzkaller fuzzer I've got the following error report.
On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
gadgetfs: disconnected
sysfs: cannot create duplicate filename
'/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c
ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510
ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179
[<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
[<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565
[<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30
[<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59
[< inline >] create_dir lib/kobject.c:71
[<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229
[< inline >] kobject_add_varg lib/kobject.c:366
[<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411
[<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088
[<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206
[<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260
drivers/usb/core/endpoint.c:195
[<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200
drivers/usb/core/message.c:1030
[<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0
drivers/usb/core/message.c:1937
[<ffffffff82cc9e2e>] generic_probe+0x6e/0xe0 drivers/usb/core/generic.c:172
[<ffffffff82caa7fa>] usb_probe_device+0xaa/0xe0 drivers/usb/core/driver.c:263
[< inline >] really_probe drivers/base/dd.c:375
[<ffffffff82741f64>] driver_probe_device+0x514/0x980 drivers/base/dd.c:515
[<ffffffff827427bb>] __device_attach_driver+0x22b/0x290 drivers/base/dd.c:610
[<ffffffff8273b56c>] bus_for_each_drv+0x15c/0x200 drivers/base/bus.c:463
[<ffffffff82741776>] __device_attach+0x266/0x3c0 drivers/base/dd.c:667
[<ffffffff827428ba>] device_initial_probe+0x1a/0x20 drivers/base/dd.c:714
[<ffffffff8273edb6>] bus_probe_device+0x1e6/0x290 drivers/base/bus.c:557
[<ffffffff82738416>] device_add+0xd06/0x1660 drivers/base/core.c:1136
[<ffffffff82c8066f>] usb_new_device+0x9af/0x1e80 drivers/usb/core/hub.c:2508
[< inline >] hub_port_connect drivers/usb/core/hub.c:4891
[< inline >] hub_port_connect_change drivers/usb/core/hub.c:4996
[< inline >] port_event drivers/usb/core/hub.c:5102
[<ffffffff82c84fe9>] hub_event+0x1689/0x36d0 drivers/usb/core/hub.c:5182
[<ffffffff8132743a>] process_one_work+0xb2a/0x1aa0 kernel/workqueue.c:2096
[<ffffffff813285bf>] worker_thread+0x20f/0x18a0 kernel/workqueue.c:2230
[<ffffffff8133ebf3>] kthread+0x323/0x3e0 kernel/kthread.c:209
[<ffffffff84f4812a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
On Mon, 12 Dec 2016, Andrey Konovalov wrote:
> Hi!
>
> While running the syzkaller fuzzer I've got the following error report.
>
> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
>
> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
> gadgetfs: disconnected
> sysfs: cannot create duplicate filename
> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
> Kernel panic - not syncing: panic_on_warn set ...
I suppose we could check for USB devices that claim to have two
endpoints with the same address. But is it really worthwhile? A
kernel warning isn't so bad when you're dealing with buggy device
firmware.
Alan Stern
On Mon, Dec 12, 2016 at 10:05 PM, Alan Stern <[email protected]> wrote:
> On Mon, 12 Dec 2016, Andrey Konovalov wrote:
>
>> Hi!
>>
>> While running the syzkaller fuzzer I've got the following error report.
>>
>> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
>>
>> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
>> gadgetfs: disconnected
>> sysfs: cannot create duplicate filename
>> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
>> Kernel panic - not syncing: panic_on_warn set ...
>
> I suppose we could check for USB devices that claim to have two
> endpoints with the same address. But is it really worthwhile? A
> kernel warning isn't so bad when you're dealing with buggy device
> firmware.
We need a clear distinction between what is a bug in kernel source
code and what is incorrect user-space code. Otherwise no automated
testing is possible. WARNING means bug in kernel source code. If it is
not a bug in kernel source code, then it must not produce a WARNING.
If it's a condition that we absolutely need to make user-space aware
of, then we can print a single line with an explanation to console
(but not prefixed with "WARNING:" nor "BUG:").
On Mon, 12 Dec 2016, Dmitry Vyukov wrote:
> On Mon, Dec 12, 2016 at 10:05 PM, Alan Stern <[email protected]> wrote:
> > On Mon, 12 Dec 2016, Andrey Konovalov wrote:
> >
> >> Hi!
> >>
> >> While running the syzkaller fuzzer I've got the following error report.
> >>
> >> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
> >>
> >> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
> >> gadgetfs: disconnected
> >> sysfs: cannot create duplicate filename
> >> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
> >> Kernel panic - not syncing: panic_on_warn set ...
> >
> > I suppose we could check for USB devices that claim to have two
> > endpoints with the same address. But is it really worthwhile? A
> > kernel warning isn't so bad when you're dealing with buggy device
> > firmware.
>
> We need a clear distinction between what is a bug in kernel source
> code and what is incorrect user-space code. Otherwise no automated
> testing is possible. WARNING means bug in kernel source code.
I don't necessarily agree with that. Is it documented anywhere?
> If it is
> not a bug in kernel source code, then it must not produce a WARNING.
Granting this point for the sake of argument...
> If it's a condition that we absolutely need to make user-space aware
> of, then we can print a single line with an explanation to console
> (but not prefixed with "WARNING:" nor "BUG:").
Then does the warning message in sysfs_warn_dup() need to be changed to
some other severity level?
Or is it truly a bug for the kernel to try to register two devices with
the same name? (In this case, two endpoints with the same address,
where the addresses are specified by the firmware on the USB device.)
Alan Stern
On Mon, Dec 12, 2016 at 10:16:50PM +0100, Dmitry Vyukov wrote:
> On Mon, Dec 12, 2016 at 10:05 PM, Alan Stern <[email protected]> wrote:
> > On Mon, 12 Dec 2016, Andrey Konovalov wrote:
> >
> >> Hi!
> >>
> >> While running the syzkaller fuzzer I've got the following error report.
> >>
> >> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
> >>
> >> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
> >> gadgetfs: disconnected
> >> sysfs: cannot create duplicate filename
> >> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
> >> Kernel panic - not syncing: panic_on_warn set ...
> >
> > I suppose we could check for USB devices that claim to have two
> > endpoints with the same address. But is it really worthwhile? A
> > kernel warning isn't so bad when you're dealing with buggy device
> > firmware.
>
> We need a clear distinction between what is a bug in kernel source
> code and what is incorrect user-space code. Otherwise no automated
> testing is possible. WARNING means bug in kernel source code. If it is
> not a bug in kernel source code, then it must not produce a WARNING.
> If it's a condition that we absolutely need to make user-space aware
> of, then we can print a single line with an explanation to console
> (but not prefixed with "WARNING:" nor "BUG:").
Ok, this is a "bug" in kernel code so the core is telling the higher
layer that something went really wrong in that a duplicate sysfs file
was created. So this code is working properly.
And yes, this is a totally bogus "fake hardware" being fuzzed on the
kernel. It complained that something was really wrong, which is fine,
it didn't crash or do anything else bad, correct? If so, all is good,
as this is not something you will ever see with a "real" device. And
even if you make a malicious device, all you will do is have the kernel
spit out some ugly messages, which I do not think is any form of attack
vector, correct?
thanks,
greg k-h
On Mon, 12 Dec 2016, Alan Stern wrote:
> On Mon, 12 Dec 2016, Dmitry Vyukov wrote:
>
> > On Mon, Dec 12, 2016 at 10:05 PM, Alan Stern <[email protected]> wrote:
> > > On Mon, 12 Dec 2016, Andrey Konovalov wrote:
> > >
> > >> Hi!
> > >>
> > >> While running the syzkaller fuzzer I've got the following error report.
> > >>
> > >> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
> > >>
> > >> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
> > >> gadgetfs: disconnected
> > >> sysfs: cannot create duplicate filename
> > >> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
> > >> Kernel panic - not syncing: panic_on_warn set ...
> > >
> > > I suppose we could check for USB devices that claim to have two
> > > endpoints with the same address. But is it really worthwhile? A
> > > kernel warning isn't so bad when you're dealing with buggy device
> > > firmware.
> >
> > We need a clear distinction between what is a bug in kernel source
> > code and what is incorrect user-space code. Otherwise no automated
> > testing is possible. WARNING means bug in kernel source code.
>
> I don't necessarily agree with that. Is it documented anywhere?
>
> > If it is
> > not a bug in kernel source code, then it must not produce a WARNING.
What about a memory allocation failure? The memory management part of
the kernel produces a WARNING message if an allocation fails and the
caller did not specify __GFP_NOWARN.
There is no way for a driver to guarantee that a memory allocation
request will succeed -- failure is always an option. But obviously
memory allocation failures are not bugs in the kernel.
Are you saying that mm/page_alloc.c:warn_alloc() should produce
something other than a WARNING?
Alan Stern
On Mon, Dec 12, 2016 at 11:04 PM, Alan Stern <[email protected]> wrote:
> On Mon, 12 Dec 2016, Alan Stern wrote:
>
>> On Mon, 12 Dec 2016, Dmitry Vyukov wrote:
>>
>> > On Mon, Dec 12, 2016 at 10:05 PM, Alan Stern <[email protected]> wrote:
>> > > On Mon, 12 Dec 2016, Andrey Konovalov wrote:
>> > >
>> > >> Hi!
>> > >>
>> > >> While running the syzkaller fuzzer I've got the following error report.
>> > >>
>> > >> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
>> > >>
>> > >> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
>> > >> gadgetfs: disconnected
>> > >> sysfs: cannot create duplicate filename
>> > >> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
>> > >> Kernel panic - not syncing: panic_on_warn set ...
>> > >
>> > > I suppose we could check for USB devices that claim to have two
>> > > endpoints with the same address. But is it really worthwhile? A
>> > > kernel warning isn't so bad when you're dealing with buggy device
>> > > firmware.
>> >
>> > We need a clear distinction between what is a bug in kernel source
>> > code and what is incorrect user-space code. Otherwise no automated
>> > testing is possible. WARNING means bug in kernel source code.
>>
>> I don't necessarily agree with that. Is it documented anywhere?
>>
>> > If it is
>> > not a bug in kernel source code, then it must not produce a WARNING.
>
> What about a memory allocation failure? The memory management part of
> the kernel produces a WARNING message if an allocation fails and the
> caller did not specify __GFP_NOWARN.
>
> There is no way for a driver to guarantee that a memory allocation
> request will succeed -- failure is always an option. But obviously
> memory allocation failures are not bugs in the kernel.
>
> Are you saying that mm/page_alloc.c:warn_alloc() should produce
> something other than a WARNING?
The main thing I am saying is that we absolutely need a way for a
human or a computer program to be able to determine if there is
anything wrong with kernel or not.
Page_alloc produces a WARNING iff you ask for an amount of memory that
can't possibly be allocated under any circumstances (order >=
MAX_ORDER). That's not just an allocation failure. Kernel itself
should not generally ask for such large amounts of memory. If the
allocation size is dictated by user, then kernel code should either
use __GFP_NOWARN, or impose own stricter limit dictated by context
(e.g. if it's a text command of known format, then limit can be as
small as say 128 bytes).
Re fake hardware. panic_on_warn will definitely cause problems. I
don't know if it used in any paranoid production systems or not,
though. But more generally, I don't see how it is different from
incorrect syscall arguments or nonsensical network packets received
from free internet. In ideal productions environments none of these
incorrect inputs to kernel should happen. I don't see any single
reason to not protect kernel from incorrect input in this case as
well, as we do in all other cases. In particular, it would resolve a
very real and practical issue for us -- fuzzer will not reboot machine
every minute, and we will not spend time looking at these WARNINGs,
and we will not spend your time by reporting these WARNINGs.
On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> >> > If it is
> >> > not a bug in kernel source code, then it must not produce a WARNING.
> >
> > What about a memory allocation failure? The memory management part of
> > the kernel produces a WARNING message if an allocation fails and the
> > caller did not specify __GFP_NOWARN.
> >
> > There is no way for a driver to guarantee that a memory allocation
> > request will succeed -- failure is always an option. But obviously
> > memory allocation failures are not bugs in the kernel.
> >
> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
> > something other than a WARNING?
>
>
> The main thing I am saying is that we absolutely need a way for a
> human or a computer program to be able to determine if there is
> anything wrong with kernel or not.
Okay, I agree with that. I'm not at all sure that this decision should
be based on whether a message is a WARNING.
In my experience, there is no general consensus on what conditions
should qualify as a WARNING. There aren't any hard-and-fast rules, so
people are forced to rely on their own judgment. The general
interpretation seems to be that WARNING is appropriate for notifying
the user whenever something important has gone unexpectedly wrong.
That covers a lot of ground, including things (like hardware failures)
that are not kernel bugs.
> Page_alloc produces a WARNING iff you ask for an amount of memory that
> can't possibly be allocated under any circumstances (order >=
> MAX_ORDER).
Doesn't it also produce a WARNING under other circumstances?
> That's not just an allocation failure. Kernel itself
> should not generally ask for such large amounts of memory. If the
> allocation size is dictated by user, then kernel code should either
> use __GFP_NOWARN, or impose own stricter limit dictated by context
> (e.g. if it's a text command of known format, then limit can be as
> small as say 128 bytes).
All right, yes, it makes sense to use __GFP_NOWARN when the allocation
size is dictated by the user.
But still, even when a stricter limit has been imposed, a memory
allocation request may fail. In fact, as far as I know, the kernel has
_no_ guarantee at all that any memory allocation request will succeed!
The best it offers is that sufficiently small requests may wait
indefinitely for memory to become available, which isn't much better
than failing.
> Re fake hardware. panic_on_warn will definitely cause problems. I
> don't know if it used in any paranoid production systems or not,
> though. But more generally, I don't see how it is different from
> incorrect syscall arguments or nonsensical network packets received
> from free internet. In ideal productions environments none of these
> incorrect inputs to kernel should happen. I don't see any single
> reason to not protect kernel from incorrect input in this case as
> well, as we do in all other cases.
I _do_ see a reason. Real computers don't live in ideal production
environments. Why go to extra trouble to add protection for some
particular incorrect input when the kernel is _already_ capable of
handling this input in a valid manner (even though this may include
logging a WARNING message)?
> In particular, it would resolve a
> very real and practical issue for us -- fuzzer will not reboot machine
> every minute, and we will not spend time looking at these WARNINGs,
> and we will not spend your time by reporting these WARNINGs.
Maybe you should decide that ERROR messages indicate a kernel bug,
rather than WARNING messages. Even that is questionable, but you will
get far fewer false positives.
Even better, you should publicize this decision (in the kernel
documentation, on various mailing lists, on LWN, and so forth), and
enforce it by reducing existing ERROR severity levels to WARNINGs in
places where they do not indicate a kernel bug.
Alan Stern
On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern <[email protected]> wrote:
> On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>
>> >> > If it is
>> >> > not a bug in kernel source code, then it must not produce a WARNING.
>> >
>> > What about a memory allocation failure? The memory management part of
>> > the kernel produces a WARNING message if an allocation fails and the
>> > caller did not specify __GFP_NOWARN.
>> >
>> > There is no way for a driver to guarantee that a memory allocation
>> > request will succeed -- failure is always an option. But obviously
>> > memory allocation failures are not bugs in the kernel.
>> >
>> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
>> > something other than a WARNING?
>>
>>
>> The main thing I am saying is that we absolutely need a way for a
>> human or a computer program to be able to determine if there is
>> anything wrong with kernel or not.
>
> Okay, I agree with that. I'm not at all sure that this decision should
> be based on whether a message is a WARNING.
>
> In my experience, there is no general consensus on what conditions
> should qualify as a WARNING. There aren't any hard-and-fast rules, so
> people are forced to rely on their own judgment. The general
> interpretation seems to be that WARNING is appropriate for notifying
> the user whenever something important has gone unexpectedly wrong.
> That covers a lot of ground, including things (like hardware failures)
> that are not kernel bugs.
see below
>> Page_alloc produces a WARNING iff you ask for an amount of memory that
>> can't possibly be allocated under any circumstances (order >=
>> MAX_ORDER).
>
> Doesn't it also produce a WARNING under other circumstances?
No.
OOM is not a WARNING and is easily distinguishable from BUG/WARNING.
>> That's not just an allocation failure. Kernel itself
>> should not generally ask for such large amounts of memory. If the
>> allocation size is dictated by user, then kernel code should either
>> use __GFP_NOWARN, or impose own stricter limit dictated by context
>> (e.g. if it's a text command of known format, then limit can be as
>> small as say 128 bytes).
>
> All right, yes, it makes sense to use __GFP_NOWARN when the allocation
> size is dictated by the user.
>
> But still, even when a stricter limit has been imposed, a memory
> allocation request may fail. In fact, as far as I know, the kernel has
> _no_ guarantee at all that any memory allocation request will succeed!
> The best it offers is that sufficiently small requests may wait
> indefinitely for memory to become available, which isn't much better
> than failing.
Memory allocator does not print WARNINGs on allocation failures.
>> Re fake hardware. panic_on_warn will definitely cause problems. I
>> don't know if it used in any paranoid production systems or not,
>> though. But more generally, I don't see how it is different from
>> incorrect syscall arguments or nonsensical network packets received
>> from free internet. In ideal productions environments none of these
>> incorrect inputs to kernel should happen. I don't see any single
>> reason to not protect kernel from incorrect input in this case as
>> well, as we do in all other cases.
>
> I _do_ see a reason. Real computers don't live in ideal production
> environments. Why go to extra trouble to add protection for some
> particular incorrect input when the kernel is _already_ capable of
> handling this input in a valid manner (even though this may include
> logging a WARNING message)?
One reason is to be able to distinguish kernel bugs from incorrect
inputs to kernel.
also see below
>> In particular, it would resolve a
>> very real and practical issue for us -- fuzzer will not reboot machine
>> every minute, and we will not spend time looking at these WARNINGs,
>> and we will not spend your time by reporting these WARNINGs.
>
> Maybe you should decide that ERROR messages indicate a kernel bug,
> rather than WARNING messages. Even that is questionable, but you will
> get far fewer false positives.
>
> Even better, you should publicize this decision (in the kernel
> documentation, on various mailing lists, on LWN, and so forth), and
> enforce it by reducing existing ERROR severity levels to WARNINGs in
> places where they do not indicate a kernel bug.
OK, I have some numbers on my hands.
To date we've run about 1e12 random programs covering 1000+ syscalls
and have found 300+ bugs. 50+ of these bugs are found on WARNINGs
(search by "warning" here
https://github.com/google/syzkaller/wiki/Found-Bugs). Some of the bugs
found on WARNINGs are as severe as VMM exploitations. With that
coverage the _only_ case I know of now where a WARNING does not mean
bug in kernel source code is basically the one we are arguing about
now. Provided that total number of WARN/WARN_ON in kernel code is
~10000. There is currently a very-very-very-very-very clear WARNING
usage practice that suggests that WARNING is-a bug in kernel code.
If we stop treating WARNINGs as bugs we will miss dozens of real bugs.
This just doesn't make sense provided that we are talking about
basically a single precedent.
We actually even treat INFO as bugs, because there are:
INFO: possible circular locking dependency detected
INFO: rcu_preempt detected stalls
INFO: rcu_sched detected stalls
INFO: rcu_preempt self-detected stall on CPU
INFO: rcu_sched self-detected stall on CPU
INFO: suspicious RCU usage
INFO: task .* blocked for more than [0-9]+ seconds
with only a single exception of:
INFO: lockdep is turned off
For INFO the story is so clear. The name strongly suggests that it is
not a bug. So probably we should promote harmful INFO to WARNINGs. And
then stop treating INFO as bugs.
On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern <[email protected]> wrote:
> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> >
> >> >> > If it is
> >> >> > not a bug in kernel source code, then it must not produce a WARNING.
> >> >
> >> > What about a memory allocation failure? The memory management part of
> >> > the kernel produces a WARNING message if an allocation fails and the
> >> > caller did not specify __GFP_NOWARN.
> >> >
> >> > There is no way for a driver to guarantee that a memory allocation
> >> > request will succeed -- failure is always an option. But obviously
> >> > memory allocation failures are not bugs in the kernel.
> >> >
> >> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
> >> > something other than a WARNING?
> >>
> >>
> >> The main thing I am saying is that we absolutely need a way for a
> >> human or a computer program to be able to determine if there is
> >> anything wrong with kernel or not.
> > Doesn't it also produce a WARNING under other circumstances?
>
> No.
>
> OOM is not a WARNING and is easily distinguishable from BUG/WARNING.
> Memory allocator does not print WARNINGs on allocation failures.
Do you count dev_warn the same as WARN or WARN_ON? What about dev_WARN
or pr_warn() or printk(KERN_WARNING...)? Maybe we're not talking about
the same messages.
The USB subsystem has got tons of dev_warn() and dev_err() calls.
Relatively few (if any) of them are for kernel bugs.
Alan Stern
On Tue, Dec 13, 2016 at 7:38 PM, Alan Stern <[email protected]> wrote:
> On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>
>> On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern <[email protected]> wrote:
>> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>> >
>> >> >> > If it is
>> >> >> > not a bug in kernel source code, then it must not produce a WARNING.
>> >> >
>> >> > What about a memory allocation failure? The memory management part of
>> >> > the kernel produces a WARNING message if an allocation fails and the
>> >> > caller did not specify __GFP_NOWARN.
>> >> >
>> >> > There is no way for a driver to guarantee that a memory allocation
>> >> > request will succeed -- failure is always an option. But obviously
>> >> > memory allocation failures are not bugs in the kernel.
>> >> >
>> >> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
>> >> > something other than a WARNING?
>> >>
>> >>
>> >> The main thing I am saying is that we absolutely need a way for a
>> >> human or a computer program to be able to determine if there is
>> >> anything wrong with kernel or not.
>> > Doesn't it also produce a WARNING under other circumstances?
>>
>> No.
>>
>> OOM is not a WARNING and is easily distinguishable from BUG/WARNING.
>
>> Memory allocator does not print WARNINGs on allocation failures.
>
> Do you count dev_warn the same as WARN or WARN_ON? What about dev_WARN
> or pr_warn() or printk(KERN_WARNING...)? Maybe we're not talking about
> the same messages.
>
> The USB subsystem has got tons of dev_warn() and dev_err() calls.
> Relatively few (if any) of them are for kernel bugs.
I grep for "WARNING:". It is not possible to understand what function
printed messages on console.
Here are my current regexps:
https://github.com/google/syzkaller/blob/master/report/report.go#L29
On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> On Tue, Dec 13, 2016 at 7:38 PM, Alan Stern <[email protected]> wrote:
> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> >
> >> On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern <[email protected]> wrote:
> >> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
> >> >
> >> >> >> > If it is
> >> >> >> > not a bug in kernel source code, then it must not produce a WARNING.
> >> >> >
> >> >> > What about a memory allocation failure? The memory management part of
> >> >> > the kernel produces a WARNING message if an allocation fails and the
> >> >> > caller did not specify __GFP_NOWARN.
> >> >> >
> >> >> > There is no way for a driver to guarantee that a memory allocation
> >> >> > request will succeed -- failure is always an option. But obviously
> >> >> > memory allocation failures are not bugs in the kernel.
> >> >> >
> >> >> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
> >> >> > something other than a WARNING?
> >> >>
> >> >>
> >> >> The main thing I am saying is that we absolutely need a way for a
> >> >> human or a computer program to be able to determine if there is
> >> >> anything wrong with kernel or not.
> >> > Doesn't it also produce a WARNING under other circumstances?
> >>
> >> No.
> >>
> >> OOM is not a WARNING and is easily distinguishable from BUG/WARNING.
> >
> >> Memory allocator does not print WARNINGs on allocation failures.
> >
> > Do you count dev_warn the same as WARN or WARN_ON? What about dev_WARN
> > or pr_warn() or printk(KERN_WARNING...)? Maybe we're not talking about
> > the same messages.
> >
> > The USB subsystem has got tons of dev_warn() and dev_err() calls.
> > Relatively few (if any) of them are for kernel bugs.
>
>
> I grep for "WARNING:". It is not possible to understand what function
> printed messages on console.
> Here are my current regexps:
> https://github.com/google/syzkaller/blob/master/report/report.go#L29
Ah, okay.
So the take-home message is that we should use WARN* or dev_WARN or
related functions only when reporting an actual kernel bug, whereas in
other circumstances we should avoid mentioning "WARNING" or "BUG" in
log output. In addition, memory allocations where the size is given by
the user (and not limited) should always use the __GFP_NOWARN flag.
I can audit the parts of the USB stack that I'm familiar with for these
things. Anything else? What about "ERROR"? Your regexps don't appear
to search for that.
Alan Stern
"On Tue, Dec 13, 2016 at 9:09 PM, Alan Stern <[email protected]> wrote:
> On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>
>> On Tue, Dec 13, 2016 at 7:38 PM, Alan Stern <[email protected]> wrote:
>> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>> >
>> >> On Tue, Dec 13, 2016 at 4:52 PM, Alan Stern <[email protected]> wrote:
>> >> > On Tue, 13 Dec 2016, Dmitry Vyukov wrote:
>> >> >
>> >> >> >> > If it is
>> >> >> >> > not a bug in kernel source code, then it must not produce a WARNING.
>> >> >> >
>> >> >> > What about a memory allocation failure? The memory management part of
>> >> >> > the kernel produces a WARNING message if an allocation fails and the
>> >> >> > caller did not specify __GFP_NOWARN.
>> >> >> >
>> >> >> > There is no way for a driver to guarantee that a memory allocation
>> >> >> > request will succeed -- failure is always an option. But obviously
>> >> >> > memory allocation failures are not bugs in the kernel.
>> >> >> >
>> >> >> > Are you saying that mm/page_alloc.c:warn_alloc() should produce
>> >> >> > something other than a WARNING?
>> >> >>
>> >> >>
>> >> >> The main thing I am saying is that we absolutely need a way for a
>> >> >> human or a computer program to be able to determine if there is
>> >> >> anything wrong with kernel or not.
>> >> > Doesn't it also produce a WARNING under other circumstances?
>> >>
>> >> No.
>> >>
>> >> OOM is not a WARNING and is easily distinguishable from BUG/WARNING.
>> >
>> >> Memory allocator does not print WARNINGs on allocation failures.
>> >
>> > Do you count dev_warn the same as WARN or WARN_ON? What about dev_WARN
>> > or pr_warn() or printk(KERN_WARNING...)? Maybe we're not talking about
>> > the same messages.
>> >
>> > The USB subsystem has got tons of dev_warn() and dev_err() calls.
>> > Relatively few (if any) of them are for kernel bugs.
>>
>>
>> I grep for "WARNING:". It is not possible to understand what function
>> printed messages on console.
>> Here are my current regexps:
>> https://github.com/google/syzkaller/blob/master/report/report.go#L29
>
> Ah, okay.
>
> So the take-home message is that we should use WARN* or dev_WARN or
> related functions only when reporting an actual kernel bug, whereas in
> other circumstances we should avoid mentioning "WARNING" or "BUG" in
> log output. In addition, memory allocations where the size is given by
> the user (and not limited) should always use the __GFP_NOWARN flag.
Yes, such simple convention that allows to "grep" for bugs would be awesome!
> I can audit the parts of the USB stack that I'm familiar with for these
> things. Anything else? What about "ERROR"? Your regexps don't appear
> to search for that.
Well, I don't know. Are there messages that are prefixed with ERROR
and indicate kernel bugs?
I can't possibly manually inspect all generated kernel output. I add
errors on case-by-case basis. BUG: and WARNING: where obvious starting
points. Then I discovered "INFO:", "Unable to handle kernel paging
request", "general protection fault:", "Kernel panic", "kernel BUG",
"Kernel BUG", "divide error:", "invalid opcode:", "unreferenced
object" and "UBSAN:".
There are chances that there are some bugs with ERROR: that I am just missing.
For example my dmesg is full of:
[2536674.524238] vmwrite error: reg 2800 value ffffffffffffffff (err -255)
[2536674.524256] Call Trace:
[2536674.524260] [<ffffffff8172a4bb>] dump_stack+0x64/0x82
[2536674.524267] [<ffffffffa1988ce2>] vmwrite_error+0x2c/0x2e [kvm_intel]
[2536674.524272] [<ffffffffa197d83f>] vmcs_writel+0x1f/0x30 [kvm_intel]
[2536674.524278] [<ffffffffa19847e0>] free_nested.part.64+0x70/0x170
[kvm_intel]
[2536674.524283] [<ffffffffa1984963>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[2536674.524295] [<ffffffffa0eced24>] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[2536674.524308] [<ffffffffa0ecf892>] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]
[2536674.524311] [<ffffffff810c9b3d>] ? synchronize_srcu+0x1d/0x20
[2536674.524323] [<ffffffffa0eb7f16>] kvm_put_kvm+0x106/0x1c0 [kvm]
[2536674.524334] [<ffffffffa0eb8031>] kvm_vm_release+0x21/0x30 [kvm]
[2536674.524337] [<ffffffff811c2f34>] __fput+0xe4/0x260
[2536674.524340] [<ffffffff811c30fe>] ____fput+0xe/0x10
[2536674.524343] [<ffffffff8108aa2c>] task_work_run+0xac/0xd0
[2536674.524346] [<ffffffff8106be10>] do_exit+0x2b0/0xa60
[2536674.524349] [<ffffffff8106c63f>] do_group_exit+0x3f/0xa0
[2536674.524352] [<ffffffff8107c740>] get_signal_to_deliver+0x1d0/0x6f0
[2536674.524356] [<ffffffff81014418>] do_signal+0x48/0xa10
[2536674.524359] [<ffffffff8161449e>] ? SYSC_sendto+0x17e/0x1c0
[2536674.524362] [<ffffffff811d49b0>] ? do_vfs_ioctl+0x2e0/0x4c0
[2536674.524365] [<ffffffff81014e49>] do_notify_resume+0x69/0xb0
[2536674.524368] [<ffffffff8173b31a>] int_signal+0x12/0x17
But I have no idea if it is a bug or not. It's kind of printed with
pr_err. But on the other hand it is not printed WARN or BUG. Who
knows?..
On Mon, 12 Dec 2016, Andrey Konovalov wrote:
> Hi!
>
> While running the syzkaller fuzzer I've got the following error report.
>
> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
>
> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
> gadgetfs: disconnected
> sysfs: cannot create duplicate filename
> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: usb_hub_wq hub_event
> ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c
> ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510
> ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
> [<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179
> [<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
> [<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565
> [<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30
> [<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59
> [< inline >] create_dir lib/kobject.c:71
> [<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229
> [< inline >] kobject_add_varg lib/kobject.c:366
> [<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411
> [<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088
> [<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206
> [<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260
> drivers/usb/core/endpoint.c:195
> [<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200
> drivers/usb/core/message.c:1030
> [<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0
> drivers/usb/core/message.c:1937
Hi, Andrey:
Please check whether the patch below fixes this problem.
Alan Stern
Index: usb-4.x/drivers/usb/core/config.c
===================================================================
--- usb-4.x.orig/drivers/usb/core/config.c
+++ usb-4.x/drivers/usb/core/config.c
@@ -234,6 +234,16 @@ static int usb_parse_endpoint(struct dev
if (ifp->desc.bNumEndpoints >= num_ep)
goto skip_to_next_endpoint_or_interface_descriptor;
+ /* Check for duplicate endpoint addresses */
+ for (i = 0; i < ifp->desc.bNumEndpoints; ++i) {
+ if (ifp->endpoint[i].desc.bEndpointAddress ==
+ d->bEndpointAddress) {
+ dev_warn(ddev, "config %d interface %d altsetting %d has a duplicate endpoint with address 0x%X, skipping\n",
+ cfgno, inum, asnum, d->bEndpointAddress);
+ goto skip_to_next_endpoint_or_interface_descriptor;
+ }
+ }
+
endpoint = &ifp->endpoint[ifp->desc.bNumEndpoints];
++ifp->desc.bNumEndpoints;
On Fri, Dec 16, 2016 at 7:01 PM, Alan Stern <[email protected]> wrote:
> On Mon, 12 Dec 2016, Andrey Konovalov wrote:
>
>> Hi!
>>
>> While running the syzkaller fuzzer I've got the following error report.
>>
>> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2).
>>
>> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
>> gadgetfs: disconnected
>> sysfs: cannot create duplicate filename
>> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: usb_hub_wq hub_event
>> ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c
>> ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510
>> ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000
>> Call Trace:
>> [< inline >] __dump_stack lib/dump_stack.c:15
>> [<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
>> [<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179
>> [<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
>> [<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565
>> [<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30
>> [<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59
>> [< inline >] create_dir lib/kobject.c:71
>> [<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229
>> [< inline >] kobject_add_varg lib/kobject.c:366
>> [<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411
>> [<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088
>> [<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206
>> [<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260
>> drivers/usb/core/endpoint.c:195
>> [<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200
>> drivers/usb/core/message.c:1030
>> [<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0
>> drivers/usb/core/message.c:1937
>
> Hi, Andrey:
>
> Please check whether the patch below fixes this problem.
Hi Alan,
Been testing with your patch for the last day, haven't seen any more
reports or other issues.
Tested-by: Andrey Konovalov <[email protected]>
Thanks!
>
> Alan Stern
>
>
>
> Index: usb-4.x/drivers/usb/core/config.c
> ===================================================================
> --- usb-4.x.orig/drivers/usb/core/config.c
> +++ usb-4.x/drivers/usb/core/config.c
> @@ -234,6 +234,16 @@ static int usb_parse_endpoint(struct dev
> if (ifp->desc.bNumEndpoints >= num_ep)
> goto skip_to_next_endpoint_or_interface_descriptor;
>
> + /* Check for duplicate endpoint addresses */
> + for (i = 0; i < ifp->desc.bNumEndpoints; ++i) {
> + if (ifp->endpoint[i].desc.bEndpointAddress ==
> + d->bEndpointAddress) {
> + dev_warn(ddev, "config %d interface %d altsetting %d has a duplicate endpoint with address 0x%X, skipping\n",
> + cfgno, inum, asnum, d->bEndpointAddress);
> + goto skip_to_next_endpoint_or_interface_descriptor;
> + }
> + }
> +
> endpoint = &ifp->endpoint[ifp->desc.bNumEndpoints];
> ++ifp->desc.bNumEndpoints;
>
>