2016-10-17 14:16:30

by Oliver Neukum

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Thu, 2016-09-08 at 14:58 +0200, Wim Osterholt wrote:
> On Thu, Sep 08, 2016 at 02:20:38PM +0200, Oliver Neukum wrote:
> > >
> > > The oops tells things that I didn't all write down, but it says
> > > null pointer dereference at 00000246
> >
> > That is the important part. I am sorry, but without the oops
> > nobody can help you. Please capture it

Hi,

I got one of those devices. However, I don't get a crash.
Could you please give me instructions on how you trigger it?

Regards
Oliver



2016-10-17 15:20:56

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Mon, Oct 17, 2016 at 04:10:45PM +0200, Oliver Neukum wrote:
> Hi,
>
> I got one of those devices. However, I don't get a crash.
> Could you please give me instructions on how you trigger it?

That's not too hard, just plug it in. :-)

However you must have set cdc_acm in your kernel, or availabe as a module.
It happens on all my machines on kernels 4.8 and 4.9.
Now, all my kernel configs will differ a bit, but must have something
peculiar in common. Or you've received a totally different device.

Here's one config at http://webserver.djo.tudelft.nl/.config-4.8.1

Many options are inherited by 'make oldconfig' from version to version,
without me knowing what it all means. So maybe it's just a weird combination
of options then?


Regards, Wim.


2016-10-18 12:24:30

by Oliver Neukum

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Mon, 2016-10-17 at 17:20 +0200, Wim Osterholt wrote:
> On Mon, Oct 17, 2016 at 04:10:45PM +0200, Oliver Neukum wrote:
> > Hi,
> >
> > I got one of those devices. However, I don't get a crash.
> > Could you please give me instructions on how you trigger it?
>
> That's not too hard, just plug it in. :-)
>
> However you must have set cdc_acm in your kernel, or availabe as a module.
> It happens on all my machines on kernels 4.8 and 4.9.
> Now, all my kernel configs will differ a bit, but must have something
> peculiar in common. Or you've received a totally different device.
>
> Here's one config at http://webserver.djo.tudelft.nl/.config-4.8.1
>
> Many options are inherited by 'make oldconfig' from version to version,
> without me knowing what it all means. So maybe it's just a weird combination
> of options then?

Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: new full-speed USB device number 10 using xhci_hcd
Oct 18 14:05:07 linux-dtbq.site mtp-probe[2583]: checking bus 1, device 10: "/sys/devices/pci0000:00/0000:00:14.0/usb1/1-9"
Oct 18 14:05:07 linux-dtbq.site mtp-probe[2583]: bus: 1, device: 10 was not an MTP device
Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: New USB device found, idVendor=0572, idProduct=1340
Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: Product: USB Modem
Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: Manufacturer: Conexant
Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: SerialNumber: 12345678
Oct 18 14:05:07 linux-dtbq.site kernel: cdc_acm 1-9:1.0: ttyACM0: USB ACM device
Oct 18 14:05:07 linux-dtbq.site kernel: usbcore: registered new interface driver cdc_acm
Oct 18 14:05:07 linux-dtbq.site kernel: cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
Oct 18 14:05:11 linux-dtbq.site ModemManager[901]: <info> Creating modem with plugin 'Generic' and '1' ports
Oct 18 14:05:11 linux-dtbq.site ModemManager[901]: <info> Modem for device at '/sys/devices/pci0000:00/0000:00:14.0/usb1/1-9' successfully created
Oct 18 14:05:11 linux-dtbq.site ModemManager[901]: <warn> Modem couldn't be initialized: couldn't load current capabilities: Failed to determine modem capabilities.
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <warn> (ttyACM0): failed to look up interface index
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <info> (ttyACM0): device state change: unmanaged -> unavailable (reason 'none') [10 20 0]
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <info> (ttyACM0): deactivating device (reason 'none') [0]
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <info> (ttyACM0): modem state 'unknown'
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <info> (ttyACM0): new Broadband device (driver: 'cdc_acm' ifindex: 0)
Oct 18 14:05:11 linux-dtbq.site NetworkManager[945]: <info> (ttyACM0): exported as /org/freedesktop/NetworkManager/Devices/3

It definitely does not crash and is probed and your .config is not
extremely unusual.
I am afraid unless you test the last patch I sent we will not make
progress. Something odd is going on.

Regards
Oliver


2016-10-18 14:51:43

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Oct 18, 2016 at 02:18:43PM +0200, Oliver Neukum wrote:
> Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: Manufacturer: Conexant
> Oct 18 14:05:07 linux-dtbq.site kernel: usb 1-9: SerialNumber: 12345678

With that unique serial number it must be that very device. :-)

> It definitely does not crash and is probed and your .config is not
> extremely unusual.
> I am afraid unless you test the last patch I sent we will not make
> progress. Something odd is going on.

Whell, I DID test that patch and it already crashed before it could print
anything. That's why the output I sent you looked the same.

Once again, this time on 4.9-rc1.
Applied your patch 0001-CDC-ACM-more-paranoid-debugging to cdc_acm.c .

Did
> > dmesg -c
> > echo 9 > /proc/sysrq-trigger
> > modprobe cdc_acm
> > echo "module cdc_acm +mpf" > /sys/kernel/debug/dynamic_debug/control
> >
> > [plug your device in]
> >
> > and provide the full output of dmesg after that.

Got
[ 765.409057] sysrq: SysRq : Changing Loglevel
[ 765.416465] sysrq: Loglevel set to 9
[ 778.299271] usbcore: registered new interface driver cdc_acm
[ 778.301921] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[ 833.204100] usb 6-1: new full-speed USB device number 2 using uhci_hcd
[ 833.411088] usb 6-1: New USB device found, idVendor=0572, idProduct=1340
[ 833.412127] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 833.416129] usb 6-1: Product: USB Modem
[ 833.420123] usb 6-1: Manufacturer: Conexant
[ 833.420126] usb 6-1: SerialNumber: 12345678
[ 833.473854] cdc_acm:acm_probe: cdc_acm 6-1:1.0: interfaces are valid
[ 833.473876] BUG: unable to handle kernel NULL pointer dereference at 00000249
[ 833.473882] IP: [<e08fca6e>] acm_probe+0x540/0xd00 [cdc_acm]
[ 833.473885] *pde = 00000000
[ 833.473887] Oops: 0000 [#1] SMP
[ 833.473925] Modules linked in: cdc_acm nouveau video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit cfg80211 rfkill binfmt_misc svgalib_helper(O) snd_pcm_oss snd_mixer_oss fbcon bitblit softcursor font tileblit sr9700 dm9601 snd_hda_codec_generic usbnet usb_storage snd_hda_intel mii snd_hda_codec tg3 snd_hwdep snd_hda_core ptp pps_core snd_pcm libphy gpio_ich snd_timer firmware_class lpc_ich pcspkr ppdev snd ohci_pci mfd_core ohci_hcd floppy wmi uhci_hcd soundcore parport_pc acpi_cpufreq ehci_pci parport ehci_hcd processor button
[ 833.473928] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-rc1 #1
[ 833.473930] Hardware name: Hewlett-Packard HP xw4300 Workstation/0A00h, BIOS 786D3 v01.08 03/10/2006
[ 833.473935] Workqueue: usb_hub_wq hub_event
[ 833.473937] task: df4e15c0 task.stack: df4f4000
[ 833.473939] EIP: 0060:[<e08fca6e>] EFLAGS: 00010202 CPU: 0
[ 833.473942] EIP is at acm_probe+0x540/0xd00 [cdc_acm]
[ 833.473944] EAX: 00000246 EBX: dc4b2800 ECX: e08fe594 EDX: 00000000
[ 833.473945] ESI: 00000100 EDI: 00000000 EBP: df4f5c18 ESP: df4f5b80
[ 833.473947] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 833.473949] CR0: 80050033 CR2: 00000249 CR3: 1c8c4000 CR4: 00000690
[ 833.473950] Stack:
[ 833.473956] 00003a20 00003de7 0000000f df4a9d50 00000000 00000000 00000010 00000040
[ 833.473960] 00000080 00000246 dee5f000 d9614d80 d960e070 00000001 d2aee100 d960e000
[ 833.473965] d2aee138 dee5f400 dee5f000 00000000 c82931b0 00000004 00000246 df4f5c00
[ 833.473966] Call Trace:
[ 833.473975] [<c04d43f0>] ? __mutex_unlock_slowpath+0xf4/0xfc
[ 833.473978] [<c03d071c>] ? usb_probe_interface+0x17b/0x1f6
[ 833.473980] [<c03d071c>] ? usb_probe_interface+0x17b/0x1f6
[ 833.473984] [<c036396f>] ? driver_probe_device+0x17b/0x30e
[ 833.473986] [<c036396f>] ? driver_probe_device+0x17b/0x30e
[ 833.473989] [<c03620b4>] ? bus_for_each_drv+0x59/0x68
[ 833.473991] [<c03620b4>] ? bus_for_each_drv+0x59/0x68
[ 833.473993] [<c036371d>] ? __device_attach+0x91/0x105
[ 833.473996] [<c0363c03>] ? driver_allows_async_probing+0x2f/0x2f
[ 833.473998] [<c0362d3c>] ? bus_probe_device+0x27/0x6b
[ 833.474000] [<c0362d3c>] ? bus_probe_device+0x27/0x6b
[ 833.474002] [<c03614c4>] ? device_add+0x28d/0x4c0
[ 833.474006] [<c03cf081>] ? usb_set_configuration+0x594/0x5d7
[ 833.474008] [<c03cf081>] ? usb_set_configuration+0x594/0x5d7
[ 833.474012] [<c03d6878>] ? generic_probe+0x3b/0x67
[ 833.474014] [<c03d6878>] ? generic_probe+0x3b/0x67
[ 833.474016] [<c03d0588>] ? usb_probe_device+0x49/0x62
[ 833.474017] [<c03d053f>] ? usb_suspend+0xcd/0xcd
[ 833.474020] [<c036396f>] ? driver_probe_device+0x17b/0x30e
[ 833.474022] [<c036396f>] ? driver_probe_device+0x17b/0x30e
[ 833.474024] [<c03620b4>] ? bus_for_each_drv+0x59/0x68
[ 833.474026] [<c03620b4>] ? bus_for_each_drv+0x59/0x68
[ 833.474028] [<c036371d>] ? __device_attach+0x91/0x105
[ 833.474031] [<c0363c03>] ? driver_allows_async_probing+0x2f/0x2f
[ 833.474033] [<c0362d3c>] ? bus_probe_device+0x27/0x6b
[ 833.474035] [<c0362d3c>] ? bus_probe_device+0x27/0x6b
[ 833.474037] [<c03614c4>] ? device_add+0x28d/0x4c0
[ 833.474041] [<c035c1f7>] ? add_device_randomness+0x84/0x9c
[ 833.474043] [<c03c7508>] ? usb_new_device+0x29d/0x3b5
[ 833.474045] [<c03c7508>] ? usb_new_device+0x29d/0x3b5
[ 833.474048] [<c03c8c37>] ? hub_event+0xb32/0xed8
[ 833.474050] [<c03c8c37>] ? hub_event+0xb32/0xed8
[ 833.474052] [<c03c7ff4>] ? usb_remote_wakeup+0x6f/0x7d
[ 833.474056] [<c0148b8f>] ? process_one_work+0x174/0x2bc
[ 833.474058] [<c0148b8f>] ? process_one_work+0x174/0x2bc
[ 833.474061] [<c014916e>] ? worker_thread+0x22c/0x2f7
[ 833.474063] [<c0148f42>] ? rescuer_thread+0x242/0x242
[ 833.474065] [<c014c5ea>] ? kthread+0xa5/0xaa
[ 833.474067] [<c014c545>] ? kthread_park+0x4c/0x4c
[ 833.474070] [<c04d5f83>] ? ret_from_fork+0x1b/0x28
[ 833.474096] Code: 14 89 83 b4 04 00 00 8b 45 90 89 43 04 8b 45 ac 89 43 08 8b 85 7c ff ff ff 89 83 c0 04 00 00 8b 45 a4 89 03 8b 45 c0 85 c0 74 0a <0f> b6 40 03 89 83 c8 04 00 00 f6 45 9c 04 74 07 83 a3 c8 04 00
[ 833.474100] EIP: [<e08fca6e>]
[ 833.474101] acm_probe+0x540/0xd00 [cdc_acm]
[ 833.474101] SS:ESP 0068:df4f5b80
[ 833.474102] CR2: 0000000000000249
[ 833.474105] ---[ end trace c01a346ab38875ab ]---
[ 833.474138] BUG: unable to handle kernel paging request at ffffffec
[ 833.474142] IP: [<c014cd62>] kthread_data+0xf/0x13
[ 833.474144] *pde = 0077e067 *pte = 00000000
[ 833.474145]
[ 833.474147] Oops: 0000 [#2] SMP
[ 833.474177] Modules linked in: cdc_acm nouveau video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit cfg80211 rfkill binfmt_misc svgalib_helper(O) snd_pcm_oss snd_mixer_oss fbcon bitblit softcursor font tileblit sr9700 dm9601 snd_hda_codec_generic usbnet usb_storage snd_hda_intel mii snd_hda_codec tg3 snd_hwdep snd_hda_core ptp pps_core snd_pcm libphy gpio_ich snd_timer firmware_class lpc_ich pcspkr ppdev snd ohci_pci mfd_core ohci_hcd floppy wmi uhci_hcd soundcore parport_pc acpi_cpufreq ehci_pci parport ehci_hcd processor button
[ 833.474180] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G D O 4.9.0-rc1 #1
[ 833.474181] Hardware name: Hewlett-Packard HP xw4300 Workstation/0A00h, BIOS 786D3 v01.08 03/10/2006
[ 833.474189] task: df4e15c0 task.stack: df4f4000
[ 833.474191] EIP: 0060:[<c014cd62>] EFLAGS: 00010002 CPU: 0
[ 833.474193] EIP is at kthread_data+0xf/0x13
[ 833.474195] EAX: 00000000 EBX: df4e15c0 ECX: dfb95050 EDX: df4e15c0
[ 833.474196] ESI: df4e1874 EDI: df4e15c0 EBP: df4f5f4c ESP: df4f5f48
[ 833.474198] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 833.474200] CR0: 80050033 CR2: 00000014 CR3: 1c8c4000 CR4: 00000690
[ 833.474201] Stack:
[ 833.474206] c0149297 df4f5f70 c04d289f df401d80 c01e3800 dfb94b80 00000001 df4e15c0
[ 833.474211] df4f5f98 df4e15c0 df4f5f7c c0153935 df4f5d50 df4f5fac c0139f19 01000000
[ 833.474215] df4e17ec df4e0000 00000001 00000000 df4f5f98 df4f5f98 00000009 df4f6000
[ 833.474216] Call Trace:
[ 833.474219] [<c0149297>] ? wq_worker_sleeping+0xd/0x75
[ 833.474222] [<c04d289f>] ? __schedule+0xb7/0x3d0
[ 833.474225] [<c01e3800>] ? kmem_cache_free+0x73/0xf3
[ 833.474228] [<c0153935>] ? do_task_dead+0x35/0x37
[ 833.474232] [<c0139f19>] ? do_exit+0x735/0x75b
[ 833.474234] [<c04d6f99>] ? rewind_stack_do_exit+0x11/0x13
[ 833.474261] Code: 8d 64 12 4e c0 8d 0c 95 00 00 00 00 29 cb b9 02 00 00 00 89 da 5b 5d e9 db fd ff ff 55 89 e5 3e 8d 74 26 00 8b 80 84 02 00 00 5d <8b> 40 ec c3 55 89 e5 52 3e 8d 74 26 00 b9 04 00 00 00 8b 90 84
[ 833.474264] EIP: [<c014cd62>]
[ 833.474265] kthread_data+0xf/0x13
[ 833.474265] SS:ESP 0068:df4f5f48
[ 833.474266] CR2: 00000000ffffffec
[ 833.474268] ---[ end trace c01a346ab38875ac ]---
[ 833.474269] Fixing recursive fault but reboot is needed!

> When I decode it, seems to crash in acm_alloc_minor() which does not make
> sense. It is likely that our kernels or compilers are a bit different.
> Could you please call gdb on your kernel module cdc-acm.ko
>and do:
>
> list *(acm_probe+0x4ee)

I guess you'd want this time list *(acm_probe+0x540) because 540 is now
reported at [ 833.473882] IP: [<e08fca6e>] acm_probe+0x540/0xd00 [cdc_acm].

# gdb ./cdc-acm.ko
GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./cdc-acm.ko...done.
(gdb) list *(acm_probe+0x540)
0x1a92 is in acm_probe (drivers/usb/class/cdc-acm.c:1340).
1335 acm->control = control_interface;
1336 acm->data = data_interface;
1337 acm->minor = minor;
1338 acm->dev = usb_dev;
1339 if (h.usb_cdc_acm_descriptor)
1340 acm->ctrl_caps = h.usb_cdc_acm_descriptor->bmCapabilities;
1341 if (quirks & NO_CAP_LINE)
1342 acm->ctrl_caps &= ~USB_CDC_CAP_LINE;
1343 acm->ctrlsize = ctrlsize;
1344 acm->readsize = readsize;
(gdb)
quit


Curiously enough, when I do that on 0x4ee, then I seen something that
reminds me of your patches:

(gdb) list *(acm_probe+0x4ee)
0x1a40 is in acm_probe (drivers/usb/class/cdc-acm.c:1332).
1327 WARN_ON(!epctrl);
1328 ctrlsize = usb_endpoint_maxp(epctrl);
1329 WARN_ON(!epread);
1330 readsize = usb_endpoint_maxp(epread) *
1331 (quirks == SINGLE_RX_URB ? 1 : 2);
1332 acm->combined_interfaces = combined_interfaces;
1333 WARN_ON(!epwrite);
1334 acm->writesize = usb_endpoint_maxp(epwrite) * 20;
1335 acm->control = control_interface;
1336 acm->data = data_interface;
(gdb)
quit


Regards, Wim.

2016-11-05 13:34:25

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Oct 18, 2016 at 02:18:43PM +0200, Oliver Neukum wrote:
> On Mon, 2016-10-17 at 17:20 +0200, Wim Osterholt wrote:
> > On Mon, Oct 17, 2016 at 04:10:45PM +0200, Oliver Neukum wrote:
> > > Hi,
> > >
> > > I got one of those devices. However, I don't get a crash.
> > > Could you please give me instructions on how you trigger it?
> >
> > That's not too hard, just plug it in. :-)

> It definitely does not crash and is probed and your .config is not
> extremely unusual.

Hmmm.

> ... Something odd is going on.

You didn't try it on many machines, did you?
The latest install on a 'new' laptop (Dell latitude D610) did also crash.
For if it matters, they all have Intel chipset here.
Crashes now on five machines.

An worn-out Dell laptop (Inspiron 510m) suddenly got stuck in a reboot loop
for 4.7.10 and 4.9-rc3, but a 4.8-rc4 kept running. Even more miraculously,
it didn't crash on inserting the modem.
I could even compile a 4.9-rc3 that didn't crash on that machine.
The effect was even portable!

I now have a crashing and a non-crashing 4.9-rc3 kernel.
You can find the configs here:
http://webserver.djo.tudelft.nl/.config-4.9-rc3.notOK
http://webserver.djo.tudelft.nl/.config-4.9-rc3.OK

They are very different, so it will take a lot of time to eliminate the
options one by one.
So if you have an idea of which options, or combination of options are evil,
I'd like to hear.


Regards, Wim.

2016-11-15 00:16:25

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Oct 18, 2016 at 02:18:43PM +0200, Oliver Neukum wrote:

> It definitely does not crash and is probed and your .config is not
> extremely unusual.

Hmmm.

> ... Something odd is going on.

Whell, yes.
The only thing that appears you'll have to do is unset 'CONFIG_SMP'.

My machines didn't have the luxury of multicore processors (until recently),
so there never has been any reason to deliberately switch these options on!

In the process of searching, many options may have changed. The crash/OOPS
has now mitigated into just a WARNING with a call trace.
(Or it could be a totally different bug?)
After the call trace the device is working normally and a shutdown
completes to the end now.
That is with the config given here:
http://webserver.djo.tudelft.nl/.config-4.9-rc4.OK (CONFIG_SMP=y)
http://webserver.djo.tudelft.nl/WARNING-4.9-rc4 (call trace for C_S unset)

Tests on other machines with (slightly) different configs all seem to
confirm that the problems are gone when CONFIG_SMP is set.

Regards, Wim.


2016-11-15 11:13:06

by Oliver Neukum

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, 2016-11-15 at 01:16 +0100, Wim Osterholt wrote:

Hi,

> Whell, yes.
> The only thing that appears you'll have to do is unset 'CONFIG_SMP'.

OK. I haven't tested that, nor would I ever considered it.

> My machines didn't have the luxury of multicore processors (until recently),
> so there never has been any reason to deliberately switch these options on!
>
> In the process of searching, many options may have changed. The crash/OOPS
> has now mitigated into just a WARNING with a call trace.
> (Or it could be a totally different bug?)
> After the call trace the device is working normally and a shutdown
> completes to the end now.
> That is with the config given here:
> http://webserver.djo.tudelft.nl/.config-4.9-rc4.OK (CONFIG_SMP=y)
> http://webserver.djo.tudelft.nl/WARNING-4.9-rc4 (call trace for C_S unset)
>
> Tests on other machines with (slightly) different configs all seem to
> confirm that the problems are gone when CONFIG_SMP is set.

OK, something extremely strange is going on. And I think it is time to get
the big hammer out. I made an extremely stupid debugging patch. Could
you test with it?

Regards
Oliver


Attachments:
0001-acm-insane-debugging.patch (5.32 kB)

2016-11-15 11:26:06

by poma

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On 15.11.2016 01:16, Wim Osterholt wrote:
> On Tue, Oct 18, 2016 at 02:18:43PM +0200, Oliver Neukum wrote:
>
>> It definitely does not crash and is probed and your .config is not
>> extremely unusual.
>
> Hmmm.
>
>> ... Something odd is going on.
>
> Whell, yes.
> The only thing that appears you'll have to do is unset 'CONFIG_SMP'.
>
> My machines didn't have the luxury of multicore processors (until recently),
> so there never has been any reason to deliberately switch these options on!
>
> In the process of searching, many options may have changed. The crash/OOPS
> has now mitigated into just a WARNING with a call trace.
> (Or it could be a totally different bug?)
> After the call trace the device is working normally and a shutdown
> completes to the end now.
> That is with the config given here:
> http://webserver.djo.tudelft.nl/.config-4.9-rc4.OK (CONFIG_SMP=y)
> http://webserver.djo.tudelft.nl/WARNING-4.9-rc4 (call trace for C_S unset)
>
> Tests on other machines with (slightly) different configs all seem to
> confirm that the problems are gone when CONFIG_SMP is set.
>
> Regards, Wim.
>
>

Try retest with mainline 4.9-rc5,
CONFIG_SMP was not crucial[1].

$ grep CONFIG_SMP /boot/config-4.9.0-0.rc5.git0.1.fc26.x86_64*
/boot/config-4.9.0-0.rc5.git0.1.fc26.x86_64:CONFIG_SMP=y
/boot/config-4.9.0-0.rc5.git0.1.fc26.x86_64+debug:CONFIG_SMP=y

[1] https://www.spinics.net/lists/linux-usb/msg148852.html


2016-11-15 13:29:37

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Nov 15, 2016 at 12:26:00PM +0100, poma wrote:
> > In the process of searching, many options may have changed. The crash/OOPS
> > has now mitigated into just a WARNING with a call trace.
> > (Or it could be a totally different bug?)
> >
> > Tests on other machines with (slightly) different configs all seem to
> > confirm that the problems are gone when CONFIG_SMP is set.
> >
>
> Try retest with mainline 4.9-rc5,
> CONFIG_SMP was not crucial[1].

I did also test 4.9-rc5 and it behaves like all the rest (since 4.8).
My problem is gone when CONFIG_SMP is set.

That doesn't mean that there are no extra bugs here, dependant on the
presence or absence of other options.

> [1] https://www.spinics.net/lists/linux-usb/msg148852.html

I experience a sliding scale.
With debug it crashes immediately. It may crash later on (even at shutdown
time). It's not even an oops but a warning. In the end it happens to just
work.


Regards, Wim.

2016-11-16 12:40:29

by Oliver Neukum

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, 2016-11-15 at 14:29 +0100, Wim Osterholt wrote:

> I experience a sliding scale.
> With debug it crashes immediately. It may crash later on (even at shutdown
> time). It's not even an oops but a warning. In the end it happens to just
> work.

This is very odd. We need to know where it crashes. Please try the
insane debug patch I posted.

Regards
Oliver


2016-11-16 15:08:03

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Wed, Nov 16, 2016 at 01:34:30PM +0100, Oliver Neukum wrote:
>
> This is very odd. We need to know where it crashes. Please try the
> insane debug patch I posted.

A bit of patience please. Yesterday I hadn't the modem at hand.

Groeten, Wim.

2016-11-17 01:57:38

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Wed, Nov 16, 2016 at 04:07:57PM +0100, Wim Osterholt wrote:
> A bit of patience please. Yesterday I hadn't the modem at hand.

Whell, I lost track of what happens where with which config file..
Confusion about the bug not appearing an too many configs with SMP set
where I'm sure the machine(s) crashed on it.

The intention was to now use just one machine, to avoid confusion.
This is not going to work, due to problems with my video card.
(Crashes when it writes text beyond framebuffer space?)
VGA mode keeps working now.. We'll see..

Okay, a virgin start then.
A new 4.8.8 downloaded. A config from 4.7.9. Make oldconfig.
Accept all defaults for the new options.
CONFIG_SMP was set. Made two kernels, with and without SMP set.
Both kernels behaved the same: an OOPS at cdc_acm loading.
Fortunately the bug is still there.
In this very config the SMP setting does not seem to matter.

Now a retry of 4.9-rc5. I take the config of 4.8.8 and accept
the default for the new options.
SMP set. No call trace appears.
For completeness I should also try with SMP unset. That is for tomorrow
then.

BTW, your latest patch was not yet applied here. At work where I had no oops
today, it gave an output count from 0 to 41, looping (30-39) through 16
buffers. More tomorrow. Need urgent sleep now.

Regards, Wim.

2016-11-17 09:14:41

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Thu, Nov 17, 2016 at 02:57:33AM +0100, Wim Osterholt wrote:
> Now a retry of 4.9-rc5. I take the config of 4.8.8 and accept
> the default for the new options.
> SMP set. No call trace appears.
> For completeness I should also try with SMP unset. That is for tomorrow
> then.

With CONFIG_SMP unset nothing goes wrong here either.
It looks like it has been fixed in 4.9-rc5, but I should also double check
several conbinations on my (slower) laptop.

Regards, Wim.

2016-11-17 17:11:42

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Thu, Nov 17, 2016 at 10:14:34AM +0100, Wim Osterholt wrote:
> > For completeness I should also try with SMP unset. That is for tomorrow
> > then.
>
> With CONFIG_SMP unset nothing goes wrong here either.
> It looks like it has been fixed in 4.9-rc5, but I should also double check
> several conbinations on my (slower) laptop.

Nov 17 15:07:49 localhost kernel: usb 4-1: new full-speed USB device number 2 using uhci_hcd
Nov 17 15:07:49 localhost kernel: usb 4-1: New USB device found, idVendor=0572, idProduct=1340
Nov 17 15:07:49 localhost kernel: usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Nov 17 15:07:49 localhost kernel: usb 4-1: Product: USB Modem
Nov 17 15:07:49 localhost kernel: usb 4-1: Manufacturer: Conexant
Nov 17 15:07:49 localhost kernel: usb 4-1: SerialNumber: 12345678
Nov 17 15:07:50 localhost mtp-probe[12956]: checking bus 4, device 2: "/sys/devices/pci0000:00/0000:00:1d.2/usb4/4-1"
Nov 17 15:07:50 localhost mtp-probe[12956]: bus: 4, device: 2 was not an MTP device
Nov 17 15:07:51 localhost kernel: Check point 1
Nov 17 15:07:51 localhost kernel: Check point 2
Nov 17 15:07:51 localhost kernel: Check point 3
Nov 17 15:07:51 localhost kernel: Check point 4
Nov 17 15:07:51 localhost kernel: Check point 5
Nov 17 15:07:51 localhost kernel: Check point 6
Nov 17 15:07:51 localhost kernel: Check point 7
Nov 17 15:07:51 localhost kernel: Check point 8
Nov 17 15:07:51 localhost kernel: Check point 9
Nov 17 15:07:51 localhost kernel: Check point 10
Nov 17 15:07:51 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000249
Nov 17 15:07:51 localhost kernel: IP: [<e186ece2>] acm_probe+0x559/0xe53 [cdc_acm]
Nov 17 15:07:51 localhost kernel: *pde = 00000000
Nov 17 15:07:51 localhost kernel: Oops: 0000 [#1] SMP
...


You can get a few logs concatenated at
http://webserver.djo.tudelft.nl/insane488.logs

Usually the oops follows immediately after inserting the modem, but in one case
it took a few seconds extra before the oops. I think that is the one that
printed the full range of checkpoints.

On another 3 machines running 4.9-rc5 here everything went fine on insertion.
Switching back to 4.8.8 produced the bug with outputs similar to the above.

Now back to the laptops to see if I can get confirmation of having seen the
oops at 4.9-rc5 in some way. I'm not sure now.


Regards, Wim.

2016-11-21 13:25:41

by Oliver Neukum

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Thu, 2016-11-17 at 17:11 +0100, Wim Osterholt wrote:

> Nov 17 15:07:51 localhost kernel: Check point 10
> Nov 17 15:07:51 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000249
> Nov 17 15:07:51 localhost kernel: IP: [<e186ece2>] acm_probe+0x559/0xe53 [cdc_acm]
> Nov 17 15:07:51 localhost kernel: *pde = 00000000
> Nov 17 15:07:51 localhost kernel: Oops: 0000 [#1] SMP

I don't understand it, bit please test the attached patch
with dynamic debugging for cdc-acm and the kernel log level
at maximum. And please repost "lsusb -v" for your device.

Regards
Oliver


Attachments:
0001-CDC-ACM-debugging-for-parsed-descriptors.patch (1.09 kB)

2016-11-21 15:58:30

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Mon, Nov 21, 2016 at 02:19:32PM +0100, Oliver Neukum wrote:
> On Thu, 2016-11-17 at 17:11 +0100, Wim Osterholt wrote:
>
> > Nov 17 15:07:51 localhost kernel: Check point 10
> > Nov 17 15:07:51 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000249
> > Nov 17 15:07:51 localhost kernel: IP: [<e186ece2>] acm_probe+0x559/0xe53 [cdc_acm]
> > Nov 17 15:07:51 localhost kernel: *pde = 00000000
> > Nov 17 15:07:51 localhost kernel: Oops: 0000 [#1] SMP
>
> I don't understand it, bit please test the attached patch
> with dynamic debugging for cdc-acm and the kernel log level
> at maximum. And please repost "lsusb -v" for your device.

I didn't find traces of kernel-4.9-rc5 being ran on any of my laptops, so I
can't have seen a crash on rc5. It seems rc5 and rc6 is safe now.

I assume you want this on a crashing kernel, but I already removed the
sources. (Lack of space).
4.8.10 is now compiling, that was the fastest option. If that one doesn't
crash anymore I'll dig up 4.8.8 again.

lsusb -v:

Bus 004 Device 002: ID 0572:1340 Conexant Systems (Rockwell), Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 2 Communications
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0572 Conexant Systems (Rockwell), Inc.
idProduct 0x1340
bcdDevice 1.00
iManufacturer 1 Conexant
iProduct 2 USB Modem
iSerial 3 12345678
bNumConfigurations 2
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 73
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 128
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
CDC Header:
bcdCDC 1.10
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC ACM:
bmCapabilities 0x07
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
Country Selection:
iCountryCodeRelDate 4 04052004
wCountryCode 0x4803
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 96
bNumInterfaces 3
bConfigurationValue 2
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 100mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 128
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 10
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 10
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 10 CDC Data
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
CDC Header:
bcdCDC 1.10
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC ACM:
bmCapabilities 0x07
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
Country Selection:
iCountryCodeRelDate 4 04052004
wCountryCode 0x4803
Device Status: 0x0000
(Bus Powered)

2016-11-21 20:23:57

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Mon, Nov 21, 2016 at 04:58:25PM +0100, Wim Osterholt wrote:
>
> I didn't find traces of kernel-4.9-rc5 being ran on any of my laptops, so I
> can't have seen a crash on rc5. It seems rc5 and rc6 is safe now.

Neither 4.8.10, nor 4.8.9 show the bug.
It must be a bug ouside cdc_acm that they have fixed. (a late propagation of
the IRQ-penalty-bug-fix maybe?)

I'm rebuilding 4.8.8 now.

Groeten, Wim.

2016-11-21 23:49:57

by poma

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On 21.11.2016 21:23, Wim Osterholt wrote:
> On Mon, Nov 21, 2016 at 04:58:25PM +0100, Wim Osterholt wrote:
>>
>> I didn't find traces of kernel-4.9-rc5 being ran on any of my laptops, so I
>> can't have seen a crash on rc5. It seems rc5 and rc6 is safe now.
>
> Neither 4.8.10, nor 4.8.9 show the bug.
> It must be a bug ouside cdc_acm that they have fixed. (a late propagation of
> the IRQ-penalty-bug-fix maybe?)
>
> I'm rebuilding 4.8.8 now.
>
> Groeten, Wim.
>


After all the patching and testing I concluded the same,
breakage came and is gone outside drivers/usb/class/
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/?id=v4.9-rc5&id2=v4.9-rc4

2016-11-22 15:38:59

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Mon, Nov 21, 2016 at 02:19:32PM +0100, Oliver Neukum wrote:

> I don't understand it, bit please test the attached patch
> with dynamic debugging for cdc-acm and the kernel log level
> at maximum.

> diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
> index 6895f9e..f03b5db 100644
> --- a/drivers/usb/class/cdc-acm.c
> +++ b/drivers/usb/class/cdc-acm.c
> @@ -1188,6 +1188,12 @@ static int acm_probe(struct usb_interface *intf,
>
> cdc_parse_cdc_header(&h, intf, buffer, buflen);
> union_header = h.usb_cdc_union_desc;
> +
> + dev_dbg(&intf->dev, "Parsed device header\n");
> + dev_dbg(&intf->dev, "Union descriptor %p\n", h.usb_cdc_union_desc);
> + dev_dbg(&intf->dev, "ACM descriptor %p\n", h.usb_cdc_acm_descriptor);
> + dev_dbg(&intf->dev, "Country descriptor %p\n", h.usb_cdc_country_functional_desc);
> +
> cmgmd = h.usb_cdc_call_mgmt_descriptor;
> if (cmgmd)
> call_intf_num = cmgmd->bDataInterface;


On kernel 4.8.8 this crashes hard and produces over a serial link:

[ 156.842106] sysrq: SysRq : Changing Loglevel
[ 156.842110] sysrq: Loglevel set to 9
[ 156.947852] usbcore: registered new interface driver cdc_acm
[ 156.947854] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[ 161.176701] usb 4-1: new full-speed USB device number 2 using uhci_hcd
[ 161.383608] usb 4-1: New USB device found, idVendor=0572, idProduct=1340
[ 161.384707] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 161.388722] usb 4-1: Product: USB Modem
[ 161.392711] usb 4-1: Manufacturer: Conexant
[ 161.392714] usb 4-1: SerialNumber: 12345678
[ 161.397703] cdc_acm:acm_probe: cdc_acm 4-1:1.0: interfaces are valid
[ 161.397731] BUG: unable to handle kernel NULL pointer dereference at 00000249
[ 161.397740] IP: [<e086ad09>] acm_probe+0x580/0xd1e [cdc_acm]
[ 161.397742] *pde = 00000000
[ 161.397745] Oops: 0000 [#1] SMP
[ 161.397786] Modules linked in: cdc_acm radeon drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit fbcon bitblit softcursor font tileblit binfmt_misc snd_pcm_oss snd_mixer_oss usb_storage usbhid ipw2200 libipw lib80211 snd_intel8x0 cfg80211 snd_ac97_codec ac97_bus uhci_hcd snd_pcm ehci_pci snd_timer snd ehci_hcd rfkill usbcore soundcore via_rhine firmware_class ppdev pcspkr parport_pc mii lpc_ich parport fan usb_common acpi_cpufreq thermal mfd_core floppy button processor
[ 161.397790] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.8.8 #2
[ 161.397792] Hardware name: MEDIONPC MS-7048/MS-7048, BIOS 6.00 PG 02/12/2004
[ 161.397805] Workqueue: usb_hub_wq hub_event [usbcore]
[ 161.397807] task: df4c9500 task.stack: df4da000
[ 161.397810] EIP: 0060:[<e086ad09>] EFLAGS: 00010202 CPU: 0
[ 161.397813] EIP is at acm_probe+0x580/0xd1e [cdc_acm]
[ 161.397815] EAX: 00000246 EBX: dc27b000 ECX: e086c934 EDX: 00000000
[ 161.397817] ESI: 00000100 EDI: 00000000 EBP: df4dbc18 ESP: df4dbb80
[ 161.397819] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 161.397821] CR0: 80050033 CR2: 00000249 CR3: 1c45f000 CR4: 00000690
[ 161.397822] Stack:
[ 161.397828] 00003640 00003662 0000000e df491d50 00000000 00000000 00000010 00000040
[ 161.397835] 00000080 00000246 dd1fc540 decf5a00 dc468c70 00000001 df583a00 df583a38
[ 161.397841] dc468c00 decf5800 decf5a00 00000000 dc452ab0 00000004 00000246 df4dbc00
[ 161.397842] Call Trace:
[ 161.397853] [<c04bce4d>] ? __mutex_unlock_slowpath+0xf4/0xfc
[ 161.397862] [<e1e2e50d>] ? usb_probe_interface+0x17b/0x1f6 [usbcore]
[ 161.397870] [<e1e2e50d>] ? usb_probe_interface+0x17b/0x1f6 [usbcore]
[ 161.397877] [<c0366fe8>] ? driver_probe_device+0x17b/0x30e
[ 161.397880] [<c0366fe8>] ? driver_probe_device+0x17b/0x30e
[ 161.397883] [<c03656e2>] ? bus_for_each_drv+0x59/0x68
[ 161.397886] [<c03656e2>] ? bus_for_each_drv+0x59/0x68
[ 161.397890] [<c0366d96>] ? __device_attach+0x91/0x105
[ 161.397893] [<c036727c>] ? driver_allows_async_probing+0x2f/0x2f
[ 161.397896] [<c036636a>] ? bus_probe_device+0x27/0x6b
[ 161.397899] [<c036636a>] ? bus_probe_device+0x27/0x6b
[ 161.397902] [<c0364af0>] ? device_add+0x289/0x4be
[ 161.397911] [<e1e2ce72>] ? usb_set_configuration+0x5a6/0x5e9 [usbcore]
[ 161.397919] [<e1e2ce72>] ? usb_set_configuration+0x5a6/0x5e9 [usbcore]
[ 161.397928] [<e1e34664>] ? generic_probe+0x3b/0x67 [usbcore]
[ 161.397937] [<e1e34664>] ? generic_probe+0x3b/0x67 [usbcore]
[ 161.397945] [<e1e2e379>] ? usb_probe_device+0x49/0x62 [usbcore]
[ 161.397953] [<e1e2e330>] ? usb_suspend+0xcd/0xcd [usbcore]
[ 161.397957] [<c0366fe8>] ? driver_probe_device+0x17b/0x30e
[ 161.397960] [<c0366fe8>] ? driver_probe_device+0x17b/0x30e
[ 161.397963] [<c03656e2>] ? bus_for_each_drv+0x59/0x68
[ 161.397966] [<c03656e2>] ? bus_for_each_drv+0x59/0x68
[ 161.397969] [<c0366d96>] ? __device_attach+0x91/0x105
[ 161.397972] [<c036727c>] ? driver_allows_async_probing+0x2f/0x2f
[ 161.397976] [<c036636a>] ? bus_probe_device+0x27/0x6b
[ 161.397979] [<c036636a>] ? bus_probe_device+0x27/0x6b
[ 161.397982] [<c0364af0>] ? device_add+0x289/0x4be
[ 161.397985] [<c035f7e9>] ? add_device_randomness+0x84/0x9c
[ 161.397993] [<e1e2521a>] ? usb_new_device+0x29d/0x3b5 [usbcore]
[ 161.398001] [<e1e2521a>] ? usb_new_device+0x29d/0x3b5 [usbcore]
[ 161.398010] [<e1e26949>] ? hub_event+0xb32/0xed8 [usbcore]
[ 161.398017] [<e1e26949>] ? hub_event+0xb32/0xed8 [usbcore]
[ 161.398026] [<e1e25d06>] ? usb_remote_wakeup+0x6f/0x7d [usbcore]
[ 161.398031] [<c01484b7>] ? process_one_work+0x174/0x2bc
[ 161.398034] [<c01484b7>] ? process_one_work+0x174/0x2bc
[ 161.398037] [<c0148a93>] ? worker_thread+0x22c/0x2f6
[ 161.398040] [<c0148867>] ? rescuer_thread+0x23f/0x23f
[ 161.398043] [<c014be62>] ? kthread+0xa4/0xa9
[ 161.398046] [<c04be662>] ? ret_from_kernel_thread+0xe/0x24
[ 161.398049] [<c014bdbe>] ? kthread_create_on_node+0x101/0x101
[ 161.398085] Code: 14 89 83 b4 04 00 00 8b 45 94 89 43 04 8b 45 ac 89 43 08 8b 85 7c ff ff ff 89 83 c0 04 00 00 8b 45 a8 89 03 8b 45 c0 85 c0 74 0a <0f> b6 40 03 89 83 c8 04 00 00 f6 45 9c 04 74 07 83 a3 c8 04 00
[ 161.398091] EIP: [<e086ad09>] acm_probe+0x580/0xd1e [cdc_acm] SS:ESP 0068:df4dbb80
[ 161.398092] CR2: 0000000000000249
[ 161.398096] ---[ end trace da016e6d3520a331 ]---
[ 161.398152] BUG: unable to handle kernel paging request at ffffffec
[ 161.398156] IP: [<c014c304>] kthread_data+0xf/0x13
[ 161.398159] *pde = 00735067 *pte = 00000000
[ 161.398161] Oops: 0000 [#2] SMP
[ 161.398197] Modules linked in: cdc_acm radeon drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit fbcon bitblit softcursor font tileblit binfmt_misc snd_pcm_oss snd_mixer_oss usb_storage usbhid ipw2200 libipw lib80211 snd_intel8x0 cfg80211 snd_ac97_codec ac97_bus uhci_hcd snd_pcm ehci_pci snd_timer snd ehci_hcd rfkill usbcore soundcore via_rhine firmware_class ppdev pcspkr parport_pc mii lpc_ich parport fan usb_common acpi_cpufreq thermal mfd_core floppy button processor
[ 161.398200] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G D 4.8.8 #2
[ 161.398202] Hardware name: MEDIONPC MS-7048/MS-7048, BIOS 6.00 PG 02/12/2004
[ 161.398217] task: df4c9500 task.stack: df4da000
[ 161.398219] EIP: 0060:[<c014c304>] EFLAGS: 00010002 CPU: 0
[ 161.398221] EIP is at kthread_data+0xf/0x13
[ 161.398223] EAX: 00000000 EBX: df4dc000 ECX: dec92374 EDX: df4c9500
[ 161.398225] ESI: df4c97b4 EDI: dfbd0960 EBP: df4dbf48 ESP: df4dbf44
[ 161.398227] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 161.398229] CR0: 80050033 CR2: 00000014 CR3: 1c45f000 CR4: 00000690
[ 161.398231] Stack:
[ 161.398237] c0148bbb df4dbf6c c04bb2a4 df401d80 c01e2b00 df4c9500 00000001 df4dc000
[ 161.398244] df4dbd50 df4dbf98 df4dbf78 c04bb669 df4c9500 df4dbfac c0139827 df4c9888
[ 161.398250] 01000000 df4c972c df4c8000 00000001 00000000 df4dbf98 df4dbf98 00000009
[ 161.398251] Call Trace:
[ 161.398254] [<c0148bbb>] ? wq_worker_sleeping+0xd/0x75
[ 161.398259] [<c04bb2a4>] ? __schedule+0xcc/0x424
[ 161.398263] [<c01e2b00>] ? __slab_free+0x266/0x270
[ 161.398266] [<c04bb669>] ? schedule+0x6d/0x7a
[ 161.398270] [<c0139827>] ? do_exit+0x74d/0x775
[ 161.398274] [<c04bf679>] ? rewind_stack_do_exit+0x11/0x13
[ 161.398277] [<c014bdbe>] ? kthread_create_on_node+0x101/0x101
[ 161.398312] Code: 8d 44 90 4c c0 8d 0c 95 00 00 00 00 29 cb b9 02 00 00 00 89 da 5b 5d e9 f5 fd ff ff 55 89 e5 3e 8d 74 26 00 8b 80 84 02 00 00 5d <8b> 40 ec c3 55 89 e5 52 3e 8d 74 26 00 b9 04 00 00 00 8b 90 84
[ 161.398316] EIP: [<c014c304>] kthread_data+0xf/0x13 SS:ESP 0068:df4dbf44
[ 161.398318] CR2: 00000000ffffffec
[ 161.398320] ---[ end trace da016e6d3520a332 ]---
[ 161.398321] Fixing recursive fault but reboot is needed!


Regards, Wim.

2016-11-22 17:51:47

by Bjørn Mork

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

Wim Osterholt <[email protected]> writes:

> On Mon, Nov 21, 2016 at 02:19:32PM +0100, Oliver Neukum wrote:
>> On Thu, 2016-11-17 at 17:11 +0100, Wim Osterholt wrote:
>>
>> > Nov 17 15:07:51 localhost kernel: Check point 10
>> > Nov 17 15:07:51 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 00000249
>> > Nov 17 15:07:51 localhost kernel: IP: [<e186ece2>] acm_probe+0x559/0xe53 [cdc_acm]
>> > Nov 17 15:07:51 localhost kernel: *pde = 00000000
>> > Nov 17 15:07:51 localhost kernel: Oops: 0000 [#1] SMP
>>
>> I don't understand it, bit please test the attached patch
>> with dynamic debugging for cdc-acm and the kernel log level
>> at maximum. And please repost "lsusb -v" for your device.
>
> I didn't find traces of kernel-4.9-rc5 being ran on any of my laptops, so I
> can't have seen a crash on rc5. It seems rc5 and rc6 is safe now.
>
> I assume you want this on a crashing kernel, but I already removed the
> sources. (Lack of space).
> 4.8.10 is now compiling, that was the fastest option. If that one doesn't
> crash anymore I'll dig up 4.8.8 again.
>
> lsusb -v:
>
> Bus 004 Device 002: ID 0572:1340 Conexant Systems (Rockwell), Inc.
> Device Descriptor:
> bLength 18
> bDescriptorType 1
> bcdUSB 1.10
> bDeviceClass 2 Communications
> bDeviceSubClass 0
> bDeviceProtocol 0
> bMaxPacketSize0 64
> idVendor 0x0572 Conexant Systems (Rockwell), Inc.
> idProduct 0x1340
> bcdDevice 1.00
> iManufacturer 1 Conexant
> iProduct 2 USB Modem
> iSerial 3 12345678
> bNumConfigurations 2
> Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength 73
> bNumInterfaces 2
> bConfigurationValue 1
> iConfiguration 0
> bmAttributes 0x80
> (Bus Powered)
> MaxPower 100mA
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 0
> bAlternateSetting 0
> bNumEndpoints 1
> bInterfaceClass 2 Communications
> bInterfaceSubClass 2 Abstract (modem)
> bInterfaceProtocol 1 AT-commands (v.25ter)
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81 EP 1 IN
> bmAttributes 3
> Transfer Type Interrupt
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 128
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 0
> bNumEndpoints 2
> bInterfaceClass 10 CDC Data
> bInterfaceSubClass 0 Unused
> bInterfaceProtocol 0
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x82 EP 2 IN
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x02 EP 2 OUT
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> CDC Header:
> bcdCDC 1.10
> CDC Call Management:
> bmCapabilities 0x03
> call management
> use DataInterface
> bDataInterface 1
> CDC ACM:
> bmCapabilities 0x07
> sends break
> line coding and serial state
> get/set/clear comm features
> CDC Union:
> bMasterInterface 0
> bSlaveInterface 1
> Country Selection:
> iCountryCodeRelDate 4 04052004
> wCountryCode 0x4803
> Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength 96
> bNumInterfaces 3
> bConfigurationValue 2
> iConfiguration 0
> bmAttributes 0x80
> (Bus Powered)
> MaxPower 100mA
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 0
> bAlternateSetting 0
> bNumEndpoints 1
> bInterfaceClass 2 Communications
> bInterfaceSubClass 2 Abstract (modem)
> bInterfaceProtocol 1 AT-commands (v.25ter)
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81 EP 1 IN
> bmAttributes 3
> Transfer Type Interrupt
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 128
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 0
> bNumEndpoints 2
> bInterfaceClass 10 CDC Data
> bInterfaceSubClass 0 Unused
> bInterfaceProtocol 0
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x82 EP 2 IN
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 10
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x02 EP 2 OUT
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 10
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 2
> bAlternateSetting 0
> bNumEndpoints 2
> bInterfaceClass 10 CDC Data
> bInterfaceSubClass 0 Unused
> bInterfaceProtocol 0
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> CDC Header:
> bcdCDC 1.10
> CDC Call Management:
> bmCapabilities 0x03
> call management
> use DataInterface
> bDataInterface 1
> CDC ACM:
> bmCapabilities 0x07
> sends break
> line coding and serial state
> get/set/clear comm features
> CDC Union:
> bMasterInterface 0
> bSlaveInterface 1
> Country Selection:
> iCountryCodeRelDate 4 04052004
> wCountryCode 0x4803

No excuse for crashing of course, but that's one of the sickets
descriptor sets I've seen today. Who got the bright idea to put the
communication class functional descriptors on the data class interfaces?
And what's with the second data interface? How is the host supposed to
make any use of that when both(!) the CDC Union descriptors refer to
interface 0 and 1 only? Not that we can use those union descriptors for
much anyway since we have to guess the relationship between control and
data interface before we can get to it...

So I'm not surprised that this is unexpected by the driver. We just
need to figure out how to ignore the noise and carry on.

But looking at the driver, it looks like that is exactly what it should
do. This device has the NO_UNION_NORMAL quirk so normal probing is
skipped and we will just use interfaces 0 and 1. Which is the only sane
thing to do given the above mess...

Don't understand how it could crash.



Bjørn

2016-11-22 18:09:42

by Bjørn Mork

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

Wim Osterholt <[email protected]> writes:

> On Mon, Nov 21, 2016 at 02:19:32PM +0100, Oliver Neukum wrote:
>
>> I don't understand it, bit please test the attached patch
>> with dynamic debugging for cdc-acm and the kernel log level
>> at maximum.
>
>> diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
>> index 6895f9e..f03b5db 100644
>> --- a/drivers/usb/class/cdc-acm.c
>> +++ b/drivers/usb/class/cdc-acm.c
>> @@ -1188,6 +1188,12 @@ static int acm_probe(struct usb_interface *intf,
>>
>> cdc_parse_cdc_header(&h, intf, buffer, buflen);
>> union_header = h.usb_cdc_union_desc;
>> +
>> + dev_dbg(&intf->dev, "Parsed device header\n");
>> + dev_dbg(&intf->dev, "Union descriptor %p\n", h.usb_cdc_union_desc);
>> + dev_dbg(&intf->dev, "ACM descriptor %p\n", h.usb_cdc_acm_descriptor);
>> + dev_dbg(&intf->dev, "Country descriptor %p\n", h.usb_cdc_country_functional_desc);
>> +
>> cmgmd = h.usb_cdc_call_mgmt_descriptor;
>> if (cmgmd)
>> call_intf_num = cmgmd->bDataInterface;
>
>
> On kernel 4.8.8 this crashes hard and produces over a serial link:

Huh? That device shouldn't ever enter that code path AFAICS.
Unless.... you wouldn't happen to add a dynamic entry for this device,
would you? What's the output of

cat /sys/bus/usb/drivers/cdc_acm/new_id

?

We should probably survive it, but I think the current acm_probe() is
going to barf hard on the last data interface, if probed without the
default NO_UNION_NORMAL quirk. cdc_parse_cdc_header() will happily
parse all the functional descriptors, including the union pointing to
interfaces 0 and 1. I might be missing it, but I cannot see any sanity
check verifying that the currently probed interface is actually part of
the set of interfaces pointed to by the union. There is a sanity check
for the availability of the data interface, but there is none for the
control interface (the assumption of course that that's the interface
we're probing).

I think we need a bit more sanity checking of the union. This is likely
a generic problem for any CDC driver, so it is worth considering adding
a shared function for that.

And all this fails to explain anything if you didn't add the device
dynamically, of course...



Bjørn



2016-11-23 00:48:39

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Nov 22, 2016 at 06:50:28PM +0100, Bj?rn Mork wrote:
> > iCountryCodeRelDate 4 04052004
> > wCountryCode 0x4803
>
> No excuse for crashing of course, but that's one of the sickets
> descriptor sets I've seen today. Who got the bright idea to put the
> communication class functional descriptors on the data class interfaces?

Whell, the chinese of coarse. It's all chinese to me. But maybe they made
this time an exact copy of their example from Conexant. Not that they are
that careful usually.

>...
> Don't understand how it could crash.

The oops does normally not immediately lead to a crash. Only with debugging
on it will halt immediately and the log will tell you that a reboot will
be necessairy.

Wim.

2016-11-23 00:55:00

by wim

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Tue, Nov 22, 2016 at 07:08:30PM +0100, Bj?rn Mork wrote:
> > On kernel 4.8.8 this crashes hard and produces over a serial link:
>
> Huh? That device shouldn't ever enter that code path AFAICS.
> Unless.... you wouldn't happen to add a dynamic entry for this device,

No idea of what you mean here.

> would you? What's the output of
>
> cat /sys/bus/usb/drivers/cdc_acm/new_id

Just empty.

Wim.

2016-11-23 07:53:44

by Bjørn Mork

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5



On November 23, 2016 1:54:57 AM CET, Wim Osterholt <[email protected]> wrote:
>On Tue, Nov 22, 2016 at 07:08:30PM +0100, Bjørn Mork wrote:
>> > On kernel 4.8.8 this crashes hard and produces over a serial link:
>>
>> Huh? That device shouldn't ever enter that code path AFAICS.
>> Unless.... you wouldn't happen to add a dynamic entry for this
>device,
>
>No idea of what you mean here.
>
>> would you? What's the output of
>>
>> cat /sys/bus/usb/drivers/cdc_acm/new_id
>
>Just empty.

Shit. Back to not understanding how you could possibly enter the debugging code at all.

Bjørn

2016-11-23 15:02:43

by Alan Stern

[permalink] [raw]
Subject: Re: crash by cdc_acm driver in kernels 4.8-rc1/5

On Wed, 23 Nov 2016, [ISO-8859-1] Bj?rn Mork wrote:

> On November 23, 2016 1:54:57 AM CET, Wim Osterholt <[email protected]> wrote:
> >On Tue, Nov 22, 2016 at 07:08:30PM +0100, Bjørn Mork wrote:
> >> > On kernel 4.8.8 this crashes hard and produces over a serial link:
> >>
> >> Huh? That device shouldn't ever enter that code path AFAICS.
> >> Unless.... you wouldn't happen to add a dynamic entry for this
> >device,
> >
> >No idea of what you mean here.
> >
> >> would you? What's the output of
> >>
> >> cat /sys/bus/usb/drivers/cdc_acm/new_id
> >
> >Just empty.
>
> Shit. Back to not understanding how you could possibly enter the debugging code at all.

You're in the lucky position of having an engaged and responsive bug
reporter who is willing to apply patches and run tests. Just write a
simple diagnostic patch that will reveal exactly what pathway is being
followed.

Alan Stern