LinuxLists.cc - A reliable kernel panic (3.6.2) and system crash when visiting a particular website

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, Oct 20, 2012 at 12:06:55PM +0000, Artem S. Tashkinov wrote:
> Hello,
>
> I'm running vanilla Linux 3.6.2 x86 on top of CentOS 6.3 userspace.
>
> Every time when I enter the chat roulette website, right click anywhere and choose "Settings",
> my PC crashes (with or without NVIDIA drivers running, it happens even when I'm running Vesa).
>
> Web browser: google-chrome-stable-22.0.1229.94-161065.i386.rpm
> OS: Linux 3.6.2 vanilla x86
> CPU: Intel Core i5 2500 (non-overclocked)
> GCC: 4.7.2 vanilla
>
> The latest crash:
>
> Oct 20 07:15:22 localhost kernel: [ 224.293756] Modules linked in: pppoe pppox ppp_synctty ppp_async crc_ccitt ppp_generic slhc ipv6 nf_conntrack_ftp nf_conntrack_netbios_ns nf_conntrack_broadcast xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp xt_pkttype ipt_ULOG xt_owner xt_multiport iptable_filter ip_tables x_tables w83627ehf adt7475 hwmon_vid vboxpci(O)
> vboxnetadp(O) vboxnetflt(O) vboxdrv(O) binfmt_misc fuse hid_generic snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi uvcvideo videobuf2_core videodev
> videobuf2_vmalloc videobuf2_memops usbhid hid sr_mod cdrom coretemp aesni_intel ablk_helper cryptd aes_i586 aes_generic microcode agpgart pcspkr snd_hda_codec_realtek
> snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd snd_page_alloc i2c_i801 sg xhci_hcd fan ehci_hcd e1000e evdev [last unloaded: nvidia]
>
> Oct 20 07:15:22 localhost kernel: [ 224.293811] Pid: 2569, comm: console-kit-dae Tainted: P O 3.6.2-ic #2

Yeah, your kernel is tainted with a proprietary module (vbox*, etc). Can
you reproduce your corruptions (this is what it looks like) without that
module?

Thanks.

--
Regards/Gruss,
Boris.

2012-10-20 17:41:52

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Oct 20, 2012, Borislav Petkov wrote:

> Yeah, your kernel is tainted with a proprietary module (vbox*, etc). Can
> you reproduce your corruptions (this is what it looks like) without that
> module?

Yes, I can reproduce this panic with zero proprietary/non-free modules loaded.

The problem is the kernel doesn't even print a kernel panic - the
system just freezes completely - cursor in a text console stops blinking.

I have no means to debug it using a serial console - what can I do?

Attachments:

.config (69.87 kB)

2012-10-20 18:04:47

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, Oct 20, 2012 at 05:41:49PM +0000, Artem S. Tashkinov wrote:
> On Oct 20, 2012, Borislav Petkov wrote:
>
> > Yeah, your kernel is tainted with a proprietary module (vbox*, etc). Can
> > you reproduce your corruptions (this is what it looks like) without that
> > module?
>
> Yes, I can reproduce this panic with zero proprietary/non-free modules loaded.
>
> The problem is the kernel doesn't even print a kernel panic - the
> system just freezes completely - cursor in a text console stops blinking.
>
> I have no means to debug it using a serial console - what can I do?

Ok, here's what you can try:

* You say this happens with google chrome. Does it happen if you use
another browser: firefox, etc?

* Can you build a 64-bit kernel and try the same with it? The 32-bit
userspace should work in compat mode just fine.

* Can you run memtest on your machine and check whether your DIMMs
aren't generating ECC errors? Are your DIMMs ECC, btw?

* What about netconsole? You only need another machine on the same
network: Documentation/networking/netconsole.txt.

* boot with "pause_on_oops=600" on the kernel command line to stop the
machine for 600 secs after the first oops happens. Then try to make a
photo of the screen. Make sure to disable X or to be on a text console
so that you can see the oops.

* Try enabling a bunch of debugging options in "Kernel hacking". More
specifically,

CONFIG_DETECT_HUNG_TASK
CONFIG_DEBUG_PREEMPT
CONFIG_DEBUG_SPINLOCK
CONFIG_DEBUG_MUTEXES
CONFIG_DEBUG_LOCK_ALLOC
CONFIG_PROVE_LOCKING
CONFIG_PROVE_RCU
CONFIG_DEBUG_ATOMIC_SLEEP
CONFIG_DEBUG_BUGVERBOSE
CONFIG_DEBUG_INFO
CONFIG_DEBUG_VM
CONFIG_DEBUG_VIRTUAL
CONFIG_DEBUG_MEMORY_INIT
CONFIG_DEBUG_LIST
CONFIG_X86_VERBOSE_BOOTUP
CONFIG_DEBUG_RODATA
...

I hope those should scream in case something goes awry.

HTH.

--
Regards/Gruss,
Boris.

2012-10-20 20:32:33

by Pavel Machek

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat 2012-10-20 17:41:49, Artem S. Tashkinov wrote:
> On Oct 20, 2012, Borislav Petkov wrote:
>
> > Yeah, your kernel is tainted with a proprietary module (vbox*, etc). Can
> > you reproduce your corruptions (this is what it looks like) without that
> > module?
>
> Yes, I can reproduce this panic with zero proprietary/non-free modules loaded.
>
> The problem is the kernel doesn't even print a kernel panic - the
> system just freezes completely - cursor in a text console stops
> blinking.

bugtraq? :-).

If remote website can crash your Linux, that's quite significant news.

(Cc-ed netdev@ and security@ ... this may be important).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2012-10-20 20:57:59

[permalink] [raw]

Subject: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Oct 21, 2012, Borislav Petkov wrote:

> Ok, here's what you can try:
>
> * You say this happens with google chrome. Does it happen if you use
> another browser: firefox, etc?
>
> * Can you build a 64-bit kernel and try the same with it? The 32-bit
> userspace should work in compat mode just fine.
>
> * Can you run memtest on your machine and check whether your DIMMs
> aren't generating ECC errors? Are your DIMMs ECC, btw?
> ...

I can reproduce this problem in a virtual machine, which means I have found a real
kernel or GCC bug. Alas, VirtualBox 4.2.2 hangs entirely when I run this virtual machine -
I've never seen anything like that. Windows 7 64 bit which hosts this VirtualBox cannot
even kill a VirtualBox instance.

Unfortunately even though I run the kernel with "console=ttyS0,115200 console=tty0"
parameters they don't help - I see no panic messages on a "virtual" serial port, which
looks like we've got a very deep freeze.

2012-10-20 22:00:59

[permalink] [raw]

Subject: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

Hi,

I can only reproduce this panic when my USB webcamera is plugged in - when
I click settings in Adobe Flash it sends some commands to my USB webcam using,
presumably, Video4Linux API calls which cause a kernel hard crash.

Your kernel debug features haven't helped at all, even the virtual machine
crashes the way I cannot get any information from it - under Windows 7 64
VirtualBox becomes an unkillable process.

I've no idea what's crashing - it can be the kernel itself, or some of v4l or usb
modules.

Artem

2012-10-20 22:58:54

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, Oct 20, 2012 at 10:32:28PM +0200, Pavel Machek wrote:
> On Sat 2012-10-20 17:41:49, Artem S. Tashkinov wrote:
> > On Oct 20, 2012, Borislav Petkov wrote:
> >
> > > Yeah, your kernel is tainted with a proprietary module (vbox*, etc). Can
> > > you reproduce your corruptions (this is what it looks like) without that
> > > module?
> >
> > Yes, I can reproduce this panic with zero proprietary/non-free modules loaded.
> >
> > The problem is the kernel doesn't even print a kernel panic - the
> > system just freezes completely - cursor in a text console stops
> > blinking.
>
> bugtraq? :-).
>
> If remote website can crash your Linux, that's quite significant news.
>
> (Cc-ed netdev@ and security@ ... this may be important).

I don't think that's the problem - I rather suspect the fact that he's
using virtualbox which is causing random corruptions by writing to
arbitrary locations.

Artem,

please remove virtualbox completely from your system, rebuild the kernel
and make sure the virtualbox kernel modules don't get loaded - simply
delete them so that they are completely gone; *and* *then* retest again.

Thanks.

--
Regards/Gruss,
Boris.

2012-10-20 23:15:24

[permalink] [raw]

Subject: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

You don't get me - I have *no* VirtualBox (or any proprietary) modules running
- but I can reproduce this problem using *the same system running under* VirtualBox
in Windows 7 64.

It's almost definitely either a USB driver bug or video4linux driver bug:

I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
https://lkml.org/lkml/2012/10/20/35
https://lkml.org/lkml/2012/10/20/148

Here are the last lines from my dmesg (with usbmon loaded):

[ 292.164833] hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0002
[ 292.168091] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 00100a 0 ACK POWER sig=se0 PEC CSC
[ 292.172063] hub 1-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
[ 292.174883] usb 1-1: USB disconnect, device number 2
[ 292.178045] usb 1-1: unregistering device
[ 292.183539] usb 1-1: unregistering interface 1-1:1.0
[ 292.197034] usb 1-1: unregistering interface 1-1:1.1
[ 292.204317] usb 1-1: unregistering interface 1-1:1.2
[ 292.234519] usb 1-1: unregistering interface 1-1:1.3
[ 292.236175] usb 1-1: usb_disable_device nuking all URBs
[ 292.364429] hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
[ 294.364279] hub 1-0:1.0: hub_suspend
[ 294.366045] usb usb1: bus auto-suspend, wakeup 1
[ 294.367375] ehci_hcd 0000:00:1f.5: suspend root hub
[ 296.501084] usb usb1: usb wakeup-resume
[ 296.508311] usb usb1: usb auto-resume
[ 296.509833] ehci_hcd 0000:00:1f.5: resume root hub
[ 296.560149] hub 1-0:1.0: hub_resume
[ 296.562240] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 001003 0 ACK POWER sig=se0 CSC CONNECT
[ 296.566141] hub 1-0:1.0: port 1: status 0501 change 0001
[ 296.670413] hub 1-0:1.0: state 7 ports 8 chg 0002 evt 0000
[ 296.673222] hub 1-0:1.0: port 1, status 0501, change 0000, 480 Mb/s
[ 297.311720] usb 1-1: new high-speed USB device number 3 using ehci_hcd
[ 300.547237] usb 1-1: skipped 1 descriptor after configuration
[ 300.549443] usb 1-1: skipped 4 descriptors after interface
[ 300.552273] usb 1-1: skipped 2 descriptors after interface
[ 300.556499] usb 1-1: skipped 1 descriptor after endpoint
[ 300.559392] usb 1-1: skipped 2 descriptors after interface
[ 300.560960] usb 1-1: skipped 1 descriptor after endpoint
[ 300.562169] usb 1-1: skipped 2 descriptors after interface
[ 300.563440] usb 1-1: skipped 1 descriptor after endpoint
[ 300.564639] usb 1-1: skipped 2 descriptors after interface
[ 300.565828] usb 1-1: skipped 2 descriptors after endpoint
[ 300.567084] usb 1-1: skipped 9 descriptors after interface
[ 300.569205] usb 1-1: skipped 1 descriptor after endpoint
[ 300.570484] usb 1-1: skipped 53 descriptors after interface
[ 300.595843] usb 1-1: default language 0x0409
[ 300.602503] usb 1-1: USB interface quirks for this device: 2
[ 300.605700] usb 1-1: udev 3, busnum 1, minor = 2
[ 300.606959] usb 1-1: New USB device found, idVendor=046d, idProduct=081d
[ 300.610298] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=1
[ 300.613742] usb 1-1: SerialNumber: 48C5D2B0
[ 300.617703] usb 1-1: usb_probe_device
[ 300.620594] usb 1-1: configuration #1 chosen from 1 choice
[ 300.639218] usb 1-1: adding 1-1:1.0 (config #1, interface 0)
[ 300.640736] snd-usb-audio 1-1:1.0: usb_probe_interface
[ 300.642307] snd-usb-audio 1-1:1.0: usb_probe_interface - got id
[ 301.050296] usb 1-1: adding 1-1:1.1 (config #1, interface 1)
[ 301.054897] usb 1-1: adding 1-1:1.2 (config #1, interface 2)
[ 301.056934] uvcvideo 1-1:1.2: usb_probe_interface
[ 301.058072] uvcvideo 1-1:1.2: usb_probe_interface - got id
[ 301.059395] uvcvideo: Found UVC 1.00 device <unnamed> (046d:081d)
[ 301.090173] input: UVC Camera (046d:081d) as /devices/pci0000:00/0000:00:1f.5/usb1/1-1/1-1:1.2/input/input7
[ 301.111289] usb 1-1: adding 1-1:1.3 (config #1, interface 3)
[ 301.131207] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
[ 301.137066] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
[ 301.156451] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
[ 301.158310] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
[ 301.160238] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
[ 301.196606] set resolution quirk: cval->res = 384
[ 371.309569] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 390.729568] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
f5ade900 2296555[ 390.730023] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
437 S Ii:1:003:7[ 390.736394] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
-115:128 16 <
f5ade900 2296566256 C Ii:1:003:7 -2:128 0
[ 391.100896] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
[ 391.103188] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
f5ade900 2296926929 S Ii:1:003:7[ 391.104889] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
-115:128 16 <
f5ade900 2296937889 C Ii:1:003:7 -2:128 0
f5272300 2310382508 S Co:1:003:0 s 01 0b 0004 0001 0000 0
f5272300 2310407888 C Co:1:003:0 0 0
f5272300 2310408051 S Co:1:003:0 s 22 01 0100 0086 0003 3 = 80bb00
f5272300 2310412456 C Co:1:003:0 0 3 >
f5272300 2310412521 S Ci:1:003:0 s a2 81 0100 0086 0003 3 <
f5272300 2310415909 C Ci:1:003:0 0 0
f5272300 2310418133 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f5272600 2310418219 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f52720c0 2310418239 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f5272a80 2310418247 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f5272480 2310418256 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f52723c0 2310418264 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f5272d80 2310418272 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
f5272b40 2310418280 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <

Hard freeze with 100% CPU usage at this point as if some driver got into an
infinite loop or something.

All debug options from https://lkml.org/lkml/2012/10/20/116 are enabled, but
serial console is empty.

Best wishes,

Artem

On Oct 21, 2012, Borislav Petkov wrote:

> I don't think that's the problem - I rather suspect the fact that he's
> using virtualbox which is causing random corruptions by writing to
> arbitrary locations.
>
>
>
> please remove virtualbox completely from your system, rebuild the kernel
> and make sure the virtualbox kernel modules don't get loaded - simply
> delete them so that they are completely gone; *and* *then* retest again.

2012-10-21 00:24:31

[permalink] [raw]

Subject: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, Oct 20, 2012 at 11:15:17PM +0000, Artem S. Tashkinov wrote:
> You don't get me - I have *no* VirtualBox (or any proprietary) modules
> running

Ok, good. We got that out of the way - I wanted to make sure after you
replied with two other possibilities of the system freezing.

> - but I can reproduce this problem using *the same system running
> under* VirtualBox in Windows 7 64.

That's windoze as host and linux as a guest, correct?

If so, that's virtualbox's problem, I'd say.

> It's almost definitely either a USB driver bug or video4linux driver
> bug:

And you're assuming that because the freeze happens when using your usb
webcam, correct? And not otherwise?

Maybe you can describe in more detail what exactly you're doing so that
people could try to reproduce your issue.

> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
> https://lkml.org/lkml/2012/10/20/35
> https://lkml.org/lkml/2012/10/20/148

Yes, good idea. Maybe the folks there have some more ideas how to debug
this.

I'm leaving in the rest for reference.

What should be pointed out, though, is that you don't have any more
random corruptions causing oopses now that virtualbox is gone. The
freeze below is a whole another issue.

Thanks.

> Here are the last lines from my dmesg (with usbmon loaded):
>
> [ 292.164833] hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0002
> [ 292.168091] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 00100a 0 ACK POWER sig=se0 PEC CSC
> [ 292.172063] hub 1-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
> [ 292.174883] usb 1-1: USB disconnect, device number 2
> [ 292.178045] usb 1-1: unregistering device
> [ 292.183539] usb 1-1: unregistering interface 1-1:1.0
> [ 292.197034] usb 1-1: unregistering interface 1-1:1.1
> [ 292.204317] usb 1-1: unregistering interface 1-1:1.2
> [ 292.234519] usb 1-1: unregistering interface 1-1:1.3
> [ 292.236175] usb 1-1: usb_disable_device nuking all URBs
> [ 292.364429] hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
> [ 294.364279] hub 1-0:1.0: hub_suspend
> [ 294.366045] usb usb1: bus auto-suspend, wakeup 1
> [ 294.367375] ehci_hcd 0000:00:1f.5: suspend root hub
> [ 296.501084] usb usb1: usb wakeup-resume
> [ 296.508311] usb usb1: usb auto-resume
> [ 296.509833] ehci_hcd 0000:00:1f.5: resume root hub
> [ 296.560149] hub 1-0:1.0: hub_resume
> [ 296.562240] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 001003 0 ACK POWER sig=se0 CSC CONNECT
> [ 296.566141] hub 1-0:1.0: port 1: status 0501 change 0001
> [ 296.670413] hub 1-0:1.0: state 7 ports 8 chg 0002 evt 0000
> [ 296.673222] hub 1-0:1.0: port 1, status 0501, change 0000, 480 Mb/s
> [ 297.311720] usb 1-1: new high-speed USB device number 3 using ehci_hcd
> [ 300.547237] usb 1-1: skipped 1 descriptor after configuration
> [ 300.549443] usb 1-1: skipped 4 descriptors after interface
> [ 300.552273] usb 1-1: skipped 2 descriptors after interface
> [ 300.556499] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.559392] usb 1-1: skipped 2 descriptors after interface
> [ 300.560960] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.562169] usb 1-1: skipped 2 descriptors after interface
> [ 300.563440] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.564639] usb 1-1: skipped 2 descriptors after interface
> [ 300.565828] usb 1-1: skipped 2 descriptors after endpoint
> [ 300.567084] usb 1-1: skipped 9 descriptors after interface
> [ 300.569205] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.570484] usb 1-1: skipped 53 descriptors after interface
> [ 300.595843] usb 1-1: default language 0x0409
> [ 300.602503] usb 1-1: USB interface quirks for this device: 2
> [ 300.605700] usb 1-1: udev 3, busnum 1, minor = 2
> [ 300.606959] usb 1-1: New USB device found, idVendor=046d, idProduct=081d
> [ 300.610298] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=1
> [ 300.613742] usb 1-1: SerialNumber: 48C5D2B0
> [ 300.617703] usb 1-1: usb_probe_device
> [ 300.620594] usb 1-1: configuration #1 chosen from 1 choice
> [ 300.639218] usb 1-1: adding 1-1:1.0 (config #1, interface 0)
> [ 300.640736] snd-usb-audio 1-1:1.0: usb_probe_interface
> [ 300.642307] snd-usb-audio 1-1:1.0: usb_probe_interface - got id
> [ 301.050296] usb 1-1: adding 1-1:1.1 (config #1, interface 1)
> [ 301.054897] usb 1-1: adding 1-1:1.2 (config #1, interface 2)
> [ 301.056934] uvcvideo 1-1:1.2: usb_probe_interface
> [ 301.058072] uvcvideo 1-1:1.2: usb_probe_interface - got id
> [ 301.059395] uvcvideo: Found UVC 1.00 device <unnamed> (046d:081d)
> [ 301.090173] input: UVC Camera (046d:081d) as /devices/pci0000:00/0000:00:1f.5/usb1/1-1/1-1:1.2/input/input7
> [ 301.111289] usb 1-1: adding 1-1:1.3 (config #1, interface 3)
> [ 301.131207] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.137066] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.156451] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> [ 301.158310] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.160238] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.196606] set resolution quirk: cval->res = 384
> [ 371.309569] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> [ 390.729568] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> f5ade900 2296555[ 390.730023] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> 437 S Ii:1:003:7[ 390.736394] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> -115:128 16 <
> f5ade900 2296566256 C Ii:1:003:7 -2:128 0
> [ 391.100896] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> [ 391.103188] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> f5ade900 2296926929 S Ii:1:003:7[ 391.104889] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> -115:128 16 <
> f5ade900 2296937889 C Ii:1:003:7 -2:128 0
> f5272300 2310382508 S Co:1:003:0 s 01 0b 0004 0001 0000 0
> f5272300 2310407888 C Co:1:003:0 0 0
> f5272300 2310408051 S Co:1:003:0 s 22 01 0100 0086 0003 3 = 80bb00
> f5272300 2310412456 C Co:1:003:0 0 3 >
> f5272300 2310412521 S Ci:1:003:0 s a2 81 0100 0086 0003 3 <
> f5272300 2310415909 C Ci:1:003:0 0 0
> f5272300 2310418133 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272600 2310418219 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f52720c0 2310418239 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272a80 2310418247 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272480 2310418256 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f52723c0 2310418264 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272d80 2310418272 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272b40 2310418280 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>
> Hard freeze with 100% CPU usage at this point as if some driver got into an
> infinite loop or something.
>
> All debug options from https://lkml.org/lkml/2012/10/20/116 are enabled, but
> serial console is empty.
>
> Best wishes,
>
> Artem
>
>
> On Oct 21, 2012, Borislav Petkov wrote:
>
> > I don't think that's the problem - I rather suspect the fact that he's
> > using virtualbox which is causing random corruptions by writing to
> > arbitrary locations.
> >
> >
> >
> > please remove virtualbox completely from your system, rebuild the kernel
> > and make sure the virtualbox kernel modules don't get loaded - simply
> > delete them so that they are completely gone; *and* *then* retest again.
>
>

--
Regards/Gruss,
Boris.

2012-10-21 01:57:25

[permalink] [raw]

Subject: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

> On Oct 21, 2012, Borislav Petkov wrote:
>
> On Sat, Oct 20, 2012 at 11:15:17PM +0000, Artem S. Tashkinov wrote:
> > You don't get me - I have *no* VirtualBox (or any proprietary) modules
> > running
>
> Ok, good. We got that out of the way - I wanted to make sure after you
> replied with two other possibilities of the system freezing.
>
> > - but I can reproduce this problem using *the same system running
> > under* VirtualBox in Windows 7 64.
>
> That's windoze as host and linux as a guest, correct?

Exactly.

> If so, that's virtualbox's problem, I'd say.

I can reproduce it on my host *alone* as I said in the very first message - never
before I tried to run my Linux in a virtual machine. Please, just forget about
VirtualBox - it has nothing to do with this problem.

> > It's almost definitely either a USB driver bug or video4linux driver
> > bug:
>
> And you're assuming that because the freeze happens when using your usb
> webcam, correct? And not otherwise?

Yes, like I said earlier - only when I try to access its settings using Adobe Flash the
system crashes/freezes.

> Maybe you can describe in more detail what exactly you're doing so that
> people could try to reproduce your issue.

I don't think many people have the same webcam so it's going to be a problem. It
can be reproduced easily - just open Flash "Settings" in Google Chrome 22. The
crash will occur immediately.

> > I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
> > https://lkml.org/lkml/2012/10/20/35
> > https://lkml.org/lkml/2012/10/20/148
>
> Yes, good idea. Maybe the folks there have some more ideas how to debug
> this.
>
> I'm leaving in the rest for reference.
>
> What should be pointed out, though, is that you don't have any more
> random corruptions causing oopses now that virtualbox is gone. The
> freeze below is a whole another issue.

The freeze happens on my *host* Linux PC. For an experiment I decided to
check if I could reproduce the freeze under a virtual machine - it turns out the
Linux kernel running under it also freezes.

Artem

2012-10-21 02:19:11

[permalink] [raw]

Subject: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, 20 Oct 2012, Artem S. Tashkinov wrote:

> You don't get me - I have *no* VirtualBox (or any proprietary) modules running
> - but I can reproduce this problem using *the same system running under* VirtualBox
> in Windows 7 64.
>
> It's almost definitely either a USB driver bug or video4linux driver bug:

Does the same thing happen with earlier kernel versions?

What about if you unload snd-usb-audio or ehci-hcd?

Alan Stern

2012-10-21 10:35:06

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 01:15, Artem S. Tashkinov wrote:
> You don't get me - I have *no* VirtualBox (or any proprietary) modules running
> - but I can reproduce this problem using *the same system running under* VirtualBox
> in Windows 7 64.
>
> It's almost definitely either a USB driver bug or video4linux driver bug:
>
> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
> https://lkml.org/lkml/2012/10/20/35
> https://lkml.org/lkml/2012/10/20/148
>
> Here are the last lines from my dmesg (with usbmon loaded):
>
> [ 292.164833] hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0002
> [ 292.168091] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 00100a 0 ACK POWER sig=se0 PEC CSC
> [ 292.172063] hub 1-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
> [ 292.174883] usb 1-1: USB disconnect, device number 2
> [ 292.178045] usb 1-1: unregistering device
> [ 292.183539] usb 1-1: unregistering interface 1-1:1.0
> [ 292.197034] usb 1-1: unregistering interface 1-1:1.1
> [ 292.204317] usb 1-1: unregistering interface 1-1:1.2
> [ 292.234519] usb 1-1: unregistering interface 1-1:1.3
> [ 292.236175] usb 1-1: usb_disable_device nuking all URBs
> [ 292.364429] hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
> [ 294.364279] hub 1-0:1.0: hub_suspend
> [ 294.366045] usb usb1: bus auto-suspend, wakeup 1
> [ 294.367375] ehci_hcd 0000:00:1f.5: suspend root hub
> [ 296.501084] usb usb1: usb wakeup-resume
> [ 296.508311] usb usb1: usb auto-resume
> [ 296.509833] ehci_hcd 0000:00:1f.5: resume root hub
> [ 296.560149] hub 1-0:1.0: hub_resume
> [ 296.562240] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 001003 0 ACK POWER sig=se0 CSC CONNECT
> [ 296.566141] hub 1-0:1.0: port 1: status 0501 change 0001
> [ 296.670413] hub 1-0:1.0: state 7 ports 8 chg 0002 evt 0000
> [ 296.673222] hub 1-0:1.0: port 1, status 0501, change 0000, 480 Mb/s
> [ 297.311720] usb 1-1: new high-speed USB device number 3 using ehci_hcd
> [ 300.547237] usb 1-1: skipped 1 descriptor after configuration
> [ 300.549443] usb 1-1: skipped 4 descriptors after interface
> [ 300.552273] usb 1-1: skipped 2 descriptors after interface
> [ 300.556499] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.559392] usb 1-1: skipped 2 descriptors after interface
> [ 300.560960] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.562169] usb 1-1: skipped 2 descriptors after interface
> [ 300.563440] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.564639] usb 1-1: skipped 2 descriptors after interface
> [ 300.565828] usb 1-1: skipped 2 descriptors after endpoint
> [ 300.567084] usb 1-1: skipped 9 descriptors after interface
> [ 300.569205] usb 1-1: skipped 1 descriptor after endpoint
> [ 300.570484] usb 1-1: skipped 53 descriptors after interface
> [ 300.595843] usb 1-1: default language 0x0409
> [ 300.602503] usb 1-1: USB interface quirks for this device: 2
> [ 300.605700] usb 1-1: udev 3, busnum 1, minor = 2
> [ 300.606959] usb 1-1: New USB device found, idVendor=046d, idProduct=081d
> [ 300.610298] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=1
> [ 300.613742] usb 1-1: SerialNumber: 48C5D2B0
> [ 300.617703] usb 1-1: usb_probe_device
> [ 300.620594] usb 1-1: configuration #1 chosen from 1 choice
> [ 300.639218] usb 1-1: adding 1-1:1.0 (config #1, interface 0)
> [ 300.640736] snd-usb-audio 1-1:1.0: usb_probe_interface
> [ 300.642307] snd-usb-audio 1-1:1.0: usb_probe_interface - got id
> [ 301.050296] usb 1-1: adding 1-1:1.1 (config #1, interface 1)
> [ 301.054897] usb 1-1: adding 1-1:1.2 (config #1, interface 2)
> [ 301.056934] uvcvideo 1-1:1.2: usb_probe_interface
> [ 301.058072] uvcvideo 1-1:1.2: usb_probe_interface - got id
> [ 301.059395] uvcvideo: Found UVC 1.00 device <unnamed> (046d:081d)
> [ 301.090173] input: UVC Camera (046d:081d) as /devices/pci0000:00/0000:00:1f.5/usb1/1-1/1-1:1.2/input/input7

That seems to be a Logitech model.

> [ 301.111289] usb 1-1: adding 1-1:1.3 (config #1, interface 3)
> [ 301.131207] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.137066] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.156451] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> [ 301.158310] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.160238] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> [ 301.196606] set resolution quirk: cval->res = 384
> [ 371.309569] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> [ 390.729568] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> f5ade900 2296555[ 390.730023] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> 437 S Ii:1:003:7[ 390.736394] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> -115:128 16 <
> f5ade900 2296566256 C Ii:1:003:7 -2:128 0
> [ 391.100896] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
> [ 391.103188] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
> f5ade900 2296926929 S Ii:1:003:7[ 391.104889] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
> -115:128 16 <
> f5ade900 2296937889 C Ii:1:003:7 -2:128 0
> f5272300 2310382508 S Co:1:003:0 s 01 0b 0004 0001 0000 0
> f5272300 2310407888 C Co:1:003:0 0 0
> f5272300 2310408051 S Co:1:003:0 s 22 01 0100 0086 0003 3 = 80bb00
> f5272300 2310412456 C Co:1:003:0 0 3 >
> f5272300 2310412521 S Ci:1:003:0 s a2 81 0100 0086 0003 3 <
> f5272300 2310415909 C Ci:1:003:0 0 0
> f5272300 2310418133 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272600 2310418219 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f52720c0 2310418239 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272a80 2310418247 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272480 2310418256 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f52723c0 2310418264 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272d80 2310418272 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
> f5272b40 2310418280 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <

At least this last packet was an isochronous input on ep 6 which has
state -EINPROGRESS, but that isn't necessarily related.

> Hard freeze with 100% CPU usage at this point as if some driver got into an
> infinite loop or something.

>From your first mail in this thread, I suspect that to be some sort of
memory corruption, but now you're seeing a hard freeze. Hmm.

> All debug options from https://lkml.org/lkml/2012/10/20/116 are enabled, but
> serial console is empty.

Some thoughts:

- As Alan asked, it would be interesting to separate video and audio
functions in this test, either by unloading the kernel modules one by
one or by disallowing Flash access to the devices.

- Can you reproduce this with some other webcam tool like "cheese"?

- Can you reproduce this with some other audio capture tool like
"arecord" (use "-D" to point it to the correct device, and play with
various sample rates and buffer sizes here)

- Do you have any built-in webcam or microphone? Does it work when you
use them instead?

- Does http://trust.com/service/guides/webcam/ also crash your kernel?

- if you can narrow down the issue to USB devices, please post the
output of "lsusb -v"

I tried Chrome 22 on Ubuntu with a cheap Logitech USB webcam (different
product ID than yours, though) under 3.6.0 and 3.6.2, and I can't
reproduce the issue.

Daniel

2012-10-21 11:08:57

[permalink] [raw]

Subject: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, Oct 21, 2012 at 01:57:21AM +0000, Artem S. Tashkinov wrote:
> The freeze happens on my *host* Linux PC. For an experiment I decided
> to check if I could reproduce the freeze under a virtual machine - it
> turns out the Linux kernel running under it also freezes.

I know that - but a freeze != oops - at least not necessarily. Which
means it could very well be a different issue now that vbox is gone.

Or, it could be the same issue with different incarnations: with vbox
you get the corruptions and without it, you get the freezes. I'm
assuming you do the same flash player thing in both cases?

Here's a crazy idea: can you try to reproduce it in KVM?

Thanks.

--
Regards/Gruss,
Boris.

2012-10-21 11:59:27

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 12:34, Daniel Mack wrote:
> On 21.10.2012 01:15, Artem S. Tashkinov wrote:
>> You don't get me - I have *no* VirtualBox (or any proprietary) modules running
>> - but I can reproduce this problem using *the same system running under* VirtualBox
>> in Windows 7 64.
>>
>> It's almost definitely either a USB driver bug or video4linux driver bug:
>>
>> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
>> https://lkml.org/lkml/2012/10/20/35
>> https://lkml.org/lkml/2012/10/20/148
>>
>> Here are the last lines from my dmesg (with usbmon loaded):
>>
>> [ 292.164833] hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0002
>> [ 292.168091] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 00100a 0 ACK POWER sig=se0 PEC CSC
>> [ 292.172063] hub 1-0:1.0: port 1, status 0100, change 0003, 12 Mb/s
>> [ 292.174883] usb 1-1: USB disconnect, device number 2
>> [ 292.178045] usb 1-1: unregistering device
>> [ 292.183539] usb 1-1: unregistering interface 1-1:1.0
>> [ 292.197034] usb 1-1: unregistering interface 1-1:1.1
>> [ 292.204317] usb 1-1: unregistering interface 1-1:1.2
>> [ 292.234519] usb 1-1: unregistering interface 1-1:1.3
>> [ 292.236175] usb 1-1: usb_disable_device nuking all URBs
>> [ 292.364429] hub 1-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x100
>> [ 294.364279] hub 1-0:1.0: hub_suspend
>> [ 294.366045] usb usb1: bus auto-suspend, wakeup 1
>> [ 294.367375] ehci_hcd 0000:00:1f.5: suspend root hub
>> [ 296.501084] usb usb1: usb wakeup-resume
>> [ 296.508311] usb usb1: usb auto-resume
>> [ 296.509833] ehci_hcd 0000:00:1f.5: resume root hub
>> [ 296.560149] hub 1-0:1.0: hub_resume
>> [ 296.562240] ehci_hcd 0000:00:1f.5: GetStatus port:1 status 001003 0 ACK POWER sig=se0 CSC CONNECT
>> [ 296.566141] hub 1-0:1.0: port 1: status 0501 change 0001
>> [ 296.670413] hub 1-0:1.0: state 7 ports 8 chg 0002 evt 0000
>> [ 296.673222] hub 1-0:1.0: port 1, status 0501, change 0000, 480 Mb/s
>> [ 297.311720] usb 1-1: new high-speed USB device number 3 using ehci_hcd
>> [ 300.547237] usb 1-1: skipped 1 descriptor after configuration
>> [ 300.549443] usb 1-1: skipped 4 descriptors after interface
>> [ 300.552273] usb 1-1: skipped 2 descriptors after interface
>> [ 300.556499] usb 1-1: skipped 1 descriptor after endpoint
>> [ 300.559392] usb 1-1: skipped 2 descriptors after interface
>> [ 300.560960] usb 1-1: skipped 1 descriptor after endpoint
>> [ 300.562169] usb 1-1: skipped 2 descriptors after interface
>> [ 300.563440] usb 1-1: skipped 1 descriptor after endpoint
>> [ 300.564639] usb 1-1: skipped 2 descriptors after interface
>> [ 300.565828] usb 1-1: skipped 2 descriptors after endpoint
>> [ 300.567084] usb 1-1: skipped 9 descriptors after interface
>> [ 300.569205] usb 1-1: skipped 1 descriptor after endpoint
>> [ 300.570484] usb 1-1: skipped 53 descriptors after interface
>> [ 300.595843] usb 1-1: default language 0x0409
>> [ 300.602503] usb 1-1: USB interface quirks for this device: 2
>> [ 300.605700] usb 1-1: udev 3, busnum 1, minor = 2
>> [ 300.606959] usb 1-1: New USB device found, idVendor=046d, idProduct=081d
>> [ 300.610298] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=1
>> [ 300.613742] usb 1-1: SerialNumber: 48C5D2B0
>> [ 300.617703] usb 1-1: usb_probe_device
>> [ 300.620594] usb 1-1: configuration #1 chosen from 1 choice
>> [ 300.639218] usb 1-1: adding 1-1:1.0 (config #1, interface 0)
>> [ 300.640736] snd-usb-audio 1-1:1.0: usb_probe_interface
>> [ 300.642307] snd-usb-audio 1-1:1.0: usb_probe_interface - got id
>> [ 301.050296] usb 1-1: adding 1-1:1.1 (config #1, interface 1)
>> [ 301.054897] usb 1-1: adding 1-1:1.2 (config #1, interface 2)
>> [ 301.056934] uvcvideo 1-1:1.2: usb_probe_interface
>> [ 301.058072] uvcvideo 1-1:1.2: usb_probe_interface - got id
>> [ 301.059395] uvcvideo: Found UVC 1.00 device <unnamed> (046d:081d)
>> [ 301.090173] input: UVC Camera (046d:081d) as /devices/pci0000:00/0000:00:1f.5/usb1/1-1/1-1:1.2/input/input7
>
> That seems to be a Logitech model.
>
>> [ 301.111289] usb 1-1: adding 1-1:1.3 (config #1, interface 3)
>> [ 301.131207] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
>> [ 301.137066] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
>> [ 301.156451] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
>> [ 301.158310] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
>> [ 301.160238] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
>> [ 301.196606] set resolution quirk: cval->res = 384
>> [ 371.309569] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
>> [ 390.729568] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
>> f5ade900 2296555[ 390.730023] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
>> 437 S Ii:1:003:7[ 390.736394] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
>> -115:128 16 <
>> f5ade900 2296566256 C Ii:1:003:7 -2:128 0
>> [ 391.100896] ehci_hcd 0000:00:1f.5: reused qh f48d64c0 schedule
>> [ 391.103188] usb 1-1: link qh16-0001/f48d64c0 start 2 [1/0 us]
>> f5ade900 2296926929 S Ii:1:003:7[ 391.104889] usb 1-1: unlink qh16-0001/f48d64c0 start 2 [1/0 us]
>> -115:128 16 <
>> f5ade900 2296937889 C Ii:1:003:7 -2:128 0
>> f5272300 2310382508 S Co:1:003:0 s 01 0b 0004 0001 0000 0
>> f5272300 2310407888 C Co:1:003:0 0 0
>> f5272300 2310408051 S Co:1:003:0 s 22 01 0100 0086 0003 3 = 80bb00
>> f5272300 2310412456 C Co:1:003:0 0 3 >
>> f5272300 2310412521 S Ci:1:003:0 s a2 81 0100 0086 0003 3 <
>> f5272300 2310415909 C Ci:1:003:0 0 0
>> f5272300 2310418133 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f5272600 2310418219 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f52720c0 2310418239 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f5272a80 2310418247 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f5272480 2310418256 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f52723c0 2310418264 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f5272d80 2310418272 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <
>> f5272b40 2310418280 S Zi:1:003:6 -115:8:0 1 -18:0:100 100 <

[...]

> I tried Chrome 22 on Ubuntu with a cheap Logitech USB webcam (different
> product ID than yours, though) under 3.6.0 and 3.6.2, and I can't
> reproduce the issue.

FWIW, I also tried Chrome 22 and Firefox 16 with kernel version 3.5.4
and 3.6.2 on Fedora 17 and everything worked as expected (with both an
external and the built-in webcam of a T420). Cheese and arecord also
work on all kernel versions and distributions I have tested so far.

So whatever causes your trouble, I assume it's rather specific to your
machine configuration and setup. More information is needed here.

Daniel

2012-10-21 11:59:42

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Oct 21, 2012, Borislav Petkov wrote:
>
> On Sun, Oct 21, 2012 at 01:57:21AM +0000, Artem S. Tashkinov wrote:
> > The freeze happens on my *host* Linux PC. For an experiment I decided
> > to check if I could reproduce the freeze under a virtual machine - it
> > turns out the Linux kernel running under it also freezes.
>
> I know that - but a freeze != oops - at least not necessarily. Which
> means it could very well be a different issue now that vbox is gone.
>
> Or, it could be the same issue with different incarnations: with vbox
> you get the corruptions and without it, you get the freezes. I'm
> assuming you do the same flash player thing in both cases?
>
> Here's a crazy idea: can you try to reproduce it in KVM?

OK, dismiss VBox altogether - it has a very buggy USB implementation, thus
it just hangs when trying to access my webcam.

What I've found out is that my system crashes *only* when I try to enable
usb-audio (from the same webcam) - I still have no idea how to capture a
panic message, but I ran

"while :; do dmesg -c; done" in xterm, then I got like thousands of messages
and I photographed my monitor:

http://imageshack.us/a/img685/9452/panicz.jpg

list_del corruption. prev->next should be ... but was ...

I cannot show you more as I have no serial console to use :( and the kernel
doesn't have enough time to push error messages to rsyslog and fsync
/var/log/messages

2012-10-21 12:04:03

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 13:59, Artem S. Tashkinov wrote:
> On Oct 21, 2012, Borislav Petkov wrote:
>>
>> On Sun, Oct 21, 2012 at 01:57:21AM +0000, Artem S. Tashkinov wrote:
>>> The freeze happens on my *host* Linux PC. For an experiment I decided
>>> to check if I could reproduce the freeze under a virtual machine - it
>>> turns out the Linux kernel running under it also freezes.
>>
>> I know that - but a freeze != oops - at least not necessarily. Which
>> means it could very well be a different issue now that vbox is gone.
>>
>> Or, it could be the same issue with different incarnations: with vbox
>> you get the corruptions and without it, you get the freezes. I'm
>> assuming you do the same flash player thing in both cases?
>>
>> Here's a crazy idea: can you try to reproduce it in KVM?
>
> OK, dismiss VBox altogether - it has a very buggy USB implementation, thus
> it just hangs when trying to access my webcam.

Ok.

> What I've found out is that my system crashes *only* when I try to enable
> usb-audio (from the same webcam) - I still have no idea how to capture a
> panic message, but I ran
>
> "while :; do dmesg -c; done" in xterm, then I got like thousands of messages
> and I photographed my monitor:
>
> http://imageshack.us/a/img685/9452/panicz.jpg

A hint at least. How did you enable the audio record exactly? Can you
reproduce this with arecord?

What chipset are you on? Please provide both "lspci -v" and "lsusb -v"
dumps. As I said, I fail to reproduce that issue on any of my machines.

Daniel

2012-10-21 12:12:30

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 13:59, Artem S. Tashkinov wrote:
> On Oct 21, 2012, Borislav Petkov wrote:
>>
>> On Sun, Oct 21, 2012 at 01:57:21AM +0000, Artem S. Tashkinov wrote:
>>> The freeze happens on my *host* Linux PC. For an experiment I decided
>>> to check if I could reproduce the freeze under a virtual machine - it
>>> turns out the Linux kernel running under it also freezes.
>>
>> I know that - but a freeze != oops - at least not necessarily. Which
>> means it could very well be a different issue now that vbox is gone.
>>
>> Or, it could be the same issue with different incarnations: with vbox
>> you get the corruptions and without it, you get the freezes. I'm
>> assuming you do the same flash player thing in both cases?
>>
>> Here's a crazy idea: can you try to reproduce it in KVM?
>
> OK, dismiss VBox altogether - it has a very buggy USB implementation, thus
> it just hangs when trying to access my webcam.
>
> What I've found out is that my system crashes *only* when I try to enable
> usb-audio (from the same webcam)

It would also be interesting to know whether you have problems with
*only* the video capture, with some tool like "cheese". It might be
you're hitting a host controller issue here, and then isochronous input
packets on the video interface would most likely also trigger such am
effect. Actually, knowing whether that's the case would be crucial for
further debugging.

Daniel

2012-10-21 12:30:26

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Oct 21, 2012, Daniel Mack wrote:

> A hint at least. How did you enable the audio record exactly? Can you
> reproduce this with arecord?
>
> What chipset are you on? Please provide both "lspci -v" and "lsusb -v"
> dumps. As I said, I fail to reproduce that issue on any of my machines.

All other applications can read from the USB audio without problems, it's
just something in the way Adobe Flash polls my audio input which causes
a crash.

Just video capture (without audio) works just fine in Adobe Flash.

Only and only when I choose to use

USB Device 0x46d:0x81d my system crashes in Adobe Flash.

See the screenshot:

https://bugzilla.kernel.org/attachment.cgi?id=84151

My hardware information can be fetched from here:

https://bugzilla.kernel.org/show_bug.cgi?id=49181

On a second thought that can be even an ALSA crash or pretty much
anything else.

2012-10-21 14:21:51

[permalink] [raw]

Subject: Re: was: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

[Cc: alsa-devel]

On 21.10.2012 14:30, Artem S. Tashkinov wrote:
> On Oct 21, 2012, Daniel Mack wrote:
>
>> A hint at least. How did you enable the audio record exactly? Can you
>> reproduce this with arecord?
>>
>> What chipset are you on? Please provide both "lspci -v" and "lsusb -v"
>> dumps. As I said, I fail to reproduce that issue on any of my machines.
>
> All other applications can read from the USB audio without problems, it's
> just something in the way Adobe Flash polls my audio input which causes
> a crash.
>
> Just video capture (without audio) works just fine in Adobe Flash.

Ok, so that pretty much rules out the host controller. I just wonder why
I still don't see it here, and I haven't heard of any such problem from
anyone else.

Some more questions:

- Which version of Flash are you running?
- Does this also happen with Firefox?
- Does flash access the device directly or via PulseAudio?
- Could you please apply the attached patch and see what it spits out to
dmesg once Flash opens the device? It returns -EINVAL in the hw_params
callback to prevent the actual streaming. On my machine with Flash
11.4.31.110, I get values of 2/44800/1/32768/2048/0, which seems sane.
Or does your machine still crash before anything is written to the logs?

> Only and only when I choose to use
>
> USB Device 0x46d:0x81d my system crashes in Adobe Flash.
>
> See the screenshot:
>
> https://bugzilla.kernel.org/attachment.cgi?id=84151

When exactly does the crash happen? Right after you selected that entry
from the list? There's a little recording level meter in that dialog.
Does that show any input from the microphone?

> My hardware information can be fetched from here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=49181
>
> On a second thought that can be even an ALSA crash or pretty much
> anything else.

We'll see. Thanks for your help to sort this out!

Daniel

Attachments:

snd-usb-hwparams.diff (778.00 B)

2012-10-21 14:58:00

[permalink] [raw]

Subject: Re: Re: was: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

> On Oct 21, 2012, Daniel Mack wrote:
>
> [Cc: alsa-devel]
>
> On 21.10.2012 14:30, Artem S. Tashkinov wrote:
> > On Oct 21, 2012, Daniel Mack wrote:
> >
> >> A hint at least. How did you enable the audio record exactly? Can you
> >> reproduce this with arecord?
> >>
> >> What chipset are you on? Please provide both "lspci -v" and "lsusb -v"
> >> dumps. As I said, I fail to reproduce that issue on any of my machines.
> >
> > All other applications can read from the USB audio without problems, it's
> > just something in the way Adobe Flash polls my audio input which causes
> > a crash.
> >
> > Just video capture (without audio) works just fine in Adobe Flash.
>
> Ok, so that pretty much rules out the host controller. I just wonder why
> I still don't see it here, and I haven't heard of any such problem from
> anyone else.
>
> Some more questions:
>
> - Which version of Flash are you running?

Google Chrome has its own version of Adobe Flash:

Name: Shockwave Flash
Description: Shockwave Flash 11.4 r31
Version: 11.4.31.110

> - Does this also happen with Firefox?

No, Adobe Flash in Firefox is an older version (Shockwave Flash 11.1 r102), it shows
just two input devices instead of three which the newer Flash players sees.

* HDA Intel PCH
* USB Device 0x46d:0x81d

> - Does flash access the device directly or via PulseAudio?

PA is not installed on my computer, so Flash accesses it directly via ALSA calls.

> - Could you please apply the attached patch and see what it spits out to
> dmesg once Flash opens the device? It returns -EINVAL in the hw_params
> callback to prevent the actual streaming. On my machine with Flash
> 11.4.31.110, I get values of 2/44800/1/32768/2048/0, which seems sane.
> Or does your machine still crash before anything is written to the logs?

I will try it a bit later.

> > Only and only when I choose to use
> >
> > USB Device 0x46d:0x81d my system crashes in Adobe Flash.
> >
> > See the screenshot:
> >
> > https://bugzilla.kernel.org/attachment.cgi?id=84151
>
> When exactly does the crash happen? Right after you selected that entry
> from the list? There's a little recording level meter in that dialog.
> Does that show any input from the microphone?

Yes, right after I select it and move the mouse cursor away from this combobox
so that this selection becomes active.

> > My hardware information can be fetched from here:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=49181
> >
> > On a second thought that can be even an ALSA crash or pretty much
> > anything else.
>
> We'll see. Thanks for your help to sort this out!

Thank you for your assistance!

2012-10-21 15:22:33

[permalink] [raw]

Subject: Re: was: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 16:57, Artem S. Tashkinov wrote:
>> On Oct 21, 2012, Daniel Mack wrote:
>>
>> [Cc: alsa-devel]
>>
>> On 21.10.2012 14:30, Artem S. Tashkinov wrote:
>>> On Oct 21, 2012, Daniel Mack wrote:
>>>
>>>> A hint at least. How did you enable the audio record exactly? Can you
>>>> reproduce this with arecord?
>>>>
>>>> What chipset are you on? Please provide both "lspci -v" and "lsusb -v"
>>>> dumps. As I said, I fail to reproduce that issue on any of my machines.
>>>
>>> All other applications can read from the USB audio without problems, it's
>>> just something in the way Adobe Flash polls my audio input which causes
>>> a crash.
>>>
>>> Just video capture (without audio) works just fine in Adobe Flash.
>>
>> Ok, so that pretty much rules out the host controller. I just wonder why
>> I still don't see it here, and I haven't heard of any such problem from
>> anyone else.
>>
>> Some more questions:
>>
>> - Which version of Flash are you running?
>
> Google Chrome has its own version of Adobe Flash:
>
> Name: Shockwave Flash
> Description: Shockwave Flash 11.4 r31
> Version: 11.4.31.110

So that's the same that I'm using.

>> - Does this also happen with Firefox?
>
> No, Adobe Flash in Firefox is an older version (Shockwave Flash 11.1 r102), it shows
> just two input devices instead of three which the newer Flash players sees.
>
> * HDA Intel PCH
> * USB Device 0x46d:0x81d

And that works, I assume? Does the second choice in the newer Flash
version work maybe?

>> - Does flash access the device directly or via PulseAudio?
>
> PA is not installed on my computer, so Flash accesses it directly via ALSA calls.

Ok, Same here.

>> - Could you please apply the attached patch and see what it spits out to
>> dmesg once Flash opens the device? It returns -EINVAL in the hw_params
>> callback to prevent the actual streaming. On my machine with Flash
>> 11.4.31.110, I get values of 2/44800/1/32768/2048/0, which seems sane.
>> Or does your machine still crash before anything is written to the logs?
>
> I will try it a bit later.

Yes, we need to trace the call chain and see at which point the trouble
starts. What could help is tracing the google-chrome binary with strace
maybe. At least we would see the ioctl command sequence, if the log file
survives the crash.

As the usb list is still in Cc: - Artem's lcpci dump shows that his
machine features XHCI controllers. Can anyone think of a relation to
this problem?

And Artem, is there any way you boot your system on an older machine
that only has EHCI ports? Thinking about it, I wonder whether the freeze
in VBox and the crashes on native hardware have the same root cause. In
that case, would it be possible to share that VBox image?

Daniel

2012-10-21 15:23:33

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, 21 Oct 2012, Artem S. Tashkinov wrote:

> What I've found out is that my system crashes *only* when I try to enable
> usb-audio (from the same webcam) - I still have no idea how to capture a
> panic message, but I ran
>
> "while :; do dmesg -c; done" in xterm, then I got like thousands of messages
> and I photographed my monitor:
>
> http://imageshack.us/a/img685/9452/panicz.jpg
>
> list_del corruption. prev->next should be ... but was ...
>
> I cannot show you more as I have no serial console to use :( and the kernel
> doesn't have enough time to push error messages to rsyslog and fsync
> /var/log/messages

Is it possible to use netconsole? The screenshot above appears to be
the end of a long series of error messages, which isn't too useful.
The most important information is in the first error.

Alan Stern

2012-10-21 15:28:31

[permalink] [raw]

Subject: Re: was: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, 21 Oct 2012, Daniel Mack wrote:

> As the usb list is still in Cc: - Artem's lcpci dump shows that his
> machine features XHCI controllers. Can anyone think of a relation to
> this problem?
>
> And Artem, is there any way you boot your system on an older machine
> that only has EHCI ports? Thinking about it, I wonder whether the freeze
> in VBox and the crashes on native hardware have the same root cause. In
> that case, would it be possible to share that VBox image?

Don't grasp at straws. All of the kernel logs Artem has posted show
ehci-hcd; none of them show xhci-hcd. Therefore the xHCI controller is
highly unlikely to be involved.

Alan Stern

2012-10-21 17:03:19

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, Oct 21, 2012 at 11:59:36AM +0000, Artem S. Tashkinov wrote:
> http://imageshack.us/a/img685/9452/panicz.jpg
>
> list_del corruption. prev->next should be ... but was ...

Btw, this is one of the debug options I told you to enable.

> I cannot show you more as I have no serial console to use :( and the kernel
> doesn't have enough time to push error messages to rsyslog and fsync
> /var/log/messages

I already told you how to catch that oops: boot with "pause_on_oops=600"
on the kernel command line and photograph the screen when the first oops
happens. This'll show us where the problem begins.

--
Regards/Gruss,
Boris.

2012-10-21 19:49:06

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

>
> On Oct 21, 2012, Borislav Petkov <[email protected]> wrote:
>
> On Sun, Oct 21, 2012 at 11:59:36AM +0000, Artem S. Tashkinov wrote:
> > http://imageshack.us/a/img685/9452/panicz.jpg
> >
> > list_del corruption. prev->next should be ... but was ...
>
> Btw, this is one of the debug options I told you to enable.
>
> > I cannot show you more as I have no serial console to use :( and the kernel
> > doesn't have enough time to push error messages to rsyslog and fsync
> > /var/log/messages
>
> I already told you how to catch that oops: boot with "pause_on_oops=600"
> on the kernel command line and photograph the screen when the first oops
> happens. This'll show us where the problem begins.

This option didn't have any effect, or maybe it's because it's such a serious crash
the kernel has no time to actually print an ooops/panic message.

dmesg messages up to a crash can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=84221

I dumped them using this application:

$ cat scat.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define O_LARGEFILE 0100000
#define BUFFER 4096
#define __USE_FILE_OFFSET64 1
#define __USE_LARGEFILE64 1

int main(int argc, char *argv[])
{
int fd_out;
int64_t bytes_read;
void *buffer;

if (argc!=2) {
printf("Usage is: scat destination\n");
return 1;
}

buffer = malloc(BUFFER * sizeof(char));
if (buffer == NULL) {
printf("Error: can't allocate buffers\n");
return 2;
}
memset(buffer, 0, BUFFER);

printf("Dumping to \"%s\" ... ", argv[1]);
fflush(NULL);

if ((fd_out = open64(argv[1], O_WRONLY | O_LARGEFILE | O_SYNC | O_NOFOLLOW, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)) == -1) {
printf("Error: destination file can't be created\n");
perror("open() ");
return 2;
}

bytes_read = 1;

while (bytes_read) {
bytes_read = fread(buffer, sizeof(char), BUFFER, stdin);

if (write(fd_out, (void *) buffer, bytes_read) != bytes_read)
{
printf("Error: can't write data to the destination file! Possibly a target disk is full\n");
return 3;
}

}

close(fd_out);

printf(" OK\n");
return 0;
}

I ran it this way: while :; do dmesg -c; done | scat /dev/sda11 (yes, straight to a hdd partition to eliminate a FS cache)

Don't judge me harshly - I'm not a programmer.

2012-10-21 19:55:11

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 21:49, Artem S. Tashkinov wrote:
>>
>> On Oct 21, 2012, Borislav Petkov <[email protected]> wrote:
>>
>> On Sun, Oct 21, 2012 at 11:59:36AM +0000, Artem S. Tashkinov wrote:
>>> http://imageshack.us/a/img685/9452/panicz.jpg
>>>
>>> list_del corruption. prev->next should be ... but was ...
>>
>> Btw, this is one of the debug options I told you to enable.
>>
>>> I cannot show you more as I have no serial console to use :( and the kernel
>>> doesn't have enough time to push error messages to rsyslog and fsync
>>> /var/log/messages
>>
>> I already told you how to catch that oops: boot with "pause_on_oops=600"
>> on the kernel command line and photograph the screen when the first oops
>> happens. This'll show us where the problem begins.
>
> This option didn't have any effect, or maybe it's because it's such a serious crash
> the kernel has no time to actually print an ooops/panic message.
>
> dmesg messages up to a crash can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=84221

Nice. Could you do that again with the patch applied I sent yo some
hours ago?

Thanks,
Daniel

2012-10-21 20:36:53

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, Oct 21, 2012 at 07:49:01PM +0000, Artem S. Tashkinov wrote:
> I ran it this way: while :; do dmesg -c; done | scat /dev/sda11 (yes,
> straight to a hdd partition to eliminate a FS cache)

Well, I'm no fs guy but this should still go through the buffer cache. I
think the O_SYNC flag makes sure it all lands on the partition in time.
Oh well, it doesn't matter.

> Don't judge me harshly - I'm not a programmer.

If you wrote that and you're not a programmer, it certainly looks cool,
good job!.

[ Btw, don't forget to free(buffer) at the end. ]

Also, there was a patchset recently which added a blockconsole method to
the kernel with which you can do something like that in a generic way.

Back to the issue at hand: it looks like ehci_hcd is causing some list
corruptions, maybe coming from the uvcvideo or whatever. I think the usb
people will have a better idea.

Btw, is there any particular reason you're running a 32-bit kernel?

Thanks.

--
Regards/Gruss,
Boris.

2012-10-21 20:43:18

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

> Nice. Could you do that again with the patch applied I sent yo some
> hours ago?

That patch was of no help - the system has crashed and I couldn't spot relevant
messages.

I've no idea what it means.

Artem

2012-10-21 21:00:30

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 21.10.2012 22:43, Artem S. Tashkinov wrote:
>> Nice. Could you do that again with the patch applied I sent yo some
>> hours ago?
>
> That patch was of no help - the system has crashed and I couldn't spot relevant
> messages.
>
> I've no idea what it means.

The sequence of driver callbacks issued on a stream start is

.open()
.hw_params()
.prepare()
.trigger()

If the ALSA part really causes this issue, the bad things happen either
in any of the driver callback functions or in the core underneath.

The patch I sent returns an error from the hw_params callback, and as
you still see the problem, that means that the crash happens before any
of the USB audio streaming really starts.

Could you try and return -EINVAL from snd_usb_capture_open() please?

If anyone has a better idea on how to debug this, please chime in.

Daniel

2012-10-22 15:18:02

[permalink] [raw]

Subject: Re: Re: Re: Re: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sun, 21 Oct 2012, Artem S. Tashkinov wrote:

> dmesg messages up to a crash can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=84221

The first problem in the log is endpoint list corruption. Here's a
debugging patch which should provide a little more information.

Alan Stern

drivers/usb/core/hcd.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

Index: usb-3.6/drivers/usb/core/hcd.c
===================================================================
--- usb-3.6.orig/drivers/usb/core/hcd.c
+++ usb-3.6/drivers/usb/core/hcd.c
@@ -1083,6 +1083,8 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);

/*-------------------------------------------------------------------------*/

+static bool list_error;
+
/**
* usb_hcd_link_urb_to_ep - add an URB to its endpoint queue
* @hcd: host controller to which @urb was submitted
@@ -1126,6 +1128,20 @@ int usb_hcd_link_urb_to_ep(struct usb_hc
*/
if (HCD_RH_RUNNING(hcd)) {
urb->unlinked = 0;
+
+ {
+ struct list_head *cur = &urb->ep->urb_list;
+ struct list_head *prev = cur->prev;
+
+ if (prev->next != cur && !list_error) {
+ list_error = true;
+ dev_err(&urb->dev->dev,
+ "ep %x list add corruption: %p %p %p\n",
+ urb->ep->desc.bEndpointAddress,
+ cur, prev, prev->next);
+ }
+ }
+
list_add_tail(&urb->urb_list, &urb->ep->urb_list);
} else {
rc = -ESHUTDOWN;
@@ -1193,6 +1209,26 @@ void usb_hcd_unlink_urb_from_ep(struct u
{
/* clear all state linking urb to this dev (and hcd) */
spin_lock(&hcd_urb_list_lock);
+ {
+ struct list_head *cur = &urb->urb_list;
+ struct list_head *prev = cur->prev;
+ struct list_head *next = cur->next;
+
+ if (prev->next != cur && !list_error) {
+ list_error = true;
+ dev_err(&urb->dev->dev,
+ "ep %x list del corruption prev: %p %p %p\n",
+ urb->ep->desc.bEndpointAddress,
+ cur, prev, prev->next);
+ }
+ if (next->prev != cur && !list_error) {
+ list_error = true;
+ dev_err(&urb->dev->dev,
+ "ep %x list del corruption next: %p %p %p\n",
+ urb->ep->desc.bEndpointAddress,
+ cur, next, next->prev);
+ }
+ }
list_del_init(&urb->urb_list);
spin_unlock(&hcd_urb_list_lock);
}

2012-10-22 15:30:20

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 22.10.2012 17:17, Alan Stern wrote:
> On Sun, 21 Oct 2012, Artem S. Tashkinov wrote:
>
>> dmesg messages up to a crash can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=84221
>
> The first problem in the log is endpoint list corruption. Here's a
> debugging patch which should provide a little more information.

Maybe add a BUG() after each of these dev_err() so we stop at the first
occurance and also see where we're coming from?

> drivers/usb/core/hcd.c | 36 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 36 insertions(+)
>
> Index: usb-3.6/drivers/usb/core/hcd.c
> ===================================================================
> --- usb-3.6.orig/drivers/usb/core/hcd.c
> +++ usb-3.6/drivers/usb/core/hcd.c
> @@ -1083,6 +1083,8 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);
>
> /*-------------------------------------------------------------------------*/
>
> +static bool list_error;
> +
> /**
> * usb_hcd_link_urb_to_ep - add an URB to its endpoint queue
> * @hcd: host controller to which @urb was submitted
> @@ -1126,6 +1128,20 @@ int usb_hcd_link_urb_to_ep(struct usb_hc
> */
> if (HCD_RH_RUNNING(hcd)) {
> urb->unlinked = 0;
> +
> + {
> + struct list_head *cur = &urb->ep->urb_list;
> + struct list_head *prev = cur->prev;
> +
> + if (prev->next != cur && !list_error) {
> + list_error = true;
> + dev_err(&urb->dev->dev,
> + "ep %x list add corruption: %p %p %p\n",
> + urb->ep->desc.bEndpointAddress,
> + cur, prev, prev->next);
> + }
> + }
> +
> list_add_tail(&urb->urb_list, &urb->ep->urb_list);
> } else {
> rc = -ESHUTDOWN;
> @@ -1193,6 +1209,26 @@ void usb_hcd_unlink_urb_from_ep(struct u
> {
> /* clear all state linking urb to this dev (and hcd) */
> spin_lock(&hcd_urb_list_lock);
> + {
> + struct list_head *cur = &urb->urb_list;
> + struct list_head *prev = cur->prev;
> + struct list_head *next = cur->next;
> +
> + if (prev->next != cur && !list_error) {
> + list_error = true;
> + dev_err(&urb->dev->dev,
> + "ep %x list del corruption prev: %p %p %p\n",
> + urb->ep->desc.bEndpointAddress,
> + cur, prev, prev->next);
> + }
> + if (next->prev != cur && !list_error) {
> + list_error = true;
> + dev_err(&urb->dev->dev,
> + "ep %x list del corruption next: %p %p %p\n",
> + urb->ep->desc.bEndpointAddress,
> + cur, next, next->prev);
> + }
> + }
> list_del_init(&urb->urb_list);
> spin_unlock(&hcd_urb_list_lock);
> }
>

2012-10-22 15:54:40

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Mon, 22 Oct 2012, Daniel Mack wrote:

> On 22.10.2012 17:17, Alan Stern wrote:
> > On Sun, 21 Oct 2012, Artem S. Tashkinov wrote:
> >
> >> dmesg messages up to a crash can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=84221
> >
> > The first problem in the log is endpoint list corruption. Here's a
> > debugging patch which should provide a little more information.
>
> Maybe add a BUG() after each of these dev_err() so we stop at the first
> occurance and also see where we're coming from?

A BUG() at these points would crash the machine hard. And where we
came from doesn't matter; what matters is the values in the pointers.

Alan Stern

2012-10-22 17:30:41

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Oct 22, 2012, Alan Stern <[email protected]> wrote:

> A BUG() at these points would crash the machine hard. And where we
> came from doesn't matter; what matters is the values in the pointers.

OK, here's what the kernel prints with your patch:

usb 6.1.4: ep 86 list del corruption prev: e5103b54 e5103a94 e51039d4

A small delay before I got thousands of list_del corruption messages would
have been nice, but I managed to catch the message anyway.

Artem

2012-10-22 18:01:09

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Mon, 22 Oct 2012, Artem S. Tashkinov wrote:

> OK, here's what the kernel prints with your patch:
>
> usb 6.1.4: ep 86 list del corruption prev: e5103b54 e5103a94 e51039d4
>
> A small delay before I got thousands of list_del corruption messages would
> have been nice, but I managed to catch the message anyway.

All right. Here's a new patch, which will print more information and
will provide a 10-second delay.

For this to be useful, you should capture a usbmon trace at the same
time. The relevant entries will show up in the trace shortly before
_and_ shortly after the error message appears.

Alan Stern

P.S.: It will help if you unplug as many of the other USB devices as
possible before running this test.

Index: usb-3.6/drivers/usb/core/hcd.c
===================================================================
--- usb-3.6.orig/drivers/usb/core/hcd.c
+++ usb-3.6/drivers/usb/core/hcd.c
@@ -1083,6 +1083,8 @@ EXPORT_SYMBOL_GPL(usb_calc_bus_time);

/*-------------------------------------------------------------------------*/

+static bool list_error;
+
/**
* usb_hcd_link_urb_to_ep - add an URB to its endpoint queue
* @hcd: host controller to which @urb was submitted
@@ -1193,6 +1195,25 @@ void usb_hcd_unlink_urb_from_ep(struct u
{
/* clear all state linking urb to this dev (and hcd) */
spin_lock(&hcd_urb_list_lock);
+ {
+ struct list_head *cur = &urb->urb_list;
+ struct list_head *prev = cur->prev;
+ struct list_head *next = cur->next;
+
+ if (prev->next != cur && !list_error) {
+ list_error = true;
+ dev_err(&urb->dev->dev,
+ "ep %x list del corruption prev: %p %p %p %p %p\n",
+ urb->ep->desc.bEndpointAddress,
+ cur, prev, prev->next, next, next->prev);
+ dev_err(&urb->dev->dev,
+ "head %p urb %p urbprev %p urbnext %p\n",
+ &urb->ep->urb_list, urb,
+ list_entry(prev, struct urb, urb_list),
+ list_entry(next, struct urb, urb_list));
+ mdelay(10000);
+ }
+ }
list_del_init(&urb->urb_list);
spin_unlock(&hcd_urb_list_lock);
}

2012-11-03 14:11:00

by Christof Meerwald

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, 20 Oct 2012 23:15:17 +0000 (GMT), Artem S. Tashkinov wrote:
> It's almost definitely either a USB driver bug or video4linux driver bug:
>
> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
> https://lkml.org/lkml/2012/10/20/35
> https://lkml.org/lkml/2012/10/20/148

Not sure if it's related, but I am seeing a kernel freeze with a
usb-audio headset (connected via an external USB hub) on Linux 3.5.0
(Ubuntu 12.10) - see
http://comments.gmane.org/gmane.comp.voip.twinkle/3052 and
http://pastebin.com/aHGe1S1X for a self-contained C test.

Christof

--

http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org

2012-11-03 14:16:48

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 03.11.2012 15:10, Christof Meerwald wrote:
> On Sat, 20 Oct 2012 23:15:17 +0000 (GMT), Artem S. Tashkinov wrote:
>> It's almost definitely either a USB driver bug or video4linux driver bug:
>>
>> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
>> https://lkml.org/lkml/2012/10/20/35
>> https://lkml.org/lkml/2012/10/20/148
>
> Not sure if it's related, but I am seeing a kernel freeze with a
> usb-audio headset (connected via an external USB hub) on Linux 3.5.0
> (Ubuntu 12.10) - see

Does Ubuntu 12.10 really ship with 3.5.0? Not any more recent

> http://comments.gmane.org/gmane.comp.voip.twinkle/3052 and
> http://pastebin.com/aHGe1S1X for a self-contained C test.

Some questions:

- Are you seeing the same issue with 3.6.x?
- If you can reproduce this issue, could you paste the messages in
dmesg when this happens? Do they resemble to the list corruption that
was reported?
- Do you see the same problem with 3.4?
- Are you able to apply the patch Alan Stern posted in this thread earlier?

We should really sort this out, but I unfortunately lack a system or
setup that shows the bug.

Thanks,
Daniel

2012-11-03 14:39:03

by Sven-Haegar Koch

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, 3 Nov 2012, Daniel Mack wrote:

> On 03.11.2012 15:10, Christof Meerwald wrote:
> > On Sat, 20 Oct 2012 23:15:17 +0000 (GMT), Artem S. Tashkinov wrote:
> >> It's almost definitely either a USB driver bug or video4linux driver bug:
> >>
> >> I'm CC'ing linux-media and linux-usb mailing lists, the problem is described here:
> >> https://lkml.org/lkml/2012/10/20/35
> >> https://lkml.org/lkml/2012/10/20/148
> >
> > Not sure if it's related, but I am seeing a kernel freeze with a
> > usb-audio headset (connected via an external USB hub) on Linux 3.5.0
> > (Ubuntu 12.10) - see
>
> Does Ubuntu 12.10 really ship with 3.5.0? Not any more recent

They ship 3.5.7 plus some more fixes, but call it 3.5.0-18.29

c'ya
sven-haegar

--
Three may keep a secret, if two of them are dead.
- Ben F.

2012-11-05 19:13:48

by Christof Meerwald

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Sat, Nov 03, 2012 at 03:16:36PM +0100, Daniel Mack wrote:
> On 03.11.2012 15:10, Christof Meerwald wrote:
> > http://comments.gmane.org/gmane.comp.voip.twinkle/3052 and
> > http://pastebin.com/aHGe1S1X for a self-contained C test.
> Some questions:
>
> - Are you seeing the same issue with 3.6.x?

I haven't tried it myself, but the other poster on
http://comments.gmane.org/gmane.comp.voip.twinkle/3052 mentions 3.6.2
(and 3.6.3)

> - If you can reproduce this issue, could you paste the messages in
> dmesg when this happens? Do they resemble to the list corruption that
> was reported?

I am not seeing any kernel messages at all - the system just freezes
and not even the SysRq stuff works after that.

> - Do you see the same problem with 3.4?

I upgraded from Ubuntu 12.04 (Linux 3.2) where I didn't see the
problem. However,
http://www.linuxquestions.org/questions/linux-desktop-74/twinkle-causes-linux-freeze-kernel-3-6-2-a-4175433799/
mentions 3.4.0

> - Are you able to apply the patch Alan Stern posted in this thread earlier?

Unfortunately, I am not really in a position to apply kernel patches
at the moment.

> We should really sort this out, but I unfortunately lack a system or
> setup that shows the bug.

BTW, I have been able to reproduce the problem on a completely
different machine (also running Ubuntu 12.10, but different hardware).
The important thing appears to be that the USB audio device is
connected via a USB 2.0 hub (and then using the test code posted in
http://pastebin.com/aHGe1S1X specifying the audio device as
"plughw:Set" (or whatever it's called) seems to trigger the freeze).

So I guess another question is: do you have a USB headset connected
via a USB 2.0 hub and not seeing the problem or is your USB headset
not connected via a USB 2.0 hub? (of course, it would also be useful
if others could comment if they are seeing the problem with that setup
or not)

Christof

--

http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org

2012-11-07 17:34:45

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Mon, 5 Nov 2012, Christof Meerwald wrote:

> BTW, I have been able to reproduce the problem on a completely
> different machine (also running Ubuntu 12.10, but different hardware).
> The important thing appears to be that the USB audio device is
> connected via a USB 2.0 hub (and then using the test code posted in
> http://pastebin.com/aHGe1S1X specifying the audio device as
> "plughw:Set" (or whatever it's called) seems to trigger the freeze).

Christof: Thank you for that reference, it was a big help. After
crashing my system many times I have tracked the problem, at least in
part. The patch below should prevent your system from freezing.

Takashi: It turns out the the problem is triggered when the audio
subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
snd_usb_endpoint_start(). Since the driver doesn't wait for the
outstanding URBs to finish, it tries to submit them again while they
are still active.

Normally the USB core would realize this and fail the submission, but a
bug in ehci-hcd prevented this from happening. (That bug is what the
patch below fixes.) The URB gets added to the active list twice,
resulting in list corruption and an oops in interrupt context, which
freezes the system.

The user program that triggers the problem basically looks like this:

snd_pcm_prepare(rec_pcm);
snd_pcm_start(rec_pcm);
snd_pcm_drop(rec_pcm);

snd_pcm_prepare(rec_pcm);
snd_pcm_start(rec_pcm);

The snd_pcm_drop call unlinks the URBs but does not wait for them to
finish. Then the second snd_pcm_start call submits the URBs before
they have finished.

What is the right solution for this problem?

Alan Stern

Index: usb-3.7/drivers/usb/host/ehci-sched.c
===================================================================
--- usb-3.7.orig/drivers/usb/host/ehci-sched.c
+++ usb-3.7/drivers/usb/host/ehci-sched.c
@@ -1632,7 +1632,7 @@ static void itd_link_urb(

/* don't need that schedule data any more */
iso_sched_free (stream, iso_sched);
- urb->hcpriv = NULL;
+ urb->hcpriv = stream;

++ehci->isoc_count;
enable_periodic(ehci);
@@ -2031,7 +2031,7 @@ static void sitd_link_urb(

/* don't need that schedule data any more */
iso_sched_free (stream, sched);
- urb->hcpriv = NULL;
+ urb->hcpriv = stream;

++ehci->isoc_count;
enable_periodic(ehci);

2012-11-07 19:19:23

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

At Wed, 7 Nov 2012 12:34:43 -0500 (EST),
Alan Stern wrote:
>
> On Mon, 5 Nov 2012, Christof Meerwald wrote:
>
> > BTW, I have been able to reproduce the problem on a completely
> > different machine (also running Ubuntu 12.10, but different hardware).
> > The important thing appears to be that the USB audio device is
> > connected via a USB 2.0 hub (and then using the test code posted in
> > http://pastebin.com/aHGe1S1X specifying the audio device as
> > "plughw:Set" (or whatever it's called) seems to trigger the freeze).
>
> Christof: Thank you for that reference, it was a big help. After
> crashing my system many times I have tracked the problem, at least in
> part. The patch below should prevent your system from freezing.
>
>
> Takashi: It turns out the the problem is triggered when the audio
> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
> snd_usb_endpoint_start(). Since the driver doesn't wait for the
> outstanding URBs to finish, it tries to submit them again while they
> are still active.
>
> Normally the USB core would realize this and fail the submission, but a
> bug in ehci-hcd prevented this from happening. (That bug is what the
> patch below fixes.) The URB gets added to the active list twice,
> resulting in list corruption and an oops in interrupt context, which
> freezes the system.
>
> The user program that triggers the problem basically looks like this:
>
> snd_pcm_prepare(rec_pcm);
> snd_pcm_start(rec_pcm);
> snd_pcm_drop(rec_pcm);
>
> snd_pcm_prepare(rec_pcm);
> snd_pcm_start(rec_pcm);
>
> The snd_pcm_drop call unlinks the URBs but does not wait for them to
> finish. Then the second snd_pcm_start call submits the URBs before
> they have finished.
>
> What is the right solution for this problem?

How about the patch below? (It's for 3.6, and won't be applied cleanly
to 3.7, but easy to adapt.)

Takashi

---
diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
index d9de667..38830e2 100644
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -35,6 +35,7 @@

#define EP_FLAG_ACTIVATED 0
#define EP_FLAG_RUNNING 1
+#define EP_FLAG_STOPPING 2

/*
* snd_usb_endpoint is a model that abstracts everything related to an
@@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
if (alive)
snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
alive, ep->ep_num);
+ clear_bit(EP_FLAG_STOPPING, &ep->flags);

return 0;
}

+/* wait until urbs are really dropped */
+void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
+{
+ if (test_bit(EP_FLAG_STOPPING, &ep->flags))
+ wait_clear_urbs(ep);
+}
+
+
/*
* unlink active urbs.
*/
@@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,

if (wait)
wait_clear_urbs(ep);
+ else
+ set_bit(EP_FLAG_STOPPING, &ep->flags);
}
}

diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
index cbbbdf2..c1540a4 100644
--- a/sound/usb/endpoint.h
+++ b/sound/usb/endpoint.h
@@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
int force, int can_sleep, int wait);
+void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
void snd_usb_endpoint_free(struct list_head *head);
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index f782ce1..aee3ab0 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
if (snd_BUG_ON(!subs->data_endpoint))
return -EIO;

+ if (subs->sync_endpoint)
+ snd_usb_endpoint_sync_stop(subs->sync_endpoint);
+ if (subs->data_endpoint)
+ snd_usb_endpoint_sync_stop(subs->data_endpoint);
+
/* some unit conversions in runtime */
subs->data_endpoint->maxframesize =
bytes_to_frames(runtime, subs->data_endpoint->maxpacksize);

2012-11-07 20:37:20

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Wed, 7 Nov 2012, Takashi Iwai wrote:

> > What is the right solution for this problem?
>
> How about the patch below? (It's for 3.6, and won't be applied cleanly
> to 3.7, but easy to adapt.)

I simplified your patch a little. This is for 3.7, not 3.6. I
verified that it does fix the problem raised by the test program.

If you think this is okay, I'll submit it officially.

Alan Stern

Index: usb-3.7/sound/usb/endpoint.h
===================================================================
--- usb-3.7.orig/sound/usb/endpoint.h
+++ usb-3.7/sound/usb/endpoint.h
@@ -19,6 +19,7 @@ int snd_usb_endpoint_set_params(struct s
int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
int force, int can_sleep, int wait);
+void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
void snd_usb_endpoint_free(struct list_head *head);
Index: usb-3.7/sound/usb/pcm.c
===================================================================
--- usb-3.7.orig/sound/usb/pcm.c
+++ usb-3.7/sound/usb/pcm.c
@@ -576,6 +576,11 @@ static int snd_usb_pcm_prepare(struct sn
subs->need_setup_ep = false;
}

+ if (subs->sync_endpoint)
+ snd_usb_endpoint_sync_stop(subs->sync_endpoint);
+ if (subs->data_endpoint)
+ snd_usb_endpoint_sync_stop(subs->data_endpoint);
+
/* some unit conversions in runtime */
subs->data_endpoint->maxframesize =
bytes_to_frames(runtime, subs->data_endpoint->maxpacksize);
Index: usb-3.7/sound/usb/endpoint.c
===================================================================
--- usb-3.7.orig/sound/usb/endpoint.c
+++ usb-3.7/sound/usb/endpoint.c
@@ -481,7 +481,7 @@ __exit_unlock:
/*
* wait until all urbs are processed.
*/
-static int wait_clear_urbs(struct snd_usb_endpoint *ep)
+void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
{
unsigned long end_time = jiffies + msecs_to_jiffies(1000);
unsigned int i;
@@ -502,8 +502,6 @@ static int wait_clear_urbs(struct snd_us
if (alive)
snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
alive, ep->ep_num);
-
- return 0;
}

/*
@@ -556,7 +554,7 @@ static void release_urbs(struct snd_usb_

/* stop urbs */
deactivate_urbs(ep, force, 1);
- wait_clear_urbs(ep);
+ snd_usb_endpoint_sync_stop(ep);

for (i = 0; i < ep->nurbs; i++)
release_urb_ctx(&ep->urb[i]);
@@ -833,7 +831,7 @@ int snd_usb_endpoint_start(struct snd_us
/* just to be sure */
deactivate_urbs(ep, 0, can_sleep);
if (can_sleep)
- wait_clear_urbs(ep);
+ snd_usb_endpoint_sync_stop(ep);

ep->active_mask = 0;
ep->unlink_mask = 0;
@@ -917,7 +915,7 @@ void snd_usb_endpoint_stop(struct snd_us
ep->prepare_data_urb = NULL;

if (wait)
- wait_clear_urbs(ep);
+ snd_usb_endpoint_sync_stop(ep);
}
}

@@ -940,7 +938,7 @@ int snd_usb_endpoint_deactivate(struct s
return -EINVAL;

deactivate_urbs(ep, 1, 1);
- wait_clear_urbs(ep);
+ snd_usb_endpoint_sync_stop(ep);

if (ep->use_count != 0)
return 0;

2012-11-07 20:46:28

by Christof Meerwald

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Wed, Nov 07, 2012 at 08:19:19PM +0100, Takashi Iwai wrote:
> How about the patch below? (It's for 3.6, and won't be applied cleanly
> to 3.7, but easy to adapt.)

Thanks, that patch seems to fix the problem.

Christof

--

http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org

2012-11-07 20:59:41

[permalink] [raw]

Subject: Re: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Nov 8, 2012, Takashi Iwai wrote:

> How about the patch below? (It's for 3.6, and won't be applied cleanly
> to 3.7, but easy to adapt.)

This patch fixes my problem, thank you!

You can add me as "Tested by".

Artem

2012-11-08 00:43:12

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 07.11.2012 20:19, Takashi Iwai wrote:
> At Wed, 7 Nov 2012 12:34:43 -0500 (EST),
> Alan Stern wrote:
>>
>> On Mon, 5 Nov 2012, Christof Meerwald wrote:
>>
>>> BTW, I have been able to reproduce the problem on a completely
>>> different machine (also running Ubuntu 12.10, but different hardware).
>>> The important thing appears to be that the USB audio device is
>>> connected via a USB 2.0 hub (and then using the test code posted in
>>> http://pastebin.com/aHGe1S1X specifying the audio device as
>>> "plughw:Set" (or whatever it's called) seems to trigger the freeze).
>>
>> Christof: Thank you for that reference, it was a big help. After
>> crashing my system many times I have tracked the problem, at least in
>> part. The patch below should prevent your system from freezing.
>>
>>
>> Takashi: It turns out the the problem is triggered when the audio
>> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
>> snd_usb_endpoint_start(). Since the driver doesn't wait for the
>> outstanding URBs to finish, it tries to submit them again while they
>> are still active.
>>
>> Normally the USB core would realize this and fail the submission, but a
>> bug in ehci-hcd prevented this from happening. (That bug is what the
>> patch below fixes.) The URB gets added to the active list twice,
>> resulting in list corruption and an oops in interrupt context, which
>> freezes the system.
>>
>> The user program that triggers the problem basically looks like this:
>>
>> snd_pcm_prepare(rec_pcm);
>> snd_pcm_start(rec_pcm);
>> snd_pcm_drop(rec_pcm);
>>
>> snd_pcm_prepare(rec_pcm);
>> snd_pcm_start(rec_pcm);
>>
>> The snd_pcm_drop call unlinks the URBs but does not wait for them to
>> finish. Then the second snd_pcm_start call submits the URBs before
>> they have finished.

Thanks for investigating on this and to everyone who so quickyl tested
the provided patch. Seems like we got the right idea where the problem
really is.

However, the proposed patch seems wrong to me (see below).

>> What is the right solution for this problem?
>
> How about the patch below? (It's for 3.6, and won't be applied cleanly
> to 3.7, but easy to adapt.)
>
>
> Takashi
>
> ---
> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> index d9de667..38830e2 100644
> --- a/sound/usb/endpoint.c
> +++ b/sound/usb/endpoint.c
> @@ -35,6 +35,7 @@
>
> #define EP_FLAG_ACTIVATED 0
> #define EP_FLAG_RUNNING 1
> +#define EP_FLAG_STOPPING 2
>
> /*
> * snd_usb_endpoint is a model that abstracts everything related to an
> @@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
> if (alive)
> snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
> alive, ep->ep_num);
> + clear_bit(EP_FLAG_STOPPING, &ep->flags);
>
> return 0;
> }
>
> +/* wait until urbs are really dropped */
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
> +{
> + if (test_bit(EP_FLAG_STOPPING, &ep->flags))
> + wait_clear_urbs(ep);
> +}
> +
> +
> /*
> * unlink active urbs.
> */
> @@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
>
> if (wait)
> wait_clear_urbs(ep);
> + else
> + set_bit(EP_FLAG_STOPPING, &ep->flags);
> }
> }
>
> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
> index cbbbdf2..c1540a4 100644
> --- a/sound/usb/endpoint.h
> +++ b/sound/usb/endpoint.h
> @@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
> int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
> void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
> int force, int can_sleep, int wait);
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
> int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
> int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
> void snd_usb_endpoint_free(struct list_head *head);
> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
> index f782ce1..aee3ab0 100644
> --- a/sound/usb/pcm.c
> +++ b/sound/usb/pcm.c
> @@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
> if (snd_BUG_ON(!subs->data_endpoint))
> return -EIO;
>
> + if (subs->sync_endpoint)
> + snd_usb_endpoint_sync_stop(subs->sync_endpoint);
> + if (subs->data_endpoint)
> + snd_usb_endpoint_sync_stop(subs->data_endpoint);

We can't simply stop both endpoints in the prepare callback. The essence
of the new reference-counting system is that we can use endpoints from
multiple contexts, and the logic inside endpoint.c will care about when
to start up and take down the urbs. The idea here is that endoints can
be run for many purposes, and the new implementation that was added
allows capture endpoints to run purely as timing reference for playback.

This bug needs to be fixed in the ehci controller, or we need some other
solution in the snd-usb-audio driver. I'll do some test once I'm back
from ELC.

Daniel

2012-11-08 06:43:50

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

At Thu, 08 Nov 2012 01:42:59 +0100,
Daniel Mack wrote:
>
> On 07.11.2012 20:19, Takashi Iwai wrote:
> > At Wed, 7 Nov 2012 12:34:43 -0500 (EST),
> > Alan Stern wrote:
> >>
> >> On Mon, 5 Nov 2012, Christof Meerwald wrote:
> >>
> >>> BTW, I have been able to reproduce the problem on a completely
> >>> different machine (also running Ubuntu 12.10, but different hardware).
> >>> The important thing appears to be that the USB audio device is
> >>> connected via a USB 2.0 hub (and then using the test code posted in
> >>> http://pastebin.com/aHGe1S1X specifying the audio device as
> >>> "plughw:Set" (or whatever it's called) seems to trigger the freeze).
> >>
> >> Christof: Thank you for that reference, it was a big help. After
> >> crashing my system many times I have tracked the problem, at least in
> >> part. The patch below should prevent your system from freezing.
> >>
> >>
> >> Takashi: It turns out the the problem is triggered when the audio
> >> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
> >> snd_usb_endpoint_start(). Since the driver doesn't wait for the
> >> outstanding URBs to finish, it tries to submit them again while they
> >> are still active.
> >>
> >> Normally the USB core would realize this and fail the submission, but a
> >> bug in ehci-hcd prevented this from happening. (That bug is what the
> >> patch below fixes.) The URB gets added to the active list twice,
> >> resulting in list corruption and an oops in interrupt context, which
> >> freezes the system.
> >>
> >> The user program that triggers the problem basically looks like this:
> >>
> >> snd_pcm_prepare(rec_pcm);
> >> snd_pcm_start(rec_pcm);
> >> snd_pcm_drop(rec_pcm);
> >>
> >> snd_pcm_prepare(rec_pcm);
> >> snd_pcm_start(rec_pcm);
> >>
> >> The snd_pcm_drop call unlinks the URBs but does not wait for them to
> >> finish. Then the second snd_pcm_start call submits the URBs before
> >> they have finished.
>
>
> Thanks for investigating on this and to everyone who so quickyl tested
> the provided patch. Seems like we got the right idea where the problem
> really is.
>
> However, the proposed patch seems wrong to me (see below).
>
> >> What is the right solution for this problem?
> >
> > How about the patch below? (It's for 3.6, and won't be applied cleanly
> > to 3.7, but easy to adapt.)
> >
> >
> > Takashi
> >
> > ---
> > diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
> > index d9de667..38830e2 100644
> > --- a/sound/usb/endpoint.c
> > +++ b/sound/usb/endpoint.c
> > @@ -35,6 +35,7 @@
> >
> > #define EP_FLAG_ACTIVATED 0
> > #define EP_FLAG_RUNNING 1
> > +#define EP_FLAG_STOPPING 2
> >
> > /*
> > * snd_usb_endpoint is a model that abstracts everything related to an
> > @@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
> > if (alive)
> > snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
> > alive, ep->ep_num);
> > + clear_bit(EP_FLAG_STOPPING, &ep->flags);
> >
> > return 0;
> > }
> >
> > +/* wait until urbs are really dropped */
> > +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
> > +{
> > + if (test_bit(EP_FLAG_STOPPING, &ep->flags))
> > + wait_clear_urbs(ep);
> > +}
> > +
> > +
> > /*
> > * unlink active urbs.
> > */
> > @@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
> >
> > if (wait)
> > wait_clear_urbs(ep);
> > + else
> > + set_bit(EP_FLAG_STOPPING, &ep->flags);
> > }
> > }
> >
> > diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
> > index cbbbdf2..c1540a4 100644
> > --- a/sound/usb/endpoint.h
> > +++ b/sound/usb/endpoint.h
> > @@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
> > int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
> > void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
> > int force, int can_sleep, int wait);
> > +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
> > int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
> > int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
> > void snd_usb_endpoint_free(struct list_head *head);
> > diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
> > index f782ce1..aee3ab0 100644
> > --- a/sound/usb/pcm.c
> > +++ b/sound/usb/pcm.c
> > @@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
> > if (snd_BUG_ON(!subs->data_endpoint))
> > return -EIO;
> >
> > + if (subs->sync_endpoint)
> > + snd_usb_endpoint_sync_stop(subs->sync_endpoint);
> > + if (subs->data_endpoint)
> > + snd_usb_endpoint_sync_stop(subs->data_endpoint);
>
> We can't simply stop both endpoints in the prepare callback.

The new function doesn't stop the stream by itself but it just syncs
if the stream is being stopped beforehand. So, it's safe to call it
there.

Maybe the name was confusing. It should have been like
snd_usb_endpoint_sync_pending_stop() or such.

Takashi

> The essence
> of the new reference-counting system is that we can use endpoints from
> multiple contexts, and the logic inside endpoint.c will care about when
> to start up and take down the urbs. The idea here is that endoints can
> be run for many purposes, and the new implementation that was added
> allows capture endpoints to run purely as timing reference for playback.
>
> This bug needs to be fixed in the ehci controller, or we need some other
> solution in the snd-usb-audio driver. I'll do some test once I'm back
> from ELC.
>
>
> Daniel
>

2012-11-08 06:48:25

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

At Wed, 7 Nov 2012 15:37:17 -0500 (EST),
Alan Stern wrote:
>
> On Wed, 7 Nov 2012, Takashi Iwai wrote:
>
> > > What is the right solution for this problem?
> >
> > How about the patch below? (It's for 3.6, and won't be applied cleanly
> > to 3.7, but easy to adapt.)
>
> I simplified your patch a little.

You can't drop the check of stopping endpoint. As Daniel pointed,
endpoints might be still running when it's called. I already did a
similar failure in the past, so this patch is a revised version with
the check for pending operations.

> This is for 3.7, not 3.6. I
> verified that it does fix the problem raised by the test program.
>
> If you think this is okay, I'll submit it officially.

Don't worry, my patch is also based on 3.7, too :) 3.6 patch was
provided just for convenience, testers seemed to have 3.6 systems.

thanks,

Takashi

>
> Alan Stern
>
>
>
> Index: usb-3.7/sound/usb/endpoint.h
> ===================================================================
> --- usb-3.7.orig/sound/usb/endpoint.h
> +++ usb-3.7/sound/usb/endpoint.h
> @@ -19,6 +19,7 @@ int snd_usb_endpoint_set_params(struct s
> int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
> void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
> int force, int can_sleep, int wait);
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
> int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
> int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
> void snd_usb_endpoint_free(struct list_head *head);
> Index: usb-3.7/sound/usb/pcm.c
> ===================================================================
> --- usb-3.7.orig/sound/usb/pcm.c
> +++ usb-3.7/sound/usb/pcm.c
> @@ -576,6 +576,11 @@ static int snd_usb_pcm_prepare(struct sn
> subs->need_setup_ep = false;
> }
>
> + if (subs->sync_endpoint)
> + snd_usb_endpoint_sync_stop(subs->sync_endpoint);
> + if (subs->data_endpoint)
> + snd_usb_endpoint_sync_stop(subs->data_endpoint);
> +
> /* some unit conversions in runtime */
> subs->data_endpoint->maxframesize =
> bytes_to_frames(runtime, subs->data_endpoint->maxpacksize);
> Index: usb-3.7/sound/usb/endpoint.c
> ===================================================================
> --- usb-3.7.orig/sound/usb/endpoint.c
> +++ usb-3.7/sound/usb/endpoint.c
> @@ -481,7 +481,7 @@ __exit_unlock:
> /*
> * wait until all urbs are processed.
> */
> -static int wait_clear_urbs(struct snd_usb_endpoint *ep)
> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
> {
> unsigned long end_time = jiffies + msecs_to_jiffies(1000);
> unsigned int i;
> @@ -502,8 +502,6 @@ static int wait_clear_urbs(struct snd_us
> if (alive)
> snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
> alive, ep->ep_num);
> -
> - return 0;
> }
>
> /*
> @@ -556,7 +554,7 @@ static void release_urbs(struct snd_usb_
>
> /* stop urbs */
> deactivate_urbs(ep, force, 1);
> - wait_clear_urbs(ep);
> + snd_usb_endpoint_sync_stop(ep);
>
> for (i = 0; i < ep->nurbs; i++)
> release_urb_ctx(&ep->urb[i]);
> @@ -833,7 +831,7 @@ int snd_usb_endpoint_start(struct snd_us
> /* just to be sure */
> deactivate_urbs(ep, 0, can_sleep);
> if (can_sleep)
> - wait_clear_urbs(ep);
> + snd_usb_endpoint_sync_stop(ep);
>
> ep->active_mask = 0;
> ep->unlink_mask = 0;
> @@ -917,7 +915,7 @@ void snd_usb_endpoint_stop(struct snd_us
> ep->prepare_data_urb = NULL;
>
> if (wait)
> - wait_clear_urbs(ep);
> + snd_usb_endpoint_sync_stop(ep);
> }
> }
>
> @@ -940,7 +938,7 @@ int snd_usb_endpoint_deactivate(struct s
> return -EINVAL;
>
> deactivate_urbs(ep, 1, 1);
> - wait_clear_urbs(ep);
> + snd_usb_endpoint_sync_stop(ep);
>
> if (ep->use_count != 0)
> return 0;
>

2012-11-08 07:31:42

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On 08.11.2012 07:43, Takashi Iwai wrote:
> At Thu, 08 Nov 2012 01:42:59 +0100,
> Daniel Mack wrote:
>>
>> On 07.11.2012 20:19, Takashi Iwai wrote:
>>> At Wed, 7 Nov 2012 12:34:43 -0500 (EST),
>>> Alan Stern wrote:
>>>>
>>>> On Mon, 5 Nov 2012, Christof Meerwald wrote:
>>>>
>>>>> BTW, I have been able to reproduce the problem on a completely
>>>>> different machine (also running Ubuntu 12.10, but different hardware).
>>>>> The important thing appears to be that the USB audio device is
>>>>> connected via a USB 2.0 hub (and then using the test code posted in
>>>>> http://pastebin.com/aHGe1S1X specifying the audio device as
>>>>> "plughw:Set" (or whatever it's called) seems to trigger the freeze).
>>>>
>>>> Christof: Thank you for that reference, it was a big help. After
>>>> crashing my system many times I have tracked the problem, at least in
>>>> part. The patch below should prevent your system from freezing.
>>>>
>>>>
>>>> Takashi: It turns out the the problem is triggered when the audio
>>>> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls
>>>> snd_usb_endpoint_start(). Since the driver doesn't wait for the
>>>> outstanding URBs to finish, it tries to submit them again while they
>>>> are still active.
>>>>
>>>> Normally the USB core would realize this and fail the submission, but a
>>>> bug in ehci-hcd prevented this from happening. (That bug is what the
>>>> patch below fixes.) The URB gets added to the active list twice,
>>>> resulting in list corruption and an oops in interrupt context, which
>>>> freezes the system.
>>>>
>>>> The user program that triggers the problem basically looks like this:
>>>>
>>>> snd_pcm_prepare(rec_pcm);
>>>> snd_pcm_start(rec_pcm);
>>>> snd_pcm_drop(rec_pcm);
>>>>
>>>> snd_pcm_prepare(rec_pcm);
>>>> snd_pcm_start(rec_pcm);
>>>>
>>>> The snd_pcm_drop call unlinks the URBs but does not wait for them to
>>>> finish. Then the second snd_pcm_start call submits the URBs before
>>>> they have finished.
>>
>>
>> Thanks for investigating on this and to everyone who so quickyl tested
>> the provided patch. Seems like we got the right idea where the problem
>> really is.
>>
>> However, the proposed patch seems wrong to me (see below).
>>
>>>> What is the right solution for this problem?
>>>
>>> How about the patch below? (It's for 3.6, and won't be applied cleanly
>>> to 3.7, but easy to adapt.)
>>>
>>>
>>> Takashi
>>>
>>> ---
>>> diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
>>> index d9de667..38830e2 100644
>>> --- a/sound/usb/endpoint.c
>>> +++ b/sound/usb/endpoint.c
>>> @@ -35,6 +35,7 @@
>>>
>>> #define EP_FLAG_ACTIVATED 0
>>> #define EP_FLAG_RUNNING 1
>>> +#define EP_FLAG_STOPPING 2
>>>
>>> /*
>>> * snd_usb_endpoint is a model that abstracts everything related to an
>>> @@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
>>> if (alive)
>>> snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
>>> alive, ep->ep_num);
>>> + clear_bit(EP_FLAG_STOPPING, &ep->flags);
>>>
>>> return 0;
>>> }
>>>
>>> +/* wait until urbs are really dropped */
>>> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep)
>>> +{
>>> + if (test_bit(EP_FLAG_STOPPING, &ep->flags))
>>> + wait_clear_urbs(ep);
>>> +}
>>> +
>>> +
>>> /*
>>> * unlink active urbs.
>>> */
>>> @@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
>>>
>>> if (wait)
>>> wait_clear_urbs(ep);
>>> + else
>>> + set_bit(EP_FLAG_STOPPING, &ep->flags);
>>> }
>>> }
>>>
>>> diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
>>> index cbbbdf2..c1540a4 100644
>>> --- a/sound/usb/endpoint.h
>>> +++ b/sound/usb/endpoint.h
>>> @@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
>>> int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
>>> void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
>>> int force, int can_sleep, int wait);
>>> +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep);
>>> int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
>>> int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
>>> void snd_usb_endpoint_free(struct list_head *head);
>>> diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
>>> index f782ce1..aee3ab0 100644
>>> --- a/sound/usb/pcm.c
>>> +++ b/sound/usb/pcm.c
>>> @@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
>>> if (snd_BUG_ON(!subs->data_endpoint))
>>> return -EIO;
>>>
>>> + if (subs->sync_endpoint)
>>> + snd_usb_endpoint_sync_stop(subs->sync_endpoint);
>>> + if (subs->data_endpoint)
>>> + snd_usb_endpoint_sync_stop(subs->data_endpoint);
>>
>> We can't simply stop both endpoints in the prepare callback.
>
> The new function doesn't stop the stream by itself but it just syncs
> if the stream is being stopped beforehand. So, it's safe to call it
> there.
>
> Maybe the name was confusing. It should have been like
> snd_usb_endpoint_sync_pending_stop() or such.

Ah, right. I was errornously looking closer to Alan's patch but then
replied to yours. Alright then - thanks for explaining :)

Daniel

2012-11-08 08:09:55

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

At Thu, 08 Nov 2012 08:31:35 +0100,
Daniel Mack wrote:
(snip)
> >> We can't simply stop both endpoints in the prepare callback.
> >
> > The new function doesn't stop the stream by itself but it just syncs
> > if the stream is being stopped beforehand. So, it's safe to call it
> > there.
> >
> > Maybe the name was confusing. It should have been like
> > snd_usb_endpoint_sync_pending_stop() or such.
>
> Ah, right. I was errornously looking closer to Alan's patch but then
> replied to yours. Alright then - thanks for explaining :)

OK, thanks for checking.

FWIW, below is the patch I applied now to for-linus branch.
Renamed the function, added the comment and put NULL check to the
function to simplify.

Takashi

---
From: Takashi Iwai <[email protected]>
Subject: [PATCH] ALSA: usb-audio: Fix crash at re-preparing the PCM stream

There are bug reports of a crash with USB-audio devices when PCM
prepare is performed immediately after the stream is stopped via
trigger callback. It turned out that the problem is that we don't
wait until all URBs are killed.

This patch adds a new function to synchronize the pending stop
operation on an endpoint, and calls in the prepare callback for
avoiding the crash above.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=49181

Reported-and-tested-by: Artem S. Tashkinov <[email protected]>
Cc: <[email protected]> [v3.6]
Signed-off-by: Takashi Iwai <[email protected]>
---
sound/usb/endpoint.c | 13 +++++++++++++
sound/usb/endpoint.h | 1 +
sound/usb/pcm.c | 3 +++
3 files changed, 17 insertions(+)

diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c
index 7f78c6d..34de6f2 100644
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -35,6 +35,7 @@

#define EP_FLAG_ACTIVATED 0
#define EP_FLAG_RUNNING 1
+#define EP_FLAG_STOPPING 2

/*
* snd_usb_endpoint is a model that abstracts everything related to an
@@ -502,10 +503,20 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep)
if (alive)
snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n",
alive, ep->ep_num);
+ clear_bit(EP_FLAG_STOPPING, &ep->flags);

return 0;
}

+/* sync the pending stop operation;
+ * this function itself doesn't trigger the stop operation
+ */
+void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep)
+{
+ if (ep && test_bit(EP_FLAG_STOPPING, &ep->flags))
+ wait_clear_urbs(ep);
+}
+
/*
* unlink active urbs.
*/
@@ -918,6 +929,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,

if (wait)
wait_clear_urbs(ep);
+ else
+ set_bit(EP_FLAG_STOPPING, &ep->flags);
}
}

diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h
index 6376ccf..3d4c970 100644
--- a/sound/usb/endpoint.h
+++ b/sound/usb/endpoint.h
@@ -19,6 +19,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep);
void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep,
int force, int can_sleep, int wait);
+void snd_usb_endpoint_sync_pending_stop(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep);
void snd_usb_endpoint_free(struct list_head *head);
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 37428f7..5c12a3f 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -568,6 +568,9 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
goto unlock;
}

+ snd_usb_endpoint_sync_pending_stop(subs->sync_endpoint);
+ snd_usb_endpoint_sync_pending_stop(subs->data_endpoint);
+
ret = set_format(subs, subs->cur_audiofmt);
if (ret < 0)
goto unlock;
--
1.8.0

2012-11-08 15:55:56

[permalink] [raw]

Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website

On Thu, 8 Nov 2012, Takashi Iwai wrote:

> At Thu, 08 Nov 2012 08:31:35 +0100,
> Daniel Mack wrote:
> (snip)
> > >> We can't simply stop both endpoints in the prepare callback.
> > >
> > > The new function doesn't stop the stream by itself but it just syncs
> > > if the stream is being stopped beforehand. So, it's safe to call it
> > > there.
> > >
> > > Maybe the name was confusing. It should have been like
> > > snd_usb_endpoint_sync_pending_stop() or such.
> >
> > Ah, right. I was errornously looking closer to Alan's patch but then
> > replied to yours. Alright then - thanks for explaining :)
>
> OK, thanks for checking.
>
> FWIW, below is the patch I applied now to for-linus branch.
> Renamed the function, added the comment and put NULL check to the
> function to simplify.

Thanks for fixing this. Is your patch marked for -stable?

I have submitted a patch for ehci-hcd, so we should be all set.

Alan Stern

2012-11-08 15:58:21