Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754552Ab2KHGnu (ORCPT ); Thu, 8 Nov 2012 01:43:50 -0500 Received: from cantor2.suse.de ([195.135.220.15]:38692 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751998Ab2KHGns (ORCPT ); Thu, 8 Nov 2012 01:43:48 -0500 Date: Thu, 08 Nov 2012 07:43:46 +0100 Message-ID: From: Takashi Iwai To: Daniel Mack Cc: Alan Stern , Christof Meerwald , "Artem S. Tashkinov" , Kernel development list , USB list Subject: Re: A reliable kernel panic (3.6.2) and system crash when visiting a particular website In-Reply-To: <509B0013.6080508@gmail.com> References: <20121105191336.GA1404@edge.cmeerw.net> <509B0013.6080508@gmail.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.2 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5801 Lines: 159 At Thu, 08 Nov 2012 01:42:59 +0100, Daniel Mack wrote: > > On 07.11.2012 20:19, Takashi Iwai wrote: > > At Wed, 7 Nov 2012 12:34:43 -0500 (EST), > > Alan Stern wrote: > >> > >> On Mon, 5 Nov 2012, Christof Meerwald wrote: > >> > >>> BTW, I have been able to reproduce the problem on a completely > >>> different machine (also running Ubuntu 12.10, but different hardware). > >>> The important thing appears to be that the USB audio device is > >>> connected via a USB 2.0 hub (and then using the test code posted in > >>> http://pastebin.com/aHGe1S1X specifying the audio device as > >>> "plughw:Set" (or whatever it's called) seems to trigger the freeze). > >> > >> Christof: Thank you for that reference, it was a big help. After > >> crashing my system many times I have tracked the problem, at least in > >> part. The patch below should prevent your system from freezing. > >> > >> > >> Takashi: It turns out the the problem is triggered when the audio > >> subsystem calls snd_usb_endpoint_stop() with wait == 0 and then calls > >> snd_usb_endpoint_start(). Since the driver doesn't wait for the > >> outstanding URBs to finish, it tries to submit them again while they > >> are still active. > >> > >> Normally the USB core would realize this and fail the submission, but a > >> bug in ehci-hcd prevented this from happening. (That bug is what the > >> patch below fixes.) The URB gets added to the active list twice, > >> resulting in list corruption and an oops in interrupt context, which > >> freezes the system. > >> > >> The user program that triggers the problem basically looks like this: > >> > >> snd_pcm_prepare(rec_pcm); > >> snd_pcm_start(rec_pcm); > >> snd_pcm_drop(rec_pcm); > >> > >> snd_pcm_prepare(rec_pcm); > >> snd_pcm_start(rec_pcm); > >> > >> The snd_pcm_drop call unlinks the URBs but does not wait for them to > >> finish. Then the second snd_pcm_start call submits the URBs before > >> they have finished. > > > Thanks for investigating on this and to everyone who so quickyl tested > the provided patch. Seems like we got the right idea where the problem > really is. > > However, the proposed patch seems wrong to me (see below). > > >> What is the right solution for this problem? > > > > How about the patch below? (It's for 3.6, and won't be applied cleanly > > to 3.7, but easy to adapt.) > > > > > > Takashi > > > > --- > > diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c > > index d9de667..38830e2 100644 > > --- a/sound/usb/endpoint.c > > +++ b/sound/usb/endpoint.c > > @@ -35,6 +35,7 @@ > > > > #define EP_FLAG_ACTIVATED 0 > > #define EP_FLAG_RUNNING 1 > > +#define EP_FLAG_STOPPING 2 > > > > /* > > * snd_usb_endpoint is a model that abstracts everything related to an > > @@ -502,10 +503,19 @@ static int wait_clear_urbs(struct snd_usb_endpoint *ep) > > if (alive) > > snd_printk(KERN_ERR "timeout: still %d active urbs on EP #%x\n", > > alive, ep->ep_num); > > + clear_bit(EP_FLAG_STOPPING, &ep->flags); > > > > return 0; > > } > > > > +/* wait until urbs are really dropped */ > > +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep) > > +{ > > + if (test_bit(EP_FLAG_STOPPING, &ep->flags)) > > + wait_clear_urbs(ep); > > +} > > + > > + > > /* > > * unlink active urbs. > > */ > > @@ -913,6 +923,8 @@ void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, > > > > if (wait) > > wait_clear_urbs(ep); > > + else > > + set_bit(EP_FLAG_STOPPING, &ep->flags); > > } > > } > > > > diff --git a/sound/usb/endpoint.h b/sound/usb/endpoint.h > > index cbbbdf2..c1540a4 100644 > > --- a/sound/usb/endpoint.h > > +++ b/sound/usb/endpoint.h > > @@ -16,6 +16,7 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep, > > int snd_usb_endpoint_start(struct snd_usb_endpoint *ep, int can_sleep); > > void snd_usb_endpoint_stop(struct snd_usb_endpoint *ep, > > int force, int can_sleep, int wait); > > +void snd_usb_endpoint_sync_stop(struct snd_usb_endpoint *ep); > > int snd_usb_endpoint_activate(struct snd_usb_endpoint *ep); > > int snd_usb_endpoint_deactivate(struct snd_usb_endpoint *ep); > > void snd_usb_endpoint_free(struct list_head *head); > > diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c > > index f782ce1..aee3ab0 100644 > > --- a/sound/usb/pcm.c > > +++ b/sound/usb/pcm.c > > @@ -546,6 +546,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream) > > if (snd_BUG_ON(!subs->data_endpoint)) > > return -EIO; > > > > + if (subs->sync_endpoint) > > + snd_usb_endpoint_sync_stop(subs->sync_endpoint); > > + if (subs->data_endpoint) > > + snd_usb_endpoint_sync_stop(subs->data_endpoint); > > We can't simply stop both endpoints in the prepare callback. The new function doesn't stop the stream by itself but it just syncs if the stream is being stopped beforehand. So, it's safe to call it there. Maybe the name was confusing. It should have been like snd_usb_endpoint_sync_pending_stop() or such. Takashi > The essence > of the new reference-counting system is that we can use endpoints from > multiple contexts, and the logic inside endpoint.c will care about when > to start up and take down the urbs. The idea here is that endoints can > be run for many purposes, and the new implementation that was added > allows capture endpoints to run purely as timing reference for playback. > > This bug needs to be fixed in the ehci controller, or we need some other > solution in the snd-usb-audio driver. I'll do some test once I'm back > from ELC. > > > Daniel > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/