2010-08-12 20:01:29

by Thomas Meyer

[permalink] [raw]
Subject: 2.6.36: Sound stop working

My sound card stop to work since a few commits. I tried to bisect it and ended up with this:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
c1e5c954020e123d30b4abf4038ce501861bcf9f
We cannot bisect more!

any ideas what to do now?


2010-08-12 20:10:19

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
>
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
> c1e5c954020e123d30b4abf4038ce501861bcf9f
> We cannot bisect more!
>
> any ideas what to do now?

Takashi, maybe this is related to Linus' problem?

2010-08-12 20:20:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 1:10 PM, Pekka Enberg <[email protected]> wrote:
> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
>> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
>>
>> There are only 'skip'ped commits left to test.
>> The first bad commit could be any of:
>> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
>> c1e5c954020e123d30b4abf4038ce501861bcf9f
>> We cannot bisect more!
>>
>> any ideas what to do now?
>
> Takashi, maybe this is related to Linus' problem?

Yes, I'm 99% sure it is. I haven't bisected my problem all the way,
but I've bisected away all the sound changes (and as mentioned, doing
a 'cat' to /dev/audio works). And yes, the fanotify changes are right
in the middle of my bisect.

So I suspect it's not sound that is broken at all, but pulseaudio that
got broken by the fanotify changes.

Eric - over to you.

Linus

2010-08-12 20:26:17

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

At Thu, 12 Aug 2010 23:10:15 +0300,
Pekka Enberg wrote:
>
> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
> > My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
> >
> > There are only 'skip'ped commits left to test.
> > The first bad commit could be any of:
> > 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
> > c1e5c954020e123d30b4abf4038ce501861bcf9f
> > We cannot bisect more!
> >
> > any ideas what to do now?
>
> Takashi, maybe this is related to Linus' problem?

Yeah, I guess so, since I couldn't see any obvious problem in sound/*.

PulseAudio uses inotify in udev-detection module, at least.
It's possible that inotify change may hit the sound in the end.


thanks,

Takashi

2010-08-12 21:02:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 1:20 PM, Linus Torvalds
<[email protected]> wrote:
>
> So I suspect it's not sound that is broken at all, but pulseaudio that
> got broken by the fanotify changes.

Confirmed. That broken commit 3bcf3860a4ff9bb doesn't even boot for me
(looks like a stack smash recursion), and the commit to "fix" it
(c1e5c954020e12) is really too ugly to live.

I think we need to totally undo the whole "struct file" thing, and
just admit that it was a mistake. The code really wants a "struct
path", and using a struct file screws up all the refcounting and is
just not right.

The fact that dentry_open() may have some problem needs to be fixed
-there- rather than make the callers do crazy things that they don't
want to do and can't do sanely.

Linus

2010-08-12 21:05:45

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On 08/12/2010 10:26 PM, Takashi Iwai wrote:
> At Thu, 12 Aug 2010 23:10:15 +0300,
> Pekka Enberg wrote:
>>
>> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
>>> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
>>>
>>> There are only 'skip'ped commits left to test.
>>> The first bad commit could be any of:
>>> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
>>> c1e5c954020e123d30b4abf4038ce501861bcf9f
>>> We cannot bisect more!
>>>
>>> any ideas what to do now?
>>
>> Takashi, maybe this is related to Linus' problem?
>
> Yeah, I guess so, since I couldn't see any obvious problem in sound/*.
>
> PulseAudio uses inotify in udev-detection module, at least.
> It's possible that inotify change may hit the sound in the end.

Probably I got into this problem yesterday. Found out that PA fails to
open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
opens it again and gets EBUSY. aplay is OK.

(If this is the same issue.)

--
js

2010-08-12 21:14:43

by Maxim Levitsky

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, 2010-08-12 at 23:01 +0200, Jiri Slaby wrote:
> On 08/12/2010 10:26 PM, Takashi Iwai wrote:
> > At Thu, 12 Aug 2010 23:10:15 +0300,
> > Pekka Enberg wrote:
> >>
> >> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
> >>> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
> >>>
> >>> There are only 'skip'ped commits left to test.
> >>> The first bad commit could be any of:
> >>> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
> >>> c1e5c954020e123d30b4abf4038ce501861bcf9f
> >>> We cannot bisect more!
> >>>
> >>> any ideas what to do now?
> >>
> >> Takashi, maybe this is related to Linus' problem?
> >
> > Yeah, I guess so, since I couldn't see any obvious problem in sound/*.
> >
> > PulseAudio uses inotify in udev-detection module, at least.
> > It's possible that inotify change may hit the sound in the end.
>
> Probably I got into this problem yesterday. Found out that PA fails to
> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
> opens it again and gets EBUSY. aplay is OK.
I confirm that here.

>
> (If this is the same issue.)
>
It must be..


Best regards,
Maxim Levitsky

2010-08-12 21:17:16

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

At Thu, 12 Aug 2010 23:01:04 +0200,
Jiri Slaby wrote:
>
> On 08/12/2010 10:26 PM, Takashi Iwai wrote:
> > At Thu, 12 Aug 2010 23:10:15 +0300,
> > Pekka Enberg wrote:
> >>
> >> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
> >>> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
> >>>
> >>> There are only 'skip'ped commits left to test.
> >>> The first bad commit could be any of:
> >>> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
> >>> c1e5c954020e123d30b4abf4038ce501861bcf9f
> >>> We cannot bisect more!
> >>>
> >>> any ideas what to do now?
> >>
> >> Takashi, maybe this is related to Linus' problem?
> >
> > Yeah, I guess so, since I couldn't see any obvious problem in sound/*.
> >
> > PulseAudio uses inotify in udev-detection module, at least.
> > It's possible that inotify change may hit the sound in the end.
>
> Probably I got into this problem yesterday. Found out that PA fails to
> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
> opens it again and gets EBUSY. aplay is OK.

Yes, I can also confirm that it's broken on my machine in the same
way now :) PA log shows that the succeeding open failed.

PA tries to do quick open/close of the same device to figure out which
configuration is available at start-up. This implies that the
fs/notify commits touching the open/close stuff can be the culprit.


thanks,

Takashi

2010-08-12 21:19:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

Eric needs to be cc'd.. Sorry for the extra quoting.

Linus
---
On Thu, Aug 12, 2010 at 2:01 PM, Jiri Slaby <[email protected]> wrote:
> On 08/12/2010 10:26 PM, Takashi Iwai wrote:
>> At Thu, 12 Aug 2010 23:10:15 +0300,
>> Pekka Enberg wrote:
>>>
>>> On Thu, Aug 12, 2010 at 11:00 PM, Thomas Meyer <[email protected]> wrote:
>>>> My sound card stop to work since a few commits. I tried to bisect it and ended up with this:
>>>>
>>>> There are only 'skip'ped commits left to test.
>>>> The first bad commit could be any of:
>>>> 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e
>>>> c1e5c954020e123d30b4abf4038ce501861bcf9f
>>>> We cannot bisect more!
>>>>
>>>> any ideas what to do now?
>>>
>>> Takashi, maybe this is related to Linus' problem?
>>
>> Yeah, I guess so, since I couldn't see any obvious problem in sound/*.
>>
>> PulseAudio uses inotify in udev-detection module, at least.
>> It's possible that inotify change may hit the sound in the end.
>
> Probably I got into this problem yesterday. Found out that PA fails to
> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
> opens it again and gets EBUSY. aplay is OK.
>
> (If this is the same issue.)
>
> --
> js
>

2010-08-12 21:24:54

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On 08/12/2010 11:18 PM, Linus Torvalds wrote:
> On Thu, Aug 12, 2010 at 2:01 PM, Jiri Slaby <[email protected]> wrote:
>> Probably I got into this problem yesterday. Found out that PA fails to
>> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
>> opens it again and gets EBUSY. aplay is OK.

Perfectly reproducible in qemu-kvm with ac97 soundhw, i.e. intel8x0
driver. Just in case you want to debug that easily.

--
js

2010-08-12 21:42:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 2:17 PM, Takashi Iwai <[email protected]> wrote:
> At Thu, 12 Aug 2010 23:01:04 +0200,
> Jiri Slaby wrote:
>>
>> Probably I got into this problem yesterday. Found out that PA fails to
>> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
>> opens it again and gets EBUSY. aplay is OK.
>
> Yes, I can also confirm that it's broken on my machine in the same
> way now :) ?PA log shows that the succeeding open failed.
>
> PA tries to do quick open/close of the same device to figure out which
> configuration is available at start-up. ?This implies that the
> fs/notify commits touching the open/close stuff can be the culprit.

Yeah. The f_count stuff is disgusting. This revert patch makes things
work for me again. And I suspect it's the right thing to do
regardless. I reacted to that ugly __fput() hackery when I pulled the
fanotify tree, but I let it slide. I clearly should have let my taste
guide me more.

fsnotify playing games with fput/fget is almost certainly totally wrong.

Al, what was the problem that caused you to think that we want to have
a 'struct file' here? Is it just the fact that some of those fsnotify
things use that 'path' structure that is embedded in the parent? But
isn't the simplest fix for that to just _copy_ the "struct path"
rather than pass the "struct file" around?

Linus


Attachments:
diff.txt (15.40 kB)

2010-08-12 21:45:35

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

At Thu, 12 Aug 2010 23:24:43 +0200,
Jiri Slaby wrote:
>
> On 08/12/2010 11:18 PM, Linus Torvalds wrote:
> > On Thu, Aug 12, 2010 at 2:01 PM, Jiri Slaby <[email protected]> wrote:
> >> Probably I got into this problem yesterday. Found out that PA fails to
> >> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
> >> opens it again and gets EBUSY. aplay is OK.
>
> Perfectly reproducible in qemu-kvm with ac97 soundhw, i.e. intel8x0
> driver. Just in case you want to debug that easily.

And the below is a minimal test case to simulate the situation
PulseAudio does.


Takashi

===

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/inotify.h>
#include <unistd.h>

int main()
{
int fd;

inotify_add_watch(inotify_init(), "/dev/snd", IN_CLOSE_WRITE);

fd = open("/dev/snd/pcmC0D0p", O_RDWR | O_NONBLOCK);
if (fd < 0)
perror("open1");
else
close(fd);
fd = open("/dev/snd/pcmC0D0p", O_RDWR | O_NONBLOCK);
if (fd < 0)
perror("open2");
else
close(fd);
return 0;
}

2010-08-12 21:51:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 2:41 PM, Linus Torvalds
<[email protected]> wrote:
>
> Al, what was the problem that caused you to think that we want to have
> a 'struct file' here? Is it just the fact that some of those fsnotify
> things use that 'path' structure that is embedded in the parent? But
> isn't the simplest fix for that to just _copy_ the "struct path"
> rather than pass the "struct file" around?

Btw, that's exactly what the fsnotify code does seem to do, in
fsnotify_create_event(). So as far as I can tell, it's all ok - we
only use that file->f_path pointer while we hold the file open, and
then we copy it to event->path and increment the counts properly.

Of course, maybe I'm missing some other case where we _don't_ copy the
data, and pass a pointer to a file->f_path around that could get
stale. Or maybe I'm missing some other worry entirely.

But those games with f_count are just disgusting. The path-based thing
seems to be much more straightforward.

Linus

2010-08-12 21:53:34

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 02:41:51PM -0700, Linus Torvalds wrote:

> Yeah. The f_count stuff is disgusting. This revert patch makes things
> work for me again. And I suspect it's the right thing to do
> regardless. I reacted to that ugly __fput() hackery when I pulled the
> fanotify tree, but I let it slide. I clearly should have let my taste
> guide me more.
>
> fsnotify playing games with fput/fget is almost certainly totally wrong.
>
> Al, what was the problem that caused you to think that we want to have
> a 'struct file' here? Is it just the fact that some of those fsnotify
> things use that 'path' structure that is embedded in the parent? But
> isn't the simplest fix for that to just _copy_ the "struct path"
> rather than pass the "struct file" around?

I agree that what this crap is doing to f_count is blatantly wrong,
of course - no arguments here. I *do* have a reason to want struct
file, but not that way, TYVM.

Basically, dentry_open() is not even promised to work on arbitrary
dentry. Thanks to !@#!@#!@#!@# intents crap we are not promised
that the damn thing won't need setup by ->lookup/->d_revalidate.
We _are_ more or less promised that it'll work while the file is
opened (provided that /proc/*/fd/* is openable), but that's it.

It's not an API that can be made to work on an arbitrary dentries. If
caller knows what it's dealing with, it's OK, but not in general. And
no, I'm not fond of that situation, to put it mildly.

I'll see what can be done to fix that mess more or less right way...

2010-08-12 22:07:39

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

At Thu, 12 Aug 2010 14:41:51 -0700,
Linus Torvalds wrote:
>
> On Thu, Aug 12, 2010 at 2:17 PM, Takashi Iwai <[email protected]> wrote:
> > At Thu, 12 Aug 2010 23:01:04 +0200,
> > Jiri Slaby wrote:
> >>
> >> Probably I got into this problem yesterday. Found out that PA fails to
> >> open /dev/snd/pcmC0D0p _second_ time. It opens it, then closes, then
> >> opens it again and gets EBUSY. aplay is OK.
> >
> > Yes, I can also confirm that it's broken on my machine in the same
> > way now :)  PA log shows that the succeeding open failed.
> >
> > PA tries to do quick open/close of the same device to figure out which
> > configuration is available at start-up.  This implies that the
> > fs/notify commits touching the open/close stuff can be the culprit.
>
> Yeah. The f_count stuff is disgusting. This revert patch makes things
> work for me again.

It makes working for me, too.


thanks,

Takashi

2010-08-12 22:20:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, Aug 12, 2010 at 2:52 PM, Al Viro <[email protected]> wrote:
> On Thu, Aug 12, 2010 at 02:41:51PM -0700, Linus Torvalds wrote:
>
>> Yeah. The f_count stuff is disgusting. This revert patch makes things
>> work for me again. And I suspect it's the right thing to do
>> regardless. I reacted to that ugly __fput() hackery when I pulled the
>> fanotify tree, but I let it slide. I clearly should have let my taste
>> guide me more.
>>
>> fsnotify playing games with fput/fget is almost certainly totally wrong.
>>
>> Al, what was the problem that caused you to think that we want to have
>> a 'struct file' here? Is it just the fact that some of those fsnotify
>> things use that 'path' structure that is embedded in the parent? But
>> isn't the simplest fix for that to just _copy_ the "struct path"
>> rather than pass the "struct file" around?
>
> I agree that what this crap is doing to f_count is blatantly wrong,
> of course - no arguments here. ?I *do* have a reason to want struct
> file, but not that way, TYVM.
>
> Basically, dentry_open() is not even promised to work on arbitrary
> dentry. ?Thanks to !@#!@#!@#!@# intents crap we are not promised
> that the damn thing won't need setup by ->lookup/->d_revalidate.
> We _are_ more or less promised that it'll work while the file is
> opened (provided that /proc/*/fd/* is openable), but that's it.

Hmm. We do basically dentry_open() these days with the whole
path_init() of a dfd. Yes, we have the 'struct file', but the only
thing we take from there is the dentry.

So why not just say that 'dentry_open()' is exactly that path_init()
setup, and then look up an empty path. Allow a non-directory, and
leave it to the filesystems to then say "I can't do that at all" if
they really want to. fanotify seems to be perfectly happy with the
lookup failing.

So I'm not convinced this is really a fanotify problem, as much as a
VFS layer thing.

> It's not an API that can be made to work on an arbitrary dentries. ?If
> caller knows what it's dealing with, it's OK, but not in general. ?And
> no, I'm not fond of that situation, to put it mildly.

So I think it's probably doable for arbitrary dentries too, although
maybe we need to have a special "reopen()" function. I suspect few
filesystems will care, and if you can fail it, the ones that do care
deeply are home free too. No?

> I'll see what can be done to fix that mess more or less right way...

.. but I assume you don't wan tto keep those "struct file" games in
fanotify regardless, right? So we can fix the sound issue by the
revert I sent out, no?

Linus

2010-08-13 01:54:46

by Eric Paris

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On Thu, 2010-08-12 at 15:19 -0700, Linus Torvalds wrote:
> On Thu, Aug 12, 2010 at 2:52 PM, Al Viro <[email protected]> wrote:
> > On Thu, Aug 12, 2010 at 02:41:51PM -0700, Linus Torvalds wrote:

> > I'll see what can be done to fix that mess more or less right way...
>
> .. but I assume you don't wan tto keep those "struct file" games in
> fanotify regardless, right? So we can fix the sound issue by the
> revert I sent out, no?

Sorry I'm just coming into this now that it is 'solved.' I just got
back to my hotel from Linuxcon and haven't been checking e-mail. I
guess noone (including me) is testing sound in linux-next :(. I
wonder what kind of tom foolery it must be doing with f_count (like I
am) to have hit problems.

I certainly agree the revert patch you sent should be applied if the
worst artifact for calling dentry_open() with an 'arbitrary dentry' is
that it fails.

An easier long term fix might be a 'dentry_open_as' which I can call in
the context of the original opening process but which will be done (for
security/accounting/fd table/etc purposes) as the process which will
ultimately consume the new struct file. I don't know which would be
better/easier for the VFS, Al?

-Eric

2010-08-13 07:50:07

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.36: Sound stop working

On 08/13/2010 03:53 AM, Eric Paris wrote:
> I guess noone (including me) is testing sound in linux-next :(.

Yeah, there are some people like Valdis and me who run -next based
kernels (-mm). But it got upstream too fast to react earlier. I updated
to the latest -next few days ago, Valdis with the latest mmotm release 2
days ago. So I think these things need to sit in -next a bit longer than
they usually do. The reason is also that I need to solve 2-3 issues with
every -next update and it takes time too.

regards,
--
js