2018-10-03 12:51:30

by Jia-Ju Bai

[permalink] [raw]
Subject: [BUG] sound: pci: trident: a possible data race

CPU0:
snd_trident_hw_free
snd_trident_free_voice
line 3870: spin_lock_irqsave()
line 3881: voice->substream = NULL; [WRITE]
CPU1:
snd_trident_interrupt
line 3798: snd_pcm_period_elapsed(voice->substream); [READ]

As for voice->substream, the WRITE operation in CPU0 is performed
with holding a spinlock, but the READ operation in CPU1 is performed
without holding this spinlock, so there may exist a data race.


Best wishes,
Jia-Ju Bai


2018-10-03 15:54:59

by Takashi Iwai

[permalink] [raw]
Subject: Re: [BUG] sound: pci: trident: a possible data race

On Wed, 03 Oct 2018 14:50:25 +0200,
Jia-Ju Bai wrote:
>
> CPU0:
> snd_trident_hw_free
> snd_trident_free_voice
> line 3870: spin_lock_irqsave()
> line 3881: voice->substream = NULL; [WRITE]
> CPU1:
> snd_trident_interrupt
> line 3798: snd_pcm_period_elapsed(voice->substream); [READ]
>
> As for voice->substream, the WRITE operation in CPU0 is performed
> with holding a spinlock, but the READ operation in CPU1 is performed
> without holding this spinlock, so there may exist a data race.

Thanks for the report.

The actual crash must be very unlikely, almost 0%, though.
snd_trident_hw_free() is called always after the PCM stream gets
stopped via trigger callback, i.e. at the moment, there is no
corresponding interrupt is generated for that voice entry.

And the hardware is very old, I bet only a handful people still using
in the whole world :)

If we really want to be 100% sure, we may call synchronize_irq()
before the NULL-clearing. But there is no way to test the bug nor
the fix, so it's fairly moot, IMO.


thanks,

Takashi

2018-10-04 03:09:15

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [BUG] sound: pci: trident: a possible data race

Thanks for the reply :)


On 2018/10/3 23:54, Takashi Iwai wrote:
> On Wed, 03 Oct 2018 14:50:25 +0200,
> Jia-Ju Bai wrote:
>> CPU0:
>> snd_trident_hw_free
>> snd_trident_free_voice
>> line 3870: spin_lock_irqsave()
>> line 3881: voice->substream = NULL; [WRITE]
>> CPU1:
>> snd_trident_interrupt
>> line 3798: snd_pcm_period_elapsed(voice->substream); [READ]
>>
>> As for voice->substream, the WRITE operation in CPU0 is performed
>> with holding a spinlock, but the READ operation in CPU1 is performed
>> without holding this spinlock, so there may exist a data race.
> Thanks for the report.
>
> The actual crash must be very unlikely, almost 0%, though.
> snd_trident_hw_free() is called always after the PCM stream gets
> stopped via trigger callback, i.e. at the moment, there is no
> corresponding interrupt is generated for that voice entry.

How about the case that playback and capture are performed concurrently?
Namely, snd_trident_hw_free() is called for playback, and the interrupt
is generated for capture.

> And the hardware is very old, I bet only a handful people still using
> in the whole world :)

I have this hardware, so I am the one of these handful people ;)


Best wishes,
Jia-Ju Bai

2018-10-04 05:25:18

by Takashi Iwai

[permalink] [raw]
Subject: Re: [BUG] sound: pci: trident: a possible data race

On Thu, 04 Oct 2018 05:08:45 +0200,
Jia-Ju Bai wrote:
>
> Thanks for the reply :)
>
>
> On 2018/10/3 23:54, Takashi Iwai wrote:
> > On Wed, 03 Oct 2018 14:50:25 +0200,
> > Jia-Ju Bai wrote:
> >> CPU0:
> >> snd_trident_hw_free
> >> snd_trident_free_voice
> >> line 3870: spin_lock_irqsave()
> >> line 3881: voice->substream = NULL; [WRITE]
> >> CPU1:
> >> snd_trident_interrupt
> >> line 3798: snd_pcm_period_elapsed(voice->substream); [READ]
> >>
> >> As for voice->substream, the WRITE operation in CPU0 is performed
> >> with holding a spinlock, but the READ operation in CPU1 is performed
> >> without holding this spinlock, so there may exist a data race.
> > Thanks for the report.
> >
> > The actual crash must be very unlikely, almost 0%, though.
> > snd_trident_hw_free() is called always after the PCM stream gets
> > stopped via trigger callback, i.e. at the moment, there is no
> > corresponding interrupt is generated for that voice entry.
>
> How about the case that playback and capture are performed concurrently?
> Namely, snd_trident_hw_free() is called for playback, and the
> interrupt is generated for capture.

They are different substreams, hence it won't pick up the substream
object.

> > And the hardware is very old, I bet only a handful people still using
> > in the whole world :)
>
> I have this hardware, so I am the one of these handful people ;)

Wow, that's fun. Then we can really fix the issue, if any.
Did you actually hit any relevant bug?


thanks,

Takashi

2018-10-04 09:19:41

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [BUG] sound: pci: trident: a possible data race



On 2018/10/4 13:24, Takashi Iwai wrote:
> On Thu, 04 Oct 2018 05:08:45 +0200,
> Jia-Ju Bai wrote:
>> Thanks for the reply :)
>>
>>
>> On 2018/10/3 23:54, Takashi Iwai wrote:
>>> On Wed, 03 Oct 2018 14:50:25 +0200,
>>> Jia-Ju Bai wrote:
>>>> CPU0:
>>>> snd_trident_hw_free
>>>> snd_trident_free_voice
>>>> line 3870: spin_lock_irqsave()
>>>> line 3881: voice->substream = NULL; [WRITE]
>>>> CPU1:
>>>> snd_trident_interrupt
>>>> line 3798: snd_pcm_period_elapsed(voice->substream); [READ]
>>>>
>>>> As for voice->substream, the WRITE operation in CPU0 is performed
>>>> with holding a spinlock, but the READ operation in CPU1 is performed
>>>> without holding this spinlock, so there may exist a data race.
>>> Thanks for the report.
>>>
>>> The actual crash must be very unlikely, almost 0%, though.
>>> snd_trident_hw_free() is called always after the PCM stream gets
>>> stopped via trigger callback, i.e. at the moment, there is no
>>> corresponding interrupt is generated for that voice entry.
>> How about the case that playback and capture are performed concurrently?
>> Namely, snd_trident_hw_free() is called for playback, and the
>> interrupt is generated for capture.
> They are different substreams, hence it won't pick up the substream
> object.

Actually, I performed a runtime testing, and found that
snd_trident_hw_free() and snd_trident_interrupt() are executed concurrently.
However, I did not check the substream object, so I guess this report
was produced in the case that I mentioned.
Anyway, my report should be false, sorry...


Best wishes,
Jia-Ju Bai