2024-04-15 11:09:13

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] usb: cdc-wdm: close race between read and workqueue



On 15.04.24 12:53, Bjørn Mork wrote:
> Oliver Neukum <[email protected]> writes:
>> On 15.04.24 11:26, Bjørn Mork wrote:
>>> Oliver Neukum <[email protected]> writes:
>>>> On 15.04.24 08:47, Bjørn Mork wrote:
>>>>
>>>>> urb from service_outstanding_interrupt(). That's why it was added. See
>>>>> the explanation Robert wrote when introducing it:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/usb/class/cdc-wdm.c?id=c1da59dad0ebd3f9bd238f3fff82b1f7ffda7829
>>>>
>>>> Well, the explanation is correct in that we must read
>>>> data available. However, if the RESPONDING flag is set
>>>> and the URB submitted, we are already doing so.
>>> Sounds reasonable. Except that the bug proves we didn't.
>>
>> Why? I am afraid I do not get that part.
>
> I don't get how it happens either. But that's the only thing changed by
> the patch.

Now you have lost me. I agree that this is the only thing that the patch
changes, but how do you derive the consequences from that?

>> > If you are right that service_outstanding_interrupt can race
>> againts
>>> itself (and I don't doubt that), then I guess this could also happen
>>> between failure to submit the URB and clearing the flag?
>>
>> Yes, it can. In fact in this case the behavior should not change.
>> I am afraid we have a misunderstanding. It seems to me that in the
>> unchanged driver the result of service_outstanding_interrupt()
>> is undefined.
>> Please explain.
>
> Sorry, I am so lost here that I am probably only confusing things. I doresp_count
> not understand why we unlock &desc->iuspin around the usb_submit_urb
> call. And git tells me I wrote that.

Dropping iuspin there allowed you to call usb_submit_urb() with GFP_KERNEL.
clear_wdm_read_flag(), as it then existed, could not race with itself because
its only caller wdm_read() is holding a mutex.

That, however, is not very material to the question at hand. iuspin at that
time protected only resp_count. Even today the URB itself is protected by
WDM_RESPONDING. (Which is why I think that test_and_set_bit is required)

Now, if we say that service_outstanding_interrupt() is racing with itself,
we have to ask why this is helpful. Do we at least agree that the regression
Aleksander is seeing is due to the removal of a race or are we looking at a side effect?

Regards
Oliver