2022-07-21 18:58:35

by Chuck Lever

[permalink] [raw]
Subject: Re: Thread safe client destruction



> On Jul 21, 2022, at 2:41 PM, Attila Kovacs <[email protected]> wrote:
>
> Hi again,
>
>
> I found yet more potential MT flaws in clnt_dg.c and clnt_vg.c.

IIRC libtirpc is not MT-enabled because it was forked from the
Solaris libtirpc before MT-enablement was completed. I could
be remembering incorrectly.

Therefore I think we would need some unit tests along with the
improvements you propose. There's a test suite somewhere, but
someone else will need to provide details.

(and also, please inline your patch submissions rather than
attaching them, to make it easier for us to review. thanks!)


> 1. In clnt_dg.c in clnt_dg_freeres(), cu_fd_lock->active is set to TRUE, with no corresponding clearing when the operation (*xdr_res() call) is completed. This would leave other waiting operations blocked indefinitely, effectively deadlocked. For comparison, clnt_vd_freeres() in clnt_vc.c does not set the active state to TRUE. I believe the vc behavior is correct, while the dg behavior is a bug.
>
> 2. If clnt_dg_destroy() or clnt_vc_destroy() is awoken with other blocked operations pending (such as clnt_*_call(), clnt_*_control(), or clnt_*_freeres()) but no active operation currently being executed, then the client gets destroyed. Then, as the other blocked operations get subsequently awoken, they will try operate on an invalid client handle, potentially causing unpredictable behavior and stack corruption.
>
> The proposed fix is to introduce a simple mutexed counting variable into the client lock structure, which keeps track of the number of pending blocking operations on the client. This way, clnt_*_destroy() can check if there are any operations pending on a client, and keep waiting until all pending operations are completed, before the client is destroyed safely and its resources are freed.
>
> Attached is a patch with the above fixes.
>
> -- A.<clnt_mt_safe_destroy.patch>

--
Chuck Lever




2022-07-21 19:19:59

by Attila Kovacs

[permalink] [raw]
Subject: Re: Thread safe client destruction

Hi Chuck,


I'm not adding any new functionality here. I'm merely trying to fix
what's already there but broken (in my opinion). :-)

-- A.


On 7/21/22 14:45, Chuck Lever III wrote:
>
>
>> On Jul 21, 2022, at 2:41 PM, Attila Kovacs <[email protected]> wrote:
>>
>> Hi again,
>>
>>
>> I found yet more potential MT flaws in clnt_dg.c and clnt_vg.c.
>
> IIRC libtirpc is not MT-enabled because it was forked from the
> Solaris libtirpc before MT-enablement was completed. I could
> be remembering incorrectly.
>
> Therefore I think we would need some unit tests along with the
> improvements you propose. There's a test suite somewhere, but
> someone else will need to provide details.
>
> (and also, please inline your patch submissions rather than
> attaching them, to make it easier for us to review. thanks!)
>
>
>> 1. In clnt_dg.c in clnt_dg_freeres(), cu_fd_lock->active is set to TRUE, with no corresponding clearing when the operation (*xdr_res() call) is completed. This would leave other waiting operations blocked indefinitely, effectively deadlocked. For comparison, clnt_vd_freeres() in clnt_vc.c does not set the active state to TRUE. I believe the vc behavior is correct, while the dg behavior is a bug.
>>
>> 2. If clnt_dg_destroy() or clnt_vc_destroy() is awoken with other blocked operations pending (such as clnt_*_call(), clnt_*_control(), or clnt_*_freeres()) but no active operation currently being executed, then the client gets destroyed. Then, as the other blocked operations get subsequently awoken, they will try operate on an invalid client handle, potentially causing unpredictable behavior and stack corruption.
>>
>> The proposed fix is to introduce a simple mutexed counting variable into the client lock structure, which keeps track of the number of pending blocking operations on the client. This way, clnt_*_destroy() can check if there are any operations pending on a client, and keep waiting until all pending operations are completed, before the client is destroyed safely and its resources are freed.
>>
>> Attached is a patch with the above fixes.
>>
>> -- A.<clnt_mt_safe_destroy.patch>
>
> --
> Chuck Lever
>
>
>

2022-07-21 20:42:31

by Steve Dickson

[permalink] [raw]
Subject: Re: Thread safe client destruction

Hey!

On 7/21/22 3:19 PM, Attila Kovacs wrote:
> Hi Chuck,
>
>
> I'm not adding any new functionality here. I'm merely trying to fix
> what's already there but broken (in my opinion). :-)
I agree with this.... And I will be more that willing
to reformat the patches make it easier to review

I don't want nits like that to get in the way this good work,
but it would be good if you could added the
Signed-off-by: Attila Kovacs <[email protected]>

line, which is described in [1]

steved.

[1] https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
>
> -- A.
>
>
> On 7/21/22 14:45, Chuck Lever III wrote:
>>
>>
>>> On Jul 21, 2022, at 2:41 PM, Attila Kovacs
>>> <[email protected]> wrote:
>>>
>>> Hi again,
>>>
>>>
>>> I found yet more potential MT flaws in clnt_dg.c and clnt_vg.c.
>>
>> IIRC libtirpc is not MT-enabled because it was forked from the
>> Solaris libtirpc before MT-enablement was completed. I could
>> be remembering incorrectly.
>>
>> Therefore I think we would need some unit tests along with the
>> improvements you propose. There's a test suite somewhere, but
>> someone else will need to provide details.
>>
>> (and also, please inline your patch submissions rather than
>> attaching them, to make it easier for us to review. thanks!)
>>
>>
>>> 1. In clnt_dg.c in clnt_dg_freeres(), cu_fd_lock->active is set to
>>> TRUE, with no corresponding clearing when the operation (*xdr_res()
>>> call) is completed. This would leave other waiting operations blocked
>>> indefinitely, effectively deadlocked. For comparison,
>>> clnt_vd_freeres() in clnt_vc.c does not set the active state to TRUE.
>>> I believe the vc behavior is correct, while the dg behavior is a bug.
>>>
>>> 2. If clnt_dg_destroy() or clnt_vc_destroy() is awoken with other
>>> blocked operations pending (such as clnt_*_call(), clnt_*_control(),
>>> or clnt_*_freeres()) but no active operation currently being
>>> executed, then the client gets destroyed. Then, as the other blocked
>>> operations get subsequently awoken, they will try operate on an
>>> invalid client handle, potentially causing unpredictable behavior and
>>> stack corruption.
>>>
>>> The proposed fix is to introduce a simple mutexed counting variable
>>> into the client lock structure, which keeps track of the number of
>>> pending blocking operations on the client. This way, clnt_*_destroy()
>>> can check if there are any operations pending on a client, and keep
>>> waiting until all pending operations are completed, before the client
>>> is destroyed safely and its resources are freed.
>>>
>>> Attached is a patch with the above fixes.
>>>
>>> -- A.<clnt_mt_safe_destroy.patch>
>>
>> --
>> Chuck Lever
>>
>>
>>
>