2007-12-13 22:28:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 09/38] svc: Add a transport function that checks for write space

On Thu, Dec 13, 2007 at 04:20:27PM -0600, Tom Tucker wrote:
>
> On Thu, 2007-12-13 at 16:45 -0500, Chuck Lever wrote:
> > On Dec 13, 2007, at 4:33 PM, J. Bruce Fields wrote:
> > > On Wed, Dec 12, 2007 at 01:10:17PM -0500, Chuck Lever wrote:
> > >> On Dec 11, 2007, at 6:32 PM, Tom Tucker wrote:
> > >>> + required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
> > >>> + wspace = sk_stream_wspace(svsk->sk_sk);
> > >>> +
> > >>> + if (wspace < sk_stream_min_wspace(svsk->sk_sk))
> > >>> + return 0;
> > >>> + if (required * 2 > wspace)
> > >>> + return 0;
> > >>
> > >> Since "required" is an int, this test can behave differently than
> > >> the one
> > >> it replaces.
> > >
> > > If sk_reserved can approach half of 2^31, for example, then surely we
> > > have bigger problems?
> >
> >
> > What stops sk_reserved from going negative?
>
> Nothing actively _stops_ it from going negative. Indirectly it is
> prevented by the "fact" that sv_max_mesg is always greater than the
> amount returned by any read on the socket. If this is not true, then
> sk_reserved can go negative. That would occur in svc_recv when the
> difference between the amount currently reserved (worst case
> sv_max_mesg) and the amount read is subtracted from the amount reserved.
> If the amount read were greater than sv_max_mesg then the result would
> be negative (both meanings intended).
>
> I could add a BUG_ON and run some tests.

I have to admit, I'm completely and utterly lost here. Chuck, could you
explain exactly, in detail, how you think this change could cause a bug?

--b.


2007-12-14 14:00:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 09/38] svc: Add a transport function that checks for write space


On Dec 13, 2007, at 5:28 PM, J. Bruce Fields wrote:

> On Thu, Dec 13, 2007 at 04:20:27PM -0600, Tom Tucker wrote:
>>
>> On Thu, 2007-12-13 at 16:45 -0500, Chuck Lever wrote:
>>> On Dec 13, 2007, at 4:33 PM, J. Bruce Fields wrote:
>>>> On Wed, Dec 12, 2007 at 01:10:17PM -0500, Chuck Lever wrote:
>>>>> On Dec 11, 2007, at 6:32 PM, Tom Tucker wrote:
>>>>>> + required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
>>>>>> + wspace = sk_stream_wspace(svsk->sk_sk);
>>>>>> +
>>>>>> + if (wspace < sk_stream_min_wspace(svsk->sk_sk))
>>>>>> + return 0;
>>>>>> + if (required * 2 > wspace)
>>>>>> + return 0;
>>>>>
>>>>> Since "required" is an int, this test can behave differently than
>>>>> the one
>>>>> it replaces.
>>>>
>>>> If sk_reserved can approach half of 2^31, for example, then
>>>> surely we
>>>> have bigger problems?
>>>
>>>
>>> What stops sk_reserved from going negative?
>>
>> Nothing actively _stops_ it from going negative. Indirectly it is
>> prevented by the "fact" that sv_max_mesg is always greater than the
>> amount returned by any read on the socket. If this is not true, then
>> sk_reserved can go negative. That would occur in svc_recv when the
>> difference between the amount currently reserved (worst case
>> sv_max_mesg) and the amount read is subtracted from the amount
>> reserved.
>> If the amount read were greater than sv_max_mesg then the result
>> would
>> be negative (both meanings intended).
>>
>> I could add a BUG_ON and run some tests.
>
> I have to admit, I'm completely and utterly lost here. Chuck,
> could you
> explain exactly, in detail, how you think this change could cause a
> bug?


The original code has this:

static inline unsigned long
svc_sock_wspace(struct svc_sock *svsk)
{
int wspace;

if (svsk->sk_sock->type == SOCK_STREAM)
wspace = sk_stream_wspace(svsk->sk_sk);
else
wspace = sock_wspace(svsk->sk_sk);

return wspace;
}

...

if (((atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg)*2
> svc_sock_wspace(svsk))

Since svc_sock_wspace() returns an unsigned long, that means the sum
above is always converted to unsigned long. If sk_reserved goes
negative, then the sum becomes a large positive.

Note that in the TCP case, sk_stream_wspace() returns an int, which
is also converted to an unsigned long by svc_sock_wspace().
sk_stream_wspace() can return a negative if sk_wmem_queued happens to
become larger than sk_sndbuf.

In all the other wspace callbacks I checked, the result of
sk_stream_wspace() is compared to sk_stream_min_space() -- comparing
an int to an int -- which means if sk_stream_wspace()'s result *does*
go negative, those comparisons will catch it and do the right thing.

In the original RPC server implementation, though, sk_stream_wspace
()'s result is converted to unsigned long; negative return values
thus become large positives.

In both of these scenarios, the large positives make the comparison
behave in exactly the opposite way we might expect it to. Tom's code
fixes this by making "required" an int for the TCP case so we still
get an int-to-int comparison.

Because the original is coded the way it is, however, it's difficult
to tell whether the signedness of these variables was intended for
some clever effect, or whether it was an oversight. If this was
intentional, the new wspace check for TCP could have unintended
performance consequences.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com