2001-10-16 22:47:20

by Shirish Kalele

[permalink] [raw]
Subject: Fw: NFSD over TCP: TCP broken?

> Hi,
>
> I've been looking at running nfsd over tcp on Linux. I modified the #ifdef
> so that nfsd uses tcp. I also made writes to the socket blocking, so that
> the thread blocks till the entire reply has been accepted by TCP. (I know
> the right way is going to be to have an independent thread whose job would
> be to just pick replies off a queue and block on sending them to tcp, but
> this is what I've done temporarily.)
>
> Then I tried to copy a directory from a Solaris client to the Linux server
> using nfsv3 over tcp. This took a long time, with lots of delays where
> nothing was being transferred.
>
> Looking at the network traces, it looks like the RPC records being sent
over
> TCP are inconsistent with the lengths specified in the record marker. This
> happens mainly when 3-4 requests arrive one after the other and you have
3-4
> threads replying to these requests in parallel. It looks like TCP gets
> hopelessly confused and botches up the replies being sent. I point my
finger
> at TCP because tcp_sendmsg returns a valid length indicating that the
entire
> reply was accepted, but the tcp sequence numbers show that the RPC record
> sent on the wire wasn't equal to the length accepted by TCP. After a
while,
> the client realizes it's out of sync when it gets an invalid RPC record
> marker, and resets and reconnects. This repeats multiple times.
>
> Is TCP known to break when multiple threads try to send data down the pipe
> simulaneously? Is there a known fix for this? Where should I be focussing
to
> fix the problem?
>
> I'm not on the list, so please include me in replies.
>
> Thanks,
> Shirish
>
>


2001-10-17 07:21:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: Fw: NFSD over TCP: TCP broken?

>>>>> " " == Shirish Kalele <[email protected]> writes:

>> Hi,

>> Looking at the network traces, it looks like the RPC records
>> being sent
> over
>> TCP are inconsistent with the lengths specified in the record
>> marker. This happens mainly when 3-4 requests arrive one after
>> the other and you have
> 3-4
>> threads replying to these requests in parallel. It looks like
>> TCP gets hopelessly confused and botches up the replies being
>> sent. I point my
> finger
>> at TCP because tcp_sendmsg returns a valid length indicating
>> that the
> entire
>> reply was accepted, but the tcp sequence numbers show that the
>> RPC record sent on the wire wasn't equal to the length accepted
>> by TCP. After a
> while,
>> the client realizes it's out of sync when it gets an invalid
>> RPC record marker, and resets and reconnects. This repeats
>> multiple times.

This is normal. Nobody has fixed the RPC server code. There are plenty
of possible sources of the above problem, but my main suspect is the
fact that the TCP reply code uses blocking socket operations (will
change once we actually go in for supporting TCP), but doesn't provide
any mechanism for preventing another thread from using the socket
while the original writer is sleeping.

Fix: Set up a semaphore in the struct svc_sock somewhere, and use it
to gate write acces to the socket...

Cheers,
Trond