2004-04-13 00:26:58

by Christoph Lameter

[permalink] [raw]
Subject: CIFS/SMBFS failing under load in 2.6.X

Whenever I put a high load on CIFS or SMBFS requests timeout and then the
benchmark or whatever I run fails. I ran the same tests successfully with
a 2.4.25 kernel. This is a connection to a samba 3.0.2 server.

SMBFS logs the following:

Apr 12 15:59:25 testbox kernel: smb_add_request: request [ca7b7280,
mid=12891] timed out!
Apr 12 15:59:25 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 15:59:26 testbox kernel: smb_add_request: request [ca7b7080,
mid=13701] timed out!
Apr 12 15:59:26 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 16:00:08 testbox kernel: smb_add_request: request [ca7b7c80,
mid=47333] timed out!
Apr 12 16:00:08 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5
Apr 12 16:00:10 testbox kernel: smb_add_request: request [ca7b7880,
mid=48900] timed out!
Apr 12 16:00:10 testbox kernel: smb_writepage_sync: failed write, wsize=1,
result=-5
Apr 12 16:00:13 testbox kernel: smb_add_request: request [ca7b7180,
mid=50657] timed out!
Apr 12 16:00:13 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5
Apr 12 16:00:22 testbox kernel: smb_add_request: request [ca7b7e80,
mid=57576] timed out!
Apr 12 16:00:22 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 16:00:22 testbox kernel: smb_add_request: request [ca7b7d80,
mid=57900] timed out!
Apr 12 16:00:22 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 16:00:39 testbox kernel: smb_add_request: request [ca7b7b80,
mid=9411] timed out!
Apr 12 16:00:39 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 16:01:22 testbox kernel: smb_add_request: request [ca7b7980,
mid=40403] timed out!
Apr 12 16:01:22 testbox kernel: smb_writepage_sync: failed write, wsize=1,
result=-5
Apr 12 16:04:53 testbox kernel: smb_add_request: request [c9d25980,
mid=35372] timed out!
Apr 12 16:04:53 testbox kernel: smb_writepage_sync: failed write,
wsize=4096, result=-5
Apr 12 16:04:53 testbox kernel: smb_add_request: request [c9d25580,
mid=35548] timed out!
Apr 12 16:04:53 testbox kernel: smb_writepage_sync: failed write,
wsize=53, result=-5
Apr 12 16:05:32 testbox kernel: smb_add_request: request [c9d25b80,
mid=5926] timed out!
Apr 12 16:05:32 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5
Apr 12 16:05:32 testbox kernel: smb_add_request: request [c9d25780,
mid=5993] timed out!
Apr 12 16:05:32 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5
Apr 12 16:05:35 testbox kernel: smb_add_request: request [c9d25680,
mid=7816] timed out!
Apr 12 16:05:35 testbox kernel: smb_writepage_sync: failed write, wsize=1,
result=-5
Apr 12 16:05:38 testbox kernel: smb_add_request: request [c9d25c80,
mid=10166] timed out!
Apr 12 16:05:38 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5
Apr 12 16:05:38 testbox kernel: smb_add_request: request [c9d25d80,
mid=10231] timed out!
Apr 12 16:05:38 testbox kernel: smb_writepage_sync: failed write,
wsize=2048, result=-5


CIFS logs:

Apr 12 17:02:00 testbox kernel: CIFS VFS: Send error in write = -6
Apr 12 17:02:29 testbox kernel: CIFS VFS: Send error in write = -5
Apr 12 17:02:29 testbox last message repeated 8 times
Apr 12 17:02:39 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:02:49 testbox last message repeated 10 times
Apr 12 17:02:49 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:02:59 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:02:59 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:09 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:09 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:14 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:19 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:19 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:20 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:20 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:20 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:20 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:20 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:20 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:20 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:20 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:20 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:20 testbox kernel: CIFS VFS: Error 0xfffffffb or (-5
decimal) on cifs_get_inode_info in lookup
Apr 12 17:03:35 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:45 testbox kernel: CIFS VFS: Need to reconnect after session
died to server
Apr 12 17:03:46 testbox kernel: CIFS VFS: cifs_umount failed with return
code -5

Tests were run with Linux 2.6.5.


2004-04-22 17:43:14

by Urban Widmark

[permalink] [raw]
Subject: Re: CIFS/SMBFS failing under load in 2.6.X

On Mon, 12 Apr 2004, Christoph Lameter wrote:

> Whenever I put a high load on CIFS or SMBFS requests timeout and then the
> benchmark or whatever I run fails. I ran the same tests successfully with
> a 2.4.25 kernel. This is a connection to a samba 3.0.2 server.
>
> SMBFS logs the following:
>
> Apr 12 15:59:25 testbox kernel: smb_add_request: request [ca7b7280,
> mid=12891] timed out!
> Apr 12 15:59:25 testbox kernel: smb_writepage_sync: failed write,
> wsize=4096, result=-5
...

> CIFS logs:
>
> Apr 12 17:02:00 testbox kernel: CIFS VFS: Send error in write = -6
> Apr 12 17:02:29 testbox kernel: CIFS VFS: Send error in write = -5
> Apr 12 17:02:29 testbox last message repeated 8 times
> Apr 12 17:02:39 testbox kernel: CIFS VFS: Need to reconnect after session
> died to server

smbfs and cifs does not share any code although I believe both of them
will send multiple requests in parallel. Any chance that this is the
server or network?


smbfs at least does not limit the number of requests it sends. It could be
a problem if the server has a low limit (should be the maxmux field in the
smb_conn_opt struct).

I could send a patch for this, but unless cifs does the same then that is
probably not it.

/Urban

2004-04-23 00:51:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: CIFS/SMBFS failing under load in 2.6.X

Well the server is under very high load in this test (up to 200) and the
response times are also extremely high. Are timeouts new in 2.6.x? SMBFS
in 2.4.X does not seem to timeout.

Also are there any fixes for the 4KB size limitation? Windows allows 64K
writes and reads in one request. SMBFS only 4K.

On Thu, 22 Apr 2004, Urban Widmark wrote:

> On Mon, 12 Apr 2004, Christoph Lameter wrote:
>
> > Whenever I put a high load on CIFS or SMBFS requests timeout and then the
> > benchmark or whatever I run fails. I ran the same tests successfully with
> > a 2.4.25 kernel. This is a connection to a samba 3.0.2 server.
> >
> > SMBFS logs the following:
> >
> > Apr 12 15:59:25 testbox kernel: smb_add_request: request [ca7b7280,
> > mid=12891] timed out!
> > Apr 12 15:59:25 testbox kernel: smb_writepage_sync: failed write,
> > wsize=4096, result=-5
> ...
>
> > CIFS logs:
> >
> > Apr 12 17:02:00 testbox kernel: CIFS VFS: Send error in write = -6
> > Apr 12 17:02:29 testbox kernel: CIFS VFS: Send error in write = -5
> > Apr 12 17:02:29 testbox last message repeated 8 times
> > Apr 12 17:02:39 testbox kernel: CIFS VFS: Need to reconnect after session
> > died to server
>
> smbfs and cifs does not share any code although I believe both of them
> will send multiple requests in parallel. Any chance that this is the
> server or network?
>
>
> smbfs at least does not limit the number of requests it sends. It could be
> a problem if the server has a low limit (should be the maxmux field in the
> smb_conn_opt struct).
>
> I could send a patch for this, but unless cifs does the same then that is
> probably not it.
>
> /Urban
>
>

2004-04-26 21:21:46

by Urban Widmark

[permalink] [raw]
Subject: Re: CIFS/SMBFS failing under load in 2.6.X

On Thu, 22 Apr 2004, Christoph Lameter wrote:

> Well the server is under very high load in this test (up to 200) and the
> response times are also extremely high. Are timeouts new in 2.6.x? SMBFS
> in 2.4.X does not seem to timeout.

Both have a timeout but they are different. I think that if smbfs-2.4
doesn't get any data for 30sec it aborts. 2.6 wants the full reply within
that time. So the 2.4 code should be happy with 1 byte every 29.9 seconds.

Also 2.4 smbfs never has more than one request active. Is the load the
same?

I should check the code, but I guess that a timeout is counted as a likely
network problem. So that could be why it reconnects. Lots of reconnections
== higher load?

You can increase the timeout with the 'timeo' option. Set it to a couple
of minutes and see if that helps any.


> Also are there any fixes for the 4KB size limitation? Windows allows 64K
> writes and reads in one request. SMBFS only 4K.

Yes, I'm well aware of that limitation. I started looking at readahead and
read/write coalescing for the 2.4 interface but I never finished it.

The readahead code I had didn't make it noticably faster in most cases, so
it didn't feel that important to get it done (it did merge the requests).
There was some specific condition where it did help a bit but I don't
remember what that was. Could have been when transfering data over a
higher latency connection.

For 2.6 the readpages/writepages interface needs to be implemented,
probably quite similar to the smb_readpage/smb_writepage code. Possibly
with some changes to smb_proc_writeX/readX and the smb_request struct.

If you are interested in doing it I could try to give you some pointers on
how I think it can be done. If you want me to do it, sure. Just remind me
in a week or so when I haven't responded. :)

/Urban

2004-04-28 19:56:31

by Steve French (smfltc)

[permalink] [raw]
Subject: Re: CIFS/SMBFS failing under load in 2.6.X

> Whenever I put a high load on CIFS or SMBFS requests timeout and
> then the benchmark or whatever I run fails. I ran the same tests
> successfully with a 2.4.25 kernel

This should be ok on 2.6.6 let me kmow if you see any more stress
problems with cifs vfs. The reconnection problems and readpages memory
leak were fixed quite a while ago in the cifs vfs (but not merged until
recently), and the problem with incomplete socket ops and incorrect
signal handling causing timeouts were fixed more recently - but both
should be in 2.6.6 (which includes a very large cifs update).

Also that with 2.6.6 the CIFS VFS will add support for NTv4 (although a
few of the posix options won't work due to lack of server support) which
may be helpful to some still.

Probably the most important performance optimization that will help some
of the popular stress scenarios, at this point at least, is implementing
cifs_writepages, and eliminating the extra memcpy in writepage. dbench
performance is ok - but since it is heavily oriented towards writes and
writepage is overly serialized, with dbench cifs (and smbfs) gets only
about 1/3 of what I would consider a reasonable goal for maximum
achievable throughput to Samba (based on the tbench estimates for
maximum network throughput).

Interestingly there are a few microbenchmarks in which implementing
readpages and writepage can actually slow things down, but in general
the addition of readpages/writepage along with oplock (smb/cifs
distributed caching) support has been a big help.

Jeremy Allison and I are starting to work through some minor CIFS
dialect enhancements to help performance even more in the CIFS -> Samba
case and perhaps getting the spec written up nicely if there is
interest.