This is on 32 bit x86 on a Dell 1950
After mouting a cifs share we have 5 second hiccups. Typical log output
when doing a simple "ls /mnt":
Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: For smb_command 50
Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: Sending smb:
total_len 118
Sep 4 16:21:43 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving
cifs_revalidate (xid = 258) rc = 0
Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: CIFS VFS: in cifs_lookup
as Xid: 263 with uid: 0
Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: parent inode = 0xf58d2e60
name is: AutoWire.bmp and dentry = 0xf5adb63c
Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: NULL inode in lookup
Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: Full path: \AutoWire.bmp
inode = 0x(null)
Sep 4 16:21:43 rd-spare kernel: fs/cifs/inode.c: Getting info on \AutoWire.bmp
Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: For smb_command 50
Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: Sending smb: total_len 104
---- 5 second hiccup
Sep 4 16:21:48 rd-spare kernel: fs/cifs/connect.c: rfc1002 length 0xce
Sep 4 16:21:48 rd-spare kernel: fs/cifs/connect.c: rfc1002 length 0xc0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: inode 0xf5876518 old_time=26000 new_time=32751
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: cifs_revalidate - inode unchanged
Sep 4 16:21:48 rd-spare kernel: fs/cifs/file.c: CIFS VFS: in cifs_writepages as Xid: 264 with uid: 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/file.c: CIFS VFS: leaving cifs_writepages (xid = 264) rc = 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving cifs_revalidate (xid = 262) rc = 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: in cifs_revalidate as Xid: 265 with uid: 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: Revalidate: \Akamai Headsets.doc inode 0xf5876518 count 2 dentry: 0xf5ada8d0 d_time 260
00 jiffies 32751
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving cifs_revalidate (xid = 265) rc = 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: in cifs_revalidate as Xid: 266 with uid: 0
Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: Revalidate: \Akamai Headsets.doc inode 0xf5876518 count 2 dentry: 0xf5ada8d0 d_time 260
00 jiffies 32751
This is happening intermittently on a variety of hosts.
cat /proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 1.60
Active VFS Requests: 2
Servers:
1) Name: 10.2.4.64 Domain: W2K Uses: 1 OS: Windows Server 2003 R2 3790
Service Pack 2
NOS: Windows Server 2003 R2 5.2 Capability: 0x1f3fd
SMB session status: 1 TCP status: 1
Local Users To Server: 1 SecMode: 0x3 Req On Wire: 2
Shares:
1) \\chiprodfs2\company Mounts: 1 Type: NTFS DevInfo: 0x20
Attributes: 0x700ff
PathComponentMax: 255 Status: 0x1 type: DISK
MIDs:
State: 2 com: 50 pid: 5951 tsk: f756d1b0 mid 277
State: 2 com: 50 pid: 6044 tsk: f69d4760 mid 278
cat /proc/fs/cifs/Stats
Resources in use
CIFS Session: 1
Share (unique mount targets): 1
SMB Request/Response Buffer: 5 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 2
0 session 0 share reconnects
Total vfs operations: 525 maximum at one time: 3
1) \\chiprodfs2\company
SMBs: 305 Oplock Breaks: 0
Reads: 0 Bytes: 0
Writes: 0 Bytes: 0
Flushes: 0
Locks: 0 HardLinks: 0 Symlinks: 0
Opens: 0 Closes: 0 Deletes: 0
Posix Opens: 0 Posix Mkdirs: 0
Mkdirs: 0 Rmdirs: 0
Renames: 0 T2 Renames 0
FindFirst: 2 FNext 0 FClose 0
What is this ???
On Fri, 4 Sep 2009 12:27:35 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> This is on 32 bit x86 on a Dell 1950
>
> After mouting a cifs share we have 5 second hiccups. Typical log output
> when doing a simple "ls /mnt":
>
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: For smb_command 50
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: Sending smb:
> total_len 118
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving
> cifs_revalidate (xid = 258) rc = 0
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: CIFS VFS: in cifs_lookup
> as Xid: 263 with uid: 0
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: parent inode = 0xf58d2e60
> name is: AutoWire.bmp and dentry = 0xf5adb63c
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: NULL inode in lookup
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/dir.c: Full path: \AutoWire.bmp
> inode = 0x(null)
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/inode.c: Getting info on \AutoWire.bmp
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: For smb_command 50
> Sep 4 16:21:43 rd-spare kernel: fs/cifs/transport.c: Sending smb: total_len 104
>
> ---- 5 second hiccup
>
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/connect.c: rfc1002 length 0xce
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/connect.c: rfc1002 length 0xc0
(adding linux-cifs-client mailing list)
It looks like it's just taking 5s for the server to respond here. Do
you happen to have a wire capture of one of these events? That may tell
us more than cifsFYI info...
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: inode 0xf5876518 old_time=26000 new_time=32751
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: cifs_revalidate - inode unchanged
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/file.c: CIFS VFS: in cifs_writepages as Xid: 264 with uid: 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/file.c: CIFS VFS: leaving cifs_writepages (xid = 264) rc = 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving cifs_revalidate (xid = 262) rc = 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: in cifs_revalidate as Xid: 265 with uid: 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: Revalidate: \Akamai Headsets.doc inode 0xf5876518 count 2 dentry: 0xf5ada8d0 d_time 260
> 00 jiffies 32751
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: leaving cifs_revalidate (xid = 265) rc = 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: CIFS VFS: in cifs_revalidate as Xid: 266 with uid: 0
> Sep 4 16:21:48 rd-spare kernel: fs/cifs/inode.c: Revalidate: \Akamai Headsets.doc inode 0xf5876518 count 2 dentry: 0xf5ada8d0 d_time 260
> 00 jiffies 32751
>
>
> This is happening intermittently on a variety of hosts.
>
> cat /proc/fs/cifs/DebugData
>
> Display Internal CIFS Data Structures for Debugging
> ---------------------------------------------------
> CIFS Version 1.60
> Active VFS Requests: 2
> Servers:
> 1) Name: 10.2.4.64 Domain: W2K Uses: 1 OS: Windows Server 2003 R2 3790
> Service Pack 2
> NOS: Windows Server 2003 R2 5.2 Capability: 0x1f3fd
> SMB session status: 1 TCP status: 1
> Local Users To Server: 1 SecMode: 0x3 Req On Wire: 2
> Shares:
> 1) \\chiprodfs2\company Mounts: 1 Type: NTFS DevInfo: 0x20
> Attributes: 0x700ff
> PathComponentMax: 255 Status: 0x1 type: DISK
>
> MIDs:
> State: 2 com: 50 pid: 5951 tsk: f756d1b0 mid 277
> State: 2 com: 50 pid: 6044 tsk: f69d4760 mid 278
>
> cat /proc/fs/cifs/Stats
>
> Resources in use
> CIFS Session: 1
> Share (unique mount targets): 1
> SMB Request/Response Buffer: 5 Pool size: 5
> SMB Small Req/Resp Buffer: 1 Pool size: 30
> Operations (MIDs): 2
>
> 0 session 0 share reconnects
> Total vfs operations: 525 maximum at one time: 3
>
> 1) \\chiprodfs2\company
> SMBs: 305 Oplock Breaks: 0
> Reads: 0 Bytes: 0
> Writes: 0 Bytes: 0
> Flushes: 0
> Locks: 0 HardLinks: 0 Symlinks: 0
> Opens: 0 Closes: 0 Deletes: 0
> Posix Opens: 0 Posix Mkdirs: 0
> Mkdirs: 0 Rmdirs: 0
> Renames: 0 T2 Renames 0
> FindFirst: 2 FNext 0 FClose 0
>
>
> What is this ???
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Jeff Layton <[email protected]>
On Sat, 5 Sep 2009, Jeff Layton wrote:
> It looks like it's just taking 5s for the server to respond here. Do
> you happen to have a wire capture of one of these events? That may tell
> us more than cifsFYI info...
I did a tcpdump and nothing stands out. Server acks the "cmd 50" and then
waits 5 seconds before sending the data.
16:23:34.336373 IP (tos 0x0, ttl 64, id 20616, offset 0, flags [DF], proto 6, length: 118) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 2801206064:2801206142(78) ack 468207120 win 190
16:23:34.336624 IP (tos 0x0, ttl 125, id 19869, offset 0, flags [DF], proto 6, length: 206) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 1:167(166) ack 78 win 64548
16:23:34.336636 IP (tos 0x0, ttl 64, id 20617, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 78:78(0) ack 167 win 190
16:23:34.336669 IP (tos 0x0, ttl 64, id 20618, offset 0, flags [DF], proto 6, length: 128) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 78:166(88) ack 167 win 190
16:23:34.456343 IP (tos 0x0, ttl 125, id 20045, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 167:167(0) ack 166 win 64460
hiccup
16:23:39.284930 IP (tos 0x0, ttl 125, id 27544, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 167:357(190) ack 166 win 64460
16:23:39.324060 IP (tos 0x0, ttl 64, id 20619, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 357 win 190
16:23:39.324292 IP (tos 0x0, ttl 125, id 27563, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 357:1817(1460) ack 166 win 64460
16:23:39.324300 IP (tos 0x0, ttl 64, id 20620, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 1817 win 190
16:23:39.324306 IP (tos 0x0, ttl 125, id 27564, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 1817:3277(1460) ack 166 win 64460
16:23:39.324311 IP (tos 0x0, ttl 64, id 20621, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 3277 win 188
16:23:39.324315 IP (tos 0x0, ttl 125, id 27565, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 3277:4737(1460) ack 166 win 64460
16:23:39.324319 IP (tos 0x0, ttl 64, id 20622, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 4737 win 186
16:23:39.324321 IP (tos 0x0, ttl 125, id 27566, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 4737:6197(1460) ack 166 win 64460
16:23:39.324324 IP (tos 0x0, ttl 64, id 20623, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 6197 win 184
16:23:39.324329 IP (tos 0x0, ttl 125, id 27567, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 6197:7657(1460) ack 166 win 64460
16:23:39.324332 IP (tos 0x0, ttl 64, id 20624, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 7657 win 182
16:23:39.324335 IP (tos 0x0, ttl 125, id 27568, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 7657:9117(1460) ack 166 win 64460
16:23:39.324337 IP (tos 0x0, ttl 64, id 20625, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 9117 win 180
16:23:39.324354 IP (tos 0x0, ttl 125, id 27569, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 9117:10577(1460) ack 166 win 64460
16:23:39.324362 IP (tos 0x0, ttl 64, id 20626, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 10577 win 190
16:23:39.324371 IP (tos 0x0, ttl 125, id 27570, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 10577:12037(1460) ack 166 win 64460
16:23:39.324374 IP (tos 0x0, ttl 64, id 20627, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 12037 win 188
16:23:39.324377 IP (tos 0x0, ttl 125, id 27571, offset 0, flags [DF], proto 6, length: 1500) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 12037:13497(1460) ack 166 win 64460
16:23:39.324379 IP (tos 0x0, ttl 64, id 20628, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 13497 win 186
16:23:39.324383 IP (tos 0x0, ttl 125, id 27572, offset 0, flags [DF], proto 6, length: 502) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 13497:13959(462) ack 166 win 64460
16:23:39.324385 IP (tos 0x0, ttl 64, id 20629, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 13959 win 186
16:23:39.324666 IP (tos 0x0, ttl 64, id 20630, offset 0, flags [DF], proto 6, length: 150) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 166:276(110) ack 13959 win 190
16:23:39.486091 IP (tos 0x0, ttl 125, id 28080, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 13959:13959(0) ack 276 win 64350
hiccup
16:23:44.315476 IP (tos 0x0, ttl 125, id 2862, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 13959:14149(190) ack 276 win 64350
16:23:44.355370 IP (tos 0x0, ttl 64, id 20631, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 276:276(0) ack 14149 win 190
16:23:44.355531 IP (tos 0x0, ttl 125, id 2875, offset 0, flags [DF], proto 6, length: 48) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 14149:14157(8) ack 276 win 64350
16:23:44.355540 IP (tos 0x0, ttl 64, id 20632, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 276:276(0) ack 14157 win 190
16:23:44.355765 IP (tos 0x0, ttl 64, id 20633, offset 0, flags [DF], proto 6, length: 178) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 276:414(138) ack 14157 win 190
16:23:44.517419 IP (tos 0x0, ttl 125, id 3277, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 14157:14157(0) ack 414 win 64212
hiccup
16:23:49.245126 IP (tos 0x0, ttl 125, id 10924, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 14157:14347(190) ack 414 win 64212
16:23:49.285309 IP (tos 0x0, ttl 64, id 20634, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 414:414(0) ack 14347 win 190
16:23:49.285462 IP (tos 0x0, ttl 125, id 11044, offset 0, flags [DF], proto 6, length: 76) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 14347:14383(36) ack 414 win 64212
16:23:49.285470 IP (tos 0x0, ttl 64, id 20635, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 414:414(0) ack 14383 win 190
16:23:49.285505 IP (tos 0x0, ttl 64, id 20636, offset 0, flags [DF], proto 6, length: 184) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 414:558(144) ack 14383 win 190
16:23:49.447170 IP (tos 0x0, ttl 125, id 11093, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 14383:14383(0) ack 558 win 65535
hiccup
16:23:54.275481 IP (tos 0x0, ttl 125, id 18653, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 14383:14573(190) ack 558 win 65535
16:23:54.315247 IP (tos 0x0, ttl 64, id 20637, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 558:558(0) ack 14573 win 190
16:23:54.315383 IP (tos 0x0, ttl 125, id 18719, offset 0, flags [DF], proto 6, length: 82) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 14573:14615(42) ack 558 win 65535
16:23:54.315392 IP (tos 0x0, ttl 64, id 20638, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 558:558(0) ack 14615 win 190
16:23:54.315443 IP (tos 0x0, ttl 64, id 20639, offset 0, flags [DF], proto 6, length: 158) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 558:676(118) ack 14615 win 190
16:23:54.477564 IP (tos 0x0, ttl 125, id 19062, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 14615:14615(0) ack 676 win 65417
hiccup
and so on.
16:23:59.306748 IP (tos 0x0, ttl 125, id 27953, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 14615:14805(190) ack 676 win 65417
16:23:59.346120 IP (tos 0x0, ttl 64, id 20640, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 676:676(0) ack 14805 win 190
16:23:59.346256 IP (tos 0x0, ttl 125, id 27964, offset 0, flags [DF], proto 6, length: 56) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 14805:14821(16) ack 676 win 65417
16:23:59.346264 IP (tos 0x0, ttl 64, id 20641, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 676:676(0) ack 14821 win 190
16:23:59.346293 IP (tos 0x0, ttl 64, id 20642, offset 0, flags [DF], proto 6, length: 156) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 676:792(116) ack 14821 win 190
16:23:59.507011 IP (tos 0x0, ttl 125, id 28189, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 14821:14821(0) ack 792 win 65301
16:24:04.336231 IP (tos 0x0, ttl 125, id 22093, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 14821:15011(190) ack 792 win 65301
16:24:04.375997 IP (tos 0x0, ttl 64, id 20643, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 792:792(0) ack 15011 win 190
16:24:04.376163 IP (tos 0x0, ttl 125, id 22565, offset 0, flags [DF], proto 6, length: 54) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 15011:15025(14) ack 792 win 65301
16:24:04.376170 IP (tos 0x0, ttl 64, id 20644, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 792:792(0) ack 15025 win 190
16:24:04.376223 IP (tos 0x0, ttl 64, id 20645, offset 0, flags [DF], proto 6, length: 190) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 792:942(150) ack 15025 win 190
16:24:04.538328 IP (tos 0x0, ttl 125, id 23320, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 15025:15025(0) ack 942 win 65151
16:24:09.266915 IP (tos 0x0, ttl 125, id 31959, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 15025:15215(190) ack 942 win 65151
16:24:09.306308 IP (tos 0x0, ttl 64, id 20646, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 942:942(0) ack 15215 win 190
16:24:09.306608 IP (tos 0x0, ttl 125, id 31990, offset 0, flags [DF], proto 6, length: 88) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 15215:15263(48) ack 942 win 65151
16:24:09.306614 IP (tos 0x0, ttl 64, id 20647, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 942:942(0) ack 15263 win 190
16:24:09.306640 IP (tos 0x0, ttl 64, id 20648, offset 0, flags [DF], proto 6, length: 198) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 942:1100(158) ack 15263 win 190
16:24:09.467203 IP (tos 0x0, ttl 125, id 32472, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 15263:15263(0) ack 1100 win 64993
16:24:14.296632 IP (tos 0x0, ttl 125, id 5658, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 15263:15453(190) ack 1100 win 64993
16:24:14.336194 IP (tos 0x0, ttl 64, id 20649, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1100:1100(0) ack 15453 win 190
16:24:14.336334 IP (tos 0x0, ttl 125, id 5674, offset 0, flags [DF], proto 6, length: 96) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 15453:15509(56) ack 1100 win 64993
16:24:14.336343 IP (tos 0x0, ttl 64, id 20650, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1100:1100(0) ack 15509 win 190
16:24:14.336371 IP (tos 0x0, ttl 64, id 20651, offset 0, flags [DF], proto 6, length: 152) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 1100:1212(112) ack 15509 win 190
16:24:14.498506 IP (tos 0x0, ttl 125, id 5736, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 15509:15509(0) ack 1212 win 64881
16:24:19.327717 IP (tos 0x0, ttl 125, id 15788, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 15509:15699(190) ack 1212 win 64881
16:24:19.367433 IP (tos 0x0, ttl 64, id 20652, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1212:1212(0) ack 15699 win 190
16:24:19.367575 IP (tos 0x0, ttl 125, id 15806, offset 0, flags [DF], proto 6, length: 50) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 15699:15709(10) ack 1212 win 64881
16:24:19.367582 IP (tos 0x0, ttl 64, id 20653, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1212:1212(0) ack 15709 win 190
16:24:19.367609 IP (tos 0x0, ttl 64, id 20654, offset 0, flags [DF], proto 6, length: 154) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 1212:1326(114) ack 15709 win 190
16:24:19.528913 IP (tos 0x0, ttl 125, id 15988, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 15709:15709(0) ack 1326 win 64767
16:24:24.256535 IP (tos 0x0, ttl 125, id 23005, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 15709:15899(190) ack 1326 win 64767
16:24:24.296246 IP (tos 0x0, ttl 64, id 20655, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1326:1326(0) ack 15899 win 190
16:24:24.296397 IP (tos 0x0, ttl 125, id 23025, offset 0, flags [DF], proto 6, length: 52) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 15899:15911(12) ack 1326 win 64767
16:24:24.296406 IP (tos 0x0, ttl 64, id 20656, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1326:1326(0) ack 15911 win 190
16:24:24.296477 IP (tos 0x0, ttl 64, id 20657, offset 0, flags [DF], proto 6, length: 144) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 1326:1430(104) ack 15911 win 190
16:24:24.457828 IP (tos 0x0, ttl 125, id 23198, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 15911:15911(0) ack 1430 win 64663
16:24:29.286968 IP (tos 0x0, ttl 125, id 4906, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 15911:16101(190) ack 1430 win 64663
16:24:29.326433 IP (tos 0x0, ttl 64, id 20658, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1430:1430(0) ack 16101 win 190
16:24:29.326569 IP (tos 0x0, ttl 125, id 4947, offset 0, flags [DF], proto 6, length: 42) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 16101:16103(2) ack 1430 win 64663
16:24:29.326576 IP (tos 0x0, ttl 64, id 20659, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1430:1430(0) ack 16103 win 190
16:24:29.326605 IP (tos 0x0, ttl 64, id 20660, offset 0, flags [DF], proto 6, length: 144) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 1430:1534(104) ack 16103 win 190
16:24:29.489040 IP (tos 0x0, ttl 125, id 5027, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 16103:16103(0) ack 1534 win 64559
16:24:34.318242 IP (tos 0x0, ttl 125, id 19964, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 16103:16293(190) ack 1534 win 64559
16:24:34.357998 IP (tos 0x0, ttl 64, id 20661, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1534:1534(0) ack 16293 win 190
16:24:34.358139 IP (tos 0x0, ttl 125, id 19965, offset 0, flags [DF], proto 6, length: 42) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P [tcp sum ok] 16293:16295(2) ack 1534 win 64559
16:24:34.358167 IP (tos 0x0, ttl 64, id 20662, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 1534:1534(0) ack 16295 win 190
16:24:34.358204 IP (tos 0x0, ttl 64, id 20663, offset 0, flags [DF], proto 6, length: 140) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 1534:1634(100) ack 16295 win 190
16:24:34.358464 IP (tos 0x0, ttl 125, id 19966, offset 0, flags [DF], proto 6, length: 228) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 16295:16483(188) ack 1634 win 64459
16:24:34.358496 IP (tos 0x0, ttl 64, id 20664,
On Wed, 9 Sep 2009 12:33:21 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Sat, 5 Sep 2009, Jeff Layton wrote:
>
> > It looks like it's just taking 5s for the server to respond here. Do
> > you happen to have a wire capture of one of these events? That may tell
> > us more than cifsFYI info...
>
> I did a tcpdump and nothing stands out. Server acks the "cmd 50" and then
> waits 5 seconds before sending the data.
>
> 16:23:34.336373 IP (tos 0x0, ttl 64, id 20616, offset 0, flags [DF], proto 6, length: 118) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 2801206064:2801206142(78) ack 468207120 win 190
> 16:23:34.336624 IP (tos 0x0, ttl 125, id 19869, offset 0, flags [DF], proto 6, length: 206) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: P 1:167(166) ack 78 win 64548
> 16:23:34.336636 IP (tos 0x0, ttl 64, id 20617, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 78:78(0) ack 167 win 190
> 16:23:34.336669 IP (tos 0x0, ttl 64, id 20618, offset 0, flags [DF], proto 6, length: 128) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: P 78:166(88) ack 167 win 190
> 16:23:34.456343 IP (tos 0x0, ttl 125, id 20045, offset 0, flags [DF], proto 6, length: 40) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . [tcp sum ok] 167:167(0) ack 166 win 64460
>
> hiccup
>
> 16:23:39.284930 IP (tos 0x0, ttl 125, id 27544, offset 0, flags [DF], proto 6, length: 230) dogmeat.jules.org.microsoft-ds > fawkes.jules.org.43355: . 167:357(190) ack 166 win 64460
> 16:23:39.324060 IP (tos 0x0, ttl 64, id 20619, offset 0, flags [DF], proto 6, length: 40) fawkes.jules.org.43355 > dogmeat.jules.org.microsoft-ds: . [tcp sum ok] 166:166(0) ack 357 win 190
A binary capture would probably be easier to infer something from --
we'd be able to open it up in wireshark and get a little more info
about what sort of call the client is doing.
My suspicion would be that the server needs to perform an oplock break
to another client before it can send the response. The only way I know
how to tell that is to sniff all SMB traffic on the server and watch
for oplock break calls to other clients when these stalls occur.
--
Jeff Layton <[email protected]>
On Wed, 9 Sep 2009, Jeff Layton wrote:
> My suspicion would be that the server needs to perform an oplock break
> to another client before it can send the response. The only way I know
> how to tell that is to sniff all SMB traffic on the server and watch
> for oplock break calls to other clients when these stalls occur.
That could be tested by switching them off right? If I do
echo 0 >/proc/fs/cifs/OplockEnabled
and then remount the volume it should switch off oplocks?
This has no effect on the stalls.
On Wed, 9 Sep 2009 13:07:52 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Wed, 9 Sep 2009, Jeff Layton wrote:
>
> > My suspicion would be that the server needs to perform an oplock break
> > to another client before it can send the response. The only way I know
> > how to tell that is to sniff all SMB traffic on the server and watch
> > for oplock break calls to other clients when these stalls occur.
>
> That could be tested by switching them off right? If I do
>
> echo 0 >/proc/fs/cifs/OplockEnabled
>
> and then remount the volume it should switch off oplocks?
>
> This has no effect on the stalls.
>
That'll stop your client from requesting oplocks, but that won't
prevent others from doing so. If my suspicion is correct, then another
client is holding an oplock and the server needs to break it before it
can reply to yours.
Unfortunately I doubt there's much you can do from your client to
prevent that (if that is the case). There may be a way to turn off
oplocks on the server side, but that may very well be even worse for
performance.
--
Jeff Layton <[email protected]>
On Wed, 9 Sep 2009, Jeff Layton wrote:
> That'll stop your client from requesting oplocks, but that won't
> prevent others from doing so. If my suspicion is correct, then another
> client is holding an oplock and the server needs to break it before it
> can reply to yours.
>
> Unfortunately I doubt there's much you can do from your client to
> prevent that (if that is the case). There may be a way to turn off
> oplocks on the server side, but that may very well be even worse for
> performance.
Hmmm... We can look at that.
Another interesting tidbit is that I have never seen this from a 64 bit
Linux kernel. Only occurs with 32 bit kernels it seems.
On Wed, 9 Sep 2009, Jeff Layton wrote:
> Unfortunately I doubt there's much you can do from your client to
> prevent that (if that is the case). There may be a way to turn off
> oplocks on the server side, but that may very well be even worse for
> performance.
Also note that these hiccups occur when simply doing an
ls
we are not accessing or writing files.
On Wed, 9 Sep 2009 13:33:33 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Wed, 9 Sep 2009, Jeff Layton wrote:
>
> > Unfortunately I doubt there's much you can do from your client to
> > prevent that (if that is the case). There may be a way to turn off
> > oplocks on the server side, but that may very well be even worse for
> > performance.
>
> Also note that these hiccups occur when simply doing an
>
> ls
>
> we are not accessing or writing files.
>
Hmm...
The hiccups you posted in the original email happened during a
QPathInfo call (somewhat similar to a NFS GETATTR). I wouldn't think
that would cause an oplock break, but I suppose it might. The server
might decide that it needs to revoke the oplock in order to retrieve
accurate size, LastWriteTime (aka mtime), etc. It could also be a
windows bug...
Here's an excerpt from an IRC conversation on this in #samba-technical, that might give a little info:
13:42 < jlayton> would a QPathInfo call cause an oplock break?
13:42 < jlayton> (typically)?
13:47 < sdann> jlayton, no it shouldn't, as it's path based and could be done with a stat() call. Only an open() or brl()
operation should break an oplock.
13:48 < jlayton> ok, good to know -- thx
13:49 < jlayton> sdann: actually though, I'm asking about win2k3 server...
13:49 < jlayton> do you know whether it might break the oplock on a qpathinfo?
13:49 < jlayton> i.e. to get accurate size info, for instance
13:50 < sdann> well in general, only opens, writes (truncate included), and byte-range-lock ops break oplocks
13:50 < sdann> so any kind of meta-data request should not
13:51 < jlayton> hmm ok, one of the linux-kernel guys is seeing QPathInfo calls go out to win2k3 server and the server waits
5s before responding
13:51 < jlayton> my initial thought was oplock break to another client is causing the stall, but maybe it's something else
13:51 < coffeedude> sdann, SetFileInfo (allocationInfo and EndofFile) will as well.
13:51 < jlayton> I'm pretty sure this is QPathInfo call
13:52 < sdann> a quick torture test in source4/torture/raw/oplock.c would solve the issue :)
13:52 < coffeedude> jlayton, internally in Windows, the NTFS interface is handle based so I assume the server does a
NtCreateFile(), QueryInformationFile(), CloseFile().
13:52 < jlayton> ahhh maybe so
13:52 < coffeedude> jlayton, the internal opens should done with FILE_READ_ATTRIBUTES so they don't cause a break but it
could be a Windows bug.
13:53 < jlayton> sounds plausible
13:53 < jlayton> coffeedude, sdann: thanks!
13:53 < coffeedude> jlayton, any open with nothing other than FILE_READ_ATTRIBUTES, FILE_WRITE_ATTRIBUTES or SYNCHRONIZE
should nto cause an oplock break either.
13:53 < sdann> coffeedude, yeah that's certainly possible
13:53 < coffeedude> jlayton, any open with nothing other than FILE_READ_ATTRIBUTES, FILE_WRITE_ATTRIBUTES or SYNCHRONIZE
should nto cause an oplock break either.
13:53 < sdann> coffeedude, yeah that's certainly possible
13:53 < coffeedude> sdann, only know cause I've done it :)
I'd probably start with sniffing traffic at the server side and see if
you can correlate the stalls with traffic to other hosts (oplock breaks
in particular).
If so then maybe consider patching the server or testing with a
different flavor of windows.
--
Jeff Layton <[email protected]>
On Wed, 9 Sep 2009 13:28:24 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Wed, 9 Sep 2009, Jeff Layton wrote:
>
> > That'll stop your client from requesting oplocks, but that won't
> > prevent others from doing so. If my suspicion is correct, then another
> > client is holding an oplock and the server needs to break it before it
> > can reply to yours.
> >
> > Unfortunately I doubt there's much you can do from your client to
> > prevent that (if that is the case). There may be a way to turn off
> > oplocks on the server side, but that may very well be even worse for
> > performance.
>
> Hmmm... We can look at that.
>
> Another interesting tidbit is that I have never seen this from a 64 bit
> Linux kernel. Only occurs with 32 bit kernels it seems.
>
That sounds rather strange. Maybe we do have a bug of some sort? The
thing to do might be to get a binary capture of the 32-bit traffic
around the time of the stalls. We could then inspect the packets and
see whether we have something wrong in there.
--
Jeff Layton <[email protected]>
On Wed, 9 Sep 2009, Jeff Layton wrote:
> That sounds rather strange. Maybe we do have a bug of some sort? The
> thing to do might be to get a binary capture of the 32-bit traffic
> around the time of the stalls. We could then inspect the packets and
> see whether we have something wrong in there.
Capture attached.
On Wed, 9 Sep 2009 17:27:57 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Wed, 9 Sep 2009, Jeff Layton wrote:
>
> > That sounds rather strange. Maybe we do have a bug of some sort? The
> > thing to do might be to get a binary capture of the 32-bit traffic
> > around the time of the stalls. We could then inspect the packets and
> > see whether we have something wrong in there.
>
> Capture attached.
Well, I can see the delays in the capture, but the snarflen for the
capture is a little too small to tell much else. Can you redo the
capture with a larger snarflen (maybe -s 512 or so)?
Also, were you able to tell anything from a server-side capture? Is the
server issuing oplock breaks at those times?
Cheers,
Jeff
On Wed, 9 Sep 2009, Jeff Layton wrote:
> Well, I can see the delays in the capture, but the snarflen for the
> capture is a little too small to tell much else. Can you redo the
> capture with a larger snarflen (maybe -s 512 or so)?
-s 1000 version attached.
> Also, were you able to tell anything from a server-side capture? Is the
> server issuing oplock breaks at those times?
Thats a pretty busy system. They have not gotten around to do any logging
on that end.
On Thu, 10 Sep 2009 14:53:12 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Wed, 9 Sep 2009, Jeff Layton wrote:
>
> > Well, I can see the delays in the capture, but the snarflen for the
> > capture is a little too small to tell much else. Can you redo the
> > capture with a larger snarflen (maybe -s 512 or so)?
>
> -s 1000 version attached.
>
> > Also, were you able to tell anything from a server-side capture? Is the
> > server issuing oplock breaks at those times?
>
> Thats a pretty busy system. They have not gotten around to do any logging
> on that end.
Ok. I had a look at the capture. The stalls seem to be occurring on
FIND_FILE requests. Those are similar to READDIRPLUS requests in NFS,
it returns a list of files that match a particular set of criteria and
their attributes.
Each time the client is making one of these calls to the server, it
requests a set of up to 150 files. The server grinds for 5s each time
and then responds.
The calls themselves seem to be sane AFAICT. I don't see any problems
with the parameters we're sending for the search. I also had a look
over the FIND_FIRST code and it doesn't seem to have any obvious
word size related problems.
I assume that the 32 and 64 bit clients you have are calling "ls" in
the same dir. If so, maybe a similar capture from a 64-bit client might
help us see the difference?
Thanks,
--
Jeff Layton <[email protected]>
On Thu, 10 Sep 2009, Jeff Layton wrote:
> I assume that the 32 and 64 bit clients you have are calling "ls" in
> the same dir. If so, maybe a similar capture from a 64-bit client might
> help us see the difference?
64 bit trace attached.
On Thu, 10 Sep 2009 15:42:28 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Thu, 10 Sep 2009, Jeff Layton wrote:
>
> > I assume that the 32 and 64 bit clients you have are calling "ls" in
> > the same dir. If so, maybe a similar capture from a 64-bit client might
> > help us see the difference?
>
> 64 bit trace attached.
A couple of differences. First, the "ls's" were done in different
directories since they had different search patterns:
32 == \*
64 == \clameter\*
...did they also mount different shares from the server?
The 64-bit capture was done in a directory with only 50 files,
whereas the other one had at least 600-700 files (capture ends before
it finished listing the files). That may make quite a bit of difference
on the server (not sure how windows works internally in this case).
The only other substantive difference I see is that the Level of
Interest that the client is requesting is different:
32 == SMB_FIND_FILE_DIRECTORY_INFO
64 == SMB_FIND_FILE_ID_FULL_DIR_INFO
That probably means that the 32 bit client has disabled
CIFS_MOUNT_SERVER_INUM for some reason. That means that it's not asking
the server for the windows equivalent of inode numbers. We typically
disable that flag automatically if a query for the inode number of a
path fails.
Since these are the same server, that may be an indicator that the
server is serving out info from two different filesystem types (maybe
FAT vs. NTFS, or maybe even a CDROM or something). If so, then that may
help explain some of the performance delta there. I'd be more
interested to see how the 64 bit client behaves when it mounts the
exact same share and does an ls in the same directory as the 32 bit
client.
Cheers,
--
Jeff Layton <[email protected]>
On Thu, 10 Sep 2009, Jeff Layton wrote:
> A couple of differences. First, the "ls's" were done in different
> directories since they had different search patterns:
Right. 32 bit cannot mount the clameter directory for strange reasons. I
have to go one level higher.
> The 64-bit capture was done in a directory with only 50 files,
> whereas the other one had at least 600-700 files (capture ends before
> it finished listing the files). That may make quite a bit of difference
> on the server (not sure how windows works internally in this case).
Right. I just remounted the 64 bit on the same directory. No delays.
> The only other substantive difference I see is that the Level of
> Interest that the client is requesting is different:
>
> 32 == SMB_FIND_FILE_DIRECTORY_INFO
> 64 == SMB_FIND_FILE_ID_FULL_DIR_INFO
>
> That probably means that the 32 bit client has disabled
> CIFS_MOUNT_SERVER_INUM for some reason. That means that it's not asking
> the server for the windows equivalent of inode numbers. We typically
> disable that flag automatically if a query for the inode number of a
> path fails.
I added the serverino option on the 32 bit system. No effect.
> Since these are the same server, that may be an indicator that the
> server is serving out info from two different filesystem types (maybe
> FAT vs. NTFS, or maybe even a CDROM or something). If so, then that may
> help explain some of the performance delta there. I'd be more
> interested to see how the 64 bit client behaves when it mounts the
> exact same share and does an ls in the same directory as the 32 bit
> client.
No its all on the same file system.
New capture attached for same directory.
One other issue that may be important: The mounting operation is very slow
on 32 bit. Could it be that the handshake does not work out?
On Thu, 10 Sep 2009 17:27:53 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> Right. 32 bit cannot mount the clameter directory for strange reasons. I
> have to go one level higher.
[...]
> One other issue that may be important: The mounting operation is very slow
> on 32 bit. Could it be that the handshake does not work out?
Ok, looks like the 64 bit client is using a different level of interest
than the 32 bit on the FIND_FIRST call. I suspect that that difference
may account for the difference in response time.
It's not completely clear to me why that would be. Maybe a windows bug
that causes a slowdown with that LOI?
In any case, I think we need to look closely at what's happening at
mount time. First, I'll need some other info:
1) output of "/sbin/mount.cifs -V" from both machines
2) mount options that you're using on both boxes
3) wire captures from mount attempts on both machines. Try to mount the
"clameter" dir on both boxes and do captures of each attempt. Maybe
this time use -s 0 with tcpdump so we get all of the traffic.
There may be crackable password hashes in the captures, so you may want
to send them to me privately and not cc the list.
Thanks,
--
Jeff Layton <[email protected]>
On Thu, 10 Sep 2009, Jeff Layton wrote:
> In any case, I think we need to look closely at what's happening at
> mount time. First, I'll need some other info:
>
> 1) output of "/sbin/mount.cifs -V" from both machines
The 32 bit machine
#/sbin/mount.cifs -V
mount.cifs version: 1.5
mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
64 bit machine
$ /sbin/mount.cifs -V
mount.cifs version: 1.12-3.4.0
mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=w2k
> 3) wire captures from mount attempts on both machines. Try to mount the
> "clameter" dir on both boxes and do captures of each attempt. Maybe
> this time use -s 0 with tcpdump so we get all of the traffic.
I cannot mount the clameter dir on the 32 bit box. Hangs. So I will mount
/company.
> There may be crackable password hashes in the captures, so you may want
> to send them to me privately and not cc the list.
Ok will follow.
64 bit one
On Mon, 14 Sep 2009 16:10:47 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Thu, 10 Sep 2009, Jeff Layton wrote:
>
> > In any case, I think we need to look closely at what's happening at
> > mount time. First, I'll need some other info:
> >
> > 1) output of "/sbin/mount.cifs -V" from both machines
>
> The 32 bit machine
>
> #/sbin/mount.cifs -V
> mount.cifs version: 1.5
> //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
> mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
>
Yow, that version of mount.cifs is really old. I wonder if it may be
passing bad mount options to the kernel? Might be interesting to strace
that. Something like:
# strace -f -s 256 -e mount mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
...it'll probably have a cleartext password in it so you might want to
doctor the options a bit before sending along if you do.
Alternately, you might just want to try a newer version of mount.cifs
and see whether that fixes this.
> 64 bit machine
>
> $ /sbin/mount.cifs -V
> mount.cifs version: 1.12-3.4.0
>
> mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=w2k
>
> > 3) wire captures from mount attempts on both machines. Try to mount the
> > "clameter" dir on both boxes and do captures of each attempt. Maybe
> > this time use -s 0 with tcpdump so we get all of the traffic.
>
> I cannot mount the clameter dir on the 32 bit box. Hangs. So I will mount
> /company.
>
Actually, the trace of a hanging mount would probably be interesting.
Does the 32-bit capture that you sent represent a mount attempt that
hung? Or was it successful?
> > There may be crackable password hashes in the captures, so you may want
> > to send them to me privately and not cc the list.
>
> Ok will follow.
>
Thanks for the info, I had a look at the captures. They both look
fairly similar. The main difference is that the 32-bit box doesn't seem
to have sent any more calls after sending a QPathInfo call to the
server for the root inode of the mount.
What's the "devname" that you're giving to the mount command for the
"clameter" dir? If there's more than 1 path component after the
hostname, then the problem may be in the old version of mount.cifs.
Some of them had broken handling for path prefixes.
--
Jeff Layton <[email protected]>
On Tue, 15 Sep 2009, Jeff Layton wrote:
> Yow, that version of mount.cifs is really old. I wonder if it may be
> passing bad mount options to the kernel? Might be interesting to strace
> that. Something like:
>
> # strace -f -s 256 -e mount mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
>
> ...it'll probably have a cleartext password in it so you might want to
> doctor the options a bit before sending along if you do.
>
> Alternately, you might just want to try a newer version of mount.cifs
> and see whether that fixes this.
Tried a newer version of mount.cifs without any change.
> > I cannot mount the clameter dir on the 32 bit box. Hangs. So I will mount
> > /company.
> >
>
> Actually, the trace of a hanging mount would probably be interesting.
>
> Does the 32-bit capture that you sent represent a mount attempt that
> hung? Or was it successful?
No it was successful.
> What's the "devname" that you're giving to the mount command for the
> "clameter" dir? If there's more than 1 path component after the
> hostname, then the problem may be in the old version of mount.cifs.
> Some of them had broken handling for path prefixes.
its //machinename/company/clameter
So two components.
On Wed, 16 Sep 2009 12:26:04 -0400 (EDT)
Christoph Lameter <[email protected]> wrote:
> On Tue, 15 Sep 2009, Jeff Layton wrote:
>
> > Yow, that version of mount.cifs is really old. I wonder if it may be
> > passing bad mount options to the kernel? Might be interesting to strace
> > that. Something like:
> >
> > # strace -f -s 256 -e mount mount -t cifs //chiprodfs2/company /mnt -ouser=clameter,domain=xxx
> >
> > ...it'll probably have a cleartext password in it so you might want to
> > doctor the options a bit before sending along if you do.
> >
> > Alternately, you might just want to try a newer version of mount.cifs
> > and see whether that fixes this.
>
> Tried a newer version of mount.cifs without any change.
>
Ok, good to rule that out then.
> > > I cannot mount the clameter dir on the 32 bit box. Hangs. So I will mount
> > > /company.
> > >
> >
> > Actually, the trace of a hanging mount would probably be interesting.
> >
> > Does the 32-bit capture that you sent represent a mount attempt that
> > hung? Or was it successful?
>
> No it was successful.
>
Hmm, ok. That isn't going to tell me as much as a mount that fails. For
now, I suggest that we focus on determining why these mounts hang/fail.
After that we can see whether the solution there has any bearing on why
the server is so slow to respond to this particular client.
> > What's the "devname" that you're giving to the mount command for the
> > "clameter" dir? If there's more than 1 path component after the
> > hostname, then the problem may be in the old version of mount.cifs.
> > Some of them had broken handling for path prefixes.
>
> its //machinename/company/clameter
>
> So two components.
>
Also good to know.
What we should probably do at this point is track down why the 32-bit
client has such a hard time mounting the clameter dir. Here's what
would be most helpful:
1) some debug log info of the mount attempt:
# modprobe cifs
# echo 7 > /proc/fs/cifs/cifsFYI
...then attempt the mount. After it hangs for a few seconds, ^c the
mount to kill it. Collect the output from dmesg and send it to me. That
should give me some idea of what the client is doing during this phase.
If you can simultaneously capture wire traffic during the same mount
attempt that would also be helpful.
Cheers,
--
Jeff Layton <[email protected]>