Doing a simple open of a file from 2 readers and there is a strange 5s
delay on the second open. We did NOT see the issue on kernel:
CentOS 4.18.0-193.6.3.el8_2.x86_64
but the issue is present in:
CentOS 4.18.0-305.7.1.el8_4.x86_64
#include <stdio.h>
int main() {
FILE * file_pointer;
char buffer[50], c;
file_pointer = fopen("/mnt/nfsv4/test/t.txt", "r");
fgets(buffer, 50, file_pointer);
file_pointer = fopen("/mnt/nfsv4/test/t.txt", "r");
fgets(buffer, 50, file_pointer);
fclose(file_pointer);
return 0;
}
Call Graph from ftrace
_nfs4_opendata_to_nfs4_state()
nfs_refresh_inode() {
nfs_refresh_inode.part.28() {
nfs_refresh_inode_locked() {
update_open_stateid()
** 5s wait **
Then the open / read completes
We do not see this issue on nfsv3 only nfsv4.
Here is the wireshark of the request and response:
The second request and response happens immediately in the wireshark.
The delay is in the kernel.
Frame 25: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
Ethernet II, Src: fa:16:3e:88:e2:24 (fa:16:3e:88:e2:24), Dst:
PureStor_5b:45:10 (24:a9:37:5b:45:10)
Internet Protocol Version 4, Src: 10.15.132.250, Dst: 10.15.128.15
Transmission Control Protocol, Src Port: 710, Dst Port: 2049, Seq:
2125, Ack: 1193, Len: 312
Remote Procedure Call, Type:Call XID:0xed175b8d
Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
[Program Version: 4]
[V4 Procedure: COMPOUND (1)]
Tag: <EMPTY>
minorversion: 1
Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
Opcode: SEQUENCE (53)
sessionid: 00c800000000000c0000000000000000
seqid: 0x00000073
slot id: 0
high slot id: 0
cache this?: Yes
Opcode: PUTFH (22)
FileHandle
Opcode: OPEN (18)
seqid: 0x00000000
share_access: OPEN4_SHARE_ACCESS_READ (1)
share_deny: OPEN4_SHARE_DENY_NONE (0)
clientid: 0x00c8000000000017
owner: <DATA>
Open Type: OPEN4_NOCREATE (0)
Claim Type: CLAIM_FH (4)
Opcode: ACCESS (3), [Check: RD MD XT XE]
Check access: 0x2d
.... ...1 = 0x001 READ: allowed?
.... .1.. = 0x004 MODIFY: allowed?
.... 1... = 0x008 EXTEND: allowed?
..1. .... = 0x020 EXECUTE: allowed?
Opcode: GETATTR (9)
Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
reqd_attr: Type (1)
reqd_attr: Change (3)
reqd_attr: Size (4)
reqd_attr: FSID (8)
reco_attr: FileId (20)
Attr mask[1]: 0x0030a23a (Mode, NumLinks, Owner,
Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata,
Time_Modify)
reco_attr: Mode (33)
reco_attr: NumLinks (35)
reco_attr: Owner (36)
reco_attr: Owner_Group (37)
reco_attr: RawDev (41)
reco_attr: Space_Used (45)
reco_attr: Time_Access (47)
reco_attr: Time_Metadata (52)
reco_attr: Time_Modify (53)
[Main Opcode: OPEN (18)]
Frame 26: 374 bytes on wire (2992 bits), 374 bytes captured (2992 bits)
Ethernet II, Src: PureStor_5b:45:10 (24:a9:37:5b:45:10), Dst:
fa:16:3e:88:e2:24 (fa:16:3e:88:e2:24)
Internet Protocol Version 4, Src: 10.15.128.15, Dst: 10.15.132.250
Transmission Control Protocol, Src Port: 2049, Dst Port: 710, Seq:
1193, Ack: 2437, Len: 308
Remote Procedure Call, Type:Reply XID:0xed175b8d
Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
[Program Version: 4]
[V4 Procedure: COMPOUND (1)]
Status: NFS4_OK (0)
Tag: <EMPTY>
Operations (count: 5)
Opcode: SEQUENCE (53)
Status: NFS4_OK (0)
sessionid: 00c800000000000c0000000000000000
seqid: 0x00000073
slot id: 0
high slot id: 127
target high slot id: 127
status flags: 0x00000000
Opcode: PUTFH (22)
Status: NFS4_OK (0)
Opcode: OPEN (18)
Status: NFS4_OK (0)
StateID
[StateID Hash: 0xaa04]
StateID seqid: 1
StateID Other: 000000000000000000000000
[StateID Other hash: 0x60de]
change_info
Atomic: Yes
changeid (before): 1626267251187000000
changeid (after): 1626267251187000000
result flags: 0x00000004, locktype posix
Delegation Type: OPEN_DELEGATE_NONE (0)
Opcode: ACCESS (3), [NOT Supported: LU DL], [Access Denied: LU
MD XT DL XE], [Allowed: RD]
Status: NFS4_OK (0)
Supported types (of requested): 0x3f
Access rights (of requested): 0x01
.... ...1 = 0x001 READ: allowed
.... ..0. = 0x002 LOOKUP: *Access Denied*
.... .0.. = 0x004 MODIFY: *Access Denied*
.... 0... = 0x008 EXTEND: *Access Denied*
...0 .... = 0x010 DELETE: *Access Denied*
..0. .... = 0x020 EXECUTE: *Access Denied*
Opcode: GETATTR (9)
Status: NFS4_OK (0)
Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
reqd_attr: Type (1)
ftype4: NF4REG (1)
reqd_attr: Change (3)
changeid: 1626267251187000000
reqd_attr: Size (4)
size: 7
reqd_attr: FSID (8)
fattr4_fsid
reco_attr: FileId (20)
fileid: 20829148276664444
Attr mask[1]: 0x0030a23a (Mode, NumLinks, Owner,
Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata,
Time_Modify)
reco_attr: Mode (33)
mode: 0644, Name: Unknown, Read permission for
owner, Write permission for owner, Read permission for group, Read
permission for others
reco_attr: NumLinks (35)
numlinks: 1
reco_attr: Owner (36)
fattr4_owner: 0
reco_attr: Owner_Group (37)
fattr4_owner_group: 0
reco_attr: RawDev (41)
specdata1: 0
specdata2: 0
reco_attr: Space_Used (45)
space_used: 7
reco_attr: Time_Access (47)
seconds: 1626267228
nseconds: 0
reco_attr: Time_Metadata (52)
seconds: 1626267251
nseconds: 187000000
reco_attr: Time_Modify (53)
seconds: 1626267251
nseconds: 186841000
[Main Opcode: OPEN (18)]
--
Matt Robertson // Data Architect | Pure Storage, Inc.
913-915-1162 | [email protected]
> On Jul 29, 2021, at 23:15, Matthew Robertson <[email protected]> wrote:
>
> Doing a simple open of a file from 2 readers and there is a strange 5s
> delay on the second open. We did NOT see the issue on kernel:
> CentOS 4.18.0-193.6.3.el8_2.x86_64
> but the issue is present in:
> CentOS 4.18.0-305.7.1.el8_4.x86_64
Your OPEN implementation is fundamentally broken. You’re handing out an illegal stateid. The client is not obligated to take into account an implementation that does not bump the seqid as required by the NFSv4 spec.
>
>
> #include <stdio.h>
>
> int main() {
>
> FILE * file_pointer;
>
> char buffer[50], c;
>
> file_pointer = fopen("/mnt/nfsv4/test/t.txt", "r");
>
> fgets(buffer, 50, file_pointer);
>
> file_pointer = fopen("/mnt/nfsv4/test/t.txt", "r");
>
> fgets(buffer, 50, file_pointer);
>
> fclose(file_pointer);
>
> return 0;
>
> }
>
>
>
> Call Graph from ftrace
>
> _nfs4_opendata_to_nfs4_state()
>
> nfs_refresh_inode() {
>
> nfs_refresh_inode.part.28() {
>
> nfs_refresh_inode_locked() {
>
> update_open_stateid()
>
> ** 5s wait **
>
> Then the open / read completes
>
>
> We do not see this issue on nfsv3 only nfsv4.
>
>
> Here is the wireshark of the request and response:
>
> The second request and response happens immediately in the wireshark.
> The delay is in the kernel.
>
>
>
> Frame 25: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
> Ethernet II, Src: fa:16:3e:88:e2:24 (fa:16:3e:88:e2:24), Dst:
> PureStor_5b:45:10 (24:a9:37:5b:45:10)
> Internet Protocol Version 4, Src: 10.15.132.250, Dst: 10.15.128.15
> Transmission Control Protocol, Src Port: 710, Dst Port: 2049, Seq:
> 2125, Ack: 1193, Len: 312
> Remote Procedure Call, Type:Call XID:0xed175b8d
> Network File System, Ops(5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
> [Program Version: 4]
> [V4 Procedure: COMPOUND (1)]
> Tag: <EMPTY>
> minorversion: 1
> Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
> Opcode: SEQUENCE (53)
> sessionid: 00c800000000000c0000000000000000
> seqid: 0x00000073
> slot id: 0
> high slot id: 0
> cache this?: Yes
> Opcode: PUTFH (22)
> FileHandle
> Opcode: OPEN (18)
> seqid: 0x00000000
> share_access: OPEN4_SHARE_ACCESS_READ (1)
> share_deny: OPEN4_SHARE_DENY_NONE (0)
> clientid: 0x00c8000000000017
> owner: <DATA>
> Open Type: OPEN4_NOCREATE (0)
> Claim Type: CLAIM_FH (4)
> Opcode: ACCESS (3), [Check: RD MD XT XE]
> Check access: 0x2d
> .... ...1 = 0x001 READ: allowed?
> .... .1.. = 0x004 MODIFY: allowed?
> .... 1... = 0x008 EXTEND: allowed?
> ..1. .... = 0x020 EXECUTE: allowed?
> Opcode: GETATTR (9)
> Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
> reqd_attr: Type (1)
> reqd_attr: Change (3)
> reqd_attr: Size (4)
> reqd_attr: FSID (8)
> reco_attr: FileId (20)
> Attr mask[1]: 0x0030a23a (Mode, NumLinks, Owner,
> Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata,
> Time_Modify)
> reco_attr: Mode (33)
> reco_attr: NumLinks (35)
> reco_attr: Owner (36)
> reco_attr: Owner_Group (37)
> reco_attr: RawDev (41)
> reco_attr: Space_Used (45)
> reco_attr: Time_Access (47)
> reco_attr: Time_Metadata (52)
> reco_attr: Time_Modify (53)
> [Main Opcode: OPEN (18)]
>
>
>
>
>
> Frame 26: 374 bytes on wire (2992 bits), 374 bytes captured (2992 bits)
> Ethernet II, Src: PureStor_5b:45:10 (24:a9:37:5b:45:10), Dst:
> fa:16:3e:88:e2:24 (fa:16:3e:88:e2:24)
> Internet Protocol Version 4, Src: 10.15.128.15, Dst: 10.15.132.250
> Transmission Control Protocol, Src Port: 2049, Dst Port: 710, Seq:
> 1193, Ack: 2437, Len: 308
> Remote Procedure Call, Type:Reply XID:0xed175b8d
> Network File System, Ops(5): SEQUENCE PUTFH OPEN ACCESS GETATTR
> [Program Version: 4]
> [V4 Procedure: COMPOUND (1)]
> Status: NFS4_OK (0)
> Tag: <EMPTY>
> Operations (count: 5)
> Opcode: SEQUENCE (53)
> Status: NFS4_OK (0)
> sessionid: 00c800000000000c0000000000000000
> seqid: 0x00000073
> slot id: 0
> high slot id: 127
> target high slot id: 127
> status flags: 0x00000000
> Opcode: PUTFH (22)
> Status: NFS4_OK (0)
> Opcode: OPEN (18)
> Status: NFS4_OK (0)
> StateID
> [StateID Hash: 0xaa04]
> StateID seqid: 1
> StateID Other: 000000000000000000000000
> [StateID Other hash: 0x60de]
> change_info
> Atomic: Yes
> changeid (before): 1626267251187000000
> changeid (after): 1626267251187000000
> result flags: 0x00000004, locktype posix
> Delegation Type: OPEN_DELEGATE_NONE (0)
> Opcode: ACCESS (3), [NOT Supported: LU DL], [Access Denied: LU
> MD XT DL XE], [Allowed: RD]
> Status: NFS4_OK (0)
> Supported types (of requested): 0x3f
> Access rights (of requested): 0x01
> .... ...1 = 0x001 READ: allowed
> .... ..0. = 0x002 LOOKUP: *Access Denied*
> .... .0.. = 0x004 MODIFY: *Access Denied*
> .... 0... = 0x008 EXTEND: *Access Denied*
> ...0 .... = 0x010 DELETE: *Access Denied*
> ..0. .... = 0x020 EXECUTE: *Access Denied*
> Opcode: GETATTR (9)
> Status: NFS4_OK (0)
> Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
> reqd_attr: Type (1)
> ftype4: NF4REG (1)
> reqd_attr: Change (3)
> changeid: 1626267251187000000
> reqd_attr: Size (4)
> size: 7
> reqd_attr: FSID (8)
> fattr4_fsid
> reco_attr: FileId (20)
> fileid: 20829148276664444
> Attr mask[1]: 0x0030a23a (Mode, NumLinks, Owner,
> Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata,
> Time_Modify)
> reco_attr: Mode (33)
> mode: 0644, Name: Unknown, Read permission for
> owner, Write permission for owner, Read permission for group, Read
> permission for others
> reco_attr: NumLinks (35)
> numlinks: 1
> reco_attr: Owner (36)
> fattr4_owner: 0
> reco_attr: Owner_Group (37)
> fattr4_owner_group: 0
> reco_attr: RawDev (41)
> specdata1: 0
> specdata2: 0
> reco_attr: Space_Used (45)
> space_used: 7
> reco_attr: Time_Access (47)
> seconds: 1626267228
> nseconds: 0
> reco_attr: Time_Metadata (52)
> seconds: 1626267251
> nseconds: 187000000
> reco_attr: Time_Modify (53)
> seconds: 1626267251
> nseconds: 186841000
> [Main Opcode: OPEN (18)]
>
>
> --
> Matt Robertson // Data Architect | Pure Storage, Inc.
> 913-915-1162 | [email protected]
_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]