Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1335185pxb; Fri, 26 Feb 2021 08:16:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAGhXNOLjeL/1+5C99SFut0WrH66gsFuv+UHEuWxc6YJpHNrNCK6307sMVSeyCykcF/U+p X-Received: by 2002:a17:906:1457:: with SMTP id q23mr4221288ejc.43.1614356161043; Fri, 26 Feb 2021 08:16:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614356161; cv=none; d=google.com; s=arc-20160816; b=UlPxDJK2+yXyacFtZvPW/i/N6NtFSdycVHr7bkClMmGyhMhQ+tSDpzeFIoXLiLncMW TxdF1oQwuCSuaZRwM1VMlzi1gCXRoChNTUs9cJ3mAQu3OeJ6Lj6EPCGJTtOSg7MwI1Xx P5YhkAfdCCW7fOPSR0NxG3OwjNTRieVnOvsYLfyLUYDlvlhUUTaj16afXHx6VTjyw1Ia sq4vGCa5XNwIW1QwfqNvyaRvql9sGtCYFuRzu5UgK5Iz40UIwHqQMhaO1dOV/qGOS8Of 3tDtViaaOapXiISkMKUfl2rDlp53hMrZR04itcypnr7aH6pI2eSat5fWEUhyC7HJ0fS+ Hvag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=LaqaDuDsNHwRyr6UcDCdxFKxwZL56M8KDynRPp+2LAM=; b=n49Cuobe4VUlUBeR4u7UAQid7dqFE0NMKncTioaY/nt9F1EcfPJhp3vEcdhVCr23Np 4vIM0sj/J2Ge65KyRZgU51KIGpRTsWfp6RxFpB0zS4tvu1IYjj2VcNhJSmbOmC6jQ61w xSxjYzX8ys3kjg6vKx5Sl0K7t5ifF1ulYZ3vYqWYpkiXDtZDXnBpiHGMKPgoyvINhoQi bP+FsNeNdr6MoW0neTBgYN95+GGQPECk9lG3G1olC3vocvYOcD3iVrkiz6fmubSNqmeG +o0WG8C5hefzK94pLtoiWEUY6WWzew4523AsJzyf9ZZKd7ay+H5FJq9skDxmK/PACwS1 2YkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pVbRmK1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hp34si607632ejc.328.2021.02.26.08.15.36; Fri, 26 Feb 2021 08:16:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pVbRmK1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230019AbhBZQOt (ORCPT + 99 others); Fri, 26 Feb 2021 11:14:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:42360 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229566AbhBZQOp (ORCPT ); Fri, 26 Feb 2021 11:14:45 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id D82F264EBA; Fri, 26 Feb 2021 16:13:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614356040; bh=XIYrnMST0eMIO+QkazWoUPYEIVztH0rO3bxVtBoKcGM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=pVbRmK1P0SrkCAhMPI5Wc6XGKamcCKljWz4QVOv+w39Np5W4Mxsg/nzsvg8GCZC2E XrK+NZrNvmbu+8KTlc/FSddgXyJKyqUUW/5t4OTD0int0BCUkm6Z4lgecLVzjbi+0x 6cs3d4o73afRixsOz40/WIWsKIA2K4W0OXH0H9c0wQloPI2HK/dU0JvA87Dcjt7GZp jzM119ZW21V1FPLL14hIYeCGmRVuFH2BJWsqCJwperT0JdL/8GoqpPLYa/2UUzANF9 N5aNyOVHdGKy7K8jW1mel0HIPuiGp1LfL1tOFkIq7tMn28hzBoUiSDv0MrcVKTM8lV bnq30z8f48CSA== Date: Sat, 27 Feb 2021 01:13:55 +0900 From: Keith Busch To: Hannes Reinecke Cc: Daniel Wagner , Sagi Grimberg , Jens Axboe , Christoph Hellwig , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme-tcp: Check if request has started before processing it Message-ID: <20210226161355.GG31593@redsun51.ssa.fujisawa.hgst.com> References: <20210212181738.79274-1-dwagner@suse.de> <20210212210929.GA3851@redsun51.ssa.fujisawa.hgst.com> <73e4914e-f867-c899-954d-4b61ae2b4c33@suse.de> <20210215104020.yyithlo2hkxqvguj@beryllium.lan> <20210226123534.4oovbzk4wrnfjp64@beryllium.lan> <9e209b12-3771-cdca-2c9d-50451061bd2a@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9e209b12-3771-cdca-2c9d-50451061bd2a@suse.de> User-Agent: Mutt/1.12.1 (2019-06-15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 26, 2021 at 01:54:00PM +0100, Hannes Reinecke wrote: > On 2/26/21 1:35 PM, Daniel Wagner wrote: > > On Mon, Feb 15, 2021 at 01:29:45PM -0800, Sagi Grimberg wrote: > > > Well, I think we should probably figure out why that is happening first. > > > > I got my hands on a tcpdump trace. I've trimmed it to this: > > > > > > No. Time Source Destination Protocol Length Info > > 1 0.000000 10.228.194.30 10.228.38.214 NVMe 138 NVMe Write > > 2 0.000285 10.228.38.214 10.228.194.30 NVMe/TCP 90 Ready To Transfer > > 3 0.000591 10.228.194.30 10.228.38.214 NVMe 4186 NVMe Write: Data > > 4 0.000673 10.228.38.214 10.228.194.30 TCP 66 4420 → 58535 [ACK] Seq=25 Ack=4193 Win=241 Len=0 TSval=2655324576 TSecr=1497295579 > > 5 0.002140 10.228.38.214 10.228.194.30 NVMe 90 NVMe Write: Response > > 6 0.002511 10.228.194.30 10.228.38.175 NVMe 138 NVMe Write > > 7 0.002812 10.228.38.175 10.228.194.30 NVMe/TCP 90 Ready To Transfer > > 8 0.003006 10.228.194.30 10.228.38.175 NVMe 4186 NVMe Write: Data > > 9 0.003098 10.228.38.175 10.228.194.30 TCP 66 4420 → 51241 [ACK] Seq=25 Ack=4193 Win=241 Len=0 TSval=2183410196 TSecr=3601034207 > > 10 0.004420 10.228.38.175 10.228.194.30 NVMe 90 NVMe Write: Response > > 11 0.004890 10.228.38.214 10.228.194.30 NVMe/TCP 90 > > 12 0.004969 10.228.38.175 10.228.194.30 NVMe/TCP 90 > > > > > > The last few seconds contain only normal writes and suddenly the host > > receives two invalid packets. From what I see the host doesn't misbehave > > at all. I wonder if it would be possible to detect the invalid packet by > > locking at the PDU header only. If this would be possible we could > > discard it early and do not try to use the invalid command id. > > > > Here the last two packets with details: > > > > > > No. Time Source Destination Protocol Length Info > > 11 0.004890 10.228.38.214 10.228.194.30 NVMe/TCP 90 > > > > Frame 11: 90 bytes on wire (720 bits), 90 bytes captured (720 bits) > > Encapsulation type: Ethernet (1) > > Arrival Time: Feb 23, 2021 18:16:08.780574000 CET > > [Time shift for this packet: 0.000000000 seconds] > > Epoch Time: 1614100568.780574000 seconds > > [Time delta from previous captured frame: 0.000470000 seconds] > > [Time delta from previous displayed frame: 0.000470000 seconds] > > [Time since reference or first frame: 0.004890000 seconds] > > Frame Number: 11 > > Frame Length: 90 bytes (720 bits) > > Capture Length: 90 bytes (720 bits) > > [Frame is marked: False] > > [Frame is ignored: False] > > [Protocols in frame: eth:ethertype:ip:tcp:nvme-tcp] > > [Coloring Rule Name: TCP] > > [Coloring Rule String: tcp] > > Ethernet II, Src: IntelCor_41:16:c0 (b4:96:91:41:16:c0), Dst: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > Destination: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > Address: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) > > .... ...0 .... .... .... .... = IG bit: Individual address (unicast) > > Source: IntelCor_41:16:c0 (b4:96:91:41:16:c0) > > Address: IntelCor_41:16:c0 (b4:96:91:41:16:c0) > > .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) > > .... ...0 .... .... .... .... = IG bit: Individual address (unicast) > > Type: IPv4 (0x0800) > > Internet Protocol Version 4, Src: 10.228.38.214, Dst: 10.228.194.30 > > 0100 .... = Version: 4 > > .... 0101 = Header Length: 20 bytes (5) > > Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) > > 0000 00.. = Differentiated Services Codepoint: Default (0) > > .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) > > Total Length: 76 > > Identification: 0x0000 (0) > > Flags: 0x40, Don't fragment > > 0... .... = Reserved bit: Not set > > .1.. .... = Don't fragment: Set > > ..0. .... = More fragments: Not set > > Fragment Offset: 0 > > Time to Live: 64 > > Protocol: TCP (6) > > Header Checksum: 0x0000 [validation disabled] > > [Header checksum status: Unverified] > > Source Address: 10.228.38.214 > > Destination Address: 10.228.194.30 > > Transmission Control Protocol, Src Port: 4420, Dst Port: 46909, Seq: 1, Ack: 1, Len: 24 > > Source Port: 4420 > > Destination Port: 46909 > > [Stream index: 2] > > [TCP Segment Len: 24] > > Sequence Number: 1 (relative sequence number) > > Sequence Number (raw): 4175488220 > > [Next Sequence Number: 25 (relative sequence number)] > > Acknowledgment Number: 1 (relative ack number) > > Acknowledgment number (raw): 2966903626 > > 1000 .... = Header Length: 32 bytes (8) > > Flags: 0x018 (PSH, ACK) > > 000. .... .... = Reserved: Not set > > ...0 .... .... = Nonce: Not set > > .... 0... .... = Congestion Window Reduced (CWR): Not set > > .... .0.. .... = ECN-Echo: Not set > > .... ..0. .... = Urgent: Not set > > .... ...1 .... = Acknowledgment: Set > > .... .... 1... = Push: Set > > .... .... .0.. = Reset: Not set > > .... .... ..0. = Syn: Not set > > .... .... ...0 = Fin: Not set > > [TCP Flags: ·······AP···] > > Window: 257 > > [Calculated window size: 257] > > [Window size scaling factor: -1 (unknown)] > > Checksum: 0xfefa [unverified] > > [Checksum Status: Unverified] > > Urgent Pointer: 0 > > Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps > > TCP Option - No-Operation (NOP) > > Kind: No-Operation (1) > > TCP Option - No-Operation (NOP) > > Kind: No-Operation (1) > > TCP Option - Timestamps: TSval 4211986351, TSecr 1497291787 > > Kind: Time Stamp Option (8) > > Length: 10 > > Timestamp value: 4211986351 > > Timestamp echo reply: 1497291787 > > [SEQ/ACK analysis] > > [Bytes in flight: 24] > > [Bytes sent since last PSH flag: 24] > > [Timestamps] > > [Time since first frame in this TCP stream: 0.000000000 seconds] > > [Time since previous frame in this TCP stream: 0.000000000 seconds] > > TCP payload (24 bytes) > > [PDU Size: 24] > > NVM Express Fabrics TCP > > Pdu Type: CapsuleResponse (5) > > Pdu Specific Flags: 0x00 > > .... ...0 = PDU Header Digest: Not set > > .... ..0. = PDU Data Digest: Not set > > .... .0.. = PDU Data Last: Not set > > .... 0... = PDU Data Success: Not set > > Pdu Header Length: 24 > > Pdu Data Offset: 0 > > Packet Length: 24 > > Unknown Data: 02000400000000001c0000001f000000 > > > > 0000 00 00 0c 9f f5 a8 b4 96 91 41 16 c0 08 00 45 00 .........A....E. > > 0010 00 4c 00 00 40 00 40 06 00 00 0a e4 26 d6 0a e4 .L..@.@.....&... > > 0020 c2 1e 11 44 b7 3d f8 e0 e4 dc b0 d7 5b 4a 80 18 ...D.=......[J.. > > 0030 01 01 fe fa 00 00 01 01 08 0a fb 0d cf af 59 3e ..............Y> > > 0040 dc 0b 05 00 18 00 18 00 00 00 02 00 04 00 00 00 ................ > > 0050 00 00 1c 00 00 00 1f 00 00 00 .......... > > > > No. Time Source Destination Protocol Length Info > > 12 0.004969 10.228.38.175 10.228.194.30 NVMe/TCP 90 > > > > Frame 12: 90 bytes on wire (720 bits), 90 bytes captured (720 bits) > > Encapsulation type: Ethernet (1) > > Arrival Time: Feb 23, 2021 18:16:08.780653000 CET > > [Time shift for this packet: 0.000000000 seconds] > > Epoch Time: 1614100568.780653000 seconds > > [Time delta from previous captured frame: 0.000079000 seconds] > > [Time delta from previous displayed frame: 0.000079000 seconds] > > [Time since reference or first frame: 0.004969000 seconds] > > Frame Number: 12 > > Frame Length: 90 bytes (720 bits) > > Capture Length: 90 bytes (720 bits) > > [Frame is marked: False] > > [Frame is ignored: False] > > [Protocols in frame: eth:ethertype:ip:tcp:nvme-tcp] > > [Coloring Rule Name: TCP] > > [Coloring Rule String: tcp] > > Ethernet II, Src: IntelCor_41:16:c0 (b4:96:91:41:16:c0), Dst: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > Destination: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > Address: Cisco_9f:f5:a8 (00:00:0c:9f:f5:a8) > > .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) > > .... ...0 .... .... .... .... = IG bit: Individual address (unicast) > > Source: IntelCor_41:16:c0 (b4:96:91:41:16:c0) > > Address: IntelCor_41:16:c0 (b4:96:91:41:16:c0) > > .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) > > .... ...0 .... .... .... .... = IG bit: Individual address (unicast) > > Type: IPv4 (0x0800) > > Internet Protocol Version 4, Src: 10.228.38.175, Dst: 10.228.194.30 > > 0100 .... = Version: 4 > > .... 0101 = Header Length: 20 bytes (5) > > Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) > > 0000 00.. = Differentiated Services Codepoint: Default (0) > > .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) > > Total Length: 76 > > Identification: 0x0000 (0) > > Flags: 0x40, Don't fragment > > 0... .... = Reserved bit: Not set > > .1.. .... = Don't fragment: Set > > ..0. .... = More fragments: Not set > > Fragment Offset: 0 > > Time to Live: 64 > > Protocol: TCP (6) > > Header Checksum: 0x0000 [validation disabled] > > [Header checksum status: Unverified] > > Source Address: 10.228.38.175 > > Destination Address: 10.228.194.30 > > Transmission Control Protocol, Src Port: 4420, Dst Port: 34895, Seq: 1, Ack: 1, Len: 24 > > Source Port: 4420 > > Destination Port: 34895 > > [Stream index: 3] > > [TCP Segment Len: 24] > > Sequence Number: 1 (relative sequence number) > > Sequence Number (raw): 3092812012 > > [Next Sequence Number: 25 (relative sequence number)] > > Acknowledgment Number: 1 (relative ack number) > > Acknowledgment number (raw): 2384147181 > > 1000 .... = Header Length: 32 bytes (8) > > Flags: 0x018 (PSH, ACK) > > 000. .... .... = Reserved: Not set > > ...0 .... .... = Nonce: Not set > > .... 0... .... = Congestion Window Reduced (CWR): Not set > > .... .0.. .... = ECN-Echo: Not set > > .... ..0. .... = Urgent: Not set > > .... ...1 .... = Acknowledgment: Set > > .... .... 1... = Push: Set > > .... .... .0.. = Reset: Not set > > .... .... ..0. = Syn: Not set > > .... .... ...0 = Fin: Not set > > [TCP Flags: ·······AP···] > > Window: 257 > > [Calculated window size: 257] > > [Window size scaling factor: -1 (unknown)] > > Checksum: 0xfed3 [unverified] > > [Checksum Status: Unverified] > > Urgent Pointer: 0 > > Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps > > TCP Option - No-Operation (NOP) > > Kind: No-Operation (1) > > TCP Option - No-Operation (NOP) > > Kind: No-Operation (1) > > TCP Option - Timestamps: TSval 3874335934, TSecr 3601030412 > > Kind: Time Stamp Option (8) > > Length: 10 > > Timestamp value: 3874335934 > > Timestamp echo reply: 3601030412 > > [SEQ/ACK analysis] > > [Bytes in flight: 24] > > [Bytes sent since last PSH flag: 24] > > [Timestamps] > > [Time since first frame in this TCP stream: 0.000000000 seconds] > > [Time since previous frame in this TCP stream: 0.000000000 seconds] > > TCP payload (24 bytes) > > [PDU Size: 24] > > NVM Express Fabrics TCP > > Pdu Type: CapsuleResponse (5) > > Pdu Specific Flags: 0x00 > > .... ...0 = PDU Header Digest: Not set > > .... ..0. = PDU Data Digest: Not set > > .... .0.. = PDU Data Last: Not set > > .... 0... = PDU Data Success: Not set > > Pdu Header Length: 24 > > Pdu Data Offset: 0 > > Packet Length: 24 > > Unknown Data: 02000400000000001b0000001f000000 > > > > 0000 00 00 0c 9f f5 a8 b4 96 91 41 16 c0 08 00 45 00 .........A....E. > > 0010 00 4c 00 00 40 00 40 06 00 00 0a e4 26 af 0a e4 .L..@.@.....&... > > 0020 c2 1e 11 44 88 4f b8 58 90 ec 8e 1b 32 ed 80 18 ...D.O.X....2... > > 0030 01 01 fe d3 00 00 01 01 08 0a e6 ed ac be d6 a3 ................ > > 0040 5d 0c 05 00 18 00 18 00 00 00 02 00 04 00 00 00 ]............... > > 0050 00 00 1b 00 00 00 1f 00 00 00 .......... > > > As I suspected, we did receive an invalid frame. > Data digest would have saved us, but then it's not enabled. > > So we do need to check if the request is valid before processing it. That's just addressing a symptom. You can't fully verify the request is valid this way because the host could have started the same command ID the very moment before the code checks it, incorrectly completing an in-flight command and getting data corruption.