Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1067540ybl; Fri, 10 Jan 2020 11:29:38 -0800 (PST) X-Google-Smtp-Source: APXvYqwnn85JN2ptPtwNUna0rdMnvW5rx57+9JJGDBGgDxQ2e3q6sEq7uhCAf/8p8gT/apOydz5I X-Received: by 2002:a9d:5545:: with SMTP id h5mr3988519oti.296.1578684578820; Fri, 10 Jan 2020 11:29:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578684578; cv=none; d=google.com; s=arc-20160816; b=ic+1Kxro1cL0TWD572i+/8gb2hvlLQ3nk7TOR9BxsgaOewg9Yto/AbbTzSvMi6W8Gu zYoZO/aRvZQT4vBs9OhKtidSPfQOZSEW01cGk3UR2Bg9dpV2znlml0ZYU3dMnzmh6UYU tduQVuCrkbniJfwlmzAhyg+c8PjL+jX5Sh5OBixgxJfCbZubP792gIo3cIWFu/DcLOCD fz6LbvjraUcP/6rlWfQi/js0NsT/OD6V/df8nLqBe7FYMTo4Ww6wnaAjzHzWifxY3XiL 8yGG7f3VITph2lgIeul0HdpY09gx9DMU/pDsgo0Bee6tnXccVESOH0MkcoGdAgH8v6Za 7oOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :mime-version:dkim-signature; bh=E8OpVKp6QaKf4QhVUGIW43Zlg74ANcKIdcgj47PoCyQ=; b=h/szuqECPq82mpi+SI3PUu5q7vMqZjoVO4MCUWk0oS70tFz6LhkaLseqde5LkA/jRS ao5BaZXolM2QR7WhFguO66BIRJW0Td0uXW2F/K6aoyPOAnKZZwk6xpB51lFhQ56H6IEH LGHBhoVAw4Y006LF6Fix8P/5K0RXx3acQTefMbyk4X9f6YzAkIjqeGXW/YiYyx9PfBpS oIDI1DwTlOjz9yYsFwSfMO3buiuAzc8oOT0VKFaQdsYzpso7eKhwqnpLm3d15nG0hYEU 1makvVmxBEDLH971DF+X+3eqf0IGwmbGLqW8pP/NmOOLWpJSWoRucF8aV9IpfE+FDDEr 31kQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=i3lYIs0c; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v21si1842740otj.282.2020.01.10.11.29.17; Fri, 10 Jan 2020 11:29:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@umich.edu header.s=google-2016-06-03 header.b=i3lYIs0c; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=umich.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728492AbgAJT3P (ORCPT + 99 others); Fri, 10 Jan 2020 14:29:15 -0500 Received: from mail-vs1-f53.google.com ([209.85.217.53]:36815 "EHLO mail-vs1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728202AbgAJT3O (ORCPT ); Fri, 10 Jan 2020 14:29:14 -0500 Received: by mail-vs1-f53.google.com with SMTP id u14so1967713vsu.3 for ; Fri, 10 Jan 2020 11:29:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=umich.edu; s=google-2016-06-03; h=mime-version:from:date:message-id:subject:to; bh=E8OpVKp6QaKf4QhVUGIW43Zlg74ANcKIdcgj47PoCyQ=; b=i3lYIs0c0zRjq1FE+FsXa3hL3bm9GiGT0HbfEJRZqCA7bHXBlT+44fVkpbAGFJOmub 49I+AgObyzPie+Brh+8U+1d+xBOvNaQl6uZLUF1LaZBR2kHPOZV5h/vfB5si2JB41DYv YM+h6Ge1p41cUE47o6FaPcn8jG+h8o7xmf6+ybZEeErHsVIZfrb5GEjKIS66cdasQ+Bv GBNjjKKq9+4SbENxX40e8szLIX5SC1NO4l3OS3+iyBRkTxa+tXgOUpn2kveC0KD6weyN X2ePiErx7G0y+PGm3jdpX6v+KHJ3Y5Rgy9ep/9wZJ63BaCu5KTITDelsEls+UdLR4nUl QLoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=E8OpVKp6QaKf4QhVUGIW43Zlg74ANcKIdcgj47PoCyQ=; b=f9Vpoh+ckMcHEzCuxdyRuKBZnR7MDp6YynW9beSQee6MV31mbyIKrHi8sOXrNVaCxw 8KZkeoru975iGiww3mYwpy9yE49qBfTeatUqtVk4kHsnOb0XEAWd1O8dgBQ7MU/wxsUf +eJmMrwfAEM2OjkuAFFsg+4hWeLPMK1zUMyn3KfuQdtWfQO4qFRajycislZmEuCuOkHQ EAKX3vuf0viteSytP0PMCHdNTYQ8j6I7WGeNcztiCiSm8ZxeapgOsaNKJaVwflhm4iPg n00UWFo11TSN8UwXs8c+7JSA+rC5vMthuHkKqLFh5KrdrIIs2eC2cHeflRBPv8wSp1G2 qYZg== X-Gm-Message-State: APjAAAUTW53TT9015bxDZtw82XObdkqs5FP5eGU2XGaFjBtO0uXTAWPU MJCTyTpQIVLJE32ELsqA+QWFSGIdHU7kCgreuE0pfOxu X-Received: by 2002:a05:6102:7a4:: with SMTP id x4mr127457vsg.85.1578684553496; Fri, 10 Jan 2020 11:29:13 -0800 (PST) MIME-Version: 1.0 From: Olga Kornievskaia Date: Fri, 10 Jan 2020 14:29:02 -0500 Message-ID: Subject: interrupted rpcs problem To: linux-nfs Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi folks, We are having an issue with an interrupted RPCs again. Here's what I see when xfstests were ctrl-c-ed. frame 332 SETATTR call slot=0 seqid=0x000013ca (I'm assuming this is interrupted and released) frame 333 CLOSE call slot=0 seqid=0x000013cb (only way the slot could be free before the reply if it was interrupted, right? Otherwise we should never have the slot used by more than one outstanding RPC) frame 334 reply to 333 with SEQ_MIS_ORDERED (I'm assuming server received frame 333 before 332) frame 336 CLOSE call slot=0 seqid=0x000013ca (??? why did we decremented it. I mean I know why it's in the current code :-/ ) frame 337 reply to 336 SEQUENCE with ERR_DELAY frame 339 reply to 332 SETATTR which nobody is waiting for frame 543 CLOSE call slot=0 seqid=0x000013ca (retry after waiting for err_delay) frame 544 reply to 543 with SETATTR (out of the cache). What this leads to is: file is never closed on the server. Can't remove it. Unmount fails with CLID_BUSY. I believe that's the result of commit 3453d5708b33efe76f40eca1c0ed60923094b971. We used to have code that bumped the sequence up when the slot was interrupted but after the commit "NFSv4.1: Avoid false retries when RPC calls are interrupted". Commit has this "The obvious fix is to bump the sequence number pre-emptively if an RPC call is interrupted, but in order to deal with the corner cases where the interrupted call is not actually received and processed by the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED as a sign that we need to either wait or locate a correct sequence number that lies between the value we sent, and the last value that was acked by a SEQUENCE call on that slot." If we can't no longer just bump the sequence up, I don't think the correct action is to automatically bump it down (as per example here)? The commit doesn't describe the corner case where it was necessary to bump the sequence up. I wonder if we can return the knowledge of the interrupted slot and make a decision based on that as well as whatever the other corner case is. I guess what I'm getting is, can somebody (Trond) provide the info for the corner case for this that patch was created. I can see if I can fix the "common" case which is now broken and not break the corner case.... Thank you