Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp8670253ybn; Tue, 1 Oct 2019 11:24:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqyiSeK0gSxnx0gdUpkWn2fPOav9uYW0fkyZhLoWbjqLRJrpoaIjB32289PHKW5fG4euwTZg X-Received: by 2002:a17:906:af57:: with SMTP id ly23mr25157813ejb.269.1569954250746; Tue, 01 Oct 2019 11:24:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569954250; cv=none; d=google.com; s=arc-20160816; b=fP0baIqAwJ2Cy10c92g2sVUb31yvL/m1Tn6dPxysz//8jRKhZCpOWMxFPiGWCvPuIm Em1yBzVvTyQc0dwDkixzIZuAH+Grf6nHQxZfrAspaKZmNJeTYsTMLOXFz2zp8eWZ0Qhj mhjh6++KFi3VWrB7uIv5mXkwXHs+7/JK5InZ7DHdEjTyoUpUAjuRa03ET0WOTO0jfu8B UNGLEQJ8aJcVeLQ/ykqUI3SusXAVkz8NZM9ugyp9niv7rBMW+4P8Xi1rBO0xnAO/XdQ9 M7VNcbHVD8s4UTU2Ns59OxcUMZquG8EgqSNAKiPGE4SykZBye800rR9h0geAU2icpl+g /mWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=kf2uioaUfS5TcJnrsVknF2Eds3jP37e4zMHr9brhv84=; b=fntnsaHL8RFch9RswRHfOT3udjC3Ka7IbKmFhdDDRRGHWTFAM8YPLZRJOID8AWBqNN EUMo1Aara5aur+wRPXCvDg7jcfYTJB21hkftMEKeEXQ9Ut5aZRNnkRhytjvyJiVvzCaC S2hZCgGXyCJ9HN1A3rymAIHqjgFdv6tjHiUMZUR8w1E7rIMGXde/uK6b/IrDQ9stX8EK ZO5/VV9QonyvN6luk7i30lP9EcgLgWTkuzy1LWAstF3LBS2w7LtzOxrjM47HBUUZnH4c kTmNvNj9C1Sy5OeGyWLvRN8eYq8XlqDOsGI6XalSkPoGbi0RxJeKNkKFtUNown0dZGSw fydg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z10si9389974ejx.236.2019.10.01.11.23.35; Tue, 01 Oct 2019 11:24:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731261AbfJASV1 (ORCPT + 99 others); Tue, 1 Oct 2019 14:21:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3743 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726376AbfJASV1 (ORCPT ); Tue, 1 Oct 2019 14:21:27 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1190130A04E3; Tue, 1 Oct 2019 18:21:27 +0000 (UTC) Received: from [172.16.176.1] (ovpn-64-2.rdu2.redhat.com [10.10.64.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 907AC5D6D0; Tue, 1 Oct 2019 18:21:26 +0000 (UTC) From: "Benjamin Coddington" To: "James Harvey" , "Trond Myklebust" Cc: "Linux NFS Mailing List" Subject: Re: 5.3.0 Regression: rpc.nfsd v4 uninterruptible sleep for 5+ minutes w/o rpc-statd/etc Date: Tue, 01 Oct 2019 14:21:26 -0400 Message-ID: <720574D9-90C7-4A79-8DA6-9A683CFD98CB@redhat.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 01 Oct 2019 18:21:27 +0000 (UTC) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 19 Sep 2019, at 9:00, James Harvey wrote: > For a really long time (years?) if you forced NFS v4 only, you could > mask a lot of unnecessary services. > > In /etc/nfs.conf, in "[nfsd] I've been able to set "vers3=n", and then > mask the following services: > * gssproxy > * nfs-blkmap > * rpc-statd > * rpcbind (service & socket) > > Upgrading from 5.2.14 to 5.3.0, nfs-server.service (rpc.nfsd) has > exactly a 5 minute delay, and sometimes longer. A bisect ends on: 4f8943f80883 SUNRPC: Replace direct task wakeups from softirq context That commit changed the way we pull the error from the socket, previously we'd wake the task with whatever error is in sk_err from xs_error_report(), but now we use SO_ERROR - but that's only after possibly running through xs_wake_disconnect which forces a closure which can change sk_err. So, I think xs_error_report sees ECONNREFUSED, but we wake tasks with ENOTCONN, and the client machine spins us back around again to reconnect, we do this until things time out. I'll send a patch to revert to the previous behavior of waking tasks with the error as it was in xs_error_report by copying it over to the sock_xprt struct and waking the tasks with that value. There's another subtle change here besides that race: SO_ERROR can return the socket's soft error, not just what's in sk_err. That can be fun things like EINVAL if routing lookups fail.. Ben