Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4031067ybg; Fri, 25 Oct 2019 12:19:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqzkawNtnd7lzFBHvEGVTxOAPKCI1q15wB+E3zoQ4ALKo2cp4ivroo+pE/nmCLbsfWLnMWOy X-Received: by 2002:a50:9930:: with SMTP id k45mr5846531edb.134.1572031198832; Fri, 25 Oct 2019 12:19:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572031198; cv=none; d=google.com; s=arc-20160816; b=f+EPswO+S0lkEk8RtrdGhDWtstHWlsXe5h9Z0MHXydxDvwIQ2ryGrdpb/Fb6+mN3yB jtPejDxiuZnuhWL3ZJwUhcWW7K+w+Jpy087yfM/pOS0mUuhvu+Fut1vzES3cj0LWAmqz C2Gk4w3ocHAHFeqt1xeqMzs1aUAV977Y/HEGi+GAkNFMHLBg7rnChhFAlkQHQftj6MES cSngPnENlwPA2jlfIbfR88rTkk2LEDUanDRQfZUIJvpOU0URFhT2AZkd4MqC3K2xvJKB ngXmLb0IH0ibw+FV+r54SqIsWSdI5G5zDwXBKwdUi5Hr64yJM/fQp3t/qG1kqMvHS48x UWhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:cc:subject:date :to:from; bh=cLMEQrFHNNdMJCuE8FyGs8ptVde1MZjnHiwSD/9U+6Y=; b=rSAuHZfL3iPqhN4zneVGCTQenkVcYh+1M6fPopEIuRg/GYwqA6aTIDfFDt9NgC45ak SvGaLrBbCQ3XJJFKgWBmFhrRCSX+wrMzwZyDB9Uj05bL15KR0Z7g9l4WkAgrfAc/rDuF Is3iJpWBUmT7Jo+CNAHWTaiDD9OZKIY8LdFrG95J6iBVoDm3E6PVrkKm9dUUHxZvyqOZ aRSARmeAiuRaNU28hjUlAMmlwEX8LWw7WBspBXC1FxQe53tgfPT5ZfHuJSGjMmt/cIBZ +MdlVkIFyr+tyXFsNLcH8QyoVWj9Z4UaQ71UMg/6GgdeilaSOoGOd+TqWGVrAD/wkHfd 3Ccg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id hh9si1739978ejb.313.2019.10.25.12.19.34; Fri, 25 Oct 2019 12:19:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388791AbfJYBWs (ORCPT + 99 others); Thu, 24 Oct 2019 21:22:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:41630 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388701AbfJYBWs (ORCPT ); Thu, 24 Oct 2019 21:22:48 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 82D1EB2DC; Fri, 25 Oct 2019 01:22:46 +0000 (UTC) From: NeilBrown To: "J. Bruce Fields" Date: Fri, 25 Oct 2019 12:22:36 +1100 Subject: uncollected nfsd open owners cc: linux-nfs@vger.kernel.org Message-ID: <87mudpfwkj.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org --=-=-= Content-Type: text/plain Hi, I have a coredump from a machine that was running as an NFS server. nfs4_laundromat was trying to expire a client, and in particular was cleaning up the ->cl_openowners. As there were 6.5 million of these, it took rather longer than the softlockup timer thought was acceptable, and hence the core dump. Those open owners that I looked at had empty so_stateids lists, so I would normally expect them to be on the close_lru and to be removed fairly soon. But they weren't (only 32 openowners on close_lru). The only explanation I can think of for this is that maybe an OPEN request successfully got through nfs4_process_open1(), thus creating an open owner, but failed to get to or through nfs4_process_open2(), and so didn't add a stateid. I *think* this can leave an openowner that is unused but will never be cleaned up (until the client is expired, which might be too late). Is this possible? If so, how should we handle those openowners which never had a stateid? In 3.0 (which it the kernel were I saw this) I could probably just put the openowner on the close_lru when it is created. In more recent kernels, it seems to be assumed that openowners are only on close_lru if they have a oo_last_closed_stid. Would we need a separate "never used lru", or should they just be destroyed as soon as the open fails? Also, should we put a cond_resched() in some or all of those loops in __destroy_client() ?? Thanks for your help, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAl2yTl0ACgkQOeye3VZi gbko8RAAj6AYx2imKdC+lalM0I5ee5bnEnEzyPeke5KrYAgf0D7sk4pjsMYfvPDk 7XzQpTJiX5nq+MSdSa9Cg3fwvqmbG/3KYq4hTjI5Fuh8DOAqYs7jHBECjwRnxr/q njqxofOFJX55nNLPoudB2cWyqT9JVWb+/FUEMRxDhVW7Pj6D+anjTk0Tfn81Klkv KCYwG1Lj9NgcARyW+8NIx09ffNsrBNEZdJol94vWih8XK0mMQcZEyDuu5xXShoPa iKe92Ube91I5mRD4+MsXx2aWpTE0pUtF1JyOr31mTcl8GEs7iOsahASfXcgnPnAF kJdh3X/i5Ej8PsKluEHrhgEaFzRX682yvEZmW8sQilP/9FHDCjZu79MKYtyQtJvB V+NRW2AMtKHIKditMnZ0Lv8Wt9mcimG4+sI4C9+SVrpAILGTuQjU4WF98QOCrnMn Y+dfvs7tHlAuDXU2onRiwE0CVk8u2E5UOJhW6e/QP2SLo3oQsANPFm/FPlBpa74Z uO9fTKtMWaoTgCulgWUYBk7b0M1Nuq7F1qVHncYubniDSeC5X4KsBJx5lc3q0ddh +/dRGbcSdaoAQjQaKMTvziFVDUiHLYVgmqUvMbYXOcm/LxF9nt2xg2c7MgysIkZ4 fbpUa+TcGp69Wqg0Wdc9O4Nhw18dCi31VOuogToCLsjXMkDObBA= =1bck -----END PGP SIGNATURE----- --=-=-=--