Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1178448ybg; Sat, 26 Oct 2019 14:36:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqwBN74RU7QaUXsSH/OVCRvSrG3kapWuQwVaEeMkf7T2e47muVrHJlTAtic6iTn7IwcYrxh8 X-Received: by 2002:a17:906:69cc:: with SMTP id g12mr9707266ejs.235.1572125812003; Sat, 26 Oct 2019 14:36:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572125811; cv=none; d=google.com; s=arc-20160816; b=ZGWOQDymwVxi/I6XppjaTTTLRO3Hne0gxSjppkr3pSB5Wn/4o46aXC2j+NxEPGOkfr 7zVluXonKttbFaFtaz8RRLVOAGKvr97hYZQ1onmkWWANDYuCClOX2Xs972dtdpPSLJ23 rLlVniivkBLocESwnw9gUKoKRKvuFW7nDRK0stEdK6ybkCf+C2rzCwUdE9CBq1vew0Pn bjX97KhtY3y7eUWfccf6cauAg/uLwdOZqR41cxf5Og8M69gDn2vfkaaVrFP7/6Rg7KLh 9R7P+QuHQx3h/zG+yoe5ubHx1EkQO8qM6AyLOB3rI525p4Z5NVzZVY4sFfIHBVU1VvkU rguQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:from:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date; bh=/QA83N2nlO9G3urYLrNkHywXUTUo19MAdamCeR+bq/k=; b=LPDSqqbAcPa0W8DcNHkmNC/QSUVaP/PKcrQfJPnEdoJCGvOcJ2wc0wMka/CGbEI9f1 emDqTazJ2n7EJ0GTSOZsl0fSp5wGjah9/CayDplTYNrjLYCk+x9ABV2J4mePUqiKCznk JaBxUWHgrJkd1pK46FRe2BmeED+gE6k3NfwFwMFq5obstHomjpms9wNt7GHeBGW3YXYM pidyF99jelxZ9t/Mb//IFLAsPWWYBxpwvLo7mOl9tU/1tvkXzwRcSpK/AOZ+9v/11AaR Dj9/1eYLTasgTKjqkcQUfvJaOW2xox8wap1OpJCMGmXhjYx2SuFtGFNgvIZoqj0SNtrh D/SA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d44si4554820ede.149.2019.10.26.14.36.08; Sat, 26 Oct 2019 14:36:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726409AbfJZVgH (ORCPT + 99 others); Sat, 26 Oct 2019 17:36:07 -0400 Received: from fieldses.org ([173.255.197.46]:32910 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726392AbfJZVgH (ORCPT ); Sat, 26 Oct 2019 17:36:07 -0400 Received: by fieldses.org (Postfix, from userid 2815) id 90281BD0; Sat, 26 Oct 2019 17:36:06 -0400 (EDT) Date: Sat, 26 Oct 2019 17:36:06 -0400 To: "J. Bruce Fields" Cc: NeilBrown , linux-nfs@vger.kernel.org Subject: Re: uncollected nfsd open owners Message-ID: <20191026213606.GA11394@fieldses.org> References: <87mudpfwkj.fsf@notabene.neil.brown.name> <20191025152047.GB16053@pick.fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191025152047.GB16053@pick.fieldses.org> User-Agent: Mutt/1.5.21 (2010-09-15) From: bfields@fieldses.org (J. Bruce Fields) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, Oct 25, 2019 at 11:20:47AM -0400, J. Bruce Fields wrote: > On Fri, Oct 25, 2019 at 12:22:36PM +1100, NeilBrown wrote: > > I have a coredump from a machine that was running as an NFS server. > > nfs4_laundromat was trying to expire a client, and in particular was > > cleaning up the ->cl_openowners. > > As there were 6.5 million of these, it took rather longer than the > > softlockup timer thought was acceptable, and hence the core dump. > > > > Those open owners that I looked at had empty so_stateids lists, so I > > would normally expect them to be on the close_lru and to be removed > > fairly soon. But they weren't (only 32 openowners on close_lru). > > > > The only explanation I can think of for this is that maybe an OPEN > > request successfully got through nfs4_process_open1(), thus creating an > > open owner, but failed to get to or through nfs4_process_open2(), and > > so didn't add a stateid. I *think* this can leave an openowner that is > > unused but will never be cleaned up (until the client is expired, which > > might be too late). > > > > Is this possible? If so, how should we handle those openowners which > > never had a stateid? > > In 3.0 (which it the kernel were I saw this) I could probably just put > > the openowner on the close_lru when it is created. > > In more recent kernels, it seems to be assumed that openowners are only > > on close_lru if they have a oo_last_closed_stid. Would we need a > > separate "never used lru", or should they just be destroyed as soon as > > the open fails? > > Hopefully we can just throw the new openowner away when the open fails. > > But it looks like the new openowner is visible on global data structures > by then, so we need to be sure somebody else isn't about to use it. But, also, if this has only been seen on 3.0, it may have been fixed already. It sounds like kind of a familiar problem, but I didn't spot a relevant commit on a quick look through the logs. --b.