Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BDBBC38142 for ; Tue, 31 Jan 2023 22:14:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230281AbjAaWOT (ORCPT ); Tue, 31 Jan 2023 17:14:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229518AbjAaWOS (ORCPT ); Tue, 31 Jan 2023 17:14:18 -0500 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68E9745F67 for ; Tue, 31 Jan 2023 14:14:17 -0800 (PST) Received: by mail-pf1-x42e.google.com with SMTP id 203so9978962pfx.6 for ; Tue, 31 Jan 2023 14:14:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=umich.edu; s=google-2016-06-03; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1KYsNE6Z96segeDRSVZFSezE6iMliAWlUEMx75EFz+c=; b=UnJId3DdpjbPRQ26F1xHmXYoLcb15Kuqts1w44fR/KIorojtMfl/ABoejoLjfrlw1g BPnuVjhnBLBiqR65ozLdTthgPuoxVqfDo21kXs0S7qSbQJ9G7eoMyd2LLhnfL4WtoLla 25G6AZqa8IudQIWvtLTBxnhSKpaRwToewPkhA6hH3W/LpYUaoZWqC8b18rEDyD1S40lx pggy5jZfSNNmaI1smT2IMvgEv5oZVU8hLp3WAyx6t9XAgpDD00lBqZJfI3wY/VATL1yV 0Lbo0PA8jFzzbCan0+w46yPCF46AbdlyvoGfHb/8QLHPCM09K642WP39tZTMS3ynviI1 bzMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1KYsNE6Z96segeDRSVZFSezE6iMliAWlUEMx75EFz+c=; b=VImgqNhc65AdfJHjuOHQ/bDBa8O1tRXsw6Ag81xhSRn4KTXbKKEkPZR1EdvO9bDEW2 ih1pMIKKNw7uhd7FDaarZQZpqft2/mQK9ZNm904yeV3kIrurUo+Cmu+kBkuBpNWBAgmc sTow6Hyl2j5L5fR7pA1TxLOaY4TWCWRYL5LbGvczN9d89GqnrNhF1GVMmUTRoXXBjwPa B2Tjbp8EcqgVM6mfED8y9LbIK7kYZ1tqpm8pVv6UvjQG/g/C9D9oqHKatTPy62dS3dH+ ha/mCgPMsnywa4HrZB+HYzvbK/Y6us/Ra04s3VqsEYXuEe1gRfHTgvkq7t/D+z8QzR1W OXUg== X-Gm-Message-State: AO0yUKVrYytX63yAs50o5Q4QqmsbS++HMY7dRADBaynpckQ8567n4F92 hP1SOyo1u/qospJOYIJURk01KvWI7txjNNqM/2A= X-Google-Smtp-Source: AK7set+JGQG1WVSw1pzYerJUGU5KAANsfq1y50yLE6lfLcrSSvUhWlWExvoBDe8KBScql+PxBph/KzyjoLr7sfBi8O4= X-Received: by 2002:a05:6a00:809:b0:592:5885:862f with SMTP id m9-20020a056a00080900b005925885862fmr30519pfk.18.1675203256749; Tue, 31 Jan 2023 14:14:16 -0800 (PST) MIME-Version: 1.0 References: <654e3b7d15992d191b2b2338483f29aec8b10ee1.camel@kernel.org> <3c02bd2df703a68093db057c51086bbf767ffeb1.camel@kernel.org> <936efa478e786be19cb9715eba1941ebc4f94a1b.camel@kernel.org> <2bc328a4a292eb02681f8fc6ea626e83f7a3ae85.camel@kernel.org> <0BBE155A-CE56-40F7-A729-85D67A9C0CC3@oracle.com> In-Reply-To: From: Olga Kornievskaia Date: Tue, 31 Jan 2023 17:14:05 -0500 Message-ID: Subject: Re: Zombie / Orphan open files To: "Andrew J. Romero" Cc: Chuck Lever III , Jeff Layton , Linux NFS Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero wrote: > > > > > What you are describing sounds like a bug in a system (be it client or > > server). There is state that the client thought it closed but the > > server still keeping that state. > > Hi Olga > > Based on my simple test script experiment, > Here's a summary of what I believe is happening > > 1. An interactive user starts a process that opens a file or multiple files > > 2. A disruption, that prevents > NFS-client <-> NFS-server communication, > occurs while the file is open. This could be due to > having the file open a long time or due to opening the file > too close to the time of disruption. > > ( I believe the most common "disruption" is > credential expiration ) > > 3) The user's process terminates before the disruption > is cleared. ( or stated another way , the disruption is not cleared until after the user > process terminates ) > > At the time the user process terminates, the process > can not tell the server to close the server-side file state. > > After the process terminates, nothing will ever tell the server > to close the files. The now zombie open files will continue to > consume server-side resources. > > In environments with many users, the problem is significant > > My reasons for posting: > > - Are not to have your team help troubleshoot my specific issue > ( that would be quite rude ) > > they are: > > - Determine If my NAS vendor might be accidentally > not doing something they should be. > ( I now don't really think this is the case. ) It's hard to say who's at fault here without having some more info like tracepoints or network traces. > - Determine if this is a known behavior common to all NFS implementations > ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed > in the spec and the implementations. What you describe --- having different views of state on the client and server -- is not a known common behaviour. I have tried it on my Kerberos setup. Gotten a 5min ticket. As a user opened a file in a process that went to sleep. My user credentials have expired (after 5mins). I verified that by doing an "ls" on a mounted filesystem which resulted in permission denied error. Then I killed the application that had an opened file. This resulted in a NFS CLOSE being sent to the server using the machine's gss context (which is a default behaviour of the linux client regardless of whether or not user's credentials are valid). Basically as far as I can tell, a linux client can handle cleaning up state when user's credentials have expired. > > > > Andy > > > > >