Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp720869pxm; Fri, 25 Feb 2022 18:17:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwJc5xJuniWbsMGaIchJp4Rknm+G6GWrwY31FQYd2hIC/dDrvNiDfRO+J+/AdOXYuwoodIi X-Received: by 2002:a63:1826:0:b0:373:41d9:b14e with SMTP id y38-20020a631826000000b0037341d9b14emr8398781pgl.197.1645841827919; Fri, 25 Feb 2022 18:17:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645841827; cv=none; d=google.com; s=arc-20160816; b=cgSqfMJxHtHqVGVk+mSN22BYODB69z6GcmFRFJNeiPGHPYAJdKot+xb3BEHuPmt7O4 fI6LQRtHCEKVcYNFhFipqD2CO9DZ+bVBJxMen2E7DSqoI6hS04uCe4NlirD4k596G78a r2Empl3kXtDz0ISi8BwetAHF8xIzBlpRoQHNNE56VhT/mHbnSM3gmq6rLT0Olg+Q0jEv zZIiaLNclvQ1E0PtMZhjXjHdYbebOmlC9/CmUnjjA3Tu6FujAb0eFfOitXbeXWKSXkRe g/ZhAhJuEQILzy5yDAdUS8XoHnQSdBJLmuNiGLv2H49oW14/gchZmGOxWbHouCnDy51r IzNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=b0gHlyQ3Vv6H4NGb9D1cfNmctBTYMsrsrTJTTns+5zg=; b=eBsFiL2Vs54qsWYaf/HmBORf1J4abPL/Yb9MLNkMF5kxh/GdQj7DbpjOl772Eg5MyB OS5IfB9tio4kepAuk5o/y0nr9HEoRdyn2syPt6dqHqocmS4bq4cB7M1tXMDEK/Ble9lm KkrrHlOuIsDQO07bx3ghACP/x2iYqKswus4kjol9UkpdKMW7w+ezE4MR3GLaE1wiVm48 5J3zfkVBWyqMUwdd1FSQj1luzVmXE5XmM++ib5leB52FXZSbnlb4hMBaNqPbi3IZ98k8 IIX5qMAFjH773dS1HEhisL2eySwkhDDDOJxvo5nUwJW1lc28fTH9YWkLHukReZsg9nhW aHWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ACLKe2oP; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id l15-20020a170903120f00b0014f88317852si3624306plh.606.2022.02.25.18.17.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:17:07 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ACLKe2oP; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E0FBF26297F; Fri, 25 Feb 2022 17:50:09 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241246AbiBYOo4 (ORCPT + 99 others); Fri, 25 Feb 2022 09:44:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241351AbiBYOoy (ORCPT ); Fri, 25 Feb 2022 09:44:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0215C1FED91 for ; Fri, 25 Feb 2022 06:44:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645800261; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b0gHlyQ3Vv6H4NGb9D1cfNmctBTYMsrsrTJTTns+5zg=; b=ACLKe2oP3VVWeIFIVV5lq9KXvoWGp7nMw52btjUqT2TW07m7vV+8fuqopV44IT0YSjnEuG 0UxWl8uTQAOtWPPkf2OYBTr//7GUGa3EZaVcs+c55Qi5GIEI3yS9Z93OTFnTNG6YmdjddR LyBaDeNLkCqlcgbOhWV82mdN8+LR2Oc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-241-bEaofyv7NDOuX4oDh6zzUw-1; Fri, 25 Feb 2022 09:44:19 -0500 X-MC-Unique: bEaofyv7NDOuX4oDh6zzUw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8C292800425; Fri, 25 Feb 2022 14:44:18 +0000 (UTC) Received: from [172.16.176.1] (ovpn-64-2.rdu2.redhat.com [10.10.64.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1D5FE84961; Fri, 25 Feb 2022 14:44:17 +0000 (UTC) From: "Benjamin Coddington" To: "Trond Myklebust" Cc: linux-nfs@vger.kernel.org Subject: Re: [PATCH v7 05/21] NFS: Store the change attribute in the directory page cache Date: Fri, 25 Feb 2022 09:44:16 -0500 Message-ID: <11744FC6-5EFB-427A-ADB4-D211BA1C74F4@redhat.com> In-Reply-To: References: <20220223211305.296816-1-trondmy@kernel.org> <20220223211305.296816-2-trondmy@kernel.org> <20220223211305.296816-3-trondmy@kernel.org> <20220223211305.296816-4-trondmy@kernel.org> <20220223211305.296816-5-trondmy@kernel.org> <20220223211305.296816-6-trondmy@kernel.org> <0DBE97BF-3A88-49FD-B078-012B5EDA5849@redhat.com> <1ca16f7e7be9588d15525e3afa4f7a80eb66b12b.camel@hammerspace.com> <9506801a7b7b6330ce2807721da5e03d77cf5c78.camel@hammerspace.com> <640B2705-35C6-4E9E-89D2-CC3D0E10EC3A@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 25 Feb 2022, at 8:10, Trond Myklebust wrote: > On Fri, 2022-02-25 at 06:38 -0500, Benjamin Coddington wrote: >> On 24 Feb 2022, at 22:51, Trond Myklebust wrote: >> >>> On Fri, 2022-02-25 at 02:26 +0000, Trond Myklebust wrote: >>>> On Thu, 2022-02-24 at 09:53 -0500, Benjamin Coddington wrote: >>>>> On 23 Feb 2022, at 16:12, trondmy@kernel.org=C2=A0wrote: >>>>> >>>>>> From: Trond Myklebust >>>>>> >>>>>> Use the change attribute and the first cookie in a directory >>>>>> page >>>>>> cache >>>>>> entry to validate that the page is up to date. >>>>>> >>>>>> Suggested-by: Benjamin Coddington >>>>>> Signed-off-by: Trond Myklebust >>>>>> >>>>>> --- >>>>>> =C2=A0fs/nfs/dir.c | 68 >>>>>> ++++++++++++++++++++++++++++------------------------ >>>>>> =C2=A01 file changed, 37 insertions(+), 31 deletions(-) >>>>>> >>>>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >>>>>> index f2258e926df2..5d9367d9b651 100644 >>>>>> --- a/fs/nfs/dir.c >>>>>> +++ b/fs/nfs/dir.c >>>>>> @@ -139,6 +139,7 @@ struct nfs_cache_array_entry { >>>>>> =C2=A0}; >>>>>> >>>>>> =C2=A0struct nfs_cache_array { >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0u64 change_attr; >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0u64 last_cookie; >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned int size;= >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned char page= _full : 1, >>>>>> @@ -175,7 +176,8 @@ static void nfs_readdir_array_init(struct >>>>>> nfs_cache_array *array) >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0memset(array, 0, s= izeof(struct nfs_cache_array)); >>>>>> =C2=A0} >>>>>> >>>>>> -static void nfs_readdir_page_init_array(struct page *page, >>>>>> u64 >>>>>> last_cookie) >>>>>> +static void nfs_readdir_page_init_array(struct page *page, >>>>>> u64 >>>>>> last_cookie, >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0u64 >>>>>> change_attr) >>>>>> =C2=A0{ >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct nfs_cache_a= rray *array; >>>>> >>>>> >>>>> There's a hunk missing here, something like: >>>>> >>>>> @@ -185,6 +185,7 @@ static void >>>>> nfs_readdir_page_init_array(struct >>>>> page >>>>> *page, u64 last_cookie, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 nfs_readdir_array_= init(array); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 array->last_cookie= =3D last_cookie; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 array->cookies_are= _ordered =3D 1; >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 array->change_attr =3D change= _attr; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 kunmap_atomic(arra= y); >>>>> =C2=A0 } >>>>> >>>>>> >>>>>> @@ -207,7 +209,7 @@ nfs_readdir_page_array_alloc(u64 >>>>>> last_cookie, >>>>>> gfp_t gfp_flags) >>>>>> =C2=A0{ >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct page *page = =3D alloc_page(gfp_flags); >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (page) >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0nfs_readdir_page_init_array(page, >>>>>> last_cookie); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0nfs_readdir_page_init_array(page, >>>>>> last_cookie, >>>>>> 0); >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return page; >>>>>> =C2=A0} >>>>>> >>>>>> @@ -304,19 +306,44 @@ int nfs_readdir_add_to_array(struct >>>>>> nfs_entry >>>>>> *entry, struct page *page) >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return ret; >>>>>> =C2=A0} >>>>>> >>>>>> +static bool nfs_readdir_page_cookie_match(struct page *page, >>>>>> u64 >>>>>> last_cookie, >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 >>>>>> u64 change_attr) >>>>> >>>>> How about "nfs_readdir_page_valid()"?=C2=A0 There's more going on >>>>> than a >>>>> cookie match. >>>>> >>>>> >>>>>> +{ >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct nfs_cache_array = *array =3D kmap_atomic(page); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int ret =3D true; >>>>>> + >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (array->change_attr = !=3D change_attr) >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0ret =3D false; >>>>> >>>>> Can we skip the next test if ret =3D false? >>>> >>>> I'd expect the compiler to do that. >>>> >>>>> >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (array->size > 0 && = array->array[0].cookie !=3D >>>>>> last_cookie) >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0ret =3D false; >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0kunmap_atomic(array); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return ret; >>>>>> +} >>>>>> + >>>>>> +static void nfs_readdir_page_unlock_and_put(struct page >>>>>> *page) >>>>>> +{ >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unlock_page(page); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0put_page(page); >>>>>> +} >>>>>> + >>>>>> =C2=A0static struct page *nfs_readdir_page_get_locked(struct >>>>>> address_space >>>>>> *mapping, >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0p= goff_t >>>>>> index, >>>>>> u64 >>>>>> last_cookie) >>>>>> =C2=A0{ >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct page *page;= >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0u64 change_attr; >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0page =3D grab_cach= e_page(mapping, index); >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (page && !PageUptoda= te(page)) { >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0nfs_readdir_page_init_array(page, >>>>>> last_cookie); >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0if >>>>>> (invalidate_inode_pages2_range(mapping, index >>>>>> + >>>>>> 1, -1) < 0) >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0n= fs_zap_mapping(mapping->host, >>>>>> mapping); >>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0SetPageUptodate(page); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!page) >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0return NULL; >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0change_attr =3D >>>>>> inode_peek_iversion_raw(mapping->host); >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (PageUptodate(page))= { >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0if >>>>>> (nfs_readdir_page_cookie_match(page, >>>>>> last_cookie, >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 >>>>>> change_attr)) >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0r= eturn page; >>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0nfs_readdir_clear_array(page); >>>>> >>>>> >>>>> Why use i_version rather than nfs_save_change_attribute?=C2=A0 Seem= s >>>>> having a >>>>> consistent value across the pachecache and dir_verifiers would >>>>> help >>>>> debugging, and we've already have a bunch of machinery around >>>>> the >>>>> change_attribute. >>>> >>>> The directory cache_change_attribute is not reported in >>>> tracepoints >>>> because it is a directory-specific field, so it's not as useful >>>> for >>>> debugging. >>>> >>>> The inode change attribute is what we have traditionally used for >>>> determining cache consistency, and when to invalidate the cache. >>> >>> I should probably elaborate a little further on the differences >>> between >>> the inode change attribute and the cache_change_attribute. >>> >>> One of the main reasons for introducing the latter was to have >>> something that allows us to track changes to the directory, but to >>> avoid forcing unnecessary revalidations of the dcache. >>> >>> What this means is that when we create or remove a file, and the >>> pre/post-op attributes tell us that there were no third party >>> changes >>> to the directory, we update the dcache, but we do _not_ update the >>> cache_change_attribute, because we know that the rest of the >>> directory >>> contents are valid, and so we don't have to revalidate the >>> dentries. >>> However in that case, we _do_ want to update the readdir cache to >>> reflect the fact that an entry was added or deleted. While we could >>> figure out how to remove an entry (at least for the case where the >>> filesystem is case-sensitive), we do not know where the filesystem >>> added the new file, or what cookies was assigned. >>> >>> This is why the inode change attribute is more appropriate for >>> indexing >>> the page cache pages. It reflects the cases where we want to >>> revalidate >>> the readdir cache, as opposed to the dcache. >> >> Ok, thanks for explaining this. >> >> I've noticed that you haven't responded about my concerns about not >> checking >> the directory for changes with every v4 READDIR.=C2=A0 For v3, we have= >> post-op >> updates to the directory, but with v4 the directory can change and >> we'll >> end up with entries in the cache that are marked with an old >> change_attr. >> > > Then they will be rejected by nfs_readdir_page_cookie_match() if a = > user > looks up that page again after we've revalidated the change attribute > on the directory. > > ...and note that NFSv4 does returns a struct change_info4 for all > operations that change the directory, so we will update the change > attribute in all those cases. I'm not worried about changes from the same client. > If the change is made on the server, well then we will detect it > through the standard revalidation process that usually decides when to > invalidate the directory page cache. The environments I'm concerned about are setup very frequently: they = look like multiple NFS clients co-ordinating on a directory with millions of files. Some clients are adding files as they do work, other clients are then looking for those files by walking the directory entries to = validate their existence. The systems that do this have a "very bad time" if = some of them produce listings that are _dramatically_ and transiently = different from a listing they produced before. That can happen really easily with what we've got here, and it can = create a huge problem for these setups. And it won't be easily reproduceable, = and it will be hard to find. It will cost everyone involved a lot of time and effort to track down, and we can fix it easily. >> I'm pretty positive that not checking for changes to the directory >> (not >> sending GETATTR with READDIR) is going to create cases of double- >> listed >> and >> truncated-listings for dirctory listers.=C2=A0 Not handling those case= s >> means >> I'm >> going to have some very unhappy customers that complain about their >> files >> disappearing/reappearing on NFS. >> >> If you need me to prove that its an issue, I can take the time to >> write >> up >> program that shows this problem. >> > > If you label the page contents with an attribute that was retrieved > _after_ the READDIR op, then you will introduce this as a problem for > your customers. No the problem is already here, we're not introducing it. By labeling = the page contents with every call we're shifting the race window from the = client where it's a very large window to the server where the window is small. Its still possible, but *much* less likely. > The reason is that there is no atomicity between operations in a > COMPOUND. Worse, the implementation of readdir in scalable modern > systems, including Linux, does not even guarantee atomicity of the > readdir operation itself. Instead each readdir entry is filled without > holding any locks or preventing any changes to the directory or to the > object itself. I understand all this, but its not a reason to make the problem worse. > POSIX states very explicitly that if you're making changes to the > directory after the call to opendir() or rewinddir(), then the > behaviour w.r.t. whether that file appears in the readdir() call is > unspecified. See > https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html= Yes, but again - just because the problem exists doesn't give us reason = to amplify it when we can easily make a better choice for almost no cost. Here are my reasons for wanting the GETATTR added: - it makes it *much* less likely for this problem to occur, with the = minor downside of decreased caching for unstable directories. - it makes v3 and v4 readdir pagecache behavior consistent WRT = changing directories. I spent a non-trivial amount of time working on this problem, and saw = this exact issue appear. Its definitely something that's going to come back = and bite us if we don't fix it. How can I convince you? I've offered to produce a working example of = this problem. Will you review those results? If I cannot convince you, I = feel I'll have to pursue distro-specific changes for this work. Ben