Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6626320iob; Wed, 11 May 2022 01:33:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw+NyV1tqp+l0T/oh29QxLhuk7JvRl583mcNCxxJ+l5a3BrzPjQhgaydALTsOWTGiWmQxHU X-Received: by 2002:a17:907:1625:b0:6f4:55aa:4213 with SMTP id hb37-20020a170907162500b006f455aa4213mr23966455ejc.594.1652258033112; Wed, 11 May 2022 01:33:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652258033; cv=none; d=google.com; s=arc-20160816; b=YOjhdF3jlFBTvfOvFk9g5Ihd9acBGi6bPFkf5RxWF7tcNAn92C22oihrKGoN+f9OAw xzecd/l+qEoCj+3aExRGItvBRrEis7dKfeMukx0vFhQk9YYNoM1fQJFTYaL4+fF+xIdY UGREqGQG4od002NfGbz/Hi/TxdfI12Jz60w97JAqhy8CGIQwvcf+zK1mfh5sM+K1fLuF Nr+HPfZV0Fx5ODQBVNoWi8i5598kVPo89YtB9YfiQjDCsR8Xvjk+mN8hlsYpx8WQyK+d olXepQ9ajmLlJAFry9kVRBbRM2GLzLRBBy0KBfpp4ygNogdq/YV5gCqnJkyQecslyxld edZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=k0l0m+o0RsdiTdrbDjnYB2YOm0qBX8zw1bKO97cOMOE=; b=LyuG54LdbjVJ73/SYaB/wfg0ZPSxnxs8NLAqweRrxnkMIB+X6GI7k87wZuj/7q0GJO UsgMbaIBDUh3eGSCw+MFmdLYf6xJVeKUGd2sB6aatIyDC7gPXuiw2QKIya0DX/ZJs9e9 yoU/0pU82Va0Af/Edu7c+LV9AgsoPuGLa0sEMP6rFBTIzh06TtP5eKL1W0QHHH7BvTa9 WEp16t0cnOUV8kDx4uJ7Khv2AYZKTdi0+G9XrL36/YIkrNdsAnLgoTg2vHKtFNF0S8fj 5iui8gBm1Y1KtLshcfWC9ndqJR47kXhHbK5qI/Q3ej7DvPEHqArnLaM5br8R96DoHafE Xnug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hd33-20020a17090796a100b006f38a65b1c2si1698942ejc.848.2022.05.11.01.33.25; Wed, 11 May 2022 01:33:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237730AbiEJXE4 (ORCPT + 99 others); Tue, 10 May 2022 19:04:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230470AbiEJXEy (ORCPT ); Tue, 10 May 2022 19:04:54 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 072442734DA; Tue, 10 May 2022 16:04:53 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 44A9110E68F9; Wed, 11 May 2022 09:04:49 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1noYuN-00AT9W-Ck; Wed, 11 May 2022 09:04:47 +1000 Date: Wed, 11 May 2022 09:04:47 +1000 From: Dave Chinner To: Florian Weimer Cc: Christian Brauner , Miklos Szeredi , linux-fsdevel@vger.kernel.org, Theodore Ts'o , Karel Zak , Greg KH , linux-kernel@vger.kernel.org, Linux API , linux-man , LSM , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , James Bottomley Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API Message-ID: <20220510230447.GC2306852@dread.disaster.area> References: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein> <20220510005533.GA2306852@dread.disaster.area> <87bkw5d098.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87bkw5d098.fsf@oldenburg.str.redhat.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=627aef94 a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=kj9zAlcOel0A:10 a=oZkIemNP1mAA:10 a=7-415B0cAAAA:8 a=DeMPlFUAyss6Xeq_UakA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 10, 2022 at 02:45:39PM +0200, Florian Weimer wrote: > * Dave Chinner: > > > IOWs, what Linux really needs is a listxattr2() syscall that works > > the same way that getdents/XFS_IOC_ATTRLIST_BY_HANDLE work. With the > > list function returning value sizes and being able to iterate > > effectively, every problem that listxattr() causes goes away. > > getdents has issues of its own because it's unspecified what happens if > the list of entries is modified during iteration. Few file systems add > another tree just to guarantee stable iteration. The filesystem I care about (XFS) guarantees stable iteration and stable seekdir/telldir cookies. It's not that hard to do, but it requires the filesystem designer to understand that this is a necessary feature before they start designing the on-disk directory format and lookup algorithms.... > Maybe that's different for xattrs because they are supposed to be small > and can just be snapshotted with a full copy? It's different for xattrs because we directly control the API specification for XFS_IOC_ATTRLIST_BY_HANDLE, not POSIX. We can define the behaviour however we want. Stable iteration is what listing keys needs. The cursor is defined as 16 bytes of opaque data, enabling us to encoded exactly where in the hashed name btree index we have traversed to: /* * Kernel-internal version of the attrlist cursor. */ struct xfs_attrlist_cursor_kern { __u32 hashval; /* hash value of next entry to add */ __u32 blkno; /* block containing entry (suggestion) */ __u32 offset; /* offset in list of equal-hashvals */ __u16 pad1; /* padding to match user-level */ __u8 pad2; /* padding to match user-level */ __u8 initted; /* T/F: cursor has been initialized */ }; Hence we have all the information in the cursor we need to reset the btree traversal index to the exact entry we finished at (even in the presence of hash collisions in the index). Hence removal of the entry the cursor points to isn't a problem for us, we just move to the next highest sequential hash index in the btree and start again from there. Of course, if this is how we define listxattr2() behaviour (or maybe we should call it "list_keys()" to make it clear we are treating this as a key/value store instead of xattrs) then each filesystem can put what it needs in that cursor to guarantee it can restart key iteration correctly if the entry the cursor points to has been removed. We can also make the cursor larger if necessary for other filesystems to store the information they need. Cheers, Dave. -- Dave Chinner david@fromorbit.com