Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5742196iob; Tue, 10 May 2022 02:35:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwhin51q8TvvyjFGpMx5tpCqfkGnqH/LbOyrPn0k22nHg53yfdGyNFNH9ja4WVsGyuoMCTV X-Received: by 2002:a17:907:97d1:b0:6f8:5aa9:6f16 with SMTP id js17-20020a17090797d100b006f85aa96f16mr14481162ejc.587.1652175311355; Tue, 10 May 2022 02:35:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652175311; cv=none; d=google.com; s=arc-20160816; b=uKVeEKelbnnyZka6Zgd8ElOg2FDBw+JNAf3kVlpXpmc/Lw1gxT1slMkVmLIM4af7wc dRF9kwvARnE9uSDM6M/0u5m2n3nkiScFaVgLQsZ3cZTGrXlJIT+WGvBKbOTdLzW8FFTM APMItjZ4tuT8CgVQR88Rfn42/kwPtPL0a0C/K2qNKkz0T25LhYRGlYOTxo+eWrDhtBG/ 9tetcpF0K4HIFRVk+/2gKv8KGxbQuZAZBUjhI7T5ZqS/7sLg5tzZi2t/AwbNS6V8TCFr +IXvkSTkrpwLnZTk6ngN9264ldiQsMz63cc9feU+NWr7UF2SOHYtGjN9H9WEmolAgF8M DBTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature:dkim-signature; bh=VLsWJyEUbg/8yUBVhrHDqMU773zHuqBYYMQbxj0NaOo=; b=EEdWn8kuh0wNor+IQrd82CS4pEv/JehfLjC7WSMnyE9bwtdyr3QdzLBfC2ORnwLRQk W9jMUWjd2JngK6pOscnFEtI86ZxUeoKo+two4x7n2e6C6sbqalaUYE3MWznmewAREk3d TgrwI1MFo+HVHb1w0Y3OCcZOqr/28v5DDl7kmNOYukz63UP6gNf2vrh9RbxlMrb5dbOR 4/zszH4sNUnQru8WVWf72E4qGJgdGUJngxf6V1KrymtaJ3WZfo2Pa1QZ/o8gFIpb3vfE Pe+97/6cOTOf1G6MqiCD8vYJ7NbyLUzBk7UbCKILVtJWZX0RzJHezr2WoTznDuHKIuwD XedA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@themaw.net header.s=fm3 header.b=oV7gqoi+; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="fCSx/1cU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id de36-20020a1709069be400b006e6ccac1988si14910696ejc.573.2022.05.10.02.34.47; Tue, 10 May 2022 02:35:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@themaw.net header.s=fm3 header.b=oV7gqoi+; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="fCSx/1cU"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236240AbiEJEdQ (ORCPT + 99 others); Tue, 10 May 2022 00:33:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236692AbiEJEbm (ORCPT ); Tue, 10 May 2022 00:31:42 -0400 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10867201305; Mon, 9 May 2022 21:27:23 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 72AF23200950; Tue, 10 May 2022 00:27:21 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Tue, 10 May 2022 00:27:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=themaw.net; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1652156840; x= 1652243240; bh=VLsWJyEUbg/8yUBVhrHDqMU773zHuqBYYMQbxj0NaOo=; b=o V7gqoi+ehEUTdy8ZZYz5zncHlJ3l9M1p/oOgjHQoaRKZXN1uwPycOF5r9kzOw2KZ FRYn5EqYqPdpchvASgC/mSRHPUPg0LBVpuqsRtpMpTfm8qfZ1f5ksNSZZEyA7VB9 xQ8KD9OESVVk8Z2Lohbr3YpExGYmqz/uINoM8ktrUnTW+7cvjuv5idMuova15JxA EqqeixS4eSZxIQJIfZ/vpWpVu+KrxQqNr0fX1OEGrcmZ40Io8gVobg7CiuTYoPrH rfl8NviSHB0Shkbhj5LV8wCXSK+QDgxr66qJbXNjZ8f415XVwgMkQ3KUvJCxGEGd IHmGMjJ8qcyVyDvPuIRUg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1652156840; x=1652243240; bh=VLsWJyEUbg/8y UBVhrHDqMU773zHuqBYYMQbxj0NaOo=; b=fCSx/1cUCCFmuBH/kO+KYeB/Acl0X SBbnBRAvVXjLUnvhS8i7ijN0tcwktpTWC1Vdqf0uwbDR9CHl6xisIh2ofI9RDoVF 9FFtSI36X9x1MuN7s699uW2Kt1UxafTzxjA1LOwcztWUWJOYEYykAO9xwFI9ZZDS 6yJ0Xb7rMCFCbfpXMHeQVrS0w6RJTeyZ0gToPkx0uu51HGDEx9Aev1jrpDtqDGxN T7aYesVlzdwXzllwWht7MyD4L7gscjdWepCIu5PSzAbwQ+A9lWhovTfP5HSb/Oc4 bdQ+9NVZ3YNg0bETNs7nqv/a6xid9hLZraavC0Ojrhcq8906eGeCXLM7A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrgedtgdekgecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkuffhvfevffgjfhgtfggggfesthekredttderjeenucfhrhhomhepkfgrnhcu mfgvnhhtuceorhgrvhgvnhesthhhvghmrgifrdhnvghtqeenucggtffrrghtthgvrhhnpe eivdehhfehtdffjeehlefhheekudetteegueeuteetvdeuheeufeeuveduuedvudenucff ohhmrghinhepghhithhhuhgsrdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrg hrrghmpehmrghilhhfrhhomheprhgrvhgvnhesthhhvghmrgifrdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 10 May 2022 00:27:14 -0400 (EDT) Message-ID: <8ab7f51cf18ba62e3f5bfdf5d9933895413f4806.camel@themaw.net> Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API From: Ian Kent To: Miklos Szeredi , Christian Brauner Cc: linux-fsdevel@vger.kernel.org, Dave Chinner , Theodore Ts'o , Karel Zak , Greg KH , linux-kernel@vger.kernel.org, Linux API , linux-man , LSM , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , James Bottomley Date: Tue, 10 May 2022 12:27:10 +0800 In-Reply-To: References: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 (3.42.4-1.fc35) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2022-05-10 at 05:49 +0200, Miklos Szeredi wrote: > On Mon, 9 May 2022 at 14:48, Christian Brauner > wrote: > > > One comment about this. We really need to have this interface > > support > > giving us mount options like "relatime" back in numeric form (I > > assume > > this will be possible.). It is royally annoying having to maintain > > a > > mapping table in userspace just to do: > > > > relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME > > ro       -> MS_RDONLY/MOUNT_ATTR_RDONLY > > > > A library shouldn't be required to use this interface. Conservative > > low-level software that keeps its shared library dependencies > > minimal > > will need to be able to use that interface without having to go to > > an > > external library that transforms text-based output to binary form > > (Which > > I'm very sure will need to happen if we go with a text-based > > interface.). > > Agreed. > > >   This pattern of requesting the size first by passing empty > > arguments, > >   then allocating the buffer and then passing down that buffer to > >   retrieve that value is really annoying to use and error prone (I > > do > >   of course understand why it exists.). > > > >   For real xattrs it's not that bad because we can assume that > > these > >   values don't change often and so the race window between > >   getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. > > But > >   fwiw, the post > pre check doesn't exist for no reason; we do > > indeed > >   hit that race. > > That code is wrong.  Changing xattr size is explicitly documented in > the man page as a non-error condition: > >        If size is specified as zero, these calls return the  current  > size  of >        the  named extended attribute (and leave value unchanged).  > This can be >        used to determine the size of the buffer that should be > supplied  in  a >        subsequent  call.   (But, bear in mind that there is a > possibility that >        the attribute value may change between the two calls,  so  > that  it  is >        still necessary to check the return status from the second > call.) > > > > >   In addition, it is costly having to call getxattr() twice. Again, > > for > >   retrieving xattrs it often doesn't matter because it's not a > > super > >   common operation but for mount and other info it might matter. > > You don't *have* to retrieve the size, it's perfectly valid to e.g. > start with a fixed buffer size and double the size until the result > fits. > > > * Would it be possible to support binary output with this > > interface? > >   I really think users would love to have an interfact where they > > can > >   get a struct with binary info back. > > I think that's bad taste.   fsinfo(2) had the same issue.  As well as > mount(2) which still interprets the last argument as a binary blob in > certain cases (nfs is one I know of). > > >   Especially for some information at least. I'd really love to have > > a > >   way go get a struct mount_info or whatever back that gives me all > > the > >   details about a mount encompassed in a single struct. > > If we want that, then can do a new syscall with that specific struct > as an argument. > > >   Callers like systemd will have to parse text and will end up > >   converting everything from text into binary anyway; especially > > for > >   mount information. So giving them an option for this out of the > > box > >   would be quite good. > > What exactly are the attributes that systemd requires? It's been a while since I worked on this so my response might not be too accurrate now. Monitoring the mount table is used primarily to identify a mount started and mount completion. Mount table entry identification requires several fields. But, in reality, once a direct interface is available it should be possible to work out what is actually needed and that will be a rather subset of a mountinfo table entry. > > >   Interfaces like statx aim to be as fast as possible because we > > exptect > >   them to be called quite often. Retrieving mount info is quite > > costly > >   and is done quite often as well. Maybe not for all software but > > for a > >   lot of low-level software. Especially when starting services in > >   systemd a lot of mount parsing happens similar when starting > >   containers in runtimes. > > Was there ever a test patch for systemd using fsinfo(2)?  I think > not. Mmm ... I'm hurt you didn't pay any attention to what I did on this during the original fsinfo() discussions. > > Until systemd people start to reengineer the mount handing to allow > for retrieving a single mount instead of the complete mount table we > will never know where the performance bottleneck lies. We didn't need the systemd people to do this only review and contribute to the pr for the change and eventually merge it. What I did on this showed that using fsinfo() allone about halved the CPU overhead (from around 4 processes using about 80%) and once the mount notifications was added too it went down to well under 10% per process. The problem here was systemd is quite good at servicing events and reducing event processing overhead meant more events would then be processed. Utilizing the mount notifications queueing was the key to improving this and that was what I was about to work on at the end. But everything stopped before the work was complete. As I said above it's been a long time since I looked at the systemd work and it definitely was a WIP so "what you see is what you get" at https://github.com/raven-au/systemd/commits/. It looks like the place to look to get some idea of what was being done is branch notifications-devel or notifications-rfc-pr. Also note that this uses the libmount fsinfo() infrastrucure that was done by Karal Zak (and a tiny bit by me) at the time. > > > > > * If we decide to go forward with this interface - and I think I > >   mentioned this in the lsfmm session - could we please at least > > add a > >   new system call? It really feels wrong to retrieve mount and > > other > >   information through the xattr interfaces. They aren't really > > xattrs. > > I'd argue with that statement.  These are most definitely attributes. > As for being extended, we'd just extended the xattr interface... > > Naming aside... imagine that read(2) has always been used to retrieve > disk data, would you say that reading data from proc feels wrong? > And in hindsight, would a new syscall for the purpose make any sense? > > Thanks, > Miklos