Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4959706iob; Mon, 9 May 2022 05:48:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFOQgCIVZ14h1iZm4uBO1uIBrOb5ZSqj2qCwvNsZlW91iRLIz+blbl7P9qBU6KcPChbNQh X-Received: by 2002:a63:d908:0:b0:3c1:f140:c1a4 with SMTP id r8-20020a63d908000000b003c1f140c1a4mr12951226pgg.393.1652100529475; Mon, 09 May 2022 05:48:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652100529; cv=none; d=google.com; s=arc-20160816; b=JhIX/zCEk2m8sIjX+1MSeqGAqlx7Ouz/+rniddKQxhqnOdfnwA0UYiEtUB9KMaaD63 S5bTFiY5qL63MmB5MH1cNlfXWbDcK29cs7mC3NzfnueWE1xV2FYtx5wdv7hjX6gMpdjM f2YV1lukS2goyvMx5TzhUjA3GwgaE7q8d6CVjNJvzEwgWW81/HYIEBWanpoFc09459my eqW3+vC/tWPETcegzP6Bn1LU3SoJfY2FXRfN2X07c6pk7L/geim+Y/GNaJRJY0oCa+zw f+DN94LTGMAjjxZmQuI3wsonBfB3OpkVwdlGCxmR+miJA+DxRVwFK6zIMooLnJqKoiHC qgVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=TcjEz9mfr/CKdBKbDpnF0VhaiW/EVMTt3QtLKB+ca3o=; b=nbCJ9ej0LGV9fnp4irukswp0Wi8jsM18z3b/O00iPfY+MFpPLTsApOYd0P+ScoPuol uAzRrHK39GJnn8b9yA3vhRFpND9b5WqMmerBiqtOvNutPELAI8HQuZrnLx7oHNVtUdO4 umbvmc37sXhK9Hr6JuoNq1C6Ipkd4iTBv4U+Lj1OUJd1MObwjNgDYld6FOHfaFn6uFLd 0HvFS9Bi/mPZeXk+vk0brmU0HcWv+7FzQ6zuLbMFOGJyyvb1Cf8bvMpOSqPY68otcAAJ b2sFeiP/xn5X5uQM4Trl64eHU5p7yFvgmlEesQ9PSbe91yYrp03JOwkuEh6A4J9i44r2 a23Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nLdqiKtU; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id e17-20020a170903241100b0015d0c53ae0esi6435760plo.491.2022.05.09.05.48.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 05:48:49 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nLdqiKtU; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7302563CA; Mon, 9 May 2022 05:48:47 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234919AbiEIMwX (ORCPT + 99 others); Mon, 9 May 2022 08:52:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234907AbiEIMwR (ORCPT ); Mon, 9 May 2022 08:52:17 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9345291CE6; Mon, 9 May 2022 05:48:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 60DC160F83; Mon, 9 May 2022 12:48:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC776C385AB; Mon, 9 May 2022 12:48:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652100502; bh=PEkal2Mg+V+a1Uc4DwNy01rEfPajZVpu3xEyPDheL7g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nLdqiKtUmlalSDwS1n1Pd+QPOl3ZQ5Z/Px3VzfgDfi/iTVQE75a4Mx59sqYinG3Y/ 6JQ6tPdvw+spGntcA6/qBbzc0EWVWgvTCSeRhFt1TJnwfvlS403/93TzCxeTfDI7bL Y5ZOPkievIcmDHqw93+6S/hqUlUH3A62X+d+bPRUfl+mF2CJW2bDhz2iBiCLZyWITr djQO1sd/jt9/OKnyv7F6kp5sV6MSxq3kG3Cdq0CXHpH2YpoyabNaeZddaukEfsezds fZkIgyWymXJHtqeaAT7YcpcuzYjWrtm8sVHbltjmgphJAmYJA3YXh3Yq2Rnko/B5MD lwieyoWchdbeg== Date: Mon, 9 May 2022 14:48:15 +0200 From: Christian Brauner To: Miklos Szeredi Cc: linux-fsdevel@vger.kernel.org, Dave Chinner , Theodore Ts'o , Karel Zak , Greg KH , linux-kernel@vger.kernel.org, Linux API , linux-man , LSM , Ian Kent , David Howells , Linus Torvalds , Al Viro , Christian Brauner , Amir Goldstein , James Bottomley Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API Message-ID: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote: > This is a simplification of the getvalues(2) prototype and moving it to the > getxattr(2) interface, as suggested by Dave. > > The patch itself just adds the possibility to retrieve a single line of > /proc/$$/mountinfo (which was the basic requirement from which the fsinfo > patchset grew out of). > > But this should be able to serve Amir's per-sb iostats, as well as a host of > other cases where some statistic needs to be retrieved from some object. Note: > a filesystem object often represents other kinds of objects (such as processes > in /proc) so this is not limited to fs attributes. > > This also opens up the interface to setting attributes via setxattr(2). > > After some pondering I made the namespace so: > > : - root > bar - an attribute > foo: - a folder (can contain attributes and/or folders) > > The contents of a folder is represented by a null separated list of names. > > Examples: > > $ getfattr -etext -n ":" . > # file: . > :="mnt:\000mntns:" > > $ getfattr -etext -n ":mnt:" . > # file: . > :mnt:="info" > > $ getfattr -etext -n ":mnt:info" . > # file: . > :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012" Hey Miklos, One comment about this. We really need to have this interface support giving us mount options like "relatime" back in numeric form (I assume this will be possible.). It is royally annoying having to maintain a mapping table in userspace just to do: relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME ro -> MS_RDONLY/MOUNT_ATTR_RDONLY A library shouldn't be required to use this interface. Conservative low-level software that keeps its shared library dependencies minimal will need to be able to use that interface without having to go to an external library that transforms text-based output to binary form (Which I'm very sure will need to happen if we go with a text-based interface.). > > $ getfattr -etext -n ":mntns:" . > # file: . > :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:" > > $ getfattr -etext -n ":mntns:28:" . > # file: . > :mntns:28:="info" > > Comments? I'm not a fan of text-based APIs and I'm particularly not a fan of the xattr APIs. But at this point I'm ready to compromise on a lot as long as it gets us values out of the kernel in some way. :) I had to use xattrs extensively in various low-level userspace projects and they continue to be a source of races and memory bugs. A few initial questions: * The xattr APIs often require the caller to do sm like (copying some go code quickly as I have that lying around): for _, x := range split { xattr := string(x) // Call Getxattr() twice: First, to determine the size of the // buffer we need to allocate to store the extended attributes, // second, to actually store the extended attributes in the // buffer. Also, check if the size of the extended attribute // hasn't increased between the two calls. pre, err = unix.Getxattr(path, xattr, nil) if err != nil || pre < 0 { return nil, err } dest = make([]byte, pre) post := 0 if pre > 0 { post, err = unix.Getxattr(path, xattr, dest) if err != nil || post < 0 { return nil, err } } if post > pre { return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post) } xattrs[xattr] = string(dest) } This pattern of requesting the size first by passing empty arguments, then allocating the buffer and then passing down that buffer to retrieve that value is really annoying to use and error prone (I do of course understand why it exists.). For real xattrs it's not that bad because we can assume that these values don't change often and so the race window between getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But fwiw, the post > pre check doesn't exist for no reason; we do indeed hit that race. In addition, it is costly having to call getxattr() twice. Again, for retrieving xattrs it often doesn't matter because it's not a super common operation but for mount and other info it might matter. Will we have to use the same pattern for mnt and other info as well? If so, I worry that the race is way more likely than it is for real xattrs. * Would it be possible to support binary output with this interface? I really think users would love to have an interfact where they can get a struct with binary info back. I'm not advocating to make the whole interface binary but I wouldn't mind having the option to support it. Especially for some information at least. I'd really love to have a way go get a struct mount_info or whatever back that gives me all the details about a mount encompassed in a single struct. Callers like systemd will have to parse text and will end up converting everything from text into binary anyway; especially for mount information. So giving them an option for this out of the box would be quite good. Interfaces like statx aim to be as fast as possible because we exptect them to be called quite often. Retrieving mount info is quite costly and is done quite often as well. Maybe not for all software but for a lot of low-level software. Especially when starting services in systemd a lot of mount parsing happens similar when starting containers in runtimes. * If we decide to go forward with this interface - and I think I mentioned this in the lsfmm session - could we please at least add a new system call? It really feels wrong to retrieve mount and other information through the xattr interfaces. They aren't really xattrs. Imho, xattrs are a bit like a wonky version of streams already (One of the reasons I find them quite unpleasant.). Making mount and other information retrievable directly through the getxattr() interface will turn them into a full-on streams implementation imho. I'd prefer not to do that (Which is another reason I'd prefer at least a separate system call.).