Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp2264966rbb; Tue, 27 Feb 2024 16:59:53 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWG3OBLHxK4SvlUVH6OgMgW+MUjFkpZinLbgQh/jNm7lZrqsAnyCYOgzKSh8fNnV6fQKPFqZU/9wgv557V8/N6M7+1l46mEPVVxu9hQIw== X-Google-Smtp-Source: AGHT+IGNBUBBNqdmA0XmcVGTrY1ZRaU3NvatAXrcXnb0/6q6M4pCZp4ZdAyU9dMr9gSZGTRGrS8A X-Received: by 2002:a17:90b:1bc4:b0:299:3035:aede with SMTP id oa4-20020a17090b1bc400b002993035aedemr7704228pjb.44.1709081993654; Tue, 27 Feb 2024 16:59:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709081993; cv=pass; d=google.com; s=arc-20160816; b=JiqYadNRq/AeFIBH6XP6GxX3/SAW3KNaFjVe3rnMvDU8Zu05pJmHwzdcWV9WR57Fyb jfmqZhqyNeVc8nsyn/7vHbvnwFe5pUhX6KLCmaOL9KmknUC50cu2zdt1iT/xHWn+3/Tt iQgFIk20U3PBUNuzWKJxnOpr814la7ydxTSlL8/AFu/kt34vwNMOpVdoPEiEHJYU6fj2 UJqbal6pmlFwb4yH/UWZWP6HUv/mMCWm+2rKWvMyxNf5nna66A/jMstyIgk6Pw9YkgYQ WMsp2TA03l8kJuNuRfRihvVViQm4VYwYHcAxlXKQNXAqVau24rtfUs39NqH71ThzbvJC f8Aw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:sender:dkim-signature; bh=Y/VJ4ezn4JfkgFtFkFYYYQTXIflXruAY0C3laMtvGzA=; fh=Siej7kE5kjZBpxTLjE+6DQ6F6iq7wezWpf6FlFRPV1I=; b=KcvX/LYgPtt5oCKQWRCIDnrvaxfRlzqh5LIs6kMaA3jC9bM7Z7zsXksmE396FEZHTz 7PAaiUTbQji4YhYi+wDJAf/1iDYLXefEPdxFKPRdpMaxoWSG8a8FAg+sDPuKNxx+ayuE Xv4bZzdZp7D4Jn5GMDKoKz6yCFV7xAAWTKvQG+jaDmC0Flj9tGq6zlf/KrPPyl0yFoOD jVii9et0mbGVNWFrvKICzFFnwHHQiaVZRtHgOXXOYYXtET989+lTOwbJFqyZ80UpBpa5 T/H/3lePWsv23ycxYN916udtrFc7AP+lTMo64CjyerbKqgj9NwDNbLiMDJ5cAVI5guu3 xiRw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ps37R3t6; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-84294-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84294-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id nl2-20020a17090b384200b0029af3763c73si240058pjb.58.2024.02.27.16.59.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 16:59:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-84294-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Ps37R3t6; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-84294-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84294-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5E11C285FA2 for ; Wed, 28 Feb 2024 00:59:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 09C0520EE; Wed, 28 Feb 2024 00:59:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ps37R3t6" Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBEFDEBB; Wed, 28 Feb 2024 00:59:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709081983; cv=none; b=OJ1Jg+062nGMrCWkSKmkTjdnm2phmKY9a+wJF3dyEOe0lJiUqXQaSmDeBRGipK3PG600UoaRA9BF1irm658FdZeSrheyqZSPgPbgFh4KLRUnNwbjd/PIRL8bKQFoF4YZYiG3Z5ya47ZWFQx3SGtV8hOeRRtG6xqHEgQ5SX6GYl4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709081983; c=relaxed/simple; bh=6mAwiUtOknSSutvq0YNFLvWJ9KCKfA9wjlmgNIu1avE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=giA/uR0iWTXGj3gevN6CQ5xAdW1U8zWDebMZthUjffkjVppVKjwO3wpmoLnvXxx6ipIdhSEUvhCzMoh9z7LpSMw38/MGxESFUpj7mUvunZoBf0KWSVREW7Pf1BQAjWqtkzB07cPY+eDTTF7c0ohtNPzdHlPwYVUGIRS8CnYfRyo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=groves.net; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ps37R3t6; arc=none smtp.client-ip=209.85.160.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=groves.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-21fa872dce3so2760214fac.2; Tue, 27 Feb 2024 16:59:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709081979; x=1709686779; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=Y/VJ4ezn4JfkgFtFkFYYYQTXIflXruAY0C3laMtvGzA=; b=Ps37R3t60IfbP9rnsIhEOeC39uUcRHWCWlso5hL5yIp7CycH4z7ObGZNRnl3rgrz+V q6hSeKHWeC0V9TdPDHNGeGx+X0yXjZYx+aPNwEWJPkPPq7zk4F4aKE3lOGx6ltpWtVmG YGA1neRx7vMOd8/3XahVCV7uqecHdNKbameqR7uDA0lXb5Gh+H/17T+DcWfxlY+rV7cK SCDS8k/0EgyleXAQdkPxN3myLO51FaidBBg7yDPfoOQRNledV8LuD+pVfX29RVTdo/qO FpeXsNypti3QHaYbG8vy00kqq3zQFVZR3D+lxM/kp+4J/SPmken5u/kU6zREaw/pcny2 FZKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709081979; x=1709686779; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y/VJ4ezn4JfkgFtFkFYYYQTXIflXruAY0C3laMtvGzA=; b=ORp14QFxRirn7q7UgyycEZ2nGijb1/EOcufLCxlsKJ68qZytC/tzRAJkEMMFuWjOJJ nnGna6vop10cF0Z1XGIDzlK1WKkQLbSnP1ShPsljx9+QC46AF8ofrEu/8R7vkon+thh2 HwfMUjml7nOewM9XP2b4vj5UVvLjF01Fte1Mgm4UpLHCF4DUb1GlTLN0bNozqM8HO8Vl 6wSvgyla0SMpl/Exe2Rd5c+4IFbSMrp+7Dtj9KDxhJ2JSYTbb0RQzAdquCq4AO8aMdnp V+KQZH1HMBc+wuE72SzuutdKa6wfb1sbg7PU4KSPgOJZdyMZ9gZwKHo7WHP8Tq8tRLK/ mMvA== X-Forwarded-Encrypted: i=1; AJvYcCWzGUfKx0wixXxaztnmPSZxNfW11PTgzovkNrIBhDb0tgzFpn3wHwksWWm8DmweDHMlv8yRN7NljRMqEEOwd5QPuy/Wv8AFeI4E7ubbJVjqPGdzzr6ecX1Ar3CYIwlvCvqZyxKMRBogq9t3dv8U8aykiuguS0mm7z+daSkjRjZus5mXcO0wT7bXEoDFAcFBykefXHZO/4Qh4Q+pEYflIpiEmA== X-Gm-Message-State: AOJu0Yx08tMlYAj/wRoKsWUo0uHyz8fx5W+JaBdQ3ldy2n/lxgRnqo8H iw9gNueBeURwHKjxRph/E2b73p4nLl2QJzGbQCdsIoF8Cc/DSeU0 X-Received: by 2002:a05:6870:55d2:b0:21e:5827:9483 with SMTP id qk18-20020a05687055d200b0021e58279483mr13395251oac.29.1709081978556; Tue, 27 Feb 2024 16:59:38 -0800 (PST) Received: from Borg-9.local (070-114-203-196.res.spectrum.com. [70.114.203.196]) by smtp.gmail.com with ESMTPSA id xm12-20020a0568709f8c00b0021fb37e33e5sm2322033oab.19.2024.02.27.16.59.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 16:59:38 -0800 (PST) Sender: John Groves Date: Tue, 27 Feb 2024 18:59:36 -0600 From: John Groves To: Christian Brauner Cc: John Groves , Jonathan Corbet , Dan Williams , Vishal Verma , Dave Jiang , Alexander Viro , Jan Kara , Matthew Wilcox , linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, john@jagalactic.com, Dave Chinner , Christoph Hellwig , dave.hansen@linux.intel.com, gregory.price@memverge.com Subject: Re: [RFC PATCH 11/20] famfs: Add fs_context_operations Message-ID: <6jrtl2vc4dmi5b6db6tte2ckiyjmiwezbtlwrtmm464v65wkhj@znzv2mwjfgsk> References: <20240227-mammut-tastatur-d791ca2f556b@brauner> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240227-mammut-tastatur-d791ca2f556b@brauner> On 24/02/27 02:41PM, Christian Brauner wrote: > On Fri, Feb 23, 2024 at 11:41:55AM -0600, John Groves wrote: > > This commit introduces the famfs fs_context_operations and > > famfs_get_inode() which is used by the context operations. > > > > Signed-off-by: John Groves > > --- > > fs/famfs/famfs_inode.c | 178 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 178 insertions(+) > > > > diff --git a/fs/famfs/famfs_inode.c b/fs/famfs/famfs_inode.c > > index 82c861998093..f98f82962d7b 100644 > > --- a/fs/famfs/famfs_inode.c > > +++ b/fs/famfs/famfs_inode.c > > +enum famfs_param { > > + Opt_mode, > > + Opt_dax, > > +}; > > + > > +const struct fs_parameter_spec famfs_fs_parameters[] = { > > + fsparam_u32oct("mode", Opt_mode), > > + fsparam_string("dax", Opt_dax), > > + {} > > +}; > > + > > +static int famfs_parse_param( > > + struct fs_context *fc, > > + struct fs_parameter *param) > > +{ > > + struct famfs_fs_info *fsi = fc->s_fs_info; > > + struct fs_parse_result result; > > + int opt; > > + > > + opt = fs_parse(fc, famfs_fs_parameters, param, &result); > > + if (opt == -ENOPARAM) { > > + opt = vfs_parse_fs_param_source(fc, param); > > + if (opt != -ENOPARAM) > > + return opt; > > I'm not sure I understand this. But in any case add, you should add > Opt_source to enum famfs_param and then add > > fsparam_string("source", Opt_source), > > to famfs_fs_parameters. Then you can add: > > famfs_parse_source(fc, param); > > You might want to consider validating your devices right away. So think > about: > > fd_fs = fsopen("famfs", ...); > ret = fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/definitely/not/valid/device", ...) // succeeds > ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "OPTION_1", ...) // succeeds > ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "OPTION_2", ...) // succeeds > ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "OPTION_3", ...) // succeeds > ret = fsconfig(fd_fs, FSCONFIG_SET_FLAG, "OPTION_N", ...) // succeeds > ret = fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) // superblock creation failed > > So what failed exactly? Yes, you can log into the fscontext and dmesg > that it's @source that's the issue but it's annoying for userspace to > setup a whole mount context only to figure out that some option was > wrong at the end of it. > > So validating > > famfs_parse_source(...) > { > if (fc->source) > return invalfc(fc, "Uhm, we already have a source.... > > lookup_bdev(fc->source, &dev) > // validate it's a device you're actually happy to use > > fc->source = param->string; > param->string = NULL; > } > > Your ->get_tree implementation that actually creates/finds the > superblock will validate fc->source again and yes, there's a race here > in so far as the path that fc->source points to could change in between > validating this in famfs_parse_source() and ->get_tree() superblock > creation. This is fixable even right now but then you couldn't reuse > common infrastrucute so I would just accept that race for now and we > should provide a nicer mechanism on the vfs layer. I wasn't aware of the new fsconfig interface. Is there documentation or a file sytsem that already uses it that I should refer to? I didn't find an obvious candidate, but it might be me. If it should be obvious from the example above, tell me and I'll try harder. My famfs code above was copied from ramfs. If you point me to documentation I might send you a ramfs fsconfig patch too :D. > > > + > > + return 0; > > + } > > + if (opt < 0) > > + return opt; > > + > > + switch (opt) { > > + case Opt_mode: > > + fsi->mount_opts.mode = result.uint_32 & S_IALLUGO; > > + break; > > + case Opt_dax: > > + if (strcmp(param->string, "always")) > > + pr_notice("%s: invalid dax mode %s\n", > > + __func__, param->string); > > + break; > > + } > > + > > + return 0; > > +} > > + > > +static DEFINE_MUTEX(famfs_context_mutex); > > +static LIST_HEAD(famfs_context_list); > > + > > +static int famfs_get_tree(struct fs_context *fc) > > +{ > > + struct famfs_fs_info *fsi_entry; > > + struct famfs_fs_info *fsi = fc->s_fs_info; > > + > > + fsi->rootdev = kstrdup(fc->source, GFP_KERNEL); > > + if (!fsi->rootdev) > > + return -ENOMEM; > > + > > + /* Fail if famfs is already mounted from the same device */ > > + mutex_lock(&famfs_context_mutex); > > + list_for_each_entry(fsi_entry, &famfs_context_list, fsi_list) { > > + if (strcmp(fsi_entry->rootdev, fc->source) == 0) { > > + mutex_unlock(&famfs_context_mutex); > > + pr_err("%s: already mounted from rootdev %s\n", __func__, fc->source); > > + return -EALREADY; > > What errno is EALREADY? Isn't that socket stuff. In any case, it seems > you want EBUSY? Thanks... That should probaby be EBUSY. But the whole famfs_context_list should probably also be removed. More below... > > But bigger picture I'm lost. And why do you keep that list based on > strings? What if I do: > > mount -t famfs /dev/pmem1234 /mnt # succeeds > > mount -t famfs /dev/pmem1234 /opt # ah, fsck me, this fails.. But wait a minute.... > > mount --bind /dev/pmem1234 /evil-masterplan > > mount -t famfs /evil-masterplan /opt # succeeds. YAY > > I believe that would trivially defeat your check. > And I suspect this is related to the get_tree issue you noticed below. This famfs code was working in 6.5 without keeping the linked list of devices, but in 6.6/6.7/6.8 it works provided you don't try to repeat a mount command that has already succeeded. I'm not sure why 6.5 protected me from that, but the later versions don't. In 6.6+ That hits a BUG_ON (have specifics on that but not handy right now). So for a while we just removed repeated mount requests from the famfs smoke tests, but eventually I implemented the list above, which - though you're right it would be easy to circumvent and therefore is not right - it did solve the problem that we were testing for. I suspect that correctly handling get_tree might solve this problem. Please assume that linked list will be removed - it was not the right solution. More below... > > + } > > + } > > + > > + list_add(&fsi->fsi_list, &famfs_context_list); > > + mutex_unlock(&famfs_context_mutex); > > + > > + return get_tree_nodev(fc, famfs_fill_super); > > So why isn't this using get_tree_bdev()? Note that a while ago I > added FSCONFIG_CMD_CREAT_EXCL which prevents silent superblock reuse. To > implement that I added fs_context->exclusive. If you unconditionally set > fc->exclusive = 1 in your famfs_init_fs_context() and use > get_tree_bdev() it will give you EBUSY if fc->source is already in use - > including other famfs instances. > > I also fail to yet understand how that function which actually opens the block > device and gets the dax device figures into this. It's a bit hard to follow > what's going on since you add all those unused functions and types so there's > never a wider context to see that stuff in. Clearly that's a bug in my code. That get_tree_nodev() is from ramfs, which was the starting point for famfs. I'm wondering if doing this correctly (get_tree_bdev() when it's pmem) would have solved my double mount problem on 6.6 onward. However, there's another wrinkle: I'm concluding (see https://lore.kernel.org/linux-fsdevel/ups6cvjw6bx5m3hotn452brbbcgemnarsasre6ep2lbe4tpjsy@ezp6oh5c72ur/) that famfs should drop block support and just work with /dev/dax. So famfs may be the first file system to be hosted on a character device? Certainly first on character dax. Given that, what variant of get_tree() should it call? Should it add get_tree_dax()? I'm not yet familiar enough with that code to have a worthy opinion on this. Please let me know what you think. Thank you for the serious review! John