Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1844035ybb; Thu, 2 Apr 2020 08:21:23 -0700 (PDT) X-Google-Smtp-Source: APiQypJEiJNYmHavul2ueZcJ3tGtiWvxkP2LQyobPXFmBW93zx8+fbjd1OoNbDYQlpqGLX/lSbXJ X-Received: by 2002:aca:fd44:: with SMTP id b65mr2462524oii.119.1585840883027; Thu, 02 Apr 2020 08:21:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585840883; cv=none; d=google.com; s=arc-20160816; b=nA2iJaMQE+PfxpYHWKYg8oAix2ak95g+BI8LI2Fog5ItYki6b9HCNvwrbKzHru7Egy vQS6/mLOqQDd/q+c74UGpTy4qUGK22OZ5as00hP0p8ShFRayDnOJR5sEIvApH6LJEJt3 60Q4Ro3E0anbprW8eUlEmBMaFAWPC+F9KjzV1WWAMw0G8gMEwiDCsVq+TJb1bgkTo8Ng bvdb0fxyA9yvAiTU5A9NHfT3619nEChJPyU9o7NwJe0V4e0zu4seTuFs/QW+gL7d3S8b P24vjKphX6sr1TbZSUb4MexT7nCgyoO1TUEYjvrBHTK7ZIZAD+jqiInCkr6DzywwC98I kIXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MUZimPCBmfg/lNq5C9li8ToeMH5GybOob9boTMJ8Ry4=; b=a/ki+lXebaEvCps5wf2aTSuuT2+fk/5D6NV5EqTY/Z+j2Ov2fbwl0RPthAtMQSpSM8 lv44/p0AzZIMWYfrkFykxjjQ9+W1EdXQBbFQe71B+SBiAh6LeVa+IfLk6BA3Eb82JegI mmvoJesNjgJhFRLN2l6dgrxS1qrifsE298x20LsaJY2j4uDM+NIaTfxCuvf2uFcoa1gf Sj1xSRtUegzFs9aP6VVeP2w10mY7xq3GwHkA4CAUNQB1p6ajufSrpncLkSI4H2qSsP6J gr4Fr8FPSm40Hett4Zm/V/J49xt3giRXQbKAdWmtMWx2hfVvPEfqAPI+OBA5sNXMiUks E1Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=pJotPFYL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i187si2543788oif.89.2020.04.02.08.21.07; Thu, 02 Apr 2020 08:21:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=pJotPFYL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389163AbgDBPTS (ORCPT + 99 others); Thu, 2 Apr 2020 11:19:18 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:39235 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389092AbgDBPTS (ORCPT ); Thu, 2 Apr 2020 11:19:18 -0400 Received: by mail-ed1-f66.google.com with SMTP id a43so4646478edf.6 for ; Thu, 02 Apr 2020 08:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MUZimPCBmfg/lNq5C9li8ToeMH5GybOob9boTMJ8Ry4=; b=pJotPFYLX2utpXM+urj5d012OnozHKhG/DAfbSolW1WTTjvFpXzSE2T94VdAu5KGjE y6UduEs0avw6aH/jnPIj35O1z9paB7T50YDn82a4NRSD6s3ei7KyuqtU4Kg/o/8lD4KB y6eArWXTDzt7ZY64tQxj+HCgONZ3sGjRTBDZI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MUZimPCBmfg/lNq5C9li8ToeMH5GybOob9boTMJ8Ry4=; b=mPgoSQUkehgvwRMiTxvQ8R8kagFRlN29lcUHCWbwrDzwon8a6bAi8NzY1f0QJMmV41 waM4gK53NkfN1M3+A7bRNoDHCotT0FSGtRYrDGLz02JgHifLRq6ttwMLpXXOKBD3KJGM GVl380q2WM4UQHTEG3nwHnAq9mY6hKM5rJ7QuSKcl44N1OL7xhBD6ASjx0Ft+DAVQZVQ SxJjq812CILnF1wcdOzFiF9Oc2G4WFvZsbU8p8i2CJNjvZgqeqnNRM5RBJhHYLjYelGa dKmGAInMV+2rL2lo1zfHJ8XQuolkHH9h8XqXG8JKr+GznCiEL7ZWEvfiiklxdBD6ERJO tUAA== X-Gm-Message-State: AGi0PuZZCM1spwIsgJBKktKTdZVEHs0oCquqJxReYe4OQXZrga41p1oA daJS9E6i8yF/3xqPNBd5BiyH1+H/o9wKAdbRaeAZEA== X-Received: by 2002:a17:906:405b:: with SMTP id y27mr3832799ejj.213.1585840754915; Thu, 02 Apr 2020 08:19:14 -0700 (PDT) MIME-Version: 1.0 References: <158454378820.2863966.10496767254293183123.stgit@warthog.procyon.org.uk> <158454391302.2863966.1884682840541676280.stgit@warthog.procyon.org.uk> In-Reply-To: <158454391302.2863966.1884682840541676280.stgit@warthog.procyon.org.uk> From: Miklos Szeredi Date: Thu, 2 Apr 2020 17:19:03 +0200 Message-ID: Subject: Re: [PATCH 13/17] watch_queue: Implement mount topology and attribute change notifications [ver #5] To: David Howells Cc: Linus Torvalds , Al Viro , Casey Schaufler , Stephen Smalley , nicolas.dichtel@6wind.com, Ian Kent , Christian Brauner , andres@anarazel.de, Jeff Layton , dray@redhat.com, Karel Zak , keyrings@vger.kernel.org, Linux API , linux-fsdevel@vger.kernel.org, LSM , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 18, 2020 at 4:05 PM David Howells wrote: > > Add a mount notification facility whereby notifications about changes in > mount topology and configuration can be received. Note that this only > covers vfsmount topology changes and not superblock events. A separate > facility will be added for that. > > Every mount is given a change counter than counts the number of topological > rearrangements in which it is involved and the number of attribute changes > it undergoes. This allows notification loss to be dealt with. Isn't queue overrun signalled anyway? If an event is lost, there's no way to know which object was affected, so how does the counter help here? > Later > patches will provide a way to quickly retrieve this value, along with > information about topology and parameters for the superblock. So? If we receive a notification for MNT1 with change counter CTR1 and then receive the info for MNT1 with CTR2, then we know that we either missed a notification or we raced and will receive the notification later. This helps with not having to redo the query when we receive the notification with CTR2, but this is just an optimization, not really useful. > Firstly, a watch queue needs to be created: > > pipe2(fds, O_NOTIFICATION_PIPE); > ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256); > > then a notification can be set up to report notifications via that queue: > > struct watch_notification_filter filter = { > .nr_filters = 1, > .filters = { > [0] = { > .type = WATCH_TYPE_MOUNT_NOTIFY, > .subtype_filter[0] = UINT_MAX, > }, > }, > }; > ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter); > watch_mount(AT_FDCWD, "/", 0, fds[1], 0x02); > > In this case, it would let me monitor the mount topology subtree rooted at > "/" for events. Mount notifications propagate up the tree towards the > root, so a watch will catch all of the events happening in the subtree > rooted at the watch. Does it make sense to watch a single mount? A set of mounts? A subtree with an exclusion list (subtrees, types, ???)? Not asking for these to be implemented initially, just questioning whether the API is flexible enough to allow these cases to be implemented later if needed. > > After setting the watch, records will be placed into the queue when, for > example, as superblock switches between read-write and read-only. Records > are of the following format: > > struct mount_notification { > struct watch_notification watch; > __u32 triggered_on; > __u32 auxiliary_mount; What guarantees that mount_id is going to remain a 32bit entity? > __u32 topology_changes; > __u32 attr_changes; > __u32 aux_topology_changes; Being 32bit this introduces wraparound effects. Is that really worth it? > } *n; > > Where: > > n->watch.type will be WATCH_TYPE_MOUNT_NOTIFY. > > n->watch.subtype will indicate the type of event, such as > NOTIFY_MOUNT_NEW_MOUNT. > > n->watch.info & WATCH_INFO_LENGTH will indicate the length of the > record. Hmm, size of record limited to 112bytes? Is this verified somewhere? Don't see a BUILD_BUG_ON() in watch_sizeof(). > > n->watch.info & WATCH_INFO_ID will be the fifth argument to > watch_mount(), shifted. > > n->watch.info & NOTIFY_MOUNT_IN_SUBTREE if true indicates that the > notifcation was generated in the mount subtree rooted at the watch, notification > and not actually in the watch itself. > > n->watch.info & NOTIFY_MOUNT_IS_RECURSIVE if true indicates that > the notifcation was generated by an event (eg. SETATTR) that was > applied recursively. The notification is only generated for the > object that initially triggered it. Unused in this patchset. Please don't add things to the API which are not used. > > n->watch.info & NOTIFY_MOUNT_IS_NOW_RO will be used for > NOTIFY_MOUNT_READONLY, being set if the superblock becomes R/O, and > being cleared otherwise, Does this refer to mount r/o flag or superblock r/o flag? Confused. > and for NOTIFY_MOUNT_NEW_MOUNT, being set > if the new mount is a submount (e.g. an automount). Huh? What has r/o flag do with being a submount? > > n->watch.info & NOTIFY_MOUNT_IS_SUBMOUNT if true indicates that the > NOTIFY_MOUNT_NEW_MOUNT notification is in response to a mount > performed by the kernel (e.g. an automount). > > n->triggered_on indicates the ID of the mount to which the change > was accounted (e.g. the new parent of a new mount). For move there are two parents that are affected. This doesn't look sufficient to reflect that. > > n->axiliary_mount indicates the ID of an additional mount that was > affected (e.g. a new mount itself) or 0. > > n->topology_changes provides the value of the topology change > counter of the triggered-on mount at the conclusion of the > operarion. operation > > n->attr_changes provides the value of the attribute change counter > of the triggered-on mount at the conclusion of the operarion. operation > > n->aux_topology_changes provides the value of the topology change > counter of the auxiliary mount at the conclusion of the operation. > > Note that it is permissible for event records to be of variable length - > or, at least, the length may be dependent on the subtype. Note also that > the queue can be shared between multiple notifications of various types. Will review code later... Thanks, Miklos