From: Beata Michalska Subject: Re: [RFC 1/4] fs: Add generic file system event notifications Date: Fri, 17 Apr 2015 15:15:34 +0200 Message-ID: <55310776.20200@samsung.com> References: <1429082147-4151-1-git-send-email-b.michalska@samsung.com> <1429082147-4151-2-git-send-email-b.michalska@samsung.com> <20150417113110.GD3116@quack.suse.cz> <553104E5.2040704@samsung.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, hughd@google.com, lczerner@redhat.com, hch@infradead.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org, kyungmin.park@samsung.com, kmpark@infradead.org, Linux Filesystem Mailing List , linux-api@vger.kernel.org To: Jan Kara Return-path: In-reply-to: <553104E5.2040704@samsung.com> Sender: owner-linux-mm@kvack.org List-Id: linux-ext4.vger.kernel.org On 04/17/2015 03:04 PM, Beata Michalska wrote: > On 04/17/2015 01:31 PM, Jan Kara wrote: >> On Wed 15-04-15 09:15:44, Beata Michalska wrote: >>> Introduce configurable generic interface for file >>> system-wide event notifications to provide file >>> systems with a common way of reporting any potential >>> issues as they emerge. >>> >>> The notifications are to be issued through generic >>> netlink interface, by a dedicated, for file system >>> events, multicast group. The file systems might as >>> well use this group to send their own custom messages. >>> >>> The events have been split into four base categories: >>> information, warnings, errors and threshold notifications, >>> with some very basic event types like running out of space >>> or file system being remounted as read-only. >>> >>> Threshold notifications have been included to allow >>> triggering an event whenever the amount of free space >>> drops below a certain level - or levels to be more precise >>> as two of them are being supported: the lower and the upper >>> range. The notifications work both ways: once the threshold >>> level has been reached, an event shall be generated whenever >>> the number of available blocks goes up again re-activating >>> the threshold. >>> >>> The interface has been exposed through a vfs. Once mounted, >>> it serves as an entry point for the set-up where one can >>> register for particular file system events. >>> >>> Signed-off-by: Beata Michalska >> Thanks for the patches! Some comments are below. >> >>> --- >>> Documentation/filesystems/events.txt | 254 +++++++++++ >>> fs/Makefile | 1 + >>> fs/events/Makefile | 6 + >>> fs/events/fs_event.c | 775 ++++++++++++++++++++++++++++++++++ >>> fs/events/fs_event.h | 27 ++ >>> fs/events/fs_event_netlink.c | 94 +++++ >>> fs/namespace.c | 1 + >>> include/linux/fs.h | 6 +- >>> include/linux/fs_event.h | 69 +++ >>> include/uapi/linux/fs_event.h | 62 +++ >>> include/uapi/linux/genetlink.h | 1 + >>> net/netlink/genetlink.c | 7 +- >>> 12 files changed, 1301 insertions(+), 2 deletions(-) >>> create mode 100644 Documentation/filesystems/events.txt >>> create mode 100644 fs/events/Makefile >>> create mode 100644 fs/events/fs_event.c >>> create mode 100644 fs/events/fs_event.h >>> create mode 100644 fs/events/fs_event_netlink.c >>> create mode 100644 include/linux/fs_event.h >>> create mode 100644 include/uapi/linux/fs_event.h >>> >>> diff --git a/Documentation/filesystems/events.txt b/Documentation/filesystems/events.txt >>> new file mode 100644 >>> index 0000000..c85dd88 >>> --- /dev/null >>> +++ b/Documentation/filesystems/events.txt >>> @@ -0,0 +1,254 @@ >>> + >>> + Generic file system event notification interface >>> + >>> +Document created 09 April 2015 by Beata Michalska >>> + >>> +1. The reason behind: >>> +===================== >>> + >>> +There are many corner cases when things might get messy with the filesystems. >>> +And it is not always obvious what and when went wrong. Sometimes you might >>> +get some subtle hints that there is something going on - but by the time >>> +you realise it, it might be too late as you are already out-of-space >>> +or the filesystem has been remounted as read-only (i.e.). The generic >>> +interface for the filesystem events fills the gap by providing a rather >>> +easy way of real-time notifications triggered whenever something intreseting >>> +happens, allowing filesystems to report events in a common way, as they occur. >>> + >>> +2. How does it work: >>> +==================== >>> + >>> +The interface itself has been exposed as fstrace-type Virtual File System, >>> +primarily to ease the process of setting up the configuration for the file >>> +system notifications. So for starters it needs to get mounted (obviously): >>> + >>> + mount -t fstrace none /sys/fs/events >>> + >>> +This will unveil the single fstrace filesystem entry - the 'config' file, >>> +through which the notification are being set-up. >>> + >>> +Activating notifications for particular filesystem is as straightforward >>> +as writing into the 'config' file. Note that by default all events despite >>> +the actual filesystem type are being disregarded. >> Is there a reason to have a special filesystem for this? Do you expect >> extending it by (many) more files? Why not just creating a file in sysfs or >> something like that? > > No particular reason here - just for possible future extension if needed. > I'm totally fine with having a single sysfs entry. > On the other hand .... sysfs entries are mostly single-valued or are sets of values of a single type, so not sure if we would fit in here - with the current configuration for the interface. >> >>> +Synopsis of config: >>> +------------------ >>> + >>> + MOUNT EVENT_TYPE [L1] [L2] >>> + >>> + MOUNT : the filesystem's mount point >> I'm not quite decided but is mountpoint really the right thing to pass >> via the interface? They aren't unique (filesystem can be mounted in >> multiple places) and more importantly can change over time. So won't it be >> better to pass major:minor over the interface? These are stable, unique to >> the filesystem, and userspace can easily get them by calling stat(2) on the >> desired path (or directly from /proc/self/mountinfo). That could be also >> used as an fs identifier instead of assigned ID (and thus we won't need >> those events about creation of new trace which look somewhat strange to >> me). >> > Even if a given filesystem is being mounted in many places this will come > down to single super_block - the interface will add trace for the first mount > point. This is just to ease the usage: internally a particular trace is > associated with a super_block. > >> OTOH using major:minor may have issues in container world where processes >> could watch events from filesystems inaccessible to the container if they >> guess the device number. So maybe we could use 'path' when creating new >> trace but I'd still like to use the device number internally and for all >> outgoing communication because of above mentioned problems with >> mountpoints. > > Alright then, so dropping the idea of announcing new trace (with assigned id) > and switching to using the major:minor numbers. Sounds OK to me. > >> >>> + EVENT_TYPE : type of events to be enabled: info,warn,err,thr; >>> + at least one type needs to be specified; >>> + note the comma delimiter and lack of spaces between >>> + those options >>> + L1 : the threshold limit - lower range >>> + L2 : the threshold limit - upper range >>> + case enabling threshold notifications the lower level is >>> + mandatory, whereas the upper one remains optional; >>> + note though, that as those refer to the number of available >>> + blocks, the lower level needs to be higher than the upper one >>> + >>> +Sample request could look like the follwoing: >>> + >>> + echo /sample/mount/point warn,err,thr 710000 500000 > /sys/fs/events/config >>> + >>> +Multiple request might be specified provided they are separated with semicolon. >> Is this necessary? It somewhat complicates syntax and parsing in kernel >> and I don't see a need for that. I'd prefer to keep the interface as simple >> as possible. >> > > This is not necessary but could ease the usage - i.e. through scripting: to specify > multiple traces and register them in one go. > >> Also I think that we should make it clear that each event type has >> different set of arguments. For threshold events they'll be L1 & L2, for >> other events there may be no arguments, for other events maybe something >> else... >> > > Currently only the threshold events use arguments - not sure what arguments > could be used for the remaining notifications. But any suggestions are welcomed. > >> ... >>> +static const match_table_t fs_etypes = { >>> + { FS_EVENT_INFO, "info" }, >>> + { FS_EVENT_WARN, "warn" }, >>> + { FS_EVENT_THRESH, "thr" }, >>> + { FS_EVENT_ERR, "err" }, >>> + { 0, NULL }, >>> +}; >> Why are there these generic message types? Threshold messages make good >> sense to me. But not so much the rest. If they don't have a clear meaning, >> it will be a mess. So I also agree with a message like - "filesystem has >> trouble, you should probably unmount and run fsck" - that's fine. But >> generic "info" or "warning" doesn't really carry any meaning on its own and >> thus seems pretty useless to me. To explain a bit more, AFAIU this >> shouldn't be a generic logging interface where something like severity >> makes sense but rather a relatively specific interface notifying about >> events in filesystem userspace should know about so I expect relatively low >> number of types of events, not tens or even hundreds... >> >> Honza > > Getting rid of those would simplify the configuration part, indeed. > So we would be left with 'generic' and threshold events. > I guess I've overdone this part. > > Thanks for Your comments so far. > > BR > Beata > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org