Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp3870103pxy; Mon, 26 Apr 2021 11:43:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeUGRpizj2oQ/5voEKXNvYANA8y91GZLC5NGataySetg8+loAkUAliIhGOYQ1oJZrzJe8t X-Received: by 2002:a17:907:9485:: with SMTP id dm5mr19964944ejc.194.1619462583733; Mon, 26 Apr 2021 11:43:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619462583; cv=none; d=google.com; s=arc-20160816; b=NcH9xvORrCb5JM2KH/IkmUj4M0pvR1JTJRKubE3dJasiRdNNcoID49Kyp3VfKZ8uUB qbcul5w1VrwhKn13DT4iZ07hOu+RY1AjJtkWoYstNJZyPYO085LD4iGhW2GqREXbsjz+ 2z9BQV08Ntb6xg+GhkoieMpauULhJLkbq8jn3eMEOSWtAAOrScHgCwai7/iW6VZv/Pp7 hHmAt/O5Ns/eeWOFK8Ho3wO0P4kFH8Ac5I4sw3kdKZootHpCIEuHk9oTGDMJ1zfg3ABO 9oBhx1v3nO7GrWxDblSjdD8UHqBTfYh8wYBnRd/TmhvnXkSvdQeNj5a83mlH53FOQCnm 18Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=1djh/eXZlf3UnLuReA9duejkS1gDjb6LxLjNhQtKQFg=; b=mwbd2lm8CkIxXM6vf1ui5ly6mCkuuN44YTJnnpE1ioP6caIHSsoQfsW0DBvyZf4rzG 6n9SJm1z2o2pM19dqEeMIZqODfLjsXq13B+fwUzfKmCc/3hDdN5qABanfQBIKNcV0Bqs v4xl/PIKR8AjO0wrw6mRaqRc9upRUKBWOgksn4fWoO5YVxd52Dmcsik5kcJyK1dbd602 fhdvK77iK8IFsVTh0CmpHucDwSsnjGqoY5iAsCdWCe4ReJ6S1t4s12ARZ1ZWjAHDGTvA +b7jXyQpwrtNbH85XgjjWsjl9V+CUzYCmL2BD7xcYbkzupZ5u7eNOilKNSyIIHfcKkgV xzbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cw6si13964257ejb.548.2021.04.26.11.42.34; Mon, 26 Apr 2021 11:43:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239316AbhDZSnL (ORCPT + 99 others); Mon, 26 Apr 2021 14:43:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237161AbhDZSnF (ORCPT ); Mon, 26 Apr 2021 14:43:05 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2018C061574; Mon, 26 Apr 2021 11:42:22 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: krisman) with ESMTPSA id B0C611F41E07 From: Gabriel Krisman Bertazi To: amir73il@gmail.com, tytso@mit.edu, djwong@kernel.org Cc: david@fromorbit.com, jack@suse.com, dhowells@redhat.com, khazhy@google.com, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Gabriel Krisman Bertazi , kernel@collabora.com Subject: [PATCH RFC 00/15] File system wide monitoring Date: Mon, 26 Apr 2021 14:41:46 -0400 Message-Id: <20210426184201.4177978-1-krisman@collabora.com> X-Mailer: git-send-email 2.31.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi, In an attempt to consolidate some of the feedback from the previous proposals, I wrote a new attempt to solve the file system error reporting problem. Before I spend more time polishing it, I'd like to hear your feedback if I'm going in the wrong direction, in particular with the modifications to fsnotify. This RFC follows up on my previous proposals which attempted to leverage watch_queue[1] and fsnotify[2] to provide a mechanism for file systems to push error notifications to user space. This proposal starts by, as suggested by Darrick, limiting the scope of what I'm trying to do to an interface for administrators to monitor the health of a file system, instead of a generic inteface for file errors. Therefore, this doesn't solve the problem of writeback errors or the need to watch a specific subsystem. * Format The feature is implemented on top of fanotify, as a new type of fanotify mark, FAN_ERROR, which a file system monitoring tool can register to receive notifications. A notification is split in three parts, and only the first is guaranteed to exist for any given error event: - FS generic data: A file system agnostic structure that has a generic error code and identifies the filesystem. Basically, it let's userspace know something happen on a monitored filesystem. - FS location data: Identifies where in the code the problem happened. (This is important for the use case of analysing frequent error points that we discussed earlier). - FS specific data: A detailed error report in a filesystem specific format that details what the error is. Ideally, a capable monitoring tool can use the information here for error recovery. For instance, xfs can put the xfs_scrub structures here, ext4 can send its error reports, etc. An example of usage is done in the ext4 patch of this series. More details on the information in each record can be found on the documentation introduced in patch 15. * Using fanotify Using fanotify for this kind of thing is slightly tricky because we want to guarantee delivery in some complicated conditions, for instance, the file system might want to send an error while holding several locks. Instead of working around file system constraints at the file system level, this proposal tries to make the FAN_ERROR submission safe in those contexts. This is done with a new mode in fsnotify that preallocates the memory at group creation to be used for the notification submission. This new mode in fsnotify introduces a ring buffer to queue notifications, which eliminates the allocation path in fsnotify. From what I saw, the allocation is the only problem in fsnotify for filesystems to submit errors in constrained situations. * Visibility Since the usecase is limited to a tool for whole file system monitoring, errors are associated with the superblock and visible filesystem-wide. It is assumed and required that userspace has CAP_SYS_ADMIN. * Testing This was tested with corrupted ext4 images in a few scenarios, which caused errors to be triggered and monitored with the sample tool provided in the next to final patch. * patches Patches 1-4 massage fanotify attempt to refactor fanotify a bit for the patches to come. Patch 5 introduce the ring buffer interface to fsnotify, while patch 6 enable this support in fanotify. Patch 7, 8 wire the FS_ERROR event type, which will be used by filesystems. In sequennce, patches 9-12 implement the FAN_ERROR record types and create the new event. Patch 13 is an ext4 example implementation supporting this feature. Finally, patches 14 and 15 document and provide examples of a userspace tool that uses this feature. I also pushed the full series to: https://gitlab.collabora.com/krisman/linux -b fanotify-notifications [1] https://lwn.net/Articles/839310/ [2] https://www.spinics.net/lists/linux-fsdevel/msg187075.html Gabriel Krisman Bertazi (15): fanotify: Fold event size calculation to its own function fanotify: Split fsid check from other fid mode checks fsnotify: Wire flags field on group allocation fsnotify: Wire up group information on event initialization fsnotify: Support event submission through ring buffer fanotify: Support submission through ring buffer fsnotify: Support FS_ERROR event type fsnotify: Introduce helpers to send error_events fanotify: Introduce generic error record fanotify: Introduce code location record fanotify: Introduce filesystem specific data record fanotify: Introduce the FAN_ERROR mark ext4: Send notifications on error samples: Add fs error monitoring example Documentation: Document the FAN_ERROR framework .../admin-guide/filesystem-monitoring.rst | 103 ++++++ Documentation/admin-guide/index.rst | 1 + fs/ext4/super.c | 60 +++- fs/notify/Makefile | 2 +- fs/notify/dnotify/dnotify.c | 2 +- fs/notify/fanotify/fanotify.c | 127 +++++-- fs/notify/fanotify/fanotify.h | 35 +- fs/notify/fanotify/fanotify_user.c | 319 ++++++++++++++---- fs/notify/fsnotify.c | 2 +- fs/notify/group.c | 25 +- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/inotify/inotify_user.c | 4 +- fs/notify/notification.c | 10 + fs/notify/ring.c | 199 +++++++++++ include/linux/fanotify.h | 12 +- include/linux/fsnotify.h | 15 + include/linux/fsnotify_backend.h | 63 +++- include/uapi/linux/ext4-notify.h | 17 + include/uapi/linux/fanotify.h | 26 ++ kernel/audit_fsnotify.c | 2 +- kernel/audit_tree.c | 2 +- kernel/audit_watch.c | 2 +- samples/Kconfig | 7 + samples/Makefile | 1 + samples/fanotify/Makefile | 3 + samples/fanotify/fs-monitor.c | 135 ++++++++ 26 files changed, 1034 insertions(+), 142 deletions(-) create mode 100644 Documentation/admin-guide/filesystem-monitoring.rst create mode 100644 fs/notify/ring.c create mode 100644 include/uapi/linux/ext4-notify.h create mode 100644 samples/fanotify/Makefile create mode 100644 samples/fanotify/fs-monitor.c -- 2.31.0