Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp2622273rwe; Sun, 16 Apr 2023 01:40:16 -0700 (PDT) X-Google-Smtp-Source: AKy350ZMIX8Dt+5jI40OQzY/m7wpPbKiGyMcmGf4VqW+CZozWUFxRD/2+amg0UtyAAuCZHlLZr6j X-Received: by 2002:a17:902:f546:b0:1a5:34c2:81ca with SMTP id h6-20020a170902f54600b001a534c281camr10818600plf.60.1681634416589; Sun, 16 Apr 2023 01:40:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681634416; cv=none; d=google.com; s=arc-20160816; b=BJQOhMTt9EW8JE4aKN27eFicmZJwqNuVqvVaKgKtz3vOSvrrW8gqCgcEAc75u2Kx0J ys292FNUKg+dHc/QsJ148s3P0d6YfK5bMsvckMHbSSzJw6BwJUkcbIvuzp77ZNPnBL9g 8WHdvrAwMIZCV+PlA4EBT+XPH7AKBFa0uOIql1dp9I0uvp545I65kRSe1aKJc5vWFJQn PNAVFb/T3Ubk6AFMoL7+70ngzcn/oPpi4wen6VFGuquyu09M05Y0r6Q+ITG209gMpKAz 6mq441oB5MngqYhMx86gDELNpEXRigQuOr9DgI0T93k2QGhto1PNQA4jJzqmHJv8xQGx tH7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:ui-outboundreport:content-transfer-encoding :in-reply-to:subject:from:references:cc:to:content-language :user-agent:mime-version:date:message-id; bh=PFDEdQ4RAB9re1iU6g1qAsnRTFvf4RbayzuzMstuGwg=; b=cKQEc0oLRzRltxDdObov33AQsFoia6wvzEGedulYXt14xa+zweg0puy2KqacoX9tp/ UAqvmo7996MVWmPLxmLawgOG+dvzQFneKRvTd1gOjTQCeQk3Phuny9tbwlO4OOEum25T l7R0zvTu589JDTqa0u/GMYgTGUuzyNZcdDseOLoWoKCGdVz91k4I17sdlQDz9QQcpz1B 0alBpm8Ax4iZF1R+E8EwLhQplSjd8xMkJesvPwWjxWZtxplvW1ry76XCx706iv3Td124 vUrmsGVZL0cNJn+378ygL+ccQW1K7wzKo2FfnHhoArzVjPeOByTdjRORyx2BJwYrw3i7 6BqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmx.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k20-20020a170902761400b0019cdd3e2aecsi8505744pll.312.2023.04.16.01.39.54; Sun, 16 Apr 2023 01:40:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmx.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230220AbjDPIK4 (ORCPT + 99 others); Sun, 16 Apr 2023 04:10:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229728AbjDPIKz (ORCPT ); Sun, 16 Apr 2023 04:10:55 -0400 Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAE7B1996; Sun, 16 Apr 2023 01:10:53 -0700 (PDT) Received: from [0.0.0.0] ([149.28.201.231]) by mail.gmx.net (mrgmx105 [212.227.17.174]) with ESMTPSA (Nemesis) id 1MOiDX-1pzJ5l0uMh-00QDC7; Sun, 16 Apr 2023 10:10:46 +0200 Message-ID: <9b689664-095c-f2cf-7ba5-86303df3722b@gmx.com> Date: Sun, 16 Apr 2023 16:10:42 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Content-Language: en-US To: "Darrick J. Wong" , lsf-pc@lists.linux-foundation.org Cc: linux-fsdevel@vger.kernel.org, xfs , linux-ext4 , linux-btrfs References: From: Qu Wenruo Subject: Re: [LSF TOPIC] online repair of filesystems: what next? In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:V4pVXMCXojE0YM0/XA3/4hgs7OVEQjrjZ/Dfd2GhcVCBhjVDpcE Cv/VGvLswBB/oNNGDJK6w6abejLVPDkTgEQ4Q4ULNVylaDDoQiJ9RIUF+KLZfLMqNKfBjQe Y+d5EqAuhNONLnrxBTwhXJw1uVHLJCip+U3IDLHTCLVO3Z8bQ4gnimAwYPwuv4FwmcjL728 gwgka44kXhlz/wJneNqQA== UI-OutboundReport: notjunk:1;M01:P0:675/jOOtOH8=;n4XsUOlanE0Snd1VK1pjMQko73i lPfW19Vrvl8NjcWoR0P2fA5+CvYpAfKGlKtSjZQ2YNueFlIimRqcnBjjgQXL8Q80sUCmaOqPo IdOiqIX2Pvf/ZphRjp0Q/NXaxxh5cDtOcbXldJoUiNK6Pxvum1lp4gJqqU2v3vMsOW/pOdk5k TFekJDrsC+wUExpVEFIBk6Xj0nAaY4bLOnC/BgzGFz9H8l8x2jLtMPw/OKoXP8KFlFBhgXkGq A5gMW3TK4lmckY2xBKN1f8Yq9jdsctyAoMZCy0KOtZiWmUTy8mRcpdvrMndaErHG675420kqh 8jg9RfHDLYqeOiURzII4LpA0LfKZH0n4yIv5Q0mtKD8FmCNPIxgoGSfvCoNTZX8IPLuhmkNw3 NiwHHPA3G/P5VM+SeqlujSU+jCIM1kNIn80jBSksGFPyfoWtflkvezzFx7kIPEnLVUJm2IP38 C2QukNZcCgxzRQdDkZV4yVd0J7+67KGZkFETgXy31pyEuKUyqkD11PHQ6Y4q4BkunNkz0KLZY yakdiCThY5pvX5DS+T++87pk7JZV+WjrnRdrgS0J3XiKZzrtNrAOJkJXduHq/LYqIBkQtc+w1 bDj3E5Z1q4uRwcHf+oFVkofXpWompzGuMH75Gr4ek4rR71z8HRWHnHwqMvf8KMWT5AY5A+gm6 DCcSmWt7eLV3cPxmfjVyiZ/VTiflmFpt/OTbfBmxovKRDKbthB6FALLJpJzy57jtpAPFEihPK BZxCWHuqV9qTTpy+ejY5pFHhwIWKFFTYPdzJtThfXv/YmbnquZaPBP46GHDMQ+MMiY2vKEzYT WObhC5MbmJ8RfEriJBQiG7z4pccrY9NMUkUPV74FOcJIVMy8vdlqJWUFAG8XU5TBIWi6YTdlf qETUXvgUwk3wfL4XsSorQOIH6JAbCyd6oW9je9WIFVTaCFS1/Z8D9ysP78jOiBqJm6ctSrzDk 8oRZn/0SaofGGFCTVQbHmW8AATY= X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, NICE_REPLY_A,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2023/3/1 04:49, Darrick J. Wong wrote: > Hello fsdevel people, > > Five years ago[0], we started a conversation about cross-filesystem > userspace tooling for online fsck. I think enough time has passed for > us to have another one, since a few things have happened since then: > > 1. ext4 has gained the ability to send corruption reports to a userspace > monitoring program via fsnotify. Thanks, Collabora! Not familiar with the new fsnotify thing, any article to start? I really believe we should have a generic interface to report errors, currently btrfs reports extra details just through dmesg (like the logical/physical of the corruption, reason, involved inodes etc), which is far from ideal. > > 2. XFS now tracks successful scrubs and corruptions seen during runtime > and during scrubs. Userspace can query this information. > > 3. Directory parent pointers, which enable online repair of the > directory tree, is nearing completion. > > 4. Dave and I are working on merging online repair of space metadata for > XFS. Online repair of directory trees is feature complete, but we > still have one or two unresolved questions in the parent pointer > code. > > 5. I've gotten a bit better[1] at writing systemd service descriptions > for scheduling and performing background online fsck. > > Now that fsnotify_sb_error exists as a result of (1), I think we > should figure out how to plumb calls into the readahead and writeback > code so that IO failures can be reported to the fsnotify monitor. I > suspect there may be a few difficulties here since fsnotify (iirc) > allocates memory and takes locks. > > As a result of (2), XFS now retains quite a bit of incore state about > its own health. The structure that fsnotify gives to userspace is very > generic (superblock, inode, errno, errno count). How might XFS export > a greater amount of information via this interface? We can provide > details at finer granularity -- for example, a specific data structure > under an allocation group or an inode, or specific quota records. The same for btrfs. Some btrfs specific info like subvolume id is also needed to locate the corrupted inode (ino is not unique among the full fs, but only inside one subvolume). And something like file paths for the corrupted inode is also very helpful for end users to locate (and normally delete) the offending inode. > > With (4) on the way, I can envision wanting a system service that would > watch for these fsnotify events, and transform the error reports into > targeted repair calls in the kernel. Btrfs has two ways of repair: - Read time repair This happens automatically for both invovled data and metadata, as long as the fs is mount RW. - Scrub time repair The repair is also automatic. The main difference is, scrub is manually triggered by user space. Otherwise it can be considered as a full read of the fs (both metadata and data). But the repair of btrfs only involves using the extra copies, never intended to repair things like directories. (That's still the work of btrfs-check, and the complex cross reference of btrfs is not designed to repair those problems at runtime) Currently both repair would result a dmesg based report, while scrub has its own interface to report some very basis accounting, like how many sectors are corrupted, and how many are repaired. A feature full and generic interface to report errors are definitely a good direction to go. > This of course would be very > filesystem specific, but I would also like to hear from anyone pondering > other usecases for fsnotify filesystem error monitors. Btrfs also has an internal error counters, but that's accumulated value, sometimes it's not that helpful and can even be confusing. If we have such interface, we can more or less get rid of the internal error counters, and rely on the user space to do the history recording. > > Once (3) lands, XFS gains the ability to translate a block device IO > error to an inode number and file offset, and then the inode number to a > path. In other words, your file breaks and now we can tell applications > which file it was so they can failover or redownload it or whatever. > Ric Wheeler mentioned this in 2018's session. Yeah, if user space deamon can automatically (at least by some policy) delete offending files, it can be a great help. As we have hit several reports that corrupted files (no extra copy to recover from) are preventing btrfs balance, and users have to locate the file from dmesg, and then delete the file and retry balancing. Thus such interface can greatly improve the user experience. Thanks, Qu > > The final topic from that 2018 session concerned generic wrappers for > fsscrub. I haven't pushed hard on that topic because XFS hasn't had > much to show for that. Now that I'm better versed in systemd services, > I envision three ways to interact with online fsck: > > - A CLI program that can be run by anyone. > > - Background systemd services that fire up periodically. > > - A dbus service that programs can bind to and request a fsck. > > I still think there's an opportunity to standardize the naming to make > it easier to use a variety of filesystems. I propose for the CLI: > > /usr/sbin/fsscrub $mnt that calls /usr/sbin/fsscrub.$FSTYP $mnt > > For systemd services, I propose "fsscrub@". I > suspect we want a separate background service that itself runs > periodically and invokes the fsscrub@$mnt services. xfsprogs already > has a xfs_scrub_all service that does that. The services are nifty > because it's really easy to restrict privileges, implement resource > usage controls, and use private name/mountspaces to isolate the process > from the rest of the system. > > dbus is a bit trickier, since there's no precedent at all. I guess > we'd have to define an interface for filesystem "object". Then we could > write a service that establishes a well-known bus name and maintains > object paths for each mounted filesystem. Each of those objects would > export the filesystem interface, and that's how programs would call > online fsck as a service. > > Ok, that's enough for a single session topic. Thoughts? :) > > --D > > [0] https://lwn.net/Articles/754504/ > [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-optimize-by-default