Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6A0CC6FA99 for ; Mon, 6 Mar 2023 16:32:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229852AbjCFQcq (ORCPT ); Mon, 6 Mar 2023 11:32:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229582AbjCFQco (ORCPT ); Mon, 6 Mar 2023 11:32:44 -0500 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F069A38654 for ; Mon, 6 Mar 2023 08:32:07 -0800 (PST) Received: by mail-ed1-f43.google.com with SMTP id u9so41155202edd.2 for ; Mon, 06 Mar 2023 08:32:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; t=1678120167; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=24/5jEnRxS77JLQYP3KCQkPVfdQcMFrYWM1aG3/d1pI=; b=PeMn8my/ImjOvT54YmCnkSVgHS4U2S3dV4lYoAUXGKW8IgxF82lBeCLXMoUamGBUvZ w9WOl2PAXIt4OIMErmXXlkQPj5zHNDvuR5WuQfAWUoTgFMbPhgdcbae2qrNZOCEcrXTh EOY9YRyLH5kmF6uxr43JPJGeFlVvjM8lbPwKY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678120167; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=24/5jEnRxS77JLQYP3KCQkPVfdQcMFrYWM1aG3/d1pI=; b=LcfI+LZWnPkxdMhYxWuQ0MS4lLLqiR+das6VyTMMrmWjpU2ugPwfUgkTJLhg2A9kkP +Bye4LvdFG87eg0dKawgK0Z8fqUlA6qv3RkQsZsrYUWx+PTAptmWE4CDmOfmPzYaNj/9 NHK5cekwBu3ANqUKDl7JBbk9L34M8x92KQqrRPJTTM0bDXfh31mxtcFquYcv4IdhCOp/ lR5OAyNXhiyBiIBWMpKH3ZAHgEqv47hw3g8OfXoDaaebv85R72++RKeSeCs8kp410yRb DCLvZDvEEpXJxaSNcUf67BROCnbcfnm5BAckzFYQwOFLj3VzH1RXzRnjjFALa3KLknRX J2rQ== X-Gm-Message-State: AO0yUKWwPgoxLZ1gQj68y1sv+1y8uUNgpq0VCBil63ZmCde5epjKTt+x NVw02hI4rz5ZC/mVUGXAns99dUz4bBqvfmFEoERDSkOhrwsfkEfB1Uw= X-Google-Smtp-Source: AK7set+qf7SFHAiNCIIe/y2arjCq+xoMRnPLuL+4ZvciprziaazTRftUVKg5erVzTHvqsyeGD4McJbOrI6XpkJdcSM0= X-Received: by 2002:a17:906:d041:b0:877:747d:4a82 with SMTP id bo1-20020a170906d04100b00877747d4a82mr5503308ejb.0.1678119310195; Mon, 06 Mar 2023 08:15:10 -0800 (PST) MIME-Version: 1.0 References: <20230220193754.470330-1-aleksandr.mikhalitsyn@canonical.com> In-Reply-To: <20230220193754.470330-1-aleksandr.mikhalitsyn@canonical.com> From: Miklos Szeredi Date: Mon, 6 Mar 2023 17:14:59 +0100 Message-ID: Subject: Re: [RFC PATCH 0/9] fuse: API for Checkpoint/Restore To: Alexander Mikhalitsyn Cc: mszeredi@redhat.com, Al Viro , Amir Goldstein , =?UTF-8?Q?St=C3=A9phane_Graber?= , Seth Forshee , Christian Brauner , Andrei Vagin , Pavel Tikhomirov , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, criu@openvz.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 20 Feb 2023 at 20:38, Alexander Mikhalitsyn wrote: > > Hello everyone, > > It would be great to hear your comments regarding this proof-of-concept Checkpoint/Restore API for FUSE. > > Support of FUSE C/R is a challenging task for CRIU [1]. Last year I've given a brief talk on LPC 2022 > about how we handle files C/R in CRIU and which blockers we have for FUSE filesystems. [2] > > The main problem for CRIU is that we have to restore mount namespaces and memory mappings before the process tree. > It means that when CRIU is performing mount of fuse filesystem it can't use the original FUSE daemon from the > restorable process tree, but instead use a "fake daemon". > > This leads to many other technical problems: > * "fake" daemon has to reply to FUSE_INIT request from the kernel and initialize fuse connection somehow. > This setup can be not consistent with the original daemon (protocol version, daemon capabilities/settings > like no_open, no_flush, readahead, and so on). > * each fuse request has a unique ID. It could confuse userspace if this unique ID sequence was reset. > > We can workaround some issues and implement fragile and limited support of FUSE in CRIU but it doesn't make any sense, IMHO. > Btw, I've enumerated only CRIU restore-stage problems there. The dump stage is another story... > > My proposal is not only about CRIU. The same interface can be useful for FUSE mounts recovery after daemon crashes. > LXC project uses LXCFS [3] as a procfs/cgroupfs/sysfs emulation layer for containers. We are using a scheme when > one LXCFS daemon handles all the work for all the containers and we use bindmounts to overmount particular > files/directories in procfs/cgroupfs/sysfs. If this single daemon crashes for some reason we are in trouble, > because we have to restart all the containers (fuse bindmounts become invalid after the crash). > The solution is fairly easy: > allow somehow to reinitialize the existing fuse connection and replace the daemon on the fly > This case is a little bit simpler than CRIU cause we don't need to care about the previously opened files > and other stuff, we are only interested in mounts. > > Current PoC implementation was developed and tested with this "recovery case". > Right now I only have LXCFS patched and have nothing for CRIU. But I wanted to discuss this idea before going forward with CRIU. Apparently all of the added mechanisms (REINIT, BM_REVAL, conn_gen) are crash recovery related, and not useful for C/R. Why is this being advertised as a precursor for CRIU support? BTW here's some earlier attempt at partial recovery, which might be interesting: https://lore.kernel.org/all/CAPm50a+j8UL9g3UwpRsye5e+a=M0Hy7Tf1FdfwOrUUBWMyosNg@mail.gmail.com/ Thanks, Miklos