Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp8932866rwl; Tue, 10 Jan 2023 22:20:14 -0800 (PST) X-Google-Smtp-Source: AMrXdXuVNXrtOqSKI61ty6zU76Fn4moUTwSObHXphGaWx5/XHqJfN/H9jkYs33A7Ng4dQPiH2sxT X-Received: by 2002:a17:907:d48a:b0:7c0:dac7:36ea with SMTP id vj10-20020a170907d48a00b007c0dac736eamr67087437ejc.66.1673418014351; Tue, 10 Jan 2023 22:20:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673418014; cv=none; d=google.com; s=arc-20160816; b=cWHm+Iw97Br+rx0aau7mcRlHMlVCjz+Km5YeEgU9SKrcG0HP1L7/FOrsgHT9a9bfTx VGghoUKfdfAfun232p/MMcN6hmu8PclSKnHT0I+leCQWsGxqQKVz1vhsdUUUAY9MBIZs JEq95ruDx1ZbBuPq8ItcYlTNAVjsSBou9IVJ3ptky+++HvfqYseyWyCN2mldIM0gSImN ZYwCB4NfgTJ9Ry6UEsXBfupK0Lbae9+7Qpeu1+c6wc+6VxiWoR2M4AVgj6q6Ji0Ln9Ui Z9yAs0hR421RvK0Xlq5UCstXa7ytWBD4dKB5HsLOBEhn63elFWzY4vqkEMsKVfb/aEC0 QVMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=+zKLZ4cMF4VZH8zOzNr6plvSw69rGNAVsqBjH7W6v6Q=; b=fTIOsjt5vePO0ti7rJgL+FRtv9NUbV8tAV3/JD2IYiLpKwcdxzlJZ+22reJu2p19NN QbsUnAFjLwaj3CWtATk6OP7Q39dWGebz8EvpOQpT97Sn2TAecFpG3NJiZbpD1CtiH+xt vNdHePjTrfvCLhwvUFFKFIB1sn+cikCWWtuLpDxZauHSLjMDxKcYNVnR6lY0t+nkP4NO 4Cd9hGy5jIqY+p+oijRIF4PkBy1vj0PrMtcQWsGo0hYx3HtYUecVwq+J47YvIMxaUoDI Xee4vbbZVbAhtL/jOZu9eXV+i0v7pr2u+uxmcg3y7uRWzbrIMuHII1SkeuBz205/OBdu pJ7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=z5siAmwL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i21-20020a170906251500b007c4f331c852si3099536ejb.274.2023.01.10.22.20.01; Tue, 10 Jan 2023 22:20:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=z5siAmwL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235804AbjAKFh0 (ORCPT + 53 others); Wed, 11 Jan 2023 00:37:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235980AbjAKFgs (ORCPT ); Wed, 11 Jan 2023 00:36:48 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69E9414D11 for ; Tue, 10 Jan 2023 21:25:26 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id o13so11224426pjg.2 for ; Tue, 10 Jan 2023 21:25:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+zKLZ4cMF4VZH8zOzNr6plvSw69rGNAVsqBjH7W6v6Q=; b=z5siAmwL0KyX3ZAkTakARazMKr4W80Evyjiap3KoB249oAn29O0Wc3a89zZfNTxmCU nakN0bq7rIJkBBmlXBQwwNCv+/WeFVOPAOTRi23O0o9Iz9Ne9emioRPjfG0HP41SdG9E ry5CoGjhHruyyIPsQv3JoOHUc2TvuZ5jB76HRNGT4ORAt97bReLLosizCUFaqtvurzGJ Tu/Wcd3fqr3drSAhKbIfCm1yWcYBhTdlL1/IgjPn1rUzPs0SiHUYVMa03x7ImQ4CmSLY hSyurk8FXo6tB/5sq0k/kUEm6CEsxs3aw82aE1ugLgymue/qO0lEOfdPi5Q+iGSkDMao zn/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+zKLZ4cMF4VZH8zOzNr6plvSw69rGNAVsqBjH7W6v6Q=; b=fEY0b6V9310MxicZCLoo7Qk8q+RTxaCnK8ns24EcMv4eKmn07rLoNusaUlhnw8Erk0 J6HgPanLMChtPQK5PNECAFEXjqacbMaElNuT3U0bX5gPlbPB8B+JI+6oPNt7vDJBrxkb i30MIgsoS+AZEws9AewBPi2nNzTxsQVxRsjkgzSPQCavy01Tmf6wkgcntoiHsa2EcOPN Ev7bBjYCpFxosFv/NwjM5kYuMAE5P7iFPXW5HWP9WsIKIaxKMx8U2TADJbf4eD8vVPDS /GXayichmPDVw/pY0eplPSW2rp+9rBz+ALhBWWRxhO873974h+ATZiYWLz8A80EkerWb f0zQ== X-Gm-Message-State: AFqh2kqSCaR738zXnRKPTWE/FTzCgcNDD48l6u+wHggngdErRJLSNR61 cGVCVlexMY+vtDG1GrqSrpmndw== X-Received: by 2002:a17:902:9b8f:b0:192:6d68:158 with SMTP id y15-20020a1709029b8f00b001926d680158mr66117737plp.15.1673414725912; Tue, 10 Jan 2023 21:25:25 -0800 (PST) Received: from C02G705SMD6V.bytedance.net ([61.213.176.10]) by smtp.gmail.com with ESMTPSA id l10-20020a170903244a00b0019334350ce6sm4934520pls.244.2023.01.10.21.25.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Jan 2023 21:25:25 -0800 (PST) From: Jia Zhu To: dhowells@redhat.com Cc: linux-cachefs@redhat.com, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jia Zhu Subject: [PATCH V4 0/5] Introduce daemon failover mechanism to recover from crashing Date: Wed, 11 Jan 2023 13:25:10 +0800 Message-Id: <20230111052515.53941-1-zhujia.zj@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes since v3: 1. Add xa_lock for traverse xarray in cachefiles_daemon_poll(). 2. Use macro to simplify the code in cachefiles_ondemand_select_req(). [Background] ============ In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon crashes), subsequent read and inflight requests based on these fd will return -EIO. Even if above mentioned case is tolerable for some individual users, but when it happenens in real cloud service production environment, such IO errors will be passed to cloud service users and impact its working jobs. It's terrible for cloud service stability. [Design] ======== The main idea of daemon failover is reopen the inflight req related object, thus the newly started daemon could process the req as usual. To implement that, we need to support: 1. Store inflight requests during daemon crash. 2. Hold the handle of /dev/cachefiles(by container snapshotter/systemd). BTW, if user chooses not to keep /dev/cachefiles fd, failover is not enabled. Inflight requests return error and passed it to container.(same behavior as now). [Flow Path] =========== This patchset introduce three states for ondemand object: CLOSE: Object which just be allocated or closed by user daemon. OPEN: Object which related OPEN request has been processed correctly. REOPENING: Object which has been closed, and is drived to open by a read request. 1. Daemon use UDS send/receive fd to keep and pass the fd reference of "/dev/cachefiles". 2. User daemon crashes -> restart and recover dev fd's reference. 3. User daemon write "restore" to device. 2.1 Reset the object's state from CLOSE to REOPENING. 2.2 Init a work which reinit the object and add it to wq. (daemon can get rid of kernel space and handle that open request). 4. The user of upper filesystem won't notice that the daemon ever crashed since the inflight IO is restored and handled correctly. [Test] ====== There is a testcase for above mentioned scenario. A user process read the file by fscache ondemand reading. At the same time, we kill the daemon constantly. The expected result is that the file read by user is consistent with original, and the user doesn't notice that daemon has ever been killed. https://github.com/userzj/demand-read-cachefilesd/commits/failover-test [GitWeb] ======== https://github.com/userzj/linux/tree/fscache-failover-v5 RFC: https://lore.kernel.org/all/20220818135204.49878-1-zhujia.zj@bytedance.com/ V1: https://lore.kernel.org/all/20221011131552.23833-1-zhujia.zj@bytedance.com/ V2: https://lore.kernel.org/all/20221014030745.25748-1-zhujia.zj@bytedance.com/ V3: https://lore.kernel.org/all/20221014080559.42108-1-zhujia.zj@bytedance.com/ Jia Zhu (5): cachefiles: introduce object ondemand state cachefiles: extract ondemand info field from cachefiles_object cachefiles: resend an open request if the read request's object is closed cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: add restore command to recover inflight ondemand read requests fs/cachefiles/daemon.c | 16 +++- fs/cachefiles/interface.c | 6 ++ fs/cachefiles/internal.h | 57 +++++++++++++- fs/cachefiles/ondemand.c | 160 ++++++++++++++++++++++++++++---------- 4 files changed, 192 insertions(+), 47 deletions(-) -- 2.20.1