Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp846844rwe; Fri, 14 Apr 2023 10:25:00 -0700 (PDT) X-Google-Smtp-Source: AKy350bXVOU9Z8BTOzd9jJyRWl5szKSzbWKqjlinf2J6seUP/SLaE92sexNQI6c9uGQWALR4CROs X-Received: by 2002:a17:903:41c2:b0:1a2:6138:c32e with SMTP id u2-20020a17090341c200b001a26138c32emr4399706ple.18.1681493099517; Fri, 14 Apr 2023 10:24:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681493099; cv=none; d=google.com; s=arc-20160816; b=c9oluE1rWtPVJr9ILjnh8JVSSg9ibUewJxLdSf4LvL1I8DmAJbvhJ+wZbucfUyMT9s YS2ezBaPGCYv6/PSOAuTO78ux2GE6k4I129cHlir/jfj4lGiB8g6fCKiguQK8BQko+3P POqbPH9Xh1F1COHIB+3GZXrhRIiuBE3yKZiKMTaBaXSU+CtA9qSKit84Cojid0l0nb/1 eXxPZs1ebTEBqWf5VN1f+yqtWjCTa1nLbVxtKp9t2WWkhFoS5KDToilC1sf/J7Q4QmMP Z4qRZ5Q8eZQxN4/a3fflMN+94+k4x59xinPvQhzE56TlQxUW8OBCRa/BJau3A+6A64AX 5tdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=VT+0HvHjEsU+vTXyoXG01Ki1olkigaqjaaxRq0h7zD0=; b=v+CCGYKozE1HswoPAqnnazoOCw0FjB4a2PtISr6NTaDRUyBgvLOeUuGRI8q6lAP/OE WUniTvsFaoZui1jhoXHGHlV/Z8Fq1S/55ww7XwFCmClJjWzwMNC8pueFD8JhWvcuUem+ vUR2o/qI/V8DwAjhaWx+1bca9bY024lkIJhTsmIeXCrevhS17C0eWs5fFwywSkwOIjTv EDHWldQfC3/mx2awWc05wRIuXLBR+P50f7CX1eAzthP1f2oTq2qGWDoXPh+J2PULkcuz eO2S0/AxiPc1MP8gR5gNNiw2B97ZV0VF6YZq1M//k7yZ3HGFg+hwy8+wbwZFDbiv62gQ rr5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=WrL85XgE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g23-20020a1709029f9700b001a511921c06si4909385plq.358.2023.04.14.10.24.44; Fri, 14 Apr 2023 10:24:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=WrL85XgE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbjDNRXV (ORCPT + 99 others); Fri, 14 Apr 2023 13:23:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbjDNRXU (ORCPT ); Fri, 14 Apr 2023 13:23:20 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CC145B9F for ; Fri, 14 Apr 2023 10:23:19 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id hg12so4493868pjb.2 for ; Fri, 14 Apr 2023 10:23:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681492999; x=1684084999; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=VT+0HvHjEsU+vTXyoXG01Ki1olkigaqjaaxRq0h7zD0=; b=WrL85XgE0J/LewoQgL2BMdswzVLg7QxUJTeJeS9VQZ5mwgMS0mXn5j79U8flu5TuJM z4/yZGflE+2kIgss9MtM/oNaynA0Nj6+pnjAPxwGQYlkZStGQBpz6+69PMwvwf/tUmDo hPL2KatXFoFd7z490zNL1E2J3rvfdUrnhDaYzqu7rk/CmRAhz57TpyIpmr+DrNiEkrkj GMLR7H+tVsAiitRBa8rHFIKMzRFzPbkmWP7Lb45gwH0ALEwN8PFjwPnOg3yIDHH6ZKDD 6i2oxR5RGyobbFO316EpC1EpP2KfQMGsluPOpOdYDgbE4zJq2jhCILSbhujNOBF1Wn2D woCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681492999; x=1684084999; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VT+0HvHjEsU+vTXyoXG01Ki1olkigaqjaaxRq0h7zD0=; b=l76QWsgal+fK9Hu7/CuCACNHNCSgaFz9e+CAIrXWxP9m5E5rlrF4gknYRtVVTpweHV /TRemZYDLTUjEKQ5G1fFkNGGpvn7/oFXpI96ZBpAaOzaOMZ/fobQJGqoYeI27qBzihlK cfkUt6BH267Hyto3FBYVCC2bO/ydRaEZogT2G+lBiHrtgB+D5mpjrSQytWQKOuQ/muig WZh0qsNl6AklzZ33d3Vrbfog61+AH8IDcCFwg8KLRCmZ8LWaradBRV6pQZPn3DeItirK 9mqIRGMPSX976Yye4R2ccivT9O4t6Z/hDMZm3ZhmWAexVQZdo9dGmfWSuVU5UHkFcwWb Xj5w== X-Gm-Message-State: AAQBX9dh5AQfQV64EyChYL6t7ih0oS8yFUDMT2B5UVWPmrB4KLK3fRnN fPeMyvo0mPQElhiKF2fmLPOpzjgXjzooA5Jevdb1vQ== X-Received: by 2002:a17:902:c951:b0:1a6:8ee3:4e2e with SMTP id i17-20020a170902c95100b001a68ee34e2emr4809363pla.33.1681492998912; Fri, 14 Apr 2023 10:23:18 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.243]) by smtp.gmail.com with ESMTPSA id q12-20020a631f4c000000b0051b8172fa68sm370315pgm.38.2023.04.14.10.23.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 10:23:18 -0700 (PDT) From: Jia Zhu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, jefflexu@linux.alibaba.com, hsiangkao@linux.alibaba.com, Jia Zhu Subject: [PATCH V6 0/5] Introduce daemon failover mechanism to recover from crashing Date: Sat, 15 Apr 2023 01:22:34 +0800 Message-Id: <20230414172239.33743-1-zhujia.zj@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes since v5: In cachefiles_daemon_poll(), replace xa_for_each_marked with xas_for_each_marked. [Background] ============ In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon crashes), subsequent read and inflight requests based on these fd will return -EIO. Even if above mentioned case is tolerable for some individual users, but when it happenens in real cloud service production environment, such IO errors will be passed to cloud service users and impact its working jobs. It's terrible for cloud service stability. [Design] ======== The main idea of daemon failover is reopen the inflight req related object, thus the newly started daemon could process the req as usual. To implement that, we need to support: 1. Store inflight requests during daemon crash. 2. Hold the handle of /dev/cachefiles(by container snapshotter/systemd). BTW, if user chooses not to keep /dev/cachefiles fd, failover is not enabled. Inflight requests return error and passed it to container.(same behavior as now). [Flow Path] =========== This patchset introduce three states for ondemand object: CLOSE: Object which just be allocated or closed by user daemon. OPEN: Object which related OPEN request has been processed correctly. REOPENING: Object which has been closed, and is drived to open by a read request. 1. Daemon use UDS send/receive fd to keep and pass the fd reference of "/dev/cachefiles". 2. User daemon crashes -> restart and recover dev fd's reference. 3. User daemon write "restore" to device. 2.1 Reset the object's state from CLOSE to REOPENING. 2.2 Init a work which reinit the object and add it to wq. (daemon can get rid of kernel space and handle that open request). 4. The user of upper filesystem won't notice that the daemon ever crashed since the inflight IO is restored and handled correctly. [Test] ====== There is a testcase for above mentioned scenario. A user process read the file by fscache ondemand reading. At the same time, we kill the daemon constantly. The expected result is that the file read by user is consistent with original, and the user doesn't notice that daemon has ever been killed. https://github.com/userzj/demand-read-cachefilesd/commits/failover-test [GitWeb] ======== https://github.com/userzj/linux/tree/fscache-failover-v6 RFC: https://lore.kernel.org/all/20220818135204.49878-1-zhujia.zj@bytedance.com/ V1: https://lore.kernel.org/all/20221011131552.23833-1-zhujia.zj@bytedance.com/ V2: https://lore.kernel.org/all/20221014030745.25748-1-zhujia.zj@bytedance.com/ V3: https://lore.kernel.org/all/20221014080559.42108-1-zhujia.zj@bytedance.com/ V4: https://lore.kernel.org/all/20230111052515.53941-1-zhujia.zj@bytedance.com/ V5: https://lore.kernel.org/all/20230329140155.53272-1-zhujia.zj@bytedance.com/ Jia Zhu (5): cachefiles: introduce object ondemand state cachefiles: extract ondemand info field from cachefiles_object cachefiles: resend an open request if the read request's object is closed cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode cachefiles: add restore command to recover inflight ondemand read requests fs/cachefiles/daemon.c | 15 +++- fs/cachefiles/interface.c | 7 +- fs/cachefiles/internal.h | 59 +++++++++++++- fs/cachefiles/ondemand.c | 166 ++++++++++++++++++++++++++++---------- 4 files changed, 201 insertions(+), 46 deletions(-) -- 2.20.1