Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4865380iob; Mon, 9 May 2022 03:32:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx155N6/5a9Kmhl3RMUNI5Z6SXYzYsFlPpWofZ4Dn/04eZA+xJQEMojNg2ctfzV6knP2jJW X-Received: by 2002:a05:6a00:15c2:b0:510:6d56:730 with SMTP id o2-20020a056a0015c200b005106d560730mr15493373pfu.62.1652092378042; Mon, 09 May 2022 03:32:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652092378; cv=none; d=google.com; s=arc-20160816; b=evMrukidKUHfkDjs/Ot6dVvYwXEn1kPrrpxlEv5d0lK3L6m2XdXaBkrdExSUturEH0 r2mQFYAbejgNBWQePD4NoqEl2ZcMFCzJ39BS6xU4W5+nSPdS7Fm+2/RImK+Qvr9AR32B ZAOtbz8UHjWy26mkbZlNs2eVICr+1yKgNEewCKBj7fiblnDetAgm43ne92LmqEQN1Pxw Y9EbKJ5nU654HbeLcwQX6gOu1iuQ5R8Q0CXh/BCekurrVgKkgJ5bgg78puJhPvZyzEWt WBUe3LV5rAyiNeto8mjkYd0TTZUtbEaeH8A0jFbJoHeyr65V07qBACElrYbgPXKU0DAC lr9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=TzGB+cBpPWXb29+9qT00J3NftnEX9e6qIw1foWdhBN0=; b=HLg+h7iiZPiPGKbth4rN9QlnR55eXURsRxhHbLCyAeMN//URERAJKO0P3ofQ3eMqfg 1zW+fyjhArOyisrgC/abMhdzkQV9ymMgsLxg55xaFVKj2XE8LOmYOd2sjg329zLMKZMm t2bR9HaO3DoBcQkrLB234fUYetY5s5JV9TweVXWBp2aPP4BWx4W4JJlQGBbOjlPOitBc mM+BNwNzPyth3ILqj+fEB5XesK6/MngxmIRGX/PYHD67pvyEIAr9P6t6+AZHVEBd18cX vzuzDzCIy/7XOlQlvkwGvbY7UawzWVOj4xIp5iAt5gcv22RlA9DXUhhBw9wXYuB9VXMA f3/g== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x5-20020aa79a45000000b0050dfc3c58fbsi15335601pfj.5.2022.05.09.03.32.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 03:32:58 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A6D5B1F90F0; Mon, 9 May 2022 03:02:53 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236818AbiEIHyJ (ORCPT + 99 others); Mon, 9 May 2022 03:54:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235007AbiEIHoy (ORCPT ); Mon, 9 May 2022 03:44:54 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50C2017073; Mon, 9 May 2022 00:40:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R231e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0VCfxsXb_1652082041; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VCfxsXb_1652082041) by smtp.aliyun-inc.com(127.0.0.1); Mon, 09 May 2022 15:40:42 +0800 From: Jeffle Xu To: dhowells@redhat.com, linux-cachefs@redhat.com, xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: torvalds@linux-foundation.org, gregkh@linuxfoundation.org, willy@infradead.org, linux-fsdevel@vger.kernel.org, joseph.qi@linux.alibaba.com, bo.liu@linux.alibaba.com, tao.peng@linux.alibaba.com, gerry@linux.alibaba.com, eguan@linux.alibaba.com, linux-kernel@vger.kernel.org, luodaowen.backend@bytedance.com, tianzichen@kuaishou.com, yinxin.x@bytedance.com, zhangjiachen.jaycee@bytedance.com, zhujia.zj@bytedance.com Subject: [PATCH v11 08/22] cachefiles: document on-demand read mode Date: Mon, 9 May 2022 15:40:14 +0800 Message-Id: <20220509074028.74954-9-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220509074028.74954-1-jefflexu@linux.alibaba.com> References: <20220509074028.74954-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Document new user interface introduced by on-demand read mode. Signed-off-by: Jeffle Xu --- .../filesystems/caching/cachefiles.rst | 178 ++++++++++++++++++ 1 file changed, 178 insertions(+) diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst index 8bf396b76359..fc7abf712315 100644 --- a/Documentation/filesystems/caching/cachefiles.rst +++ b/Documentation/filesystems/caching/cachefiles.rst @@ -28,6 +28,7 @@ Cache on Already Mounted Filesystem (*) Debugging. + (*) On-demand Read. Overview @@ -482,3 +483,180 @@ the control file. For example:: echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug will turn on all function entry debugging. + + +On-demand Read +============== + +When working in its original mode, CacheFiles serves as a local cache for a +remote networking fs - while in on-demand read mode, CacheFiles can boost the +scenario where on-demand read semantics are needed, e.g. container image +distribution. + +The essential difference between these two modes is seen when a cache miss +occurs: In the original mode, the netfs will fetch the data from the remote +server and then write it to the cache file; in on-demand read mode, fetching +the data and writing it into the cache is delegated to a user daemon. + +``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode. + + +Protocol Communication +---------------------- + +The on-demand read mode uses a simple protocol for communication between kernel +and user daemon. The protocol can be modeled as:: + + kernel --[request]--> user daemon --[reply]--> kernel + +CacheFiles will send requests to the user daemon when needed. The user daemon +should poll the devnode ('/dev/cachefiles') to check if there's a pending +request to be processed. A POLLIN event will be returned when there's a pending +request. + +The user daemon then reads the devnode to fetch a request to process. It should +be noted that each read only gets one request. When it has finished processing +the request, the user daemon should write the reply to the devnode. + +Each request starts with a message header of the form:: + + struct cachefiles_msg { + __u32 msg_id; + __u32 opcode; + __u32 len; + __u32 object_id; + __u8 data[]; + }; + +where: + + * ``msg_id`` is a unique ID identifying this request among all pending + requests. + + * ``opcode`` indicates the type of this request. + + * ``object_id`` is a unique ID identifying the cache file operated on. + + * ``data`` indicates the payload of this request. + + * ``len`` indicates the whole length of this request, including the + header and following type-specific payload. + + +Turning on On-demand Mode +------------------------- + +An optional parameter becomes available to the "bind" command:: + + bind [ondemand] + +When the "bind" command is given no argument, it defaults to the original mode. +When it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read +mode will be enabled. + + +The OPEN Request +---------------- + +When the netfs opens a cache file for the first time, a request with the +CACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user +daemon. The payload format is of the form:: + + struct cachefiles_open { + __u32 volume_key_size; + __u32 cookie_key_size; + __u32 fd; + __u32 flags; + __u8 data[]; + }; + +where: + + * ``data`` contains the volume_key followed directly by the cookie_key. + The volume key is a NUL-terminated string; the cookie key is binary + data. + + * ``volume_key_size`` indicates the size of the volume key in bytes. + + * ``cookie_key_size`` indicates the size of the cookie key in bytes. + + * ``fd`` indicates an anonymous fd referring to the cache file, through + which the user daemon can perform write/llseek file operations on the + cache file. + + +The user daemon can use the given (volume_key, cookie_key) pair to distinguish +the requested cache file. With the given anonymous fd, the user daemon can +fetch the data and write it to the cache file in the background, even when +kernel has not triggered a cache miss yet. + +Be noted that each cache file has a unique object_id, while it may have multiple +anonymous fds. The user daemon may duplicate anonymous fds from the initial +anonymous fd indicated by the @fd field through dup(). Thus each object_id can +be mapped to multiple anonymous fds, while the usr daemon itself needs to +maintain the mapping. + +When implementing a user daemon, please be careful of RLIMIT_NOFILE, +``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``. Typically these needn't +be huge since they're related to the number of open device blobs rather than +open files of each individual filesystem. + +The user daemon should reply the OPEN request by issuing a "copen" (complete +open) command on the devnode:: + + copen , + +where: + + * ``msg_id`` must match the msg_id field of the OPEN request. + + * When >= 0, ``cache_size`` indicates the size of the cache file; + when < 0, ``cache_size`` indicates any error code encountered by the + user daemon. + + +The CLOSE Request +----------------- + +When a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be +sent to the user daemon. This tells the user daemon to close all anonymous fds +associated with the given object_id. The CLOSE request has no extra payload, +and shouldn't be replied. + + +The READ Request +---------------- + +When a cache miss is encountered in on-demand read mode, CacheFiles will send a +READ request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user +daemon to fetch the contents of the requested file range. The payload is of the +form:: + + struct cachefiles_read { + __u64 off; + __u64 len; + }; + +where: + + * ``off`` indicates the starting offset of the requested file range. + + * ``len`` indicates the length of the requested file range. + + +When it receives a READ request, the user daemon should fetch the requested data +and write it to the cache file identified by object_id. + +When it has finished processing the READ request, the user daemon should reply +by using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds +associated with the object_id given in the READ request. The ioctl is of the +form:: + + ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id); + +where: + + * ``fd`` is one of the anonymous fds associated with the object_id + given. + + * ``msg_id`` must match the msg_id field of the READ request. -- 2.27.0