Received: by 2002:a05:7412:3290:b0:fa:6e18:a558 with SMTP id ev16csp463854rdb; Fri, 26 Jan 2024 00:28:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IH+SFpnZjFcxxT5Ww2RBXjq+Jegwq1KzjjtXKmhmFjfCecn+pa3pHLIcFDheQcZSqRpfL0D X-Received: by 2002:a17:903:40d0:b0:1d7:79c5:8f5a with SMTP id t16-20020a17090340d000b001d779c58f5amr1244346pld.134.1706257729954; Fri, 26 Jan 2024 00:28:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706257729; cv=pass; d=google.com; s=arc-20160816; b=B/g4sMxRL6GRCFYiWXhiRyWPTvweQ1Ajy35ijHrGHW844imDQe80a3cmPeXmcTgx14 u1wR1Lp5ocMg0TA+o/vFhEmhqu7RlWKSw4n+ZWut6L7EAuygWz3kpMM7vtwVZ9tVYoW2 kzDLwEHL7rubtbNqK0GAjiWHOcLRlTVZHHNyEq10YJSzENk4kh0s7ttteu5dwJJ3rXNr rZOXNiX7mfES2TkYjj+TqodLlKZ80brbEJQKup0pEqdni+VeM/YbNxXHb9T+cyVJMG58 IGjOagoMvkxI5wTImTM9wX4vW116OA0OnGULR4PC2mLXM5ttQhqW8k0ufo1deZILUGvY rYQA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=NT8hdE7ND84hHJW5HWIw/4V07Z4gRS7n8kgPyH9udd8=; fh=703EK4fsgijzdjx1LAFFSrw4SX7taPT7rj2kur7h6dQ=; b=Px0dY3vJtnN1mWUBu99kKJk0mh2ZmNULaqD+0wBqyRtrH2ixSXW+Xn4Lz+4z0FKlNU H9NgMTlUbiErdg2QeUJJRgda8d26warJI4l9v/KFo51bVLTSRlFGpmBVpIeF2TsVV9+C HJldGqfbFFsGH4ZoOYX6r4XN1B+cXR4Wsm8E8eC/iXoYCL4ifmpNoWr5yZpksV4stVZK rwKYq8eS/bgAQwAhW6sRTFzZMocl0gIWZwKDFLy/Vws44J2c3vQsv7yaopx6qztTKa/B qIUSfF07ciy7jj/Si2GUYwH4yw4xzWyGbnle6OPIr2iuv1Ibo7PyX4t45E7nAIddQEFa wNEA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=KI4HUP2Y; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-39661-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39661-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id y7-20020a1709027c8700b001d50ecf8686si713634pll.520.2024.01.26.00.28.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jan 2024 00:28:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-39661-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=KI4HUP2Y; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-39661-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39661-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8DE38B20AEA for ; Fri, 26 Jan 2024 08:28:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BA0C61BF53; Fri, 26 Jan 2024 06:29:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="KI4HUP2Y" Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE5B311184; Fri, 26 Jan 2024 06:29:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706250566; cv=none; b=lmsrsc7rm5a9p7kptHpN1Nsgfz7KfbpbSOWxu1hCpfwGf+ez1nclXmJEcmkMl/50aO8TAiaRnwk84zIAtW59aSCcDo2eOvvibWAjI5FwkBPJFmU4YZuVh+6qDQlr8OLpdzmtNzdu7LUGd+W+ieDXI/4DWAW7OfZ5UMFXv89UNmY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706250566; c=relaxed/simple; bh=1oKgDqQmkrliWI/8sRTPCwLf90bKib+H0y0+WavBNMo=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=KuKHBXJuzmaakf18QWFVMD87+HhxlbqcatO1G6JHMOocYW+h/nAv4dtxtCQJ9oyPu/o8xskzP9G4fRGUPKOCrHgvAXnEZfaymoBAsAMUYT1uNGzjf6DFBjPJ3NtRFQz2mcwHkCzTx/MqkNZaOaffX5kMSp4hISZN0aBrt766w8w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=KI4HUP2Y; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1706250553; h=Message-ID:Date:MIME-Version:Subject:From:To:Content-Type; bh=NT8hdE7ND84hHJW5HWIw/4V07Z4gRS7n8kgPyH9udd8=; b=KI4HUP2Y80uIOGY/aCB3L7VHV2S45yx2GiRULjXq+gtD0GL7xsarI+dKI3HETPGN1lPzudqIvjzimRCM1UXjS3oGA6atD3q002qIwUZTs/YVp5o3Q87dtrPSzEw+P5sy6vHDskIKelV5HlaB+/DXRqAiy/Docqvq1DXO7ahPE8E= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0W.Moqkw_1706250552; Received: from 30.221.147.50(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0W.Moqkw_1706250552) by smtp.aliyun-inc.com; Fri, 26 Jan 2024 14:29:13 +0800 Message-ID: Date: Fri, 26 Jan 2024 14:29:09 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] fuse: increase FUSE_MAX_MAX_PAGES limit Content-Language: en-US From: Jingbo Xu To: Miklos Szeredi Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, zhangjiachen.jaycee@bytedance.com References: <20240124070512.52207-1-jefflexu@linux.alibaba.com> <6e6bef3d-dd26-45ce-bc4a-c04a960dfb9c@linux.alibaba.com> In-Reply-To: <6e6bef3d-dd26-45ce-bc4a-c04a960dfb9c@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 1/24/24 8:47 PM, Jingbo Xu wrote: > > > On 1/24/24 8:23 PM, Miklos Szeredi wrote: >> On Wed, 24 Jan 2024 at 08:05, Jingbo Xu wrote: >>> >>> From: Xu Ji >>> >>> Increase FUSE_MAX_MAX_PAGES limit, so that the maximum data size of a >>> single request is increased. >> >> The only worry is about where this memory is getting accounted to. >> This needs to be thought through, since the we are increasing the >> possible memory that an unprivileged user is allowed to pin. Apart from the request size, the maximum number of background requests, i.e. max_background (12 by default, and configurable by the fuse daemon), also limits the size of the memory that an unprivileged user can pin. But yes, it indeed increases the number proportionally by increasing the maximum request size. > >> >> >> >>> >>> This optimizes the write performance especially when the optimal IO size >>> of the backend store at the fuse daemon side is greater than the original >>> maximum request size (i.e. 1MB with 256 FUSE_MAX_MAX_PAGES and >>> 4096 PAGE_SIZE). >>> >>> Be noted that this only increases the upper limit of the maximum request >>> size, while the real maximum request size relies on the FUSE_INIT >>> negotiation with the fuse daemon. >>> >>> Signed-off-by: Xu Ji >>> Signed-off-by: Jingbo Xu >>> --- >>> I'm not sure if 1024 is adequate for FUSE_MAX_MAX_PAGES, as the >>> Bytedance floks seems to had increased the maximum request size to 8M >>> and saw a ~20% performance boost. >> >> The 20% is against the 256 pages, I guess. > > Yeah I guess so. > > >> It would be interesting to >> see the how the number of pages per request affects performance and >> why. > > To be honest, I'm not sure the root cause of the performance boost in > bytedance's case. > > While in our internal use scenario, the optimal IO size of the backend > store at the fuse server side is, e.g. 4MB, and thus if the maximum > throughput can not be achieved with current 256 pages per request. IOW > the backend store, e.g. a distributed parallel filesystem, get optimal > performance when the data is aligned at 4MB boundary. I can ask my folk > who implements the fuse server to give more background info and the > exact performance statistics. Here are more details about our internal use case: We have a fuse server used in our internal cloud scenarios, while the backend store is actually a distributed filesystem. That is, the fuse server actually plays as the client of the remote distributed filesystem. The fuse server forwards the fuse requests to the remote backing store through network, while the remote distributed filesystem handles the IO requests, e.g. process the data from/to the persistent store. Then it comes the details of the remote distributed filesystem when it process the requested data with the persistent store. [1] The remote distributed filesystem uses, e.g. a 8+3 mode, EC (ErasureCode), where each fixed sized user data is split and stored as 8 data blocks plus 3 extra parity blocks. For example, with 512 bytes block size, for each 4MB user data, it's split and stored as 8 (512 bytes) data blocks with 3 (512 bytes) parity blocks. It also utilize the stripe technology to boost the performance, for example, there are 8 data disks and 3 parity disks in the above 8+3 mode example, in which each stripe consists of 8 data blocks and 3 parity blocks. [2] To avoid data corruption on power off, the remote distributed filesystem commit a O_SYNC write right away once a write (fuse) request received. Since the EC described above, when the write fuse request is not aligned on 4MB (the stripe size) boundary, say it's 1MB in size, the other 3MB is read from the persistent store first, then compute the extra 3 parity blocks with the complete 4MB stripe, and finally write the 8 data blocks and 3 parity blocks down. Thus the write amplification is un-neglectable and is the performance bottleneck when the fuse request size is less than the stripe size. Here are some simple performance statistics with varying request size. With 4MB stripe size, there's ~3x bandwidth improvement when the maximum request size is increased from 256KB to 3.9MB, and another ~20% improvement when the request size is increased to 4MB from 3.9MB. -- Thanks, Jingbo