Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp773161lqj; Sun, 2 Jun 2024 23:19:01 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXQYLcijI9pdOVzQcMbwi/YqfXRrpRGAACsv7a2Y2aokpAYgXWk/667+IQ+jsMEnOijjfDOlp4Tn2JKUt/+5szFM5/QthzRbxs6ARllxQ== X-Google-Smtp-Source: AGHT+IFGjGKGHlY5iHTgPW5m3XDGYgf2SksJmf+raE+8FM5r566MO8rzjrUO6x5em9f5DiG2Orq/ X-Received: by 2002:a05:6a21:2729:b0:1a7:23ae:4421 with SMTP id adf61e73a8af0-1b264d5fd02mr14154625637.24.1717395540865; Sun, 02 Jun 2024 23:19:00 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717395540; cv=pass; d=google.com; s=arc-20160816; b=aM1s2/cmyKSu006TYzjMTLbVAa0Ksicaw6PXI3SR4GktQw+z5sBIOOSM6VT77z7KKX xutRDzZmBRNDghYYdMK8XC0V1ViuQ+S59DOxMhiAe4BjQTO/QwkELo63YAsw1SRPTEOO vt/aN18BcASCCbGqkNfbGg50XSmbGm94iYWnQ8CxWLB0WfkwVjTxv4cpdq50SIAcvJEY QUNSAPw9gN0RwPvn5i4ulUll4TGigMcmeI7mxIEkRSbanTHUMnUnlf8T6w+ORbzla53G JWAjWHo+xT/FxB/i+urvrCPYxKJpl1a3eb8BV1Jo/jEqL8mr4sjz5pMz17v6GC2FrlqH MRqg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id:dkim-signature; bh=U1l92gV1h3f4v8wWLEoiitFuPVp+n9cdp6+tTytlorw=; fh=HTf17Jl6pHUI53LBv4fE6kRvsOYfWM9D8hhAiRxWZNE=; b=PkxKbAmNbv5g20IDfALt+5Wtl+YnEerbgPaW7MpS8NgJ5AcrxnKmkbF7j1lR0LdjRz nI+VfnrBxuPvT3XU/B1Y5JzBtnzBZcpRY2YIlxPBq2niLhG3KSaKJUZl/FjVQthMrUxc 3tKCCLW3c6G0+WMQ0aQGcr6gPNgWPkiZZ+y0HQ4nG0L48qN01wDEK8kfaQAIF0W5kIBe kuf/cOyBFKVXURmFVla/s7RfkVUGonPhZ3GHxQXeF8jGTVyq7sczp0l8l8k9m0zwykmJ A7/sEmNtkPBzLDsJ582u+MTxnP7pLxa3a4eMf1c+demnydF1BKPZEDA0X4xXmixr6O/f S5wg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=xBzjubuY; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-198632-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198632-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6c35b40c4f1si5806046a12.591.2024.06.02.23.19.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jun 2024 23:19:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-198632-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.alibaba.com header.s=default header.b=xBzjubuY; arc=pass (i=1 spf=pass spfdomain=linux.alibaba.com dkim=pass dkdomain=linux.alibaba.com dmarc=pass fromdomain=linux.alibaba.com); spf=pass (google.com: domain of linux-kernel+bounces-198632-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198632-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 044FDB21BA9 for ; Mon, 3 Jun 2024 06:17:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5CE40383B0; Mon, 3 Jun 2024 06:17:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="xBzjubuY" Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8857A374D3; Mon, 3 Jun 2024 06:17:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717395468; cv=none; b=tg7xv3D7MTmn8ftzUurq6GRyPtZqX6hosa7Bg+qIV3ZZ6CBwmnuMP9HLjpXAbyBD5FGx0uCrFe0J9KagKwUodIJPbZMmLHeXSNFMpSzCF2ABgdKDI+gd2OZ2Te+Q3EF6kAP/ds2od7kyzGBQCyr6o8ONBXmUUx3y7QSgjpjM+LI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717395468; c=relaxed/simple; bh=bqwooavz5h0SpmZqW29Anepz6kxXdC3/Q984SNlb9II=; h=Message-ID:Date:MIME-Version:To:Cc:From:Subject:Content-Type; b=LvEVEoPbqObaXvLpKF/iQXCBlMq7PRPoKkWg8rdjahgpRIz1pxEQE4QH6n/hr4Gi4YToZJapC9io1q33pjWzCAtlJurWIp8DQkAzEeAlSCpw/BUMIcU8HVMbICcus+6Uc2fBFs5XKaGrUXl1OVZGMrKrw0Tcohw6YkiMXpbZ/WI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=xBzjubuY; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717395461; h=Message-ID:Date:MIME-Version:To:From:Subject:Content-Type; bh=U1l92gV1h3f4v8wWLEoiitFuPVp+n9cdp6+tTytlorw=; b=xBzjubuYgeoNu5ATdLTjjaM5sFFicVeGP1LPMCK8PvzihZoq/Ehn+xBPJfk6mDPYmfPxFHK1RMFrWcDraWKDysVPMDlOkJpfPFCfV9+wOJFmIGdYo2y+3wmQ8NcJT0S/PzD21KXCI5npg8wTlfxLuykmYNdoAs2GgT0jrXCmtT0= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045046011;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0W7j3tcA_1717395460; Received: from 30.221.146.1(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0W7j3tcA_1717395460) by smtp.aliyun-inc.com; Mon, 03 Jun 2024 14:17:41 +0800 Message-ID: <495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com> Date: Mon, 3 Jun 2024 14:17:39 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Miklos Szeredi , "linux-fsdevel@vger.kernel.org" Cc: "linux-kernel@vger.kernel.org" , lege.wang@jaguarmicro.com From: Jingbo Xu Subject: [HELP] FUSE writeback performance bottleneck Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi, Miklos, We spotted a performance bottleneck for FUSE writeback in which the writeback kworker has consumed nearly 100% CPU, among which 40% CPU is used for copy_page(). fuse_writepages_fill alloc tmp_page copy_highpage This is because of FUSE writeback design (see commit 3be5a52b30aa ("fuse: support writable mmap")), which newly allocates a temp page for each dirty page to be written back, copy content of dirty page to temp page, and then write back the temp page instead. This special design is intentional to avoid potential deadlocked due to buggy or even malicious fuse user daemon. There was a proposal of removing this constraint for virtiofs [1], which is reasonable as users of virtiofs and virtiofs daemon don't run on the same OS, and virtiofs daemon is usually offered by cloud vendors that shall not be malicious. While for the normal /dev/fuse interface, I don't think removing the constraint is acceptable. Come back to the writeback performance bottleneck. Another important factor is that, (IIUC) only one kworker at the same time is allowed for writeback for each filesystem instance (if cgroup writeback is not enabled). The kworker is scheduled upon sb->s_bdi->wb.dwork, and the workqueue infrastructure guarantees that at most one *running* worker is allowed for one specific work (sb->s_bdi->wb.dwork) at any time. Thus the writeback is constraint to one CPU for each filesystem instance. I'm not sure if offloading the page copying and then FUSE requests sending to another worker (if a bunch of dirty pages have been collected) is a good idea or not, e.g. ``` fuse_writepages_fill if fuse_writepage_need_send: # schedule a work # the worker for each dirty page in ap->pages[]: copy_page fuse_writepages_send ``` Any suggestion? This issue can be reproduced by: 1 ./libfuse/build/example/passthrough_ll -o cache=always -o writeback -o source=/run/ /mnt ("/run/" is a tmpfs mount) 2 fio --name=write_test --ioengine=psync --iodepth=1 --rw=write --bs=1M --direct=0 --size=1G --numjobs=2 --group_reporting --directory=/mnt (at least two threads are needed; fio shows ~1800MiB/s buffer write bandwidth) [1] https://lore.kernel.org/all/20231228123528.705-1-lege.wang@jaguarmicro.com/ -- Thanks, Jingbo