Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp1010801lqj; Mon, 3 Jun 2024 07:43:37 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVX9ifRo0i/Hx96tvHoCLXokl79I2OAXCzmvUAEJDS0d0l58os4IiBlRnfPU2+YFvbeQx45gOSJ9wdwRQ4hwOPBavIYnGyjnIX+2zE08A== X-Google-Smtp-Source: AGHT+IGWoDOROgHaYt28eg8p8L14/s7y8AYL/6cwHw4OAtL//R1oxM3QW74Wip/VmbQRVEv0bBWx X-Received: by 2002:a05:622a:252:b0:43e:3d52:3e12 with SMTP id d75a77b69052e-43ff5261ee9mr99328151cf.35.1717425817478; Mon, 03 Jun 2024 07:43:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717425817; cv=pass; d=google.com; s=arc-20160816; b=QNLtctVQSJoH1zOaGsBi7g1OPacnOuPUkueLRD+yCVr4TCvr34qMi347S2iPQ5Ikyz OwKddsfR3JWPrY3mzZSQbsWzV4Inlm4JSBPtoNHsTQQQBaPfzw0hNfK6He5ZtjGN43XV bRoGIAiO4RZXZ2az1JiKhjMesbem/M49HbMCNeuyXIswGafrapWFPnQyowcvKA1dnQ4Z gPiC7IBK/AEwEwjqPKOzk37FEuVfafpIGlDeFDVMg4261ne8rFjS/i1/twEaZZ33shLx eT5y/SAkv2eQite7WO//yj0EcJdQtDqdDh3yHbRntFoEXhK/S++7ajQnNKy/KSon6BdM R45A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:feedback-id :dkim-signature:dkim-signature; bh=Z2qo5O2k2PV/eacvQhlYA1E6BZbR+GCOSqbcm9k3pUE=; fh=WZPy57Hvxyo4atxeOZ7TIhneS0ZSKUvg7mHfTBqRSx0=; b=kAKVnp8uvFjCeBTD6ZfcbA2DKv0WfccCoV7AwhOM5bDDRrhtyMSm1fX2MLpWHgkyVm aWZ215Pu0hlSI37egHDuqY6iMCa5T9R7EIGFmsVHuXTiQWufC6mGwSUHf7HcUctGcGAJ SKXcNHAPypYYZ0zryaTddN7eeaDdFeJL4EPC/FsIDO5M5fzz8dI9tX/HCn960RvSSsaH yL/493Ci1wGzPGRqC7OulUF6zSYLITrMmNdO3rSlDRa/KWg60rWi/T7Uyr0RzOlBAbOx b21Avidi8yYdXZ19gWG2Ursnicciz6963DWUomoUeffQsWt9HT+6FzBJheS/EddZLE6I NaZw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b=BRhvmta0; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="WF92i/Fn"; arc=pass (i=1 spf=pass spfdomain=fastmail.fm dkim=pass dkdomain=fastmail.fm dkim=pass dkdomain=messagingengine.com dmarc=pass fromdomain=fastmail.fm); spf=pass (google.com: domain of linux-kernel+bounces-199307-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199307-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43ff25a4274si86251031cf.687.2024.06.03.07.43.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jun 2024 07:43:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-199307-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b=BRhvmta0; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="WF92i/Fn"; arc=pass (i=1 spf=pass spfdomain=fastmail.fm dkim=pass dkdomain=fastmail.fm dkim=pass dkdomain=messagingengine.com dmarc=pass fromdomain=fastmail.fm); spf=pass (google.com: domain of linux-kernel+bounces-199307-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199307-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0D27D1C21597 for ; Mon, 3 Jun 2024 14:43:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2974A12FF6E; Mon, 3 Jun 2024 14:43:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fastmail.fm header.i=@fastmail.fm header.b="BRhvmta0"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="WF92i/Fn" Received: from fhigh2-smtp.messagingengine.com (fhigh2-smtp.messagingengine.com [103.168.172.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07BF382D8E; Mon, 3 Jun 2024 14:43:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.153 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717425804; cv=none; b=KRzvlavfvr5pu8xK8ALmj1vI6wnHijJLro16YZB7V/5wJu+Q5HNrPK3hvEa9VoOEe0vNJQLltqiyKnJ1UBkUQCMp03dzj30g2S9Q1u0mXAkUxYWaMXqsRILFRiHxPRGrVT1/heO60yknyuXeDr3Wg7fgL58OLeI6lw0SvEMA5eM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717425804; c=relaxed/simple; bh=x/CF7EMgxMegQrau/qgJVCCmxuGZ9Rq54sjL7tyxb9s=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=uJ8EGcAeNAhk2lprlo0r4ArgbbUux1Sg2Z5NZt2aeCSx9ckJJ00JEp3FAOVW2So6cl97DIXX8SpSCbLXbxgZskim0E409X6nuR5G7Go41xXx3HyLwF9tAiNM1rHMPpgm9IqX0CySYGJ8xXBsSsZTUI6EsEvsZXjLlPC6+mm3uT8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fastmail.fm; spf=pass smtp.mailfrom=fastmail.fm; dkim=pass (2048-bit key) header.d=fastmail.fm header.i=@fastmail.fm header.b=BRhvmta0; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=WF92i/Fn; arc=none smtp.client-ip=103.168.172.153 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fastmail.fm Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fastmail.fm Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 00D9911400CE; Mon, 3 Jun 2024 10:43:22 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Mon, 03 Jun 2024 10:43:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1717425801; x=1717512201; bh=Z2qo5O2k2PV/eacvQhlYA1E6BZbR+GCOSqbcm9k3pUE=; b= BRhvmta0Zr5xeYoSLdERexWYunCelEx4bLh5nDmWEqsv2A/kuVwBad965jHfwU/a lY7H4kzOfx5yoGYPd67cC0rw2E/CVHoeH6Qeagg72iMizFVaUMFm3swt7YCnKeSU JGeKAcxDbkNoeXZlUwjP8mkUxYgOIDD295DrE1tBusrw8kgjoWkptNnUiBxMa1HW mWg9Mk+cKyC0jMGr9zTcS81HXg/fzGW9xM8Ae3CsNFL96DcIkwAGk68ruMj2AABQ DYYz4JDuVgGreIqH8VFOZdRgAZKHz6j1A+JYHoUoN/JOTojmL827l2RPdFWE2vl1 he1YJ5qNdNshXBMLvmJBtA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717425801; x= 1717512201; bh=Z2qo5O2k2PV/eacvQhlYA1E6BZbR+GCOSqbcm9k3pUE=; b=W F92i/FnHa1dThyQGV3iOfgULSWtNWxdTZBaJGdkYsXWXZJto9E2WKyF3eq1FDsum wjXonE0e3QoK6LsP2W2fM2NKTASxNscxI/Ic8u0HP3oRVnArur3OhP/Sa3WPdzOG rAwl9hV11WZnG8d8WMCRj5j24dFCHVaOje1VgQIFY/YpGBhSiaLtTt2JzZ+YnLBA F+300Z71T/sVELAl5Cp2OSMCn/91MM7HJorNcX8txK2zJe9MzmDXZeZzy2/7CcnT ShWhNbalSb+PC0lfHxviHZjWQaAVLDV9ynIiK4nZ+oZTmTS37iXFKvDi6IoleySI LgXRTWm8kcQtilDUfBP7Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdelvddgjeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeeuvghr nhguucfutghhuhgsvghrthcuoegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrih hlrdhfmheqnecuggftrfgrthhtvghrnhepvedvfffhleettddtleffheduieeuveeghfdv gfefudeiheduffehudetlefhgfeunecuffhomhgrihhnpehgihhthhhusgdrtghomhdpkh gvrhhnvghlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghi lhhfrhhomhepsggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrghilhdrfhhm X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 3 Jun 2024 10:43:19 -0400 (EDT) Message-ID: <67771830-977f-4fca-9d0b-0126abf120a5@fastmail.fm> Date: Mon, 3 Jun 2024 16:43:17 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [HELP] FUSE writeback performance bottleneck To: Jingbo Xu , Miklos Szeredi , "linux-fsdevel@vger.kernel.org" Cc: "linux-kernel@vger.kernel.org" , lege.wang@jaguarmicro.com References: <495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com> From: Bernd Schubert Content-Language: en-US, de-DE, fr In-Reply-To: <495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 6/3/24 08:17, Jingbo Xu wrote: > Hi, Miklos, > > We spotted a performance bottleneck for FUSE writeback in which the > writeback kworker has consumed nearly 100% CPU, among which 40% CPU is > used for copy_page(). > > fuse_writepages_fill > alloc tmp_page > copy_highpage > > This is because of FUSE writeback design (see commit 3be5a52b30aa > ("fuse: support writable mmap")), which newly allocates a temp page for > each dirty page to be written back, copy content of dirty page to temp > page, and then write back the temp page instead. This special design is > intentional to avoid potential deadlocked due to buggy or even malicious > fuse user daemon. I also noticed that and I admin that I don't understand it yet. The commit says The basic problem is that there can be no guarantee about the time in which the userspace filesystem will complete a write. It may be buggy or even malicious, and fail to complete WRITE requests. We don't want unrelated parts of the system to grind to a halt in such cases. Timing - NFS/cifs/etc have the same issue? Even a local file system has no guarantees how fast storage is? Buggy - hmm yeah, then it is splice related only? But I think splice feature was not introduced yet when fuse got mmap and writeback in 2008? Without splice the pages are just copied into a userspace buffer? So what can userspace do wrong with its copy? Failure - why can't it do what nfs_mapping_set_error() does? I guess I miss something, but so far I don't understand what that is. > > There was a proposal of removing this constraint for virtiofs [1], which > is reasonable as users of virtiofs and virtiofs daemon don't run on the > same OS, and virtiofs daemon is usually offered by cloud vendors that > shall not be malicious. While for the normal /dev/fuse interface, I > don't think removing the constraint is acceptable. > > > Come back to the writeback performance bottleneck. Another important > factor is that, (IIUC) only one kworker at the same time is allowed for > writeback for each filesystem instance (if cgroup writeback is not > enabled). The kworker is scheduled upon sb->s_bdi->wb.dwork, and the > workqueue infrastructure guarantees that at most one *running* worker is > allowed for one specific work (sb->s_bdi->wb.dwork) at any time. Thus > the writeback is constraint to one CPU for each filesystem instance. > > I'm not sure if offloading the page copying and then FUSE requests > sending to another worker (if a bunch of dirty pages have been > collected) is a good idea or not, e.g. > > ``` > fuse_writepages_fill > if fuse_writepage_need_send: > # schedule a work > > # the worker > for each dirty page in ap->pages[]: > copy_page > fuse_writepages_send > ``` > > Any suggestion? > > > > This issue can be reproduced by: > > 1 ./libfuse/build/example/passthrough_ll -o cache=always -o writeback -o > source=/run/ /mnt > ("/run/" is a tmpfs mount) > > 2 fio --name=write_test --ioengine=psync --iodepth=1 --rw=write --bs=1M > --direct=0 --size=1G --numjobs=2 --group_reporting --directory=/mnt > (at least two threads are needed; fio shows ~1800MiB/s buffer write > bandwidth) That should quickly run out of tmpfs memory. I need to find time to improve this a bit, but this should give you an easier test: https://github.com/libfuse/libfuse/pull/807 > > > [1] > https://lore.kernel.org/all/20231228123528.705-1-lege.wang@jaguarmicro.com/ > > Thanks, Bernd