Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9724FC433FE for ; Thu, 18 Nov 2021 16:43:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B2AB610A1 for ; Thu, 18 Nov 2021 16:43:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233684AbhKRQqz (ORCPT ); Thu, 18 Nov 2021 11:46:55 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:52366 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233673AbhKRQqy (ORCPT ); Thu, 18 Nov 2021 11:46:54 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 48695218B0; Thu, 18 Nov 2021 16:43:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1637253833; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uAvLRnRmvQQtZ3go6D5e5zJVFcCKD7HB2NABcevNTLo=; b=auG2BY1i8mGJfduvur7HqOxc61KzrLa9dtfPM63InvHlhhWnX7qO1cJ/5G4gyVxJ/KX28J siifTEBgD0na7RHYRPqWThRSHo5NqR0yMwYBTwM3Dvmolh6085y/YNw/fVNOoHrA8xXkN6 p6xziFdFJN03Zs9Q0upvHOG9YF6/3Ns= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1637253833; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uAvLRnRmvQQtZ3go6D5e5zJVFcCKD7HB2NABcevNTLo=; b=KkQ8xd7MJnXjLrdy2bTbFJ/yroSDOuVEC8e1dTrcN6OJYHDvRi6xj0dlaxsWpwN15UhxEE +Lu02cyNjLvI9RAQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 0EDF2A3B97; Thu, 18 Nov 2021 16:43:53 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id DC2801F2C95; Thu, 18 Nov 2021 17:43:49 +0100 (CET) Date: Thu, 18 Nov 2021 17:43:49 +0100 From: Jan Kara To: Chengguang Xu Cc: Jan Kara , Miklos Szeredi , Amir Goldstein , linux-fsdevel , overlayfs , linux-kernel Subject: Re: [RFC PATCH v5 06/10] ovl: implement overlayfs' ->write_inode operation Message-ID: <20211118164349.GB8267@quack2.suse.cz> References: <17c5aba1fef.c5c03d5825886.6577730832510234905@mykernel.net> <17c5adfe5ea.12f1be94625921.4478415437452327206@mykernel.net> <17d268ba3ce.1199800543649.1713755891767595962@mykernel.net> <17d2c858d76.d8a27d876510.8802992623030721788@mykernel.net> <17d31bf3d62.1119ad4be10313.6832593367889908304@mykernel.net> <20211118112315.GD13047@quack2.suse.cz> <17d32ecf46e.124314f8f672.8832559275193368959@mykernel.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <17d32ecf46e.124314f8f672.8832559275193368959@mykernel.net> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 18-11-21 20:02:09, Chengguang Xu wrote: > ---- 在 星期四, 2021-11-18 19:23:15 Jan Kara 撰写 ---- > > On Thu 18-11-21 14:32:36, Chengguang Xu wrote: > > > > > > ---- 在 星期三, 2021-11-17 14:11:29 Chengguang Xu 撰写 ---- > > > > ---- 在 星期二, 2021-11-16 20:35:55 Miklos Szeredi 撰写 ---- > > > > > On Tue, 16 Nov 2021 at 03:20, Chengguang Xu wrote: > > > > > > > > > > > > ---- 在 星期四, 2021-10-07 21:34:19 Miklos Szeredi 撰写 ---- > > > > > > > On Thu, 7 Oct 2021 at 15:10, Chengguang Xu wrote: > > > > > > > > > However that wasn't what I was asking about. AFAICS ->write_inode() > > > > > > > > > won't start write back for dirty pages. Maybe I'm missing something, > > > > > > > > > but there it looks as if nothing will actually trigger writeback for > > > > > > > > > dirty pages in upper inode. > > > > > > > > > > > > > > > > > > > > > > > > > Actually, page writeback on upper inode will be triggered by overlayfs ->writepages and > > > > > > > > overlayfs' ->writepages will be called by vfs writeback function (i.e writeback_sb_inodes). > > > > > > > > > > > > > > Right. > > > > > > > > > > > > > > But wouldn't it be simpler to do this from ->write_inode()? > > > > > > > > > > > > > > I.e. call write_inode_now() as suggested by Jan. > > > > > > > > > > > > > > Also could just call mark_inode_dirty() on the overlay inode > > > > > > > regardless of the dirty flags on the upper inode since it shouldn't > > > > > > > matter and results in simpler logic. > > > > > > > > > > > > > > > > > > > Hi Miklos, > > > > > > > > > > > > Sorry for delayed response for this, I've been busy with another project. > > > > > > > > > > > > I agree with your suggesion above and further more how about just mark overlay inode dirty > > > > > > when it has upper inode? This approach will make marking dirtiness simple enough. > > > > > > > > > > Are you suggesting that all non-lower overlay inodes should always be dirty? > > > > > > > > > > The logic would be simple, no doubt, but there's the cost to walking > > > > > those overlay inodes which don't have a dirty upper inode, right? > > > > > > > > That's true. > > > > > > > > > Can you quantify this cost with a benchmark? Can be totally synthetic, > > > > > e.g. lookup a million upper files without modifying them, then call > > > > > syncfs. > > > > > > > > > > > > > No problem, I'll do some tests for the performance. > > > > > > > > > > Hi Miklos, > > > > > > I did some rough tests and the results like below. In practice, I don't > > > think that 1.3s extra time of syncfs will cause significant problem. > > > What do you think? > > > > Well, burning 1.3s worth of CPU time for doing nothing seems like quite a > > bit to me. I understand this is with 1000000 inodes but although that is > > quite a few it is not unheard of. If there would be several containers > > calling sync_fs(2) on the machine they could easily hog the machine... That > > is why I was originally against keeping overlay inodes always dirty and > > wanted their dirtiness to at least roughly track the real need to do > > writeback. > > > > Hi Jan, > > Actually, the time on user and sys are almost same with directly excute syncfs on underlying fs. > IMO, it only extends syncfs(2) waiting time for perticular container but not burning cpu. > What am I missing? Ah, right, I've missed that only realtime changed, not systime. I'm sorry for confusion. But why did the realtime increase so much? Are we waiting for some IO? Honza > > > Test bed: kvm vm > > > 2.50GHz cpu 32core > > > 64GB mem > > > vm kernel 5.15.0-rc1+ (with ovl syncfs patch V6) > > > > > > one millon files spread to 2 level of dir hierarchy. > > > test step: > > > 1) create testfiles in ovl upper dir > > > 2) mount overlayfs > > > 3) excute ls -lR to lookup all file in overlay merge dir > > > 4) excute slabtop to make sure overlay inode number > > > 5) call syncfs to the file in merge dir > > > > > > Tested five times and the reusults are in 1.310s ~ 1.326s > > > > > > root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh > > > syncfs success > > > > > > real 0m1.310s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh > > > syncfs success > > > > > > real 0m1.326s > > > user 0m0.001s > > > sys 0m0.000s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh > > > syncfs success > > > > > > real 0m1.321s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh > > > syncfs success > > > > > > real 0m1.316s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-merge/create-file.sh > > > syncfs success > > > > > > real 0m1.314s > > > user 0m0.001s > > > sys 0m0.001s > > > > > > > > > Directly run syncfs to the file in ovl-upper dir. > > > Tested five times and the reusults are in 0.001s ~ 0.003s > > > > > > [root@VM-144-4-centos test]# time ./syncfs a > > > syncfs success > > > > > > real 0m0.002s > > > user 0m0.001s > > > sys 0m0.000s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh > > > syncfs success > > > > > > real 0m0.003s > > > user 0m0.001s > > > sys 0m0.000s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh > > > syncfs success > > > > > > real 0m0.001s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh > > > syncfs success > > > > > > real 0m0.001s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh > > > syncfs success > > > > > > real 0m0.001s > > > user 0m0.000s > > > sys 0m0.001s > > > [root@VM-144-4-centos test]# time ./syncfs ovl-upper/create-file.sh > > > syncfs success > > > > > > real 0m0.001s > > > user 0m0.000s > > > sys 0m0.001 > > > > > > > > > > > > > > > > > > > > -- > > Jan Kara > > SUSE Labs, CR > > -- Jan Kara SUSE Labs, CR