Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1393828ybl; Fri, 23 Aug 2019 19:20:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqyP3Bq0DIA63y0lJ4NdsogyndUJwzasuIUFw62YSRUfkjxTCJzJpopQDFiz4HsRSQQTkVN3 X-Received: by 2002:aa7:8a04:: with SMTP id m4mr8542350pfa.65.1566613233967; Fri, 23 Aug 2019 19:20:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566613233; cv=none; d=google.com; s=arc-20160816; b=TxlNIwTpHigQEj7tLNpU356dkWN4FCQz5OaUn7+bLmwFoKgjFBFnB658S9SNxTiGVU WhrT8ai36v0AitWKmTZKoR3bwYjKZE/iaXj3hhrmXfwskqeuhozZJV4DmyU0mOZnTn9C C1svFIQAPLYTyiVzy1URm25LV2616bPIDMMBG2MvUngz3DZs5OQFBHzQW7rl2lpSRYU7 rQa2gEUIiGkZv7PBWdYYNhEaSwiBlQjn8s5jwMpNFP9PUH/U5C9hWGPCB7qKcB/80dIn KOgqfR9v6J71JlCVSf3TYQsY4kRBwJswth16pvhdftFEEiyZLpAoNBzuUctRld9tXd5p kGlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=VU54DVXGBYS1E/RFNcW19TVKF0P31plfsFPgkPUUL1o=; b=krIp5G8DSNGMBOtuRVufiugR6u+GD358k2f++nmqkOE3kyJFOjwPNBieFNtKtGUWA5 +Eck+uCXhJl6Vl2bT3eIokk89W3qbkqPVO9uo6hUIkBLBkWga03/D2RtkYKPJUq+odhd 5LYK+KVr2fzfO72NfPT8GSeJImhV2AkBG+BIdNDpJDa8/+4tPcy9132iTyznQ/jKw8Pj YdxI/5cx0m4x+GTJtItmwPpA3zDwyBHdnt6+gQrYPEv676QaInazrDv42VdfYO1bTinW appNDGCBsuLkJ4Hc8UV0qVm/NAZWat8IINOrEDV97RcCbqLlvtVfS0Fvrop1sxatHDeJ dn5w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o2si4338216pfg.136.2019.08.23.19.20.04; Fri, 23 Aug 2019 19:20:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725924AbfHXCTw (ORCPT + 99 others); Fri, 23 Aug 2019 22:19:52 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:60138 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725807AbfHXCTw (ORCPT ); Fri, 23 Aug 2019 22:19:52 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 99FCA36140C; Sat, 24 Aug 2019 12:19:47 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1i1LdY-0008Ny-8O; Sat, 24 Aug 2019 12:18:40 +1000 Date: Sat, 24 Aug 2019 12:18:40 +1000 From: Dave Chinner To: Joseph Qi Cc: "Theodore Y. Ts'o" , Jan Kara , Joseph Qi , Andreas Dilger , Ext4 Developers List , Xiaoguang Wang , Liu Bo Subject: Re: [RFC] performance regression with "ext4: Allow parallel DIO reads" Message-ID: <20190824021840.GW7777@dread.disaster.area> References: <20190815151336.GO14313@quack2.suse.cz> <075fd06f-b0b4-4122-81c6-e49200d5bd17@linux.alibaba.com> <20190816145719.GA3041@quack2.suse.cz> <20190820160805.GB10232@mit.edu> <20190822054001.GT7777@dread.disaster.area> <20190823101623.GV7777@dread.disaster.area> <707b1a60-00f0-847e-02f9-e63d20eab47e@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <707b1a60-00f0-847e-02f9-e63d20eab47e@linux.alibaba.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=IkcTkHD0fZMA:10 a=FmdZ9Uzk2mMA:10 a=7-415B0cAAAA:8 a=mCdJKzTsXoswTGrCkX0A:9 a=QEXdDO2ut3YA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Aug 23, 2019 at 09:08:53PM +0800, Joseph Qi wrote: > > > On 19/8/23 18:16, Dave Chinner wrote: > > On Fri, Aug 23, 2019 at 03:57:02PM +0800, Joseph Qi wrote: > >> Hi Dave, > >> > >> On 19/8/22 13:40, Dave Chinner wrote: > >>> On Wed, Aug 21, 2019 at 09:04:57AM +0800, Joseph Qi wrote: > >>>> Hi Ted, > >>>> > >>>> On 19/8/21 00:08, Theodore Y. Ts'o wrote: > >>>>> On Tue, Aug 20, 2019 at 11:00:39AM +0800, Joseph Qi wrote: > >>>>>> > >>>>>> I've tested parallel dio reads with dioread_nolock, it > >>>>>> doesn't have significant performance improvement and still > >>>>>> poor compared with reverting parallel dio reads. IMO, this > >>>>>> is because with parallel dio reads, it take inode shared > >>>>>> lock at the very beginning in ext4_direct_IO_read(). > >>>>> > >>>>> Why is that a problem? It's a shared lock, so parallel > >>>>> threads should be able to issue reads without getting > >>>>> serialized? > >>>>> > >>>> The above just tells the result that even mounting with > >>>> dioread_nolock, parallel dio reads still has poor performance > >>>> than before (w/o parallel dio reads). > >>>> > >>>>> Are you using sufficiently fast storage devices that you're > >>>>> worried about cache line bouncing of the shared lock? Or do > >>>>> you have some other concern, such as some other thread > >>>>> taking an exclusive lock? > >>>>> > >>>> The test case is random read/write described in my first > >>>> mail. And > >>> > >>> Regardless of dioread_nolock, ext4_direct_IO_read() is taking > >>> inode_lock_shared() across the direct IO call. And writes in > >>> ext4 _always_ take the inode_lock() in ext4_file_write_iter(), > >>> even though it gets dropped quite early when overwrite && > >>> dioread_nolock is set. But just taking the lock exclusively > >>> in write fro a short while is enough to kill all shared > >>> locking concurrency... > >>> > >>>> from my preliminary investigation, shared lock consumes more > >>>> in such scenario. > >>> > >>> If the write lock is also shared, then there should not be a > >>> scalability issue. The shared dio locking is only half-done in > >>> ext4, so perhaps comparing your workload against XFS would be > >>> an informative exercise... > >> > >> I've done the same test workload on xfs, it behaves the same as > >> ext4 after reverting parallel dio reads and mounting with > >> dioread_lock. > > > > Ok, so the problem is not shared locking scalability ('cause > > that's what XFS does and it scaled fine), the problem is almost > > certainly that ext4 is using exclusive locking during > > writes... > > > > Agree. Maybe I've misled you in my previous mails.I meant shared > lock makes worse in case of mixed random read/write, since we > would always take inode lock during write. And it also conflicts > with dioread_nolock. It won't take any inode lock before with > dioread_nolock during read, but now it always takes a shared > lock. No, you didn't mislead me. IIUC, the shared locking was added to the direct IO read path so that it can't run concurrently with operations like hole punch that free the blocks the dio read might currently be operating on (use after free). i.e. the shared locking fixes an actual bug, but the performance regression is a result of only partially converting the direct IO path to use shared locking. Only half the job was done from a performance perspective. Seems to me that the two options here to fix the performance regression are to either finish the shared locking conversion, or remove the shared locking on read and re-open a potential data exposure issue... Cheers, Dave. -- Dave Chinner david@fromorbit.com