Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp13220284ybl; Sun, 29 Dec 2019 07:04:48 -0800 (PST) X-Google-Smtp-Source: APXvYqzuPfihhriT8CN67VeIM3wYrCYfxRQXYZH4GIXeAuyYVX++bfBJdCNipMH7RWWTdzUzTcGT X-Received: by 2002:a05:6830:4d5:: with SMTP id s21mr51270551otd.294.1577631888220; Sun, 29 Dec 2019 07:04:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1577631888; cv=none; d=google.com; s=arc-20160816; b=AKmYd9YqDIENoHM50tsqYMvJkb6VsXTcAUeTCQAiuQBj4my9qagkEkH6MzJU1IaysY HL2AtueJ/x7tFgnTvHbtZ4N68qoZBZ7oUja03R/DuoLK8U3BTdup9ti6bAaSQzjWwHh0 fSBIaImaUC3kg17R+PhFBvZioN1dobxZuVqNnzpM+0zaof+s86eUpUWi2utKnehXi7rk FTaiTpQaKU2PvxRAhelYNgYwDnQPfSG+iwEw/h/BDKJwnlA/IXHdHrej5flBOF9NOyOa f517FM+sDbW8Xjfo4RtOmTTKD+xPPjuxxaSpllc8g8RUypIntUKsM1kG578fQ7HKyz/+ 3yUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=GkTvqChWFh9+FPQBZ4kO5+45DiynyyEkFcv7lzv+fOg=; b=sKFY+/k+aftnPt+p37K5O4hUj2xPqkH3GLULA7q/KIww6faRGScZOa/avNFJTiqrV1 6+Emnx1OQHx0WYLFDjoCL+d/IB6rEOtTeTdaOby6jwwznZTDyRFxCSNwkhCS6AD/mNEl has5+WMQB9mm158I2FBxGXfnRzme1eGCcQiRES84W9Bm7eDpkbfawnQ8+VdpJrHat4fM 8sp0vtFqtrbesrxMNiMwzhWLfc5i/W4LxdYPpNgbb6zRGVb7yU//zlSy+CHSID/iUBW2 37c0ta3PfZ+p25lSqZwRGBxWcuUNmoYbycbgbQIxcstZ0wOYAiVES1A/ypOHnMnrgeuG mqaQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c7si20755877oto.305.2019.12.29.07.04.31; Sun, 29 Dec 2019 07:04:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726543AbfL2PDV (ORCPT + 99 others); Sun, 29 Dec 2019 10:03:21 -0500 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:58517 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726410AbfL2PDV (ORCPT ); Sun, 29 Dec 2019 10:03:21 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R841e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=xiaoguang.wang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0TmBIwOh_1577631797; Received: from 30.39.178.78(mailfrom:xiaoguang.wang@linux.alibaba.com fp:SMTPD_---0TmBIwOh_1577631797) by smtp.aliyun-inc.com(127.0.0.1); Sun, 29 Dec 2019 23:03:18 +0800 Subject: Re: Discussion: is it time to remove dioread_nolock? To: "Theodore Y. Ts'o" , Ext4 Developers List Cc: joseph.qi@linux.alibaba.com, Liu Bo References: <20191226153118.GA17237@mit.edu> From: Xiaoguang Wang Message-ID: <9042a8f4-985a-fc83-c059-241c9440200c@linux.alibaba.com> Date: Sun, 29 Dec 2019 23:03:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: <20191226153118.GA17237@mit.edu> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org hi, > With inclusion of Ritesh's inode lock scalability patches[1], the > traditional performance reasons for dioread_nolock --- namely, > removing the need to take an exclusive lock for Direct I/O read > operations --- has been removed. > > [1] https://lore.kernel.org/r/20191212055557.11151-1-riteshh@linux.ibm.com > > So... is it time to remove the code which supports dioread_nolock? > Doing so would simplify the code base, and reduce the test matrix. > This would also make it easier to restructure the write path when > allocating blocks so that the extent tree is updated after writing out > the data blocks, by clearing away the underbrush of dioread nolock > first. > > If we do this, we'd leave the dioread_nolock mount option for > backwards compatibility, but it would be a no-op and not actually do > anything. > > Any objections before I look into ripping out dioread_nolock? > > The one possible concern that I considered was for Alibaba, which was > doing something interesting with dioread_nolock plus nodelalloc. But > looking at Liu Bo's explanation[2], I believe that their workload > would be satisfied simply by using the standard ext4 mount options > (that is, the default mode has the performance benefits when doing > parallel DIO reads, and so the need for nodelalloc to mitigate the > tail latency concerns which Alibaba was seeing in their workload would > not be needed). Could Liu or someone from Alibaba confirm, perhaps > with some benchmarks using their workload? Currently we don't use dioread_nolock & nodelalloc in our internal servers, and we use dioread_nolock & delalloc widely, it works well. The initial reason we use dioread_nolock is that it'll also allocate unwritten extents for buffered write, and normally the corresponding inode won't be added to jbd2 transaction's t_inode_list, so while commiting transaction, it won't flush inodes' dirty pages, then transaction will commit quickly, otherwise in extream case, the time taking to flush dirty inodes will be very big, especially cgroup writeback is enabled. A previous discussion: https://marc.info/?l=linux-fsdevel&m=151799957104768&w=2 I think this semantics hidden behind diread_nolock is also important, so if planning to remove this mount option, we should keep same semantics. Regards, Xiaoguang Wang > > [2] https://lore.kernel.org/linux-ext4/20181121013035.ab4xp7evjyschecy@US-160370MP2.local/ > > - Ted >