Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1306171ybl; Sun, 19 Jan 2020 00:00:45 -0800 (PST) X-Google-Smtp-Source: APXvYqy1WY05IFaE+TBHH5xapOs5KeoFhm2TWHw0o5NFSlrnjMI82hZgveg3udGCTwhhoKe+c6DV X-Received: by 2002:a05:6808:683:: with SMTP id k3mr8903859oig.50.1579420845139; Sun, 19 Jan 2020 00:00:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579420845; cv=none; d=google.com; s=arc-20160816; b=dUiiXtFsFF2ani+vTs5IBdPQiPsAvaPfBkaBQZ8KrYyiZJIZZ0cBlyn99mgFy6ECgq ojL6oYJ9XTsrOLEfyA4d/hmORqd1f5hqEGG5HRfXp0KnfhNgk5lVmX6Rda84hoQMza+Z CECoq9Ix2oR7qHgfz2ULOZzT8T5Qntjb85NjnJhMt1td439lL9Ed52zEZnImKmNBDfPE pyjGevnQOJEMH8Gb/JIq9ad4dCkAiDH2UCthCAHzmL9wLmoiWgiOjt9D2ZVDH6CnbEPA Y3V5rD8H7fx22R/8FW1z49aD/6WsS2qf96PzkdSp+X2+6z419B+AZjTts0VPt/TGrboV KgHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=N+YdFKhWavAVlRXTIVt/VMOYgs6+bw78jRCCgHPX49U=; b=PhD3xfKUwfKdRhBUptxOos8JuI4yzBJkd1jKnNIWrIEpxt94yLnMg7RCd0duPWBmg9 LCxSvTC3qRPj751sCxF7VSAPGiz6CXo2wyjMaBpZdJ2cI8+lLBAaHF82PBMZTDcK2REa z2Mg4KCgSZYiWAwRlbEnHm3GW0XJFye7bp68x1Da9K59DJ1OjsAc7bdMr29SRN1zFRyC WESIZU6Lyl3QG2DxSXZLsw1mYBkthX2V6pMnbxAoQtqJt8o16WHnRRZyKweNLG2qtssj MkqA4Q96JCol3kzGpLBzSXGt+r1Tj9DpYP/0iL/hE0f73llQjp9rPKp4HCz/i7t3NT3r lcVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=cnpOgTxq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o16si18694774otp.289.2020.01.19.00.00.20; Sun, 19 Jan 2020 00:00:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=cnpOgTxq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726778AbgASH6i (ORCPT + 99 others); Sun, 19 Jan 2020 02:58:38 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:33792 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726421AbgASH6h (ORCPT ); Sun, 19 Jan 2020 02:58:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=N+YdFKhWavAVlRXTIVt/VMOYgs6+bw78jRCCgHPX49U=; b=cnpOgTxq6/Xz/M9FKSsXmVlV9 9zFCg83sM54zdtEz2Y58AHRa35UmzHkHnhBqaPr8CXR/O5hjWKMlGzLxG8RU1YbtTGQZ+udBVj6JO 0PJ6n64TQIyp2NpZ3duUtFWMQW10V9B51+EJCQw6D832tbnv9TUoXgAdWOYvvvMuPlfK1pl55OGcT Bs4ODrDJ8hmz6Fltuc3w/xO9+cTUmtIGp1aqqmh1gawiikfXhU4IZK+4DpuB3nCzf+dOXcIVImvM2 BxHjTDdDk22xQKeFg1xqTlqhaIki3RTnsILGLSxvKZHof1cHGoe2C7bzlLZmzd84OMtjfSfMfF/DM PMzpnfhBg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1it5TY-0007CL-Hv; Sun, 19 Jan 2020 07:58:28 +0000 Date: Sat, 18 Jan 2020 23:58:28 -0800 From: Matthew Wilcox To: "yukuai (C)" Cc: hch@infradead.org, darrick.wong@oracle.com, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, houtao1@huawei.com, zhengbin13@huawei.com, yi.zhang@huawei.com Subject: Re: [RFC] iomap: fix race between readahead and direct write Message-ID: <20200119075828.GA4147@bombadil.infradead.org> References: <20200116063601.39201-1-yukuai3@huawei.com> <20200118230826.GA5583@bombadil.infradead.org> <20200119014213.GA16943@bombadil.infradead.org> <64d617cc-e7fe-6848-03bb-aab3498c9a07@huawei.com> <20200119061402.GA7301@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 19, 2020 at 02:55:14PM +0800, yukuai (C) wrote: > On 2020/1/19 14:14, Matthew Wilcox wrote: > > I don't understand your reasoning here. If another process wants to > > access a page of the file which isn't currently in cache, it would have > > to first read the page in from storage. If it's under readahead, it > > has to wait for the read to finish. Why is the second case worse than > > the second? It seems better to me. > > Thanks for your response! My worries is that, for example: > > We read page 0, and trigger readahead to read n pages(0 - n-1). While in > another thread, we read page n-1. > > In the current implementation, if readahead is in the process of reading > page 0 - n-2, later operation doesn't need to wait the former one to > finish. However, later operation will have to wait if we add all pages > to page cache first. And that is why I said it might cause problem for > performance overhead. OK, but let's put some numbers on that. Imagine that we're using high performance spinning rust so we have an access latency of 5ms (200 IOPS), we're accessing 20 consecutive pages which happen to have their data contiguous on disk. Our CPU is running at 2GHz and takes about 100,000 cycles to submit an I/O, plus 1,000 cycles to add an extra page to the I/O. Current implementation: Allocate 20 pages, place 19 of them in the cache, fail to place the last one in the cache. The later thread actually gets to jump the queue and submit its bio first. Its latency will be 100,000 cycles (20us) plus the 5ms access time. But it only has 20,000 cycles (4us) to hit this race, or it will end up behaving the same way as below. New implementation: Allocate 20 pages, place them all in the cache, then takes 120,000 cycles to build & submit the I/O, and wait 5ms for the I/O to complete. But look how much more likely it is that it'll hit during the window where we're waiting for the I/O to complete -- 5ms is 1250 times longer than 4us. If it _does_ get the latency benefit of jumping the queue, the readahead will create one or two I/Os. If it hit page 18 instead of page 19, we'd end up doing three I/Os; the first for page 18, then one for pages 0-17, and one for page 19. And that means the disk is going to be busy for 15ms, delaying the next I/O for up to 10ms. It's actually beneficial in the long term for the second thread to wait for the readahead to finish. Oh, and the current ->readpages code has a race where if the page tagged with PageReadahead ends up not being inserted, we'll lose that bit, which means the readahead will just stop and have to restart (because it will look to the readahead code like it's not being effective). That's a far worse performance problem.