Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp5518092ybv; Mon, 17 Feb 2020 22:15:35 -0800 (PST) X-Google-Smtp-Source: APXvYqzCodxiKlRHy0WR2hQobvRenpaQbPiEe2nNZQYDardJkRzHzL8ASk02JAX0gST21SaVxE6u X-Received: by 2002:a05:6830:1db3:: with SMTP id z19mr14946483oti.292.1582006535225; Mon, 17 Feb 2020 22:15:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582006535; cv=none; d=google.com; s=arc-20160816; b=R/KuIfI8MnevPyOs/wuP8aj+ZBomdqwOXDw/ZcSCYT6D7L+phTjWkf/Db5tmoJzjkp ulI5EbRkJ/YXsD38PPZniK6/g6AjvfA8GRolfdtWLEtoujgkYDrGSbkPM8nUotdJcpaJ QLwvrAH826lb1oZx+azyVrmIh7UcfQvDf++ncLJHyHQGNB6Bkn/bT1Jm+zwaA/puJGEQ PQaTT5uzuqXms/VNWIfeCJM9XHe9FmrzpBmQa53jDbApN2jhI6GDWgoImgpa5fR41HHy f2XXlcgn/K2/gfAzssWOqDy/LKulaV3NmXH8LaBXkpV/v+umhPNdN6stNKYr7zM0jHzm F3tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=tGTcNrfg+pQ4GciAiCGzOr1vSSuTY5p1xQ3ZNP3om48=; b=V7B4J0R7K8wUDHxvbarE2D3mS6EzlTdaolfm1dikmFg1iqlKW53vrtEUPox8dkQX7H EajwCen/3eFXHCr2yktdz3z9zqF7b/WizpE8JSHCx3bG2frginNAzW+cy+GRJqdrcIxC WRPhVBJFKFzVHTMmBr7c9tbyxzCkORJyIDYrU006WQaU2UrhmOKaHXlWntg9+3eD1Lrd Gd11PrlZozWoZbvDyhe6llAU2Sn4pWO9y7h7kJoDowC0T4PVvpPlgRXGiKKB2mY9HmuN Tkvw3skxeXHdgEWcAL4AqP5CZn2SQ29bkQUJKkeGHQqRfNjQzPXl3N6inAe6z6Z2mLbF eGTA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i7si7116596oib.63.2020.02.17.22.15.22; Mon, 17 Feb 2020 22:15:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726168AbgBRGPG (ORCPT + 99 others); Tue, 18 Feb 2020 01:15:06 -0500 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:55597 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726017AbgBRGPG (ORCPT ); Tue, 18 Feb 2020 01:15:06 -0500 Received: from dread.disaster.area (pa49-179-138-28.pa.nsw.optusnet.com.au [49.179.138.28]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id ABD677E9E5B; Tue, 18 Feb 2020 17:15:00 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1j3w9r-0006E2-AR; Tue, 18 Feb 2020 17:14:59 +1100 Date: Tue, 18 Feb 2020 17:14:59 +1100 From: Dave Chinner To: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, ocfs2-devel@oss.oracle.com, linux-xfs@vger.kernel.org Subject: Re: [PATCH v6 07/19] mm: Put readahead pages in cache earlier Message-ID: <20200218061459.GM10776@dread.disaster.area> References: <20200217184613.19668-1-willy@infradead.org> <20200217184613.19668-12-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200217184613.19668-12-willy@infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=W5xGqiek c=1 sm=1 tr=0 a=zAxSp4fFY/GQY8/esVNjqw==:117 a=zAxSp4fFY/GQY8/esVNjqw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=l697ptgUJYAA:10 a=JfrnYn6hAAAA:8 a=7-415B0cAAAA:8 a=6Sg7X3AK5n0gSZn-CawA:9 a=XfLjuTNYxNuElQ0I:21 a=atnrvcmCVHYDzj7Q:21 a=CjuIK1q_8ugA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 17, 2020 at 10:45:52AM -0800, Matthew Wilcox wrote: > From: "Matthew Wilcox (Oracle)" > > At allocation time, put the pages in the cache unless we're using > ->readpages. Add the readahead_for_each() iterator for the benefit of > the ->readpage fallback. This iterator supports huge pages, even though > none of the filesystems to be converted do yet. This could be better written - took me some time to get my head around it and the code. "When populating the page cache for readahead, mappings that don't use ->readpages need to have their pages added to the page cache before ->readpage is called. Do this insertion earlier so that the pages can be looked up immediately prior to ->readpage calls rather than passing them on a linked list. This early insert functionality is also required by the upcoming ->readahead method that will replace ->readpages. Optimise and simplify the readpage loop by adding a readahead_for_each() iterator to provide the pages we need to read. This iterator also supports huge pages, even though none of the filesystems have been converted to use them yet." > +static inline struct page *readahead_page(struct readahead_control *rac) > +{ > + struct page *page; > + > + if (!rac->_nr_pages) > + return NULL; Hmmmm. > + > + page = xa_load(&rac->mapping->i_pages, rac->_start); > + VM_BUG_ON_PAGE(!PageLocked(page), page); > + rac->_batch_count = hpage_nr_pages(page); So we could have rac->_nr_pages = 2, and then we get an order 2 large page returned, and so rac->_batch_count = 4. > + > + return page; > +} > + > +static inline void readahead_next(struct readahead_control *rac) > +{ > + rac->_nr_pages -= rac->_batch_count; > + rac->_start += rac->_batch_count; This results in rac->_nr_pages = -2 (or a huge positive number). That means that readahead_page() will not terminate when it should, and potentially will panic if it doesn't find the page that it thinks should be there at rac->_start + 4... > +#define readahead_for_each(rac, page) \ > + for (; (page = readahead_page(rac)); readahead_next(rac)) > + > /* The number of pages in this readahead block */ > static inline unsigned int readahead_count(struct readahead_control *rac) > { > diff --git a/mm/readahead.c b/mm/readahead.c > index bdc5759000d3..9e430daae42f 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -113,12 +113,11 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, > > EXPORT_SYMBOL(read_cache_pages); > > -static void read_pages(struct readahead_control *rac, struct list_head *pages, > - gfp_t gfp) > +static void read_pages(struct readahead_control *rac, struct list_head *pages) > { > const struct address_space_operations *aops = rac->mapping->a_ops; > + struct page *page; > struct blk_plug plug; > - unsigned page_idx; > > blk_start_plug(&plug); > > @@ -127,19 +126,13 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages, > readahead_count(rac)); > /* Clean up the remaining pages */ > put_pages_list(pages); > - goto out; > - } > - > - for (page_idx = 0; page_idx < readahead_count(rac); page_idx++) { > - struct page *page = lru_to_page(pages); > - list_del(&page->lru); > - if (!add_to_page_cache_lru(page, rac->mapping, page->index, > - gfp)) > + } else { > + readahead_for_each(rac, page) { > aops->readpage(rac->file, page); > - put_page(page); > + put_page(page); > + } > } Nice simplification and gets rid of the need for rac->mapping, but I still find the aops variable weird. > -out: > blk_finish_plug(&plug); > } > > @@ -159,6 +152,7 @@ void __do_page_cache_readahead(struct address_space *mapping, > unsigned long i; > loff_t isize = i_size_read(inode); > gfp_t gfp_mask = readahead_gfp_mask(mapping); > + bool use_list = mapping->a_ops->readpages; > struct readahead_control rac = { > .mapping = mapping, > .file = filp, [ I do find these unstructured mixes of declarations and initialisations dense and difficult to read.... ] > @@ -196,8 +190,14 @@ void __do_page_cache_readahead(struct address_space *mapping, > page = __page_cache_alloc(gfp_mask); > if (!page) > break; > - page->index = offset; > - list_add(&page->lru, &page_pool); > + if (use_list) { > + page->index = offset; > + list_add(&page->lru, &page_pool); > + } else if (add_to_page_cache_lru(page, mapping, offset, > + gfp_mask) < 0) { > + put_page(page); > + goto read; > + } Ok, so that's why you put read code at the end of the loop. To turn the code into spaghetti :/ How much does this simplify down when we get rid of ->readpages and can restructure the loop? This really seems like you're trying to flatten two nested loops into one by the use of goto.... Cheers, Dave. -- Dave Chinner david@fromorbit.com