Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4153050pxf; Tue, 23 Mar 2021 04:12:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJylZbWkEAGU4eRA4QdjiXPsTNXwrXB2fPQld2LbHUGKQeTEZRAiITXkuiKt2EL6hCBspgjw X-Received: by 2002:a05:6402:145a:: with SMTP id d26mr4029844edx.182.1616497935171; Tue, 23 Mar 2021 04:12:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616497935; cv=none; d=google.com; s=arc-20160816; b=MVPGNROMteRlY/476wrtaInddn+21lgWYX8jMe1NeMDIz7oQiWG+nNENyX/G54gXNg awhRjrwOKdFTT9b48YSmVR99b0L/a/0/35WvGGs59jkE4hUT4xwnQ+7j0qGIy1IaSss8 IizuYSlUyc+GE7I/TXoEDnbw4NEHtzwb1ooMFAeuPwWygY/ADYq11R6CdW1q7TrkHukd 8bU4DMxyxn7BUMkJhzupbwAh/SJTNxzCa2EZ6AkutN9wOBZ2bTjq7pKSerGPuOsMehme NjUwpXWD2e0Kom2wInb7XISulOjVsJNpWhEb4NRh3luwrYB0eXGTq8g7V1FADgwgdANB 2m5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=jH64CwItD7+IDNu5T9rDTh1QqhWJhGHEtaUsuBHfOjI=; b=lLML4cHlN9CHNbh0v9GxdQ/bIheH4tA8bixnvtDeD/LFbF9U35lT/iEySt9U3RwUe/ h4ovUfixf0C8rO+YtO3Dzwh29CztV3ELm9vYf+JXyNalF0RL8+QVBMPkdJUsUL8gOeEj vwmj+vU5faTl8v7BWVZQl8lKIQZg+go14Aw6EiRXomQ/RcrKJIo1Vx7YVWWUMY2z8Bqo qOiQJuGy5wqbI15vPVi7Zoi64Pz4SJ/7LGEa4iDK6uw+wxU861Nb6yd36q4EuBq8UqHa xs5bOXBTQta6l3eRA+awFqQmgzVHEKCWtxoxDwkHJxhdhw30cGkWraucHlf45em0GTFZ A6Mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KtR9TPuX; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e2si12991933ejd.17.2021.03.23.04.11.43; Tue, 23 Mar 2021 04:12:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KtR9TPuX; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230236AbhCWLJY (ORCPT + 99 others); Tue, 23 Mar 2021 07:09:24 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54325 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230433AbhCWLJF (ORCPT ); Tue, 23 Mar 2021 07:09:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616497744; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jH64CwItD7+IDNu5T9rDTh1QqhWJhGHEtaUsuBHfOjI=; b=KtR9TPuXH2C6X3zmB0VHWNc0CGD1dHNX50z+7Tm00fB7ZAl2qZ4JahD+DfbD8BrPse85RT pgWew+WoB2+tXQKVp0oPuajpoFkbK/7o5xTx8lIDO1at2DzP1oBd8mZ+qj8VWQQ1U9Zq/z VFaAlJrzPMYIOduOL8JzwI+1+rZOtFE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-351-XYjXjMkUNaOx1Y9hR_U8xA-1; Tue, 23 Mar 2021 07:09:01 -0400 X-MC-Unique: XYjXjMkUNaOx1Y9hR_U8xA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9D7881009466; Tue, 23 Mar 2021 11:08:59 +0000 (UTC) Received: from carbon (unknown [10.36.110.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38A1F5D9F0; Tue, 23 Mar 2021 11:08:52 +0000 (UTC) Date: Tue, 23 Mar 2021 12:08:51 +0100 From: Jesper Dangaard Brouer To: Mel Gorman Cc: Chuck Lever III , Andrew Morton , Vlastimil Babka , Christoph Hellwig , Alexander Duyck , Matthew Wilcox , LKML , Linux-Net , Linux-MM , Linux NFS Mailing List , brouer@redhat.com Subject: Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator Message-ID: <20210323120851.18d430cf@carbon> In-Reply-To: <20210322205827.GJ3697@techsingularity.net> References: <20210322091845.16437-1-mgorman@techsingularity.net> <20210322194948.GI3697@techsingularity.net> <0E0B33DE-9413-4849-8E78-06B0CDF2D503@oracle.com> <20210322205827.GJ3697@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, 22 Mar 2021 20:58:27 +0000 Mel Gorman wrote: > On Mon, Mar 22, 2021 at 08:32:54PM +0000, Chuck Lever III wrote: > > >> It is returning some confusing (to me) results. I'd like > > >> to get these resolved before posting any benchmark > > >> results. > > >> > > >> 1. When it has visited every array element, it returns the > > >> same value as was passed in @nr_pages. That's the N + 1th > > >> array element, which shouldn't be touched. Should the > > >> allocator return nr_pages - 1 in the fully successful case? > > >> Or should the documentation describe the return value as > > >> "the number of elements visited" ? > > >> > > > > > > I phrased it as "the known number of populated elements in the > > > page_array". > > > > The comment you added states: > > > > + * For lists, nr_pages is the number of pages that should be allocated. > > + * > > + * For arrays, only NULL elements are populated with pages and nr_pages > > + * is the maximum number of pages that will be stored in the array. > > + * > > + * Returns the number of pages added to the page_list or the index of the > > + * last known populated element of page_array. > > > > > > > I did not want to write it as "the number of valid elements > > > in the array" because that is not necessarily the case if an array is > > > passed in with holes in the middle. I'm open to any suggestions on how > > > the __alloc_pages_bulk description can be improved. > > > > The comments states that, for the array case, a /count/ of > > pages is passed in, and an /index/ is returned. If you want > > to return the same type for lists and arrays, it should be > > documented as a count in both cases, to match @nr_pages. > > Consumers will want to compare @nr_pages with the return > > value to see if they need to call again. > > > > Then I'll just say it's the known count of pages in the array. That > might still be less than the number of requested pages if holes are > encountered. > > > > The definition of the return value as-is makes sense for either a list > > > or an array. Returning "nr_pages - 1" suits an array because it's the > > > last valid index but it makes less sense when returning a list. > > > > > >> 2. Frequently the allocator returns a number smaller than > > >> the total number of elements. As you may recall, sunrpc > > >> will delay a bit (via a call to schedule_timeout) then call > > >> again. This is supposed to be a rare event, and the delay > > >> is substantial. But with the array-based API, a not-fully- > > >> successful allocator call seems to happen more than half > > >> the time. Is that expected? I'm calling with GFP_KERNEL, > > >> seems like the allocator should be trying harder. > > >> > > > > > > It's not expected that the array implementation would be worse *unless* > > > you are passing in arrays with holes in the middle. Otherwise, the success > > > rate should be similar. > > > > Essentially, sunrpc will always pass an array with a hole. > > Each RPC consumes the first N elements in the rq_pages array. > > Sometimes N == ARRAY_SIZE(rq_pages). AFAIK sunrpc will not > > pass in an array with more than one hole. Typically: > > > > .....PPPP > > > > My results show that, because svc_alloc_arg() ends up calling > > __alloc_pages_bulk() twice in this case, it ends up being > > twice as expensive as the list case, on average, for the same > > workload. > > > > Ok, so in this case the caller knows that holes are always at the > start. If the API returns an index that is a valid index and populated, > it can check the next index and if it is valid then the whole array > must be populated. > > Right now, the implementation checks for populated elements at the *start* > because it is required for calling prep_new_page starting at the correct > index and the API cannot make assumptions about the location of the hole. > > The patch below would check the rest of the array but note that it's > slower for the API to do this check because it has to check every element > while the sunrpc user could check one element. Let me know if a) this > hunk helps and b) is desired behaviour. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c83d38dfe936..4bf20650e5f5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5107,6 +5107,9 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > } else { > while (prep_index < nr_populated) > prep_new_page(page_array[prep_index++], 0, gfp, 0); > + > + while (nr_populated < nr_pages && page_array[nr_populated]) > + nr_populated++; > } > > return nr_populated; I do know that I suggested moving prep_new_page() out of the IRQ-disabled loop, but maybe was a bad idea, for several reasons. All prep_new_page does is to write into struct page, unless some debugging stuff (like kasan) is enabled. This cache-line is hot as LRU-list update just wrote into this cache-line. As the bulk size goes up, as Matthew pointed out, this cache-line might be pushed into L2-cache, and then need to be accessed again when prep_new_page() is called. Another observation is that moving prep_new_page() into loop reduced function size with 253 bytes (which affect I-cache). ./scripts/bloat-o-meter mm/page_alloc.o-prep_new_page-outside mm/page_alloc.o-prep_new_page-inside add/remove: 18/18 grow/shrink: 0/1 up/down: 144/-397 (-253) Function old new delta __alloc_pages_bulk 1965 1712 -253 Total: Before=60799, After=60546, chg -0.42% Maybe it is better to keep prep_new_page() inside the loop. This also allows list vs array variant to share the call. And it should simplify the array variant code. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer [PATCH] mm: move prep_new_page inside IRQ disabled loop From: Jesper Dangaard Brouer ./scripts/bloat-o-meter mm/page_alloc.o-prep_new_page-outside mm/page_alloc.o-prep_new_page-inside add/remove: 18/18 grow/shrink: 0/1 up/down: 144/-397 (-253) Function old new delta __alloc_pages_bulk 1965 1712 -253 Total: Before=60799, After=60546, chg -0.42% Signed-off-by: Jesper Dangaard Brouer --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 88a5c1ce5b87..b4ff09b320bc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5096,11 +5096,13 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, else page_array[nr_populated] = page; nr_populated++; + prep_new_page(page, 0, gfp, 0); } local_irq_restore(flags); /* Prep pages with IRQs enabled. */ +/* if (page_list) { list_for_each_entry(page, page_list, lru) prep_new_page(page, 0, gfp, 0); @@ -5108,7 +5110,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, while (prep_index < nr_populated) prep_new_page(page_array[prep_index++], 0, gfp, 0); } - +*/ return nr_populated; failed_irq: