Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp190531ybt; Tue, 16 Jun 2020 21:06:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzsAFQ/DdpLzIZ0jI7zVeusqg0pFUxAJWhs8bPfGZR+5caJOrJY+RJLlmAC8IkSUnZUvMYZ X-Received: by 2002:a17:906:a0f:: with SMTP id w15mr6114564ejf.332.1592366768737; Tue, 16 Jun 2020 21:06:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592366768; cv=none; d=google.com; s=arc-20160816; b=d6tl9b6PmJvzDgjV4ra+8+2EScR/6DV6dYKcZn+qvv8WYi3OvWwMVcXfhkvdiv5wEB a38bE//Zj9gHElrf1D15+eqQvMHkpmfBkYAfGIajnas5F2o2hTC759eXAsOzSpI8l9Pe ezePQPIb7KTCAuWazwcRlJ07MTmakJR0wYTmcZ+ZBvx27n/R6ZizhOkhuzU5X8/AvbGi 68mpbCvH+NJzCQq0mUzZNvTU0eRt7p+uvzcT/GejdzRiXHsCf0XCsOXMYwj3Nd6D8yMz LWQis7GO4zyzJq3yy5q+BG8MrRviO5syPDqeTXE++LPtBjvMoIkCBcwSt98K/n5kKRX9 3jhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=dUjZV5HGPyTEJs9+tiHKC3ItBOnfHi3MbbIcDCIeAMo=; b=R+18tHX6rhCXyPcHVF/lpVOqPD+eeOg2fgNO1r1Kyxwd4izYvn+34uKUWIIMXSE11o s9RcFch7I2pHvUFqMpjwoizNvb0ILXNqxVGa3/kekOPNVTjJu2F0Fblf8xCPIuSKwrzK DhSeybKZtveg0UliDrz2dZRuvjaaSyHnHjNoosoAJzZeBJeBGOzXShGOLcaJyfbUQnAR TPdn93wCQOTWxVB+CYkA7IsClJNnDjn+3WQ7b2SSCvH+Cae3aMhIYiD5Vie1MdrF18n2 u/C+rN3XCBtzAzXvVi73BOdBRtFG9NVSSdrHDRs9VGjeDCQc8BkN14Hkl/p4HVlJspi6 6XGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/c8YMsF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dn12si15836514ejc.287.2020.06.16.21.05.46; Tue, 16 Jun 2020 21:06:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/c8YMsF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726681AbgFQED5 (ORCPT + 99 others); Wed, 17 Jun 2020 00:03:57 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:60323 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725808AbgFQED5 (ORCPT ); Wed, 17 Jun 2020 00:03:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592366635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dUjZV5HGPyTEJs9+tiHKC3ItBOnfHi3MbbIcDCIeAMo=; b=g/c8YMsFcH4210ogGNwA3u3UDa5xQ/Uym0u7oUrjr3O/rZ7zSMYCnoLMAUbRTi+vpF11DH eL6UKlNYzKx8hRwie9HohvQ0EjOTF/eMfvIk9r5aCifPDDZwF4hwFsKb75v6MgfjAFetAJ c0qpbKbePujtBDu38UJWqkZ76fAHTmE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-481-AybmVhfQO_K0_WKlen-EGw-1; Wed, 17 Jun 2020 00:03:48 -0400 X-MC-Unique: AybmVhfQO_K0_WKlen-EGw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 563581009618; Wed, 17 Jun 2020 04:03:46 +0000 (UTC) Received: from localhost (ovpn-12-37.pek2.redhat.com [10.72.12.37]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D642379317; Wed, 17 Jun 2020 04:03:44 +0000 (UTC) Date: Wed, 17 Jun 2020 12:03:42 +0800 From: Baoquan He To: Jaewon Kim Cc: Vlastimil Babka , =?utf-8?B?6rmA7J6s7JuQ?= , "mgorman@techsingularity.net" , "minchan@kernel.org" , "mgorman@suse.de" , "hannes@cmpxchg.org" , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?utf-8?B?7J207Jqp7YOd?= , =?utf-8?B?6rmA7LKg66+8?= Subject: Re: (2) [PATCH v2] page_alloc: consider highatomic reserve in wmartermark fast Message-ID: <20200617040342.GC3346@MiWiFi-R3L-srv> References: <26a6a2c2-56e6-1a34-84ac-8ff9f74c34ce@suse.cz> <20200613025102.12880-1-jaewon31.kim@samsung.com> <20200613094228.GB3346@MiWiFi-R3L-srv> <20200613231736.GJ20367@MiWiFi-R3L-srv> <20200616073002epcms1p12a14cc85b0aebf8ba88bedee7495b34f@epcms1p1> <20200616141717.GN20367@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/17/20 at 12:46am, Jaewon Kim wrote: ... > > > >>> i.e) > > > >>> In following situation, watermark check fails (9MB - 8MB < 4MB) though there are > > > >>> enough free (9MB - 4MB > 4MB). If this is really matter, we need to > > > >>> count highatomic > > > >>> free accurately. > > > >> > > > >> Seems to make sense. We only use zone->nr_reserved_highatomic to account > > > >> the reserved highatomic, don't track how much have been used for > > > >> highatomic allocation. But not sure if this will happen often, or just a > > > >> very rare case, whether taking that into account will impact anything. > > > > > > > >Unfortunately there's a problem when trying to account free pages of a migrate > > > >type exactly, as e.g. during reserve_highatomic_pageblock(), some pages might be > > > >in pcplist of other cpu with other migratetype, and once they are freed, the > > > >buddy merging will merge the different migratetypes and distort the accounting. > > > > Yeah, the migratetype could have been cached in page->index before we > > finish the reserve_highatomic_pageblock(). Seems we take a very coarse > > grained way to do the highatomic reservation and accounting. When I went > > through the related code, found out we call > > reserve_highatomic_pageblock() if below condition is met. So what if > > order is 1, and all other pages on the page block have been used? Do we > > possibly have this kind of extreme case? > > If I correctly understand your question, yes I think so. > If a hight-order free page was allocated on ALLOC_HARDER context, and the page > was the last order-1, then zone->nr_reserved_highatomic will be increased by > pageblock_nr_pages even though there was actually no free page moved to the > highatomic free page list. Exactly. > > The highatomic logic, I think, was originally designed to reserve in > that coarse grained > way to mitigate highatomic allocation failure. > > > > > From Jaewon's captured information, we can see that the available free > > highatomic is even less than half the zone->nr_reserved_highatomic. > > Should we at least limit the reservation to the case with a bigger > > order. E.g 1/2 of pageblock_nr_pages? Please correct me if I am wrong or > > miss anyting. > > > > I do not know well, but I think high-order lower than 1/2 of pageblock_nr_pages > also should be considered. i.e) a system using huge order-3 atomic allocation > in a short time. > > > "reserved_highatomic:32768KB and actually 14232kB free." > > I think this unwanted case might happen by freed pages. The pages allocated > for non-high-atomic also would be freed back into highatomic free > list. But I guess > there was sudden huge use of highatomic and partially freed. OK, thanks for sharing. So we can leave with it, may do some improvement if any issue is reported in the future. > > > > > get_page_from_freelist > > { > > ... > > if (unlikely(order && (alloc_flags & ALLOC_HARDER))) > > reserve_highatomic_pageblock(page, zone, order); > > ... > > } > > > >Fixing this for all migratetypes would have performance overhead, so we only do > > > >that for MIGRATE_ISOLATE which is not that frequent (and it took a while to > > > >eliminate all corner cases), and CMA which doesn't change pageblocks dynamically. > > > > > > AFAIK we do not account free cma in pcp either. But yes accurate check could be > > > overhead. For example, __mod_zone_freepage_state should account highatomic free > > > as cma free. And we may see some incorrect accounting issue. > > > > > > > > > > >So either we live with the possible overreclaim due to inaccurate counting per > > > >your example above, or we instead let order-0 atomic allocations use highatomic > > > >reserves. > > > > > > > > > > Additionally regarding existing Mel's comment, let me remove some of it if you > > > don't mind. > > > > > > /* > > > * Fast check for order-0 only. If this fails then the reserves > > > - * need to be calculated. There is a corner case where the check > > > - * passes but only the high-order atomic reserve are free. If > > > - * the caller is !atomic then it'll uselessly search the free > > > - * list. That corner case is then slower but it is harmless. > > > + * need to be calculated. > > > */ > > > > > > I will prepare v3 patch. > > > Thank you again. > > > > > >