Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2484856ybt; Tue, 16 Jun 2020 07:19:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz6lzPdSXV2/+SdzL862dt/20JaFIUvmft9t7GZRzPjoZYtV9FLyiTf/L+Zm56Sc2LZXVdJ X-Received: by 2002:a05:6402:642:: with SMTP id u2mr2792432edx.230.1592317197284; Tue, 16 Jun 2020 07:19:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592317197; cv=none; d=google.com; s=arc-20160816; b=m3Dc2hlb5WJF6b9KOg+aixM4hZsnm51n0LppsS6aBt7w5AQpz4pCkIElFHGCxNyUMk njEPqMMs9lsus57JpAljeOhATcfQF+pc2lh/eD00yy2IFmp2LLjZ+qz+jIxh8r3knRXs E2J5aGBaqxhptZtpqVsZ5GgeAI1vMfirI1wu7Jis/NduUHoeD7tIKoO8lZ2KrYKPAisI zCCFHZpckEJLcrPBziOJr0o7m9DrnSBU4gPOvkXdE8ro91VtSvap6+qT6Z4MluVh3l+n s/+GojBtV+EOAPMEKzI9K7YAUxwk6ZTfTXR/1n2gtfawjaPSw29ZSt1ljY5SoXfZgqUU 0L4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=5U79kuIEnYZOM70CXAnAxUaWje17186nlnT5EaRCkcY=; b=dX3yySBC4ZKjZoDr275xS6V7SfSp7F0UrTj/304y2i5jt0c080CHDTjvncnSW5DmGd JYieyHkSkdu/zStLk32DY/Wk3tg45VrBb+6/uZsfrrAWh56LEoBVNIeBj067vm+YmdO6 oRUuxbFSkHe4LKVFWYhF4hVIrdhjkjfbSF+ddye1OsZT9sK/atbB4GSQmpiTcTFl+Z7T MeiK1sE/1M1lImEdALIJEcCgAHQYOZyY9ORL+U29WYnTCdtjW+APiqxeLdDe2EaWXp/h MUHULP6b8x0kZ2p6fhzlZo5ACCwmo9OEJ9wFfZ50y4t76fF2o5OrryHdeCJ1kLH8isqd 464Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y07ojxM1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s28si8442027ejm.388.2020.06.16.07.19.33; Tue, 16 Jun 2020 07:19:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y07ojxM1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728947AbgFPORd (ORCPT + 99 others); Tue, 16 Jun 2020 10:17:33 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:43563 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728716AbgFPORb (ORCPT ); Tue, 16 Jun 2020 10:17:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592317048; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5U79kuIEnYZOM70CXAnAxUaWje17186nlnT5EaRCkcY=; b=Y07ojxM1Pp99UAOL5Q1+FmHWA9KbChD5dOemJouSw9OaX7+QTRxk02AiYT5FQfVUKZ613e KLKHcAugUo7PdUm3dVIID8LFdpyLik6DKiWn0YAlMTrsMqYxAFlQ0BsBChhb1gCuMNcdqk 8hnrPORQQBEB7uYbP0YNAKTC1kNpqeQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-477-pGmEWh4tNOuNJJ3pS5vacA-1; Tue, 16 Jun 2020 10:17:24 -0400 X-MC-Unique: pGmEWh4tNOuNJJ3pS5vacA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6EA8A8730EE; Tue, 16 Jun 2020 14:17:22 +0000 (UTC) Received: from localhost (ovpn-12-37.pek2.redhat.com [10.72.12.37]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 152D719C71; Tue, 16 Jun 2020 14:17:20 +0000 (UTC) Date: Tue, 16 Jun 2020 22:17:17 +0800 From: Baoquan He To: Vlastimil Babka , =?utf-8?B?6rmA7J6s7JuQ?= Cc: Jaewon Kim , "mgorman@techsingularity.net" , "minchan@kernel.org" , "mgorman@suse.de" , "hannes@cmpxchg.org" , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , =?utf-8?B?7J207Jqp7YOd?= , =?utf-8?B?6rmA7LKg66+8?= Subject: Re: (2) [PATCH v2] page_alloc: consider highatomic reserve in wmartermark fast Message-ID: <20200616141717.GN20367@MiWiFi-R3L-srv> References: <26a6a2c2-56e6-1a34-84ac-8ff9f74c34ce@suse.cz> <20200613025102.12880-1-jaewon31.kim@samsung.com> <20200613094228.GB3346@MiWiFi-R3L-srv> <20200613231736.GJ20367@MiWiFi-R3L-srv> <20200616073002epcms1p12a14cc85b0aebf8ba88bedee7495b34f@epcms1p1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200616073002epcms1p12a14cc85b0aebf8ba88bedee7495b34f@epcms1p1> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/16/20 at 04:30pm, 김재원 wrote: > >>> > > <4>[ 6207.637627] [3: Binder:9343_3:22875] Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB > >> > >> Checked this mem info, wondering why there's no 'reserved_highatomic' > >> printing in all these examples. > > > >Yeah, it better be printed, especially after it's included in watermark > >calculation, so we're less confused by reports of allocation failure where > >watermarks are seemingly ok. > > > > > Hello Vlastimil and Baoquan > > The log in previous mail was captured from kernel based on v4.14. > After adding the reserved_highatomic log, I finally got a new log below > Let me change description in next patch. > > There seems be reserved_highatomic:32768KB and actually 14232kB free. > > [ 4738.329298] kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null) > [ 4738.329325] kswapd0 cpuset=/ mems_allowed=0 > [ 4738.329339] CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1 > [ 4738.329350] Call trace: > [ 4738.329366] [<0000000000000000>] dump_backtrace+0x0/0x248 > [ 4738.329377] [<0000000000000000>] show_stack+0x18/0x20 > [ 4738.329390] [<0000000000000000>] __dump_stack+0x20/0x28 > [ 4738.329398] [<0000000000000000>] dump_stack+0x68/0x90 > [ 4738.329409] [<0000000000000000>] warn_alloc+0x104/0x198 > [ 4738.329417] [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0 > [ 4738.329427] [<0000000000000000>] zs_malloc+0x148/0x3d0 > [ 4738.329438] [<0000000000000000>] zram_bvec_rw+0x410/0x798 > [ 4738.329446] [<0000000000000000>] zram_rw_page+0x88/0xdc > [ 4738.329455] [<0000000000000000>] bdev_write_page+0x70/0xbc > [ 4738.329463] [<0000000000000000>] __swap_writepage+0x58/0x37c > [ 4738.329469] [<0000000000000000>] swap_writepage+0x40/0x4c > [ 4738.329478] [<0000000000000000>] shrink_page_list+0xc30/0xf48 > [ 4738.329486] [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c > [ 4738.329494] [<0000000000000000>] shrink_node_memcg+0x23c/0x618 > [ 4738.329501] [<0000000000000000>] shrink_node+0x1c8/0x304 > [ 4738.329509] [<0000000000000000>] kswapd+0x680/0x7c4 > [ 4738.329518] [<0000000000000000>] kthread+0x110/0x120 > [ 4738.329527] [<0000000000000000>] ret_from_fork+0x10/0x18 > [ 4738.329538] Mem-Info: > [ 4738.329574] active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0 > [ 4738.329593] Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no > [ 4738.329603] Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB [ 4738.329607] lowmem_reserve[]: 0 0 > [ 4738.329615] Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB > > > >... > > > >>> > > /* > >>> > > * Fast check for order-0 only. If this fails then the reserves > >>> > > @@ -3598,9 +3604,12 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, > >>> > > * the caller is !atomic then it'll uselessly search the free > >>> > > * list. That corner case is then slower but it is harmless. > >>> > > >>> > Do we need remove or adjust the code comment at this place? So Mel have > >>> > foreseen the corner case, just reclaiming to unreserve the highatomic > >>> > might be ignored. > >>> > > >>> > >>> Hello thank you for your comment. > >>> > >>> My previous mail to Vlastimil Babka seems to have html. > >>> Let me put again here because I also think the comment should be changed. > >>> I'd like to hear opinions from the open source community. > >> > >> Yeah, your replying mail to Vlastimil looks a little messy on format, I > >> didn't scroll down to check. > >> > >>> > >>> Additionally actually I think we need accurate counting of highatomic > >>> free after this patch. > >>> > >>> ---------------------------------------------------------------------------------------- > >>> > >>> As Mel also agreed with me in v1 mail thread, this highatomic reserved should > >>> be considered even in watermark fast. > >>> > >>> The comment, I think, may need to be changed. Prior to this patch, non > >>> highatomic > >>> allocation may do useless search, but it also can take ALL non highatomic free. > >>> > >>> With this patch, non highatomic allocation will NOT do useless search. Rather, > >>> it may be required direct reclamation even when there are some non > >>> high atomic free. > >>> > >>> i.e) > >>> In following situation, watermark check fails (9MB - 8MB < 4MB) though there are > >>> enough free (9MB - 4MB > 4MB). If this is really matter, we need to > >>> count highatomic > >>> free accurately. > >> > >> Seems to make sense. We only use zone->nr_reserved_highatomic to account > >> the reserved highatomic, don't track how much have been used for > >> highatomic allocation. But not sure if this will happen often, or just a > >> very rare case, whether taking that into account will impact anything. > > > >Unfortunately there's a problem when trying to account free pages of a migrate > >type exactly, as e.g. during reserve_highatomic_pageblock(), some pages might be > >in pcplist of other cpu with other migratetype, and once they are freed, the > >buddy merging will merge the different migratetypes and distort the accounting. Yeah, the migratetype could have been cached in page->index before we finish the reserve_highatomic_pageblock(). Seems we take a very coarse grained way to do the highatomic reservation and accounting. When I went through the related code, found out we call reserve_highatomic_pageblock() if below condition is met. So what if order is 1, and all other pages on the page block have been used? Do we possibly have this kind of extreme case? From Jaewon's captured information, we can see that the available free highatomic is even less than half the zone->nr_reserved_highatomic. Should we at least limit the reservation to the case with a bigger order. E.g 1/2 of pageblock_nr_pages? Please correct me if I am wrong or miss anyting. "reserved_highatomic:32768KB and actually 14232kB free." get_page_from_freelist { ... if (unlikely(order && (alloc_flags & ALLOC_HARDER))) reserve_highatomic_pageblock(page, zone, order); ... } > >Fixing this for all migratetypes would have performance overhead, so we only do > >that for MIGRATE_ISOLATE which is not that frequent (and it took a while to > >eliminate all corner cases), and CMA which doesn't change pageblocks dynamically. > > AFAIK we do not account free cma in pcp either. But yes accurate check could be > overhead. For example, __mod_zone_freepage_state should account highatomic free > as cma free. And we may see some incorrect accounting issue. > > > > >So either we live with the possible overreclaim due to inaccurate counting per > >your example above, or we instead let order-0 atomic allocations use highatomic > >reserves. > > > > Additionally regarding existing Mel's comment, let me remove some of it if you > don't mind. > > /* > * Fast check for order-0 only. If this fails then the reserves > - * need to be calculated. There is a corner case where the check > - * passes but only the high-order atomic reserve are free. If > - * the caller is !atomic then it'll uselessly search the free > - * list. That corner case is then slower but it is harmless. > + * need to be calculated. > */ > > I will prepare v3 patch. > Thank you again. >