Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp25796pxb; Wed, 30 Mar 2022 21:54:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzHS7F3gdNCEvQ5uvG9ekZTTZwqKsgGYdJvpVVAd8LEK+39Koi0mzq+EdJlGy4DoTRiOJZq X-Received: by 2002:aa7:8753:0:b0:4fb:7b8b:44df with SMTP id g19-20020aa78753000000b004fb7b8b44dfmr3496479pfo.48.1648702467238; Wed, 30 Mar 2022 21:54:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648702467; cv=none; d=google.com; s=arc-20160816; b=yfhbWIfFTWPJFBWuqXhdahzypbyjfFpw5mcMzWQkUslOANV0APrcjCXzsG0ivFyuBQ g6ANUerBCjyxhzXqDceCs/sNKKidKQFtPgq+SqRvECczlfnQgK3CkvkoE6PlV15Wq1nN GXymdPuSDZFeNRU05PgFnc7V9jZ3m2uwk+quCPbv0Sp3h0S34InmGOalZkVrPK/ko7tQ 9Se7VJZKhdK2YDe1hIwOsHeCZXaPkpRFz7UOD7XkNJxKMfbCDBDA8H1GAhjgDMGpp6vI 68wP8ywvLOzGtuWxoeLUZIFMTbnmsHodcRHVanbbHwPVv6ezG5UxMfQs2cy4h2z1Na9u xl9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=bWhCH76wJNOPUkyUEGjWSMW9zgCG+hN7ltEMYwIFxNA=; b=xbx2AiopGha/N6rFPWzZgZDUqhX547ysrWIZ6eoaCMOg/yyHU1yvN1TVaaj28zgK+T oKe2ukl9RGsQSPVcav9F2bHkrwzSD5Q5sFLpPFTSjCyT/7bt0+iF8iJZ+rt6cgEdOCi8 bNj5QS/rarcSl2xmxk7N80XyQvIs8EXv4y4k5TTU4yU/6Ru4sZWwOYyjvmaTAU0cvl2T vS64/BPJGR0MePagMX4JxIfTAAYEd0XckCfbBIDjS54tC3DTS2/PyKzTw6K9uf5RLTrq D0dyf6Sx8Lvu98RNoCsNlVsSslkmvHqbcNwBQE2fEifRB/YrjBU6C9IYTINfJYdXqhzt Be9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=0hGaA5oU; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id a4-20020a656044000000b0038259e42e7csi19740347pgp.845.2022.03.30.21.54.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:54:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=0hGaA5oU; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DD601220318; Wed, 30 Mar 2022 20:33:40 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351363AbiC3VuJ (ORCPT + 99 others); Wed, 30 Mar 2022 17:50:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351360AbiC3Vtz (ORCPT ); Wed, 30 Mar 2022 17:49:55 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A23652E4A for ; Wed, 30 Mar 2022 14:48:09 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DF36F218F8; Wed, 30 Mar 2022 21:48:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1648676887; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bWhCH76wJNOPUkyUEGjWSMW9zgCG+hN7ltEMYwIFxNA=; b=0hGaA5oUgyBUmiCvjSMdMMYK2/vVDl0Paz7oDbnNAYjh4/os8TxHqDREx2rf9SuMr7F53f PPiEMTg+AkY+dCF5urd4okHT3fXJY5I8uMymT7Dg2hlIxe0LuC7sCpets5jwYGXTrvgCFm pTXzbFqNCftxSHTEqHt3TlfiEYtKrOM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1648676887; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bWhCH76wJNOPUkyUEGjWSMW9zgCG+hN7ltEMYwIFxNA=; b=GSR7mChVNeh+pGPYpRizx3j4jedxwMLSR2xHiSKZs5M7BkTT08RBLW4XLRH7B98bcZli3o XAi0+gkLkZCBhDAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A6B1613AF3; Wed, 30 Mar 2022 21:48:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id T5nBJxfQRGJxHgAAMHmgww (envelope-from ); Wed, 30 Mar 2022 21:48:07 +0000 Message-ID: <2b84aba9-7435-0073-59f0-410fddb6df7d@suse.cz> Date: Wed, 30 Mar 2022 23:48:07 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others Content-Language: en-US To: Zi Yan , Steven Rostedt Cc: Linus Torvalds , LKML , Mel Gorman , David Hildenbrand , Mike Rapoport , Oscar Salvador , Andrew Morton , Linux-MM References: <20220330154208.71aca532@gandalf.local.home> <20220330165337.7138810e@gandalf.local.home> <733F211D-9717-46A7-A0A2-40353E12F65A@nvidia.com> From: Vlastimil Babka In-Reply-To: <733F211D-9717-46A7-A0A2-40353E12F65A@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/30/22 23:43, Zi Yan wrote: > On 30 Mar 2022, at 17:25, Zi Yan wrote: > >> On 30 Mar 2022, at 16:53, Steven Rostedt wrote: >> >>> On Wed, 30 Mar 2022 16:29:28 -0400 >>> Zi Yan wrote: >>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index bdc8f60ae462..83a90e2973b7 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1108,6 +1108,8 @@ static inline void __free_one_page(struct page *page, >>>> >>>> buddy_pfn = __find_buddy_pfn(pfn, order); >>>> buddy = page + (buddy_pfn - pfn); >>>> + if (!page_is_buddy(page, buddy, order)) >>>> + goto done_merging; >>>> buddy_mt = get_pageblock_migratetype(buddy); >>>> >>>> if (migratetype != buddy_mt >>>> >>> >>> The above did not apply to Linus's tree, nor even the problem commit >>> (before or after), but I found where the code is, and added it manually. >>> >>> It does appear to allow the machine to boot. >>> >> I just pulled Linus’s tree and grabbed the diff. Anyway, thanks. >> >> I would like to get more understanding of the issue before blindly sending >> this as a fix. >> >> Merge the other thread: >>> >>> Not sure if this matters or not, but my kernel command line has: >>> >>> crashkernel=256M >>> >>> Could that have caused this to break? >> >> Unlikely, 256MB is MAX_ORDER_NR_PAGES aligned (MAX_ORDER is 11 here). >> __find_buddy_pfn() will not get any buddy_pfn from crashkernel memory >> region, since that would cross MAX_ORDER_NR_PAGES boundary. >> >> page_is_buddy() checks page_is_guard(buddy), PageBuddy(buddy), >> buddy_order(buddy), and page_zone_id(buddy), where page_is_guard(buddy) >> is always false since CONFIG_DEBUG_PAGEALLOC is not set in your config. >> So either PageBuddy(buddy) is false, buddy_order(buddy) != order, >> or page_zone_id(buddy) is not the same as page_zone_id(page). >> >> Do you mind adding the following code right before my fix code above >> and provide a complete boot log? I would like to understand what >> went wrong. Thanks. >> >> pr_info("buddy_pfn: %lx, PageBuddy: %d, buddy_order: %d (vs %d), page_zone_id: %d (vs %d)\n", >> buddy_pfn, PageBuddy(buddy), buddy_order(buddy), order, page_zone_id(buddy), >> page_zone_id(page)); >> >> > > This seems to be a bug in the original code too. > But "if (unlikely(has_isolate_pageblock(zone)))" is too rare to trigger it. > I do not see how having isolated pageblocks in a zone could get us away > from checking page_is_buddy(). IIRC the assumption was that pageblock bitmaps would always exist withing MAX_ORDER blocks. But here we are still under mem_init() where has_isolate_pageblock() couldn't happen. And the assumption could have been silently broken by subsequent memory init changes. > -- > Best Regards, > Yan, Zi