Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp3038550pxb; Tue, 12 Jan 2021 05:01:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyViMgY2fmyWG5FJWb6mR4H31XCFdx5WZK6MCrsLJdlILE+/KRRx2f6zgmV2+tZeAVgKbNc X-Received: by 2002:a05:6402:22d6:: with SMTP id dm22mr1378383edb.255.1610456468338; Tue, 12 Jan 2021 05:01:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610456468; cv=none; d=google.com; s=arc-20160816; b=Y2b2iQsDsedIujx4u9NYCWd9pMeYPZ0uyVbTx/NvZ2dVMPIPni0iDGPpv+dlHFdD2R OxhEQzyY/fyQPwqBXpAmKA3l0lhWa+VPcidyN2Hx1IHEUTYOuPLsLWt0Xu84qk929vAU zwOXaUFIc3A/4Hq7HhhdX8pfO9cVHdUxRddUMa1yGpEjW2/5eJARmqPDRDDKIgOd3iLQ 3lBsZWPN3H1qpBI2qfz2Wb/XwAq632DtbfbZTUyIvmB+otO7nglYs8qQSdenAOkWgdJO dkUeaCE5H2jcfFg0w9z1bRwdloYmx0j4Li8wrkxf4ARhECqa6DXvQc4fafPpF14P2LVz 2BmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=0vHcoQhaW/93pSySXKBuO7mk6gAqcZYU9QHPiFBztSk=; b=g2IGSQy0hra4Ofy2kXT5vAv8EdqguOvVmuwAkf60c2nO+aFO1HOyqE+I0gW1X2oGqF dDedJp1Opn8nlYTwY5ZGff8Mmco7m/hyeHLq+AQjoE4i5J7WbEzRU/9ShJ0aafKtfCOf vCdyMxMVgvOhpf5RrZ5Tzag5vNN/I3mXSCq1bN3fK1GiPYwN2Hhsx45QzlnexnQis1Y0 zrWnvRO5OmkAGQRnDt35zIJnyh7Bvx7Yu7xUc47EuneXqbLDbDTW62sH56HhCGBMwFF7 LvWF8bCvP4s+ozYbwWs+dDPV9Wyx1f6ZPVapaUYiTzka21Hd/Bdb/rNLZE0Ggotj//66 TruQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e25si1113626ejt.153.2021.01.12.05.00.44; Tue, 12 Jan 2021 05:01:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728548AbhALKya (ORCPT + 99 others); Tue, 12 Jan 2021 05:54:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727976AbhALKya (ORCPT ); Tue, 12 Jan 2021 05:54:30 -0500 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB610C061786 for ; Tue, 12 Jan 2021 02:53:49 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: gtucker) with ESMTPSA id DC3B61F413F8 Subject: Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging To: Mike Rapoport , Andrea Arcangeli Cc: Andrew Morton , Stephen Rothwell , kernelci-results-staging@groups.io, "kernelci-results@groups.io" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He References: <5fd3e5d9.1c69fb81.f9e69.5028@mx.google.com> <127999c4-7d56-0c36-7f88-8e1a5c934cae@collabora.com> <20201213082314.GA198221@linux.ibm.com> <0633d44a-3796-8a1b-e5dc-99fc62aa4dc7@collabora.com> <20210103134753.GC832698@linux.ibm.com> <20210105091330.GD832698@linux.ibm.com> From: Guillaume Tucker Message-ID: <28e59120-f8b9-7256-325a-1e4ca90887b5@collabora.com> Date: Tue, 12 Jan 2021 10:53:45 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210105091330.GD832698@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/01/2021 09:13, Mike Rapoport wrote: > On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote: >> Hello Mike, >> >> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote: >>> Thanks for the logs, it seems that implicitly adding reserved regions to >>> memblock.memory wasn't that bright idea :) >> >> Would it be possible to somehow clean up the hack then? >> >> The only difference between the clean solution and the hack is that >> the hack intended to achieved the exact same, but without adding the >> reserved regions to memblock.memory. > > I didn't consider adding reserved regions to memblock.memory as a clean > solution, this was still a hack, but I didn't think that things are that > fragile. > > I still think we cannot rely on memblock.reserved to detect > memory/zone/node sizes and the boot failure reported here confirms this. > >> The comment on that problematic area says the reserved area cannot be >> used for DMA because of some unexplained hw issue, and that doing so >> prevents booting, but since the area got reserved, even with the clean >> solution, it shouldn't have never been used for DMA? >> >> So I can only imagine that the physical memory region is way more >> problematic than just for DMA. It sounds like that anything that >> touches it, including the CPU, will hang the system, not just DMA. It >> sounds somewhat similar to the other e820 direct mapping issue on x86? > > My understanding is that the boot failed because when I implicitly added > the reserved region to memblock.memory the memory size seen by > free_area_init() jumped from 2G to 4G because the reserved area was close > to 4G. The very first allocation would get a chunk from slightly below of > 4G and as there is no real memory there, the kernel would crash. > >> If you want to test the hack on the arm board to check if it boots you >> can use the below commit: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc > > My take is your solution would boot with this memory configuration, but I > still don't think that using memblock.reserved for zone/node sizing is > correct. The rk3288 platform has now been failing to boot for nearly a month on linux-next: https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/ Until a fix or a new version of this patch is made, would it be possible to drop it or revert it so the platform become usable again? Or if you want, I can make a cleaned-up version of my hack to ignore the problematic region if you still need your patch to be on linux-next, but that would probably be less than ideal. Thanks, Guillaume