Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp3040066pxb; Tue, 12 Jan 2021 05:02:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJwQvMZ3OEMapUKZs+jjvA3xzenGBEr4CYSXCFSTDotN+UJWSAz5Bg+RkcqxxMVOwub0huUa X-Received: by 2002:a05:6402:407:: with SMTP id q7mr3321570edv.312.1610456569898; Tue, 12 Jan 2021 05:02:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610456569; cv=none; d=google.com; s=arc-20160816; b=bL/mxxJKioUq8gs+C5oEuW2Zo51Upyt6Bemh5UHbhLFrpjPLRRXJfzGDogW42CLF3z 4ia/1XL5ejmEic11AaYTni0lcp9SjCA9+ICJXnzHRkLwZAVAVCmaBRkEVtGPOSD/RnSl eZXBaDRTTdmuoUWShHEv7I9xK88b3Hzkazgrcfw89bWeOO/RmUUn7/Zi0pbW20wy81Gr GwMWqQDZpH5fyDwgXYB8XP1zwcnN8JVM7kuG0vKbNzmI4aAU4+MqeXvi+C94s3Ud28Vs FfPJEMZ9U0LbxWo6YJzlnM/GZ4xQgQ7ecR5HXcQUhcrAaV91oq5dkckKOcXY5BHAG5+A Zg4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject; bh=MotP8orzMK6140j5Zr6zHSlig8qJGneZv+ov3jqLg/c=; b=kkImSI4hFEQCQgkFFrCMadWhF75Fddykx7Dxg41YF38db91+2dVeqvZxlo8736NjY4 Hqui6Lw4WvI1pCB2MwvzLHz0D4DuyEbbqwLjP5Ghwv1G/Kx3zP+q89QtRfShdRs7K+i/ ku/cBSIEcUDRNTGRip5MRTsdaRckFN2WYlbe8F72TADfQvobAeW/+oTSoZnaeA9sZxAb Bs9Kz08/PvDdd5ktJFfoA1DzQAkWMC6MC4PpCEfzVJEd8mEp1+rO6IXHBb05HZIKDx8B C17iJx6Ylpwuya+XfWN/cWgK10zso7BZcjOsKJdDwsJ067syZlWhdo4fabWmGZ1Vgpb6 a4fQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f11si1069785ejw.546.2021.01.12.05.02.25; Tue, 12 Jan 2021 05:02:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727651AbhALLLP (ORCPT + 99 others); Tue, 12 Jan 2021 06:11:15 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:39110 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727474AbhALLLN (ORCPT ); Tue, 12 Jan 2021 06:11:13 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: gtucker) with ESMTPSA id E9E9A1F45284 Subject: Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging From: Guillaume Tucker To: Mike Rapoport , Andrea Arcangeli Cc: Andrew Morton , Stephen Rothwell , kernelci-results-staging@groups.io, "kernelci-results@groups.io" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He References: <5fd3e5d9.1c69fb81.f9e69.5028@mx.google.com> <127999c4-7d56-0c36-7f88-8e1a5c934cae@collabora.com> <20201213082314.GA198221@linux.ibm.com> <0633d44a-3796-8a1b-e5dc-99fc62aa4dc7@collabora.com> <20210103134753.GC832698@linux.ibm.com> <20210105091330.GD832698@linux.ibm.com> <28e59120-f8b9-7256-325a-1e4ca90887b5@collabora.com> Message-ID: Date: Tue, 12 Jan 2021 11:10:28 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <28e59120-f8b9-7256-325a-1e4ca90887b5@collabora.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/01/2021 10:53, Guillaume Tucker wrote: > On 05/01/2021 09:13, Mike Rapoport wrote: >> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote: >>> Hello Mike, >>> >>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote: >>>> Thanks for the logs, it seems that implicitly adding reserved regions to >>>> memblock.memory wasn't that bright idea :) >>> >>> Would it be possible to somehow clean up the hack then? >>> >>> The only difference between the clean solution and the hack is that >>> the hack intended to achieved the exact same, but without adding the >>> reserved regions to memblock.memory. >> >> I didn't consider adding reserved regions to memblock.memory as a clean >> solution, this was still a hack, but I didn't think that things are that >> fragile. >> >> I still think we cannot rely on memblock.reserved to detect >> memory/zone/node sizes and the boot failure reported here confirms this. >> >>> The comment on that problematic area says the reserved area cannot be >>> used for DMA because of some unexplained hw issue, and that doing so >>> prevents booting, but since the area got reserved, even with the clean >>> solution, it shouldn't have never been used for DMA? >>> >>> So I can only imagine that the physical memory region is way more >>> problematic than just for DMA. It sounds like that anything that >>> touches it, including the CPU, will hang the system, not just DMA. It >>> sounds somewhat similar to the other e820 direct mapping issue on x86? >> >> My understanding is that the boot failed because when I implicitly added >> the reserved region to memblock.memory the memory size seen by >> free_area_init() jumped from 2G to 4G because the reserved area was close >> to 4G. The very first allocation would get a chunk from slightly below of >> 4G and as there is no real memory there, the kernel would crash. >> >>> If you want to test the hack on the arm board to check if it boots you >>> can use the below commit: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc >> >> My take is your solution would boot with this memory configuration, but I >> still don't think that using memblock.reserved for zone/node sizing is >> correct. > > The rk3288 platform has now been failing to boot for nearly a > month on linux-next: > > https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/ > > Until a fix or a new version of this patch is made, would it be > possible to drop it or revert it so the platform become usable > again? > > Or if you want, I can make a cleaned-up version of my hack to > ignore the problematic region if you still need your patch to be > on linux-next, but that would probably be less than ideal. By the way, another bisection found that this commit is also breaking tegra124-nyan-big but only with both CONFIG_EFI=y CONFIG_ARM_LPAE=y enabled: https://kernelci.org/test/case/id/5ff6b1e26cf19f3b10c94cc5/ The plain multi_v7_defconfig is booting fine: https://kernelci.org/test/plan/id/5ff6b0a1db91b8a2b9c94cba/ I haven't looked into this one or tried to make it boot like rk3288, but please let me know if there's anything there that can be done to help. Thanks, Guillaume