Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S942270AbcJFQLi (ORCPT ); Thu, 6 Oct 2016 12:11:38 -0400 Received: from mail-by2nam01on0088.outbound.protection.outlook.com ([104.47.34.88]:53836 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S942199AbcJFQL2 (ORCPT ); Thu, 6 Oct 2016 12:11:28 -0400 X-Greylist: delayed 22700 seconds by postgrey-1.27 at vger.kernel.org; Thu, 06 Oct 2016 12:11:27 EDT Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Robert.Richter@cavium.com; Date: Thu, 6 Oct 2016 18:11:14 +0200 From: Robert Richter To: Ard Biesheuvel CC: Catalin Marinas , Will Deacon , David Daney , Mark Rutland , Hanjun Guo , "linux-arm-kernel@lists.infradead.org" , "linux-efi@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section Message-ID: <20161006161114.GH22012@rric.localdomain> References: <1475747527-32387-1-git-send-email-rrichter@cavium.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Originating-IP: [92.229.86.66] X-ClientProxiedBy: AM4PR01CA0009.eurprd01.prod.exchangelabs.com (10.164.74.147) To SN1PR07MB2351.namprd07.prod.outlook.com (10.169.127.17) X-MS-Office365-Filtering-Correlation-Id: 2595d235-3211-44cd-ddd8-08d3ee036c0b X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2351;2:zJpyTOgrfmkArY5BVxnRKXqGeTXxZ0Wct78ZuuMhoH0q5ruqjpsVW1jlLcAj2CD92JREgQzRfgvzzBszCW6w1uZra6C7vbZtni01r0bg0MEyOgZR82JwpO62aNbdzNzmaxMd48vAOBHA+yLZtWAQuwQmXICB6AQxA7NbzG9l0DuynBgd6neYGCliD96yioU6aI9pNafYY72kFvr3QI0dTw==;3:umD2YvZyrOSmOdI3p2wgRo/50w/ZpZYxeMM87UhnjO0Y0fWu9dN28ot+e9VkIMoglkKnwD5lSC/Hr1lwjqzmCyAt/AVHphLW0h/Wa3RrKqEgFxxqDdb0BR6UErVsmq8bPQ+KDcLpzAyI9qHdkS4Hqw==;25:fcXMcKx8taHSElZCgS+R0MjwyyFZsKX9nzvQINxlIYzaBGt+YdIwHqNZm2SXTisoSKfwS4fORITTDcvvFyaSxISP1IxjhWyRGIraJNLYfKewMmUqepaUAUYSYw/C2rCHFBXVXNTJECZD5/coL/Z6zwJp1sPf6skMjpTug1ElTmejGLUkS/po+8fMSs9i9yJK43VxQtqQEw+Mz1PbPRjzKpSmxevrr43jW+uQbZXtgsXWc+sOqvfmjugSS/TOioWPGhwim//fs7jML3pNwa393waoXWYbJOAI7U4vS8hUSaA6tzS8i6HNuvWkQfWw7RiT+DX3KorpP/l2EXqYhT0g6gUu0LfjpUWAwZ4JhuL+eF4ziKv/1N2mftrtVKprIxybsnpcQCYjXqcovirNlSxl1qAHjv/VMz1xW1oaQeIalgDltunWw8lXsNumWLBcmic2 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN1PR07MB2351; X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2351;31:NAMQ0AoVmp+Y6uSGNEy4S3sOvrKB+uNDp1WpYd9NjxwHc1Fqk7cGtKWaidE+7CYnsvJVSyO2sliUv2vQBIgB5re2FoZr+aOWH4kO2wZDRE/pZovKOxu09nZetqwspf8wjj5ogZ1XqiPUVk5uIXh0Wp7AZIAKj17oQY53vJUc0aK8P3aV4Z9F3WTssGYw+IS6V2IItO4ev7zeu8PSiJkxGaXCYxr9LH5RtF2F5lkVW2vk7wdtMLiQHEjkT4YsI2/ERMl/FPiPRZM5DSxjG65WIw==;20:QsecRquSm0lciuJLqtHGL0GCyeLhrrQ09aT0wFSbd3qYudB5wd4UeE3thIBoTutNwAxJYdfWq9C+LATPqTLKLYFwVjIeC5S5oH7t4wCXw0rkYjOIAsn0zraku1QRNFCfcN2gVYqsHWI0seC1PU1eEC+Ut0iGTLRzSMtz90zK9PGffKbfrawti1btJB2DfBqx8ySmyYITt2Qf7ND8o8xyq/qaNZW7N7IUZjOsvEJCtYlcqxH8wRJImjGvjjPq3dX1fSwsg+XJ6X4pzFeoxKAizkEJO1rFUSd0Zk/ZQne5fLPtyUdZ0WlvahvFoDBxklgc3/h66XpxlgCtF/jzG7tDXwTkKn4NI51rAbXWayS88iRUzkVFzMMLD1qJ6nIt8xrhSWqLX0HaNVwgLyotI5iiVgQUe4FAax6sK6JXCDI1smnnuT/kcUik2OQTWy4+0E8HHKDa8SCZRIX2EuiB+SWk25b8/wsTZdD+tITsqb2f8YMGDWQZDBeQ7Z42kuCDgIhi X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(20558992708506); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:SN1PR07MB2351;BCL:0;PCL:0;RULEID:;SRVR:SN1PR07MB2351; X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2351;4:/X9ZbHnm4qGi2Iq+Dx5eNg9vEcVoNflBJi3Pz97zZPbzncRfItwXWdYqfzQK+Mwp+/n0n/1K7IQLbGNwQCQebGov/rYiCPgvrGI38amJmyZdgqfGNkEQZ9romTU49/G2suqlrnjiiIJHN4ciTPspzv8hc14s1OyGf8EKJrFnsz8DYFZOjpH1BukNMt8digMiAAnNjqkJTrskIUjB3D+LDbwt615/mqWdu1c+qhHv03GWPmeWn/li9Q7LF+uqw/LWwZic6b+GO8qxZgx+2mqOk/ysOjTBPRjkKZVD8DmNC6vKJ305M460qkIy905FOvUNrGL3tTPRG3yzQMHvhrxLUI5P0U01InBFb1AbKwzv5PmhzPRqBC8bfNZCwO2LTm23mD8bPGvtFyjEBrrxfCy58h+0HMAGxH53zbDnLBrzAj7yxy1Bx9o0E+GE3C8EA1Ho X-Forefront-PRVS: 00872B689F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(7916002)(199003)(189002)(24454002)(54356999)(46406003)(76176999)(19580395003)(19580405001)(81166006)(81156014)(8676002)(7846002)(305945005)(23726003)(7736002)(6666003)(2950100002)(6916009)(4001350100001)(97736004)(83506001)(5890100001)(77096005)(110136003)(86362001)(66066001)(47776003)(92566002)(68736007)(561944003)(101416001)(33656002)(105586002)(5660300001)(50986999)(106356001)(189998001)(586003)(50466002)(97756001)(3846002)(345774005)(42186005)(4326007)(9686002)(2906002)(1076002)(6116002)(18370500001);DIR:OUT;SFP:1101;SCL:1;SRVR:SN1PR07MB2351;H:rric.localdomain;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;SN1PR07MB2351;23:vz0cWyLWXU4dHW19yWQA9PqAC/AedaIzf1wSxjYYd?= =?us-ascii?Q?6kBCe80XW7/JQexWN4Yizm9w84dnkqBTb9XfpY13u/CTDajOXUjJcLo8djsk?= =?us-ascii?Q?NEe7Y6aDLMhqah7EcxYTiLiOA5YVaOtDx9vn4+Ych/nqz5gAGKPLcmQAc6qX?= =?us-ascii?Q?PkwwkB8QxGwb/FmdFQqZJLegyTHhOPfCbvnBACZErzd6MG4eiRPOw+gzWuwf?= =?us-ascii?Q?txr3TVLbrrINp7FMBYT2AmjZouDn5suxFJfz0egVV/pDsxffffXKMJTs24cx?= =?us-ascii?Q?E1MPYPONks9SY2+E0BYZbmPZQnj4m9FwU6eU+v0WfLkaY/KgKSzs5mGt+2rY?= =?us-ascii?Q?ttcMFBKF6akDu0nenbKhP0RRDFUKXWDVDRNBLJfdo/V01VRuFCCiCHe5l8cp?= =?us-ascii?Q?N/dcWNdrR/zsqMVYpsAtjKQfwOO3ls9rxkDtXDhOdWurSfRlirmDqZOFJuaC?= =?us-ascii?Q?6JWQUDHM+HiWkVZJDb1TGqv9TUoqgKLrTGTx7EhU0sLFXiwww7IbQODi5y05?= =?us-ascii?Q?ZoXZf/pvn7NsZUj9TH91NdKgUh1GgK+6OjGH9MlwKC76qRoOOY5NpJzxxczu?= =?us-ascii?Q?zYUcr9WvXz9V8FjT67YpnNbaC3or38ZISZuxDimATgqgpasx6GYqs3zHfkQB?= =?us-ascii?Q?FOvVH4wZ/JnYvCbjPuaAIbF6Fkv8vqlpSGdjTkcA5qD/FgDESBxOlYnZdu+e?= =?us-ascii?Q?yHTEvCo3KTPxrTZ7ifObXfdekdC+6cyY2Xe/7T1mz//Y4zIat4EcegnJa8p8?= =?us-ascii?Q?zNwhdKI3YVc/PObGlJdKhBHvCoFgrSlraZF2IDjyeQq1aIo03v8uidPVOf4G?= =?us-ascii?Q?ZgDHQCyuNScXt26V9TvZeGfMqlSDKAdNDe3TerR+MKR0tiD9BETHbCt0Po3F?= =?us-ascii?Q?isy0YR5/vHeSYvAGDFYfI7m+uU0FJpaQ1efIe5XycEn62r8jf5SAM2xhcm0q?= =?us-ascii?Q?oMHkLvpSdllAE2QRMPF7Yuttc1/zhDUn2yyCndAErcXsNE9hgiBfAEvgzWd4?= =?us-ascii?Q?bmnLS7ri9z45CVqyZhNtF60e7YW4YBG3/DM4fygp1yywuxMKW8jYcOvvBfpf?= =?us-ascii?Q?/2qIqXzhLc8HQf/nxZxRbOGAMpg5d114Kh+EWO8rie2g6uuQV1wInMKZ3xT5?= =?us-ascii?Q?PgqGBkS0RQeZrlXFaU/I0sunorJpkLm2NjVUXkTCyLzi+bI2w71ftrRN2AUl?= =?us-ascii?Q?QNBMTiBBEQE7qfVGP0cvD8/Z/A5+yIcA1fC7KEsvkh7lk39FuwmvQOZHHsET?= =?us-ascii?Q?XQ4UFLjAuIgtj5BCiyGhnxYwrNKsNzLqxWbGobXZlzwN0xi/J4Ld5ho2OumS?= =?us-ascii?B?dz09?= X-Microsoft-Exchange-Diagnostics: 1;SN1PR07MB2351;6:kMq64O9R4wS4z15Q/EKTS9js9bUaFaATN4SASVdDONx1Nfn6XLCp73Pd2d93jGsoma8a6L8gN5o46QYiHFx2qHFEJ7dYbiooDYLRsNFzpEdH+ZucFdjv9LNs8BIub14OyeB41pPHk4CLrFMo9cOCRSL/BMT1nH0IpQ+XFCY4qlkLqjYtFUCHQGLvIHnbRUlAYn/fEoTTuiOqOOl0hx/v3ezccJYK7HO+h+fPg+g8nYtcnkej71FZyjB77Ulk7IlsqZ4jCOWUVAvQgQsxiTbxg0tGwrXBWw5cAqCpQTaoS0U1vluZJ3LvSXH0Eqttf2zy;5:I2V6KheFqi2TDRMqRUaCL9lz+WzuQaF12WZqZqqDCyK9Cs5O48Sh/9P6/kXHNZLiDG098qDbIcDjumYaqXaN6VcvvsZDm/zDuOEXy5nFYnD0iDMGcz4/lclLTiWDKbZ7TDIpCAxJe4ziHGnoQwCjq5AVWnJtGD9nqIebWZntuj4=;24:V1b/I3gg3H9mcdjdhMq2CutWO61TL/MV0sBcno4QQeR0lE7le1km3lCGqy2Y7jWRsxXHQPpoXid1QUU2iuXNA025gY3VY7FqebeDke+1TKs=;7:sbs4TyNnhU66KFFOB5+f29BI67Kv6T4lzridGi7K1YbTWJ8Zs8wuaxMOsjFzfl0XF98GXnqyuS95e48CqEHPrjD5yq1OSZL+x3y/3J3lMFzuhqkgu12mU0EZXw2dwQKlL1lPR8JeELHSF8evLRijmNRfSVHq98lJoRQinpAAlz9jhbqaijkg14N+oRxiTx9nEY8evVV5dqmbrDjNa1+O3n4x/7PFOL+y7O+TTv2yv+px4NTZyFdSuXg+ralLSgbFl0BrGAI2qxqfM0aoCPz97UfmYbEQvtDGe20g1Pxqmtv/jcjdyWnZSHKqosuJrtCBpmVDRdWWP1Bcr+hpneqQGw== SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Oct 2016 16:11:23.7127 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR07MB2351 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3839 Lines: 97 Ard, thank you for your answer and you explanation. On 06.10.16 11:00:33, Ard Biesheuvel wrote: > On 6 October 2016 at 10:52, Robert Richter wrote: > > There is a memory setup problem on ThunderX systems with certain > > memory configurations. The symptom is > > > > kernel BUG at mm/page_alloc.c:1848! > > > > This happens for some configs with 64k page size enabled. The bug > > triggers for page zones with some pages in the zone not assigned to > > this particular zone. In my case some pages that are marked as nomap > > were not reassigned to the new zone of node 1, so those are still > > assigned to node 0. > > > > The reason for the mis-configuration is a change in pfn_valid() which > > reports pages marked nomap as invalid: > > > > 68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping > > > > These pages are owned by the firmware, which may map it with > attributes that conflict with the attributes we use for the linear > mapping. This means they should not be covered by the linear mapping. > > > This causes pages marked as nomap being no long reassigned to the new > > zone in memmap_init_zone() by calling __init_single_pfn(). > > > > This sounds like the root cause of your issue. Could we not fix that instead? Yes, this is proposal b) from my last mail that would work too: I implemented an arm64 private early_pfn_valid() function that uses memblock_is_memory() to setup all pages of a zone. Though, I think this is the wrong way and thus I prefer this patch instead. I see serveral reasons for this: Inconsistent use of struct *page, it is initialized but never used again. Other archs only do a basic range check in pfn_valid(), the default implementation just returns if the whole section is valid. As I understand the code, if the mem range is not aligned to the section, then there will be pfn's in the section that don't have physical mem attached. The page is then just initialized, it's not marked reserved nor the refcount is non-zero. It is then simply not used. This is how no-map pages should be handled too. I think pfn_valid() is just a quick check if the pfn's struct *page can be used. There is a good description for this in include/linux/ mmzone.h. So there can be memory holes that have a valid pfn. If the no-map memory needs special handling, then additional checks need to be added to the particular code (as in ioremap.c). It's imo wrong to (mis-)use pfn_valid for that. Variant b) involves generic mm code to fix it for arm64, this patch is an arm64 change only. This makes it harder to get a fix for it. (Though maybe only a problem of patch logistics.) > > > Fixing this by restoring the old behavior of pfn_valid() to use > > memblock_is_memory(). > > This is incorrect imo. In general, pfn_valid() means ordinary memory > covered by the linear mapping and the struct page array. Returning > reserved ranges that the kernel should not even touch only to please > the NUMA code seems like an inappropriate way to deal with this issue. As said above, it is not marked as reserved, it is treated like non-existing memory. This has been observed for non-numa kernels too and can happen for each zone that is only partly initialized. I think the patch addresses your concerns. I can't see there the kernel uses memory marked as nomap in a wrong way. Thanks, -Robert > > > Also changing users of pfn_valid() in arm64 code > > to use memblock_is_map_memory() where necessary. This only affects > > code in ioremap.c. The code in mmu.c still can use the new version of > > pfn_valid(). > > > > Should be marked stable v4.5.. > > > > Signed-off-by: Robert Richter > > --- > > arch/arm64/mm/init.c | 2 +- > > arch/arm64/mm/ioremap.c | 5 +++-- > > 2 files changed, 4 insertions(+), 3 deletions(-)