Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp5636635imm; Wed, 12 Sep 2018 08:52:25 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbT648utSvNABlG7geEnwnZE0P8vHYItnWv13fvr5dmbP6GZWNzVW2YzpB4k4pqoyQPGb1+ X-Received: by 2002:a17:902:1745:: with SMTP id i63-v6mr2994614pli.3.1536767545265; Wed, 12 Sep 2018 08:52:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536767545; cv=none; d=google.com; s=arc-20160816; b=LRiyU9EbpNDjJrCmvEQotc28ij3JdmfYY80e+Es4PB+/Hu9Bh6Z3CSUTplZFUwnE/H qtQoe3m3bg9yNcvgiwNtiB7ss5FG/fOp6NRWnLslYQoCcDR5XLH84PKFEDSNFI/RRGuk E5nTd+ghJDjO6a1TL1T1z13vzWxUQhViPxkmZBPOJTzRmwh+dKsMMT2tA4pNic5k75dT ZmPxvaU7G5KDYb2C6ymLesHWs7ynvcTmYuJ468LiezzLrhQDA3RvMaw7JyGp5cpZJ67v dD/STDJqt2LXjw7htM5E/84jbQYasPOD0FYh3w7jaVKwm7xwHTyf/kP3oglYj7Xp1yeQ JVJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:message-id :mime-version:references:in-reply-to:subject:cc:to:from:date; bh=3f3urg0hqIS0VlGfCdWbZKEeorvcy+N7iJEKPinwLDs=; b=RrQ/lUOkSPseBHQ32hZnXpk4m/AtsNpPSPNULStgKFbjYCubQCcNRUJcz0QiOa+IXu S3axWe6EDxc8s9oaxXUBtcIwkDT2z/v5JLPvxmgeIrb3ufae+qBMbidXjsbkYPxYWkZD QVb0Abn0E5z/2YSg3M84HiP/tZ+M5Lh3qXOMDH08VQjjASFxu+ZLn1czP9eUUu6Qm4da wcji++k+EDLVCYye94KPhEiNNFggeHHRe5H5ai4+J1QvPsrbJPydIbM7Mm+aBd233yFI tT13anCkvybc6RMJGYBlD8UsCaRh4i/rbbBrHXCQE9P7vxby4b7f0XXNLF1RfzFbM5ii Ydhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n5-v6si1313658plp.186.2018.09.12.08.52.09; Wed, 12 Sep 2018 08:52:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbeILU5C (ORCPT + 99 others); Wed, 12 Sep 2018 16:57:02 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:51588 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726640AbeILU5C (ORCPT ); Wed, 12 Sep 2018 16:57:02 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w8CFkMcU010221 for ; Wed, 12 Sep 2018 11:51:55 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 2mf2jea370-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 12 Sep 2018 11:51:54 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 12 Sep 2018 16:51:51 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 12 Sep 2018 16:51:48 +0100 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w8CFplBi15073404 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 12 Sep 2018 15:51:47 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2FD124203F; Wed, 12 Sep 2018 18:51:38 +0100 (BST) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E74E542045; Wed, 12 Sep 2018 18:51:37 +0100 (BST) Received: from thinkpad (unknown [9.152.212.168]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 12 Sep 2018 18:51:37 +0100 (BST) Date: Wed, 12 Sep 2018 17:51:45 +0200 From: Gerald Schaefer To: Pasha Tatashin Cc: Michal Hocko , Mikhail Zaslonko , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "osalvador@suse.de" Subject: Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary In-Reply-To: <38ce1d0b-14bd-9a4a-1061-62c366cb11b5@microsoft.com> References: <20180910123527.71209-1-zaslonko@linux.ibm.com> <20180910131754.GG10951@dhcp22.suse.cz> <20180912150356.642c1dab@thinkpad> <20180912133933.GI10951@dhcp22.suse.cz> <20180912162717.5a018bf6@thinkpad> <38ce1d0b-14bd-9a4a-1061-62c366cb11b5@microsoft.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18091215-0012-0000-0000-000002A7CCDB X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18091215-0013-0000-0000-000020DC0F64 Message-Id: <20180912175145.7dd3513c@thinkpad> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-09-12_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1809120160 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 12 Sep 2018 14:40:18 +0000 Pasha Tatashin wrote: > On 9/12/18 10:27 AM, Gerald Schaefer wrote: > > On Wed, 12 Sep 2018 15:39:33 +0200 > > Michal Hocko wrote: > > > >> On Wed 12-09-18 15:03:56, Gerald Schaefer wrote: > >> [...] > >>> BTW, those sysfs attributes are world-readable, so anyone can trigger > >>> the panic by simply reading them, or just run lsmem (also available for > >>> x86 since util-linux 2.32). OK, you need a special not-memory-block-aligned > >>> mem= parameter and DEBUG_VM for poison check, but w/o DEBUG_VM you would > >>> still access uninitialized struct pages. This sounds very wrong, and I > >>> think it really should be fixed. > >> > >> Ohh, absolutely. Nobody is questioning that. The thing is that the > >> code has been likely always broken. We just haven't noticed because > >> those unitialized parts where zeroed previously. Now that the implicit > >> zeroying is gone it is just visible. > >> > >> All that I am arguing is that there are many places which assume > >> pageblocks to be fully initialized and plugging one place that blows up > >> at the time is just whack a mole. We need to address this much earlier. > >> E.g. by allowing only full pageblocks when adding a memory range. > > > > Just to make sure we are talking about the same thing: when you say > > "pageblocks", do you mean the MAX_ORDER_NR_PAGES / pageblock_nr_pages > > unit of pages, or do you mean the memory (hotplug) block unit? > > From early discussion, it was about pageblock_nr_pages not about > memory_block_size_bytes > > > > > I do not see any issue here with MAX_ORDER_NR_PAGES / pageblock_nr_pages > > pageblocks, and if there was such an issue, of course you are right that > > this would affect many places. If there was such an issue, I would also > > assume that we would see the new page poison warning in many other places. > > > > The bug that Mikhails patch would fix only affects code that operates > > on / iterates through memory (hotplug) blocks, and that does not happen > > in many places, only in the two functions that his patch fixes. > > Just to be clear, so memory is pageblock_nr_pages aligned, yet > memory_block are larger and panic is still triggered? > > I ask, because 3075M is not 128M aligned. Correct, 3075M is pageblock aligned (at least on s390), but not memory block aligned (256 MB on s390). And the "not memory block aligned" is the reason for the panic, because at least the two memory hotplug functions seem to rely on completely initialized struct pages for a memory block. In this scenario we don't have any partly initialized pageblocks. While thinking about this, with mem= it may actually be possible to also create a not-pageblock-aligned scenario, e.g. with mem=2097148K. I didn't try this and I thought that at least pageblock-alignment would always be present, but from a quick glance at the mem= parsing it should actually be possible to also create such a scenario. Then we really would have partly initialized pageblocks, and maybe other problems would occur. > > > > > When you say "address this much earlier", do you mean changing the way > > that free_area_init_core()/memmap_init() initialize struct pages, i.e. > > have them not use zone->spanned_pages as limit, but rather align that > > up to the memory block (not pageblock) boundary? > > > > This was my initial proposal, to fix memmap_init() and initialize struct > pages beyond the "end", and before the "start" to cover the whole > section. But, I think Michal suggested (and he might correct me) to > simply ignore unaligned memory to section memory much earlier: so > anything that does not align to sparse order is not added at all to the > system. > > I think Michal's proposal would simplify and strengthen memory mapping > overall. Of course it would be better to fix this in one place by providing proper alignment, but to what unit, pageblock, section, memory block? I was just confused by the pageblock discussion, because in the current scenario we do not have any pageblock issues, and pageblock alignment would also not help here. section alignment probably would, even though a memory block can contain multiple sections, at least the memory hotplug valid_zones/removable sysfs handlers seem to check for present sections first, before accessing the struct pages.