Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp5483851img; Wed, 27 Mar 2019 09:19:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqzpyC+6s3yG0O82vVd+reckMpi57vkaskt4YK68maMQA3tIrkV8JU8y+fQWbu2ju35jLSEX X-Received: by 2002:a62:ab14:: with SMTP id p20mr37194286pff.23.1553703543418; Wed, 27 Mar 2019 09:19:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553703543; cv=none; d=google.com; s=arc-20160816; b=VCwEFDxt7ojXIa3B8m8HBz5JbovdoCqoLRyjLm/Dg3UX6VtvcZxIg+xyDqMSRv8DKO 5+n4oyl4euYq4/QTkYFC0FBPi34mAB2rwjGepMHZLOyv39gKdgxJJmWDKI6KL62LVIRQ FkfgDbZ97gkD85Puf9TMutRSKfHb53ZWCRgkS2ESGLCx1eB8Zy6nb5iTzETM5U1TR5SX 9CXbjC+GTP21E5t3+cg1yM2fATWaVv6f5ZjI2MrfPV4p1PnEnRN2tDGA5qugiIOAxN6+ zqPTR+teihyPJUj/luH++tqIylvtibSxnnAMLzUuyOygVuXdYMYCvnJxZLRsi2XKczCk oa7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=o06e9v+9FpXO9GK+yHaRDxnC2XU9MQFMNB9rxnWVCx0=; b=Kux1XpZF2EmkdcuAPVFIN10QHsQC8lV1pFfPZ1+fP45xcojX0cThGjxSNvxo0F4t0z xy/xv0Rz0RD3nPlzynGxTLBr/Nq+r0hQbb0dljUB/6fVm0LxYDlj53JPZ4HoRuXqwD4k M2j452XkAWjvVG3FjzNXaSBq6EDHQJYGwhf5yzlRpyNd9gp5XHPNaALpLtKGTRxne3Wx Ggca5Cf2tSTDtvDhRYkhpv3XAbJtcDE9f0eDThK+GQhjEP//Szuy/7B2xn0lT1NntgUt erW1TJ+s5kLnIBk3DAmnRx6dhgaeWdixjciyhtDN06H3dAoR2bFHzdxKFePAheGKDfQL dzqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=riSZtAb4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h6si16810824pfn.13.2019.03.27.09.18.47; Wed, 27 Mar 2019 09:19:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=riSZtAb4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727716AbfC0QRu (ORCPT + 99 others); Wed, 27 Mar 2019 12:17:50 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:37434 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726803AbfC0QRt (ORCPT ); Wed, 27 Mar 2019 12:17:49 -0400 Received: by mail-oi1-f195.google.com with SMTP id v84so13349910oif.4 for ; Wed, 27 Mar 2019 09:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o06e9v+9FpXO9GK+yHaRDxnC2XU9MQFMNB9rxnWVCx0=; b=riSZtAb4wf4bYiirQclqxvNvoVvIjtdvtpP7hW1LkPvm8AIJkqkLcOikXYp70CGJF8 pO/VD4mpVZfLCWOoghiHrFHv2ix3uFN93wST3+Zo/8vL8tnAt3RUFcVE+bTNS6b0Bu8S sPJHJGMjlaqftIoN/SQpJDjLqgKPHbsLHV7oRhJLaQeY56dthnI7xorlCM3jCS7N1qPf 4D6ld0NFBKCRMr0LNm2iJAZkUzdSyxX7qhEOV+g9rpVmKKNcFa3531Xk9UPzX6i+LFux WYasZ76PTsX5mSzeAOm0mJyw4SL0YYdJEB0FU/+sH9/0HdD1ZHErFmYUisyGtVfwORCb wizw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o06e9v+9FpXO9GK+yHaRDxnC2XU9MQFMNB9rxnWVCx0=; b=ckuOpKG239Y5XqeZAVDKwoup0vIcVy2hGNkPATs7ZfEgGk93hfDa4HkZNJIHPu9NWO 7uI7ijJtZ7p3OgQO5b+8JXFzdFKF0iw7w8O6ercSb2ySaUo5EOYp8Jbh6sev4kEsxmSZ SO47FYidvCjnLokI3HcqxENqyOMx0PULWiqjH4OUTND6yrXhSl23T42YQDTB+1dAD8Ko MpllUKIcxFAnuZJaMOim4DDnG5MCYt/tO2+kxspSRGT9wx4twXzFsnC8+R3jfztPHRpW woEa4qIYq6HhRSCktcrlMnxzXGp4tG6kLlx1PpyvEg4AJxlQ+ZhHwmkwa9MY+oRg9vIK qqAw== X-Gm-Message-State: APjAAAVF7McmZ1sOWQEcKpHUHVJErWl6jNj2T/NSqqHPm7a973ftWxRh eb7ytT1wtUV0NdXpoNgHjhSiOMDTvG+KQjDN5Q3KGg== X-Received: by 2002:aca:f581:: with SMTP id t123mr20220426oih.0.1553703468743; Wed, 27 Mar 2019 09:17:48 -0700 (PDT) MIME-Version: 1.0 References: <155327387405.225273.9325594075351253804.stgit@dwillia2-desk3.amr.corp.intel.com> <20190322180532.GM32418@dhcp22.suse.cz> <20190325101945.GD9924@dhcp22.suse.cz> <20190326080408.GC28406@dhcp22.suse.cz> <20190327161306.GM11927@dhcp22.suse.cz> In-Reply-To: <20190327161306.GM11927@dhcp22.suse.cz> From: Dan Williams Date: Wed, 27 Mar 2019 09:17:37 -0700 Message-ID: Subject: Re: [PATCH v5 00/10] mm: Sub-section memory hotplug support To: Michal Hocko Cc: Andrew Morton , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Toshi Kani , Jeff Moyer , Vlastimil Babka , stable , Linux MM , linux-nvdimm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 27, 2019 at 9:13 AM Michal Hocko wrote: > > On Tue 26-03-19 17:20:41, Dan Williams wrote: > > On Tue, Mar 26, 2019 at 1:04 AM Michal Hocko wrote: > > > > > > On Mon 25-03-19 13:03:47, Dan Williams wrote: > > > > On Mon, Mar 25, 2019 at 3:20 AM Michal Hocko wrote: > > > [...] > > > > > > User-defined memory namespaces have this problem, but 2MB is the > > > > > > default alignment and is sufficient for most uses. > > > > > > > > > > What does prevent users to go and use a larger alignment? > > > > > > > > Given that we are living with 64MB granularity on mainstream platforms > > > > for the foreseeable future, the reason users can't rely on a larger > > > > alignment to address the issue is that the physical alignment may > > > > change from one boot to the next. > > > > > > I would love to learn more about this inter boot volatility. Could you > > > expand on that some more? I though that the HW configuration presented > > > to the OS would be more or less stable unless the underlying HW changes. > > > > Even if the configuration is static there can be hardware failures > > that prevent a DIMM, or a PCI device to be included in the memory map. > > When that happens the BIOS needs to re-layout the map and the result > > is not guaranteed to maintain the previous alignment. > > > > > > No, you can't just wish hardware / platform firmware won't do this, > > > > because there are not enough platform resources to give every hardware > > > > device a guaranteed alignment. > > > > > > Guarantee is one part and I can see how nobody wants to give you > > > something as strong but how often does that happen in the real life? > > > > I expect a "rare" event to happen everyday in a data-center fleet. > > Failure rates tend towards 100% daily occurrence at scale and in this > > case the kernel has everything it needs to mitigate such an event. > > > > Setting aside the success rate of a software-alignment mitigation, the > > reason I am charging this hill again after a 2 year hiatus is the > > realization that this problem is wider spread than the original > > failing scenario. Back in 2017 the problem seemed limited to custom > > memmap= configurations, and collisions between PMEM and System RAM. > > Now it is clear that the collisions can happen between PMEM regions > > and namespaces as well, and the problem spans platforms from multiple > > vendors. Here is the most recent collision problem: > > https://github.com/pmem/ndctl/issues/76, from a third-party platform. > > > > The fix for that issue uncovered a bug in the padding implementation, > > and a fix for that bug would result in even more hacks in the nvdimm > > code for what is a core kernel deficiency. Code review of those > > changes resulted in changing direction to go after the core > > deficiency. > > This kind of information along with real world examples is exactly what > you should have added into the cover letter. A previous very vague > claims were not really convincing or something that can be considered a > proper justification. Please do realize that people who are not working > with the affected HW are unlikely to have an idea how serious/relevant > those problems really are. > > People are asking for a smaller memory hotplug granularity for other > usecases (e.g. memory ballooning into VMs) which are quite dubious to > be honest and not really worth all the code rework. If we are talking > about something that can be worked around elsewhere then it is preferred > because the code base is not in an excellent shape and putting more on > top is just going to cause more headaches. > > I will try to find some time to review this more deeply (no promises > though because time is hectic and this is not a simple feature). For the > future, please try harder to write up a proper justification and a > highlevel design description which tells a bit about all important parts > of the new scheme. Fair enough. I've been steeped in this for too long, and should have taken a wider view to bring reviewers up to speed.