Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp755503imm; Fri, 21 Sep 2018 07:47:25 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY6VQk4jKPyuUBAh68hA91PjwW3LWxiIAutqTdahUJxzA+9bHgpbk+nJhbGvwEyi5tAZ8Hq X-Received: by 2002:a63:d613:: with SMTP id q19-v6mr42249564pgg.327.1537541245473; Fri, 21 Sep 2018 07:47:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537541245; cv=none; d=google.com; s=arc-20160816; b=WvlAOF14sWu6FAxGihDNh9NDXxFK0dKD5diSMAK0uFE+gz2ABQSVOVTvg6gLJerBcJ P/V4puq0VPN8AXi/exBX2kSSEeE+JbVCXwGNWv8rrneyRPWAKHtq+O6qYYHTsW4EkS0B yrlJCsDxQjnHY7llrm/S+ORw/RuzCGsdRH9HNoepz/SJKGvYAsWpy/AdrNhwlb4ccHe+ vkOS6olMU8nKKgU20iyYHkQyLUttsyOHSXjVPh/ctMCEbE+9HgTtqCHlQI46IX431xOi X5+Go/0wm55+4sclOeftGwy3JEVyYwNtRsLhP3pTikeZf2EJ7a18lHYEFIEd9XLTuTAa 18Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=WncCAXu7PTBb0xwdymLDOoD8CEO2uA86wmgeteDzhk0=; b=msVZzF7GvhPB1GKMlbDOyDOWqRwkZrXTmfFE9sB3iPEOQlzgJqBdjrChF6uQXgll02 sb19C/BNP7atV2fb314ymKBZFCgxkvLMXVKs5UB5/gJubUppSKRmV8tk4aT9JLNVDkeM 4A+9bxIXsa00dDQEhTFuiMCzZcRiJ54tN8u3Q3ZlpAKbZsMGGgShcR2Jn6/u5bIPqEze fyQTdAAerpvsgj6/un924Fs0wO6b86q4OVHsu9L6w51wfVw9TQkjTWTfKEfXNXWF5ehE yL7yd7gxpdlG+jDqU3fN1BHE6UuVX44s6IYrAbImZe8HJafUCrVe0R1tA1eTQCh9qufs iBWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p67-v6si29136723pfg.295.2018.09.21.07.47.09; Fri, 21 Sep 2018 07:47:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390267AbeIUUgB (ORCPT + 99 others); Fri, 21 Sep 2018 16:36:01 -0400 Received: from mga05.intel.com ([192.55.52.43]:24640 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728149AbeIUUgA (ORCPT ); Fri, 21 Sep 2018 16:36:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Sep 2018 07:46:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,285,1534834800"; d="scan'208";a="75153767" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.252.132.103]) ([10.252.132.103]) by orsmga008.jf.intel.com with ESMTP; 21 Sep 2018 07:46:46 -0700 Subject: Re: [PATCH v4 5/5] nvdimm: Schedule device registration on node local to the device To: Dan Williams Cc: Linux MM , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Michal Hocko , Dave Jiang , Ingo Molnar , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Logan Gunthorpe , "Kirill A. Shutemov" References: <20180920215824.19464.8884.stgit@localhost.localdomain> <20180920222951.19464.39241.stgit@localhost.localdomain> <0d6525c1-2e8b-0e5d-7dae-193bf697a4ec@linux.intel.com> From: Alexander Duyck Message-ID: <6e17294f-4847-9e7a-2396-6fffaf8a8f4a@linux.intel.com> Date: Fri, 21 Sep 2018 07:46:46 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/20/2018 7:46 PM, Dan Williams wrote: > On Thu, Sep 20, 2018 at 6:34 PM Alexander Duyck > wrote: >> >> >> >> On 9/20/2018 5:36 PM, Dan Williams wrote: >>> On Thu, Sep 20, 2018 at 5:26 PM Alexander Duyck >>> wrote: >>>> >>>> On 9/20/2018 3:59 PM, Dan Williams wrote: >>>>> On Thu, Sep 20, 2018 at 3:31 PM Alexander Duyck >>>>> wrote: >>>>>> >>>>>> This patch is meant to force the device registration for nvdimm devices to >>>>>> be closer to the actual device. This is achieved by using either the NUMA >>>>>> node ID of the region, or of the parent. By doing this we can have >>>>>> everything above the region based on the region, and everything below the >>>>>> region based on the nvdimm bus. >>>>>> >>>>>> One additional change I made is that we hold onto a reference to the parent >>>>>> while we are going through registration. By doing this we can guarantee we >>>>>> can complete the registration before we have the parent device removed. >>>>>> >>>>>> By guaranteeing NUMA locality I see an improvement of as high as 25% for >>>>>> per-node init of a system with 12TB of persistent memory. >>>>>> >>>>>> Signed-off-by: Alexander Duyck >>>>>> --- >>>>>> drivers/nvdimm/bus.c | 19 +++++++++++++++++-- >>>>>> 1 file changed, 17 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c >>>>>> index 8aae6dcc839f..ca935296d55e 100644 >>>>>> --- a/drivers/nvdimm/bus.c >>>>>> +++ b/drivers/nvdimm/bus.c >>>>>> @@ -487,7 +487,9 @@ static void nd_async_device_register(void *d, async_cookie_t cookie) >>>>>> dev_err(dev, "%s: failed\n", __func__); >>>>>> put_device(dev); >>>>>> } >>>>>> + >>>>>> put_device(dev); >>>>>> + put_device(dev->parent); >>>>> >>>>> Good catch. The child does not pin the parent until registration, but >>>>> we need to make sure the parent isn't gone while were waiting for the >>>>> registration work to run. >>>>> >>>>> Let's break this reference count fix out into its own separate patch, >>>>> because this looks to be covering a gap that may need to be >>>>> recommended for -stable. >>>> >>>> Okay, I guess I can do that. >>>> >>>>> >>>>>> >>>>>> static void nd_async_device_unregister(void *d, async_cookie_t cookie) >>>>>> @@ -504,12 +506,25 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie) >>>>>> >>>>>> void __nd_device_register(struct device *dev) >>>>>> { >>>>>> + int node; >>>>>> + >>>>>> if (!dev) >>>>>> return; >>>>>> + >>>>>> dev->bus = &nvdimm_bus_type; >>>>>> + get_device(dev->parent); >>>>>> get_device(dev); >>>>>> - async_schedule_domain(nd_async_device_register, dev, >>>>>> - &nd_async_domain); >>>>>> + >>>>>> + /* >>>>>> + * For a region we can break away from the parent node, >>>>>> + * otherwise for all other devices we just inherit the node from >>>>>> + * the parent. >>>>>> + */ >>>>>> + node = is_nd_region(dev) ? to_nd_region(dev)->numa_node : >>>>>> + dev_to_node(dev->parent); >>>>> >>>>> Devices already automatically inherit the node of their parent, so I'm >>>>> not understanding why this is needed? >>>> >>>> That doesn't happen until you call device_add, which you don't call >>>> until nd_async_device_register. All that has been called on the device >>>> up to now is device_initialize which leaves the node at NUMA_NO_NODE. >>> >>> Ooh, yeah, missed that. I think I'd prefer this policy to moved out to >>> where we set the dev->parent before calling __nd_device_register, or >>> at least a comment here about *why* we know region devices are special >>> (i.e. because the nd_region_desc specified the node at region creation >>> time). >>> >> >> Are you talking about pulling the scheduling out or just adding a node >> value to the nd_device_register call so it can be set directly from the >> caller? > > I was thinking everywhere we set dev->parent before registering, also > set the node... That will not work unless we move the call to device_initialize to somewhere before you are setting the node. That is why I was thinking it might work to put the node assignment in nd_device_register itself since it looks like the regions don't call __nd_device_register directly. I guess we could get rid of nd_device_register if we wanted to go that route. >> If you wanted what I could do is pull the set_dev_node call from >> nvdimm_bus_uevent and place it in nd_device_register. That should stick >> as the node doesn't get overwritten by the parent if it is set after >> device_initialize. If I did that along with the parent bit I was already >> doing then all that would be left to do in is just use the dev_to_node >> call on the device itself. > > ...but this is even better. > I'm not sure it adds that much. Basically My thought was we just need to make sure to set the device node after the call to device_initialize but before the call to device_add. This just seems like a bunch more work spread the device_initialize calls all over and introduce possible regressions.