Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4101725imm; Mon, 8 Oct 2018 15:08:19 -0700 (PDT) X-Google-Smtp-Source: ACcGV60X73NucrSEbgm3NQGzDVsIgnOIvHnDanMMqYwlT0Wt+8qMCpBCOnNvfE3BR+9bLLnfXTrY X-Received: by 2002:a63:1c64:: with SMTP id c36-v6mr22304821pgm.354.1539036499355; Mon, 08 Oct 2018 15:08:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539036499; cv=none; d=google.com; s=arc-20160816; b=JhLArMiH72ZDnZVnrMWGA4Z5Vo38oDJ4GC6/UnvQVdv/l87MUcvHJCL7SxCQsEHR3r hcYqooZZ7g+xPfy019KBNwfkQf2rixcMSfdP2Ugie39/ZZUiW6+9w5a6AnvVmkC90lS6 AOgBTnVWP3kFu1LE8w2rIyBv3u7Bf2M0prGOVdJaw7s5S5ly2o9pq8Y5NrenFH+dLWh8 F8hpCQpURTeIN7l7PVMDILU/EdTdq1qaH3pO7JaBkHT4pNHRboZbCKBo/h3Cxknbi8cJ 56kGGe9XOWUVNdCvZEK9d4LqtXysRPXiuxBIr9NRDYRHvZ1T/w+aSxtns2KpVF8nzxEv U5Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=luo7kKwQ/PS0awMRA+WkRXSZI2wQ45B6nKKSiLdJMsQ=; b=n8jHsxUJziG/3P6R7sDfX9gQdABHVx7oQJUMG/xGml83aIZFCa16xhiu7vZT+TZAjY f/Zyl4RdyY+cJLFrYbHbRVV7Al6eAs5xRnQeyhX8atx+zk6uuC+7h/MHhDpJHpyeqJ4J 6+nYWYgCyYRMy9C+w5emukFZv0eWYgkte0oedVVpgZ3p3bGSmx7A5YW86DtlUa7wd/TS Xr7SBkgmhWswNbrl2TDW4EVGis++1Zufv7JDTbQp/z5hpZYLBlL3u55wEgAkeVLENcnB KDER8/1vr/KIbCFpZKMmF91GOaJTqjXWN+5YTbmsuvXSjzO7IKCHrFvkJuNNyF0ykoKK n7rA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l128-v6si21288578pfc.6.2018.10.08.15.08.04; Mon, 08 Oct 2018 15:08:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725861AbeJIFVe (ORCPT + 99 others); Tue, 9 Oct 2018 01:21:34 -0400 Received: from mga01.intel.com ([192.55.52.88]:36669 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725749AbeJIFVe (ORCPT ); Tue, 9 Oct 2018 01:21:34 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Oct 2018 15:07:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,358,1534834800"; d="scan'208";a="239600576" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.7.198.166]) ([10.7.198.166]) by orsmga004.jf.intel.com with ESMTP; 08 Oct 2018 15:07:42 -0700 Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap To: Dan Williams Cc: Linux MM , Andrew Morton , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Michal Hocko , Dave Jiang , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , rppt@linux.vnet.ibm.com, Logan Gunthorpe , Ingo Molnar , "Kirill A. Shutemov" References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com> From: Alexander Duyck Message-ID: Date: Mon, 8 Oct 2018 15:07:42 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/8/2018 3:00 PM, Dan Williams wrote: > On Mon, Oct 8, 2018 at 2:48 PM Alexander Duyck > wrote: >> >> On 10/8/2018 2:01 PM, Dan Williams wrote: >>> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck >>> wrote: >>>> >>>> The ZONE_DEVICE pages were being initialized in two locations. One was with >>>> the memory_hotplug lock held and another was outside of that lock. The >>>> problem with this is that it was nearly doubling the memory initialization >>>> time. Instead of doing this twice, once while holding a global lock and >>>> once without, I am opting to defer the initialization to the one outside of >>>> the lock. This allows us to avoid serializing the overhead for memory init >>>> and we can instead focus on per-node init times. >>>> >>>> One issue I encountered is that devm_memremap_pages and >>>> hmm_devmmem_pages_create were initializing only the pgmap field the same >>>> way. One wasn't initializing hmm_data, and the other was initializing it to >>>> a poison value. Since this is something that is exposed to the driver in >>>> the case of hmm I am opting for a third option and just initializing >>>> hmm_data to 0 since this is going to be exposed to unknown third party >>>> drivers. >>>> >>>> Reviewed-by: Pavel Tatashin >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> >>>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid >>>> merge conflicts with other changes in the kernel. >>>> v5: No change >>> >>> This patch appears to cause a regression in the "create.sh" unit test >>> in the ndctl test suite. >> >> So all you had to do is run the create.sh script to see the issue? I >> just want to confirm there isn't any additional information needed >> before I try chasing this down. > > From the ndctl source tree run: > > make -j TESTS="create.sh" check > > ...the readme has some more setup instructions: > https://github.com/pmem/ndctl/blob/master/README.md > > 0day has sometimes run this test suite automatically, but we need to > get that more robust because setting up this environment is a bit of a > hoop to jump through with the need to setup the nfit_test module. > >>> I tried to reproduce on -next with: >>> >>> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point >>> where we init pgmap >>> >>> ...but -next does not even boot for me at that commit. >> >> What version of -next? There are a couple of patches probably needed >> depending on which version you are trying to boot. > > Today's -next, but backed up to that above commit. I was also seeing > CONFIG_DEBUG_LIST spamming the logs, and a crash in the crypto layer. > >>> Here is a warning signature that proceeds a hang with this patch >>> applied against v4.19-rc6: >>> >>> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after >>> switching to atomic >>> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155 >>> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> CPU: 24 PID: 7346 Comm: modprobe Tainted: G OE 4.19.0-rc6+ #2458 >>> [..] >>> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> [..] >>> Call Trace: >>> >>> ? percpu_ref_reinit+0x140/0x140 >>> rcu_process_callbacks+0x273/0x880 >>> __do_softirq+0xd2/0x428 >>> irq_exit+0xf6/0x100 >>> smp_apic_timer_interrupt+0xa2/0x220 >>> apic_timer_interrupt+0xf/0x20 >>> >>> RIP: 0010:lock_acquire+0xb8/0x1a0 >>> [..] >>> ? __put_page+0x55/0x150 >>> ? __put_page+0x55/0x150 >>> __put_page+0x83/0x150 >>> ? __put_page+0x55/0x150 >>> devm_memremap_pages_release+0x194/0x250 >>> release_nodes+0x17c/0x2c0 >>> device_release_driver_internal+0x1a2/0x250 >>> driver_detach+0x3a/0x70 >>> bus_remove_driver+0x58/0xd0 >>> __x64_sys_delete_module+0x13f/0x200 >>> ? trace_hardirqs_off_thunk+0x1a/0x1c >>> do_syscall_64+0x60/0x210 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >> >> So it looks like we are tearing down memory when this is triggered. Do >> we know if this is at the end of the test or if this is running in >> parallel with anything? > > Should not be running in parallel with anything this test is > performing a series of namespace setup and teardown events. > > Wait, where did the call to "percpu_ref_get()" go? I think that's the bug. Actually I think you are probably right. Do you want to get that or should I. Should be a quick patch since you could probably just add a call to percpu_ref_get_many to hold a reference for each page in the range of device pages before calling memmap_init_zone_device. - Alex