Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4128020imm; Mon, 8 Oct 2018 15:37:55 -0700 (PDT) X-Google-Smtp-Source: ACcGV60rPbKxekQjC7XfEZ5BAzrfeD/LBprIn7eKoIMkpwBdEZf7oOSBDyvLHqLS9BOVaXRBdvwx X-Received: by 2002:a17:902:7246:: with SMTP id c6-v6mr26075164pll.304.1539038274958; Mon, 08 Oct 2018 15:37:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539038274; cv=none; d=google.com; s=arc-20160816; b=zYfBXbk0NfOsZTjAM9GoLm/rSSLCVCWLTyzv3Guvfgoyj9A2s/myUkBSOOBt5q1TKc MsREtF8wDUfluFVClCdcOA0bWi0MkQUei08quxnJbr/zGA3eIoxduHUWVhpwRtMKTZ/e apx29JQyM7otVJmr5ZsYnG0jP2w0HKFKTd5RYHBjBL2D2HAzjkeq3A3dHe15fXJyQ2Is QBsIDdXBHovFBy2jgHcI8VO7JjDGMnGEe7inZumtjG+KSZ11unVWEZGEyFOqEVKWye2m WmGbGk4auscFr0C/psTK54fYXmWzmcnoJ/AV0UWrOfKioHQMlhdG2GmKmTyNacgcdzJz eAhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=ITxk0DBnpKRlooGIxl9kdHxBh6QzyZsUcvd6V/o59JI=; b=QiJFLMBNzTldqVEmvtysdvLZlqEmEhDd2vaBtAeFn8FeECCDH1Cw2evNh6hBvkJw+E RFXN5uFUlDTJUi72X6zpo5Ryr5Hh7PKZd8b+gFNfrqivqoBfCEsH2BwLz7KiHYeQ3Qvb jYUzXLxJKx72DiR9GW1muuqFG90OIuJQBH3kVoJTh7ca3TG0Xz72XV8QGR4a6gHxclZf kUzVXH6in36O/vS9TcuCJbcWPUNIv623MmmxprUF47g30iDG9HUjlGr2Q674yZya9iBL m1/bJaZCy2/l8vkhis+bJj3ZZbIrGBeRus+3XgupmR3wJ5kkrfAr4yVCK+Pj2y0zZSwh kymw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u10-v6si19478620plq.1.2018.10.08.15.37.31; Mon, 08 Oct 2018 15:37:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726393AbeJIFvC (ORCPT + 99 others); Tue, 9 Oct 2018 01:51:02 -0400 Received: from mga14.intel.com ([192.55.52.115]:40559 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725783AbeJIFvB (ORCPT ); Tue, 9 Oct 2018 01:51:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Oct 2018 15:37:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,358,1534834800"; d="scan'208";a="239606712" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.7.198.166]) ([10.7.198.166]) by orsmga004.jf.intel.com with ESMTP; 08 Oct 2018 15:36:58 -0700 Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap To: Dan Williams Cc: Linux MM , Andrew Morton , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Michal Hocko , Dave Jiang , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , rppt@linux.vnet.ibm.com, Logan Gunthorpe , Ingo Molnar , "Kirill A. Shutemov" References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com> From: Alexander Duyck Message-ID: <95de811f-ad44-c96d-3914-5625933d5e88@linux.intel.com> Date: Mon, 8 Oct 2018 15:36:58 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/8/2018 3:00 PM, Dan Williams wrote: > On Mon, Oct 8, 2018 at 2:48 PM Alexander Duyck > wrote: >> >> On 10/8/2018 2:01 PM, Dan Williams wrote: >>> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck >>> wrote: >>>> >>>> The ZONE_DEVICE pages were being initialized in two locations. One was with >>>> the memory_hotplug lock held and another was outside of that lock. The >>>> problem with this is that it was nearly doubling the memory initialization >>>> time. Instead of doing this twice, once while holding a global lock and >>>> once without, I am opting to defer the initialization to the one outside of >>>> the lock. This allows us to avoid serializing the overhead for memory init >>>> and we can instead focus on per-node init times. >>>> >>>> One issue I encountered is that devm_memremap_pages and >>>> hmm_devmmem_pages_create were initializing only the pgmap field the same >>>> way. One wasn't initializing hmm_data, and the other was initializing it to >>>> a poison value. Since this is something that is exposed to the driver in >>>> the case of hmm I am opting for a third option and just initializing >>>> hmm_data to 0 since this is going to be exposed to unknown third party >>>> drivers. >>>> >>>> Reviewed-by: Pavel Tatashin >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> >>>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid >>>> merge conflicts with other changes in the kernel. >>>> v5: No change >>> >>> This patch appears to cause a regression in the "create.sh" unit test >>> in the ndctl test suite. >> >> So all you had to do is run the create.sh script to see the issue? I >> just want to confirm there isn't any additional information needed >> before I try chasing this down. > > From the ndctl source tree run: > > make -j TESTS="create.sh" check > > ...the readme has some more setup instructions: > https://github.com/pmem/ndctl/blob/master/README.md > > 0day has sometimes run this test suite automatically, but we need to > get that more robust because setting up this environment is a bit of a > hoop to jump through with the need to setup the nfit_test module. > >>> I tried to reproduce on -next with: >>> >>> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point >>> where we init pgmap >>> >>> ...but -next does not even boot for me at that commit. >> >> What version of -next? There are a couple of patches probably needed >> depending on which version you are trying to boot. > > Today's -next, but backed up to that above commit. I was also seeing > CONFIG_DEBUG_LIST spamming the logs, and a crash in the crypto layer. > >>> Here is a warning signature that proceeds a hang with this patch >>> applied against v4.19-rc6: >>> >>> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after >>> switching to atomic >>> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155 >>> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> CPU: 24 PID: 7346 Comm: modprobe Tainted: G OE 4.19.0-rc6+ #2458 >>> [..] >>> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> [..] >>> Call Trace: >>> >>> ? percpu_ref_reinit+0x140/0x140 >>> rcu_process_callbacks+0x273/0x880 >>> __do_softirq+0xd2/0x428 >>> irq_exit+0xf6/0x100 >>> smp_apic_timer_interrupt+0xa2/0x220 >>> apic_timer_interrupt+0xf/0x20 >>> >>> RIP: 0010:lock_acquire+0xb8/0x1a0 >>> [..] >>> ? __put_page+0x55/0x150 >>> ? __put_page+0x55/0x150 >>> __put_page+0x83/0x150 >>> ? __put_page+0x55/0x150 >>> devm_memremap_pages_release+0x194/0x250 >>> release_nodes+0x17c/0x2c0 >>> device_release_driver_internal+0x1a2/0x250 >>> driver_detach+0x3a/0x70 >>> bus_remove_driver+0x58/0xd0 >>> __x64_sys_delete_module+0x13f/0x200 >>> ? trace_hardirqs_off_thunk+0x1a/0x1c >>> do_syscall_64+0x60/0x210 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >> >> So it looks like we are tearing down memory when this is triggered. Do >> we know if this is at the end of the test or if this is running in >> parallel with anything? > > Should not be running in parallel with anything this test is > performing a series of namespace setup and teardown events. > > Wait, where did the call to "percpu_ref_get()" go? I think that's the bug. > I have a reproduction on my system now as well. I should have a patch ready to go for it in the next hour or so. Thanks. - Alex