Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp462509imm; Fri, 28 Sep 2018 01:12:55 -0700 (PDT) X-Google-Smtp-Source: ACcGV60n3HuZ8s7UhsaP3aVZUsmQn82lMbQuVHxZW71EgEnAACgxWmMGR7WVV8u6vDe4IXTi3uJU X-Received: by 2002:a62:ac15:: with SMTP id v21-v6mr15649951pfe.126.1538122375910; Fri, 28 Sep 2018 01:12:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538122375; cv=none; d=google.com; s=arc-20160816; b=ykoH139/kAbIcmGQXzCo36cq+5YQQJwQ3soblBbqDakEP3Efl/ChanO137tqByUeKA 1mXstK83FqdG6u+E7/SLrnVf1hhH4Qz2e/Ktw2gjoQ78h9kM005nqztjILOR1mjzS9l+ ldpaYyRzM4RrrTgAy5I7K+2th6dHMcxDgNT7ja6BIajbP0bL8HEKEUmI3Xi9x6N9vFml M0ifCyoLfBa+yY5FruhHg67pwJseTBo4GYiRS4JO9h34AsmPuceKwKpOTc8dSkB3Xa7h V2vQnFCA9wyZRZRmjUreRBJ4wZgdrxNQX1Hm2sJdtc760CigULekctLLReYLsFa/sXHW BTug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=e6G1NBSTkJ/bhGl8Fn+Smcp7GAw+5rucnfxNIeZbOzo=; b=QR2KVuCaCk+2kYStWi4sW0UXsBb2zP+CiKPbDJY02mswpkeZ0TBm3gwMu1IJxdzHBC 6ikTcuCpL4wp4hZ0yGW3IdZPIvLrd1FsHyok5qzHxpgWhdbdBCK7izxbhvWcXmqX+hqj kHkJc1HoASSiCNv6if79Qn+nLW1EjbuIzQO3UV3jqaMn6GKPTo53fDPtE/XwjhDDhqiY f9xklc1sjNb4Usf2UtmSkuvipFjSeLCKGMZ3cV5Tko6zuE7gQLDWX/W6Wvs8Zp7pMNZq sEFx4thd+by2tIdws9NDMQk5JgIhd3bjw2fH5LR414zh7hfUBEHg9O4LnrfeVR+ZnOts 7RXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d10-v6si4405238pfo.108.2018.09.28.01.12.39; Fri, 28 Sep 2018 01:12:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728825AbeI1OfA (ORCPT + 99 others); Fri, 28 Sep 2018 10:35:00 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:36016 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726440AbeI1OfA (ORCPT ); Fri, 28 Sep 2018 10:35:00 -0400 Received: by mail-wr1-f67.google.com with SMTP id l10-v6so5359418wrp.3 for ; Fri, 28 Sep 2018 01:12:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=e6G1NBSTkJ/bhGl8Fn+Smcp7GAw+5rucnfxNIeZbOzo=; b=S3CAsZfXlsBOq8TQoLW58hgTjLvILPpzf9fvkxcYfYNBjbyjsebGUx1EQx2KpfApF0 9I6UNgToYs37w6hp/0RZKFn8JsU5mF07OgnmEu3GMCU6BSnnJRzxe9wfTTtpvITYyPBP Ny7Idkrh6qu1/uEftvfDPr5CZ/tioBs8VPARKJNuVzz92VtaoekOiPNgGVcqozy8QOgY S2/g7cfzRI59N4xKdZRUrhdw83ukQKgexS/xRGAcw6MqqHEqcQUREsKvbqa+tVyhOi2V 8V/ArRGcGH6HlVezE5tXIzTt6q8+UBotUBRGR9XMRF8YLtnxKt1Zjfb7K+Sbt1un1upe 4zeQ== X-Gm-Message-State: ABuFfohUsDTm+qVREYc0hTtAeH4b5IJyuMQ1WqY5h0QW0i3H67rOfezr Lgz0mO5vPYrzH1KBQFAjdM0= X-Received: by 2002:adf:8024:: with SMTP id 33-v6mr11575338wrk.16.1538122345973; Fri, 28 Sep 2018 01:12:25 -0700 (PDT) Received: from techadventures.net (techadventures.net. [62.201.165.239]) by smtp.gmail.com with ESMTPSA id x204-v6sm1141493wmg.27.2018.09.28.01.12.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Sep 2018 01:12:25 -0700 (PDT) Received: by techadventures.net (Postfix, from userid 1000) id A486A125651; Fri, 28 Sep 2018 10:12:24 +0200 (CEST) Date: Fri, 28 Sep 2018 10:12:24 +0200 From: Oscar Salvador To: Michal Hocko Cc: Alexander Duyck , linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, pavel.tatashin@microsoft.com, dave.jiang@intel.com, dave.hansen@intel.com, jglisse@redhat.com, rppt@linux.vnet.ibm.com, dan.j.williams@intel.com, logang@deltatee.com, mingo@kernel.org, kirill.shutemov@linux.intel.com Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Message-ID: <20180928081224.GA25561@techadventures.net> References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <20180926075540.GD6278@dhcp22.suse.cz> <6f87a5d7-05e2-00f4-8568-bb3521869cea@linux.intel.com> <20180927110926.GE6278@dhcp22.suse.cz> <20180927122537.GA20378@techadventures.net> <20180927131329.GI6278@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180927131329.GI6278@dhcp22.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 27, 2018 at 03:13:29PM +0200, Michal Hocko wrote: > I would have to double check but is the hotplug lock really serializing > access to the state initialized by init_currently_empty_zone? E.g. > zone_start_pfn is a nice example of a state that is used outside of the > lock. zone's free lists are similar. So do we really need the hoptlug > lock? And more broadly, what does the hotplug lock is supposed to > serialize in general. A proper documentation would surely help to answer > these questions. There is way too much of "do not touch this code and > just make my particular hack" mindset which made the whole memory > hotplug a giant pile of mess. We really should start with some proper > engineering here finally. * Locking rules: * * zone_start_pfn and spanned_pages are protected by span_seqlock. * It is a seqlock because it has to be read outside of zone->lock, * and it is done in the main allocator path. But, it is written * quite infrequently. * * Write access to present_pages at runtime should be protected by * mem_hotplug_begin/end(). Any reader who can't tolerant drift of * present_pages should get_online_mems() to get a stable value. IIUC, looks like zone_start_pfn should be envolved with zone_span_writelock/zone_span_writeunlock, and since zone_start_pfn is changed in init_currently_empty_zone, I guess that the whole function should be within that lock. So, a blind shot, but could we do something like the following? diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 898e1f816821..49f87252f1b1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, int nid = pgdat->node_id; unsigned long flags; - if (zone_is_empty(zone)) - init_currently_empty_zone(zone, start_pfn, nr_pages); - clear_zone_contiguous(zone); /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */ pgdat_resize_lock(pgdat, &flags); zone_span_writelock(zone); + if (zone_is_empty(zone)) + init_currently_empty_zone(zone, start_pfn, nr_pages); resize_zone_range(zone, start_pfn, nr_pages); zone_span_writeunlock(zone); resize_pgdat_range(pgdat, start_pfn, nr_pages); Then, we could take move_pfn_range_to_zone out of the hotplug lock. Although I am not sure about leaving memmap_init_zone unprotected. For the normal memory, that is not a problem since the memblock's lock protects us from touching the same pages at the same time in online/offline_pages, but for HMM/devm the story is different. I am totally unaware of HMM/devm, so I am not sure if its protected somehow. e.g: what happens if devm_memremap_pages and devm_memremap_pages_release are running at the same time for the same memory-range (with the assumption that the hotplug-lock does not protect move_pfn_range_to_zone anymore). -- Oscar Salvador SUSE L3