Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp494586imm; Fri, 17 Aug 2018 01:22:56 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxIkIW2DlniWO6SKk+z1KaD8R2bgJ4w4KwJLcFSaO6hqvrNjwV58bX7SCq2yC1eEjglY67K X-Received: by 2002:a17:902:7c8e:: with SMTP id y14-v6mr33299723pll.265.1534494176738; Fri, 17 Aug 2018 01:22:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534494176; cv=none; d=google.com; s=arc-20160816; b=kE3+Bloa76dv/zL12dUppsDeoYZC3EghehSklOKFcN06F1yH4PJ/8gt7chBzZbxMs1 Q61Z4avGlusS8m0e9yiwvJAnARuQFHsFKfxJ0CehTG9ZRwVPrySE+CgH9ROcO2Y7H3he 7lCMEXpbstRHnPlA1BnlmngZMfc85ybZ6mxKNl3CMkkWro6GvHT6iN9pZPFZS1AZHMp6 ggF3GAgnb2Q2lZsAQY/9n4jcmA9H/4V2ADQiBIQLtLlsou5IlZd3NHtNPJGbub85HJZ+ C9Fad08uGFaAD1beITnnDEEv3pyFZk7nxc4B8zeyDHR3eSoEe4YZo5bGdSrKeELNUf5d pdsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:arc-authentication-results; bh=JDXSIF84yfXcqCeDr7VZsnQVZZVz5xhGS9xGzMQu1l8=; b=z2Rblw8ZB3+UaRp+aWVMQawR/xkzkS0NHgis2qoZjjq8RQdpzbsjwPIZp2vhYOa3EI 79jwtjZ93lhGGC/h0g6LbRkv5v/nFwMGyStd3fufVZ638PLqs+9MLo2el0sk9+CqXCt8 mN8RKjZkwXDgIVYW2oInwjEkJJhpITEzhIoyO8pxvGdcrObj7uopN71Ntbk5BD2aQIXd sJiDMUS5bvw3pye1rVbYiemq9AIs9570OS3cxXRdhPvMYFp+kON2lJ0TiG14hHA17s/q 4aVTxjSycJpxnTb0yzqg+gBzzXS86q0o7FbA0JNOGsc5GBYW7DeoJslVQ6YVDuRL97SN 4LHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b18-v6si1532169pge.666.2018.08.17.01.22.41; Fri, 17 Aug 2018 01:22:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726771AbeHQLXX (ORCPT + 99 others); Fri, 17 Aug 2018 07:23:23 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:37668 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726528AbeHQLXX (ORCPT ); Fri, 17 Aug 2018 07:23:23 -0400 Received: by mail-oi0-f66.google.com with SMTP id j205-v6so12711817oib.4; Fri, 17 Aug 2018 01:20:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JDXSIF84yfXcqCeDr7VZsnQVZZVz5xhGS9xGzMQu1l8=; b=lCTbg6vwLrXeed0oo7EEPXn1HZPBS6BLojaHuuIjlh6/qqpavb+gWjCuPo8McIY1rp EtDsUr97f0egTvGyO8WOSu5jt/q4TMWjtd/IhOQoKgdmqeEM8G40+tZAAxv/Exd3ZHhy fGRFfTVvz5SA+ipnxSHU7xnAJATR2gmIwwYMeiVanFB7DU5f3y1oAKKk0OB/oJPUU/NX ImLiie56veUUB+XH8U4Fwwu88XEzd3mv1/bWkZtxGGJWTHPLg+3Fz3x71xHTecAna/lq xAXy3mj6i6j6WUIuMu8L0FkEvsYeAe1JhtHGr9IJ6E8zabvl1YikXLWoACVarDsQ6IfP NSEQ== X-Gm-Message-State: AOUpUlHKw7+GpL9Hb5Emn5wUDl5bPMaDMbAkoAQFUPBDCAtZIDNXYrfl gYJSTj6rIv7xNl0Iw3L8sJ2gDSm8cAoLS9tbenM= X-Received: by 2002:aca:37c2:: with SMTP id e185-v6mr1564649oia.326.1534494055281; Fri, 17 Aug 2018 01:20:55 -0700 (PDT) MIME-Version: 1.0 References: <20180817075901.4608-1-david@redhat.com> <20180817075901.4608-3-david@redhat.com> In-Reply-To: <20180817075901.4608-3-david@redhat.com> From: "Rafael J. Wysocki" Date: Fri, 17 Aug 2018 10:20:43 +0200 Message-ID: Subject: Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock To: David Hildenbrand Cc: Linux Kernel Mailing List , Linux Memory Management List , linuxppc-dev , ACPI Devel Maling List , devel@linuxdriverproject.org, linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org, Andrew Morton , Michal Hocko , Vlastimil Babka , Dan Williams , Pavel Tatashin , osalvador@suse.de, Vitaly Kuznetsov , Martin Schwidefsky , Heiko Carstens , kys@microsoft.com, haiyangz@microsoft.com, sthemmin@microsoft.com, Benjamin Herrenschmidt , Paul Mackerras , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , David Rientjes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand wrote: > > There seem to be some problems as result of 30467e0b3be ("mm, hotplug: > fix concurrent memory hot-add deadlock"), which tried to fix a possible > lock inversion reported and discussed in [1] due to the two locks > a) device_lock() > b) mem_hotplug_lock > > While add_memory() first takes b), followed by a) during > bus_probe_device(), onlining of memory from user space first took b), > followed by a), exposing a possible deadlock. > > In [1], and it was decided to not make use of device_hotplug_lock, but > rather to enforce a locking order. Looking at 1., this order is not always > satisfied when calling device_online() - essentially we simply don't take > one of both locks anymore - and fixing this would require us to > take the mem_hotplug_lock in core driver code (online_store()), which > sounds wrong. > > The problems I spotted related to this: > > 1. Memory block device attributes: While .state first calls > mem_hotplug_begin() and the calls device_online() - which takes > device_lock() - .online does no longer call mem_hotplug_begin(), so > effectively calls online_pages() without mem_hotplug_lock. onlining/ > offlining of pages is no longer serialised across different devices. > > 2. device_online() should be called under device_hotplug_lock, however > onlining memory during add_memory() does not take care of that. (I > didn't follow how strictly this is needed, but there seems to be a > reason because it is documented at device_online() and > device_offline()). > > In addition, I think there is also something wrong about the locking in > > 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() > (and device_online()) without locks. This was introduced after > 30467e0b3be. And skimming over the code, I assume it could need some > more care in regards to locking. > > ACPI code already holds the device_hotplug_lock, and as we are > effectively hotplugging memory block devices, requiring to hold that > lock does not sound too wrong, although not chosen in [1], as > "I don't think resolving a locking dependency is appropriate by > just serializing them with another lock." > I think this is the cleanest solution. > > Requiring add_memory()/add_memory_resource() to be called under > device_hotplug_lock fixes 2., taking the mem_hotplug_lock in > online_pages/offline_pages() fixes 1. and 3. > > Fixup all callers of add_memory/add_memory_resource to hold the lock if > not already done. > > So this is essentially a revert of 30467e0b3be, implementation of what > was suggested in [1] by Vitaly, applied to the current tree. > > [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ > 2015-February/065324.html > > This patch is partly based on a patch by Vitaly Kuznetsov. > > Signed-off-by: David Hildenbrand > --- > arch/powerpc/platforms/powernv/memtrace.c | 3 ++ > drivers/acpi/acpi_memhotplug.c | 1 + > drivers/base/memory.c | 18 +++++----- > drivers/hv/hv_balloon.c | 4 +++ > drivers/s390/char/sclp_cmd.c | 3 ++ > drivers/xen/balloon.c | 3 ++ > mm/memory_hotplug.c | 42 ++++++++++++++++++----- > 7 files changed, 55 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c > index 51dc398ae3f7..4c2737a33020 100644 > --- a/arch/powerpc/platforms/powernv/memtrace.c > +++ b/arch/powerpc/platforms/powernv/memtrace.c > @@ -206,6 +206,8 @@ static int memtrace_online(void) > int i, ret = 0; > struct memtrace_entry *ent; > > + /* add_memory() requires device_hotplug_lock */ > + lock_device_hotplug(); > for (i = memtrace_array_nr - 1; i >= 0; i--) { > ent = &memtrace_array[i]; > > @@ -244,6 +246,7 @@ static int memtrace_online(void) > pr_info("Added trace memory back to node %d\n", ent->nid); > ent->size = ent->start = ent->nid = -1; > } > + unlock_device_hotplug(); > if (ret) > return ret; > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > index 6b0d3ef7309c..e7a4c7900967 100644 > --- a/drivers/acpi/acpi_memhotplug.c > +++ b/drivers/acpi/acpi_memhotplug.c > @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) > if (node < 0) > node = memory_add_physaddr_to_nid(info->start_addr); > > + /* we already hold the device_hotplug lock at this point */ > result = add_memory(node, info->start_addr, info->length); > > /* A very minor nit here: I would say "device_hotplug_lock is already held at this point" in the comment (I sort of don't like to say "we" in code comments as it is not particularly clear what group of people is represented by that and the lock is actually called device_hotplug_lock). Otherwise the approach is fine by me. BTW, the reason why device_hotplug_lock is acquired by the ACPI memory hotplug is because it generally needs to be synchronized with respect CPU hot-remove and similar. I believe that this may be the case in non-ACPI setups as well. Thanks, Rafael