Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp476248imm; Fri, 17 Aug 2018 01:00:32 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzTpZsKblUo9znavZrgYPaWP4p49pgdBU+eR82gkuwVJ9+2Ul6bllqSx9o+8HiSvigPC+vS X-Received: by 2002:a63:cc4f:: with SMTP id q15-v6mr5717182pgi.217.1534492832404; Fri, 17 Aug 2018 01:00:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534492832; cv=none; d=google.com; s=arc-20160816; b=n2N75tcV3OVjYclvJAfjsGlKuhOmQyIFQGg4Wn8ihxckPJZ8tncAX2dxeVloZLjrhw 3mUnE4BcJTnXNvGS3x4lJaD7ig0tkbi+Vi5TMda+EVc0sPlokDj2V16jWCfH9aQTATuJ pOTWVROZNHnJs60Z7+ZU+v5M2DL3nyYSE1BPpah0mNR0F77PAB9Um3iICmvy8QVKv6oU CtVNClnmIW4XQjw0d9Ji9Bf34r9KZudnpjrxRhs7xftNqRa45mAl4W0DIAtgr3TI2Jrz owJe+zFFluaoCpY8ECNKrcLR6wt8yIQ+ay47qTzrpYaot+sqnijAvxrNYBkN9WOK8Cu+ oSrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=T45a9DNOY6vlD1lD0lvp+jXIv4s2SfAGDcAXrZ22sYA=; b=o/5W375NWTtlw2PQEZsOxxsfB4YTe8x2Mc4XKEdkiI6fdKJG2+NnuEjP5zvdG3bvnk PesPAV/ws0TVST5ac79lHySYoHDl7YMPq8YSfZqS85cYIkl7cvA+cPPD2I4DlH0LH9oj xq1H10mYHcAjR6D6snRLXPIdb6qjQTtY0gLgHEi1i+iSI9m1I59odK3fd0tFc2gUM3m6 QA+I0GGCdKbAgzb6iq3Ln9n80SAuFe1kBzndcUTTg6kEa0sa/v3z6+0FnSKCRwm2k3sa CcJXWA3t6eTU54bavoOIob8qNx2QR1+tc9OCXr3M31lEltaQO9/efvZpkAcf9Onar/3/ kfSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i24-v6si1630824pgj.228.2018.08.17.01.00.16; Fri, 17 Aug 2018 01:00:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726546AbeHQLBd (ORCPT + 99 others); Fri, 17 Aug 2018 07:01:33 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57190 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726422AbeHQLBd (ORCPT ); Fri, 17 Aug 2018 07:01:33 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F2757A7E8; Fri, 17 Aug 2018 07:59:09 +0000 (UTC) Received: from t460s.redhat.com (ovpn-117-9.ams2.redhat.com [10.36.117.9]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3AE1910D18C9; Fri, 17 Aug 2018 07:59:02 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, devel@linuxdriverproject.org, linux-s390@vger.kernel.org, xen-devel@lists.xenproject.org, Andrew Morton , Michal Hocko , Vlastimil Babka , Dan Williams , Pavel Tatashin , Oscar Salvador , Vitaly Kuznetsov , Martin Schwidefsky , Heiko Carstens , David Hildenbrand , "K . Y . Srinivasan" , Haiyang Zhang , Stephen Hemminger , Benjamin Herrenschmidt , Paul Mackerras , "Rafael J . Wysocki" , Len Brown , Greg Kroah-Hartman , David Rientjes Subject: [PATCH RFC 0/2] mm: online/offline_pages called w.o. mem_hotplug_lock Date: Fri, 17 Aug 2018 09:58:59 +0200 Message-Id: <20180817075901.4608-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 17 Aug 2018 07:59:09 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 17 Aug 2018 07:59:09 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Reading through the code and studying how mem_hotplug_lock is to be used, I noticed that there are two places where we can end up calling device_online()/device_offline() - online_pages()/offline_pages() without the mem_hotplug_lock. And there are other places where we call device_online()/device_offline() without the device_hotplug_lock. While e.g. echo "online" > /sys/devices/system/memory/memory9/state is fine, e.g. echo 1 > /sys/devices/system/memory/memory9/online Will not take the mem_hotplug_lock. However the device_lock() and device_hotplug_lock. E.g. via memory_probe_store(), we can end up calling add_memory()->online_pages() without the device_hotplug_lock. So we can have concurrent callers in online_pages(). We e.g. touch in online_pages() basically unprotected zone->present_pages then. Looks like there is a longer history to that (see Patch #2 for details), and fixing it to work the way it was intended is not really possible. We would e.g. have to take the mem_hotplug_lock in device/base/core.c, which sounds wrong. Summary: We had a lock inversion on mem_hotplug_lock and device_lock(), and the approach to fix it fixed one inversion, but dropped the mem_hotplug_lock on another instance (.online). As far as I understand from the code and from b93e0f329e24 ("mm, memory_hotplug: get rid of zonelists_mutex"), mem_hotplug_lock is required because we assume that "both memory online and offline are fully serialized." and this is not the case if we only hold the device_lock(). I propose the general rules: 1. add_memory/add_memory_resource() must only be called with device_hotplug_lock. For now only done in ACPI code. 2. remove_memory() must only be called with device_hotplug_lock. This is already documented and true in ACPI code. 3. device_online()/device_offline() must only be called with device_hotplug_lock. This is already documented and true for now in core code. Other callers (related to memory hotplug) have to be fixed up. 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/ online_pages/offline_pages. For now this is only true for the first two instances. To me, this looks way cleaner than what we have right now (and easier to verify). And looking at the documentation of remove_memory, using lock_device_hotplug also for add_memory() feels natural. Second patch could maybe split up. But let's first hear if this is actually a problem and if there migh be alternatives (or cleanups). Only tested with DIMM-based hotplug. David Hildenbrand (1): mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock Vitaly Kuznetsov (1): drivers/base: export lock_device_hotplug/unlock_device_hotplug arch/powerpc/platforms/powernv/memtrace.c | 3 ++ drivers/acpi/acpi_memhotplug.c | 1 + drivers/base/core.c | 2 ++ drivers/base/memory.c | 18 +++++----- drivers/hv/hv_balloon.c | 4 +++ drivers/s390/char/sclp_cmd.c | 3 ++ drivers/xen/balloon.c | 3 ++ mm/memory_hotplug.c | 42 ++++++++++++++++++----- 8 files changed, 57 insertions(+), 19 deletions(-) -- 2.17.1