Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp5038728imm; Tue, 21 Aug 2018 05:23:32 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwFKKkLe0K3wrRJsPkrtL72fMoyYCC44vpGXgiJ8HQsvn4tCz6GIP4qTKQB3j7bDKZP+Y7A X-Received: by 2002:a63:2150:: with SMTP id s16-v6mr17360751pgm.267.1534854212062; Tue, 21 Aug 2018 05:23:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534854212; cv=none; d=google.com; s=arc-20160816; b=W5NtQxbBFF8HqKUlU4suxeG5aAoUfpQ00EFBSon1axAahZxfr/qFEX+l4fj3bYjXQx MtHy+gSJZccjetVQXgIpw0A5Vph01+cmqOHJI4u99zMvMPM54pBaUq8rGO1X9xJBUygT IRizTsCgQfeujzTKfYM1hY93Lt75XxeUe2PwKnwY8/hmSFSpubBRQ9NmRF8f+2hN1iBX PZ+N233q8ysn69uhIvksH1rjGrZi4G9kAPcLd77D5osgRxnkqSU4RtASoiJENZqbzSwm CdUBi049CU++K4YM+ZKpka70eMtL+sEKifxK3B/HmZgaj1aN8bnwvbPOvUHRfZdeU7o5 4DFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=2fvsgSZhun/SjO8alM7BftZg262QXbusi418RCEviyE=; b=lmtTKUtGZLng0QQBBT/xt1avqh7M6HHdujwkS93YhpVbh2njY29NOGYenj0UQzUota yzLqLvuKU/m01k/wUnLLcPTw776nc+0uVWAZHK+3eeHuhV9HD8+aP6vqIXrVlCLkp+Yl CZWyNmDMc2LkvTrhgUDecDxUm3RNrw0UIY4uxxaEIcJLwF7nnh26eAYeUePgLBxdd7We ZH1yf/IAxj6miEWDn3eArV52LsqVHq1TCm+ZGQ9ttWOdXqZcfA6Oaug5ImiTpfV5ySBM MhzeanylGPbQaZ5V0xCqePadCuAPrRemQ/2+rMsV6SUu+zNz9T02hyY1RiS1adZZKeLp k91A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s15-v6si12186328pgr.269.2018.08.21.05.23.16; Tue, 21 Aug 2018 05:23:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727260AbeHUOEN (ORCPT + 99 others); Tue, 21 Aug 2018 10:04:13 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36130 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727181AbeHUOEL (ORCPT ); Tue, 21 Aug 2018 10:04:11 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6268D402178A; Tue, 21 Aug 2018 10:44:31 +0000 (UTC) Received: from t460s.redhat.com (ovpn-117-96.ams2.redhat.com [10.36.117.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8CE5094647; Tue, 21 Aug 2018 10:44:19 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Andrew Morton , Balbir Singh , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Greg Kroah-Hartman , Haiyang Zhang , Heiko Carstens , John Allen , Jonathan Corbet , Joonsoo Kim , Juergen Gross , Kate Stewart , "K. Y. Srinivasan" , Len Brown , Martin Schwidefsky , Mathieu Malaterre , Michael Ellerman , Michael Neuling , Michal Hocko , Nathan Fontenot , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , "Rafael J. Wysocki" , Rashmica Gupta , Stephen Hemminger , Thomas Gleixner , Vlastimil Babka , YASUAKI ISHIMATSU Subject: [PATCH RFCv2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Date: Tue, 21 Aug 2018 12:44:12 +0200 Message-Id: <20180821104418.12710-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 21 Aug 2018 10:44:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Tue, 21 Aug 2018 10:44:31 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'david@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is the same approach as in the first RFC, but this time without exporting device_hotplug_lock (requested by Greg) and with some more details and documentation regarding locking. Tested only on x86 so far. -------------------------------------------------------------------------- Reading through the code and studying how mem_hotplug_lock is to be used, I noticed that there are two places where we can end up calling device_online()/device_offline() - online_pages()/offline_pages() without the mem_hotplug_lock. And there are other places where we call device_online()/device_offline() without the device_hotplug_lock. While e.g. echo "online" > /sys/devices/system/memory/memory9/state is fine, e.g. echo 1 > /sys/devices/system/memory/memory9/online Will not take the mem_hotplug_lock. However the device_lock() and device_hotplug_lock. E.g. via memory_probe_store(), we can end up calling add_memory()->online_pages() without the device_hotplug_lock. So we can have concurrent callers in online_pages(). We e.g. touch in online_pages() basically unprotected zone->present_pages then. Looks like there is a longer history to that (see Patch #2 for details), and fixing it to work the way it was intended is not really possible. We would e.g. have to take the mem_hotplug_lock in device/base/core.c, which sounds wrong. Summary: We had a lock inversion on mem_hotplug_lock and device_lock(). More details can be found in patch 3 and patch 6. I propose the general rules (documentation added in patch 6): 1. add_memory/add_memory_resource() must only be called with device_hotplug_lock. 2. remove_memory() must only be called with device_hotplug_lock. This is already documented and holds for all callers. 3. device_online()/device_offline() must only be called with device_hotplug_lock. This is already documented and true for now in core code. Other callers (related to memory hotplug) have to be fixed up. 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/ online_pages/offline_pages. To me, this looks way cleaner than what we have right now (and easier to verify). And looking at the documentation of remove_memory, using lock_device_hotplug also for add_memory() feels natural. RFC -> RFCv2: - Don't export device_hotplug_lock, provide proper remove_memory/add_memory wrappers. - Split up the patches a bit. - Try to improve powernv memtrace locking - Add some documentation for locking that matches my knowledge David Hildenbrand (6): mm/memory_hotplug: make remove_memory() take the device_hotplug_lock mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock powerpc/powernv: hold device_hotplug_lock when calling device_online() powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() memory-hotplug.txt: Add some details about locking internals Documentation/memory-hotplug.txt | 39 +++++++++++- arch/powerpc/platforms/powernv/memtrace.c | 14 +++-- .../platforms/pseries/hotplug-memory.c | 8 +-- drivers/acpi/acpi_memhotplug.c | 4 +- drivers/base/memory.c | 22 +++---- drivers/xen/balloon.c | 3 + include/linux/memory_hotplug.h | 4 +- mm/memory_hotplug.c | 59 +++++++++++++++---- 8 files changed, 115 insertions(+), 38 deletions(-) -- 2.17.1