Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752144AbdF3NBp (ORCPT ); Fri, 30 Jun 2017 09:01:45 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:35057 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751565AbdF3NA1 (ORCPT ); Fri, 30 Jun 2017 09:00:27 -0400 Date: Fri, 30 Jun 2017 15:00:22 +0200 (CEST) From: Thomas Gleixner To: Andrey Ryabinin cc: Michal Hocko , LKML , "linux-mm@kvack.org" , Andrew Morton , Vlastimil Babka , Vladimir Davydov , Heiko Carstens Subject: Re: [PATCH] mm/memory-hotplug: Switch locking to a percpu rwsem In-Reply-To: <3f2395c6-bbe0-23c1-fe06-d17ffbf619c3@virtuozzo.com> Message-ID: References: <20170630092747.GD22917@dhcp22.suse.cz> <3f2395c6-bbe0-23c1-fe06-d17ffbf619c3@virtuozzo.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4476 Lines: 141 On Fri, 30 Jun 2017, Andrey Ryabinin wrote: > On 06/30/2017 01:15 PM, Thomas Gleixner wrote: > > On Fri, 30 Jun 2017, Michal Hocko wrote: > >> So I like this simplification a lot! Even if we can get rid of the > >> stop_machine eventually this patch would be an improvement. A short > >> comment on why the per-cpu semaphore over the regular one is better > >> would be nice. > > > > Yes, will add one. > > > > The main point is that the current locking construct is evading lockdep due > > to the ability to support recursive locking, which I did not observe so > > far. > > > > Like this? Cute..... > [ 131.023034] Call Trace: > [ 131.023034] dump_stack+0x85/0xc7 > [ 131.023034] __lock_acquire+0x1747/0x17a0 > [ 131.023034] ? lru_add_drain_all+0x3d/0x190 > [ 131.023034] ? __mutex_lock+0x218/0x940 > [ 131.023034] ? trace_hardirqs_on+0xd/0x10 > [ 131.023034] lock_acquire+0x103/0x200 > [ 131.023034] ? lock_acquire+0x103/0x200 > [ 131.023034] ? lru_add_drain_all+0x42/0x190 > [ 131.023034] cpus_read_lock+0x3d/0x80 > [ 131.023034] ? lru_add_drain_all+0x42/0x190 > [ 131.023034] lru_add_drain_all+0x42/0x190 > [ 131.023034] __offline_pages.constprop.25+0x5de/0x870 > [ 131.023034] offline_pages+0xc/0x10 > [ 131.023034] memory_subsys_offline+0x43/0x70 > [ 131.023034] device_offline+0x83/0xb0 > [ 131.023034] store_mem_state+0xdb/0xe0 > [ 131.023034] dev_attr_store+0x13/0x20 > [ 131.023034] sysfs_kf_write+0x40/0x50 > [ 131.023034] kernfs_fop_write+0x130/0x1b0 > [ 131.023034] __vfs_write+0x23/0x130 > [ 131.023034] ? rcu_read_lock_sched_held+0x6d/0x80 > [ 131.023034] ? rcu_sync_lockdep_assert+0x2a/0x50 > [ 131.023034] ? __sb_start_write+0xd4/0x1c0 > [ 131.023034] ? vfs_write+0x1a8/0x1d0 > [ 131.023034] vfs_write+0xc8/0x1d0 > [ 131.023034] SyS_write+0x44/0xa0 Why didn't trigger that here? Bah, I should have become suspicious due to not seeing a splat .... The patch below should cure that. Thanks, tglx 8<------------------- Subject: mm: Change cpuhotplug lock order in lru_add_drain_all() From: Thomas Gleixner Date: Fri, 30 Jun 2017 14:25:24 +0200 Not-Yet-Signed-off-by: Thomas Gleixner --- include/linux/swap.h | 1 + mm/memory_hotplug.c | 4 ++-- mm/swap.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 5 deletions(-) Index: b/include/linux/swap.h =================================================================== --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -277,6 +277,7 @@ extern void mark_page_accessed(struct pa extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); +extern void lru_add_drain_all_cpuslocked(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); extern void mark_page_lazyfree(struct page *page); Index: b/mm/memory_hotplug.c =================================================================== --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1860,7 +1860,7 @@ static int __ref __offline_pages(unsigne goto failed_removal; ret = 0; if (drain) { - lru_add_drain_all(); + lru_add_drain_all_cpuslocked(); cond_resched(); drain_all_pages(zone); } @@ -1881,7 +1881,7 @@ static int __ref __offline_pages(unsigne } } /* drain all zone's lru pagevec, this is asynchronous... */ - lru_add_drain_all(); + lru_add_drain_all_cpuslocked(); yield(); /* drain pcp pages, this is synchronous. */ drain_all_pages(zone); Index: b/mm/swap.c =================================================================== --- a/mm/swap.c +++ b/mm/swap.c @@ -687,7 +687,7 @@ static void lru_add_drain_per_cpu(struct static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); -void lru_add_drain_all(void) +void lru_add_drain_all_cpuslocked(void) { static DEFINE_MUTEX(lock); static struct cpumask has_work; @@ -701,7 +701,6 @@ void lru_add_drain_all(void) return; mutex_lock(&lock); - get_online_cpus(); cpumask_clear(&has_work); for_each_online_cpu(cpu) { @@ -721,10 +720,16 @@ void lru_add_drain_all(void) for_each_cpu(cpu, &has_work) flush_work(&per_cpu(lru_add_drain_work, cpu)); - put_online_cpus(); mutex_unlock(&lock); } +void lru_add_drain_all(void) +{ + get_online_cpus(); + lru_add_drain_all_cpuslocked(); + put_online_cpus(); +} + /** * release_pages - batched put_page() * @pages: array of pages to release