Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1300506imm; Fri, 8 Jun 2018 13:36:36 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLeLACOYfiLwCVnJUIDLT3BSYFm0C4CKsQ+aVZqKlibzogm5yywc4BVerEWTyW4Nh2VTt1L X-Received: by 2002:a17:902:206:: with SMTP id 6-v6mr7988330plc.294.1528490196048; Fri, 08 Jun 2018 13:36:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528490196; cv=none; d=google.com; s=arc-20160816; b=0BD0WSX8Z4kT27VczQUoKDDyohfRkdRbHqgye258QrBnNuB7aYsV+ziyzuCI3d4hCZ FMMcdh7CHOzvKEliJqnmKbr6MD/v4gvKSJFMZf0Rrd5Kv4apPlyB7IyaA7SYeGW1mBrY 2mbVpDiAgRg6a/gUR+80YqZ2zgxVWYDU5BaXa2Q859fi1NJrLKDq+5Ryut/u3TfqLEFE CxVhmC6oLjpVX/2mNbtD1qejxxw22W7VsF/ajn+0S/MPAauJVUVIsLZ/map/X7sbzThI WrpIV1M1vKvgulNqiKSYSCgN2DZOaX9T3wWvd23fTgOYgqy9CsXmui+t/t7tE0DgAkYR 7w0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=2ppRF4EUqUk+t+LzOigvvm1S6WDO9mSilo5DEd81uAk=; b=xR5LMITQ+YlBXyCumlPOgfSgiXe7NqCbwTwoC2e54g3V1CBQGTu6NfqZSh+t0lYbbU loDYQtjGS3kDLg8VS15n6Mlx6rmIZKyb+tXd9S5TriFw7yn2zenpFJTMp9gY4AbOTcHF 9ztZJApQuqezFE2Hfyes2ITCtC1ICA4Skt2h9b4dST64Udi+0CGpFaeFMCyTUSTHqV6Y gjzZ4kV0u/kOW7wdU9ExXTbmoG37Rz81vLzZ+jLjXzROqK6iplcH0whWjOXcq1Z/XIl1 JWsBJDtduEkfAgqUs1QCr5eTXJPaJKWuxLTstNIjJS28loyjo2caueL2Mg3SXbd5phBl MCxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c14-v6si21069382pls.32.2018.06.08.13.36.21; Fri, 08 Jun 2018 13:36:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753103AbeFHUfz (ORCPT + 99 others); Fri, 8 Jun 2018 16:35:55 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:52188 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752754AbeFHUfy (ORCPT ); Fri, 8 Jun 2018 16:35:54 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 49E3DA7B; Fri, 8 Jun 2018 20:35:53 +0000 (UTC) Date: Fri, 8 Jun 2018 13:35:52 -0700 From: Andrew Morton To: Shakeel Butt Cc: Michal Hocko , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Greg Thelen , Johannes Weiner , Vladimir Davydov , Tejun Heo , Linux MM , Cgroups , LKML Subject: Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate Message-Id: <20180608133552.d9839eb08f6b98ec4d0ad783@linux-foundation.org> In-Reply-To: <20180530001204.183758-1-shakeelb@google.com> References: <20180530001204.183758-1-shakeelb@google.com> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 29 May 2018 17:12:04 -0700 Shakeel Butt wrote: > The memcg kmem cache creation and deactivation (SLUB only) is > asynchronous. If a root kmem cache is destroyed whose memcg cache is in > the process of creation or deactivation, the kernel may crash. > > Example of one such crash: > general protection fault: 0000 [#1] SMP PTI > CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp > ... > Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn > RIP: 0010:has_cpu_slab > ... > Call Trace: > ? on_each_cpu_cond > __kmem_cache_shrink > kmemcg_cache_deact_after_rcu > kmemcg_deactivate_workfn > process_one_work > worker_thread > kthread > ret_from_fork+0x35/0x40 > > To fix this race, on root kmem cache destruction, mark the cache as > dying and flush the workqueue used for memcg kmem cache creation and > deactivation. We have a distinct lack of reviewers and testers on this one. Please. Vladimir, v3 replaced the refcounting with workqueue flushing, as you suggested... From: Shakeel Butt Subject: mm: fix race between kmem_cache destroy, create and deactivate The memcg kmem cache creation and deactivation (SLUB only) is asynchronous. If a root kmem cache is destroyed whose memcg cache is in the process of creation or deactivation, the kernel may crash. Example of one such crash: general protection fault: 0000 [#1] SMP PTI CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp ... Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn RIP: 0010:has_cpu_slab ... Call Trace: ? on_each_cpu_cond __kmem_cache_shrink kmemcg_cache_deact_after_rcu kmemcg_deactivate_workfn process_one_work worker_thread kthread ret_from_fork+0x35/0x40 To fix this race, on root kmem cache destruction, mark the cache as dying and flush the workqueue used for memcg kmem cache creation and deactivation. [shakeelb@google.com: add more documentation, rename fields for readability] Link: http://lkml.kernel.org/r/20180522201336.196994-1-shakeelb@google.com [akpm@linux-foundation.org: fix build, per Shakeel] [shakeelb@google.com: v3. Instead of refcount, flush the workqueue] Link: http://lkml.kernel.org/r/20180530001204.183758-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180521174116.171846-1-shakeelb@google.com Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Greg Thelen Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Tejun Heo Signed-off-by: Andrew Morton --- include/linux/slab.h | 1 + mm/slab_common.c | 21 ++++++++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) diff -puN include/linux/slab_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate include/linux/slab_def.h diff -puN include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate include/linux/slab.h --- a/include/linux/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate +++ a/include/linux/slab.h @@ -600,6 +600,7 @@ struct memcg_cache_params { struct memcg_cache_array __rcu *memcg_caches; struct list_head __root_caches_node; struct list_head children; + bool dying; }; struct { struct mem_cgroup *memcg; diff -puN include/linux/slub_def.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate include/linux/slub_def.h diff -puN mm/memcontrol.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/memcontrol.c diff -puN mm/slab.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/slab.c diff -puN mm/slab_common.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/slab_common.c --- a/mm/slab_common.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate +++ a/mm/slab_common.c @@ -136,6 +136,7 @@ void slab_init_memcg_params(struct kmem_ s->memcg_params.root_cache = NULL; RCU_INIT_POINTER(s->memcg_params.memcg_caches, NULL); INIT_LIST_HEAD(&s->memcg_params.children); + s->memcg_params.dying = false; } static int init_memcg_params(struct kmem_cache *s, @@ -608,7 +609,7 @@ void memcg_create_kmem_cache(struct mem_ * The memory cgroup could have been offlined while the cache * creation work was pending. */ - if (memcg->kmem_state != KMEM_ONLINE) + if (memcg->kmem_state != KMEM_ONLINE || root_cache->memcg_params.dying) goto out_unlock; idx = memcg_cache_id(memcg); @@ -712,6 +713,9 @@ void slab_deactivate_memcg_cache_rcu_sch WARN_ON_ONCE(s->memcg_params.deact_fn)) return; + if (s->memcg_params.root_cache->memcg_params.dying) + return; + /* pin memcg so that @s doesn't get destroyed in the middle */ css_get(&s->memcg_params.memcg->css); @@ -823,11 +827,24 @@ static int shutdown_memcg_caches(struct return -EBUSY; return 0; } + +static void flush_memcg_workqueue(struct kmem_cache *s) +{ + mutex_lock(&slab_mutex); + s->memcg_params.dying = true; + mutex_unlock(&slab_mutex); + + flush_workqueue(memcg_kmem_cache_wq); +} #else static inline int shutdown_memcg_caches(struct kmem_cache *s) { return 0; } + +static inline void flush_memcg_workqueue(struct kmem_cache *s) +{ +} #endif /* CONFIG_MEMCG && !CONFIG_SLOB */ void slab_kmem_cache_release(struct kmem_cache *s) @@ -845,6 +862,8 @@ void kmem_cache_destroy(struct kmem_cach if (unlikely(!s)) return; + flush_memcg_workqueue(s); + get_online_cpus(); get_online_mems(); diff -puN mm/slab.h~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/slab.h diff -puN mm/slub.c~mm-fix-race-between-kmem_cache-destroy-create-and-deactivate mm/slub.c _