Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp640974imm; Mon, 21 May 2018 11:44:01 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr8QMHqLDPNeCwoSklYe1piRnrgoZNf5C3pHuRsJ4GZSMXxb2/thURuy/mp63PcXppjMebA X-Received: by 2002:a63:7e08:: with SMTP id z8-v6mr16590741pgc.383.1526928241145; Mon, 21 May 2018 11:44:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526928241; cv=none; d=google.com; s=arc-20160816; b=xM5DU/cry7IOMu3Eu+0xIK/9jZM0OSlsoxnnHjBIroTybmxj1D9GyYdPT6ds9tz/56 lNSG9L2wxaPVnQJoe2zTf24sozpKI2Um9PUL16+tx54aOwkRZvWlZPfzF2fGevIJS5x4 nYOnkxsG5kcuHbI7ydAmkD1ONuiAiCH7yQ6hbf0K2St1SumO9JugsjzdxgavZMfd78x+ aYNDYOZVTrmEgLHG0JGV2hvTqtbwImgcK588yDlZ0EJiGQrLK6eC1fAK5a4nKpNHd17e fiM0FDtiBbQg+Gi7BsPL54JKQPR7xlZ660jO7I2lhCM4TaAfHnjkVh/9ox5kHJFaMYdW +/iA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=qtg9Gqb3fuu5IAqG+ELHXF35lkqUk+DUHllI3Sr/t4M=; b=A+ojhTyoKl2bqvTiscGMcfLD0TSKjNR4fAYzN39DLP7a8gbB4i5VOC+HpplSdtigE2 skfaSHkUJxXE8l3/vHLXy6AWDlCD/m3hfeF1ROx4Rod57rSHhLTVx3hynpeEhLeWmgZv faUIoGrlguHqvwlLYDOubFf0ch4OG/fOXEIWi6mD7Xm6bczn+S19UlR0KBHkIedhsJwk dR+V3OEqqTWAx7Ezvq+Jlxzwf84UnBBBv4I2l+dWM9t2yCS9SaqAhiqssDZdFkQYEmsU IM3vWz6fZV3OKCA0q+cwicTPtGntLTGh4/gurSwFpKMgKyAMYyZ3BKtsAW9Hpe4Vv8H7 lWZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h19-v6si865140pgv.501.2018.05.21.11.43.46; Mon, 21 May 2018 11:44:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751108AbeEUSmc (ORCPT + 99 others); Mon, 21 May 2018 14:42:32 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:39492 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750930AbeEUSm3 (ORCPT ); Mon, 21 May 2018 14:42:29 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.71]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id EEFA218D8; Mon, 21 May 2018 18:42:28 +0000 (UTC) Date: Mon, 21 May 2018 11:42:27 -0700 From: Andrew Morton To: Shakeel Butt Cc: Michal Hocko , Greg Thelen , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Vladimir Davydov , Tejun Heo , Linux MM , cgroups@vger.kernel.org, LKML Subject: Re: [PATCH] mm: fix race between kmem_cache destroy, create and deactivate Message-Id: <20180521114227.233983ac7038a9f4bf5b7066@linux-foundation.org> In-Reply-To: <20180521174116.171846-1-shakeelb@google.com> References: <20180521174116.171846-1-shakeelb@google.com> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 21 May 2018 10:41:16 -0700 Shakeel Butt wrote: > The memcg kmem cache creation and deactivation (SLUB only) is > asynchronous. If a root kmem cache is destroyed whose memcg cache is in > the process of creation or deactivation, the kernel may crash. > > Example of one such crash: > general protection fault: 0000 [#1] SMP PTI > CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp > ... > Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn > RIP: 0010:has_cpu_slab > ... > Call Trace: > ? on_each_cpu_cond > __kmem_cache_shrink > kmemcg_cache_deact_after_rcu > kmemcg_deactivate_workfn > process_one_work > worker_thread > kthread > ret_from_fork+0x35/0x40 > > This issue is due to the lack of reference counting for the root > kmem_caches. There exist a refcount in kmem_cache but it is actually a > count of aliases i.e. number of kmem_caches merged together. > > This patch make alias count explicit and adds reference counting to the > root kmem_caches. The reference of a root kmem cache is elevated on > merge and while its memcg kmem_cache is in the process of creation or > deactivation. > The patch seems depressingly complex. And a bit underdocumented... > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -674,6 +674,8 @@ struct memcg_cache_params { > }; > > int memcg_update_all_caches(int num_memcgs); > +bool kmem_cache_tryget(struct kmem_cache *s); > +void kmem_cache_put(struct kmem_cache *s); > > /** > * kmalloc_array - allocate memory for an array. > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h > index d9228e4d0320..4bb22c89a740 100644 > --- a/include/linux/slab_def.h > +++ b/include/linux/slab_def.h > @@ -41,7 +41,8 @@ struct kmem_cache { > /* 4) cache creation/removal */ > const char *name; > struct list_head list; > - int refcount; > + refcount_t refcount; > + int alias_count; The semantic meaning of these two? What locking protects alias_count? > int object_size; > int align; > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > index 3773e26c08c1..532d4b6f83ed 100644 > --- a/include/linux/slub_def.h > +++ b/include/linux/slub_def.h > @@ -97,7 +97,8 @@ struct kmem_cache { > struct kmem_cache_order_objects max; > struct kmem_cache_order_objects min; > gfp_t allocflags; /* gfp flags to use on each alloc */ > - int refcount; /* Refcount for slab cache destroy */ > + refcount_t refcount; /* Refcount for slab cache destroy */ > + int alias_count; /* Number of root kmem caches merged */ "merged" what with what in what manner? > void (*ctor)(void *); > unsigned int inuse; /* Offset to metadata */ > unsigned int align; /* Alignment */ > > ... > > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -25,7 +25,8 @@ struct kmem_cache { > unsigned int useroffset;/* Usercopy region offset */ > unsigned int usersize; /* Usercopy region size */ > const char *name; /* Slab name for sysfs */ > - int refcount; /* Use counter */ > + refcount_t refcount; /* Use counter */ > + int alias_count; Semantic meaning/usage of alias_count? Locking for it? > void (*ctor)(void *); /* Called on object slot creation */ > struct list_head list; /* List of all slab caches on the system */ > }; > > ... > > +bool kmem_cache_tryget(struct kmem_cache *s) > +{ > + if (is_root_cache(s)) > + return refcount_inc_not_zero(&s->refcount); > + return false; > +} > + > +void kmem_cache_put(struct kmem_cache *s) > +{ > + if (is_root_cache(s) && > + refcount_dec_and_test(&s->refcount)) > + __kmem_cache_destroy(s, true); > +} > + > +void kmem_cache_put_locked(struct kmem_cache *s) > +{ > + if (is_root_cache(s) && > + refcount_dec_and_test(&s->refcount)) > + __kmem_cache_destroy(s, false); > +} Some covering documentation for the above would be useful. Why do they exist, why do they only operate on the root cache? etc. > > ... >