Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp723874imm; Mon, 21 May 2018 13:14:14 -0700 (PDT) X-Google-Smtp-Source: AB8JxZq7tcvEJr3kwZdN9SFtkLpOI9Kkt61r3eRYltpv9GV9eZplFBUMIWnb1tZP1eLxPYWmN6U3 X-Received: by 2002:a17:902:76c3:: with SMTP id j3-v6mr21285337plt.15.1526933654839; Mon, 21 May 2018 13:14:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526933654; cv=none; d=google.com; s=arc-20160816; b=SYU0IUHu1hKAVkiD3AkhjbJcj6YARujZcWNCgNNcH4JuZqrseiKol5yiQrTJxCm5HA 2vcLXsZNJo3g0Yd93xizOoTEfU4tKXazvf4CvIlLfgVZd5mwF2L22HicO9SadeMTZKZN hDJWNVSyn4B/hg0M07fpa/uJDchx5KVHn0X+42nuGh3fwZke7jEQzjNjYKZqubyY63zP L2fG3WeM+7/wFw/1FFMmLpWSV5i2zuDPKxR8ZytNIL/IL5YsIeszgwCijdE6hBixlrMU 9MjT/9b0rG+i85T8h6oyqf+yEmChaHEWtpK7MvQIL3PAufrFbeWsnF6mmKlmgX7iM2fa cv3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=RICTi7SOFlKeNPLdlNKr76Kl3Uv9JbKPo26nmo5yiFs=; b=pVZaf4Aa4VTp4OlM1fZbjE6HHxrhroPEaqPDgysy+cRPADRBMOzLlWxS3Vd1PixAcG cW0y35OxRv/J/rYHjdymTvECnWZLf8PR7/OHSJOMfVYi7iDJyiH8sbIXcMSSWIfq7s3+ KYhRI36nLSalv4dTI9xKzXPRbtlz2wB2Re7EgHMWSdD0nRTYMxVNrngh3rUnQJylEHb/ yRMiH7dvzhGfuJGSfEo7VWdcgs69AMxtpbcKWlbObeoFM+J4vRYnD3YbuIeZH2U2zlfC QtEr33Gm82i4xpnwOGfL383MRpFn8vwCxBQZEZiG/pgiDGzcvSkuXCNaeOh7Xo2NCt4U RFlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QJk1ikt7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i1-v6si11546275pgq.327.2018.05.21.13.13.59; Mon, 21 May 2018 13:14:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=QJk1ikt7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751298AbeEUUMj (ORCPT + 99 others); Mon, 21 May 2018 16:12:39 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:44812 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750993AbeEUUMe (ORCPT ); Mon, 21 May 2018 16:12:34 -0400 Received: by mail-wr0-f194.google.com with SMTP id y15-v6so17173377wrg.11 for ; Mon, 21 May 2018 13:12:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RICTi7SOFlKeNPLdlNKr76Kl3Uv9JbKPo26nmo5yiFs=; b=QJk1ikt7Hb4WDU1GjGQOYIlGVbAyD86TamljYjtU80T7hIBH/M4e4JIsDwvrDeUkQ7 QHUNXZ2deSOFnD0yvW4eEi1OozCk6Rbk/LDtvJWg+GZuqwd/on0qgCvkElxycNMw4Spn 7uMrEuH110QBx47i03s/N/X4GRc5gfyg+YtHNr5H+7dsohCQa7Uwx8HKk5jYUQr7g+/z Gjl6rqfQNcgJXR9WKXUzsWMZ48G4npm9ivUkfyShAJ2CUq92jwCqB1BhMoMRu5DjdapG 35BFAxKNoBGdLqcBH2u0LRfBsNg7HyMP0BwE6U739H3n48jIYJ4+JUoMvVXLrm//sX38 lNtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RICTi7SOFlKeNPLdlNKr76Kl3Uv9JbKPo26nmo5yiFs=; b=N5x6wIW8dXXf+Tnjps+n6ekg39LeHh2NT+2aOWM1uM98XeQYLHomv+9v2aZR/iKuoV KslOJzVAwzgnpyaVO1UWLWcfytxtB2YqQwI1CFfdnDdxRksY8wnY2MVJvmgrf1cHsS5y 2I+bnif4jInEJREG8LuXR4WVexNXIRFK2yoTfNPfmKQTTVS4MKTwF6ui/QF0Fuk5IJzu ANoQ9S+8NbSO6wdM2VNJn9QHMBBElV1mB70O57Q2gLhj4+e/NFhWYMtHFOBT8QqY1s2P 9nEfFTJWXYeC53R2XVpN/gpolbxlg6+JEYBABiMfK3LRNhfYz8frqahuFqAhZ6vmG/Fa ajVQ== X-Gm-Message-State: ALKqPwecuUMTsevLlItZM3O2f3BAA+Hn7FNKZHtvQ4b75Vy2E6WM999R SRkN+2RapeQiYzgcMWA+w97jv+BAXsiOS/dvAigYGw== X-Received: by 2002:adf:adf0:: with SMTP id w103-v6mr17683399wrc.101.1526933552279; Mon, 21 May 2018 13:12:32 -0700 (PDT) MIME-Version: 1.0 References: <20180521174116.171846-1-shakeelb@google.com> <20180521114227.233983ac7038a9f4bf5b7066@linux-foundation.org> In-Reply-To: <20180521114227.233983ac7038a9f4bf5b7066@linux-foundation.org> From: Shakeel Butt Date: Mon, 21 May 2018 13:12:20 -0700 Message-ID: Subject: Re: [PATCH] mm: fix race between kmem_cache destroy, create and deactivate To: Andrew Morton Cc: Michal Hocko , Greg Thelen , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Vladimir Davydov , Tejun Heo , Linux MM , Cgroups , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 21, 2018 at 11:42 AM Andrew Morton wrote: > On Mon, 21 May 2018 10:41:16 -0700 Shakeel Butt wrote: > > The memcg kmem cache creation and deactivation (SLUB only) is > > asynchronous. If a root kmem cache is destroyed whose memcg cache is in > > the process of creation or deactivation, the kernel may crash. > > > > Example of one such crash: > > general protection fault: 0000 [#1] SMP PTI > > CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp > > ... > > Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn > > RIP: 0010:has_cpu_slab > > ... > > Call Trace: > > ? on_each_cpu_cond > > __kmem_cache_shrink > > kmemcg_cache_deact_after_rcu > > kmemcg_deactivate_workfn > > process_one_work > > worker_thread > > kthread > > ret_from_fork+0x35/0x40 > > > > This issue is due to the lack of reference counting for the root > > kmem_caches. There exist a refcount in kmem_cache but it is actually a > > count of aliases i.e. number of kmem_caches merged together. > > > > This patch make alias count explicit and adds reference counting to the > > root kmem_caches. The reference of a root kmem cache is elevated on > > merge and while its memcg kmem_cache is in the process of creation or > > deactivation. > > > The patch seems depressingly complex. > And a bit underdocumented... I will add more documentation to the code. > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -674,6 +674,8 @@ struct memcg_cache_params { > > }; > > > > int memcg_update_all_caches(int num_memcgs); > > +bool kmem_cache_tryget(struct kmem_cache *s); > > +void kmem_cache_put(struct kmem_cache *s); > > > > /** > > * kmalloc_array - allocate memory for an array. > > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h > > index d9228e4d0320..4bb22c89a740 100644 > > --- a/include/linux/slab_def.h > > +++ b/include/linux/slab_def.h > > @@ -41,7 +41,8 @@ struct kmem_cache { > > /* 4) cache creation/removal */ > > const char *name; > > struct list_head list; > > - int refcount; > > + refcount_t refcount; > > + int alias_count; > The semantic meaning of these two? What locking protects alias_count? SLAB and SLUB allow reusing existing root kmem caches. The alias_count of a kmem cache tells the number of times this kmem cache is reused (maybe shared_count or reused_count are better names). Basically if there were 5 root kmem cache creation request and suppose SLAB/SLUB decide to reuse the first kmem cache created for next 4 requests then this count will be 5 and all 5 will be pointing to the same kmem_cache object. Before this patch, alias_count (previously named refcount) was modified only within slab_mutex but can be read outside. It was conflated into multiple things like shared count, reference count and unmergeable flag (if -ve). This patch decouples the reference counting from this field and there is no need to protect alias_count with locks. > > int object_size; > > int align; > > > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h > > index 3773e26c08c1..532d4b6f83ed 100644 > > --- a/include/linux/slub_def.h > > +++ b/include/linux/slub_def.h > > @@ -97,7 +97,8 @@ struct kmem_cache { > > struct kmem_cache_order_objects max; > > struct kmem_cache_order_objects min; > > gfp_t allocflags; /* gfp flags to use on each alloc */ > > - int refcount; /* Refcount for slab cache destroy */ > > + refcount_t refcount; /* Refcount for slab cache destroy */ > > + int alias_count; /* Number of root kmem caches merged */ > "merged" what with what in what manner? shared or reused might be better words here. > > void (*ctor)(void *); > > unsigned int inuse; /* Offset to metadata */ > > unsigned int align; /* Alignment */ > > > > ... > > > > --- a/mm/slab.h > > +++ b/mm/slab.h > > @@ -25,7 +25,8 @@ struct kmem_cache { > > unsigned int useroffset;/* Usercopy region offset */ > > unsigned int usersize; /* Usercopy region size */ > > const char *name; /* Slab name for sysfs */ > > - int refcount; /* Use counter */ > > + refcount_t refcount; /* Use counter */ > > + int alias_count; > Semantic meaning/usage of alias_count? Locking for it? Will add in the next version. > > void (*ctor)(void *); /* Called on object slot creation */ > > struct list_head list; /* List of all slab caches on the system */ > > }; > > > > ... > > > > +bool kmem_cache_tryget(struct kmem_cache *s) > > +{ > > + if (is_root_cache(s)) > > + return refcount_inc_not_zero(&s->refcount); > > + return false; > > +} > > + > > +void kmem_cache_put(struct kmem_cache *s) > > +{ > > + if (is_root_cache(s) && > > + refcount_dec_and_test(&s->refcount)) > > + __kmem_cache_destroy(s, true); > > +} > > + > > +void kmem_cache_put_locked(struct kmem_cache *s) > > +{ > > + if (is_root_cache(s) && > > + refcount_dec_and_test(&s->refcount)) > > + __kmem_cache_destroy(s, false); > > +} > Some covering documentation for the above would be useful. Why do they > exist, why do they only operate on the root cache? etc. Ack.