Received: by 10.213.65.68 with SMTP id h4csp1584149imn; Mon, 26 Mar 2018 10:18:55 -0700 (PDT) X-Google-Smtp-Source: AG47ELs7soH9CMf4GVgF3s722W9BJMiwWncX8lG2ACufCqn8W8nz9az+v2EXH1VVvHif3SQ71NUq X-Received: by 10.101.81.75 with SMTP id g11mr23158855pgq.143.1522084735097; Mon, 26 Mar 2018 10:18:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522084735; cv=none; d=google.com; s=arc-20160816; b=PQSPB4AlrxsPnoFUvDbFgx7PN0DzM+iY90LYLIFRiaJimUvAAJ8pra/RcxcE3ZV7XT 1ehT9ZRSYPGDW958yNFuRi9tcJYKsi7dL5wy6ncVn2RFIbgDCA3PVlz20ussJfmZiryZ XPTwtbmlI3iffuBPhydvG6gnKdKKrTopRolaNgBQktrz7dOEcd6Jcm+qQLnk1NETXWyR +rpm9FZS2HwYAxyV0kvtJ/+zBKOlWSiIFWAFCCR9/hwNHJHrHmulFFFR+tyyi/dCTuuB 078m3X/ElgcduKdS9TpGQgsrpXW5x7llxVjpJCyXcDM0Xs2u6ziahEIsyhTnwK39Rj65 hbyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=evlGGFKT9s/3mqtTfc9siYSRpFdWy3BbQGFVjlYsB6k=; b=MKECRAlPhjXv8xuueTQLB0d+hRawJwsDcW0aaMfntRynMeQQLAndZsiOSsBd7dW6+P mj+ZIsos+CF7/2ANACV4ZEXjZ3GtC4JGituLtI25xgaTIk6cuypGIs6P1iG4pb0SyM96 mqrucRk5Bq7ncrV7v1MLHOt00+wSwMBr+RloPdJTR6gx+aP8XuW3Yr3tQnfJkyp6rhl3 He6cOStAmG0cL59bP45Arw4Q3SLWxSrDsvlzfhh2lSeAG9b0pWVDhJjRufnchQ45Efcg 4OxU32hqsYNkdtVNDCqZXJNk+J6MfA+ZsP3Fi6QmfIcGkwLimAs8MsWycBofjefN3jw2 BVrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=P6UE3HCo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f21si10258858pgn.693.2018.03.26.10.18.39; Mon, 26 Mar 2018 10:18:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=P6UE3HCo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752734AbeCZRRD (ORCPT + 99 others); Mon, 26 Mar 2018 13:17:03 -0400 Received: from mail-wr0-f179.google.com ([209.85.128.179]:40707 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752695AbeCZRQk (ORCPT ); Mon, 26 Mar 2018 13:16:40 -0400 Received: by mail-wr0-f179.google.com with SMTP id z8so19691044wrh.7 for ; Mon, 26 Mar 2018 10:16:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=evlGGFKT9s/3mqtTfc9siYSRpFdWy3BbQGFVjlYsB6k=; b=P6UE3HCoDBYUXIbBhPZYn6cnU2tUnNwGujq7aybgz0I67Ug9ozEvamTVRfOfIorSv0 TRA8fQTiMq+Mu/EjuUk8/xGoiX/sX/0vMBU7q9oA9v/y+mRIbiq7WLwIn95GGWkzqHMG fhbACaGJ1lDuoFoI/S0Ob1XMESHTYCnL5x4RtvUFOP5kGykU3rtpJGZPAUYuNNoB8eBz bNbU64Y6ZMgzmj8GkMuEnmPILe4koFepkdS5NhMe7AIBQd5sQlSgtOJ6/8ivnSjcstzn S9RaY7QrNFplgFII+xIPuZxVAYR+OEpAKJC3tXvU/eG2984sUGSaqWFHC3qAAMJ/W4KU az/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=evlGGFKT9s/3mqtTfc9siYSRpFdWy3BbQGFVjlYsB6k=; b=i9Ll2iZitjQ8rlsuadkwuZU2SeqPrKlUKKzh4hqES55qcobIHeo0S6E7PiyXpbr5eo G4m99dgj+rWytUFARKpdBomxKZU82o6l2Ux8z7iuabNDDDzLf+/jqsJlNUwFy0U3MaKF V+TuGqYWTY4e+2uR7dBNn1zGiyADgWSK68m0ykzNgeIS08GkgEsLVKQ4E6LMiSSTbVPf Gsw0DfTmL5rEekYLAwCtqk38hHPPduAJ87Wdf541oGbZxqBVfUHRzdOGXLoBCzY5UIgo z6rkqVaJkya+vIlrkFpikXN8DZVOy/nuX24tqfYH14PoX+LNyn7OHZt+YgTA2uJJz8tl Sbug== X-Gm-Message-State: AElRT7Hprcvt7T8mY4GYOxg67zzZAdUeMJ7PDQcuoW49xrKHk3xargPk zZf0HYAJU99sgEyEDIEjcx5wF9kgCdg+QzNZ6cOOug== X-Received: by 10.223.172.45 with SMTP id v42mr181706wrc.2.1522084598647; Mon, 26 Mar 2018 10:16:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.184.12 with HTTP; Mon, 26 Mar 2018 10:16:37 -0700 (PDT) In-Reply-To: <20180324131131.blg3eqsfjc6issp2@esperanza> References: <20180321224301.142879-1-shakeelb@google.com> <20180324131131.blg3eqsfjc6issp2@esperanza> From: Shakeel Butt Date: Mon, 26 Mar 2018 10:16:37 -0700 Message-ID: Subject: Re: [PATCH] mm, slab: eagerly delete inactive offlined SLABs To: Vladimir Davydov , Tejun Heo , Johannes Weiner Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Greg Thelen , Linux MM , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Tejun, Johannes Hi Vladimir, On Sat, Mar 24, 2018 at 6:11 AM, Vladimir Davydov wrote: > Hello Shakeel, > > The patch makes sense to me, but I have a concern about synchronization > of cache destruction vs concurrent kmem_cache_free. Please, see my > comments inline. > > On Wed, Mar 21, 2018 at 03:43:01PM -0700, Shakeel Butt wrote: >> With kmem cgroup support, high memcgs churn can leave behind a lot of >> empty kmem_caches. Usually such kmem_caches will be destroyed when the >> corresponding memcg gets released but the memcg release can be >> arbitrarily delayed. These empty kmem_caches wastes cache_reaper's time. >> So, the reaper should destroy such empty offlined kmem_caches. > >> diff --git a/mm/slab.c b/mm/slab.c >> index 66f2db98f026..9c174a799ffb 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -4004,6 +4004,16 @@ static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n, >> slabs_destroy(cachep, &list); >> } >> >> +static bool is_slab_active(struct kmem_cache *cachep) >> +{ >> + int node; >> + struct kmem_cache_node *n; >> + >> + for_each_kmem_cache_node(cachep, node, n) >> + if (READ_ONCE(n->total_slabs) - n->free_slabs) > > Why READ_ONCE total_slabs, but not free_slabs? > > Anyway, AFAIU there's no guarantee that this CPU sees the two fields > updated in the same order as they were actually updated on another CPU. > For example, suppose total_slabs is 2 and free_slabs is 1, and another > CPU is freeing a slab page concurrently from kmem_cache_free, i.e. > subtracting 1 from both total_slabs and free_slabs. Then this CPU might > see a transient state, when total_slabs is already updated (set to 1), > but free_slabs is not (still equals 1), and decide that it's safe to > destroy this slab cache while in fact it isn't. > > Such a race will probably not result in any serious problems, because > shutdown_cache() checks that the cache is empty and does nothing if it > isn't, but still it looks suspicious and at least deserves a comment. > To eliminate the race, we should check total_slabs vs free_slabs with > kmem_cache_node->list_lock held. Alternatively, I think we could just > check if total_slabs is 0 - sooner or later cache_reap() will release > all empty slabs anyway. > Checking total_slabs is 0 seems much simpler, I will test that. >> + return true; >> + return false; >> +} > >> @@ -4061,6 +4071,10 @@ static void cache_reap(struct work_struct *w) >> 5 * searchp->num - 1) / (5 * searchp->num)); >> STATS_ADD_REAPED(searchp, freed); >> } >> + >> + /* Eagerly delete inactive kmem_cache of an offlined memcg. */ >> + if (!is_memcg_online(searchp) && !is_slab_active(searchp)) > > I don't think we need to define is_memcg_online in generic code. > I would merge is_memcg_online and is_slab_active, and call the > resulting function cache_is_active. > Ack. >> + shutdown_cache(searchp); >> next: >> cond_resched(); >> } Currently I am holding off this patch as Greg Thelen has pointed out (offline) a race condition this patch will introduce between memcg_kmem_get_cache and the cache reaper. The memcg of the cache returned by memcg_kmem_get_cache() can get offline while the allocation is happening on that cache (allocation can take long time due to reclaim or memory pressure). The reaper will see that the memcg of this cache is offlined and let's say at the moment s->total_slabs is 0, the reaper will delete the cache while parallel allocation is going on. I was thinking of adding an API to force a memcg to be online (or rather delay the call to css_offline), something like css_tryget_stay_online()/css_put_online() and use it in memcg_kmem_get_cache() and memcg_kmem_put_cache(). However Tejun has advised to not go through that route, more specifically not to tie on/offling a css with accounting artifacts. I am still exploring more solutions. thanks, Shakeel