Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp686459pxy; Wed, 5 May 2021 11:13:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxqeiYDIHJ4o/+GTJj6QHAVzYh0JgUD0/W7KD/oVHTjrwHcu8tyJ1Rkee/lofQxAkXw+LsF X-Received: by 2002:aa7:de8b:: with SMTP id j11mr316543edv.363.1620238402352; Wed, 05 May 2021 11:13:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620238402; cv=none; d=google.com; s=arc-20160816; b=a+BHm7yfb2kY3tPWTbg+jQQIIgksErH5XfsW2/A4XnzlQVhzO4KQRZkDSovPE0cJR9 11VtjYkfTQycgJX0ZU4UcdemKs9YtC6GC/gbxDWMpo7AndAqQXXHZn/0Khu0UFSYVL/W vbvZgZme9RQJrqQnFjf21U1+pLH/uD4Vf8quuHJYQZykH9KloZl52GhKs7YMVvBmL9vG 1YZnNdZhfdSo7poJQOS/7MAjRN2PyhIUmsJl4DMwmSA+CkjA0p0bbTFqwEklZWzYTqmW 42g6H7vZA5JBDuF5mYTIVTzmfELxqpaE9Yd2x3zCh3ZkeDJSybfc/bNZYdDwoDZuqkju A3xQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature; bh=nw/NXWJf7YefMV4xFftmS9WnDS1NhKlaE8hzMyBIij0=; b=TexheSnj9xALOLbXG3fuuYcgf8fJfWiyDnnDJcdrUBsJGQkVuUJ8pBmuIWXXER4pHk vIukPHLsJm4M2xiW0J6WzGTNCir6k082IIedNBktgtUNWehBDcEzfqvZKZ3ntBF0/85g CaRpQK0frSmLfYrADKnTNfwGbLzMdRVN+p9r68jtGxYzgf2ZXyK5tLPFnGykFohwkxzA 1FFfjFaqwrdRliidhNkAifOcC7aAkmPp85yUTrc8CkP90g5UuwW4SOyBwm8VSma15U44 2jyCpZ8EvgGZhkUA5jH2qKApNViU1ckIaadDdyr4yGBfYVpjYAm09yw9NFHyhBBxuM+s sQYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ja9IWhvq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i19si82933ejx.231.2021.05.05.11.12.56; Wed, 05 May 2021 11:13:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ja9IWhvq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234165AbhEESMy (ORCPT + 99 others); Wed, 5 May 2021 14:12:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:44357 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233946AbhEESMy (ORCPT ); Wed, 5 May 2021 14:12:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620238316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nw/NXWJf7YefMV4xFftmS9WnDS1NhKlaE8hzMyBIij0=; b=Ja9IWhvqR+R7g0HFM0Rot7+v4B9/Vnr+Q3gJqLhjodJ6G8kIhen2LvT4GL0QI2hdSp8Rrr fTmQrz1DXlygp4CVYbJp5hPze+BqWhQXsAvahBVgX3FGTexniw3YMBN5q6/dxsS9n9N+lN 1FahD9MjAUzeAmYwtyahpSYNZVvP8tg= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-160-W0O7LhssOgOuhnUW3CZB3g-1; Wed, 05 May 2021 14:11:55 -0400 X-MC-Unique: W0O7LhssOgOuhnUW3CZB3g-1 Received: by mail-qv1-f71.google.com with SMTP id d11-20020a0cdb0b0000b02901c0da4391d5so2281445qvk.12 for ; Wed, 05 May 2021 11:11:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=nw/NXWJf7YefMV4xFftmS9WnDS1NhKlaE8hzMyBIij0=; b=su2XyEKxSJL41aM4xliAJ0q9kaGCvAWAdSdFD9/+rwE784uZZ3x+pSE+8eqPDf8loO Y9MWK8leCYp2WPB1OwYryfb4d3ahyF7Isi3eiUSiUEAyy7lYWi8crhUrHj+E42iQ/cx/ pwqpTcLV+cTvcne63NPCuUJCH89YAxK0yQXT3sGDH8xmL/x3l55vQhfBW5HZLOTlXKHY KnlO9pkWTCLr21nAXdkJmIHPJ9qCiHKvK1GcULh8YxkEzerh+BAUyUU1m4Ntynehy83X i7HgSr6gkqvneteZhiTkLArZWFR8pSzrJJ/rfgRic52+Zc9SgInxAvlqKXJum8pFmVXv y5+Q== X-Gm-Message-State: AOAM530BoonFvHdJRTgYwIz0mVNgaBYfmIdFVB5/mGqzaeEtjjAL+Tyu bQCetA5hdvASNxfZIny99h8COHOW0qySyPPIEh5ps6wb6whu+kbx4bBzm4s6mLoCa3yHm4oehnG kKokdxw97sIXfnOf9H/lj1HvM X-Received: by 2002:a37:de14:: with SMTP id h20mr53708qkj.34.1620238314917; Wed, 05 May 2021 11:11:54 -0700 (PDT) X-Received: by 2002:a37:de14:: with SMTP id h20mr53688qkj.34.1620238314589; Wed, 05 May 2021 11:11:54 -0700 (PDT) Received: from llong.remote.csb ([2601:191:8500:76c0::cdbc]) by smtp.gmail.com with ESMTPSA id z4sm5539059qtq.34.2021.05.05.11.11.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 05 May 2021 11:11:54 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Subject: Re: [PATCH v3 2/2] mm: memcg/slab: Create a new set of kmalloc-cg- caches To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Shakeel Butt , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org References: <20210505154613.17214-1-longman@redhat.com> <20210505154613.17214-3-longman@redhat.com> Message-ID: Date: Wed, 5 May 2021 14:11:52 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/5/21 1:30 PM, Roman Gushchin wrote: > On Wed, May 05, 2021 at 11:46:13AM -0400, Waiman Long wrote: >> There are currently two problems in the way the objcg pointer array >> (memcg_data) in the page structure is being allocated and freed. >> >> On its allocation, it is possible that the allocated objcg pointer >> array comes from the same slab that requires memory accounting. If this >> happens, the slab will never become empty again as there is at least >> one object left (the obj_cgroup array) in the slab. >> >> When it is freed, the objcg pointer array object may be the last one >> in its slab and hence causes kfree() to be called again. With the >> right workload, the slab cache may be set up in a way that allows the >> recursive kfree() calling loop to nest deep enough to cause a kernel >> stack overflow and panic the system. >> >> One way to solve this problem is to split the kmalloc- caches >> (KMALLOC_NORMAL) into two separate sets - a new set of kmalloc- >> (KMALLOC_NORMAL) caches for non-accounted objects only and a new set of >> kmalloc-cg- (KMALLOC_CGROUP) caches for accounted objects only. All >> the other caches can still allow a mix of accounted and non-accounted >> objects. > I agree that it's likely the best approach here. Thanks for discovering > and fixing the problem! > >> With this change, all the objcg pointer array objects will come from >> KMALLOC_NORMAL caches which won't have their objcg pointer arrays. So >> both the recursive kfree() problem and non-freeable slab problem are >> gone. Since both the KMALLOC_NORMAL and KMALLOC_CGROUP caches no longer >> have mixed accounted and unaccounted objects, this will slightly reduce >> the number of objcg pointer arrays that need to be allocated and save >> a bit of memory. > Unfortunately the positive effect of this change will be likely > reversed by a lower utilization due to a larger number of caches. That is also true, will mention that. > > Btw, I wonder if we also need a change in the slab caches merging procedure? > KMALLOC_NORMAL caches should not be merged with caches which can potentially > include accounted objects. Thank for catching this omission. I will take a look and modify the merging procedure in a new patch. Accounting is usually specified at kmem_cache_create() time. Though, I did find one instance of setting ACCOUNT flag in kmem_cache_alloc(), I will ignore this case and merge accounted, but unreclaimable caches to KMALLOC_CGROUP. > >> The new KMALLOC_CGROUP is added between KMALLOC_NORMAL and >> KMALLOC_RECLAIM so that the first for loop in create_kmalloc_caches() >> will include the newly added caches without change. >> >> Suggested-by: Vlastimil Babka >> Signed-off-by: Waiman Long >> --- >> include/linux/slab.h | 42 ++++++++++++++++++++++++++++++++++-------- >> mm/slab_common.c | 23 +++++++++++++++-------- >> 2 files changed, 49 insertions(+), 16 deletions(-) >> >> diff --git a/include/linux/slab.h b/include/linux/slab.h >> index 0c97d788762c..f2d9ebc34f5c 100644 >> --- a/include/linux/slab.h >> +++ b/include/linux/slab.h >> @@ -305,9 +305,16 @@ static inline void __check_heap_object(const void *ptr, unsigned long n, >> /* >> * Whenever changing this, take care of that kmalloc_type() and >> * create_kmalloc_caches() still work as intended. >> + * >> + * KMALLOC_NORMAL is for non-accounted objects only whereas KMALLOC_CGROUP >> + * is for accounted objects only. All the other kmem caches can have both >> + * accounted and non-accounted objects. >> */ >> enum kmalloc_cache_type { >> KMALLOC_NORMAL = 0, >> +#ifdef CONFIG_MEMCG_KMEM >> + KMALLOC_CGROUP, >> +#endif >> KMALLOC_RECLAIM, >> #ifdef CONFIG_ZONE_DMA >> KMALLOC_DMA, >> @@ -315,28 +322,47 @@ enum kmalloc_cache_type { >> NR_KMALLOC_TYPES >> }; >> >> +#ifndef CONFIG_MEMCG_KMEM >> +#define KMALLOC_CGROUP KMALLOC_NORMAL >> +#endif >> +#ifndef CONFIG_ZONE_DMA >> +#define KMALLOC_DMA KMALLOC_NORMAL >> +#endif >> + >> #ifndef CONFIG_SLOB >> extern struct kmem_cache * >> kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; >> >> +/* >> + * Define gfp bits that should not be set for KMALLOC_NORMAL. >> + */ >> +#define KMALLOC_NOT_NORMAL_BITS \ >> + (__GFP_RECLAIMABLE | \ >> + (IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0) | \ >> + (IS_ENABLED(CONFIG_MEMCG_KMEM) ? __GFP_ACCOUNT : 0)) >> + >> static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) >> { >> -#ifdef CONFIG_ZONE_DMA >> /* >> * The most common case is KMALLOC_NORMAL, so test for it >> * with a single branch for both flags. >> */ >> - if (likely((flags & (__GFP_DMA | __GFP_RECLAIMABLE)) == 0)) >> + if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0)) >> return KMALLOC_NORMAL; > Likely KMALLOC_CGROUP is also very popular, so maybe we want to change the > optimization here a bit. I doubt this optimization is really noticeable and whether KMALLOC_CGROUP is really popular will depend on the workloads. I am not planning to spend additional time to micro-optimize this part of the code. Cheers, Longman