Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp128696ybt; Tue, 16 Jun 2020 18:49:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCL4nfot3+z+z8QSrphvjUMGvNH3rZOhLAdgGeeF9w0NuPtFbNxvH1FywyjPvRNmAyBXtf X-Received: by 2002:a17:907:40bf:: with SMTP id nu23mr5298073ejb.74.1592358560447; Tue, 16 Jun 2020 18:49:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592358560; cv=none; d=google.com; s=arc-20160816; b=zm5z9JwsY0lbmUCseaY4D/D583j1xnCzq5pcKMZ47b43J4Om7vLkMOXnf9BV9m8EQw U+ThlqDuBrk0B/F9GxxCLMed9d7E3FtfouQMKji9cFeK0joOhk3fwqp2RL+PHJ6rzDaZ f1mzivSTqWNR2YhGdXau400H1vshy9fzGr1PJcYfR4OjitRtAGv0Y/XDsvoCjDTHEuq2 p+3YYZ70n/vOJWlRcAyuUShddDm01BhNSoldv7zjyoQnV6WQpZSUQ+8TrkMlH4atlJnW cMpl/cA51ZTPKmYLHqtC5Dqse5LMvMlvS0ARM/Yw7698pv6uzibgUHUUnEu8PkoqNLuB 2XNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=CKJmGYErZ9H0ka158EMoPYekKFURMtA3vN2REMqWfV0=; b=oTqEA9JhlnUy/Y3Ch6lebLbsQ2q1EOvbjt5FPvoV6B7i59uHpAZiZMmbwgdXUYJn3T ZL9/ztiCR8TfyayDEaZ5OYAf96XERusZOQ+KEXuEHZBPlsPRScBIWVQYTp1h9mGAVsHw uSis0RxoBkbbeXwJNeWIGhdQVVsrZtggEc2c5+N705AhXUP8RRALIggl7mIbBHpPZ9o+ 4OgRSGe01WkXKP3BsJ4BK0xgppspoX4KBjIQcXOPLBAWR/xxVK5LFM88KKwSWDekWnnQ P6lgWpU2F4oEctMUAsAkKwL4A/r0p2HvUuK6W1imrkqZsP/Kh6Uhzow6hv4ojBw+3e44 nvHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V2hPmbBt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v1si11701807ejd.509.2020.06.16.18.48.57; Tue, 16 Jun 2020 18:49:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V2hPmbBt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726583AbgFQBrN (ORCPT + 99 others); Tue, 16 Jun 2020 21:47:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726275AbgFQBrL (ORCPT ); Tue, 16 Jun 2020 21:47:11 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B2D5C061573 for ; Tue, 16 Jun 2020 18:47:10 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id y11so842624ljm.9 for ; Tue, 16 Jun 2020 18:47:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CKJmGYErZ9H0ka158EMoPYekKFURMtA3vN2REMqWfV0=; b=V2hPmbBtrX5NR4ylwcX16+HLpgroG9ByjyyN0A4ZGDfICDSqynbEVxpa3uAwA2fTmS iVAscO+KX4/4/G5376kAhvNHV8aEeq7EqOIcFS+be1QexX35ighWK/Y5FHsbXZrYkkP6 5O2z2Yv72gSZbXjNPZqdqQKVDuQ1Oe/J3SvaaxaEwO+pdC4BMb+5nP4b7Gbb7EpJNYUH XOIdgdiUqUR1Kz0IBT4gZcr0dTU3ExISwawYggM63BGNniLOOZhLeEAjNlY4mDbTBGkr DLse923+PPANQ+IpWkfIs9X+F0VVsKc7m6x0YBvncIVqFLjkgL0xUf5aR0CzqetHnoN9 09hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CKJmGYErZ9H0ka158EMoPYekKFURMtA3vN2REMqWfV0=; b=Dg9sCsVgwNp0QYYf/uTGKOsbd0cyrv04RN9Wa9ECQR/xiLZiW7Y2YbB8s0a+4+SXjD NQCSRLBW0bTt40G3TFK+BnElW2ChbSWfafl9fYNVs9vensjmoMU878IRIWVT9KI3PYL7 eJimYyF9WauorZbqSW4g/St6YvxWBbQSqMzR5pS0DN2guzs3TxdPPhBq814jOuPYC8W7 C+ofavP/KpzDF0VCToEVGzErSuUoaPfLGG8/qjVAMzDAtjosw+Mz05W/8r4XSaTiB0rb WrREilndOXbnZi5zTYATBVw0eb2WlTtFC/hjmS4Lh/VkeMDLknu9vfPm0yvEETHDkMQt lIkw== X-Gm-Message-State: AOAM531JDa/WEXPpcW0Tqw+buMRlx5cl8eHl5MUeLOln5cfrY03DGvxe sq2prGOFE0d6VaimjyYtcWbK8orRJXnz//jy6Nqbj7Sg X-Received: by 2002:a2e:8754:: with SMTP id q20mr2839836ljj.270.1592358428184; Tue, 16 Jun 2020 18:47:08 -0700 (PDT) MIME-Version: 1.0 References: <20200608230654.828134-1-guro@fb.com> In-Reply-To: <20200608230654.828134-1-guro@fb.com> From: Shakeel Butt Date: Tue, 16 Jun 2020 18:46:56 -0700 Message-ID: Subject: Re: [PATCH v6 00/19] The new cgroup slab memory controller To: Roman Gushchin Cc: Andrew Morton , Christoph Lameter , Johannes Weiner , Michal Hocko , Linux MM , Vlastimil Babka , Kernel Team , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin wrote: > > This is v6 of the slab cgroup controller rework. > > The patchset moves the accounting from the page level to the object > level. It allows to share slab pages between memory cgroups. > This leads to a significant win in the slab utilization (up to 45%) > and the corresponding drop in the total kernel memory footprint. Is this based on just SLUB or does this have a similar impact on SLAB as well? > The reduced number of unmovable slab pages should also have a positive > effect on the memory fragmentation. That would be awesome. We have seen fragmentation getting very bad on system (or node) level memory pressure. Is that the same for you? > > The patchset makes the slab accounting code simpler: there is no more > need in the complicated dynamic creation and destruction of per-cgroup > slab caches, all memory cgroups use a global set of shared slab caches. > The lifetime of slab caches is not more connected to the lifetime > of memory cgroups. > > The more precise accounting does require more CPU, however in practice > the difference seems to be negligible. We've been using the new slab > controller in Facebook production for several months with different > workloads and haven't seen any noticeable regressions. What we've seen > were memory savings in order of 1 GB per host (it varied heavily depending > on the actual workload, size of RAM, number of CPUs, memory pressure, etc). > > The third version of the patchset added yet another step towards > the simplification of the code: sharing of slab caches between > accounted and non-accounted allocations. It comes with significant > upsides (most noticeable, a complete elimination of dynamic slab caches > creation) but not without some regression risks, so this change sits > on top of the patchset and is not completely merged in. So in the unlikely > event of a noticeable performance regression it can be reverted separately. > Have you performed any [perf] testing on SLAB with this patchset? > v6: > 1) rebased on top of the mm tree > 2) removed a redundant check from cache_from_obj(), suggested by Vlastimil > > v5: > 1) fixed a build error, spotted by Vlastimil > 2) added a comment about memcg->nr_charged_bytes, asked by Johannes > 3) added missed acks and reviews > > v4: > 1) rebased on top of the mm tree, some fixes here and there > 2) merged obj_to_index() with slab_index(), suggested by Vlastimil > 3) changed objects_per_slab() to a better objects_per_slab_page(), > suggested by Vlastimil > 4) other minor fixes and changes > > v3: > 1) added a patch that switches to a global single set of kmem_caches > 2) kmem API clean up dropped, because if has been already merged > 3) byte-sized slab vmstat API over page-sized global counters and > bytes-sized memcg/lruvec counters > 3) obj_cgroup refcounting simplifications and other minor fixes > 4) other minor changes > > v2: > 1) implemented re-layering and renaming suggested by Johannes, > added his patch to the set. Thanks! > 2) fixed the issue discovered by Bharata B Rao. Thanks! > 3) added kmem API clean up part > 4) added slab/memcg follow-up clean up part > 5) fixed a couple of issues discovered by internal testing on FB fleet. > 6) added kselftests > 7) included metadata into the charge calculation > 8) refreshed commit logs, regrouped patches, rebased onto mm tree, etc > > v1: > 1) fixed a bug in zoneinfo_show_print() > 2) added some comments to the subpage charging API, a minor fix > 3) separated memory.kmem.slabinfo deprecation into a separate patch, > provided a drgn-based replacement > 4) rebased on top of the current mm tree > > RFC: > https://lwn.net/Articles/798605/ > > > Johannes Weiner (1): > mm: memcontrol: decouple reference counting from page accounting > > Roman Gushchin (18): > mm: memcg: factor out memcg- and lruvec-level changes out of > __mod_lruvec_state() > mm: memcg: prepare for byte-sized vmstat items > mm: memcg: convert vmstat slab counters to bytes > mm: slub: implement SLUB version of obj_to_index() > mm: memcg/slab: obj_cgroup API > mm: memcg/slab: allocate obj_cgroups for non-root slab pages > mm: memcg/slab: save obj_cgroup for non-root slab objects > mm: memcg/slab: charge individual slab objects instead of pages > mm: memcg/slab: deprecate memory.kmem.slabinfo > mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h > mm: memcg/slab: use a single set of kmem_caches for all accounted > allocations > mm: memcg/slab: simplify memcg cache creation > mm: memcg/slab: remove memcg_kmem_get_cache() > mm: memcg/slab: deprecate slab_root_caches > mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() > mm: memcg/slab: use a single set of kmem_caches for all allocations > kselftests: cgroup: add kernel memory accounting tests > tools/cgroup: add memcg_slabinfo.py tool > > drivers/base/node.c | 6 +- > fs/proc/meminfo.c | 4 +- > include/linux/memcontrol.h | 85 ++- > include/linux/mm_types.h | 5 +- > include/linux/mmzone.h | 24 +- > include/linux/slab.h | 5 - > include/linux/slab_def.h | 9 +- > include/linux/slub_def.h | 31 +- > include/linux/vmstat.h | 14 +- > kernel/power/snapshot.c | 2 +- > mm/memcontrol.c | 608 +++++++++++-------- > mm/oom_kill.c | 2 +- > mm/page_alloc.c | 8 +- > mm/slab.c | 70 +-- > mm/slab.h | 372 +++++------- > mm/slab_common.c | 643 +-------------------- > mm/slob.c | 12 +- > mm/slub.c | 229 +------- > mm/vmscan.c | 3 +- > mm/vmstat.c | 30 +- > mm/workingset.c | 6 +- > tools/cgroup/memcg_slabinfo.py | 226 ++++++++ > tools/testing/selftests/cgroup/.gitignore | 1 + > tools/testing/selftests/cgroup/Makefile | 2 + > tools/testing/selftests/cgroup/test_kmem.c | 382 ++++++++++++ > 25 files changed, 1374 insertions(+), 1405 deletions(-) > create mode 100755 tools/cgroup/memcg_slabinfo.py > create mode 100644 tools/testing/selftests/cgroup/test_kmem.c > > -- > 2.25.4 >