Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp5401283imb; Thu, 7 Mar 2019 15:02:33 -0800 (PST) X-Google-Smtp-Source: APXvYqzUPoys6Fl/wYi2kV8r4pKIQASQpv3IK1wQtLsV6Z3vNNSyedBCz/2ZAxgTOUkJkcicVnqi X-Received: by 2002:a17:902:7207:: with SMTP id ba7mr15049228plb.16.1551999753046; Thu, 07 Mar 2019 15:02:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551999753; cv=none; d=google.com; s=arc-20160816; b=bzU9vl5u9AxdxT1Jz6cpIpLPcj9YrZWxfZWloHk7KwzYhKEHtAjU9onp+riT19qE7X F0hu5wHaX7cSGsY8vzovy0S8/y60nPBJNWSt3H8k9FwXUWWSCBBhNDp754G4cPZNu8u1 kca2chB/Rd2rpflOV+K9GlHWnH2nwwvvLuCIcK8LyEtSU9B8ceQxJ05Y24tdk7wPvACa Fk7e2PqTOOXVGInfDpZuGXyoM8ADWcg52r0AKghNLgLgAkj110tA1/83npT2Nb2eAzeN xQV6ByHqVMP0ybUKnKBT8PLiWeBGFHaPItF0zftogc+JTNfe49SP1Ki1ObS6sXtZ3WFY /tJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=BZiJH4Ow73tuiBV88Fygbaz1AoqkBYeF4+n2mjYnICk=; b=Ck7fbEjBDBTJk+Tz60ZFwVICV6vW3eCpACofCHHz3nL3aTwH3LAidMqiD33vKL7kcC TTr8TJA3ZgpbIpxsTJJBIrujtZlzf9ulPpDxFdscK6bVsYThyJvspTYpCfH6YtBkzOSq TQLkMF5KVipNT8Kie9rVDP2oDokPq8Tlf9KsHTUYuHm+U8jjb1SDkFUifZv//J8UwuUO eNMJbWRYiE6CQyliU+/Wl1iTZJLah6BhF+9FcnYQnV5iZuwX1w4dzVY/CSPyDrMoGeOA EFYaIHqja1G88XGqsYEzDRtzIrQhrF9VMrm3Z3yIGXJAgRBNiz1ZMQHxQE1FKGEAjPlx a2yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=baZX8c4F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3si5650705plb.402.2019.03.07.15.02.17; Thu, 07 Mar 2019 15:02:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=baZX8c4F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726297AbfCGXAi (ORCPT + 99 others); Thu, 7 Mar 2019 18:00:38 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:40852 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbfCGXAi (ORCPT ); Thu, 7 Mar 2019 18:00:38 -0500 Received: by mail-pg1-f194.google.com with SMTP id u9so12453523pgo.7 for ; Thu, 07 Mar 2019 15:00:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=BZiJH4Ow73tuiBV88Fygbaz1AoqkBYeF4+n2mjYnICk=; b=baZX8c4FlLhpBJPoSRgY7Rp8YON53q0tykAHOZ5cFNIVg+FeYidnaWg0Xc9sGGNEWk LCBIUz5cj6+rxQHhInhFStRag16qQA+B2B6NEkHO4vD5bfljuKUoRiMMIT9luBOzntXx oSiY+uE4SBXvn9gG3Cr39R20AvM6bDAcTwPcNIwX0/bFBn8uYzDhezkh+hART3E2Ggcn jkW2WiGr/jXjr7Y+w2ejT8xwPRas/HzTE66VtqeSFoxNdoxI8UcHalb5OLBQEs7onBpU LywumicyLOtWGsmFRNxtchpqZJ6Aqyx+naQuORMV7O9k/HCsKJkM/QwmNSN65dhh/m3x hi6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=BZiJH4Ow73tuiBV88Fygbaz1AoqkBYeF4+n2mjYnICk=; b=d8G8ZFkYTJzAwF/RL7foj5vdtLNuJo6gN3fpqwfLjUgfBkEgrtOHHIdpPHY2Z7MKZh AHhO1iZkCdM7zu558olRjxMCjJr4gUsjQNl9ibCVoyQM0RaGLtmog3f54WuGa8ZaYh9f UBpH6bxxHrkuq4WaQ5NRa9ZC9jSBX9VOSW1XeDy/NID444tA+WAqna0H4+HqWY7BqSRF RxSe/sO6gFoOmGcKyBg5YPlrWcoV0IMDUIMoAwAMz9KfApDagOP+qjJW+Itm28TdyayL sy0fZJlalH+F38SQW9pBva75FoKeKri3tCoLDYcU4YmKwENRO0Vp87wg66nqDmvMwMR5 w9vQ== X-Gm-Message-State: APjAAAWVIMDz3BEaz1+0L6m2pABDgMUyZBx3WVX84YAaAfFnOcE+zbGQ LekJPkmR+AtSmfcnNHEQZL+aJDZ2 X-Received: by 2002:a65:4244:: with SMTP id d4mr13840826pgq.419.1551999637002; Thu, 07 Mar 2019 15:00:37 -0800 (PST) Received: from tower.thefacebook.com ([2620:10d:c090:200::2:d18b]) by smtp.gmail.com with ESMTPSA id i126sm11864806pfb.15.2019.03.07.15.00.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 07 Mar 2019 15:00:36 -0800 (PST) From: Roman Gushchin X-Google-Original-From: Roman Gushchin To: linux-mm@kvack.org, kernel-team@fb.com Cc: linux-kernel@vger.kernel.org, Tejun Heo , Rik van Riel , Johannes Weiner , Michal Hocko , Roman Gushchin Subject: [PATCH 0/5] mm: reduce the memory footprint of dying memory cgroups Date: Thu, 7 Mar 2019 15:00:28 -0800 Message-Id: <20190307230033.31975-1-guro@fb.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A cgroup can remain in the dying state for a long time, being pinned in the memory by any kernel object. It can be pinned by a page, shared with other cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned by a vfs cache object, etc. Mostly because of percpu data, the size of a memcg structure in the kernel memory is quite large. Depending on the machine size and the kernel config, it can easily reach hundreds of kilobytes per cgroup. Depending on the memory pressure and the reclaim approach (which is a separate topic), it looks like several hundreds (if not single thousands) of dying cgroups is a typical number. On a moderately sized machine the overall memory footprint is measured in hundreds of megabytes. So if we can't completely get rid of dying cgroups, let's make them smaller. This patchset aims to reduce the size of a dying memory cgroup by the premature release of percpu data during the cgroup removal, and use of atomic counterparts instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node lruvec_stat_cpu. The same approach can be further applied to other percpu data. Results on my test machine (32 CPUs, singe node): With the patchset: Originally: nr_dying_descendants 0 Slab: 66640 kB Slab: 67644 kB Percpu: 6912 kB Percpu: 6912 kB nr_dying_descendants 1000 Slab: 85912 kB Slab: 84704 kB Percpu: 26880 kB Percpu: 64128 kB So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller. The difference will be even bigger on a bigger machine (especially, with NUMA). To test the patchset, I used the following script: CG=/sys/fs/cgroup/percpu_test/ mkdir ${CG} echo "+memory" > ${CG}/cgroup.subtree_control cat ${CG}/cgroup.stat | grep nr_dying_descendants cat /proc/meminfo | grep -e Percpu -e Slab for i in `seq 1 1000`; do mkdir ${CG}/${i} echo $$ > ${CG}/${i}/cgroup.procs dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null echo $$ > /sys/fs/cgroup/cgroup.procs rmdir ${CG}/${i} done cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants cat /proc/meminfo | grep -e Percpu -e Slab rmdir ${CG} Roman Gushchin (5): mm: prepare to premature release of memcg->vmstats_percpu mm: prepare to premature release of per-node lruvec_stat_cpu mm: release memcg percpu data prematurely mm: release per-node memcg percpu data prematurely mm: spill memcg percpu stats and events before releasing include/linux/memcontrol.h | 66 ++++++++++---- mm/memcontrol.c | 173 +++++++++++++++++++++++++++++++++---- 2 files changed, 204 insertions(+), 35 deletions(-) -- 2.20.1