Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp911730pxf; Wed, 7 Apr 2021 14:52:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyI7pV7Q8nqywofbWDW6oNZ7dB//24z8Q307it3kKCOB7YRzxCdGuyMJ3UmNFL4Nc7M6CX7 X-Received: by 2002:a05:6a00:1595:b029:217:49e9:2429 with SMTP id u21-20020a056a001595b029021749e92429mr4789335pfk.80.1617832377470; Wed, 07 Apr 2021 14:52:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617832377; cv=none; d=google.com; s=arc-20160816; b=tjlBLaw5GTLWFW0CcsHS49ihqgsgFJUkpV1REquF5a3BHviSX7LVq132mqbmvao9Rc /qy5yn5+bFBAftj82fW58withY3g1G1StP+p+Il1R5rn8Coe19AsLM3s6IB7XcExAyYg tB6irBzx2yiX34chrUvfpxHnKx3a13h8g1KDlloiPUMz98UFOiLC89jahd+agsowWOlZ zQ4oTudwuWuy4gMxyrQ1ukfm1i9ijn7QAZSe2CY0imgDnbobarcM8IBfaZ5w2KV9IvZr zrL0duRZJsKp7ASddd1TKmFv0Jw1XTZKw9D+4Mlw4igARh3IMhe8cJTvFXqrRMFv4R6W fq+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature; bh=oMW5JogOXep/VirxdgVme2c1Ex6L6GNCHF/YUMUglV4=; b=vkpK23jpXeJGUiW1uJ5QJJB8DO40jMndPPj2vCv8gBOrN845ZP+dmQyg55HRmEjoNy WvYhOZNKLr7nHytdLo3L7BWaDlgsvxz1z00wfj/VZlnRD71oCsc1cRhmm6+a5Qv0rD98 JA2Lm9DvGsGdhXWXpsHDy+DyTLprfsCKJAAzDvjGm7MYnqEaZbQ2uUnGyKKN7cr5E222 wRwYwmusQsxgd6e3iWtxEM9V/mUdSMy6S4FNjdyUiidcbQEDzi1epC0031xTc1ywZoxB 2Cyc+062Hgnskizsi4e6y0uFl+ldZj6v8nJTmcJQS15sHVP0PdtP9ysYl3gJo2Vu8y9Y 9J6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=aV3qe27q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j21si4417336pfr.154.2021.04.07.14.52.45; Wed, 07 Apr 2021 14:52:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=aV3qe27q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355235AbhDGS0v (ORCPT + 99 others); Wed, 7 Apr 2021 14:26:51 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:2048 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1348515AbhDGS0p (ORCPT ); Wed, 7 Apr 2021 14:26:45 -0400 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.43/8.16.0.43) with SMTP id 137IIVCC007432 for ; Wed, 7 Apr 2021 11:26:34 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : content-type : content-transfer-encoding : mime-version; s=facebook; bh=oMW5JogOXep/VirxdgVme2c1Ex6L6GNCHF/YUMUglV4=; b=aV3qe27q6yIkjjtd32S5vP07Dq+VWLm9+vrvYRI46oFpWtviJLvM1kwrBkOd8udn81Xt zXWHY/YXYvPXMw0rj/edLw0B2VA5y8mdbOfgofLPKPr7xpMtNVHg9WE07nC4t1V+uOtO tYW/HLhiZtGLRC4pMKjrRBPfa/rhNmRRcjg= Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net with ESMTP id 37rvd4ee4x-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 07 Apr 2021 11:26:34 -0700 Received: from intmgw002.46.prn1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 7 Apr 2021 11:26:25 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 345295FEBBFF; Wed, 7 Apr 2021 11:26:19 -0700 (PDT) From: Roman Gushchin To: Dennis Zhou CC: Tejun Heo , Christoph Lameter , Andrew Morton , , , Roman Gushchin Subject: [PATCH v2 0/5] percpu: partial chunk depopulation Date: Wed, 7 Apr 2021 11:26:13 -0700 Message-ID: <20210407182618.2728388-1-guro@fb.com> X-Mailer: git-send-email 2.30.2 X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-ORIG-GUID: 9-EPPU-D7K2wMmpN9Px9S9ZP9EP3A4NM X-Proofpoint-GUID: 9-EPPU-D7K2wMmpN9Px9S9ZP9EP3A4NM Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-07_09:2021-04-07,2021-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 clxscore=1015 adultscore=0 lowpriorityscore=0 malwarescore=0 spamscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 mlxlogscore=999 impostorscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104070126 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In our production experience the percpu memory allocator is sometimes strug= gling with returning the memory to the system. A typical example is a creation of several thousands memory cgroups (each has several chunks of the percpu data used for vmstats, vmevents, ref counters etc). Deletion and complete releas= ing of these cgroups doesn't always lead to a shrinkage of the percpu memory, so that sometimes there are several GB's of memory wasted. The underlying problem is the fragmentation: to release an underlying chunk all percpu allocations should be released first. The percpu allocator tends to top up chunks to improve the utilization. It means new small-ish allocat= ions (e.g. percpu ref counters) are placed onto almost filled old-ish chunks, effectively pinning them in memory. This patchset pretends to solve this problem by implementing a partial depopulation of percpu chunks: chunks with many empty pages are being asynchronously depopulated and the pages are returned to the system. To illustrate the problem the following script can be used: -- #!/bin/bash cd /sys/fs/cgroup mkdir percpu_test echo "+memory" > percpu_test/cgroup.subtree_control cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do mkdir percpu_test/cg_"${i}" for j in `seq 1 10`; do mkdir percpu_test/cg_"${i}"_"${j}" done done cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do for j in `seq 1 10`; do rmdir percpu_test/cg_"${i}"_"${j}" done done sleep 10 cat /proc/meminfo | grep Percpu for i in `seq 1 1000`; do rmdir percpu_test/cg_"${i}" done rmdir percpu_test -- It creates 11000 memory cgroups and removes every 10 out of 11. It prints the initial size of the percpu memory, the size after creating all cgroups and the size after deleting most of them. Results: vanilla: ./percpu_test.sh Percpu: 7488 kB Percpu: 481152 kB Percpu: 481152 kB with this patchset applied: ./percpu_test.sh Percpu: 7488 kB Percpu: 481408 kB Percpu: 135552 kB So the total size of the percpu memory was reduced by more than 3.5 times. v2: - depopulated chunks are sidelined - depopulation happens in the reverse order - depopulate list made per-chunk type - better results due to better heuristics v1: - depopulation heuristics changed and optimized - chunks are put into a separate list, depopulation scan this list - chunk->isolated is introduced, chunk->depopulate is dropped - rearranged patches a bit - fixed a panic discovered by krobot - made pcpu_nr_empty_pop_pages per chunk type - minor fixes rfc: https://lwn.net/Articles/850508/ Roman Gushchin (5): percpu: fix a comment about the chunks ordering percpu: split __pcpu_balance_workfn() percpu: make pcpu_nr_empty_pop_pages per chunk type percpu: generalize pcpu_balance_populated() percpu: implement partial chunk depopulation mm/percpu-internal.h | 4 +- mm/percpu-stats.c | 9 +- mm/percpu.c | 282 ++++++++++++++++++++++++++++++++++++------- 3 files changed, 246 insertions(+), 49 deletions(-) --=20 2.30.2