Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1254394ybl; Wed, 29 Jan 2020 18:44:21 -0800 (PST) X-Google-Smtp-Source: APXvYqwr+WypHV3AtjdIPz3qy9qyil87cH1mpR/4FpBl/XIQT7Cdrrxv3Hx9nWU6W2XxtriAhn4O X-Received: by 2002:aca:5ad5:: with SMTP id o204mr1456918oib.2.1580352261415; Wed, 29 Jan 2020 18:44:21 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1580352261; cv=pass; d=google.com; s=arc-20160816; b=SdBZ8R8cxJsnu/K33v2q9yLPO9dkaXn7V9L2sq8ZckAgMS1X0MgER80ZMP5m9uYrJ5 fNsSt/exyDaaiUmA10WXReN/jF2mBd4Q77nqRCi1gk8VrqGMbcrbAoDGfiNg0MifbQdz ouhIpdNPMfVOdtbvpRX4bs0GJY/fnRGfBL5GxBALvN46TwdaHudbVwKe9NDvxQubkiQb wS7iGdKZ9v0Z7gHsXF05v/zTmtaX8jVfuJo/iZz5BblVZWZmHLLs21yL6wZXbB3JhyO6 diBdZjibJHNtxNH7zCuGYuA4/HWds6dzbij4fTIkbN6ZnaHZkDY1LiKtBkXxyjiNf2Ta 5MJg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :dkim-signature:dkim-signature; bh=TDktIcI5o9ZCItUIhhVsmJxLHnhN2ITyMd6+sSXbToA=; b=MqEcQuC0DOuAioAb41GlNbbn789rsIr/XZvrh5h/DWhxcXjEjpqBucF39/3ILXa4RI xKaX+ZcIjxP1Lznf+fhl875bQyPFCVAHRXoe80aLPyZveSJYKDVluc1bv43tsiNlMhum oDRifUv83jfyzOc22lzfCAP3uyjH3oZRz7f/UcvsdYp6eN+8y53CUXa2hH87HBWUkv3T h9SD4O+Q47P3gdRzpNGTRUvQaXuLBs/LeF99eQvcTk/TOsTm3CvTGZIefw4Pz6lPpLOF r7l1Ip9mGTjzsOy0Gvke7BC03QhciGoc17CKsrpqk42zplf/IE8+dKBUF9Yacl5P69Hw bNsQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=prEKWpAM; dkim=pass header.i=@fb.onmicrosoft.com header.s=selector2-fb-onmicrosoft-com header.b=LQMuE0Ov; arc=pass (i=1 spf=pass spfdomain=fb.com dkim=pass dkdomain=fb.com dmarc=pass fromdomain=fb.com); spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r22si2050738oic.93.2020.01.29.18.44.03; Wed, 29 Jan 2020 18:44:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=prEKWpAM; dkim=pass header.i=@fb.onmicrosoft.com header.s=selector2-fb-onmicrosoft-com header.b=LQMuE0Ov; arc=pass (i=1 spf=pass spfdomain=fb.com dkim=pass dkdomain=fb.com dmarc=pass fromdomain=fb.com); spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726818AbgA3Clz (ORCPT + 99 others); Wed, 29 Jan 2020 21:41:55 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:35954 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726719AbgA3Clz (ORCPT ); Wed, 29 Jan 2020 21:41:55 -0500 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 00U2au16013744; Wed, 29 Jan 2020 18:41:44 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=TDktIcI5o9ZCItUIhhVsmJxLHnhN2ITyMd6+sSXbToA=; b=prEKWpAMdmYZHyoMmGn3xcKdAAH0AftLIfhb5eSDiQvhVmFZHL44e11oSw41/EPXahje 30KtTa381+Svu37nRPkDspc4LoTyvz/IqVby0z1Ug9KA6ctkiPrP0bCRCb49GMmlnnZL YMqWupMfISZ/hGSdXioO+P0HICycMab0jiw= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 2xucv9aw19-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 29 Jan 2020 18:41:44 -0800 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (100.104.98.9) by o365-in.thefacebook.com (100.104.94.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1779.2; Wed, 29 Jan 2020 18:41:43 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RgUGIFejpJOfZx2sO73oAUuo/hT8Uo5Mfu0qPTV0c9IXHKPL8OFPJn8WuYcF/frNXIz9CFMjxGzJYb26JN+VQ3HqzJgw3Rv5RFpYtqHYzbe61S7FFqba1S9MQSmT8lxdMWOfg2ccKhsPSKyxWBhvEMSDBhZYkbBt4SmLmaJ3qyKGhUpRlD24i/YpZMoUx4JzVcQXgZEImIijbdzB7aTUTs0EOL+u0sclpsSNdsAATBOB2eXkccPEK2LduconbD/3bY2i3QgQLRzuZbSeNUKd8nvmCjRZTogipWXnVFz0A9UPnMH8b3mozIHHKyO6U7KdfjMk+X48OT2Cyw2qzNlBIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TDktIcI5o9ZCItUIhhVsmJxLHnhN2ITyMd6+sSXbToA=; b=SCKjRMko9Fk82bczehT28uVvJQSnbHhq/+b37wMk0WYvkqULxAyXrkg6HqeNRCqlKDUwUQYuUchG4PeICuc+a/pClfT1IgJzSiczog9SlcXxGndPvovueE8K4HwNmrcN4xWh4T3uxSpNxkZaMzRaUerN75Wcnsiok/OuHimHheO2Q9yh93Jdw7O06CIXbgNU5v06cYwS/kqsmxO0IV8DdiMyVkpzidu4GdpTmnrNKGs6lpVP8kFqJ14JAQau4DZvihRk4pmVf51P2n6huYBuGvOEbquBeDTE0y4vBURgtCMRkbsmH9NPmbrXp6MYxy46+X6/fGRxBkxtZ9LvMZO7qw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector2-fb-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TDktIcI5o9ZCItUIhhVsmJxLHnhN2ITyMd6+sSXbToA=; b=LQMuE0Ov5an7jyn5XL/p8KEm4nVxlGwopLoFRnm9wcQ0K7nBEUc19ReZ58fDafis+sku8aPpplWwJhBnNhBZO1Ssffeb1ld/WRinsTpimuknToiaL3PB/WAuvJ5HBBx2R78SKwkK8iM92S1/b/cMEf4vqE8cCjqhgEgXhusyUSQ= Received: from BYAPR15MB2631.namprd15.prod.outlook.com (20.179.155.147) by BYAPR15MB2854.namprd15.prod.outlook.com (20.178.206.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2665.22; Thu, 30 Jan 2020 02:41:41 +0000 Received: from BYAPR15MB2631.namprd15.prod.outlook.com ([fe80::ccb6:a331:77d8:d308]) by BYAPR15MB2631.namprd15.prod.outlook.com ([fe80::ccb6:a331:77d8:d308%7]) with mapi id 15.20.2665.027; Thu, 30 Jan 2020 02:41:41 +0000 Received: from xps.DHCP.thefacebook.com (2620:10d:c090:180::ade2) by MWHPR22CA0036.namprd22.prod.outlook.com (2603:10b6:300:69::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2665.22 via Frontend Transport; Thu, 30 Jan 2020 02:41:39 +0000 From: Roman Gushchin To: Bharata B Rao CC: "linux-mm@kvack.org" , Andrew Morton , Michal Hocko , "Johannes Weiner" , Shakeel Butt , "Vladimir Davydov" , "linux-kernel@vger.kernel.org" , Kernel Team , Yafang Shao Subject: Re: [PATCH v2 00/28] The new cgroup slab memory controller Thread-Topic: [PATCH v2 00/28] The new cgroup slab memory controller Thread-Index: AQHV1Thx8lEkKPj+0UO2O+yf+Epkr6gCeVAAgAAJ0oA= Date: Thu, 30 Jan 2020 02:41:41 +0000 Message-ID: <20200130024135.GA14994@xps.DHCP.thefacebook.com> References: <20200127173453.2089565-1-guro@fb.com> <20200130020626.GA21973@in.ibm.com> In-Reply-To: <20200130020626.GA21973@in.ibm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: MWHPR22CA0036.namprd22.prod.outlook.com (2603:10b6:300:69::22) To BYAPR15MB2631.namprd15.prod.outlook.com (2603:10b6:a03:150::19) x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [2620:10d:c090:180::ade2] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: fc0a3bfd-d647-4ab8-c8b3-08d7a52defb4 x-ms-traffictypediagnostic: BYAPR15MB2854: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-fb-source: Internal x-ms-oob-tlc-oobclassifiers: OLM:3968; x-forefront-prvs: 02981BE340 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(376002)(136003)(346002)(39860400002)(366004)(199004)(189003)(1076003)(54906003)(71200400001)(52116002)(478600001)(7696005)(6666004)(316002)(6506007)(5660300002)(8936002)(186003)(55016002)(33656002)(16526019)(9686003)(4326008)(2906002)(81156014)(8676002)(66556008)(66476007)(66946007)(81166006)(6916009)(66446008)(86362001)(64756008);DIR:OUT;SFP:1102;SCL:1;SRVR:BYAPR15MB2854;H:BYAPR15MB2631.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: XWXEa6aVOxNVnnG2u4KHw324a3WIU3OuBwcUdgL2zKzegF647aUEYrjOeyp+557vESRr8/59jWcN8/3GU+vNTeb2F0afD62Hi+neHK5z5wasDG0Y89avB5MS4/kw9Y1rWQZBveWuTrNN8AJoK9pHa3J24KXtIdhgcAqryU7Zt3uus3729ECpgD5fSD99/IOVs1n3o/S93pZMblL01jMeuyX2Zpmbc5eG8I8B+XDQpjgriULQGqOlO1EO9kw+3f8S4RY9auYkHI61rv2aehVt8cNO9Y5HTQIsQILxdKjDI6FRzF++y5wmSupwL+/blJK2rwUTk+9xILzt3VGQduq9jVt3nlTVJCCpFZCc5zL8leJ32BFbCIFW59RXFMtnb3nyAIMwEMkST5i3JuBHRKDtIgdoFgmqhLNmEJFvKhVEudZ+KIe5l+DVdn0SUt8GgvY2 x-ms-exchange-antispam-messagedata: TVngLFPb78gNxXV4WKFytFx0TvSPZfdh4oQjg4khlbZhAUq5swN2SQpZuKwOjDXJJv0HpAzXXe13JIWOJFhuZQjvx4O7RJ81UXqA+80RrCehWO0xpXNmUAPoWESqrh6iMHVTSbyZ3nIW48HOQvY9iH16KQJDIZ2fKRZ8JjhS7uM= Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: fc0a3bfd-d647-4ab8-c8b3-08d7a52defb4 X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Jan 2020 02:41:41.5665 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: hwcWntSBxYuS5ifI0pnKFKxiT/EQ+oovErVhPXcBe7bCSDpaYeI9dehbgmTg7gc+ X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR15MB2854 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-01-29_08:2020-01-28,2020-01-29 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxscore=0 adultscore=0 priorityscore=1501 phishscore=0 impostorscore=0 spamscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 malwarescore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-1911200001 definitions=main-2001300014 X-FB-Internal: deliver Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 30, 2020 at 07:36:26AM +0530, Bharata B Rao wrote: > On Mon, Jan 27, 2020 at 09:34:25AM -0800, Roman Gushchin wrote: > > The existing cgroup slab memory controller is based on the idea of > > replicating slab allocator internals for each memory cgroup. > > This approach promises a low memory overhead (one pointer per page), > > and isn't adding too much code on hot allocation and release paths. > > But is has a very serious flaw: it leads to a low slab utilization. > >=20 > > Using a drgn* script I've got an estimation of slab utilization on > > a number of machines running different production workloads. In most > > cases it was between 45% and 65%, and the best number I've seen was > > around 85%. Turning kmem accounting off brings it to high 90s. Also > > it brings back 30-50% of slab memory. It means that the real price > > of the existing slab memory controller is way bigger than a pointer > > per page. > >=20 > > The real reason why the existing design leads to a low slab utilization > > is simple: slab pages are used exclusively by one memory cgroup. > > If there are only few allocations of certain size made by a cgroup, > > or if some active objects (e.g. dentries) are left after the cgroup is > > deleted, or the cgroup contains a single-threaded application which is > > barely allocating any kernel objects, but does it every time on a new C= PU: > > in all these cases the resulting slab utilization is very low. > > If kmem accounting is off, the kernel is able to use free space > > on slab pages for other allocations. > >=20 > > Arguably it wasn't an issue back to days when the kmem controller was > > introduced and was an opt-in feature, which had to be turned on > > individually for each memory cgroup. But now it's turned on by default > > on both cgroup v1 and v2. And modern systemd-based systems tend to > > create a large number of cgroups. > >=20 > > This patchset provides a new implementation of the slab memory controll= er, > > which aims to reach a much better slab utilization by sharing slab page= s > > between multiple memory cgroups. Below is the short description of the = new > > design (more details in commit messages). > >=20 > > Accounting is performed per-object instead of per-page. Slab-related > > vmstat counters are converted to bytes. Charging is performed on page-b= asis, > > with rounding up and remembering leftovers. > >=20 > > Memcg ownership data is stored in a per-slab-page vector: for each slab= page > > a vector of corresponding size is allocated. To keep slab memory repare= nting > > working, instead of saving a pointer to the memory cgroup directly an > > intermediate object is used. It's simply a pointer to a memcg (which ca= n be > > easily changed to the parent) with a built-in reference counter. This s= cheme > > allows to reparent all allocated objects without walking them over and > > changing memcg pointer to the parent. > >=20 > > Instead of creating an individual set of kmem_caches for each memory cg= roup, > > two global sets are used: the root set for non-accounted and root-cgrou= p > > allocations and the second set for all other allocations. This allows t= o > > simplify the lifetime management of individual kmem_caches: they are > > destroyed with root counterparts. It allows to remove a good amount of = code > > and make things generally simpler. > >=20 > > The patchset* has been tested on a number of different workloads in our > > production. In all cases it saved significant amount of memory, measure= d > > from high hundreds of MBs to single GBs per host. On average, the size > > of slab memory has been reduced by 35-45%. >=20 > Here are some numbers from multiple runs of sysbench and kernel compilati= on > with this patchset on a 10 core POWER8 host: >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Peak usage of memory.kmem.usage_in_bytes, memory.usage_in_bytes and > meminfo:Slab for Sysbench oltp_read_write with mysqld running as part > of a mem cgroup (Sampling every 5s) > -------------------------------------------------------------------------= - > 5.5.0-rc7-mm1 +slab patch %reduction > -------------------------------------------------------------------------= - > memory.kmem.usage_in_bytes 15859712 4456448 72 > memory.usage_in_bytes 337510400 335806464 .5 > Slab: (kB) 814336 607296 25 >=20 > memory.kmem.usage_in_bytes 16187392 4653056 71 > memory.usage_in_bytes 318832640 300154880 5 > Slab: (kB) 789888 559744 29 > -------------------------------------------------------------------------= - >=20 >=20 > Peak usage of memory.kmem.usage_in_bytes, memory.usage_in_bytes and > meminfo:Slab for kernel compilation (make -s -j64) Compilation was > done from bash that is in a memory cgroup. (Sampling every 5s) > -------------------------------------------------------------------------= - > 5.5.0-rc7-mm1 +slab patch %reduction > -------------------------------------------------------------------------= - > memory.kmem.usage_in_bytes 338493440 231931904 31 > memory.usage_in_bytes 7368015872 6275923968 15 > Slab: (kB) 1139072 785408 31 >=20 > memory.kmem.usage_in_bytes 341835776 236453888 30 > memory.usage_in_bytes 6540427264 6072893440 7 > Slab: (kB) 1074304 761280 29 >=20 > memory.kmem.usage_in_bytes 340525056 233570304 31 > memory.usage_in_bytes 6406209536 6177357824 3 > Slab: (kB) 1244288 739712 40 > -------------------------------------------------------------------------= - >=20 > Slab consumption right after boot > -------------------------------------------------------------------------= - > 5.5.0-rc7-mm1 +slab patch %reduction > -------------------------------------------------------------------------= - > Slab: (kB) 821888 583424 29 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > Summary: >=20 > With sysbench and kernel compilation, memory.kmem.usage_in_bytes shows > around 70% and 30% reduction consistently. >=20 > Didn't see consistent reduction of memory.usage_in_bytes with sysbench an= d > kernel compilation. >=20 > Slab usage (from /proc/meminfo) shows consistent 30% reduction and the > same is seen right after boot too. That's just perfect! memory.usage_in_bytes was most likely the same because the freed space was taken by pagecache. Thank you very much for testing! Roman