Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1941853ybk; Thu, 21 May 2020 20:28:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxf2+ptMiIWbCDlYyTgZockZduqm2oEhuiBxAV6zKaHMpv58Kob3KpUu7Epbs/MZ91kxECF X-Received: by 2002:a17:906:b182:: with SMTP id w2mr6404782ejy.463.1590118117110; Thu, 21 May 2020 20:28:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590118117; cv=none; d=google.com; s=arc-20160816; b=ou6bqiq8J1Uk2Z+xZobrpStDbO/OnzbhX7JwaulujI8x0/+2gQ2KrehcyY9IVi3gYD boBxzMPYupYvxA5xoR8TApfDmQYpnvfY/uvpCFQOnKSiIQyr3TZeq/WUxkHtgal46ODR MAXZ2qYszqTlTsUWF0wyMry0RQrpGYZQJ1GUq67AbB0nwQ+WoLjZidqiOKjo0alOxe7n wx2nZ4vtywaOLHIetOW8VlBzEpFk3r9p2cJdw3sOXAUaMcBcOXWXZ0Vsq1CDHEapPUkZ 1FGipuzF/0iExezm3/btgUwytE8hd8nN50tLIG7Exg+rbEbn25/y4OiE7B+bOj7rmKTK YA3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=w4IqXx8CfV8dNUHJrpuv6R2T2FOicwcCWatTeWg4LMA=; b=v91m56RYkT8M8lfljvkmx0z2ytII8jUwuYnsRtYh8lBkrWkkRd+5x7aiik6rNw6bGF fn7HN5oZXUp0GL9ai7/KxLNqAK5Sv5896gqOCdWBMVgSgjLkUX0NcyWn9Y30Uqn9h5n0 6HEPuIL4CUxJ6GX+Wn4H+iUb2iDq/a7CYznXVrq3yK4H5C2bbZkK8HyOlzyW/AOku4QM ZQC/myR5AulvIDZnqKRCufhShbxOjyrDuXbiunXQw/oOCo31DN9mlk5Rtu19Wh7uw45H vYsvK5209Y6O8YmpL9IGsB+aIDbKxcYcpp5rN6WO4WGohdktUvk6c7pnGq97cPHVmvOE lMBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qo7si4437138ejb.458.2020.05.21.20.28.14; Thu, 21 May 2020 20:28:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728019AbgEVD0t (ORCPT + 99 others); Thu, 21 May 2020 23:26:49 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:5273 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727024AbgEVD0s (ORCPT ); Thu, 21 May 2020 23:26:48 -0400 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 5264D3B8B9B8A21A4CC9; Fri, 22 May 2020 11:26:47 +0800 (CST) Received: from [10.166.213.22] (10.166.213.22) by smtp.huawei.com (10.3.19.202) with Microsoft SMTP Server (TLS) id 14.3.487.0; Fri, 22 May 2020 11:26:41 +0800 Subject: Re: [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups To: John Fastabend , Tejun Heo , CC: Jakub Kicinski , David Miller , yangyingliang , Kefeng Wang , , , , , Linux Kernel Network Developers References: <939566f5-abe3-3526-d4ff-ec6bf8e8c138@huawei.com> <2fcd921d-8f42-9d33-951c-899d0bbdd92d@huawei.com> <20200508225829.0880cf8b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> <20200509210214.408e847a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> <5ec6ef43d98e7_3bbf2ab912c625b4eb@john-XPS-13-9370.notmuch> From: Zefan Li Message-ID: <37ad9c6e-b8e9-d23a-f168-fca2292ef5c5@huawei.com> Date: Fri, 22 May 2020 11:26:40 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <5ec6ef43d98e7_3bbf2ab912c625b4eb@john-XPS-13-9370.notmuch> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.166.213.22] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/5/22 5:14, John Fastabend wrote: > Jakub Kicinski wrote: >> On Fri, 8 May 2020 22:58:29 -0700 Jakub Kicinski wrote: >>> On Sat, 9 May 2020 11:32:10 +0800 Zefan Li wrote: >>>> If systemd is configured to use hybrid mode which enables the use of >>>> both cgroup v1 and v2, systemd will create new cgroup on both the default >>>> root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach >>>> task to the two cgroups. If the task does some network thing then the v2 >>>> cgroup can never be freed after the session exited. >>>> >>>> One of our machines ran into OOM due to this memory leak. >>>> >>>> In the scenario described above when sk_alloc() is called cgroup_sk_alloc() >>>> thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data >>>> and increments the cgroup refcnt, but then sock_update_netprioidx() thought >>>> it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the >>>> cgroup refcnt will never be freed. >>>> >>>> Currently we do the mode switch when someone writes to the ifpriomap cgroup >>>> control file. The easiest fix is to also do the switch when a task is attached >>>> to a new cgroup. >>>> >>>> Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup") >>> >>> ^ space missing here >>> >>>> Reported-by: Yang Yingliang >>>> Tested-by: Yang Yingliang >>>> Signed-off-by: Zefan Li >> >> Fixed up the commit message and applied, thank you. > > Hi Zefan, Tejun, > > This is causing a regression where previously cgroupv2 bpf sockops programs > could be attached and would run even if netprio_cgroup was enabled as long > as the netprio cgroup had not been configured. After this the bpf sockops > programs can still be attached but only programs attached to the root cgroup > will be run. For example I hit this when I ran bpf selftests on a box that > also happened to have netprio cgroup enabled, tests started failing after > bumping kernel to rc5. > > I'm a bit on the fence here if it needs to be reverted. For my case its just > a test box and easy enough to work around. Also all the production cases I > have already have to be aware of this to avoid the configured error. So it > may be fine but worth noting at least. Added Alexei to see if he has any > opinion and/or uses net_prio+cgroubv2. I only looked it over briefly but > didn't see any simple rc6 worthy fixes that would fix the issue above and > also keep the original behavior. > Me neither. If we really want to keep the original behavior we probably need to do something similar to what netclassid cgroup does, which is to iterate all the tasks in the cgroup to update netprioidx when netprio cgroup is configured, and we also need to not update netprioidx when a task is attached to a new cgroup. > And then while reviewing I also wonder do we have the same issue described > here in netclasid_cgroup.c with the cgrp_attach()? It would be best to keep > netcls and netprio in sync in this regard imo. At least netcls calls > cgroup_sk_alloc_disable in the write_classid() hook so I suspect it makes > sense to also add that to the attach hook? > Fortunately we don't have this issue in netclassid cgroup. :) Because task_cls_classid() remains 0 as long as netclassid cgroup is not configured.