Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp2142788rdb; Mon, 20 Nov 2023 03:20:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IHH3ZvFfBvryRFOTlZR9vZUHUZ4BDhVFKsYw29w6uH4ABLzRgsvO/TLMA8+7grbyAuhUsO2 X-Received: by 2002:a05:6a20:7fa0:b0:18a:59fe:3ae5 with SMTP id d32-20020a056a207fa000b0018a59fe3ae5mr3272654pzj.48.1700479206180; Mon, 20 Nov 2023 03:20:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700479206; cv=none; d=google.com; s=arc-20160816; b=kJla/eSDCUpHQeIYUnpt2AC42b2nfwwf0PBanMH/Fh1WvFvr4Ea/a/441zLJWSmOA1 1VNY/BOPw5Niu+Nt5ctFfWj9ayXMMEBrRpGe2ZGO9lFTXWhav8H6GqKSeqvtWYv4/Z5N QY62QLhas0JGHw7DkCOi8uzy1Dvv/CZMq9Xk4AxYqIbyU4ldnQ+5BVHKrFb4LPhBMo/X c2eLoFZFI6J+WuqFv08T1U2DHqYvvx1cVQ8VFfBzMEj8Rb5APKWJwXmcIkDzqA74yP2L TFW/SRkswNbGCS9bPfXdoZcoB1wkKiW3WiM+6Slx0lYGVRyGxF8s4bzi+E6OSXv1LTvu rNWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=5V2XDX61NY0tOMnwnrDCDr7g/yBsZgaDtQvnBOTZYnU=; fh=CPbnwhUZLh3xsDVAF+evHOXYTuq9Su8YFinTyX0sUhc=; b=y0dNiyzxw5IlsazszMJZdrWNGoVLkNAb4HE5v2yRMWIisqK4XzAjA1WMkS+cZTwcHh 2DnSiSr70JjWOUxyac7//gE4srg2kvxs0PZl6s8pCmh7Ol0b4VDrMp0Y5ThNQeyujlnt sCLaMPrM3qVbNLFdJNEwdNAQ7C1cGN/SiAdnk23c+j/fClhZS9p553t2Boq8KGkY/GHV U8C1Yf0XyRXF0QsB1p+7nTGic/3I7f7OeWfvlYDZxR7mb+RyQ+ercluHd32NZhIt1OoR OmDgJT/fZr6jLbA2kAYXC4mm3TRFjDI7MREbRm8Wu32kRtFN5OOAVQhudQ9q98ckabUp I9fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="E+x/GA9O"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id c1-20020a63da01000000b005bddb7249e0si8010859pgh.313.2023.11.20.03.20.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 03:20:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="E+x/GA9O"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id E7FB58044772; Mon, 20 Nov 2023 03:20:02 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233144AbjKTLTt (ORCPT + 99 others); Mon, 20 Nov 2023 06:19:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233392AbjKTLTT (ORCPT ); Mon, 20 Nov 2023 06:19:19 -0500 Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F276FC for ; Mon, 20 Nov 2023 03:17:56 -0800 (PST) Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2c88750e7d1so2562671fa.3 for ; Mon, 20 Nov 2023 03:17:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700479074; x=1701083874; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5V2XDX61NY0tOMnwnrDCDr7g/yBsZgaDtQvnBOTZYnU=; b=E+x/GA9OrDl+OiOsvuYtGIzEJYjGJ3X4QV+jQa6vo2XWc5+teCsEJrSFQTZ5/JQ8E0 PfgEwdF0/izTFNMGoW9KkuWMrfMiByLbCsp7flgU+6GOIfF5FJc8UN9s4/3QuRWMI/12 zkdRbtC7jmDcpKd3A421TtkMKD4WsyVXwhu6HnEDMasrX4/0DH5kbvmkeg5GTGPd+Wgw RTiYmhA/HvOyAK3pYKkgC8+r9VCGrP8RY2c5xrcV43tlxLzUDc3TxHzA1AF96aau40XQ CknjWhl6qqFZb4114USjQmSnGIo+IS0kuJmixOmMy8Qqtp/LadKjCaK6sjS8pRqDMq0k Iwtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700479074; x=1701083874; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5V2XDX61NY0tOMnwnrDCDr7g/yBsZgaDtQvnBOTZYnU=; b=MeiFQOxbKvFbtckk/peypVHidPjxSAHSnp2jFM0OR7CF+394SxCk4OCJyccTnz8U7j b77Q5Hp2Wg6o8b+WRJMJlJJ0JrNniCmyW4SooFBAoeK6PYi2yo1at54QUYYEXpz9/qdR YJcTXp5x9LwPSW3JYd7Xxi02jSZaKNUwHZuHIqdIn0IzsUpsCyhKihCjMYzjBvk2ZmSP /3QMiMoKgv0ShpCyExIlUymlE4jR7UPvuRc6DPQ5M+cTqJzE6teFBfydCveShGW32gsG RIe/KKNEk9LzvYE8y8Ylfw1NsI2EXL2BGgnzi6P1U8I8uKdR0v6BrK9Yc9xfQ5ajwD4D YVzw== X-Gm-Message-State: AOJu0YykUXHmVG2lchZC0GCZv4/CL0PxkZFad2qeECoZm5Lnz8tGiuuG MbFHZX5wrcFfNleUrHhfx1HN8mybkvHgEkWYapY= X-Received: by 2002:a05:651c:2209:b0:2c8:87fe:2f4e with SMTP id y9-20020a05651c220900b002c887fe2f4emr112425ljq.8.1700479074189; Mon, 20 Nov 2023 03:17:54 -0800 (PST) MIME-Version: 1.0 References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-24-ryncsn@gmail.com> <87msv8c1xy.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87msv8c1xy.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Kairui Song Date: Mon, 20 Nov 2023 19:17:37 +0800 Message-ID: Subject: Re: [PATCH 23/24] swap: fix multiple swap leak when after cgroup migrate To: "Huang, Ying" Cc: linux-mm@kvack.org, Andrew Morton , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org, Tejun Heo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 20 Nov 2023 03:20:03 -0800 (PST) Huang, Ying =E4=BA=8E2023=E5=B9=B411=E6=9C=8820=E6= =97=A5=E5=91=A8=E4=B8=80 15:37=E5=86=99=E9=81=93=EF=BC=9A > > Kairui Song writes: > > > From: Kairui Song > > > > When a process which previously swapped some memory was moved to > > another cgroup, and the cgroup it previous in is dead, then swapped in > > pages will be leaked into rootcg. Previous commits fixed the bug for > > no readahead path, this commit fix the same issue for readahead path. > > > > This can be easily reproduced by: > > - Setup a SSD or HDD swap. > > - Create memory cgroup A, B and C. > > - Spawn process P1 in cgroup A and make it swap out some pages. > > - Move process P1 to memory cgroup B. > > - Destroy cgroup A. > > - Do a swapoff in cgroup C > > - Swapped in pages is accounted into cgroup C. > > > > This patch will fix it make the swapped in pages accounted in cgroup B. > > Accroding to "Memory Ownership" section of > Documentation/admin-guide/cgroup-v2.rst, > > " > A memory area is charged to the cgroup which instantiated it and stays > charged to the cgroup until the area is released. Migrating a process > to a different cgroup doesn't move the memory usages that it > instantiated while in the previous cgroup to the new cgroup. > " > > Because we don't move the charge when we move a task from one cgroup to > another. It's controversial which cgroup should be charged to. > According to the above document, it's acceptable to charge to the cgroup > C (cgroup where swapoff happens). Hi Ying, thank you very much for the info! It is controversial indeed, just the original behavior is kind of counter-intuitive. Image if there are cgroup P1, and its child cgroup C1 C2. If a process swapped out some memory in C1 then moved to C2, and C1 is dead. On swapoff the charge will be moved out of P1... And swapoff often happen on some unlimited cgroup or some cgroup for management agent. If P1 have a memory limit, it can breech the limit easily, we will see a process that never leave P1 having a much higher RSS that P1/C1/C2's limit. And if there is a limit for the management agent cgroup, the agent will be OOM instead of OOM in P1. Simply moving a process between the child cgroup of the same parent cgroup won't cause such issue, thing get weird when swapoff is involved. Or maybe we should try to be compatible, and introduce a sysctl or cmdline for this?