Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp621566rdb; Tue, 31 Oct 2023 18:26:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/VmC3tvjjXV8hIz5kVEdm0eR5ebCVCnmTRPxU4Cs5E1zIfIAKcBQ93mNmn2R8RqDjpDZq X-Received: by 2002:a05:6870:f21d:b0:1e9:f6c3:8594 with SMTP id t29-20020a056870f21d00b001e9f6c38594mr16285333oao.2.1698801985607; Tue, 31 Oct 2023 18:26:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698801985; cv=none; d=google.com; s=arc-20160816; b=eBJ0y2kiDpqJ3X4OyDovrmMhK7CZ1xwF7p7ik4TlhoslbMMMp+zwGWAxe997Foy/0Z TdyauZSsAhCti5zbeAjtOIbm3lsx4MAN44Dbw7yVuNWtm/XeuTZd8FaoexKHnAdr7wVV yfjknTcSh/YW5k9QjlsgNv7+yFxQrogA/RONE6iXbIthpTEyo+hQ0MhVq8xV5ZxQjv5S 1CQkN0C92Cybn1QoUFRbgFQMNaYxUBLIzZq761VETV8MvDKif2kOOaL7HRvIm4J6Nh4M tRzl35vj0voxfCiUZGjDCknRFKTGPRP9N8G37Y4+4FPhFU8ly6VsQM43J8NhjykpzhFF hVkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KRhIv33/xNOydaS40tvWBRwkpIiHBlMX74EhxM1mr3A=; fh=s8lAN7t8GKeMJ3l9JO5lh7rnlc/4QGKxyh5PkT048uM=; b=uUloQsFvNWDVUJllFBGSGIyN83igQvxazETkaXVRw2A53yiLrSBuKhkxo4miFLp96Q bUbxXzkaTRndM6N9QbRgtv/ucOod4oOxC3P9JtEY+Y28r+rBbekpF58v5ewri+Cj4SdM CmjHSO8OA75i177Bp0ruMD+0KwPpmPIhWP6qQk7M2N+LDz4bU3LwnBjm+Wu7eGZ/Pjdt nFOoKjoqzjGDdXTqlx/51jGX7umgcqm+LsJG2xRA9RFYJBqhLGuOmCrGg6peV933EKCb G80cBTrr4ErRl7mQ7DPAwHNIM92pyLjbtoShtuWkKW5RKhi78p+4Hsnp9JwPqP+2LePg tzqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=KwTdA0Xw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id h25-20020a633859000000b00584b2a790absi1945564pgn.821.2023.10.31.18.26.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 18:26:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=KwTdA0Xw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 45C5B80FC182; Tue, 31 Oct 2023 18:26:23 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345135AbjKAB0T (ORCPT + 99 others); Tue, 31 Oct 2023 21:26:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345060AbjKAB0S (ORCPT ); Tue, 31 Oct 2023 21:26:18 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 218E7F1 for ; Tue, 31 Oct 2023 18:26:16 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6ba54c3ed97so6396872b3a.2 for ; Tue, 31 Oct 2023 18:26:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698801975; x=1699406775; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KRhIv33/xNOydaS40tvWBRwkpIiHBlMX74EhxM1mr3A=; b=KwTdA0Xwf6bc1od7lc1d/Pzr2/R9NJf4MGG9BWGb3i3rhrxZZC3vaNEPSqEf2QBTWl 82ZBsq/mU/M1ZAfkzWCQLRIDMkZmiOy5UcpBVsd6F0AnrqwG8QmCpjp+mpe2Y+gksJUm 4draaup1CYDKZlRpem9OuB6Smrx233x/JaNjzkVwIf8KjCMIvMEBhp28LCDSwXNJruK/ L1STFuAfwVYMofPr12SgX+g92YDt1d+c42hb3ATapOkKCFEzMNeRtr+XXJHF3/jiEie3 ctXd5TLDX8xhj4KBSHQVkrLopA1JpP9aidIDsoCIKNfgsGeNm8vYypiIFoanxe/iy0k1 Ua/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698801975; x=1699406775; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KRhIv33/xNOydaS40tvWBRwkpIiHBlMX74EhxM1mr3A=; b=hLIZaoiXQfdqaDGKoOAoFJa8P7Ykb7+ZB58BfCbWbrQOXtHNbCBr2hYZD4ZIcEEYOJ NIyduuFurCXNnR3VguRjO8m9tWaWUVQf1NBgnBaDsh1q9iVeToxGJII1mLLLQfrtfkUU eXUko8uEANViMX0+JZmtOvAcLdJ0Zhe3Nl28kfkjc/kV9d1uNLBFIOC+17gU4N0UTmle INVBNzggdrETEcyoYcUjLolAz5KkUQaU1967XdTc9FZwTcefufoGPulElfVc5XnaIICf AfENwNzqxukGiY/DmVzMTw/XZ5aCZ2/JqCwOCD8p9ZAJ1AkIMsrLU72x6uDfkO7lFSjk 8P3A== X-Gm-Message-State: AOJu0YzdPZGXrWMsCao3MwwPjSwTSeF9z7NhUJcdHurbjWQ3BAkuwZPz waPGVwCz2yvS0c/y8l69B5k= X-Received: by 2002:a05:6a20:a122:b0:148:f952:552b with SMTP id q34-20020a056a20a12200b00148f952552bmr19654936pzk.51.1698801975542; Tue, 31 Oct 2023 18:26:15 -0700 (PDT) Received: from localhost (fwdproxy-prn-003.fbsv.net. [2a03:2880:ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id jk6-20020a170903330600b001c61901ed2esm176358plb.219.2023.10.31.18.26.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 18:26:15 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 2/5] zswap: make shrinking memcg-aware Date: Tue, 31 Oct 2023 18:26:14 -0700 Message-Id: <20231101012614.186996-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231024203302.1920362-3-nphamcs@gmail.com> References: <20231024203302.1920362-3-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=y Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 31 Oct 2023 18:26:23 -0700 (PDT) cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more about memory controller + list_lru reparenting logic than me. There seems to be a race between memcg offlining and zswap’s cgroup-aware LRU implementation: CPU0 CPU1 zswap_lru_add() mem_cgroup_css_offline() get_mem_cgroup_from_objcg() memcg_offline_kmem() memcg_reparent_objcgs() memcg_reparent_list_lrus() memcg_reparent_list_lru() memcg_reparent_list_lru_node() list_lru_add() memcg_list_lru_free() Essentially: on CPU0, zswap gets the memcg from the entry's objcg (before the objcgs are reparented). Then it performs list_lru_add() after the list_lru entries reparenting (memcg_reparent_list_lru_node()) step. If the list_lru of the memcg being offlined has not been freed (i.e before the memcg_list_lru_free() call), then the list_lru_add() call would succeed - but the list will be freed soon after. The new zswap entry as a result will not be subjected to future reclaim attempt. IOW, this list_lru_add() call is effectively swallowed. And worse, there might be a crash when we invalidate the zswap_entry in the future (which will perform a list_lru removal). Within get_mem_cgroup_from_objcg(), none of the following seem sufficient to prevent this race: 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() section. 2. Checking if the memcg is freed yet (with css_tryget()) (what we're currently doing in this patch series). 3. Checking if the memcg is still online (with css_tryget_online()) The memcg can still be offlined down the line. I've discussed this privately with Johannes, and it seems like the cleanest solution here is to move the reparenting logic down to release stage. That way, when get_mem_cgroup_from_objcg() returns, zswap_lru_add() is given an memcg that is reparenting-safe (until we drop the obtained reference).