Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp652775rdb; Tue, 31 Oct 2023 20:07:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFo4VsQW+PzKQEkjEtMPVL+9ezWzOM6lpCvs+GV/soznPBBfiDIWGKGFcXrUYbawjx7i+iq X-Received: by 2002:a05:6a21:7189:b0:180:e069:581e with SMTP id wq9-20020a056a21718900b00180e069581emr4866139pzb.54.1698808049067; Tue, 31 Oct 2023 20:07:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698808049; cv=none; d=google.com; s=arc-20160816; b=g7Zh1epfLWlDcjPLHLXW8YqNoceBjguzkJ5UViZpsX2jB/p0x1JoqotJVDD5PtvTgM pVyH331jV5Y7ZjLI9Kde1sN7WSand97hPER5W2+3SQ1MUdX4JkRMT3voYvbKdpFT8ICw FbbydVjPkhOiZTDgE6JfCl5ROa7RLV/+m4L55ANhsBJlzRDTzRqS10bLXdulErGIy7ym GqBJZyNwm3SKbAQ0KnSYFCGnyr3pi0cJnpELeNHONKiuJDCcsOYyPYdmSG+ywEl0Hjrs TBNHL5m81S++plGpLvc/Lz3pQZaHrw6S0Vda4FCL35wpwHlY8b5xqePgwXRjvcxCEGlw JfIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=uWoT6krnJovWyrEsMbfgyVFj2a0jkXtJTHAUCAcBSHk=; fh=XK/bkPt0CziZ8EVCF4nzI7N6MDeTSKWjXikhpG/NxkE=; b=yt4xmS0aNxHEWt3gqybaV6VYYGlvQ1g5sAc8AMW8WoyJpwqGiPptXJ56kL6zyJynoA 3K3k93sVLMbwF21m0io1NBYXsz8+zKV2/hICgyiZem3IWCQwAlCcMyLYSUgxpz5MmP/j cp+eX0DXvtCXL4iYeoFo7IKFV1m7ZPx3vq5TswV04i4InEZDeCY2CiKnSXpl7tG/CVGO QCulWyZv4HWGT0zE66aVTo6merVpzU+YChVSqpqzK+2BHW3LmGl+BL1av0Cum3Cpn+qN EyPyoZkVi1nhmkqlWXtq9RGLcFC3aJv45Cgt3BgDlO6RXK4kV1CH3uqDEUf5hwE263Sb i0zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=tdHJYSBR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id j8-20020a170902da8800b001cc3480e703si2222783plx.517.2023.10.31.20.07.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 20:07:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=tdHJYSBR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 09A0F8129AED; Tue, 31 Oct 2023 20:07:28 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376745AbjKADH0 (ORCPT + 99 others); Tue, 31 Oct 2023 23:07:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345437AbjKADHX (ORCPT ); Tue, 31 Oct 2023 23:07:23 -0400 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [IPv6:2001:41d0:203:375::ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81BD7A4 for ; Tue, 31 Oct 2023 20:07:17 -0700 (PDT) Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1698808035; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uWoT6krnJovWyrEsMbfgyVFj2a0jkXtJTHAUCAcBSHk=; b=tdHJYSBR9O6TjCmWCn2ljaxzGettdzXpL+yVuzXAIkcUPgnCN7ueVaIJI4+NTgBqk01xm1 bxU5yWujuCAG4cVbxT23sYtyr4ehEISpdILe17tsUdbHjohsz7Ni1wf5DMUsAGOy0N1vfe 69csr/e6AnbTYIRoWHcrPVcA8jdj/vY= Mime-Version: 1.0 Subject: Re: [PATCH v4 2/5] zswap: make shrinking memcg-aware X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20231101012614.186996-1-nphamcs@gmail.com> Date: Wed, 1 Nov 2023 11:06:26 +0800 Cc: Andrew Morton , Johannes Weiner , cerasuolodomenico@gmail.com, Yosry Ahmed , sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, Michal Hocko , Roman Gushchin , Shakeel Butt , Chris Li , Linux-MM , kernel-team@meta.com, LKML Content-Transfer-Encoding: quoted-printable Message-Id: References: <20231024203302.1920362-3-nphamcs@gmail.com> <20231101012614.186996-1-nphamcs@gmail.com> To: Nhat Pham X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 31 Oct 2023 20:07:28 -0700 (PDT) > On Nov 1, 2023, at 09:26, Nhat Pham wrote: >=20 > cc-ing Johannes, Roman, Shakeel, Muchun since you all know much more > about memory controller + list_lru reparenting logic than me. >=20 > There seems to be a race between memcg offlining and zswap=E2=80=99s > cgroup-aware LRU implementation: >=20 > CPU0 CPU1 > zswap_lru_add() mem_cgroup_css_offline() > get_mem_cgroup_from_objcg() > memcg_offline_kmem() > memcg_reparent_objcgs() > memcg_reparent_list_lrus() > memcg_reparent_list_lru() > = memcg_reparent_list_lru_node() > list_lru_add() > memcg_list_lru_free() >=20 >=20 > Essentially: on CPU0, zswap gets the memcg from the entry's objcg > (before the objcgs are reparented). Then it performs list_lru_add() > after the list_lru entries reparenting = (memcg_reparent_list_lru_node()) > step. If the list_lru of the memcg being offlined has not been freed > (i.e before the memcg_list_lru_free() call), then the list_lru_add() > call would succeed - but the list will be freed soon after. The new No worries. list_lru_add() will add the object to the lru list of the parent of the memcg being offlined, because the ->kmemcg_id of the memcg being offlined will be changed to its parent's ->kmemcg_id before = memcg_reparent_list_lru(). Muchun, Thanks > zswap entry as a result will not be subjected to future reclaim > attempt. IOW, this list_lru_add() call is effectively swallowed. And > worse, there might be a crash when we invalidate the zswap_entry in = the > future (which will perform a list_lru removal). >=20 > Within get_mem_cgroup_from_objcg(), none of the following seem > sufficient to prevent this race: >=20 > 1. Perform the objcg-to-memcg lookup inside a rcu_read_lock() > section. > 2. Checking if the memcg is freed yet (with css_tryget()) (what > we're currently doing in this patch series). > 3. Checking if the memcg is still online (with css_tryget_online()) > The memcg can still be offlined down the line. >=20 >=20 > I've discussed this privately with Johannes, and it seems like the > cleanest solution here is to move the reparenting logic down to = release > stage. That way, when get_mem_cgroup_from_objcg() returns, > zswap_lru_add() is given an memcg that is reparenting-safe (until we > drop the obtained reference).