Received: by 2002:a05:6512:2355:0:0:0:0 with SMTP id p21csp200464lfu; Wed, 30 Mar 2022 20:40:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxeeNGvEJUFlHT7829ba3Nk0yQ+KNlKUTWz+BcI7YzQmbrs9OHH+ELDC8LCWZL0/trS7APE X-Received: by 2002:a17:902:f78d:b0:14d:522e:deb3 with SMTP id q13-20020a170902f78d00b0014d522edeb3mr3265452pln.173.1648698052978; Wed, 30 Mar 2022 20:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648698052; cv=none; d=google.com; s=arc-20160816; b=AXEhGotyviRvjT+rce/Nw8pJPBrYIMePvTuXo7KWTC5hKfP5rK/JUKmHYxPnLwDXqO m0MXVgfxU6zhNVVILpIDmkmLf5lwEs2uc3AwSWq+4EEOYSauME3vz5Kr1armzjwq2iCz 8lnQmU63yB2lJ25jNyM95RYWbfim5jdlwRJ6lnnuBcjj8TQmh33EFtz50c/wBDP0gWP3 xZ5FhX4CBShTSb8vjuhXmvQf05Un1foGeazCNaqZRmnSzIq7FUE9MSQxps8Bc281JfXU /opN64AUBzawtkIg+S1iA20qk3/+5urXYaz9TXwhpSThi0Ebmjsf6KkJOYYMEyDKF6H2 TzYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=YP5iqz5u1p9Qm11w8oi/kQHa8fcwvye0l6L3q9KcJpo=; b=GFoUokYAoZkCSH13qEVlccJ0Dyek7PoEdOj8PZOWykOMlbrf2/qGAk7FzlTbwq+aaS 6faaZYdv0Tl9MEF22rGwSEGXoU9TsWlrHP8CDuNYOEPbWGeHtyYHcpetY63eWxi0QPX0 Mj0aR5cmQksjfAzqYDZGhf6LoriiVPFxaH138lbL975jIM4DOKy5T9gtkfvvdn4xWu34 YczT+NHNTWG669wv3qj5QNfk3+zV4hdHMYUsMCutPk7fG1bYkvS21cJcm26DVeQEgD3G Jacbiti5MbsnzTBZUVvoLitsIntYNLi4vo1zfwowD9MCCWJYV54aUYy+hek61E1xvr+H /rjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=Y7q7P79T; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id u18-20020a170903125200b00153b94147e7si24465182plh.157.2022.03.30.20.40.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 20:40:52 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=Y7q7P79T; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 31EC1E6142; Wed, 30 Mar 2022 20:00:06 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352490AbiCaCQ3 (ORCPT + 99 others); Wed, 30 Mar 2022 22:16:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231567AbiCaCQ2 (ORCPT ); Wed, 30 Mar 2022 22:16:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D6543D1E1 for ; Wed, 30 Mar 2022 19:14:42 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id DBAC461964 for ; Thu, 31 Mar 2022 02:14:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F497C340EE; Thu, 31 Mar 2022 02:14:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1648692881; bh=l1qjXlV910SQvhBABrcXr3WHKwXfDb6DJqrt+F179IU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Y7q7P79T6brRykBtLstI9JRhnXkdDRsxcDTObnE6HicO9gaZ8FLo/f4lc7lT0ecMf AKPTOLxMjR7KzODRCQBXtr7eztYD53LQsnGjNAaFtFM8xCzfPqFxPj65irh2UDy3mi RyXgoMf6w1UIFKOojDOFg4XlukfmKFJO10RHYEPg= Date: Wed, 30 Mar 2022 19:14:40 -0700 From: Andrew Morton To: Waiman Long Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Roman Gushchin Subject: Re: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Message-Id: <20220330191440.1cc1b2de2b849d1ba93d2ba7@linux-foundation.org> In-Reply-To: <20220330172646.2687555-1-longman@redhat.com> References: <20220330172646.2687555-1-longman@redhat.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 30 Mar 2022 13:26:46 -0400 Waiman Long wrote: > Muchun Song found out there could be a race between list_lru_add() > and memcg_reparent_list_lru_node() causing the later function to miss > reparenting of a lru entry as shown below: > > CPU0: CPU1: > list_lru_add() > spin_lock(&nlru->lock) > l = list_lru_from_kmem(memcg) > memcg_reparent_objcgs(memcg) > memcg_reparent_list_lrus(memcg) > memcg_reparent_list_lru() > memcg_reparent_list_lru_node() > if (!READ_ONCE(nlru->nr_items)) > // Miss reparenting > return > // Assume 0->1 > l->nr_items++ > // Assume 0->1 > nlru->nr_items++ > > Though it is not likely that a list_lru_node that has 0 item suddenly > has a newly added lru entry at the end of its life. The race is still > theoretically possible. > > With the lock/unlock pair used within the percpu_ref_kill() which is > the last function call of memcg_reparent_objcgs(), any read issued > in memcg_reparent_list_lru_node() will not be reordered before the > reparenting of objcgs. > > Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check > to ensure that either the reading of nr_items is valid or the racing > list_lru_add() will see the reparented objcg. > > ... > > --- a/mm/list_lru.c > +++ b/mm/list_lru.c > @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, > struct list_lru_one *src, *dst; > > /* > - * If there is no lru entry in this nlru, we can skip it immediately. > + * With the lock/unlock pair used within the percpu_ref_kill() > + * which is the last function call of memcg_reparent_objcgs(), any > + * read issued here will not be reordered before the reparenting > + * of objcgs. > + * > + * Assuming a racing list_lru_add(): > + * list_lru_add() > + * <- memcg_reparent_list_lru_node() > + * spin_lock(&nlru->lock) > + * l = list_lru_from_kmem(memcg) > + * nlru->nr_items++ > + * spin_unlock(&nlru->lock) > + * <- memcg_reparent_list_lru_node() > + * > + * The !spin_is_locked(&nlru->lock) check is true means it is > + * either before the spin_lock() or after the spin_unlock(). In the > + * former case, list_lru_add() will see the reparented objcg and so > + * won't touch the lru to be reparented. In the later case, it will > + * see the updated nr_items. So we can use the optimization that if > + * there is no lru entry in this nlru, skip it immediately. > */ > - if (!READ_ONCE(nlru->nr_items)) > - return; > + if (!spin_is_locked(&nlru->lock)) { ick. > + /* nr_items read must be ordered after nlru->lock */ > + smp_rmb(); > + if (!READ_ONCE(nlru->nr_items)) > + return; > + } include/linux/spinlock_up.h has #define arch_spin_is_locked(lock) ((void)(lock), 0) so this `if' will always be true on CONFIG_SMP=n. Will the kernel still work? At the very least let's have changelogging and commenting explaining that we've actually thought about this. Preferably, can we fix this hole properly and avoid this hack? There is a reason for this: hp2:/usr/src/25> grep spin_is_locked mm/*.c hp2:/usr/src/25>