Received: by 2002:a05:6512:2355:0:0:0:0 with SMTP id p21csp208310lfu; Wed, 30 Mar 2022 21:01:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyMpqcHydGPJaWb5DWOEmXSCRTTQz3wEBT+omgZpnUL5UW2dH/HJYF3tbdaPyr1sPweHPTY X-Received: by 2002:a17:902:f64d:b0:14f:fb63:f1a with SMTP id m13-20020a170902f64d00b0014ffb630f1amr3392875plg.159.1648699313011; Wed, 30 Mar 2022 21:01:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648699313; cv=none; d=google.com; s=arc-20160816; b=ask+ZsWWH8BYBsbP0EN8q0zky9VphzOudxyquh0mGGSosYZfJayITvDOjLjf9cHSGB Z2dGAhFMqgdEBucdaUWtPf6vcZRsDF47YSGYyKaeajhi7hAOat4HIdXG4knsP3iG5w9H 256CKbuu7vV/HrJc0+FRsr/ZGzQkqJG/Vu1+XNSotDdOapahgUuN+joINGoHlUm5R2oa C7q7sFLWYATpY/xC4IsimTlzCJWARCIp5WkjLwmmrb/QRmZA3EjrEO3uz4MFfs9XIPK0 bZrrU5enAw1KQFp3mqX2572356HZMWcOiyA7uBA/gsnlyRctg4Is+vKxorBJLu3I6f1P gZjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=J0Zkf1FnfxfcI2nPQ4sDNYlJ/HJW/82S+akVEvMwcw0=; b=bvufntPLKDnD+o8qGuzxpwCEqV7HkdzkWk8KzyrRC9C8gqsKDgJss4Y6OSgy/b81fw wC+5iGYHjBa9iLxcgajoM2D19cMgsr17y1opbVJl3xCmQzy1SnhO3cxlOr49oYCOGZGn echMKsuWiN2/BjNgwcEZr42ZdAb2u5MxDEJW8tFAi3WGl/zi3AXm4bSTAPVSJD70hsua QdZcd1MY6MKbyW5qG1f9zDxSE1ivx43OwTEpIkDNk6wGyOYkwiav/aiiDo6kNpvP0fpa q+ZfogVud+dvHz4lvCVZP/9aDVWvDypbD7MQ3s/T/JcVKveJiz9csVzVKEHFtJguyTYR 372g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkzOVe1h; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id w23-20020a17090a8a1700b001c5f7c491d4si1855659pjn.166.2022.03.30.21.01.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 21:01:53 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AkzOVe1h; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9D18E175382; Wed, 30 Mar 2022 20:09:21 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349278AbiC3R2s (ORCPT + 99 others); Wed, 30 Mar 2022 13:28:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245162AbiC3R2r (ORCPT ); Wed, 30 Mar 2022 13:28:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BF810201A4 for ; Wed, 30 Mar 2022 10:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1648661220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=J0Zkf1FnfxfcI2nPQ4sDNYlJ/HJW/82S+akVEvMwcw0=; b=AkzOVe1hC0UxuJ4tpcYQS1AD3dHDWoVQNUuVdhYkM31jx/TZtlKTYSkOmGOeQI2oIqAAVG 2YYuoINF37s4RFv+lU8ZEJ1QAhIW95kODKMnbu0IXuCT9456w6ZzJGqU4pMcFnBHevixas p0fcBUk3m6gRpV97R2WPD72Ta4x5JDE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-384-0ohhkZM3PR-RGzimURPoBQ-1; Wed, 30 Mar 2022 13:26:54 -0400 X-MC-Unique: 0ohhkZM3PR-RGzimURPoBQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F835803CB8; Wed, 30 Mar 2022 17:26:54 +0000 (UTC) Received: from llong.com (dhcp-17-215.bos.redhat.com [10.18.17.215]) by smtp.corp.redhat.com (Postfix) with ESMTP id 463BD401E3A; Wed, 30 Mar 2022 17:26:54 +0000 (UTC) From: Waiman Long To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Roman Gushchin , Waiman Long Subject: [PATCH v2] mm/list_lru: Fix possible race in memcg_reparent_list_lru_node() Date: Wed, 30 Mar 2022 13:26:46 -0400 Message-Id: <20220330172646.2687555-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Muchun Song found out there could be a race between list_lru_add() and memcg_reparent_list_lru_node() causing the later function to miss reparenting of a lru entry as shown below: CPU0: CPU1: list_lru_add() spin_lock(&nlru->lock) l = list_lru_from_kmem(memcg) memcg_reparent_objcgs(memcg) memcg_reparent_list_lrus(memcg) memcg_reparent_list_lru() memcg_reparent_list_lru_node() if (!READ_ONCE(nlru->nr_items)) // Miss reparenting return // Assume 0->1 l->nr_items++ // Assume 0->1 nlru->nr_items++ Though it is not likely that a list_lru_node that has 0 item suddenly has a newly added lru entry at the end of its life. The race is still theoretically possible. With the lock/unlock pair used within the percpu_ref_kill() which is the last function call of memcg_reparent_objcgs(), any read issued in memcg_reparent_list_lru_node() will not be reordered before the reparenting of objcgs. Adding a !spin_is_locked()/smp_rmb()/!READ_ONCE(nlru->nr_items) check to ensure that either the reading of nr_items is valid or the racing list_lru_add() will see the reparented objcg. Fixes: 405cc51fc104 ("mm/list_lru: optimize memcg_reparent_list_lru_node()") Reported-by: Muchun Song Signed-off-by: Waiman Long --- mm/list_lru.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/mm/list_lru.c b/mm/list_lru.c index c669d87001a6..08ff54ffabd6 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -395,10 +395,33 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, struct list_lru_one *src, *dst; /* - * If there is no lru entry in this nlru, we can skip it immediately. + * With the lock/unlock pair used within the percpu_ref_kill() + * which is the last function call of memcg_reparent_objcgs(), any + * read issued here will not be reordered before the reparenting + * of objcgs. + * + * Assuming a racing list_lru_add(): + * list_lru_add() + * <- memcg_reparent_list_lru_node() + * spin_lock(&nlru->lock) + * l = list_lru_from_kmem(memcg) + * nlru->nr_items++ + * spin_unlock(&nlru->lock) + * <- memcg_reparent_list_lru_node() + * + * The !spin_is_locked(&nlru->lock) check is true means it is + * either before the spin_lock() or after the spin_unlock(). In the + * former case, list_lru_add() will see the reparented objcg and so + * won't touch the lru to be reparented. In the later case, it will + * see the updated nr_items. So we can use the optimization that if + * there is no lru entry in this nlru, skip it immediately. */ - if (!READ_ONCE(nlru->nr_items)) - return; + if (!spin_is_locked(&nlru->lock)) { + /* nr_items read must be ordered after nlru->lock */ + smp_rmb(); + if (!READ_ONCE(nlru->nr_items)) + return; + } /* * Since list_lru_{add,del} may be called under an IRQ-safe lock, @@ -407,7 +430,7 @@ static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid, spin_lock_irq(&nlru->lock); src = list_lru_from_memcg_idx(lru, nid, src_idx); - if (!src) + if (!src || !src->nr_items) goto out; dst = list_lru_from_memcg_idx(lru, nid, dst_idx); -- 2.27.0