Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp900010ybz; Wed, 22 Apr 2020 09:55:10 -0700 (PDT) X-Google-Smtp-Source: APiQypKr5e0Z9q9VrgvxXtJZXW/gHzvOXXNfJSCZeJhPq7AsQ3TAh8mb/s26GN3gNufexH71UcDb X-Received: by 2002:a17:906:5918:: with SMTP id h24mr27253844ejq.210.1587574510475; Wed, 22 Apr 2020 09:55:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587574510; cv=none; d=google.com; s=arc-20160816; b=gM1MPcIVCqyqDUVTkh8c9oKVbd3ZvfL2xjCGR89Ucqpx/E8pyQ4yHjQceuzTnmmsCv P1garumZMfFYqJJvRKt1Oz7nM2wYBLTu3wma+b2vqoG6LM5Zu4dyKuBtZKNMXzoN+nfe n6CcHm91TreS68aZhCUM6gllyUn+nkV33MmNythtXHgVuZs1PCI1aOBs4XWckGQhc7TM hdX3tWR7v6bB1BbOKEFxGr5QkIrmiUrMbTjLcpdqZ1z4LZ40b+mXlvX95RZ+C2o8ACfR TZFqpy4UZO6isFRZb0AnFX8NUhlLzRTjRTSoPfspV6gqqtCibqTytbCFIq5tP9r7kRg3 WYYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WMU2sdANRlebnl16w8n5PRa+nqFLI55fKcb7gFihZwY=; b=lxTKDUQlaAvA5jApWPHiW5vuJz98ILCovYyRLrUJ3UqBEwK41zQYslMaSTK2aupeS7 N4+tSiHIBtia/LaCP4HD8ykPi34q2DYI1CRlYTyS5078wWyjtPnQtiDBogSoprQmCXZG wWQT1OkvoWiFlvRNuytfVsWDg0g48Xl5mPRfvhxVrERvseBEf8XizYZLwPEzpprhJXsY NY1tcej6s1rjh+0rLp0o0KNfFsNhRc+B91L5kWAM2dunpo3t/AUkA42iTk8pEY4lj15l X/YqLlhXmp0IzJ6eD27Sc81SePetsZZ4rfyA8PXB5OEEH/GM4UfySm0PHTuI7/riEYHH pHuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lb8r9xb4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z5si4483604edp.102.2020.04.22.09.54.47; Wed, 22 Apr 2020 09:55:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=lb8r9xb4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726413AbgDVQve (ORCPT + 99 others); Wed, 22 Apr 2020 12:51:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726006AbgDVQvd (ORCPT ); Wed, 22 Apr 2020 12:51:33 -0400 Received: from mail-lf1-x144.google.com (mail-lf1-x144.google.com [IPv6:2a00:1450:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A077C03C1AA for ; Wed, 22 Apr 2020 09:51:33 -0700 (PDT) Received: by mail-lf1-x144.google.com with SMTP id m2so2266549lfo.6 for ; Wed, 22 Apr 2020 09:51:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WMU2sdANRlebnl16w8n5PRa+nqFLI55fKcb7gFihZwY=; b=lb8r9xb492dHP5+E9i8W4r6C52dw70KTq4NynepcjSjXvxIqTqRhzsIw/ttkMxrwR+ j65ZyTEhp61QhtOvHDWc1S1Gaoob8C7seTN0+h615yPlj2d3h6iKXEMBokSYRM7iBfw2 wOx+DdKKvbiST5q/iSH8Zo6nzHoqELKr4D8cflowDrI/wIZIsZtR+puJFNCBBOQxCPFP zooVfRTMyWh0vdBpp4pjpVqBJTo8TpjwReKJMe+TIa/g0NEZPob/vo8o5fdY5lKWvlhe SAcrPWWpLCQCvjII891JE5HLeOF2Tg1gHwGQ+Pq6Z6lAAH1rPcLwEnbLP6JvB8ttEoSE Mstg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WMU2sdANRlebnl16w8n5PRa+nqFLI55fKcb7gFihZwY=; b=WJDH+HogGqtDPu9smJCP7zdjCZs9QoVYjdcoV5870+xmFjV2W6WDvfd6gePLlz4ycE mTs/dV399T3VpmVSF2Tot8wAjeLiAvyN+UsA5T+BaD+iPJ+x8wxa58bEYdp4mAfFVRtD waXwg/fwgbc94wq+gKvroRSp2BoM5NDEublHDMA7TtIHXFHoki6m2jFaNmEiWh+PdTU6 oVeVRaW2P+sDjvNONdFDlLbVpf4VErbJh9ks6sNDnlBNL2488AImhxU1ULIL1sluMVoX 5cEZYYbUTJZRWJBx0kIZwVSvLeKgycG4m0g46PoYJ25Xr7TnwK6w3162+fPWsC/k98Xg k3fw== X-Gm-Message-State: AGi0PuaQ+M1cI9ZIergLTXVb5hgvpg1GnS9TLz9fIfrxZGyXmiouQtGe Q+j/oInWg6bWHpHjD73uEHGogWTjQuLVJC8bvpkmmQ== X-Received: by 2002:a19:7407:: with SMTP id v7mr17903201lfe.124.1587574291394; Wed, 22 Apr 2020 09:51:31 -0700 (PDT) MIME-Version: 1.0 References: <20200420221126.341272-1-hannes@cmpxchg.org> <20200420221126.341272-3-hannes@cmpxchg.org> In-Reply-To: <20200420221126.341272-3-hannes@cmpxchg.org> From: Shakeel Butt Date: Wed, 22 Apr 2020 09:51:20 -0700 Message-ID: Subject: Re: [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving To: Johannes Weiner Cc: Joonsoo Kim , Alex Shi , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , Linux MM , Cgroups , LKML , Kernel Team Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 20, 2020 at 3:11 PM Johannes Weiner wrote: > > The move_lock is a per-memcg lock, but the VM accounting code that > needs to acquire it comes from the page and follows page->mem_cgroup > under RCU protection. That means that the page becomes unlocked not > when we drop the move_lock, but when we update page->mem_cgroup. And > that assignment doesn't imply any memory ordering. If that pointer > write gets reordered against the reads of the page state - > page_mapped, PageDirty etc. the state may change while we rely on it > being stable and we can end up corrupting the counters. > > Place an SMP memory barrier to make sure we're done with all page > state by the time the new page->mem_cgroup becomes visible. > > Also replace the open-coded move_lock with a lock_page_memcg() to make > it more obvious what we're serializing against. > > Signed-off-by: Johannes Weiner > --- > mm/memcontrol.c | 26 ++++++++++++++------------ > 1 file changed, 14 insertions(+), 12 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5beea03dd58a..41f5ed79272e 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -5372,7 +5372,6 @@ static int mem_cgroup_move_account(struct page *page, > { > struct lruvec *from_vec, *to_vec; > struct pglist_data *pgdat; > - unsigned long flags; > unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1; > int ret; > bool anon; > @@ -5399,18 +5398,13 @@ static int mem_cgroup_move_account(struct page *page, > from_vec = mem_cgroup_lruvec(from, pgdat); > to_vec = mem_cgroup_lruvec(to, pgdat); > > - spin_lock_irqsave(&from->move_lock, flags); > + lock_page_memcg(page); > > if (!anon && page_mapped(page)) { > __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); > __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); > } > > - /* > - * move_lock grabbed above and caller set from->moving_account, so > - * mod_memcg_page_state will serialize updates to PageDirty. > - * So mapping should be stable for dirty pages. > - */ > if (!anon && PageDirty(page)) { > struct address_space *mapping = page_mapping(page); > > @@ -5426,15 +5420,23 @@ static int mem_cgroup_move_account(struct page *page, > } > > /* > + * All state has been migrated, let's switch to the new memcg. > + * > * It is safe to change page->mem_cgroup here because the page > - * is referenced, charged, and isolated - we can't race with > - * uncharging, charging, migration, or LRU putback. > + * is referenced, charged, isolated, and locked: we can't race > + * with (un)charging, migration, LRU putback, or anything else > + * that would rely on a stable page->mem_cgroup. > + * > + * Note that lock_page_memcg is a memcg lock, not a page lock, > + * to save space. As soon as we switch page->mem_cgroup to a > + * new memcg that isn't locked, the above state can change > + * concurrently again. Make sure we're truly done with it. > */ > + smp_mb(); You said theoretical race in the subject but the above comment convinced me that smp_mb() is required. So, why is the race still theoretical? > > - /* caller should have done css_get */ > - page->mem_cgroup = to; > + page->mem_cgroup = to; /* caller should have done css_get */ > > - spin_unlock_irqrestore(&from->move_lock, flags); > + __unlock_page_memcg(from); > > ret = 0; > > -- > 2.26.0 >