Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp97883ybc; Fri, 22 Nov 2019 17:00:17 -0800 (PST) X-Google-Smtp-Source: APXvYqzLbiagpjiavBBI3fIR/Gpgp7qkWCp+2A2YW8ucfDalvYVCjsvG1+Bs60seOQJOT7ISw7Jm X-Received: by 2002:a17:906:3484:: with SMTP id g4mr25339592ejb.293.1574470817095; Fri, 22 Nov 2019 17:00:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574470817; cv=none; d=google.com; s=arc-20160816; b=r6+W5J+pGHt7J150rT40/FKDnAPKZiIW+s0rZhUPHj2xuYyeYpkP2dR7hVlQjeRjEg 0lLWW3Zj2rkg7+146W3CQPksPpvSmu8kdvnGuEFLJfYJanxYrH42gc4BYX0sYdoN2T/8 Gq7/Ns2wf11Pg/wmu9YFOms2Mt24FSdTVBa4j0hpEcJrct3sR4p8yLBZVXXUAX8YJCaW yC43Ksp99KN8alRtzoGcF4Pw+1lfNzjhx7q6iN3LAw+tUs8IQP1cuAVVQhDFgZ9YHNAr IIJqbHWApTY+G1T36eGYX5rbyREB/oAkoWDRehSqP5mVf6t8mn5LKcu6WkPXMGdHCSOi igUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=IuWeFnrYeJCjuhCM2V1dA58lGjRpOtXacus/IBoPj+E=; b=Aql8ESqkWNiUaPTF4Z4aNzhiSTkys1FJbIQa1wnkgxw7f7xv1Az76+5g7ZqSKR2Hrb jf4ZDLOUtbbqJBE1WdmoOILOYt1hyYnlPissi0PxToYSTG1ToONHM6WaYvICc85mYz+l ZP8MJFbXGp48ndTY8UrPlikNt9De+4SljgUDM1D8Qx/XENf/oCOiH0Pma8yuEnKvy7V5 irjPekF+sqqdUMR/BsOF8uKSYj5XdFidBv9d0SxaDXXIZbLLFrxqOimJuwjNy+fR3c6K 1em98kmC44htDadn3r56xOleCBMpRlfY358wrhZ+UDAJKMQBxExwstzm4NysKlPsamlm U2Wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SWT9RPbb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 35si8501154edj.185.2019.11.22.16.59.50; Fri, 22 Nov 2019 17:00:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=SWT9RPbb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726705AbfKWA6t (ORCPT + 99 others); Fri, 22 Nov 2019 19:58:49 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:36702 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726089AbfKWA6t (ORCPT ); Fri, 22 Nov 2019 19:58:49 -0500 Received: by mail-pg1-f196.google.com with SMTP id k13so4189090pgh.3 for ; Fri, 22 Nov 2019 16:58:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=IuWeFnrYeJCjuhCM2V1dA58lGjRpOtXacus/IBoPj+E=; b=SWT9RPbbH0wqbnY590Ng1zNYJSEYMwqyk2hjYVStgfe3cn8J9xhspUE/IhAhMyiyYF ITsmkvVswijzNTydtgsNqLfWO5AlPmHZfgbeP/ck3vqmG8WD1AthVOBjQq/7wIxhNj6N IhmPbMJDedgw7loa0CC2hR6GkM41ieeXfMen6GAXTZZY2Y0aEjcK/Bj1BejspvEmOXyA zo/hn1dh/4KxioncezIii0LXRSfDqEwYu6VZbBVGOhhR7OLAyWwJhZLPdB9oGfMaULZa gsxPSmoSKUVsQOHxRh4sawj5xROFzr/xaOzGB7fddbIKF3FVjM+LXYaqj5zXQ27i30Lz iu+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=IuWeFnrYeJCjuhCM2V1dA58lGjRpOtXacus/IBoPj+E=; b=rcD9RmMUI5jFqzyKFp2WhWLn9q6QVWuPiisfdDh//poploc5bng8iI3CsQBTm6kEFr m4FGGjytSZQ9b/UCfTWBqizFzIs9WnsPvtfBuX4u2Nb17t5W+NWNfxeS2TWqMSREFhhC AivQkcH3pjDEo2YFzAgmqTzM0YmBDSCU2H2agmQujQkM9TxBS6JAWALeEpaJAvvpko1x iqHYGPuqNkqtly9OwrRs3hh5xcS7PfAz0yLdBm3A2iMaNCwi8qKKz/7vEG1HdizdhU4n jQXbpEnC6/NuLjxHvkvW+jw3JdP1HRyyVDLdA/ZsmRfJ5oZbAbijj3UtRSGGW0A29bSk d3kw== X-Gm-Message-State: APjAAAX4+YbcA8YA6AhXLvhMm80dPHp8OcXvcnsVh33lZR+GluML0bbc ZA6lMhlBHpSjdu+PIiSohO82+w== X-Received: by 2002:a63:fb04:: with SMTP id o4mr1513860pgh.122.1574470727152; Fri, 22 Nov 2019 16:58:47 -0800 (PST) Received: from [100.112.92.218] ([104.133.9.106]) by smtp.gmail.com with ESMTPSA id w69sm9117470pfc.164.2019.11.22.16.58.45 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 22 Nov 2019 16:58:46 -0800 (PST) Date: Fri, 22 Nov 2019 16:58:34 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Johannes Weiner cc: Alex Shi , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , Jerome Glisse , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: Re: [PATCH v4 3/9] mm/lru: replace pgdat lru_lock with lruvec lock In-Reply-To: <20191122161652.GA489821@cmpxchg.org> Message-ID: References: <1574166203-151975-1-git-send-email-alex.shi@linux.alibaba.com> <1574166203-151975-4-git-send-email-alex.shi@linux.alibaba.com> <20191119160456.GD382712@cmpxchg.org> <20191121220613.GB487872@cmpxchg.org> <20191122161652.GA489821@cmpxchg.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 22 Nov 2019, Johannes Weiner wrote: > > But that leaves me with one more worry: compaction. We locked out > charge moving now, so between that and knowing that the page is alive, > we have page->mem_cgroup stable. But compaction doesn't know whether > the page is alive - it comes from a pfn and finds out using PageLRU. > > In the current code, pgdat->lru_lock remains the same before and after > the page is charged to a cgroup, so once compaction has that locked > and it observes PageLRU, it can go ahead and isolate the page. > > But lruvec->lru_lock changes during charging, and then compaction may > hold the wrong lock during isolation: > > compaction: generic_file_buffered_read: > > page_cache_alloc() > > !PageBuddy() > > lock_page_lruvec(page) > lruvec = mem_cgroup_page_lruvec() > spin_lock(&lruvec->lru_lock) > if lruvec != mem_cgroup_page_lruvec() > goto again > > add_to_page_cache_lru() > mem_cgroup_commit_charge() > page->mem_cgroup = foo > lru_cache_add() > __pagevec_lru_add() > SetPageLRU() > > if PageLRU(page): > __isolate_lru_page() > > I don't see what prevents the lruvec from changing under compaction, > neither in your patches nor in Hugh's. Maybe I'm missing something? Speaking for my patches only: I'm humbled, I think you have caught me, I cannot find any argument against the race you suggest here. The race with mem_cgroup_move_account(), which Konstantin pointed out in 2012's https://lore.kernel.org/lkml/4F433418.3010401@openvz.org/ but I later misunderstood, and came to think I needed no patch against, until this week coming to perceive the same race in isolate_lru_page(): that one is easily and satisfactorily fixed by holding lruvec lock in mem_cgroup_move_account() - embarrassing, but not too serious. Your race here (again, lruvec lock taken then PageLRU observed, but page->mem_cgroup changed in between) really questions my whole scheme: I am not going to propose a solution now, I'll have to go back and recheck my assumptions all over. Certainly isolate_migratepage_block() has a harder job than any other, but I need to re-review it all. Maybe we got it right back in the days of PageCgroupUsed, and then I paid too little attention when rebasing to your welcome simplifications. I don't think any of us want to bring back PageCgroupUsed! And maybe we could get it right by always holding lruvec lock in commit_charge(), lrucare or not; but that's a much hotter path, and not a change I'd expect anyone to embrace. I'll go away and re-examine it all; probably start by verifying that your race actually happens in practice, though we never observed it. Heavy-hearted thanks, Hannes! Hugh