Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp482741pxa; Tue, 11 Aug 2020 07:48:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwlXwEMvLExRxJVHIyK3vLq/ZzF/vrBDvFI9EKnXTexs+lX9KRYTiKsH1eUlsJLOZE3Yk0A X-Received: by 2002:a17:907:205c:: with SMTP id pg28mr25571094ejb.22.1597157331750; Tue, 11 Aug 2020 07:48:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597157331; cv=none; d=google.com; s=arc-20160816; b=oErbWB6vACH2e1O06drhnnAvocjFymxJOsPRuwHIxcmCw8kMC0ozP3hyRrME08y0zi /PbI/ERtyStOKqGT3Yk4DvSBZ7XFEux3+XNb4GNe5/T/rf1JPSTU5aXHEM7dRvbwH+3s Fs+O3SEKQfyKkTKKNH/kQ7cv9qHpA+jVNDkUtxEEL9TXEYvLOHAvAndewCXA98eoO46u 468nlfZh2dHu5/iTYlsAUowTwroA+4+/chf0Ld+H1eqJ5KHTQBtC6AZpqlmBVnZufQAf eBJzASx5QWgEMPHwlFz/NIDPCDlHjpEA27oyGaazcrGwNS3IwBGsssdDQMY58aRWYvOg vSng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=IIIvYG8OQy5SVNgWTuGukxssojF/486+mb+O9w4PDtk=; b=LZOWz3jGwLeVJ2wF+5MTKFWM/DhUHDGZMXHQGH/a+qW6k1BUMnUL8OVNWJPBOW72m8 yxWUxC5DyXUOLKlqxv93SRB/mCqow1XyefbmPlUSkvngO1+jjot9zDvabaNxFXjkK31F c9qhhKZHQP/zGBw9tl7j3dd3bKODEP11H6nYvfX8qb2yYMhTNWKMwFhm8WK4AcMRqpKz CBooUAGdDN3u4/7vTOdFFj4oFwAAZfLp1IYrHkKS/OFUaM/5AhAdrr+A424GSSAgJ9tx G0bvOxTKOiZAKftsM+KRO/OasmZfpPx0hrZVxd+Whaet6lul8dTIXm9E7yhSC++Ia0Qn rocw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NaqAX3wl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j12si12678746ejc.680.2020.08.11.07.48.27; Tue, 11 Aug 2020 07:48:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NaqAX3wl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728854AbgHKOra (ORCPT + 99 others); Tue, 11 Aug 2020 10:47:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728788AbgHKOr3 (ORCPT ); Tue, 11 Aug 2020 10:47:29 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DCA5C06174A; Tue, 11 Aug 2020 07:47:29 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id h4so7092846ioe.5; Tue, 11 Aug 2020 07:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=IIIvYG8OQy5SVNgWTuGukxssojF/486+mb+O9w4PDtk=; b=NaqAX3wl28TbdvrAD3AZ4ERIShK39VkxBdJD3bYWJhcwPuA0miqXrpuX9zpra5mCRQ 6dJtc1UcwKVyOq/vXdP05jlWyD9ube6IdAo7U6JTiMzGfpY5C4nsErPey/JuUTR2nQAD VTd8JBLugvUKNsoowNbdcp5QOa8bPwLso3lPkA24WuBMLsvEnQsnUQYS4I3P6L/5MPoF 2cSpkUTxbrqDMfHpS5s2yUKLU/LTol+uvFWNMyD0hq8SUPx9G+xyebl5S7pp6tZ3Y2zT hMktOb9KRF9JYdfNQHSI1H896WUX3zNRoqWzJjQwpdBLnneIV5yJ2lF0XzsG77n6eMYq Z4zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=IIIvYG8OQy5SVNgWTuGukxssojF/486+mb+O9w4PDtk=; b=CdPGg80yaggsQ7lU6N68y3JwA8BEwZSSeJIz43Xk4gJLKvrCrqaZ1+N3wm7nTH2oRN sRbKQAgglSdS0Gc7me9us1HOpxlEtyBavuA22ZLF6yfcBGHW6e2rlCdZEB03tR1gLO8I FC93C163V/9mixV5EWdipk8y5sAA+8OisYc5d8+Dpze7twIlMDDpIeGjK6hV1/GM/hc9 T89ikTcratDuymKfoQzmsMI+E+8JFTst8nxsNKZGUvP4rMTgkG6gjGJlRu1F1tTJl7oi 3jyAqJQrhtW7q28j+Bxdt5A4+zo26YMdQP1YsoF1JEnulyU5HXykhiY97LtTaZDLheL3 HjKA== X-Gm-Message-State: AOAM532piiH2RGmpewbXyUbt9OqTKK+5XvhWCQbEiINuSIvm5szloaLb Nrwrmy0XAfa/r1b85VBZ1CnemI0W295mFCm0TkY= X-Received: by 2002:a02:8384:: with SMTP id z4mr26693219jag.121.1597157248755; Tue, 11 Aug 2020 07:47:28 -0700 (PDT) MIME-Version: 1.0 References: <1595681998-19193-1-git-send-email-alex.shi@linux.alibaba.com> <1595681998-19193-15-git-send-email-alex.shi@linux.alibaba.com> <241ca157-104f-4f0d-7d5b-de394443788d@linux.alibaba.com> <8dbd004e-8eba-f1ec-a5eb-5dc551978936@linux.alibaba.com> In-Reply-To: From: Alexander Duyck Date: Tue, 11 Aug 2020 07:47:17 -0700 Message-ID: Subject: Re: [PATCH v17 14/21] mm/compaction: do page isolation first in compaction To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , Rong Chen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 11, 2020 at 1:23 AM Alex Shi wrote= : > > > > =E5=9C=A8 2020/8/10 =E4=B8=8B=E5=8D=8810:41, Alexander Duyck =E5=86=99=E9= =81=93: > > On Mon, Aug 10, 2020 at 6:10 AM Alex Shi w= rote: > >> > >> > >> > >> =E5=9C=A8 2020/8/7 =E4=B8=8B=E5=8D=8810:51, Alexander Duyck =E5=86=99= =E9=81=93: > >>> I wonder if this entire section shouldn't be restructured. This is th= e > >>> only spot I can see where we are resetting the LRU flag instead of > >>> pulling the page from the LRU list with the lock held. Looking over > >>> the code it seems like something like that should be possible. I am > >>> not sure the LRU lock is really protecting us in either the > >>> PageCompound check nor the skip bits. It seems like holding a > >>> reference on the page should prevent it from switching between > >>> compound or not, and the skip bits are per pageblock with the LRU bit= s > >>> being per node/memcg which I would think implies that we could have > >>> multiple LRU locks that could apply to a single skip bit. > >> > >> Hi Alexander, > >> > >> I don't find problem yet on compound or skip bit usage. Would you clar= ify the > >> issue do you concerned? > >> > >> Thanks! > > > > The point I was getting at is that the LRU lock is being used to > > protect these and with your changes I don't think that makes sense > > anymore. > > > > The skip bits are per-pageblock bits. With your change the LRU lock is > > now per memcg first and then per node. As such I do not believe it > > really provides any sort of exclusive access to the skip bits. I still > > have to look into this more, but it seems like you need a lock per > > either section or zone that can be used to protect those bits and deal > > with this sooner rather than waiting until you have found an LRU page. > > The one part that is confusing though is that the definition of the > > skip bits seems to call out that they are a hint since they are not > > protected by a lock, but that is exactly what has been happening here. > > > > The skip bits are safe here, since even it race with other skip action, > It will still skip out. The skip action is try not to compaction too much= , > not a exclusive action needs avoid race. That would be the case if it didn't have the impact that they currently do on the compaction process. What I am getting at is that a race was introduced when you placed this test between the clearing of the LRU flag and the actual pulling of the page from the LRU list. So if you tested the skip bits before clearing the LRU flag then I would be okay with the code, however because it is triggering an abort after the LRU flag is cleared then you are creating a situation where multiple processes will be stomping all over each other as you can have each thread essentially take a page via the LRU flag, but only one thread will process a page and it could skip over all other pages that preemptively had their LRU flag cleared. If you take a look at the test_and_set_skip the function only acts on the pageblock aligned PFN for a given range. WIth the changes you have in place now that would mean that only one thread would ever actually call this function anyway since the first PFN would take the LRU flag so no other thread could follow through and test or set the bit as well. The expectation before was that all threads would encounter this test and either proceed after setting the bit for the first PFN or abort after testing the first PFN. With you changes only the first thread actually runs this test and then it and the others will likely encounter multiple failures as they are all clearing LRU bits simultaneously and tripping each other up. That is why the skip bit must have a test and set done before you even get to the point of clearing the LRU flag. > > The point I was getting at with the PageCompound check is that instead > > of needing the LRU lock you should be able to look at PageCompound as > > soon as you call get_page_unless_zero() and preempt the need to set > > the LRU bit again. Instead of trying to rely on the LRU lock to > > guarantee that the page hasn't been merged you could just rely on the > > fact that you are holding a reference to it so it isn't going to > > switch between being compound or order 0 since it cannot be freed. It > > spoils the idea I originally had of combining the logic for > > get_page_unless_zero and TestClearPageLRU into a single function, but > > the advantage is you aren't clearing the LRU flag unless you are > > actually going to pull the page from the LRU list. > > Sorry, I still can not follow you here. Compound code part is unchanged > and follow the original logical. So would you like to pose a new code to > see if its works? No there are significant changes as you reordered all of the operations. Prior to your change the LRU bit was checked, but not cleared before testing for PageCompound. Now you are clearing it before you are testing if it is a compound page. So if compaction is running we will be seeing the pages in the LRU stay put, but the compound bit flickering off and on if the compound page is encountered with the wrong or NULL lruvec. What I was suggesting is that the PageCompound test probably doesn't need to be concerned with the lock after your changes. You could test it after you call get_page_unless_zero() and before you call __isolate_lru_page_prepare(). Instead of relying on the LRU lock to protect us from the page switching between compound and not we would be relying on the fact that we are holding a reference to the page so it should not be freed and transition between compound or not.