Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp6575199ybv; Wed, 12 Feb 2020 15:12:48 -0800 (PST) X-Google-Smtp-Source: APXvYqxhJkKtVyxcKYmjwH0XYDb25Xp3cZygaYW70ZzVEyDFJheR06kIFZ9B8hUGyibjNCYpnpL8 X-Received: by 2002:aca:5258:: with SMTP id g85mr1091244oib.80.1581549168208; Wed, 12 Feb 2020 15:12:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581549168; cv=none; d=google.com; s=arc-20160816; b=mnLVFB52NfpN1+NHB4Y9EJDg54+Zn4dF42BGACPumq4TQrtN5jPvTx1nfctGOTs6pV py+pUCug+9kKG329Ekbqf+R7TN/c6gIaMDo5pDqiMV3sSBS7xofTFl8EmVrzTKnhkKRt SH0XyUrkenFh53VYbgy7hzLnvyK/CYQFnuJF98lfocAPoysex3bqs6LGQ+wYza1fZ9uz 31uUJyqpjOVr1JhquM8sZMFTHe1GpAX69dEKIzSUtBvdLpqp7UnxtmeehmBCU4Y/XNnX tvXtsWdztud1VAXoVjrclBI5YGJoGzn4BKW+fvUHM6HtXPsA5w+Ojb1ytdWky2yJpHk3 Cjcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Tra7DkW4Ug7XOH+KIBkrjPoBiq7GSiJb0g4w8xR1qLU=; b=dWmicyLnL0ACA4fXJRL8p+aN2COzYI5HA5VXFoudVhbGj7Zf997k+Gplf4HkAj9oIF 9Qsk8qnjFSDvjFbUjxlpHMZhBv3KWCPmNnblgCZQS78l7Pro/ipCWLOcd23tEgmFUW2W bKozC4iVWIf4hDPLZI44JmV/jJCkP3A6pALVR3FHDoNtSDpyQWyggJxMGPmUutwfgAhz vbgWW6t0+c8logn7R1ka50pjxJK/xiuHS5nvto/T8ieDxrWRfO9E0uDlDidKgJee2JpV s/mbBaLkjah+QfKY0OCAt1tzd3Gqsk+cZ+zc8AS6BQUKcTZLqbl+XeHaSwxmToNtKypE TsoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Dp1Nk18Z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r12si145130otq.156.2020.02.12.15.12.35; Wed, 12 Feb 2020 15:12:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=Dp1Nk18Z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729213AbgBLXMQ (ORCPT + 99 others); Wed, 12 Feb 2020 18:12:16 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:37846 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729194AbgBLXMP (ORCPT ); Wed, 12 Feb 2020 18:12:15 -0500 Received: by mail-pl1-f196.google.com with SMTP id c23so1534738plz.4 for ; Wed, 12 Feb 2020 15:12:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Tra7DkW4Ug7XOH+KIBkrjPoBiq7GSiJb0g4w8xR1qLU=; b=Dp1Nk18ZQOEKIuvucaP3mFLGhGNU+WNbsa6ok4EsqiH/zPfBTLJgNlur3OAyp2ZECq hNIYEHA7KfCg2SUnDHNLHt4GirxR8vHBz/M/k0KqkC7T9LF0SVz5cYeo+wlHwFTCf72N bNLaMvX/zYa1OC6eY3MpZnRMJPIa9MakGAMbOL9achr1odxxY4QegR7YJnjuZm4yw4In WyMujejgPqtlCsuHzuN1wUq8KGMcC2zMY8dj1/81hwZIF2e+4zn7d3jy/P0BsICq/ped hKEZXxQ8x5vs9Lu8oKhtpU5umlvvkyNZxLWGTDVWGdhpHQ+fQT636HoXr2Bukwc3R96D 3JjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=Tra7DkW4Ug7XOH+KIBkrjPoBiq7GSiJb0g4w8xR1qLU=; b=I5AmFXaQUQ2B/5PeArNwE9MBCZVyXi/fGurZUH4/EoCh6y0ABL/RiXZfo/eV5UQd/9 W8GwYFGSJcQK46t+bewvD/K+lRfWt2u2hoXntl25kcWwnCJMmCY7qlSJ7AOuXIVvB7ac W3n1EsTTjDoykpWGe96lg+gc6MtZEn/RT2MIcR7RPCAy+vewn7rmfNu17IvNQv6qyeUX C4/OJarHIF0g+TVtwtfvBlsQa4fC1gMb9/KoByZu6+5akZSnKS5/Vwyfwnlt0yf8zhHb U9ea9+RPKzy73nM/esVr8CfgScCzzo2o4heOVXasJUiuGJA5m4WaFSYiJ14MRmg+y6Ms xqTw== X-Gm-Message-State: APjAAAUXtnGZ2ROtGivCxNBCZDh6TaFnpgUZsWMQ8sTj4ysQuTltpQhy 5jvJ2ybx76H/n/s5ibEotsk= X-Received: by 2002:a17:90a:8547:: with SMTP id a7mr1763279pjw.0.1581549133392; Wed, 12 Feb 2020 15:12:13 -0800 (PST) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id z3sm283795pfz.155.2020.02.12.15.12.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2020 15:12:12 -0800 (PST) Date: Wed, 12 Feb 2020 15:12:10 -0800 From: Minchan Kim To: Andrew Morton Cc: Matthew Wilcox , Jan Kara , linux-mm , Josef Bacik , Johannes Weiner , LKML Subject: Re: [PATCH] mm: fix long time stall from mm_populate Message-ID: <20200212231210.GA233109@google.com> References: <20200211042536.GB242563@google.com> <20200211122323.GS8731@bombadil.infradead.org> <20200211163404.GC242563@google.com> <20200211172803.GA7778@bombadil.infradead.org> <20200211175731.GA185752@google.com> <20200212101804.GD25573@quack2.suse.cz> <20200212174015.GB93795@google.com> <20200212182851.GG7778@bombadil.infradead.org> <20200212195322.GA83146@google.com> <20200212142435.0b7e938fe8112fa35fcbcc71@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200212142435.0b7e938fe8112fa35fcbcc71@linux-foundation.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 12, 2020 at 02:24:35PM -0800, Andrew Morton wrote: > On Wed, 12 Feb 2020 11:53:22 -0800 Minchan Kim wrote: > > > > That's definitely wrong. It'll clear PageReclaim and then pretend it did > > > nothing wrong. > > > > > > return !PageWriteback(page) || > > > test_and_clear_bit(PG_reclaim, &page->flags); > > > > > > > Much better, Thanks for the review, Matthew! > > If there is no objection, I will send two patches to Andrew. > > One is PageReadahead strict, the other is limit retry from mm_populate. > > With much more detailed changelogs, please! > > This all seems rather screwy. if a page is under writeback then it is > uptodate and we should be able to fault it in immediately. Hi Andrew, This description in cover-letter will work? If so, I will add each part below in each patch. Subject: [PATCH 0/3] fixing mm_populate long stall I got several reports major page fault takes several seconds sometime. When I review drop mmap_sem in page fault hanlder, I found several bugs. CPU 1 CPU 2 mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead shrink_page_list pageout SetPageReclaim(=SetPageReadahead) writepage SetPageWriteback if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; here, since ret from populate_vma_page_range is zero, the loop continue to run with same address with previous iteration. It will repeat the loop until the page's writeout is done(ie, PG_writeback or PG_reclaim clear). We could fix the above specific case via adding PageWriteback. IOW, ret = populate_vma_page_range ... ... filemap_fault do_async_mmap_readahead if (!PageWriteback(page) && PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (PageWriteback(page)) return; That's a thing [3/3] is fixing here. Even though it could fix the problem effectively, it has still livelock problem theoretically because the page of faulty address could be reclaimed and then allocated/become readahead marker on other CPUs during faulty process is retrying in mm_populate's loop. [2/3] is fixing the such livelock via limiting retry count. There is another hole for the livelock or hang of the process as well as ageWriteback - ra_pages. mm_populate for () .. ret = populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(page)) maybe_unlock_mmap_for_io up_read(mmap_sem) page_cache_async_readahead() if (!ra->ra_pages) return; It will repeat the loop until ra->ra_pages become non-zero. [1/3] is fixing the problem. Jan Kara (1): mm: Don't bother dropping mmap_sem for zero size readahead Minchan Kim (2): mm: fix long time stall from mm_populate mm: make PageReadahead more strict include/linux/page-flags.h | 28 ++++++++++++++++++++++++++-- mm/filemap.c | 2 +- mm/gup.c | 9 +++++++-- mm/readahead.c | 6 ------ 4 files changed, 34 insertions(+), 11 deletions(-) -- 2.25.0.225.g125e21ebc7-goog