Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp6309840ybv; Wed, 12 Feb 2020 09:44:52 -0800 (PST) X-Google-Smtp-Source: APXvYqzVMivMz0VcSV9/9Pn+0DtafNwbWBHx0IOQI+DTxkuOs4J0F+xhVJ5GaSAhXkUgknlf6p80 X-Received: by 2002:aca:514e:: with SMTP id f75mr129119oib.103.1581529491745; Wed, 12 Feb 2020 09:44:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581529491; cv=none; d=google.com; s=arc-20160816; b=j+uu0/psirdVIjjW9BODa5NwnQvaR7VSxUgaqSJ8Umnecsc59CyCSrPumiI34xi2Dp T3jKhJ8Atr9TprKN92srvHG9pDNhIHi6xg/esCPcp4eSpAHgfYer7ni7i9DcgmEZtY7c lh+tSvOEpveCME/ntZuJq0ZOdgewFMioCUNOzThm2v/AKSm/x8jblRklgMv6VIz9zQPs vk01i9Bk2wMhxB3WAqFs0PxwdNhLExfJON/oeCcjs67FVVI5ntEygPU5BXuKdmJ2Pj6t vg8Tl4/lyQ60tFxBTEM7dfLZhtqrMHqd+373j6ZEkwra3yvFgpnj/ai3PyvQNqaxU5a5 USxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=rNICHkLbJTVgEdPQbhs6w63GjR1LcJl8un8NvPek2fI=; b=DKPOv0uQQyrhqyjWvbwJYDDZjJZcc27pvANWQ2JrbMQyAGluCf9M/9xHF88+lQnJ3B iO3IuEruPU7BuI5raOiVIRcOmG1yjIg+sLO5Qe4tGsQurmTIk1nnJORtuYG/7qp888Kh FSxgayMn2LntQBxrVtGuju4wlZWsC/U64mh3QnFTqDgJLJr2uWgMJuj55GQ50wQs92dd 4KT+o8wi6s77HKW63wbn/6y4+xXXP7c01pEHuAVOCDnUEqTlBjJGRARi1l+mKEOKhFAT sq54SizhSnt7C67tQ2l9fTvuRy9ZzJ0liujkmZRyC8SrB3Kl8qRYfAmWaniYZLjfMede FBOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=OVshkgAz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y24si3491058oih.24.2020.02.12.09.44.39; Wed, 12 Feb 2020 09:44:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=OVshkgAz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728595AbgBLRnc (ORCPT + 99 others); Wed, 12 Feb 2020 12:43:32 -0500 Received: from mail-pj1-f68.google.com ([209.85.216.68]:56014 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727372AbgBLRnc (ORCPT ); Wed, 12 Feb 2020 12:43:32 -0500 Received: by mail-pj1-f68.google.com with SMTP id d5so1172066pjz.5 for ; Wed, 12 Feb 2020 09:43:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=rNICHkLbJTVgEdPQbhs6w63GjR1LcJl8un8NvPek2fI=; b=OVshkgAzjxdXM1zg6t1n401th/FuOWTu2lrRPuKw7zBNq000hv/19b+VfT4Ijn60l8 x0BmHMooDj9t44s0XHjLm+lLLkME57AsQZd2Iv2XLekvOwivDnToEZ240V45ZBcaM6cw VVdfiy7EhgUQdisQ5dVIaac+GvpEHaelLz7YbCNlA6z7En89ZN0NYH5TkDngk3CTtRln nt7ak0675ltINM0z5A61TSI/GsDpKvu9xCwepvQMW1WUWXXHsNaljgJYvllmXDnfHaGK k07oNGnNfP6XVI/LcJ/H3Va9Qde+YsWoyiAC4xxDsVuDDxPfA/wEcxblpNFJ2ff0CYTG 59QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=rNICHkLbJTVgEdPQbhs6w63GjR1LcJl8un8NvPek2fI=; b=cVekmEC0kyg4ZGUSjvZkWa0XfuWHHzxSpiFMU8FTDqHmkpkq5WUaAVTXb5u29yDXBO YnOWR6KoP9VbaOpMoDtb4m4NPUQOABywqDrww/qt7327tZcjeCSZZTOXUgVoeR8xrE6j McPJZWEKsxMDCb0MM4fecGcLtFUXuhBx6oXa3x6Pq6GwioTsiEZLAlvi3ezMcRi/zfW7 t3NNDa5vf7XL5reZmq6LQ27EA+rQg5gHpzFANPcBkocW20oCtbAt4iwvZ9ahF430VPhU JZU2KesRZXhcwOEqAkSu0UNSUyAedq8bk0ujm+H+88xRaxCO+2IAJ8mwiudtHoVuF2CZ xDrA== X-Gm-Message-State: APjAAAUWn19fIZWqzvGFHSCPLmkckVMhyUOBYGGR0qY3aUuXfTj6M9nJ 58kn93HtRBjBDUfUdiSVhrc= X-Received: by 2002:a17:902:7484:: with SMTP id h4mr9112516pll.206.1581529410937; Wed, 12 Feb 2020 09:43:30 -0800 (PST) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id l18sm1594393pfe.96.2020.02.12.09.43.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Feb 2020 09:43:29 -0800 (PST) Date: Wed, 12 Feb 2020 09:43:28 -0800 From: Minchan Kim To: Jan Kara Cc: Andrew Morton , linux-mm , Josef Bacik , Johannes Weiner , LKML Subject: Re: [PATCH] mm: fix long time stall from mm_populate Message-ID: <20200212174328.GC93795@google.com> References: <20200211001958.170261-1-minchan@kernel.org> <20200212102206.GE25573@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200212102206.GE25573@quack2.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 12, 2020 at 11:22:06AM +0100, Jan Kara wrote: > On Mon 10-02-20 16:19:58, Minchan Kim wrote: > > Basically, fault handler releases mmap_sem before requesting readahead > > and then it is supposed to retry lookup the page from page cache with > > FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. > > > > However, what happens if the fault handler find a page from page > > cache and the page has readahead marker but are waiting under > > writeback? Plus one more condition, it happens under mm_populate > > which repeats faulting unless it encounters error. So let's assemble > > conditions below. > > > > __mm_populate > > for (...) > > get_user_pages(faluty_address) > > handle_mm_fault > > filemap_fault > > find a page form page(PG_uptodate|PG_readahead|PG_writeback) > > it will return VM_FAULT_RETRY > > continue with faulty_address > > > > IOW, it will repeat fault retry logic until the page will be written > > back in the long run. It makes big spike latency of several seconds. > > > > This patch solves the issue by turning off fault retry logic in second > > trial. > > > > Signed-off-by: Minchan Kim > > --- > > It was orignated from code review once I have seen several user reports > > but didn't confirm yet it's the root cause. > > Yes, I think the immediate problem is actually elsewhere but I agree that > __mm_populate() should follow the general protocol of retrying only once > so your change should make it more robust. The patch looks good to me, you > can add: > > Reviewed-by: Jan Kara Thanks for the review!