Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp501083ybx; Wed, 6 Nov 2019 04:07:45 -0800 (PST) X-Google-Smtp-Source: APXvYqxPVZv21Zybi3ktICbDBRnA2nH/5pBiz8lpr0JKVSoN2A0eLLmHwTbdhIdJ5K5lJdeFo1mv X-Received: by 2002:a17:906:48b:: with SMTP id f11mr6846134eja.225.1573042064930; Wed, 06 Nov 2019 04:07:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573042064; cv=none; d=google.com; s=arc-20160816; b=FLriIUEULjpBtxmq6RdNld5VIGPKc2unn0arNRRNwuiQjKRpauDz56T56NPTA/zveY Tg0yf5X2Y6OV1aMdqSaJy1Omk8bvpX55W0/NciSDzhpXHk1q3rjZ3shp1ZC4cFkpijX1 9K4zrPF0nWpVTSDscxO/8mLc0R0Hvlg0LhgkkFCsaV0QrSmwJLPFG3TP7ROicpNL1Fb7 FhPmFm3xk2EC4RhtM2KUflpbh7buEWZNH643wuxecvvnhZwtn3eXT+tsQSvUmgsbEw50 nLKbn4PAlFoe8ji7kNnJlMBHDJ9Jga0WSOoiOvTET4DHS9I2a1/+8fw3uFgAMctjYO5L uTAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=32AH2I0ssN2vC6TrLDDxhPAoNg69CzZIoB0mNKDmMs4=; b=CrEZcv5JW6uV3SEFgjuesJGk7dJpLRD2bMulv5qLey1Re7G1GvRY4ndUYzZlrfnYfD /dzGhYjf1yibYGX5envjTgul+fBR72N9FfHrvC5KQu/vNkwV3Zv/4KCMh8DEWj//fbb2 Vtf9aO9r5K2g34Zpg+WJNYIL1RELaP8CMlel4NQj4KZxbMvBMGgu6Gd8rR5gzTQy4Z3Z rVC/EVpS3N9abHWzaOWW4tVQnoyW9PkKM1Z3DhfBal/JfPT6kyohCbkJVCJZtJuVdvzz asYTM/WkWsN2F6XW02xeP+/2qH95PQ++wskJ3WRdCL6i1jW4Ug+W/oBUDrjTTfTcILli n/+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l8si11382523edb.234.2019.11.06.04.07.21; Wed, 06 Nov 2019 04:07:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731731AbfKFMDT (ORCPT + 99 others); Wed, 6 Nov 2019 07:03:19 -0500 Received: from mx2.suse.de ([195.135.220.15]:40712 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729286AbfKFMDS (ORCPT ); Wed, 6 Nov 2019 07:03:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 33B8BB4E0; Wed, 6 Nov 2019 12:03:16 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 81CD01E4862; Wed, 6 Nov 2019 13:03:15 +0100 (CET) Date: Wed, 6 Nov 2019 13:03:15 +0100 From: Jan Kara To: Johannes Weiner Cc: Vlastimil Babka , snazy@snazy.de, Michal Hocko , Josef Bacik , Jan Kara , "Kirill A. Shutemov" , Randy Dunlap , linux-kernel@vger.kernel.org, Linux MM , Andrew Morton , "Potyra, Stefan" Subject: Re: mlockall(MCL_CURRENT) blocking infinitely Message-ID: <20191106120315.GF16085@quack2.suse.cz> References: <20191025121104.GH17610@dhcp22.suse.cz> <20191025132700.GJ17610@dhcp22.suse.cz> <707b72c6dac76c534dcce60830fa300c44f53404.camel@gmx.de> <20191025135749.GK17610@dhcp22.suse.cz> <20191025140029.GL17610@dhcp22.suse.cz> <20191105182211.GA33242@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191105182211.GA33242@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 05-11-19 13:22:11, Johannes Weiner wrote: > On Tue, Nov 05, 2019 at 04:28:21PM +0100, Vlastimil Babka wrote: > > On 11/5/19 2:23 PM, Robert Stupp wrote: > > > "git bisect" led to a result. > > > > > > The offending merge commit is f91f2ee54a21404fbc633550e99d69d14c2478f2 > > > "Merge branch 'akpm' (rest of patches from Andrew)". > > > > > > The first bad commit in the merged series of commits is > > > https://github.com/torvalds/linux/commit/6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11 > > > . a75d4c33377277b6034dd1e2663bce444f952c14, the commit before 6b4c9f44, > > > is good. > > > > Ah, great you could bisect this. CCing people from the commit > > 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > > Judging from Robert's stack captures, the task is not hung but > busy-looping in __mm_populate(). AFAICS, the only way this can occur > is if populate_vma_page_range() returns 0 and we don't advance the > iteration position (if it returned an error, we wouldn't reset nend > and move on to the next vma as ignore_errors is 1 for mlockall.) > > populate_vma_page_range() returns 0 when the first page is not found > and faultin_page() returns -EBUSY (if it were processing pages, or if > the error from faultin_page() would be a different one, we would > return the number of pages processed or -error). > > faultin_page() returns -EBUSY when VM_FAULT_RETRY is set, i.e. we > dropped the mmap_sem in order to initiate IO and require a retry. That > is consistent with the bisect result (new VM_FAULT_RETRY conditions). > > At this point, regular page fault would retry with FAULT_FLAG_TRIED to > indicate that the mmap_sem cannot be dropped a second time. But this > mlock path doesn't set that flag and we can loop repeatedly. That is > something we probably need to fix with a FOLL_TRIED somewhere. It seems we could use __get_user_pages_locked() for that in populate_vma_page_range() if we were guaranteed that mm stays alive. This is guaranteed for current->mm cases but there seem to be some callers to populate_vma_page_range() where mm could indeed go away once we drop mmap_sem. These luckily pass NULL for the 'nonblocking' parameter though so all call sites seem to be fine but it would be fragile... > What I don't quite understand yet is why the fault path doesn't make > progress eventually. We must drop the mmap_sem without changing the > state in any way. How can we keep looping on the same page? That may be a slight suboptimality with Josef's patches. If the page is marked as PageReadahead, we always drop mmap_sem if we can and start readahead without checking whether that makes sense or not in do_async_mmap_readahead(). OTOH page_cache_async_readahead() then clears PageReadahead so the only way how I can see we could loop like this is when file->ra->ra_pages is 0. Not sure if that's what's happening through. We'd need to find which of the paths in filemap_fault() calls maybe_unlock_mmap_for_io() to tell more. Honza -- Jan Kara SUSE Labs, CR