Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1726152imu; Thu, 22 Nov 2018 22:32:09 -0800 (PST) X-Google-Smtp-Source: AFSGD/UnqCFYNe1ZMKDsjhfw37uCCe+A0tSbX7GRdcm1wPGsSGtIYjTkSqpLfwzC3Wzoj1cpYEun X-Received: by 2002:a17:902:654a:: with SMTP id d10mr14067083pln.324.1542954729864; Thu, 22 Nov 2018 22:32:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542954729; cv=none; d=google.com; s=arc-20160816; b=Asd/5vSHpBY5n4NmX3QH7rRbulOmRhmssXVr21L/2Zx/haqXejQqLP1ZmBslu5WjZs 9nGPPEGS6muAtRjKKwwkRcjyXWRaiXs00pAeeI8/mB18ETuYb+S1VdWM0LifrXT5FLvb ggs8fRbtVIAXXpyauRNDGq/8abViWqfF1dKA86Z4FIsb8877aRwAf9cPtfxLy1hQsYa5 5AN7rqtzAMlU0ujbfpgN3cc8jRxEhZABMuop/Mwbszn+e1DaSS4Dnr7L/n+8kQ2FOuvz FB/dvAHwvlBOqq0pA0WTHrVOkpgxduLora4Jh0lwPDG1FGuakpjCqlVi+PuPaoIWPDeJ M0Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=dmYQPzYsQQkciSaxvw83IyyVDT8cLFQfHvo5Ntf3AUY=; b=d9NNrHQKyRftKBHH9GDfE8gRcSz6vrhs+54pc0t95cyAKsm6qXvJBheb3YopxzxXg9 REIBlRxUwH8L66njblK3mtOqwNZszhHcQo5i3XJXRhJ5+rP219SJNCEmgtQQzlYIYPbU 3FDJ85lfX4MIRxoQjSaVsvpBY4yNj2VpsuHCgQkYrEi2a+/ifFZgThipAZCw/DWvbCrU 4dnzknFRMvH8fBE81bRHncWIeMMQ7L4clZ8hDlfkrzDCuUL75b0pC1NTNamMt3rNTRAs L5fQfMTuok0ybr53dr3IP9xfv0T+YGjr7I8LdJEyV4v1yZ4vIuFOWpuQXPtgncg0xPz5 WZJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h3si51760361pgi.391.2018.11.22.22.31.52; Thu, 22 Nov 2018 22:32:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405124AbeKVToY (ORCPT + 99 others); Thu, 22 Nov 2018 14:44:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:36596 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731655AbeKVToY (ORCPT ); Thu, 22 Nov 2018 14:44:24 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 50B13AD5D; Thu, 22 Nov 2018 09:05:48 +0000 (UTC) Date: Thu, 22 Nov 2018 10:05:47 +0100 From: Michal Hocko To: Hugh Dickins Cc: linux-mm@kvack.org, Andrew Morton , Oscar Salvador , Pavel Tatashin , David Hildenbrand , LKML , "Kirill A. Shutemov" Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page Message-ID: <20181122090547.GD18011@dhcp22.suse.cz> References: <20181120134323.13007-1-mhocko@kernel.org> <20181120134323.13007-4-mhocko@kernel.org> <20181121071132.GD12932@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 21-11-18 18:27:11, Hugh Dickins wrote: > On Wed, 21 Nov 2018, Michal Hocko wrote: > > On Tue 20-11-18 17:47:21, Hugh Dickins wrote: > > > On Tue, 20 Nov 2018, Michal Hocko wrote: > > > > > > > From: Michal Hocko > > > > > > > > filemap_map_pages takes a speculative reference to each page in the > > > > range before it tries to lock that page. While this is correct it > > > > also can influence page migration which will bail out when seeing > > > > an elevated reference count. The faultaround code would bail on > > > > seeing a locked page so we can pro-actively check the PageLocked > > > > bit before page_cache_get_speculative and prevent from pointless > > > > reference count churn. > > > > > > > > Cc: "Kirill A. Shutemov" > > > > Suggested-by: Jan Kara > > > > Signed-off-by: Michal Hocko > > > > > > Acked-by: Hugh Dickins > > > > Thanks! > > > > > though I think this patch is more useful to the avoid atomic ops, > > > and unnecessary dirtying of the cacheline, than to avoid the very > > > transient elevation of refcount, which will not affect page migration > > > very much. > > > > Are you sure it would really be transient? In other words is it possible > > that the fault around can block migration repeatedly under refault heavy > > workload? I just couldn't convince myself, to be honest. > > I don't deny that it is possible: I expect that, using fork() (which does > not copy the ptes in a shared file vma), you can construct a test case > where each child faults one or another page near a page of no interest, > and that page of no interest is a target of migration perpetually > frustrated by filemap_map_pages()'s briefly raised refcount. The other issue I am debugging and which very likely has the same underlying issue in the end has shown [ 883.930477] rac1 kernel: page:ffffea2084bf5cc0 count:1889 mapcount:1887 mapping:ffff8833c82c9ad8 index:0x6b [ 883.930485] rac1 kernel: ext4_da_aops [ext4] [ 883.930497] rac1 kernel: name:"libc-2.22.so" [ 883.931241] rac1 kernel: do_migrate_range done ret=23 pattern. After we have disabled the faultaround the failure has moved to a different page but libc hasn't shown up again. This might be a matter of (bad)luck and timing. But we thought that it is not too unlikely for faultaround on such a shared page to strike in. > But I suggest that's a third-order effect: well worth fixing because > it's easily and uncontroversially dealt with, as you have; but not of > great importance. > > The first-order effect is migration conspiring to defeat itself: that's > what my put_and_wait_on_page_locked() patch, in other thread, is about. yes. That is obviously a much more effective fix. > The second order effect is when a page that is really wanted is waited > on - the target of a fault, for which page refcount is raised maybe > long before it finally gets into the page table (whereupon it becomes > visible to try_to_unmap(), and its mapcount matches refcount so that > migration can fully account for the page). One class of that can be > well dealt with by using put_and_wait_on_page_locked_killable() in > lock_page_or_retry(), but I was keeping that as a future instalment. > > But I shouldn't denigrate the transient case by referring so lightly > to migrate_pages()' 10 attempts: each of those failed attempts can > be very expensive, unmapping and TLB flushing (including IPIs) and > remapping. It may well be that 2 or 3 would be a more cost-effective > number of attempts, at least when the page is mapped. If you want some update to the comment in this function or to the changelog, I am open of course. Right now I have + * Check for a locked page first, as a speculative + * reference may adversely influence page migration. as suggested by William. -- Michal Hocko SUSE Labs