Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp446979imu; Tue, 11 Dec 2018 01:42:02 -0800 (PST) X-Google-Smtp-Source: AFSGD/XBL9aps+lIfiggnqwUynCv4axouUN+W9+rzwjh2gcHU+pGJxIHqahLPfssyXvBtoHUAqLl X-Received: by 2002:a17:902:6bc7:: with SMTP id m7mr15702075plt.106.1544521322235; Tue, 11 Dec 2018 01:42:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544521322; cv=none; d=google.com; s=arc-20160816; b=s1OXhf2ZAwfEEDlElaoAdO3ax9A8wPSDmlKHpwPNqqRGoL57yUHEl+RUHslupsNrPu f6bQ2j4LtR3Ne/5iuiP6rok5HJQZdjYVyQgAMO4tKkV13vKVieu+VOXMDaIpYjqNgbma vCY4t4GuS21wDgmLVV2Mzcb1UGDf0i+/uTvFxlMtowLb2m3GBN7JDTIU+BPJA7FrDKrB lpMjaIrOUhYiYqKG4xTqx2l9mIfTZ4oRf+9VJDCY+J1qERdC+8P9QfbC3d8FDDhRMCVe BMvE/y8ZYv5eQx9/6k+Z6KlYB+jGSCW5IYkYXfda8FEUjcLv1UJsJzMmCYhL7cjn2VWU bvoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Ce2cFHVYqx3M4t2nYHF0hNdL7lv8PEEva34lx3v2nNw=; b=kZE3dKMt/16FA9AezofpE+feePQXEvOOWT55pcoJ5NmVelNbZCz/C/EKqtHhNv3v1A o2hi/WRsh8OWT9A8tpX3lKKZWkCD8/g46GWEVJ0eWCP+/NoYs8FCX0nJDSDOD4nFOTlo FDw9RRSauMgH5QWjtBdGXDBDHO7qU1Gn5gE+U9CPHsSPvnVvTRRHkKQPTZvBiSaMqPlk qht39woC8njhtd2rf+w3JN4LET0dLkz13SQ8x6zqMDbs3kCtIVhiDQzzwpVMGDeWFQmT NhskEzZRPBN6tgQzUj+W25cKQQG8UM1FD1DEL0kXBBQPSJ/yjOkNWT29eE75kRCWojDn 3ExQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x64si12596440pfb.120.2018.12.11.01.41.46; Tue, 11 Dec 2018 01:42:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726253AbeLKJkh (ORCPT + 99 others); Tue, 11 Dec 2018 04:40:37 -0500 Received: from mx2.suse.de ([195.135.220.15]:44304 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726085AbeLKJkh (ORCPT ); Tue, 11 Dec 2018 04:40:37 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 45934AE03; Tue, 11 Dec 2018 09:40:35 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 33C751E13F8; Tue, 11 Dec 2018 10:40:34 +0100 (CET) Date: Tue, 11 Dec 2018 10:40:34 +0100 From: Jan Kara To: Josef Bacik Cc: Jan Kara , kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com Subject: Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Message-ID: <20181211094034.GD17539@quack2.suse.cz> References: <20181130195812.19536-1-josef@toxicpanda.com> <20181130195812.19536-4-josef@toxicpanda.com> <20181207110138.GE13008@quack2.suse.cz> <20181210184438.va7mdwjgwndgri4s@macbook-pro-91.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181210184438.va7mdwjgwndgri4s@macbook-pro-91.dhcp.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 10-12-18 13:44:39, Josef Bacik wrote: > On Fri, Dec 07, 2018 at 12:01:38PM +0100, Jan Kara wrote: > > On Fri 30-11-18 14:58:11, Josef Bacik wrote: > > > @@ -2433,9 +2458,32 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > > return vmf_error(-ENOMEM); > > > } > > > > > > - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { > > > - put_page(page); > > > - return ret | VM_FAULT_RETRY; > > > + /* > > > + * We are open-coding lock_page_or_retry here because we want to do the > > > + * readpage if necessary while the mmap_sem is dropped. If there > > > + * happens to be a lock on the page but it wasn't being faulted in we'd > > > + * come back around without ALLOW_RETRY set and then have to do the IO > > > + * under the mmap_sem, which would be a bummer. > > > + */ > > > > Hum, lock_page_or_retry() has two callers and you've just killed one. I > > think it would be better to modify the function to suit both callers rather > > than opencoding? Maybe something like lock_page_maybe_drop_mmap() which > > would unconditionally acquire the lock and return whether it has dropped > > mmap sem or not? Callers can then decide what to do. > > > > I tried this, but it ends up being convoluted, since swap doesn't have a file to > pin we have to add extra cases for that, and then change the return value to > indicate wether we locked the page _and_ dropped the mmap sem, or just locked > the page, etc. It didn't seem the extra complication, so I just broke the open > coding out into its own helper. OK. Thanks for looking into this! > > BTW I'm not sure this complication is really worth it. The "drop mmap_sem > > for IO" is never going to be 100% thing if nothing else because only one > > retry is allowed in do_user_addr_fault(). So the second time we get to > > filemap_fault(), we will not have FAULT_FLAG_ALLOW_RETRY set and thus do > > blocking locking. So I think your code needs to catch common cases you > > observe in practice but not those super-rare corner cases... > > I had counters in all of these paths because I was sure some things weren't > getting hit at all, but it turns out each of these cases gets hit with > surprisingly high regularity. Cool! Could you share these counters? I'd be curious and they'd be actually nice as a motivation in the changelog of this patch to show the benefit. > The lock_page_or_retry() case in particular gets hit a lot with > multi-threaded applications that got paged out because of heavy memory > pressure. By no means is it as high as just the normal readpage or > readahead cases, but it's not 0, so I'd rather have the extra helper here > to make sure we're never getting screwed. Do you mean the case where we the page is locked in filemap_fault() (so that lock_page_or_retry() bails after waiting) and when the page becomes unlocked it is not uptodate? Because that is the reason why you opencode lock_page_or_retry(), right? I'm not aware of any normal code path that would create page in page cache and not try to fill it with data before unlocking it so that's why I'm really trying to make sure we understand each other. Honza -- Jan Kara SUSE Labs, CR