Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8814840imu; Tue, 4 Dec 2018 14:51:30 -0800 (PST) X-Google-Smtp-Source: AFSGD/XoGdKpk02Lk0xkNgVOXlKYRiYNuJ6tKtLMRQByLqTu+dns9DCk9VutFtmTQXP25Z+nPQPc X-Received: by 2002:a17:902:7686:: with SMTP id m6mr22412673pll.179.1543963890616; Tue, 04 Dec 2018 14:51:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543963890; cv=none; d=google.com; s=arc-20160816; b=NuI/qUAFBrCvILlAi/fFtnnZydN0RYw4AMwPBlEa2zF8NmaVfk4mNfH7yXP8ayvQAC 0t93K8wbUtaSjV3zEK0v8wDsl+DDBYPcPggL3ldgvQxtLBog+gU56QdkKWTL6LYt7gCU kF3pILZT6MReTJth+8Z86w/HqwFfwHj7tH3+YASxKv3eYifzdym9cyNCfadtXrMRtbaU 6wSXwzPq6D9sSQe+cybdGClXvMAq/nyZqmQFFBJ+oy0IS/QJCC4CCWsIuXF+czdfh5hC DEiuPgk97d02+qvdqf774ZhCOdU+QVCykAVKmxP0MePOZ0f/S6lZZRfPkENoWiBErNbs vj0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=sMRFePMUHuYy6uVpPgJos/yrWKGg+L/nPaJ7bmDKR2w=; b=bTDT0t88Gg/qENt3ZsrIeptNSGTwTLHGWMi71qbSCNsvUd+OskIakHaA/yqGCMGIRf gMcYZlHYP76zlPJ07FXyoT8ese/E2OGr3gYSXFGck24BCYBUNrOnFm8yKq6bX5r0gvaf SqcJ9R5zspfM3cWiBpUZyE0wd0pskb2OGIWK9k8AK6z7IG5yy/gAnEX/rOs0IfOjqKfu rYF4YJ0SKKtmqE2SftQquYKlHtqBKn5dx4m/5QwAXRgc635eJLybSiM42jZdnwAad4DY PJZ4kfTYUtKzvzm/7syUOdlzbLrb7Ii2Nrbe9EcjtMcGN//+WlZy4DB7cxCntbn09b5V vcWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5si16420451pgi.146.2018.12.04.14.51.15; Tue, 04 Dec 2018 14:51:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726662AbeLDWuX (ORCPT + 99 others); Tue, 4 Dec 2018 17:50:23 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:47336 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725886AbeLDWuX (ORCPT ); Tue, 4 Dec 2018 17:50:23 -0500 Received: from akpm3.svl.corp.google.com (unknown [104.133.8.65]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id A2D87A47; Tue, 4 Dec 2018 22:50:21 +0000 (UTC) Date: Tue, 4 Dec 2018 14:50:17 -0800 From: Andrew Morton To: Josef Bacik Cc: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, riel@redhat.com, jack@suse.cz Subject: Re: [PATCH 3/4] filemap: drop the mmap_sem for all blocking operations Message-Id: <20181204145017.62d952c2a209975aa5888acf@linux-foundation.org> In-Reply-To: <20181130195812.19536-4-josef@toxicpanda.com> References: <20181130195812.19536-1-josef@toxicpanda.com> <20181130195812.19536-4-josef@toxicpanda.com> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 30 Nov 2018 14:58:11 -0500 Josef Bacik wrote: > Currently we only drop the mmap_sem if there is contention on the page > lock. The idea is that we issue readahead and then go to lock the page > while it is under IO and we want to not hold the mmap_sem during the IO. > > The problem with this is the assumption that the readahead does > anything. In the case that the box is under extreme memory or IO > pressure we may end up not reading anything at all for readahead, which > means we will end up reading in the page under the mmap_sem. Please describe here why this is considered to be a problem. Application stalling, I assume? Some description of in-the-field observations would be appropriate. ie, how serious is the problem whcih is being addressed. > Instead rework filemap fault path to drop the mmap sem at any point that > we may do IO or block for an extended period of time. This includes > while issuing readahead, locking the page, or needing to call ->readpage > because readahead did not occur. Then once we have a fully uptodate > page we can return with VM_FAULT_RETRY and come back again to find our > nicely in-cache page that was gotten outside of the mmap_sem. > > ... > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2304,28 +2304,44 @@ EXPORT_SYMBOL(generic_file_read_iter); > > #ifdef CONFIG_MMU > #define MMAP_LOTSAMISS (100) > +static struct file *maybe_unlock_mmap_for_io(struct file *fpin, > + struct vm_area_struct *vma, > + int flags) > +{ > + if (fpin) > + return fpin; > + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == > + FAULT_FLAG_ALLOW_RETRY) { > + fpin = get_file(vma->vm_file); > + up_read(&vma->vm_mm->mmap_sem); > + } > + return fpin; > +} A code comment would be nice. What it does and, especially, why it does it. > - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { > - put_page(page); > - return ret | VM_FAULT_RETRY; > + /* > + * We are open-coding lock_page_or_retry here because we want to do the > + * readpage if necessary while the mmap_sem is dropped. If there > + * happens to be a lock on the page but it wasn't being faulted in we'd > + * come back around without ALLOW_RETRY set and then have to do the IO > + * under the mmap_sem, which would be a bummer. Expanding on "a bummer" would help here ;) > + */ > + if (!trylock_page(page)) { > + fpin = maybe_unlock_mmap_for_io(fpin, vmf->vma, vmf->flags); > + if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > + goto out_retry; > + if (vmf->flags & FAULT_FLAG_KILLABLE) { > + if (__lock_page_killable(page)) { > + /* > + * If we don't have the right flags for > + * maybe_unlock_mmap_for_io to do it's thing we "its" > + * still need to drop the sem and return > + * VM_FAULT_RETRY so the upper layer checks the > + * signal and takes the appropriate action. > + */ > + if (!fpin) > + up_read(&vmf->vma->vm_mm->mmap_sem); > + goto out_retry; > + } > + } else > + __lock_page(page); > } > > /* Did it get truncated? */