Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp796377pxa; Wed, 19 Aug 2020 15:27:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwnlgmMRDwUYOP2IxAyMmPYm4aZvbCJFR05ieW35stck1uhsSWRbq4gc+37REQV2p0Wrl/N X-Received: by 2002:a17:906:970a:: with SMTP id k10mr450924ejx.189.1597876039421; Wed, 19 Aug 2020 15:27:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597876039; cv=none; d=google.com; s=arc-20160816; b=CZXKSTk7/AMOQkF9d74tYc67D8FbZn9eGK3YSYyQMc8RXTZa0TP9fXI05Gi9m+xbBO V+enXMNUfoNR8yaiXGeU8jjoV15BUluiegBR3alSsYzsUEKPUfg4dLNv9c2fAUpnSioj BU0aUnqhWtOJOwjsZ7Rd7XKMjt6dH9G2GLzsTG269TIb39zcqV47R0+xlT8Krr+5voaU 6hOgb31PkZCQLvozztNsDgVZYGrUYJVUkhrEy7FhKrGlFATrxTgGNbxNhcpFJULCAory hk5uLNQRkq32Mb91tTqizIDXpDpmO9Xb0FpPRE5I2QPhT8CZ7QoEOEt+BpiQGkGH23uv dpyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=y+9MJ9AUGEvGT2iI7OfaM8eWLvPkELwmthVnOBkeqIs=; b=H0nO3EJPWVoGH585UD/WtRbclfKvo2TAZSYMGJgVWhq3ztTjDmV/57vCAOhXLsgR2D 45s3juupRzyXInuZRGeyu8DrmWGwGYdJGueW91tAn9ToitfcaN0tsNoBQaggTj0R2fSb lGAqvVdOT3eGDauaZWBf6xEyB/FEtf3bSlXHs42tsDwrkVy9W5P65z4bupz2Dw6eR3gF b1WE94q46RruevxBD9mOo39FEgeWNRlyfiAEmJTzSou3tHkbFM0LMKsTFHBbYQJMzcvA YVUW/XN6TxbJXXAORSvOBIOzaOrESUvIWfWJ2bOEIerW5lmQBKhGA7rksplZJrmO5mPL +ghA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iikeXZl9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si179473edr.565.2020.08.19.15.26.54; Wed, 19 Aug 2020 15:27:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iikeXZl9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728116AbgHSWXO (ORCPT + 99 others); Wed, 19 Aug 2020 18:23:14 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:41981 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726681AbgHSWVG (ORCPT ); Wed, 19 Aug 2020 18:21:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1597875664; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y+9MJ9AUGEvGT2iI7OfaM8eWLvPkELwmthVnOBkeqIs=; b=iikeXZl9inOzfzUcvPz/6YUEt9ZKYW1xkvKi0vhkrKk2h8jH/3vRJRRUPPrC1TaFmJA3I+ ubjLzmLisFIJLcjOiVO2DH1D/SzsCwtQDWbfYY1b9VC5mrq+QfAR1wjYg/APiTjUmiDH4G MhapTjRSCDVQ1vyE/13fUBhTNOj99FE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-155-EYNoBz8-M263gTqHl_89AA-1; Wed, 19 Aug 2020 18:21:02 -0400 X-MC-Unique: EYNoBz8-M263gTqHl_89AA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 074AE1014DF9; Wed, 19 Aug 2020 22:21:01 +0000 (UTC) Received: from horse.redhat.com (ovpn-115-197.rdu2.redhat.com [10.10.115.197]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4744B7E303; Wed, 19 Aug 2020 22:20:54 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id CDD4A2254FB; Wed, 19 Aug 2020 18:20:53 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, virtio-fs@redhat.com Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, dan.j.williams@intel.com, Jan Kara , Vishal L Verma , "Weiny, Ira" Subject: [PATCH v3 02/18] dax: Create a range version of dax_layout_busy_page() Date: Wed, 19 Aug 2020 18:19:40 -0400 Message-Id: <20200819221956.845195-3-vgoyal@redhat.com> In-Reply-To: <20200819221956.845195-1-vgoyal@redhat.com> References: <20200819221956.845195-1-vgoyal@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org virtiofs device has a range of memory which is mapped into file inodes using dax. This memory is mapped in qemu on host and maps different sections of real file on host. Size of this memory is limited (determined by administrator) and depending on filesystem size, we will soon reach a situation where all the memory is in use and we need to reclaim some. As part of reclaim process, we will need to make sure that there are no active references to pages (taken by get_user_pages()) on the memory range we are trying to reclaim. I am planning to use dax_layout_busy_page() for this. But in current form this is per inode and scans through all the pages of the inode. We want to reclaim only a portion of memory (say 2MB page). So we want to make sure that only that 2MB range of pages do not have any references (and don't want to unmap all the pages of inode). Hence, create a range version of this function named dax_layout_busy_page_range() which can be used to pass a range which needs to be unmapped. Cc: Dan Williams Cc: linux-nvdimm@lists.01.org Cc: Jan Kara Cc: Vishal L Verma Cc: "Weiny, Ira" Signed-off-by: Vivek Goyal --- fs/dax.c | 29 +++++++++++++++++++++++------ include/linux/dax.h | 6 ++++++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 95341af1a966..ddd705251d9f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -559,7 +559,7 @@ static void *grab_mapping_entry(struct xa_state *xas, } /** - * dax_layout_busy_page - find first pinned page in @mapping + * dax_layout_busy_page_range - find first pinned page in @mapping * @mapping: address space to scan for a page with ref count > 1 * * DAX requires ZONE_DEVICE mapped pages. These pages are never @@ -572,13 +572,19 @@ static void *grab_mapping_entry(struct xa_state *xas, * establishment of new mappings in this address_space. I.e. it expects * to be able to run unmap_mapping_range() and subsequently not race * mapping_mapped() becoming true. + * + * Partial pages are included. If 'end' is LLONG_MAX, pages in the range + * from 'start' to end of the file are inluded. */ -struct page *dax_layout_busy_page(struct address_space *mapping) +struct page *dax_layout_busy_page_range(struct address_space *mapping, + loff_t start, loff_t end) { - XA_STATE(xas, &mapping->i_pages, 0); void *entry; unsigned int scanned = 0; struct page *page = NULL; + pgoff_t start_idx = start >> PAGE_SHIFT; + pgoff_t end_idx; + XA_STATE(xas, &mapping->i_pages, start_idx); /* * In the 'limited' case get_user_pages() for dax is disabled. @@ -589,6 +595,11 @@ struct page *dax_layout_busy_page(struct address_space *mapping) if (!dax_mapping(mapping) || !mapping_mapped(mapping)) return NULL; + /* If end == LLONG_MAX, all pages from start to till end of file */ + if (end == LLONG_MAX) + end_idx = ULONG_MAX; + else + end_idx = end >> PAGE_SHIFT; /* * If we race get_user_pages_fast() here either we'll see the * elevated page count in the iteration and wait, or @@ -596,15 +607,15 @@ struct page *dax_layout_busy_page(struct address_space *mapping) * against is no longer mapped in the page tables and bail to the * get_user_pages() slow path. The slow path is protected by * pte_lock() and pmd_lock(). New references are not taken without - * holding those locks, and unmap_mapping_range() will not zero the + * holding those locks, and unmap_mapping_pages() will not zero the * pte or pmd without holding the respective lock, so we are * guaranteed to either see new references or prevent new * references from being established. */ - unmap_mapping_range(mapping, 0, 0, 0); + unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0); xas_lock_irq(&xas); - xas_for_each(&xas, entry, ULONG_MAX) { + xas_for_each(&xas, entry, end_idx) { if (WARN_ON_ONCE(!xa_is_value(entry))) continue; if (unlikely(dax_is_locked(entry))) @@ -625,6 +636,12 @@ struct page *dax_layout_busy_page(struct address_space *mapping) xas_unlock_irq(&xas); return page; } +EXPORT_SYMBOL_GPL(dax_layout_busy_page_range); + +struct page *dax_layout_busy_page(struct address_space *mapping) +{ + return dax_layout_busy_page_range(mapping, 0, LLONG_MAX); +} EXPORT_SYMBOL_GPL(dax_layout_busy_page); static int __dax_invalidate_entry(struct address_space *mapping, diff --git a/include/linux/dax.h b/include/linux/dax.h index 6904d4e0b2e0..9016929db4c6 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -141,6 +141,7 @@ int dax_writeback_mapping_range(struct address_space *mapping, struct dax_device *dax_dev, struct writeback_control *wbc); struct page *dax_layout_busy_page(struct address_space *mapping); +struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); #else @@ -171,6 +172,11 @@ static inline struct page *dax_layout_busy_page(struct address_space *mapping) return NULL; } +static inline struct page *dax_layout_busy_page_range(struct address_space *mapping, pgoff_t start, pgoff_t nr_pages) +{ + return NULL; +} + static inline int dax_writeback_mapping_range(struct address_space *mapping, struct dax_device *dax_dev, struct writeback_control *wbc) { -- 2.25.4