Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp120639pxv; Thu, 15 Jul 2021 00:14:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwUrEm5hq9ocgo0ZDfiolxmIVjhRU+wwuqNzyajzQcKX38LAhIEOJe7IcnzuxxHhOGDsaid X-Received: by 2002:a05:6e02:1c02:: with SMTP id l2mr1760388ilh.9.1626333281480; Thu, 15 Jul 2021 00:14:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626333281; cv=none; d=google.com; s=arc-20160816; b=lbp6whfDBJPZjeQ+XwlZgHUvJju2UR3hFsTHDlSCxDGEjrn1ma3sghiH3Epm9EaYXH dWmzP0XZsB5lJZyebAJRcL2nMlC5S6qVYRjJrAT1fnWY2FdVVg5px1Lv0EFpsXg6RkgD DmWlMa3HgcBcDCxUBuHCe8CUwI70aYtRIf3AmTA27QVirLKYWoWcTpGTokvjti3mzL0x YFyfd15PRraXeH5mmF8PO5sf5Nu8qjfHrAiRwCraye0/lo772rTlNKea4IyfipjQutTe 1tFWpNikaQEsmkfO58MLB1jdl8I0VmvlksqdmfT+sYmmGOVxrbcK7tVrq31vhLv7tigF bCag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dzf8jbhkN7R5lzpEImX6pkVdJc7FyTXhvJz5CFg3tWU=; b=qF/QKh1YBYAgsTHXE5XhVaolN1bUwterGIl3yE5l0d7wEzfxEGtOl6qtDd+EB6eCpD fhyVO13n2UAPRT1Kj22zF27HgTznPfzI7y9EMlvj5H5w+gNlztozPLrb/gzeGuklfl6f 4jVAKLlyAHMwZJDE4QbJI+nVcatlnyQKnLnndG5YL/lIBzpXUnzgegR2SnlRKncXlSZg 6CRbyJNTPywUHZuvvxYadVtiwBaRgniXjcv119CBXRHCtQMCNW+vRgxjvbCqhBKkYB47 csKV42F9It9kJ6HBW8It+UtLccTwsrojkqSdk2PjgKTZ5+yPSWvfJCRwSJnPXc2WjdJq A1OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=BybQYomr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d1si3715335jak.77.2021.07.15.00.14.29; Thu, 15 Jul 2021 00:14:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=BybQYomr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235267AbhGOFXm (ORCPT + 99 others); Thu, 15 Jul 2021 01:23:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230076AbhGOFXl (ORCPT ); Thu, 15 Jul 2021 01:23:41 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 621A9C06175F; Wed, 14 Jul 2021 22:20:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=dzf8jbhkN7R5lzpEImX6pkVdJc7FyTXhvJz5CFg3tWU=; b=BybQYomrYBpiYdMYqercCq0RUl ddDfzyq+p5ekjCiis3x5ckT5UgqQCgigzCeDlb5UYMAd9ZNEZvazismBefHW5Rz/2J6gURzHCN2mc rzyYsThHUEY2DhEl62y9TReFJZOTGqqBT0YqPl4rPW1fL4Cec5WIH9s983mlXyPpP2LLzyxeDO+xw DGakx5XfUnp92ln2MK6yWfKD0FIfDPQ2t74wwzZahks1X0+2fUFJchvATBBu1o/VsY8Z+5DsJZyi0 cQ9zfwuDUXC6KLtxqnrUCBJYIY1iO4MidaBIBKD+RlR7wPlA6T6SYimccMoApRer39yF9g4KTiFSj 5BEfoe4Q==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m3tm8-0030kG-Hi; Thu, 15 Jul 2021 05:19:30 +0000 From: "Matthew Wilcox (Oracle)" To: linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v14 127/138] mm: Use multi-index entries in the page cache Date: Thu, 15 Jul 2021 04:36:53 +0100 Message-Id: <20210715033704.692967-128-willy@infradead.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210715033704.692967-1-willy@infradead.org> References: <20210715033704.692967-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We currently store order-N THPs as 2^N consecutive entries. While this consumes rather more memory than necessary, it also turns out to be buggy. A writeback operation which starts in the middle of a dirty THP will not notice as the dirty bit is only set on the head index. With multi-index entries, the dirty bit will be found no matter where in the THP the iteration starts. This does end up simplifying the page cache slightly, although not as much as I had hoped. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 10 ------- mm/filemap.c | 63 +++++++++++++++++++++++++---------------- mm/huge_memory.c | 20 ++++++++++--- mm/khugepaged.c | 12 +++++++- mm/migrate.c | 8 ------ mm/shmem.c | 11 ++----- 6 files changed, 68 insertions(+), 56 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index bf8e978a48f2..25b1bf3b1cdb 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1078,16 +1078,6 @@ static inline unsigned int __readahead_batch(struct readahead_control *rac, VM_BUG_ON_PAGE(PageTail(page), page); array[i++] = page; rac->_batch_count += thp_nr_pages(page); - - /* - * The page cache isn't using multi-index entries yet, - * so the xas cursor needs to be manually moved to the - * next index. This can be removed once the page cache - * is converted. - */ - if (PageHead(page)) - xas_set(&xas, rac->_index + rac->_batch_count); - if (i == array_sz) break; } diff --git a/mm/filemap.c b/mm/filemap.c index 20434d7bdad8..97d17e8c76aa 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -134,7 +134,6 @@ static void page_cache_delete(struct address_space *mapping, } VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(nr != 1 && shadow, folio); xas_store(&xas, shadow); xas_init_marks(&xas); @@ -276,8 +275,7 @@ void filemap_remove_folio(struct folio *folio) * from the mapping. The function expects @pvec to be sorted by page index * and is optimised for it to be dense. * It tolerates holes in @pvec (mapping entries at those indices are not - * modified). The function expects only THP head pages to be present in the - * @pvec. + * modified). The function expects only folios to be present in the @pvec. * * The function expects the i_pages lock to be held. */ @@ -312,20 +310,12 @@ static void page_cache_delete_batch(struct address_space *mapping, WARN_ON_ONCE(!folio_test_locked(folio)); - if (folio->index == xas.xa_index) - folio->mapping = NULL; - /* Leave page->index set: truncation lookup relies on it */ + folio->mapping = NULL; + /* Leave folio->index set: truncation lookup relies on it */ - /* - * Move to the next page in the vector if this is a regular - * page or the index is of the last sub-page of this compound - * page. - */ - if (folio->index + folio_nr_pages(folio) - 1 == - xas.xa_index) - i++; + i++; xas_store(&xas, NULL); - total_pages++; + total_pages += folio_nr_pages(folio); } mapping->nrpages -= total_pages; } @@ -2027,24 +2017,27 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t start, indices[pvec->nr] = xas.xa_index; if (!pagevec_add(pvec, &folio->page)) break; - goto next; + continue; unlock: folio_unlock(folio); put: folio_put(folio); -next: - if (!xa_is_value(folio) && folio_multi(folio)) { - xas_set(&xas, folio->index + folio_nr_pages(folio)); - /* Did we wrap on 32-bit? */ - if (!xas.xa_index) - break; - } } rcu_read_unlock(); return pagevec_count(pvec); } +static inline +bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max) +{ + if (folio_single(folio) || folio_test_hugetlb(folio)) + return false; + if (index >= max) + return false; + return index < folio->index + folio_nr_pages(folio) - 1; +} + /** * find_get_pages_range - gang pagecache lookup * @mapping: The address_space to search @@ -2083,11 +2076,17 @@ unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, if (xa_is_value(folio)) continue; +again: pages[ret] = folio_file_page(folio, xas.xa_index); if (++ret == nr_pages) { *start = xas.xa_index + 1; goto out; } + if (folio_more_pages(folio, xas.xa_index, end)) { + xas.xa_index++; + folio_ref_inc(folio); + goto again; + } } /* @@ -2145,9 +2144,15 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index, if (unlikely(folio != xas_reload(&xas))) goto put_page; - pages[ret] = &folio->page; +again: + pages[ret] = folio_file_page(folio, xas.xa_index); if (++ret == nr_pages) break; + if (folio_more_pages(folio, xas.xa_index, ULONG_MAX)) { + xas.xa_index++; + folio_ref_inc(folio); + goto again; + } continue; put_page: folio_put(folio); @@ -3169,6 +3174,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); do { +again: page = folio_file_page(folio, xas.xa_index); if (PageHWPoison(page)) goto unlock; @@ -3190,9 +3196,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, do_set_pte(vmf, page, addr); /* no need to invalidate: a not-present page won't be cached */ update_mmu_cache(vma, addr, vmf->pte); + if (folio_more_pages(folio, xas.xa_index, end_pgoff)) { + xas.xa_index++; + folio_ref_inc(folio); + goto again; + } folio_unlock(folio); continue; unlock: + if (folio_more_pages(folio, xas.xa_index, end_pgoff)) { + xas.xa_index++; + goto again; + } folio_unlock(folio); folio_put(folio); } while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 763bf687ca92..7ea0052172a8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2638,6 +2638,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) { struct page *head = compound_head(page); struct deferred_split *ds_queue = get_deferred_split_queue(head); + XA_STATE(xas, &head->mapping->i_pages, head->index); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; int extra_pins, ret; @@ -2700,18 +2701,27 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) unmap_page(head); + if (mapping) { + xas_split_alloc(&xas, head, compound_order(head), + mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); + if (xas_error(&xas)) { + ret = xas_error(&xas); + goto out_unlock; + } + } + /* block interrupt reentry in xa_lock and spinlock */ local_irq_disable(); if (mapping) { - XA_STATE(xas, &mapping->i_pages, page_index(head)); - /* * Check if the head page is present in page cache. * We assume all tail are present too, if head is there. */ - xa_lock(&mapping->i_pages); + xas_lock(&xas); + xas_reset(&xas); if (xas_load(&xas) != head) goto fail; + xas_split(&xas, head, thp_order(head)); } /* Prevent deferred_split_scan() touching ->_refcount */ @@ -2739,7 +2749,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) - xa_unlock(&mapping->i_pages); + xas_unlock(&xas); local_irq_enable(); remap_page(head, thp_nr_pages(head)); ret = -EBUSY; @@ -2753,6 +2763,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (mapping) i_mmap_unlock_read(mapping); out: + /* Free any memory we didn't use */ + xas_nomem(&xas, 0); count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED); return ret; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6b9c98ddcd09..949b583f22c0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1664,7 +1664,10 @@ static void collapse_file(struct mm_struct *mm, } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); - /* This will be less messy when we use multi-index entries */ + /* + * Ensure we have slots for all the pages in the range. This is + * almost certainly a no-op because most of the pages must be present + */ do { xas_lock_irq(&xas); xas_create_range(&xas); @@ -1884,6 +1887,9 @@ static void collapse_file(struct mm_struct *mm, __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none); } + /* Join all the small entries into a single multi-index entry */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, new_page); xa_locked: xas_unlock_irq(&xas); xa_unlocked: @@ -2005,6 +2011,10 @@ static void khugepaged_scan_file(struct mm_struct *mm, continue; } + /* + * XXX: khugepaged should compact smaller compound pages + * into a PMD sized page + */ if (PageTransCompound(page)) { result = SCAN_PAGE_COMPOUND; break; diff --git a/mm/migrate.c b/mm/migrate.c index 36cdae0a1235..029b592a0066 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -439,14 +439,6 @@ int folio_migrate_mapping(struct address_space *mapping, } xas_store(&xas, newfolio); - if (nr > 1) { - int i; - - for (i = 1; i < nr; i++) { - xas_next(&xas); - xas_store(&xas, newfolio); - } - } /* * Drop cache reference from old page by unfreezing diff --git a/mm/shmem.c b/mm/shmem.c index 337680a01f2a..bdfa60416d68 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -670,7 +670,6 @@ static int shmem_add_to_page_cache(struct page *page, struct mm_struct *charge_mm) { XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page)); - unsigned long i = 0; unsigned long nr = compound_nr(page); int error; @@ -700,17 +699,11 @@ static int shmem_add_to_page_cache(struct page *page, void *entry; xas_lock_irq(&xas); entry = xas_find_conflict(&xas); - if (entry != expected) + if (entry != expected) { xas_set_err(&xas, -EEXIST); - xas_create_range(&xas); - if (xas_error(&xas)) goto unlock; -next: - xas_store(&xas, page); - if (++i < nr) { - xas_next(&xas); - goto next; } + xas_store(&xas, page); if (PageTransHuge(page)) { count_vm_event(THP_FILE_ALLOC); __mod_lruvec_page_state(page, NR_SHMEM_THPS, nr); -- 2.30.2