2021-03-05 04:19:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 00/25] Page folios

Our type system does not currently distinguish between tail pages and
head or single pages. This is a problem because we call compound_head()
multiple times (and the compiler cannot optimise it out), bloating the
kernel. It also makes programming hard as it is often unclear whether
a function operates on an individual page, or an entire compound page.

This patch series introduces the struct folio, which is a type that
represents an entire compound page. This initial set reduces the kernel
size by approximately 6kB, although its real purpose is adding
infrastructure to enable further use of the folio.

The big correctness proof that exists in this patch series is that we
never lock or wait for writeback on a tail page. This is important as
we would miss wakeups due to being on the wrong page waitqueue if we
ever did.

I analysed the text size reduction using a config based on Oracle UEK
with all modules changed to built-in. That's obviously not a kernel
which makes sense to run, but it serves to compare the effects on (many
common) filesystems & drivers, not just the core.

add/remove: 33510/33499 grow/shrink: 1831/1898 up/down: 888762/-894719 (-5957)

For a Debian config, just comparing vmlinux.o gives a reduction of 3828
bytes of text and 72 bytes of data:

text data bss dec hex filename
16125879 4421122 1846344 22393345 155b201 linus/vmlinux.o
16122051 4421050 1846344 22389445 155a2c5 folio/vmlinux.o

For nfs (a module I happened to notice was particularly affected), it's
a reduction of 588 bytes of text and 16 bytes of data:
257142 59228 408 316778 4d56a linus/fs/nfs/nfs.o
256554 59212 408 316174 4d30e folio/fs/nfs/nfs.o

Current tree at:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio

(contains another ~70 patches on top of this batch)

v4:
- Rebase on current Linus tree (including swap fix)
- Analyse each patch in terms of its effects on kernel text size.
A few were modified to improve their effect. In particular, where
pushing calls to page_folio() into the callers resulted in unacceptable
size increases, the wrapper was placed in mm/folio-compat.c. This lets
us see all the places which are good targets for conversion to folios.
- Some of the patches were reordered, split or merged in order to make
more logical sense.
- Use nth_page() for folio_next() if we're using SPARSEMEM and not
VMEMMAP (Zi Yan)
- Increment and decrement page stats in units of pages instead of units
of folios (Zi Yan)
v3:
- Rebase on next-20210127. Two major sources of conflict, the
generic_file_buffered_read refactoring (in akpm tree) and the
fscache work (in dhowells tree).
v2:
- Pare patch series back to just infrastructure and the page waiting
parts.

Matthew Wilcox (Oracle) (25):
mm: Introduce struct folio
mm: Add folio_pgdat and folio_zone
mm/vmstat: Add functions to account folio statistics
mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
mm: Add put_folio
mm: Add get_folio
mm: Create FolioFlags
mm: Handle per-folio private data
mm: Add folio_index, folio_page and folio_contains
mm/util: Add folio_mapping and folio_file_mapping
mm/memcg: Add folio wrappers for various memcontrol functions
mm/filemap: Add unlock_folio
mm/filemap: Add lock_folio
mm/filemap: Add lock_folio_killable
mm/filemap: Convert lock_page_async to lock_folio_async
mm/filemap: Convert end_page_writeback to end_folio_writeback
mm/filemap: Add wait_on_folio_locked & wait_on_folio_locked_killable
mm/page-writeback: Add wait_on_folio_writeback
mm/page-writeback: Add wait_for_stable_folio
mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
mm/filemap: Add __lock_folio_or_retry
mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
mm/page-writeback: Convert test_clear_page_writeback to take a folio
mm/filemap: Convert page wait queues to be folios
cachefiles: Switch to wait_page_key

fs/afs/write.c | 21 ++--
fs/cachefiles/rdwr.c | 13 +--
fs/io_uring.c | 2 +-
include/linux/fscache.h | 5 +
include/linux/memcontrol.h | 30 +++++
include/linux/mm.h | 88 ++++++++++-----
include/linux/mm_types.h | 33 ++++++
include/linux/mmdebug.h | 20 ++++
include/linux/page-flags.h | 106 ++++++++++++++----
include/linux/pagemap.h | 197 +++++++++++++++++++++++----------
include/linux/vmstat.h | 83 ++++++++++++++
mm/Makefile | 2 +-
mm/filemap.c | 218 ++++++++++++++++++-------------------
mm/folio-compat.c | 37 +++++++
mm/memory.c | 8 +-
mm/page-writeback.c | 58 +++++-----
mm/swapfile.c | 6 +-
mm/util.c | 20 ++--
18 files changed, 666 insertions(+), 281 deletions(-)
create mode 100644 mm/folio-compat.c

--
2.30.0


2021-03-05 04:19:48

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 01/25] mm: Introduce struct folio

A struct folio refers to an entire (possibly compound) page. A function
which takes a struct folio argument declares that it will operate on the
entire compound page, not just PAGE_SIZE bytes. In return, the caller
guarantees that the pointer it is passing does not point to a tail page.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm.h | 30 ++++++++++++++++++++++++++++++
include/linux/mm_types.h | 17 +++++++++++++++++
2 files changed, 47 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..a46e5a4385b0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -927,6 +927,11 @@ static inline unsigned int compound_order(struct page *page)
return page[1].compound_order;
}

+static inline unsigned int folio_order(struct folio *folio)
+{
+ return compound_order(&folio->page);
+}
+
static inline bool hpage_pincount_available(struct page *page)
{
/*
@@ -1518,6 +1523,30 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
#endif
}

+static inline unsigned long folio_nr_pages(struct folio *folio)
+{
+ return compound_nr(&folio->page);
+}
+
+static inline struct folio *next_folio(struct folio *folio)
+{
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
+ return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
+#else
+ return folio + folio_nr_pages(folio);
+#endif
+}
+
+static inline unsigned int folio_shift(struct folio *folio)
+{
+ return PAGE_SHIFT + folio_order(folio);
+}
+
+static inline size_t folio_size(struct folio *folio)
+{
+ return PAGE_SIZE << folio_order(folio);
+}
+
/*
* Some inline functions in vmstat.h depend on page_zone()
*/
@@ -1623,6 +1652,7 @@ extern void pagefault_out_of_memory(void);

#define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK)
#define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1))
+#define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1))

/*
* Flags passed to show_mem() and show_free_areas() to suppress output in
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0974ad501a47..a311cb48526f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -223,6 +223,23 @@ struct page {
#endif
} _struct_page_alignment;

+/*
+ * A struct folio is either a base (order-0) page or the head page of
+ * a compound page.
+ */
+struct folio {
+ struct page page;
+};
+
+static inline struct folio *page_folio(struct page *page)
+{
+ unsigned long head = READ_ONCE(page->compound_head);
+
+ if (unlikely(head & 1))
+ return (struct folio *)(head - 1);
+ return (struct folio *)page;
+}
+
static inline atomic_t *compound_mapcount_ptr(struct page *page)
{
return &page[1].compound_mapcount;
--
2.30.0

2021-03-05 04:19:51

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 02/25] mm: Add folio_pgdat and folio_zone

These are just convenience wrappers for callers with folios; pgdat and
zone can be reached from tail pages as well as head pages.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
---
include/linux/mm.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a46e5a4385b0..ecfe202aa4ec 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1488,6 +1488,16 @@ static inline pg_data_t *page_pgdat(const struct page *page)
return NODE_DATA(page_to_nid(page));
}

+static inline struct zone *folio_zone(const struct folio *folio)
+{
+ return page_zone(&folio->page);
+}
+
+static inline pg_data_t *folio_pgdat(const struct folio *folio)
+{
+ return page_pgdat(&folio->page);
+}
+
#ifdef SECTION_IN_PAGE_FLAGS
static inline void set_page_section(struct page *page, unsigned long section)
{
--
2.30.0

2021-03-05 04:19:59

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics

Allow page counters to be more readily modified by callers which have
a folio. Name these wrappers with 'stat' instead of 'state' as requested
by Linus here:
https://lore.kernel.org/linux-mm/[email protected]om/

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/vmstat.h | 83 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 506d625163a1..dff5342df07b 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -402,6 +402,54 @@ static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
#endif /* CONFIG_SMP */

+static inline
+void __inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
+{
+ __mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline
+void __dec_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
+{
+ __mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline
+void inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
+{
+ mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
+}
+
+static inline
+void dec_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
+{
+ mod_zone_page_state(folio_zone(folio), item, -folio_nr_pages(folio));
+}
+
+static inline
+void __inc_node_folio_stat(struct folio *folio, enum node_stat_item item)
+{
+ __mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline
+void __dec_node_folio_stat(struct folio *folio, enum node_stat_item item)
+{
+ __mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
+static inline
+void inc_node_folio_stat(struct folio *folio, enum node_stat_item item)
+{
+ mod_node_page_state(folio_pgdat(folio), item, folio_nr_pages(folio));
+}
+
+static inline
+void dec_node_folio_stat(struct folio *folio, enum node_stat_item item)
+{
+ mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
+}
+
static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
int migratetype)
{
@@ -536,6 +584,24 @@ static inline void __dec_lruvec_page_state(struct page *page,
__mod_lruvec_page_state(page, idx, -1);
}

+static inline void __mod_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx, int val)
+{
+ __mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void __inc_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx)
+{
+ __mod_lruvec_folio_stat(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void __dec_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx)
+{
+ __mod_lruvec_folio_stat(folio, idx, -folio_nr_pages(folio));
+}
+
static inline void inc_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx)
{
@@ -560,4 +626,21 @@ static inline void dec_lruvec_page_state(struct page *page,
mod_lruvec_page_state(page, idx, -1);
}

+static inline void mod_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx, int val)
+{
+ mod_lruvec_page_state(&folio->page, idx, val);
+}
+
+static inline void inc_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx)
+{
+ mod_lruvec_folio_stat(folio, idx, folio_nr_pages(folio));
+}
+
+static inline void dec_lruvec_folio_stat(struct folio *folio,
+ enum node_stat_item idx)
+{
+ mod_lruvec_folio_stat(folio, idx, -folio_nr_pages(folio));
+}
#endif /* _LINUX_VMSTAT_H */
--
2.30.0

2021-03-05 04:20:14

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 04/25] mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO

These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
---
include/linux/mmdebug.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 5d0767cb424a..77d24e1dcaec 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -23,6 +23,13 @@ void dump_mm(const struct mm_struct *mm);
BUG(); \
} \
} while (0)
+#define VM_BUG_ON_FOLIO(cond, folio) \
+ do { \
+ if (unlikely(cond)) { \
+ dump_page(&folio->page, "VM_BUG_ON_FOLIO(" __stringify(cond)")");\
+ BUG(); \
+ } \
+ } while (0)
#define VM_BUG_ON_VMA(cond, vma) \
do { \
if (unlikely(cond)) { \
@@ -48,6 +55,17 @@ void dump_mm(const struct mm_struct *mm);
} \
unlikely(__ret_warn_once); \
})
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio) ({ \
+ static bool __section(".data.once") __warned; \
+ int __ret_warn_once = !!(cond); \
+ \
+ if (unlikely(__ret_warn_once && !__warned)) { \
+ dump_page(&folio->page, "VM_WARN_ON_ONCE_FOLIO(" __stringify(cond)")");\
+ __warned = true; \
+ WARN_ON(1); \
+ } \
+ unlikely(__ret_warn_once); \
+})

#define VM_WARN_ON(cond) (void)WARN_ON(cond)
#define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
@@ -56,11 +74,13 @@ void dump_mm(const struct mm_struct *mm);
#else
#define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond)
#define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
+#define VM_BUG_ON_FOLIO(cond, folio) VM_BUG_ON(cond)
#define VM_BUG_ON_VMA(cond, vma) VM_BUG_ON(cond)
#define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
#define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_FOLIO(cond, folio) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
#endif
--
2.30.0

2021-03-05 04:21:01

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 06/25] mm: Add get_folio

If we know we have a folio, we can call get_folio() instead of get_page()
and save the overhead of calling compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
---
include/linux/mm.h | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 30fd431b1b05..4595955805f8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1177,18 +1177,19 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
}

/* 127: arbitrary random number, small enough to assemble well */
-#define page_ref_zero_or_close_to_overflow(page) \
- ((unsigned int) page_ref_count(page) + 127u <= 127u)
+#define folio_ref_zero_or_close_to_overflow(folio) \
+ ((unsigned int) page_ref_count(&folio->page) + 127u <= 127u)
+
+static inline void get_folio(struct folio *folio)
+{
+ /* Getting a page requires an already elevated page->_refcount. */
+ VM_BUG_ON_FOLIO(folio_ref_zero_or_close_to_overflow(folio), folio);
+ page_ref_inc(&folio->page);
+}

static inline void get_page(struct page *page)
{
- page = compound_head(page);
- /*
- * Getting a normal page or the head of a compound page
- * requires to already have an elevated page->_refcount.
- */
- VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
- page_ref_inc(page);
+ get_folio(page_folio(page));
}

bool __must_check try_grab_page(struct page *page, unsigned int flags);
--
2.30.0

2021-03-05 04:21:21

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 07/25] mm: Create FolioFlags

These new functions are the folio analogues of the PageFlags functions.
If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio is not a tail
page at every invocation. Note that this will also catch the PagePoisoned
case as a poisoned page has every bit set, which would include PageTail.

This saves 1740 bytes of text with the distro-derived config that
I'm testing due to removing a double call to compound_head() in
PageSwapCache().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/fscache.h | 5 ++
include/linux/page-flags.h | 104 ++++++++++++++++++++++++++++++-------
2 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a1c928fe98e7..c274006f4037 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -38,6 +38,11 @@
#define ClearPageFsCache(page) ClearPagePrivate2((page))
#define TestSetPageFsCache(page) TestSetPagePrivate2((page))
#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
+#define FolioFsCache(folio) FolioPrivate2((folio))
+#define SetFolioFsCache(folio) SetFolioPrivate2((folio))
+#define ClearFolioFsCache(folio) ClearFolioPrivate2((folio))
+#define TestSetFolioFsCache(folio) TestSetFolioPrivate2((folio))
+#define TestClearFolioFsCache(folio) TestClearFolioPrivate2((folio))

/* pattern used to fill dead space in an index entry */
#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..90381858d901 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -212,6 +212,12 @@ static inline void page_init_poison(struct page *page, size_t size)
}
#endif

+static unsigned long *folio_flags(struct folio *folio)
+{
+ VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
+ return &folio->page.flags;
+}
+
/*
* Page flags policies wrt compound pages
*
@@ -260,30 +266,44 @@ static inline void page_init_poison(struct page *page, size_t size)
* Macros to create function definitions for page flags
*/
#define TESTPAGEFLAG(uname, lname, policy) \
+static __always_inline int Folio##uname(struct folio *folio) \
+ { return test_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline int Page##uname(struct page *page) \
{ return test_bit(PG_##lname, &policy(page, 0)->flags); }

#define SETPAGEFLAG(uname, lname, policy) \
+static __always_inline void SetFolio##uname(struct folio *folio) \
+ { set_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline void SetPage##uname(struct page *page) \
{ set_bit(PG_##lname, &policy(page, 1)->flags); }

#define CLEARPAGEFLAG(uname, lname, policy) \
+static __always_inline void ClearFolio##uname(struct folio *folio) \
+ { clear_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline void ClearPage##uname(struct page *page) \
{ clear_bit(PG_##lname, &policy(page, 1)->flags); }

#define __SETPAGEFLAG(uname, lname, policy) \
+static __always_inline void __SetFolio##uname(struct folio *folio) \
+ { __set_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline void __SetPage##uname(struct page *page) \
{ __set_bit(PG_##lname, &policy(page, 1)->flags); }

#define __CLEARPAGEFLAG(uname, lname, policy) \
+static __always_inline void __ClearFolio##uname(struct folio *folio) \
+ { __clear_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline void __ClearPage##uname(struct page *page) \
{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }

#define TESTSETFLAG(uname, lname, policy) \
+static __always_inline int TestSetFolio##uname(struct folio *folio) \
+ { return test_and_set_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline int TestSetPage##uname(struct page *page) \
{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }

#define TESTCLEARFLAG(uname, lname, policy) \
+static __always_inline int TestClearFolio##uname(struct folio *folio) \
+ { return test_and_clear_bit(PG_##lname, folio_flags(folio)); } \
static __always_inline int TestClearPage##uname(struct page *page) \
{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }

@@ -302,21 +322,27 @@ static __always_inline int TestClearPage##uname(struct page *page) \
TESTCLEARFLAG(uname, lname, policy)

#define TESTPAGEFLAG_FALSE(uname) \
+static inline int Folio##uname(const struct folio *folio) { return 0; } \
static inline int Page##uname(const struct page *page) { return 0; }

#define SETPAGEFLAG_NOOP(uname) \
+static inline void SetFolio##uname(struct folio *folio) { } \
static inline void SetPage##uname(struct page *page) { }

#define CLEARPAGEFLAG_NOOP(uname) \
+static inline void ClearFolio##uname(struct folio *folio) { } \
static inline void ClearPage##uname(struct page *page) { }

#define __CLEARPAGEFLAG_NOOP(uname) \
+static inline void __ClearFolio##uname(struct folio *folio) { } \
static inline void __ClearPage##uname(struct page *page) { }

#define TESTSETFLAG_FALSE(uname) \
+static inline int TestSetFolio##uname(struct folio *folio) { return 0; } \
static inline int TestSetPage##uname(struct page *page) { return 0; }

#define TESTCLEARFLAG_FALSE(uname) \
+static inline int TestClearFolio##uname(struct folio *folio) { return 0; } \
static inline int TestClearPage##uname(struct page *page) { return 0; }

#define PAGEFLAG_FALSE(uname) TESTPAGEFLAG_FALSE(uname) \
@@ -393,14 +419,18 @@ PAGEFLAG_FALSE(HighMem)
#endif

#ifdef CONFIG_SWAP
-static __always_inline int PageSwapCache(struct page *page)
+static __always_inline bool FolioSwapCache(struct folio *folio)
{
-#ifdef CONFIG_THP_SWAP
- page = compound_head(page);
-#endif
- return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
+ return FolioSwapBacked(folio) &&
+ test_bit(PG_swapcache, folio_flags(folio));

}
+
+static __always_inline bool PageSwapCache(struct page *page)
+{
+ return FolioSwapCache(page_folio(page));
+}
+
SETPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
CLEARPAGEFLAG(SwapCache, swapcache, PF_NO_TAIL)
#else
@@ -478,10 +508,14 @@ static __always_inline int PageMappingFlags(struct page *page)
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
}

-static __always_inline int PageAnon(struct page *page)
+static __always_inline bool FolioAnon(struct folio *folio)
{
- page = compound_head(page);
- return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
+ return ((unsigned long)folio->page.mapping & PAGE_MAPPING_ANON) != 0;
+}
+
+static __always_inline bool PageAnon(struct page *page)
+{
+ return FolioAnon(page_folio(page));
}

static __always_inline int __PageMovable(struct page *page)
@@ -509,18 +543,16 @@ TESTPAGEFLAG_FALSE(Ksm)

u64 stable_page_flags(struct page *page);

-static inline int PageUptodate(struct page *page)
+static inline int FolioUptodate(struct folio *folio)
{
- int ret;
- page = compound_head(page);
- ret = test_bit(PG_uptodate, &(page)->flags);
+ int ret = test_bit(PG_uptodate, folio_flags(folio));
/*
* Must ensure that the data we read out of the page is loaded
* _after_ we've loaded page->flags to check for PageUptodate.
* We can skip the barrier if the page is not uptodate, because
* we wouldn't be reading anything from it.
*
- * See SetPageUptodate() for the other side of the story.
+ * See SetFolioUptodate() for the other side of the story.
*/
if (ret)
smp_rmb();
@@ -528,23 +560,36 @@ static inline int PageUptodate(struct page *page)
return ret;
}

-static __always_inline void __SetPageUptodate(struct page *page)
+static inline int PageUptodate(struct page *page)
+{
+ return FolioUptodate(page_folio(page));
+}
+
+static __always_inline void __SetFolioUptodate(struct folio *folio)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
smp_wmb();
- __set_bit(PG_uptodate, &page->flags);
+ __set_bit(PG_uptodate, folio_flags(folio));
}

-static __always_inline void SetPageUptodate(struct page *page)
+static __always_inline void SetFolioUptodate(struct folio *folio)
{
- VM_BUG_ON_PAGE(PageTail(page), page);
/*
* Memory barrier must be issued before setting the PG_uptodate bit,
* so that all previous stores issued in order to bring the page
* uptodate are actually visible before PageUptodate becomes true.
*/
smp_wmb();
- set_bit(PG_uptodate, &page->flags);
+ set_bit(PG_uptodate, folio_flags(folio));
+}
+
+static __always_inline void __SetPageUptodate(struct page *page)
+{
+ __SetFolioUptodate((struct folio *)page);
+}
+
+static __always_inline void SetPageUptodate(struct page *page)
+{
+ SetFolioUptodate((struct folio *)page);
}

CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)
@@ -569,6 +614,17 @@ static inline void set_page_writeback_keepwrite(struct page *page)

__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)

+/* Whether there are one or multiple pages in a folio */
+static inline bool FolioSingle(struct folio *folio)
+{
+ return !FolioHead(folio);
+}
+
+static inline bool FolioMulti(struct folio *folio)
+{
+ return FolioHead(folio);
+}
+
static __always_inline void set_compound_head(struct page *page, struct page *head)
{
WRITE_ONCE(page->compound_head, (unsigned long)head + 1);
@@ -592,12 +648,15 @@ static inline void ClearPageCompound(struct page *page)
#ifdef CONFIG_HUGETLB_PAGE
int PageHuge(struct page *page);
int PageHeadHuge(struct page *page);
+static inline bool FolioHuge(struct folio *folio)
+{
+ return PageHeadHuge(&folio->page);
+}
#else
TESTPAGEFLAG_FALSE(Huge)
TESTPAGEFLAG_FALSE(HeadHuge)
#endif

-
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* PageHuge() only returns true for hugetlbfs pages, but not for
@@ -844,6 +903,11 @@ static inline int page_has_private(struct page *page)
return !!(page->flags & PAGE_FLAGS_PRIVATE);
}

+static inline bool folio_has_private(struct folio *folio)
+{
+ return page_has_private(&folio->page);
+}
+
#undef PF_ANY
#undef PF_HEAD
#undef PF_ONLY_HEAD
--
2.30.0

2021-03-05 04:21:26

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 05/25] mm: Add put_folio

If we know we have a folio, we can call put_folio() instead of put_page()
and save the overhead of calling compound_head(). Also skips the
devmap checks.

This commit looks like it should be a no-op, but actually saves 1714 bytes
of text with the distro-derived config that I'm testing. Some functions
grow a little while others shrink. I presume the compiler is making
different inlining decisions.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
---
include/linux/mm.h | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecfe202aa4ec..30fd431b1b05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1205,9 +1205,15 @@ static inline __must_check bool try_get_page(struct page *page)
return true;
}

+static inline void put_folio(struct folio *folio)
+{
+ if (put_page_testzero(&folio->page))
+ __put_page(&folio->page);
+}
+
static inline void put_page(struct page *page)
{
- page = compound_head(page);
+ struct folio *folio = page_folio(page);

/*
* For devmap managed pages we need to catch refcount transition from
@@ -1215,13 +1221,12 @@ static inline void put_page(struct page *page)
* need to inform the device driver through callback. See
* include/linux/memremap.h and HMM for details.
*/
- if (page_is_devmap_managed(page)) {
- put_devmap_managed_page(page);
+ if (page_is_devmap_managed(&folio->page)) {
+ put_devmap_managed_page(&folio->page);
return;
}

- if (put_page_testzero(page))
- __put_page(page);
+ put_folio(folio);
}

/*
--
2.30.0

2021-03-05 04:22:35

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 08/25] mm: Handle per-folio private data

Add folio_private() and set_folio_private() which mirror page_private()
and set_page_private() -- ie folio private data is the same as page
private data.

Turn attach_page_private() into attach_folio_private() and reimplement
attach_page_private() as a wrapper. No filesystem which uses page private
data currently supports compound pages, so we're free to define the rules.
attach_page_private() may only be called on a head page; if you want
to add private data to a tail page, you can call set_page_private()
directly (and shouldn't increment the page refcount! That should be
done when adding private data to the head page / folio).

This saves 886 bytes of text with the distro-derived config that I'm
testing due to removing the calls to compound_head() in get_page()
& put_page().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm_types.h | 16 ++++++++++++++
include/linux/pagemap.h | 48 ++++++++++++++++++++++++----------------
2 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a311cb48526f..b650423fcba6 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -258,6 +258,12 @@ static inline atomic_t *compound_pincount_ptr(struct page *page)
#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK)
#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE)

+/*
+ * page_private can be used on tail pages. However, PagePrivate is only
+ * checked by the VM on the head page. So page_private on the tail pages
+ * should be used for data that's ancillary to the head page (eg attaching
+ * buffer heads to tail pages after attaching buffer heads to the head page)
+ */
#define page_private(page) ((page)->private)

static inline void set_page_private(struct page *page, unsigned long private)
@@ -265,6 +271,16 @@ static inline void set_page_private(struct page *page, unsigned long private)
page->private = private;
}

+static inline unsigned long folio_private(struct folio *folio)
+{
+ return folio->page.private;
+}
+
+static inline void set_folio_private(struct folio *folio, unsigned long v)
+{
+ folio->page.private = v;
+}
+
struct page_frag_cache {
void * va;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 20225b067583..f07c03da83f6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -245,42 +245,52 @@ static inline int page_cache_add_speculative(struct page *page, int count)
}

/**
- * attach_page_private - Attach private data to a page.
- * @page: Page to attach data to.
- * @data: Data to attach to page.
+ * attach_folio_private - Attach private data to a folio.
+ * @folio: Folio to attach data to.
+ * @data: Data to attach to folio.
*
- * Attaching private data to a page increments the page's reference count.
- * The data must be detached before the page will be freed.
+ * Attaching private data to a folio increments the page's reference count.
+ * The data must be detached before the folio will be freed.
*/
-static inline void attach_page_private(struct page *page, void *data)
+static inline void attach_folio_private(struct folio *folio, void *data)
{
- get_page(page);
- set_page_private(page, (unsigned long)data);
- SetPagePrivate(page);
+ get_folio(folio);
+ set_folio_private(folio, (unsigned long)data);
+ SetFolioPrivate(folio);
}

/**
- * detach_page_private - Detach private data from a page.
- * @page: Page to detach data from.
+ * detach_folio_private - Detach private data from a folio.
+ * @folio: Folio to detach data from.
*
- * Removes the data that was previously attached to the page and decrements
+ * Removes the data that was previously attached to the folio and decrements
* the refcount on the page.
*
- * Return: Data that was attached to the page.
+ * Return: Data that was attached to the folio.
*/
-static inline void *detach_page_private(struct page *page)
+static inline void *detach_folio_private(struct folio *folio)
{
- void *data = (void *)page_private(page);
+ void *data = (void *)folio_private(folio);

- if (!PagePrivate(page))
+ if (!FolioPrivate(folio))
return NULL;
- ClearPagePrivate(page);
- set_page_private(page, 0);
- put_page(page);
+ ClearFolioPrivate(folio);
+ set_folio_private(folio, 0);
+ put_folio(folio);

return data;
}

+static inline void attach_page_private(struct page *page, void *data)
+{
+ attach_folio_private((struct folio *)page, data);
+}
+
+static inline void *detach_page_private(struct page *page)
+{
+ return detach_folio_private((struct folio *)page);
+}
+
#ifdef CONFIG_NUMA
extern struct page *__page_cache_alloc(gfp_t gfp);
#else
--
2.30.0

2021-03-05 04:22:55

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 09/25] mm: Add folio_index, folio_page and folio_contains

folio_index() is the equivalent of page_index() for folios. folio_page()
finds the page in a folio for a page cache index. folio_contains()
tells you whether a folio contains a particular page cache index.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f07c03da83f6..5094b50f7680 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -447,6 +447,29 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
}

+static inline pgoff_t folio_index(struct folio *folio)
+{
+ if (unlikely(FolioSwapCache(folio)))
+ return __page_file_index(&folio->page);
+ return folio->page.index;
+}
+
+static inline struct page *folio_page(struct folio *folio, pgoff_t index)
+{
+ index -= folio_index(folio);
+ VM_BUG_ON_FOLIO(index >= folio_nr_pages(folio), folio);
+ return &folio->page + index;
+}
+
+/* Does this folio contain this index? */
+static inline bool folio_contains(struct folio *folio, pgoff_t index)
+{
+ /* HugeTLBfs indexes the page cache in units of hpage_size */
+ if (PageHuge(&folio->page))
+ return folio->page.index == index;
+ return index - folio_index(folio) < folio_nr_pages(folio);
+}
+
/*
* Given the page we found in the page cache, return the page corresponding
* to this index in the file
--
2.30.0

2021-03-05 04:23:26

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 10/25] mm/util: Add folio_mapping and folio_file_mapping

These are the folio equivalent of page_mapping() and page_file_mapping().
Add an out-of-line page_mapping() wrapper around folio_mapping()
in order to prevent the page_folio() call from bloating every caller
of page_mapping(). Adjust page_file_mapping() and page_mapping_file()
to use folios internally.

This ends up saving 102 bytes of text overall. folio_mapping() is
45 bytes shorter than page_mapping() was, but the compiler chooses to
inline folio_mapping() into page_mapping_file(), for a net increase of
69 bytes in the core. This is made up for by a few bytes less in dozens
of nfs functions (which call page_file_mapping()). The small amount of
difference appears to be a slight change in gcc's register allocation
decisions, which allow:

48 8b 56 08 mov 0x8(%rsi),%rdx
48 8d 42 ff lea -0x1(%rdx),%rax
83 e2 01 and $0x1,%edx
48 0f 44 c6 cmove %rsi,%rax

to become:

48 8b 46 08 mov 0x8(%rsi),%rax
48 8d 78 ff lea -0x1(%rax),%rdi
a8 01 test $0x1,%al
48 0f 44 fe cmove %rsi,%rdi

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/mm.h | 14 --------------
include/linux/pagemap.h | 17 +++++++++++++++++
mm/Makefile | 2 +-
mm/folio-compat.c | 13 +++++++++++++
mm/swapfile.c | 6 +++---
mm/util.c | 20 ++++++++++----------
6 files changed, 44 insertions(+), 28 deletions(-)
create mode 100644 mm/folio-compat.c

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4595955805f8..9199b59ee8da 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1603,19 +1603,6 @@ void page_address_init(void);

extern void *page_rmapping(struct page *page);
extern struct anon_vma *page_anon_vma(struct page *page);
-extern struct address_space *page_mapping(struct page *page);
-
-extern struct address_space *__page_file_mapping(struct page *);
-
-static inline
-struct address_space *page_file_mapping(struct page *page)
-{
- if (unlikely(PageSwapCache(page)))
- return __page_file_mapping(page);
-
- return page->mapping;
-}
-
extern pgoff_t __page_file_index(struct page *page);

/*
@@ -1630,7 +1617,6 @@ static inline pgoff_t page_index(struct page *page)
}

bool page_mapped(struct page *page);
-struct address_space *page_mapping(struct page *page);
struct address_space *page_mapping_file(struct page *page);

/*
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5094b50f7680..5a2c0764d7c0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -157,6 +157,23 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping)

void release_pages(struct page **pages, int nr);

+struct address_space *page_mapping(struct page *);
+struct address_space *folio_mapping(struct folio *);
+struct address_space *__folio_file_mapping(struct folio *);
+
+static inline struct address_space *folio_file_mapping(struct folio *folio)
+{
+ if (unlikely(FolioSwapCache(folio)))
+ return __folio_file_mapping(folio);
+
+ return folio->page.mapping;
+}
+
+static inline struct address_space *page_file_mapping(struct page *page)
+{
+ return folio_file_mapping(page_folio(page));
+}
+
/*
* speculatively take a reference to a page.
* If the page is free (_refcount == 0), then _refcount is untouched, and 0
diff --git a/mm/Makefile b/mm/Makefile
index 72227b24a616..ceb0089efd29 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -46,7 +46,7 @@ mmu-$(CONFIG_MMU) += process_vm_access.o
endif

obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
- maccess.o page-writeback.o \
+ maccess.o page-writeback.o folio-compat.o \
readahead.o swap.o truncate.o vmscan.o shmem.o \
util.o mmzone.o vmstat.o backing-dev.o \
mm_init.o percpu.o slab_common.o \
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
new file mode 100644
index 000000000000..5e107aa30a62
--- /dev/null
+++ b/mm/folio-compat.c
@@ -0,0 +1,13 @@
+/*
+ * Compatibility functions which bloat the callers too much to make inline.
+ * All of the callers of these functions should be converted to use folios
+ * eventually.
+ */
+
+#include <linux/pagemap.h>
+
+struct address_space *page_mapping(struct page *page)
+{
+ return folio_mapping(page_folio(page));
+}
+EXPORT_SYMBOL(page_mapping);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 084a5b9a18e5..b80af7537d9e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3535,11 +3535,11 @@ struct swap_info_struct *page_swap_info(struct page *page)
/*
* out-of-line __page_file_ methods to avoid include hell.
*/
-struct address_space *__page_file_mapping(struct page *page)
+struct address_space *__folio_file_mapping(struct folio *folio)
{
- return page_swap_info(page)->swap_file->f_mapping;
+ return page_swap_info(&folio->page)->swap_file->f_mapping;
}
-EXPORT_SYMBOL_GPL(__page_file_mapping);
+EXPORT_SYMBOL_GPL(__folio_file_mapping);

pgoff_t __page_file_index(struct page *page)
{
diff --git a/mm/util.c b/mm/util.c
index 54870226cea6..9ab72cfa4aa1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -686,39 +686,39 @@ struct anon_vma *page_anon_vma(struct page *page)
return __page_rmapping(page);
}

-struct address_space *page_mapping(struct page *page)
+struct address_space *folio_mapping(struct folio *folio)
{
struct address_space *mapping;

- page = compound_head(page);
-
/* This happens if someone calls flush_dcache_page on slab page */
- if (unlikely(PageSlab(page)))
+ if (unlikely(FolioSlab(folio)))
return NULL;

- if (unlikely(PageSwapCache(page))) {
+ if (unlikely(FolioSwapCache(folio))) {
swp_entry_t entry;

- entry.val = page_private(page);
+ entry.val = folio_private(folio);
return swap_address_space(entry);
}

- mapping = page->mapping;
+ mapping = folio->page.mapping;
if ((unsigned long)mapping & PAGE_MAPPING_ANON)
return NULL;

return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
}
-EXPORT_SYMBOL(page_mapping);
+EXPORT_SYMBOL(folio_mapping);

/*
* For file cache pages, return the address_space, otherwise return NULL
*/
struct address_space *page_mapping_file(struct page *page)
{
- if (unlikely(PageSwapCache(page)))
+ struct folio *folio = page_folio(page);
+
+ if (unlikely(FolioSwapCache(folio)))
return NULL;
- return page_mapping(page);
+ return folio_mapping(folio);
}

/* Slow path of page_mapcount() for compound pages */
--
2.30.0

2021-03-05 04:23:55

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 11/25] mm/memcg: Add folio wrappers for various memcontrol functions

Add new functions folio_memcg(), lock_folio_memcg(),
unlock_folio_memcg() and mem_cgroup_folio_lruvec(). These will
all be used by other functions later.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/memcontrol.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..564ce61319d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -383,6 +383,11 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

+static inline struct mem_cgroup *folio_memcg(struct folio *folio)
+{
+ return page_memcg(&folio->page);
+}
+
/*
* page_memcg_rcu - locklessly get the memory cgroup associated with a page
* @page: a pointer to the page struct
@@ -871,6 +876,16 @@ struct mem_cgroup *lock_page_memcg(struct page *page);
void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);

+static inline struct mem_cgroup *lock_folio_memcg(struct folio *folio)
+{
+ return lock_page_memcg(&folio->page);
+}
+
+static inline void unlock_folio_memcg(struct folio *folio)
+{
+ unlock_page_memcg(&folio->page);
+}
+
/*
* idx can be of type enum memcg_stat_item or node_stat_item.
* Keep in sync with memcg_exact_page_state().
@@ -1291,6 +1306,11 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg)
{
}

+static inline struct mem_cgroup *lock_folio_memcg(struct folio *folio)
+{
+ return NULL;
+}
+
static inline struct mem_cgroup *lock_page_memcg(struct page *page)
{
return NULL;
@@ -1300,6 +1320,10 @@ static inline void __unlock_page_memcg(struct mem_cgroup *memcg)
{
}

+static inline void unlock_folio_memcg(struct folio *folio)
+{
+}
+
static inline void unlock_page_memcg(struct page *page)
{
}
@@ -1431,6 +1455,12 @@ static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page)
}
#endif /* CONFIG_MEMCG */

+static inline struct lruvec *mem_cgroup_folio_lruvec(struct folio *folio,
+ struct pglist_data *pgdat)
+{
+ return mem_cgroup_page_lruvec(&folio->page, pgdat);
+}
+
static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
{
__mod_lruvec_kmem_state(p, idx, 1);
--
2.30.0

2021-03-05 04:23:59

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 12/25] mm/filemap: Add unlock_folio

Convert unlock_page() to call unlock_folio(). By using a folio we
avoid a call to compound_head(). This shortens the function from 39
bytes to 25 and removes 4 instructions on x86-64. Because we still
have unlock_page(), it's a net increase of 24 bytes of text.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 3 ++-
mm/filemap.c | 27 ++++++++++-----------------
mm/folio-compat.c | 6 ++++++
3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5a2c0764d7c0..a34cf531c100 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -640,7 +640,8 @@ extern int __lock_page_killable(struct page *page);
extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
-extern void unlock_page(struct page *page);
+void unlock_page(struct page *page);
+void unlock_folio(struct folio *folio);

/*
* Return true if the page was successfully locked
diff --git a/mm/filemap.c b/mm/filemap.c
index 3d1635d3be3e..9960ef1b2758 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1408,29 +1408,22 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
#endif

/**
- * unlock_page - unlock a locked page
- * @page: the page
+ * unlock_folio - Unlock a locked folio.
+ * @folio: The folio.
*
- * Unlocks the page and wakes up sleepers in wait_on_page_locked().
- * Also wakes sleepers in wait_on_page_writeback() because the wakeup
- * mechanism between PageLocked pages and PageWriteback pages is shared.
- * But that's OK - sleepers in wait_on_page_writeback() just go back to sleep.
+ * Unlocks the folio and wakes up any thread sleeping on the page lock.
*
- * Note that this depends on PG_waiters being the sign bit in the byte
- * that contains PG_locked - thus the BUILD_BUG_ON(). That allows us to
- * clear the PG_locked bit and test PG_waiters at the same time fairly
- * portably (architectures that do LL/SC can test any bit, while x86 can
- * test the sign bit).
+ * Context: May be called from interrupt or process context. May not be
+ * called from NMI context.
*/
-void unlock_page(struct page *page)
+void unlock_folio(struct folio *folio)
{
BUILD_BUG_ON(PG_waiters != 7);
- page = compound_head(page);
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
- wake_up_page_bit(page, PG_locked);
+ VM_BUG_ON_FOLIO(!FolioLocked(folio), folio);
+ if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio)))
+ wake_up_page_bit(&folio->page, PG_locked);
}
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(unlock_folio);

/**
* end_page_writeback - end writeback against a page
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 5e107aa30a62..02798abf19a1 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -11,3 +11,9 @@ struct address_space *page_mapping(struct page *page)
return folio_mapping(page_folio(page));
}
EXPORT_SYMBOL(page_mapping);
+
+void unlock_page(struct page *page)
+{
+ return unlock_folio(page_folio(page));
+}
+EXPORT_SYMBOL(unlock_page);
--
2.30.0

2021-03-05 04:24:27

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 14/25] mm/filemap: Add lock_folio_killable

This is like lock_page_killable() but for use by callers who
know they have a folio. Convert __lock_page_killable() to be
__lock_folio_killable(). This saves one call to compound_head() per
contended call to lock_page_killable().

__lock_folio_killable() is 20 bytes smaller than __lock_page_killable()
was. lock_page_maybe_drop_mmap() shrinks by 68 bytes and
__lock_page_or_retry() shrinks by 66 bytes. That's a total of 154 bytes
of text saved.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 15 ++++++++++-----
mm/filemap.c | 17 +++++++++--------
2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 034e41256340..0fa1a0338e54 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -636,7 +636,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
}

void __lock_folio(struct folio *folio);
-extern int __lock_page_killable(struct page *page);
+int __lock_folio_killable(struct folio *folio);
extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
@@ -676,6 +676,14 @@ static inline void lock_page(struct page *page)
__lock_folio(folio);
}

+static inline int lock_folio_killable(struct folio *folio)
+{
+ might_sleep();
+ if (!trylock_folio(folio))
+ return __lock_folio_killable(folio);
+ return 0;
+}
+
/*
* lock_page_killable is like lock_page but can be interrupted by fatal
* signals. It returns 0 if it locked the page and -EINTR if it was
@@ -683,10 +691,7 @@ static inline void lock_page(struct page *page)
*/
static inline int lock_page_killable(struct page *page)
{
- might_sleep();
- if (!trylock_page(page))
- return __lock_page_killable(page);
- return 0;
+ return lock_folio_killable(page_folio(page));
}

/*
diff --git a/mm/filemap.c b/mm/filemap.c
index 3e3e3c666b94..5acadffed25d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1499,14 +1499,13 @@ void __lock_folio(struct folio *folio)
}
EXPORT_SYMBOL(__lock_folio);

-int __lock_page_killable(struct page *__page)
+int __lock_folio_killable(struct folio *folio)
{
- struct page *page = compound_head(__page);
- wait_queue_head_t *q = page_waitqueue(page);
- return wait_on_page_bit_common(q, page, PG_locked, TASK_KILLABLE,
+ wait_queue_head_t *q = page_waitqueue(&folio->page);
+ return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
EXCLUSIVE);
}
-EXPORT_SYMBOL_GPL(__lock_page_killable);
+EXPORT_SYMBOL_GPL(__lock_folio_killable);

int __lock_page_async(struct page *page, struct wait_page_queue *wait)
{
@@ -1548,6 +1547,8 @@ int __lock_page_async(struct page *page, struct wait_page_queue *wait)
int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags)
{
+ struct folio *folio = page_folio(page);
+
if (fault_flag_allow_retry_first(flags)) {
/*
* CAUTION! In this case, mmap_lock is not released
@@ -1566,13 +1567,13 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
if (flags & FAULT_FLAG_KILLABLE) {
int ret;

- ret = __lock_page_killable(page);
+ ret = __lock_folio_killable(folio);
if (ret) {
mmap_read_unlock(mm);
return 0;
}
} else {
- __lock_folio(page_folio(page));
+ __lock_folio(folio);
}

return 1;
@@ -2734,7 +2735,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,

*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
if (vmf->flags & FAULT_FLAG_KILLABLE) {
- if (__lock_page_killable(&folio->page)) {
+ if (__lock_folio_killable(folio)) {
/*
* We didn't have the right flags to drop the mmap_lock,
* but all fault_handlers only check for fatal signals
--
2.30.0

2021-03-05 04:25:11

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 15/25] mm/filemap: Convert lock_page_async to lock_folio_async

There aren't any actual callers of lock_page_async(), but convert
filemap_update_page() to call __lock_folio_async().

__lock_folio_async() is 21 bytes smaller than __lock_page_async(),
but the real savings come from using a folio in filemap_update_page(),
shrinking it from 514 bytes to 403 bytes, saving 111 bytes. The text
shrinks by 132 bytes in total.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/io_uring.c | 2 +-
include/linux/pagemap.h | 14 +++++++-------
mm/filemap.c | 31 ++++++++++++++++---------------
3 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4a088581b0f2..55687707b5fb 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3155,7 +3155,7 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
}

/*
- * This is our waitqueue callback handler, registered through lock_page_async()
+ * This is our waitqueue callback handler, registered through lock_folio_async()
* when we initially tried to do the IO with the iocb armed our waitqueue.
* This gets called when the page is unlocked, and we generally expect that to
* happen when the page IO is completed and the page is now uptodate. This will
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0fa1a0338e54..9dbd9cf7d541 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -637,7 +637,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,

void __lock_folio(struct folio *folio);
int __lock_folio_killable(struct folio *folio);
-extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
+int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
void unlock_page(struct page *page);
@@ -695,18 +695,18 @@ static inline int lock_page_killable(struct page *page)
}

/*
- * lock_page_async - Lock the page, unless this would block. If the page
- * is already locked, then queue a callback when the page becomes unlocked.
+ * lock_folio_async - Lock the folio, unless this would block. If the folio
+ * is already locked, then queue a callback when the folio becomes unlocked.
* This callback can then retry the operation.
*
- * Returns 0 if the page is locked successfully, or -EIOCBQUEUED if the page
+ * Returns 0 if the folio is locked successfully, or -EIOCBQUEUED if the folio
* was already locked and the callback defined in 'wait' was queued.
*/
-static inline int lock_page_async(struct page *page,
+static inline int lock_folio_async(struct folio *folio,
struct wait_page_queue *wait)
{
- if (!trylock_page(page))
- return __lock_page_async(page, wait);
+ if (!trylock_folio(folio))
+ return __lock_folio_async(folio, wait);
return 0;
}

diff --git a/mm/filemap.c b/mm/filemap.c
index 5acadffed25d..b99b068bc058 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1507,18 +1507,18 @@ int __lock_folio_killable(struct folio *folio)
}
EXPORT_SYMBOL_GPL(__lock_folio_killable);

-int __lock_page_async(struct page *page, struct wait_page_queue *wait)
+int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)
{
- struct wait_queue_head *q = page_waitqueue(page);
+ struct wait_queue_head *q = page_waitqueue(&folio->page);
int ret = 0;

- wait->page = page;
+ wait->page = &folio->page;
wait->bit_nr = PG_locked;

spin_lock_irq(&q->lock);
__add_wait_queue_entry_tail(q, &wait->wait);
- SetPageWaiters(page);
- ret = !trylock_page(page);
+ SetFolioWaiters(folio);
+ ret = !trylock_folio(folio);
/*
* If we were successful now, we know we're still on the
* waitqueue as we're still under the lock. This means it's
@@ -2265,41 +2265,42 @@ static int filemap_update_page(struct kiocb *iocb,
struct address_space *mapping, struct iov_iter *iter,
struct page *page)
{
+ struct folio *folio = page_folio(page);
int error;

- if (!trylock_page(page)) {
+ if (!trylock_folio(folio)) {
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
return -EAGAIN;
if (!(iocb->ki_flags & IOCB_WAITQ)) {
- put_and_wait_on_page_locked(page, TASK_KILLABLE);
+ put_and_wait_on_page_locked(&folio->page, TASK_KILLABLE);
return AOP_TRUNCATED_PAGE;
}
- error = __lock_page_async(page, iocb->ki_waitq);
+ error = __lock_folio_async(folio, iocb->ki_waitq);
if (error)
return error;
}

- if (!page->mapping)
+ if (!folio->page.mapping)
goto truncated;

error = 0;
- if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, page))
+ if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, &folio->page))
goto unlock;

error = -EAGAIN;
if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT | IOCB_WAITQ))
goto unlock;

- error = filemap_read_page(iocb->ki_filp, mapping, page);
+ error = filemap_read_page(iocb->ki_filp, mapping, &folio->page);
if (error == AOP_TRUNCATED_PAGE)
- put_page(page);
+ put_folio(folio);
return error;
truncated:
- unlock_page(page);
- put_page(page);
+ unlock_folio(folio);
+ put_folio(folio);
return AOP_TRUNCATED_PAGE;
unlock:
- unlock_page(page);
+ unlock_folio(folio);
return error;
}

--
2.30.0

2021-03-05 04:25:25

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 13/25] mm/filemap: Add lock_folio

This is like lock_page() but for use by callers who know they have a folio.
Convert __lock_page() to be __lock_folio(). This saves one call to
compound_head() per contended call to lock_page().

Saves 362 bytes of text; mostly from improved register allocation and
inlining decisions. __lock_folio is 59 bytes while __lock_page was 79.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 24 +++++++++++++++++++-----
mm/filemap.c | 29 +++++++++++++++--------------
2 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a34cf531c100..034e41256340 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -635,7 +635,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
return true;
}

-extern void __lock_page(struct page *page);
+void __lock_folio(struct folio *folio);
extern int __lock_page_killable(struct page *page);
extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
@@ -643,13 +643,24 @@ extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
void unlock_page(struct page *page);
void unlock_folio(struct folio *folio);

+static inline bool trylock_folio(struct folio *folio)
+{
+ return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio)));
+}
+
/*
* Return true if the page was successfully locked
*/
static inline int trylock_page(struct page *page)
{
- page = compound_head(page);
- return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
+ return trylock_folio(page_folio(page));
+}
+
+static inline void lock_folio(struct folio *folio)
+{
+ might_sleep();
+ if (!trylock_folio(folio))
+ __lock_folio(folio);
}

/*
@@ -657,9 +668,12 @@ static inline int trylock_page(struct page *page)
*/
static inline void lock_page(struct page *page)
{
+ struct folio *folio;
might_sleep();
- if (!trylock_page(page))
- __lock_page(page);
+
+ folio = page_folio(page);
+ if (!trylock_folio(folio))
+ __lock_folio(folio);
}

/*
diff --git a/mm/filemap.c b/mm/filemap.c
index 9960ef1b2758..3e3e3c666b94 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1160,7 +1160,7 @@ static void wake_up_page(struct page *page, int bit)
*/
enum behavior {
EXCLUSIVE, /* Hold ref to page and take the bit when woken, like
- * __lock_page() waiting on then setting PG_locked.
+ * __lock_folio() waiting on then setting PG_locked.
*/
SHARED, /* Hold ref to page and check the bit when woken, like
* wait_on_page_writeback() waiting on PG_writeback.
@@ -1488,17 +1488,16 @@ void page_endio(struct page *page, bool is_write, int err)
EXPORT_SYMBOL_GPL(page_endio);

/**
- * __lock_page - get a lock on the page, assuming we need to sleep to get it
- * @__page: the page to lock
+ * __lock_folio - Get a lock on the folio, assuming we need to sleep to get it.
+ * @folio: The folio to lock
*/
-void __lock_page(struct page *__page)
+void __lock_folio(struct folio *folio)
{
- struct page *page = compound_head(__page);
- wait_queue_head_t *q = page_waitqueue(page);
- wait_on_page_bit_common(q, page, PG_locked, TASK_UNINTERRUPTIBLE,
+ wait_queue_head_t *q = page_waitqueue(&folio->page);
+ wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
EXCLUSIVE);
}
-EXPORT_SYMBOL(__lock_page);
+EXPORT_SYMBOL(__lock_folio);

int __lock_page_killable(struct page *__page)
{
@@ -1573,10 +1572,10 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
return 0;
}
} else {
- __lock_page(page);
+ __lock_folio(page_folio(page));
}
- return 1;

+ return 1;
}

/**
@@ -2720,7 +2719,9 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
struct file **fpin)
{
- if (trylock_page(page))
+ struct folio *folio = page_folio(page);
+
+ if (trylock_folio(folio))
return 1;

/*
@@ -2733,7 +2734,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,

*fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
if (vmf->flags & FAULT_FLAG_KILLABLE) {
- if (__lock_page_killable(page)) {
+ if (__lock_page_killable(&folio->page)) {
/*
* We didn't have the right flags to drop the mmap_lock,
* but all fault_handlers only check for fatal signals
@@ -2745,11 +2746,11 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page,
return 0;
}
} else
- __lock_page(page);
+ __lock_folio(folio);
+
return 1;
}

-
/*
* Synchronous readahead happens when we don't even find a page in the page
* cache at all. We don't want to perform IO under the mmap sem, so if we have
--
2.30.0

2021-03-05 04:26:09

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 16/25] mm/filemap: Convert end_page_writeback to end_folio_writeback

Add a wrapper function for users that are not yet converted to folios.

end_folio_writeback() is less than half the size of end_page_writeback()
at just 105 bytes compared to 213 bytes, due to removing all the
compound_head() calls. The 30 byte wrapper function makes this a net
saving of 70 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 3 ++-
mm/filemap.c | 30 +++++++++++++++---------------
mm/folio-compat.c | 6 ++++++
3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9dbd9cf7d541..e7fe8b276a2d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -753,7 +753,8 @@ static inline int wait_on_page_locked_killable(struct page *page)

int put_and_wait_on_page_locked(struct page *page, int state);
void wait_on_page_writeback(struct page *page);
-extern void end_page_writeback(struct page *page);
+void end_page_writeback(struct page *page);
+void end_folio_writeback(struct folio *folio);
void wait_for_stable_page(struct page *page);

void page_endio(struct page *page, bool is_write, int err);
diff --git a/mm/filemap.c b/mm/filemap.c
index b99b068bc058..e25a5ebc914c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1148,11 +1148,11 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
spin_unlock_irqrestore(&q->lock, flags);
}

-static void wake_up_page(struct page *page, int bit)
+static void wake_up_folio(struct folio *folio, int bit)
{
- if (!PageWaiters(page))
+ if (!FolioWaiters(folio))
return;
- wake_up_page_bit(page, bit);
+ wake_up_page_bit(&folio->page, bit);
}

/*
@@ -1426,10 +1426,10 @@ void unlock_folio(struct folio *folio)
EXPORT_SYMBOL(unlock_folio);

/**
- * end_page_writeback - end writeback against a page
- * @page: the page
+ * end_folio_writeback - End writeback against a page.
+ * @folio: The page.
*/
-void end_page_writeback(struct page *page)
+void end_folio_writeback(struct folio *folio)
{
/*
* TestClearPageReclaim could be used here but it is an atomic
@@ -1438,26 +1438,26 @@ void end_page_writeback(struct page *page)
* justify taking an atomic operation penalty at the end of
* ever page writeback.
*/
- if (PageReclaim(page)) {
- ClearPageReclaim(page);
- rotate_reclaimable_page(page);
+ if (FolioReclaim(folio)) {
+ ClearFolioReclaim(folio);
+ rotate_reclaimable_page(&folio->page);
}

/*
* Writeback does not hold a page reference of its own, relying
* on truncation to wait for the clearing of PG_writeback.
* But here we must make sure that the page is not freed and
- * reused before the wake_up_page().
+ * reused before the wake_up_folio().
*/
- get_page(page);
- if (!test_clear_page_writeback(page))
+ get_folio(folio);
+ if (!test_clear_page_writeback(&folio->page))
BUG();

smp_mb__after_atomic();
- wake_up_page(page, PG_writeback);
- put_page(page);
+ wake_up_folio(folio, PG_writeback);
+ put_folio(folio);
}
-EXPORT_SYMBOL(end_page_writeback);
+EXPORT_SYMBOL(end_folio_writeback);

/*
* After completing I/O on a page, call this routine to update the page
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 02798abf19a1..d1a1dfe52589 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -17,3 +17,9 @@ void unlock_page(struct page *page)
return unlock_folio(page_folio(page));
}
EXPORT_SYMBOL(unlock_page);
+
+void end_page_writeback(struct page *page)
+{
+ return end_folio_writeback(page_folio(page));
+}
+EXPORT_SYMBOL(end_page_writeback);
--
2.30.0

2021-03-05 04:26:16

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 18/25] mm/page-writeback: Add wait_on_folio_writeback

This eliminates a call to compound_head() by turning PageWriteback()
into FolioWriteback(), which saves 8 bytes. That is more than offset
by adding the wait_on_page_writeback compatibility wrapper for a net
increase in text of 30 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 1 +
mm/folio-compat.c | 6 ++++++
mm/page-writeback.c | 21 ++++++++++++++-------
3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c53743f24550..7d797847633c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -763,6 +763,7 @@ static inline int wait_on_page_locked_killable(struct page *page)

int put_and_wait_on_page_locked(struct page *page, int state);
void wait_on_page_writeback(struct page *page);
+void wait_on_folio_writeback(struct folio *folio);
void end_page_writeback(struct page *page);
void end_folio_writeback(struct folio *folio);
void wait_for_stable_page(struct page *page);
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index d1a1dfe52589..6aadecc39fba 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -23,3 +23,9 @@ void end_page_writeback(struct page *page)
return end_folio_writeback(page_folio(page));
}
EXPORT_SYMBOL(end_page_writeback);
+
+void wait_on_page_writeback(struct page *page)
+{
+ return wait_on_folio_writeback(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_on_page_writeback);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eb34d204d4ee..968579452ea4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2821,17 +2821,24 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
}
EXPORT_SYMBOL(__test_set_page_writeback);

-/*
- * Wait for a page to complete writeback
+/**
+ * wait_on_folio_writeback - Wait for a folio to complete writeback.
+ * @folio: The folio to wait for.
+ *
+ * If the folio is currently being written back to storage, waits for the
+ * I/O to complete.
+ *
+ * Context: Sleeps; must be called in process context and with no spinlocks
+ * held.
*/
-void wait_on_page_writeback(struct page *page)
+void wait_on_folio_writeback(struct folio *folio)
{
- while (PageWriteback(page)) {
- trace_wait_on_page_writeback(page, page_mapping(page));
- wait_on_page_bit(page, PG_writeback);
+ while (FolioWriteback(folio)) {
+ trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
+ wait_on_page_bit(&folio->page, PG_writeback);
}
}
-EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+EXPORT_SYMBOL_GPL(wait_on_folio_writeback);

/**
* wait_for_stable_page() - wait for writeback to finish, if necessary.
--
2.30.0

2021-03-05 04:26:38

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 19/25] mm/page-writeback: Add wait_for_stable_folio

Move wait_for_stable_page() into the folio compatibility file.
wait_for_stable_folio() avoids a call to compound_head() and is 14 bytes
smaller than wait_for_stable_page() was. The net text size grows by 24
bytes as a result of this patch.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 1 +
mm/folio-compat.c | 6 ++++++
mm/page-writeback.c | 17 ++++++++---------
3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7d797847633c..e7ca8def2f0d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -767,6 +767,7 @@ void wait_on_folio_writeback(struct folio *folio);
void end_page_writeback(struct page *page);
void end_folio_writeback(struct folio *folio);
void wait_for_stable_page(struct page *page);
+void wait_for_stable_folio(struct folio *folio);

void page_endio(struct page *page, bool is_write, int err);

diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 6aadecc39fba..335594fe414e 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -29,3 +29,9 @@ void wait_on_page_writeback(struct page *page)
return wait_on_folio_writeback(page_folio(page));
}
EXPORT_SYMBOL_GPL(wait_on_page_writeback);
+
+void wait_for_stable_page(struct page *page)
+{
+ return wait_for_stable_folio(page_folio(page));
+}
+EXPORT_SYMBOL_GPL(wait_for_stable_page);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 968579452ea4..eef36edc9e0c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2841,17 +2841,16 @@ void wait_on_folio_writeback(struct folio *folio)
EXPORT_SYMBOL_GPL(wait_on_folio_writeback);

/**
- * wait_for_stable_page() - wait for writeback to finish, if necessary.
- * @page: The page to wait on.
+ * wait_for_stable_folio() - wait for writeback to finish, if necessary.
+ * @folio: The folio to wait on.
*
- * This function determines if the given page is related to a backing device
- * that requires page contents to be held stable during writeback. If so, then
+ * This function determines if the given folio is related to a backing device
+ * that requires folio contents to be held stable during writeback. If so, then
* it will wait for any pending writeback to complete.
*/
-void wait_for_stable_page(struct page *page)
+void wait_for_stable_folio(struct folio *folio)
{
- page = thp_head(page);
- if (page->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
- wait_on_page_writeback(page);
+ if (folio->page.mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
+ wait_on_folio_writeback(folio);
}
-EXPORT_SYMBOL_GPL(wait_for_stable_page);
+EXPORT_SYMBOL_GPL(wait_for_stable_folio);
--
2.30.0

2021-03-05 04:26:50

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 17/25] mm/filemap: Add wait_on_folio_locked & wait_on_folio_locked_killable

Turn wait_on_page_locked() and wait_on_page_locked_killable() into
wrappers. This eliminates a call to compound_head() from each call-site,
reducing text size by 207 bytes for me.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 26 ++++++++++++++++++--------
mm/filemap.c | 4 ++--
2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e7fe8b276a2d..c53743f24550 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -732,23 +732,33 @@ extern void wait_on_page_bit(struct page *page, int bit_nr);
extern int wait_on_page_bit_killable(struct page *page, int bit_nr);

/*
- * Wait for a page to be unlocked.
+ * Wait for a folio to be unlocked.
*
- * This must be called with the caller "holding" the page,
- * ie with increased "page->count" so that the page won't
+ * This must be called with the caller "holding" the folio,
+ * ie with increased "page->count" so that the folio won't
* go away during the wait..
*/
+static inline void wait_on_folio_locked(struct folio *folio)
+{
+ if (FolioLocked(folio))
+ wait_on_page_bit(&folio->page, PG_locked);
+}
+
+static inline int wait_on_folio_locked_killable(struct folio *folio)
+{
+ if (!FolioLocked(folio))
+ return 0;
+ return wait_on_page_bit_killable(&folio->page, PG_locked);
+}
+
static inline void wait_on_page_locked(struct page *page)
{
- if (PageLocked(page))
- wait_on_page_bit(compound_head(page), PG_locked);
+ wait_on_folio_locked(page_folio(page));
}

static inline int wait_on_page_locked_killable(struct page *page)
{
- if (!PageLocked(page))
- return 0;
- return wait_on_page_bit_killable(compound_head(page), PG_locked);
+ return wait_on_folio_locked_killable(page_folio(page));
}

int put_and_wait_on_page_locked(struct page *page, int state);
diff --git a/mm/filemap.c b/mm/filemap.c
index e25a5ebc914c..50263fa62574 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1559,9 +1559,9 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,

mmap_read_unlock(mm);
if (flags & FAULT_FLAG_KILLABLE)
- wait_on_page_locked_killable(page);
+ wait_on_folio_locked_killable(folio);
else
- wait_on_page_locked(page);
+ wait_on_folio_locked(folio);
return 0;
}
if (flags & FAULT_FLAG_KILLABLE) {
--
2.30.0

2021-03-05 04:27:41

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 22/25] mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit

All callers have a folio, so use it directly.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/filemap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 76c97cb9cbbe..e91fa14c86c7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1094,14 +1094,14 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
return (flags & WQ_FLAG_EXCLUSIVE) != 0;
}

-static void wake_up_page_bit(struct page *page, int bit_nr)
+static void wake_up_folio_bit(struct folio *folio, int bit_nr)
{
- wait_queue_head_t *q = page_waitqueue(page);
+ wait_queue_head_t *q = page_waitqueue(&folio->page);
struct wait_page_key key;
unsigned long flags;
wait_queue_entry_t bookmark;

- key.page = page;
+ key.page = &folio->page;
key.bit_nr = bit_nr;
key.page_match = 0;

@@ -1136,7 +1136,7 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
* page waiters.
*/
if (!waitqueue_active(q) || !key.page_match) {
- ClearPageWaiters(page);
+ ClearFolioWaiters(folio);
/*
* It's possible to miss clearing Waiters here, when we woke
* our page waiters, but the hashed waitqueue has waiters for
@@ -1152,7 +1152,7 @@ static void wake_up_folio(struct folio *folio, int bit)
{
if (!FolioWaiters(folio))
return;
- wake_up_page_bit(&folio->page, bit);
+ wake_up_folio_bit(folio, bit);
}

/*
@@ -1417,7 +1417,7 @@ void unlock_folio(struct folio *folio)
BUILD_BUG_ON(PG_waiters != 7);
VM_BUG_ON_FOLIO(!FolioLocked(folio), folio);
if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio)))
- wake_up_page_bit(&folio->page, PG_locked);
+ wake_up_folio_bit(folio, PG_locked);
}
EXPORT_SYMBOL(unlock_folio);

--
2.30.0

2021-03-05 04:27:47

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 21/25] mm/filemap: Add __lock_folio_or_retry

Convert __lock_page_or_retry() to __lock_folio_or_retry(). This actually
saves 4 bytes in the only caller of lock_page_or_retry() (due to better
register allocation) and saves the 20 byte cost of calling page_folio()
in __lock_folio_or_retry() for a total saving of 24 bytes.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 9 ++++++---
mm/filemap.c | 10 ++++------
mm/memory.c | 8 ++++----
3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8737eb86602e..6ee4bc843f98 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -638,7 +638,7 @@ static inline bool wake_page_match(struct wait_page_queue *wait_page,
void __lock_folio(struct folio *folio);
int __lock_folio_killable(struct folio *folio);
int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait);
-extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __lock_folio_or_retry(struct folio *folio, struct mm_struct *mm,
unsigned int flags);
void unlock_page(struct page *page);
void unlock_folio(struct folio *folio);
@@ -715,13 +715,16 @@ static inline int lock_folio_async(struct folio *folio,
* caller indicated that it can handle a retry.
*
* Return value and mmap_lock implications depend on flags; see
- * __lock_page_or_retry().
+ * __lock_folio_or_retry().
*/
static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags)
{
+ struct folio *folio;
might_sleep();
- return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+ folio = page_folio(page);
+ return trylock_folio(folio) || __lock_folio_or_retry(folio, mm, flags);
}

/*
diff --git a/mm/filemap.c b/mm/filemap.c
index ec5a743b4e0f..76c97cb9cbbe 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1529,20 +1529,18 @@ int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)

/*
* Return values:
- * 1 - page is locked; mmap_lock is still held.
- * 0 - page is not locked.
+ * 1 - folio is locked; mmap_lock is still held.
+ * 0 - folio is not locked.
* mmap_lock has been released (mmap_read_unlock(), unless flags had both
* FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in
* which case mmap_lock is still held.
*
* If neither ALLOW_RETRY nor KILLABLE are set, will always return 1
- * with the page locked and the mmap_lock unperturbed.
+ * with the folio locked and the mmap_lock unperturbed.
*/
-int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
+int __lock_folio_or_retry(struct folio *folio, struct mm_struct *mm,
unsigned int flags)
{
- struct folio *folio = page_folio(page);
-
if (fault_flag_allow_retry_first(flags)) {
/*
* CAUTION! In this case, mmap_lock is not released
diff --git a/mm/memory.c b/mm/memory.c
index c8e357627318..51a3590418f0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4022,7 +4022,7 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf)
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults).
* The mmap_lock may have been released depending on flags and our
- * return value. See filemap_fault() and __lock_page_or_retry().
+ * return value. See filemap_fault() and __lock_folio_or_retry().
* If mmap_lock is released, vma may become invalid (for example
* by other thread calling munmap()).
*/
@@ -4256,7 +4256,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
* concurrent faults).
*
* The mmap_lock may have been released depending on flags and our return value.
- * See filemap_fault() and __lock_page_or_retry().
+ * See filemap_fault() and __lock_folio_or_retry().
*/
static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
{
@@ -4360,7 +4360,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
* By the time we get here, we already hold the mm semaphore
*
* The mmap_lock may have been released depending on flags and our
- * return value. See filemap_fault() and __lock_page_or_retry().
+ * return value. See filemap_fault() and __lock_folio_or_retry().
*/
static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
unsigned long address, unsigned int flags)
@@ -4516,7 +4516,7 @@ static inline void mm_account_fault(struct pt_regs *regs,
* By the time we get here, we already hold the mm semaphore
*
* The mmap_lock may have been released depending on flags and our
- * return value. See filemap_fault() and __lock_page_or_retry().
+ * return value. See filemap_fault() and __lock_folio_or_retry().
*/
vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
unsigned int flags, struct pt_regs *regs)
--
2.30.0

2021-03-05 04:27:54

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 20/25] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit

We must always wait on the folio, otherwise we won't be woken up.

This commit shrinks the kernel by 695 bytes, mostly due to moving
the page waitqueue lookup into wait_on_folio_bit_common().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/afs/write.c | 21 ++++++++--------
include/linux/pagemap.h | 10 ++++----
mm/filemap.c | 56 ++++++++++++++++++-----------------------
mm/page-writeback.c | 2 +-
4 files changed, 42 insertions(+), 47 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index c9195fc67fd8..0c313117200a 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -834,13 +834,14 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
*/
vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
{
+ struct folio *folio = page_folio(vmf->page);
struct file *file = vmf->vma->vm_file;
struct inode *inode = file_inode(file);
struct afs_vnode *vnode = AFS_FS_I(inode);
unsigned long priv;

_enter("{{%llx:%llu}},{%lx}",
- vnode->fid.vid, vnode->fid.vnode, vmf->page->index);
+ vnode->fid.vid, vnode->fid.vnode, folio->page.index);

sb_start_pagefault(inode->i_sb);

@@ -851,27 +852,27 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
fscache_wait_on_page_write(vnode->cache, vmf->page);
#endif

- if (PageWriteback(vmf->page) &&
- wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
+ if (FolioWriteback(folio) &&
+ wait_on_folio_bit_killable(folio, PG_writeback) < 0)
return VM_FAULT_RETRY;

- if (lock_page_killable(vmf->page) < 0)
+ if (lock_folio_killable(folio) < 0)
return VM_FAULT_RETRY;

/* We mustn't change page->private until writeback is complete as that
* details the portion of the page we need to write back and we might
* need to redirty the page if there's a problem.
*/
- wait_on_page_writeback(vmf->page);
+ wait_on_folio_writeback(folio);

- priv = afs_page_dirty(0, PAGE_SIZE);
+ priv = afs_page_dirty(0, folio_size(folio));
priv = afs_page_dirty_mmapped(priv);
trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"),
- vmf->page->index, priv);
- if (PagePrivate(vmf->page))
- set_page_private(vmf->page, priv);
+ folio->page.index, priv);
+ if (FolioPrivate(folio))
+ set_folio_private(folio, priv);
else
- attach_page_private(vmf->page, (void *)priv);
+ attach_folio_private(folio, (void *)priv);
file_update_time(file);

sb_end_pagefault(inode->i_sb);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e7ca8def2f0d..8737eb86602e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -725,11 +725,11 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
}

/*
- * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
+ * This is exported only for wait_on_folio_locked/wait_on_folio_writeback, etc.,
* and should not be used directly.
*/
-extern void wait_on_page_bit(struct page *page, int bit_nr);
-extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
+extern void wait_on_folio_bit(struct folio *folio, int bit_nr);
+extern int wait_on_folio_bit_killable(struct folio *folio, int bit_nr);

/*
* Wait for a folio to be unlocked.
@@ -741,14 +741,14 @@ extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
static inline void wait_on_folio_locked(struct folio *folio)
{
if (FolioLocked(folio))
- wait_on_page_bit(&folio->page, PG_locked);
+ wait_on_folio_bit(folio, PG_locked);
}

static inline int wait_on_folio_locked_killable(struct folio *folio)
{
if (!FolioLocked(folio))
return 0;
- return wait_on_page_bit_killable(&folio->page, PG_locked);
+ return wait_on_folio_bit_killable(folio, PG_locked);
}

static inline void wait_on_page_locked(struct page *page)
diff --git a/mm/filemap.c b/mm/filemap.c
index 50263fa62574..ec5a743b4e0f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1075,7 +1075,7 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
*
* So update the flags atomically, and wake up the waiter
* afterwards to avoid any races. This store-release pairs
- * with the load-acquire in wait_on_page_bit_common().
+ * with the load-acquire in wait_on_folio_bit_common().
*/
smp_store_release(&wait->flags, flags | WQ_FLAG_WOKEN);
wake_up_state(wait->private, mode);
@@ -1156,7 +1156,7 @@ static void wake_up_folio(struct folio *folio, int bit)
}

/*
- * A choice of three behaviors for wait_on_page_bit_common():
+ * A choice of three behaviors for wait_on_folio_bit_common():
*/
enum behavior {
EXCLUSIVE, /* Hold ref to page and take the bit when woken, like
@@ -1190,9 +1190,10 @@ static inline bool trylock_page_bit_common(struct page *page, int bit_nr,
/* How many times do we accept lock stealing from under a waiter? */
int sysctl_page_lock_unfairness = 5;

-static inline int wait_on_page_bit_common(wait_queue_head_t *q,
- struct page *page, int bit_nr, int state, enum behavior behavior)
+static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
+ int state, enum behavior behavior)
{
+ wait_queue_head_t *q = page_waitqueue(&folio->page);
int unfairness = sysctl_page_lock_unfairness;
struct wait_page_queue wait_page;
wait_queue_entry_t *wait = &wait_page.wait;
@@ -1201,8 +1202,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
unsigned long pflags;

if (bit_nr == PG_locked &&
- !PageUptodate(page) && PageWorkingset(page)) {
- if (!PageSwapBacked(page)) {
+ !FolioUptodate(folio) && FolioWorkingset(folio)) {
+ if (!FolioSwapBacked(folio)) {
delayacct_thrashing_start();
delayacct = true;
}
@@ -1212,7 +1213,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,

init_wait(wait);
wait->func = wake_page_function;
- wait_page.page = page;
+ wait_page.page = &folio->page;
wait_page.bit_nr = bit_nr;

repeat:
@@ -1227,7 +1228,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
* Do one last check whether we can get the
* page bit synchronously.
*
- * Do the SetPageWaiters() marking before that
+ * Do the SetFolioWaiters() marking before that
* to let any waker we _just_ missed know they
* need to wake us up (otherwise they'll never
* even go to the slow case that looks at the
@@ -1238,8 +1239,8 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
* lock to avoid races.
*/
spin_lock_irq(&q->lock);
- SetPageWaiters(page);
- if (!trylock_page_bit_common(page, bit_nr, wait))
+ SetFolioWaiters(folio);
+ if (!trylock_page_bit_common(&folio->page, bit_nr, wait))
__add_wait_queue_entry_tail(q, wait);
spin_unlock_irq(&q->lock);

@@ -1249,10 +1250,10 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
* see whether the page bit testing has already
* been done by the wake function.
*
- * We can drop our reference to the page.
+ * We can drop our reference to the folio.
*/
if (behavior == DROP)
- put_page(page);
+ put_folio(folio);

/*
* Note that until the "finish_wait()", or until
@@ -1289,7 +1290,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
*
* And if that fails, we'll have to retry this all.
*/
- if (unlikely(test_and_set_bit(bit_nr, &page->flags)))
+ if (unlikely(test_and_set_bit(bit_nr, folio_flags(folio))))
goto repeat;

wait->flags |= WQ_FLAG_DONE;
@@ -1298,7 +1299,7 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,

/*
* If a signal happened, this 'finish_wait()' may remove the last
- * waiter from the wait-queues, but the PageWaiters bit will remain
+ * waiter from the wait-queues, but the FolioWaiters bit will remain
* set. That's ok. The next wakeup will take care of it, and trying
* to do it here would be difficult and prone to races.
*/
@@ -1329,19 +1330,17 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
}

-void wait_on_page_bit(struct page *page, int bit_nr)
+void wait_on_folio_bit(struct folio *folio, int bit_nr)
{
- wait_queue_head_t *q = page_waitqueue(page);
- wait_on_page_bit_common(q, page, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
+ wait_on_folio_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
}
-EXPORT_SYMBOL(wait_on_page_bit);
+EXPORT_SYMBOL(wait_on_folio_bit);

-int wait_on_page_bit_killable(struct page *page, int bit_nr)
+int wait_on_folio_bit_killable(struct folio *folio, int bit_nr)
{
- wait_queue_head_t *q = page_waitqueue(page);
- return wait_on_page_bit_common(q, page, bit_nr, TASK_KILLABLE, SHARED);
+ return wait_on_folio_bit_common(folio, bit_nr, TASK_KILLABLE, SHARED);
}
-EXPORT_SYMBOL(wait_on_page_bit_killable);
+EXPORT_SYMBOL(wait_on_folio_bit_killable);

/**
* put_and_wait_on_page_locked - Drop a reference and wait for it to be unlocked
@@ -1358,11 +1357,8 @@ EXPORT_SYMBOL(wait_on_page_bit_killable);
*/
int put_and_wait_on_page_locked(struct page *page, int state)
{
- wait_queue_head_t *q;
-
- page = compound_head(page);
- q = page_waitqueue(page);
- return wait_on_page_bit_common(q, page, PG_locked, state, DROP);
+ return wait_on_folio_bit_common(page_folio(page), PG_locked, state,
+ DROP);
}

/**
@@ -1493,16 +1489,14 @@ EXPORT_SYMBOL_GPL(page_endio);
*/
void __lock_folio(struct folio *folio)
{
- wait_queue_head_t *q = page_waitqueue(&folio->page);
- wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_UNINTERRUPTIBLE,
+ wait_on_folio_bit_common(folio, PG_locked, TASK_UNINTERRUPTIBLE,
EXCLUSIVE);
}
EXPORT_SYMBOL(__lock_folio);

int __lock_folio_killable(struct folio *folio)
{
- wait_queue_head_t *q = page_waitqueue(&folio->page);
- return wait_on_page_bit_common(q, &folio->page, PG_locked, TASK_KILLABLE,
+ return wait_on_folio_bit_common(folio, PG_locked, TASK_KILLABLE,
EXCLUSIVE);
}
EXPORT_SYMBOL_GPL(__lock_folio_killable);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eef36edc9e0c..6c1b4737c383 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2835,7 +2835,7 @@ void wait_on_folio_writeback(struct folio *folio)
{
while (FolioWriteback(folio)) {
trace_wait_on_page_writeback(&folio->page, folio_mapping(folio));
- wait_on_page_bit(&folio->page, PG_writeback);
+ wait_on_folio_bit(folio, PG_writeback);
}
}
EXPORT_SYMBOL_GPL(wait_on_folio_writeback);
--
2.30.0

2021-03-05 04:28:20

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 23/25] mm/page-writeback: Convert test_clear_page_writeback to take a folio

The one caller of test_clear_page_writeback() already has a folio,
so rename it to test_clear_folio_writeback() to make it clear that it
operates on the entire folio. This removes a few calls to compound_head()
but actually grows the function by 49 bytes because it now accounts for
the number of pages in the folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/page-flags.h | 2 +-
mm/filemap.c | 2 +-
mm/page-writeback.c | 20 ++++++++++----------
3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 90381858d901..01aa4a71bf14 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -594,7 +594,7 @@ static __always_inline void SetPageUptodate(struct page *page)

CLEARPAGEFLAG(Uptodate, uptodate, PF_NO_TAIL)

-int test_clear_page_writeback(struct page *page);
+int test_clear_folio_writeback(struct folio *folio);
int __test_set_page_writeback(struct page *page, bool keep_write);

#define test_set_page_writeback(page) \
diff --git a/mm/filemap.c b/mm/filemap.c
index e91fa14c86c7..57f46ff2f230 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1446,7 +1446,7 @@ void end_folio_writeback(struct folio *folio)
* reused before the wake_up_folio().
*/
get_folio(folio);
- if (!test_clear_page_writeback(&folio->page))
+ if (!test_clear_folio_writeback(folio))
BUG();

smp_mb__after_atomic();
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6c1b4737c383..fa3411ea4cd3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -589,7 +589,7 @@ static void wb_domain_writeout_inc(struct wb_domain *dom,

/*
* Increment @wb's writeout completion count and the global writeout
- * completion count. Called from test_clear_page_writeback().
+ * completion count. Called from test_clear_folio_writeback().
*/
static inline void __wb_writeout_inc(struct bdi_writeback *wb)
{
@@ -2719,24 +2719,24 @@ int clear_page_dirty_for_io(struct page *page)
}
EXPORT_SYMBOL(clear_page_dirty_for_io);

-int test_clear_page_writeback(struct page *page)
+int test_clear_folio_writeback(struct folio *folio)
{
- struct address_space *mapping = page_mapping(page);
+ struct address_space *mapping = folio_mapping(folio);
struct mem_cgroup *memcg;
struct lruvec *lruvec;
int ret;

- memcg = lock_page_memcg(page);
- lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
+ memcg = lock_folio_memcg(folio);
+ lruvec = mem_cgroup_folio_lruvec(folio, folio_pgdat(folio));
if (mapping && mapping_use_writeback_tags(mapping)) {
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
unsigned long flags;

xa_lock_irqsave(&mapping->i_pages, flags);
- ret = TestClearPageWriteback(page);
+ ret = TestClearFolioWriteback(folio);
if (ret) {
- __xa_clear_mark(&mapping->i_pages, page_index(page),
+ __xa_clear_mark(&mapping->i_pages, folio_index(folio),
PAGECACHE_TAG_WRITEBACK);
if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT) {
struct bdi_writeback *wb = inode_to_wb(inode);
@@ -2752,12 +2752,12 @@ int test_clear_page_writeback(struct page *page)

xa_unlock_irqrestore(&mapping->i_pages, flags);
} else {
- ret = TestClearPageWriteback(page);
+ ret = TestClearFolioWriteback(folio);
}
if (ret) {
dec_lruvec_state(lruvec, NR_WRITEBACK);
- dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
- inc_node_page_state(page, NR_WRITTEN);
+ dec_zone_folio_stat(folio, NR_ZONE_WRITE_PENDING);
+ inc_node_folio_stat(folio, NR_WRITTEN);
}
__unlock_page_memcg(memcg);
return ret;
--
2.30.0

2021-03-05 04:29:12

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 25/25] cachefiles: Switch to wait_page_key

Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile. Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility. Also switch it
to use the folio directly instead of the page.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/cachefiles/rdwr.c | 13 ++++++-------
include/linux/pagemap.h | 1 -
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e027c718ca01..b1dbc484a9c7 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -24,22 +24,21 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
container_of(wait, struct cachefiles_one_read, monitor);
struct cachefiles_object *object;
struct fscache_retrieval *op = monitor->op;
- struct wait_bit_key *key = _key;
- struct page *page = wait->private;
+ struct wait_page_key *key = _key;
+ struct folio *folio = wait->private;

ASSERT(key);

_enter("{%lu},%u,%d,{%p,%u}",
monitor->netfs_page->index, mode, sync,
- key->flags, key->bit_nr);
+ key->folio, key->bit_nr);

- if (key->flags != &page->flags ||
- key->bit_nr != PG_locked)
+ if (key->folio != folio || key->bit_nr != PG_locked)
return 0;

- _debug("--- monitor %p %lx ---", page, page->flags);
+ _debug("--- monitor %p %lx ---", folio, folio->page.flags);

- if (!PageUptodate(page) && !PageError(page)) {
+ if (!FolioUptodate(folio) && !FolioError(folio)) {
/* unlocked, not uptodate and not erronous? */
_debug("page probably truncated");
}
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2236f726bf01..699a6a0bcc54 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -609,7 +609,6 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
return pgoff;
}

-/* This has the same layout as wait_bit_key - see fs/cachefiles/rdwr.c */
struct wait_page_key {
struct folio *folio;
int bit_nr;
--
2.30.0

2021-03-05 04:29:43

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v4 24/25] mm/filemap: Convert page wait queues to be folios

Reinforce that if we're waiting for a bit in a struct page, that's
actually in the head page by changing the type from page to folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 6 +++---
mm/filemap.c | 30 ++++++++++++++++--------------
2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6ee4bc843f98..2236f726bf01 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -611,13 +611,13 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,

/* This has the same layout as wait_bit_key - see fs/cachefiles/rdwr.c */
struct wait_page_key {
- struct page *page;
+ struct folio *folio;
int bit_nr;
int page_match;
};

struct wait_page_queue {
- struct page *page;
+ struct folio *folio;
int bit_nr;
wait_queue_entry_t wait;
};
@@ -625,7 +625,7 @@ struct wait_page_queue {
static inline bool wake_page_match(struct wait_page_queue *wait_page,
struct wait_page_key *key)
{
- if (wait_page->page != key->page)
+ if (wait_page->folio != key->folio)
return false;
key->page_match = 1;

diff --git a/mm/filemap.c b/mm/filemap.c
index 57f46ff2f230..1cdd565c69a6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -992,11 +992,11 @@ EXPORT_SYMBOL(__page_cache_alloc);
*/
#define PAGE_WAIT_TABLE_BITS 8
#define PAGE_WAIT_TABLE_SIZE (1 << PAGE_WAIT_TABLE_BITS)
-static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;
+static wait_queue_head_t folio_wait_table[PAGE_WAIT_TABLE_SIZE] __cacheline_aligned;

-static wait_queue_head_t *page_waitqueue(struct page *page)
+static wait_queue_head_t *folio_waitqueue(struct folio *folio)
{
- return &page_wait_table[hash_ptr(page, PAGE_WAIT_TABLE_BITS)];
+ return &folio_wait_table[hash_ptr(folio, PAGE_WAIT_TABLE_BITS)];
}

void __init pagecache_init(void)
@@ -1004,7 +1004,7 @@ void __init pagecache_init(void)
int i;

for (i = 0; i < PAGE_WAIT_TABLE_SIZE; i++)
- init_waitqueue_head(&page_wait_table[i]);
+ init_waitqueue_head(&folio_wait_table[i]);

page_writeback_init();
}
@@ -1059,10 +1059,11 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
*/
flags = wait->flags;
if (flags & WQ_FLAG_EXCLUSIVE) {
- if (test_bit(key->bit_nr, &key->page->flags))
+ if (test_bit(key->bit_nr, &key->folio->page.flags))
return -1;
if (flags & WQ_FLAG_CUSTOM) {
- if (test_and_set_bit(key->bit_nr, &key->page->flags))
+ if (test_and_set_bit(key->bit_nr,
+ &key->folio->page.flags))
return -1;
flags |= WQ_FLAG_DONE;
}
@@ -1096,12 +1097,12 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,

static void wake_up_folio_bit(struct folio *folio, int bit_nr)
{
- wait_queue_head_t *q = page_waitqueue(&folio->page);
+ wait_queue_head_t *q = folio_waitqueue(folio);
struct wait_page_key key;
unsigned long flags;
wait_queue_entry_t bookmark;

- key.page = &folio->page;
+ key.folio = folio;
key.bit_nr = bit_nr;
key.page_match = 0;

@@ -1193,7 +1194,7 @@ int sysctl_page_lock_unfairness = 5;
static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,
int state, enum behavior behavior)
{
- wait_queue_head_t *q = page_waitqueue(&folio->page);
+ wait_queue_head_t *q = folio_waitqueue(folio);
int unfairness = sysctl_page_lock_unfairness;
struct wait_page_queue wait_page;
wait_queue_entry_t *wait = &wait_page.wait;
@@ -1213,7 +1214,7 @@ static inline int wait_on_folio_bit_common(struct folio *folio, int bit_nr,

init_wait(wait);
wait->func = wake_page_function;
- wait_page.page = &folio->page;
+ wait_page.folio = folio;
wait_page.bit_nr = bit_nr;

repeat:
@@ -1370,12 +1371,13 @@ int put_and_wait_on_page_locked(struct page *page, int state)
*/
void add_page_wait_queue(struct page *page, wait_queue_entry_t *waiter)
{
- wait_queue_head_t *q = page_waitqueue(page);
+ struct folio *folio = page_folio(page);
+ wait_queue_head_t *q = folio_waitqueue(folio);
unsigned long flags;

spin_lock_irqsave(&q->lock, flags);
__add_wait_queue_entry_tail(q, waiter);
- SetPageWaiters(page);
+ SetFolioWaiters(folio);
spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL_GPL(add_page_wait_queue);
@@ -1503,10 +1505,10 @@ EXPORT_SYMBOL_GPL(__lock_folio_killable);

int __lock_folio_async(struct folio *folio, struct wait_page_queue *wait)
{
- struct wait_queue_head *q = page_waitqueue(&folio->page);
+ struct wait_queue_head *q = folio_waitqueue(folio);
int ret = 0;

- wait->page = &folio->page;
+ wait->folio = folio;
wait->bit_nr = PG_locked;

spin_lock_irq(&q->lock);
--
2.30.0

2021-03-13 20:38:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Fri, 5 Mar 2021 04:18:36 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:

> Our type system does not currently distinguish between tail pages and
> head or single pages. This is a problem because we call compound_head()
> multiple times (and the compiler cannot optimise it out), bloating the
> kernel. It also makes programming hard as it is often unclear whether
> a function operates on an individual page, or an entire compound page.
>
> This patch series introduces the struct folio, which is a type that
> represents an entire compound page. This initial set reduces the kernel
> size by approximately 6kB, although its real purpose is adding
> infrastructure to enable further use of the folio.

Geeze it's a lot of noise. More things to remember and we'll forever
have a mismash of `page' and `folio' and code everywhere converting
from one to the other. Ongoing addition of folio
accessors/manipulators to overlay the existing page
accessors/manipulators, etc.

It's unclear to me that it's all really worth it. What feedback have
you seen from others?

2021-03-13 20:39:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Fri, 5 Mar 2021 04:18:37 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:

> A struct folio refers to an entire (possibly compound) page. A function
> which takes a struct folio argument declares that it will operate on the
> entire compound page, not just PAGE_SIZE bytes. In return, the caller
> guarantees that the pointer it is passing does not point to a tail page.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/mm.h | 30 ++++++++++++++++++++++++++++++
> include/linux/mm_types.h | 17 +++++++++++++++++

Perhaps a new folio.h would be neater.

> @@ -1518,6 +1523,30 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
> #endif
> }
>
> +static inline unsigned long folio_nr_pages(struct folio *folio)
> +{
> + return compound_nr(&folio->page);
> +}
> +
> +static inline struct folio *next_folio(struct folio *folio)
> +{
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> + return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
> +#else
> + return folio + folio_nr_pages(folio);
> +#endif
> +}

It's a shame this isn't called folio_something(), like the rest of the API.

Unclear what this does. Some comments would help.

> +static inline unsigned int folio_shift(struct folio *folio)
> +{
> + return PAGE_SHIFT + folio_order(folio);
> +}
> +
> +static inline size_t folio_size(struct folio *folio)
> +{
> + return PAGE_SIZE << folio_order(folio);
> +}

Why size_t? That's pretty rare in this space and I'd have expected
unsigned long.


> @@ -1623,6 +1652,7 @@ extern void pagefault_out_of_memory(void);
>
> #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK)
> #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1))
> +#define offset_in_folio(folio, p) ((unsigned long)(p) & (folio_size(folio) - 1))
>
> /*
> * Flags passed to show_mem() and show_free_areas() to suppress output in
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 0974ad501a47..a311cb48526f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -223,6 +223,23 @@ struct page {
> #endif
> } _struct_page_alignment;
>
> +/*
> + * A struct folio is either a base (order-0) page or the head page of
> + * a compound page.
> + */
> +struct folio {
> + struct page page;
> +};
> +
> +static inline struct folio *page_folio(struct page *page)
> +{
> + unsigned long head = READ_ONCE(page->compound_head);
> +
> + if (unlikely(head & 1))
> + return (struct folio *)(head - 1);
> + return (struct folio *)page;
> +}

What purpose does the READ_ONCE() serve?


2021-03-13 20:41:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 09/25] mm: Add folio_index, folio_page and folio_contains

On Fri, 5 Mar 2021 04:18:45 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:

> folio_index() is the equivalent of page_index() for folios. folio_page()
> finds the page in a folio for a page cache index. folio_contains()
> tells you whether a folio contains a particular page cache index.
>

copy-paste changelog into each function's covering comment?

> ---
> include/linux/pagemap.h | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f07c03da83f6..5094b50f7680 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -447,6 +447,29 @@ static inline bool thp_contains(struct page *head, pgoff_t index)
> return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
> }
>
> +static inline pgoff_t folio_index(struct folio *folio)
> +{
> + if (unlikely(FolioSwapCache(folio)))
> + return __page_file_index(&folio->page);
> + return folio->page.index;
> +}
> +
> +static inline struct page *folio_page(struct folio *folio, pgoff_t index)
> +{
> + index -= folio_index(folio);
> + VM_BUG_ON_FOLIO(index >= folio_nr_pages(folio), folio);
> + return &folio->page + index;
> +}

One would expect folio_page() to be the reverse of page_folio(), only
it isn't anything like that.

> +/* Does this folio contain this index? */
> +static inline bool folio_contains(struct folio *folio, pgoff_t index)
> +{
> + /* HugeTLBfs indexes the page cache in units of hpage_size */
> + if (PageHuge(&folio->page))
> + return folio->page.index == index;
> + return index - folio_index(folio) < folio_nr_pages(folio);
> +}
> +
> /*
> * Given the page we found in the page cache, return the page corresponding
> * to this index in the file

2021-03-13 20:41:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics

On Fri, 5 Mar 2021 04:18:39 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:

> Allow page counters to be more readily modified by callers which have
> a folio. Name these wrappers with 'stat' instead of 'state' as requested
> by Linus here:
> https://lore.kernel.org/linux-mm/[email protected]om/
>
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -402,6 +402,54 @@ static inline void drain_zonestat(struct zone *zone,
> struct per_cpu_pageset *pset) { }
> #endif /* CONFIG_SMP */
>
> +static inline
> +void __inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
> +{
> + __mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
> +}

The naming is unfortunate. We expect

inc: add one to
dec: subtract one from
mod: modify by signed quantity

So these are inconsistent. Perhaps use "add" and "sub" instead. At
least to alert people to the fact that these are different.

And, again, it's nice to see the subsystem's name leading the
identifiers, so "zone_folio_stat_add()".


2021-03-14 02:42:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Sat, Mar 13, 2021 at 12:36:58PM -0800, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:36 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > Our type system does not currently distinguish between tail pages and
> > head or single pages. This is a problem because we call compound_head()
> > multiple times (and the compiler cannot optimise it out), bloating the
> > kernel. It also makes programming hard as it is often unclear whether
> > a function operates on an individual page, or an entire compound page.
> >
> > This patch series introduces the struct folio, which is a type that
> > represents an entire compound page. This initial set reduces the kernel
> > size by approximately 6kB, although its real purpose is adding
> > infrastructure to enable further use of the folio.
>
> Geeze it's a lot of noise. More things to remember and we'll forever
> have a mismash of `page' and `folio' and code everywhere converting
> from one to the other. Ongoing addition of folio
> accessors/manipulators to overlay the existing page
> accessors/manipulators, etc.
>
> It's unclear to me that it's all really worth it. What feedback have
> you seen from others?

Mmm. The thing is, the alternative is ongoing bugs. And inefficiencies.
Today, we have code everywhere converting from tail pages to head pages
-- we just don't notice it because it's all wrapped up in macros. I
have over 10kB in text size reductions in my tree (yes, it's a monster
series of patches), almost all from removing those conversions. And
it's far from done.

And these conversions are all in hot paths, like handling page faults
and read(). For example:

filemap_fault 1980 1289 -691

it's two-thirds the size it was! Surely that's not all in the hot path,
but still it's going to have some kind of effect.

As well, we have code today that _looks_ right but is buggy. Take a
look at vfs_dedupe_file_range_compare(). There's nothing wrong with
it at first glance, until you realise that vfs_dedupe_get_page() might
return a tail page, and you can't look at page->mapping for a tail page.
Nor page->index, so vfs_lock_two_pages() is also broken.

As far as feedback, I really want more. Particularly from filesystem
people. I don't think a lot of them realise yet that I'm going to change
15 of the 22 address_space_ops to work with folios instead of pages.
Individual filesystems can keep working with pages, of course, until
they enable the "use multipage folios" bit.

2021-03-14 03:12:27

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Sat, 13 Mar 2021, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:36 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > Our type system does not currently distinguish between tail pages and
> > head or single pages. This is a problem because we call compound_head()
> > multiple times (and the compiler cannot optimise it out), bloating the
> > kernel. It also makes programming hard as it is often unclear whether
> > a function operates on an individual page, or an entire compound page.
> >
> > This patch series introduces the struct folio, which is a type that
> > represents an entire compound page. This initial set reduces the kernel
> > size by approximately 6kB, although its real purpose is adding
> > infrastructure to enable further use of the folio.
>
> Geeze it's a lot of noise. More things to remember and we'll forever
> have a mismash of `page' and `folio' and code everywhere converting
> from one to the other. Ongoing addition of folio
> accessors/manipulators to overlay the existing page
> accessors/manipulators, etc.
>
> It's unclear to me that it's all really worth it. What feedback have
> you seen from others?

My own feeling and feedback have been much like yours.

I don't get very excited by type safety at this level; and although
I protested back when all those compound_head()s got tucked into the
*PageFlag() functions, the text size increase was not very much, and
I never noticed any adverse performance reports.

To me, it's distraction, churn and friction, ongoing for years; but
that's just me, and I'm resigned to the possibility that it will go in.
Matthew is not alone in wanting to pursue it: let others speak.

Hugh

2021-03-14 04:15:05

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 09/25] mm: Add folio_index, folio_page and folio_contains

On Sat, Mar 13, 2021 at 12:37:16PM -0800, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:45 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> > folio_index() is the equivalent of page_index() for folios. folio_page()
> > finds the page in a folio for a page cache index. folio_contains()
> > tells you whether a folio contains a particular page cache index.
> >
>
> copy-paste changelog into each function's covering comment?

Certainly.

> > +static inline struct page *folio_page(struct folio *folio, pgoff_t index)
> > +{
> > + index -= folio_index(folio);
> > + VM_BUG_ON_FOLIO(index >= folio_nr_pages(folio), folio);
> > + return &folio->page + index;
> > +}
>
> One would expect folio_page() to be the reverse of page_folio(), only
> it isn't anything like that.

It is ... but only for files. So maybe folio_file_page()?

2021-03-14 04:18:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Sat, Mar 13, 2021 at 12:37:02PM -0800, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:37 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > A struct folio refers to an entire (possibly compound) page. A function
> > which takes a struct folio argument declares that it will operate on the
> > entire compound page, not just PAGE_SIZE bytes. In return, the caller
> > guarantees that the pointer it is passing does not point to a tail page.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > include/linux/mm.h | 30 ++++++++++++++++++++++++++++++
> > include/linux/mm_types.h | 17 +++++++++++++++++
>
> Perhaps a new folio.h would be neater.

I understand that urge (I'm no fan of the size of mm.h ...), but it ends
up pretty interwoven with mm.h. For example:

static inline struct zone *folio_zone(const struct folio *folio)
{
return page_zone(&folio->page);
}

so we need both struct folio defined here and we need page_zone().
page_zone() is defined in mm.h, so we'd have folio.h including mm.h.
But then put_page() calls put_folio(), so we need folio.h included
in mm.h.

The patch series I have does a lot of movement of page cache functionality
from mm.h to filemap.h, so you may end up with a smaller mm.h at the
end of it. Right now, it's 10 lines longer than it was.

> > +static inline struct folio *next_folio(struct folio *folio)
> > +{
> > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> > + return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio));
> > +#else
> > + return folio + folio_nr_pages(folio);
> > +#endif
> > +}
>
> It's a shame this isn't called folio_something(), like the rest of the API.
>
> Unclear what this does. Some comments would help.

folio_next() it can be. I'll add some commentary.

> > +static inline unsigned int folio_shift(struct folio *folio)
> > +{
> > + return PAGE_SHIFT + folio_order(folio);
> > +}
> > +
> > +static inline size_t folio_size(struct folio *folio)
> > +{
> > + return PAGE_SIZE << folio_order(folio);
> > +}
>
> Why size_t? That's pretty rare in this space and I'd have expected
> unsigned long.

I like to use size_t for things which are the number of bytes represented
by an in-memory object. As opposed to all the other things which we
use unsigned long for. Maybe that's more common on the filesystem side
of the house.

> > +static inline struct folio *page_folio(struct page *page)
> > +{
> > + unsigned long head = READ_ONCE(page->compound_head);
> > +
> > + if (unlikely(head & 1))
> > + return (struct folio *)(head - 1);
> > + return (struct folio *)page;
> > +}
>
> What purpose does the READ_ONCE() serve?

Same purpose as it does in compound_head():

static inline struct page *compound_head(struct page *page)
{
unsigned long head = READ_ONCE(page->compound_head);

if (unlikely(head & 1))
return (struct page *) (head - 1);
return page;
}

I think Kirill would say that it's to defend against splitting if we
don't have a refcount on the page yet. So if we do something like walk
the page tables, find a PTE, translate it to a struct page, then try to
get a reference on the head page, we don't end up with an incoherent
answer from 'compound_head()' if it's split in the middle of the call
and the page->compound_head field gets assigned some other value.

It might return the wrong page, so get_user_pages() still has to check
the page is right after it's got the reference, but at least this way
it's guaranteed to return something that was right at one time.

There might be more to it, but that's my understanding of why the code
is currently written this way.

2021-03-14 04:21:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics

On Sat, Mar 13, 2021 at 12:37:07PM -0800, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:39 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > Allow page counters to be more readily modified by callers which have
> > a folio. Name these wrappers with 'stat' instead of 'state' as requested
> > by Linus here:
> > https://lore.kernel.org/linux-mm/[email protected]om/
> >
> > --- a/include/linux/vmstat.h
> > +++ b/include/linux/vmstat.h
> > @@ -402,6 +402,54 @@ static inline void drain_zonestat(struct zone *zone,
> > struct per_cpu_pageset *pset) { }
> > #endif /* CONFIG_SMP */
> >
> > +static inline
> > +void __inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
> > +{
> > + __mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
> > +}
>
> The naming is unfortunate. We expect
>
> inc: add one to
> dec: subtract one from
> mod: modify by signed quantity
>
> So these are inconsistent. Perhaps use "add" and "sub" instead. At
> least to alert people to the fact that these are different.
>
> And, again, it's nice to see the subsystem's name leading the
> identifiers, so "zone_folio_stat_add()".

I thought this was a 'zone stat', so __zone_stat_add_folio()?
I'm not necessarily signing up to change the existing
__inc_node_page_state(), but I might. If so, __node_stat_add_page()?

2021-03-14 04:58:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics

On Sun, 14 Mar 2021 04:11:55 +0000 Matthew Wilcox <[email protected]> wrote:

> On Sat, Mar 13, 2021 at 12:37:07PM -0800, Andrew Morton wrote:
> > On Fri, 5 Mar 2021 04:18:39 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> >
> > > Allow page counters to be more readily modified by callers which have
> > > a folio. Name these wrappers with 'stat' instead of 'state' as requested
> > > by Linus here:
> > > https://lore.kernel.org/linux-mm/[email protected]om/
> > >
> > > --- a/include/linux/vmstat.h
> > > +++ b/include/linux/vmstat.h
> > > @@ -402,6 +402,54 @@ static inline void drain_zonestat(struct zone *zone,
> > > struct per_cpu_pageset *pset) { }
> > > #endif /* CONFIG_SMP */
> > >
> > > +static inline
> > > +void __inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)
> > > +{
> > > + __mod_zone_page_state(folio_zone(folio), item, folio_nr_pages(folio));
> > > +}
> >
> > The naming is unfortunate. We expect
> >
> > inc: add one to
> > dec: subtract one from
> > mod: modify by signed quantity
> >
> > So these are inconsistent. Perhaps use "add" and "sub" instead. At
> > least to alert people to the fact that these are different.
> >
> > And, again, it's nice to see the subsystem's name leading the
> > identifiers, so "zone_folio_stat_add()".
>
> I thought this was a 'zone stat', so __zone_stat_add_folio()?
> I'm not necessarily signing up to change the existing
> __inc_node_page_state(), but I might. If so, __node_stat_add_page()?

That works. It's the "inc means +1" and "dec means -1" whiplash that
struck me the most.

2021-03-15 02:30:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 07/25] mm: Create FolioFlags

On Fri, Mar 05, 2021 at 04:18:43AM +0000, Matthew Wilcox (Oracle) wrote:
> These new functions are the folio analogues of the PageFlags functions.
> If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio is not a tail
> page at every invocation. Note that this will also catch the PagePoisoned
> case as a poisoned page has every bit set, which would include PageTail.
>
> This saves 1740 bytes of text with the distro-derived config that
> I'm testing due to removing a double call to compound_head() in
> PageSwapCache().

This patch is buggy due to using the wrong page->flags for FolioDoubleMapped.
I'm not totally in love with this fix, but it does work without changing
every PAGEFLAG definition.

(also, I needed FolioTransHuge())

commit fe8ca904171345d113f06f381c255a3c4b20074e
Author: Matthew Wilcox (Oracle) <[email protected]>
Date: Sun Mar 14 17:34:48 2021 -0400

fix FolioFlags

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 01aa4a71bf14..b7fd4c3733ca 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -212,10 +212,13 @@ static inline void page_init_poison(struct page *page, size_t size)
}
#endif

-static unsigned long *folio_flags(struct folio *folio)
+static unsigned long *folio_flags(struct folio *folio, unsigned n)
{
- VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
- return &folio->page.flags;
+ struct page *page = &folio->page;
+
+ VM_BUG_ON_PGFLAGS(PageTail(page), page);
+ VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
+ return &page[n].flags;
}

/*
@@ -262,48 +265,56 @@ static unsigned long *folio_flags(struct folio *folio)
VM_BUG_ON_PGFLAGS(!PageHead(page), page); \
PF_POISONED_CHECK(&page[1]); })

+/* Which page is the flag stored in */
+#define FOLIO_PF_ANY 0
+#define FOLIO_PF_HEAD 0
+#define FOLIO_PF_ONLY_HEAD 0
+#define FOLIO_PF_NO_TAIL 0
+#define FOLIO_PF_NO_COMPOUND 0
+#define FOLIO_PF_SECOND 1
+
/*
* Macros to create function definitions for page flags
*/
#define TESTPAGEFLAG(uname, lname, policy) \
static __always_inline int Folio##uname(struct folio *folio) \
- { return test_bit(PG_##lname, folio_flags(folio)); } \
+ { return test_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline int Page##uname(struct page *page) \
{ return test_bit(PG_##lname, &policy(page, 0)->flags); }

#define SETPAGEFLAG(uname, lname, policy) \
static __always_inline void SetFolio##uname(struct folio *folio) \
- { set_bit(PG_##lname, folio_flags(folio)); } \
+ { set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline void SetPage##uname(struct page *page) \
{ set_bit(PG_##lname, &policy(page, 1)->flags); }

#define CLEARPAGEFLAG(uname, lname, policy) \
static __always_inline void ClearFolio##uname(struct folio *folio) \
- { clear_bit(PG_##lname, folio_flags(folio)); } \
+ { clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline void ClearPage##uname(struct page *page) \
{ clear_bit(PG_##lname, &policy(page, 1)->flags); }

#define __SETPAGEFLAG(uname, lname, policy) \
static __always_inline void __SetFolio##uname(struct folio *folio) \
- { __set_bit(PG_##lname, folio_flags(folio)); } \
+ { __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline void __SetPage##uname(struct page *page) \
{ __set_bit(PG_##lname, &policy(page, 1)->flags); }

#define __CLEARPAGEFLAG(uname, lname, policy) \
static __always_inline void __ClearFolio##uname(struct folio *folio) \
- { __clear_bit(PG_##lname, folio_flags(folio)); } \
+ { __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline void __ClearPage##uname(struct page *page) \
{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }

#define TESTSETFLAG(uname, lname, policy) \
static __always_inline int TestSetFolio##uname(struct folio *folio) \
- { return test_and_set_bit(PG_##lname, folio_flags(folio)); } \
+ { return test_and_set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline int TestSetPage##uname(struct page *page) \
{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }

#define TESTCLEARFLAG(uname, lname, policy) \
static __always_inline int TestClearFolio##uname(struct folio *folio) \
- { return test_and_clear_bit(PG_##lname, folio_flags(folio)); } \
+ { return test_and_clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
static __always_inline int TestClearPage##uname(struct page *page) \
{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }

@@ -422,7 +433,7 @@ PAGEFLAG_FALSE(HighMem)
static __always_inline bool FolioSwapCache(struct folio *folio)
{
return FolioSwapBacked(folio) &&
- test_bit(PG_swapcache, folio_flags(folio));
+ test_bit(PG_swapcache, folio_flags(folio, 0));

}

@@ -545,7 +556,7 @@ u64 stable_page_flags(struct page *page);

static inline int FolioUptodate(struct folio *folio)
{
- int ret = test_bit(PG_uptodate, folio_flags(folio));
+ int ret = test_bit(PG_uptodate, folio_flags(folio, 0));
/*
* Must ensure that the data we read out of the page is loaded
* _after_ we've loaded page->flags to check for PageUptodate.
@@ -568,7 +579,7 @@ static inline int PageUptodate(struct page *page)
static __always_inline void __SetFolioUptodate(struct folio *folio)
{
smp_wmb();
- __set_bit(PG_uptodate, folio_flags(folio));
+ __set_bit(PG_uptodate, folio_flags(folio, 0));
}

static __always_inline void SetFolioUptodate(struct folio *folio)
@@ -579,7 +590,7 @@ static __always_inline void SetFolioUptodate(struct folio *folio)
* uptodate are actually visible before PageUptodate becomes true.
*/
smp_wmb();
- set_bit(PG_uptodate, folio_flags(folio));
+ set_bit(PG_uptodate, folio_flags(folio, 0));
}

static __always_inline void __SetPageUptodate(struct page *page)
@@ -672,6 +683,11 @@ static inline int PageTransHuge(struct page *page)
return PageHead(page);
}

+static inline bool FolioTransHuge(struct folio *folio)
+{
+ return FolioHead(folio);
+}
+
/*
* PageTransCompound returns true for both transparent huge pages
* and hugetlbfs pages, so it should only be called when it's known

2021-03-15 11:56:28

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Sat, Mar 13, 2021 at 07:09:01PM -0800, Hugh Dickins wrote:
> On Sat, 13 Mar 2021, Andrew Morton wrote:
> > On Fri, 5 Mar 2021 04:18:36 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> >
> > > Our type system does not currently distinguish between tail pages and
> > > head or single pages. This is a problem because we call compound_head()
> > > multiple times (and the compiler cannot optimise it out), bloating the
> > > kernel. It also makes programming hard as it is often unclear whether
> > > a function operates on an individual page, or an entire compound page.
> > >
> > > This patch series introduces the struct folio, which is a type that
> > > represents an entire compound page. This initial set reduces the kernel
> > > size by approximately 6kB, although its real purpose is adding
> > > infrastructure to enable further use of the folio.
> >
> > Geeze it's a lot of noise. More things to remember and we'll forever
> > have a mismash of `page' and `folio' and code everywhere converting
> > from one to the other. Ongoing addition of folio
> > accessors/manipulators to overlay the existing page
> > accessors/manipulators, etc.
> >
> > It's unclear to me that it's all really worth it. What feedback have
> > you seen from others?
>
> My own feeling and feedback have been much like yours.
>
> I don't get very excited by type safety at this level; and although
> I protested back when all those compound_head()s got tucked into the
> *PageFlag() functions, the text size increase was not very much, and
> I never noticed any adverse performance reports.
>
> To me, it's distraction, churn and friction, ongoing for years; but
> that's just me, and I'm resigned to the possibility that it will go in.
> Matthew is not alone in wanting to pursue it: let others speak.

I'm with Matthew on this. I would really want to drop the number of places
where we call compoud_head(). I hope we can get rid of the page flag
policy hack I made.

--
Kirill A. Shutemov

2021-03-15 12:40:30

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon 15-03-21 14:55:01, Kirill A. Shutemov wrote:
> On Sat, Mar 13, 2021 at 07:09:01PM -0800, Hugh Dickins wrote:
> > On Sat, 13 Mar 2021, Andrew Morton wrote:
> > > On Fri, 5 Mar 2021 04:18:36 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> > >
> > > > Our type system does not currently distinguish between tail pages and
> > > > head or single pages. This is a problem because we call compound_head()
> > > > multiple times (and the compiler cannot optimise it out), bloating the
> > > > kernel. It also makes programming hard as it is often unclear whether
> > > > a function operates on an individual page, or an entire compound page.
> > > >
> > > > This patch series introduces the struct folio, which is a type that
> > > > represents an entire compound page. This initial set reduces the kernel
> > > > size by approximately 6kB, although its real purpose is adding
> > > > infrastructure to enable further use of the folio.
> > >
> > > Geeze it's a lot of noise. More things to remember and we'll forever
> > > have a mismash of `page' and `folio' and code everywhere converting
> > > from one to the other. Ongoing addition of folio
> > > accessors/manipulators to overlay the existing page
> > > accessors/manipulators, etc.
> > >
> > > It's unclear to me that it's all really worth it. What feedback have
> > > you seen from others?
> >
> > My own feeling and feedback have been much like yours.
> >
> > I don't get very excited by type safety at this level; and although
> > I protested back when all those compound_head()s got tucked into the
> > *PageFlag() functions, the text size increase was not very much, and
> > I never noticed any adverse performance reports.
> >
> > To me, it's distraction, churn and friction, ongoing for years; but
> > that's just me, and I'm resigned to the possibility that it will go in.
> > Matthew is not alone in wanting to pursue it: let others speak.
>
> I'm with Matthew on this. I would really want to drop the number of places
> where we call compoud_head(). I hope we can get rid of the page flag
> policy hack I made.

I tend to agree here as well. The level compoud_head has spread out
silently is just too large. There are people coming up with all sorts of
optimizations to workaround that, and they are quite right that this is
somehing worth doing, but last attempts I have seen were very focused on
specific page flags handling which is imho worse wrt maintainability
than a higher level and type safe abstraction. I find it quite nice that
this doesn't really have to be a flag day conversion but it can be done
incrementally.

I didn't get review the series yet and I cannot really promise anything
but from what I understand the conversion should be pretty
straightforward, albeit noisy.

One thing that was really strange to me when seeing the concept for the
first time was the choice of naming (no I do not want to start any
bikeshedding) because it hasn't really resonated with the udnerlying
concept. Maybe just me as a non native speaker... page_head would have
been so much more straightforward but not something I really care about.
--
Michal Hocko
SUSE Labs

2021-03-15 13:47:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon, Mar 15, 2021 at 02:55:01PM +0300, Kirill A. Shutemov wrote:
> I'm with Matthew on this. I would really want to drop the number of places
> where we call compoud_head(). I hope we can get rid of the page flag
> policy hack I made.

I can't see that far ahead too clearly, but I do think that at some
point we'll actually distinguish between folio flags and page flags.
For example, we won't have a FolioHWPoison, because we won't keep a folio
together if one page in it has become defective. Nor will we have a
PageUptodate because we'll only care about whether a folio is uptodate.
And at that point, we won't want page flag policies.

2021-03-15 22:01:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

Michal Hocko <[email protected]> writes:
> bikeshedding) because it hasn't really resonated with the udnerlying
> concept. Maybe just me as a non native speaker... page_head would have
> been so much more straightforward but not something I really care
> about.

Yes. page_head explains exactly what it is.

But Folio? Huh?

-Andi

2021-03-16 02:31:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon, Mar 15, 2021 at 01:38:04PM +0100, Michal Hocko wrote:
> I tend to agree here as well. The level compoud_head has spread out
> silently is just too large. There are people coming up with all sorts of
> optimizations to workaround that, and they are quite right that this is
> somehing worth doing, but last attempts I have seen were very focused on
> specific page flags handling which is imho worse wrt maintainability
> than a higher level and type safe abstraction. I find it quite nice that
> this doesn't really have to be a flag day conversion but it can be done
> incrementally.
>
> I didn't get review the series yet and I cannot really promise anything
> but from what I understand the conversion should be pretty
> straightforward, albeit noisy.
>
> One thing that was really strange to me when seeing the concept for the
> first time was the choice of naming (no I do not want to start any
> bikeshedding) because it hasn't really resonated with the udnerlying
> concept. Maybe just me as a non native speaker... page_head would have
> been so much more straightforward but not something I really care about.

That pretty much summarizes my opinion as well. I'll need to find some
time to review the series as well.

2021-03-16 02:36:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon, Mar 15, 2021 at 07:09:04PM +0000, Christoph Hellwig wrote:
> On Mon, Mar 15, 2021 at 01:38:04PM +0100, Michal Hocko wrote:
> > I tend to agree here as well. The level compoud_head has spread out
> > silently is just too large. There are people coming up with all sorts of
> > optimizations to workaround that, and they are quite right that this is
> > somehing worth doing, but last attempts I have seen were very focused on
> > specific page flags handling which is imho worse wrt maintainability
> > than a higher level and type safe abstraction. I find it quite nice that
> > this doesn't really have to be a flag day conversion but it can be done
> > incrementally.
> >
> > I didn't get review the series yet and I cannot really promise anything
> > but from what I understand the conversion should be pretty
> > straightforward, albeit noisy.
> >
> > One thing that was really strange to me when seeing the concept for the
> > first time was the choice of naming (no I do not want to start any
> > bikeshedding) because it hasn't really resonated with the udnerlying
> > concept. Maybe just me as a non native speaker... page_head would have
> > been so much more straightforward but not something I really care about.
>
> That pretty much summarizes my opinion as well. I'll need to find some
> time to review the series as well.

If it's easier for you, I'm trying to keep
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio
up to date. Not all of those 111 patches are suitable for upstreaming,
but it might give you a better idea of where I'm going than if I only
posted the first 70-80 of them. Stopping at
"mm/memory: Use a folio in copy_pte_range()" nets us almost 10kb of
text reduction for the UEK-derived config, about 3.3kb on an allnoconfig
(which is a little over 0.1% on a 2.4MB kernel).

The reason I didn't go with 'head' is that traditionally 'head' implies
that there are tail pages. It would be weird to ask 'if (HeadHead(head))'
That's currently spelled 'if (FolioMulti(folio))'. But it can be changed
if there's a really better alternative. It'll make me more grumpy if
somebody comes up with a really good alternative in six months.

I would agree that the conversion is both straightforward and noisy.
There are some minor things that crop up, like noticing that we get
the accounting wrong for writeback of compound pages. That's not
entirely unexpected since no filesystem supports both compound pages
and writeback today.

2021-03-16 05:50:56

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon, Mar 15, 2021 at 07:40:14PM +0000, Matthew Wilcox wrote:
> I would agree that the conversion is both straightforward and noisy.
> There are some minor things that crop up, like noticing that we get
> the accounting wrong for writeback of compound pages. That's not
> entirely unexpected since no filesystem supports both compound pages
> and writeback today.

And this is where the abstraction that the "folio" introduces is
required - filesystem people don't want to have to deal with all the
complexity and subtlety of compound pages when large page support is
added to the page cache. As Willy says, this will be a never-ending
source of data corruption bugs....

Hence from the filesystem side of things, I think this abstraction
is absolutely necessary. Especially because handling buffered IO in
4kB pages really, really sucks from a performance persepctive and
the sooner we have native high-order page support in the page cache
the better. These days buffered IO often can't even max out a cheap
NVMe SSD because of the CPU cost of per-page state management in the
page cache. So the sooner we have large page support to mitigate the
worst of the overhead for streaming buffered IO, the better.

FWIW, like others, I'm not a fan of "folio" as a name, but I can live
with it because it so jarringly different to "pages" that we're not
going to confuse the two of them. But if we want a better name, my
suggestion would be for a "struct cage" as in Compound pAGE....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2021-03-17 17:25:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 03/25] mm/vmstat: Add functions to account folio statistics

> +static inline
> +void __inc_zone_folio_stat(struct folio *folio, enum zone_stat_item item)

This prototype style is weird and doesn't follow either of the preffered
styles..

> static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
> int migratetype)
> {
> @@ -536,6 +584,24 @@ static inline void __dec_lruvec_page_state(struct page *page,
> __mod_lruvec_page_state(page, idx, -1);
> }
>
> +static inline void __mod_lruvec_folio_stat(struct folio *folio,
> + enum node_stat_item idx, int val)

.. like the ones that exist and are added by you just below.

2021-03-17 17:30:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 10/25] mm/util: Add folio_mapping and folio_file_mapping

> +struct address_space *page_mapping(struct page *);
> +struct address_space *folio_mapping(struct folio *);
> +struct address_space *__folio_file_mapping(struct folio *);
> +
> +static inline struct address_space *folio_file_mapping(struct folio *folio)
> +{
> + if (unlikely(FolioSwapCache(folio)))
> + return __folio_file_mapping(folio);

I think __folio_file_mapping is badly misnamed as it only deals with
swapcache folios. Maybe that should be reflected in the name?

Also for all these funtions documentation would be very helpful, even if
the existing struct page based helpers don't have that either.

2021-03-17 17:33:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 15/25] mm/filemap: Convert lock_page_async to lock_folio_async

On Fri, Mar 05, 2021 at 04:18:51AM +0000, Matthew Wilcox (Oracle) wrote:
> There aren't any actual callers of lock_page_async(), but convert
> filemap_update_page() to call __lock_folio_async().

So please just kill lock_page_async first and mark __lock_page_async
static. Then only update __lock_page_async to work on a folio.

2021-03-17 17:43:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 20/25] mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit

> + if (FolioWriteback(folio) &&
> + wait_on_folio_bit_killable(folio, PG_writeback) < 0)
> return VM_FAULT_RETRY;

This really screams for a proper wait_on_page_writeback_killable helper
rather than hardcoding the PG_* bit in a random file system. It also
seems to have missed the conversion to a while loop in
wait_on_page_writeback.

Also this patch seems to be different in style to other by not for now
always using page wrappers in the file systems. Not that I really care
either way, but it seems inconsistent with the rest.

> /*
> - * This is exported only for wait_on_page_locked/wait_on_page_writeback, etc.,
> + * This is exported only for wait_on_folio_locked/wait_on_folio_writeback, etc.,
> * and should not be used directly.
> */
> -extern void wait_on_page_bit(struct page *page, int bit_nr);
> -extern int wait_on_page_bit_killable(struct page *page, int bit_nr);
> +extern void wait_on_folio_bit(struct folio *folio, int bit_nr);
> +extern int wait_on_folio_bit_killable(struct folio *folio, int bit_nr);

Well, the above code obviously ignored this comment :( Maybe an
__ prefix is a bit more of hint?

2021-03-17 17:50:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 25/25] cachefiles: Switch to wait_page_key

On Fri, Mar 05, 2021 at 04:19:01AM +0000, Matthew Wilcox (Oracle) wrote:
> Cachefiles was relying on wait_page_key and wait_bit_key being the
> same layout, which is fragile. Now that wait_page_key is exposed in
> the pagemap.h header, we can remove that fragility. Also switch it
> to use the folio directly instead of the page.

Yikes. That fix itself is something that should go into mainline ASAP as
it fixes a massive landmine instead of mixing it up with the folio
conversion.

2021-03-17 21:27:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
> +/*
> + * A struct folio is either a base (order-0) page or the head page of
> + * a compound page.
> + */

Hmm. While that comment seems to be true I'm not sure it is the
essence. Maybe it should be more framed in terms of

"A folio represents a contigously allocated chunk of memory.."

and then extend it with the categories of state and operations performed
on the folio while those get added. The above statement can still
remain as a low-level explanation, maybe moved to the page member
instead of the type itself.

2021-03-17 21:29:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 08/25] mm: Handle per-folio private data

> +static inline void attach_page_private(struct page *page, void *data)
> +{
> + attach_folio_private((struct folio *)page, data);
> +}
> +
> +static inline void *detach_page_private(struct page *page)
> +{
> + return detach_folio_private((struct folio *)page);
> +}

I hate these open code casts. Can't we have a single central
page_to_folio helper, which could also grow a debug check (maybe
under a new config option) to check that it really is called on a
head page?

2021-03-17 21:29:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 09/25] mm: Add folio_index, folio_page and folio_contains

On Sat, Mar 13, 2021 at 12:37:16PM -0800, Andrew Morton wrote:
> On Fri, 5 Mar 2021 04:18:45 +0000 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > folio_index() is the equivalent of page_index() for folios. folio_page()
> > finds the page in a folio for a page cache index. folio_contains()
> > tells you whether a folio contains a particular page cache index.
> >
>
> copy-paste changelog into each function's covering comment?

Yes, I think documenting all these helpers would be very useful. The
lack of good commens on some of the page related existing helpers has
driven me insane sometimes.

2021-03-17 21:35:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Mon, Mar 15, 2021 at 07:40:14PM +0000, Matthew Wilcox wrote:
> The reason I didn't go with 'head' is that traditionally 'head' implies
> that there are tail pages. It would be weird to ask 'if (HeadHead(head))'
> That's currently spelled 'if (FolioMulti(folio))'. But it can be changed
> if there's a really better alternative. It'll make me more grumpy if
> somebody comes up with a really good alternative in six months.

The folio name keeps growing on me. Still not perfect, but I do like
that it is:

a) short

and

b) very different from page

2021-03-18 18:00:17

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 08/25] mm: Handle per-folio private data

On Wed, Mar 17, 2021 at 06:20:32PM +0100, Christoph Hellwig wrote:
> > +static inline void attach_page_private(struct page *page, void *data)
> > +{
> > + attach_folio_private((struct folio *)page, data);
> > +}
> > +
> > +static inline void *detach_page_private(struct page *page)
> > +{
> > + return detach_folio_private((struct folio *)page);
> > +}
>
> I hate these open code casts. Can't we have a single central
> page_to_folio helper, which could also grow a debug check (maybe
> under a new config option) to check that it really is called on a
> head page?

Some of that is already there. We have page_folio() which is the
page_to_folio() helper you're asking for. And folio_flags() (which is
called *all the time*) contains
VM_BUG_ON_PGFLAGS(PageTail(page), page);
Someone passing around a tail pointer cast to a folio is not going to
get very far, assuming CONFIG_DEBUG_VM_PGFLAGS is enabled (most distros
don't, but I do when I'm testing anything THPish).

These helpers aren't going to live for very long ... I expect to have
all filesystems which use attach/detach page private converted to folios
pretty soon. Certainly before any of them _use_ multi-page folios.

Anyway, the simple thing to do is just to use page_folio() here and eat
the cost of calling compound_head() on something we're certain is an
order-0 page. It only defers the win of removing the compound_head()
call; it doesn't preclude it. And it means we're not setting a bad
example here (there really shouldn't be any casts from pages to folios,
except in the folio allocator, which uses the page allocator and then
casts what _must be_ a non-tail page to a folio).

2021-03-18 23:58:56

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
> A struct folio refers to an entire (possibly compound) page. A function
> which takes a struct folio argument declares that it will operate on the
> entire compound page, not just PAGE_SIZE bytes. In return, the caller
> guarantees that the pointer it is passing does not point to a tail page.
>

Is this a part of a larger use case or general cleanup/refactor where
the split between page and folio simplify programming?

Balbir Singh.


2021-03-19 01:28:03

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
> On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
> > A struct folio refers to an entire (possibly compound) page. A function
> > which takes a struct folio argument declares that it will operate on the
> > entire compound page, not just PAGE_SIZE bytes. In return, the caller
> > guarantees that the pointer it is passing does not point to a tail page.
> >
>
> Is this a part of a larger use case or general cleanup/refactor where
> the split between page and folio simplify programming?

The goal here is to manage memory in larger chunks. Pages are now too
small for just about every workload. Even compiling the kernel sees a 7%
performance improvement just by doing readahead using relatively small
THPs (16k-256k). You can see that work here:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/master

I think Kirill, Hugh and others have done a fantastic job stretching
the page struct to work in shmem, but we really need a different type
to avoid people writing code that _looks_ right but is actually buggy.
So I'm starting again, this time with the folio metaphor.

2021-03-19 04:03:55

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 00/25] Page folios

On Fri, Mar 05, 2021 at 04:18:36AM +0000, Matthew Wilcox (Oracle) wrote:
> Our type system does not currently distinguish between tail pages and
> head or single pages. This is a problem because we call compound_head()
> multiple times (and the compiler cannot optimise it out), bloating the
> kernel. It also makes programming hard as it is often unclear whether
> a function operates on an individual page, or an entire compound page.

I've pushed a new version out here:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio

I think it takes into account everyone's comments so far. It even compiles.

2021-03-20 02:11:48

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Fri, Mar 19, 2021 at 01:25:27AM +0000, Matthew Wilcox wrote:
> On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
> > On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
> > > A struct folio refers to an entire (possibly compound) page. A function
> > > which takes a struct folio argument declares that it will operate on the
> > > entire compound page, not just PAGE_SIZE bytes. In return, the caller
> > > guarantees that the pointer it is passing does not point to a tail page.
> > >
> >
> > Is this a part of a larger use case or general cleanup/refactor where
> > the split between page and folio simplify programming?
>
> The goal here is to manage memory in larger chunks. Pages are now too
> small for just about every workload. Even compiling the kernel sees a 7%
> performance improvement just by doing readahead using relatively small
> THPs (16k-256k). You can see that work here:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/master
>
> I think Kirill, Hugh and others have done a fantastic job stretching
> the page struct to work in shmem, but we really need a different type
> to avoid people writing code that _looks_ right but is actually buggy.
> So I'm starting again, this time with the folio metaphor.

Thanks, makes sense, I'll take a look.

Balbir Singh.

2021-03-22 02:54:13

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

Excerpts from Matthew Wilcox's message of March 19, 2021 11:25 am:
> On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
>> On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
>> > A struct folio refers to an entire (possibly compound) page. A function
>> > which takes a struct folio argument declares that it will operate on the
>> > entire compound page, not just PAGE_SIZE bytes. In return, the caller
>> > guarantees that the pointer it is passing does not point to a tail page.
>> >
>>
>> Is this a part of a larger use case or general cleanup/refactor where
>> the split between page and folio simplify programming?
>
> The goal here is to manage memory in larger chunks. Pages are now too
> small for just about every workload. Even compiling the kernel sees a 7%
> performance improvement just by doing readahead using relatively small
> THPs (16k-256k). You can see that work here:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/master

The 7% improvement comes from cache cold kbuild by improving IO
patterns?

Just wondering what kind of readahead is enabled by this that can't
be done with base page size.

Thanks,
Nick

2021-03-22 03:56:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 01/25] mm: Introduce struct folio

On Mon, Mar 22, 2021 at 12:52:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Matthew Wilcox's message of March 19, 2021 11:25 am:
> > On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
> >> On Fri, Mar 05, 2021 at 04:18:37AM +0000, Matthew Wilcox (Oracle) wrote:
> >> > A struct folio refers to an entire (possibly compound) page. A function
> >> > which takes a struct folio argument declares that it will operate on the
> >> > entire compound page, not just PAGE_SIZE bytes. In return, the caller
> >> > guarantees that the pointer it is passing does not point to a tail page.
> >> >
> >>
> >> Is this a part of a larger use case or general cleanup/refactor where
> >> the split between page and folio simplify programming?
> >
> > The goal here is to manage memory in larger chunks. Pages are now too
> > small for just about every workload. Even compiling the kernel sees a 7%
> > performance improvement just by doing readahead using relatively small
> > THPs (16k-256k). You can see that work here:
> > https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/master
>
> The 7% improvement comes from cache cold kbuild by improving IO
> patterns?
>
> Just wondering what kind of readahead is enabled by this that can't
> be done with base page size.

I see my explanation earlier was confusing. What I meant to say
was that the only way in that patch set to create larger pages was
at readahead time. Writes were incapable of creating larger pages.
Once pages were in the page cache, they got managed at that granularity
unless they got split by a truncate/holepunch/io-error/...

I don't have good perf runs of kernbench to say exactly where we got the
benefit. My assumption is that because we're managing an entire, say,
256kB page as a single unit on the LRU list, we benefit from lower LRU
lock contention. There's also the benefit of batching, eg, allocating
a single 256kB page from the page allocator may well be more effective
than allocating 64 4kB pages.