2022-03-30 05:56:28

by NeilBrown

[permalink] [raw]
Subject: [PATCH 00/10] MM changes to improve swap-over-NFS support

Assorted improvements for swap-via-filesystem.

This is a resend of these patches, rebased on current HEAD.
The only substantial changes is that swap_dirty_folio has replaced
swap_set_page_dirty.

Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It
has previously worked for NFS but that broke a few releases back.
This series changes to use a new ->swap_rw rather than ->readpage and
->direct_IO. It also makes other improvements.

There is a companion series already in linux-next which fixes various
issues with NFS. Once both series land, a final patch is needed which
changes NFS over to use ->swap_rw.

Thanks,
NeilBrown


---

NeilBrown (10):
MM: create new mm/swap.h header file.
MM: drop swap_dirty_folio
MM: move responsibility for setting SWP_FS_OPS to ->swap_activate
MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space
MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space
MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw
DOC: update documentation for swap_activate and swap_rw
MM: submit multipage reads for SWP_FS_OPS swap-space
MM: submit multipage write for SWP_FS_OPS swap-space
VFS: Add FMODE_CAN_ODIRECT file flag


Documentation/filesystems/locking.rst | 18 +-
Documentation/filesystems/vfs.rst | 17 +-
drivers/block/loop.c | 4 +-
fs/cifs/file.c | 7 +-
fs/fcntl.c | 9 +-
fs/nfs/file.c | 20 ++-
fs/open.c | 9 +-
fs/overlayfs/file.c | 13 +-
include/linux/fs.h | 4 +
include/linux/swap.h | 7 +-
include/linux/writeback.h | 7 +
mm/madvise.c | 8 +-
mm/memory.c | 2 +-
mm/page_io.c | 247 +++++++++++++++++++-------
mm/swap.h | 30 +++-
mm/swap_state.c | 22 ++-
mm/swapfile.c | 13 +-
mm/vmscan.c | 38 ++--
18 files changed, 347 insertions(+), 128 deletions(-)

--
Signature


2022-03-30 06:51:07

by NeilBrown

[permalink] [raw]
Subject: [PATCH 03/10] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate

If a filesystem wishes to handle all swap IO itself (via ->direct_IO and
->readpage), rather than just providing devices addresses for
submit_bio(), SWP_FS_OPS must be set.
Currently the protocol for setting this it to have ->swap_activate
return zero. In that case SWP_FS_OPS is set, and add_swap_extent()
is called for the entire file.

This is a little clumsy as different return values for ->swap_activate
have quite different meanings, and it makes it hard to search for which
filesystems require SWP_FS_OPS to be set.

So remove the special meaning of a zero return, and require the
filesystem to set SWP_FS_OPS if it so desires, and to always call
add_swap_extent() as required.

Currently only NFS and CIFS return zero for add_swap_extent().

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
fs/cifs/file.c | 3 ++-
fs/nfs/file.c | 13 +++++++++++--
include/linux/swap.h | 6 ++++++
mm/swapfile.c | 10 +++-------
4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 60f43bff7ccb..050f463580f3 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4927,7 +4927,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
* from reading or writing the file
*/

- return 0;
+ sis->flags |= SWP_FS_OPS;
+ return add_swap_extent(sis, 0, sis->max, 0);
}

static void cifs_swap_deactivate(struct file *file)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 2df2a5392737..66136dca0ad5 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -488,6 +488,7 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
{
unsigned long blocks;
long long isize;
+ int ret;
struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
struct inode *inode = file->f_mapping->host;

@@ -500,9 +501,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
return -EINVAL;
}

+ ret = rpc_clnt_swap_activate(clnt);
+ if (ret)
+ return ret;
+ ret = add_swap_extent(sis, 0, sis->max, 0);
+ if (ret < 0) {
+ rpc_clnt_swap_deactivate(clnt);
+ return ret;
+ }
*span = sis->pages;
-
- return rpc_clnt_swap_activate(clnt);
+ sis->flags |= SWP_FS_OPS;
+ return ret;
}

static void nfs_swap_deactivate(struct file *file)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6bc9e21262de..e18b7edccc1d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -570,6 +570,12 @@ static inline swp_entry_t get_swap_page(struct page *page)
return entry;
}

+static inline int add_swap_extent(struct swap_info_struct *sis,
+ unsigned long start_page,
+ unsigned long nr_pages, sector_t start_block)
+{
+ return -EINVAL;
+}
#endif /* CONFIG_SWAP */

#ifdef CONFIG_THP_SWAP
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2650927a009b..8710c9c29862 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2244,13 +2244,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)

if (mapping->a_ops->swap_activate) {
ret = mapping->a_ops->swap_activate(sis, swap_file, span);
- if (ret >= 0)
- sis->flags |= SWP_ACTIVATED;
- if (!ret) {
- sis->flags |= SWP_FS_OPS;
- ret = add_swap_extent(sis, 0, sis->max, 0);
- *span = sis->pages;
- }
+ if (ret < 0)
+ return ret;
+ sis->flags |= SWP_ACTIVATED;
return ret;
}



2022-03-30 08:54:23

by NeilBrown

[permalink] [raw]
Subject: [PATCH 04/10] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space

If swap-out is using filesystem operations (SWP_FS_OPS), then it is not
safe to enter the FS for reclaim.
So only down-grade the requirement for swap pages to __GFP_IO after
checking that SWP_FS_OPS are not being used.

This makes the calculation of "may_enter_fs" slightly more complex, so
move it into a separate function. with that done, there is little value
in maintaining the bool variable any more. So replace the
may_enter_fs variable with a may_enter_fs() function. This removes any
risk for the variable becoming out-of-date.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
mm/swap.h | 8 ++++++++
mm/vmscan.c | 29 ++++++++++++++++++++---------
2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/mm/swap.h b/mm/swap.h
index f8265bf0ce00..e19f185df5e2 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -50,6 +50,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
struct vm_fault *vmf);

+static inline unsigned int page_swap_flags(struct page *page)
+{
+ return page_swap_info(page)->flags;
+}
#else /* CONFIG_SWAP */
static inline int swap_readpage(struct page *page, bool do_poll)
{
@@ -129,5 +133,9 @@ static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
{
}

+static inline unsigned int page_swap_flags(struct page *page)
+{
+ return 0;
+}
#endif /* CONFIG_SWAP */
#endif /* _MM_SWAP_H */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 60378d36ec77..9150754bf2b8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1502,6 +1502,22 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
return nr_succeeded;
}

+static bool may_enter_fs(struct page *page, gfp_t gfp_mask)
+{
+ if (gfp_mask & __GFP_FS)
+ return true;
+ if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO))
+ return false;
+ /*
+ * We can "enter_fs" for swap-cache with only __GFP_IO
+ * providing this isn't SWP_FS_OPS.
+ * ->flags can be updated non-atomicially (scan_swap_map_slots),
+ * but that will never affect SWP_FS_OPS, so the data_race
+ * is safe.
+ */
+ return !data_race(page_swap_flags(page) & SWP_FS_OPS);
+}
+
/*
* shrink_page_list() returns the number of reclaimed pages
*/
@@ -1528,7 +1544,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
struct page *page;
struct folio *folio;
enum page_references references = PAGEREF_RECLAIM;
- bool dirty, writeback, may_enter_fs;
+ bool dirty, writeback;
unsigned int nr_pages;

cond_resched();
@@ -1553,9 +1569,6 @@ static unsigned int shrink_page_list(struct list_head *page_list,
if (!sc->may_unmap && page_mapped(page))
goto keep_locked;

- may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
- (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
-
/*
* The number of dirty pages determines if a node is marked
* reclaim_congested. kswapd will stall and start writing
@@ -1598,7 +1611,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
* not to fs). In this case mark the page for immediate
* reclaim and continue scanning.
*
- * Require may_enter_fs because we would wait on fs, which
+ * Require may_enter_fs() because we would wait on fs, which
* may not have submitted IO yet. And the loop driver might
* enter reclaim, and deadlock if it waits on a page for
* which it is needed to do the write (loop masks off
@@ -1630,7 +1643,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,

/* Case 2 above */
} else if (writeback_throttling_sane(sc) ||
- !PageReclaim(page) || !may_enter_fs) {
+ !PageReclaim(page) || !may_enter_fs(page, sc->gfp_mask)) {
/*
* This is slightly racy - end_page_writeback()
* might have just cleared PageReclaim, then
@@ -1720,8 +1733,6 @@ static unsigned int shrink_page_list(struct list_head *page_list,
goto activate_locked_split;
}

- may_enter_fs = true;
-
/* Adding to swap updated mapping */
mapping = page_mapping(page);
}
@@ -1792,7 +1803,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,

if (references == PAGEREF_RECLAIM_CLEAN)
goto keep_locked;
- if (!may_enter_fs)
+ if (!may_enter_fs(page, sc->gfp_mask))
goto keep_locked;
if (!sc->may_writepage)
goto keep_locked;


2022-03-30 11:55:27

by NeilBrown

[permalink] [raw]
Subject: [PATCH 07/10] DOC: update documentation for swap_activate and swap_rw

This documentation for ->swap_activate() has been out-of-date for a long
time. This patch updates it to match recent changes, and adds
documentation for the associated ->swap_rw()

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
Documentation/filesystems/locking.rst | 18 ++++++++++++------
Documentation/filesystems/vfs.rst | 17 ++++++++++++-----
2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 2998cec9af4b..009d855c9be5 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -260,8 +260,9 @@ prototypes::
int (*launder_folio)(struct folio *);
bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
int (*error_remove_page)(struct address_space *, struct page *);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);

locking rules:
All except dirty_folio and freepage may block
@@ -290,6 +291,7 @@ is_partially_uptodate: yes
error_remove_page: yes
swap_activate: no
swap_deactivate: no
+swap_rw: yes, unlocks
====================== ======================== ========= ===============

->write_begin(), ->write_end() and ->readpage() may be called from
@@ -392,15 +394,19 @@ cleaned, or an error value if not. Note that in order to prevent the folio
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.

-->swap_activate will be called with a non-zero argument on
-files backing (non block device backed) swapfiles. A return value
-of zero indicates success, in which case this file can be used for
-backing swapspace. The swapspace operations will be proxied to the
-address space operations.
+->swap_activate() will be called to prepare the given file for swap. It
+should perform any validation and preparation necessary to ensure that
+writes can be performed with minimal memory allocation. It should call
+add_swap_extent(), or the helper iomap_swapfile_activate(), and return
+the number of extents added. If IO should be submitted through
+->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
+directly to the block device ``sis->bdev``.

->swap_deactivate() will be called in the sys_swapoff()
path after ->swap_activate() returned success.

+->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
+
file_lock_operations
====================

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 4f14edf93941..9d3480e089f6 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined:
size_t count);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
int (*swap_deactivate)(struct file *);
+ int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
};

``writepage``
@@ -963,15 +964,21 @@ cache in your filesystem. The following members are defined:
unless you have them locked or reference counts increased.

``swap_activate``
- Called when swapon is used on a file to allocate space if
- necessary and pin the block lookup information in memory. A
- return value of zero indicates success, in which case this file
- can be used to back swapspace.
+
+ Called to prepare the given file for swap. It should perform
+ any validation and preparation necessary to ensure that writes
+ can be performed with minimal memory allocation. It should call
+ add_swap_extent(), or the helper iomap_swapfile_activate(), and
+ return the number of extents added. If IO should be submitted
+ through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
+ be submitted directly to the block device ``sis->bdev``.

``swap_deactivate``
Called during swapoff on files where swap_activate was
successful.

+``swap_rw``
+ Called to read or write swap pages when SWP_FS_OPS is set.

The File Object
===============


2022-03-30 11:58:28

by NeilBrown

[permalink] [raw]
Subject: [PATCH 01/10] MM: create new mm/swap.h header file.

Many functions declared in include/linux/swap.h are only used within mm/

Create a new "mm/swap.h" and move some of these declarations there.
Remove the redundant 'extern' from the function declarations.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
include/linux/swap.h | 121 ---------------------------------------------
mm/huge_memory.c | 1
mm/madvise.c | 1
mm/memcontrol.c | 1
mm/memory.c | 1
mm/mincore.c | 1
mm/page_alloc.c | 1
mm/page_io.c | 1
mm/shmem.c | 1
mm/swap.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++
mm/swap_state.c | 1
mm/swapfile.c | 1
mm/util.c | 1
mm/vmscan.c | 1
mm/zswap.c | 2 +
15 files changed, 147 insertions(+), 121 deletions(-)
create mode 100644 mm/swap.h

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 27093b477c5f..11390dde5a6c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -420,62 +420,19 @@ extern void kswapd_stop(int nid);

#ifdef CONFIG_SWAP

-#include <linux/blk_types.h> /* for bio_end_io_t */
-
-/* linux/mm/page_io.c */
-extern int swap_readpage(struct page *page, bool do_poll);
-extern int swap_writepage(struct page *page, struct writeback_control *wbc);
-extern void end_swap_bio_write(struct bio *bio);
-extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
- bio_end_io_t end_write_func);
bool swap_dirty_folio(struct address_space *mapping, struct folio *folio);
-
int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
unsigned long nr_pages, sector_t start_block);
int generic_swapfile_activate(struct swap_info_struct *, struct file *,
sector_t *);

-/* linux/mm/swap_state.c */
-/* One swap address space for each 64M swap space */
-#define SWAP_ADDRESS_SPACE_SHIFT 14
-#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT)
-extern struct address_space *swapper_spaces[];
-#define swap_address_space(entry) \
- (&swapper_spaces[swp_type(entry)][swp_offset(entry) \
- >> SWAP_ADDRESS_SPACE_SHIFT])
static inline unsigned long total_swapcache_pages(void)
{
return global_node_page_state(NR_SWAPCACHE);
}

-extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *page);
-extern void *get_shadow_from_swap_cache(swp_entry_t entry);
-extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
- gfp_t gfp, void **shadowp);
-extern void __delete_from_swap_cache(struct page *page,
- swp_entry_t entry, void *shadow);
-extern void delete_from_swap_cache(struct page *);
-extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
- unsigned long end);
-extern void free_swap_cache(struct page *);
extern void free_page_and_swap_cache(struct page *);
extern void free_pages_and_swap_cache(struct page **, int);
-extern struct page *lookup_swap_cache(swp_entry_t entry,
- struct vm_area_struct *vma,
- unsigned long addr);
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
-extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
- struct vm_area_struct *vma, unsigned long addr,
- bool do_poll);
-extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
- struct vm_area_struct *vma, unsigned long addr,
- bool *new_page_allocated);
-extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
- struct vm_fault *vmf);
-extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
- struct vm_fault *vmf);
-
/* linux/mm/swapfile.c */
extern atomic_long_t nr_swap_pages;
extern long total_swap_pages;
@@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
}

#else /* CONFIG_SWAP */
-
-static inline int swap_readpage(struct page *page, bool do_poll)
-{
- return 0;
-}
-
static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
{
return NULL;
@@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
{
}

-static inline struct address_space *swap_address_space(swp_entry_t entry)
-{
- return NULL;
-}
-
#define get_nr_swap_pages() 0L
#define total_swap_pages 0L
#define total_swapcache_pages() 0UL
@@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
#define free_pages_and_swap_cache(pages, nr) \
release_pages((pages), (nr));

-static inline void free_swap_cache(struct page *page)
-{
-}
-
-static inline void show_swap_cache_info(void)
-{
-}
-
/* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
#define free_swap_and_cache(e) is_pfn_swap_entry(e)

@@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp)
{
}

-static inline struct page *swap_cluster_readahead(swp_entry_t entry,
- gfp_t gfp_mask, struct vm_fault *vmf)
-{
- return NULL;
-}
-
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
- struct vm_fault *vmf)
-{
- return NULL;
-}
-
-static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
-{
- return 0;
-}
-
-static inline struct page *lookup_swap_cache(swp_entry_t swp,
- struct vm_area_struct *vma,
- unsigned long addr)
-{
- return NULL;
-}
-
-static inline
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
-{
- return find_get_page(mapping, index);
-}
-
-static inline int add_to_swap(struct page *page)
-{
- return 0;
-}
-
-static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
-{
- return NULL;
-}
-
-static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
- gfp_t gfp_mask, void **shadowp)
-{
- return -1;
-}
-
-static inline void __delete_from_swap_cache(struct page *page,
- swp_entry_t entry, void *shadow)
-{
-}
-
-static inline void delete_from_swap_cache(struct page *page)
-{
-}
-
-static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
- unsigned long end)
-{
-}

static inline int page_swapcount(struct page *page)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2fe38212e07c..2b433920726d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -39,6 +39,7 @@
#include <asm/tlb.h>
#include <asm/pgalloc.h>
#include "internal.h"
+#include "swap.h"

#define CREATE_TRACE_POINTS
#include <trace/events/thp.h>
diff --git a/mm/madvise.c b/mm/madvise.c
index b41858ee937b..4f48e48432e8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -35,6 +35,7 @@
#include <asm/tlb.h>

#include "internal.h"
+#include "swap.h"

struct madvise_walk_private {
struct mmu_gather *tlb;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 725f76723220..4f4cb6a464fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -67,6 +67,7 @@
#include <net/sock.h>
#include <net/ip.h>
#include "slab.h"
+#include "swap.h"

#include <linux/uaccess.h>

diff --git a/mm/memory.c b/mm/memory.c
index be44d0b36b18..92ea8ac374a4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -86,6 +86,7 @@

#include "pgalloc-track.h"
#include "internal.h"
+#include "swap.h"

#if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
#warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..f4f627325e12 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -20,6 +20,7 @@
#include <linux/pgtable.h>

#include <linux/uaccess.h>
+#include "swap.h"

static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
unsigned long end, struct mm_walk *walk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdc8f60ae462..82bfcd23d0eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -81,6 +81,7 @@
#include "internal.h"
#include "shuffle.h"
#include "page_reporting.h"
+#include "swap.h"

/* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */
typedef int __bitwise fpi_t;
diff --git a/mm/page_io.c b/mm/page_io.c
index b417f000b49e..d01ab9d5410a 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -26,6 +26,7 @@
#include <linux/uio.h>
#include <linux/sched/task.h>
#include <linux/delayacct.h>
+#include "swap.h"

void end_swap_bio_write(struct bio *bio)
{
diff --git a/mm/shmem.c b/mm/shmem.c
index 529c9ad3e926..31db146f15ec 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -38,6 +38,7 @@
#include <linux/hugetlb.h>
#include <linux/fs_parser.h>
#include <linux/swapfile.h>
+#include "swap.h"

static struct vfsmount *shm_mnt;

diff --git a/mm/swap.h b/mm/swap.h
new file mode 100644
index 000000000000..f8265bf0ce00
--- /dev/null
+++ b/mm/swap.h
@@ -0,0 +1,133 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _MM_SWAP_H
+#define _MM_SWAP_H
+
+#ifdef CONFIG_SWAP
+#include <linux/blk_types.h> /* for bio_end_io_t */
+
+/* linux/mm/page_io.c */
+int swap_readpage(struct page *page, bool do_poll);
+int swap_writepage(struct page *page, struct writeback_control *wbc);
+void end_swap_bio_write(struct bio *bio);
+int __swap_writepage(struct page *page, struct writeback_control *wbc,
+ bio_end_io_t end_write_func);
+
+/* linux/mm/swap_state.c */
+/* One swap address space for each 64M swap space */
+#define SWAP_ADDRESS_SPACE_SHIFT 14
+#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT)
+extern struct address_space *swapper_spaces[];
+#define swap_address_space(entry) \
+ (&swapper_spaces[swp_type(entry)][swp_offset(entry) \
+ >> SWAP_ADDRESS_SPACE_SHIFT])
+
+void show_swap_cache_info(void);
+int add_to_swap(struct page *page);
+void *get_shadow_from_swap_cache(swp_entry_t entry);
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+ gfp_t gfp, void **shadowp);
+void __delete_from_swap_cache(struct page *page,
+ swp_entry_t entry, void *shadow);
+void delete_from_swap_cache(struct page *page);
+void clear_shadow_from_swap_cache(int type, unsigned long begin,
+ unsigned long end);
+void free_swap_cache(struct page *page);
+struct page *lookup_swap_cache(swp_entry_t entry,
+ struct vm_area_struct *vma,
+ unsigned long addr);
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
+
+struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_area_struct *vma,
+ unsigned long addr,
+ bool do_poll);
+struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
+ struct vm_area_struct *vma,
+ unsigned long addr,
+ bool *new_page_allocated);
+struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
+ struct vm_fault *vmf);
+struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
+ struct vm_fault *vmf);
+
+#else /* CONFIG_SWAP */
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+ return 0;
+}
+
+static inline struct address_space *swap_address_space(swp_entry_t entry)
+{
+ return NULL;
+}
+
+static inline void free_swap_cache(struct page *page)
+{
+}
+
+static inline void show_swap_cache_info(void)
+{
+}
+
+static inline struct page *swap_cluster_readahead(swp_entry_t entry,
+ gfp_t gfp_mask, struct vm_fault *vmf)
+{
+ return NULL;
+}
+
+static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+ struct vm_fault *vmf)
+{
+ return NULL;
+}
+
+static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
+{
+ return 0;
+}
+
+static inline struct page *lookup_swap_cache(swp_entry_t swp,
+ struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return NULL;
+}
+
+static inline
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
+{
+ return find_get_page(mapping, index);
+}
+
+static inline int add_to_swap(struct page *page)
+{
+ return 0;
+}
+
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+ return NULL;
+}
+
+static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
+ gfp_t gfp_mask, void **shadowp)
+{
+ return -1;
+}
+
+static inline void __delete_from_swap_cache(struct page *page,
+ swp_entry_t entry, void *shadow)
+{
+}
+
+static inline void delete_from_swap_cache(struct page *page)
+{
+}
+
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+ unsigned long end)
+{
+}
+
+#endif /* CONFIG_SWAP */
+#endif /* _MM_SWAP_H */
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 013856004825..5437dd317cf3 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -23,6 +23,7 @@
#include <linux/huge_mm.h>
#include <linux/shmem_fs.h>
#include "internal.h"
+#include "swap.h"

/*
* swapper_space is a fiction, retained to simplify the path through
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63c61f8b2611..2650927a009b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -44,6 +44,7 @@
#include <asm/tlbflush.h>
#include <linux/swapops.h>
#include <linux/swap_cgroup.h>
+#include "swap.h"

static bool swap_count_continued(struct swap_info_struct *, pgoff_t,
unsigned char);
diff --git a/mm/util.c b/mm/util.c
index 54e5e761a9a9..e8f59c0ef90f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -27,6 +27,7 @@
#include <linux/uaccess.h>

#include "internal.h"
+#include "swap.h"

/**
* kfree_const - conditionally free memory
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1678802e03e7..60378d36ec77 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -59,6 +59,7 @@
#include <linux/sched/sysctl.h>

#include "internal.h"
+#include "swap.h"

#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>
diff --git a/mm/zswap.c b/mm/zswap.c
index 3efd8cae315e..2c5db4cbedea 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -36,6 +36,8 @@
#include <linux/pagemap.h>
#include <linux/workqueue.h>

+#include "swap.h"
+
/*********************************
* statistics
**********************************/


2022-03-30 12:17:58

by NeilBrown

[permalink] [raw]
Subject: [PATCH 08/10] MM: submit multipage reads for SWP_FS_OPS swap-space

swap_readpage() is given one page at a time, but may be called
repeatedly in succession.
For block-device swap-space, the blk_plug functionality allows the
multiple pages to be combined together at lower layers.
That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is
only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS
are single page reads.

With this patch we pass in a pointer-to-pointer when swap_readpage can
store state between calls - much like the effect of blk_plug. After
calling swap_readpage() some number of times, the state will be passed
to swap_read_unplug() which can submit the combined request.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
mm/madvise.c | 8 +++-
mm/memory.c | 2 +
mm/page_io.c | 104 ++++++++++++++++++++++++++++++++++++-------------------
mm/swap.h | 17 +++++++--
mm/swap_state.c | 20 +++++++----
5 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 4f48e48432e8..297de11f73d6 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -198,6 +198,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
pte_t *orig_pte;
struct vm_area_struct *vma = walk->private;
unsigned long index;
+ struct swap_iocb *splug = NULL;

if (pmd_none_or_trans_huge_or_clear_bad(pmd))
return 0;
@@ -219,10 +220,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
continue;

page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
- vma, index, false);
+ vma, index, false, &splug);
if (page)
put_page(page);
}
+ swap_read_unplug(splug);

return 0;
}
@@ -238,6 +240,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1);
struct page *page;
+ struct swap_iocb *splug = NULL;

rcu_read_lock();
xas_for_each(&xas, page, end_index) {
@@ -250,13 +253,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,

swap = radix_to_swp_entry(page);
page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
- NULL, 0, false);
+ NULL, 0, false, &splug);
if (page)
put_page(page);

rcu_read_lock();
}
rcu_read_unlock();
+ swap_read_unplug(splug);

lru_add_drain(); /* Push any new pages onto the LRU now */
}
diff --git a/mm/memory.c b/mm/memory.c
index 92ea8ac374a4..8de0ad307cb2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3586,7 +3586,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)

/* To provide entry to swap_readpage() */
set_page_private(page, entry.val);
- swap_readpage(page, true);
+ swap_readpage(page, true, NULL);
set_page_private(page, 0);
}
} else {
diff --git a/mm/page_io.c b/mm/page_io.c
index a01cc273bb00..8735707ea349 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -286,7 +286,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)

struct swap_iocb {
struct kiocb iocb;
- struct bio_vec bvec;
+ struct bio_vec bvec[SWAP_CLUSTER_MAX];
+ int pages;
};
static mempool_t *sio_pool;

@@ -306,7 +307,7 @@ int sio_pool_init(void)
static void sio_write_complete(struct kiocb *iocb, long ret)
{
struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
- struct page *page = sio->bvec.bv_page;
+ struct page *page = sio->bvec[0].bv_page;

if (ret != PAGE_SIZE) {
/*
@@ -344,10 +345,10 @@ static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
init_sync_kiocb(&sio->iocb, swap_file);
sio->iocb.ki_complete = sio_write_complete;
sio->iocb.ki_pos = page_file_offset(page);
- sio->bvec.bv_page = page;
- sio->bvec.bv_len = PAGE_SIZE;
- sio->bvec.bv_offset = 0;
- iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE);
+ sio->bvec[0].bv_page = page;
+ sio->bvec[0].bv_len = PAGE_SIZE;
+ sio->bvec[0].bv_offset = 0;
+ iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE);
ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
if (ret != -EIOCBQUEUED)
sio_write_complete(&sio->iocb, ret);
@@ -395,46 +396,66 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
static void sio_read_complete(struct kiocb *iocb, long ret)
{
struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
- struct page *page = sio->bvec.bv_page;
+ int p;

- if (ret != 0 && ret != PAGE_SIZE) {
- SetPageError(page);
- ClearPageUptodate(page);
- pr_alert_ratelimited("Read-error on swap-device\n");
+ if (ret == PAGE_SIZE * sio->pages) {
+ for (p = 0; p < sio->pages; p++) {
+ struct page *page = sio->bvec[p].bv_page;
+
+ SetPageUptodate(page);
+ unlock_page(page);
+ }
+ count_vm_events(PSWPIN, sio->pages);
} else {
- SetPageUptodate(page);
- count_vm_event(PSWPIN);
+ for (p = 0; p < sio->pages; p++) {
+ struct page *page = sio->bvec[p].bv_page;
+
+ SetPageError(page);
+ ClearPageUptodate(page);
+ unlock_page(page);
+ }
+ pr_alert_ratelimited("Read-error on swap-device\n");
}
- unlock_page(page);
mempool_free(sio, sio_pool);
}

-static int swap_readpage_fs(struct page *page)
+static void swap_readpage_fs(struct page *page,
+ struct swap_iocb **plug)
{
struct swap_info_struct *sis = page_swap_info(page);
- struct file *swap_file = sis->swap_file;
- struct address_space *mapping = swap_file->f_mapping;
- struct iov_iter from;
- struct swap_iocb *sio;
+ struct swap_iocb *sio = NULL;
loff_t pos = page_file_offset(page);
- int ret;
-
- sio = mempool_alloc(sio_pool, GFP_KERNEL);
- init_sync_kiocb(&sio->iocb, swap_file);
- sio->iocb.ki_pos = pos;
- sio->iocb.ki_complete = sio_read_complete;
- sio->bvec.bv_page = page;
- sio->bvec.bv_len = PAGE_SIZE;
- sio->bvec.bv_offset = 0;

- iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE);
- ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
- if (ret != -EIOCBQUEUED)
- sio_read_complete(&sio->iocb, ret);
- return ret;
+ if (plug)
+ sio = *plug;
+ if (sio) {
+ if (sio->iocb.ki_filp != sis->swap_file ||
+ sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) {
+ swap_read_unplug(sio);
+ sio = NULL;
+ }
+ }
+ if (!sio) {
+ sio = mempool_alloc(sio_pool, GFP_KERNEL);
+ init_sync_kiocb(&sio->iocb, sis->swap_file);
+ sio->iocb.ki_pos = pos;
+ sio->iocb.ki_complete = sio_read_complete;
+ sio->pages = 0;
+ }
+ sio->bvec[sio->pages].bv_page = page;
+ sio->bvec[sio->pages].bv_len = PAGE_SIZE;
+ sio->bvec[sio->pages].bv_offset = 0;
+ sio->pages += 1;
+ if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) {
+ swap_read_unplug(sio);
+ sio = NULL;
+ }
+ if (plug)
+ *plug = sio;
}

-int swap_readpage(struct page *page, bool synchronous)
+int swap_readpage(struct page *page, bool synchronous,
+ struct swap_iocb **plug)
{
struct bio *bio;
int ret = 0;
@@ -462,7 +483,7 @@ int swap_readpage(struct page *page, bool synchronous)
}

if (data_race(sis->flags & SWP_FS_OPS)) {
- ret = swap_readpage_fs(page);
+ swap_readpage_fs(page, plug);
goto out;
}

@@ -513,3 +534,16 @@ int swap_readpage(struct page *page, bool synchronous)
delayacct_swapin_end();
return ret;
}
+
+void __swap_read_unplug(struct swap_iocb *sio)
+{
+ struct iov_iter from;
+ struct address_space *mapping = sio->iocb.ki_filp->f_mapping;
+ int ret;
+
+ iov_iter_bvec(&from, READ, sio->bvec, sio->pages,
+ PAGE_SIZE * sio->pages);
+ ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+ if (ret != -EIOCBQUEUED)
+ sio_read_complete(&sio->iocb, ret);
+}
diff --git a/mm/swap.h b/mm/swap.h
index eafac80b18d9..0389ab147837 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -7,7 +7,15 @@

/* linux/mm/page_io.c */
int sio_pool_init(void);
-int swap_readpage(struct page *page, bool do_poll);
+struct swap_iocb;
+int swap_readpage(struct page *page, bool do_poll,
+ struct swap_iocb **plug);
+void __swap_read_unplug(struct swap_iocb *plug);
+static inline void swap_read_unplug(struct swap_iocb *plug)
+{
+ if (unlikely(plug))
+ __swap_read_unplug(plug);
+}
int swap_writepage(struct page *page, struct writeback_control *wbc);
void end_swap_bio_write(struct bio *bio);
int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -41,7 +49,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma,
unsigned long addr,
- bool do_poll);
+ bool do_poll,
+ struct swap_iocb **plug);
struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma,
unsigned long addr,
@@ -56,7 +65,9 @@ static inline unsigned int page_swap_flags(struct page *page)
return page_swap_info(page)->flags;
}
#else /* CONFIG_SWAP */
-static inline int swap_readpage(struct page *page, bool do_poll)
+struct swap_iocb;
+static inline int swap_readpage(struct page *page, bool do_poll,
+ struct swap_iocb **plug)
{
return 0;
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index f3ab01801629..d41746a572a2 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* the swap entry is no longer in use.
*/
struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
- struct vm_area_struct *vma, unsigned long addr, bool do_poll)
+ struct vm_area_struct *vma,
+ unsigned long addr, bool do_poll,
+ struct swap_iocb **plug)
{
bool page_was_allocated;
struct page *retpage = __read_swap_cache_async(entry, gfp_mask,
vma, addr, &page_was_allocated);

if (page_was_allocated)
- swap_readpage(retpage, do_poll);
+ swap_readpage(retpage, do_poll, plug);

return retpage;
}
@@ -621,6 +623,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long mask;
struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
+ struct swap_iocb *splug = NULL;
bool do_poll = true, page_allocated;
struct vm_area_struct *vma = vmf->vma;
unsigned long addr = vmf->address;
@@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
if (!page)
continue;
if (page_allocated) {
- swap_readpage(page, false);
+ swap_readpage(page, false, &splug);
if (offset != entry_offset) {
SetPageReadahead(page);
count_vm_event(SWAP_RA);
@@ -656,10 +659,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
put_page(page);
}
blk_finish_plug(&plug);
+ swap_read_unplug(splug);

lru_add_drain(); /* Push any new pages onto the LRU now */
skip:
- return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll);
+ /* The page was likely read above, so no need for plugging here */
+ return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL);
}

int init_swap_address_space(unsigned int type, unsigned long nr_pages)
@@ -790,6 +795,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
struct vm_fault *vmf)
{
struct blk_plug plug;
+ struct swap_iocb *splug = NULL;
struct vm_area_struct *vma = vmf->vma;
struct page *page;
pte_t *pte, pentry;
@@ -820,7 +826,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
if (!page)
continue;
if (page_allocated) {
- swap_readpage(page, false);
+ swap_readpage(page, false, &splug);
if (i != ra_info.offset) {
SetPageReadahead(page);
count_vm_event(SWAP_RA);
@@ -829,10 +835,12 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
put_page(page);
}
blk_finish_plug(&plug);
+ swap_read_unplug(splug);
lru_add_drain();
skip:
+ /* The page was likely read above, so no need for plugging here */
return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address,
- ra_info.win == 1);
+ ra_info.win == 1, NULL);
}

/**


2022-03-31 03:30:15

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/10] MM changes to improve swap-over-NFS support

On Wed, 30 Mar 2022, David Howells wrote:
> Do you have a branch with your patches on?

http://git.neil.brown.name/?p=linux.git;a=shortlog;h=refs/heads/swap-nfs

git://neil.brown.name/linux branch swap-nfs

Also on https://github.com/neilbrown/linux.git same branch

(it seems 1GB is no longer enough to run a git server for the kernel
effectively)

This contains
- recent HEAD from Linus, which includes the NFS work
- the patches I sent to akpm
- the patch to switch NFS over to using the new swap_rw
- a SUNRPC patch to fix an easy crash. But has always been there,
but recent changes to how kmalloc is called makes it much easier to
trigger.

NeilBrown

2022-03-31 09:01:41

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 00/10] MM changes to improve swap-over-NFS support

NeilBrown <[email protected]> wrote:

> Assorted improvements for swap-via-filesystem.
>
> This is a resend of these patches, rebased on current HEAD.
> The only substantial changes is that swap_dirty_folio has replaced
> swap_set_page_dirty.
>
> Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It
> has previously worked for NFS but that broke a few releases back.
> This series changes to use a new ->swap_rw rather than ->readpage and
> ->direct_IO. It also makes other improvements.
>
> There is a companion series already in linux-next which fixes various
> issues with NFS. Once both series land, a final patch is needed which
> changes NFS over to use ->swap_rw.

This seems to work by running sufficient copies of the attached program in
parallel to overwhelm the amount of ordinary RAM.

Tested-by: David Howells <[email protected]>
---
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

int main()
{
unsigned int pid = getpid(), iterations = 0;
size_t i, j, size = 1024 * 1024 * 1024;
char *p;
bool mismatch;

p = malloc(size);
if (!p) {
perror("malloc");
exit(1);
}

srand(pid);
for (i = 0; i < size; i += 4)
*(unsigned int *)(p + i) = rand();

do {
for (j = 0; j < 16; j++) {
for (i = 0; i < size; i += 4096)
*(unsigned int *)(p + i) += 1;
iterations++;
}

mismatch = false;
srand(pid);
for (i = 0; i < size; i += 4) {
unsigned int r = rand();
unsigned int v = *(unsigned int *)(p + i);

if (i % 4096 == 0)
v -= iterations;

if (v != r) {
fprintf(stderr, "mismatch %zx: %x != %x (diff %x)\n",
i, v, r, v - r);
mismatch = true;
}
}
} while (!mismatch);

exit(1);
}

2022-04-21 23:05:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 00/10] MM changes to improve swap-over-NFS support

Hi Neil,

On Thu, Mar 31, 2022 at 4:54 AM NeilBrown <[email protected]> wrote:
> On Wed, 30 Mar 2022, David Howells wrote:
> > Do you have a branch with your patches on?
>
> http://git.neil.brown.name/?p=linux.git;a=shortlog;h=refs/heads/swap-nfs
>
> git://neil.brown.name/linux branch swap-nfs
>
> Also on https://github.com/neilbrown/linux.git same branch
>
> (it seems 1GB is no longer enough to run a git server for the kernel
> effectively)
>
> This contains
> - recent HEAD from Linus, which includes the NFS work
> - the patches I sent to akpm
> - the patch to switch NFS over to using the new swap_rw
> - a SUNRPC patch to fix an easy crash. But has always been there,
> but recent changes to how kmalloc is called makes it much easier to
> trigger.

Thanks for your series!

I gave this a try on Renesas RSK+RZA1 (RZ/A1H with 32 MiB of RAM)
and RZA2MEVB (RZ/A2M with 64 MiB of RAM) with a Debian nfsroot.
Seems to work, so
Tested-by: Geert Uytterhoeven <[email protected]>

However, I still managed to trigger memory allocation failures,
even on the RZ/A2, which I don't remember seeing last time I tried.

root@rza2mevb:~# free
total used free shared buff/cache available
Mem: 57428 12400 20024 1212 25004 40028
Swap: 0 0 0
root@rza2mevb:~# swapon /swap
Adding 1048572k swap on /swap. Priority:-2 extents:1 across:1048572k
root@rza2mevb:~# apt update
Ign:1 http://ftp.be.debian.org/debian stretch InRelease
Get:2 http://security.debian.org stretch/updates InRelease [53.0 kB]
Hit:3 http://ftp.be.debian.org/debian stretch Release
Get:5 http://security.debian.org stretch/updates/main armhf Packages [738 kB]
Get:6 http://security.debian.org stretch/updates/main Translation-en [356 kB]
Fetched 1,147 kB in 12s (89.5 kB/s)
apt: page allocation failure: order:0,
mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
CPU: 0 PID: 455 Comm: apt Not tainted
5.18.0-rc3-rza2mevb-00734-g98e2a6b7a591 #186
Hardware name: Generic R7S9210 (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from warn_alloc+0xa0/0x150
warn_alloc from __alloc_pages+0x3a0/0x8c0
__alloc_pages from ____cache_alloc+0x194/0x734
____cache_alloc from kmem_cache_alloc+0x60/0xd0
kmem_cache_alloc from nfs_writehdr_alloc+0x28/0x70
nfs_writehdr_alloc from nfs_pgio_header_alloc+0x10/0x28
nfs_pgio_header_alloc from nfs_generic_pg_pgios+0x14/0xa8
nfs_generic_pg_pgios from nfs_pageio_doio+0x2c/0x4c
nfs_pageio_doio from __nfs_pageio_add_request+0x34c/0x3c8
__nfs_pageio_add_request from nfs_pageio_add_request_mirror+0x18/0x44
nfs_pageio_add_request_mirror from nfs_pageio_add_request+0x1b8/0x1c8
nfs_pageio_add_request from nfs_direct_write_schedule_iovec+0x208/0x28c
nfs_direct_write_schedule_iovec from nfs_file_direct_write+0x128/0x21c
nfs_file_direct_write from nfs_swap_rw+0x24/0x28
nfs_swap_rw from swap_write_unplug+0x54/0x94
swap_write_unplug from __swap_writepage+0x10c/0x20c
__swap_writepage from shrink_page_list+0x86c/0xabc
shrink_page_list from shrink_inactive_list+0xfc/0x2b0
shrink_inactive_list from shrink_node+0x598/0x80c
shrink_node from try_to_free_pages+0x2bc/0x3e8
try_to_free_pages from __alloc_pages+0x55c/0x8c0
__alloc_pages from __filemap_get_folio+0x1b4/0x260
__filemap_get_folio from pagecache_get_page+0x10/0x68
pagecache_get_page from nfs_write_begin+0x30/0x148
nfs_write_begin from generic_perform_write+0xa4/0x1b8
generic_perform_write from nfs_file_write+0xf0/0x2a4
nfs_file_write from vfs_write+0x140/0x19c
vfs_write from ksys_write+0x74/0xc8
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xc4c1dfa8 to 0xc4c1dff0)
dfa0: b6ec4025 00000000 00000004 b1d0e000 019f12ac befee52c
dfc0: b6ec4025 00000000 019f12ac 00000004 019f12ac b1d0e000 befee52c befee7ac
dfe0: 00000000 befee4d4 b6ec0b43 b6cb1cf6
Mem-Info:
active_anon:1772 inactive_anon:7471 isolated_anon:64
active_file:679 inactive_file:392 isolated_file:0
unevictable:0 dirty:0 writeback:2891
slab_reclaimable:417 slab_unreclaimable:2863
mapped:32 shmem:52 pagetables:107 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:6 free_cma:0
Node 0 active_anon:7088kB inactive_anon:29884kB active_file:2716kB
inactive_file:1568kB unevictable:0kB isolated(anon):256kB
isolated(file):0kB mapped:128kB dirty:0kB writeback:11564kB
shmem:208kB writeback_tmp:0kB kernel_stack:408kB pagetables:428kB
all_unreclaimable? no
Normal free:0kB boost:4096kB min:5044kB low:5280kB high:5516kB
reserved_highatomic:0KB active_anon:7088kB inactive_anon:29884kB
active_file:2716kB inactive_file:1568kB unevictable:0kB
writepending:10296kB present:65536kB managed:57428kB mlocked:0kB
bounce:0kB free_pcp:24kB local_pcp:24kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
7385 total pagecache pages
6262 pages in swap cache
Swap cache stats: add 6787, delete 525, find 58/74
Free swap = 1021476kB
Total swap = 1048572kB
16384 pages RAM
0 pages HighMem/MovableOnly
2027 pages reserved
Write error -12 on dio swapfile (27660288)
Write error -12 on dio swapfile (29679616)
Write error -12 on dio swapfile (8572928)
Write error 0 on dio swapfile (8441856)
Write error 0 on dio swapfile (8704000)
Write error -12 on dio swapfile (8966144)
Write error -12 on dio swapfile (9097216)
Write error 0 on dio swapfile (8835072)
Write error 0 on dio swapfile (9228288)
Write error 0 on dio swapfile (9359360)
sio_write_complete: 2731 callbacks suppressed
Write error 0 on dio swapfile (34705408)
Write error 0 on dio swapfile (23470080)
Write error 0 on dio swapfile (23601152)
Write error 0 on dio swapfile (23732224)
Write error 0 on dio swapfile (4202496)
Write error 0 on dio swapfile (4304896)
Write error 0 on dio swapfile (4435968)
Write error 0 on dio swapfile (4567040)
Write error 0 on dio swapfile (4698112)
Write error 0 on dio swapfile (4829184)
warn_alloc: 125849 callbacks suppressed
kworker/u2:7: page allocation failure: order:0,
mode:0x60c40(GFP_NOFS|__GFP_COMP|__GFP_MEMALLOC), nodemask=(null)
CPU: 0 PID: 457 Comm: kworker/u2:7 Not tainted
5.18.0-rc3-rza2mevb-00734-g98e2a6b7a591 #186
Hardware name: Generic R7S9210 (Flattened Device Tree)
Workqueue: rpciod rpc_async_schedule
unwind_backtrace from show_stack+0x10/0x14
show_stack from warn_alloc+0xa0/0x150
warn_alloc from __alloc_pages+0x3a0/0x8c0
__alloc_pages from ____cache_alloc+0x194/0x734
____cache_alloc from __kmalloc_track_caller+0x74/0xf0
__kmalloc_track_caller from kmalloc_reserve.constprop.0+0x4c/0x60
kmalloc_reserve.constprop.0 from __alloc_skb+0x88/0x154
__alloc_skb from tcp_stream_alloc_skb+0x68/0x13c
tcp_stream_alloc_skb from tcp_sendmsg_locked+0x4b8/0xabc
tcp_sendmsg_locked from tcp_sendmsg+0x24/0x38
tcp_sendmsg from sock_sendmsg_nosec+0x14/0x24
sock_sendmsg_nosec from xprt_sock_sendmsg+0x1d8/0x244
xprt_sock_sendmsg from xs_tcp_send_request+0x11c/0x20c
xs_tcp_send_request from xprt_transmit+0x84/0x234
xprt_transmit from call_transmit+0x6c/0x7c
call_transmit from __rpc_execute+0xe4/0x2f0
__rpc_execute from rpc_async_schedule+0x18/0x24
rpc_async_schedule from process_one_work+0x170/0x210
process_one_work from worker_thread+0x204/0x2a4
worker_thread from kthread+0xb0/0xbc
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc4e0dfb0 to 0xc4e0dff8)
dfa0: 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
Mem-Info:
active_anon:2703 inactive_anon:6291 isolated_anon:209
active_file:541 inactive_file:530 isolated_file:0
unevictable:0 dirty:0 writeback:3781
slab_reclaimable:391 slab_unreclaimable:2993
mapped:0 shmem:0 pagetables:107 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:26 free_cma:0
Node 0 active_anon:10812kB inactive_anon:25164kB active_file:2164kB
inactive_file:2120kB unevictable:0kB isolated(anon):836kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:15124kB shmem:0kB
writeback_tmp:0kB kernel_stack:408kB pagetables:428kB
all_unreclaimable? yes
Normal free:0kB boost:0kB min:948kB low:1184kB high:1420kB
reserved_highatomic:0KB active_anon:10812kB inactive_anon:25164kB
active_file:2164kB inactive_file:2120kB unevictable:0kB
writepending:13284kB present:65536kB managed:57428kB mlocked:0kB
bounce:0kB free_pcp:104kB local_pcp:104kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
10274 total pagecache pages
9203 pages in swap cache
Swap cache stats: add 9834, delete 631, find 61/77
Free swap = 1009180kB
Total swap = 1048572kB
16384 pages RAM
0 pages HighMem/MovableOnly
2027 pages reserved
sio_write_complete: 29066 callbacks suppressed
...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2022-04-26 07:37:04

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/10] MM changes to improve swap-over-NFS support

On Wed, 20 Apr 2022, Geert Uytterhoeven wrote:
> Hi Neil,
>
> On Thu, Mar 31, 2022 at 4:54 AM NeilBrown <[email protected]> wrote:
> > On Wed, 30 Mar 2022, David Howells wrote:
> > > Do you have a branch with your patches on?
> >
> > http://git.neil.brown.name/?p=linux.git;a=shortlog;h=refs/heads/swap-nfs
> >
> > git://neil.brown.name/linux branch swap-nfs
> >
> > Also on https://github.com/neilbrown/linux.git same branch
> >
> > (it seems 1GB is no longer enough to run a git server for the kernel
> > effectively)
> >
> > This contains
> > - recent HEAD from Linus, which includes the NFS work
> > - the patches I sent to akpm
> > - the patch to switch NFS over to using the new swap_rw
> > - a SUNRPC patch to fix an easy crash. But has always been there,
> > but recent changes to how kmalloc is called makes it much easier to
> > trigger.
>
> Thanks for your series!
>
> I gave this a try on Renesas RSK+RZA1 (RZ/A1H with 32 MiB of RAM)
> and RZA2MEVB (RZ/A2M with 64 MiB of RAM) with a Debian nfsroot.
> Seems to work, so
> Tested-by: Geert Uytterhoeven <[email protected]>

Thanks for testing!!!!

>
> However, I still managed to trigger memory allocation failures,
> even on the RZ/A2, which I don't remember seeing last time I tried.
>
> root@rza2mevb:~# free
> total used free shared buff/cache available
> Mem: 57428 12400 20024 1212 25004 40028
> Swap: 0 0 0
> root@rza2mevb:~# swapon /swap
> Adding 1048572k swap on /swap. Priority:-2 extents:1 across:1048572k
> root@rza2mevb:~# apt update
> Ign:1 http://ftp.be.debian.org/debian stretch InRelease
> Get:2 http://security.debian.org stretch/updates InRelease [53.0 kB]
> Hit:3 http://ftp.be.debian.org/debian stretch Release
> Get:5 http://security.debian.org stretch/updates/main armhf Packages [738 kB]
> Get:6 http://security.debian.org stretch/updates/main Translation-en [356 kB]
> Fetched 1,147 kB in 12s (89.5 kB/s)
> apt: page allocation failure: order:0,
> mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
> CPU: 0 PID: 455 Comm: apt Not tainted
> 5.18.0-rc3-rza2mevb-00734-g98e2a6b7a591 #186
> Hardware name: Generic R7S9210 (Flattened Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from warn_alloc+0xa0/0x150
> warn_alloc from __alloc_pages+0x3a0/0x8c0
> __alloc_pages from ____cache_alloc+0x194/0x734
> ____cache_alloc from kmem_cache_alloc+0x60/0xd0
> kmem_cache_alloc from nfs_writehdr_alloc+0x28/0x70
> nfs_writehdr_alloc from nfs_pgio_header_alloc+0x10/0x28

This is due to a recent change in NFS code which I don't think actually
makes sense.
Commit 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck in mempool_alloc()")

I need to find an alternate approach which addresses Trond's concerns
but also works. I'm just now back from leave and will try to look at
this over the next week or two.

Thanks,
NeilBrown



> nfs_pgio_header_alloc from nfs_generic_pg_pgios+0x14/0xa8
> nfs_generic_pg_pgios from nfs_pageio_doio+0x2c/0x4c
> nfs_pageio_doio from __nfs_pageio_add_request+0x34c/0x3c8
> __nfs_pageio_add_request from nfs_pageio_add_request_mirror+0x18/0x44
> nfs_pageio_add_request_mirror from nfs_pageio_add_request+0x1b8/0x1c8
> nfs_pageio_add_request from nfs_direct_write_schedule_iovec+0x208/0x28c
> nfs_direct_write_schedule_iovec from nfs_file_direct_write+0x128/0x21c
> nfs_file_direct_write from nfs_swap_rw+0x24/0x28
> nfs_swap_rw from swap_write_unplug+0x54/0x94
> swap_write_unplug from __swap_writepage+0x10c/0x20c
> __swap_writepage from shrink_page_list+0x86c/0xabc
> shrink_page_list from shrink_inactive_list+0xfc/0x2b0
> shrink_inactive_list from shrink_node+0x598/0x80c
> shrink_node from try_to_free_pages+0x2bc/0x3e8
> try_to_free_pages from __alloc_pages+0x55c/0x8c0
> __alloc_pages from __filemap_get_folio+0x1b4/0x260
> __filemap_get_folio from pagecache_get_page+0x10/0x68
> pagecache_get_page from nfs_write_begin+0x30/0x148
> nfs_write_begin from generic_perform_write+0xa4/0x1b8
> generic_perform_write from nfs_file_write+0xf0/0x2a4
> nfs_file_write from vfs_write+0x140/0x19c
> vfs_write from ksys_write+0x74/0xc8
> ksys_write from ret_fast_syscall+0x0/0x54
> Exception stack(0xc4c1dfa8 to 0xc4c1dff0)
> dfa0: b6ec4025 00000000 00000004 b1d0e000 019f12ac befee52c
> dfc0: b6ec4025 00000000 019f12ac 00000004 019f12ac b1d0e000 befee52c befee7ac
> dfe0: 00000000 befee4d4 b6ec0b43 b6cb1cf6
> Mem-Info:
> active_anon:1772 inactive_anon:7471 isolated_anon:64
> active_file:679 inactive_file:392 isolated_file:0
> unevictable:0 dirty:0 writeback:2891
> slab_reclaimable:417 slab_unreclaimable:2863
> mapped:32 shmem:52 pagetables:107 bounce:0
> kernel_misc_reclaimable:0
> free:0 free_pcp:6 free_cma:0
> Node 0 active_anon:7088kB inactive_anon:29884kB active_file:2716kB
> inactive_file:1568kB unevictable:0kB isolated(anon):256kB
> isolated(file):0kB mapped:128kB dirty:0kB writeback:11564kB
> shmem:208kB writeback_tmp:0kB kernel_stack:408kB pagetables:428kB
> all_unreclaimable? no
> Normal free:0kB boost:4096kB min:5044kB low:5280kB high:5516kB
> reserved_highatomic:0KB active_anon:7088kB inactive_anon:29884kB
> active_file:2716kB inactive_file:1568kB unevictable:0kB
> writepending:10296kB present:65536kB managed:57428kB mlocked:0kB
> bounce:0kB free_pcp:24kB local_pcp:24kB free_cma:0kB
> lowmem_reserve[]: 0 0
> Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 0kB
> 7385 total pagecache pages
> 6262 pages in swap cache
> Swap cache stats: add 6787, delete 525, find 58/74
> Free swap = 1021476kB
> Total swap = 1048572kB
> 16384 pages RAM
> 0 pages HighMem/MovableOnly
> 2027 pages reserved
> Write error -12 on dio swapfile (27660288)
> Write error -12 on dio swapfile (29679616)
> Write error -12 on dio swapfile (8572928)
> Write error 0 on dio swapfile (8441856)
> Write error 0 on dio swapfile (8704000)
> Write error -12 on dio swapfile (8966144)
> Write error -12 on dio swapfile (9097216)
> Write error 0 on dio swapfile (8835072)
> Write error 0 on dio swapfile (9228288)
> Write error 0 on dio swapfile (9359360)
> sio_write_complete: 2731 callbacks suppressed
> Write error 0 on dio swapfile (34705408)
> Write error 0 on dio swapfile (23470080)
> Write error 0 on dio swapfile (23601152)
> Write error 0 on dio swapfile (23732224)
> Write error 0 on dio swapfile (4202496)
> Write error 0 on dio swapfile (4304896)
> Write error 0 on dio swapfile (4435968)
> Write error 0 on dio swapfile (4567040)
> Write error 0 on dio swapfile (4698112)
> Write error 0 on dio swapfile (4829184)
> warn_alloc: 125849 callbacks suppressed
> kworker/u2:7: page allocation failure: order:0,
> mode:0x60c40(GFP_NOFS|__GFP_COMP|__GFP_MEMALLOC), nodemask=(null)
> CPU: 0 PID: 457 Comm: kworker/u2:7 Not tainted
> 5.18.0-rc3-rza2mevb-00734-g98e2a6b7a591 #186
> Hardware name: Generic R7S9210 (Flattened Device Tree)
> Workqueue: rpciod rpc_async_schedule
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from warn_alloc+0xa0/0x150
> warn_alloc from __alloc_pages+0x3a0/0x8c0
> __alloc_pages from ____cache_alloc+0x194/0x734
> ____cache_alloc from __kmalloc_track_caller+0x74/0xf0
> __kmalloc_track_caller from kmalloc_reserve.constprop.0+0x4c/0x60
> kmalloc_reserve.constprop.0 from __alloc_skb+0x88/0x154
> __alloc_skb from tcp_stream_alloc_skb+0x68/0x13c
> tcp_stream_alloc_skb from tcp_sendmsg_locked+0x4b8/0xabc
> tcp_sendmsg_locked from tcp_sendmsg+0x24/0x38
> tcp_sendmsg from sock_sendmsg_nosec+0x14/0x24
> sock_sendmsg_nosec from xprt_sock_sendmsg+0x1d8/0x244
> xprt_sock_sendmsg from xs_tcp_send_request+0x11c/0x20c
> xs_tcp_send_request from xprt_transmit+0x84/0x234
> xprt_transmit from call_transmit+0x6c/0x7c
> call_transmit from __rpc_execute+0xe4/0x2f0
> __rpc_execute from rpc_async_schedule+0x18/0x24
> rpc_async_schedule from process_one_work+0x170/0x210
> process_one_work from worker_thread+0x204/0x2a4
> worker_thread from kthread+0xb0/0xbc
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc4e0dfb0 to 0xc4e0dff8)
> dfa0: 00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Mem-Info:
> active_anon:2703 inactive_anon:6291 isolated_anon:209
> active_file:541 inactive_file:530 isolated_file:0
> unevictable:0 dirty:0 writeback:3781
> slab_reclaimable:391 slab_unreclaimable:2993
> mapped:0 shmem:0 pagetables:107 bounce:0
> kernel_misc_reclaimable:0
> free:0 free_pcp:26 free_cma:0
> Node 0 active_anon:10812kB inactive_anon:25164kB active_file:2164kB
> inactive_file:2120kB unevictable:0kB isolated(anon):836kB
> isolated(file):0kB mapped:0kB dirty:0kB writeback:15124kB shmem:0kB
> writeback_tmp:0kB kernel_stack:408kB pagetables:428kB
> all_unreclaimable? yes
> Normal free:0kB boost:0kB min:948kB low:1184kB high:1420kB
> reserved_highatomic:0KB active_anon:10812kB inactive_anon:25164kB
> active_file:2164kB inactive_file:2120kB unevictable:0kB
> writepending:13284kB present:65536kB managed:57428kB mlocked:0kB
> bounce:0kB free_pcp:104kB local_pcp:104kB free_cma:0kB
> lowmem_reserve[]: 0 0
> Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 0kB
> 10274 total pagecache pages
> 9203 pages in swap cache
> Swap cache stats: add 9834, delete 631, find 61/77
> Free swap = 1009180kB
> Total swap = 1048572kB
> 16384 pages RAM
> 0 pages HighMem/MovableOnly
> 2027 pages reserved
> sio_write_complete: 29066 callbacks suppressed
> ...
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
>