2023-03-09 23:06:26

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v2 6/6] shmem: add support to ignore swap

In doing experimentations with shmem having the option to avoid swap
becomes a useful mechanism. One of the *raves* about brd over shmem is
you can avoid swap, but that's not really a good reason to use brd if
we can instead use shmem. Using brd has its own good reasons to exist,
but just because "tmpfs" doesn't let you do that is not a great reason
to avoid it if we can easily add support for it.

I don't add support for reconfiguring incompatible options, but if
we really wanted to we can add support for that.

To avoid swap we use mapping_set_unevictable() upon inode creation,
and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.

Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
---
Documentation/filesystems/tmpfs.rst | 9 ++++++---
Documentation/mm/unevictable-lru.rst | 2 ++
include/linux/shmem_fs.h | 1 +
mm/shmem.c | 28 +++++++++++++++++++++++++++-
4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 1ec9a9f8196b..f18f46be5c0c 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -13,7 +13,8 @@ everything stored therein is lost.

tmpfs puts everything into the kernel internal caches and grows and
shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space, and supports THP.
+unneeded pages out to swap space, if swap was enabled for the tmpfs
+mount. tmpfs also supports THP.

tmpfs extends ramfs with a few userspace configurable options listed and
explained further below, some of which can be reconfigured dynamically on the
@@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them.
Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
block layer at all.

-Since tmpfs lives completely in the page cache and on swap, all tmpfs
-pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
+Since tmpfs lives completely in the page cache and optionally on swap,
+all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
free(1). Notice that these counters also include shared memory
(shmem, see ipcs(1)). The most reliable way to get the count is
using df(1) and du(1).
@@ -83,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default
is half of the number of your physical RAM pages, or (on a
machine with highmem) the number of lowmem RAM pages,
whichever is the lower.
+noswap Disables swap. Remounts must respect the original settings.
+ By default swap is enabled.
========= ============================================================

These parameters accept a suffix k, m or g for kilo, mega and giga and
diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
index 92ac5dca420c..d5ac8511eb67 100644
--- a/Documentation/mm/unevictable-lru.rst
+++ b/Documentation/mm/unevictable-lru.rst
@@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:

* Those owned by ramfs.

+ * Those owned by tmpfs with the noswap mount option.
+
* Those mapped into SHM_LOCK'd shared memory regions.

* Those mapped into VM_LOCKED [mlock()ed] VMAs.
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 103d1000a5a2..50bf82b36995 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -45,6 +45,7 @@ struct shmem_sb_info {
kuid_t uid; /* Mount uid for root directory */
kgid_t gid; /* Mount gid for root directory */
bool full_inums; /* If i_ino should be uint or ino_t */
+ bool noswap; /* ignores VM reclaim / swap requests */
ino_t next_ino; /* The next per-sb inode number to use */
ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */
struct mempolicy *mpol; /* default memory policy for mappings */
diff --git a/mm/shmem.c b/mm/shmem.c
index dfd995da77b4..2e122c72b375 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -119,10 +119,12 @@ struct shmem_options {
bool full_inums;
int huge;
int seen;
+ bool noswap;
#define SHMEM_SEEN_BLOCKS 1
#define SHMEM_SEEN_INODES 2
#define SHMEM_SEEN_HUGE 4
#define SHMEM_SEEN_INUMS 8
+#define SHMEM_SEEN_NOSWAP 16
};

#ifdef CONFIG_TMPFS
@@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
struct address_space *mapping = folio->mapping;
struct inode *inode = mapping->host;
struct shmem_inode_info *info = SHMEM_I(inode);
+ struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
swp_entry_t swap;
pgoff_t index;

@@ -1350,7 +1353,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
if (WARN_ON_ONCE(!wbc->for_reclaim))
goto redirty;

- if (WARN_ON_ONCE(info->flags & VM_LOCKED))
+ if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap))
goto redirty;

if (!total_swap_pages)
@@ -2487,6 +2490,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
shmem_set_inode_flags(inode, info->fsflags);
INIT_LIST_HEAD(&info->shrinklist);
INIT_LIST_HEAD(&info->swaplist);
+ if (sbinfo->noswap)
+ mapping_set_unevictable(inode->i_mapping);
simple_xattrs_init(&info->xattrs);
cache_no_acl(inode);
mapping_set_large_folios(inode->i_mapping);
@@ -3574,6 +3579,7 @@ enum shmem_param {
Opt_uid,
Opt_inode32,
Opt_inode64,
+ Opt_noswap,
};

static const struct constant_table shmem_param_enums_huge[] = {
@@ -3595,6 +3601,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
fsparam_u32 ("uid", Opt_uid),
fsparam_flag ("inode32", Opt_inode32),
fsparam_flag ("inode64", Opt_inode64),
+ fsparam_flag ("noswap", Opt_noswap),
{}
};

@@ -3678,6 +3685,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
ctx->full_inums = true;
ctx->seen |= SHMEM_SEEN_INUMS;
break;
+ case Opt_noswap:
+ ctx->noswap = true;
+ ctx->seen |= SHMEM_SEEN_NOSWAP;
+ break;
}
return 0;

@@ -3776,6 +3787,14 @@ static int shmem_reconfigure(struct fs_context *fc)
err = "Current inum too high to switch to 32-bit inums";
goto out;
}
+ if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
+ err = "Cannot disable swap on remount";
+ goto out;
+ }
+ if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
+ err = "Cannot enable swap on remount if it was disabled on first mount";
+ goto out;
+ }

if (ctx->seen & SHMEM_SEEN_HUGE)
sbinfo->huge = ctx->huge;
@@ -3796,6 +3815,10 @@ static int shmem_reconfigure(struct fs_context *fc)
sbinfo->mpol = ctx->mpol; /* transfers initial ref */
ctx->mpol = NULL;
}
+
+ if (ctx->noswap)
+ sbinfo->noswap = true;
+
raw_spin_unlock(&sbinfo->stat_lock);
mpol_put(mpol);
return 0;
@@ -3850,6 +3873,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge));
#endif
shmem_show_mpol(seq, sbinfo->mpol);
+ if (sbinfo->noswap)
+ seq_printf(seq, ",noswap");
return 0;
}

@@ -3893,6 +3918,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
ctx->inodes = shmem_default_max_inodes();
if (!(ctx->seen & SHMEM_SEEN_INUMS))
ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
+ sbinfo->noswap = ctx->noswap;
} else {
sb->s_flags |= SB_NOUSER;
}
--
2.39.1



2023-04-18 05:58:26

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap

On Thu, 9 Mar 2023, Luis Chamberlain wrote:

> In doing experimentations with shmem having the option to avoid swap
> becomes a useful mechanism. One of the *raves* about brd over shmem is
> you can avoid swap, but that's not really a good reason to use brd if
> we can instead use shmem. Using brd has its own good reasons to exist,
> but just because "tmpfs" doesn't let you do that is not a great reason
> to avoid it if we can easily add support for it.
>
> I don't add support for reconfiguring incompatible options, but if
> we really wanted to we can add support for that.
>
> To avoid swap we use mapping_set_unevictable() upon inode creation,
> and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.

I have one big question here, which betrays my ignorance:
I hope that you or Christian can reassure me on this.

tmpfs has fs_flags FS_USERNS_MOUNT. I know nothing about namespaces,
nothing; but from overhearings, wonder if an ordinary user in a namespace
might be able to mount their own tmpfs with "noswap", and thereby evade
all accounting of the locked memory.

That would be an absolute no-no for this patch; but I assume that even
if so, it can be easily remedied by inserting an appropriate (unknown
to me!) privilege check where the "noswap" option is validated.

I did idly wonder what happens with "noswap" when CONFIG_SWAP is not
enabled, or no swap is enabled; but I think it would be a waste of time
and code to worry over doing anything different from whatever behaviour
falls out trivially.

You'll be sending a manpage update to Alejandro in due course, I think.

Thanks,
Hugh

>
> Acked-by: Christian Brauner <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> Documentation/filesystems/tmpfs.rst | 9 ++++++---
> Documentation/mm/unevictable-lru.rst | 2 ++
> include/linux/shmem_fs.h | 1 +
> mm/shmem.c | 28 +++++++++++++++++++++++++++-
> 4 files changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 1ec9a9f8196b..f18f46be5c0c 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -13,7 +13,8 @@ everything stored therein is lost.
>
> tmpfs puts everything into the kernel internal caches and grows and
> shrinks to accommodate the files it contains and is able to swap
> -unneeded pages out to swap space, and supports THP.
> +unneeded pages out to swap space, if swap was enabled for the tmpfs
> +mount. tmpfs also supports THP.
>
> tmpfs extends ramfs with a few userspace configurable options listed and
> explained further below, some of which can be reconfigured dynamically on the
> @@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynamically resize them.
> Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
> block layer at all.
>
> -Since tmpfs lives completely in the page cache and on swap, all tmpfs
> -pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> +Since tmpfs lives completely in the page cache and optionally on swap,
> +all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
> free(1). Notice that these counters also include shared memory
> (shmem, see ipcs(1)). The most reliable way to get the count is
> using df(1) and du(1).
> @@ -83,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default
> is half of the number of your physical RAM pages, or (on a
> machine with highmem) the number of lowmem RAM pages,
> whichever is the lower.
> +noswap Disables swap. Remounts must respect the original settings.
> + By default swap is enabled.
> ========= ============================================================
>
> These parameters accept a suffix k, m or g for kilo, mega and giga and
> diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevictable-lru.rst
> index 92ac5dca420c..d5ac8511eb67 100644
> --- a/Documentation/mm/unevictable-lru.rst
> +++ b/Documentation/mm/unevictable-lru.rst
> @@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
>
> * Those owned by ramfs.
>
> + * Those owned by tmpfs with the noswap mount option.
> +
> * Those mapped into SHM_LOCK'd shared memory regions.
>
> * Those mapped into VM_LOCKED [mlock()ed] VMAs.
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 103d1000a5a2..50bf82b36995 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -45,6 +45,7 @@ struct shmem_sb_info {
> kuid_t uid; /* Mount uid for root directory */
> kgid_t gid; /* Mount gid for root directory */
> bool full_inums; /* If i_ino should be uint or ino_t */
> + bool noswap; /* ignores VM reclaim / swap requests */
> ino_t next_ino; /* The next per-sb inode number to use */
> ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */
> struct mempolicy *mpol; /* default memory policy for mappings */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index dfd995da77b4..2e122c72b375 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -119,10 +119,12 @@ struct shmem_options {
> bool full_inums;
> int huge;
> int seen;
> + bool noswap;
> #define SHMEM_SEEN_BLOCKS 1
> #define SHMEM_SEEN_INODES 2
> #define SHMEM_SEEN_HUGE 4
> #define SHMEM_SEEN_INUMS 8
> +#define SHMEM_SEEN_NOSWAP 16
> };
>
> #ifdef CONFIG_TMPFS
> @@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> struct address_space *mapping = folio->mapping;
> struct inode *inode = mapping->host;
> struct shmem_inode_info *info = SHMEM_I(inode);
> + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> swp_entry_t swap;
> pgoff_t index;
>
> @@ -1350,7 +1353,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> if (WARN_ON_ONCE(!wbc->for_reclaim))
> goto redirty;
>
> - if (WARN_ON_ONCE(info->flags & VM_LOCKED))
> + if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap))
> goto redirty;
>
> if (!total_swap_pages)
> @@ -2487,6 +2490,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
> shmem_set_inode_flags(inode, info->fsflags);
> INIT_LIST_HEAD(&info->shrinklist);
> INIT_LIST_HEAD(&info->swaplist);
> + if (sbinfo->noswap)
> + mapping_set_unevictable(inode->i_mapping);
> simple_xattrs_init(&info->xattrs);
> cache_no_acl(inode);
> mapping_set_large_folios(inode->i_mapping);
> @@ -3574,6 +3579,7 @@ enum shmem_param {
> Opt_uid,
> Opt_inode32,
> Opt_inode64,
> + Opt_noswap,
> };
>
> static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3595,6 +3601,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
> fsparam_u32 ("uid", Opt_uid),
> fsparam_flag ("inode32", Opt_inode32),
> fsparam_flag ("inode64", Opt_inode64),
> + fsparam_flag ("noswap", Opt_noswap),
> {}
> };
>
> @@ -3678,6 +3685,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
> ctx->full_inums = true;
> ctx->seen |= SHMEM_SEEN_INUMS;
> break;
> + case Opt_noswap:
> + ctx->noswap = true;
> + ctx->seen |= SHMEM_SEEN_NOSWAP;
> + break;
> }
> return 0;
>
> @@ -3776,6 +3787,14 @@ static int shmem_reconfigure(struct fs_context *fc)
> err = "Current inum too high to switch to 32-bit inums";
> goto out;
> }
> + if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) {
> + err = "Cannot disable swap on remount";
> + goto out;
> + }
> + if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) {
> + err = "Cannot enable swap on remount if it was disabled on first mount";
> + goto out;
> + }
>
> if (ctx->seen & SHMEM_SEEN_HUGE)
> sbinfo->huge = ctx->huge;
> @@ -3796,6 +3815,10 @@ static int shmem_reconfigure(struct fs_context *fc)
> sbinfo->mpol = ctx->mpol; /* transfers initial ref */
> ctx->mpol = NULL;
> }
> +
> + if (ctx->noswap)
> + sbinfo->noswap = true;
> +
> raw_spin_unlock(&sbinfo->stat_lock);
> mpol_put(mpol);
> return 0;
> @@ -3850,6 +3873,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
> seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge));
> #endif
> shmem_show_mpol(seq, sbinfo->mpol);
> + if (sbinfo->noswap)
> + seq_printf(seq, ",noswap");
> return 0;
> }
>
> @@ -3893,6 +3918,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
> ctx->inodes = shmem_default_max_inodes();
> if (!(ctx->seen & SHMEM_SEEN_INUMS))
> ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
> + sbinfo->noswap = ctx->noswap;
> } else {
> sb->s_flags |= SB_NOUSER;
> }
> --
> 2.39.1

2023-04-18 07:42:29

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap

On Mon, Apr 17, 2023 at 10:50:59PM -0700, Hugh Dickins wrote:
> On Thu, 9 Mar 2023, Luis Chamberlain wrote:
>
> > In doing experimentations with shmem having the option to avoid swap
> > becomes a useful mechanism. One of the *raves* about brd over shmem is
> > you can avoid swap, but that's not really a good reason to use brd if
> > we can instead use shmem. Using brd has its own good reasons to exist,
> > but just because "tmpfs" doesn't let you do that is not a great reason
> > to avoid it if we can easily add support for it.
> >
> > I don't add support for reconfiguring incompatible options, but if
> > we really wanted to we can add support for that.
> >
> > To avoid swap we use mapping_set_unevictable() upon inode creation,
> > and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.
>
> I have one big question here, which betrays my ignorance:
> I hope that you or Christian can reassure me on this.
>
> tmpfs has fs_flags FS_USERNS_MOUNT. I know nothing about namespaces,
> nothing; but from overhearings, wonder if an ordinary user in a namespace
> might be able to mount their own tmpfs with "noswap", and thereby evade
> all accounting of the locked memory.
>
> That would be an absolute no-no for this patch; but I assume that even
> if so, it can be easily remedied by inserting an appropriate (unknown
> to me!) privilege check where the "noswap" option is validated.

Oh, good catch. Thanks! So you would just need sm like:

diff --git a/mm/shmem.c b/mm/shmem.c
index 787e83791eb5..21ce9b26bb4d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3571,6 +3571,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
ctx->seen |= SHMEM_SEEN_INUMS;
break;
case Opt_noswap:
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN)) {
+ return invalfc(fc,
+ "Turning off swap in unprivileged tmpfs mounts unsupported");
+ }
ctx->noswap = true;
ctx->seen |= SHMEM_SEEN_NOSWAP;
break;

The fc->user_ns is the userns that the tmpfs mount will be mounted in, i.e.,
fc->user_ns will become sb->s_user_ns if FS_USERNS_MOUNT is raised. So with the
check above we require that the tmpfs instance must ultimately belong to the
initial userns and that the caller has CAP_SYS_ADMIN in the initial userns
(CAP_SYS_ADMIN guards swapon and swapoff) according to capabilities(7).

2023-04-18 21:25:25

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap

On Mon, Apr 17, 2023 at 10:50:59PM -0700, Hugh Dickins wrote:
> You'll be sending a manpage update to Alejandro in due course, I think.

Sure thing! Just need a git tree. I can send the updates as we reach
a consensus on where to store / share huge page shmem updates.

Luis

2023-04-18 21:46:12

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap



On 4/18/23 14:22, Luis Chamberlain wrote:
> On Mon, Apr 17, 2023 at 10:50:59PM -0700, Hugh Dickins wrote:
>> You'll be sending a manpage update to Alejandro in due course, I think.
>
> Sure thing! Just need a git tree. I can send the updates as we reach
> a consensus on where to store / share huge page shmem updates.
>
> Luis

From the latest man-page announcement:


man-pages-6.04 - manual pages for GNU/Linux

The release tarball is already available at <kernel.org>.

Tarball download:
<https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/>
Git repository:
<https://git.kernel.org/cgit/docs/man-pages/man-pages.git/>

--
~Randy

2023-04-18 22:14:57

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] shmem: add support to ignore swap

On Tue, Apr 18, 2023 at 09:38:10AM +0200, Christian Brauner wrote:
> On Mon, Apr 17, 2023 at 10:50:59PM -0700, Hugh Dickins wrote:
> > On Thu, 9 Mar 2023, Luis Chamberlain wrote:
> >
> > > In doing experimentations with shmem having the option to avoid swap
> > > becomes a useful mechanism. One of the *raves* about brd over shmem is
> > > you can avoid swap, but that's not really a good reason to use brd if
> > > we can instead use shmem. Using brd has its own good reasons to exist,
> > > but just because "tmpfs" doesn't let you do that is not a great reason
> > > to avoid it if we can easily add support for it.
> > >
> > > I don't add support for reconfiguring incompatible options, but if
> > > we really wanted to we can add support for that.
> > >
> > > To avoid swap we use mapping_set_unevictable() upon inode creation,
> > > and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.
> >
> > I have one big question here, which betrays my ignorance:
> > I hope that you or Christian can reassure me on this.
> >
> > tmpfs has fs_flags FS_USERNS_MOUNT. I know nothing about namespaces,
> > nothing; but from overhearings, wonder if an ordinary user in a namespace
> > might be able to mount their own tmpfs with "noswap", and thereby evade
> > all accounting of the locked memory.
> >
> > That would be an absolute no-no for this patch; but I assume that even
> > if so, it can be easily remedied by inserting an appropriate (unknown
> > to me!) privilege check where the "noswap" option is validated.
>
> Oh, good catch. Thanks! So you would just need sm like:
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 787e83791eb5..21ce9b26bb4d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3571,6 +3571,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
> ctx->seen |= SHMEM_SEEN_INUMS;
> break;
> case Opt_noswap:
> + if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN)) {
> + return invalfc(fc,
> + "Turning off swap in unprivileged tmpfs mounts unsupported");
> + }
> ctx->noswap = true;
> ctx->seen |= SHMEM_SEEN_NOSWAP;
> break;
>
> The fc->user_ns is the userns that the tmpfs mount will be mounted in, i.e.,
> fc->user_ns will become sb->s_user_ns if FS_USERNS_MOUNT is raised. So with the
> check above we require that the tmpfs instance must ultimately belong to the
> initial userns and that the caller has CAP_SYS_ADMIN in the initial userns
> (CAP_SYS_ADMIN guards swapon and swapoff) according to capabilities(7).

Christian, mind sending this as a fix?

Luis

2023-04-20 09:09:41

by Christian Brauner

[permalink] [raw]
Subject: [PATCH] shmem: restrict noswap option to initial user namespace

Prevent tmpfs instances mounted in an unprivileged namespaces from
evading accounting of locked memory by using the "noswap" mount option.

Cc: Luis Chamberlain <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
---
mm/shmem.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 787e83791eb5..21ce9b26bb4d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3571,6 +3571,10 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
ctx->seen |= SHMEM_SEEN_INUMS;
break;
case Opt_noswap:
+ if ((fc->user_ns != &init_user_ns) || !capable(CAP_SYS_ADMIN)) {
+ return invalfc(fc,
+ "Turning off swap in unprivileged tmpfs mounts unsupported");
+ }
ctx->noswap = true;
ctx->seen |= SHMEM_SEEN_NOSWAP;
break;
--
2.34.1

2023-04-20 19:26:02

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] shmem: restrict noswap option to initial user namespace

On Thu, Apr 20, 2023 at 10:57:43AM +0200, Christian Brauner wrote:
> Prevent tmpfs instances mounted in an unprivileged namespaces from
> evading accounting of locked memory by using the "noswap" mount option.
>
> Cc: Luis Chamberlain <[email protected]>
> Reported-by: Hugh Dickins <[email protected]>
> Link: https://lore.kernel.org/lkml/[email protected]
> Signed-off-by: Christian Brauner <[email protected]>

Reviewed-by: Luis Chamberlain <[email protected]>

Luis