Currently it's not possible to enable hibernation without also enabling
generic swap for a given swap area. These two use cases are not the
same. For example there may be users who want to enable hibernation,
but whose drives don't have the write endurance for generic swap
activities. Swap and hibernate also have different security/integrity
requirements, prompting folks to possibly set up something like block-level
integrity for swap and image-level integrity for hibernate. Keeping swap
and hibernate separate in these cases becomes not just a matter of
preference, but correctness.
Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
generic swapping to it. This region can still be wired up for use in
suspend-to-disk activities, but will never have regular pages swapped to
it. This flag will be passed in by utilities like swapon(8), usage would
probably look something like: swapon -o noswap /dev/sda2.
Swap regions with SWAP_FLAG_NOSWAP set will not appear in /proc/meminfo
under SwapTotal and SwapFree, since they are not usable as general swap.
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Updated commit message with additional explanation [Andrew]
Changes in v2:
- NOSWAP regions should not contribute to Swap stats in /proc/meminfo.
[David]
- Adjusted comment of SWAP_FLAG_NOSWAP [Pavel]
- Note: Opted not to take Pavel's tag since enough has changed in this
revision to warrant another look.
- Call swap_entry_free() in swap_free to avoid NOSWAP leaks back into
the general pool via swap_slots_cache [me].
include/linux/swap.h | 4 +++-
mm/swapfile.c | 52 +++++++++++++++++++++++++++++++-------------
2 files changed, 40 insertions(+), 16 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index cdf0957a88a49a..5e1d80be84bb02 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -28,10 +28,11 @@ struct pagevec;
#define SWAP_FLAG_DISCARD 0x10000 /* enable discard for swap */
#define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */
#define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */
+#define SWAP_FLAG_NOSWAP 0x80000 /* use only for hibernate, not swap */
#define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \
SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \
- SWAP_FLAG_DISCARD_PAGES)
+ SWAP_FLAG_DISCARD_PAGES | SWAP_FLAG_NOSWAP)
#define SWAP_BATCH 64
static inline int current_is_kswapd(void)
@@ -182,6 +183,7 @@ enum {
SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */
SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */
SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */
+ SWP_NOSWAP = (1 << 13), /* use only for suspend, not swap */
/* add others here before... */
SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */
};
diff --git a/mm/swapfile.c b/mm/swapfile.c
index e3dcaeecc50f54..ca0e5e5d1f3074 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -697,7 +697,8 @@ static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset,
if (si->inuse_pages == si->pages) {
si->lowest_bit = si->max;
si->highest_bit = 0;
- del_from_avail_list(si);
+ if (!(si->flags & SWP_NOSWAP))
+ del_from_avail_list(si);
}
}
@@ -726,10 +727,12 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
bool was_full = !si->highest_bit;
WRITE_ONCE(si->highest_bit, end);
- if (was_full && (si->flags & SWP_WRITEOK))
+ if (was_full &&
+ ((si->flags & (SWP_WRITEOK | SWP_NOSWAP)) == SWP_WRITEOK))
add_to_avail_list(si);
}
- atomic_long_add(nr_entries, &nr_swap_pages);
+ if (!(si->flags & SWP_NOSWAP))
+ atomic_long_add(nr_entries, &nr_swap_pages);
si->inuse_pages -= nr_entries;
if (si->flags & SWP_BLKDEV)
swap_slot_free_notify =
@@ -1078,6 +1081,9 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
WARN(!(si->flags & SWP_WRITEOK),
"swap_info %d in list but !SWP_WRITEOK\n",
si->type);
+ WARN((si->flags & SWP_NOSWAP),
+ "swap_info %d in list but SWP_NOSWAP\n",
+ si->type);
__del_from_avail_list(si);
spin_unlock(&si->lock);
goto nextsi;
@@ -1338,8 +1344,12 @@ void swap_free(swp_entry_t entry)
struct swap_info_struct *p;
p = _swap_info_get(entry);
- if (p)
- __swap_entry_free(p, entry);
+ if (p) {
+ if (p->flags & SWP_NOSWAP)
+ swap_entry_free(p, entry);
+ else
+ __swap_entry_free(p, entry);
+ }
}
/*
@@ -1783,8 +1793,10 @@ swp_entry_t get_swap_page_of_type(int type)
/* This is called for allocating swap entry, not cache */
spin_lock(&si->lock);
- if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry))
- atomic_long_dec(&nr_swap_pages);
+ if ((si->flags & SWP_WRITEOK) && scan_swap_map_slots(si, 1, 1, &entry)) {
+ if (!(si->flags & SWP_NOSWAP))
+ atomic_long_dec(&nr_swap_pages);
+ }
spin_unlock(&si->lock);
fail:
return entry;
@@ -2454,8 +2466,6 @@ static void setup_swap_info(struct swap_info_struct *p, int prio,
static void _enable_swap_info(struct swap_info_struct *p)
{
p->flags |= SWP_WRITEOK;
- atomic_long_add(p->pages, &nr_swap_pages);
- total_swap_pages += p->pages;
assert_spin_locked(&swap_lock);
/*
@@ -2469,7 +2479,11 @@ static void _enable_swap_info(struct swap_info_struct *p)
* swap_info_struct.
*/
plist_add(&p->list, &swap_active_head);
- add_to_avail_list(p);
+ if (!(p->flags & SWP_NOSWAP)) {
+ atomic_long_add(p->pages, &nr_swap_pages);
+ total_swap_pages += p->pages;
+ add_to_avail_list(p);
+ }
}
static void enable_swap_info(struct swap_info_struct *p, int prio,
@@ -2564,7 +2578,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
- del_from_avail_list(p);
+ if (!(p->flags & SWP_NOSWAP))
+ del_from_avail_list(p);
+
spin_lock(&p->lock);
if (p->prio < 0) {
struct swap_info_struct *si = p;
@@ -2581,8 +2597,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
least_priority++;
}
plist_del(&p->list, &swap_active_head);
- atomic_long_sub(p->pages, &nr_swap_pages);
- total_swap_pages -= p->pages;
+ if (!(p->flags & SWP_NOSWAP)) {
+ atomic_long_sub(p->pages, &nr_swap_pages);
+ total_swap_pages -= p->pages;
+ }
p->flags &= ~SWP_WRITEOK;
spin_unlock(&p->lock);
spin_unlock(&swap_lock);
@@ -3335,16 +3353,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (swap_flags & SWAP_FLAG_PREFER)
prio =
(swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT;
+
+ if (swap_flags & SWAP_FLAG_NOSWAP)
+ p->flags |= SWP_NOSWAP;
enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
- pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n",
+ pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n",
p->pages<<(PAGE_SHIFT-10), name->name, p->prio,
nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
(p->flags & SWP_SOLIDSTATE) ? "SS" : "",
(p->flags & SWP_DISCARDABLE) ? "D" : "",
(p->flags & SWP_AREA_DISCARD) ? "s" : "",
(p->flags & SWP_PAGE_DISCARD) ? "c" : "",
- (frontswap_map) ? "FS" : "");
+ (frontswap_map) ? "FS" : "",
+ (p->flags & SWP_NOSWAP) ? "N" : "");
mutex_unlock(&swapon_mutex);
atomic_inc(&proc_poll_event);
--
2.31.0
On Wed, 21 Jul 2021 14:40:28 -0700 Evan Green <[email protected]> wrote:
> Currently it's not possible to enable hibernation without also enabling
> generic swap for a given swap area. These two use cases are not the
> same. For example there may be users who want to enable hibernation,
> but whose drives don't have the write endurance for generic swap
> activities. Swap and hibernate also have different security/integrity
> requirements, prompting folks to possibly set up something like block-level
> integrity for swap and image-level integrity for hibernate. Keeping swap
> and hibernate separate in these cases becomes not just a matter of
> preference, but correctness.
>
> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> generic swapping to it. This region can still be wired up for use in
> suspend-to-disk activities, but will never have regular pages swapped to
> it. This flag will be passed in by utilities like swapon(8), usage would
> probably look something like: swapon -o noswap /dev/sda2.
Will patches to swapon and its manpage be prepared?
> Swap regions with SWAP_FLAG_NOSWAP set will not appear in /proc/meminfo
> under SwapTotal and SwapFree, since they are not usable as general swap.
>
On 21.07.21 23:40, Evan Green wrote:
> Currently it's not possible to enable hibernation without also enabling
> generic swap for a given swap area. These two use cases are not the
> same. For example there may be users who want to enable hibernation,
> but whose drives don't have the write endurance for generic swap
> activities. Swap and hibernate also have different security/integrity
> requirements, prompting folks to possibly set up something like block-level
> integrity for swap and image-level integrity for hibernate. Keeping swap
> and hibernate separate in these cases becomes not just a matter of
> preference, but correctness.
>
> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> generic swapping to it. This region can still be wired up for use in
> suspend-to-disk activities, but will never have regular pages swapped to
> it. This flag will be passed in by utilities like swapon(8), usage would
> probably look something like: swapon -o noswap /dev/sda2.
Just a minor comment, I'd call it rather SWAP_FLAG_HIBERNATE_ONLY and
SWAP_FLAG_HIBERNATE_ONLY -- that calls the child by its name.
I think some other flags might not apply with that new flag set, right?
For example, does SWAP_FLAG_DISCARD_ONCE or SWP_AREA_DISCARD still have
any meaning with the new flag being set?
We should most probably disallow enabling any flag that doesn't make any
sense in combination.
Apart from that, I'd love to see a comment in here why the workaround
suggested by Michal isn't feasible -- essentially a summary of what we
discussed.
I had a quick glimpse and nothing jumed at me, no mm/swapfile.c expert,
though :)
--
Thanks,
David / dhildenb
On Wed, Jul 21, 2021 at 3:09 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 21 Jul 2021 14:40:28 -0700 Evan Green <[email protected]> wrote:
>
> > Currently it's not possible to enable hibernation without also enabling
> > generic swap for a given swap area. These two use cases are not the
> > same. For example there may be users who want to enable hibernation,
> > but whose drives don't have the write endurance for generic swap
> > activities. Swap and hibernate also have different security/integrity
> > requirements, prompting folks to possibly set up something like block-level
> > integrity for swap and image-level integrity for hibernate. Keeping swap
> > and hibernate separate in these cases becomes not just a matter of
> > preference, but correctness.
> >
> > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> > generic swapping to it. This region can still be wired up for use in
> > suspend-to-disk activities, but will never have regular pages swapped to
> > it. This flag will be passed in by utilities like swapon(8), usage would
> > probably look something like: swapon -o noswap /dev/sda2.
>
> Will patches to swapon and its manpage be prepared?
Yes, I'll work on that today. Thanks Andrew!
-Evan
On Thu, Jul 22, 2021 at 12:12 AM David Hildenbrand <[email protected]> wrote:
>
> On 21.07.21 23:40, Evan Green wrote:
> > Currently it's not possible to enable hibernation without also enabling
> > generic swap for a given swap area. These two use cases are not the
> > same. For example there may be users who want to enable hibernation,
> > but whose drives don't have the write endurance for generic swap
> > activities. Swap and hibernate also have different security/integrity
> > requirements, prompting folks to possibly set up something like block-level
> > integrity for swap and image-level integrity for hibernate. Keeping swap
> > and hibernate separate in these cases becomes not just a matter of
> > preference, but correctness.
> >
> > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> > generic swapping to it. This region can still be wired up for use in
> > suspend-to-disk activities, but will never have regular pages swapped to
> > it. This flag will be passed in by utilities like swapon(8), usage would
> > probably look something like: swapon -o noswap /dev/sda2.
>
> Just a minor comment, I'd call it rather SWAP_FLAG_HIBERNATE_ONLY and
> SWAP_FLAG_HIBERNATE_ONLY -- that calls the child by its name.
I went back and forth on this too. It seemed pretty close to toss-up
to me. I went with NOSWAP ultimately because it seemed more closely
tied to what the flag was actually doing, rather than building in my
one expected use case into the name. In some world years from now
where either hibernate has diverged, been deleted, or maybe some new
usage has been invented for swap space, the NOSWAP name felt like it
had a better chance of holding up. The argument is weak though, as
these features are pretty well cast in stone, and the likelihood of
any of those outcomes seems low. I can change it if you feel strongly,
but would probably keep it as-is otherwise.
>
> I think some other flags might not apply with that new flag set, right?
> For example, does SWAP_FLAG_DISCARD_ONCE or SWP_AREA_DISCARD still have
> any meaning with the new flag being set?
>
> We should most probably disallow enabling any flag that doesn't make any
> sense in combination.
Good point, I can send a followup patch for that. From my reading
SWAP_FLAG_DISCARD and SWAP_FLAG_DISCARD_ONCE are still valid, since
the discard can be run at swapon() time. SWAP_FLAG_PREFER (specifying
the priority) doesn't make sense, and SWAP_FLAG_DISCARD_PAGES never
kicks in because it's called at the cluster level. Hm, that sort of
seems like a bug that freed hibernate swap doesn't get discarded. I
can disallow it now as unsupported, but might send a patch to fix it
later.
>
> Apart from that, I'd love to see a comment in here why the workaround
> suggested by Michal isn't feasible -- essentially a summary of what we
> discussed.
Ah sorry, I had tried to clarify that in the commit text, but didn't
explicitly address the workaround. To summarize, the workaround keeps
generic swap out of your hibernate region... until hibernate time. But
once hibernate starts, a lot of swapping tends to happen when the
hiber-image is allocated. At this point the hibernate region is
eligible for general swap even with the workaround. The reasons I gave
for wanting to exclusively steer swap and hibernate are SSD write
wearing, different integrity solutions for swap vs hibernate, and our
own security changes that no-op out the swapon/swapoff syscalls after
init.
>
> I had a quick glimpse and nothing jumed at me, no mm/swapfile.c expert,
> though :)
Thanks David!
-Evan
>
>
>
> --
> Thanks,
>
> David / dhildenb
>
On 22.07.21 20:00, Evan Green wrote:
> On Thu, Jul 22, 2021 at 12:12 AM David Hildenbrand <[email protected]> wrote:
>>
>> On 21.07.21 23:40, Evan Green wrote:
>>> Currently it's not possible to enable hibernation without also enabling
>>> generic swap for a given swap area. These two use cases are not the
>>> same. For example there may be users who want to enable hibernation,
>>> but whose drives don't have the write endurance for generic swap
>>> activities. Swap and hibernate also have different security/integrity
>>> requirements, prompting folks to possibly set up something like block-level
>>> integrity for swap and image-level integrity for hibernate. Keeping swap
>>> and hibernate separate in these cases becomes not just a matter of
>>> preference, but correctness.
>>>
>>> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
>>> generic swapping to it. This region can still be wired up for use in
>>> suspend-to-disk activities, but will never have regular pages swapped to
>>> it. This flag will be passed in by utilities like swapon(8), usage would
>>> probably look something like: swapon -o noswap /dev/sda2.
>>
>> Just a minor comment, I'd call it rather SWAP_FLAG_HIBERNATE_ONLY and
>> SWAP_FLAG_HIBERNATE_ONLY -- that calls the child by its name.
>
> I went back and forth on this too. It seemed pretty close to toss-up
> to me. I went with NOSWAP ultimately because it seemed more closely
> tied to what the flag was actually doing, rather than building in my
> one expected use case into the name. In some world years from now
> where either hibernate has diverged, been deleted, or maybe some new
> usage has been invented for swap space, the NOSWAP name felt like it
> had a better chance of holding up. The argument is weak though, as
> these features are pretty well cast in stone, and the likelihood of
> any of those outcomes seems low. I can change it if you feel strongly,
> but would probably keep it as-is otherwise.
Just imagine technology Z popping up and using also the swap
infrastructure. What would be the semantics of NOSWAP? With
HIBERNATE_ONLY it's clear -- enable that device only for hibernation,
nothing else.
But you raise a good point: if hibernation isn't even possible in a
configuration (e.g., not configured into the kernel), we should simply
reject that flag. So if hibernation would vanish at some point
completely from the system, it would all be handled accordingly.
That would result in quite a consistent definition of
SWAP_FLAG_HIBERNATE_ONLY IMHO.
Makes sense?
>
>>
>> I think some other flags might not apply with that new flag set, right?
>> For example, does SWAP_FLAG_DISCARD_ONCE or SWP_AREA_DISCARD still have
>> any meaning with the new flag being set?
>>
>> We should most probably disallow enabling any flag that doesn't make any
>> sense in combination.
>
> Good point, I can send a followup patch for that. From my reading
I'd actually enjoy if we'd have that logic in the introducing patch.
> SWAP_FLAG_DISCARD and SWAP_FLAG_DISCARD_ONCE are still valid, since
> the discard can be run at swapon() time. SWAP_FLAG_PREFER (specifying
> the priority) doesn't make sense, and SWAP_FLAG_DISCARD_PAGES never
> kicks in because it's called at the cluster level. Hm, that sort of
> seems like a bug that freed hibernate swap doesn't get discarded. I
> can disallow it now as unsupported, but might send a patch to fix it
> later.
Might be worth fixing, indeed.
>
>>
>> Apart from that, I'd love to see a comment in here why the workaround
>> suggested by Michal isn't feasible -- essentially a summary of what we
>> discussed.
>
> Ah sorry, I had tried to clarify that in the commit text, but didn't
> explicitly address the workaround. To summarize, the workaround keeps
> generic swap out of your hibernate region... until hibernate time. But
> once hibernate starts, a lot of swapping tends to happen when the
> hiber-image is allocated. At this point the hibernate region is
> eligible for general swap even with the workaround. The reasons I gave
> for wanting to exclusively steer swap and hibernate are SSD write
> wearing, different integrity solutions for swap vs hibernate, and our
> own security changes that no-op out the swapon/swapoff syscalls after
> init.
>
That would be nice to have in the patch description :)
--
Thanks,
David / dhildenb
On Thu, Jul 22, 2021 at 11:58 PM David Hildenbrand <[email protected]> wrote:
>
> On 22.07.21 20:00, Evan Green wrote:
> > On Thu, Jul 22, 2021 at 12:12 AM David Hildenbrand <[email protected]> wrote:
> >>
> >> On 21.07.21 23:40, Evan Green wrote:
> >>> Currently it's not possible to enable hibernation without also enabling
> >>> generic swap for a given swap area. These two use cases are not the
> >>> same. For example there may be users who want to enable hibernation,
> >>> but whose drives don't have the write endurance for generic swap
> >>> activities. Swap and hibernate also have different security/integrity
> >>> requirements, prompting folks to possibly set up something like block-level
> >>> integrity for swap and image-level integrity for hibernate. Keeping swap
> >>> and hibernate separate in these cases becomes not just a matter of
> >>> preference, but correctness.
> >>>
> >>> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> >>> generic swapping to it. This region can still be wired up for use in
> >>> suspend-to-disk activities, but will never have regular pages swapped to
> >>> it. This flag will be passed in by utilities like swapon(8), usage would
> >>> probably look something like: swapon -o noswap /dev/sda2.
> >>
> >> Just a minor comment, I'd call it rather SWAP_FLAG_HIBERNATE_ONLY and
> >> SWAP_FLAG_HIBERNATE_ONLY -- that calls the child by its name.
> >
> > I went back and forth on this too. It seemed pretty close to toss-up
> > to me. I went with NOSWAP ultimately because it seemed more closely
> > tied to what the flag was actually doing, rather than building in my
> > one expected use case into the name. In some world years from now
> > where either hibernate has diverged, been deleted, or maybe some new
> > usage has been invented for swap space, the NOSWAP name felt like it
> > had a better chance of holding up. The argument is weak though, as
> > these features are pretty well cast in stone, and the likelihood of
> > any of those outcomes seems low. I can change it if you feel strongly,
> > but would probably keep it as-is otherwise.
>
> Just imagine technology Z popping up and using also the swap
> infrastructure. What would be the semantics of NOSWAP? With
> HIBERNATE_ONLY it's clear -- enable that device only for hibernation,
> nothing else.
>
> But you raise a good point: if hibernation isn't even possible in a
> configuration (e.g., not configured into the kernel), we should simply
> reject that flag. So if hibernation would vanish at some point
> completely from the system, it would all be handled accordingly.
>
> That would result in quite a consistent definition of
> SWAP_FLAG_HIBERNATE_ONLY IMHO.
>
> Makes sense?
Ok, I'll change the name, and reject the flag if hibernation is not enabled.
>
> >
> >>
> >> I think some other flags might not apply with that new flag set, right?
> >> For example, does SWAP_FLAG_DISCARD_ONCE or SWP_AREA_DISCARD still have
> >> any meaning with the new flag being set?
> >>
> >> We should most probably disallow enabling any flag that doesn't make any
> >> sense in combination.
> >
> > Good point, I can send a followup patch for that. From my reading
>
> I'd actually enjoy if we'd have that logic in the introducing patch.
Ok.
>
> > SWAP_FLAG_DISCARD and SWAP_FLAG_DISCARD_ONCE are still valid, since
> > the discard can be run at swapon() time. SWAP_FLAG_PREFER (specifying
> > the priority) doesn't make sense, and SWAP_FLAG_DISCARD_PAGES never
> > kicks in because it's called at the cluster level. Hm, that sort of
> > seems like a bug that freed hibernate swap doesn't get discarded. I
> > can disallow it now as unsupported, but might send a patch to fix it
> > later.
>
> Might be worth fixing, indeed.
>
> >
> >>
> >> Apart from that, I'd love to see a comment in here why the workaround
> >> suggested by Michal isn't feasible -- essentially a summary of what we
> >> discussed.
> >
> > Ah sorry, I had tried to clarify that in the commit text, but didn't
> > explicitly address the workaround. To summarize, the workaround keeps
> > generic swap out of your hibernate region... until hibernate time. But
> > once hibernate starts, a lot of swapping tends to happen when the
> > hiber-image is allocated. At this point the hibernate region is
> > eligible for general swap even with the workaround. The reasons I gave
> > for wanting to exclusively steer swap and hibernate are SSD write
> > wearing, different integrity solutions for swap vs hibernate, and our
> > own security changes that no-op out the swapon/swapoff syscalls after
> > init.
> >
>
> That would be nice to have in the patch description :)
Sure, I'll add that as well and send out a v4 shortly.
-Evan