Currently it's not possible to enable hibernation without also enabling
generic swap for a given swap area. These two use cases are not the
same. For example there may be users who want to enable hibernation,
but whose drives don't have the write endurance for generic swap
activities.
Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
generic swapping to it. This region can still be wired up for use in
suspend-to-disk activities, but will never have regular pages swapped to
it.
Signed-off-by: Evan Green <[email protected]>
---
include/linux/swap.h | 4 +++-
mm/swapfile.c | 24 ++++++++++++++++++------
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6f5a43251593c8..a9fc37e29c17d6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -28,10 +28,11 @@ struct pagevec;
#define SWAP_FLAG_DISCARD 0x10000 /* enable discard for swap */
#define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */
#define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */
+#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */
#define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \
SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \
- SWAP_FLAG_DISCARD_PAGES)
+ SWAP_FLAG_DISCARD_PAGES | SWAP_FLAG_NOSWAP)
#define SWAP_BATCH 64
static inline int current_is_kswapd(void)
@@ -182,6 +183,7 @@ enum {
SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */
SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */
SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */
+ SWP_NOSWAP = (1 << 13), /* use only for suspend, not swap */
/* add others here before... */
SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */
};
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1e07d1c776f2ae..164937f958c319 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -697,7 +697,8 @@ static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset,
if (si->inuse_pages == si->pages) {
si->lowest_bit = si->max;
si->highest_bit = 0;
- del_from_avail_list(si);
+ if (!(si->flags & SWP_NOSWAP))
+ del_from_avail_list(si);
}
}
@@ -726,7 +727,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
bool was_full = !si->highest_bit;
WRITE_ONCE(si->highest_bit, end);
- if (was_full && (si->flags & SWP_WRITEOK))
+ if (was_full &&
+ ((si->flags & (SWP_WRITEOK | SWP_NOSWAP)) == SWP_WRITEOK))
add_to_avail_list(si);
}
atomic_long_add(nr_entries, &nr_swap_pages);
@@ -1078,6 +1080,9 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
WARN(!(si->flags & SWP_WRITEOK),
"swap_info %d in list but !SWP_WRITEOK\n",
si->type);
+ WARN((si->flags & SWP_NOSWAP),
+ "swap_info %d in list but SWP_NOSWAP\n",
+ si->type);
__del_from_avail_list(si);
spin_unlock(&si->lock);
goto nextsi;
@@ -2469,7 +2474,8 @@ static void _enable_swap_info(struct swap_info_struct *p)
* swap_info_struct.
*/
plist_add(&p->list, &swap_active_head);
- add_to_avail_list(p);
+ if (!(p->flags & SWP_NOSWAP))
+ add_to_avail_list(p);
}
static void enable_swap_info(struct swap_info_struct *p, int prio,
@@ -2564,7 +2570,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
- del_from_avail_list(p);
+ if (!(p->flags & SWP_NOSWAP))
+ del_from_avail_list(p);
+
spin_lock(&p->lock);
if (p->prio < 0) {
struct swap_info_struct *si = p;
@@ -3329,16 +3337,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (swap_flags & SWAP_FLAG_PREFER)
prio =
(swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT;
+
+ if (swap_flags & SWAP_FLAG_NOSWAP)
+ p->flags |= SWP_NOSWAP;
enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
- pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n",
+ pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n",
p->pages<<(PAGE_SHIFT-10), name->name, p->prio,
nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
(p->flags & SWP_SOLIDSTATE) ? "SS" : "",
(p->flags & SWP_DISCARDABLE) ? "D" : "",
(p->flags & SWP_AREA_DISCARD) ? "s" : "",
(p->flags & SWP_PAGE_DISCARD) ? "c" : "",
- (frontswap_map) ? "FS" : "");
+ (frontswap_map) ? "FS" : "",
+ (p->flags & SWP_NOSWAP) ? "N" : "");
mutex_unlock(&swapon_mutex);
atomic_inc(&proc_poll_event);
--
2.31.0
Hi!
> Currently it's not possible to enable hibernation without also enabling
> generic swap for a given swap area. These two use cases are not the
> same. For example there may be users who want to enable hibernation,
> but whose drives don't have the write endurance for generic swap
> activities.
>
> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> generic swapping to it. This region can still be wired up for use in
> suspend-to-disk activities, but will never have regular pages swapped to
> it.
>
> Signed-off-by: Evan Green <[email protected]>
Makes sense to me.
Reviewed-by: Pavel Machek <[email protected]>
> #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */
> #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */
> +#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */
I'd say "only for hibernation". And actually maybe code would be more clear if logic was reverted.
Aha, and you may want to check... does the hibernation still work for you without the swap?
Because we need half memory free to create swap image and swap is really quite useful for that.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu, Jul 1, 2021 at 1:02 PM Pavel Machek <[email protected]> wrote:
>
> Hi!
>
> > Currently it's not possible to enable hibernation without also enabling
> > generic swap for a given swap area. These two use cases are not the
> > same. For example there may be users who want to enable hibernation,
> > but whose drives don't have the write endurance for generic swap
> > activities.
> >
> > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> > generic swapping to it. This region can still be wired up for use in
> > suspend-to-disk activities, but will never have regular pages swapped to
> > it.
> >
> > Signed-off-by: Evan Green <[email protected]>
>
> Makes sense to me.
>
> Reviewed-by: Pavel Machek <[email protected]>
Thanks!
>
> > #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */
> > #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */
> > +#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */
>
> I'd say "only for hibernation". And actually maybe code would be more clear if logic was reverted.
Sure about the rewording. Yes, I also thought about flipping the
polarity. This made more sense to me as an outlier condition, despite
the slight awkwardness of a negative flag. And the usermode flag has
to be written this way, so I might as well carry it through. I think
I'll keep it unless anyone feels strongly.
>
> Aha, and you may want to check... does the hibernation still work for you without the swap?
>
> Because we need half memory free to create swap image and swap is really quite useful for that.
Yes, hibernation still works. You're right that without another swap
space set up, it starts to fail when half of memory is used up. This
flag gives me the control to exclusively steer swap towards one device
and hibernate towards another.
-Evan
On 30.06.21 19:07, Evan Green wrote:
> Currently it's not possible to enable hibernation without also enabling
> generic swap for a given swap area. These two use cases are not the
> same. For example there may be users who want to enable hibernation,
> but whose drives don't have the write endurance for generic swap
> activities.
>
> Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> generic swapping to it. This region can still be wired up for use in
> suspend-to-disk activities, but will never have regular pages swapped to
> it.
Just to confirm: things like /proc/meminfo won't show this "swap that's
not actually swap" as free/total swap, correct? Maybe it's worth
spelling the expected system behavior out here.
--
Thanks,
David / dhildenb
On Mon, Jul 5, 2021 at 12:44 AM David Hildenbrand <[email protected]> wrote:
>
> On 30.06.21 19:07, Evan Green wrote:
> > Currently it's not possible to enable hibernation without also enabling
> > generic swap for a given swap area. These two use cases are not the
> > same. For example there may be users who want to enable hibernation,
> > but whose drives don't have the write endurance for generic swap
> > activities.
> >
> > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
> > generic swapping to it. This region can still be wired up for use in
> > suspend-to-disk activities, but will never have regular pages swapped to
> > it.
>
> Just to confirm: things like /proc/meminfo won't show this "swap that's
> not actually swap" as free/total swap, correct? Maybe it's worth
> spelling the expected system behavior out here.
Currently these noswap regions do still count in /proc/meminfo. I
suppose as you say it makes more sense for it not to count. I should
be able to carefully put some conditionals around the nr_swap_pages
and total_swap_pages accounting to fix that. I'll also document this
in the commit text as suggested.
When looking at that, I realized something else. Hibernate uses
swap_free() to release its image pages, which calls free_swap_slot().
That may land the page in a swap_slots_cache, causing it to possibly
leak back into general usage. I'm thinking I should just call
swap_entry_free() directly if NOSWAP is set. I gave that a quick test
and so far it looks good.
Other random musings I had while staring that this code:
It surprised me that there's swap_entry_free() and
__swap_entry_free(), but the one without underscores is the lower
level one (ie __swap_entry_free() winds through the cache,
swap_entry_free() just does it). I'm not really sure if renaming those
is worth the churn or not: leaning towards no.
It's also interesting that scan_swap_map_slots() chooses whether or
not to attempt reclaim based on vm_swap_full(). vm_swap_full() returns
true if swap globally is 50% full or not. But hibernate is restricted
to a single swap device. So you could find yourself in a situation
where the hibernate device was full-but-reclaimable, and other areas
aren't very full. This might cause hibernations to fail because we
never attempted to reclaim swap. Maybe this never comes up in practice
because people don't use multiple swap devices. Or maybe we naturally
tend to spread the swap load evenly such that looking at the global
counts is roughly equivalent to looking at a single one. Shower
thoughts. I'll keep this in mind when I'm doing my own testing to see
if it ever comes up.
Thanks for the review, I'll plan to post a v2 in the next couple days.
-Evan