2012-10-27 21:21:03

by Cesar Eduardo Barros

[permalink] [raw]
Subject: [PATCH 0/2] mm: do not call frontswap_init() during swapoff

The call to frontswap_init() was added in a place where it is called not
only from sys_swapon, but also from sys_swapoff. This pair of patches
fixes that.

The first patch moves the acquisition of swap_lock from enable_swap_info
to two separate helpers, one for sys_swapon and one for sys_swapoff. As
a bonus, it also makes the code for sys_swapoff less subtle.

The second patch moves the call to frontswap_init() from the common code
to the helper used only by sys_swapon.

Compile-tested only, but should be safe.

Cesar Eduardo Barros (2):
mm: refactor reinsert of swap_info in sys_swapoff
mm: do not call frontswap_init() during swapoff

mm/swapfile.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)

--
1.7.11.7


2012-10-27 21:21:04

by Cesar Eduardo Barros

[permalink] [raw]
Subject: [PATCH 2/2] mm: do not call frontswap_init() during swapoff

The call to frontswap_init() was added within enable_swap_info(), which
was called not only during sys_swapon, but also to reinsert the
swap_info into the swap_list in case of failure of try_to_unuse() within
sys_swapoff. This means that frontswap_init() might be called more than
once for the same swap area.

While as far as I could see no frontswap implementation has any problem
with it (and in fact, all the ones I found ignore the parameter passed
to frontswap_init), this could change in the future.

To prevent future problems, move the call to frontswap_init() to outside
the code shared between sys_swapon and sys_swapoff.

Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Dan Magenheimer <[email protected]>
Signed-off-by: Cesar Eduardo Barros <[email protected]>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 886db96..088daf4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1471,7 +1471,6 @@ static void _enable_swap_info(struct swap_info_struct *p, int prio,
swap_list.head = swap_list.next = p->type;
else
swap_info[prev]->next = p->type;
- frontswap_init(p->type);
}

static void enable_swap_info(struct swap_info_struct *p, int prio,
@@ -1480,6 +1479,7 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
{
spin_lock(&swap_lock);
_enable_swap_info(p, prio, swap_map, frontswap_map);
+ frontswap_init(p->type);
spin_unlock(&swap_lock);
}

--
1.7.11.7

2012-10-27 21:21:06

by Cesar Eduardo Barros

[permalink] [raw]
Subject: [PATCH 1/2] mm: refactor reinsert of swap_info in sys_swapoff

The block within sys_swapoff which re-inserts the swap_info into the
swap_list in case of failure of try_to_unuse() reads a few values outside
the swap_lock. While this is safe at that point, it is subtle code.

Simplify the code by moving the reading of these values to a separate
function, refactoring it a bit so they are read from within the
swap_lock. This is easier to understand, and matches better the way it
worked before I unified the insertion of the swap_info from both
sys_swapon and sys_swapoff.

This change should make no functional difference. The only real change
is moving the read of two or three structure fields to within the lock
(frontswap_map_get() is nothing more than a read of p->frontswap_map).

Signed-off-by: Cesar Eduardo Barros <[email protected]>
---
mm/swapfile.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 71cd288..886db96 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1443,13 +1443,12 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
return generic_swapfile_activate(sis, swap_file, span);
}

-static void enable_swap_info(struct swap_info_struct *p, int prio,
+static void _enable_swap_info(struct swap_info_struct *p, int prio,
unsigned char *swap_map,
unsigned long *frontswap_map)
{
int i, prev;

- spin_lock(&swap_lock);
if (prio >= 0)
p->prio = prio;
else
@@ -1473,6 +1472,21 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
else
swap_info[prev]->next = p->type;
frontswap_init(p->type);
+}
+
+static void enable_swap_info(struct swap_info_struct *p, int prio,
+ unsigned char *swap_map,
+ unsigned long *frontswap_map)
+{
+ spin_lock(&swap_lock);
+ _enable_swap_info(p, prio, swap_map, frontswap_map);
+ spin_unlock(&swap_lock);
+}
+
+static void reinsert_swap_info(struct swap_info_struct *p)
+{
+ spin_lock(&swap_lock);
+ _enable_swap_info(p, p->prio, p->swap_map, frontswap_map_get(p));
spin_unlock(&swap_lock);
}

@@ -1549,14 +1563,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);

if (err) {
- /*
- * reading p->prio and p->swap_map outside the lock is
- * safe here because only sys_swapon and sys_swapoff
- * change them, and there can be no other sys_swapon or
- * sys_swapoff for this swap_info_struct at this point.
- */
/* re-insert swap space back into swap_list */
- enable_swap_info(p, p->prio, p->swap_map, frontswap_map_get(p));
+ reinsert_swap_info(p);
goto out_dput;
}

--
1.7.11.7

2012-10-30 21:04:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: refactor reinsert of swap_info in sys_swapoff

On Sat, 27 Oct 2012 19:20:46 -0200
Cesar Eduardo Barros <[email protected]> wrote:

> The block within sys_swapoff which re-inserts the swap_info into the
> swap_list in case of failure of try_to_unuse() reads a few values outside
> the swap_lock. While this is safe at that point, it is subtle code.
>
> Simplify the code by moving the reading of these values to a separate
> function, refactoring it a bit so they are read from within the
> swap_lock. This is easier to understand, and matches better the way it
> worked before I unified the insertion of the swap_info from both
> sys_swapon and sys_swapoff.
>
> This change should make no functional difference. The only real change
> is moving the read of two or three structure fields to within the lock
> (frontswap_map_get() is nothing more than a read of p->frontswap_map).

Your patch doesn't change this, but... it is very unusual for any
subsystem's ->init method to be called under a spinlock. Because it is
highly likely that such a method will wish to do things such as memory
allocation.

It is rare and unlikely for an ->init() method to *need* such external
locking, because all the objects it is dealing with cannot be looked up
by other threads because nothing has been registered anywhere yet.

So either frontswap is doing something wrong here or there's some
subtlety which escapes me. If the former then we should try to get
that ->init call to happen outside swap_lock.

And if we can do that, perhaps we can fix the regrettable GFP_ATOMIC
in zcache_new_pool().

2012-10-30 22:48:53

by Cesar Eduardo Barros

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: refactor reinsert of swap_info in sys_swapoff

Em 30-10-2012 19:04, Andrew Morton escreveu:
> Your patch doesn't change this, but... it is very unusual for any
> subsystem's ->init method to be called under a spinlock. Because it is
> highly likely that such a method will wish to do things such as memory
> allocation.
>
> It is rare and unlikely for an ->init() method to *need* such external
> locking, because all the objects it is dealing with cannot be looked up
> by other threads because nothing has been registered anywhere yet.

The frontswap_init() method is being passed the swap_info_struct's
->type, which it uses to get back the swap_info_struct itself, which it
then uses to check if the frontswap_map allocation succeeded. As noted
by the commit message for commit 38b5faf (mm: frontswap: core swap
subsystem hooks and headers), that allocation can fail, which will do
nothing more than not enable frontswap for that swap area.

The same parameter is then passed down to the ->init() method, which
proceeds to sumarily ignore it on the three in-tree implementations I
looked at.

Yeah, it looks like a violation of YAGNI to me, and doing things in a
more roundabount way than is justified. Why pass the ->type and then get
the pointer from it instead of just passing the pointer in the first
place? Or better yet, why not pass the frontswap_map pointer? Even
better, why not a boolean saying whether it worked? Even better, *why
not just put the conditional inside enable_swap_info* and pass no
parameter at all?

> So either frontswap is doing something wrong here or there's some
> subtlety which escapes me. If the former then we should try to get
> that ->init call to happen outside swap_lock.

I believe the swap_lock is protecting the poolid. It is possible that
other things in the frontswap code are being called before the first
swapon with a successful frontswap_map allocation (which is when the
poolid would get allocated).

A quick look suggests the poolid only gets set on an initcall or in the
->init() method; perhaps a local mutex (to prevent double allocation)
and an atomic update of the poolid would be enough to move it outside
the lock (and also outside the swapon_mutex).

But that would work only if no out-of-tree frontswap module needs it
within the swap_lock.

> And if we can do that, perhaps we can fix the regrettable GFP_ATOMIC
> in zcache_new_pool().


--
Cesar Eduardo Barros
[email protected]
[email protected]

2012-10-30 23:13:09

by Cesar Eduardo Barros

[permalink] [raw]
Subject: [PATCH RFC] mm: simplify frontswap_init()

The function frontswap_init() uses the passed parameter only to check
for the presence of the frontswap_map. It is also passed down to
frontswap_ops.init(), but all implementations of it in the kernel ignore
the parameter.

Do the check for frontswap_map in the caller instead and remove the
parameter from frontswap_init() and frontswap_ops.init().

Also, __frontswap_init() was exported, but its only caller (via an
inline function) is mm/swapfile.c, which cannot be built as a module.
Remove the unnecessary export.

Signed-off-by: Cesar Eduardo Barros <[email protected]>
---

Not even compile tested, just a quick patch to show what I was thinking
of, but feel free to apply if you think it is good.

I might write another patch to move it outside the lock later, but I
would have to read the frontswap code more carefully first.

drivers/staging/ramster/zcache-main.c | 2 +-
drivers/staging/zcache/zcache-main.c | 2 +-
drivers/xen/tmem.c | 2 +-
include/linux/frontswap.h | 8 ++++----
mm/frontswap.c | 10 ++--------
mm/swapfile.c | 3 ++-
6 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
index a09dd5c..b3f01c9 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -1610,7 +1610,7 @@ static void zcache_frontswap_flush_area(unsigned type)
}
}

-static void zcache_frontswap_init(unsigned ignored)
+static void zcache_frontswap_init(void)
{
/* a single tmem poolid is used for all frontswap "types" (swapfiles) */
if (zcache_frontswap_poolid < 0)
diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index 52b43b7..cb67635 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -1903,7 +1903,7 @@ static void zcache_frontswap_flush_area(unsigned type)
}
}

-static void zcache_frontswap_init(unsigned ignored)
+static void zcache_frontswap_init(void)
{
/* a single tmem poolid is used for all frontswap "types" (swapfiles) */
if (zcache_frontswap_poolid < 0)
diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index 144564e..7156ff0 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -343,7 +343,7 @@ static void tmem_frontswap_flush_area(unsigned type)
(void)xen_tmem_flush_object(pool, oswiz(type, ind));
}

-static void tmem_frontswap_init(unsigned ignored)
+static void tmem_frontswap_init(void)
{
struct tmem_pool_uuid private = TMEM_POOL_PRIVATE_UUID;

diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
index 3044254..6374c80 100644
--- a/include/linux/frontswap.h
+++ b/include/linux/frontswap.h
@@ -6,7 +6,7 @@
#include <linux/bitops.h>

struct frontswap_ops {
- void (*init)(unsigned);
+ void (*init)(void);
int (*store)(unsigned, pgoff_t, struct page *);
int (*load)(unsigned, pgoff_t, struct page *);
void (*invalidate_page)(unsigned, pgoff_t);
@@ -22,7 +22,7 @@ extern void frontswap_writethrough(bool);
#define FRONTSWAP_HAS_EXCLUSIVE_GETS
extern void frontswap_tmem_exclusive_gets(bool);

-extern void __frontswap_init(unsigned type);
+extern void __frontswap_init(void);
extern int __frontswap_store(struct page *page);
extern int __frontswap_load(struct page *page);
extern void __frontswap_invalidate_page(unsigned, pgoff_t);
@@ -120,10 +120,10 @@ static inline void frontswap_invalidate_area(unsigned type)
__frontswap_invalidate_area(type);
}

-static inline void frontswap_init(unsigned type)
+static inline void frontswap_init(void)
{
if (frontswap_enabled)
- __frontswap_init(type);
+ __frontswap_init();
}

#endif /* _LINUX_FRONTSWAP_H */
diff --git a/mm/frontswap.c b/mm/frontswap.c
index 2890e67..d13661b 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -115,16 +115,10 @@ EXPORT_SYMBOL(frontswap_tmem_exclusive_gets);
/*
* Called when a swap device is swapon'd.
*/
-void __frontswap_init(unsigned type)
+void __frontswap_init(void)
{
- struct swap_info_struct *sis = swap_info[type];
-
- BUG_ON(sis == NULL);
- if (sis->frontswap_map == NULL)
- return;
- frontswap_ops.init(type);
+ frontswap_ops.init();
}
-EXPORT_SYMBOL(__frontswap_init);

static inline void __frontswap_clear(struct swap_info_struct *sis, pgoff_t offset)
{
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 088daf4..28c26bd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1479,7 +1479,8 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
{
spin_lock(&swap_lock);
_enable_swap_info(p, prio, swap_map, frontswap_map);
- frontswap_init(p->type);
+ if (frontswap_map)
+ frontswap_init();
spin_unlock(&swap_lock);
}

--
1.7.11.7

2012-10-31 13:56:42

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH RFC] mm: simplify frontswap_init()

On Tue, Oct 30, 2012 at 09:12:53PM -0200, Cesar Eduardo Barros wrote:
> The function frontswap_init() uses the passed parameter only to check
> for the presence of the frontswap_map. It is also passed down to
> frontswap_ops.init(), but all implementations of it in the kernel ignore
> the parameter.
>
> Do the check for frontswap_map in the caller instead and remove the
> parameter from frontswap_init() and frontswap_ops.init().
>
> Also, __frontswap_init() was exported, but its only caller (via an
> inline function) is mm/swapfile.c, which cannot be built as a module.
> Remove the unnecessary export.
>
> Signed-off-by: Cesar Eduardo Barros <[email protected]>
> ---
>
> Not even compile tested, just a quick patch to show what I was thinking
> of, but feel free to apply if you think it is good.

That looks good.
>
> I might write another patch to move it outside the lock later, but I
> would have to read the frontswap code more carefully first.
>
> drivers/staging/ramster/zcache-main.c | 2 +-
> drivers/staging/zcache/zcache-main.c | 2 +-
> drivers/xen/tmem.c | 2 +-
> include/linux/frontswap.h | 8 ++++----
> mm/frontswap.c | 10 ++--------
> mm/swapfile.c | 3 ++-
> 6 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/staging/ramster/zcache-main.c b/drivers/staging/ramster/zcache-main.c
> index a09dd5c..b3f01c9 100644
> --- a/drivers/staging/ramster/zcache-main.c
> +++ b/drivers/staging/ramster/zcache-main.c
> @@ -1610,7 +1610,7 @@ static void zcache_frontswap_flush_area(unsigned type)
> }
> }
>
> -static void zcache_frontswap_init(unsigned ignored)
> +static void zcache_frontswap_init(void)
> {
> /* a single tmem poolid is used for all frontswap "types" (swapfiles) */
> if (zcache_frontswap_poolid < 0)
> diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
> index 52b43b7..cb67635 100644
> --- a/drivers/staging/zcache/zcache-main.c
> +++ b/drivers/staging/zcache/zcache-main.c
> @@ -1903,7 +1903,7 @@ static void zcache_frontswap_flush_area(unsigned type)
> }
> }
>
> -static void zcache_frontswap_init(unsigned ignored)
> +static void zcache_frontswap_init(void)
> {
> /* a single tmem poolid is used for all frontswap "types" (swapfiles) */
> if (zcache_frontswap_poolid < 0)
> diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
> index 144564e..7156ff0 100644
> --- a/drivers/xen/tmem.c
> +++ b/drivers/xen/tmem.c
> @@ -343,7 +343,7 @@ static void tmem_frontswap_flush_area(unsigned type)
> (void)xen_tmem_flush_object(pool, oswiz(type, ind));
> }
>
> -static void tmem_frontswap_init(unsigned ignored)
> +static void tmem_frontswap_init(void)
> {
> struct tmem_pool_uuid private = TMEM_POOL_PRIVATE_UUID;
>
> diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
> index 3044254..6374c80 100644
> --- a/include/linux/frontswap.h
> +++ b/include/linux/frontswap.h
> @@ -6,7 +6,7 @@
> #include <linux/bitops.h>
>
> struct frontswap_ops {
> - void (*init)(unsigned);
> + void (*init)(void);
> int (*store)(unsigned, pgoff_t, struct page *);
> int (*load)(unsigned, pgoff_t, struct page *);
> void (*invalidate_page)(unsigned, pgoff_t);
> @@ -22,7 +22,7 @@ extern void frontswap_writethrough(bool);
> #define FRONTSWAP_HAS_EXCLUSIVE_GETS
> extern void frontswap_tmem_exclusive_gets(bool);
>
> -extern void __frontswap_init(unsigned type);
> +extern void __frontswap_init(void);
> extern int __frontswap_store(struct page *page);
> extern int __frontswap_load(struct page *page);
> extern void __frontswap_invalidate_page(unsigned, pgoff_t);
> @@ -120,10 +120,10 @@ static inline void frontswap_invalidate_area(unsigned type)
> __frontswap_invalidate_area(type);
> }
>
> -static inline void frontswap_init(unsigned type)
> +static inline void frontswap_init(void)
> {
> if (frontswap_enabled)
> - __frontswap_init(type);
> + __frontswap_init();
> }
>
> #endif /* _LINUX_FRONTSWAP_H */
> diff --git a/mm/frontswap.c b/mm/frontswap.c
> index 2890e67..d13661b 100644
> --- a/mm/frontswap.c
> +++ b/mm/frontswap.c
> @@ -115,16 +115,10 @@ EXPORT_SYMBOL(frontswap_tmem_exclusive_gets);
> /*
> * Called when a swap device is swapon'd.
> */
> -void __frontswap_init(unsigned type)
> +void __frontswap_init(void)
> {
> - struct swap_info_struct *sis = swap_info[type];
> -
> - BUG_ON(sis == NULL);
> - if (sis->frontswap_map == NULL)
> - return;
> - frontswap_ops.init(type);
> + frontswap_ops.init();
> }
> -EXPORT_SYMBOL(__frontswap_init);
>
> static inline void __frontswap_clear(struct swap_info_struct *sis, pgoff_t offset)
> {
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 088daf4..28c26bd 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1479,7 +1479,8 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
> {
> spin_lock(&swap_lock);
> _enable_swap_info(p, prio, swap_map, frontswap_map);
> - frontswap_init(p->type);
> + if (frontswap_map)
> + frontswap_init();
> spin_unlock(&swap_lock);
> }
>
> --
> 1.7.11.7

2012-10-31 13:59:09

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 1/2] mm: refactor reinsert of swap_info in sys_swapoff

On Tue, Oct 30, 2012 at 02:04:17PM -0700, Andrew Morton wrote:
> On Sat, 27 Oct 2012 19:20:46 -0200
> Cesar Eduardo Barros <[email protected]> wrote:
>
> > The block within sys_swapoff which re-inserts the swap_info into the
> > swap_list in case of failure of try_to_unuse() reads a few values outside
> > the swap_lock. While this is safe at that point, it is subtle code.
> >
> > Simplify the code by moving the reading of these values to a separate
> > function, refactoring it a bit so they are read from within the
> > swap_lock. This is easier to understand, and matches better the way it
> > worked before I unified the insertion of the swap_info from both
> > sys_swapon and sys_swapoff.
> >
> > This change should make no functional difference. The only real change
> > is moving the read of two or three structure fields to within the lock
> > (frontswap_map_get() is nothing more than a read of p->frontswap_map).
>
> Your patch doesn't change this, but... it is very unusual for any
> subsystem's ->init method to be called under a spinlock. Because it is
> highly likely that such a method will wish to do things such as memory
> allocation.
>
> It is rare and unlikely for an ->init() method to *need* such external
> locking, because all the objects it is dealing with cannot be looked up
> by other threads because nothing has been registered anywhere yet.

I don't believe it actually needs that locking. Dan, do you recall
the details of this?
>
> So either frontswap is doing something wrong here or there's some
> subtlety which escapes me. If the former then we should try to get
> that ->init call to happen outside swap_lock.

Agreed.
>
> And if we can do that, perhaps we can fix the regrettable GFP_ATOMIC
> in zcache_new_pool().

Ouch. Yes.


FYI, thanks for pulling those two patches - they looked good to me
but I hadn't had a chance to test them so did not want to comment on them
until that happen. Dan beat me to it and he did test them.

2012-10-31 14:45:00

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [PATCH 0/2] mm: do not call frontswap_init() during swapoff

> From: Cesar Eduardo Barros [mailto:[email protected]]
> Sent: Saturday, October 27, 2012 3:21 PM
> To: [email protected]
> Cc: [email protected]; Konrad Rzeszutek Wilk; Dan Magenheimer; Andrew Morton; Mel Gorman;
> Rik van Riel; KAMEZAWA Hiroyuki; Johannes Weiner; Cesar Eduardo Barros
> Subject: [PATCH 0/2] mm: do not call frontswap_init() during swapoff
>
> The call to frontswap_init() was added in a place where it is called not
> only from sys_swapon, but also from sys_swapoff. This pair of patches
> fixes that.
>
> The first patch moves the acquisition of swap_lock from enable_swap_info
> to two separate helpers, one for sys_swapon and one for sys_swapoff. As
> a bonus, it also makes the code for sys_swapoff less subtle.
>
> The second patch moves the call to frontswap_init() from the common code
> to the helper used only by sys_swapon.
>
> Compile-tested only, but should be safe.
>
> Cesar Eduardo Barros (2):
> mm: refactor reinsert of swap_info in sys_swapoff
> mm: do not call frontswap_init() during swapoff
>
> mm/swapfile.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)

Belated but, I'm told, better late than never.

Minimally tested to ensure that frontswap continues
to work properly with some disk swap activity, not
exhaustively tested for swap in general.

Acked-by: Dan Magenheimer <[email protected]>