2013-04-15 13:03:35

by Oskar Andero

[permalink] [raw]
Subject: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

From: Snild Dolkow <[email protected]>

Running multiple instances of LMK is not useful since it will try to
kill the same process.

This patch adds a spinlock to prevent multiple instances of the LMK
running at the same time. Uses spin_trylock and return on failure to
avoid blocking.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: Brian Swetland <[email protected]>
Reviewed-by: Radovan Lekanovic <[email protected]>
Signed-off-by: Snild Dolkow <[email protected]>
Signed-off-by: Oskar Andero <[email protected]>
---
drivers/staging/android/lowmemorykiller.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index 3b91b0f..0b19353 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -38,6 +38,7 @@
#include <linux/rcupdate.h>
#include <linux/profile.h>
#include <linux/notifier.h>
+#include <linux/spinlock.h>

static uint32_t lowmem_debug_level = 2;
static short lowmem_adj[6] = {
@@ -57,6 +58,8 @@ static int lowmem_minfree_size = 4;

static unsigned long lowmem_deathpending_timeout;

+#define LMK_BUSY (-1)
+
#define lowmem_print(level, x...) \
do { \
if (lowmem_debug_level >= (level)) \
@@ -65,6 +68,7 @@ static unsigned long lowmem_deathpending_timeout;

static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
{
+ static DEFINE_SPINLOCK(lowmem_lock);
struct task_struct *tsk;
struct task_struct *selected = NULL;
int rem = 0;
@@ -104,6 +108,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
}
selected_oom_score_adj = min_score_adj;

+ if (spin_trylock(&lowmem_lock) == 0)
+ return LMK_BUSY;
+
rcu_read_lock();
for_each_process(tsk) {
struct task_struct *p;
@@ -120,6 +127,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
time_before_eq(jiffies, lowmem_deathpending_timeout)) {
task_unlock(p);
rcu_read_unlock();
+ spin_unlock(&lowmem_lock);
return 0;
}
oom_score_adj = p->signal->oom_score_adj;
@@ -156,6 +164,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc)
lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n",
sc->nr_to_scan, sc->gfp_mask, rem);
rcu_read_unlock();
+ spin_unlock(&lowmem_lock);
return rem;
}

--
1.8.1.5


2013-04-15 13:18:23

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Mon, Apr 15, 2013 at 03:03:29PM +0200, Oskar Andero wrote:
> From: Snild Dolkow <[email protected]>
>
> Running multiple instances of LMK is not useful since it will try to
> kill the same process.
>
> This patch adds a spinlock to prevent multiple instances of the LMK
> running at the same time. Uses spin_trylock and return on failure to
> avoid blocking.
>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Brian Swetland <[email protected]>
> Reviewed-by: Radovan Lekanovic <[email protected]>
> Signed-off-by: Snild Dolkow <[email protected]>
> Signed-off-by: Oskar Andero <[email protected]>
> ---
> drivers/staging/android/lowmemorykiller.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> index 3b91b0f..0b19353 100644
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ b/drivers/staging/android/lowmemorykiller.c
> @@ -38,6 +38,7 @@
> #include <linux/rcupdate.h>
> #include <linux/profile.h>
> #include <linux/notifier.h>
> +#include <linux/spinlock.h>
>
> static uint32_t lowmem_debug_level = 2;
> static short lowmem_adj[6] = {
> @@ -57,6 +58,8 @@ static int lowmem_minfree_size = 4;
>
> static unsigned long lowmem_deathpending_timeout;
>
> +#define LMK_BUSY (-1)

Where is lowmem_shrink called from? I only see shrink called from
the bcache sysfs handler __bch_cache_set(). The return value isn't
checked there.

Up to now this function has only returns positive numbers.

There isn't a place which check LMK_BUSY so maybe it's best to just
return zero?

regards,
dan carpenter

2013-04-15 13:38:13

by Dolkow, Snild

[permalink] [raw]
Subject: RE: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

>Where is lowmem_shrink called from? I only see shrink called from the
>bcache sysfs handler __bch_cache_set(). The return value isn't checked
>there.
>
>Up to now this function has only returns positive numbers.
>
>There isn't a place which check LMK_BUSY so maybe it's best to just
>return zero?

Hey Dan,

lowmem_shrink is assigned to a shrinker struct (include/linux/shrinker.h) and called in do_shrinker_shrink() in mm/vmscan.c. That, in turn, is called and checked in a few places in vmscan.c.

>From the comments in shrinker.h:
"It should return the number of objects which remain in the cache. If it returns -1, it means it cannot do any scanning at this time (eg. there is a risk of deadlock). The callback must not return -1 if nr_to_scan is zero."

//Snild

2013-04-15 14:14:05

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Mon, Apr 15, 2013 at 03:38:08PM +0200, Dolkow, Snild wrote:
> >Where is lowmem_shrink called from? I only see shrink called from the
> >bcache sysfs handler __bch_cache_set(). The return value isn't checked
> >there.
> >
> >Up to now this function has only returns positive numbers.
> >
> >There isn't a place which check LMK_BUSY so maybe it's best to just
> >return zero?
>
> Hey Dan,
>
> lowmem_shrink is assigned to a shrinker struct
> (include/linux/shrinker.h) and called in do_shrinker_shrink() in
> mm/vmscan.c. That, in turn, is called and checked in a few places
> in vmscan.c.
>
> >From the comments in shrinker.h:
> "It should return the number of objects which remain in the
> cache. If it returns -1, it means it cannot do any scanning at
> this time (eg. there is a risk of deadlock). The callback must not
> return -1 if nr_to_scan is zero."

Ah. Good. -1 is the right return.

But really should be a #define in shrinker.h instead of in
drivers/staging/android/.

regards,
dan carpenter

2013-04-15 15:04:01

by Oskar Andero

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On 16:13 Mon 15 Apr , Dan Carpenter wrote:
> On Mon, Apr 15, 2013 at 03:38:08PM +0200, Dolkow, Snild wrote:
> > >Where is lowmem_shrink called from? I only see shrink called from the
> > >bcache sysfs handler __bch_cache_set(). The return value isn't checked
> > >there.
> > >
> > >Up to now this function has only returns positive numbers.
> > >
> > >There isn't a place which check LMK_BUSY so maybe it's best to just
> > >return zero?
> >
> > Hey Dan,
> >
> > lowmem_shrink is assigned to a shrinker struct
> > (include/linux/shrinker.h) and called in do_shrinker_shrink() in
> > mm/vmscan.c. That, in turn, is called and checked in a few places
> > in vmscan.c.
> >
> > >From the comments in shrinker.h:
> > "It should return the number of objects which remain in the
> > cache. If it returns -1, it means it cannot do any scanning at
> > this time (eg. there is a risk of deadlock). The callback must not
> > return -1 if nr_to_scan is zero."
>
> Ah. Good. -1 is the right return.
>
> But really should be a #define in shrinker.h instead of in
> drivers/staging/android/.

IMO one should use the errno.h values - e.g. EBUSY might be a good
value in this case. Does anyone know why the shrinker wants -1? Is there
a reason?

-Oskar

2013-04-15 18:28:11

by Dolkow, Snild

[permalink] [raw]
Subject: RE: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

>> > >From the comments in shrinker.h:
>> > "It should return the number of objects which remain in the cache.
>> > If it returns -1, it means it cannot do any scanning at this time
>> > (eg. there is a risk of deadlock). The callback must not return -1
>> > if nr_to_scan is zero."
>>
>
>IMO one should use the errno.h values - e.g. EBUSY might be a good value
>in this case. Does anyone know why the shrinker wants -1? Is there a
>reason?

The positive numbers are used to return information on the remaining cache size (again, see the comment I pasted above). We could use -EBUSY, but we'd have to change vmscan.c, which checks specifically for -1. I can't see a technical reason why -EBUSY couldn't have been chosen instead, but there's also no real reason to change it now.

//Snild

2013-04-15 19:49:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Mon, Apr 15, 2013 at 08:28:07PM +0200, Dolkow, Snild wrote:
> >> > >From the comments in shrinker.h:
> >> > "It should return the number of objects which remain in the cache.
> >> > If it returns -1, it means it cannot do any scanning at this time
> >> > (eg. there is a risk of deadlock). The callback must not return -1
> >> > if nr_to_scan is zero."
> >>
> >
> >IMO one should use the errno.h values - e.g. EBUSY might be a good value
> >in this case. Does anyone know why the shrinker wants -1? Is there a
> >reason?
>
> The positive numbers are used to return information on the remaining
> cache size (again, see the comment I pasted above). We could use
> -EBUSY, but we'd have to change vmscan.c, which checks specifically
> for -1. I can't see a technical reason why -EBUSY couldn't have been
> chosen instead, but there's also no real reason to change it now.

If it's not the correct thing to do, sure we can change it, just send a
patch. It makes way more sense than some random -1 return value to me.

Care to send a series of patches fixing this up properly?

thanks,

greg k-h

2013-04-15 23:11:23

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Mon, 15 Apr 2013, Greg Kroah-Hartman wrote:

> > The positive numbers are used to return information on the remaining
> > cache size (again, see the comment I pasted above). We could use
> > -EBUSY, but we'd have to change vmscan.c, which checks specifically
> > for -1. I can't see a technical reason why -EBUSY couldn't have been
> > chosen instead, but there's also no real reason to change it now.
>
> If it's not the correct thing to do, sure we can change it, just send a
> patch. It makes way more sense than some random -1 return value to me.
>
> Care to send a series of patches fixing this up properly?
>

The comment in shrinker.h is misleading, not the source code.
do_shrinker_shrink() will fail for anything negative and 0. The patch
being discussed could easily use -1 or 0 hardcoded into the return value,
forget the definition of LMK_BUSY.

Also, please consider using an atomic chmpxchg instead of a spinlock: if
you're only ever doing spin_trylock() then you don't need a spinlock.

2013-04-16 06:19:29

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Mon, Apr 15, 2013 at 04:11:18PM -0700, David Rientjes wrote:
> On Mon, 15 Apr 2013, Greg Kroah-Hartman wrote:
>
> > > The positive numbers are used to return information on the remaining
> > > cache size (again, see the comment I pasted above). We could use
> > > -EBUSY, but we'd have to change vmscan.c, which checks specifically
> > > for -1. I can't see a technical reason why -EBUSY couldn't have been
> > > chosen instead, but there's also no real reason to change it now.
> >
> > If it's not the correct thing to do, sure we can change it, just send a
> > patch. It makes way more sense than some random -1 return value to me.
> >
> > Care to send a series of patches fixing this up properly?
> >
>
> The comment in shrinker.h is misleading, not the source code.
> do_shrinker_shrink() will fail for anything negative and 0.

The comment is correct. The only acceptable negative return is -1.
Look at the second time do_shrinker_shrink() is called from
shrink_slab().

283 while (total_scan >= batch_size) {
284 int nr_before;
285
286 nr_before = do_shrinker_shrink(shrinker, shrink, 0);
287 shrink_ret = do_shrinker_shrink(shrinker, shrink,
288 batch_size);
289 if (shrink_ret == -1)
290 break;
291 if (shrink_ret < nr_before)
292 ret += nr_before - shrink_ret;
293 count_vm_events(SLABS_SCANNED, batch_size);

regards,
dan carpenter

2013-04-16 10:59:55

by Oskar Andero

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On 08:19 Tue 16 Apr , Dan Carpenter wrote:
> On Mon, Apr 15, 2013 at 04:11:18PM -0700, David Rientjes wrote:
> > On Mon, 15 Apr 2013, Greg Kroah-Hartman wrote:
> >
> > > > The positive numbers are used to return information on the remaining
> > > > cache size (again, see the comment I pasted above). We could use
> > > > -EBUSY, but we'd have to change vmscan.c, which checks specifically
> > > > for -1. I can't see a technical reason why -EBUSY couldn't have been
> > > > chosen instead, but there's also no real reason to change it now.
> > >
> > > If it's not the correct thing to do, sure we can change it, just send a
> > > patch. It makes way more sense than some random -1 return value to me.
> > >
> > > Care to send a series of patches fixing this up properly?
> > >
> >
> > The comment in shrinker.h is misleading, not the source code.
> > do_shrinker_shrink() will fail for anything negative and 0.
>
> The comment is correct. The only acceptable negative return is -1.
> Look at the second time do_shrinker_shrink() is called from
> shrink_slab().
>
> 283 while (total_scan >= batch_size) {
> 284 int nr_before;
> 285
> 286 nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> 287 shrink_ret = do_shrinker_shrink(shrinker, shrink,
> 288 batch_size);
> 289 if (shrink_ret == -1)
> 290 break;
> 291 if (shrink_ret < nr_before)
> 292 ret += nr_before - shrink_ret;
> 293 count_vm_events(SLABS_SCANNED, batch_size);

Yes, the comment is correct with what is implemented in the code, but
that doesn't mean the code is right. IMHO, relaying on magical numbers is highly
questionable coding style.

If there are no objections I will prepare a patch-set and let's discuss it from there.

-Oskar

2013-04-16 20:00:26

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On Tue, 16 Apr 2013, Oskar Andero wrote:

> > > The comment in shrinker.h is misleading, not the source code.
> > > do_shrinker_shrink() will fail for anything negative and 0.
> >
> > The comment is correct. The only acceptable negative return is -1.
> > Look at the second time do_shrinker_shrink() is called from
> > shrink_slab().
> >
> > 283 while (total_scan >= batch_size) {
> > 284 int nr_before;
> > 285
> > 286 nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> > 287 shrink_ret = do_shrinker_shrink(shrinker, shrink,
> > 288 batch_size);
> > 289 if (shrink_ret == -1)
> > 290 break;
> > 291 if (shrink_ret < nr_before)
> > 292 ret += nr_before - shrink_ret;
> > 293 count_vm_events(SLABS_SCANNED, batch_size);
>
> Yes, the comment is correct with what is implemented in the code, but
> that doesn't mean the code is right. IMHO, relaying on magical numbers is highly
> questionable coding style.
>

No, it's not. This is controlled higher in shrink_slab() by this:

max_pass = do_shrinker_shrink(shrinker, shrink, 0);
if (max_pass <= 0)
continue;

and your patch is implemented incorrectly, i.e. it does not return
LMK_BUSY if the spinlock is contended which needlessly recalls the
shrinker later.

You have a couple of options:

- return -1 when the spinlock is contended immediately when
!sc->nr_to_scan (although it should really be a cmpxchg since a
spinlock isn't needed), or

- protect the for_each_process() loop in lowmem_shrink() with an
actual spinlock that will detect any previously killed process
since it will have the TIF_MEMDIE bit set.

2013-04-23 22:05:47

by Oskar Andero

[permalink] [raw]
Subject: Re: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

On 22:00 Tue 16 Apr , David Rientjes wrote:
> On Tue, 16 Apr 2013, Oskar Andero wrote:
>
> > > > The comment in shrinker.h is misleading, not the source code.
> > > > do_shrinker_shrink() will fail for anything negative and 0.
> > >
> > > The comment is correct. The only acceptable negative return is -1.
> > > Look at the second time do_shrinker_shrink() is called from
> > > shrink_slab().
> > >
> > > 283 while (total_scan >= batch_size) {
> > > 284 int nr_before;
> > > 285
> > > 286 nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> > > 287 shrink_ret = do_shrinker_shrink(shrinker, shrink,
> > > 288 batch_size);
> > > 289 if (shrink_ret == -1)
> > > 290 break;
> > > 291 if (shrink_ret < nr_before)
> > > 292 ret += nr_before - shrink_ret;
> > > 293 count_vm_events(SLABS_SCANNED, batch_size);
> >
> > Yes, the comment is correct with what is implemented in the code, but
> > that doesn't mean the code is right. IMHO, relaying on magical numbers is highly
> > questionable coding style.
> >
>
> No, it's not. This is controlled higher in shrink_slab() by this:
>
> max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> if (max_pass <= 0)
> continue;
>

Sure, that looks ok, but that doesn't change the fact that line 289
above has a magical number and I guess that explains the comment:
> > > 289 if (shrink_ret == -1)
> > > 290 break;

Just to be clear - this is not about lowmemkiller, but rather a generic
clean-up of shrinkers that is needed IMO.

> and your patch is implemented incorrectly, i.e. it does not return
> LMK_BUSY if the spinlock is contended which needlessly recalls the
> shrinker later.
>
> You have a couple of options:
>
> - return -1 when the spinlock is contended immediately when
> !sc->nr_to_scan (although it should really be a cmpxchg since a
> spinlock isn't needed), or

I leave it to Snild to comment on the patch, but could you elaborate on why
you think cmpxchg is a better alternative than a spin_trylock? I just had a
brief look at the implementation for ARM and it looks like cmpxchg means
two unconditional memory barriers, whereas spin_trylock has one
conditional memory barrier. See arch/arm/include/asm/spinlock.h:
if (tmp == 0) {
smp_mb();
return 1;
} else {
return 0;
}

...and arch/arm/include/asm/cmpxchg.h:
smp_mb();
ret = __cmpxchg(ptr, old, new, size);
smp_mb();

AFAIK a memory barrier is pretty costly.

-Oskar

2013-04-24 08:33:53

by Dolkow, Snild

[permalink] [raw]
Subject: RE: [PATCH] lowmemorykiller: prevent multiple instances of low memory killer

> No, it's not. This is controlled higher in shrink_slab() by this:
>
> max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> if (max_pass <= 0)
> continue;
>

Yes, but the later calls will still not handle other negative values as failures, and there is a chance that more than one thread will get past that first check.

286 nr_before = do_shrinker_shrink(shrinker, shrink, 0);
287 shrink_ret = do_shrinker_shrink(shrinker, shrink,
288 batch_size);
289 if (shrink_ret == -1)
290 break;
291 if (shrink_ret < nr_before)
292 ret += nr_before - shrink_ret;

If, for example, nr_before happens to be -2 and shrink_ret happens to be -1000 here, we're going to erroneously increase ret by 998.

> and your patch is implemented incorrectly, i.e. it does not return
> LMK_BUSY if the spinlock is contended which needlessly recalls the
> shrinker later.

It's worth noting that the LMK has a fastpath for the nr_to_scan=0 case, like the shrinker.h comment recommends. And nr_to_scan=0 is used to query the cache size, so it seems like a good idea to return successfully whenever we can.

> You have a couple of options:
>
> - return -1 when the spinlock is contended immediately when
> !sc->nr_to_scan (although it should really be a cmpxchg since a
> spinlock isn't needed), or

This comes with the risk of nr_before being -1, and shrink_ret being positive. In that case, we will have sent a kill signal, but we're not increasing ret. Not a catastrophe, AFAICT, but not fantastic either.

> - protect the for_each_process() loop in lowmem_shrink() with an
> actual spinlock that will detect any previously killed process
> since it will have the TIF_MEMDIE bit set.

We expect that killing one process will be enough, so spinning seems like a waste of time. If one process wasn't enough, the LMK will trigger again soon.

//Snild