2008-12-01 01:58:11

by Johannes Weiner

[permalink] [raw]
Subject: [patch v2] vmscan: protect zone rotation stats by lru lock

The zone's rotation statistics must not be accessed without the
corresponding LRU lock held. Fix an unprotected write in
shrink_active_list().

Acked-by: Rik van Riel <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Applies to your tree, Linus, and should probably go into .28.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1243,32 +1243,32 @@ static void shrink_active_list(unsigned
/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&
page_referenced(page, 0, sc->mem_cgroup))
pgmoved++;

list_add(&page->lru, &l_inactive);
}

+ spin_lock_irq(&zone->lru_lock);
/*
* Count referenced pages from currently used mappings as
* rotated, even though they are moved to the inactive list.
* This helps balance scan pressure between file and anonymous
* pages in get_scan_ratio.
*/
zone->recent_rotated[!!file] += pgmoved;

/*
* Move the pages to the [file or anon] inactive list.
*/
pagevec_init(&pvec, 1);

pgmoved = 0;
lru = LRU_BASE + file * LRU_FILE;
- spin_lock_irq(&zone->lru_lock);
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);
VM_BUG_ON(PageLRU(page));
SetPageLRU(page);
VM_BUG_ON(!PageActive(page));
ClearPageActive(page);


2008-12-01 21:42:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch v2] vmscan: protect zone rotation stats by lru lock

On Mon, 01 Dec 2008 03:00:35 +0100
Johannes Weiner <[email protected]> wrote:

> The zone's rotation statistics must not be accessed without the
> corresponding LRU lock held. Fix an unprotected write in
> shrink_active_list().
>

I don't think it really matters. It's quite common in that code to do
unlocked, racy update to statistics such as this. Because on those
rare occasions where a race does happen, there's a small glitch in the
reclaim logic which nobody will notice anyway.

Of course, this does need to be done with some care, to ensure the
glitch _will_ be small. If such a race would cause the scanner to go
off and reclaim 2^32 pages, well, that's not so good.

>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1243,32 +1243,32 @@ static void shrink_active_list(unsigned
> /* page_referenced clears PageReferenced */
> if (page_mapping_inuse(page) &&
> page_referenced(page, 0, sc->mem_cgroup))
> pgmoved++;
>
> list_add(&page->lru, &l_inactive);
> }
>
> + spin_lock_irq(&zone->lru_lock);
> /*
> * Count referenced pages from currently used mappings as
> * rotated, even though they are moved to the inactive list.
> * This helps balance scan pressure between file and anonymous
> * pages in get_scan_ratio.
> */
> zone->recent_rotated[!!file] += pgmoved;
>
> /*
> * Move the pages to the [file or anon] inactive list.
> */
> pagevec_init(&pvec, 1);
>
> pgmoved = 0;
> lru = LRU_BASE + file * LRU_FILE;
> - spin_lock_irq(&zone->lru_lock);

We've unnecessarily moved a pile of other things inside the locked
region as well, needlessly extending the lock hold times.

> while (!list_empty(&l_inactive)) {
> page = lru_to_page(&l_inactive);
> prefetchw_prev_lru_page(page, &l_inactive, flags);
> VM_BUG_ON(PageLRU(page));
> SetPageLRU(page);
> VM_BUG_ON(!PageActive(page));
> ClearPageActive(page);
>

You'll note that the code which _uses_ these values does so without
holding the lock. So get_scan_ratio() sees incoherent values of
recent_scanned[0] and recent_scanned[1]. As is common in this code,
that is OK and deliberate.

It's also racy here:

if (unlikely(zone->recent_scanned[0] > anon / 4)) {
spin_lock_irq(&zone->lru_lock);
zone->recent_scanned[0] /= 2;
zone->recent_rotated[0] /= 2;
spin_unlock_irq(&zone->lru_lock);
}

failing to recheck the comparison after taking the lock..

2008-12-01 21:47:25

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch v2] vmscan: protect zone rotation stats by lru lock

Andrew Morton wrote:
> On Mon, 01 Dec 2008 03:00:35 +0100
> Johannes Weiner <[email protected]> wrote:
>
>> The zone's rotation statistics must not be accessed without the
>> corresponding LRU lock held. Fix an unprotected write in
>> shrink_active_list().
>>
>
> I don't think it really matters. It's quite common in that code to do
> unlocked, racy update to statistics such as this. Because on those
> rare occasions where a race does happen, there's a small glitch in the
> reclaim logic which nobody will notice anyway.
>
> Of course, this does need to be done with some care, to ensure the
> glitch _will_ be small.

Processing at most SWAP_CLUSTER_MAX pages at once probably
ensures that glitches will be small most of the time.

The only way this could be a big problem is if we end up
racing with the divide-by-two logic in get_scan_ratio,
leaving the rotated pages a factor two higher than they
should be.

Putting all the writes to the stats under the LRU lock
should ensure that never happens.

--
All rights reversed.

2008-12-01 22:10:51

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [patch v2] vmscan: protect zone rotation stats by lru lock

On Mon, 2008-12-01 at 16:46 -0500, Rik van Riel wrote:
> Andrew Morton wrote:
> > On Mon, 01 Dec 2008 03:00:35 +0100
> > Johannes Weiner <[email protected]> wrote:
> >
> >> The zone's rotation statistics must not be accessed without the
> >> corresponding LRU lock held. Fix an unprotected write in
> >> shrink_active_list().
> >>
> >
> > I don't think it really matters. It's quite common in that code to do
> > unlocked, racy update to statistics such as this. Because on those
> > rare occasions where a race does happen, there's a small glitch in the
> > reclaim logic which nobody will notice anyway.
> >
> > Of course, this does need to be done with some care, to ensure the
> > glitch _will_ be small.
>
> Processing at most SWAP_CLUSTER_MAX pages at once probably
> ensures that glitches will be small most of the time.
>
> The only way this could be a big problem is if we end up
> racing with the divide-by-two logic in get_scan_ratio,
> leaving the rotated pages a factor two higher than they
> should be.
>
> Putting all the writes to the stats under the LRU lock
> should ensure that never happens.

And he's not actually adding a lock. Just moving the exiting one up to
include the stats update. The intervening pagevec, pgmoved and lru
initializations don't need to be under the lock, but that's probably not
a big deal?

Lee

2008-12-02 12:35:52

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch v2] vmscan: protect zone rotation stats by lru lock

On Mon, Dec 01, 2008 at 05:09:45PM -0500, Lee Schermerhorn wrote:
> On Mon, 2008-12-01 at 16:46 -0500, Rik van Riel wrote:
> > Andrew Morton wrote:
> > > On Mon, 01 Dec 2008 03:00:35 +0100
> > > Johannes Weiner <[email protected]> wrote:
> > >
> > >> The zone's rotation statistics must not be accessed without the
> > >> corresponding LRU lock held. Fix an unprotected write in
> > >> shrink_active_list().
> > >>
> > >
> > > I don't think it really matters. It's quite common in that code to do
> > > unlocked, racy update to statistics such as this. Because on those
> > > rare occasions where a race does happen, there's a small glitch in the
> > > reclaim logic which nobody will notice anyway.
> > >
> > > Of course, this does need to be done with some care, to ensure the
> > > glitch _will_ be small.
> >
> > Processing at most SWAP_CLUSTER_MAX pages at once probably
> > ensures that glitches will be small most of the time.
> >
> > The only way this could be a big problem is if we end up
> > racing with the divide-by-two logic in get_scan_ratio,
> > leaving the rotated pages a factor two higher than they
> > should be.
> >
> > Putting all the writes to the stats under the LRU lock
> > should ensure that never happens.
>
> And he's not actually adding a lock. Just moving the exiting one up to
> include the stats update. The intervening pagevec, pgmoved and lru
> initializations don't need to be under the lock, but that's probably not
> a big deal?

I did it like this to keep the diff as simple as possible and to not
change existing code flow.

Here is an alternate version that moves the safe stuff out of the
locked region.

tbh, I think it's worse.

Hannes

---

The zone's rotation statistics must not be modified without the
corresponding LRU lock held. Fix an unprotected write in
shrink_active_list().

---
mm/vmscan.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1249,21 +1249,21 @@ static void shrink_active_list(unsigned
}

/*
+ * Move the pages to the [file or anon] inactive list.
+ */
+
+ pagevec_init(&pvec, 1);
+ lru = LRU_BASE + file * LRU_FILE;
+
+ spin_lock_irq(&zone->lru_lock);
+ /*
* Count referenced pages from currently used mappings as
* rotated, even though they are moved to the inactive list.
* This helps balance scan pressure between file and anonymous
* pages in get_scan_ratio.
*/
zone->recent_rotated[!!file] += pgmoved;
-
- /*
- * Move the pages to the [file or anon] inactive list.
- */
- pagevec_init(&pvec, 1);
-
pgmoved = 0;
- lru = LRU_BASE + file * LRU_FILE;
- spin_lock_irq(&zone->lru_lock);
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);

2008-12-02 18:17:51

by Lee Schermerhorn

[permalink] [raw]
Subject: Re: [patch v2] vmscan: protect zone rotation stats by lru lock

On Tue, 2008-12-02 at 13:34 +0100, Johannes Weiner wrote:
> On Mon, Dec 01, 2008 at 05:09:45PM -0500, Lee Schermerhorn wrote:
> > On Mon, 2008-12-01 at 16:46 -0500, Rik van Riel wrote:
> > > Andrew Morton wrote:
> > > > On Mon, 01 Dec 2008 03:00:35 +0100
> > > > Johannes Weiner <[email protected]> wrote:
> > > >
> > > >> The zone's rotation statistics must not be accessed without the
> > > >> corresponding LRU lock held. Fix an unprotected write in
> > > >> shrink_active_list().
> > > >>
> > > >
> > > > I don't think it really matters. It's quite common in that code to do
> > > > unlocked, racy update to statistics such as this. Because on those
> > > > rare occasions where a race does happen, there's a small glitch in the
> > > > reclaim logic which nobody will notice anyway.
> > > >
> > > > Of course, this does need to be done with some care, to ensure the
> > > > glitch _will_ be small.
> > >
> > > Processing at most SWAP_CLUSTER_MAX pages at once probably
> > > ensures that glitches will be small most of the time.
> > >
> > > The only way this could be a big problem is if we end up
> > > racing with the divide-by-two logic in get_scan_ratio,
> > > leaving the rotated pages a factor two higher than they
> > > should be.
> > >
> > > Putting all the writes to the stats under the LRU lock
> > > should ensure that never happens.
> >
> > And he's not actually adding a lock. Just moving the exiting one up to
> > include the stats update. The intervening pagevec, pgmoved and lru
> > initializations don't need to be under the lock, but that's probably not
> > a big deal?
>
> I did it like this to keep the diff as simple as possible and to not
> change existing code flow.
>
> Here is an alternate version that moves the safe stuff out of the
> locked region.
>
> tbh, I think it's worse.

As I said, I didn't think it was a big deal. I'm fine with the prior
version.

Lee
>
> Hannes
>
> ---
>
> The zone's rotation statistics must not be modified without the
> corresponding LRU lock held. Fix an unprotected write in
> shrink_active_list().
>
> ---
> mm/vmscan.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1249,21 +1249,21 @@ static void shrink_active_list(unsigned
> }
>
> /*
> + * Move the pages to the [file or anon] inactive list.
> + */
> +
> + pagevec_init(&pvec, 1);
> + lru = LRU_BASE + file * LRU_FILE;
> +
> + spin_lock_irq(&zone->lru_lock);
> + /*
> * Count referenced pages from currently used mappings as
> * rotated, even though they are moved to the inactive list.
> * This helps balance scan pressure between file and anonymous
> * pages in get_scan_ratio.
> */
> zone->recent_rotated[!!file] += pgmoved;
> -
> - /*
> - * Move the pages to the [file or anon] inactive list.
> - */
> - pagevec_init(&pvec, 1);
> -
> pgmoved = 0;
> - lru = LRU_BASE + file * LRU_FILE;
> - spin_lock_irq(&zone->lru_lock);
> while (!list_empty(&l_inactive)) {
> page = lru_to_page(&l_inactive);
> prefetchw_prev_lru_page(page, &l_inactive, flags);