2011-04-14 21:17:31

by Johannes Weiner

[permalink] [raw]
Subject: [patch] mm/vmalloc: remove block allocation bitmap

Space in a vmap block that was once allocated is considered dirty and
not made available for allocation again before the whole block is
recycled.

The result is that free space within a vmap block is always contiguous
and the allocation bitmap can be replaced by remembering the offset of
free space in the block.

The fragmented block purging was never invoked from vb_alloc() either,
as it skips blocks that do not have enough free space for the
allocation in the first place. According to the above, it is
impossible for a block to have enough free space and still fail the
allocation. Thus, this dead code is removed. Partially consumed
blocks will be reclaimed anyway when an attempt is made to allocate a
new vmap block altogether and no free space is found.

Signed-off-by: Johannes Weiner <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
mm/vmalloc.c | 39 +++++++++------------------------------
1 files changed, 9 insertions(+), 30 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5d8666b..5393248 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -682,7 +682,7 @@ struct vmap_block {
struct vmap_area *va;
struct vmap_block_queue *vbq;
unsigned long free, dirty;
- DECLARE_BITMAP(alloc_map, VMAP_BBMAP_BITS);
+ unsigned long free_offset;
DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS);
struct list_head free_list;
struct rcu_head rcu_head;
@@ -748,7 +748,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
vb->va = va;
vb->free = VMAP_BBMAP_BITS;
vb->dirty = 0;
- bitmap_zero(vb->alloc_map, VMAP_BBMAP_BITS);
+ vb->free_offset = 0;
bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
INIT_LIST_HEAD(&vb->free_list);

@@ -808,7 +808,7 @@ static void purge_fragmented_blocks(int cpu)
if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
vb->free = 0; /* prevent further allocs after releasing lock */
vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
- bitmap_fill(vb->alloc_map, VMAP_BBMAP_BITS);
+ vb->free_offset = VMAP_BBMAP_BITS;
bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS);
spin_lock(&vbq->lock);
list_del_rcu(&vb->free_list);
@@ -826,11 +826,6 @@ static void purge_fragmented_blocks(int cpu)
}
}

-static void purge_fragmented_blocks_thiscpu(void)
-{
- purge_fragmented_blocks(smp_processor_id());
-}
-
static void purge_fragmented_blocks_allcpus(void)
{
int cpu;
@@ -845,7 +840,6 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
struct vmap_block *vb;
unsigned long addr = 0;
unsigned int order;
- int purge = 0;

BUG_ON(size & ~PAGE_MASK);
BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);
@@ -855,41 +849,26 @@ again:
rcu_read_lock();
vbq = &get_cpu_var(vmap_block_queue);
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
- int i;
-
spin_lock(&vb->lock);
- if (vb->free < 1UL << order)
- goto next;
-
- i = bitmap_find_free_region(vb->alloc_map,
- VMAP_BBMAP_BITS, order);
-
- if (i < 0) {
- if (vb->free + vb->dirty == VMAP_BBMAP_BITS) {
- /* fragmented and no outstanding allocations */
- BUG_ON(vb->dirty != VMAP_BBMAP_BITS);
- purge = 1;
- }
- goto next;
+ if (vb->free < 1UL << order) {
+ spin_unlock(&vb->lock);
+ continue;
}
- addr = vb->va->va_start + (i << PAGE_SHIFT);
+ addr = vb->va->va_start + (vb->free_offset << PAGE_SHIFT);
BUG_ON(addr_to_vb_idx(addr) !=
addr_to_vb_idx(vb->va->va_start));
vb->free -= 1UL << order;
+ vb->free_offset += 1UL << order;
if (vb->free == 0) {
+ BUG_ON(vb->free_offset != VMAP_BBMAP_BITS);
spin_lock(&vbq->lock);
list_del_rcu(&vb->free_list);
spin_unlock(&vbq->lock);
}
spin_unlock(&vb->lock);
break;
-next:
- spin_unlock(&vb->lock);
}

- if (purge)
- purge_fragmented_blocks_thiscpu();
-
put_cpu_var(vmap_block_queue);
rcu_read_unlock();

--
1.7.4


2011-04-19 09:31:48

by Mel Gorman

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Thu, Apr 14, 2011 at 05:16:56PM -0400, Johannes Weiner wrote:
> Space in a vmap block that was once allocated is considered dirty and
> not made available for allocation again before the whole block is
> recycled.
>
> The result is that free space within a vmap block is always contiguous
> and the allocation bitmap can be replaced by remembering the offset of
> free space in the block.
>
> The fragmented block purging was never invoked from vb_alloc() either,
> as it skips blocks that do not have enough free space for the
> allocation in the first place. According to the above, it is
> impossible for a block to have enough free space and still fail the
> allocation. Thus, this dead code is removed. Partially consumed
> blocks will be reclaimed anyway when an attempt is made to allocate a
> new vmap block altogether and no free space is found.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Hugh Dickins <[email protected]>

I didn't see a problem with the patch per-se but I wonder if your patch
is the intended behaviour. It looks like the intention was that dirty
blocks could be flushed from the TLB and made available for allocations
leading to the possibility of fragmented vmap blocks.

It's this check that is skipping over blocks without taking dirty into
account.

spin_lock(&vb->lock);
if (vb->free < 1UL << order)
goto next;

It was introduced by [02b709d: mm: purge fragmented percpu vmap blocks]
but is there any possibility that this is what should be fixed instead?
Do we know what the consequences of blocks only getting flushed when
they have been fully allocated are?

> <SNIP>

--
Mel Gorman
SUSE Labs

2011-04-19 23:39:34

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Tue, Apr 19, 2011 at 10:31:18AM +0100, Mel Gorman wrote:
> On Thu, Apr 14, 2011 at 05:16:56PM -0400, Johannes Weiner wrote:
> > Space in a vmap block that was once allocated is considered dirty and
> > not made available for allocation again before the whole block is
> > recycled.
> >
> > The result is that free space within a vmap block is always contiguous
> > and the allocation bitmap can be replaced by remembering the offset of
> > free space in the block.
> >
> > The fragmented block purging was never invoked from vb_alloc() either,
> > as it skips blocks that do not have enough free space for the
> > allocation in the first place. According to the above, it is
> > impossible for a block to have enough free space and still fail the
> > allocation. Thus, this dead code is removed. Partially consumed
> > blocks will be reclaimed anyway when an attempt is made to allocate a
> > new vmap block altogether and no free space is found.
> >
> > Signed-off-by: Johannes Weiner <[email protected]>
> > Cc: Nick Piggin <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Hugh Dickins <[email protected]>
>
> I didn't see a problem with the patch per-se but I wonder if your patch
> is the intended behaviour. It looks like the intention was that dirty
> blocks could be flushed from the TLB and made available for allocations
> leading to the possibility of fragmented vmap blocks.
>
> It's this check that is skipping over blocks without taking dirty into
> account.
>
> spin_lock(&vb->lock);
> if (vb->free < 1UL << order)
> goto next;
>
> It was introduced by [02b709d: mm: purge fragmented percpu vmap blocks]
> but is there any possibility that this is what should be fixed instead?

I would like to emphasize that the quoted check only made it clear
that the allocation bitmap is superfluous. There is no partial
recycling of a block with live allocations, not even before this
commit.

> Do we know what the consequences of blocks only getting flushed when
> they have been fully allocated are?

Note that it can get recycled earlier if there is no outstanding
allocation on it, even if only a small amount of it is dirty (the
purge_fragmented_blocks code does this).

A single outstanding allocation prevents the block from being
recycled, blocking the reuse of the dirty area.

Theoretically, we could end up with all possible vmap blocks being
pinned by single allocations with most of their area being dirty and
not reusable. But I believe this is unlikely to happen.

Would you be okay with printing out block usage statistics on
allocation errors for the time being, so we can identify this case if
problems show up?

And consider this patch an optimization/simplification of a status quo
that does not appear problematic? We can still revert it and
implement live block recycling when it turns out to be necessary.

Hannes

2011-04-20 09:47:17

by Mel Gorman

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Wed, Apr 20, 2011 at 01:39:05AM +0200, Johannes Weiner wrote:
> On Tue, Apr 19, 2011 at 10:31:18AM +0100, Mel Gorman wrote:
> > On Thu, Apr 14, 2011 at 05:16:56PM -0400, Johannes Weiner wrote:
> > > Space in a vmap block that was once allocated is considered dirty and
> > > not made available for allocation again before the whole block is
> > > recycled.
> > >
> > > The result is that free space within a vmap block is always contiguous
> > > and the allocation bitmap can be replaced by remembering the offset of
> > > free space in the block.
> > >
> > > The fragmented block purging was never invoked from vb_alloc() either,
> > > as it skips blocks that do not have enough free space for the
> > > allocation in the first place. According to the above, it is
> > > impossible for a block to have enough free space and still fail the
> > > allocation. Thus, this dead code is removed. Partially consumed
> > > blocks will be reclaimed anyway when an attempt is made to allocate a
> > > new vmap block altogether and no free space is found.
> > >
> > > Signed-off-by: Johannes Weiner <[email protected]>
> > > Cc: Nick Piggin <[email protected]>
> > > Cc: Mel Gorman <[email protected]>
> > > Cc: Hugh Dickins <[email protected]>
> >
> > I didn't see a problem with the patch per-se but I wonder if your patch
> > is the intended behaviour. It looks like the intention was that dirty
> > blocks could be flushed from the TLB and made available for allocations
> > leading to the possibility of fragmented vmap blocks.
> >
> > It's this check that is skipping over blocks without taking dirty into
> > account.
> >
> > spin_lock(&vb->lock);
> > if (vb->free < 1UL << order)
> > goto next;
> >
> > It was introduced by [02b709d: mm: purge fragmented percpu vmap blocks]
> > but is there any possibility that this is what should be fixed instead?
>
> I would like to emphasize that the quoted check only made it clear
> that the allocation bitmap is superfluous. There is no partial
> recycling of a block with live allocations, not even before this
> commit.
>

You're right in that the allocation bitmap does look superfluous. I was
wondering if it was meant to be doing something useful.

> > Do we know what the consequences of blocks only getting flushed when
> > they have been fully allocated are?
>
> Note that it can get recycled earlier if there is no outstanding
> allocation on it, even if only a small amount of it is dirty (the
> purge_fragmented_blocks code does this).
>

Yep.

> A single outstanding allocation prevents the block from being
> recycled, blocking the reuse of the dirty area.
>

Yes although your patch doesn't appear to make the current situation
better or worse. It's tricky to know exactly when a full flush
will take place and what the conseqeuences are. For example, look
at vb_alloc(). If all the blocks have a single allocation preventing
recycling, we call new_vmap_block() which in itself is not too bad,
but it may mean we are using more memory than necessary in the name
of avoiding flushes. This is avoided if a lot of freeing is going on
at the same time but it's unpredictable.

> Theoretically, we could end up with all possible vmap blocks being
> pinned by single allocations with most of their area being dirty and
> not reusable. But I believe this is unlikely to happen.
>
> Would you be okay with printing out block usage statistics on
> allocation errors for the time being, so we can identify this case if
> problems show up?
>

It'd be interesting but for the purposes of this patch I think it
would be more useful to see the results of some benchmark that is vmap
intensive. Something directory intensive running on XFS should do the
job just to confirm no regression, right? A profile might indicate
how often we end up scanning the full list, finding it dirty and
calling new_vmap_block but even if something odd showed up there,
it would be a new patch.

> And consider this patch an optimization/simplification of a status quo
> that does not appear problematic? We can still revert it and
> implement live block recycling when it turns out to be necessary.
>

I see no problem with your patch so;

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2011-04-20 16:40:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Wed, Apr 20, 2011 at 10:46:47AM +0100, Mel Gorman wrote:
> It'd be interesting but for the purposes of this patch I think it
> would be more useful to see the results of some benchmark that is vmap
> intensive. Something directory intensive running on XFS should do the
> job just to confirm no regression, right? A profile might indicate
> how often we end up scanning the full list, finding it dirty and
> calling new_vmap_block but even if something odd showed up there,
> it would be a new patch.

Note that the default mkfs.xfs options will not trigger any vmap
calls at runtime. You'll need a filesystem with a large directory
block size to trigger heavy vmap usage.

2011-04-23 02:08:52

by Johannes Weiner

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Wed, Apr 20, 2011 at 10:46:47AM +0100, Mel Gorman wrote:
> On Wed, Apr 20, 2011 at 01:39:05AM +0200, Johannes Weiner wrote:
> > On Tue, Apr 19, 2011 at 10:31:18AM +0100, Mel Gorman wrote:
> > > On Thu, Apr 14, 2011 at 05:16:56PM -0400, Johannes Weiner wrote:
> > > > Space in a vmap block that was once allocated is considered dirty and
> > > > not made available for allocation again before the whole block is
> > > > recycled.
> > > >
> > > > The result is that free space within a vmap block is always contiguous
> > > > and the allocation bitmap can be replaced by remembering the offset of
> > > > free space in the block.
> > > >
> > > > The fragmented block purging was never invoked from vb_alloc() either,
> > > > as it skips blocks that do not have enough free space for the
> > > > allocation in the first place. According to the above, it is
> > > > impossible for a block to have enough free space and still fail the
> > > > allocation. Thus, this dead code is removed. Partially consumed
> > > > blocks will be reclaimed anyway when an attempt is made to allocate a
> > > > new vmap block altogether and no free space is found.
> > > >
> > > > Signed-off-by: Johannes Weiner <[email protected]>
> > > > Cc: Nick Piggin <[email protected]>
> > > > Cc: Mel Gorman <[email protected]>
> > > > Cc: Hugh Dickins <[email protected]>
> > >
> > > I didn't see a problem with the patch per-se but I wonder if your patch
> > > is the intended behaviour. It looks like the intention was that dirty
> > > blocks could be flushed from the TLB and made available for allocations
> > > leading to the possibility of fragmented vmap blocks.
> > >
> > > It's this check that is skipping over blocks without taking dirty into
> > > account.
> > >
> > > spin_lock(&vb->lock);
> > > if (vb->free < 1UL << order)
> > > goto next;
> > >
> > > It was introduced by [02b709d: mm: purge fragmented percpu vmap blocks]
> > > but is there any possibility that this is what should be fixed instead?
> >
> > I would like to emphasize that the quoted check only made it clear
> > that the allocation bitmap is superfluous. There is no partial
> > recycling of a block with live allocations, not even before this
> > commit.
>
> You're right in that the allocation bitmap does look superfluous. I was
> wondering if it was meant to be doing something useful.

Ok, just wanted to make sure we are on the same page.

> > Theoretically, we could end up with all possible vmap blocks being
> > pinned by single allocations with most of their area being dirty and
> > not reusable. But I believe this is unlikely to happen.
> >
> > Would you be okay with printing out block usage statistics on
> > allocation errors for the time being, so we can identify this case if
> > problems show up?
>
> It'd be interesting but for the purposes of this patch I think it
> would be more useful to see the results of some benchmark that is vmap
> intensive. Something directory intensive running on XFS should do the
> job just to confirm no regression, right? A profile might indicate
> how often we end up scanning the full list, finding it dirty and
> calling new_vmap_block but even if something odd showed up there,
> it would be a new patch.

Ok, I am still a bit confused. You say 'regression' in the overall
flow of vmap block allocation, but my patch does not change that.
Which kernel versions do you want to have compared?

Profiling the algorithm in general sounds like a good idea and I shall
give it a go.

> > And consider this patch an optimization/simplification of a status quo
> > that does not appear problematic? We can still revert it and
> > implement live block recycling when it turns out to be necessary.
>
> I see no problem with your patch so;
>
> Acked-by: Mel Gorman <[email protected]>

Thanks!

Hannes

2011-04-25 11:08:59

by Mel Gorman

[permalink] [raw]
Subject: Re: [patch] mm/vmalloc: remove block allocation bitmap

On Sat, Apr 23, 2011 at 04:08:35AM +0200, Johannes Weiner wrote:
> On Wed, Apr 20, 2011 at 10:46:47AM +0100, Mel Gorman wrote:
> > On Wed, Apr 20, 2011 at 01:39:05AM +0200, Johannes Weiner wrote:
> > > On Tue, Apr 19, 2011 at 10:31:18AM +0100, Mel Gorman wrote:
> > > > On Thu, Apr 14, 2011 at 05:16:56PM -0400, Johannes Weiner wrote:
> > > > > Space in a vmap block that was once allocated is considered dirty and
> > > > > not made available for allocation again before the whole block is
> > > > > recycled.
> > > > >
> > > > > The result is that free space within a vmap block is always contiguous
> > > > > and the allocation bitmap can be replaced by remembering the offset of
> > > > > free space in the block.
> > > > >
> > > > > The fragmented block purging was never invoked from vb_alloc() either,
> > > > > as it skips blocks that do not have enough free space for the
> > > > > allocation in the first place. According to the above, it is
> > > > > impossible for a block to have enough free space and still fail the
> > > > > allocation. Thus, this dead code is removed. Partially consumed
> > > > > blocks will be reclaimed anyway when an attempt is made to allocate a
> > > > > new vmap block altogether and no free space is found.
> > > > >
> > > > > Signed-off-by: Johannes Weiner <[email protected]>
> > > > > Cc: Nick Piggin <[email protected]>
> > > > > Cc: Mel Gorman <[email protected]>
> > > > > Cc: Hugh Dickins <[email protected]>
> > > >
> > > > I didn't see a problem with the patch per-se but I wonder if your patch
> > > > is the intended behaviour. It looks like the intention was that dirty
> > > > blocks could be flushed from the TLB and made available for allocations
> > > > leading to the possibility of fragmented vmap blocks.
> > > >
> > > > It's this check that is skipping over blocks without taking dirty into
> > > > account.
> > > >
> > > > spin_lock(&vb->lock);
> > > > if (vb->free < 1UL << order)
> > > > goto next;
> > > >
> > > > It was introduced by [02b709d: mm: purge fragmented percpu vmap blocks]
> > > > but is there any possibility that this is what should be fixed instead?
> > >
> > > I would like to emphasize that the quoted check only made it clear
> > > that the allocation bitmap is superfluous. There is no partial
> > > recycling of a block with live allocations, not even before this
> > > commit.
> >
> > You're right in that the allocation bitmap does look superfluous. I was
> > wondering if it was meant to be doing something useful.
>
> Ok, just wanted to make sure we are on the same page.
>
> > > Theoretically, we could end up with all possible vmap blocks being
> > > pinned by single allocations with most of their area being dirty and
> > > not reusable. But I believe this is unlikely to happen.
> > >
> > > Would you be okay with printing out block usage statistics on
> > > allocation errors for the time being, so we can identify this case if
> > > problems show up?
> >
> > It'd be interesting but for the purposes of this patch I think it
> > would be more useful to see the results of some benchmark that is vmap
> > intensive. Something directory intensive running on XFS should do the
> > job just to confirm no regression, right? A profile might indicate
> > how often we end up scanning the full list, finding it dirty and
> > calling new_vmap_block but even if something odd showed up there,
> > it would be a new patch.
>
> Ok, I am still a bit confused. You say 'regression' in the overall
> flow of vmap block allocation, but my patch does not change that.
> Which kernel versions do you want to have compared?
>

I should have been clearer. I was checking to see if we spend
much time scanning to find a clean block or allocating new blocks
and if that overhead could be reduced by flushing the TLB sooner
(obvious the savings would have to exceed the cost of the TLB flush
itself). If flushing sooner would help then we want the alloc bitmap
to hang around.

> Profiling the algorithm in general sounds like a good idea and I shall
> give it a go.
>
> > > And consider this patch an optimization/simplification of a status quo
> > > that does not appear problematic? We can still revert it and
> > > implement live block recycling when it turns out to be necessary.
> >
> > I see no problem with your patch so;
> >
> > Acked-by: Mel Gorman <[email protected]>
>
> Thanks!
>

--
Mel Gorman
SUSE Labs