2007-12-21 11:09:55

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Fix the soft lockup with multi block allocator.

With the multi block allocator when we don't have prealloc space we discard
the existing preallocaltion data and try to rebuild the buddy cache. While
discarding the loop through the group specific prealloc list. If we find any
particular prealloc space being used we mark the space busy. If we are not
able to find enough free space and if we have any prealloc space busy we loop
back again. With non preempted kernel this tight loop resulted in watchdog
timer triggering soft lockup warning.


Whe we are allocation the block we search the prealloc list and mark the
prealloc space used via incrementing pa_count value. One after succesffuly
allocating the block we need to update the block bitmap and this could
actually involved a disk io if the bitmap need to read from the disk. This
actually cause prealloc space to be marked used for quiet a long time. This
inturn results in the discard logic going on tight loop resulting in watchdog
timer triggering soft lockup warning.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/mballoc.c | 12 +++---------
1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 844765c..cbc8143 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3729,7 +3729,7 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
struct list_head list;
struct ext4_buddy e4b;
int err;
- int busy;
+ int busy = 0;
int free = 0;

mb_debug("discard preallocation for group %lu\n", group);
@@ -3754,20 +3754,12 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
INIT_LIST_HEAD(&list);

repeat:
- busy = 0;
ext4_lock_group(sb, group);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {
spin_lock(&pa->pa_lock);
if (atomic_read(&pa->pa_count)) {
spin_unlock(&pa->pa_lock);
- /* FIXME!!
- * It is quiet natural to have the pa being
- * used on other cpus when we are trying free
- * space
- printk(KERN_ERR "uh! busy PA\n");
- dump_stack();
- */
busy = 1;
continue;
}
@@ -3790,7 +3782,9 @@ repeat:

/* if we still need more blocks and some PAs were used, try again */
if (free < needed && busy) {
+ busy = 0;
ext4_unlock_group(sb, group);
+ schedule_timeout(HZ);
goto repeat;
}

--
1.5.4.rc0-dirty


2007-12-21 11:36:51

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

the patch looks OK

Aneesh Kumar K.V wrote:
> With the multi block allocator when we don't have prealloc space we discard
> the existing preallocaltion data and try to rebuild the buddy cache. While
> discarding the loop through the group specific prealloc list. If we find any
> particular prealloc space being used we mark the space busy. If we are not
> able to find enough free space and if we have any prealloc space busy we loop
> back again. With non preempted kernel this tight loop resulted in watchdog
> timer triggering soft lockup warning.
>
>
> Whe we are allocation the block we search the prealloc list and mark the
> prealloc space used via incrementing pa_count value. One after succesffuly
> allocating the block we need to update the block bitmap and this could
> actually involved a disk io if the bitmap need to read from the disk. This
> actually cause prealloc space to be marked used for quiet a long time. This
> inturn results in the discard logic going on tight loop resulting in watchdog
> timer triggering soft lockup warning.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/mballoc.c | 12 +++---------
> 1 files changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 844765c..cbc8143 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3729,7 +3729,7 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
> struct list_head list;
> struct ext4_buddy e4b;
> int err;
> - int busy;
> + int busy = 0;
> int free = 0;
>
> mb_debug("discard preallocation for group %lu\n", group);
> @@ -3754,20 +3754,12 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
> INIT_LIST_HEAD(&list);
>
> repeat:
> - busy = 0;
> ext4_lock_group(sb, group);
> list_for_each_entry_safe(pa, tmp,
> &grp->bb_prealloc_list, pa_group_list) {
> spin_lock(&pa->pa_lock);
> if (atomic_read(&pa->pa_count)) {
> spin_unlock(&pa->pa_lock);
> - /* FIXME!!
> - * It is quiet natural to have the pa being
> - * used on other cpus when we are trying free
> - * space
> - printk(KERN_ERR "uh! busy PA\n");
> - dump_stack();
> - */
> busy = 1;
> continue;
> }
> @@ -3790,7 +3782,9 @@ repeat:
>
> /* if we still need more blocks and some PAs were used, try again */
> if (free < needed && busy) {
> + busy = 0;
> ext4_unlock_group(sb, group);
> + schedule_timeout(HZ);
> goto repeat;
> }
>

2007-12-21 19:10:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Dec 21, 2007 16:39 +0530, Aneesh Kumar K.V wrote:
> @@ -3790,7 +3782,9 @@ repeat:
>
> /* if we still need more blocks and some PAs were used, try again */
> if (free < needed && busy) {
> + busy = 0;
> ext4_unlock_group(sb, group);
> + schedule_timeout(HZ);
> goto repeat;
> }

Is there nothing we could actually wait on instead of just sleeping for
1 second?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2007-12-24 18:19:31

by Alex Tomas

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

Andreas Dilger wrote:
> On Dec 21, 2007 16:39 +0530, Aneesh Kumar K.V wrote:
>> @@ -3790,7 +3782,9 @@ repeat:
>>
>> /* if we still need more blocks and some PAs were used, try again */
>> if (free < needed && busy) {
>> + busy = 0;
>> ext4_unlock_group(sb, group);
>> + schedule_timeout(HZ);
>> goto repeat;
>> }
>
> Is there nothing we could actually wait on instead of just sleeping for
> 1 second?

actually it was done for simplicity - in my tests busy PA happened quite rare.
I have no objection to improve this with special wait queue.

thanks, Alex

2007-12-24 18:45:34

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Dec 24, 2007 21:18 +0300, Alex Tomas wrote:
> Andreas Dilger wrote:
>> On Dec 21, 2007 16:39 +0530, Aneesh Kumar K.V wrote:
>>> @@ -3790,7 +3782,9 @@ repeat:
>>> /* if we still need more blocks and some PAs were used, try again */
>>> if (free < needed && busy) {
>>> + busy = 0;
>>> ext4_unlock_group(sb, group);
>>> + schedule_timeout(HZ);
>>> goto repeat;
>>> }
>>
>> Is there nothing we could actually wait on instead of just sleeping for
>> 1 second?
>
> actually it was done for simplicity - in my tests busy PA happened quite rare.
> I have no objection to improve this with special wait queue.

If it is a very rare case, then I have no objection. I just wanted to
avoid some sort of "Nagle" case where suddenly a workload is taking 1s
instead of 1ms to complete each IO.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-01-09 12:10:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

> With the multi block allocator when we don't have prealloc space we discard
> the existing preallocaltion data and try to rebuild the buddy cache. While
> discarding the loop through the group specific prealloc list. If we find any
> particular prealloc space being used we mark the space busy. If we are not
> able to find enough free space and if we have any prealloc space busy we loop
> back again. With non preempted kernel this tight loop resulted in watchdog
> timer triggering soft lockup warning.
>
>
> Whe we are allocation the block we search the prealloc list and mark the
> prealloc space used via incrementing pa_count value. One after succesffuly
> allocating the block we need to update the block bitmap and this could
> actually involved a disk io if the bitmap need to read from the disk. This
> actually cause prealloc space to be marked used for quiet a long time. This
> inturn results in the discard logic going on tight loop resulting in watchdog
> timer triggering soft lockup warning.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/mballoc.c | 12 +++---------
> 1 files changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 844765c..cbc8143 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3729,7 +3729,7 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
> struct list_head list;
> struct ext4_buddy e4b;
> int err;
> - int busy;
> + int busy = 0;
> int free = 0;
>
> mb_debug("discard preallocation for group %lu\n", group);
> @@ -3754,20 +3754,12 @@ static int ext4_mb_discard_group_preallocations(struct super_block *sb,
> INIT_LIST_HEAD(&list);
>
> repeat:
> - busy = 0;
> ext4_lock_group(sb, group);
> list_for_each_entry_safe(pa, tmp,
> &grp->bb_prealloc_list, pa_group_list) {
> spin_lock(&pa->pa_lock);
> if (atomic_read(&pa->pa_count)) {
> spin_unlock(&pa->pa_lock);
> - /* FIXME!!
> - * It is quiet natural to have the pa being
> - * used on other cpus when we are trying free
> - * space
> - printk(KERN_ERR "uh! busy PA\n");
> - dump_stack();
> - */
> busy = 1;
> continue;
> }
> @@ -3790,7 +3782,9 @@ repeat:
>
> /* if we still need more blocks and some PAs were used, try again */
> if (free < needed && busy) {
> + busy = 0;
> ext4_unlock_group(sb, group);
> + schedule_timeout(HZ);
> goto repeat;
> }
Hmm, wouldn't just schedule() be enough here? That would give a good
chance to other processes to proceed and we would avoid this artificial
wait of 1s which is quite ugly IMO.

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2008-01-09 18:24:35

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Wed, Jan 09, 2008 at 01:10:41PM +0100, Jan Kara wrote:
> > With the multi block allocator when we don't have prealloc space we discard
> > @@ -3790,7 +3782,9 @@ repeat:
> >
> > /* if we still need more blocks and some PAs were used, try again */
> > if (free < needed && busy) {
> > + busy = 0;
> > ext4_unlock_group(sb, group);
> > + schedule_timeout(HZ);
> > goto repeat;
> > }
> Hmm, wouldn't just schedule() be enough here? That would give a good
> chance to other processes to proceed and we would avoid this artificial
> wait of 1s which is quite ugly IMO.
>
> Honza

But then who will wake up the task ?. I have the below comment added to
the patch in the patch queue.

/*
* We see this quiet rare. But if a particular workload is
* effected by this we may need to add a waitqueue
*/


-aneesh

2008-01-09 18:44:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Wed 09-01-08 23:54:28, Aneesh Kumar K.V wrote:
> On Wed, Jan 09, 2008 at 01:10:41PM +0100, Jan Kara wrote:
> > > With the multi block allocator when we don't have prealloc space we discard
> > > @@ -3790,7 +3782,9 @@ repeat:
> > >
> > > /* if we still need more blocks and some PAs were used, try again */
> > > if (free < needed && busy) {
> > > + busy = 0;
> > > ext4_unlock_group(sb, group);
> > > + schedule_timeout(HZ);
> > > goto repeat;
> > > }
> > Hmm, wouldn't just schedule() be enough here? That would give a good
> > chance to other processes to proceed and we would avoid this artificial
> > wait of 1s which is quite ugly IMO.
> >
>
> But then who will wake up the task ?. I have the below comment added to
> the patch in the patch queue.
As far as I know, you don't have to wake-up the task explicitely.
Scheduler will simply schedule the task sometime in future (it is a similar
situation as if the task got preempted in the kernel).

> /*
> * We see this quiet rare. But if a particular workload is
> * effected by this we may need to add a waitqueue
> */
Yes, adding that comment is good in any case :).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-01-09 19:11:17

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Wed, Jan 09, 2008 at 07:44:30PM +0100, Jan Kara wrote:
> On Wed 09-01-08 23:54:28, Aneesh Kumar K.V wrote:
> > On Wed, Jan 09, 2008 at 01:10:41PM +0100, Jan Kara wrote:
> > > > With the multi block allocator when we don't have prealloc space we discard
> > > > @@ -3790,7 +3782,9 @@ repeat:
> > > >
> > > > /* if we still need more blocks and some PAs were used, try again */
> > > > if (free < needed && busy) {
> > > > + busy = 0;
> > > > ext4_unlock_group(sb, group);
> > > > + schedule_timeout(HZ);
> > > > goto repeat;
> > > > }
> > > Hmm, wouldn't just schedule() be enough here? That would give a good
> > > chance to other processes to proceed and we would avoid this artificial
> > > wait of 1s which is quite ugly IMO.
> > >
> >
> > But then who will wake up the task ?. I have the below comment added to
> > the patch in the patch queue.
> As far as I know, you don't have to wake-up the task explicitely.
> Scheduler will simply schedule the task sometime in future (it is a similar
> situation as if the task got preempted in the kernel).

I missed the current->state = TASK_UNINTERRUPTIBLE; in that code piece.
So yes without setting the task state yes it would put it back to the run
queue.Infact schedule_timeout() without changing the task state also
behaves similarly. Now that that we know that it if fine just to have a
schedule() there since schedule_timeout() was just behaving as
schedule(). I guess we should make the change you suggested. In that
case we can remove the comment which says we need to add the wait queue.

Mingming,

Do you want me to send a patch or can you make the modification ?

-aneesh

2008-01-09 22:02:21

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

On Thu, 2008-01-10 at 00:41 +0530, Aneesh Kumar K.V wrote:
> On Wed, Jan 09, 2008 at 07:44:30PM +0100, Jan Kara wrote:
> > On Wed 09-01-08 23:54:28, Aneesh Kumar K.V wrote:
> > > On Wed, Jan 09, 2008 at 01:10:41PM +0100, Jan Kara wrote:
> > > > > With the multi block allocator when we don't have prealloc space we discard
> > > > > @@ -3790,7 +3782,9 @@ repeat:
> > > > >
> > > > > /* if we still need more blocks and some PAs were used, try again */
> > > > > if (free < needed && busy) {
> > > > > + busy = 0;
> > > > > ext4_unlock_group(sb, group);
> > > > > + schedule_timeout(HZ);
> > > > > goto repeat;
> > > > > }
> > > > Hmm, wouldn't just schedule() be enough here? That would give a good
> > > > chance to other processes to proceed and we would avoid this artificial
> > > > wait of 1s which is quite ugly IMO.
> > > >
> > >
> > > But then who will wake up the task ?. I have the below comment added to
> > > the patch in the patch queue.
> > As far as I know, you don't have to wake-up the task explicitely.
> > Scheduler will simply schedule the task sometime in future (it is a similar
> > situation as if the task got preempted in the kernel).
>
> I missed the current->state = TASK_UNINTERRUPTIBLE; in that code piece.
> So yes without setting the task state yes it would put it back to the run
> queue.Infact schedule_timeout() without changing the task state also
> behaves similarly. Now that that we know that it if fine just to have a
> schedule() there since schedule_timeout() was just behaving as
> schedule(). I guess we should make the change you suggested. In that
> case we can remove the comment which says we need to add the wait queue.
>
> Mingming,
>
> Do you want me to send a patch or can you make the modification ?
>

I could make the changes and update the mballoc patch in the queue.

Mingming
> -aneesh
>