LinuxLists.cc - [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

2014-07-31 10:16:45

Subject: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.

This commit can lead to deadlocks by way of what at a high level
appears to look like a missing wakeup on mutex_unlock() when
CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
their kernels. In particular, it causes reproducible deadlocks in
libceph/rbd code under higher than moderate loads with the evidence
actually pointing to the bowels of mutex_lock().

kernel/locking/mutex.c, __mutex_lock_common():
476 osq_unlock(&lock->osq);
477 slowpath:
478 /*
479 * If we fell out of the spin path because of need_resched(),
480 * reschedule now, before we try-lock the mutex. This avoids getting
481 * scheduled out right after we obtained the mutex.
482 */
483 if (need_resched())
484 schedule_preempt_disabled(); <-- never returns
485 #endif
486 spin_lock_mutex(&lock->wait_lock, flags);

We started bumping into deadlocks in QA the day our branch has been
rebased onto 3.15 (the release this commit went in) but then as part of
debugging effort I enabled all locking debug options, which also
disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
which is why it hasn't been looked into until now. Revert makes the
problem go away, confirmed by our users.

Cc: Peter Zijlstra <[email protected]>
Cc: [email protected] # 3.15
Signed-off-by: Ilya Dryomov <[email protected]>
---
kernel/locking/mutex.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index acca2c1a3c5e..746ff280a2fc 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -475,13 +475,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}
osq_unlock(&lock->osq);
slowpath:
- /*
- * If we fell out of the spin path because of need_resched(),
- * reschedule now, before we try-lock the mutex. This avoids getting
- * scheduled out right after we obtained the mutex.
- */
- if (need_resched())
- schedule_preempt_disabled();
#endif
spin_lock_mutex(&lock->wait_lock, flags);

--
1.7.10.4

2014-07-31 11:58:13

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>
> This commit can lead to deadlocks by way of what at a high level
> appears to look like a missing wakeup on mutex_unlock() when
> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
> their kernels. In particular, it causes reproducible deadlocks in
> libceph/rbd code under higher than moderate loads with the evidence
> actually pointing to the bowels of mutex_lock().
>
> kernel/locking/mutex.c, __mutex_lock_common():
> 476 osq_unlock(&lock->osq);
> 477 slowpath:
> 478 /*
> 479 * If we fell out of the spin path because of need_resched(),
> 480 * reschedule now, before we try-lock the mutex. This avoids getting
> 481 * scheduled out right after we obtained the mutex.
> 482 */
> 483 if (need_resched())
> 484 schedule_preempt_disabled(); <-- never returns
> 485 #endif
> 486 spin_lock_mutex(&lock->wait_lock, flags);
>
> We started bumping into deadlocks in QA the day our branch has been
> rebased onto 3.15 (the release this commit went in) but then as part of
> debugging effort I enabled all locking debug options, which also
> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
> which is why it hasn't been looked into until now. Revert makes the
> problem go away, confirmed by our users.

This doesn't make sense and you fail to explain how this can possibly
deadlock.

Attachments:

(No filename) (1.50 kB)
(No filename) (836.00 B)
Download all attachments

2014-07-31 12:37:31

by Ilya Dryomov

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 3:57 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Jul 31, 2014 at 02:16:37PM +0400, Ilya Dryomov wrote:
>> This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900.
>>
>> This commit can lead to deadlocks by way of what at a high level
>> appears to look like a missing wakeup on mutex_unlock() when
>> CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship
>> their kernels. In particular, it causes reproducible deadlocks in
>> libceph/rbd code under higher than moderate loads with the evidence
>> actually pointing to the bowels of mutex_lock().
>>
>> kernel/locking/mutex.c, __mutex_lock_common():
>> 476 osq_unlock(&lock->osq);
>> 477 slowpath:
>> 478 /*
>> 479 * If we fell out of the spin path because of need_resched(),
>> 480 * reschedule now, before we try-lock the mutex. This avoids getting
>> 481 * scheduled out right after we obtained the mutex.
>> 482 */
>> 483 if (need_resched())
>> 484 schedule_preempt_disabled(); <-- never returns
>> 485 #endif
>> 486 spin_lock_mutex(&lock->wait_lock, flags);
>>
>> We started bumping into deadlocks in QA the day our branch has been
>> rebased onto 3.15 (the release this commit went in) but then as part of
>> debugging effort I enabled all locking debug options, which also
>> disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear,
>> which is why it hasn't been looked into until now. Revert makes the
>> problem go away, confirmed by our users.
>
> This doesn't make sense and you fail to explain how this can possibly
> deadlock.

This didn't make sense to me at first too, and I'll be happy to be
proven wrong, but we can reproduce this with rbd very reliably under
higher than usual load, and the revert makes it go away. What we are
seeing in the rbd scenario is the following.

Suppose foo needs mutexes A and B, bar needs mutex B. foo acquires
A and then wants to acquire B, but B is held by bar. foo spins
a little and ends up calling schedule_preempt_disabled() on line 484
above, but that call never returns, even though a hundred usecs later
bar releases B. foo ends up stuck in mutex_lock() indefinitely, but
still holds A and everybody else who needs A gets behind A. Given that
this A happens to be a central libceph mutex all rbd activity halts.
Deadlock may not be the best term for this, but never returning from
mutex_lock(&B) even though B has been unlocked is *a* problem.

This obviously doesn't happen every time schedule_preempt_disabled() on
line 484 is called, so there must be some sort of race here. I'll send
along the actual rbd stack traces shortly.

Thanks,

Ilya

2014-07-31 13:13:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:

> This didn't make sense to me at first too, and I'll be happy to be
> proven wrong, but we can reproduce this with rbd very reliably under
> higher than usual load, and the revert makes it go away. What we are
> seeing in the rbd scenario is the following.

This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
there.

> Suppose foo needs mutexes A and B, bar needs mutex B. foo acquires
> A and then wants to acquire B, but B is held by bar. foo spins
> a little and ends up calling schedule_preempt_disabled() on line 484
> above, but that call never returns, even though a hundred usecs later
> bar releases B. foo ends up stuck in mutex_lock() indefinitely, but
> still holds A and everybody else who needs A gets behind A. Given that
> this A happens to be a central libceph mutex all rbd activity halts.
> Deadlock may not be the best term for this, but never returning from
> mutex_lock(&B) even though B has been unlocked is *a* problem.
>
> This obviously doesn't happen every time schedule_preempt_disabled() on
> line 484 is called, so there must be some sort of race here. I'll send
> along the actual rbd stack traces shortly.

Smells like maybe current->state != TASK_RUNNING, does the below
trigger?

If so, you've wrecked something in whatever...

---
kernel/locking/mutex.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ae712b25e492..3d726fdaa764 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
* reschedule now, before we try-lock the mutex. This avoids getting
* scheduled out right after we obtained the mutex.
*/
- if (need_resched())
+ if (need_resched()) {
+ if (WARN_ON_ONCE(current->state != TASK_RUNNING))
+ __set_current_state(TASK_RUNNING);
+
schedule_preempt_disabled();
+ }
#endif
spin_lock_mutex(&lock->wait_lock, flags);

Attachments:

(No filename) (2.01 kB)
(No filename) (836.00 B)
Download all attachments

2014-07-31 13:25:24

by Ilya Dryomov

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:
>
>> This didn't make sense to me at first too, and I'll be happy to be
>> proven wrong, but we can reproduce this with rbd very reliably under
>> higher than usual load, and the revert makes it go away. What we are
>> seeing in the rbd scenario is the following.
>
> This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
> there.

This is in net/ceph, include/linux/ceph.

Mutex A - struct ceph_osd_client::request_mutex, taken in alloc_msg(),
handle_timeout(), handle_osds_timeout(), ceph_osdc_start_request().

Mutex B - struct ceph_connection::mutex, taken in ceph_con_send().

dmesg with a sample dump of blocked tasks attached.

Basically everybody except kjournald:4398 is waiting for request_mutex,
which kjournald acquired in ceph_osdc_start_request(). kjournald
however itself sits waiting for ceph_connection::mutex, even though it
has been released.

>> Suppose foo needs mutexes A and B, bar needs mutex B. foo acquires
>> A and then wants to acquire B, but B is held by bar. foo spins
>> a little and ends up calling schedule_preempt_disabled() on line 484
>> above, but that call never returns, even though a hundred usecs later
>> bar releases B. foo ends up stuck in mutex_lock() indefinitely, but
>> still holds A and everybody else who needs A gets behind A. Given that
>> this A happens to be a central libceph mutex all rbd activity halts.
>> Deadlock may not be the best term for this, but never returning from
>> mutex_lock(&B) even though B has been unlocked is *a* problem.
>>
>> This obviously doesn't happen every time schedule_preempt_disabled() on
>> line 484 is called, so there must be some sort of race here. I'll send
>> along the actual rbd stack traces shortly.
>
> Smells like maybe current->state != TASK_RUNNING, does the below
> trigger?
>
> If so, you've wrecked something in whatever...

Trying it now.

Thanks,

Ilya

Attachments:

dmesg-11 (78.28 kB)

2014-07-31 13:44:23

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

* Peter Zijlstra <[email protected]> wrote:

> On Thu, Jul 31, 2014 at 04:37:29PM +0400, Ilya Dryomov wrote:
>
> > This didn't make sense to me at first too, and I'll be happy to be
> > proven wrong, but we can reproduce this with rbd very reliably under
> > higher than usual load, and the revert makes it go away. What we are
> > seeing in the rbd scenario is the following.
>
> This is drivers/block/rbd.c ? I can find but a single mutex_lock() in
> there.
>
> > Suppose foo needs mutexes A and B, bar needs mutex B. foo acquires
> > A and then wants to acquire B, but B is held by bar. foo spins
> > a little and ends up calling schedule_preempt_disabled() on line 484
> > above, but that call never returns, even though a hundred usecs later
> > bar releases B. foo ends up stuck in mutex_lock() indefinitely, but
> > still holds A and everybody else who needs A gets behind A. Given that
> > this A happens to be a central libceph mutex all rbd activity halts.
> > Deadlock may not be the best term for this, but never returning from
> > mutex_lock(&B) even though B has been unlocked is *a* problem.
> >
> > This obviously doesn't happen every time schedule_preempt_disabled() on
> > line 484 is called, so there must be some sort of race here. I'll send
> > along the actual rbd stack traces shortly.
>
> Smells like maybe current->state != TASK_RUNNING, does the below
> trigger?
>
> If so, you've wrecked something in whatever...
>
> ---
> kernel/locking/mutex.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index ae712b25e492..3d726fdaa764 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> * reschedule now, before we try-lock the mutex. This avoids getting
> * scheduled out right after we obtained the mutex.
> */
> - if (need_resched())
> + if (need_resched()) {
> + if (WARN_ON_ONCE(current->state != TASK_RUNNING))
> + __set_current_state(TASK_RUNNING);
> +
> schedule_preempt_disabled();
> + }

Might make sense to add that debug check under mutex debugging or so,
with a sensible kernel message printed.

Thanks,

Ingo

2014-07-31 13:56:31

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 03:44:11PM +0200, Ingo Molnar wrote:
>
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index ae712b25e492..3d726fdaa764 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -473,8 +473,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> > * reschedule now, before we try-lock the mutex. This avoids getting
> > * scheduled out right after we obtained the mutex.
> > */
> > - if (need_resched())
> > + if (need_resched()) {
> > + if (WARN_ON_ONCE(current->state != TASK_RUNNING))
> > + __set_current_state(TASK_RUNNING);
> > +
> > schedule_preempt_disabled();
> > + }
>
> Might make sense to add that debug check under mutex debugging or so,
> with a sensible kernel message printed.

Something like so? I suppose we should do a similar one for rwsem,
semaphores and possibly wait_event*() too.

---
Subject: locking/mutex: Add debug check for task state

Calling blocking locks with current->state != TASK_RUNNING is a bug.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/locking/mutex.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ae712b25e492..d5daf8c38899 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -375,6 +375,17 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
unsigned long flags;
int ret;

+#ifdef CONFIG_DEBUG_MUTEXES
+ /*
+ * Blocking primitives (including this one) will set (and therefore
+ * destroy) current->state, since we will exit with TASK_RUNNING
+ * make sure we enter with it, otherwise we will destroy state.
+ */
+ if (WARN_ONCE(current->state != TASK_RUNNING,
+ "do not call blocking locks when !TASK_RUNNING\n"))
+ __set_current_state(TASK_RUNNING);
+#endif
+
preempt_disable();
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);

Attachments:

(No filename) (1.91 kB)
(No filename) (836.00 B)
Download all attachments

2014-07-31 14:31:05

by Mike Galbraith

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:

> Smells like maybe current->state != TASK_RUNNING

Bingo

[ 1200.851004] kjournald D 0000000000000002 0 4398 2 0x00000000
[ 1200.858283] ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
[ 1200.865914] ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
[ 1200.873590] ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
[ 1200.881256] Call Trace:
[ 1200.883724] [<ffffffff816981c9>] schedule+0x29/0x70
[ 1200.888798] [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
[ 1200.895239] [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
[ 1200.901673] [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
[ 1200.908198] [<ffffffff8169a013>] mutex_lock+0x23/0x37
[ 1200.913430] [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
[ 1200.919912] [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
[ 1200.926444] [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
[ 1200.933855] [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
[ 1200.941126] [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
[ 1200.948622] [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
[ 1200.956040] [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
[ 1200.962845] [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
[ 1200.969108] [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
[ 1200.974764] [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
[ 1200.980424] [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
[ 1200.986635] [<ffffffff81698288>] io_schedule+0x78/0xd0
[ 1200.991954] [<ffffffff8133b864>] get_request+0x414/0x800
[ 1200.997440] [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
[ 1201.004013] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.009782] [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
[ 1201.015353] [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
[ 1201.021605] [<ffffffff8133c385>] submit_bio+0x75/0x140
[ 1201.026921] [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
[ 1201.032390] [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
[ 1201.038662] [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
[ 1201.045683] [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
[ 1201.052043] [<ffffffff81293e01>] kjournald+0xe1/0x260
[ 1201.057324] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
[ 1201.063072] [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
[ 1201.068855] [<ffffffff81078829>] kthread+0xc9/0xe0
[ 1201.073819] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
[ 1201.080084] [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
[ 1201.085573] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

2014-07-31 14:37:43

by Ilya Dryomov

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 6:30 PM, Mike Galbraith
<[email protected]> wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>
>> Smells like maybe current->state != TASK_RUNNING

It just triggered for me too, took longer than usual. Sorry for the
churn Peter, this was really confusing. Onto finding the real bug..

Thanks,

Ilya

2014-07-31 14:40:01

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point"

On Thu, Jul 31, 2014 at 04:30:52PM +0200, Mike Galbraith wrote:
> On Thu, 2014-07-31 at 15:13 +0200, Peter Zijlstra wrote:
>
> > Smells like maybe current->state != TASK_RUNNING
>
> Bingo
>
> [ 1200.851004] kjournald D 0000000000000002 0 4398 2 0x00000000
> [ 1200.858283] ffff8803537bb788 0000000000000046 ffff8803537bb7a8 0000000000000000
> [ 1200.865914] ffff880423324b60 0000000000012f80 ffff8803537bbfd8 0000000000012f80
> [ 1200.873590] ffff88042937cb60 ffff880423324b60 ffff880428ceb240 ffff8804231e59b8
> [ 1200.881256] Call Trace:
> [ 1200.883724] [<ffffffff816981c9>] schedule+0x29/0x70
> [ 1200.888798] [<ffffffff8169849e>] schedule_preempt_disabled+0xe/0x10
> [ 1200.895239] [<ffffffff81699fe5>] __mutex_lock_slowpath+0x1b5/0x1c0
> [ 1200.901673] [<ffffffffa0479826>] ? ceph_str_hash+0x26/0x80 [libceph]
> [ 1200.908198] [<ffffffff8169a013>] mutex_lock+0x23/0x37
> [ 1200.913430] [<ffffffffa046751d>] ceph_con_send+0x4d/0x130 [libceph]
> [ 1200.919912] [<ffffffffa046c540>] __send_queued+0x120/0x150 [libceph]
> [ 1200.926444] [<ffffffffa046ec5b>] __ceph_osdc_start_request+0x5b/0xd0 [libceph]
> [ 1200.933855] [<ffffffffa046ed21>] ceph_osdc_start_request+0x51/0x80 [libceph]
> [ 1200.941126] [<ffffffffa042bf60>] rbd_obj_request_submit.isra.25+0x10/0x20 [rbd]
> [ 1200.948622] [<ffffffffa042e8ee>] rbd_img_obj_request_submit+0x1ce/0x460 [rbd]
> [ 1200.956040] [<ffffffffa042ebcc>] rbd_img_request_submit+0x4c/0x60 [rbd]
> [ 1200.962845] [<ffffffffa042f2a8>] rbd_request_fn+0x238/0x290 [rbd]
> [ 1200.969108] [<ffffffff8133a397>] __blk_run_queue+0x37/0x50
> [ 1200.974764] [<ffffffff8133affd>] queue_unplugged+0x3d/0xc0
> [ 1200.980424] [<ffffffff8133fddb>] blk_flush_plug_list+0x1db/0x210
> [ 1200.986635] [<ffffffff81698288>] io_schedule+0x78/0xd0
> [ 1200.991954] [<ffffffff8133b864>] get_request+0x414/0x800
> [ 1200.997440] [<ffffffff8133f477>] ? bio_attempt_back_merge+0x37/0x100
> [ 1201.004013] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.009782] [<ffffffff8133ff2c>] blk_queue_bio+0xcc/0x360
> [ 1201.015353] [<ffffffff8133c2d0>] generic_make_request+0xc0/0x100
> [ 1201.021605] [<ffffffff8133c385>] submit_bio+0x75/0x140
> [ 1201.026921] [<ffffffff811de4e6>] _submit_bh+0x136/0x1f0
> [ 1201.032390] [<ffffffff81290081>] journal_do_submit_data+0x41/0x50
> [ 1201.038662] [<ffffffff81291380>] journal_commit_transaction+0x1150/0x1350
> [ 1201.045683] [<ffffffff81063aff>] ? try_to_del_timer_sync+0x4f/0x70
> [ 1201.052043] [<ffffffff81293e01>] kjournald+0xe1/0x260
> [ 1201.057324] [<ffffffff8109b9e0>] ? __wake_up_sync+0x20/0x20
> [ 1201.063072] [<ffffffff81293d20>] ? commit_timeout+0x10/0x10
> [ 1201.068855] [<ffffffff81078829>] kthread+0xc9/0xe0
> [ 1201.073819] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0
> [ 1201.080084] [<ffffffff8169bb6c>] ret_from_fork+0x7c/0xb0
> [ 1201.085573] [<ffffffff81078760>] ? flush_kthread_worker+0xb0/0xb0

Ohh. that's properly broken indeed.

You can't just call blocking primitives on the way to schedule(), that's
fail.

Also, if I look at blk_flush_plug_list(), it calls queue_unplugged()
with IRQs disabled, so _who_ is enabling them again and calling blocking
stuff?

/me stares more..

rbd_request_fn() does.. *argh*

Someone needs to go fix, this cannot work right.

Attachments:

(No filename) (3.23 kB)
(No filename) (836.00 B)
Download all attachments