LinuxLists.cc - more cfq spinlock badness

2006-01-31 06:39:46

Subject: more cfq spinlock badness

Not seen this break for a while, but I just hit it again in 2.6.16rc1-git4.

Dave

BUG: spinlock bad magic on CPU#0, pdflush/1128
lock: ffff81003a219000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0

Call Trace: <ffffffff80206edc>{spin_bug+177} <ffffffff80207045>{_raw_spin_lock+25}
<ffffffff801fea4a>{cfq_exit_single_io_context+85} <ffffffff801ff9a6>{cfq_exit_io_context+24}
<ffffffff801f79b0>{exit_io_context+137} <ffffffff80135fbc>{do_exit+182}
<ffffffff8010ba49>{child_rip+15} <ffffffff80146087>{keventd_create_kthread+0}
<ffffffff8014629c>{kthread+0} <ffffffff8010ba3a>{child_rip+0}
Kernel panic - not syncing: bad locking

2006-01-31 09:07:32

by Jens Axboe

[permalink] [raw]

Subject: Re: more cfq spinlock badness

On Tue, Jan 31 2006, Dave Jones wrote:
> Not seen this break for a while, but I just hit it again in 2.6.16rc1-git4.
>
> Dave
>
> BUG: spinlock bad magic on CPU#0, pdflush/1128
> lock: ffff81003a219000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
>
> Call Trace: <ffffffff80206edc>{spin_bug+177} <ffffffff80207045>{_raw_spin_lock+25}
> <ffffffff801fea4a>{cfq_exit_single_io_context+85} <ffffffff801ff9a6>{cfq_exit_io_context+24}
> <ffffffff801f79b0>{exit_io_context+137} <ffffffff80135fbc>{do_exit+182}
> <ffffffff8010ba49>{child_rip+15} <ffffffff80146087>{keventd_create_kthread+0}
> <ffffffff8014629c>{kthread+0} <ffffffff8010ba3a>{child_rip+0}
> Kernel panic - not syncing: bad locking

Again, which devices have you used? Did it happen at shutdown, or? Did
the ub bug get fixed, if you are using that? The bug above has in the
past always been able to be explained by a driver destroying a structure
embedding the queue lock before the queue is dead.

--
Jens Axboe

2006-01-31 17:36:09

by Dave Jones

[permalink] [raw]

Subject: Re: more cfq spinlock badness

On Tue, Jan 31, 2006 at 10:09:45AM +0100, Jens Axboe wrote:
> On Tue, Jan 31 2006, Dave Jones wrote:
> > Not seen this break for a while, but I just hit it again in 2.6.16rc1-git4.
> >
> > Dave
> >
> > BUG: spinlock bad magic on CPU#0, pdflush/1128
> > lock: ffff81003a219000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> >
> > Call Trace: <ffffffff80206edc>{spin_bug+177} <ffffffff80207045>{_raw_spin_lock+25}
> > <ffffffff801fea4a>{cfq_exit_single_io_context+85} <ffffffff801ff9a6>{cfq_exit_io_context+24}
> > <ffffffff801f79b0>{exit_io_context+137} <ffffffff80135fbc>{do_exit+182}
> > <ffffffff8010ba49>{child_rip+15} <ffffffff80146087>{keventd_create_kthread+0}
> > <ffffffff8014629c>{kthread+0} <ffffffff8010ba3a>{child_rip+0}
> > Kernel panic - not syncing: bad locking
>
> Again, which devices have you used?

nothing special (Ie, no usb bits, just the onboard ata_piix SATA)

> Did it happen at shutdown, or?

whislt starting up a bunch of gnome panel applets.

> Did the ub bug get fixed

yes

> if you are using that? The bug above has in the
> past always been able to be explained by a driver destroying a structure
> embedding the queue lock before the queue is dead.

as there were no ub devices plugged in at the time, I think
Pete is off the hook for this one.

Dave

2006-02-01 11:00:11

by Jens Axboe

[permalink] [raw]

Subject: Re: more cfq spinlock badness

On Tue, Jan 31 2006, Dave Jones wrote:
> On Tue, Jan 31, 2006 at 10:09:45AM +0100, Jens Axboe wrote:
> > On Tue, Jan 31 2006, Dave Jones wrote:
> > > Not seen this break for a while, but I just hit it again in 2.6.16rc1-git4.
> > >
> > > Dave
> > >
> > > BUG: spinlock bad magic on CPU#0, pdflush/1128
> > > lock: ffff81003a219000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> > >
> > > Call Trace: <ffffffff80206edc>{spin_bug+177} <ffffffff80207045>{_raw_spin_lock+25}
> > > <ffffffff801fea4a>{cfq_exit_single_io_context+85} <ffffffff801ff9a6>{cfq_exit_io_context+24}
> > > <ffffffff801f79b0>{exit_io_context+137} <ffffffff80135fbc>{do_exit+182}
> > > <ffffffff8010ba49>{child_rip+15} <ffffffff80146087>{keventd_create_kthread+0}
> > > <ffffffff8014629c>{kthread+0} <ffffffff8010ba3a>{child_rip+0}
> > > Kernel panic - not syncing: bad locking
> >
> > Again, which devices have you used?
>
> nothing special (Ie, no usb bits, just the onboard ata_piix SATA)
>
> > Did it happen at shutdown, or?
>
> whislt starting up a bunch of gnome panel applets.
>
> > Did the ub bug get fixed
>
> yes
>
> > if you are using that? The bug above has in the
> > past always been able to be explained by a driver destroying a structure
> > embedding the queue lock before the queue is dead.
>
> as there were no ub devices plugged in at the time, I think
> Pete is off the hook for this one.

The ub fix hasn't been merged yet. Just trying to be absolutely certain
that you didn't have that loaded?

--
Jens Axboe

2006-02-01 16:15:56

by Dave Jones

[permalink] [raw]

Subject: Re: more cfq spinlock badness

On Wed, Feb 01, 2006 at 12:02:28PM +0100, Jens Axboe wrote:

> > > if you are using that? The bug above has in the
> > > past always been able to be explained by a driver destroying a structure
> > > embedding the queue lock before the queue is dead.
> >
> > as there were no ub devices plugged in at the time, I think
> > Pete is off the hook for this one.
>
> The ub fix hasn't been merged yet.

Really? I thought Pete synced up a whole bunch of ub bits including that already.

> Just trying to be absolutely certain
> that you didn't have that loaded?

99.9% sure. (It was a few days ago, and I'm already forgetting details).

Dave

2006-02-03 18:44:50

by Pete Zaitcev

[permalink] [raw]

Subject: Re: more cfq spinlock badness

On Wed, 1 Feb 2006 11:15:41 -0500, Dave Jones <[email protected]> wrote:

> > > > Not seen this break for a while, but I just hit it again in 2.6.16rc1-git4.

> > The ub fix hasn't been merged yet.
>
> Really? I thought Pete synced up a whole bunch of ub bits including that already.

I did but the patch sat in Greg's and AKPM's trees for a while, so the
first release which contains the CFQ+ub fix is 2.6.16-rc2.

But in any case if no USB devices were connected, ub cannot be in
the picture. It's true even if they were connected and disconnected.
A queue has to be in use before this happens.

-- Pete