2010-06-01 06:41:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449


* Peter Zijlstra <[email protected]> wrote:

> On Mon, 2010-05-24 at 05:03 +0200, Piotr Hosowicz wrote:
> > [ 720.313607] BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449
> > [ 720.313612] caller is native_sched_clock+0x3c/0x68
> > [ 720.313616] Pid: 5449, comm: icedove-bin Tainted: P 2.6.34-20100524-0407 #1
> > [ 720.313618] Call Trace:
> > [ 720.313624] [<ffffffff811a533b>] debug_smp_processor_id+0xc7/0xe0
> > [ 720.313629] [<ffffffff81009b87>] native_sched_clock+0x3c/0x68
> > [ 720.313634] [<ffffffff81009a4d>] sched_clock+0x9/0xd
> > [ 720.313637] [<ffffffff811823ec>] blk_rq_init+0x92/0x9d
> > [ 720.313641] [<ffffffff81184227>] get_request+0x1bf/0x2c7
> > [ 720.313646] [<ffffffff8118435c>] get_request_wait+0x2d/0x19d
>
> This comes from wreckage in the blk tree..
>
> ---
> commit 9195291e5f05e01d67f9a09c756b8aca8f009089
> Author: Divyesh Shah <[email protected]>
> Date: Thu Apr 1 15:01:41 2010 -0700
>
> blkio: Increment the blkio cgroup stats for real now

Jens, this regression is still in .35-rc1 and triggers in about 25% of all
-tip boot tests.

The above commit is using sched_clock() in an unsafe way - please fix it or
revert it.

The local_clock() changes PeterZ is working on are still WIP, it's not sure
we'll have it before .36.

Thanks,

Ingo


2010-06-01 06:47:29

by Jens Axboe

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449

On Tue, Jun 01 2010, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Mon, 2010-05-24 at 05:03 +0200, Piotr Hosowicz wrote:
> > > [ 720.313607] BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449
> > > [ 720.313612] caller is native_sched_clock+0x3c/0x68
> > > [ 720.313616] Pid: 5449, comm: icedove-bin Tainted: P 2.6.34-20100524-0407 #1
> > > [ 720.313618] Call Trace:
> > > [ 720.313624] [<ffffffff811a533b>] debug_smp_processor_id+0xc7/0xe0
> > > [ 720.313629] [<ffffffff81009b87>] native_sched_clock+0x3c/0x68
> > > [ 720.313634] [<ffffffff81009a4d>] sched_clock+0x9/0xd
> > > [ 720.313637] [<ffffffff811823ec>] blk_rq_init+0x92/0x9d
> > > [ 720.313641] [<ffffffff81184227>] get_request+0x1bf/0x2c7
> > > [ 720.313646] [<ffffffff8118435c>] get_request_wait+0x2d/0x19d
> >
> > This comes from wreckage in the blk tree..
> >
> > ---
> > commit 9195291e5f05e01d67f9a09c756b8aca8f009089
> > Author: Divyesh Shah <[email protected]>
> > Date: Thu Apr 1 15:01:41 2010 -0700
> >
> > blkio: Increment the blkio cgroup stats for real now
>
> Jens, this regression is still in .35-rc1 and triggers in about 25% of all
> -tip boot tests.
>
> The above commit is using sched_clock() in an unsafe way - please fix it or
> revert it.
>
> The local_clock() changes PeterZ is working on are still WIP, it's not sure
> we'll have it before .36.

OK, I guess we'll have to solve this differently for .35 - I'll cook up
something simple, if need be revert the change.

--
Jens Axboe

2010-06-01 06:55:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449


* Jens Axboe <[email protected]> wrote:

> On Tue, Jun 01 2010, Ingo Molnar wrote:
> >
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Mon, 2010-05-24 at 05:03 +0200, Piotr Hosowicz wrote:
> > > > [ 720.313607] BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449
> > > > [ 720.313612] caller is native_sched_clock+0x3c/0x68
> > > > [ 720.313616] Pid: 5449, comm: icedove-bin Tainted: P 2.6.34-20100524-0407 #1
> > > > [ 720.313618] Call Trace:
> > > > [ 720.313624] [<ffffffff811a533b>] debug_smp_processor_id+0xc7/0xe0
> > > > [ 720.313629] [<ffffffff81009b87>] native_sched_clock+0x3c/0x68
> > > > [ 720.313634] [<ffffffff81009a4d>] sched_clock+0x9/0xd
> > > > [ 720.313637] [<ffffffff811823ec>] blk_rq_init+0x92/0x9d
> > > > [ 720.313641] [<ffffffff81184227>] get_request+0x1bf/0x2c7
> > > > [ 720.313646] [<ffffffff8118435c>] get_request_wait+0x2d/0x19d
> > >
> > > This comes from wreckage in the blk tree..
> > >
> > > ---
> > > commit 9195291e5f05e01d67f9a09c756b8aca8f009089
> > > Author: Divyesh Shah <[email protected]>
> > > Date: Thu Apr 1 15:01:41 2010 -0700
> > >
> > > blkio: Increment the blkio cgroup stats for real now
> >
> > Jens, this regression is still in .35-rc1 and triggers in about 25% of all
> > -tip boot tests.
> >
> > The above commit is using sched_clock() in an unsafe way - please fix it or
> > revert it.
> >
> > The local_clock() changes PeterZ is working on are still WIP, it's not sure
> > we'll have it before .36.
>
> OK, I guess we'll have to solve this differently for .35 - I'll cook up
> something simple, if need be revert the change.

I suspect you can put get_cpu/put_cpu around it and use cpu_clock(). The
cross-CPU effects will still be there and there might be weird stats.

Ingo

2010-06-01 07:53:47

by Jens Axboe

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449

On Tue, Jun 01 2010, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
> > On Tue, Jun 01 2010, Ingo Molnar wrote:
> > >
> > > * Peter Zijlstra <[email protected]> wrote:
> > >
> > > > On Mon, 2010-05-24 at 05:03 +0200, Piotr Hosowicz wrote:
> > > > > [ 720.313607] BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449
> > > > > [ 720.313612] caller is native_sched_clock+0x3c/0x68
> > > > > [ 720.313616] Pid: 5449, comm: icedove-bin Tainted: P 2.6.34-20100524-0407 #1
> > > > > [ 720.313618] Call Trace:
> > > > > [ 720.313624] [<ffffffff811a533b>] debug_smp_processor_id+0xc7/0xe0
> > > > > [ 720.313629] [<ffffffff81009b87>] native_sched_clock+0x3c/0x68
> > > > > [ 720.313634] [<ffffffff81009a4d>] sched_clock+0x9/0xd
> > > > > [ 720.313637] [<ffffffff811823ec>] blk_rq_init+0x92/0x9d
> > > > > [ 720.313641] [<ffffffff81184227>] get_request+0x1bf/0x2c7
> > > > > [ 720.313646] [<ffffffff8118435c>] get_request_wait+0x2d/0x19d
> > > >
> > > > This comes from wreckage in the blk tree..
> > > >
> > > > ---
> > > > commit 9195291e5f05e01d67f9a09c756b8aca8f009089
> > > > Author: Divyesh Shah <[email protected]>
> > > > Date: Thu Apr 1 15:01:41 2010 -0700
> > > >
> > > > blkio: Increment the blkio cgroup stats for real now
> > >
> > > Jens, this regression is still in .35-rc1 and triggers in about 25% of all
> > > -tip boot tests.
> > >
> > > The above commit is using sched_clock() in an unsafe way - please fix it or
> > > revert it.
> > >
> > > The local_clock() changes PeterZ is working on are still WIP, it's not sure
> > > we'll have it before .36.
> >
> > OK, I guess we'll have to solve this differently for .35 - I'll cook up
> > something simple, if need be revert the change.
>
> I suspect you can put get_cpu/put_cpu around it and use cpu_clock(). The
> cross-CPU effects will still be there and there might be weird stats.

It'll shut it up at least, which is the primary concern at this point.

--
Jens Axboe

2010-06-02 11:17:10

by Ingo Molnar

[permalink] [raw]
Subject: blkiocg_update_io_add_stats(): INFO: trying to register non-static key


btw., there's another warning triggered by the new blk-cgroups stats code:

sda:INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 81, comm: async/2 Not tainted 2.6.35-rc1-tip-01073-gd2f7698-dirty #6765
Call Trace:
[<ffffffff81065d6d>] register_lock_class+0x15f/0x365
[<ffffffff8105c426>] ? sched_clock_local+0x1d/0x83
[<ffffffff8105c557>] ? sched_clock_cpu+0xcb/0xd9
[<ffffffff81067184>] __lock_acquire+0x97/0x481
[<ffffffff810088d4>] ? native_sched_clock+0x37/0x6d
[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
[<ffffffff81067644>] lock_acquire+0xd6/0xfd
[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
[<ffffffff81850859>] _raw_spin_lock_irqsave+0x47/0x82
[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
[<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e
[<ffffffff81325048>] cfq_insert_request+0xdd/0x47c
[<ffffffff813135b9>] elv_insert+0x12a/0x1f4
[<ffffffff8131371c>] __elv_add_request+0x99/0xa1
[<ffffffff8131817a>] __make_request+0x3e2/0x45f
[<ffffffff8131736e>] generic_make_request+0x2a0/0x316
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff8131749c>] submit_bio+0xb8/0xd6
[<ffffffff81117e1b>] submit_bh+0xf5/0x117
[<ffffffff8111b36e>] block_read_full_page+0x1f2/0x214
[<ffffffff8111ce86>] ? blkdev_get_block+0x0/0x66
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff810c6ddd>] ? __lru_cache_add+0x79/0x97
[<ffffffff8111de4d>] blkdev_readpage+0x18/0x1a
[<ffffffff810becb0>] do_read_cache_page+0x95/0x128
[<ffffffff8111de35>] ? blkdev_readpage+0x0/0x1a
[<ffffffff810bed87>] read_cache_page_async+0x1c/0x1e
[<ffffffff810bed97>] read_cache_page+0xe/0x18
[<ffffffff8114698f>] read_dev_sector+0x33/0xbe
[<ffffffff8114797d>] adfspart_check_EESOX+0x59/0x1e7
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff8185405e>] ? sub_preempt_count+0xf/0xb7
[<ffffffff8103dede>] ? vprintk+0x395/0x3f0
[<ffffffff811472f0>] ? kzalloc.clone.1+0x13/0x15
[<ffffffff8184dfe5>] ? printk+0x69/0x6b
[<ffffffff8114771d>] rescan_partitions+0x186/0x38d
[<ffffffff81513cfb>] ? sd_open+0xf1/0x1bd
[<ffffffff8111e512>] __blkdev_get+0x291/0x372
[<ffffffff8132a4ba>] ? kobject_put+0x48/0x4c
[<ffffffff8111e603>] blkdev_get+0x10/0x12
[<ffffffff81146bcd>] register_disk+0xb9/0x11d
[<ffffffff8131f3c4>] add_disk+0xb0/0x10c
[<ffffffff81515d4c>] sd_probe_async+0x12a/0x200
[<ffffffff8105f4a4>] ? ktime_get+0x65/0xbe
[<ffffffff8105d742>] async_thread+0x10e/0x21c
[<ffffffff81039390>] ? default_wake_function+0x0/0x14
[<ffffffff8105d634>] ? async_thread+0x0/0x21c
[<ffffffff81056c01>] kthread+0x9b/0xa3
[<ffffffff810030f4>] kernel_thread_helper+0x4/0x10
[<ffffffff81056b66>] ? kthread+0x0/0xa3
[<ffffffff810030f0>] ? kernel_thread_helper+0x0/0x10
sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 >
device: 'sda1': device_add
PM: Adding info for No Bus:sda1

seems to be a lock taken by:

[<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e

Ingo

2010-06-02 13:05:01

by Jens Axboe

[permalink] [raw]
Subject: Re: blkiocg_update_io_add_stats(): INFO: trying to register non-static key

On Wed, Jun 02 2010, Ingo Molnar wrote:
>
> btw., there's another warning triggered by the new blk-cgroups stats code:
>
> sda:INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> Pid: 81, comm: async/2 Not tainted 2.6.35-rc1-tip-01073-gd2f7698-dirty #6765
> Call Trace:
> [<ffffffff81065d6d>] register_lock_class+0x15f/0x365
> [<ffffffff8105c426>] ? sched_clock_local+0x1d/0x83
> [<ffffffff8105c557>] ? sched_clock_cpu+0xcb/0xd9
> [<ffffffff81067184>] __lock_acquire+0x97/0x481
> [<ffffffff810088d4>] ? native_sched_clock+0x37/0x6d
> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> [<ffffffff81067644>] lock_acquire+0xd6/0xfd
> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> [<ffffffff81850859>] _raw_spin_lock_irqsave+0x47/0x82
> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> [<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e

So that's blkg->stats_lock - help me out, what is lockdep complaining
about? The lock is initialized, what kind of lockdep annotation magic do
we need to sprinkle on it?

--
Jens Axboe

2010-06-11 01:33:36

by Divyesh Shah

[permalink] [raw]
Subject: Re: blkiocg_update_io_add_stats(): INFO: trying to register non-static key

On Wed, Jun 2, 2010 at 6:04 AM, Jens Axboe <[email protected]> wrote:
> On Wed, Jun 02 2010, Ingo Molnar wrote:
>>
>> btw., there's another warning triggered by the new blk-cgroups stats code:
>>
>> sda:INFO: trying to register non-static key.
>> the code is fine but needs lockdep annotation.
>> turning off the locking correctness validator.
>> Pid: 81, comm: async/2 Not tainted 2.6.35-rc1-tip-01073-gd2f7698-dirty #6765
>> Call Trace:
>> ?[<ffffffff81065d6d>] register_lock_class+0x15f/0x365
>> ?[<ffffffff8105c426>] ? sched_clock_local+0x1d/0x83
>> ?[<ffffffff8105c557>] ? sched_clock_cpu+0xcb/0xd9
>> ?[<ffffffff81067184>] __lock_acquire+0x97/0x481
>> ?[<ffffffff810088d4>] ? native_sched_clock+0x37/0x6d
>> ?[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>> ?[<ffffffff81067644>] lock_acquire+0xd6/0xfd
>> ?[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>> ?[<ffffffff81850859>] _raw_spin_lock_irqsave+0x47/0x82
>> ?[<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>> ?[<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e
>
> So that's blkg->stats_lock - help me out, what is lockdep complaining
> about? The lock is initialized, what kind of lockdep annotation magic do
> we need to sprinkle on it?

Jens,
Were you able to figure out what annotation magic is needed here.
I remember we had fixed another similar warning that Gui reported, but
that was because the root cfqg didn't have the stats_lock initialized
as expected. I'm not sure what needs to be done here either.

-Divyesh

>
> --
> Jens Axboe
>
>

2010-06-11 07:15:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: blkiocg_update_io_add_stats(): INFO: trying to register non-static key

On Wed, 2010-06-02 at 15:04 +0200, Jens Axboe wrote:
> On Wed, Jun 02 2010, Ingo Molnar wrote:
> >
> > btw., there's another warning triggered by the new blk-cgroups stats code:
> >
> > sda:INFO: trying to register non-static key.
> > the code is fine but needs lockdep annotation.
> > turning off the locking correctness validator.
> > Pid: 81, comm: async/2 Not tainted 2.6.35-rc1-tip-01073-gd2f7698-dirty #6765
> > Call Trace:
> > [<ffffffff81065d6d>] register_lock_class+0x15f/0x365
> > [<ffffffff8105c426>] ? sched_clock_local+0x1d/0x83
> > [<ffffffff8105c557>] ? sched_clock_cpu+0xcb/0xd9
> > [<ffffffff81067184>] __lock_acquire+0x97/0x481
> > [<ffffffff810088d4>] ? native_sched_clock+0x37/0x6d
> > [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> > [<ffffffff81067644>] lock_acquire+0xd6/0xfd
> > [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> > [<ffffffff81850859>] _raw_spin_lock_irqsave+0x47/0x82
> > [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
> > [<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e
>
> So that's blkg->stats_lock - help me out, what is lockdep complaining
> about? The lock is initialized, what kind of lockdep annotation magic do
> we need to sprinkle on it?

It basically says its lock instance isn't properly initialised. Usually
spin_lock_init() will set lock->dep_map->key, for static locks,
initialised with __SPIN_LOCK_UNLOCKED() ->key doesn't get set and we use
the lock's address as key.

Now lockdep requires the key to be in static storage, so if you try to
used __SPIN_LOCK_UNLOCKED() on dynamically allocated locks (the most
common form is using static forms like DEFINE_foo() on stack variables),
things go bang.

That said, the block_cgroup.c code seems to use spin_lock_init() so it
_should_ all work out. Use before init/after free perhaps?

2010-06-11 08:34:53

by Jens Axboe

[permalink] [raw]
Subject: Re: blkiocg_update_io_add_stats(): INFO: trying to register non-static key

On 2010-06-11 09:15, Peter Zijlstra wrote:
> On Wed, 2010-06-02 at 15:04 +0200, Jens Axboe wrote:
>> On Wed, Jun 02 2010, Ingo Molnar wrote:
>>>
>>> btw., there's another warning triggered by the new blk-cgroups stats code:
>>>
>>> sda:INFO: trying to register non-static key.
>>> the code is fine but needs lockdep annotation.
>>> turning off the locking correctness validator.
>>> Pid: 81, comm: async/2 Not tainted 2.6.35-rc1-tip-01073-gd2f7698-dirty #6765
>>> Call Trace:
>>> [<ffffffff81065d6d>] register_lock_class+0x15f/0x365
>>> [<ffffffff8105c426>] ? sched_clock_local+0x1d/0x83
>>> [<ffffffff8105c557>] ? sched_clock_cpu+0xcb/0xd9
>>> [<ffffffff81067184>] __lock_acquire+0x97/0x481
>>> [<ffffffff810088d4>] ? native_sched_clock+0x37/0x6d
>>> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>>> [<ffffffff81067644>] lock_acquire+0xd6/0xfd
>>> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>>> [<ffffffff81850859>] _raw_spin_lock_irqsave+0x47/0x82
>>> [<ffffffff81321b74>] ? blkiocg_update_io_add_stats+0x32/0x12e
>>> [<ffffffff81321b74>] blkiocg_update_io_add_stats+0x32/0x12e
>>
>> So that's blkg->stats_lock - help me out, what is lockdep complaining
>> about? The lock is initialized, what kind of lockdep annotation magic do
>> we need to sprinkle on it?
>
> It basically says its lock instance isn't properly initialised. Usually
> spin_lock_init() will set lock->dep_map->key, for static locks,
> initialised with __SPIN_LOCK_UNLOCKED() ->key doesn't get set and we use
> the lock's address as key.
>
> Now lockdep requires the key to be in static storage, so if you try to
> used __SPIN_LOCK_UNLOCKED() on dynamically allocated locks (the most
> common form is using static forms like DEFINE_foo() on stack variables),
> things go bang.
>
> That said, the block_cgroup.c code seems to use spin_lock_init() so it
> _should_ all work out. Use before init/after free perhaps?

Exactly, I did double check that. But yes, could be a bug where it's
used before being initialized, though it seems to do that when the
struct is allocated. So perhaps user-after-free indeed, but it happened
at boot.

--
Jens Axboe

2010-06-12 01:55:16

by Divyesh Shah

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449

On Tue, Jun 1, 2010 at 12:53 AM, Jens Axboe <[email protected]> wrote:
> On Tue, Jun 01 2010, Ingo Molnar wrote:
>>
>> * Jens Axboe <[email protected]> wrote:
>>
>> > On Tue, Jun 01 2010, Ingo Molnar wrote:
>> > >
>> > > * Peter Zijlstra <[email protected]> wrote:
>> > >
>> > > > On Mon, 2010-05-24 at 05:03 +0200, Piotr Hosowicz wrote:
>> > > > > [ ?720.313607] BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449
>> > > > > [ ?720.313612] caller is native_sched_clock+0x3c/0x68
>> > > > > [ ?720.313616] Pid: 5449, comm: icedove-bin Tainted: P ? ? ? ? ? ?2.6.34-20100524-0407 #1
>> > > > > [ ?720.313618] Call Trace:
>> > > > > [ ?720.313624] ?[<ffffffff811a533b>] debug_smp_processor_id+0xc7/0xe0
>> > > > > [ ?720.313629] ?[<ffffffff81009b87>] native_sched_clock+0x3c/0x68
>> > > > > [ ?720.313634] ?[<ffffffff81009a4d>] sched_clock+0x9/0xd
>> > > > > [ ?720.313637] ?[<ffffffff811823ec>] blk_rq_init+0x92/0x9d
>> > > > > [ ?720.313641] ?[<ffffffff81184227>] get_request+0x1bf/0x2c7
>> > > > > [ ?720.313646] ?[<ffffffff8118435c>] get_request_wait+0x2d/0x19d
>> > > >
>> > > > This comes from wreckage in the blk tree..
>> > > >
>> > > > ---
>> > > > commit 9195291e5f05e01d67f9a09c756b8aca8f009089
>> > > > Author: Divyesh Shah <[email protected]>
>> > > > Date: ? Thu Apr 1 15:01:41 2010 -0700
>> > > >
>> > > > ? ? blkio: Increment the blkio cgroup stats for real now
>> > >
>> > > Jens, this regression is still in .35-rc1 and triggers in about 25% of all
>> > > -tip boot tests.
>> > >
>> > > The above commit is using sched_clock() in an unsafe way - please fix it or
>> > > revert it.
>> > >
>> > > The local_clock() changes PeterZ is working on are still WIP, it's not sure
>> > > we'll have it before .36.
>> >
>> > OK, I guess we'll have to solve this differently for .35 - I'll cook up
>> > something simple, if need be revert the change.
>>
>> I suspect you can put get_cpu/put_cpu around it and use cpu_clock(). The
>> cross-CPU effects will still be there and there might be weird stats.
>
> It'll shut it up at least, which is the primary concern at this point.

Jens,
Thanks for the temporary fix (adding preempt_enable/disable()
calls). I was able to repro the issue and and have a patch that
replaces use of sched_clock() in block layer w/
ktime_to_ns(ktime_get()). We will lose resolution when running w/o
highres timers but should be better than unbounded drift between cpus.
I'll send the patch next.

-Divyesh

>
> --
> Jens Axboe
>
>

2010-06-12 09:42:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: icedove-bin/5449

On Fri, 2010-06-11 at 18:54 -0700, Divyesh Shah wrote:
> I was able to repro the issue and and have a patch that
> replaces use of sched_clock() in block layer w/
> ktime_to_ns(ktime_get()).

You know ktime_get() is terrifyingly expensive, right?