2010-06-08 22:12:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: 2.6.35-rc2-git2: Reported regressions from 2.6.34

[NOTES:
* This by no means is a complete list, but we only put e-mail reports that
are at least 1 week old into the Bugzilla.
* Quite a few of the already reported regressions may be related to the bug
fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
bug in scrup, vt.c"), so reporters please retest with this commit applied.]

This message contains a list of some regressions from 2.6.34,
for which there are no fixes in the mainline known to the tracking team.
If any of them have been fixed already, please let us know.

If you know of any other unresolved regressions from 2.6.34, please let us
know either and we'll add them to the list. Also, please let us know
if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply
to this message with CCs to the people involved in reporting and handling
the issue.


Listed regressions statistics:

Date Total Pending Unresolved
----------------------------------------
2010-06-09 15 13 10


Unresolved regressions
----------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16163
Subject : [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
Submitter : David John <[email protected]>
Date : 2010-06-02 12:52 (7 days old)
Message-ID : <[email protected]>
References : http://marc.info/?l=linux-kernel&m=127548313828613&w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16161
Subject : [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
Submitter : Mikael Pettersson <[email protected]>
Date : 2010-06-01 19:57 (8 days old)
Message-ID : <[email protected]>
References : http://marc.info/?l=linux-kernel&m=127542227511925&w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16160
Subject : 2.6.35 Radeon KMS power management regression?
Submitter : Nigel Cunningham <[email protected]>
Date : 2010-06-01 6:23 (8 days old)
Message-ID : <[email protected]>
References : http://marc.info/?l=linux-kernel&m=127537343722290&w=2


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16145
Subject : Unable to boot after "ACPI: Don't let acpi_pad needlessly mark TSC unstable"
Submitter : Tom Gundersen <[email protected]>
Date : 2010-06-07 13:11 (2 days old)
Handled-By : Venkatesh Pallipadi <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
Submitter : Jan Kreuzer <[email protected]>
Date : 2010-06-05 06:15 (4 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16122
Subject : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
Submitter : Larry Finger <[email protected]>
Date : 2010-06-04 13:18 (5 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16120
Subject : Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
Submitter : Alex Zhavnerchik <[email protected]>
Date : 2010-06-04 09:25 (5 days old)
Handled-By : Eric Dumazet <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16104
Subject : Radeon KMS does not start after merge of the new PM-Code
Submitter : Jan Kreuzer <[email protected]>
Date : 2010-06-02 07:47 (7 days old)


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16090
Subject : sysfs: cannot create duplicate filename
Submitter : Tobias <[email protected]>
Date : 2010-06-01 15:59 (8 days old)
Handled-By : Jesse Barnes <[email protected]>


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16037
Subject : NULL Pointer dereference in __ir_input_register/budget_ci_attach
Submitter : Sean Finney <[email protected]>
Date : 2010-05-23 19:52 (17 days old)


Regressions with patches
------------------------

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16131
Subject : kernel BUG at fs/btrfs/extent-tree.c:4363 (btrfs_free_tree_block)
Submitter : Chow Loong Jin <[email protected]>
Date : 2010-06-05 18:53 (4 days old)
Handled-By : Yan Zheng <[email protected]>
Patch : https://patchwork.kernel.org/patch/103235/


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16127
Subject : Boot freeze on HP Compaq nx6325 (RS482) with Radeon KMS
Submitter : Jure Repinc <[email protected]>
Date : 2010-06-04 21:14 (5 days old)
Handled-By : Dave Airlie <[email protected]>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=26677


Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16092
Subject : Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb
Submitter : Christian Casteyde <[email protected]>
Date : 2010-06-01 18:08 (8 days old)
Handled-By : Venki <[email protected]>
Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16092#c2


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.34,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=16055

Please let the tracking team know if there are any Bugzilla entries that
should be added to the list in there.

Thanks!



2010-06-09 14:25:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34



On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
> >
> > That has a "reverting the commit fixes it", and a confirmation from Nick
> > Bowler.
> >
> > Eric, Carl: should I just revert that commit? Or do you have a fix?
>
> This one is reported to have been fixed already. Closed now.

Heh. That "fixed already" is actually the revert. Carl acked it.

> This should be fixed by commit f8ed8b4c5d30b5214f185997131b06e35f6f7113, so
> closing now.

Good, that was in yesterday's drm pull.

Linus

2010-06-11 09:07:39

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On 2010-06-11 10:55, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
>> On 2010-06-11 10:32, Ingo Molnar wrote:
>>>
>>> * Jens Axboe <[email protected]> wrote:
>>>
>>>> On 2010-06-09 03:53, Linus Torvalds wrote:
>>>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
>>>>>> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>>>>>> Submitter : Jan Kreuzer <[email protected]>
>>>>>> Date : 2010-06-05 06:15 (4 days old)
>>>>>
>>>>> This seems to have been introduced by
>>>>>
>>>>> commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
>>>>> Author: Ingo Molnar <[email protected]>
>>>>> Date: Sat Nov 8 17:05:38 2008 +0100
>>>>>
>>>>> sched: optimize sched_clock() a bit
>>>>>
>>>>> sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
>>>>> variant of __cycles_2_ns().
>>>>>
>>>>> Most of the time sched_clock() is called with irqs disabled already.
>>>>> The few places that call it with irqs enabled need to be updated..
>>>>>
>>>>> Signed-off-by: Ingo Molnar <[email protected]>
>>>>>
>>>>> and this seems to be one of those calling cases that need to be updated..
>>>>>
>>>>> Ingo? The call trace is:
>>>>>
>>>>> BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
>>>>> caller is native_sched_clock+0x3c/0x68
>>>>> Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
>>>>> Call Trace:
>>>>> [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
>>>>> [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
>>>>> [<ffffffff8101043d>] sched_clock+0x9/0xd
>>>>> [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
>>>>> [<ffffffff81214d71>] get_request+0x1c4/0x2d0
>>>>> [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
>>>>> [<ffffffff81215537>] __make_request+0x338/0x45b
>>>>> [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
>>>>> [<ffffffff81214909>] submit_bio+0xd2/0xef
>>>>> [<ffffffff811413cb>] submit_bh+0xf4/0x116
>>>>> [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
>>>>> [<ffffffff81144875>] block_write_full_page+0x15/0x17
>>>>> [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
>>>>> [<ffffffff810e1f91>] __writepage+0x1a/0x39
>>>>> [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
>>>>> [<ffffffff810e3406>] generic_writepages+0x27/0x29
>>>>> [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
>>>>> [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
>>>>> [<ffffffff811d0cba>] kjournald2+0x147/0x37a
>>>>>
>>>>> (from the bugzilla thing)
>>>>
>>>> This should be fixed by commit 28f4197e which was merged on friday.
>>>
>>> Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some
>>> configs i get bad spinlock warnings during bootup:
>>>
>>> [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
>>> [ 28.972003] calling b44_init+0x0/0x55 @ 1
>>> [ 28.976009] bus: 'pci': add driver b44
>>> [ 28.976374] sda:
>>> [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
>>> [ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
>>> [ 28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
>>> [ 28.980000] Call Trace:
>>> [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
>>> [ 28.980000] [<4134b7b7>] spin_bug+0x7c/0x87
>>> [ 28.980000] [<4134b853>] do_raw_spin_lock+0x1e/0x123
>>> [ 28.980000] [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
>>> [ 28.980000] [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
>>> [ 28.980000] [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
>>> [ 28.980000] [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
>>> [ 28.980000] [<41337bc7>] cfq_insert_request+0x8c/0x425
>>> [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
>>> [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
>>> [ 28.980000] [<41329225>] elv_insert+0x107/0x1a0
>>> [ 28.980000] [<41329354>] __elv_add_request+0x96/0x9d
>>> [ 28.980000] [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
>>> [ 28.980000] [<4132dd64>] __make_request+0x335/0x376
>>> [ 28.980000] [<4132c726>] generic_make_request+0x336/0x39d
>>> [ 28.980000] [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
>>> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
>>> [ 28.980000] [<41089347>] ? mempool_alloc+0x57/0xe2
>>> [ 28.980000] [<4132c804>] submit_bio+0x77/0x8f
>>> [ 28.980000] [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
>>> [ 28.980000] [<410ceb90>] submit_bh+0xc3/0xe2
>>> [ 28.980000] [<410d1474>] block_read_full_page+0x249/0x259
>>> [ 28.980000] [<410d31fb>] ? blkdev_get_block+0x0/0xc6
>>> [ 28.980000] [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
>>> [ 28.980000] [<410d3d92>] blkdev_readpage+0xf/0x11
>>> [ 28.980000] [<41088823>] do_read_cache_page+0x7d/0x11a
>>> [ 28.980000] [<410d3d83>] ? blkdev_readpage+0x0/0x11
>>> [ 28.980000] [<410888f4>] read_cache_page_async+0x16/0x1b
>>> [ 28.980000] [<41088904>] read_cache_page+0xb/0x12
>>> [ 28.980000] [<410e80e1>] read_dev_sector+0x2a/0x63
>>> [ 28.980000] [<410e92e8>] adfspart_check_ICS+0x2e/0x166
>>> [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
>>> [ 28.980000] [<410e8d23>] rescan_partitions+0x196/0x3e4
>>> [ 28.980000] [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
>>> [ 28.980000] [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
>>> [ 28.980000] [<410d4277>] __blkdev_get+0x1e7/0x292
>>> [ 28.980000] [<4133a201>] ? kobject_put+0x14/0x16
>>> [ 28.980000] [<410d432c>] blkdev_get+0xa/0xc
>>> [ 28.980000] [<410e81fb>] register_disk+0x94/0xe5
>>> [ 28.980000] [<413326c6>] ? blk_register_region+0x1b/0x20
>>> [ 28.980000] [<41332815>] add_disk+0x57/0x95
>>> [ 28.980000] [<41331fc6>] ? exact_match+0x0/0x8
>>> [ 28.980000] [<4133233f>] ? exact_lock+0x0/0x11
>>> [ 28.980000] [<41643848>] sd_probe_async+0x108/0x1be
>>> [ 28.980000] [<41048865>] async_thread+0xf5/0x1e6
>>> [ 28.980000] [<4102cbcb>] ? default_wake_function+0x0/0xd
>>> [ 28.980000] [<41048770>] ? async_thread+0x0/0x1e6
>>> [ 28.980000] [<410433df>] kthread+0x5f/0x64
>>> [ 28.980000] [<41043380>] ? kthread+0x0/0x64
>>> [ 28.980000] [<41002cc6>] kernel_thread_helper+0x6/0x10
>>> [ 29.264071] async/1 used greatest stack depth: 2336 bytes left
>>> [ 29.267020] bus: 'ssb': add driver b44
>>> [ 29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
>>> [ 29.267076] calling init_nic+0x0/0x16 @ 1
>>>
>>> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
>>> attached. Reproducible on that sha1 and with that config.
>>
>> I think I see it, the internal CFQ blkg groups are not properly
>> initialized... Will send a patch shortly.
>
> Cool - can test it with a short turnaround, the bug is easy to reproduce.

Thanks, I need to ensure what the best way to solve it is. The problem
is that if you have BLK_CGROUP set but don't enable the CFQ cgroup
stuff, then you end up calling the real update functions but CFQ has not
initialized them.

--
Jens Axboe


2010-06-09 08:54:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wednesday 09 June 2010, Jens Axboe wrote:
> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter : Jan Kreuzer <[email protected]>
> >> Date : 2010-06-05 06:15 (4 days old)
> >
> > This seems to have been introduced by
> >
> > commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > Author: Ingo Molnar <[email protected]>
> > Date: Sat Nov 8 17:05:38 2008 +0100
> >
> > sched: optimize sched_clock() a bit
> >
> > sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > variant of __cycles_2_ns().
> >
> > Most of the time sched_clock() is called with irqs disabled already.
> > The few places that call it with irqs enabled need to be updated.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> > and this seems to be one of those calling cases that need to be updated.
> >
> > Ingo? The call trace is:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > caller is native_sched_clock+0x3c/0x68
> > Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > Call Trace:
> > [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > [<ffffffff8101043d>] sched_clock+0x9/0xd
> > [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > [<ffffffff81215537>] __make_request+0x338/0x45b
> > [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > [<ffffffff81214909>] submit_bio+0xd2/0xef
> > [<ffffffff811413cb>] submit_bh+0xf4/0x116
> > [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > [<ffffffff81144875>] block_write_full_page+0x15/0x17
> > [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > [<ffffffff810e1f91>] __writepage+0x1a/0x39
> > [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > [<ffffffff810e3406>] generic_writepages+0x27/0x29
> > [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > [<ffffffff811d0cba>] kjournald2+0x147/0x37a
> >
> > (from the bugzilla thing)
>
> This should be fixed by commit 28f4197e which was merged on friday.

Thanks, closed.

Rafael

2010-06-11 08:34:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34


* Jens Axboe <[email protected]> wrote:

> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter : Jan Kreuzer <[email protected]>
> >> Date : 2010-06-05 06:15 (4 days old)
> >
> > This seems to have been introduced by
> >
> > commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > Author: Ingo Molnar <[email protected]>
> > Date: Sat Nov 8 17:05:38 2008 +0100
> >
> > sched: optimize sched_clock() a bit
> >
> > sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > variant of __cycles_2_ns().
> >
> > Most of the time sched_clock() is called with irqs disabled already.
> > The few places that call it with irqs enabled need to be updated.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> > and this seems to be one of those calling cases that need to be updated.
> >
> > Ingo? The call trace is:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > caller is native_sched_clock+0x3c/0x68
> > Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > Call Trace:
> > [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > [<ffffffff8101043d>] sched_clock+0x9/0xd
> > [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > [<ffffffff81215537>] __make_request+0x338/0x45b
> > [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > [<ffffffff81214909>] submit_bio+0xd2/0xef
> > [<ffffffff811413cb>] submit_bh+0xf4/0x116
> > [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > [<ffffffff81144875>] block_write_full_page+0x15/0x17
> > [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > [<ffffffff810e1f91>] __writepage+0x1a/0x39
> > [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > [<ffffffff810e3406>] generic_writepages+0x27/0x29
> > [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > [<ffffffff811d0cba>] kjournald2+0x147/0x37a
> >
> > (from the bugzilla thing)
>
> This should be fixed by commit 28f4197e which was merged on friday.

Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some
configs i get bad spinlock warnings during bootup:

[ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
[ 28.972003] calling b44_init+0x0/0x55 @ 1
[ 28.976009] bus: 'pci': add driver b44
[ 28.976374] sda:
[ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
[ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[ 28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
[ 28.980000] Call Trace:
[ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
[ 28.980000] [<4134b7b7>] spin_bug+0x7c/0x87
[ 28.980000] [<4134b853>] do_raw_spin_lock+0x1e/0x123
[ 28.980000] [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
[ 28.980000] [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
[ 28.980000] [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
[ 28.980000] [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
[ 28.980000] [<41337bc7>] cfq_insert_request+0x8c/0x425
[ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
[ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
[ 28.980000] [<41329225>] elv_insert+0x107/0x1a0
[ 28.980000] [<41329354>] __elv_add_request+0x96/0x9d
[ 28.980000] [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
[ 28.980000] [<4132dd64>] __make_request+0x335/0x376
[ 28.980000] [<4132c726>] generic_make_request+0x336/0x39d
[ 28.980000] [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
[ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
[ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
[ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
[ 28.980000] [<41089347>] ? mempool_alloc+0x57/0xe2
[ 28.980000] [<4132c804>] submit_bio+0x77/0x8f
[ 28.980000] [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
[ 28.980000] [<410ceb90>] submit_bh+0xc3/0xe2
[ 28.980000] [<410d1474>] block_read_full_page+0x249/0x259
[ 28.980000] [<410d31fb>] ? blkdev_get_block+0x0/0xc6
[ 28.980000] [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
[ 28.980000] [<410d3d92>] blkdev_readpage+0xf/0x11
[ 28.980000] [<41088823>] do_read_cache_page+0x7d/0x11a
[ 28.980000] [<410d3d83>] ? blkdev_readpage+0x0/0x11
[ 28.980000] [<410888f4>] read_cache_page_async+0x16/0x1b
[ 28.980000] [<41088904>] read_cache_page+0xb/0x12
[ 28.980000] [<410e80e1>] read_dev_sector+0x2a/0x63
[ 28.980000] [<410e92e8>] adfspart_check_ICS+0x2e/0x166
[ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
[ 28.980000] [<410e8d23>] rescan_partitions+0x196/0x3e4
[ 28.980000] [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
[ 28.980000] [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
[ 28.980000] [<410d4277>] __blkdev_get+0x1e7/0x292
[ 28.980000] [<4133a201>] ? kobject_put+0x14/0x16
[ 28.980000] [<410d432c>] blkdev_get+0xa/0xc
[ 28.980000] [<410e81fb>] register_disk+0x94/0xe5
[ 28.980000] [<413326c6>] ? blk_register_region+0x1b/0x20
[ 28.980000] [<41332815>] add_disk+0x57/0x95
[ 28.980000] [<41331fc6>] ? exact_match+0x0/0x8
[ 28.980000] [<4133233f>] ? exact_lock+0x0/0x11
[ 28.980000] [<41643848>] sd_probe_async+0x108/0x1be
[ 28.980000] [<41048865>] async_thread+0xf5/0x1e6
[ 28.980000] [<4102cbcb>] ? default_wake_function+0x0/0xd
[ 28.980000] [<41048770>] ? async_thread+0x0/0x1e6
[ 28.980000] [<410433df>] kthread+0x5f/0x64
[ 28.980000] [<41043380>] ? kthread+0x0/0x64
[ 28.980000] [<41002cc6>] kernel_thread_helper+0x6/0x10
[ 29.264071] async/1 used greatest stack depth: 2336 bytes left
[ 29.267020] bus: 'ssb': add driver b44
[ 29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
[ 29.267076] calling init_nic+0x0/0x16 @ 1

Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
attached. Reproducible on that sha1 and with that config.

Thanks,

Ingo


Attachments:
(No filename) (6.16 kB)
config (71.82 kB)
boot.log (535.12 kB)
Download all attachments

2010-06-09 09:02:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

The patch from [1] is still missing.

"cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
Dmitry Monakhoc

Tested-by: Sedat Dilek <[email protected]>
Tested-by Maciej Rutecki <[email protected]>

I have already reported this issue on LKML [2] and cpufreq ML [3].

- Sedat -

[1] http://www.spinics.net/lists/cpufreq/msg01631.html
[2] http://lkml.org/lkml/2010/5/31/77
[3] http://www.spinics.net/lists/cpufreq/msg01637.html

On Wed, Jun 9, 2010 at 12:06 AM, Rafael J. Wysocki <[email protected]> wrote:
> [NOTES:
>  * This by no means is a complete list, but we only put e-mail reports that
>   are at least 1 week old into the Bugzilla.
>  * Quite a few of the already reported regressions may be related to the bug
>   fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
>   bug in scrup, vt.c"), so reporters please retest with this commit applied.]
>
> This message contains a list of some regressions from 2.6.34,
> for which there are no fixes in the mainline known to the tracking team.
> If any of them have been fixed already, please let us know.
>
> If you know of any other unresolved regressions from 2.6.34, please let us
> know either and we'll add them to the list.  Also, please let us know
> if any of the entries below are invalid.
>
> Each entry from the list will be sent additionally in an automatic reply
> to this message with CCs to the people involved in reporting and handling
> the issue.
>
>
> Listed regressions statistics:
>
>  Date          Total  Pending  Unresolved
>  ----------------------------------------
>  2010-06-09       15       13          10
>
>
> Unresolved regressions
> ----------------------
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16163
> Subject         : [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> Submitter       : David John <[email protected]>
> Date            : 2010-06-02 12:52 (7 days old)
> Message-ID      : <[email protected]>
> References      : http://marc.info/?l=linux-kernel&m=127548313828613&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16161
> Subject         : [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
> Submitter       : Mikael Pettersson <[email protected]>
> Date            : 2010-06-01 19:57 (8 days old)
> Message-ID      : <[email protected]>
> References      : http://marc.info/?l=linux-kernel&m=127542227511925&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16160
> Subject         : 2.6.35 Radeon KMS power management regression?
> Submitter       : Nigel Cunningham <[email protected]>
> Date            : 2010-06-01 6:23 (8 days old)
> Message-ID      : <[email protected]>
> References      : http://marc.info/?l=linux-kernel&m=127537343722290&w=2
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16145
> Subject         : Unable to boot after "ACPI: Don't let acpi_pad needlessly mark TSC unstable"
> Submitter       : Tom Gundersen <[email protected]>
> Date            : 2010-06-07 13:11 (2 days old)
> Handled-By      : Venkatesh Pallipadi <[email protected]>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> Subject         : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> Submitter       : Jan Kreuzer <[email protected]>
> Date            : 2010-06-05 06:15 (4 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16122
> Subject         : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
> Submitter       : Larry Finger <[email protected]>
> Date            : 2010-06-04 13:18 (5 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16120
> Subject         : Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> Submitter       : Alex Zhavnerchik <[email protected]>
> Date            : 2010-06-04 09:25 (5 days old)
> Handled-By      : Eric Dumazet <[email protected]>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16104
> Subject         : Radeon KMS does not start after merge of the new PM-Code
> Submitter       : Jan Kreuzer <[email protected]>
> Date            : 2010-06-02 07:47 (7 days old)
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16090
> Subject         : sysfs: cannot create duplicate filename
> Submitter       : Tobias <[email protected]>
> Date            : 2010-06-01 15:59 (8 days old)
> Handled-By      : Jesse Barnes <[email protected]>
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16037
> Subject         : NULL Pointer dereference in __ir_input_register/budget_ci_attach
> Submitter       : Sean Finney <[email protected]>
> Date            : 2010-05-23 19:52 (17 days old)
>
>
> Regressions with patches
> ------------------------
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16131
> Subject         : kernel BUG at fs/btrfs/extent-tree.c:4363 (btrfs_free_tree_block)
> Submitter       : Chow Loong Jin <[email protected]>
> Date            : 2010-06-05 18:53 (4 days old)
> Handled-By      : Yan Zheng <[email protected]>
> Patch           : https://patchwork.kernel.org/patch/103235/
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16127
> Subject         : Boot freeze on HP Compaq nx6325 (RS482) with Radeon KMS
> Submitter       : Jure Repinc <[email protected]>
> Date            : 2010-06-04 21:14 (5 days old)
> Handled-By      : Dave Airlie <[email protected]>
> Patch           : https://bugzilla.kernel.org/attachment.cgi?id=26677
>
>
> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16092
> Subject         : Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb
> Submitter       : Christian Casteyde <[email protected]>
> Date            : 2010-06-01 18:08 (8 days old)
> Handled-By      : Venki <[email protected]>
> Patch           : https://bugzilla.kernel.org/show_bug.cgi?id=16092#c2
>
>
> For details, please visit the bug entries and follow the links given in
> references.
>
> As you can see, there is a Bugzilla entry for each of the listed regressions.
> There also is a Bugzilla entry used for tracking the regressions from 2.6.34,
> unresolved as well as resolved, at:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=16055
>
> Please let the tracking team know if there are any Bugzilla entries that
> should be added to the list in there.
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2010-06-09 01:54:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34



[ Added lots of cc's to direct specific people to look at the regressions
that may or may not be theirs... ]

On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
>
> * Quite a few of the already reported regressions may be related to the bug
> fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
> bug in scrup, vt.c"), so reporters please retest with this commit applied.]

>From a quick look, most of them look unrelated to that unfortunate bug.
It's hard to tell for sure, of course (memory corruption can do pretty
much anything), but I'm not seeing the 07200720.. pattern at least.

And some of them do seem to be bisected to likely culprits and/or have
patches that are claimed to have fixed them.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16163
> Subject : [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> Submitter : David John <[email protected]>
> Date : 2010-06-02 12:52 (7 days old)
> Message-ID : <[email protected]>
> References : http://marc.info/?l=linux-kernel&m=127548313828613&w=2

That has a "reverting the commit fixes it", and a confirmation from Nick
Bowler.

Eric, Carl: should I just revert that commit? Or do you have a fix?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16145
> Subject : Unable to boot after "ACPI: Don't let acpi_pad needlessly mark TSC unstable"
> Submitter : Tom Gundersen <[email protected]>
> Date : 2010-06-07 13:11 (2 days old)
> Handled-By : Venkatesh Pallipadi <[email protected]>

Hmm. This does seem to be a properly bisected commit, but at the same time
it looks from the bugzilla like it's just pure luck on that machine that
the acpi_pad driver happened to mark TSC unstable - so while the commit
bisected is the real one, it's not the "deeper reason" for the problem.

Venki, any updates?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> Submitter : Jan Kreuzer <[email protected]>
> Date : 2010-06-05 06:15 (4 days old)

This seems to have been introduced by

commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
Author: Ingo Molnar <[email protected]>
Date: Sat Nov 8 17:05:38 2008 +0100

sched: optimize sched_clock() a bit

sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
variant of __cycles_2_ns().

Most of the time sched_clock() is called with irqs disabled already.
The few places that call it with irqs enabled need to be updated.

Signed-off-by: Ingo Molnar <[email protected]>

and this seems to be one of those calling cases that need to be updated.

Ingo? The call trace is:

BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
caller is native_sched_clock+0x3c/0x68
Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
Call Trace:
[<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
[<ffffffff8101059d>] native_sched_clock+0x3c/0x68
[<ffffffff8101043d>] sched_clock+0x9/0xd
[<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
[<ffffffff81214d71>] get_request+0x1c4/0x2d0
[<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
[<ffffffff81215537>] __make_request+0x338/0x45b
[<ffffffff812147c2>] generic_make_request+0x2bb/0x330
[<ffffffff81214909>] submit_bio+0xd2/0xef
[<ffffffff811413cb>] submit_bh+0xf4/0x116
[<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
[<ffffffff81144875>] block_write_full_page+0x15/0x17
[<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
[<ffffffff810e1f91>] __writepage+0x1a/0x39
[<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
[<ffffffff810e3406>] generic_writepages+0x27/0x29
[<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
[<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
[<ffffffff811d0cba>] kjournald2+0x147/0x37a

(from the bugzilla thing)

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16122
> Subject : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
> Submitter : Larry Finger <[email protected]>
> Date : 2010-06-04 13:18 (5 days old)

Jens?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16120
> Subject : Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> Submitter : Alex Zhavnerchik <[email protected]>
> Date : 2010-06-04 09:25 (5 days old)
> Handled-By : Eric Dumazet <[email protected]>

This one seems to have a patch in bugzilla.

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16104
> Subject : Radeon KMS does not start after merge of the new PM-Code
> Submitter : Jan Kreuzer <[email protected]>
> Date : 2010-06-02 07:47 (7 days old)

This one also has a patch in Bugzilla, I think Airlie is just waiting to
calm down his queue and already removed the dependency on the temperature
code. DaveA?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16161
> Subject : [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
> Submitter : Mikael Pettersson <[email protected]>
> Date : 2010-06-01 19:57 (8 days old)
> Message-ID : <[email protected]>
> References : http://marc.info/?l=linux-kernel&m=127542227511925&w=2
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16090
> Subject : sysfs: cannot create duplicate filename
> Submitter : Tobias <[email protected]>
> Date : 2010-06-01 15:59 (8 days old)
> Handled-By : Jesse Barnes <[email protected]>

These two look related. Both are related to that "slot" thing in sysfs
PCI, but only one of them is marked as Jesse's. Jesse?

> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16037
> Subject : NULL Pointer dereference in __ir_input_register/budget_ci_attach
> Submitter : Sean Finney <[email protected]>
> Date : 2010-05-23 19:52 (17 days old)

Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?

David H?rdeman, Mauro Carvalho Chehab added to cc.

Linus

2010-06-09 09:05:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wednesday 09 June 2010, Linus Torvalds wrote:
>
> [ Added lots of cc's to direct specific people to look at the regressions
> that may or may not be theirs... ]
>
> On Wed, 9 Jun 2010, Rafael J. Wysocki wrote:
> >
> > * Quite a few of the already reported regressions may be related to the bug
> > fixed by 386f40c86d6c8d5b717ef20620af1a750d0dacb4 (Revert "tty: fix a little
> > bug in scrup, vt.c"), so reporters please retest with this commit applied.]
>
> From a quick look, most of them look unrelated to that unfortunate bug.
> It's hard to tell for sure, of course (memory corruption can do pretty
> much anything), but I'm not seeing the 07200720.. pattern at least.
>
> And some of them do seem to be bisected to likely culprits and/or have
> patches that are claimed to have fixed them.
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16163
> > Subject : [2.6.35-rc1 Regression] i915: Commit cfecde causes VGA to stay off
> > Submitter : David John <[email protected]>
> > Date : 2010-06-02 12:52 (7 days old)
> > Message-ID : <[email protected]>
> > References : http://marc.info/?l=linux-kernel&m=127548313828613&w=2
>
> That has a "reverting the commit fixes it", and a confirmation from Nick
> Bowler.
>
> Eric, Carl: should I just revert that commit? Or do you have a fix?

This one is reported to have been fixed already. Closed now.

...
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16104
> > Subject : Radeon KMS does not start after merge of the new PM-Code
> > Submitter : Jan Kreuzer <[email protected]>
> > Date : 2010-06-02 07:47 (7 days old)
>
> This one also has a patch in Bugzilla, I think Airlie is just waiting to
> calm down his queue and already removed the dependency on the temperature
> code. DaveA?

This should be fixed by commit f8ed8b4c5d30b5214f185997131b06e35f6f7113, so
closing now.

Rafael

2010-06-09 06:36:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

Le mardi 08 juin 2010 à 18:53 -0700, Linus Torvalds a écrit :

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16120
> > Subject : Oops: 0000 [#1] SMP, unable to handle kernel NULL pointer dereference at (null)
> > Submitter : Alex Zhavnerchik <[email protected]>
> > Date : 2010-06-04 09:25 (5 days old)
> > Handled-By : Eric Dumazet <[email protected]>
>
> This one seems to have a patch in bugzilla.

Yep, commit 035320d54758e21227987e3aae0d46e7a04f4ddc in David net-2.6
tree, i'll be included in its next pull request.

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=035320d54758e21227987e3aae0d46e7a04f4ddc

Thanks



2010-06-16 20:43:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wed, 9 Jun 2010 11:22:35 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wednesday 09 June 2010, Sedat Dilek wrote:
> > The patch from [1] is still missing.
> >
> > "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> > Dmitry Monakhoc
> >
> > Tested-by: Sedat Dilek <[email protected]>
> > Tested-by Maciej Rutecki <[email protected]>
> >
> > I have already reported this issue on LKML [2] and cpufreq ML [3].
> >
> > - Sedat -
> >
> > [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> > [2] http://lkml.org/lkml/2010/5/31/77
> > [3] http://www.spinics.net/lists/cpufreq/msg01637.html
>
> Thanks, added.

I just merged a different patch whcih should address this:


From: Sergey Senozhatsky <[email protected]>

Fix

BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
caller is nr_iowait_cpu+0xe/0x1e
Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
Call Trace:
[<c1184c55>] debug_smp_processor_id+0xa5/0xbc
[<c10282a5>] nr_iowait_cpu+0xe/0x1e
[<c104ab7c>] update_ts_time_stats+0x32/0x6c
[<c104ac73>] get_cpu_idle_time_us+0x36/0x58
[<c124229b>] get_cpu_idle_time+0x12/0x74
[<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
[<c1240437>] __cpufreq_governor+0x51/0x85
[<c1241190>] __cpufreq_set_policy+0x10c/0x13d
[<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
[<c1241b1e>] ? handle_update+0x0/0xd
[<c1241a18>] cpufreq_add_dev+0x34b/0x35a
[<c103c973>] ? schedule_delayed_work_on+0x11/0x13
[<c12c14db>] cpufreq_cpu_callback+0x59/0x63
[<c1042f39>] notifier_call_chain+0x26/0x48
[<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
[<c102efb9>] __cpu_notify+0x15/0x29
[<c102efda>] cpu_notify+0xd/0xf
[<c12bfb30>] _cpu_up+0xaf/0xd2
[<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
[<c1055eef>] hibernation_snapshot+0x104/0x1a2
[<c1058b49>] snapshot_ioctl+0x24b/0x53e
[<c1028ad1>] ? sub_preempt_count+0x7c/0x89
[<c10ab91d>] vfs_ioctl+0x2e/0x8c
[<c10588fe>] ? snapshot_ioctl+0x0/0x53e
[<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
[<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
[<c11e9dc3>] ? tty_write+0x0/0x1d0
[<c10a12d6>] ? vfs_write+0xa2/0xda
[<c10ac333>] sys_ioctl+0x41/0x62
[<c10027d3>] sysenter_do_call+0x12/0x2d

The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".

This patch introduces nr_iowait_cpu(int cpu) and changes to it callers.

akpm: addresses about 30,000,000 different bug reports.

Signed-off-by: Sergey Senozhatsky <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

drivers/cpuidle/governors/menu.c | 10 ++++++++--
include/linux/sched.h | 2 +-
kernel/sched.c | 4 ++--
kernel/time/tick-sched.c | 4 +++-
4 files changed, 14 insertions(+), 6 deletions(-)

diff -puN drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu drivers/cpuidle/governors/menu.c
--- a/drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/drivers/cpuidle/governors/menu.c
@@ -137,15 +137,18 @@ static inline int which_bucket(unsigned
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;

+ put_cpu();
+
if (duration < 10)
return bucket;
if (duration < 100)
@@ -169,13 +172,16 @@ static inline int which_bucket(unsigned
static inline int performance_multiplier(void)
{
int mult = 1;
+ int cpu = get_cpu();

/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);
+
+ put_cpu();

return mult;
}
diff -puN include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu include/linux/sched.h
--- a/include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff -puN kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/sched.c
--- a/kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff -puN kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
+++ a/kernel/time/tick-sched.c
@@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *
ktime_t delta;

if (ts->idle_active) {
+ int cpu = get_cpu();
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ put_cpu();
ts->idle_entrytime = now;
}

_


2010-06-09 07:53:23

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On 2010-06-09 03:53, Linus Torvalds wrote:
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
>> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>> Submitter : Jan Kreuzer <[email protected]>
>> Date : 2010-06-05 06:15 (4 days old)
>
> This seems to have been introduced by
>
> commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> Author: Ingo Molnar <[email protected]>
> Date: Sat Nov 8 17:05:38 2008 +0100
>
> sched: optimize sched_clock() a bit
>
> sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> variant of __cycles_2_ns().
>
> Most of the time sched_clock() is called with irqs disabled already.
> The few places that call it with irqs enabled need to be updated.
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> and this seems to be one of those calling cases that need to be updated.
>
> Ingo? The call trace is:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> caller is native_sched_clock+0x3c/0x68
> Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> Call Trace:
> [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> [<ffffffff8101043d>] sched_clock+0x9/0xd
> [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> [<ffffffff81215537>] __make_request+0x338/0x45b
> [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> [<ffffffff81214909>] submit_bio+0xd2/0xef
> [<ffffffff811413cb>] submit_bh+0xf4/0x116
> [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> [<ffffffff81144875>] block_write_full_page+0x15/0x17
> [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> [<ffffffff810e1f91>] __writepage+0x1a/0x39
> [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> [<ffffffff810e3406>] generic_writepages+0x27/0x29
> [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> [<ffffffff811d0cba>] kjournald2+0x147/0x37a
>
> (from the bugzilla thing)

This should be fixed by commit 28f4197e which was merged on friday.

>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16122
>> Subject : 2.6.35-rc1: WARNING at fs/fs-writeback.c:1142 __mark_inode_dirty+0x103/0x170
>> Submitter : Larry Finger <[email protected]>
>> Date : 2010-06-04 13:18 (5 days old)
>
> Jens?

Looking into this one.

--
Jens Axboe


2010-06-11 08:55:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34


* Jens Axboe <[email protected]> wrote:

> On 2010-06-11 10:32, Ingo Molnar wrote:
> >
> > * Jens Axboe <[email protected]> wrote:
> >
> >> On 2010-06-09 03:53, Linus Torvalds wrote:
> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >>>> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >>>> Submitter : Jan Kreuzer <[email protected]>
> >>>> Date : 2010-06-05 06:15 (4 days old)
> >>>
> >>> This seems to have been introduced by
> >>>
> >>> commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> >>> Author: Ingo Molnar <[email protected]>
> >>> Date: Sat Nov 8 17:05:38 2008 +0100
> >>>
> >>> sched: optimize sched_clock() a bit
> >>>
> >>> sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> >>> variant of __cycles_2_ns().
> >>>
> >>> Most of the time sched_clock() is called with irqs disabled already.
> >>> The few places that call it with irqs enabled need to be updated.
> >>>
> >>> Signed-off-by: Ingo Molnar <[email protected]>
> >>>
> >>> and this seems to be one of those calling cases that need to be updated..
> >>>
> >>> Ingo? The call trace is:
> >>>
> >>> BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> >>> caller is native_sched_clock+0x3c/0x68
> >>> Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> >>> Call Trace:
> >>> [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> >>> [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> >>> [<ffffffff8101043d>] sched_clock+0x9/0xd
> >>> [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> >>> [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> >>> [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> >>> [<ffffffff81215537>] __make_request+0x338/0x45b
> >>> [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> >>> [<ffffffff81214909>] submit_bio+0xd2/0xef
> >>> [<ffffffff811413cb>] submit_bh+0xf4/0x116
> >>> [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> >>> [<ffffffff81144875>] block_write_full_page+0x15/0x17
> >>> [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> >>> [<ffffffff810e1f91>] __writepage+0x1a/0x39
> >>> [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> >>> [<ffffffff810e3406>] generic_writepages+0x27/0x29
> >>> [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> >>> [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> >>> [<ffffffff811d0cba>] kjournald2+0x147/0x37a
> >>>
> >>> (from the bugzilla thing)
> >>
> >> This should be fixed by commit 28f4197e which was merged on friday.
> >
> > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some
> > configs i get bad spinlock warnings during bootup:
> >
> > [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
> > [ 28.972003] calling b44_init+0x0/0x55 @ 1
> > [ 28.976009] bus: 'pci': add driver b44
> > [ 28.976374] sda:
> > [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
> > [ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> > [ 28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
> > [ 28.980000] Call Trace:
> > [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
> > [ 28.980000] [<4134b7b7>] spin_bug+0x7c/0x87
> > [ 28.980000] [<4134b853>] do_raw_spin_lock+0x1e/0x123
> > [ 28.980000] [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
> > [ 28.980000] [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
> > [ 28.980000] [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
> > [ 28.980000] [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
> > [ 28.980000] [<41337bc7>] cfq_insert_request+0x8c/0x425
> > [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> > [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> > [ 28.980000] [<41329225>] elv_insert+0x107/0x1a0
> > [ 28.980000] [<41329354>] __elv_add_request+0x96/0x9d
> > [ 28.980000] [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
> > [ 28.980000] [<4132dd64>] __make_request+0x335/0x376
> > [ 28.980000] [<4132c726>] generic_make_request+0x336/0x39d
> > [ 28.980000] [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
> > [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> > [ 28.980000] [<41089347>] ? mempool_alloc+0x57/0xe2
> > [ 28.980000] [<4132c804>] submit_bio+0x77/0x8f
> > [ 28.980000] [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
> > [ 28.980000] [<410ceb90>] submit_bh+0xc3/0xe2
> > [ 28.980000] [<410d1474>] block_read_full_page+0x249/0x259
> > [ 28.980000] [<410d31fb>] ? blkdev_get_block+0x0/0xc6
> > [ 28.980000] [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
> > [ 28.980000] [<410d3d92>] blkdev_readpage+0xf/0x11
> > [ 28.980000] [<41088823>] do_read_cache_page+0x7d/0x11a
> > [ 28.980000] [<410d3d83>] ? blkdev_readpage+0x0/0x11
> > [ 28.980000] [<410888f4>] read_cache_page_async+0x16/0x1b
> > [ 28.980000] [<41088904>] read_cache_page+0xb/0x12
> > [ 28.980000] [<410e80e1>] read_dev_sector+0x2a/0x63
> > [ 28.980000] [<410e92e8>] adfspart_check_ICS+0x2e/0x166
> > [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
> > [ 28.980000] [<410e8d23>] rescan_partitions+0x196/0x3e4
> > [ 28.980000] [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
> > [ 28.980000] [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
> > [ 28.980000] [<410d4277>] __blkdev_get+0x1e7/0x292
> > [ 28.980000] [<4133a201>] ? kobject_put+0x14/0x16
> > [ 28.980000] [<410d432c>] blkdev_get+0xa/0xc
> > [ 28.980000] [<410e81fb>] register_disk+0x94/0xe5
> > [ 28.980000] [<413326c6>] ? blk_register_region+0x1b/0x20
> > [ 28.980000] [<41332815>] add_disk+0x57/0x95
> > [ 28.980000] [<41331fc6>] ? exact_match+0x0/0x8
> > [ 28.980000] [<4133233f>] ? exact_lock+0x0/0x11
> > [ 28.980000] [<41643848>] sd_probe_async+0x108/0x1be
> > [ 28.980000] [<41048865>] async_thread+0xf5/0x1e6
> > [ 28.980000] [<4102cbcb>] ? default_wake_function+0x0/0xd
> > [ 28.980000] [<41048770>] ? async_thread+0x0/0x1e6
> > [ 28.980000] [<410433df>] kthread+0x5f/0x64
> > [ 28.980000] [<41043380>] ? kthread+0x0/0x64
> > [ 28.980000] [<41002cc6>] kernel_thread_helper+0x6/0x10
> > [ 29.264071] async/1 used greatest stack depth: 2336 bytes left
> > [ 29.267020] bus: 'ssb': add driver b44
> > [ 29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
> > [ 29.267076] calling init_nic+0x0/0x16 @ 1
> >
> > Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
> > attached. Reproducible on that sha1 and with that config.
>
> I think I see it, the internal CFQ blkg groups are not properly
> initialized... Will send a patch shortly.

Cool - can test it with a short turnaround, the bug is easy to reproduce.

Thanks,

Ingo

2010-06-16 21:35:32

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wed, 16 Jun 2010 23:00:37 +0200
Sedat Dilek <[email protected]> wrote:

> On Wed, Jun 16, 2010 at 10:42 PM, Andrew Morton
> <[email protected]> wrote:
> > On Wed, 9 Jun 2010 11:22:35 +0200
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> >> On Wednesday 09 June 2010, Sedat Dilek wrote:
> >> > The patch from [1] is still missing.
> >> >
> >> > __ __"cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> >> > Dmitry Monakhoc
> >> >
> >> > Tested-by: Sedat Dilek <[email protected]>
> >> > Tested-by Maciej Rutecki <[email protected]>
> >> >
> >> > I have already reported this issue on LKML [2] and cpufreq ML [3].
> >> >
> >> > - Sedat -
> >> >
> >> > [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> >> > [2] http://lkml.org/lkml/2010/5/31/77
> >> > [3] http://www.spinics.net/lists/cpufreq/msg01637.html
> >>
> >> Thanks, added.
> >
> > I just merged a different patch whcih should address this:
>
> How do cpu-freq related stuff find its way into mainline?
> Is there a GIT repository/branch on <git.kernel.org> where you can pull from?
>

(top-posting repaired. Please don't)

Usually via the cpufreq git tree, mailing list and maintainer, as
described in ./MAINTAINERS.

But for a patch like this one, I'll just scoot it into mainline unless
Dave happens to grab it before I do that.


2010-06-09 05:35:05

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34


* Linus Torvalds <[email protected]> wrote:

> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> > Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> > Submitter : Jan Kreuzer <[email protected]>
> > Date : 2010-06-05 06:15 (4 days old)
>
> This seems to have been introduced by
>
> commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> Author: Ingo Molnar <[email protected]>
> Date: Sat Nov 8 17:05:38 2008 +0100
>
> sched: optimize sched_clock() a bit
>
> sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> variant of __cycles_2_ns().
>
> Most of the time sched_clock() is called with irqs disabled already.
> The few places that call it with irqs enabled need to be updated.
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> and this seems to be one of those calling cases that need to be updated.

That's a commit from 2008.

> Ingo? The call trace is:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> caller is native_sched_clock+0x3c/0x68
> Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> Call Trace:
> [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> [<ffffffff8101043d>] sched_clock+0x9/0xd
> [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> [<ffffffff81215537>] __make_request+0x338/0x45b
> [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> [<ffffffff81214909>] submit_bio+0xd2/0xef
> [<ffffffff811413cb>] submit_bh+0xf4/0x116
> [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> [<ffffffff81144875>] block_write_full_page+0x15/0x17
> [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> [<ffffffff810e1f91>] __writepage+0x1a/0x39
> [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> [<ffffffff810e3406>] generic_writepages+0x27/0x29
> [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> [<ffffffff811d0cba>] kjournald2+0x147/0x37a
>
> (from the bugzilla thing)

The warning was introduced by this fresh commit (and a followup commit) merged
in the .35 merge window:

| commit 9195291e5f05e01d67f9a09c756b8aca8f009089
| Author: Divyesh Shah <[email protected]>
| Date: Thu Apr 1 15:01:41 2010 -0700
|
| blkio: Increment the blkio cgroup stats for real now

IIRC Jens posted a fix for the regression. Jens, what's the status of that?

As the code there started using a raw sched_clock() call for block statistics
purposes, which was a poorly thought out (and buggy) approach:

- it takes timestamps on different cpus and then compares then, but doesnt
consider that sched_clock() is not comparable between CPUs without extra
care

- it doesnt consider the possibility for the sched_clock() result going
backwards on certain platforms (such as x86)

- it doesnt consider preemptability

(There's work ongoing to add a clock variant that can be used for such
purposes, but that's .36 fodder.)

Thanks,

Ingo

2010-06-09 08:58:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wednesday 09 June 2010, Mauro Carvalho Chehab wrote:
> Em 08-06-2010 22:53, Linus Torvalds escreveu:
>
> >
> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16037
> >> Subject : NULL Pointer dereference in __ir_input_register/budget_ci_attach
> >> Submitter : Sean Finney <[email protected]>
> >> Date : 2010-05-23 19:52 (17 days old)
> >
> > Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?
> >
> > David H?rdeman, Mauro Carvalho Chehab added to cc.
>
> This patch probably solves the issue:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=84b14f181a36eea6591779156ef356f8d198bbfd
>
> The patch were already applied upstream. I've already asked the reporter to test it, via BZ.

Confirmed fixed, so closing.

Thanks,
Rafael

2010-06-09 09:21:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On Wednesday 09 June 2010, Sedat Dilek wrote:
> The patch from [1] is still missing.
>
> "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
> Dmitry Monakhoc
>
> Tested-by: Sedat Dilek <[email protected]>
> Tested-by Maciej Rutecki <[email protected]>
>
> I have already reported this issue on LKML [2] and cpufreq ML [3].
>
> - Sedat -
>
> [1] http://www.spinics.net/lists/cpufreq/msg01631.html
> [2] http://lkml.org/lkml/2010/5/31/77
> [3] http://www.spinics.net/lists/cpufreq/msg01637.html

Thanks, added.

Rafael

2010-06-09 02:27:25

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

Em 08-06-2010 22:53, Linus Torvalds escreveu:

>
>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16037
>> Subject : NULL Pointer dereference in __ir_input_register/budget_ci_attach
>> Submitter : Sean Finney <[email protected]>
>> Date : 2010-05-23 19:52 (17 days old)
>
> Perhaps related to commit 13c24497086418010bf4f76378bcae241d7f59cd?
>
> David H?rdeman, Mauro Carvalho Chehab added to cc.

This patch probably solves the issue:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=84b14f181a36eea6591779156ef356f8d198bbfd

The patch were already applied upstream. I've already asked the reporter to test it, via BZ.

Cheers,
Mauro

2010-06-09 09:39:26

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On 2010-06-09 11:32, Ingo Molnar wrote:
>> This should be fixed by commit 28f4197e which was merged on friday.
>
> The scheduler commit adding local_clock() (for .36) is:
>
> c676329: sched_clock: Add local_clock() API and improve documentation
>
> So once that is upstream the block IO statistics code can use that.

Thanks, I'll have to make a note of that.

--
Jens Axboe


2010-06-09 09:33:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34


* Jens Axboe <[email protected]> wrote:

> On 2010-06-09 03:53, Linus Torvalds wrote:
> >> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
> >> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
> >> Submitter : Jan Kreuzer <[email protected]>
> >> Date : 2010-06-05 06:15 (4 days old)
> >
> > This seems to have been introduced by
> >
> > commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
> > Author: Ingo Molnar <[email protected]>
> > Date: Sat Nov 8 17:05:38 2008 +0100
> >
> > sched: optimize sched_clock() a bit
> >
> > sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
> > variant of __cycles_2_ns().
> >
> > Most of the time sched_clock() is called with irqs disabled already.
> > The few places that call it with irqs enabled need to be updated.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> > and this seems to be one of those calling cases that need to be updated.
> >
> > Ingo? The call trace is:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
> > caller is native_sched_clock+0x3c/0x68
> > Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
> > Call Trace:
> > [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
> > [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
> > [<ffffffff8101043d>] sched_clock+0x9/0xd
> > [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
> > [<ffffffff81214d71>] get_request+0x1c4/0x2d0
> > [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
> > [<ffffffff81215537>] __make_request+0x338/0x45b
> > [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
> > [<ffffffff81214909>] submit_bio+0xd2/0xef
> > [<ffffffff811413cb>] submit_bh+0xf4/0x116
> > [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
> > [<ffffffff81144875>] block_write_full_page+0x15/0x17
> > [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
> > [<ffffffff810e1f91>] __writepage+0x1a/0x39
> > [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
> > [<ffffffff810e3406>] generic_writepages+0x27/0x29
> > [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
> > [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
> > [<ffffffff811d0cba>] kjournald2+0x147/0x37a
> >
> > (from the bugzilla thing)
>
> This should be fixed by commit 28f4197e which was merged on friday.

The scheduler commit adding local_clock() (for .36) is:

c676329: sched_clock: Add local_clock() API and improve documentation

So once that is upstream the block IO statistics code can use that.

Thanks,

Ingo

2010-06-11 08:40:02

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

On 2010-06-11 10:32, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
>> On 2010-06-09 03:53, Linus Torvalds wrote:
>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129
>>>> Subject : BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2
>>>> Submitter : Jan Kreuzer <[email protected]>
>>>> Date : 2010-06-05 06:15 (4 days old)
>>>
>>> This seems to have been introduced by
>>>
>>> commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
>>> Author: Ingo Molnar <[email protected]>
>>> Date: Sat Nov 8 17:05:38 2008 +0100
>>>
>>> sched: optimize sched_clock() a bit
>>>
>>> sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
>>> variant of __cycles_2_ns().
>>>
>>> Most of the time sched_clock() is called with irqs disabled already.
>>> The few places that call it with irqs enabled need to be updated.
>>>
>>> Signed-off-by: Ingo Molnar <[email protected]>
>>>
>>> and this seems to be one of those calling cases that need to be updated..
>>>
>>> Ingo? The call trace is:
>>>
>>> BUG: using smp_processor_id() in preemptible [00000000] code: jbd2/sda2-8/337
>>> caller is native_sched_clock+0x3c/0x68
>>> Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
>>> Call Trace:
>>> [<ffffffff812362c5>] debug_smp_processor_id+0xc9/0xe4
>>> [<ffffffff8101059d>] native_sched_clock+0x3c/0x68
>>> [<ffffffff8101043d>] sched_clock+0x9/0xd
>>> [<ffffffff81212d7a>] blk_rq_init+0x97/0xa3
>>> [<ffffffff81214d71>] get_request+0x1c4/0x2d0
>>> [<ffffffff81214ea6>] get_request_wait+0x29/0x1a6
>>> [<ffffffff81215537>] __make_request+0x338/0x45b
>>> [<ffffffff812147c2>] generic_make_request+0x2bb/0x330
>>> [<ffffffff81214909>] submit_bio+0xd2/0xef
>>> [<ffffffff811413cb>] submit_bh+0xf4/0x116
>>> [<ffffffff81144853>] block_write_full_page_endio+0x89/0x96
>>> [<ffffffff81144875>] block_write_full_page+0x15/0x17
>>> [<ffffffff8119b00a>] ext4_writepage+0x356/0x36b
>>> [<ffffffff810e1f91>] __writepage+0x1a/0x39
>>> [<ffffffff810e32a6>] write_cache_pages+0x20d/0x346
>>> [<ffffffff810e3406>] generic_writepages+0x27/0x29
>>> [<ffffffff811ca279>] journal_submit_data_buffers+0x110/0x17d
>>> [<ffffffff811ca986>] jbd2_journal_commit_transaction+0x4cb/0x156d
>>> [<ffffffff811d0cba>] kjournald2+0x147/0x37a
>>>
>>> (from the bugzilla thing)
>>
>> This should be fixed by commit 28f4197e which was merged on friday.
>
> Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some
> configs i get bad spinlock warnings during bootup:
>
> [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs
> [ 28.972003] calling b44_init+0x0/0x55 @ 1
> [ 28.976009] bus: 'pci': add driver b44
> [ 28.976374] sda:
> [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
> [ 28.980000] lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> [ 28.980000] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
> [ 28.980000] Call Trace:
> [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
> [ 28.980000] [<4134b7b7>] spin_bug+0x7c/0x87
> [ 28.980000] [<4134b853>] do_raw_spin_lock+0x1e/0x123
> [ 28.980000] [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
> [ 28.980000] [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
> [ 28.980000] [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
> [ 28.980000] [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
> [ 28.980000] [<41337bc7>] cfq_insert_request+0x8c/0x425
> [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> [ 28.980000] [<41ba9271>] ? _raw_spin_unlock_irqrestore+0x17/0x23
> [ 28.980000] [<41329225>] elv_insert+0x107/0x1a0
> [ 28.980000] [<41329354>] __elv_add_request+0x96/0x9d
> [ 28.980000] [<4132bb8c>] ? drive_stat_acct+0x9d/0xc6
> [ 28.980000] [<4132dd64>] __make_request+0x335/0x376
> [ 28.980000] [<4132c726>] generic_make_request+0x336/0x39d
> [ 28.980000] [<410ad422>] ? kmem_cache_alloc+0xa1/0x105
> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [ 28.980000] [<41089285>] ? mempool_alloc_slab+0xe/0x10
> [ 28.980000] [<41089347>] ? mempool_alloc+0x57/0xe2
> [ 28.980000] [<4132c804>] submit_bio+0x77/0x8f
> [ 28.980000] [<410d2cbc>] ? bio_alloc_bioset+0x37/0x94
> [ 28.980000] [<410ceb90>] submit_bh+0xc3/0xe2
> [ 28.980000] [<410d1474>] block_read_full_page+0x249/0x259
> [ 28.980000] [<410d31fb>] ? blkdev_get_block+0x0/0xc6
> [ 28.980000] [<41087bfa>] ? add_to_page_cache_locked+0x94/0xb5
> [ 28.980000] [<410d3d92>] blkdev_readpage+0xf/0x11
> [ 28.980000] [<41088823>] do_read_cache_page+0x7d/0x11a
> [ 28.980000] [<410d3d83>] ? blkdev_readpage+0x0/0x11
> [ 28.980000] [<410888f4>] read_cache_page_async+0x16/0x1b
> [ 28.980000] [<41088904>] read_cache_page+0xb/0x12
> [ 28.980000] [<410e80e1>] read_dev_sector+0x2a/0x63
> [ 28.980000] [<410e92e8>] adfspart_check_ICS+0x2e/0x166
> [ 28.980000] [<41ba6d55>] ? printk+0x20/0x24
> [ 28.980000] [<410e8d23>] rescan_partitions+0x196/0x3e4
> [ 28.980000] [<41ba7dc7>] ? __mutex_unlock_slowpath+0x98/0x9f
> [ 28.980000] [<410e92ba>] ? adfspart_check_ICS+0x0/0x166
> [ 28.980000] [<410d4277>] __blkdev_get+0x1e7/0x292
> [ 28.980000] [<4133a201>] ? kobject_put+0x14/0x16
> [ 28.980000] [<410d432c>] blkdev_get+0xa/0xc
> [ 28.980000] [<410e81fb>] register_disk+0x94/0xe5
> [ 28.980000] [<413326c6>] ? blk_register_region+0x1b/0x20
> [ 28.980000] [<41332815>] add_disk+0x57/0x95
> [ 28.980000] [<41331fc6>] ? exact_match+0x0/0x8
> [ 28.980000] [<4133233f>] ? exact_lock+0x0/0x11
> [ 28.980000] [<41643848>] sd_probe_async+0x108/0x1be
> [ 28.980000] [<41048865>] async_thread+0xf5/0x1e6
> [ 28.980000] [<4102cbcb>] ? default_wake_function+0x0/0xd
> [ 28.980000] [<41048770>] ? async_thread+0x0/0x1e6
> [ 28.980000] [<410433df>] kthread+0x5f/0x64
> [ 28.980000] [<41043380>] ? kthread+0x0/0x64
> [ 28.980000] [<41002cc6>] kernel_thread_helper+0x6/0x10
> [ 29.264071] async/1 used greatest stack depth: 2336 bytes left
> [ 29.267020] bus: 'ssb': add driver b44
> [ 29.267072] initcall b44_init+0x0/0x55 returned 0 after 281250 usecs
> [ 29.267076] calling init_nic+0x0/0x16 @ 1
>
> Caused by the same blkiocg_update_io_add_stats() function. Bootlog and config
> attached. Reproducible on that sha1 and with that config.

I think I see it, the internal CFQ blkg groups are not properly
initialized... Will send a patch shortly.

--
Jens Axboe


2010-06-16 21:00:40

by Sedat Dilek

[permalink] [raw]
Subject: Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

How do cpu-freq related stuff find its way into mainline?
Is there a GIT repository/branch on <git.kernel.org> where you can pull from?

- Sedat -

On Wed, Jun 16, 2010 at 10:42 PM, Andrew Morton
<[email protected]> wrote:
> On Wed, 9 Jun 2010 11:22:35 +0200
> "Rafael J. Wysocki" <[email protected]> wrote:
>
>> On Wednesday 09 June 2010, Sedat Dilek wrote:
>> > The patch from [1] is still missing.
>> >
>> >    "cpufreq-call-nr_iowait_cpu-with-disabled-preemption.patch" from
>> > Dmitry Monakhoc
>> >
>> > Tested-by: Sedat Dilek <[email protected]>
>> > Tested-by Maciej Rutecki <[email protected]>
>> >
>> > I have already reported this issue on LKML [2] and cpufreq ML [3].
>> >
>> > - Sedat -
>> >
>> > [1] http://www.spinics.net/lists/cpufreq/msg01631.html
>> > [2] http://lkml.org/lkml/2010/5/31/77
>> > [3] http://www.spinics.net/lists/cpufreq/msg01637.html
>>
>> Thanks, added.
>
> I just merged a different patch whcih should address this:
>
>
> From: Sergey Senozhatsky <[email protected]>
>
> Fix
>
>  BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
>  caller is nr_iowait_cpu+0xe/0x1e
>  Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
>  Call Trace:
>  [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
>  [<c10282a5>] nr_iowait_cpu+0xe/0x1e
>  [<c104ab7c>] update_ts_time_stats+0x32/0x6c
>  [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
>  [<c124229b>] get_cpu_idle_time+0x12/0x74
>  [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
>  [<c1240437>] __cpufreq_governor+0x51/0x85
>  [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
>  [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
>  [<c1241b1e>] ? handle_update+0x0/0xd
>  [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
>  [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
>  [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
>  [<c1042f39>] notifier_call_chain+0x26/0x48
>  [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
>  [<c102efb9>] __cpu_notify+0x15/0x29
>  [<c102efda>] cpu_notify+0xd/0xf
>  [<c12bfb30>] _cpu_up+0xaf/0xd2
>  [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
>  [<c1055eef>] hibernation_snapshot+0x104/0x1a2
>  [<c1058b49>] snapshot_ioctl+0x24b/0x53e
>  [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
>  [<c10ab91d>] vfs_ioctl+0x2e/0x8c
>  [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
>  [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
>  [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
>  [<c11e9dc3>] ? tty_write+0x0/0x1d0
>  [<c10a12d6>] ? vfs_write+0xa2/0xda
>  [<c10ac333>] sys_ioctl+0x41/0x62
>  [<c10027d3>] sysenter_do_call+0x12/0x2d
>
> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu.  However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to it callers.
>
> akpm: addresses about 30,000,000 different bug reports.
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
> Cc: Arjan van de Ven <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Pavel Machek <[email protected]>
> Cc: Jiri Slaby <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
>  drivers/cpuidle/governors/menu.c |   10 ++++++++--
>  include/linux/sched.h            |    2 +-
>  kernel/sched.c                   |    4 ++--
>  kernel/time/tick-sched.c         |    4 +++-
>  4 files changed, 14 insertions(+), 6 deletions(-)
>
> diff -puN drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu drivers/cpuidle/governors/menu.c
> --- a/drivers/cpuidle/governors/menu.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
> +++ a/drivers/cpuidle/governors/menu.c
> @@ -137,15 +137,18 @@ static inline int which_bucket(unsigned
>  {
>        int bucket = 0;
>
> +       int cpu = get_cpu();
>        /*
>         * We keep two groups of stats; one with no
>         * IO pending, one without.
>         * This allows us to calculate
>         * E(duration)|iowait
>         */
> -       if (nr_iowait_cpu())
> +       if (nr_iowait_cpu(cpu))
>                bucket = BUCKETS/2;
>
> +       put_cpu();
> +
>        if (duration < 10)
>                return bucket;
>        if (duration < 100)
> @@ -169,13 +172,16 @@ static inline int which_bucket(unsigned
>  static inline int performance_multiplier(void)
>  {
>        int mult = 1;
> +       int cpu = get_cpu();
>
>        /* for higher loadavg, we are more reluctant */
>
>        mult += 2 * get_loadavg();
>
>        /* for IO wait tasks (per cpu!) we add 5x each */
> -       mult += 10 * nr_iowait_cpu();
> +       mult += 10 * nr_iowait_cpu(cpu);
> +
> +       put_cpu();
>
>        return mult;
>  }
> diff -puN include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu include/linux/sched.h
> --- a/include/linux/sched.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
> +++ a/include/linux/sched.h
> @@ -139,7 +139,7 @@ extern int nr_processes(void);
>  extern unsigned long nr_running(void);
>  extern unsigned long nr_uninterruptible(void);
>  extern unsigned long nr_iowait(void);
> -extern unsigned long nr_iowait_cpu(void);
> +extern unsigned long nr_iowait_cpu(int cpu);
>  extern unsigned long this_cpu_load(void);
>
>
> diff -puN kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/sched.c
> --- a/kernel/sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
> +++ a/kernel/sched.c
> @@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
>        return sum;
>  }
>
> -unsigned long nr_iowait_cpu(void)
> +unsigned long nr_iowait_cpu(int cpu)
>  {
> -       struct rq *this = this_rq();
> +       struct rq *this = cpu_rq(cpu);
>        return atomic_read(&this->nr_iowait);
>  }
>
> diff -puN kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu kernel/time/tick-sched.c
> --- a/kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu
> +++ a/kernel/time/tick-sched.c
> @@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *
>        ktime_t delta;
>
>        if (ts->idle_active) {
> +               int cpu = get_cpu();
>                delta = ktime_sub(now, ts->idle_entrytime);
>                ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> -               if (nr_iowait_cpu() > 0)
> +               if (nr_iowait_cpu(cpu) > 0)
>                        ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> +               put_cpu();
>                ts->idle_entrytime = now;
>        }
>
> _
>
>