Hey everybody, there's finally a new -rc out there..
I've been back online for a week, and at least judging by the kinds of
patches and pull requests I've been getting, I have to say that I
think that having been strict for -rc3 ended up working out pretty
well. The diffstat of rc3..rc4 looks quite reasonable, despite the
longer window between rc's. And while there were certainly some things
that needed fixing, I'm hoping that we'll have a timely 2.6.35 release
despite my vacation (my longest time away from the kernel in many
years, I do believe - I followed email on a cellphone, but did not a
single kernel compile, and had a great time under water).
So go out and test -rc4. It fixes a number of regressions, a couple of
them harking back to from before 2.6.34. Networking, cfq, i915 and
radeo. And filesystem writeback performance issues, etc. It's all
good.
Linus
* Linus Torvalds <[email protected]> wrote:
> [...] And filesystem writeback performance issues, etc. It's all good.
> Christoph Hellwig (11):
> writeback: fix writeback completion notifications
> writeback: queue work on stack in writeback_inodes_sb
> writeback: enforce s_umount locking in writeback_inodes_sb
> writeback: fix writeback_inodes_wb from writeback_inodes_sb
> writeback: simplify wakeup_flusher_threads
> writeback: simplify and split bdi_start_writeback
> writeback: add missing requeue_io in writeback_inodes_wb
> writeback: fix pin_sb_for_writeback
Hm, those writeback patches are not looking overly good here - the kernel
crashes quite a lot during bootup:
BUG: unable to handle kernel paging request at 000000000528aa7e
IP: [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66
PGD 0
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/eth0/broadcast
CPU 1
Modules linked in:
Pid: 1637, comm: flush-8:0 Not tainted 2.6.35-rc4-tip-01075-g0c1880a-dirty #15306 A8N-E/System Product Name
RIP: 0010:[<ffffffff8112d514>] [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66
RSP: 0018:ffff88003b899d90 EFLAGS: 00010286
RAX: 000000000528aa7e RBX: ffff880038e3d5f8 RCX: ffff88003b952000
RDX: ffff88003dd8c468 RSI: ffff88003dd8c420 RDI: ffff88003dd8c408
RBP: ffff88003b899da0 R08: ffffffff825bf870 R09: 0000000000000000
R10: ffff88003b899cf0 R11: 0000000000000002 R12: ffff88003dd8c408
R13: 0000000000000000 R14: ffff880038e3d5f8 R15: ffff88003dd8c468
FS: 00007f2d706b36f0(0000) GS:ffff880003200000(0000) knlGS:00000000f77bb6c0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000528aa7e CR3: 0000000001d1e000 CR4: 00000000000006a0
DR0: ffffffff82ceebbc DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Process flush-8:0 (pid: 1637, threadinfo ffff88003b898000, task ffff88003b952000)
Stack:
ffff88003dd8c3a8 0000000000000000 ffff88003b899e30 ffffffff8112d5d7
<0> ffffffff8112d5de ffffffff823ae2a0 0000000000000000 ffff88003b899de0
<0> 00000000000009c4 ffff88003b899fd8 000000000000000a 0000000000000000
Call Trace:
[<ffffffff8112d5d7>] wb_do_writeback+0x98/0x1d7
[<ffffffff8112d5de>] ? wb_do_writeback+0x9f/0x1d7
[<ffffffff8112d755>] bdi_writeback_task+0x3f/0xfc
[<ffffffff810809a7>] ? bit_waitqueue+0x17/0xa9
[<ffffffff810ee2d0>] ? bdi_start_fn+0x0/0xda
[<ffffffff810ee346>] bdi_start_fn+0x76/0xda
[<ffffffff810ee2d0>] ? bdi_start_fn+0x0/0xda
[<ffffffff810805e9>] kthread+0x7f/0x87
[<ffffffff8102ac04>] kernel_thread_helper+0x4/0x10
[<ffffffff8105af79>] ? finish_task_switch+0x70/0xdc
[<ffffffff818674c5>] ? _raw_spin_unlock_irq+0x4f/0x6c
[<ffffffff81867994>] ? restore_args+0x0/0x30
[<ffffffff8108056a>] ? kthread+0x0/0x87
[<ffffffff8102ac00>] ? kernel_thread_helper+0x0/0x10
Code: ff 4e 28 0f 94 c0 84 c0 74 47 4c 8b 67 10 49 81 c4 10 02 00 00 4c 89 e7 e8 d5 92 73 00 48 8b 43 08 48 8b 13 4c 89 e7 48 89 42 08 <48> 89 10 48 b8 00 02 20 00 00 00 ad de 48 89 43 08 e8 4d a0 73
RIP [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66
It appears to be a race of some sorts and is not reproducible consistently: it
occurs with an about 2-10% likelyhood in my test runs and i've seen about ~5
similar crashes so far. They appeared after you pulled those writeback
patches. I had a long and successful testrun of 1000+ kernels with no crashes,
shortly before those wb patches went upstream.
Given the small linear chance for the crash [i'd have to do a series of 100
reboots per bisection point to be reasonably sure about goodness/badness] i'd
rather not try to bisect it - but can try fix patches or can try specific
reverts as well.
Config and full crashlog attached. (Note, the log is from
tip-2.6.35-rc4-tip-01075-g0c1880a but the crash clearly came in from upstream,
the patterns are consistent and nothing remotely related happened in -tip in
that timeframe.)
Thanks,
Ingo
On Mon, Jul 5, 2010 at 11:44 AM, Linus Torvalds
<[email protected]> wrote:
> Hey everybody, there's finally a new -rc out there..
>
> I've been back online for a week, and at least judging by the kinds of
> patches and pull requests I've been getting, I have to say that I
> think that having been strict for -rc3 ended up working out pretty
> well. The diffstat of rc3..rc4 looks quite reasonable, despite the
> longer window between rc's. And while there were certainly some things
> that needed fixing, I'm hoping that we'll have a timely 2.6.35 release
> despite my vacation (my longest time away from the kernel in many
> years, I do believe - I followed email on a cellphone, but did not a
> single kernel compile, and had a great time under water).
>
> So go out and test -rc4. It fixes a number of regressions, a couple of
> them harking back to from before 2.6.34. Networking, cfq, i915 and
> radeo. And filesystem writeback performance issues, etc. It's all
> good.
I'm facing some drm issue with -rc4, my x-windows is messed up and can
not be used. I got floods of following messages, kernel config is
attached. I have the same issue on -mmotm-rc3.
[ 34.532132] BUG: scheduling while atomic: plymouthd/177/0x00000002
[ 34.532133] INFO: lockdep is turned off.
[ 34.532135] Modules linked in: radeon(+) ttm drm_kms_helper drm
i2c_algo_bit i2c_core
[ 34.532139] Pid: 177, comm: plymouthd Tainted: G D I 2.6.35-rc4+ #39
[ 34.532141] Call Trace:
[ 34.532144] [<ffffffff8104361a>] __schedule_bug+0x77/0x7c
[ 34.532146] [<ffffffff814f9592>] schedule+0xd3/0x695
[ 34.532149] [<ffffffff81301fc9>] ? tty_release+0x301/0x602
[ 34.532152] [<ffffffff814fa64c>] __mutex_lock_common+0x29d/0x461
[ 34.532154] [<ffffffff81301fc9>] ? tty_release+0x301/0x602
[ 34.532156] [<ffffffff814fbebc>] ? _raw_spin_unlock+0x4a/0x57
[ 34.532159] [<ffffffff814fa8c5>] mutex_lock_nested+0x39/0x3e
[ 34.532161] [<ffffffff81301fc9>] tty_release+0x301/0x602
[ 34.532164] [<ffffffff8110a06b>] ? __cache_free+0x3a/0x1bc
[ 34.532167] [<ffffffff8111a636>] fput+0x135/0x1e2
[ 34.532170] [<ffffffff811177e5>] filp_close+0x68/0x72
[ 34.532172] [<ffffffff81052047>] put_files_struct+0xbd/0x171
[ 34.532175] [<ffffffff81052146>] exit_files+0x4b/0x53
[ 34.532177] [<ffffffff81053bb2>] do_exit+0x294/0x756
[ 34.532180] [<ffffffff81050bd6>] ? kmsg_dump+0x155/0x16f
[ 34.532182] [<ffffffff814fd6ec>] oops_end+0xbe/0xc6
[ 34.532185] [<ffffffff81033022>] no_context+0x1fc/0x20b
[ 34.532188] [<ffffffff810331c3>] __bad_area_nosemaphore+0x192/0x1b5
[ 34.532196] [<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
[ 34.532199] [<ffffffff810331f9>] bad_area_nosemaphore+0x13/0x15
[ 34.532201] [<ffffffff814ff441>] do_page_fault+0x15d/0x283
[ 34.532209] [<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
[ 34.532211] [<ffffffff814fca25>] page_fault+0x25/0x30
[ 34.532219] [<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
[ 34.532222] [<ffffffff8127be8a>] ? __list_add+0x3f/0x81
[ 34.532225] [<ffffffff814fa4e0>] __mutex_lock_common+0x131/0x461
[ 34.532233] [<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
[ 34.532239] [<ffffffffa001b104>] ? drm_release+0x2bd/0x63e [drm]
[ 34.532242] [<ffffffff814fa8c5>] mutex_lock_nested+0x39/0x3e
[ 34.532250] [<ffffffffa0023e39>] drm_fb_release+0x2f/0x7e [drm]
[ 34.532257] [<ffffffffa001b1da>] drm_release+0x393/0x63e [drm]
[ 34.532259] [<ffffffff8111a636>] fput+0x135/0x1e2
[ 34.532262] [<ffffffff811177e5>] filp_close+0x68/0x72
[ 34.532265] [<ffffffff811178cb>] sys_close+0xdc/0x116
[ 34.532268] [<ffffffff81009cc2>] system_call_fastpath+0x16/0x1b
>
> Linus
>
We had a thread about it a few days ago. The following commit from
Jens' 2.6.36 tree needs to go into 2.6.35 to sort the issues with the
fragile wakeup mechanism which only got worse by the patches out:
http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=79338d2a78ab78efdc1698f1309766a039addf9d
* Christoph Hellwig <[email protected]> wrote:
> We had a thread about it a few days ago. The following commit from
> Jens' 2.6.36 tree needs to go into 2.6.35 to sort the issues with the
> fragile wakeup mechanism which only got worse by the patches out:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=79338d2a78ab78efdc1698f1309766a039addf9d
I've attached it below - in case someone wants to apply it before it hits
upstream.
Scary big patch btw. - is there no smaller fix?
Ingo
------------------------>
>From 79338d2a78ab78efdc1698f1309766a039addf9d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Sat, 19 Jun 2010 23:07:56 +0200
Subject: [PATCH] writeback: simplify the write back thread queue
First remove items from work_list as soon as we start working on them. This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished. Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly. Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work. Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there. Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
fs/fs-writeback.c | 253 ++++++++++++-------------------------------
include/linux/backing-dev.h | 2 -
mm/backing-dev.c | 14 +--
3 files changed, 72 insertions(+), 197 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5455009..d5be169 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -38,43 +38,18 @@ int nr_pdflush_threads;
/*
* Passed into wb_writeback(), essentially a subset of writeback_control
*/
-struct wb_writeback_args {
+struct wb_writeback_work {
long nr_pages;
struct super_block *sb;
enum writeback_sync_modes sync_mode;
unsigned int for_kupdate:1;
unsigned int range_cyclic:1;
unsigned int for_background:1;
-};
-/*
- * Work items for the bdi_writeback threads
- */
-struct bdi_work {
struct list_head list; /* pending work list */
- struct rcu_head rcu_head; /* for RCU free/clear of work */
-
- unsigned long seen; /* threads that have seen this work */
- atomic_t pending; /* number of threads still to do work */
-
- struct wb_writeback_args args; /* writeback arguments */
-
- unsigned long state; /* flag bits, see WS_* */
-};
-
-enum {
- WS_INPROGRESS = 0,
- WS_ONSTACK,
+ struct completion *done; /* set if the caller waits */
};
-static inline void bdi_work_init(struct bdi_work *work,
- struct wb_writeback_args *args)
-{
- INIT_RCU_HEAD(&work->rcu_head);
- work->args = *args;
- __set_bit(WS_INPROGRESS, &work->state);
-}
-
/**
* writeback_in_progress - determine whether there is writeback in progress
* @bdi: the device's backing_dev_info structure.
@@ -87,49 +62,11 @@ int writeback_in_progress(struct backing_dev_info *bdi)
return !list_empty(&bdi->work_list);
}
-static void bdi_work_free(struct rcu_head *head)
-{
- struct bdi_work *work = container_of(head, struct bdi_work, rcu_head);
-
- clear_bit(WS_INPROGRESS, &work->state);
- smp_mb__after_clear_bit();
- wake_up_bit(&work->state, WS_INPROGRESS);
-
- if (!test_bit(WS_ONSTACK, &work->state))
- kfree(work);
-}
-
-static void wb_clear_pending(struct bdi_writeback *wb, struct bdi_work *work)
-{
- /*
- * The caller has retrieved the work arguments from this work,
- * drop our reference. If this is the last ref, delete and free it
- */
- if (atomic_dec_and_test(&work->pending)) {
- struct backing_dev_info *bdi = wb->bdi;
-
- spin_lock(&bdi->wb_lock);
- list_del_rcu(&work->list);
- spin_unlock(&bdi->wb_lock);
-
- call_rcu(&work->rcu_head, bdi_work_free);
- }
-}
-
-static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
+static void bdi_queue_work(struct backing_dev_info *bdi,
+ struct wb_writeback_work *work)
{
- work->seen = bdi->wb_mask;
- BUG_ON(!work->seen);
- atomic_set(&work->pending, bdi->wb_cnt);
- BUG_ON(!bdi->wb_cnt);
-
- /*
- * list_add_tail_rcu() contains the necessary barriers to
- * make sure the above stores are seen before the item is
- * noticed on the list
- */
spin_lock(&bdi->wb_lock);
- list_add_tail_rcu(&work->list, &bdi->work_list);
+ list_add_tail(&work->list, &bdi->work_list);
spin_unlock(&bdi->wb_lock);
/*
@@ -146,55 +83,29 @@ static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
}
}
-/*
- * Used for on-stack allocated work items. The caller needs to wait until
- * the wb threads have acked the work before it's safe to continue.
- */
-static void bdi_wait_on_work_done(struct bdi_work *work)
-{
- wait_on_bit(&work->state, WS_INPROGRESS, bdi_sched_wait,
- TASK_UNINTERRUPTIBLE);
-}
-
-static void bdi_alloc_queue_work(struct backing_dev_info *bdi,
- struct wb_writeback_args *args)
+static void
+__bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
+ bool range_cyclic, bool for_background)
{
- struct bdi_work *work;
+ struct wb_writeback_work *work;
/*
* This is WB_SYNC_NONE writeback, so if allocation fails just
* wakeup the thread for old dirty data writeback
*/
- work = kmalloc(sizeof(*work), GFP_ATOMIC);
- if (work) {
- bdi_work_init(work, args);
- bdi_queue_work(bdi, work);
- } else {
- struct bdi_writeback *wb = &bdi->wb;
-
- if (wb->task)
- wake_up_process(wb->task);
+ work = kzalloc(sizeof(*work), GFP_ATOMIC);
+ if (!work) {
+ if (bdi->wb.task)
+ wake_up_process(bdi->wb.task);
+ return;
}
-}
-/**
- * bdi_queue_work_onstack - start and wait for writeback
- * @sb: write inodes from this super_block
- *
- * Description:
- * This function initiates writeback and waits for the operation to
- * complete. Callers must hold the sb s_umount semaphore for
- * reading, to avoid having the super disappear before we are done.
- */
-static void bdi_queue_work_onstack(struct wb_writeback_args *args)
-{
- struct bdi_work work;
+ work->sync_mode = WB_SYNC_NONE;
+ work->nr_pages = nr_pages;
+ work->range_cyclic = range_cyclic;
+ work->for_background = for_background;
- bdi_work_init(&work, args);
- __set_bit(WS_ONSTACK, &work.state);
-
- bdi_queue_work(args->sb->s_bdi, &work);
- bdi_wait_on_work_done(&work);
+ bdi_queue_work(bdi, work);
}
/**
@@ -210,13 +121,7 @@ static void bdi_queue_work_onstack(struct wb_writeback_args *args)
*/
void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
{
- struct wb_writeback_args args = {
- .sync_mode = WB_SYNC_NONE,
- .nr_pages = nr_pages,
- .range_cyclic = 1,
- };
-
- bdi_alloc_queue_work(bdi, &args);
+ __bdi_start_writeback(bdi, nr_pages, true, false);
}
/**
@@ -230,13 +135,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
*/
void bdi_start_background_writeback(struct backing_dev_info *bdi)
{
- struct wb_writeback_args args = {
- .sync_mode = WB_SYNC_NONE,
- .nr_pages = LONG_MAX,
- .for_background = 1,
- .range_cyclic = 1,
- };
- bdi_alloc_queue_work(bdi, &args);
+ __bdi_start_writeback(bdi, LONG_MAX, true, true);
}
/*
@@ -703,14 +602,14 @@ static inline bool over_bground_thresh(void)
* all dirty pages if they are all attached to "old" mappings.
*/
static long wb_writeback(struct bdi_writeback *wb,
- struct wb_writeback_args *args)
+ struct wb_writeback_work *work)
{
struct writeback_control wbc = {
- .sync_mode = args->sync_mode,
+ .sync_mode = work->sync_mode,
.older_than_this = NULL,
- .for_kupdate = args->for_kupdate,
- .for_background = args->for_background,
- .range_cyclic = args->range_cyclic,
+ .for_kupdate = work->for_kupdate,
+ .for_background = work->for_background,
+ .range_cyclic = work->range_cyclic,
};
unsigned long oldest_jif;
long wrote = 0;
@@ -730,24 +629,24 @@ static long wb_writeback(struct bdi_writeback *wb,
/*
* Stop writeback when nr_pages has been consumed
*/
- if (args->nr_pages <= 0)
+ if (work->nr_pages <= 0)
break;
/*
* For background writeout, stop when we are below the
* background dirty threshold
*/
- if (args->for_background && !over_bground_thresh())
+ if (work->for_background && !over_bground_thresh())
break;
wbc.more_io = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.pages_skipped = 0;
- if (args->sb)
- __writeback_inodes_sb(args->sb, wb, &wbc);
+ if (work->sb)
+ __writeback_inodes_sb(work->sb, wb, &wbc);
else
writeback_inodes_wb(wb, &wbc);
- args->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ work->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
/*
@@ -783,31 +682,21 @@ static long wb_writeback(struct bdi_writeback *wb,
}
/*
- * Return the next bdi_work struct that hasn't been processed by this
- * wb thread yet. ->seen is initially set for each thread that exists
- * for this device, when a thread first notices a piece of work it
- * clears its bit. Depending on writeback type, the thread will notify
- * completion on either receiving the work (WB_SYNC_NONE) or after
- * it is done (WB_SYNC_ALL).
+ * Return the next wb_writeback_work struct that hasn't been processed yet.
*/
-static struct bdi_work *get_next_work_item(struct backing_dev_info *bdi,
- struct bdi_writeback *wb)
+static struct wb_writeback_work *
+get_next_work_item(struct backing_dev_info *bdi, struct bdi_writeback *wb)
{
- struct bdi_work *work, *ret = NULL;
-
- rcu_read_lock();
-
- list_for_each_entry_rcu(work, &bdi->work_list, list) {
- if (!test_bit(wb->nr, &work->seen))
- continue;
- clear_bit(wb->nr, &work->seen);
+ struct wb_writeback_work *work = NULL;
- ret = work;
- break;
+ spin_lock(&bdi->wb_lock);
+ if (!list_empty(&bdi->work_list)) {
+ work = list_entry(bdi->work_list.next,
+ struct wb_writeback_work, list);
+ list_del_init(&work->list);
}
-
- rcu_read_unlock();
- return ret;
+ spin_unlock(&bdi->wb_lock);
+ return work;
}
static long wb_check_old_data_flush(struct bdi_writeback *wb)
@@ -832,14 +721,14 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
(inodes_stat.nr_inodes - inodes_stat.nr_unused);
if (nr_pages) {
- struct wb_writeback_args args = {
+ struct wb_writeback_work work = {
.nr_pages = nr_pages,
.sync_mode = WB_SYNC_NONE,
.for_kupdate = 1,
.range_cyclic = 1,
};
- return wb_writeback(wb, &args);
+ return wb_writeback(wb, &work);
}
return 0;
@@ -851,33 +740,27 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
{
struct backing_dev_info *bdi = wb->bdi;
- struct bdi_work *work;
+ struct wb_writeback_work *work;
long wrote = 0;
while ((work = get_next_work_item(bdi, wb)) != NULL) {
- struct wb_writeback_args args = work->args;
-
/*
* Override sync mode, in case we must wait for completion
+ * because this thread is exiting now.
*/
if (force_wait)
- work->args.sync_mode = args.sync_mode = WB_SYNC_ALL;
+ work->sync_mode = WB_SYNC_ALL;
- /*
- * If this isn't a data integrity operation, just notify
- * that we have seen this work and we are now starting it.
- */
- if (!test_bit(WS_ONSTACK, &work->state))
- wb_clear_pending(wb, work);
-
- wrote += wb_writeback(wb, &args);
+ wrote += wb_writeback(wb, work);
/*
- * This is a data integrity writeback, so only do the
- * notification when we have completed the work.
+ * Notify the caller of completion if this is a synchronous
+ * work item, otherwise just free it.
*/
- if (test_bit(WS_ONSTACK, &work->state))
- wb_clear_pending(wb, work);
+ if (work->done)
+ complete(work->done);
+ else
+ kfree(work);
}
/*
@@ -940,14 +823,9 @@ int bdi_writeback_task(struct bdi_writeback *wb)
void wakeup_flusher_threads(long nr_pages)
{
struct backing_dev_info *bdi;
- struct wb_writeback_args args = {
- .sync_mode = WB_SYNC_NONE,
- };
- if (nr_pages) {
- args.nr_pages = nr_pages;
- } else {
- args.nr_pages = global_page_state(NR_FILE_DIRTY) +
+ if (!nr_pages) {
+ nr_pages = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
}
@@ -955,7 +833,7 @@ void wakeup_flusher_threads(long nr_pages)
list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
if (!bdi_has_dirty_io(bdi))
continue;
- bdi_alloc_queue_work(bdi, &args);
+ __bdi_start_writeback(bdi, nr_pages, false, false);
}
rcu_read_unlock();
}
@@ -1164,17 +1042,20 @@ void writeback_inodes_sb(struct super_block *sb)
{
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
- struct wb_writeback_args args = {
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct wb_writeback_work work = {
.sb = sb,
.sync_mode = WB_SYNC_NONE,
+ .done = &done,
};
WARN_ON(!rwsem_is_locked(&sb->s_umount));
- args.nr_pages = nr_dirty + nr_unstable +
+ work.nr_pages = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused);
- bdi_queue_work_onstack(&args);
+ bdi_queue_work(sb->s_bdi, &work);
+ wait_for_completion(&done);
}
EXPORT_SYMBOL(writeback_inodes_sb);
@@ -1206,16 +1087,20 @@ EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
*/
void sync_inodes_sb(struct super_block *sb)
{
- struct wb_writeback_args args = {
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct wb_writeback_work work = {
.sb = sb,
.sync_mode = WB_SYNC_ALL,
.nr_pages = LONG_MAX,
.range_cyclic = 0,
+ .done = &done,
};
WARN_ON(!rwsem_is_locked(&sb->s_umount));
- bdi_queue_work_onstack(&args);
+ bdi_queue_work(sb->s_bdi, &work);
+ wait_for_completion(&done);
+
wait_sb_inodes(sb);
}
EXPORT_SYMBOL(sync_inodes_sb);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 9ae2889..e9aec0d 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -82,8 +82,6 @@ struct backing_dev_info {
struct bdi_writeback wb; /* default writeback info for this bdi */
spinlock_t wb_lock; /* protects update side of wb_list */
struct list_head wb_list; /* the flusher threads hanging off this bdi */
- unsigned long wb_mask; /* bitmask of registered tasks */
- unsigned int wb_cnt; /* number of registered tasks */
struct list_head work_list;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6e0b09a..123bcef 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -104,15 +104,13 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
"b_more_io: %8lu\n"
"bdi_list: %8u\n"
"state: %8lx\n"
- "wb_mask: %8lx\n"
- "wb_list: %8u\n"
- "wb_cnt: %8u\n",
+ "wb_list: %8u\n",
(unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
(unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
K(bdi_thresh), K(dirty_thresh),
K(background_thresh), nr_wb, nr_dirty, nr_io, nr_more_io,
- !list_empty(&bdi->bdi_list), bdi->state, bdi->wb_mask,
- !list_empty(&bdi->wb_list), bdi->wb_cnt);
+ !list_empty(&bdi->bdi_list), bdi->state,
+ !list_empty(&bdi->wb_list));
#undef K
return 0;
@@ -674,12 +672,6 @@ int bdi_init(struct backing_dev_info *bdi)
bdi_wb_init(&bdi->wb, bdi);
- /*
- * Just one thread support for now, hard code mask and count
- */
- bdi->wb_mask = 1;
- bdi->wb_cnt = 1;
-
for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
err = percpu_counter_init(&bdi->bdi_stat[i], 0);
if (err)
* Ingo Molnar <[email protected]> wrote:
> * Christoph Hellwig <[email protected]> wrote:
>
> > We had a thread about it a few days ago. The following commit from
> > Jens' 2.6.36 tree needs to go into 2.6.35 to sort the issues with the
> > fragile wakeup mechanism which only got worse by the patches out:
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=79338d2a78ab78efdc1698f1309766a039addf9d
>
> I've attached it below - in case someone wants to apply it before it hits
> upstream.
>
> Scary big patch btw. - is there no smaller fix?
Nor does it apply to v2.6.35-rc4:
Applying: writeback: simplify the write back thread queue
error: patch failed: fs/fs-writeback.c:703
error: fs/fs-writeback.c: patch does not apply
Patch failed at 0001 writeback: simplify the write back thread queue
I guess it has other dependencies as well. If you want me to test the fix then
please send a version that applies.
Thanks,
Ingo
On Mon, Jul 05, 2010 at 07:14:20PM +0200, Ingo Molnar wrote:
> Nor does it apply to v2.6.35-rc4:
It's indeed missing the two previous patches to the writeback code that
I thought Jens sent to Linus aswell [1]. The race was initially found using
a distro kernel with the patches backported, and in the meantime we've
done a lot of testing with this patch (and the two previous that also
were backported). I'd prefer to get this full stack that's been in
-next for a while and got large machine testing, but if Jens prefers it
I can aim for a smaller variant. Jens, what version do you prefer?
[1] thee patches are:
"writeback: remove writeback_inodes_wbc" a73dd720f081cd160e3843d531eff2195036e442
"writeback: split writeback_inodes_wb" 9f98c0faa41ae334d894762f556195c8a7fb352f
On 05/07/10 20.20, Christoph Hellwig wrote:
> On Mon, Jul 05, 2010 at 07:14:20PM +0200, Ingo Molnar wrote:
>> Nor does it apply to v2.6.35-rc4:
>
> It's indeed missing the two previous patches to the writeback code that
> I thought Jens sent to Linus aswell [1]. The race was initially found using
> a distro kernel with the patches backported, and in the meantime we've
> done a lot of testing with this patch (and the two previous that also
> were backported). I'd prefer to get this full stack that's been in
> -next for a while and got large machine testing, but if Jens prefers it
> I can aim for a smaller variant. Jens, what version do you prefer?
The oops itself looks like a recurrence of the missing RCU grace or
too early stack wakeup, which should be a 1-2 liner once it's found.
So I think such a patch would be greatly preferable to doing this
much churn this late in the cycle.
Christoph, do you have time to look into that? We can always punt to
the larger version in a few days if unsuccessful, which gets rid of
the problem by simply deleting the troublesome and complex
clear/wakeup logic.
--
Jens Axboe
On Mon, Jul 05, 2010 at 09:24:39PM +0200, Jens Axboe wrote:
> The oops itself looks like a recurrence of the missing RCU grace or
> too early stack wakeup, which should be a 1-2 liner once it's found.
See the previous thread. There's at least two issues:
- wb_do_writeback checks work->state after it's been freed when we do
the second test_bit for WS_ONSTACK
- bdi_work_free accesses work->state after waking up the caller doing
bdi_wait_on_work_done, which might have re-used the stack space
allocated for the work item.
The fix for that is to get rid of the fragile work->state stuff and the
bit wakeups by just using a completion and using that as indicator
for the stack wait. That's the main change the above patch does. In
addition it also merges the two structures used for the writeback
requests. Onl doing the completion and earlier list removal would
be something like the untested patch below:
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c 2010-07-05 11:41:51.459003854 -0700
+++ linux-2.6/fs/fs-writeback.c 2010-07-05 11:48:05.542046598 -0700
@@ -52,29 +52,10 @@ struct wb_writeback_args {
*/
struct bdi_work {
struct list_head list; /* pending work list */
- struct rcu_head rcu_head; /* for RCU free/clear of work */
-
- unsigned long seen; /* threads that have seen this work */
- atomic_t pending; /* number of threads still to do work */
-
struct wb_writeback_args args; /* writeback arguments */
-
- unsigned long state; /* flag bits, see WS_* */
+ struct completion *done; /* set if the caller waits */
};
-enum {
- WS_INPROGRESS = 0,
- WS_ONSTACK,
-};
-
-static inline void bdi_work_init(struct bdi_work *work,
- struct wb_writeback_args *args)
-{
- INIT_RCU_HEAD(&work->rcu_head);
- work->args = *args;
- __set_bit(WS_INPROGRESS, &work->state);
-}
-
/**
* writeback_in_progress - determine whether there is writeback in progress
* @bdi: the device's backing_dev_info structure.
@@ -87,49 +68,10 @@ int writeback_in_progress(struct backing
return !list_empty(&bdi->work_list);
}
-static void bdi_work_free(struct rcu_head *head)
-{
- struct bdi_work *work = container_of(head, struct bdi_work, rcu_head);
-
- clear_bit(WS_INPROGRESS, &work->state);
- smp_mb__after_clear_bit();
- wake_up_bit(&work->state, WS_INPROGRESS);
-
- if (!test_bit(WS_ONSTACK, &work->state))
- kfree(work);
-}
-
-static void wb_clear_pending(struct bdi_writeback *wb, struct bdi_work *work)
-{
- /*
- * The caller has retrieved the work arguments from this work,
- * drop our reference. If this is the last ref, delete and free it
- */
- if (atomic_dec_and_test(&work->pending)) {
- struct backing_dev_info *bdi = wb->bdi;
-
- spin_lock(&bdi->wb_lock);
- list_del_rcu(&work->list);
- spin_unlock(&bdi->wb_lock);
-
- call_rcu(&work->rcu_head, bdi_work_free);
- }
-}
-
static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
{
- work->seen = bdi->wb_mask;
- BUG_ON(!work->seen);
- atomic_set(&work->pending, bdi->wb_cnt);
- BUG_ON(!bdi->wb_cnt);
-
- /*
- * list_add_tail_rcu() contains the necessary barriers to
- * make sure the above stores are seen before the item is
- * noticed on the list
- */
spin_lock(&bdi->wb_lock);
- list_add_tail_rcu(&work->list, &bdi->work_list);
+ list_add_tail(&work->list, &bdi->work_list);
spin_unlock(&bdi->wb_lock);
/*
@@ -146,16 +88,6 @@ static void bdi_queue_work(struct backin
}
}
-/*
- * Used for on-stack allocated work items. The caller needs to wait until
- * the wb threads have acked the work before it's safe to continue.
- */
-static void bdi_wait_on_work_done(struct bdi_work *work)
-{
- wait_on_bit(&work->state, WS_INPROGRESS, bdi_sched_wait,
- TASK_UNINTERRUPTIBLE);
-}
-
static void bdi_alloc_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_args *args)
{
@@ -167,7 +99,7 @@ static void bdi_alloc_queue_work(struct
*/
work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (work) {
- bdi_work_init(work, args);
+ work->args = *args;
bdi_queue_work(bdi, work);
} else {
struct bdi_writeback *wb = &bdi->wb;
@@ -188,13 +120,14 @@ static void bdi_alloc_queue_work(struct
*/
static void bdi_queue_work_onstack(struct wb_writeback_args *args)
{
+ DECLARE_COMPLETION_ONSTACK(done);
struct bdi_work work;
- bdi_work_init(&work, args);
- __set_bit(WS_ONSTACK, &work.state);
+ work.args = *args;
+ work.done = &done;
bdi_queue_work(args->sb->s_bdi, &work);
- bdi_wait_on_work_done(&work);
+ wait_for_completion(&done);
}
/**
@@ -791,21 +724,17 @@ static long wb_writeback(struct bdi_writ
static struct bdi_work *get_next_work_item(struct backing_dev_info *bdi,
struct bdi_writeback *wb)
{
- struct bdi_work *work, *ret = NULL;
-
- rcu_read_lock();
+ struct bdi_work *work = NULL;
- list_for_each_entry_rcu(work, &bdi->work_list, list) {
- if (!test_bit(wb->nr, &work->seen))
- continue;
- clear_bit(wb->nr, &work->seen);
-
- ret = work;
- break;
+ spin_lock(&bdi->wb_lock);
+ if (!list_empty(&bdi->work_list)) {
+ work = list_entry(bdi->work_list.next,
+ struct bdi_work, list);
+ list_del_init(&work->list);
}
+ spin_unlock(&bdi->wb_lock);
- rcu_read_unlock();
- return ret;
+ return work;
}
static long wb_check_old_data_flush(struct bdi_writeback *wb)
@@ -861,21 +790,12 @@ long wb_do_writeback(struct bdi_writebac
if (force_wait)
work->args.sync_mode = args.sync_mode = WB_SYNC_ALL;
- /*
- * If this isn't a data integrity operation, just notify
- * that we have seen this work and we are now starting it.
- */
- if (!test_bit(WS_ONSTACK, &work->state))
- wb_clear_pending(wb, work);
-
wrote += wb_writeback(wb, &args);
- /*
- * This is a data integrity writeback, so only do the
- * notification when we have completed the work.
- */
- if (test_bit(WS_ONSTACK, &work->state))
- wb_clear_pending(wb, work);
+ if (work->done)
+ complete(work->done);
+ else
+ kfree(work);
}
/*
Index: linux-2.6/include/linux/backing-dev.h
===================================================================
--- linux-2.6.orig/include/linux/backing-dev.h 2010-07-05 11:45:45.610003714 -0700
+++ linux-2.6/include/linux/backing-dev.h 2010-07-05 11:45:48.802084661 -0700
@@ -82,8 +82,6 @@ struct backing_dev_info {
struct bdi_writeback wb; /* default writeback info for this bdi */
spinlock_t wb_lock; /* protects update side of wb_list */
struct list_head wb_list; /* the flusher threads hanging off this bdi */
- unsigned long wb_mask; /* bitmask of registered tasks */
- unsigned int wb_cnt; /* number of registered tasks */
struct list_head work_list;
Index: linux-2.6/mm/backing-dev.c
===================================================================
--- linux-2.6.orig/mm/backing-dev.c 2010-07-05 11:45:52.458023339 -0700
+++ linux-2.6/mm/backing-dev.c 2010-07-05 11:46:22.145302078 -0700
@@ -104,15 +104,13 @@ static int bdi_debug_stats_show(struct s
"b_more_io: %8lu\n"
"bdi_list: %8u\n"
"state: %8lx\n"
- "wb_mask: %8lx\n"
- "wb_list: %8u\n"
- "wb_cnt: %8u\n",
+ "wb_list: %8u\n",
(unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
(unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
K(bdi_thresh), K(dirty_thresh),
K(background_thresh), nr_wb, nr_dirty, nr_io, nr_more_io,
- !list_empty(&bdi->bdi_list), bdi->state, bdi->wb_mask,
- !list_empty(&bdi->wb_list), bdi->wb_cnt);
+ !list_empty(&bdi->bdi_list), bdi->state,
+ !list_empty(&bdi->wb_list));
#undef K
return 0;
@@ -675,12 +673,6 @@ int bdi_init(struct backing_dev_info *bd
bdi_wb_init(&bdi->wb, bdi);
- /*
- * Just one thread support for now, hard code mask and count
- */
- bdi->wb_mask = 1;
- bdi->wb_cnt = 1;
-
for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
err = percpu_counter_init(&bdi->bdi_stat[i], 0);
if (err)
On 05/07/10 21.32, Christoph Hellwig wrote:
> On Mon, Jul 05, 2010 at 09:24:39PM +0200, Jens Axboe wrote:
>> The oops itself looks like a recurrence of the missing RCU grace or
>> too early stack wakeup, which should be a 1-2 liner once it's found.
>
> See the previous thread. There's at least two issues:
>
> - wb_do_writeback checks work->state after it's been freed when we do
> the second test_bit for WS_ONSTACK
> - bdi_work_free accesses work->state after waking up the caller doing
> bdi_wait_on_work_done, which might have re-used the stack space
> allocated for the work item.
>
> The fix for that is to get rid of the fragile work->state stuff and the
> bit wakeups by just using a completion and using that as indicator
> for the stack wait. That's the main change the above patch does. In
> addition it also merges the two structures used for the writeback
> requests. Onl doing the completion and earlier list removal would
> be something like the untested patch below:
If those two late ON_STACK checks is the only issue left there,
why not just apply the below for 2.6.35?
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0609607..15ce6ab 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -90,12 +90,13 @@ int writeback_in_progress(struct backing_dev_info *bdi)
static void bdi_work_free(struct rcu_head *head)
{
struct bdi_work *work = container_of(head, struct bdi_work, rcu_head);
+ int on_stack = test_bit(WS_ONSTACK, &work->state);
clear_bit(WS_INPROGRESS, &work->state);
smp_mb__after_clear_bit();
wake_up_bit(&work->state, WS_INPROGRESS);
- if (!test_bit(WS_ONSTACK, &work->state))
+ if (!on_stack)
kfree(work);
}
@@ -854,6 +855,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
while ((work = get_next_work_item(bdi, wb)) != NULL) {
struct wb_writeback_args args = work->args;
+ int on_stack = test_bit(WS_ONSTACK, &work->state);
/*
* Override sync mode, in case we must wait for completion
@@ -865,7 +867,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
* If this isn't a data integrity operation, just notify
* that we have seen this work and we are now starting it.
*/
- if (!test_bit(WS_ONSTACK, &work->state))
+ if (!on_stack)
wb_clear_pending(wb, work);
wrote += wb_writeback(wb, &args);
@@ -874,7 +876,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
* This is a data integrity writeback, so only do the
* notification when we have completed the work.
*/
- if (test_bit(WS_ONSTACK, &work->state))
+ if (on_stack)
wb_clear_pending(wb, work);
}
--
Jens Axboe
Jens Axboe wrote:
> On 05/07/10 20.20, Christoph Hellwig wrote:
>> On Mon, Jul 05, 2010 at 07:14:20PM +0200, Ingo Molnar wrote:
>>> Nor does it apply to v2.6.35-rc4:
>> It's indeed missing the two previous patches to the writeback code that
>> I thought Jens sent to Linus aswell [1]. The race was initially found using
>> a distro kernel with the patches backported, and in the meantime we've
>> done a lot of testing with this patch (and the two previous that also
>> were backported). I'd prefer to get this full stack that's been in
>> -next for a while and got large machine testing, but if Jens prefers it
>> I can aim for a smaller variant. Jens, what version do you prefer?
>
> The oops itself looks like a recurrence of the missing RCU grace or
> too early stack wakeup, which should be a 1-2 liner once it's found.
> So I think such a patch would be greatly preferable to doing this
> much churn this late in the cycle.
>
Absent a small fix, and given that the big fix has a lot more testing than any
new patch might, in this case the quickie might be undesirable. Particularly
since posters here seem sure that code will be replaced in the next version
anyway, and lightly tested patch to obsolete code is actually less conservative.
I can't reproduce the bug, so take that as an opinion based on Christoph's
comment on the use of a tested full change.
> Christoph, do you have time to look into that? We can always punt to
> the larger version in a few days if unsuccessful, which gets rid of
> the problem by simply deleting the troublesome and complex
> clear/wakeup logic.
>
--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
On Monday, July 05, 2010, Linus Torvalds wrote:
> Hey everybody, there's finally a new -rc out there..
>
> I've been back online for a week, and at least judging by the kinds of
> patches and pull requests I've been getting, I have to say that I
> think that having been strict for -rc3 ended up working out pretty
> well. The diffstat of rc3..rc4 looks quite reasonable, despite the
> longer window between rc's. And while there were certainly some things
> that needed fixing, I'm hoping that we'll have a timely 2.6.35 release
> despite my vacation (my longest time away from the kernel in many
> years, I do believe - I followed email on a cellphone, but did not a
> single kernel compile, and had a great time under water).
>
> So go out and test -rc4. It fixes a number of regressions, a couple of
> them harking back to from before 2.6.34. Networking, cfq, i915 and
> radeo. And filesystem writeback performance issues, etc. It's all
> good.
Hmm. Am I dong anything particularly wrong, or is CONFIG_LOCALVERSION ignored
now, as it seems?
Rafael
On Mon, Jul 05, 2010 at 11:25:56PM +0200, Rafael J. Wysocki wrote:
> Hmm. Am I dong anything particularly wrong, or is CONFIG_LOCALVERSION ignored
> now, as it seems?
(also reported by Stephen earlier that day)
On Mon, Jul 05, 2010 at 10:39:41AM +1000, Stephen Rothwell wrote:
> After merging the libata tree, today's linux-next build produced this
> warning (but it should be a failure ...):
>
> /bin/sh: scripts/setlocalversion: No such file or directory
>
> Probably introduced by commit 0a564b2645c8766a669c55bde1f1ef5b0518caec
> ("kbuild: Propagate LOCALVERSION= down to scripts/setlocalversion").
>
> That commit really isn't ready for inclusion in Linus' tree ...
Oops, I'm an idiot. The below patch should fix localversion again:
From: Michal Marek <[email protected]>
Subject: [PATCH] kbuild: Fix path to scripts/setlocalversion
Commit 0a564b2 broke LOCALVERSION for O=... builds. Ouch.
Reported-by: Stephen Rothwell <[email protected]>
Reported-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Michal Marek <[email protected]>
diff --git a/Makefile b/Makefile
index 12ab175..df16c4f 100644
--- a/Makefile
+++ b/Makefile
@@ -886,7 +886,7 @@ $(vmlinux-dirs): prepare scripts
# Store (new) KERNELRELASE string in include/config/kernel.release
include/config/kernel.release: include/config/auto.conf FORCE
$(Q)rm -f $@
- $(Q)echo "$(KERNELVERSION)$$($(CONFIG_SHELL) scripts/setlocalversion $(srctree))" > $@
+ $(Q)echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion $(srctree))" > $@
# Things we need to do before we recursively start building the kernel
On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <[email protected]> wrote:
>
> Absent a small fix, and given that the big fix has a lot more testing than
> any new patch might, in this case the quickie might be undesirable.
> Particularly since posters here seem sure that code will be replaced in the
> next version anyway, and lightly tested patch to obsolete code is actually
> less conservative.
I have to agree. Especially as the "big patch" just removes the
fragile code that caused the problem in the first place. So in this
case I do suspect that the bigger patch ends up being the safer one.
But I obviously don't actually see the problem, so it would be good to
get confirmation that Christoph's patch actually fixes things first.
Ingo, does the one in this thread apply for you?
Linus
On 2010-07-06 01:34, Linus Torvalds wrote:
> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <[email protected]> wrote:
>>
>> Absent a small fix, and given that the big fix has a lot more testing than
>> any new patch might, in this case the quickie might be undesirable.
>> Particularly since posters here seem sure that code will be replaced in the
>> next version anyway, and lightly tested patch to obsolete code is actually
>> less conservative.
>
> I have to agree. Especially as the "big patch" just removes the
> fragile code that caused the problem in the first place. So in this
> case I do suspect that the bigger patch ends up being the safer one.
OK, it's up to you.
> But I obviously don't actually see the problem, so it would be good to
> get confirmation that Christoph's patch actually fixes things first.
> Ingo, does the one in this thread apply for you?
I'll update for-linus, Ingo can try that.
--
Jens Axboe
* Linus Torvalds <[email protected]> wrote:
> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <[email protected]> wrote:
> >
> > Absent a small fix, and given that the big fix has a lot more testing than
> > any new patch might, in this case the quickie might be undesirable.
> > Particularly since posters here seem sure that code will be replaced in the
> > next version anyway, and lightly tested patch to obsolete code is actually
> > less conservative.
>
> I have to agree. Especially as the "big patch" just removes the fragile code
> that caused the problem in the first place. So in this case I do suspect
> that the bigger patch ends up being the safer one.
Yeah, i agree - especially since the smaller patch is still pretty large (not
a oneliner), plus it does not appear that the precise failure mode is fully
understood either.
> But I obviously don't actually see the problem, so it would be good to get
> confirmation that Christoph's patch actually fixes things first. Ingo, does
> the one in this thread apply for you?
Yes, the three larger patches survived overnight testing with 300+ iterations
and i did some other tests as well, which passed too. These are the patches i
applied:
a73dd720 writeback: remove writeback_inodes_wbc
9f98c0fa writeback: split writeback_inodes_wb
79338d2a writeback: simplify the write back thread queue
Thanks,
Ingo
* Jens Axboe <[email protected]> wrote:
> > But I obviously don't actually see the problem, so it would be good to get
> > confirmation that Christoph's patch actually fixes things first. Ingo,
> > does the one in this thread apply for you?
>
> I'll update for-linus, Ingo can try that.
I have cherry-picked the 3 patches explicitly yesterday, and overnight testing
showed that the bug is fixed. Please send them to Linus.
Thanks,
Ingo
On 2010-07-06 08:47, Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
>> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <[email protected]> wrote:
>>>
>>> Absent a small fix, and given that the big fix has a lot more testing than
>>> any new patch might, in this case the quickie might be undesirable.
>>> Particularly since posters here seem sure that code will be replaced in the
>>> next version anyway, and lightly tested patch to obsolete code is actually
>>> less conservative.
>>
>> I have to agree. Especially as the "big patch" just removes the fragile code
>> that caused the problem in the first place. So in this case I do suspect
>> that the bigger patch ends up being the safer one.
>
> Yeah, i agree - especially since the smaller patch is still pretty large (not
> a oneliner), plus it does not appear that the precise failure mode is fully
> understood either.
http://lkml.org/lkml/2010/7/5/232
It's about as straight forward as it can be :-)
It definitely fixes _a_ bug, but whether it's only that bug is not certain.
As long as Linus is fine with the larger fix, then I have no issues going
in that direction.
>> But I obviously don't actually see the problem, so it would be good to get
>> confirmation that Christoph's patch actually fixes things first. Ingo, does
>> the one in this thread apply for you?
>
> Yes, the three larger patches survived overnight testing with 300+ iterations
> and i did some other tests as well, which passed too. These are the patches i
> applied:
>
> a73dd720 writeback: remove writeback_inodes_wbc
> 9f98c0fa writeback: split writeback_inodes_wb
> 79338d2a writeback: simplify the write back thread queue
Great, I'll upstream these bits today. Thanks Ingo.
--
Jens Axboe
* Jens Axboe <[email protected]> wrote:
> On 2010-07-06 08:47, Ingo Molnar wrote:
> >
> > * Linus Torvalds <[email protected]> wrote:
> >
> >> On Mon, Jul 5, 2010 at 2:19 PM, Bill Davidsen <[email protected]> wrote:
> >>>
> >>> Absent a small fix, and given that the big fix has a lot more testing than
> >>> any new patch might, in this case the quickie might be undesirable.
> >>> Particularly since posters here seem sure that code will be replaced in the
> >>> next version anyway, and lightly tested patch to obsolete code is actually
> >>> less conservative.
> >>
> >> I have to agree. Especially as the "big patch" just removes the fragile code
> >> that caused the problem in the first place. So in this case I do suspect
> >> that the bigger patch ends up being the safer one.
> >
> > Yeah, i agree - especially since the smaller patch is still pretty large (not
> > a oneliner), plus it does not appear that the precise failure mode is fully
> > understood either.
>
> http://lkml.org/lkml/2010/7/5/232
>
> It's about as straight forward as it can be :-)
Yeah, i meant Christoph's first patch. Yours looks simple enough but from the
discussion it wasnt clear (to me) yet whether it would fix the bug.
Thanks,
Ingo
On Monday, July 05, 2010, Michal Marek wrote:
> On Mon, Jul 05, 2010 at 11:25:56PM +0200, Rafael J. Wysocki wrote:
> > Hmm. Am I dong anything particularly wrong, or is CONFIG_LOCALVERSION ignored
> > now, as it seems?
>
> (also reported by Stephen earlier that day)
>
> On Mon, Jul 05, 2010 at 10:39:41AM +1000, Stephen Rothwell wrote:
> > After merging the libata tree, today's linux-next build produced this
> > warning (but it should be a failure ...):
> >
> > /bin/sh: scripts/setlocalversion: No such file or directory
> >
> > Probably introduced by commit 0a564b2645c8766a669c55bde1f1ef5b0518caec
> > ("kbuild: Propagate LOCALVERSION= down to scripts/setlocalversion").
> >
> > That commit really isn't ready for inclusion in Linus' tree ...
>
>
> Oops, I'm an idiot. The below patch should fix localversion again:
Yes, it helps here.
Thanks,
Rafael
> From: Michal Marek <[email protected]>
> Subject: [PATCH] kbuild: Fix path to scripts/setlocalversion
>
> Commit 0a564b2 broke LOCALVERSION for O=... builds. Ouch.
>
> Reported-by: Stephen Rothwell <[email protected]>
> Reported-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Michal Marek <[email protected]>
>
> diff --git a/Makefile b/Makefile
> index 12ab175..df16c4f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -886,7 +886,7 @@ $(vmlinux-dirs): prepare scripts
> # Store (new) KERNELRELASE string in include/config/kernel.release
> include/config/kernel.release: include/config/auto.conf FORCE
> $(Q)rm -f $@
> - $(Q)echo "$(KERNELVERSION)$$($(CONFIG_SHELL) scripts/setlocalversion $(srctree))" > $@
> + $(Q)echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion $(srctree))" > $@
>
>
> # Things we need to do before we recursively start building the kernel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Hi Michal,
On Tue, 6 Jul 2010 20:29:34 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
>
> On Monday, July 05, 2010, Michal Marek wrote:
> > On Mon, Jul 05, 2010 at 11:25:56PM +0200, Rafael J. Wysocki wrote:
> > > Hmm. Am I dong anything particularly wrong, or is CONFIG_LOCALVERSION ignored
> > > now, as it seems?
> >
> > (also reported by Stephen earlier that day)
> >
> > On Mon, Jul 05, 2010 at 10:39:41AM +1000, Stephen Rothwell wrote:
> > > After merging the libata tree, today's linux-next build produced this
> > > warning (but it should be a failure ...):
> > >
> > > /bin/sh: scripts/setlocalversion: No such file or directory
> > >
> > > Probably introduced by commit 0a564b2645c8766a669c55bde1f1ef5b0518caec
> > > ("kbuild: Propagate LOCALVERSION= down to scripts/setlocalversion").
> > >
> > > That commit really isn't ready for inclusion in Linus' tree ...
> >
> >
> > Oops, I'm an idiot. The below patch should fix localversion again:
>
> Yes, it helps here.
I also applied it to linux-next yesterday (in my fixes tree) and it
solves the problem.
--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/
On Tue, Jul 06, 2010 at 08:47:33AM +0200, Ingo Molnar wrote:
> Yeah, i agree - especially since the smaller patch is still pretty large (not
> a oneliner), plus it does not appear that the precise failure mode is fully
> understood either.
The two failure modes we saw are very precicely understood:
- access to ->state after it's been freed
- access to ->state after the caller which has the bdi_work struct on
stack has been woken up.
But I seriously fear that there are even more dragons hidden in this
maze of bit wakeups and rcu, so replacing it with a proper completion
seems by far safer to me.
I have found one system, where 2.6.35 does not work (as tested with rc3
and rc4)
That Intel system has no problems in 2.6.33.x nor 2.6.34.0.
The problem seems to be in AGP - I can boot if I specify "agp=off" - but
of course only in text mode...
There seems to be a hard lock-up, so the only way to show the crash is
by picture 8-)
Since I do not build kernel on that machine, I did not do any bisect
tests, however if someone is interested in digging deeper, I can try...
Preferably a patch to try out ;-)
This bug seems to be different then
https://bugzilla.kernel.org/show_bug.cgi?id=16179
Should it be blamed on BIOS (the conflict indicated just before the crash)?
Thanks, Woody
On Fri, 9 Jul 2010 10:21:56 -0400
Woody Suwalski <[email protected]> wrote:
> I have found one system, where 2.6.35 does not work (as tested with rc3
> and rc4)
> That Intel system has no problems in 2.6.33.x nor 2.6.34.0.
>
> The problem seems to be in AGP - I can boot if I specify "agp=off" - but
> of course only in text mode...
> There seems to be a hard lock-up, so the only way to show the crash is
> by picture 8-)
>
> Since I do not build kernel on that machine, I did not do any bisect
> tests, however if someone is interested in digging deeper, I can try...
> Preferably a patch to try out ;-)
>
> This bug seems to be different then
> https://bugzilla.kernel.org/show_bug.cgi?id=16179
>
> Should it be blamed on BIOS (the conflict indicated just before the crash)?
Well even if the BIOS is doing something bad, if we handled it in
earlier kernels we should handle it today. So this sounds like a
regression.
A bisect should help if it was working before, can you do that?
--
Jesse Barnes, Intel Open Source Technology Center
Jesse Barnes wrote:
> On Fri, 9 Jul 2010 10:21:56 -0400
> Woody Suwalski<[email protected]> wrote:
>
>
>> I have found one system, where 2.6.35 does not work (as tested with rc3
>> and rc4)
>> That Intel system has no problems in 2.6.33.x nor 2.6.34.0.
>>
>> The problem seems to be in AGP - I can boot if I specify "agp=off" - but
>> of course only in text mode...
>> There seems to be a hard lock-up, so the only way to show the crash is
>> by picture 8-)
>>
>> Since I do not build kernel on that machine, I did not do any bisect
>> tests, however if someone is interested in digging deeper, I can try...
>> Preferably a patch to try out ;-)
>>
>> This bug seems to be different then
>> https://bugzilla.kernel.org/show_bug.cgi?id=16179
>>
>> Should it be blamed on BIOS (the conflict indicated just before the crash)?
>>
> Well even if the BIOS is doing something bad, if we handled it in
> earlier kernels we should handle it today. So this sounds like a
> regression.
>
> A bisect should help if it was working before, can you do that?
>
>
OK, I have never really done the brute-force bissecting, but there is
always first time...
Will try to do it over the weekend...
Unless U will be visiting Ottawa and want to do it yourself ;-)
Woody
On Fri, 9 Jul 2010 12:27:43 -0400
Woody Suwalski <[email protected]> wrote:
> Jesse Barnes wrote:
> > On Fri, 9 Jul 2010 10:21:56 -0400
> > Woody Suwalski<[email protected]> wrote:
> >
> >
> >> I have found one system, where 2.6.35 does not work (as tested with rc3
> >> and rc4)
> >> That Intel system has no problems in 2.6.33.x nor 2.6.34.0.
> >>
> >> The problem seems to be in AGP - I can boot if I specify "agp=off" - but
> >> of course only in text mode...
> >> There seems to be a hard lock-up, so the only way to show the crash is
> >> by picture 8-)
> >>
> >> Since I do not build kernel on that machine, I did not do any bisect
> >> tests, however if someone is interested in digging deeper, I can try...
> >> Preferably a patch to try out ;-)
> >>
> >> This bug seems to be different then
> >> https://bugzilla.kernel.org/show_bug.cgi?id=16179
> >>
> >> Should it be blamed on BIOS (the conflict indicated just before the crash)?
> >>
> > Well even if the BIOS is doing something bad, if we handled it in
> > earlier kernels we should handle it today. So this sounds like a
> > regression.
> >
> > A bisect should help if it was working before, can you do that?
> >
> >
> OK, I have never really done the brute-force bissecting, but there is
> always first time...
> Will try to do it over the weekend...
Great, thanks.
> Unless U will be visiting Ottawa and want to do it yourself ;-)
Nope, no immediate plans. :)
--
Jesse Barnes, Intel Open Source Technology Center
On Mon, Jul 5, 2010 at 3:22 AM, Xiaotian Feng <[email protected]> wrote:
>
> I'm facing some drm issue with -rc4, my x-windows is messed up and can
> not be used. I got floods of following messages, kernel config is
> attached. I have the same issue on -mmotm-rc3.
It looks like the first oops that causes this problem is missing. It
looks to me that the "scheduling while atomic" issue happens because
you had an oops in drm_fb_release->mutex_lock_nested->oops, and then
when that kills the process, we get that secondary "scheduling while
atomic" thing.
But I'd really like to see the first oops itself. It _looks_ like it
should be a page fault with this backtrace:
> [ ? 34.532219] ?[<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
> [ ? 34.532222] ?[<ffffffff8127be8a>] ? __list_add+0x3f/0x81
> [ ? 34.532225] ?[<ffffffff814fa4e0>] __mutex_lock_common+0x131/0x461
> [ ? 34.532233] ?[<ffffffffa0023e39>] ? drm_fb_release+0x2f/0x7e [drm]
> [ ? 34.532239] ?[<ffffffffa001b104>] ? drm_release+0x2bd/0x63e [drm]
> [ ? 34.532242] ?[<ffffffff814fa8c5>] mutex_lock_nested+0x39/0x3e
> [ ? 34.532250] ?[<ffffffffa0023e39>] drm_fb_release+0x2f/0x7e [drm]
> [ ? 34.532257] ?[<ffffffffa001b1da>] drm_release+0x393/0x63e [drm]
> [ ? 34.532259] ?[<ffffffff8111a636>] fput+0x135/0x1e2
> [ ? 34.532262] ?[<ffffffff811177e5>] filp_close+0x68/0x72
> [ ? 34.532265] ?[<ffffffff811178cb>] sys_close+0xdc/0x116
> [ ? 34.532268] ?[<ffffffff81009cc2>] system_call_fastpath+0x16/0x1b
but I'd like to see the actual register state etc from the oops too.
And if it all scrolls away so quickly that you can't see it, then I
suspect we need some help in the form of a bisection or something. Was
2.6.35-rc3 fine for you? If so, it should bisect pretty well - there's
only a few hundred commits, so you should be able to do it in eight or
nine recompiles.
Linus
On piątek, 9 lipca 2010 o 16:21:56 Woody Suwalski wrote:
> I have found one system, where 2.6.35 does not work (as tested with rc3
> and rc4)
> That Intel system has no problems in 2.6.33.x nor 2.6.34.0.
>
> The problem seems to be in AGP - I can boot if I specify "agp=off" - but
> of course only in text mode...
> There seems to be a hard lock-up, so the only way to show the crash is
> by picture 8-)
>
> Since I do not build kernel on that machine, I did not do any bisect
> tests, however if someone is interested in digging deeper, I can try...
> Preferably a patch to try out ;-)
>
> This bug seems to be different then
> https://bugzilla.kernel.org/show_bug.cgi?id=16179
>
> Should it be blamed on BIOS (the conflict indicated just before the crash)?
>
> Thanks, Woody
I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=16369
for your bug report, please add your address to the CC list in there, thanks!
--
Maciej Rutecki
http://www.maciek.unixy.pl
On Jul. 06, 2010, 0:25 +0300, "Rafael J. Wysocki" <[email protected]> wrote:
> On Monday, July 05, 2010, Linus Torvalds wrote:
>> Hey everybody, there's finally a new -rc out there..
>>
>> I've been back online for a week, and at least judging by the kinds of
>> patches and pull requests I've been getting, I have to say that I
>> think that having been strict for -rc3 ended up working out pretty
>> well. The diffstat of rc3..rc4 looks quite reasonable, despite the
>> longer window between rc's. And while there were certainly some things
>> that needed fixing, I'm hoping that we'll have a timely 2.6.35 release
>> despite my vacation (my longest time away from the kernel in many
>> years, I do believe - I followed email on a cellphone, but did not a
>> single kernel compile, and had a great time under water).
>>
>> So go out and test -rc4. It fixes a number of regressions, a couple of
>> them harking back to from before 2.6.34. Networking, cfq, i915 and
>> radeo. And filesystem writeback performance issues, etc. It's all
>> good.
>
> Hmm. Am I dong anything particularly wrong, or is CONFIG_LOCALVERSION ignored
> now, as it seems?
Not working for me either in 2.6.35-rc4.
Note that I have a localversion-xxxx file in my root directory
which is also ignored, regardless of CONFIG_LOCALVERSION
and CONFIG_LOCALVERSION_AUTO.
Benny
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
2010/7/12 Benny Halevy <[email protected]>:
>
> Not working for me either in 2.6.35-rc4.
I assume you're building with "O="? If so, have you checked -git (or
any of the nightly snapshots)?
Becuse it _should_ have been fixed about a week ago in commit
7263e715c91f ("kbuild: Fix path to scripts/setlocalversion"). But
that's after -rc4.
(I'll do an -rc5 probably today, but it would be good to have it
confirmed that the bug is gone)
Linus
On Jul. 12, 2010, 20:20 +0300, Linus Torvalds <[email protected]> wrote:
> 2010/7/12 Benny Halevy <[email protected]>:
>>
>> Not working for me either in 2.6.35-rc4.
>
> I assume you're building with "O="? If so, have you checked -git (or
> any of the nightly snapshots)?
Correct.
>
> Becuse it _should_ have been fixed about a week ago in commit
> 7263e715c91f ("kbuild: Fix path to scripts/setlocalversion"). But
> that's after -rc4.
>
> (I'll do an -rc5 probably today, but it would be good to have it
> confirmed that the bug is gone)
Confirmed. This patch fixes my problem too.
Thanks!
Benny
>
> Linus