2015-12-07 11:40:51

by Jan Stancek

[permalink] [raw]
Subject: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)

Hi,

"ADSP018" test from LTP[1] is triggering BUG_ON below reliably for me on 4.4.0-rc4.
I'll start a bisect - if someone already sees a suspect/culprit that could narrow
it down, please let me know.

# ./aiodio_sparse -i 4 -a 8k -w 16384k -s 65536k -n 2
aiodio_sparse 0 TINFO : Dirtying free blocks
aiodio_sparse 0 TINFO : Starting I/O tests
aiodio_sparse 0 TINFO : Killing childrens(s)

[ 637.250251] ------------[ cut here ]------------
[ 637.255404] kernel BUG at mm/filemap.c:238!
[ 637.260069] invalid opcode: 0000 [#1] SMP
[ 637.264655] Modules linked in: loop x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ipmi_devintf ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif lpc_ich sg pcspkr shpchp i2c_i801 mfd_core ipmi_si winbond_cir parport_pc rc_core parport ipmi_msghandler video nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod sd_mod cdrom mgag200 drm_kms_helper igb syscopyarea sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm dca i2c_algo_bit drm ahci libahci crc32c_intel libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
[ 637.328054] CPU: 6 PID: 22523 Comm: aiodio_sparse Not tainted 4.4.0-rc4 #1
[ 637.335723] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.0002.041520151123 04/15/2015
[ 637.347173] task: ffff880437fab200 ti: ffff8804379f4000 task.ti: ffff8804379f4000
[ 637.355522] RIP: 0010:[<ffffffff811cd141>] [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
[ 637.365433] RSP: 0018:ffff8804379f7978 EFLAGS: 00010246
[ 637.371358] RAX: 002fffff80020028 RBX: ffffea000fe71c40 RCX: 0000000000000000
[ 637.379319] RDX: ffff88043e410220 RSI: 0000000000000000 RDI: ffffea000fe71c40
[ 637.387280] RBP: ffff8804379f79a0 R08: 0000000000000000 R09: 0000000000000001
[ 637.395241] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880430b543b8
[ 637.403202] R13: ffff8804379f79f0 R14: 0000000000000964 R15: 0000000000000000
[ 637.411161] FS: 00007fd344bab740(0000) GS:ffff88043e400000(0000) knlGS:0000000000000000
[ 637.420188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 637.426598] CR2: 00007ffc755f8aef CR3: 0000000001ad6000 CR4: 00000000003406e0
[ 637.434560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 637.442518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 637.450479] Stack:
[ 637.452718] ffffea000fe71c40 ffff880430b543b8 ffff8804379f79f0 0000000000000964
[ 637.461009] 0000000000000000 ffff8804379f79c0 ffffffff811dee66 ffffffffffffffff
[ 637.469299] ffff8804379f7a60 ffff8804379f7b10 ffffffff811df2cb 0000000000000000
[ 637.477590] Call Trace:
[ 637.480316] [<ffffffff811dee66>] truncate_inode_page+0x56/0x90
[ 637.486922] [<ffffffff811df2cb>] truncate_inode_pages_range+0x3eb/0x760
[ 637.494399] [<ffffffff811df6ac>] truncate_inode_pages_final+0x4c/0x60
[ 637.501695] [<ffffffffa02b2537>] xfs_fs_evict_inode+0x77/0x1b0 [xfs]
[ 637.508881] [<ffffffff8127c22f>] evict+0xaf/0x180
[ 637.514223] [<ffffffff8127cc6f>] iput+0x1af/0x290
[ 637.519566] [<ffffffff812763fc>] __dentry_kill+0x17c/0x1e0
[ 637.525782] [<ffffffff812776ad>] dput+0x25d/0x310
[ 637.531126] [<ffffffff81277470>] ? dput+0x20/0x310
[ 637.536566] [<ffffffff8125f904>] __fput+0x1a4/0x240
[ 637.542102] [<ffffffff8125f9de>] ____fput+0xe/0x10
[ 637.547542] [<ffffffff810b91a7>] task_work_run+0x77/0xa0
[ 637.553565] [<ffffffff81098fdf>] do_exit+0x33f/0xc60
[ 637.559199] [<ffffffff8109998c>] do_group_exit+0x4c/0xc0
[ 637.565221] [<ffffffff810a7a11>] get_signal+0x331/0x8f0
[ 637.571147] [<ffffffff8101d3c7>] do_signal+0x37/0x680
[ 637.576878] [<ffffffff81113ab3>] ? rcu_read_lock_sched_held+0x93/0xa0
[ 637.584160] [<ffffffff8123303e>] ? kfree+0x1ae/0x270
[ 637.589794] [<ffffffff8108f2e4>] ? exit_to_usermode_loop+0x33/0xac
[ 637.596785] [<ffffffff8108f30f>] exit_to_usermode_loop+0x5e/0xac
[ 637.603584] [<ffffffff81003d0b>] syscall_return_slowpath+0xbb/0x130
[ 637.610673] [<ffffffff81761ada>] int_ret_from_sys_call+0x25/0x9f
[ 637.617469] Code: e8 65 3d 59 00 4c 89 ef e8 1d 72 07 00 4d 85 f6 74 06 48 89 df 41 ff d6 48 89 df e8 1a fb 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[ 637.639172] RIP [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
[ 637.646464] RSP <ffff8804379f7978>

Regards,
Jan

[1] https://github.com/linux-test-project/ltp


2015-12-07 15:19:04

by Jan Stancek

[permalink] [raw]
Subject: Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)

On 12/07/2015 12:40 PM, Jan Stancek wrote:
> Hi,
>
> "ADSP018" test from LTP[1] is triggering BUG_ON below reliably for me on 4.4.0-rc4.
> I'll start a bisect - if someone already sees a suspect/culprit that could narrow
> it down, please let me know.
>
> # ./aiodio_sparse -i 4 -a 8k -w 16384k -s 65536k -n 2
> aiodio_sparse 0 TINFO : Dirtying free blocks
> aiodio_sparse 0 TINFO : Starting I/O tests
> aiodio_sparse 0 TINFO : Killing childrens(s)
>
> [ 637.250251] ------------[ cut here ]------------
> [ 637.255404] kernel BUG at mm/filemap.c:238!
> [ 637.260069] invalid opcode: 0000 [#1] SMP
> [ 637.264655] Modules linked in: loop x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ipmi_devintf ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif lpc_ich sg pcspkr shpchp i2c_i801 mfd_core ipmi_si winbond_cir parport_pc rc_core parport ipmi_msghandler video nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod sd_mod cdrom mgag200 drm_kms_helper igb syscopyarea sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm dca i2c_algo_bit drm ahci libahci crc32c_intel libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [ 637.328054] CPU: 6 PID: 22523 Comm: aiodio_sparse Not tainted 4.4.0-rc4 #1
> [ 637.335723] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.0002.041520151123 04/15/2015
> [ 637.347173] task: ffff880437fab200 ti: ffff8804379f4000 task.ti: ffff8804379f4000
> [ 637.355522] RIP: 0010:[<ffffffff811cd141>] [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
> [ 637.365433] RSP: 0018:ffff8804379f7978 EFLAGS: 00010246
> [ 637.371358] RAX: 002fffff80020028 RBX: ffffea000fe71c40 RCX: 0000000000000000
> [ 637.379319] RDX: ffff88043e410220 RSI: 0000000000000000 RDI: ffffea000fe71c40
> [ 637.387280] RBP: ffff8804379f79a0 R08: 0000000000000000 R09: 0000000000000001
> [ 637.395241] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880430b543b8
> [ 637.403202] R13: ffff8804379f79f0 R14: 0000000000000964 R15: 0000000000000000
> [ 637.411161] FS: 00007fd344bab740(0000) GS:ffff88043e400000(0000) knlGS:0000000000000000
> [ 637.420188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 637.426598] CR2: 00007ffc755f8aef CR3: 0000000001ad6000 CR4: 00000000003406e0
> [ 637.434560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 637.442518] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 637.450479] Stack:
> [ 637.452718] ffffea000fe71c40 ffff880430b543b8 ffff8804379f79f0 0000000000000964
> [ 637.461009] 0000000000000000 ffff8804379f79c0 ffffffff811dee66 ffffffffffffffff
> [ 637.469299] ffff8804379f7a60 ffff8804379f7b10 ffffffff811df2cb 0000000000000000
> [ 637.477590] Call Trace:
> [ 637.480316] [<ffffffff811dee66>] truncate_inode_page+0x56/0x90
> [ 637.486922] [<ffffffff811df2cb>] truncate_inode_pages_range+0x3eb/0x760
> [ 637.494399] [<ffffffff811df6ac>] truncate_inode_pages_final+0x4c/0x60
> [ 637.501695] [<ffffffffa02b2537>] xfs_fs_evict_inode+0x77/0x1b0 [xfs]
> [ 637.508881] [<ffffffff8127c22f>] evict+0xaf/0x180
> [ 637.514223] [<ffffffff8127cc6f>] iput+0x1af/0x290
> [ 637.519566] [<ffffffff812763fc>] __dentry_kill+0x17c/0x1e0
> [ 637.525782] [<ffffffff812776ad>] dput+0x25d/0x310
> [ 637.531126] [<ffffffff81277470>] ? dput+0x20/0x310
> [ 637.536566] [<ffffffff8125f904>] __fput+0x1a4/0x240
> [ 637.542102] [<ffffffff8125f9de>] ____fput+0xe/0x10
> [ 637.547542] [<ffffffff810b91a7>] task_work_run+0x77/0xa0
> [ 637.553565] [<ffffffff81098fdf>] do_exit+0x33f/0xc60
> [ 637.559199] [<ffffffff8109998c>] do_group_exit+0x4c/0xc0
> [ 637.565221] [<ffffffff810a7a11>] get_signal+0x331/0x8f0
> [ 637.571147] [<ffffffff8101d3c7>] do_signal+0x37/0x680
> [ 637.576878] [<ffffffff81113ab3>] ? rcu_read_lock_sched_held+0x93/0xa0
> [ 637.584160] [<ffffffff8123303e>] ? kfree+0x1ae/0x270
> [ 637.589794] [<ffffffff8108f2e4>] ? exit_to_usermode_loop+0x33/0xac
> [ 637.596785] [<ffffffff8108f30f>] exit_to_usermode_loop+0x5e/0xac
> [ 637.603584] [<ffffffff81003d0b>] syscall_return_slowpath+0xbb/0x130
> [ 637.610673] [<ffffffff81761ada>] int_ret_from_sys_call+0x25/0x9f
> [ 637.617469] Code: e8 65 3d 59 00 4c 89 ef e8 1d 72 07 00 4d 85 f6 74 06 48 89 df 41 ff d6 48 89 df e8 1a fb 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [ 637.639172] RIP [<ffffffff811cd141>] delete_from_page_cache+0x81/0x90
> [ 637.646464] RSP <ffff8804379f7978>


So, according to bisect first bad commit is:

commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
Author: Peter Zijlstra <[email protected]>
Date: Tue Dec 1 14:04:04 2015 +0100

sched/wait: Fix signal handling in bit wait helpers

Vladimir reported getting RCU stall warnings and bisected it back to
commit:

743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")

That commit inadvertently reversed the calls to schedule() and signal_pending(),
thereby not handling the case where the signal receives while we sleep.

Reported-by: Vladimir Murzin <[email protected]>
Tested-by: Vladimir Murzin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

which seems to me is only exposing problem elsewhere.

>
> Regards,
> Jan
>
> [1] https://github.com/linux-test-project/ltp
>

2015-12-07 15:45:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)

On Mon, Dec 07, 2015 at 04:18:30PM +0100, Jan Stancek wrote:
> So, according to bisect first bad commit is:
>
> commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
> Author: Peter Zijlstra <[email protected]>
> Date: Tue Dec 1 14:04:04 2015 +0100
>
> sched/wait: Fix signal handling in bit wait helpers
>
> which seems to me is only exposing problem elsewhere.
>

Nope, I think I messed that up, just not sure how to fix it proper then.
Let me have a ponder.

2015-12-07 16:07:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)

On Mon, Dec 07, 2015 at 04:44:59PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 07, 2015 at 04:18:30PM +0100, Jan Stancek wrote:
> > So, according to bisect first bad commit is:
> >
> > commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
> > Author: Peter Zijlstra <[email protected]>
> > Date: Tue Dec 1 14:04:04 2015 +0100
> >
> > sched/wait: Fix signal handling in bit wait helpers
> >
> > which seems to me is only exposing problem elsewhere.
> >
>
> Nope, I think I messed that up, just not sure how to fix it proper then.
> Let me have a ponder.

Blergh I hate signals :/

The below compiles, does it work?

---
fs/cifs/inode.c | 6 +++---
fs/nfs/inode.c | 6 +++---
fs/nfs/internal.h | 2 +-
fs/nfs/pagelist.c | 2 +-
fs/nfs/pnfs.c | 4 ++--
include/linux/wait.h | 10 +++++-----
kernel/sched/wait.c | 20 ++++++++++----------
net/sunrpc/sched.c | 6 +++---
8 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 6b66dd5..a329f5b 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1831,11 +1831,11 @@ cifs_invalidate_mapping(struct inode *inode)
* @word: long word containing the bit lock
*/
static int
-cifs_wait_bit_killable(struct wait_bit_key *key)
+cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 31b0a52..c7e8b87 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -75,11 +75,11 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
* nfs_wait_bit_killable - helper for functions that are sleeping on bit locks
* @word: long word containing the bit lock
*/
-int nfs_wait_bit_killable(struct wait_bit_key *key)
+int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}
EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 56cfde2..9dea85f 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -379,7 +379,7 @@ extern int nfs_drop_inode(struct inode *);
extern void nfs_clear_inode(struct inode *);
extern void nfs_evict_inode(struct inode *);
void nfs_zap_acl_cache(struct inode *inode);
-extern int nfs_wait_bit_killable(struct wait_bit_key *key);
+extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);

/* super.c */
extern const struct super_operations nfs_sops;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index fe3ddd2..452a011 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -129,7 +129,7 @@ __nfs_iocounter_wait(struct nfs_io_counter *c)
set_bit(NFS_IO_INPROGRESS, &c->flags);
if (atomic_read(&c->io_count) == 0)
break;
- ret = nfs_wait_bit_killable(&q.key);
+ ret = nfs_wait_bit_killable(&q.key, TASK_KILLABLE);
} while (atomic_read(&c->io_count) != 0 && !ret);
finish_wait(wq, &q.wait);
return ret;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5a8ae21..bec0384 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1466,11 +1466,11 @@ static bool pnfs_within_mdsthreshold(struct nfs_open_context *ctx,
}

/* stop waiting if someone clears NFS_LAYOUT_RETRY_LAYOUTGET bit. */
-static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key)
+static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key, int mode)
{
if (!test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, key->flags))
return 1;
- return nfs_wait_bit_killable(key);
+ return nfs_wait_bit_killable(key, mode);
}

static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1e1bf9f..513b36f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -145,7 +145,7 @@ __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
list_del(&old->task_list);
}

-typedef int wait_bit_action_f(struct wait_bit_key *);
+typedef int wait_bit_action_f(struct wait_bit_key *, int mode);
void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key);
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
@@ -960,10 +960,10 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
} while (0)


-extern int bit_wait(struct wait_bit_key *);
-extern int bit_wait_io(struct wait_bit_key *);
-extern int bit_wait_timeout(struct wait_bit_key *);
-extern int bit_wait_io_timeout(struct wait_bit_key *);
+extern int bit_wait(struct wait_bit_key *, int);
+extern int bit_wait_io(struct wait_bit_key *, int);
+extern int bit_wait_timeout(struct wait_bit_key *, int);
+extern int bit_wait_io_timeout(struct wait_bit_key *, int);

/**
* wait_on_bit - wait for a bit to be cleared
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index f10bd87..f15d6b6 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -392,7 +392,7 @@ __wait_on_bit(wait_queue_head_t *wq, struct wait_bit_queue *q,
do {
prepare_to_wait(wq, &q->wait, mode);
if (test_bit(q->key.bit_nr, q->key.flags))
- ret = (*action)(&q->key);
+ ret = (*action)(&q->key, mode);
} while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
finish_wait(wq, &q->wait);
return ret;
@@ -431,7 +431,7 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
prepare_to_wait_exclusive(wq, &q->wait, mode);
if (!test_bit(q->key.bit_nr, q->key.flags))
continue;
- ret = action(&q->key);
+ ret = action(&q->key, mode);
if (!ret)
continue;
abort_exclusive_wait(wq, &q->wait, mode, &q->key);
@@ -581,43 +581,43 @@ void wake_up_atomic_t(atomic_t *p)
}
EXPORT_SYMBOL(wake_up_atomic_t);

-__sched int bit_wait(struct wait_bit_key *word)
+__sched int bit_wait(struct wait_bit_key *word, int mode)
{
schedule();
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait);

-__sched int bit_wait_io(struct wait_bit_key *word)
+__sched int bit_wait_io(struct wait_bit_key *word, int mode)
{
io_schedule();
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL(bit_wait_io);

-__sched int bit_wait_timeout(struct wait_bit_key *word)
+__sched int bit_wait_timeout(struct wait_bit_key *word, int mode)
{
unsigned long now = READ_ONCE(jiffies);
if (time_after_eq(now, word->timeout))
return -EAGAIN;
schedule_timeout(word->timeout - now);
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
EXPORT_SYMBOL_GPL(bit_wait_timeout);

-__sched int bit_wait_io_timeout(struct wait_bit_key *word)
+__sched int bit_wait_io_timeout(struct wait_bit_key *word, int mode)
{
unsigned long now = READ_ONCE(jiffies);
if (time_after_eq(now, word->timeout))
return -EAGAIN;
io_schedule_timeout(word->timeout - now);
- if (signal_pending(current))
+ if (signal_pending_state(mode, current))
return -EINTR;
return 0;
}
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index f14f24e..73ad57a 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -250,11 +250,11 @@ void rpc_destroy_wait_queue(struct rpc_wait_queue *queue)
}
EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);

-static int rpc_wait_bit_killable(struct wait_bit_key *key)
+static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (fatal_signal_pending(current))
- return -ERESTARTSYS;
freezable_schedule_unsafe();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
return 0;
}

2015-12-07 17:32:54

by Jan Stancek

[permalink] [raw]
Subject: Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)





----- Original Message -----
> From: "Peter Zijlstra" <[email protected]>
> To: "Jan Stancek" <[email protected]>
> Cc: [email protected], [email protected], "Oleg Nesterov" <[email protected]>
> Sent: Monday, 7 December, 2015 5:07:15 PM
> Subject: Re: kernel BUG at mm/filemap.c:238! (4.4.0-rc4)
>
> On Mon, Dec 07, 2015 at 04:44:59PM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 07, 2015 at 04:18:30PM +0100, Jan Stancek wrote:
> > > So, according to bisect first bad commit is:
> > >
> > > commit 68985633bccb6066bf1803e316fbc6c1f5b796d6
> > > Author: Peter Zijlstra <[email protected]>
> > > Date: Tue Dec 1 14:04:04 2015 +0100
> > >
> > > sched/wait: Fix signal handling in bit wait helpers
> > >
> > > which seems to me is only exposing problem elsewhere.
> > >
> >
> > Nope, I think I messed that up, just not sure how to fix it proper then.
> > Let me have a ponder.
>
> Blergh I hate signals :/
>
> The below compiles, does it work?

Yes, it does. I applied your patch on 4.4-rc4 and I can't
reproduce it any longer.

>
> ---
> fs/cifs/inode.c | 6 +++---
> fs/nfs/inode.c | 6 +++---
> fs/nfs/internal.h | 2 +-
> fs/nfs/pagelist.c | 2 +-
> fs/nfs/pnfs.c | 4 ++--
> include/linux/wait.h | 10 +++++-----
> kernel/sched/wait.c | 20 ++++++++++----------
> net/sunrpc/sched.c | 6 +++---
> 8 files changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 6b66dd5..a329f5b 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -1831,11 +1831,11 @@ cifs_invalidate_mapping(struct inode *inode)
> * @word: long word containing the bit lock
> */
> static int
> -cifs_wait_bit_killable(struct wait_bit_key *key)
> +cifs_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> - if (fatal_signal_pending(current))
> - return -ERESTARTSYS;
> freezable_schedule_unsafe();
> + if (signal_pending_state(mode, current))
> + return -ERESTARTSYS;
> return 0;
> }
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 31b0a52..c7e8b87 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -75,11 +75,11 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
> * nfs_wait_bit_killable - helper for functions that are sleeping on bit
> locks
> * @word: long word containing the bit lock
> */
> -int nfs_wait_bit_killable(struct wait_bit_key *key)
> +int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> - if (fatal_signal_pending(current))
> - return -ERESTARTSYS;
> freezable_schedule_unsafe();
> + if (signal_pending_state(mode, current))
> + return -ERESTARTSYS;
> return 0;
> }
> EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 56cfde2..9dea85f 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -379,7 +379,7 @@ extern int nfs_drop_inode(struct inode *);
> extern void nfs_clear_inode(struct inode *);
> extern void nfs_evict_inode(struct inode *);
> void nfs_zap_acl_cache(struct inode *inode);
> -extern int nfs_wait_bit_killable(struct wait_bit_key *key);
> +extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
>
> /* super.c */
> extern const struct super_operations nfs_sops;
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index fe3ddd2..452a011 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -129,7 +129,7 @@ __nfs_iocounter_wait(struct nfs_io_counter *c)
> set_bit(NFS_IO_INPROGRESS, &c->flags);
> if (atomic_read(&c->io_count) == 0)
> break;
> - ret = nfs_wait_bit_killable(&q.key);
> + ret = nfs_wait_bit_killable(&q.key, TASK_KILLABLE);
> } while (atomic_read(&c->io_count) != 0 && !ret);
> finish_wait(wq, &q.wait);
> return ret;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 5a8ae21..bec0384 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1466,11 +1466,11 @@ static bool pnfs_within_mdsthreshold(struct
> nfs_open_context *ctx,
> }
>
> /* stop waiting if someone clears NFS_LAYOUT_RETRY_LAYOUTGET bit. */
> -static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key)
> +static int pnfs_layoutget_retry_bit_wait(struct wait_bit_key *key, int mode)
> {
> if (!test_bit(NFS_LAYOUT_RETRY_LAYOUTGET, key->flags))
> return 1;
> - return nfs_wait_bit_killable(key);
> + return nfs_wait_bit_killable(key, mode);
> }
>
> static bool pnfs_prepare_to_retry_layoutget(struct pnfs_layout_hdr *lo)
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 1e1bf9f..513b36f 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -145,7 +145,7 @@ __remove_wait_queue(wait_queue_head_t *head, wait_queue_t
> *old)
> list_del(&old->task_list);
> }
>
> -typedef int wait_bit_action_f(struct wait_bit_key *);
> +typedef int wait_bit_action_f(struct wait_bit_key *, int mode);
> void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
> void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void
> *key);
> void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr,
> void *key);
> @@ -960,10 +960,10 @@ int wake_bit_function(wait_queue_t *wait, unsigned
> mode, int sync, void *key);
> } while (0)
>
>
> -extern int bit_wait(struct wait_bit_key *);
> -extern int bit_wait_io(struct wait_bit_key *);
> -extern int bit_wait_timeout(struct wait_bit_key *);
> -extern int bit_wait_io_timeout(struct wait_bit_key *);
> +extern int bit_wait(struct wait_bit_key *, int);
> +extern int bit_wait_io(struct wait_bit_key *, int);
> +extern int bit_wait_timeout(struct wait_bit_key *, int);
> +extern int bit_wait_io_timeout(struct wait_bit_key *, int);
>
> /**
> * wait_on_bit - wait for a bit to be cleared
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index f10bd87..f15d6b6 100644
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -392,7 +392,7 @@ __wait_on_bit(wait_queue_head_t *wq, struct
> wait_bit_queue *q,
> do {
> prepare_to_wait(wq, &q->wait, mode);
> if (test_bit(q->key.bit_nr, q->key.flags))
> - ret = (*action)(&q->key);
> + ret = (*action)(&q->key, mode);
> } while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
> finish_wait(wq, &q->wait);
> return ret;
> @@ -431,7 +431,7 @@ __wait_on_bit_lock(wait_queue_head_t *wq, struct
> wait_bit_queue *q,
> prepare_to_wait_exclusive(wq, &q->wait, mode);
> if (!test_bit(q->key.bit_nr, q->key.flags))
> continue;
> - ret = action(&q->key);
> + ret = action(&q->key, mode);
> if (!ret)
> continue;
> abort_exclusive_wait(wq, &q->wait, mode, &q->key);
> @@ -581,43 +581,43 @@ void wake_up_atomic_t(atomic_t *p)
> }
> EXPORT_SYMBOL(wake_up_atomic_t);
>
> -__sched int bit_wait(struct wait_bit_key *word)
> +__sched int bit_wait(struct wait_bit_key *word, int mode)
> {
> schedule();
> - if (signal_pending(current))
> + if (signal_pending_state(mode, current))
> return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL(bit_wait);
>
> -__sched int bit_wait_io(struct wait_bit_key *word)
> +__sched int bit_wait_io(struct wait_bit_key *word, int mode)
> {
> io_schedule();
> - if (signal_pending(current))
> + if (signal_pending_state(mode, current))
> return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL(bit_wait_io);
>
> -__sched int bit_wait_timeout(struct wait_bit_key *word)
> +__sched int bit_wait_timeout(struct wait_bit_key *word, int mode)
> {
> unsigned long now = READ_ONCE(jiffies);
> if (time_after_eq(now, word->timeout))
> return -EAGAIN;
> schedule_timeout(word->timeout - now);
> - if (signal_pending(current))
> + if (signal_pending_state(mode, current))
> return -EINTR;
> return 0;
> }
> EXPORT_SYMBOL_GPL(bit_wait_timeout);
>
> -__sched int bit_wait_io_timeout(struct wait_bit_key *word)
> +__sched int bit_wait_io_timeout(struct wait_bit_key *word, int mode)
> {
> unsigned long now = READ_ONCE(jiffies);
> if (time_after_eq(now, word->timeout))
> return -EAGAIN;
> io_schedule_timeout(word->timeout - now);
> - if (signal_pending(current))
> + if (signal_pending_state(mode, current))
> return -EINTR;
> return 0;
> }
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index f14f24e..73ad57a 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -250,11 +250,11 @@ void rpc_destroy_wait_queue(struct rpc_wait_queue
> *queue)
> }
> EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>
> -static int rpc_wait_bit_killable(struct wait_bit_key *key)
> +static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> - if (fatal_signal_pending(current))
> - return -ERESTARTSYS;
> freezable_schedule_unsafe();
> + if (signal_pending_state(mode, current))
> + return -ERESTARTSYS;
> return 0;
> }
>
>