2012-10-08 18:20:25

by Peter Moody

[permalink] [raw]
Subject: Oops with ext(3|4) and audit and Xen

Hey folks,

I'm trying to track down a BUG() that seems to strike a particular
system configuration (unfortunately, an increasingly common
configuration), but does so with 100% reliability.

The system in question is a Xen instance (6 vcpus, 32G memory) running
3.2 on essentially stock ubuntu (10.04) system.

if I run the attached program with the crash dir set to any ext3 or
ext4 file system with any audit rules installed, I get an oops on the
second time through the while loop:

kernel BUG at fs/buffer.c:1267!
invalid opcode: 0000 [#1] SMP
CPU 1
Pid: 4146, comm: a.out Not tainted 3.2.5-will-break-2-ganetixenu #4
RIP: e030:[<ffffffff81696a6c>] [<ffffffff81696a6c>] check_irqs_on.part
.10+0x17/0x19
RSP: e02b:ffff8807c7339bf8 EFLAGS: 00010096
RAX: 000000000000001e RBX: ffff8807970840b0 RCX: 00000000000000e7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: ffff8807c7339bf8 R08: 0000000000000000 R09: 0000000000000018
R10: 0000000000006a5d R11: 0000000000000001 R12: 0000000000000400
R13: ffff8807dee05040 R14: ffff8807c7339dc0 R15: 0000000000000124
FS: 00007fe7cde057c0(0000) GS:ffff8807fff44000(0063) knlGS:00000000000
00000
CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f76dc4b0 CR3: 00000007a769a000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process a.out (pid: 4146, threadinfo ffff8807c7338000, task ffff8807ab3
496b0)
Stack:
ffff8807c7339c68 ffffffff81161dc9 ffff8807c7339c90 ffff8807b6b909f0
ffff8807ab23901a ffff8807c7339d60 ffff880700000000 ffff8807c7339d30
ffff8807c7339d60 ffff8807c7339e78 ffff8807970840b0 0000000000000400
Call Trace:
[<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
[<ffffffff81164c8f>] __getblk+0x1f/0x280
[<ffffffff811f35bb>] __ext4_get_inode_loc+0x10b/0x410
[<ffffffff81124935>] ? kmem_cache_alloc+0xa5/0x150
[<ffffffff811f9857>] ? ext4_evict_inode+0x177/0x450
[<ffffffff811f4cc7>] ext4_get_inode_loc+0x17/0x20
[<ffffffff811f75a8>] ext4_reserve_inode_write+0x28/0xa0
[<ffffffff811f9815>] ? ext4_evict_inode+0x135/0x450
[<ffffffff811f7673>] ext4_mark_inode_dirty+0x53/0x200
[<ffffffff811f9857>] ext4_evict_inode+0x177/0x450
[<ffffffff8114bfb1>] evict+0xa1/0x1a0
[<ffffffff8114cc61>] iput+0x101/0x210
[<ffffffff81148040>] d_kill+0xf0/0x130
[<ffffffff81148bd2>] dput+0xd2/0x1b0
[<ffffffff8113eb85>] path_put+0x15/0x30
[<ffffffff81693e39>] audit_free_names+0x96/0xb5
[<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
[<ffffffff816a076a>] sysexit_audit+0x21/0x5f
Code: 5c 48 89 df e8 b6 20 ab ff 5b 41 5c 5d c3 55 48 89 e5 0f 0b 55 be
08 00 00 00 48 c7 c7 c4 fe a0 81 31 c0 48 89 e5 e8 91 cb ff ff <0f>
0b 55 48 89 e5 0f
0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP [<ffffffff81696a6c>] check_irqs_on.part.10+0x17/0x19
RSP <ffff8807c7339bf8>

line 1267 of fs/buffer.c is

static inline void check_irqs_on(void)
{
#ifdef irqs_disabled
BUG_ON(irqs_disabled());
#endif
}

If I run the same code on the same system with the same audit rule(s)
on an ext2 filesystem, I get no such oops.

So it seems like something either in the ext3/ext4 or Xen codepath is
disabling interrupts. I'm getting an updated test Xen instance to test
on, but I while I'm waiting on that, I wanted to see if anyone one
here might have an idea of the ext3/4 codepath. whether something
there is doing the interrupt disabling or if there might be some other
race condition going on. I haven't had a chance to test with the large
"ext4 updates for v3.7" tytso recently posted, but I'll be doing that
later today in case something there fixes this.

So, does any one have any thoughts and/or pointers which might help me
get to the bottom of this?

Cheers,
peter

--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038


Attachments:
crasher.c (3.77 kB)

2012-10-08 19:08:04

by Eric Sandeen

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On 10/8/12 1:19 PM, Peter Moody wrote:
> Hey folks,
>
> I'm trying to track down a BUG() that seems to strike a particular
> system configuration (unfortunately, an increasingly common
> configuration), but does so with 100% reliability.
>
> The system in question is a Xen instance (6 vcpus, 32G memory) running
> 3.2 on essentially stock ubuntu (10.04) system.
>
> if I run the attached program with the crash dir set to any ext3 or
> ext4 file system with any audit rules installed, I get an oops on the
> second time through the while loop:
>
> kernel BUG at fs/buffer.c:1267!
> invalid opcode: 0000 [#1] SMP
> CPU 1
> Pid: 4146, comm: a.out Not tainted 3.2.5-will-break-2-ganetixenu #4
> RIP: e030:[<ffffffff81696a6c>] [<ffffffff81696a6c>] check_irqs_on.part
> .10+0x17/0x19
> RSP: e02b:ffff8807c7339bf8 EFLAGS: 00010096
> RAX: 000000000000001e RBX: ffff8807970840b0 RCX: 00000000000000e7
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: ffff8807c7339bf8 R08: 0000000000000000 R09: 0000000000000018
> R10: 0000000000006a5d R11: 0000000000000001 R12: 0000000000000400
> R13: ffff8807dee05040 R14: ffff8807c7339dc0 R15: 0000000000000124
> FS: 00007fe7cde057c0(0000) GS:ffff8807fff44000(0063) knlGS:00000000000
> 00000
> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000f76dc4b0 CR3: 00000007a769a000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process a.out (pid: 4146, threadinfo ffff8807c7338000, task ffff8807ab3
> 496b0)
> Stack:
> ffff8807c7339c68 ffffffff81161dc9 ffff8807c7339c90 ffff8807b6b909f0
> ffff8807ab23901a ffff8807c7339d60 ffff880700000000 ffff8807c7339d30
> ffff8807c7339d60 ffff8807c7339e78 ffff8807970840b0 0000000000000400
> Call Trace:
> [<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
> [<ffffffff81164c8f>] __getblk+0x1f/0x280
> [<ffffffff811f35bb>] __ext4_get_inode_loc+0x10b/0x410
> [<ffffffff81124935>] ? kmem_cache_alloc+0xa5/0x150
> [<ffffffff811f9857>] ? ext4_evict_inode+0x177/0x450
> [<ffffffff811f4cc7>] ext4_get_inode_loc+0x17/0x20
> [<ffffffff811f75a8>] ext4_reserve_inode_write+0x28/0xa0
> [<ffffffff811f9815>] ? ext4_evict_inode+0x135/0x450
> [<ffffffff811f7673>] ext4_mark_inode_dirty+0x53/0x200
> [<ffffffff811f9857>] ext4_evict_inode+0x177/0x450
> [<ffffffff8114bfb1>] evict+0xa1/0x1a0
> [<ffffffff8114cc61>] iput+0x101/0x210
> [<ffffffff81148040>] d_kill+0xf0/0x130
> [<ffffffff81148bd2>] dput+0xd2/0x1b0
> [<ffffffff8113eb85>] path_put+0x15/0x30
> [<ffffffff81693e39>] audit_free_names+0x96/0xb5
> [<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
> [<ffffffff816a076a>] sysexit_audit+0x21/0x5f
> Code: 5c 48 89 df e8 b6 20 ab ff 5b 41 5c 5d c3 55 48 89 e5 0f 0b 55 be
> 08 00 00 00 48 c7 c7 c4 fe a0 81 31 c0 48 89 e5 e8 91 cb ff ff <0f>
> 0b 55 48 89 e5 0f
> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
> RIP [<ffffffff81696a6c>] check_irqs_on.part.10+0x17/0x19
> RSP <ffff8807c7339bf8>
>
> line 1267 of fs/buffer.c is
>
> static inline void check_irqs_on(void)
> {
> #ifdef irqs_disabled
> BUG_ON(irqs_disabled());
> #endif
> }
>
> If I run the same code on the same system with the same audit rule(s)
> on an ext2 filesystem, I get no such oops.
>
> So it seems like something either in the ext3/ext4 or Xen codepath is
> disabling interrupts. I'm getting an updated test Xen instance to test
> on, but I while I'm waiting on that, I wanted to see if anyone one
> here might have an idea of the ext3/4 codepath. whether something
> there is doing the interrupt disabling or if there might be some other
> race condition going on. I haven't had a chance to test with the large
> "ext4 updates for v3.7" tytso recently posted, but I'll be doing that
> later today in case something there fixes this.
>
> So, does any one have any thoughts and/or pointers which might help me
> get to the bottom of this?

I had suggested this on the other list, but will put it here too, though it
might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
test might give the wrong answer.

Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
various places along the stack above and never got it to trip; until after
the BUG_ON() had fired; this makes me think corruption might be a possibility
after all.

I suggested building with CONFIG_DEBUG_STACK_USAGE and watching
for stack messages, and building with CONFIG_STACK_TRACER on, and then:

# mount -t debugfs none /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# <run your test w/o the panic, get the BUG_ON>
# cat /sys/kernel/debug/tracing/stack_trace

to try to rule that out.

-Eric

> Cheers,
> peter
>


2012-10-08 21:04:02

by Peter Moody

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On Mon, Oct 8, 2012 at 12:08 PM, Eric Sandeen <[email protected]> wrote:
> On 10/8/12 1:19 PM, Peter Moody wrote:
>> Hey folks,
>>
>> I'm trying to track down a BUG() that seems to strike a particular
>> system configuration (unfortunately, an increasingly common
>> configuration), but does so with 100% reliability.
>>
>> The system in question is a Xen instance (6 vcpus, 32G memory) running
>> 3.2 on essentially stock ubuntu (10.04) system.
>>
>> if I run the attached program with the crash dir set to any ext3 or
>> ext4 file system with any audit rules installed, I get an oops on the
>> second time through the while loop:
>>
>> kernel BUG at fs/buffer.c:1267!
>> invalid opcode: 0000 [#1] SMP
>> CPU 1
>> Pid: 4146, comm: a.out Not tainted 3.2.5-will-break-2-ganetixenu #4
>> RIP: e030:[<ffffffff81696a6c>] [<ffffffff81696a6c>] check_irqs_on.part
>> .10+0x17/0x19
>> RSP: e02b:ffff8807c7339bf8 EFLAGS: 00010096
>> RAX: 000000000000001e RBX: ffff8807970840b0 RCX: 00000000000000e7
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
>> RBP: ffff8807c7339bf8 R08: 0000000000000000 R09: 0000000000000018
>> R10: 0000000000006a5d R11: 0000000000000001 R12: 0000000000000400
>> R13: ffff8807dee05040 R14: ffff8807c7339dc0 R15: 0000000000000124
>> FS: 00007fe7cde057c0(0000) GS:ffff8807fff44000(0063) knlGS:00000000000
>> 00000
>> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
>> CR2: 00000000f76dc4b0 CR3: 00000007a769a000 CR4: 0000000000002660
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process a.out (pid: 4146, threadinfo ffff8807c7338000, task ffff8807ab3
>> 496b0)
>> Stack:
>> ffff8807c7339c68 ffffffff81161dc9 ffff8807c7339c90 ffff8807b6b909f0
>> ffff8807ab23901a ffff8807c7339d60 ffff880700000000 ffff8807c7339d30
>> ffff8807c7339d60 ffff8807c7339e78 ffff8807970840b0 0000000000000400
>> Call Trace:
>> [<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
>> [<ffffffff81164c8f>] __getblk+0x1f/0x280
>> [<ffffffff811f35bb>] __ext4_get_inode_loc+0x10b/0x410
>> [<ffffffff81124935>] ? kmem_cache_alloc+0xa5/0x150
>> [<ffffffff811f9857>] ? ext4_evict_inode+0x177/0x450
>> [<ffffffff811f4cc7>] ext4_get_inode_loc+0x17/0x20
>> [<ffffffff811f75a8>] ext4_reserve_inode_write+0x28/0xa0
>> [<ffffffff811f9815>] ? ext4_evict_inode+0x135/0x450
>> [<ffffffff811f7673>] ext4_mark_inode_dirty+0x53/0x200
>> [<ffffffff811f9857>] ext4_evict_inode+0x177/0x450
>> [<ffffffff8114bfb1>] evict+0xa1/0x1a0
>> [<ffffffff8114cc61>] iput+0x101/0x210
>> [<ffffffff81148040>] d_kill+0xf0/0x130
>> [<ffffffff81148bd2>] dput+0xd2/0x1b0
>> [<ffffffff8113eb85>] path_put+0x15/0x30
>> [<ffffffff81693e39>] audit_free_names+0x96/0xb5
>> [<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
>> [<ffffffff816a076a>] sysexit_audit+0x21/0x5f
>> Code: 5c 48 89 df e8 b6 20 ab ff 5b 41 5c 5d c3 55 48 89 e5 0f 0b 55 be
>> 08 00 00 00 48 c7 c7 c4 fe a0 81 31 c0 48 89 e5 e8 91 cb ff ff <0f>
>> 0b 55 48 89 e5 0f
>> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
>> RIP [<ffffffff81696a6c>] check_irqs_on.part.10+0x17/0x19
>> RSP <ffff8807c7339bf8>
>>
>> line 1267 of fs/buffer.c is
>>
>> static inline void check_irqs_on(void)
>> {
>> #ifdef irqs_disabled
>> BUG_ON(irqs_disabled());
>> #endif
>> }
>>
>> If I run the same code on the same system with the same audit rule(s)
>> on an ext2 filesystem, I get no such oops.
>>
>> So it seems like something either in the ext3/ext4 or Xen codepath is
>> disabling interrupts. I'm getting an updated test Xen instance to test
>> on, but I while I'm waiting on that, I wanted to see if anyone one
>> here might have an idea of the ext3/4 codepath. whether something
>> there is doing the interrupt disabling or if there might be some other
>> race condition going on. I haven't had a chance to test with the large
>> "ext4 updates for v3.7" tytso recently posted, but I'll be doing that
>> later today in case something there fixes this.
>>
>> So, does any one have any thoughts and/or pointers which might help me
>> get to the bottom of this?
>
> I had suggested this on the other list, but will put it here too, though it
> might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
> test might give the wrong answer.
>
> Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
> various places along the stack above and never got it to trip; until after
> the BUG_ON() had fired; this makes me think corruption might be a possibility
> after all.
>
> I suggested building with CONFIG_DEBUG_STACK_USAGE and watching
> for stack messages, and building with CONFIG_STACK_TRACER on, and then:
>
> # mount -t debugfs none /sys/kernel/debug
> # echo 1 > /proc/sys/kernel/stack_tracer_enabled
> # <run your test w/o the panic, get the BUG_ON>
> # cat /sys/kernel/debug/tracing/stack_trace
>
> to try to rule that out.

This is what I get. I'm not sure how to interpret this output:

Depth Size Location (31 entries)
----- ---- --------
0) 3792 272 load_balance+0xaf/0x890
1) 3520 144 __schedule+0x7ee/0x870
2) 3376 16 schedule+0x3f/0x60
3) 3360 32 io_schedule+0x8f/0xd0
4) 3328 16 sleep_on_buffer+0xe/0x20
5) 3312 80 __wait_on_bit+0x5f/0x90
6) 3232 112 out_of_line_wait_on_bit+0x7c/0x90
7) 3120 16 __wait_on_buffer+0x2e/0x30
8) 3104 32 __bread+0x71/0xb0
9) 3072 96 ext3_get_branch+0x8b/0x150
10) 2976 384 ext3_get_blocks_handle+0xd2/0xe00
11) 2592 80 ext3_get_block+0xc4/0x120
12) 2512 208 do_mpage_readpage+0x3a2/0x5a0
13) 2304 272 mpage_readpages+0xee/0x140
14) 2032 16 ext3_readpages+0x1d/0x20
15) 2016 192 __do_page_cache_readahead+0x1af/0x250
16) 1824 16 ra_submit+0x21/0x30
17) 1808 112 filemap_fault+0x261/0x4a0
18) 1696 160 __do_fault+0x6f/0x500
19) 1536 208 handle_pte_fault+0xf7/0xac0
20) 1328 80 handle_mm_fault+0x164/0x260
21) 1248 272 do_page_fault+0x144/0x460
22) 976 208 page_fault+0x25/0x30
23) 768 16 clear_user+0x36/0x40
24) 752 16 padzero+0x2d/0x40
25) 736 304 load_elf_binary+0xa7f/0x1a10
26) 432 80 search_binary_handler+0xc8/0x1b0
27) 352 112 do_execve_common.isra.25+0x2b1/0x350
28) 240 16 do_execve+0x1b/0x20
29) 224 48 sys_execve+0x47/0x70
30) 176 176 stub_execve+0x6c/0xc0

and for good measure, here's the BUG() notice.

kernel BUG at fs/buffer.c:1265!
invalid opcode: 0000 [#1] SMP
CPU 1
Pid: 3810, comm: a.out Not tainted 3.2.5-at20-ganetixenu #7
RIP: e030:[<ffffffff816bd776>] [<ffffffff816bd776>]
check_irqs_on.part.9+0x9/0xb
RSP: e02b:ffff8807aca81bf8 EFLAGS: 00010096
RAX: ffff8807c7400000 RBX: ffff8807acf1c420 RCX: 0000000000000123
RDX: 0000000000000400 RSI: 0000000000000124 RDI: ffff8807c9dec000
RBP: ffff8807aca81bf8 R08: ffff8807c7401000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000400
R13: ffff8807c9dec000 R14: ffff8807aca81dc0 R15: 0000000000000124
FS: 00007f528f9df700(0000) GS:ffff8807fff44000(0063) knlGS:0000000000000000
CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f76c14b0 CR3: 00000007a96d9000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process a.out (pid: 3810, threadinfo ffff8807aca80000, task ffff8807acb42da0)
Stack:
ffff8807aca81c68 ffffffff81170ed9 ffff8807acf1c420 ffff8807acf1c420
ffff8807c7402000 ffff8807aca81dc0 ffff8807aca81cc8 ffffffff816c5316
ffff8807c7401000 0000000000000123 ffff8807acf1c420 0000000000000400
Call Trace:
[<ffffffff81170ed9>] __find_get_block+0x209/0x210
[<ffffffff816c5316>] ? ftrace_call+0x5/0x2b
[<ffffffff81173e64>] __getblk+0x24/0x280
[<ffffffff816c5316>] ? ftrace_call+0x5/0x2b
[<ffffffff81204b63>] __ext4_get_inode_loc+0x113/0x420
[<ffffffff8120af4f>] ? ext4_evict_inode+0x17f/0x450
[<ffffffff812062ac>] ext4_get_inode_loc+0x1c/0x20
[<ffffffff81208c5d>] ext4_reserve_inode_write+0x2d/0xa0
[<ffffffff81208d28>] ext4_mark_inode_dirty+0x58/0x210
[<ffffffff8120af4f>] ext4_evict_inode+0x17f/0x450
[<ffffffff8115a826>] evict+0xa6/0x1b0
[<ffffffff8115b521>] iput+0x101/0x210
[<ffffffff81156768>] d_kill+0xf8/0x130
[<ffffffff816baa34>] ? audit_free_names+0x6a/0xba
[<ffffffff8115732a>] dput+0xda/0x1b0
[<ffffffff8114d15a>] path_put+0x1a/0x30
[<ffffffff816baa65>] audit_free_names+0x9b/0xba
[<ffffffff810b09be>] audit_syscall_exit+0x13e/0x1e0
[<ffffffff816c7aea>] sysexit_audit+0x21/0x5f
Code: ff ff 48 85 db 74 0b fe 43 5c 48 89 df e8 f3 9a a9 ff 5b 41 5c
5d c3 55 48 89 e5 e8 65 7b 00
00 0f 0b 55 48 89 e5 e8 5a 7b 00 00 <0f> 0b 55 48 89 e5 e8 4f 7b 00 00
0f 0b 55 48 89 e5 e8 44 7b 00
RIP [<ffffffff816bd776>] check_irqs_on.part.9+0x9/0xb
RSP <ffff8807aca81bf8>

> -Eric
>
>> Cheers,
>> peter
>>
>



--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038

2012-10-08 21:15:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On Mon, Oct 08, 2012 at 11:19:53AM -0700, Peter Moody wrote:
>
> I'm trying to track down a BUG() that seems to strike a particular
> system configuration (unfortunately, an increasingly common
> configuration), but does so with 100% reliability.

Peter, this looks like exactly the same issue that you asked me about
back in July, isn't it? Check your corp e-mail archives for the
subject line: "advice/pointers on debugging a potential kernel race".

As I recall, the problem was a missing path_put() in audit_putname()
in kernel/auditsc.c?

- Ted

2012-10-08 21:19:37

by Peter Moody

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On Mon, Oct 8, 2012 at 2:15 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Oct 08, 2012 at 11:19:53AM -0700, Peter Moody wrote:
>>
>> I'm trying to track down a BUG() that seems to strike a particular
>> system configuration (unfortunately, an increasingly common
>> configuration), but does so with 100% reliability.
>
> Peter, this looks like exactly the same issue that you asked me about
> back in July, isn't it? Check your corp e-mail archives for the
> subject line: "advice/pointers on debugging a potential kernel race".
>
> As I recall, the problem was a missing path_put() in audit_putname()
> in kernel/auditsc.c?
>
> - Ted

Yep, this is that issue and I thought that was the fix but it turns
out it wasn't. The dentry was already being pinned in:

if (!context->pwd.dentry)
get_fs_pwd(current->fs, &context->pwd);

so my path_get/path_put patch just needlessly grabbed a second reference.

--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038

2012-10-08 21:39:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On Mon, Oct 08, 2012 at 02:08:02PM -0500, Eric Sandeen wrote:
> I had suggested this on the other list, but will put it here too, though it
> might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
> test might give the wrong answer.
>
> Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
> various places along the stack above and never got it to trip; until after
> the BUG_ON() had fired; this makes me think corruption might be a possibility
> after all.

Well, there is absolutely no place where we disable interrupts in
ext3. In ext4 we do have some spinlock_irqsave/irqresture() calls,
but they are tightly bracketed --- and since you can reproduce this
with ext3, I think that pretty much exonerates ext4.

Hmm.... one possibility might be that it's some XEN-specific paravirt
call that happens to be called by ext3/ext4 and which is leaving
interrupts disabled on its return due to a missing irqrestore() call?

Can you reproduce the problem if you disable XEN and run this on a
native system? What if you run a kernel w/o auditing but under Xen?
Maybe that will allow you to figure out what the critical variable
might be?

I'll note that if ext3 or ext4 was playing with interrupts and leaving
them disabled, we'd have a huge number of people complaining. So the
question is whether it's something unique to audit, or unique to Xen,
or perhaps the combination of the two....

- Ted

2012-10-08 21:40:52

by Eric Sandeen

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On 10/8/12 4:39 PM, Theodore Ts'o wrote:
> On Mon, Oct 08, 2012 at 02:08:02PM -0500, Eric Sandeen wrote:
>> I had suggested this on the other list, but will put it here too, though it
>> might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
>> test might give the wrong answer.
>>
>> Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
>> various places along the stack above and never got it to trip; until after
>> the BUG_ON() had fired; this makes me think corruption might be a possibility
>> after all.
>
> Well, there is absolutely no place where we disable interrupts in
> ext3. In ext4 we do have some spinlock_irqsave/irqresture() calls,
> but they are tightly bracketed --- and since you can reproduce this
> with ext3, I think that pretty much exonerates ext4.
>
> Hmm.... one possibility might be that it's some XEN-specific paravirt
> call that happens to be called by ext3/ext4 and which is leaving
> interrupts disabled on its return due to a missing irqrestore() call?
>
> Can you reproduce the problem if you disable XEN and run this on a
> native system? What if you run a kernel w/o auditing but under Xen?
> Maybe that will allow you to figure out what the critical variable
> might be?
>
> I'll note that if ext3 or ext4 was playing with interrupts and leaving
> them disabled, we'd have a huge number of people complaining. So the
> question is whether it's something unique to audit, or unique to Xen,
> or perhaps the combination of the two....

and unique to running a 32-bit binary as well, right?

> - Ted



2012-10-08 21:41:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On 10/8/12 4:03 PM, Peter Moody wrote:
> On Mon, Oct 8, 2012 at 12:08 PM, Eric Sandeen <[email protected]> wrote:
>> On 10/8/12 1:19 PM, Peter Moody wrote:
>>> Hey folks,
>>>
>>> I'm trying to track down a BUG() that seems to strike a particular
>>> system configuration (unfortunately, an increasingly common
>>> configuration), but does so with 100% reliability.
>>>
>>> The system in question is a Xen instance (6 vcpus, 32G memory) running
>>> 3.2 on essentially stock ubuntu (10.04) system.
>>>
>>> if I run the attached program with the crash dir set to any ext3 or
>>> ext4 file system with any audit rules installed, I get an oops on the
>>> second time through the while loop:
>>>
>>> kernel BUG at fs/buffer.c:1267!
>>> invalid opcode: 0000 [#1] SMP
>>> CPU 1
>>> Pid: 4146, comm: a.out Not tainted 3.2.5-will-break-2-ganetixenu #4
>>> RIP: e030:[<ffffffff81696a6c>] [<ffffffff81696a6c>] check_irqs_on.part
>>> .10+0x17/0x19
>>> RSP: e02b:ffff8807c7339bf8 EFLAGS: 00010096
>>> RAX: 000000000000001e RBX: ffff8807970840b0 RCX: 00000000000000e7
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
>>> RBP: ffff8807c7339bf8 R08: 0000000000000000 R09: 0000000000000018
>>> R10: 0000000000006a5d R11: 0000000000000001 R12: 0000000000000400
>>> R13: ffff8807dee05040 R14: ffff8807c7339dc0 R15: 0000000000000124
>>> FS: 00007fe7cde057c0(0000) GS:ffff8807fff44000(0063) knlGS:00000000000
>>> 00000
>>> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
>>> CR2: 00000000f76dc4b0 CR3: 00000007a769a000 CR4: 0000000000002660
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process a.out (pid: 4146, threadinfo ffff8807c7338000, task ffff8807ab3
>>> 496b0)
>>> Stack:
>>> ffff8807c7339c68 ffffffff81161dc9 ffff8807c7339c90 ffff8807b6b909f0
>>> ffff8807ab23901a ffff8807c7339d60 ffff880700000000 ffff8807c7339d30
>>> ffff8807c7339d60 ffff8807c7339e78 ffff8807970840b0 0000000000000400
>>> Call Trace:
>>> [<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
>>> [<ffffffff81164c8f>] __getblk+0x1f/0x280
>>> [<ffffffff811f35bb>] __ext4_get_inode_loc+0x10b/0x410
>>> [<ffffffff81124935>] ? kmem_cache_alloc+0xa5/0x150
>>> [<ffffffff811f9857>] ? ext4_evict_inode+0x177/0x450
>>> [<ffffffff811f4cc7>] ext4_get_inode_loc+0x17/0x20
>>> [<ffffffff811f75a8>] ext4_reserve_inode_write+0x28/0xa0
>>> [<ffffffff811f9815>] ? ext4_evict_inode+0x135/0x450
>>> [<ffffffff811f7673>] ext4_mark_inode_dirty+0x53/0x200
>>> [<ffffffff811f9857>] ext4_evict_inode+0x177/0x450
>>> [<ffffffff8114bfb1>] evict+0xa1/0x1a0
>>> [<ffffffff8114cc61>] iput+0x101/0x210
>>> [<ffffffff81148040>] d_kill+0xf0/0x130
>>> [<ffffffff81148bd2>] dput+0xd2/0x1b0
>>> [<ffffffff8113eb85>] path_put+0x15/0x30
>>> [<ffffffff81693e39>] audit_free_names+0x96/0xb5
>>> [<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
>>> [<ffffffff816a076a>] sysexit_audit+0x21/0x5f
>>> Code: 5c 48 89 df e8 b6 20 ab ff 5b 41 5c 5d c3 55 48 89 e5 0f 0b 55 be
>>> 08 00 00 00 48 c7 c7 c4 fe a0 81 31 c0 48 89 e5 e8 91 cb ff ff <0f>
>>> 0b 55 48 89 e5 0f
>>> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
>>> RIP [<ffffffff81696a6c>] check_irqs_on.part.10+0x17/0x19
>>> RSP <ffff8807c7339bf8>
>>>
>>> line 1267 of fs/buffer.c is
>>>
>>> static inline void check_irqs_on(void)
>>> {
>>> #ifdef irqs_disabled
>>> BUG_ON(irqs_disabled());
>>> #endif
>>> }
>>>
>>> If I run the same code on the same system with the same audit rule(s)
>>> on an ext2 filesystem, I get no such oops.
>>>
>>> So it seems like something either in the ext3/ext4 or Xen codepath is
>>> disabling interrupts. I'm getting an updated test Xen instance to test
>>> on, but I while I'm waiting on that, I wanted to see if anyone one
>>> here might have an idea of the ext3/4 codepath. whether something
>>> there is doing the interrupt disabling or if there might be some other
>>> race condition going on. I haven't had a chance to test with the large
>>> "ext4 updates for v3.7" tytso recently posted, but I'll be doing that
>>> later today in case something there fixes this.
>>>
>>> So, does any one have any thoughts and/or pointers which might help me
>>> get to the bottom of this?
>>
>> I had suggested this on the other list, but will put it here too, though it
>> might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
>> test might give the wrong answer.
>>
>> Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
>> various places along the stack above and never got it to trip; until after
>> the BUG_ON() had fired; this makes me think corruption might be a possibility
>> after all.
>>
>> I suggested building with CONFIG_DEBUG_STACK_USAGE and watching
>> for stack messages, and building with CONFIG_STACK_TRACER on, and then:
>>
>> # mount -t debugfs none /sys/kernel/debug
>> # echo 1 > /proc/sys/kernel/stack_tracer_enabled
>> # <run your test w/o the panic, get the BUG_ON>
>> # cat /sys/kernel/debug/tracing/stack_trace
>>
>> to try to rule that out.
>
> This is what I get. I'm not sure how to interpret this output:
>
> Depth Size Location (31 entries)
> ----- ---- --------
> 0) 3792 272 load_balance+0xaf/0x890

so it only got 3792 bytes into the (8k) stack, not bad at all I guess.
More than one way to corrupt a threadinfo but I guess this wasn't it.

-Eric

> 1) 3520 144 __schedule+0x7ee/0x870
> 2) 3376 16 schedule+0x3f/0x60
> 3) 3360 32 io_schedule+0x8f/0xd0
> 4) 3328 16 sleep_on_buffer+0xe/0x20
> 5) 3312 80 __wait_on_bit+0x5f/0x90
> 6) 3232 112 out_of_line_wait_on_bit+0x7c/0x90
> 7) 3120 16 __wait_on_buffer+0x2e/0x30
> 8) 3104 32 __bread+0x71/0xb0
> 9) 3072 96 ext3_get_branch+0x8b/0x150
> 10) 2976 384 ext3_get_blocks_handle+0xd2/0xe00
> 11) 2592 80 ext3_get_block+0xc4/0x120
> 12) 2512 208 do_mpage_readpage+0x3a2/0x5a0
> 13) 2304 272 mpage_readpages+0xee/0x140
> 14) 2032 16 ext3_readpages+0x1d/0x20
> 15) 2016 192 __do_page_cache_readahead+0x1af/0x250
> 16) 1824 16 ra_submit+0x21/0x30
> 17) 1808 112 filemap_fault+0x261/0x4a0
> 18) 1696 160 __do_fault+0x6f/0x500
> 19) 1536 208 handle_pte_fault+0xf7/0xac0
> 20) 1328 80 handle_mm_fault+0x164/0x260
> 21) 1248 272 do_page_fault+0x144/0x460
> 22) 976 208 page_fault+0x25/0x30
> 23) 768 16 clear_user+0x36/0x40
> 24) 752 16 padzero+0x2d/0x40
> 25) 736 304 load_elf_binary+0xa7f/0x1a10
> 26) 432 80 search_binary_handler+0xc8/0x1b0
> 27) 352 112 do_execve_common.isra.25+0x2b1/0x350
> 28) 240 16 do_execve+0x1b/0x20
> 29) 224 48 sys_execve+0x47/0x70
> 30) 176 176 stub_execve+0x6c/0xc0
>
> and for good measure, here's the BUG() notice.
>
> kernel BUG at fs/buffer.c:1265!
> invalid opcode: 0000 [#1] SMP
> CPU 1
> Pid: 3810, comm: a.out Not tainted 3.2.5-at20-ganetixenu #7
> RIP: e030:[<ffffffff816bd776>] [<ffffffff816bd776>]
> check_irqs_on.part.9+0x9/0xb
> RSP: e02b:ffff8807aca81bf8 EFLAGS: 00010096
> RAX: ffff8807c7400000 RBX: ffff8807acf1c420 RCX: 0000000000000123
> RDX: 0000000000000400 RSI: 0000000000000124 RDI: ffff8807c9dec000
> RBP: ffff8807aca81bf8 R08: ffff8807c7401000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000400
> R13: ffff8807c9dec000 R14: ffff8807aca81dc0 R15: 0000000000000124
> FS: 00007f528f9df700(0000) GS:ffff8807fff44000(0063) knlGS:0000000000000000
> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000f76c14b0 CR3: 00000007a96d9000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process a.out (pid: 3810, threadinfo ffff8807aca80000, task ffff8807acb42da0)
> Stack:
> ffff8807aca81c68 ffffffff81170ed9 ffff8807acf1c420 ffff8807acf1c420
> ffff8807c7402000 ffff8807aca81dc0 ffff8807aca81cc8 ffffffff816c5316
> ffff8807c7401000 0000000000000123 ffff8807acf1c420 0000000000000400
> Call Trace:
> [<ffffffff81170ed9>] __find_get_block+0x209/0x210
> [<ffffffff816c5316>] ? ftrace_call+0x5/0x2b
> [<ffffffff81173e64>] __getblk+0x24/0x280
> [<ffffffff816c5316>] ? ftrace_call+0x5/0x2b
> [<ffffffff81204b63>] __ext4_get_inode_loc+0x113/0x420
> [<ffffffff8120af4f>] ? ext4_evict_inode+0x17f/0x450
> [<ffffffff812062ac>] ext4_get_inode_loc+0x1c/0x20
> [<ffffffff81208c5d>] ext4_reserve_inode_write+0x2d/0xa0
> [<ffffffff81208d28>] ext4_mark_inode_dirty+0x58/0x210
> [<ffffffff8120af4f>] ext4_evict_inode+0x17f/0x450
> [<ffffffff8115a826>] evict+0xa6/0x1b0
> [<ffffffff8115b521>] iput+0x101/0x210
> [<ffffffff81156768>] d_kill+0xf8/0x130
> [<ffffffff816baa34>] ? audit_free_names+0x6a/0xba
> [<ffffffff8115732a>] dput+0xda/0x1b0
> [<ffffffff8114d15a>] path_put+0x1a/0x30
> [<ffffffff816baa65>] audit_free_names+0x9b/0xba
> [<ffffffff810b09be>] audit_syscall_exit+0x13e/0x1e0
> [<ffffffff816c7aea>] sysexit_audit+0x21/0x5f
> Code: ff ff 48 85 db 74 0b fe 43 5c 48 89 df e8 f3 9a a9 ff 5b 41 5c
> 5d c3 55 48 89 e5 e8 65 7b 00
> 00 0f 0b 55 48 89 e5 e8 5a 7b 00 00 <0f> 0b 55 48 89 e5 e8 4f 7b 00 00
> 0f 0b 55 48 89 e5 e8 44 7b 00
> RIP [<ffffffff816bd776>] check_irqs_on.part.9+0x9/0xb
> RSP <ffff8807aca81bf8>
>
>> -Eric
>>
>>> Cheers,
>>> peter
>>>
>>
>
>
>


2012-10-08 21:46:11

by Peter Moody

[permalink] [raw]
Subject: Re: Oops with ext(3|4) and audit and Xen

On Mon, Oct 8, 2012 at 2:40 PM, Eric Sandeen <[email protected]> wrote:
> On 10/8/12 4:39 PM, Theodore Ts'o wrote:
>> On Mon, Oct 08, 2012 at 02:08:02PM -0500, Eric Sandeen wrote:
>>> I had suggested this on the other list, but will put it here too, though it
>>> might be a long shot. If threadinfo gets corrupted, the irqs_enabled()
>>> test might give the wrong answer.
>>>
>>> Peter also mentioned that he had tried putting WARN_ON(irqs_disabled()) at
>>> various places along the stack above and never got it to trip; until after
>>> the BUG_ON() had fired; this makes me think corruption might be a possibility
>>> after all.
>>
>> Well, there is absolutely no place where we disable interrupts in
>> ext3. In ext4 we do have some spinlock_irqsave/irqresture() calls,
>> but they are tightly bracketed --- and since you can reproduce this
>> with ext3, I think that pretty much exonerates ext4.
>>
>> Hmm.... one possibility might be that it's some XEN-specific paravirt
>> call that happens to be called by ext3/ext4 and which is leaving
>> interrupts disabled on its return due to a missing irqrestore() call?
>>
>> Can you reproduce the problem if you disable XEN and run this on a
>> native system?

Nope, I can't reproduce with this setup (and I've tried a *ton*)

>> What if you run a kernel w/o auditing but under Xen?

Nope, this doesn't trigger it either.

>> Maybe that will allow you to figure out what the critical variable
>> might be?

Yeah, I'm working with the Xen folks to get a test cluster built that
I can test this out on.

>> I'll note that if ext3 or ext4 was playing with interrupts and leaving
>> them disabled, we'd have a huge number of people complaining. So the
>> question is whether it's something unique to audit, or unique to Xen,
>> or perhaps the combination of the two....

Yeah, I figured if this was something in ext3/4, I would not be the
first person asking about it. I mostly brought it here this morning
because ext2 seemed immune.

> and unique to running a 32-bit binary as well, right?

Yes, this does seem to be required for triggering this.

Cheers,
peter

>> - Ted
>
>



--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038