2009-09-04 17:08:51

by Mimi Zohar

[permalink] [raw]
Subject: [PATCH] IMA: update ima_counts_put

- As ima_counts_put() may be called after the inode has been freed,
verify that the inode is not NULL, before dereferencing it.

- Maintain the IMA file counters in may_open() properly, decrementing
any counter increments on subsequent errors.

Reported-by: Ciprian Docan <[email protected]>
Reported-by: J.R. Okajima <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>
---
fs/namei.c | 22 +++++++++++++++-------
security/integrity/ima/ima_main.c | 6 +++++-
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ee01308..fcfc553 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1544,28 +1544,31 @@ int may_open(struct path *path, int acc_mode, int flag)
* An append-only file must be opened in append mode for writing.
*/
if (IS_APPEND(inode)) {
+ error = -EPERM;
if ((flag & FMODE_WRITE) && !(flag & O_APPEND))
- return -EPERM;
+ goto err_out;
if (flag & O_TRUNC)
- return -EPERM;
+ goto err_out;
}

/* O_NOATIME can only be set by the owner or superuser */
if (flag & O_NOATIME)
- if (!is_owner_or_cap(inode))
- return -EPERM;
+ if (!is_owner_or_cap(inode)) {
+ error = -EPERM;
+ goto err_out;
+ }

/*
* Ensure there are no outstanding leases on the file.
*/
error = break_lease(inode, flag);
if (error)
- return error;
+ goto err_out;

if (flag & O_TRUNC) {
error = get_write_access(inode);
if (error)
- return error;
+ goto err_out;

/*
* Refuse to truncate files with mandatory locks held on them.
@@ -1583,12 +1586,17 @@ int may_open(struct path *path, int acc_mode, int flag)
}
put_write_access(inode);
if (error)
- return error;
+ goto err_out;
} else
if (flag & FMODE_WRITE)
vfs_dq_init(inode);

return 0;
+err_out:
+ ima_counts_put(path, acc_mode ?
+ acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC) :
+ ACC_MODE(flag) & (MAY_READ | MAY_WRITE));
+ return error;
}

/*
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 101c512..f0c9634 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -249,7 +249,11 @@ void ima_counts_put(struct path *path, int mask)
struct inode *inode = path->dentry->d_inode;
struct ima_iint_cache *iint;

- if (!ima_initialized || !S_ISREG(inode->i_mode))
+ /* The inode may already have been freed, freeing the iint
+ * with it. Verify the inode is not NULL before dereferencing
+ * it.
+ */
+ if (!ima_initialized || !inode || !S_ISREG(inode->i_mode))
return;
iint = ima_iint_find_insert_get(inode);
if (!iint)
--
1.6.0.6


2009-09-06 22:05:52

by Eric Paris

[permalink] [raw]
Subject: Re: [PATCH] IMA: update ima_counts_put

On Fri, Sep 4, 2009 at 1:08 PM, Mimi Zohar<[email protected]> wrote:
> - As ima_counts_put() may be called after the inode has been freed,
> verify that the inode is not NULL, before dereferencing it.
>
> - Maintain the IMA file counters in may_open() properly, decrementing
> any counter increments on subsequent errors.

This is a 2.6.31 regression introduced in 94e5d714f604d4cb4cb13

James can you push it along?

Acked-by: Eric Paris <[email protected]

2009-09-07 02:17:36

by James Morris

[permalink] [raw]
Subject: [GIT] IMA regression fix

Please pull.

The following changes since commit e07cccf4046978df10f2e13fe2b99b2f9b3a65db:
Linus Torvalds (1):
Linux 2.6.31-rc9

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 for-linus

Mimi Zohar (1):
IMA: update ima_counts_put

fs/namei.c | 22 +++++++++++++++-------
security/integrity/ima/ima_main.c | 6 +++++-
2 files changed, 20 insertions(+), 8 deletions(-)


--
James Morris
<[email protected]>

2009-09-12 07:25:42

by Ingo Molnar

[permalink] [raw]
Subject: [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux"


James - i did not see a security pull request email from you in my
lkml folder so i created this new thread. -tip testing found the
easy crash below. It reverts cleanly so i went that easy route.

At a really quick 10-seconds glance the crash happens because we
destroy the slab cache twice, if the sysctl is toggled twice?

Ingo

----------------->
>From cb52c156f8eedbcd963e0178787c8e2a933a656b Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 12 Sep 2009 09:17:42 +0200
Subject: [PATCH] Revert "selinux: clean up avc node cache when disabling selinux"

This reverts commit 89c86576ecde504da1eeb4f4882b2189ac2f9c4a.

Causes this crash:

[ 21.280240] async_continuing @ 1 after 0 usec
[ 21.289992] Freeing unused kernel memory: 616k freed
[ 21.289992] Write protecting the kernel read-only data: 10216k
[ 21.586068] SELinux: Disabled at runtime.
[ 21.590018] =============================================================================
[ 21.598233] BUG avc_node: Objects remaining on kmem_cache_close()
[ 21.600000] -----------------------------------------------------------------------------
[ 21.600000]
[ 21.600000] INFO: Slab 0xffffea00015de088 objects=30 used=6 fp=0xffff88003f9d3330 flags=0x100000000000082
[ 21.600000] Pid: 1, comm: init Not tainted 2.6.31-00127-g2490138-dirty #12971
[ 21.600000] Call Trace:
[ 21.600000] [<ffffffff811179f7>] slab_err+0xb0/0xd2
[ 21.600000] [<ffffffff81085ba7>] ? __lock_acquire+0x982/0x9e6
[ 21.600000] [<ffffffff816b8090>] ? _spin_unlock+0x3a/0x55
[ 21.600000] [<ffffffff811176b2>] ? add_partial+0x2e/0x94
[ 21.600000] [<ffffffff8111d254>] ? kmem_cache_destroy+0xcb/0x223
[ 21.600000] [<ffffffff81118f3a>] list_slab_objects+0xbc/0x18e
[ 21.600000] [<ffffffff816b8358>] ? _spin_lock_irqsave+0x4e/0x6e
[ 21.600000] [<ffffffff8111d2af>] kmem_cache_destroy+0x126/0x223
[ 21.600000] [<ffffffff816b43b2>] ? printk+0x50/0x66
[ 21.600000] [<ffffffff812324a5>] avc_disable+0x2d/0x43
[ 21.600000] [<ffffffff8123bd37>] selinux_disable+0x53/0xb5
[ 21.600000] [<ffffffff8123c55c>] sel_write_disable+0xa2/0x118
[ 21.600000] [<ffffffff81127291>] vfs_write+0xc6/0x17a
[ 21.600000] [<ffffffff81127445>] sys_write+0x5b/0x98
[ 21.600000] [<ffffffff8100bf6b>] system_call_fastpath+0x16/0x1b
[ 21.600000] INFO: Object 0xffff88003f9d3000 @offset=0
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=2167 cpu=0 pid=0
[ 21.600000] INFO: Object 0xffff88003f9d3088 @offset=136
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=2167 cpu=0 pid=0
[ 21.600000] INFO: Object 0xffff88003f9d3110 @offset=272
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=2158 cpu=0 pid=0
[ 21.600000] INFO: Object 0xffff88003f9d3198 @offset=408
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=1797 cpu=0 pid=1
[ 21.600000] INFO: Object 0xffff88003f9d3220 @offset=544
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=1798 cpu=0 pid=1
[ 21.600000] INFO: Object 0xffff88003f9d32a8 @offset=680
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=1115 cpu=0 pid=1
[ 21.600000] =============================================================================
[ 21.600000] BUG avc_node: Objects remaining on kmem_cache_close()
[ 21.600000] -----------------------------------------------------------------------------
[ 21.600000]
[ 21.600000] INFO: Slab 0xffffea000158b7d8 objects=30 used=4 fp=0xffff88003ead1220 flags=0x100000000000082
[ 21.600000] Pid: 1, comm: init Not tainted 2.6.31-00127-g2490138-dirty #12971
[ 21.600000] Call Trace:
[ 21.600000] [<ffffffff811179f7>] slab_err+0xb0/0xd2
[ 21.600000] [<ffffffff816b43b2>] ? printk+0x50/0x66
[ 21.600000] [<ffffffff812326b7>] ? avc_alloc_node+0x36/0x1c0
[ 21.600000] [<ffffffff81118f3a>] list_slab_objects+0xbc/0x18e
[ 21.600000] [<ffffffff816b8358>] ? _spin_lock_irqsave+0x4e/0x6e
[ 21.600000] [<ffffffff8111d2af>] kmem_cache_destroy+0x126/0x223
[ 21.600000] [<ffffffff816b43b2>] ? printk+0x50/0x66
[ 21.600000] [<ffffffff812324a5>] avc_disable+0x2d/0x43
[ 21.600000] [<ffffffff8123bd37>] selinux_disable+0x53/0xb5
[ 21.600000] [<ffffffff8123c55c>] sel_write_disable+0xa2/0x118
[ 21.600000] [<ffffffff81127291>] vfs_write+0xc6/0x17a
[ 21.600000] [<ffffffff81127445>] sys_write+0x5b/0x98
[ 21.600000] [<ffffffff8100bf6b>] system_call_fastpath+0x16/0x1b
[ 21.600000] INFO: Object 0xffff88003ead1000 @offset=0
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=2113 cpu=1 pid=13
[ 21.600000] INFO: Object 0xffff88003ead1088 @offset=136
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=70 cpu=1 pid=1
[ 21.600000] INFO: Object 0xffff88003ead1110 @offset=272
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=58 cpu=1 pid=1
[ 21.600000] INFO: Object 0xffff88003ead1198 @offset=408
[ 21.600000] INFO: Allocated in avc_alloc_node+0x36/0x1c0 age=55 cpu=1 pid=1
[ 21.950006] SLUB avc_node: kmem_cache_destroy called for cache that still has objects.
[ 21.960003] Pid: 1, comm: init Not tainted 2.6.31-00127-g2490138-dirty #12971
[ 21.970002] Call Trace:
[ 21.972460] [<ffffffff8111d347>] kmem_cache_destroy+0x1be/0x223
[ 21.978497] [<ffffffff816b43b2>] ? printk+0x50/0x66
[ 21.980004] [<ffffffff812324a5>] avc_disable+0x2d/0x43
[ 21.985241] [<ffffffff8123bd37>] selinux_disable+0x53/0xb5
[ 21.990004] [<ffffffff8123c55c>] sel_write_disable+0xa2/0x118
[ 22.000004] [<ffffffff81127291>] vfs_write+0xc6/0x17a
[ 22.005185] [<ffffffff81127445>] sys_write+0x5b/0x98
[ 22.010013] [<ffffffff8100bf6b>] system_call_fastpath+0x16/0x1b
[ 22.025687] khelper used greatest stack depth: 4104 bytes left
[ 22.030152] SELinux: Unregistering netfilter hooks
[ 22.170024] type=1404 audit(1252760072.170:2): selinux=0 auid=4294967295 ses=4294967295
INIT: version 2.86 booting
[ 22.280812] CRED: Invalid credentials
[ 22.284469] CRED: At kernel/cred.c:295
[ 22.288212] CRED: Specified credentials: ffff88003d467500
[ 22.290007] CRED: ->magic=43736564, put_addr=(null)
[ 22.294874] CRED: ->usage=1, subscr=0
[ 22.300003] CRED: ->*uid = { 0,0,0,0 }
[ 22.303749] CRED: ->*gid = { 0,0,0,0 }
[ 22.307490] CRED: ->security is (null)
[ 22.310011] ------------[ cut here ]------------
[ 22.314624] kernel BUG at kernel/cred.c:823!
[ 22.318893] invalid opcode: 0000 [#1] SMP
[ 22.320000] last sysfs file:
[ 22.320000] CPU 1
[ 22.320000] Modules linked in:
[ 22.320000] Pid: 1, comm: init Not tainted 2.6.31-00127-g2490138-dirty #12971 System Product Name
[ 22.320000] RIP: 0010:[<ffffffff8107911e>] [<ffffffff8107911e>] __invalid_creds+0x60/0x64
[ 22.320000] RSP: 0018:ffff88003ea4be88 EFLAGS: 00010292
[ 22.320000] RAX: 0000000000000000 RBX: 0000000000000127 RCX: 0000000000000000
[ 22.320000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88003ea4bd78
[ 22.320000] RBP: ffff88003ea4beb8 R08: 00000000bb1f063d R09: 0000000000000000
[ 22.320000] R10: 00000000bb1f063d R11: 0000000000018600 R12: ffffffff818e1647
[ 22.320000] R13: ffff88003d467500 R14: 0000000000000004 R15: 00000000020f88f8
[ 22.320000] FS: 00007f03df0ff780(0000) GS:ffff88000248f000(0000) knlGS:0000000000000000
[ 22.320000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 22.320000] CR2: 000000311090e004 CR3: 000000003d599000 CR4: 00000000000006a0
[ 22.320000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 22.320000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 22.320000] Process init (pid: 1, threadinfo ffff88003ea4a000, task ffff88003ea50000)
[ 22.320000] Stack:
[ 22.320000] 00000000bb1f063d 00000000bb1f063d 00000000bb1f063d ffff88003d467500
[ 22.320000] <0> ffff88003ea50000 00000000ffffff9c ffff88003ea4bef8 ffffffff81079a7c
[ 22.320000] <0> ffffffff8106445a ffff88003d618000 00000000bb1f063d 00000000bb1f063d
[ 22.320000] Call Trace:
[ 22.320000] [<ffffffff81079a7c>] prepare_creds+0x107/0x133
[ 22.320000] [<ffffffff8106445a>] ? sigprocmask+0x46/0xfb
[ 22.320000] [<ffffffff81125512>] sys_faccessat+0x46/0x1d4
[ 22.320000] [<ffffffff811256cb>] sys_access+0x2b/0x41
[ 22.320000] [<ffffffff8100bf6b>] system_call_fastpath+0x16/0x1b
[ 22.320000] Code: 89 da 4c 89 e6 48 c7 c7 fd 15 8e 81 31 c0 e8 5c b2 63 00 48 c7 c6 73 16 8e 81 4c 89 ef 65 48 8b 14 25 00 b0 00 00 e8 d6 fc ff ff <0f> 0b eb fe 55 48 89 e5 41 54 53 48 83 ec 10 0f 1f 44 00 00 65
[ 22.320000] RIP [<ffffffff8107911e>] __invalid_creds+0x60/0x64
[ 22.320000] RSP <ffff88003ea4be88>
[ 22.520003] ---[ end trace f1d1365aeb345558 ]---
[ 22.524612] Kernel panic - not syncing: Fatal exception
[ 22.529826] Pid: 1, comm: init Tainted: G D 2.6.31-00127-g2490138-dirty #12971
[ 22.530001] Call Trace:
[ 22.540008] [<ffffffff816b42b2>] panic+0x89/0x139
[ 22.544790] [<ffffffff816b9686>] oops_end+0xb9/0xe0
[ 22.550003] [<ffffffff816b9746>] ? oops_begin+0x99/0xb7
[ 22.555311] [<ffffffff8100fd81>] die+0x6d/0x8c
[ 22.559839] [<ffffffff816b8ff8>] do_trap+0x11f/0x142
[ 22.560004] [<ffffffff81077d7d>] ? notify_die+0x3d/0x53
[ 22.570004] [<ffffffff8100db30>] do_invalid_op+0xab/0xcb
[ 22.575397] [<ffffffff8107911e>] ? __invalid_creds+0x60/0x64
[ 22.580004] [<ffffffff8100cd95>] invalid_op+0x15/0x20
[ 22.585138] [<ffffffff8107911e>] ? __invalid_creds+0x60/0x64
[ 22.590004] [<ffffffff8107911e>] ? __invalid_creds+0x60/0x64
[ 22.595744] [<ffffffff81079a7c>] prepare_creds+0x107/0x133
[ 22.600004] [<ffffffff8106445a>] ? sigprocmask+0x46/0xfb
[ 22.605397] [<ffffffff81125512>] sys_faccessat+0x46/0x1d4
[ 22.610004] [<ffffffff811256cb>] sys_access+0x2b/0x41
[ 22.615137] [<ffffffff8100bf6b>] system_call_fastpath+0x16/0x1b
[ 22.620006] Rebooting in 1 seconds..Press any key to enter the menu

Signed-off-by: Ingo Molnar <[email protected]>
---
security/selinux/avc.c | 6 ------
security/selinux/hooks.c | 3 ---
security/selinux/include/avc.h | 3 ---
3 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index e3d1901..d07cd64 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -855,9 +855,3 @@ u32 avc_policy_seqno(void)
{
return avc_cache.latest_notif;
}
-
-void avc_disable(void)
-{
- if (avc_node_cachep)
- kmem_cache_destroy(avc_node_cachep);
-}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 417f7c9..d7afdb1 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -5830,9 +5830,6 @@ int selinux_disable(void)
selinux_disabled = 1;
selinux_enabled = 0;

- /* Try to destroy the avc node cache */
- avc_disable();
-
/* Reset security_ops to the secondary module, dummy or capability. */
security_ops = secondary_ops;

diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index e94e82f..e57f2ba 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -92,9 +92,6 @@ int avc_add_callback(int (*callback)(u32 event, u32 ssid, u32 tsid,
int avc_get_hash_stats(char *page);
extern unsigned int avc_cache_threshold;

-/* Attempt to free avc node cache */
-void avc_disable(void);
-
#ifdef CONFIG_SECURITY_SELINUX_AVC_STATS
DECLARE_PER_CPU(struct avc_cache_stats, avc_cache_stats);
#endif

2009-09-12 07:59:11

by Ingo Molnar

[permalink] [raw]
Subject: [origin tree boot crash #2] kernel BUG at kernel/cred.c:855!


below is another boot crash. This is with the revert in place so
sourced elsewhere. I already bisected the other one, making this one
much more difficult to bisect ...

[ 0.022999] Security Framework initialized
[ 0.023999] SELinux: Disabled at boot.
[ 0.024999] Mount-cache hash table entries: 512
[ 0.028999] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.029999] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.030999] CPU: Physical Processor ID: 0
[ 0.031999] CPU: Processor Core ID: 0
[ 0.032999] Checking 'hlt' instruction... OK.
[ 0.038999] CRED: Invalid process credentials
[ 0.039999] CRED: At kernel/cred.c:267
[ 0.040999] CRED: Real credentials: c19ab770 [init][real][eff]
[ 0.041999] CRED: ->magic=43736564, put_addr=(null)
[ 0.042999] CRED: ->usage=4, subscr=2
[ 0.043999] CRED: ->*uid = { 0,0,0,0 }
[ 0.044999] CRED: ->*gid = { 0,0,0,0 }
[ 0.045999] CRED: ->security is (null)
[ 0.046999] CRED: Effective creds == Real creds
[ 0.047999] ------------[ cut here ]------------
[ 0.047999] kernel BUG at kernel/cred.c:855!
[ 0.047999] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 0.047999] last sysfs file:
[ 0.047999] Modules linked in:
[ 0.047999]
[ 0.047999] Pid: 0, comm: swapper Not tainted (2.6.31-tip-02294-g6f4c721-dirty #12983) System Product Name
[ 0.047999] EIP: 0060:[<c1064f9c>] EFLAGS: 00010282 CPU: 0
[ 0.047999] EIP is at __validate_process_creds+0xd6/0xfe
[ 0.047999] EAX: c18642ba EBX: c19ab770 ECX: c106d02f EDX: c16cde5d
[ 0.047999] ESI: c19a5960 EDI: 0000010b EBP: c199fea4 ESP: c199fe94
[ 0.047999] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 0.047999] Process swapper (pid: 0, ti=c199f000 task=c19a5960 task.ti=c199f000)
[ 0.047999] Stack:
[ 0.047999] c18642e2 f7868000 c19a5960 00000000 c199feb8 c10653b1 f7868000 00800b00
[ 0.047999] <0> 00000000 c199fed4 c1065771 f7868000 00000000 f7868000 00000000 00000000
[ 0.047999] <0> c199ff04 c104c8d8 f7868000 c199ff84 00000000 00800b00 00000001 00000000
[ 0.047999] Call Trace:
[ 0.047999] [<c10653b1>] ? prepare_creds+0x1e/0xb1
[ 0.047999] [<c1065771>] ? copy_creds+0x85/0x1cc
[ 0.047999] [<c104c8d8>] ? copy_process+0x18b/0xc75
[ 0.047999] [<c104d4d5>] ? do_fork+0x113/0x28d
[ 0.047999] [<c106faef>] ? __lock_release+0x15e/0x164
[ 0.047999] [<c16ccd91>] ? __mutex_unlock_slowpath+0xf8/0x107
[ 0.047999] [<c1017b13>] ? kernel_thread+0x80/0x88
[ 0.047999] [<c1a15331>] ? kernel_init+0x0/0xa6
[ 0.047999] [<c1a15331>] ? kernel_init+0x0/0xa6
[ 0.047999] [<c1019ab0>] ? kernel_thread_helper+0x0/0x10
[ 0.047999] [<c169c241>] ? rest_init+0x19/0x5f
[ 0.047999] [<c1a158c7>] ? start_kernel+0x310/0x315
[ 0.047999] [<c1a15098>] ? __init_begin+0x98/0x9d
[ 0.047999] Code: ff 8b 86 dc 02 00 00 83 c4 10 3b 86 d8 02 00 00 74 0e 89 f1 ba b0 42 86 c1 e8 55 fe ff ff eb 0b 68 ba 42 86 c1 e8 d9 6f 66 00 58 <0f> 0b eb fe 81 7b 0c 64 65 73 43 75 9f e9 58 ff ff ff 81 79 0c
[ 0.047999] EIP: [<c1064f9c>] __validate_process_creds+0xd6/0xfe SS:ESP 0068:c199fe94

I'll try a blind (and manual) revert of:

ee18d64: KEYS: Add a keyctl to install a process's session keyring on its parent [try #6

Ingo

2009-09-12 08:20:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree boot crash #2] kernel BUG at kernel/cred.c:855!


* Ingo Molnar <[email protected]> wrote:

> I'll try a blind (and manual) revert of:
>
> ee18d64: KEYS: Add a keyctl to install a process's session keyring
> on its parent [try #6

that didnt do the trick, nor did this:

1a51e09: Revert "KEYS: Add a keyctl to install a process's session keyring on its parent

These were the only two changes to cred.c.

Ingo

-------------->
>From 1a51e095bae9e89170296e5a27ac19a666e84b3a Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 12 Sep 2009 09:56:14 +0200
Subject: [PATCH] Revert "KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]"

This reverts commit ee18d64c1f632043a02e6f5ba5e045bb26a5465f.

Conflicts:

include/linux/security.h
---
Documentation/keys.txt | 20 --------
arch/alpha/kernel/signal.c | 2 -
arch/arm/kernel/signal.c | 2 -
arch/avr32/kernel/signal.c | 2 -
arch/cris/kernel/ptrace.c | 2 -
arch/frv/kernel/signal.c | 2 -
arch/h8300/kernel/signal.c | 2 -
arch/ia64/kernel/process.c | 2 -
arch/m32r/kernel/signal.c | 2 -
arch/mips/kernel/signal.c | 2 -
arch/mn10300/kernel/signal.c | 2 -
arch/parisc/kernel/signal.c | 2 -
arch/s390/kernel/signal.c | 2 -
arch/sh/kernel/signal_32.c | 2 -
arch/sh/kernel/signal_64.c | 2 -
arch/sparc/kernel/signal_32.c | 2 -
arch/sparc/kernel/signal_64.c | 3 -
arch/x86/kernel/signal.c | 2 -
include/linux/cred.h | 1 -
include/linux/key.h | 3 -
include/linux/keyctl.h | 1 -
include/linux/sched.h | 1 -
include/linux/security.h | 43 -----------------
kernel/cred.c | 43 -----------------
security/capability.c | 19 --------
security/keys/compat.c | 3 -
security/keys/gc.c | 1 -
security/keys/internal.h | 1 -
security/keys/keyctl.c | 102 -----------------------------------------
security/keys/process_keys.c | 49 --------------------
security/security.c | 17 -------
security/selinux/hooks.c | 28 -----------
security/smack/smack_lsm.c | 30 ------------
security/tomoyo/tomoyo.c | 17 -------
34 files changed, 0 insertions(+), 414 deletions(-)

diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index e4dbbdb..203487e 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -757,26 +757,6 @@ The keyctl syscall functions are:
successful.


- (*) Install the calling process's session keyring on its parent.
-
- long keyctl(KEYCTL_SESSION_TO_PARENT);
-
- This functions attempts to install the calling process's session keyring
- on to the calling process's parent, replacing the parent's current session
- keyring.
-
- The calling process must have the same ownership as its parent, the
- keyring must have the same ownership as the calling process, the calling
- process must have LINK permission on the keyring and the active LSM module
- mustn't deny permission, otherwise error EPERM will be returned.
-
- Error ENOMEM will be returned if there was insufficient memory to complete
- the operation, otherwise 0 will be returned to indicate success.
-
- The keyring will be replaced next time the parent process leaves the
- kernel and resumes executing userspace.
-
-
===============
KERNEL SERVICES
===============
diff --git a/arch/alpha/kernel/signal.c b/arch/alpha/kernel/signal.c
index 0932dbb..a58f857 100644
--- a/arch/alpha/kernel/signal.c
+++ b/arch/alpha/kernel/signal.c
@@ -688,7 +688,5 @@ do_notify_resume(struct pt_regs *regs, struct switch_stack *sw,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index b76fe06..cab2c53 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -712,7 +712,5 @@ do_notify_resume(struct pt_regs *regs, unsigned int thread_flags, int syscall)
if (thread_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/avr32/kernel/signal.c b/arch/avr32/kernel/signal.c
index 64f886f..0d512c5 100644
--- a/arch/avr32/kernel/signal.c
+++ b/arch/avr32/kernel/signal.c
@@ -327,7 +327,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, struct thread_info *ti)
if (ti->flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/cris/kernel/ptrace.c b/arch/cris/kernel/ptrace.c
index 48b0f39..5c969ab 100644
--- a/arch/cris/kernel/ptrace.c
+++ b/arch/cris/kernel/ptrace.c
@@ -41,7 +41,5 @@ void do_notify_resume(int canrestart, struct pt_regs *regs,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/frv/kernel/signal.c b/arch/frv/kernel/signal.c
index 6b0a2b6..4a7a62c 100644
--- a/arch/frv/kernel/signal.c
+++ b/arch/frv/kernel/signal.c
@@ -572,8 +572,6 @@ asmlinkage void do_notify_resume(__u32 thread_info_flags)
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(__frame);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}

} /* end do_notify_resume() */
diff --git a/arch/h8300/kernel/signal.c b/arch/h8300/kernel/signal.c
index af842c3..14c46e8 100644
--- a/arch/h8300/kernel/signal.c
+++ b/arch/h8300/kernel/signal.c
@@ -557,7 +557,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, u32 thread_info_flags)
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 135d849..04da55e 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -192,8 +192,6 @@ do_notify_resume_user(sigset_t *unused, struct sigscratch *scr, long in_syscall)
if (test_thread_flag(TIF_NOTIFY_RESUME)) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(&scr->pt);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}

/* copy user rbs to kernel rbs */
diff --git a/arch/m32r/kernel/signal.c b/arch/m32r/kernel/signal.c
index 144b0f1..91fea76 100644
--- a/arch/m32r/kernel/signal.c
+++ b/arch/m32r/kernel/signal.c
@@ -412,8 +412,6 @@ void do_notify_resume(struct pt_regs *regs, sigset_t *oldset,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}

clear_thread_flag(TIF_IRET);
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index 6254041..6149b04 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -705,7 +705,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/mn10300/kernel/signal.c b/arch/mn10300/kernel/signal.c
index a21f43b..feb2f2e 100644
--- a/arch/mn10300/kernel/signal.c
+++ b/arch/mn10300/kernel/signal.c
@@ -568,7 +568,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, u32 thread_info_flags)
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(__frame);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/parisc/kernel/signal.c b/arch/parisc/kernel/signal.c
index 8eb3c63..0408aac 100644
--- a/arch/parisc/kernel/signal.c
+++ b/arch/parisc/kernel/signal.c
@@ -650,7 +650,5 @@ void do_notify_resume(struct pt_regs *regs, long in_syscall)
if (test_thread_flag(TIF_NOTIFY_RESUME)) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index 6b4fef8..062bd64 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -536,6 +536,4 @@ void do_notify_resume(struct pt_regs *regs)
{
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
diff --git a/arch/sh/kernel/signal_32.c b/arch/sh/kernel/signal_32.c
index 04a2188..b5afbec 100644
--- a/arch/sh/kernel/signal_32.c
+++ b/arch/sh/kernel/signal_32.c
@@ -640,7 +640,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, unsigned int save_r0,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/sh/kernel/signal_64.c b/arch/sh/kernel/signal_64.c
index 9e5c9b1..0663a0e 100644
--- a/arch/sh/kernel/signal_64.c
+++ b/arch/sh/kernel/signal_64.c
@@ -772,7 +772,5 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, unsigned long thread_info
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
diff --git a/arch/sparc/kernel/signal_32.c b/arch/sparc/kernel/signal_32.c
index 7ce1a10..181d069 100644
--- a/arch/sparc/kernel/signal_32.c
+++ b/arch/sparc/kernel/signal_32.c
@@ -590,8 +590,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0,
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}

diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c
index 647afbd..ec82d76 100644
--- a/arch/sparc/kernel/signal_64.c
+++ b/arch/sparc/kernel/signal_64.c
@@ -613,8 +613,5 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}
}
-
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 81e5823..4c57875 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -869,8 +869,6 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
tracehook_notify_resume(regs);
- if (current->replacement_session_keyring)
- key_replace_session_keyring();
}

#ifdef CONFIG_X86_32
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 24520a5..85439ab 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -152,7 +152,6 @@ struct cred {
extern void __put_cred(struct cred *);
extern void exit_creds(struct task_struct *);
extern int copy_creds(struct task_struct *, unsigned long);
-extern struct cred *cred_alloc_blank(void);
extern struct cred *prepare_creds(void);
extern struct cred *prepare_exec_creds(void);
extern struct cred *prepare_usermodehelper_creds(void);
diff --git a/include/linux/key.h b/include/linux/key.h
index cd50dfa..33e0165 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -278,8 +278,6 @@ static inline key_serial_t key_serial(struct key *key)
extern ctl_table key_sysctls[];
#endif

-extern void key_replace_session_keyring(void);
-
/*
* the userspace interface
*/
@@ -302,7 +300,6 @@ extern void key_init(void);
#define key_fsuid_changed(t) do { } while(0)
#define key_fsgid_changed(t) do { } while(0)
#define key_init() do { } while(0)
-#define key_replace_session_keyring() do { } while(0)

#endif /* CONFIG_KEYS */
#endif /* __KERNEL__ */
diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h
index bd383f1..c0688eb 100644
--- a/include/linux/keyctl.h
+++ b/include/linux/keyctl.h
@@ -52,6 +52,5 @@
#define KEYCTL_SET_TIMEOUT 15 /* set key timeout */
#define KEYCTL_ASSUME_AUTHORITY 16 /* assume request_key() authorisation */
#define KEYCTL_GET_SECURITY 17 /* get key security label */
-#define KEYCTL_SESSION_TO_PARENT 18 /* apply session keyring to parent process */

#endif /* _LINUX_KEYCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3d74bd..039ccbd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1303,7 +1303,6 @@ struct task_struct {
struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
* (notably. ptrace) */
- struct cred *replacement_session_keyring; /* for KEYCTL_SESSION_TO_PARENT */

char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
diff --git a/include/linux/security.h b/include/linux/security.h
index d050b66..0e75a10 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -653,11 +653,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* manual page for definitions of the @clone_flags.
* @clone_flags contains the flags indicating what should be shared.
* Return 0 if permission is granted.
- * @cred_alloc_blank:
- * @cred points to the credentials.
- * @gfp indicates the atomicity of any memory allocations.
- * Only allocate sufficient memory and attach to @cred such that
- * cred_transfer() will not get ENOMEM.
* @cred_free:
* @cred points to the credentials.
* Deallocate and clear the cred->security field in a set of credentials.
@@ -670,10 +665,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* @new points to the new credentials.
* @old points to the original credentials.
* Install a new set of credentials.
- * @cred_transfer:
- * @new points to the new credentials.
- * @old points to the original credentials.
- * Transfer data from original creds to new creds
* @kernel_act_as:
* Set the credentials for a kernel service to act as (subjective context).
* @new points to the credentials to be modified.
@@ -1112,13 +1103,6 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* Return the length of the string (including terminating NUL) or -ve if
* an error.
* May also return 0 (and a NULL buffer pointer) if there is no label.
- * @key_session_to_parent:
- * Forcibly assign the session keyring from a process to its parent
- * process.
- * @cred: Pointer to process's credentials
- * @parent_cred: Pointer to parent process's credentials
- * @keyring: Proposed new session keyring
- * Return 0 if permission is granted, -ve error otherwise.
*
* Security hooks affecting all System V IPC operations.
*
@@ -1549,12 +1533,10 @@ struct security_operations {
int (*dentry_open) (struct file *file, const struct cred *cred);

int (*task_create) (unsigned long clone_flags);
- int (*cred_alloc_blank) (struct cred *cred, gfp_t gfp);
void (*cred_free) (struct cred *cred);
int (*cred_prepare)(struct cred *new, const struct cred *old,
gfp_t gfp);
void (*cred_commit)(struct cred *new, const struct cred *old);
- void (*cred_transfer)(struct cred *new, const struct cred *old);
int (*kernel_act_as)(struct cred *new, u32 secid);
int (*kernel_create_files_as)(struct cred *new, struct inode *inode);
int (*kernel_module_request)(void);
@@ -1696,9 +1678,6 @@ struct security_operations {
const struct cred *cred,
key_perm_t perm);
int (*key_getsecurity)(struct key *key, char **_buffer);
- int (*key_session_to_parent)(const struct cred *cred,
- const struct cred *parent_cred,
- struct key *key);
#endif /* CONFIG_KEYS */

#ifdef CONFIG_AUDIT
@@ -1815,11 +1794,9 @@ int security_file_send_sigiotask(struct task_struct *tsk,
int security_file_receive(struct file *file);
int security_dentry_open(struct file *file, const struct cred *cred);
int security_task_create(unsigned long clone_flags);
-int security_cred_alloc_blank(struct cred *cred, gfp_t gfp);
void security_cred_free(struct cred *cred);
int security_prepare_creds(struct cred *new, const struct cred *old, gfp_t gfp);
void security_commit_creds(struct cred *new, const struct cred *old);
-void security_transfer_creds(struct cred *new, const struct cred *old);
int security_kernel_act_as(struct cred *new, u32 secid);
int security_kernel_create_files_as(struct cred *new, struct inode *inode);
int security_kernel_module_request(void);
@@ -2351,11 +2328,6 @@ static inline int security_task_create(unsigned long clone_flags)
return 0;
}

-static inline int security_cred_alloc_blank(struct cred *cred, gfp_t gfp)
-{
- return 0;
-}
-
static inline void security_cred_free(struct cred *cred)
{ }

@@ -2371,11 +2343,6 @@ static inline void security_commit_creds(struct cred *new,
{
}

-static inline void security_transfer_creds(struct cred *new,
- const struct cred *old)
-{
-}
-
static inline int security_kernel_act_as(struct cred *cred, u32 secid)
{
return 0;
@@ -3011,9 +2978,6 @@ void security_key_free(struct key *key);
int security_key_permission(key_ref_t key_ref,
const struct cred *cred, key_perm_t perm);
int security_key_getsecurity(struct key *key, char **_buffer);
-int security_key_session_to_parent(const struct cred *cred,
- const struct cred *parent_cred,
- struct key *key);

#else

@@ -3041,13 +3005,6 @@ static inline int security_key_getsecurity(struct key *key, char **_buffer)
return 0;
}

-static inline int security_key_session_to_parent(const struct cred *cred,
- const struct cred *parent_cred,
- struct key *key)
-{
- return 0;
-}
-
#endif
#endif /* CONFIG_KEYS */

diff --git a/kernel/cred.c b/kernel/cred.c
index 006fcab..24dd2f5 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -199,49 +199,6 @@ void exit_creds(struct task_struct *tsk)
validate_creds(cred);
alter_cred_subscribers(cred, -1);
put_cred(cred);
-
- cred = (struct cred *) tsk->replacement_session_keyring;
- if (cred) {
- tsk->replacement_session_keyring = NULL;
- validate_creds(cred);
- put_cred(cred);
- }
-}
-
-/*
- * Allocate blank credentials, such that the credentials can be filled in at a
- * later date without risk of ENOMEM.
- */
-struct cred *cred_alloc_blank(void)
-{
- struct cred *new;
-
- new = kmem_cache_zalloc(cred_jar, GFP_KERNEL);
- if (!new)
- return NULL;
-
-#ifdef CONFIG_KEYS
- new->tgcred = kzalloc(sizeof(*new->tgcred), GFP_KERNEL);
- if (!new->tgcred) {
- kfree(new);
- return NULL;
- }
- atomic_set(&new->tgcred->usage, 1);
-#endif
-
- atomic_set(&new->usage, 1);
-
- if (security_cred_alloc_blank(new, GFP_KERNEL) < 0)
- goto error;
-
-#ifdef CONFIG_DEBUG_CREDENTIALS
- new->magic = CRED_MAGIC;
-#endif
- return new;
-
-error:
- abort_creds(new);
- return NULL;
}

/**
diff --git a/security/capability.c b/security/capability.c
index 13781e9..790d5c9 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -374,11 +374,6 @@ static int cap_task_create(unsigned long clone_flags)
return 0;
}

-static int cap_cred_alloc_blank(struct cred *cred, gfp_t gfp)
-{
- return 0;
-}
-
static void cap_cred_free(struct cred *cred)
{
}
@@ -392,10 +387,6 @@ static void cap_cred_commit(struct cred *new, const struct cred *old)
{
}

-static void cap_cred_transfer(struct cred *new, const struct cred *old)
-{
-}
-
static int cap_kernel_act_as(struct cred *new, u32 secid)
{
return 0;
@@ -863,13 +854,6 @@ static int cap_key_getsecurity(struct key *key, char **_buffer)
return 0;
}

-static int cap_key_session_to_parent(const struct cred *cred,
- const struct cred *parent_cred,
- struct key *key)
-{
- return 0;
-}
-
#endif /* CONFIG_KEYS */

#ifdef CONFIG_AUDIT
@@ -995,11 +979,9 @@ void security_fixup_ops(struct security_operations *ops)
set_to_cap_if_null(ops, file_receive);
set_to_cap_if_null(ops, dentry_open);
set_to_cap_if_null(ops, task_create);
- set_to_cap_if_null(ops, cred_alloc_blank);
set_to_cap_if_null(ops, cred_free);
set_to_cap_if_null(ops, cred_prepare);
set_to_cap_if_null(ops, cred_commit);
- set_to_cap_if_null(ops, cred_transfer);
set_to_cap_if_null(ops, kernel_act_as);
set_to_cap_if_null(ops, kernel_create_files_as);
set_to_cap_if_null(ops, kernel_module_request);
@@ -1102,7 +1084,6 @@ void security_fixup_ops(struct security_operations *ops)
set_to_cap_if_null(ops, key_free);
set_to_cap_if_null(ops, key_permission);
set_to_cap_if_null(ops, key_getsecurity);
- set_to_cap_if_null(ops, key_session_to_parent);
#endif /* CONFIG_KEYS */
#ifdef CONFIG_AUDIT
set_to_cap_if_null(ops, audit_rule_init);
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 792c0a6..c766c68 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -82,9 +82,6 @@ asmlinkage long compat_sys_keyctl(u32 option,
case KEYCTL_GET_SECURITY:
return keyctl_get_security(arg2, compat_ptr(arg3), arg4);

- case KEYCTL_SESSION_TO_PARENT:
- return keyctl_session_to_parent();
-
default:
return -EOPNOTSUPP;
}
diff --git a/security/keys/gc.c b/security/keys/gc.c
index 1e616ae..44adc32 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -65,7 +65,6 @@ static void key_gc_timer_func(unsigned long data)
* - return true if we altered the keyring
*/
static bool key_gc_keyring(struct key *keyring, time_t limit)
- __releases(key_serial_lock)
{
struct keyring_list *klist;
struct key *key;
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 24ba030..fb83051 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -201,7 +201,6 @@ extern long keyctl_set_timeout(key_serial_t, unsigned);
extern long keyctl_assume_authority(key_serial_t);
extern long keyctl_get_security(key_serial_t keyid, char __user *buffer,
size_t buflen);
-extern long keyctl_session_to_parent(void);

/*
* debugging key validation
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 74c9685..736d780 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1228,105 +1228,6 @@ long keyctl_get_security(key_serial_t keyid,
return ret;
}

-/*
- * attempt to install the calling process's session keyring on the process's
- * parent process
- * - the keyring must exist and must grant us LINK permission
- * - implements keyctl(KEYCTL_SESSION_TO_PARENT)
- */
-long keyctl_session_to_parent(void)
-{
- struct task_struct *me, *parent;
- const struct cred *mycred, *pcred;
- struct cred *cred, *oldcred;
- key_ref_t keyring_r;
- int ret;
-
- keyring_r = lookup_user_key(KEY_SPEC_SESSION_KEYRING, 0, KEY_LINK);
- if (IS_ERR(keyring_r))
- return PTR_ERR(keyring_r);
-
- /* our parent is going to need a new cred struct, a new tgcred struct
- * and new security data, so we allocate them here to prevent ENOMEM in
- * our parent */
- ret = -ENOMEM;
- cred = cred_alloc_blank();
- if (!cred)
- goto error_keyring;
-
- cred->tgcred->session_keyring = key_ref_to_ptr(keyring_r);
- keyring_r = NULL;
-
- me = current;
- write_lock_irq(&tasklist_lock);
-
- parent = me->real_parent;
- ret = -EPERM;
-
- /* the parent mustn't be init and mustn't be a kernel thread */
- if (parent->pid <= 1 || !parent->mm)
- goto not_permitted;
-
- /* the parent must be single threaded */
- if (atomic_read(&parent->signal->count) != 1)
- goto not_permitted;
-
- /* the parent and the child must have different session keyrings or
- * there's no point */
- mycred = current_cred();
- pcred = __task_cred(parent);
- if (mycred == pcred ||
- mycred->tgcred->session_keyring == pcred->tgcred->session_keyring)
- goto already_same;
-
- /* the parent must have the same effective ownership and mustn't be
- * SUID/SGID */
- if (pcred-> uid != mycred->euid ||
- pcred->euid != mycred->euid ||
- pcred->suid != mycred->euid ||
- pcred-> gid != mycred->egid ||
- pcred->egid != mycred->egid ||
- pcred->sgid != mycred->egid)
- goto not_permitted;
-
- /* the keyrings must have the same UID */
- if (pcred ->tgcred->session_keyring->uid != mycred->euid ||
- mycred->tgcred->session_keyring->uid != mycred->euid)
- goto not_permitted;
-
- /* the LSM must permit the replacement of the parent's keyring with the
- * keyring from this process */
- ret = security_key_session_to_parent(mycred, pcred,
- key_ref_to_ptr(keyring_r));
- if (ret < 0)
- goto not_permitted;
-
- /* if there's an already pending keyring replacement, then we replace
- * that */
- oldcred = parent->replacement_session_keyring;
-
- /* the replacement session keyring is applied just prior to userspace
- * restarting */
- parent->replacement_session_keyring = cred;
- cred = NULL;
- set_ti_thread_flag(task_thread_info(parent), TIF_NOTIFY_RESUME);
-
- write_unlock_irq(&tasklist_lock);
- if (oldcred)
- put_cred(oldcred);
- return 0;
-
-already_same:
- ret = 0;
-not_permitted:
- put_cred(cred);
- return ret;
-
-error_keyring:
- key_ref_put(keyring_r);
- return ret;
-}
-
/*****************************************************************************/
/*
* the key control system call
@@ -1412,9 +1313,6 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3,
(char __user *) arg3,
(size_t) arg4);

- case KEYCTL_SESSION_TO_PARENT:
- return keyctl_session_to_parent();
-
default:
return -EOPNOTSUPP;
}
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 5c23afb..4739cfb 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -17,7 +17,6 @@
#include <linux/fs.h>
#include <linux/err.h>
#include <linux/mutex.h>
-#include <linux/security.h>
#include <linux/user_namespace.h>
#include <asm/uaccess.h>
#include "internal.h"
@@ -769,51 +768,3 @@ error:
abort_creds(new);
return ret;
}
-
-/*
- * Replace a process's session keyring when that process resumes userspace on
- * behalf of one of its children
- */
-void key_replace_session_keyring(void)
-{
- const struct cred *old;
- struct cred *new;
-
- if (!current->replacement_session_keyring)
- return;
-
- write_lock_irq(&tasklist_lock);
- new = current->replacement_session_keyring;
- current->replacement_session_keyring = NULL;
- write_unlock_irq(&tasklist_lock);
-
- if (!new)
- return;
-
- old = current_cred();
- new-> uid = old-> uid;
- new-> euid = old-> euid;
- new-> suid = old-> suid;
- new->fsuid = old->fsuid;
- new-> gid = old-> gid;
- new-> egid = old-> egid;
- new-> sgid = old-> sgid;
- new->fsgid = old->fsgid;
- new->user = get_uid(old->user);
- new->group_info = get_group_info(old->group_info);
-
- new->securebits = old->securebits;
- new->cap_inheritable = old->cap_inheritable;
- new->cap_permitted = old->cap_permitted;
- new->cap_effective = old->cap_effective;
- new->cap_bset = old->cap_bset;
-
- new->jit_keyring = old->jit_keyring;
- new->thread_keyring = key_get(old->thread_keyring);
- new->tgcred->tgid = old->tgcred->tgid;
- new->tgcred->process_keyring = key_get(old->tgcred->process_keyring);
-
- security_transfer_creds(new, old);
-
- commit_creds(new);
-}
diff --git a/security/security.c b/security/security.c
index c4c6732..3a89c9a 100644
--- a/security/security.c
+++ b/security/security.c
@@ -684,11 +684,6 @@ int security_task_create(unsigned long clone_flags)
return security_ops->task_create(clone_flags);
}

-int security_cred_alloc_blank(struct cred *cred, gfp_t gfp)
-{
- return security_ops->cred_alloc_blank(cred, gfp);
-}
-
void security_cred_free(struct cred *cred)
{
security_ops->cred_free(cred);
@@ -704,11 +699,6 @@ void security_commit_creds(struct cred *new, const struct cred *old)
security_ops->cred_commit(new, old);
}

-void security_transfer_creds(struct cred *new, const struct cred *old)
-{
- security_ops->cred_transfer(new, old);
-}
-
int security_kernel_act_as(struct cred *new, u32 secid)
{
return security_ops->kernel_act_as(new, secid);
@@ -1269,13 +1259,6 @@ int security_key_getsecurity(struct key *key, char **_buffer)
return security_ops->key_getsecurity(key, _buffer);
}

-int security_key_session_to_parent(const struct cred *cred,
- const struct cred *parent_cred,
- struct key *key)
-{
- return security_ops->key_session_to_parent(cred, parent_cred, key);
-}
-
#endif /* CONFIG_KEYS */

#ifdef CONFIG_AUDIT
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d7afdb1..ec04cc2 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3238,21 +3238,6 @@ static int selinux_task_create(unsigned long clone_flags)
}

/*
- * allocate the SELinux part of blank credentials
- */
-static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
-{
- struct task_security_struct *tsec;
-
- tsec = kzalloc(sizeof(struct task_security_struct), gfp);
- if (!tsec)
- return -ENOMEM;
-
- cred->security = tsec;
- return 0;
-}
-
-/*
* detach and free the LSM part of a set of credentials
*/
static void selinux_cred_free(struct cred *cred)
@@ -3284,17 +3269,6 @@ static int selinux_cred_prepare(struct cred *new, const struct cred *old,
}

/*
- * transfer the SELinux data to a blank set of creds
- */
-static void selinux_cred_transfer(struct cred *new, const struct cred *old)
-{
- const struct task_security_struct *old_tsec = old->security;
- struct task_security_struct *tsec = new->security;
-
- *tsec = *old_tsec;
-}
-
-/*
* set the security data for a kernel service
* - all the creation contexts are set to unlabelled
*/
@@ -5526,10 +5500,8 @@ static struct security_operations selinux_ops = {
.dentry_open = selinux_dentry_open,

.task_create = selinux_task_create,
- .cred_alloc_blank = selinux_cred_alloc_blank,
.cred_free = selinux_cred_free,
.cred_prepare = selinux_cred_prepare,
- .cred_transfer = selinux_cred_transfer,
.kernel_act_as = selinux_kernel_act_as,
.kernel_create_files_as = selinux_kernel_create_files_as,
.kernel_module_request = selinux_kernel_module_request,
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index acae7ef..aba5c9a 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1080,22 +1080,6 @@ static int smack_file_receive(struct file *file)
*/

/**
- * smack_cred_alloc_blank - "allocate" blank task-level security credentials
- * @new: the new credentials
- * @gfp: the atomicity of any memory allocations
- *
- * Prepare a blank set of credentials for modification. This must allocate all
- * the memory the LSM module might require such that cred_transfer() can
- * complete without error.
- */
-static int smack_cred_alloc_blank(struct cred *cred, gfp_t gfp)
-{
- cred->security = NULL;
- return 0;
-}
-
-
-/**
* smack_cred_free - "free" task-level security credentials
* @cred: the credentials in question
*
@@ -1133,18 +1117,6 @@ static void smack_cred_commit(struct cred *new, const struct cred *old)
}

/**
- * smack_cred_transfer - Transfer the old credentials to the new credentials
- * @new: the new credentials
- * @old: the original credentials
- *
- * Fill in a set of blank credentials from another set of credentials.
- */
-static void smack_cred_transfer(struct cred *new, const struct cred *old)
-{
- new->security = old->security;
-}
-
-/**
* smack_kernel_act_as - Set the subjective context in a set of credentials
* @new: points to the set of credentials to be modified.
* @secid: specifies the security ID to be set
@@ -3123,11 +3095,9 @@ struct security_operations smack_ops = {
.file_send_sigiotask = smack_file_send_sigiotask,
.file_receive = smack_file_receive,

- .cred_alloc_blank = smack_cred_alloc_blank,
.cred_free = smack_cred_free,
.cred_prepare = smack_cred_prepare,
.cred_commit = smack_cred_commit,
- .cred_transfer = smack_cred_transfer,
.kernel_act_as = smack_kernel_act_as,
.kernel_create_files_as = smack_kernel_create_files_as,
.task_setpgid = smack_task_setpgid,
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 9548a09..35a13e7 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -14,12 +14,6 @@
#include "tomoyo.h"
#include "realpath.h"

-static int tomoyo_cred_alloc_blank(struct cred *new, gfp_t gfp)
-{
- new->security = NULL;
- return 0;
-}
-
static int tomoyo_cred_prepare(struct cred *new, const struct cred *old,
gfp_t gfp)
{
@@ -31,15 +25,6 @@ static int tomoyo_cred_prepare(struct cred *new, const struct cred *old,
return 0;
}

-static void tomoyo_cred_transfer(struct cred *new, const struct cred *old)
-{
- /*
- * Since "struct tomoyo_domain_info *" is a sharable pointer,
- * we don't need to duplicate.
- */
- new->security = old->security;
-}
-
static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
{
int rc;
@@ -277,9 +262,7 @@ static int tomoyo_dentry_open(struct file *f, const struct cred *cred)
*/
static struct security_operations tomoyo_security_ops = {
.name = "tomoyo",
- .cred_alloc_blank = tomoyo_cred_alloc_blank,
.cred_prepare = tomoyo_cred_prepare,
- .cred_transfer = tomoyo_cred_transfer,
.bprm_set_creds = tomoyo_bprm_set_creds,
.bprm_check_security = tomoyo_bprm_check_security,
#ifdef CONFIG_SYSCTL

>From 14a0881feaf6004fe1060584a4c814c92f26a545 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 12 Sep 2009 10:16:42 +0200
Subject: [PATCH] Revert "CRED: Add some configurable debugging [try #6]"

This reverts commit e0e817392b9acf2c98d3be80c233dddb1b52003d.
---
fs/nfsd/auth.c | 4 -
fs/nfsd/nfssvc.c | 2 -
fs/nfsd/vfs.c | 3 -
fs/open.c | 2 -
include/linux/cred.h | 65 +------------
kernel/cred.c | 250 +--------------------------------------------
kernel/exit.c | 4 -
kernel/fork.c | 6 +-
kernel/kmod.c | 1 -
lib/Kconfig.debug | 15 ---
security/selinux/hooks.c | 6 +-
11 files changed, 12 insertions(+), 346 deletions(-)

diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index 36fcabb..5573508 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -34,8 +34,6 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
int flags = nfsexp_flags(rqstp, exp);
int ret;

- validate_process_creds();
-
/* discard any old override before preparing the new set */
revert_creds(get_cred(current->real_cred));
new = prepare_creds();
@@ -88,10 +86,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
else
new->cap_effective = cap_raise_nfsd_set(new->cap_effective,
new->cap_permitted);
- validate_process_creds();
put_cred(override_creds(new));
put_cred(new);
- validate_process_creds();
return 0;

oom:
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 24d58ad..492c79b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -496,9 +496,7 @@ nfsd(void *vrqstp)
/* Lock the export hash tables for reading. */
exp_readlock();

- validate_process_creds();
svc_process(rqstp);
- validate_process_creds();

/* Unlock export hash tables */
exp_readunlock();
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8fa09bf..23341c1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -684,8 +684,6 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
__be32 err;
int host_err;

- validate_process_creds();
-
/*
* If we get here, then the client has already done an "open",
* and (hopefully) checked permission - so allow OWNER_OVERRIDE
@@ -742,7 +740,6 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
out_nfserr:
err = nfserrno(host_err);
out:
- validate_process_creds();
return err;
}

diff --git a/fs/open.c b/fs/open.c
index 31191bf..40d1fa2 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -959,8 +959,6 @@ struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags,
int error;
struct file *f;

- validate_creds(cred);
-
/*
* We must always pass in a valid mount pointer. Historically
* callers got away with not passing it, but we must enforce this at
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 85439ab..b3c76e8 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -114,13 +114,6 @@ struct thread_group_cred {
*/
struct cred {
atomic_t usage;
-#ifdef CONFIG_DEBUG_CREDENTIALS
- atomic_t subscribers; /* number of processes subscribed */
- void *put_addr;
- unsigned magic;
-#define CRED_MAGIC 0x43736564
-#define CRED_MAGIC_DEAD 0x44656144
-#endif
uid_t uid; /* real UID of the task */
gid_t gid; /* real GID of the task */
uid_t suid; /* saved UID of the task */
@@ -150,7 +143,6 @@ struct cred {
};

extern void __put_cred(struct cred *);
-extern void exit_creds(struct task_struct *);
extern int copy_creds(struct task_struct *, unsigned long);
extern struct cred *prepare_creds(void);
extern struct cred *prepare_exec_creds(void);
@@ -166,60 +158,6 @@ extern int set_security_override_from_ctx(struct cred *, const char *);
extern int set_create_files_as(struct cred *, struct inode *);
extern void __init cred_init(void);

-/*
- * check for validity of credentials
- */
-#ifdef CONFIG_DEBUG_CREDENTIALS
-extern void __invalid_creds(const struct cred *, const char *, unsigned);
-extern void __validate_process_creds(struct task_struct *,
- const char *, unsigned);
-
-static inline bool creds_are_invalid(const struct cred *cred)
-{
- if (cred->magic != CRED_MAGIC)
- return true;
- if (atomic_read(&cred->usage) < atomic_read(&cred->subscribers))
- return true;
-#ifdef CONFIG_SECURITY_SELINUX
- if ((unsigned long) cred->security < PAGE_SIZE)
- return true;
- if ((*(u32*)cred->security & 0xffffff00) ==
- (POISON_FREE << 24 | POISON_FREE << 16 | POISON_FREE << 8))
- return true;
-#endif
- return false;
-}
-
-static inline void __validate_creds(const struct cred *cred,
- const char *file, unsigned line)
-{
- if (unlikely(creds_are_invalid(cred)))
- __invalid_creds(cred, file, line);
-}
-
-#define validate_creds(cred) \
-do { \
- __validate_creds((cred), __FILE__, __LINE__); \
-} while(0)
-
-#define validate_process_creds() \
-do { \
- __validate_process_creds(current, __FILE__, __LINE__); \
-} while(0)
-
-extern void validate_creds_for_do_exit(struct task_struct *);
-#else
-static inline void validate_creds(const struct cred *cred)
-{
-}
-static inline void validate_creds_for_do_exit(struct task_struct *tsk)
-{
-}
-static inline void validate_process_creds(void)
-{
-}
-#endif
-
/**
* get_new_cred - Get a reference on a new set of credentials
* @cred: The new credentials to reference
@@ -249,7 +187,6 @@ static inline struct cred *get_new_cred(struct cred *cred)
static inline const struct cred *get_cred(const struct cred *cred)
{
struct cred *nonconst_cred = (struct cred *) cred;
- validate_creds(cred);
return get_new_cred(nonconst_cred);
}

@@ -268,7 +205,7 @@ static inline void put_cred(const struct cred *_cred)
{
struct cred *cred = (struct cred *) _cred;

- validate_creds(cred);
+ BUG_ON(atomic_read(&(cred)->usage) <= 0);
if (atomic_dec_and_test(&(cred)->usage))
__put_cred(cred);
}
diff --git a/kernel/cred.c b/kernel/cred.c
index 24dd2f5..1bb4d7e 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -18,18 +18,6 @@
#include <linux/cn_proc.h>
#include "cred-internals.h"

-#if 0
-#define kdebug(FMT, ...) \
- printk("[%-5.5s%5u] "FMT"\n", current->comm, current->pid ,##__VA_ARGS__)
-#else
-static inline __attribute__((format(printf, 1, 2)))
-void no_printk(const char *fmt, ...)
-{
-}
-#define kdebug(FMT, ...) \
- no_printk("[%-5.5s%5u] "FMT"\n", current->comm, current->pid ,##__VA_ARGS__)
-#endif
-
static struct kmem_cache *cred_jar;

/*
@@ -48,10 +36,6 @@ static struct thread_group_cred init_tgcred = {
*/
struct cred init_cred = {
.usage = ATOMIC_INIT(4),
-#ifdef CONFIG_DEBUG_CREDENTIALS
- .subscribers = ATOMIC_INIT(2),
- .magic = CRED_MAGIC,
-#endif
.securebits = SECUREBITS_DEFAULT,
.cap_inheritable = CAP_INIT_INH_SET,
.cap_permitted = CAP_FULL_SET,
@@ -64,31 +48,6 @@ struct cred init_cred = {
#endif
};

-static inline void set_cred_subscribers(struct cred *cred, int n)
-{
-#ifdef CONFIG_DEBUG_CREDENTIALS
- atomic_set(&cred->subscribers, n);
-#endif
-}
-
-static inline int read_cred_subscribers(const struct cred *cred)
-{
-#ifdef CONFIG_DEBUG_CREDENTIALS
- return atomic_read(&cred->subscribers);
-#else
- return 0;
-#endif
-}
-
-static inline void alter_cred_subscribers(const struct cred *_cred, int n)
-{
-#ifdef CONFIG_DEBUG_CREDENTIALS
- struct cred *cred = (struct cred *) _cred;
-
- atomic_add(n, &cred->subscribers);
-#endif
-}
-
/*
* Dispose of the shared task group credentials
*/
@@ -126,22 +85,9 @@ static void put_cred_rcu(struct rcu_head *rcu)
{
struct cred *cred = container_of(rcu, struct cred, rcu);

- kdebug("put_cred_rcu(%p)", cred);
-
-#ifdef CONFIG_DEBUG_CREDENTIALS
- if (cred->magic != CRED_MAGIC_DEAD ||
- atomic_read(&cred->usage) != 0 ||
- read_cred_subscribers(cred) != 0)
- panic("CRED: put_cred_rcu() sees %p with"
- " mag %x, put %p, usage %d, subscr %d\n",
- cred, cred->magic, cred->put_addr,
- atomic_read(&cred->usage),
- read_cred_subscribers(cred));
-#else
if (atomic_read(&cred->usage) != 0)
panic("CRED: put_cred_rcu() sees %p with usage %d\n",
cred, atomic_read(&cred->usage));
-#endif

security_cred_free(cred);
key_put(cred->thread_keyring);
@@ -160,47 +106,12 @@ static void put_cred_rcu(struct rcu_head *rcu)
*/
void __put_cred(struct cred *cred)
{
- kdebug("__put_cred(%p{%d,%d})", cred,
- atomic_read(&cred->usage),
- read_cred_subscribers(cred));
-
BUG_ON(atomic_read(&cred->usage) != 0);
-#ifdef CONFIG_DEBUG_CREDENTIALS
- BUG_ON(read_cred_subscribers(cred) != 0);
- cred->magic = CRED_MAGIC_DEAD;
- cred->put_addr = __builtin_return_address(0);
-#endif
- BUG_ON(cred == current->cred);
- BUG_ON(cred == current->real_cred);

call_rcu(&cred->rcu, put_cred_rcu);
}
EXPORT_SYMBOL(__put_cred);

-/*
- * Clean up a task's credentials when it exits
- */
-void exit_creds(struct task_struct *tsk)
-{
- struct cred *cred;
-
- kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred,
- atomic_read(&tsk->cred->usage),
- read_cred_subscribers(tsk->cred));
-
- cred = (struct cred *) tsk->real_cred;
- tsk->real_cred = NULL;
- validate_creds(cred);
- alter_cred_subscribers(cred, -1);
- put_cred(cred);
-
- cred = (struct cred *) tsk->cred;
- tsk->cred = NULL;
- validate_creds(cred);
- alter_cred_subscribers(cred, -1);
- put_cred(cred);
-}
-
/**
* prepare_creds - Prepare a new set of credentials for modification
*
@@ -221,19 +132,16 @@ struct cred *prepare_creds(void)
const struct cred *old;
struct cred *new;

- validate_process_creds();
+ BUG_ON(atomic_read(&task->real_cred->usage) < 1);

new = kmem_cache_alloc(cred_jar, GFP_KERNEL);
if (!new)
return NULL;

- kdebug("prepare_creds() alloc %p", new);
-
old = task->cred;
memcpy(new, old, sizeof(struct cred));

atomic_set(&new->usage, 1);
- set_cred_subscribers(new, 0);
get_group_info(new->group_info);
get_uid(new->user);

@@ -249,7 +157,6 @@ struct cred *prepare_creds(void)

if (security_prepare_creds(new, old, GFP_KERNEL) < 0)
goto error;
- validate_creds(new);
return new;

error:
@@ -322,12 +229,9 @@ struct cred *prepare_usermodehelper_creds(void)
if (!new)
return NULL;

- kdebug("prepare_usermodehelper_creds() alloc %p", new);
-
memcpy(new, &init_cred, sizeof(struct cred));

atomic_set(&new->usage, 1);
- set_cred_subscribers(new, 0);
get_group_info(new->group_info);
get_uid(new->user);

@@ -346,7 +250,6 @@ struct cred *prepare_usermodehelper_creds(void)
#endif
if (security_prepare_creds(new, &init_cred, GFP_ATOMIC) < 0)
goto error;
- validate_creds(new);

BUG_ON(atomic_read(&new->usage) != 1);
return new;
@@ -383,10 +286,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
) {
p->real_cred = get_cred(p->cred);
get_cred(p->cred);
- alter_cred_subscribers(p->cred, 2);
- kdebug("share_creds(%p{%d,%d})",
- p->cred, atomic_read(&p->cred->usage),
- read_cred_subscribers(p->cred));
atomic_inc(&p->cred->user->processes);
return 0;
}
@@ -432,8 +331,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)

atomic_inc(&new->user->processes);
p->cred = p->real_cred = get_cred(new);
- alter_cred_subscribers(new, 2);
- validate_creds(new);
return 0;

error_put:
@@ -458,20 +355,13 @@ error_put:
int commit_creds(struct cred *new)
{
struct task_struct *task = current;
- const struct cred *old = task->real_cred;
-
- kdebug("commit_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
- read_cred_subscribers(new));
+ const struct cred *old;

- BUG_ON(task->cred != old);
-#ifdef CONFIG_DEBUG_CREDENTIALS
- BUG_ON(read_cred_subscribers(old) < 2);
- validate_creds(old);
- validate_creds(new);
-#endif
+ BUG_ON(task->cred != task->real_cred);
+ BUG_ON(atomic_read(&task->real_cred->usage) < 2);
BUG_ON(atomic_read(&new->usage) < 1);

+ old = task->real_cred;
security_commit_creds(new, old);

get_cred(new); /* we will require a ref for the subj creds too */
@@ -500,14 +390,12 @@ int commit_creds(struct cred *new)
* cheaply with the new uid cache, so if it matters
* we should be checking for it. -DaveM
*/
- alter_cred_subscribers(new, 2);
if (new->user != old->user)
atomic_inc(&new->user->processes);
rcu_assign_pointer(task->real_cred, new);
rcu_assign_pointer(task->cred, new);
if (new->user != old->user)
atomic_dec(&old->user->processes);
- alter_cred_subscribers(old, -2);

sched_switch_user(task);

@@ -540,13 +428,6 @@ EXPORT_SYMBOL(commit_creds);
*/
void abort_creds(struct cred *new)
{
- kdebug("abort_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
- read_cred_subscribers(new));
-
-#ifdef CONFIG_DEBUG_CREDENTIALS
- BUG_ON(read_cred_subscribers(new) != 0);
-#endif
BUG_ON(atomic_read(&new->usage) < 1);
put_cred(new);
}
@@ -563,20 +444,7 @@ const struct cred *override_creds(const struct cred *new)
{
const struct cred *old = current->cred;

- kdebug("override_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
- read_cred_subscribers(new));
-
- validate_creds(old);
- validate_creds(new);
- get_cred(new);
- alter_cred_subscribers(new, 1);
- rcu_assign_pointer(current->cred, new);
- alter_cred_subscribers(old, -1);
-
- kdebug("override_creds() = %p{%d,%d}", old,
- atomic_read(&old->usage),
- read_cred_subscribers(old));
+ rcu_assign_pointer(current->cred, get_cred(new));
return old;
}
EXPORT_SYMBOL(override_creds);
@@ -592,15 +460,7 @@ void revert_creds(const struct cred *old)
{
const struct cred *override = current->cred;

- kdebug("revert_creds(%p{%d,%d})", old,
- atomic_read(&old->usage),
- read_cred_subscribers(old));
-
- validate_creds(old);
- validate_creds(override);
- alter_cred_subscribers(old, 1);
rcu_assign_pointer(current->cred, old);
- alter_cred_subscribers(override, -1);
put_cred(override);
}
EXPORT_SYMBOL(revert_creds);
@@ -642,15 +502,11 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
if (!new)
return NULL;

- kdebug("prepare_kernel_cred() alloc %p", new);
-
if (daemon)
old = get_task_cred(daemon);
else
old = get_cred(&init_cred);

- validate_creds(old);
-
*new = *old;
get_uid(new->user);
get_group_info(new->group_info);
@@ -670,9 +526,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
goto error;

atomic_set(&new->usage, 1);
- set_cred_subscribers(new, 0);
put_cred(old);
- validate_creds(new);
return new;

error:
@@ -735,95 +589,3 @@ int set_create_files_as(struct cred *new, struct inode *inode)
return security_kernel_create_files_as(new, inode);
}
EXPORT_SYMBOL(set_create_files_as);
-
-#ifdef CONFIG_DEBUG_CREDENTIALS
-
-/*
- * dump invalid credentials
- */
-static void dump_invalid_creds(const struct cred *cred, const char *label,
- const struct task_struct *tsk)
-{
- printk(KERN_ERR "CRED: %s credentials: %p %s%s%s\n",
- label, cred,
- cred == &init_cred ? "[init]" : "",
- cred == tsk->real_cred ? "[real]" : "",
- cred == tsk->cred ? "[eff]" : "");
- printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n",
- cred->magic, cred->put_addr);
- printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n",
- atomic_read(&cred->usage),
- read_cred_subscribers(cred));
- printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n",
- cred->uid, cred->euid, cred->suid, cred->fsuid);
- printk(KERN_ERR "CRED: ->*gid = { %d,%d,%d,%d }\n",
- cred->gid, cred->egid, cred->sgid, cred->fsgid);
-#ifdef CONFIG_SECURITY
- printk(KERN_ERR "CRED: ->security is %p\n", cred->security);
- if ((unsigned long) cred->security >= PAGE_SIZE &&
- (((unsigned long) cred->security & 0xffffff00) !=
- (POISON_FREE << 24 | POISON_FREE << 16 | POISON_FREE << 8)))
- printk(KERN_ERR "CRED: ->security {%x, %x}\n",
- ((u32*)cred->security)[0],
- ((u32*)cred->security)[1]);
-#endif
-}
-
-/*
- * report use of invalid credentials
- */
-void __invalid_creds(const struct cred *cred, const char *file, unsigned line)
-{
- printk(KERN_ERR "CRED: Invalid credentials\n");
- printk(KERN_ERR "CRED: At %s:%u\n", file, line);
- dump_invalid_creds(cred, "Specified", current);
- BUG();
-}
-EXPORT_SYMBOL(__invalid_creds);
-
-/*
- * check the credentials on a process
- */
-void __validate_process_creds(struct task_struct *tsk,
- const char *file, unsigned line)
-{
- if (tsk->cred == tsk->real_cred) {
- if (unlikely(read_cred_subscribers(tsk->cred) < 2 ||
- creds_are_invalid(tsk->cred)))
- goto invalid_creds;
- } else {
- if (unlikely(read_cred_subscribers(tsk->real_cred) < 1 ||
- read_cred_subscribers(tsk->cred) < 1 ||
- creds_are_invalid(tsk->real_cred) ||
- creds_are_invalid(tsk->cred)))
- goto invalid_creds;
- }
- return;
-
-invalid_creds:
- printk(KERN_ERR "CRED: Invalid process credentials\n");
- printk(KERN_ERR "CRED: At %s:%u\n", file, line);
-
- dump_invalid_creds(tsk->real_cred, "Real", tsk);
- if (tsk->cred != tsk->real_cred)
- dump_invalid_creds(tsk->cred, "Effective", tsk);
- else
- printk(KERN_ERR "CRED: Effective creds == Real creds\n");
- BUG();
-}
-EXPORT_SYMBOL(__validate_process_creds);
-
-/*
- * check creds for do_exit()
- */
-void validate_creds_for_do_exit(struct task_struct *tsk)
-{
- kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})",
- tsk->real_cred, tsk->cred,
- atomic_read(&tsk->cred->usage),
- read_cred_subscribers(tsk->cred));
-
- __validate_process_creds(tsk, __FILE__, __LINE__);
-}
-
-#endif /* CONFIG_DEBUG_CREDENTIALS */
diff --git a/kernel/exit.c b/kernel/exit.c
index ae5d866..263f95e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -901,8 +901,6 @@ NORET_TYPE void do_exit(long code)

tracehook_report_exit(&code);

- validate_creds_for_do_exit(tsk);
-
/*
* We're taking recursive faults here in do_exit. Safest is to just
* leave this task alone and wait for reboot.
@@ -1011,8 +1009,6 @@ NORET_TYPE void do_exit(long code)
if (tsk->splice_pipe)
__free_pipe_info(tsk->splice_pipe);

- validate_creds_for_do_exit(tsk);
-
preempt_disable();
exit_rcu();
/* causes final put_task_struct in finish_task_switch(). */
diff --git a/kernel/fork.c b/kernel/fork.c
index bfee931..637520c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -152,7 +152,8 @@ void __put_task_struct(struct task_struct *tsk)
WARN_ON(atomic_read(&tsk->usage));
WARN_ON(tsk == current);

- exit_creds(tsk);
+ put_cred(tsk->real_cred);
+ put_cred(tsk->cred);
delayacct_tsk_free(tsk);

if (!profile_handoff_task(tsk))
@@ -1293,7 +1294,8 @@ bad_fork_cleanup_put_domain:
module_put(task_thread_info(p)->exec_domain->module);
bad_fork_cleanup_count:
atomic_dec(&p->cred->user->processes);
- exit_creds(p);
+ put_cred(p->real_cred);
+ put_cred(p->cred);
bad_fork_free:
free_task(p);
fork_out:
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 9fcb53a..94abc21 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -470,7 +470,6 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info,
int retval = 0;

BUG_ON(atomic_read(&sub_info->cred->usage) != 1);
- validate_creds(sub_info->cred);

helper_lock();
if (sub_info->path[0] == '\0')
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e08ffa1..63f0906 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -652,21 +652,6 @@ config DEBUG_NOTIFIERS
This is a relatively cheap check but if you care about maximum
performance, say N.

-config DEBUG_CREDENTIALS
- bool "Debug credential management"
- depends on DEBUG_KERNEL
- help
- Enable this to turn on some debug checking for credential
- management. The additional code keeps track of the number of
- pointers from task_structs to any given cred struct, and checks to
- see that this number never exceeds the usage count of the cred
- struct.
-
- Furthermore, if SELinux is enabled, this also checks that the
- security pointer in the cred struct is never seen to be invalid.
-
- If unsure, say N.
-
#
# Select this config option from the architecture Kconfig, if it
# it is preferred to always offer frame pointers as a config
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ec04cc2..772c1fa 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1535,8 +1535,6 @@ static int inode_has_perm(const struct cred *cred,
struct common_audit_data ad;
u32 sid;

- validate_creds(cred);
-
if (unlikely(IS_PRIVATE(inode)))
return 0;

@@ -3243,9 +3241,7 @@ static int selinux_task_create(unsigned long clone_flags)
static void selinux_cred_free(struct cred *cred)
{
struct task_security_struct *tsec = cred->security;
-
- BUG_ON((unsigned long) cred->security < PAGE_SIZE);
- cred->security = (void *) 0x7UL;
+ cred->security = NULL;
kfree(tsec);
}

2009-09-12 08:41:17

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] out-of-tree: Whack warning off in kernel/cred.c ...


* Ingo Molnar <[email protected]> wrote:

> > I'll try a blind (and manual) revert of:
> >
> > ee18d64: KEYS: Add a keyctl to install a process's session keyring
> > on its parent [try #6
>
> that didnt do the trick, nor did this:
>
> 1a51e09: Revert "KEYS: Add a keyctl to install a process's session keyring on its parent
>
> These were the only two changes to cred.c.

Whacking off the BUG()s via the hack below gave me a booting system.

( Btw., WARN_ONCE() / WARN_ON_ONCE() constructs are in fashion these
days not BUG() - they are real time savers ;-) )

Ingo

---------->
>From 2723334da705b2bb162bb6c7dabbbb4806278758 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 12 Sep 2009 10:21:33 +0200
Subject: [PATCH] out-of-tree: Whack warning off in kernel/cred.c ...

Prevent a crash with selinux=0.

NOT-Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/cred.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 006fcab..782e362 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -817,10 +817,12 @@ static void dump_invalid_creds(const struct cred *cred, const char *label,
*/
void __invalid_creds(const struct cred *cred, const char *file, unsigned line)
{
+#if 0
printk(KERN_ERR "CRED: Invalid credentials\n");
printk(KERN_ERR "CRED: At %s:%u\n", file, line);
dump_invalid_creds(cred, "Specified", current);
BUG();
+#endif
}
EXPORT_SYMBOL(__invalid_creds);

@@ -844,6 +846,7 @@ void __validate_process_creds(struct task_struct *tsk,
return;

invalid_creds:
+#if 0
printk(KERN_ERR "CRED: Invalid process credentials\n");
printk(KERN_ERR "CRED: At %s:%u\n", file, line);

@@ -853,6 +856,8 @@ invalid_creds:
else
printk(KERN_ERR "CRED: Effective creds == Real creds\n");
BUG();
+#endif
+ ;
}
EXPORT_SYMBOL(__validate_process_creds);

2009-09-12 09:47:41

by Eric Paris

[permalink] [raw]
Subject: Re: [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux"

On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> James - i did not see a security pull request email from you in my
> lkml folder so i created this new thread. -tip testing found the
> easy crash below. It reverts cleanly so i went that easy route.
>
> At a really quick 10-seconds glance the crash happens because we
> destroy the slab cache twice, if the sysctl is toggled twice?

No, it's only being free'd once (and can only be freed once since
the /selinuxfs file disappears when it happens). It's being freed while
there are still entries in it.

This actually points out to me that SELinux was leaking memory when
disabled at run time (not when disabled from the kernel command line)
and that's the real problem.

I'll take a look at it tonight, James, if you haven't ask Linus to pull
can you hold off until I get this long standing memory leak fixed? If
Linus already took the change we should revert and do them both again.
(This patch is right, just obviously incomplete)

-Eric

2009-09-12 09:59:39

by Eric Paris

[permalink] [raw]
Subject: Re: [origin tree boot crash #2] kernel BUG at kernel/cred.c:855!

On Sat, 2009-09-12 at 09:58 +0200, Ingo Molnar wrote:
> below is another boot crash.

> [ 0.022999] Security Framework initialized
> [ 0.023999] SELinux: Disabled at boot.
> [ 0.024999] Mount-cache hash table entries: 512
> [ 0.028999] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.029999] CPU: L2 Cache: 512K (64 bytes/line)
> [ 0.030999] CPU: Physical Processor ID: 0
> [ 0.031999] CPU: Processor Core ID: 0
> [ 0.032999] Checking 'hlt' instruction... OK.
> [ 0.038999] CRED: Invalid process credentials
> [ 0.039999] CRED: At kernel/cred.c:267
> [ 0.040999] CRED: Real credentials: c19ab770 [init][real][eff]
> [ 0.041999] CRED: ->magic=43736564, put_addr=(null)
> [ 0.042999] CRED: ->usage=4, subscr=2
> [ 0.043999] CRED: ->*uid = { 0,0,0,0 }
> [ 0.044999] CRED: ->*gid = { 0,0,0,0 }
> [ 0.045999] CRED: ->security is (null)
> [ 0.046999] CRED: Effective creds == Real creds
> [ 0.047999] ------------[ cut here ]------------
> [ 0.047999] kernel BUG at kernel/cred.c:855!
> [ 0.047999] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 0.047999] last sysfs file:
> [ 0.047999] Modules linked in:
> [ 0.047999]
> [ 0.047999] Pid: 0, comm: swapper Not tainted (2.6.31-tip-02294-g6f4c721-dirty #12983) System Product Name
> [ 0.047999] EIP: 0060:[<c1064f9c>] EFLAGS: 00010282 CPU: 0
> [ 0.047999] EIP is at __validate_process_creds+0xd6/0xfe
> [ 0.047999] EAX: c18642ba EBX: c19ab770 ECX: c106d02f EDX: c16cde5d
> [ 0.047999] ESI: c19a5960 EDI: 0000010b EBP: c199fea4 ESP: c199fe94
> [ 0.047999] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 0.047999] Process swapper (pid: 0, ti=c199f000 task=c19a5960 task.ti=c199f000)
> [ 0.047999] Stack:
> [ 0.047999] c18642e2 f7868000 c19a5960 00000000 c199feb8 c10653b1 f7868000 00800b00
> [ 0.047999] <0> 00000000 c199fed4 c1065771 f7868000 00000000 f7868000 00000000 00000000
> [ 0.047999] <0> c199ff04 c104c8d8 f7868000 c199ff84 00000000 00800b00 00000001 00000000
> [ 0.047999] Call Trace:
> [ 0.047999] [<c10653b1>] ? prepare_creds+0x1e/0xb1
> [ 0.047999] [<c1065771>] ? copy_creds+0x85/0x1cc
> [ 0.047999] [<c104c8d8>] ? copy_process+0x18b/0xc75
> [ 0.047999] [<c104d4d5>] ? do_fork+0x113/0x28d
> [ 0.047999] [<c106faef>] ? __lock_release+0x15e/0x164
> [ 0.047999] [<c16ccd91>] ? __mutex_unlock_slowpath+0xf8/0x107
> [ 0.047999] [<c1017b13>] ? kernel_thread+0x80/0x88
> [ 0.047999] [<c1a15331>] ? kernel_init+0x0/0xa6
> [ 0.047999] [<c1a15331>] ? kernel_init+0x0/0xa6
> [ 0.047999] [<c1019ab0>] ? kernel_thread_helper+0x0/0x10
> [ 0.047999] [<c169c241>] ? rest_init+0x19/0x5f
> [ 0.047999] [<c1a158c7>] ? start_kernel+0x310/0x315
> [ 0.047999] [<c1a15098>] ? __init_begin+0x98/0x9d
> [ 0.047999] Code: ff 8b 86 dc 02 00 00 83 c4 10 3b 86 d8 02 00 00 74 0e 89 f1 ba b0 42 86 c1 e8 55 fe ff ff eb 0b 68 ba 42 86 c1 e8 d9 6f 66 00 58 <0f> 0b eb fe 81 7b 0c 64 65 73 43 75 9f e9 58 ff ff ff 81 79 0c
> [ 0.047999] EIP: [<c1064f9c>] __validate_process_creds+0xd6/0xfe SS:ESP 0068:c199fe94

[adding the creds guy even though it isn't in MAINTAINERS]

This had to come from e0e817392b9acf2c98d3be80c233dddb1b52003d
CRED: Add some configurable debugging [try #6]

static inline bool creds_are_invalid(const struct cred *cred)
{
[snip]
#ifdef CONFIG_SECURITY_SELINUX
if ((unsigned long) cred->security < PAGE_SIZE)
return true;
if ((*(u32*)cred->security & 0xffffff00) ==
(POISON_FREE << 24 | POISON_FREE << 16 | POISON_FREE << 8))
return true;
#endif

cred->security could be NULL with CONFIG_SECURITY_SELINUX but when
SELinux is disabled at run time (which obviously you are doing). I
think the checks are generally good, but they need to also look for
if(selinux_enabled)

If I don't hear anything today I'll patch it tonight along with the
other bug....

-Eric

2009-09-12 10:44:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux"


* Eric Paris <[email protected]> wrote:

> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> > James - i did not see a security pull request email from you in my
> > lkml folder so i created this new thread. -tip testing found the
> > easy crash below. It reverts cleanly so i went that easy route.
> >
> > At a really quick 10-seconds glance the crash happens because we
> > destroy the slab cache twice, if the sysctl is toggled twice?
>
> No, it's only being free'd once (and can only be freed once since
> the /selinuxfs file disappears when it happens). It's being freed
> while there are still entries in it.
>
> This actually points out to me that SELinux was leaking memory
> when disabled at run time (not when disabled from the kernel
> command line) and that's the real problem.
>
> I'll take a look at it tonight, James, if you haven't ask Linus to
> pull can you hold off until I get this long standing memory leak
> fixed? If Linus already took the change we should revert and do
> them both again. (This patch is right, just obviously incomplete)

FYI, the changes went all upstream yesterday.

Ingo

2009-09-12 13:58:54

by Ingo Molnar

[permalink] [raw]
Subject: [origin tree boot hang] lockup in key_schedule_gc()


here's a new crash related to security changes - a boot lockup on a
testbox:

Pid: 5, comm: events/0 Tainted: G W (2.6.31-tip-02301-g1c11bd7-dirty #13102) System Product Name
EIP: 0060:[<c104ad77>] EFLAGS: 00000046 CPU: 0
EIP is at trace_hardirqs_off_caller+0x30/0x9a
EAX: 00000002 EBX: f70431c0 ECX: c18c8e58 EDX: c10138ce
ESI: c10138ce EDI: 00000002 EBP: f7051ddc ESP: f7051dd4
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0: 8005003b CR2: b745e530 CR3: 3618f000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: c210fa6c
DR6: ffff0ff0 DR7: 00000400
Call Trace:
[<c104adec>] trace_hardirqs_off+0xb/0xd
[<c10138ce>] default_send_IPI_mask_logical+0xd2/0xe3
[<c1013759>] default_send_IPI_all+0x27/0x67
[<c1013a97>] arch_trigger_all_cpu_backtrace+0x30/0x54
[<c1060c46>] __rcu_pending+0x49/0x113
[<c1060d2d>] rcu_check_callbacks+0x1d/0x9c
[<c103390c>] update_process_times+0x29/0x3e
[<c104713c>] tick_periodic+0x6a/0x6c
[<c1047152>] tick_handle_periodic+0x14/0x6a
[<c1013463>] smp_apic_timer_interrupt+0x63/0x73
[<c100302f>] apic_timer_interrupt+0x2f/0x40
[<c15fdaaa>] ? _spin_unlock_irqrestore+0x3d/0x41
[<c103a685>] __queue_work+0x2b/0x30
[<c103a6d1>] queue_work_on+0x2c/0x36
[<c103a7b2>] queue_work+0x13/0x15
[<c103a7c8>] schedule_work+0x14/0x16
[<c125178a>] key_schedule_gc+0x28/0x4e
[<c1251917>] key_garbage_collector+0x167/0x180
[<c103a004>] run_workqueue+0xfb/0x1c4
[<c1039fe5>] ? run_workqueue+0xdc/0x1c4
[<c12517b0>] ? key_garbage_collector+0x0/0x180
[<c103a146>] worker_thread+0x79/0x85
[<c103d3e3>] ? autoremove_wake_function+0x0/0x38
[<c103a0cd>] ? worker_thread+0x0/0x85
[<c103d1d2>] kthread+0x65/0x6a
[<c103d16d>] ? kthread+0x0/0x6a
[<c1003267>] kernel_thread_helper+0x7/0x10
Pid: 5, comm: events/0 Tainted: G W 2.6.31-tip-02301-g1c11bd7-dirty #13102

config and bootlog attached.

Ingo


Attachments:
(No filename) (1.81 kB)
config (72.62 kB)
hang.log (181.39 kB)
Download all attachments

2009-09-12 20:27:50

by Eric Paris

[permalink] [raw]
Subject: Re: [origin tree boot hang] lockup in key_schedule_gc()

On Sat, 2009-09-12 at 15:58 +0200, Ingo Molnar wrote:
> here's a new crash related to security changes - a boot lockup on a
> testbox:
>
> Pid: 5, comm: events/0 Tainted: G W (2.6.31-tip-02301-g1c11bd7-dirty #13102) System Product Name
> EIP: 0060:[<c104ad77>] EFLAGS: 00000046 CPU: 0
> EIP is at trace_hardirqs_off_caller+0x30/0x9a
> EAX: 00000002 EBX: f70431c0 ECX: c18c8e58 EDX: c10138ce
> ESI: c10138ce EDI: 00000002 EBP: f7051ddc ESP: f7051dd4
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> CR0: 8005003b CR2: b745e530 CR3: 3618f000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: c210fa6c
> DR6: ffff0ff0 DR7: 00000400
> Call Trace:
> [<c104adec>] trace_hardirqs_off+0xb/0xd
> [<c10138ce>] default_send_IPI_mask_logical+0xd2/0xe3
> [<c1013759>] default_send_IPI_all+0x27/0x67
> [<c1013a97>] arch_trigger_all_cpu_backtrace+0x30/0x54
> [<c1060c46>] __rcu_pending+0x49/0x113
> [<c1060d2d>] rcu_check_callbacks+0x1d/0x9c
> [<c103390c>] update_process_times+0x29/0x3e
> [<c104713c>] tick_periodic+0x6a/0x6c
> [<c1047152>] tick_handle_periodic+0x14/0x6a
> [<c1013463>] smp_apic_timer_interrupt+0x63/0x73
> [<c100302f>] apic_timer_interrupt+0x2f/0x40
> [<c15fdaaa>] ? _spin_unlock_irqrestore+0x3d/0x41
> [<c103a685>] __queue_work+0x2b/0x30
> [<c103a6d1>] queue_work_on+0x2c/0x36
> [<c103a7b2>] queue_work+0x13/0x15
> [<c103a7c8>] schedule_work+0x14/0x16
> [<c125178a>] key_schedule_gc+0x28/0x4e
> [<c1251917>] key_garbage_collector+0x167/0x180
> [<c103a004>] run_workqueue+0xfb/0x1c4
> [<c1039fe5>] ? run_workqueue+0xdc/0x1c4
> [<c12517b0>] ? key_garbage_collector+0x0/0x180
> [<c103a146>] worker_thread+0x79/0x85
> [<c103d3e3>] ? autoremove_wake_function+0x0/0x38
> [<c103a0cd>] ? worker_thread+0x0/0x85
> [<c103d1d2>] kthread+0x65/0x6a
> [<c103d16d>] ? kthread+0x0/0x6a
> [<c1003267>] kernel_thread_helper+0x7/0x10
> Pid: 5, comm: events/0 Tainted: G W 2.6.31-tip-02301-g1c11bd7-dirty #13102
>
> config and bootlog attached.

Adding dhowells, the keys maintainer, this one certainly isn't obvious
to me off hand.

-Eric

2009-09-13 02:30:04

by Eric Paris

[permalink] [raw]
Subject: Re: [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux"

On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> James - i did not see a security pull request email from you in my
> lkml folder so i created this new thread. -tip testing found the
> easy crash below. It reverts cleanly so i went that easy route.
>
> At a really quick 10-seconds glance the crash happens because we
> destroy the slab cache twice, if the sysctl is toggled twice?

Something a lot worse than SELinux here. I added this exact code and
got this warning. Something is wrong in the world of
kmem_cache_destroy.....

static struct kmem_cache *tmp_cachep;
tmp_cachep = kmem_cache_create("tmp_cache", sizeof(struct avc_node), 0, SLAB_PANIC, NULL);
if (tmp_cachep)
kmem_cache_destroy(tmp_cachep);

[ 0.006076] ------------[ cut here ]------------
[ 0.007019] WARNING: at lib/kobject.c:595 kobject_put+0x6e/0x80()
[ 0.008011] Hardware name:
[ 0.009006] kobject: '<NULL>' (ffff88001f8da128): is not initialized, yet kobject_put() is being called.
[ 0.010005] Modules linked in:
[ 0.011284] Pid: 0, comm: swapper Not tainted 2.6.31-next-20090911 #17
[ 0.012011] Call Trace:
[ 0.013008] [<ffffffff8129460e>] ? kobject_put+0x6e/0x80
[ 0.014009] [<ffffffff81070b71>] warn_slowpath_common+0x91/0xd0
[ 0.015006] [<ffffffff81070c66>] warn_slowpath_fmt+0x76/0xa0
[ 0.016016] [<ffffffff811dd8c3>] ? sysfs_remove_dir+0x43/0xf0
[ 0.017007] [<ffffffff810b3a5d>] ? trace_hardirqs_on_caller+0x14d/0x1e0
[ 0.018007] [<ffffffff8129460e>] kobject_put+0x6e/0x80
[ 0.019005] [<ffffffff8129607e>] ? kobject_uevent+0x1e/0x40
[ 0.020016] [<ffffffff81159933>] kmem_cache_destroy+0x213/0x250
[ 0.021008] [<ffffffff812a3d37>] ? __spin_lock_init+0x47/0x90
[ 0.022012] [<ffffffff819e6860>] ? early_idt_handler+0x0/0x71
[ 0.023008] [<ffffffff81a18da3>] avc_init+0xd3/0x120
[ 0.024010] [<ffffffff81a1903e>] selinux_init+0xfe/0x210
[ 0.025006] [<ffffffff819e6860>] ? early_idt_handler+0x0/0x71
[ 0.026004] [<ffffffff81a18bb2>] security_init+0x52/0x80
[ 0.027005] [<ffffffff81a18a16>] ? key_init+0xc6/0xf0
[ 0.028009] [<ffffffff819e764a>] start_kernel+0x35a/0x490
[ 0.029005] [<ffffffff819e69d4>] x86_64_start_reservations+0x94/0xf0
[ 0.030004] [<ffffffff819e6b38>] x86_64_start_kernel+0x108/0x150
[ 0.031015] ---[ end trace a7919e7f17c0a725 ]---

2009-09-13 23:03:06

by Eric Paris

[permalink] [raw]
Subject: Re: [origin tree boot crash] Revert "selinux: clean up avc node cache when disabling selinux"

On Sat, Sep 12, 2009 at 10:28 PM, Eric Paris <[email protected]> wrote:
> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:

> Something a lot worse than SELinux here. ?I added this exact code and
> got this warning. ?Something is wrong in the world of
> kmem_cache_destroy.....
>
> static struct kmem_cache *tmp_cachep;
> tmp_cachep = kmem_cache_create("tmp_cache", sizeof(struct avc_node), 0, SLAB_PANIC, NULL);
> ? ? ? ?if (tmp_cachep)
> ? ? ? ? ? ? ? ?kmem_cache_destroy(tmp_cachep);
>
> [ ? ?0.006076] ------------[ cut here ]------------
> [ ? ?0.007019] WARNING: at lib/kobject.c:595 kobject_put+0x6e/0x80()
> [ ? ?0.008011] Hardware name:
> [ ? ?0.009006] kobject: '<NULL>' (ffff88001f8da128): is not initialized, yet kobject_put() is being called.
> [ ? ?0.010005] Modules linked in:
> [ ? ?0.011284] Pid: 0, comm: swapper Not tainted 2.6.31-next-20090911 #17
> [ ? ?0.012011] Call Trace:

Just for those playing along at home, I sent a series of 3 patches to
fix the invalid creds test and to clear the avc_node_cachep before
freeing it.
http://marc.info/?l=linux-kernel&m=125281056403544&w=2
http://marc.info/?l=linux-kernel&m=125281056403547&w=2
http://marc.info/?l=linux-kernel&m=125281056403550&w=2

I also bisected at least one problem with kmem_cache_destroy which I
posted the bisect results here
http://marc.info/?l=linux-mm&m=125286686917465&w=2
2a38a002fbee06556489091c30b04746222167e4

2009-09-14 06:16:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree boot hang] lockup in key_schedule_gc()


* Eric Paris <[email protected]> wrote:

> On Sat, 2009-09-12 at 15:58 +0200, Ingo Molnar wrote:
> > here's a new crash related to security changes - a boot lockup on a
> > testbox:
> >
> > Pid: 5, comm: events/0 Tainted: G W (2.6.31-tip-02301-g1c11bd7-dirty #13102) System Product Name
> > EIP: 0060:[<c104ad77>] EFLAGS: 00000046 CPU: 0
> > EIP is at trace_hardirqs_off_caller+0x30/0x9a
> > EAX: 00000002 EBX: f70431c0 ECX: c18c8e58 EDX: c10138ce
> > ESI: c10138ce EDI: 00000002 EBP: f7051ddc ESP: f7051dd4
> > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > CR0: 8005003b CR2: b745e530 CR3: 3618f000 CR4: 000006d0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: c210fa6c
> > DR6: ffff0ff0 DR7: 00000400
> > Call Trace:
> > [<c104adec>] trace_hardirqs_off+0xb/0xd
> > [<c10138ce>] default_send_IPI_mask_logical+0xd2/0xe3
> > [<c1013759>] default_send_IPI_all+0x27/0x67
> > [<c1013a97>] arch_trigger_all_cpu_backtrace+0x30/0x54
> > [<c1060c46>] __rcu_pending+0x49/0x113
> > [<c1060d2d>] rcu_check_callbacks+0x1d/0x9c
> > [<c103390c>] update_process_times+0x29/0x3e
> > [<c104713c>] tick_periodic+0x6a/0x6c
> > [<c1047152>] tick_handle_periodic+0x14/0x6a
> > [<c1013463>] smp_apic_timer_interrupt+0x63/0x73
> > [<c100302f>] apic_timer_interrupt+0x2f/0x40
> > [<c15fdaaa>] ? _spin_unlock_irqrestore+0x3d/0x41
> > [<c103a685>] __queue_work+0x2b/0x30
> > [<c103a6d1>] queue_work_on+0x2c/0x36
> > [<c103a7b2>] queue_work+0x13/0x15
> > [<c103a7c8>] schedule_work+0x14/0x16
> > [<c125178a>] key_schedule_gc+0x28/0x4e
> > [<c1251917>] key_garbage_collector+0x167/0x180
> > [<c103a004>] run_workqueue+0xfb/0x1c4
> > [<c1039fe5>] ? run_workqueue+0xdc/0x1c4
> > [<c12517b0>] ? key_garbage_collector+0x0/0x180
> > [<c103a146>] worker_thread+0x79/0x85
> > [<c103d3e3>] ? autoremove_wake_function+0x0/0x38
> > [<c103a0cd>] ? worker_thread+0x0/0x85
> > [<c103d1d2>] kthread+0x65/0x6a
> > [<c103d16d>] ? kthread+0x0/0x6a
> > [<c1003267>] kernel_thread_helper+0x7/0x10
> > Pid: 5, comm: events/0 Tainted: G W 2.6.31-tip-02301-g1c11bd7-dirty #13102
> >
> > config and bootlog attached.
>
> Adding dhowells, the keys maintainer, this one certainly isn't
> obvious to me off hand.

this bug also manifests itself in a plain 64-bit x86 defconfig
bootup on a system, events/1 goes looping burning 100% CPU time a
few minutes into the bootup:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8 root 20 0 0 0 0 R 100 0.0 4:39.22 events/1
3005 mingo 20 0 14728 1068 736 R 2 0.1 0:00.01 top
1 root 20 0 10308 732 616 S 0 0.1 0:01.59 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd

i've enabled the function graph tracer and events/1 does an ever
repeating loop of key_garbage_collector(), the same thing i reported
in the lockup:

1) | key_garbage_collector() {
1) 0.310 us | current_kernel_time();
1) 0.349 us | _spin_lock();
1) | key_schedule_gc() {
1) 0.327 us | current_kernel_time();
1) | schedule_work() {
1) | queue_work() {
1) | queue_work_on() {
1) | __queue_work() {
1) | _spin_lock_irqsave() {
1) | insert_work() {
1) | __wake_up() {
1) 0.317 us | _spin_lock_irqsave();
1) 0.345 us | __wake_up_common();
1) 0.298 us | _spin_unlock_irqrestore();
1) 2.242 us | }
1) 2.865 us | }
1) 0.351 us | _spin_unlock_irqrestore();
1) 4.811 us | }
1) 5.434 us | }
1) 5.946 us | }
1) 6.601 us | }
1) 7.964 us | }
1) + 10.169 us | }
1) 0.358 us | _spin_lock_irq();

find below a few more excerpts from the trace.

Ingo

1) | key_garbage_collector() {
1) 0.310 us | current_kernel_time();
1) 0.349 us | _spin_lock();
1) | key_schedule_gc() {
1) 0.327 us | current_kernel_time();
1) | schedule_work() {
1) | queue_work() {
1) | queue_work_on() {
1) | __queue_work() {
1) | _spin_lock_irqsave() {
1) | insert_work() {
1) | __wake_up() {
1) 0.317 us | _spin_lock_irqsave();
1) 0.345 us | __wake_up_common();
1) 0.298 us | _spin_unlock_irqrestore();
1) 2.242 us | }
1) 2.865 us | }
1) 0.351 us | _spin_unlock_irqrestore();
1) 4.811 us | }
1) 5.434 us | }
1) 5.946 us | }
1) 6.601 us | }
1) 7.964 us | }
1) + 10.169 us | }
1) 0.358 us | _spin_lock_irq();
1) | key_garbage_collector() {
1) 0.319 us | current_kernel_time();
1) 0.317 us | _spin_lock();
1) | key_schedule_gc() {
1) 0.286 us | current_kernel_time();
1) | schedule_work() {
1) | queue_work() {
1) | queue_work_on() {
1) | __queue_work() {
1) 0.310 us | _spin_lock_irqsave();
1) | insert_work() {
1) | __wake_up() {
1) 0.335 us | _spin_lock_irqsave();
1) 0.293 us | __wake_up_common();
1) 0.341 us | _spin_unlock_irqrestore();
1) 2.206 us | }
1) 2.873 us | }
1) 0.311 us | _spin_unlock_irqrestore();
1) 4.765 us | }
1) 5.380 us | }
1) 6.063 us | }
1) 6.691 us | }
1) 8.023 us | }
1) + 10.250 us | }
1) 0.336 us | _spin_lock_irq();
1) | key_garbage_collector() {
1) 0.285 us | current_kernel_time();
1) | _spin_lock() {
1) | key_schedule_gc() {
1) 0.323 us | current_kernel_time();
1) | schedule_work() {
1) | queue_work() {
1) | queue_work_on() {
1) | __queue_work() {
1) 0.287 us | _spin_lock_irqsave();
1) | insert_work() {
1) | __wake_up() {
1) 0.326 us | _spin_lock_irqsave();
1) 0.297 us | __wake_up_common();
1) 0.228 us | _spin_unlock_irqrestore();
1) 2.197 us | }
1) 3.173 us | }
1) 0.386 us | _spin_unlock_irqrestore();
1) 5.049 us | }
1) 5.659 us | }
1) 6.191 us | }
1) 6.723 us | }
1) 7.946 us | }
1) + 10.031 us | }
1) 0.301 us | _spin_lock_irq();
1) | key_garbage_collector() {
1) 0.252 us | current_kernel_time();
1) 0.234 us | _spin_lock();
1) | key_schedule_gc() {
1) 0.262 us | current_kernel_time();
1) | schedule_work() {
1) | queue_work() {
1) | queue_work_on() {
1) | __queue_work() {
1) 0.235 us | _spin_lock_irqsave();
1) | insert_work() {
1) | __wake_up() {
1) 0.267 us | _spin_lock_irqsave();
1) 0.231 us | __wake_up_common();
1) 0.266 us | _spin_unlock_irqrestore();
1) 1.785 us | }
1) 2.320 us | }
1) 0.228 us | _spin_unlock_irqrestore();
1) 3.900 us | }
1) 4.433 us | }
1) 4.980 us | }
1) 5.505 us | }
1) 6.571 us | }
1) 8.376 us | }

2009-09-14 07:17:37

by Ingo Molnar

[permalink] [raw]
Subject: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514


* Eric Paris <[email protected]> wrote:

> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> > James - i did not see a security pull request email from you in my
> > lkml folder so i created this new thread. -tip testing found the
> > easy crash below. It reverts cleanly so i went that easy route.
> >
> > At a really quick 10-seconds glance the crash happens because we
> > destroy the slab cache twice, if the sysctl is toggled twice?
>
> Something a lot worse than SELinux here. I added this exact code and
> got this warning. Something is wrong in the world of
> kmem_cache_destroy.....

-tip testing just triggered another type of SLAB problem (this time
not apparently related to the security subsystem):

BUG kmalloc-64: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00

Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
Object 0xf498f6b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf498f6c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk?
Redzone 0xf498f6d0: bb bb bb bb ????
Padding 0xf498f6f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Pid: 3514, comm: sync Not tainted 2.6.31-tip-02343-gb432421-dirty #14071
Call Trace:
[<c10e5b29>] print_trailer+0xf9/0x170
[<c10e5c95>] check_bytes_and_report+0xf5/0x120
[<c10e7129>] check_object+0x1e9/0x230
[<c10e848d>] alloc_debug_processing+0xfd/0x1d0
[<c111394b>] ? bdi_alloc_work+0x2b/0x100
[<c10e8687>] __slab_alloc+0x127/0x330
[<c111394b>] ? bdi_alloc_work+0x2b/0x100
[<c111394b>] ? bdi_alloc_work+0x2b/0x100
[<c10e8a43>] kmem_cache_alloc+0x1b3/0x1d0
[<c111394b>] ? bdi_alloc_work+0x2b/0x100
[<c111394b>] ? bdi_alloc_work+0x2b/0x100
[<c11148a4>] ? bdi_writeback_all+0x34/0x190
[<c111394b>] bdi_alloc_work+0x2b/0x100
[<c194c6b2>] ? _spin_lock+0x72/0x90
[<c11148e2>] bdi_writeback_all+0x72/0x190
[<c107b3db>] ? mark_held_locks+0x6b/0xb0
[<c194ab75>] ? __mutex_unlock_slowpath+0xf5/0x160
[<c107b76c>] ? trace_hardirqs_on_caller+0x15c/0x1c0
[<c1114a48>] sync_inodes_sb+0x48/0x70
[<c1118b0b>] __sync_filesystem+0x7b/0x90
[<c1118c13>] sync_filesystems+0xf3/0x140
[<c1118cd7>] sys_sync+0x27/0x60
[<c100344b>] sysenter_do_call+0x12/0x36
FIX kmalloc-64: Restoring 0xf498f6a0-0xf498f6a7=0x6b

Now, this might be an extended arm of the security related slab
troubles. (which seem to have been cured by the revert i posted
though.)

This bug is not bisectable at all - it happened after 1000+
successful random bootups. The mainline base for
2.6.31-tip-02343-gb432421 is upstream commit 86d7101.

Full bootlog and config attached.

Ingo


Attachments:
(No filename) (3.15 kB)
boot.log (188.49 kB)
config-Mon_Sep_14_08_54_02_CEST_2009.bad (65.02 kB)
Download all attachments

2009-09-14 07:57:04

by Pekka Enberg

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

* Eric Paris <[email protected]> wrote:
>> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
>> > James - i did not see a security pull request email from you in my
>> > lkml folder so i created this new thread. -tip testing found the
>> > easy crash below. It reverts cleanly so i went that easy route.
>> >
>> > At a really quick 10-seconds glance the crash happens because we
>> > destroy the slab cache twice, if the sysctl is toggled twice?
>>
>> Something a lot worse than SELinux here. ?I added this exact code and
>> got this warning. ?Something is wrong in the world of
>> kmem_cache_destroy.....

Btw, the kmem_cache_destroy() bug Eric found is not in Linu's tree yet.

On Mon, Sep 14, 2009 at 10:16 AM, Ingo Molnar <[email protected]> wrote:
> -tip testing just triggered another type of SLAB problem (this time
> not apparently related to the security subsystem):
>
> BUG kmalloc-64: Poison overwritten
> -----------------------------------------------------------------------------
>
> INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
>
> Bytes b4 0xf498f680: ?ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> ?Object 0xf498f690: ?6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> ?Object 0xf498f6a0: ?90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk

This would be use-after-free in kmalloc-64 cache. Given the trace and
the fact that bdi_work_alloc() got introduce recently, it seems more
likely that fs/fs-writeback.c is to blame here. Jens, does the warning
ring a bell to you?

> ?Object 0xf498f6b0: ?6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> ?Object 0xf498f6c0: ?6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk?
> ?Redzone 0xf498f6d0: ?bb bb bb bb ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ????
> ?Padding 0xf498f6f8: ?5a 5a 5a 5a 5a 5a 5a 5a ? ? ? ? ? ? ? ? ? ? ? ? ZZZZZZZZ
> Pid: 3514, comm: sync Not tainted 2.6.31-tip-02343-gb432421-dirty #14071
> Call Trace:
> ?[<c10e5b29>] print_trailer+0xf9/0x170
> ?[<c10e5c95>] check_bytes_and_report+0xf5/0x120
> ?[<c10e7129>] check_object+0x1e9/0x230
> ?[<c10e848d>] alloc_debug_processing+0xfd/0x1d0
> ?[<c111394b>] ? bdi_alloc_work+0x2b/0x100
> ?[<c10e8687>] __slab_alloc+0x127/0x330
> ?[<c111394b>] ? bdi_alloc_work+0x2b/0x100
> ?[<c111394b>] ? bdi_alloc_work+0x2b/0x100
> ?[<c10e8a43>] kmem_cache_alloc+0x1b3/0x1d0
> ?[<c111394b>] ? bdi_alloc_work+0x2b/0x100
> ?[<c111394b>] ? bdi_alloc_work+0x2b/0x100
> ?[<c11148a4>] ? bdi_writeback_all+0x34/0x190
> ?[<c111394b>] bdi_alloc_work+0x2b/0x100
> ?[<c194c6b2>] ? _spin_lock+0x72/0x90
> ?[<c11148e2>] bdi_writeback_all+0x72/0x190
> ?[<c107b3db>] ? mark_held_locks+0x6b/0xb0
> ?[<c194ab75>] ? __mutex_unlock_slowpath+0xf5/0x160
> ?[<c107b76c>] ? trace_hardirqs_on_caller+0x15c/0x1c0
> ?[<c1114a48>] sync_inodes_sb+0x48/0x70
> ?[<c1118b0b>] __sync_filesystem+0x7b/0x90
> ?[<c1118c13>] sync_filesystems+0xf3/0x140
> ?[<c1118cd7>] sys_sync+0x27/0x60
> ?[<c100344b>] sysenter_do_call+0x12/0x36
> FIX kmalloc-64: Restoring 0xf498f6a0-0xf498f6a7=0x6b
>
> Now, this might be an extended arm of the security related slab
> troubles. (which seem to have been cured by the revert i posted
> though.)
>
> This bug is not bisectable at all - it happened after 1000+
> successful random bootups. The mainline base for
> 2.6.31-tip-02343-gb432421 is upstream commit 86d7101.
>
> Full bootlog and config attached.
>
> ? ? ? ?Ingo
>
>

2009-09-14 09:20:43

by Jens Axboe

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Mon, Sep 14 2009, Pekka Enberg wrote:
> * Eric Paris <[email protected]> wrote:
> >> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> >> > James - i did not see a security pull request email from you in my
> >> > lkml folder so i created this new thread. -tip testing found the
> >> > easy crash below. It reverts cleanly so i went that easy route.
> >> >
> >> > At a really quick 10-seconds glance the crash happens because we
> >> > destroy the slab cache twice, if the sysctl is toggled twice?
> >>
> >> Something a lot worse than SELinux here. ?I added this exact code and
> >> got this warning. ?Something is wrong in the world of
> >> kmem_cache_destroy.....
>
> Btw, the kmem_cache_destroy() bug Eric found is not in Linu's tree yet.
>
> On Mon, Sep 14, 2009 at 10:16 AM, Ingo Molnar <[email protected]> wrote:
> > -tip testing just triggered another type of SLAB problem (this time
> > not apparently related to the security subsystem):
> >
> > BUG kmalloc-64: Poison overwritten
> > -----------------------------------------------------------------------------
> >
> > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> >
> > Bytes b4 0xf498f680: ?ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > ?Object 0xf498f690: ?6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > ?Object 0xf498f6a0: ?90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
>
> This would be use-after-free in kmalloc-64 cache. Given the trace and
> the fact that bdi_work_alloc() got introduce recently, it seems more
> likely that fs/fs-writeback.c is to blame here. Jens, does the warning
> ring a bell to you?

No bells, the code seems right to me. I'll prod at it a bit more. I
haven't seen anything like this during testing.

--
Jens Axboe

2009-09-14 09:23:19

by Pekka Enberg

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Mon, 2009-09-14 at 11:20 +0200, Jens Axboe wrote:
> On Mon, Sep 14 2009, Pekka Enberg wrote:
> > * Eric Paris <[email protected]> wrote:
> > >> On Sat, 2009-09-12 at 09:24 +0200, Ingo Molnar wrote:
> > >> > James - i did not see a security pull request email from you in my
> > >> > lkml folder so i created this new thread. -tip testing found the
> > >> > easy crash below. It reverts cleanly so i went that easy route.
> > >> >
> > >> > At a really quick 10-seconds glance the crash happens because we
> > >> > destroy the slab cache twice, if the sysctl is toggled twice?
> > >>
> > >> Something a lot worse than SELinux here. I added this exact code and
> > >> got this warning. Something is wrong in the world of
> > >> kmem_cache_destroy.....
> >
> > Btw, the kmem_cache_destroy() bug Eric found is not in Linu's tree yet.
> >
> > On Mon, Sep 14, 2009 at 10:16 AM, Ingo Molnar <[email protected]> wrote:
> > > -tip testing just triggered another type of SLAB problem (this time
> > > not apparently related to the security subsystem):
> > >
> > > BUG kmalloc-64: Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > >
> > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> >
> > This would be use-after-free in kmalloc-64 cache. Given the trace and
> > the fact that bdi_work_alloc() got introduce recently, it seems more
> > likely that fs/fs-writeback.c is to blame here. Jens, does the warning
> > ring a bell to you?
>
> No bells, the code seems right to me. I'll prod at it a bit more. I
> haven't seen anything like this during testing.

OK, it's possible that someone else is holding on to the kmalloc-64
memory block too but that won't show up in the traces.

Pekka

2009-09-14 14:38:50

by David Howells

[permalink] [raw]
Subject: Re: [origin tree boot hang] lockup in key_schedule_gc()

Eric Paris <[email protected]> wrote:

> Adding dhowells, the keys maintainer, this one certainly isn't obvious
> to me off hand.

I've screwed up the gc mechanism. I'm working out a patch for it.

David

2009-09-14 14:41:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514



On Mon, 14 Sep 2009, Ingo Molnar wrote:
>
> BUG kmalloc-64: Poison overwritten
> -----------------------------------------------------------------------------
>
> INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
>
> Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk

That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
they're both valid kernel pointers, and the 0xf498f398 one is actually
into the same page as the corruption, so it's a pointer to the same slab
type (or at least same size). Which is a good hint in itself: we're
looking at a list or something.

And it's at offset 16 in the structure.

That's almost certainly a "struct bdi_work", and the use-aftr-free thing
is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
the same page) is 'next', and the second thing is a pointer to kernel text
(and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').

So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
Both of them are new or hugely changed since 2.6.31.

Linus

2009-09-14 16:29:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 14 Sep 2009, Ingo Molnar wrote:
> >
> > BUG kmalloc-64: Poison overwritten
> > -----------------------------------------------------------------------------
> >
> > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> >
> > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
>
> That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> they're both valid kernel pointers, and the 0xf498f398 one is actually
> into the same page as the corruption, so it's a pointer to the same slab
> type (or at least same size). Which is a good hint in itself: we're
> looking at a list or something.
>
> And it's at offset 16 in the structure.
>
> That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> the same page) is 'next', and the second thing is a pointer to kernel text
> (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
>
> So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> Both of them are new or hugely changed since 2.6.31.

If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
CONFIG_TREE_RCU that it actually had used, I would suggest applying the
patchset I submitted yesterday (Sept 13).

http://thread.gmane.org/gmane.linux.kernel/888803

Will take a look, regardless.

Thanx, Paul

2009-09-14 17:10:40

by Jens Axboe

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Mon, Sep 14 2009, Paul E. McKenney wrote:
> On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> >
> >
> > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > >
> > > BUG kmalloc-64: Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > >
> > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> >
> > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > into the same page as the corruption, so it's a pointer to the same slab
> > type (or at least same size). Which is a good hint in itself: we're
> > looking at a list or something.
> >
> > And it's at offset 16 in the structure.
> >
> > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > the same page) is 'next', and the second thing is a pointer to kernel text
> > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> >
> > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > Both of them are new or hugely changed since 2.6.31.
>
> If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> CONFIG_TREE_RCU that it actually had used, I would suggest applying the
> patchset I submitted yesterday (Sept 13).
>
> http://thread.gmane.org/gmane.linux.kernel/888803

Ingo, did it? I'll dive into this tonight, Linus' analysis and just a
general feel does point in the direction of the bdi work.

> Will take a look, regardless.

Thanks!

--
Jens Axboe

2009-09-15 06:57:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514


* Jens Axboe <[email protected]> wrote:

> On Mon, Sep 14 2009, Paul E. McKenney wrote:
> > On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > > >
> > > > BUG kmalloc-64: Poison overwritten
> > > > -----------------------------------------------------------------------------
> > > >
> > > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > > >
> > > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> > >
> > > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > > into the same page as the corruption, so it's a pointer to the same slab
> > > type (or at least same size). Which is a good hint in itself: we're
> > > looking at a list or something.
> > >
> > > And it's at offset 16 in the structure.
> > >
> > > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > > the same page) is 'next', and the second thing is a pointer to kernel text
> > > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> > >
> > > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > > Both of them are new or hugely changed since 2.6.31.
> >
> > If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> > CONFIG_TREE_RCU that it actually had used, I would suggest applying
> > the patchset I submitted yesterday (Sept 13).
> >
> > http://thread.gmane.org/gmane.linux.kernel/888803
>
> Ingo, did it? [...]

The config i attached to the bugreport has:

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_TREE_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_EXACT=y
CONFIG_TREE_RCU_TRACE=y

So TREE_PREEMPT_RCU & the synchronize_rcu() bug Paul fixed is out.

> [...] I'll dive into this tonight, Linus' analysis and just a general
> feel does point in the direction of the bdi work.

Hard to tell whether it's BDI, RCU or something else - sadly this is the
only incident i've managed to log so far. (We'd be all much happier if
boxes crashed left and right! ;)

-tip's been carrying the RCU changes for a long(er) time which would
reduce the chance of this being RCU related. [ It's still possible
though: if it's a bug with a probability of hitting this box on these
workloads with a chance of 1:20,000 or worse. ]

Plus it triggered shortly after i updated -tip to latest -git which had
the BDI bits - which would indicate the BDI stuff - or just about
anything else in -git for that matter - or something older in -tip.
Every day without having hit this crash once more broadens the range of
plausible possibilities.

In any case, i'll refrain from trying to fit a line on a single point of
measurement ;-)

Ingo

2009-09-15 07:00:58

by Jens Axboe

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Tue, Sep 15 2009, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
> > On Mon, Sep 14 2009, Paul E. McKenney wrote:
> > > On Mon, Sep 14, 2009 at 07:40:27AM -0700, Linus Torvalds wrote:
> > > >
> > > >
> > > > On Mon, 14 Sep 2009, Ingo Molnar wrote:
> > > > >
> > > > > BUG kmalloc-64: Poison overwritten
> > > > > -----------------------------------------------------------------------------
> > > > >
> > > > > INFO: 0xf498f6a0-0xf498f6a7. First byte 0x90 instead of 0x6b
> > > > > INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
> > > > > INFO: Freed in bdi_work_free+0x45/0x60 age=9 cpu=1 pid=3509
> > > > > INFO: Slab 0xc3257d84 objects=36 used=11 fp=0xf498f690 flags=0x400000c3
> > > > > INFO: Object 0xf498f690 @offset=1680 fp=0xf498fe00
> > > > >
> > > > > Bytes b4 0xf498f680: ab 0d 00 00 9c 27 ff ff 5a 5a 5a 5a 5a 5a 5a 5a ?....'??ZZZZZZZZ
> > > > > Object 0xf498f690: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > > > Object 0xf498f6a0: 90 f3 98 f4 60 3c 11 c1 6b 6b 6b 6b 6b 6b 6b 6b .?.?`<.?kkkkkkkk
> > > >
> > > > That's 8 bytes of 0xf498f398 and 0xc1113c60. Doesn't look like much, but
> > > > they're both valid kernel pointers, and the 0xf498f398 one is actually
> > > > into the same page as the corruption, so it's a pointer to the same slab
> > > > type (or at least same size). Which is a good hint in itself: we're
> > > > looking at a list or something.
> > > >
> > > > And it's at offset 16 in the structure.
> > > >
> > > > That's almost certainly a "struct bdi_work", and the use-aftr-free thing
> > > > is the "struct rcu_head rcu_head" part of it. That first thing (pointer to
> > > > the same page) is 'next', and the second thing is a pointer to kernel text
> > > > (and I can pretty much guarantee that 0xc1113c60 is 'bdi_work_free').
> > > >
> > > > So this is either a fs/fs-writeback.c bug, or it's a problem with RCU.
> > > > Both of them are new or hugely changed since 2.6.31.
> > >
> > > If this run had used CONFIG_TREE_PREEMPT_RCU rather than the
> > > CONFIG_TREE_RCU that it actually had used, I would suggest applying
> > > the patchset I submitted yesterday (Sept 13).
> > >
> > > http://thread.gmane.org/gmane.linux.kernel/888803
> >
> > Ingo, did it? [...]
>
> The config i attached to the bugreport has:
>
> #
> # RCU Subsystem
> #
> CONFIG_TREE_RCU=y
> # CONFIG_TREE_PREEMPT_RCU is not set
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=64
> CONFIG_RCU_FANOUT_EXACT=y
> CONFIG_TREE_RCU_TRACE=y
>
> So TREE_PREEMPT_RCU & the synchronize_rcu() bug Paul fixed is out.

Yeah, I noticed later on. synchronize_rcu() is only used on exit as
well, so if it happened during boot it would have to be a call_rcu()
problem.

> > [...] I'll dive into this tonight, Linus' analysis and just a general
> > feel does point in the direction of the bdi work.
>
> Hard to tell whether it's BDI, RCU or something else - sadly this is the
> only incident i've managed to log so far. (We'd be all much happier if
> boxes crashed left and right! ;)

Indeed, that's much easier to test and fix!

> -tip's been carrying the RCU changes for a long(er) time which would
> reduce the chance of this being RCU related. [ It's still possible
> though: if it's a bug with a probability of hitting this box on these
> workloads with a chance of 1:20,000 or worse. ]
>
> Plus it triggered shortly after i updated -tip to latest -git which had
> the BDI bits - which would indicate the BDI stuff - or just about
> anything else in -git for that matter - or something older in -tip.
> Every day without having hit this crash once more broadens the range of
> plausible possibilities.

I haven't found anything here yet, but I'll keep playing. My RCU config
is the same as yours.

> In any case, i'll refrain from trying to fit a line on a single point of
> measurement ;-)

;-)

--
Jens Axboe

2009-09-15 07:13:04

by Ingo Molnar

[permalink] [raw]
Subject: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514


* Ingo Molnar <[email protected]> wrote:

> Hard to tell whether it's BDI, RCU or something else - sadly this is
> the only incident i've managed to log so far. (We'd be all much
> happier if boxes crashed left and right! ;)
>
> -tip's been carrying the RCU changes for a long(er) time which would
> reduce the chance of this being RCU related. [ It's still possible
> though: if it's a bug with a probability of hitting this box on these
> workloads with a chance of 1:20,000 or worse. ]
>
> Plus it triggered shortly after i updated -tip to latest -git which
> had the BDI bits - which would indicate the BDI stuff - or just about
> anything else in -git for that matter - or something older in -tip.
> Every day without having hit this crash once more broadens the range
> of plausible possibilities.
>
> In any case, i'll refrain from trying to fit a line on a single point
> of measurement ;-)

Ha! I should have checked all logs of today before writing that, not
just that box's logs.

Another testbox triggered the SLAB corruption yesternight:

[ 13.598011] Freeing unused kernel memory: 2820k freed
[ 13.602011] Write protecting the kernel read-only data: 13528k
[ 13.649011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
[ 14.391012] =============================================================================
[ 14.391012] BUG kmalloc-96: Poison overwritten
[ 14.391012] -----------------------------------------------------------------------------
[ 14.391012]
[ 14.391012] INFO: 0xffff88003da4a950-0xffff88003da4a988. First byte 0x0 instead of 0x6b
[ 14.391012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=6 cpu=0 pid=3191
[ 14.391012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
[ 14.391012] INFO: Slab 0xffffea000190ae10 objects=24 used=13 fp=0xffff88003da4a930 flags=0x200000000000c3
[ 14.391012] INFO: Object 0xffff88003da4a930 @offset=2352 fp=0xffff88003da4a888
[ 14.391012]
[ 14.391012] Bytes b4 0xffff88003da4a920: 52 a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a R???....ZZZZZZZZ
[ 14.391012] Object 0xffff88003da4a930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.391012] Object 0xffff88003da4a940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.391012] Object 0xffff88003da4a950: 00 0c 47 3d 00 88 ff ff be 22 11 81 ff ff ff ff ..G=..???"..????
[ 14.391012] Object 0xffff88003da4a960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.391012] Object 0xffff88003da4a970: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.391012] Object 0xffff88003da4a980: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk?
[ 14.391012] Redzone 0xffff88003da4a990: bb bb bb bb bb bb bb bb ????????
[ 14.391012] Padding 0xffff88003da4a9d0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[ 14.391012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02377-g78907f0-dirty #88876
[ 14.391012] Call Trace:
[ 14.391012] [<ffffffff810e6bef>] print_trailer+0x140/0x149
[ 14.391012] [<ffffffff810e70db>] check_bytes_and_report+0xb7/0xf7
[ 14.391012] [<ffffffff810e71ec>] check_object+0xd1/0x1b4
[ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
[ 14.391012] [<ffffffff810e7bee>] alloc_debug_processing+0x7b/0xf7
[ 14.391012] [<ffffffff810e9842>] __slab_alloc+0x23e/0x282
[ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
[ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
[ 14.391012] [<ffffffff810e9a8f>] kmem_cache_alloc+0xa1/0x13f
[ 14.391012] [<ffffffff81112078>] bdi_alloc_work+0x20/0x83
[ 14.391012] [<ffffffff81112ae2>] bdi_writeback_all+0x66/0x133
[ 14.391012] [<ffffffff810788b8>] ? mark_held_locks+0x4d/0x6b
[ 14.391012] [<ffffffff81872268>] ? __mutex_unlock_slowpath+0x12d/0x163
[ 14.391012] [<ffffffff81078b45>] ? trace_hardirqs_on_caller+0x11c/0x140
[ 14.391012] [<ffffffff815c35e5>] ? usbfs_fill_super+0x0/0xa8
[ 14.391012] [<ffffffff81112c87>] writeback_inodes_sb+0x75/0x83
[ 14.391012] [<ffffffff8111652d>] __sync_filesystem+0x30/0x6b
[ 14.391012] [<ffffffff81116705>] sync_filesystem+0x3a/0x51
[ 14.391012] [<ffffffff810f4b23>] do_remount_sb+0x5b/0x11f
[ 14.391012] [<ffffffff810f59a3>] get_sb_single+0x92/0xad
[ 14.391012] [<ffffffff815c2f01>] usb_get_sb+0x1b/0x1d
[ 14.391012] [<ffffffff810f5786>] vfs_kern_mount+0x9e/0x122
[ 14.391012] [<ffffffff810f5871>] do_kern_mount+0x4c/0xec
[ 14.391012] [<ffffffff8110dd15>] do_mount+0x1e9/0x236
[ 14.391012] [<ffffffff8110dde6>] sys_mount+0x84/0xc6
[ 14.391012] [<ffffffff8187333a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 14.391012] [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
[ 14.391012] FIX kmalloc-96: Restoring 0xffff88003da4a950-0xffff88003da4a988=0x6b
[ 14.391012]
[ 14.391012] FIX kmalloc-96: Marking all objects used
[ 20.597016] usb usb2: uevent
[ 20.606016] usb 2-0:1.0: uevent
[ 20.615016] usb usb3: uevent
[ 20.622016] usb 3-0:1.0: uevent
[ 20.631016] usb usb4: uevent

Different hardware, different config, but still in bdi_alloc_work().

- Which excludes cosmic rays and freak hardware from the list of
possibilities.

- I'd also say RCU is out too as this incident was 500 iterations after
the BDI merge - preceded by a streak of 3000+ successful iterations on
that same box with all of -tip (including the RCU changes).

- Random memory corruption is probably out as well - the chance of
hitting a BDI data structure twice accidentally is low.

- It's also two completely different versions of distros - the
user-space of the two testboxes affected is 2 years apart or so.

- It's a single CPU box - SMP races are out as well.

This points towards this being a BDI bug with about ~80% confidence
statistically - or [with a lower probability] a SLAB bug. (both failing
configs had CONFIG_SLUB=y. But SLUB is the best in detecting corrupted
data structures so that alone does not tell much.)

Also, this particular config seems to reproduce the problem a bit more
reliably - a second bootup gave a third corruption report, attached
below.

Full bootlog and config attached as well.

Ingo

[ 13.677011] Freeing unused kernel memory: 2820k freed
[ 13.681011] Write protecting the kernel read-only data: 13528k
[ 13.728011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
[ 14.399012] =============================================================================
[ 14.399012] BUG kmalloc-96: Poison overwritten
[ 14.399012] -----------------------------------------------------------------------------
[ 14.399012]
[ 14.399012] INFO: 0xffff88003da5c950-0xffff88003da5c988. First byte 0x0 instead of 0x6b
[ 14.399012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=7 cpu=0 pid=3191
[ 14.399012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
[ 14.399012] INFO: Slab 0xffffea000190b560 objects=24 used=13 fp=0xffff88003da5c930 flags=0x200000000000c3
[ 14.399012] INFO: Object 0xffff88003da5c930 @offset=2352 fp=0xffff88003da5c888
[ 14.399012]
[ 14.399012] Bytes b4 0xffff88003da5c920: 5a a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a Z???....ZZZZZZZZ
[ 14.399012] Object 0xffff88003da5c930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.399012] Object 0xffff88003da5c940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.399012] Object 0xffff88003da5c950: 00 83 49 3d 00 88 ff ff 92 23 11 81 ff ff ff ff ..I=..??.#..????
[ 14.399012] Object 0xffff88003da5c960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.399012] Object 0xffff88003da5c970: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 14.399012] Object 0xffff88003da5c980: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk?
[ 14.399012] Redzone 0xffff88003da5c990: bb bb bb bb bb bb bb bb ????????
[ 14.399012] Padding 0xffff88003da5c9d0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[ 14.399012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02387-gc84a410-dirty #88877
[ 14.399012] Call Trace:
[ 14.399012] [<ffffffff810e6cc3>] print_trailer+0x140/0x149
[ 14.399012] [<ffffffff810e71af>] check_bytes_and_report+0xb7/0xf7
[ 14.399012] [<ffffffff810e72c0>] check_object+0xd1/0x1b4
[ 14.399012] [<ffffffff8111214c>] ? bdi_alloc_work+0x20/0x83
[ 14.399012] [<ffffffff810e7cc2>] alloc_debug_processing+0x7b/0xf7
[ 14.399012] [<ffffffff810e9916>] __slab_alloc+0x23e/0x282
[ 14.399012] [<ffffffff8111214c>] ? bdi_alloc_work+0x20/0x83
[ 14.399012] [<ffffffff8111214c>] ? bdi_alloc_work+0x20/0x83
[ 14.399012] [<ffffffff810e9b63>] kmem_cache_alloc+0xa1/0x13f
[ 14.399012] [<ffffffff8111214c>] bdi_alloc_work+0x20/0x83
[ 14.399012] [<ffffffff81112bb6>] bdi_writeback_all+0x66/0x133
[ 14.399012] [<ffffffff8107894c>] ? mark_held_locks+0x4d/0x6b
[ 14.399012] [<ffffffff81872358>] ? __mutex_unlock_slowpath+0x12d/0x163
[ 14.399012] [<ffffffff81078bd9>] ? trace_hardirqs_on_caller+0x11c/0x140
[ 14.399012] [<ffffffff815c36c9>] ? usbfs_fill_super+0x0/0xa8
[ 14.399012] [<ffffffff81112d5b>] writeback_inodes_sb+0x75/0x83
[ 14.399012] [<ffffffff81116601>] __sync_filesystem+0x30/0x6b
[ 14.399012] [<ffffffff811167d9>] sync_filesystem+0x3a/0x51
[ 14.399012] [<ffffffff810f4bf7>] do_remount_sb+0x5b/0x11f
[ 14.399012] [<ffffffff810f5a77>] get_sb_single+0x92/0xad
[ 14.399012] [<ffffffff815c2fe5>] usb_get_sb+0x1b/0x1d
[ 14.399012] [<ffffffff810f585a>] vfs_kern_mount+0x9e/0x122
[ 14.399012] [<ffffffff810f5945>] do_kern_mount+0x4c/0xec
[ 14.399012] [<ffffffff8110dde9>] do_mount+0x1e9/0x236
[ 14.399012] [<ffffffff8110deba>] sys_mount+0x84/0xc6
[ 14.399012] [<ffffffff8187342a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 14.399012] [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
[ 14.399012] FIX kmalloc-96: Restoring 0xffff88003da5c950-0xffff88003da5c988=0x6b
[ 14.399012]
[ 14.399012] FIX kmalloc-96: Marking all objects used
[ 20.640016] usb usb2: uevent
[ 20.651016] usb 2-0:1.0: uevent
[ 20.658016] usb usb3: uevent


Attachments:
(No filename) (10.10 kB)
config-Tue_Sep_15_05_02_52_CEST_2009.bad (69.88 kB)
crash.log (298.90 kB)
Download all attachments

2009-09-15 07:24:36

by Jens Axboe

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Tue, Sep 15 2009, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > Hard to tell whether it's BDI, RCU or something else - sadly this is
> > the only incident i've managed to log so far. (We'd be all much
> > happier if boxes crashed left and right! ;)
> >
> > -tip's been carrying the RCU changes for a long(er) time which would
> > reduce the chance of this being RCU related. [ It's still possible
> > though: if it's a bug with a probability of hitting this box on these
> > workloads with a chance of 1:20,000 or worse. ]
> >
> > Plus it triggered shortly after i updated -tip to latest -git which
> > had the BDI bits - which would indicate the BDI stuff - or just about
> > anything else in -git for that matter - or something older in -tip.
> > Every day without having hit this crash once more broadens the range
> > of plausible possibilities.
> >
> > In any case, i'll refrain from trying to fit a line on a single point
> > of measurement ;-)
>
> Ha! I should have checked all logs of today before writing that, not
> just that box's logs.
>
> Another testbox triggered the SLAB corruption yesternight:
>
> [ 13.598011] Freeing unused kernel memory: 2820k freed
> [ 13.602011] Write protecting the kernel read-only data: 13528k
> [ 13.649011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
> [ 14.391012] =============================================================================
> [ 14.391012] BUG kmalloc-96: Poison overwritten
> [ 14.391012] -----------------------------------------------------------------------------
> [ 14.391012]
> [ 14.391012] INFO: 0xffff88003da4a950-0xffff88003da4a988. First byte 0x0 instead of 0x6b
> [ 14.391012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=6 cpu=0 pid=3191
> [ 14.391012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
> [ 14.391012] INFO: Slab 0xffffea000190ae10 objects=24 used=13 fp=0xffff88003da4a930 flags=0x200000000000c3
> [ 14.391012] INFO: Object 0xffff88003da4a930 @offset=2352 fp=0xffff88003da4a888
> [ 14.391012]
> [ 14.391012] Bytes b4 0xffff88003da4a920: 52 a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a R???....ZZZZZZZZ
> [ 14.391012] Object 0xffff88003da4a930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a950: 00 0c 47 3d 00 88 ff ff be 22 11 81 ff ff ff ff ..G=..???"..????
> [ 14.391012] Object 0xffff88003da4a960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a970: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 14.391012] Object 0xffff88003da4a980: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk?
> [ 14.391012] Redzone 0xffff88003da4a990: bb bb bb bb bb bb bb bb ????????
> [ 14.391012] Padding 0xffff88003da4a9d0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> [ 14.391012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02377-g78907f0-dirty #88876
> [ 14.391012] Call Trace:
> [ 14.391012] [<ffffffff810e6bef>] print_trailer+0x140/0x149
> [ 14.391012] [<ffffffff810e70db>] check_bytes_and_report+0xb7/0xf7
> [ 14.391012] [<ffffffff810e71ec>] check_object+0xd1/0x1b4
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff810e7bee>] alloc_debug_processing+0x7b/0xf7
> [ 14.391012] [<ffffffff810e9842>] __slab_alloc+0x23e/0x282
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff810e9a8f>] kmem_cache_alloc+0xa1/0x13f
> [ 14.391012] [<ffffffff81112078>] bdi_alloc_work+0x20/0x83
> [ 14.391012] [<ffffffff81112ae2>] bdi_writeback_all+0x66/0x133
> [ 14.391012] [<ffffffff810788b8>] ? mark_held_locks+0x4d/0x6b
> [ 14.391012] [<ffffffff81872268>] ? __mutex_unlock_slowpath+0x12d/0x163
> [ 14.391012] [<ffffffff81078b45>] ? trace_hardirqs_on_caller+0x11c/0x140
> [ 14.391012] [<ffffffff815c35e5>] ? usbfs_fill_super+0x0/0xa8
> [ 14.391012] [<ffffffff81112c87>] writeback_inodes_sb+0x75/0x83
> [ 14.391012] [<ffffffff8111652d>] __sync_filesystem+0x30/0x6b
> [ 14.391012] [<ffffffff81116705>] sync_filesystem+0x3a/0x51
> [ 14.391012] [<ffffffff810f4b23>] do_remount_sb+0x5b/0x11f
> [ 14.391012] [<ffffffff810f59a3>] get_sb_single+0x92/0xad
> [ 14.391012] [<ffffffff815c2f01>] usb_get_sb+0x1b/0x1d
> [ 14.391012] [<ffffffff810f5786>] vfs_kern_mount+0x9e/0x122
> [ 14.391012] [<ffffffff810f5871>] do_kern_mount+0x4c/0xec
> [ 14.391012] [<ffffffff8110dd15>] do_mount+0x1e9/0x236
> [ 14.391012] [<ffffffff8110dde6>] sys_mount+0x84/0xc6
> [ 14.391012] [<ffffffff8187333a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 14.391012] [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
> [ 14.391012] FIX kmalloc-96: Restoring 0xffff88003da4a950-0xffff88003da4a988=0x6b
> [ 14.391012]
> [ 14.391012] FIX kmalloc-96: Marking all objects used
> [ 20.597016] usb usb2: uevent
> [ 20.606016] usb 2-0:1.0: uevent
> [ 20.615016] usb usb3: uevent
> [ 20.622016] usb 3-0:1.0: uevent
> [ 20.631016] usb usb4: uevent
>
> Different hardware, different config, but still in bdi_alloc_work().
>
> - Which excludes cosmic rays and freak hardware from the list of
> possibilities.
>
> - I'd also say RCU is out too as this incident was 500 iterations after
> the BDI merge - preceded by a streak of 3000+ successful iterations on
> that same box with all of -tip (including the RCU changes).
>
> - Random memory corruption is probably out as well - the chance of
> hitting a BDI data structure twice accidentally is low.
>
> - It's also two completely different versions of distros - the
> user-space of the two testboxes affected is 2 years apart or so.
>
> - It's a single CPU box - SMP races are out as well.
>
> This points towards this being a BDI bug with about ~80% confidence
> statistically - or [with a lower probability] a SLAB bug. (both failing
> configs had CONFIG_SLUB=y. But SLUB is the best in detecting corrupted
> data structures so that alone does not tell much.)

Hmmm, at least this reproduces more consistently. Can I talk you into
trying to pull in:

git://git.kernel.dk/linux-2.6-block.git writeback

and see if it reproduces there? That path has been cleaned up
considerably there.

--
Jens Axboe

2009-09-15 07:44:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514


* Jens Axboe <[email protected]> wrote:

> On Tue, Sep 15 2009, Ingo Molnar wrote:
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > Hard to tell whether it's BDI, RCU or something else - sadly this is
> > > the only incident i've managed to log so far. (We'd be all much
> > > happier if boxes crashed left and right! ;)
> > >
> > > -tip's been carrying the RCU changes for a long(er) time which would
> > > reduce the chance of this being RCU related. [ It's still possible
> > > though: if it's a bug with a probability of hitting this box on these
> > > workloads with a chance of 1:20,000 or worse. ]
> > >
> > > Plus it triggered shortly after i updated -tip to latest -git which
> > > had the BDI bits - which would indicate the BDI stuff - or just about
> > > anything else in -git for that matter - or something older in -tip.
> > > Every day without having hit this crash once more broadens the range
> > > of plausible possibilities.
> > >
> > > In any case, i'll refrain from trying to fit a line on a single point
> > > of measurement ;-)
> >
> > Ha! I should have checked all logs of today before writing that, not
> > just that box's logs.
> >
> > Another testbox triggered the SLAB corruption yesternight:
> >
> > [ 13.598011] Freeing unused kernel memory: 2820k freed
> > [ 13.602011] Write protecting the kernel read-only data: 13528k
> > [ 13.649011] Not activating Mandatory Access Control now since /sbin/tomoyo-init doesn't exist.
> > [ 14.391012] =============================================================================
> > [ 14.391012] BUG kmalloc-96: Poison overwritten
> > [ 14.391012] -----------------------------------------------------------------------------
> > [ 14.391012]
> > [ 14.391012] INFO: 0xffff88003da4a950-0xffff88003da4a988. First byte 0x0 instead of 0x6b
> > [ 14.391012] INFO: Allocated in bdi_alloc_work+0x20/0x83 age=6 cpu=0 pid=3191
> > [ 14.391012] INFO: Freed in bdi_work_free+0x1b/0x2f age=4 cpu=0 pid=3193
> > [ 14.391012] INFO: Slab 0xffffea000190ae10 objects=24 used=13 fp=0xffff88003da4a930 flags=0x200000000000c3
> > [ 14.391012] INFO: Object 0xffff88003da4a930 @offset=2352 fp=0xffff88003da4a888
> > [ 14.391012]
> > [ 14.391012] Bytes b4 0xffff88003da4a920: 52 a4 fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a R???....ZZZZZZZZ
> > [ 14.391012] Object 0xffff88003da4a930: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 14.391012] Object 0xffff88003da4a940: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 14.391012] Object 0xffff88003da4a950: 00 0c 47 3d 00 88 ff ff be 22 11 81 ff ff ff ff ..G=..???"..????
> > [ 14.391012] Object 0xffff88003da4a960: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 14.391012] Object 0xffff88003da4a970: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > [ 14.391012] Object 0xffff88003da4a980: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b a5 kkkkkkkkjkkkkkk?
> > [ 14.391012] Redzone 0xffff88003da4a990: bb bb bb bb bb bb bb bb ????????
> > [ 14.391012] Padding 0xffff88003da4a9d0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
> > [ 14.391012] Pid: 3193, comm: mount Not tainted 2.6.31-tip-02377-g78907f0-dirty #88876
> > [ 14.391012] Call Trace:
> > [ 14.391012] [<ffffffff810e6bef>] print_trailer+0x140/0x149
> > [ 14.391012] [<ffffffff810e70db>] check_bytes_and_report+0xb7/0xf7
> > [ 14.391012] [<ffffffff810e71ec>] check_object+0xd1/0x1b4
> > [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> > [ 14.391012] [<ffffffff810e7bee>] alloc_debug_processing+0x7b/0xf7
> > [ 14.391012] [<ffffffff810e9842>] __slab_alloc+0x23e/0x282
> > [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> > [ 14.391012] [<ffffffff81112078>] ? bdi_alloc_work+0x20/0x83
> > [ 14.391012] [<ffffffff810e9a8f>] kmem_cache_alloc+0xa1/0x13f
> > [ 14.391012] [<ffffffff81112078>] bdi_alloc_work+0x20/0x83
> > [ 14.391012] [<ffffffff81112ae2>] bdi_writeback_all+0x66/0x133
> > [ 14.391012] [<ffffffff810788b8>] ? mark_held_locks+0x4d/0x6b
> > [ 14.391012] [<ffffffff81872268>] ? __mutex_unlock_slowpath+0x12d/0x163
> > [ 14.391012] [<ffffffff81078b45>] ? trace_hardirqs_on_caller+0x11c/0x140
> > [ 14.391012] [<ffffffff815c35e5>] ? usbfs_fill_super+0x0/0xa8
> > [ 14.391012] [<ffffffff81112c87>] writeback_inodes_sb+0x75/0x83
> > [ 14.391012] [<ffffffff8111652d>] __sync_filesystem+0x30/0x6b
> > [ 14.391012] [<ffffffff81116705>] sync_filesystem+0x3a/0x51
> > [ 14.391012] [<ffffffff810f4b23>] do_remount_sb+0x5b/0x11f
> > [ 14.391012] [<ffffffff810f59a3>] get_sb_single+0x92/0xad
> > [ 14.391012] [<ffffffff815c2f01>] usb_get_sb+0x1b/0x1d
> > [ 14.391012] [<ffffffff810f5786>] vfs_kern_mount+0x9e/0x122
> > [ 14.391012] [<ffffffff810f5871>] do_kern_mount+0x4c/0xec
> > [ 14.391012] [<ffffffff8110dd15>] do_mount+0x1e9/0x236
> > [ 14.391012] [<ffffffff8110dde6>] sys_mount+0x84/0xc6
> > [ 14.391012] [<ffffffff8187333a>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [ 14.391012] [<ffffffff8100b642>] system_call_fastpath+0x16/0x1b
> > [ 14.391012] FIX kmalloc-96: Restoring 0xffff88003da4a950-0xffff88003da4a988=0x6b
> > [ 14.391012]
> > [ 14.391012] FIX kmalloc-96: Marking all objects used
> > [ 20.597016] usb usb2: uevent
> > [ 20.606016] usb 2-0:1.0: uevent
> > [ 20.615016] usb usb3: uevent
> > [ 20.622016] usb 3-0:1.0: uevent
> > [ 20.631016] usb usb4: uevent
> >
> > Different hardware, different config, but still in bdi_alloc_work().
> >
> > - Which excludes cosmic rays and freak hardware from the list of
> > possibilities.
> >
> > - I'd also say RCU is out too as this incident was 500 iterations after
> > the BDI merge - preceded by a streak of 3000+ successful iterations on
> > that same box with all of -tip (including the RCU changes).
> >
> > - Random memory corruption is probably out as well - the chance of
> > hitting a BDI data structure twice accidentally is low.
> >
> > - It's also two completely different versions of distros - the
> > user-space of the two testboxes affected is 2 years apart or so.
> >
> > - It's a single CPU box - SMP races are out as well.
> >
> > This points towards this being a BDI bug with about ~80% confidence
> > statistically - or [with a lower probability] a SLAB bug. (both failing
> > configs had CONFIG_SLUB=y. But SLUB is the best in detecting corrupted
> > data structures so that alone does not tell much.)
>
> Hmmm, at least this reproduces more consistently. Can I talk you into
> trying to pull in:
>
> git://git.kernel.dk/linux-2.6-block.git writeback
>
> and see if it reproduces there? That path has been cleaned up
> considerably there.

I gave it a test-pull - and the bug does not trigger anymore.

Note, that may not mean much: your tree is based on a fresh upstream
tree so it pulled a lot of new stuff into -tip that i have yet to
test/validate. It also changed the kernel image size/layout considerably
and this bug seems to be a very narrow to hit race of sorts.

Ingo

2009-09-15 07:48:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514


* Ingo Molnar <[email protected]> wrote:

> > trying to pull in:
> >
> > git://git.kernel.dk/linux-2.6-block.git writeback
> >
> > and see if it reproduces there? That path has been cleaned up
> > considerably there.
>
> I gave it a test-pull - and the bug does not trigger anymore.
>
> Note, that may not mean much: your tree is based on a fresh upstream
> tree so it pulled a lot of new stuff into -tip that i have yet to
> test/validate. It also changed the kernel image size/layout
> considerably and this bug seems to be a very narrow to hit race of
> sorts.

Btw., is there anything in your cleanups that could explain this bug?
Some list handling bug? A double free? Uninitialized memory? Race
between block IRQs and process context?

Ingo

2009-09-15 07:51:05

by Jens Axboe

[permalink] [raw]
Subject: Re: [origin tree SLAB corruption #2] BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514

On Tue, Sep 15 2009, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > trying to pull in:
> > >
> > > git://git.kernel.dk/linux-2.6-block.git writeback
> > >
> > > and see if it reproduces there? That path has been cleaned up
> > > considerably there.
> >
> > I gave it a test-pull - and the bug does not trigger anymore.
> >
> > Note, that may not mean much: your tree is based on a fresh upstream
> > tree so it pulled a lot of new stuff into -tip that i have yet to
> > test/validate. It also changed the kernel image size/layout
> > considerably and this bug seems to be a very narrow to hit race of
> > sorts.
>
> Btw., is there anything in your cleanups that could explain this bug?
> Some list handling bug? A double free? Uninitialized memory? Race
> between block IRQs and process context?

No that that I noticed, it was just a thought since that path has been
cleaned up and I have been looking at the newer tree only. I'll take a
good look at the current -git situation.

--
Jens Axboe