2013-09-16 17:30:57

by Dave Jones

[permalink] [raw]
Subject: lockup during selinux policy load.

On a slow machine (with debugging enabled), during a yum update I get
the soft lockup detector kicking in when it gets to reloading the selinux policy.
It looks like this..


BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
irq event stamp: 2368864
hardirqs last enabled at (2368863): [<ffffffff8152ab5d>] __slab_alloc.constprop.78+0x4c0/0x4d7
hardirqs last disabled at (2368864): [<ffffffff8153c82a>] apic_timer_interrupt+0x6a/0x80
softirqs last enabled at (2368554): [<ffffffff810480b9>] __do_softirq+0x169/0x200
softirqs last disabled at (2368539): [<ffffffff8104837d>] irq_exit+0x11d/0x140
CPU: 2 PID: 19045 Comm: load_policy Not tainted 3.11.0+ #16
Hardware name: /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
task: ffff88005ab38000 ti: ffff880019620000 task.ti: ffff880019620000
RIP: 0010:[<ffffffff81274273>] [<ffffffff81274273>] strcmp+0x23/0x40
RSP: 0018:ffff880019621818 EFLAGS: 00000246
RAX: 0000000000000063 RBX: ffff880018090ca8 RCX: 0000000000006070
RDX: ffff88000781d8f0 RSI: ffff88000781d8f1 RDI: ffff880079caed21
RBP: ffff880019621818 R08: ffff88006345a290 R09: ffff880018091680
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000292
R13: 0000000180160016 R14: ffff88007b804488 R15: ffffffff81221430
FS: 00007f73e1212800(0000) GS:ffff88007e600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f73e0857aa0 CR3: 0000000076fea000 CR4: 00000000000007e0
Stack:
ffff880019621828 ffffffff81221ddf ffff880019621850 ffffffff81221c27
ffff8800180ac6d8 ffff880019621b88 ffffffff82759620 ffff8800196218a0
ffffffff8122e96c 0000000181221e05 ffff880018091680 0000000000000080
Call Trace:
[<ffffffff81221ddf>] symcmp+0xf/0x20
[<ffffffff81221c27>] hashtab_search+0x47/0x80
[<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
[<ffffffff812294e8>] convert_context+0x378/0x460
[<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
[<ffffffff812221b5>] sidtab_map+0x45/0x80
[<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580
[<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
[<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
[<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
[<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
[<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
[<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0
[<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
[<ffffffff8121e947>] sel_write_load+0xa7/0x770
[<ffffffff81139633>] ? vfs_write+0x1c3/0x200
[<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0
[<ffffffff8113952b>] vfs_write+0xbb/0x200
[<ffffffff811581c7>] ? fget_light+0x397/0x4b0
[<ffffffff81139c27>] SyS_write+0x47/0xa0
[<ffffffff8153bde4>] tracesys+0xdd/0xe2
Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 eb 0e 66 2e 0f 1f 84 00 00 00 00 00 84 c0 74 1c 48 83 c7 01 0f b6 47 ff 48 83 c6 01 3a 46 ff <74> eb 19 c0 83 c8 01 5d c3 0f 1f 40 00 31 c0 5d c3 66 66 66 2e


23s in the kernel is an eternity. Short of rearchitecting how policy loads are done,
perhaps we could do something like this ? (untested, and 10000 is arbitarily chosen, may need to be adjusted)

thoughts ?

Dave

diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
index 933e735..69a0587 100644
--- a/security/selinux/ss/hashtab.c
+++ b/security/selinux/ss/hashtab.c
@@ -75,14 +75,21 @@ void *hashtab_search(struct hashtab *h, const void *key)
{
u32 hvalue;
struct hashtab_node *cur;
+ int count;

if (!h)
return NULL;

hvalue = h->hash_value(h, key);
cur = h->htable[hvalue];
- while (cur && h->keycmp(h, key, cur->key) > 0)
+ while (cur && h->keycmp(h, key, cur->key) > 0) {
cur = cur->next;
+ count++;
+ if (count == 10000) {
+ touch_softlockup_watchdog();
+ count = 0;
+ }
+ }

if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
return NULL;


2013-09-16 17:59:27

by Stephen Smalley

[permalink] [raw]
Subject: Re: lockup during selinux policy load.

On 09/16/2013 01:30 PM, Dave Jones wrote:
> On a slow machine (with debugging enabled), during a yum update I get
> the soft lockup detector kicking in when it gets to reloading the selinux policy.
> It looks like this..
>
>
> BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
> irq event stamp: 2368864
> hardirqs last enabled at (2368863): [<ffffffff8152ab5d>] __slab_alloc.constprop.78+0x4c0/0x4d7
> hardirqs last disabled at (2368864): [<ffffffff8153c82a>] apic_timer_interrupt+0x6a/0x80
> softirqs last enabled at (2368554): [<ffffffff810480b9>] __do_softirq+0x169/0x200
> softirqs last disabled at (2368539): [<ffffffff8104837d>] irq_exit+0x11d/0x140
> CPU: 2 PID: 19045 Comm: load_policy Not tainted 3.11.0+ #16
> Hardware name: /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
> task: ffff88005ab38000 ti: ffff880019620000 task.ti: ffff880019620000
> RIP: 0010:[<ffffffff81274273>] [<ffffffff81274273>] strcmp+0x23/0x40
> RSP: 0018:ffff880019621818 EFLAGS: 00000246
> RAX: 0000000000000063 RBX: ffff880018090ca8 RCX: 0000000000006070
> RDX: ffff88000781d8f0 RSI: ffff88000781d8f1 RDI: ffff880079caed21
> RBP: ffff880019621818 R08: ffff88006345a290 R09: ffff880018091680
> R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000292
> R13: 0000000180160016 R14: ffff88007b804488 R15: ffffffff81221430
> FS: 00007f73e1212800(0000) GS:ffff88007e600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f73e0857aa0 CR3: 0000000076fea000 CR4: 00000000000007e0
> Stack:
> ffff880019621828 ffffffff81221ddf ffff880019621850 ffffffff81221c27
> ffff8800180ac6d8 ffff880019621b88 ffffffff82759620 ffff8800196218a0
> ffffffff8122e96c 0000000181221e05 ffff880018091680 0000000000000080
> Call Trace:
> [<ffffffff81221ddf>] symcmp+0xf/0x20
> [<ffffffff81221c27>] hashtab_search+0x47/0x80
> [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
> [<ffffffff812294e8>] convert_context+0x378/0x460
> [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
> [<ffffffff812221b5>] sidtab_map+0x45/0x80
> [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0
> [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> [<ffffffff8121e947>] sel_write_load+0xa7/0x770
> [<ffffffff81139633>] ? vfs_write+0x1c3/0x200
> [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0
> [<ffffffff8113952b>] vfs_write+0xbb/0x200
> [<ffffffff811581c7>] ? fget_light+0x397/0x4b0
> [<ffffffff81139c27>] SyS_write+0x47/0xa0
> [<ffffffff8153bde4>] tracesys+0xdd/0xe2
> Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 eb 0e 66 2e 0f 1f 84 00 00 00 00 00 84 c0 74 1c 48 83 c7 01 0f b6 47 ff 48 83 c6 01 3a 46 ff <74> eb 19 c0 83 c8 01 5d c3 0f 1f 40 00 31 c0 5d c3 66 66 66 2e
>
>
> 23s in the kernel is an eternity. Short of rearchitecting how policy loads are done,
> perhaps we could do something like this ? (untested, and 10000 is arbitarily chosen, may need to be adjusted)
>
> thoughts ?

Maybe put a cond_resched() within the ebitmap_for_each_positive_bit()
loop in mls_convert_context()?

>
> Dave
>
> diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> index 933e735..69a0587 100644
> --- a/security/selinux/ss/hashtab.c
> +++ b/security/selinux/ss/hashtab.c
> @@ -75,14 +75,21 @@ void *hashtab_search(struct hashtab *h, const void *key)
> {
> u32 hvalue;
> struct hashtab_node *cur;
> + int count;
>
> if (!h)
> return NULL;
>
> hvalue = h->hash_value(h, key);
> cur = h->htable[hvalue];
> - while (cur && h->keycmp(h, key, cur->key) > 0)
> + while (cur && h->keycmp(h, key, cur->key) > 0) {
> cur = cur->next;
> + count++;
> + if (count == 10000) {
> + touch_softlockup_watchdog();
> + count = 0;
> + }
> + }
>
> if (cur == NULL || (h->keycmp(h, key, cur->key) != 0))
> return NULL;
>
>

2013-09-16 18:40:45

by Dave Jones

[permalink] [raw]
Subject: [PATCH] conditionally reschedule while loading selinux policy.

On a slow machine (with debugging enabled), upgrading selinux policy may take
a considerable amount of time. Long enough that the softlockup detector
gets triggered.

The backtrace looks like this..

> BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
> Call Trace:
> [<ffffffff81221ddf>] symcmp+0xf/0x20
> [<ffffffff81221c27>] hashtab_search+0x47/0x80
> [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
> [<ffffffff812294e8>] convert_context+0x378/0x460
> [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
> [<ffffffff812221b5>] sidtab_map+0x45/0x80
> [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0
> [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> [<ffffffff8121e947>] sel_write_load+0xa7/0x770
> [<ffffffff81139633>] ? vfs_write+0x1c3/0x200
> [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0
> [<ffffffff8113952b>] vfs_write+0xbb/0x200
> [<ffffffff811581c7>] ? fget_light+0x397/0x4b0
> [<ffffffff81139c27>] SyS_write+0x47/0xa0
> [<ffffffff8153bde4>] tracesys+0xdd/0xe2

Stephen Smalley suggested:

> Maybe put a cond_resched() within the ebitmap_for_each_positive_bit()
> loop in mls_convert_context()?

That seems to do the trick. Tested by downgrading and re-upgrading selinux-policy-targeted.

Signed-off-by: Dave Jones <[email protected]>

diff --git a/security/selinux/ss/mls.c b/security/selinux/ss/mls.c
index 40de8d3..9ef8e51 100644
--- a/security/selinux/ss/mls.c
+++ b/security/selinux/ss/mls.c
@@ -500,6 +500,8 @@ int mls_convert_context(struct policydb *oldp,
rc = ebitmap_set_bit(&bitmap, catdatum->value - 1, 1);
if (rc)
return rc;
+
+ cond_resched();
}
ebitmap_destroy(&c->range.level[l].cat);
c->range.level[l].cat = bitmap;

2013-09-16 18:51:55

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On 09/16/2013 02:40 PM, Dave Jones wrote:
> On a slow machine (with debugging enabled), upgrading selinux policy may take
> a considerable amount of time. Long enough that the softlockup detector
> gets triggered.
>
> The backtrace looks like this..
>
> > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
> > Call Trace:
> > [<ffffffff81221ddf>] symcmp+0xf/0x20
> > [<ffffffff81221c27>] hashtab_search+0x47/0x80
> > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
> > [<ffffffff812294e8>] convert_context+0x378/0x460
> > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
> > [<ffffffff812221b5>] sidtab_map+0x45/0x80
> > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580
> > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80
> > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50
> > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100
> > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> > [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> > [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0
> > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe
> > [<ffffffff8121e947>] sel_write_load+0xa7/0x770
> > [<ffffffff81139633>] ? vfs_write+0x1c3/0x200
> > [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0
> > [<ffffffff8113952b>] vfs_write+0xbb/0x200
> > [<ffffffff811581c7>] ? fget_light+0x397/0x4b0
> > [<ffffffff81139c27>] SyS_write+0x47/0xa0
> > [<ffffffff8153bde4>] tracesys+0xdd/0xe2
>
> Stephen Smalley suggested:
>
> > Maybe put a cond_resched() within the ebitmap_for_each_positive_bit()
> > loop in mls_convert_context()?
>
> That seems to do the trick. Tested by downgrading and re-upgrading selinux-policy-targeted.
>
> Signed-off-by: Dave Jones <[email protected]>

Acked-by: Stephen Smalley <[email protected]>

>
> diff --git a/security/selinux/ss/mls.c b/security/selinux/ss/mls.c
> index 40de8d3..9ef8e51 100644
> --- a/security/selinux/ss/mls.c
> +++ b/security/selinux/ss/mls.c
> @@ -500,6 +500,8 @@ int mls_convert_context(struct policydb *oldp,
> rc = ebitmap_set_bit(&bitmap, catdatum->value - 1, 1);
> if (rc)
> return rc;
> +
> + cond_resched();
> }
> ebitmap_destroy(&c->range.level[l].cat);
> c->range.level[l].cat = bitmap;
>
>

2013-09-30 17:25:03

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On Mon, Sep 16, 2013 at 02:40:30PM -0400, Dave Jones wrote:
> On a slow machine (with debugging enabled), upgrading selinux policy may take
> a considerable amount of time. Long enough that the softlockup detector
> gets triggered.
>
> The backtrace looks like this..
>
> > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
> > Call Trace:
> > [<ffffffff81221ddf>] symcmp+0xf/0x20
> > [<ffffffff81221c27>] hashtab_search+0x47/0x80
> > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
> > [<ffffffff812294e8>] convert_context+0x378/0x460
> > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
> > [<ffffffff812221b5>] sidtab_map+0x45/0x80
> > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580

With that patch applied, the problem seems to have moved elsewhere..

BUG: soft lockup - CPU#3 stuck for 22s! [load_policy:8119]
irq event stamp: 1590886
hardirqs last enabled at (1590885): [<ffffffff8152c3dd>] __slab_alloc.constprop.78+0x4c0/0x4d7
hardirqs last disabled at (1590886): [<ffffffff8153e06a>] apic_timer_interrupt+0x6a/0x80
softirqs last enabled at (1590336): [<ffffffff810480d9>] __do_softirq+0x169/0x200
softirqs last disabled at (1590331): [<ffffffff8104839d>] irq_exit+0x11d/0x140
RIP: 0010:[<ffffffff81223182>] [<ffffffff81223182>] hashtab_insert+0x62/0x110

Call Trace:
[<ffffffff812283b5>] policydb_read+0xc25/0x1200
[<ffffffff810b1639>] ? is_module_text_address+0x19/0x40
[<ffffffff8122cefe>] security_load_policy+0x10e/0x580
[<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff810786ad>] ? sched_clock_local+0x1d/0x80
[<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff810786ad>] ? sched_clock_local+0x1d/0x80
[<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff8103098a>] ? __change_page_attr_set_clr+0x82a/0xa50
[<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff8153669c>] ? retint_restore_args+0xe/0xe
[<ffffffff810a196d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[<ffffffff8127b07e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810d2958>] ? rcu_irq_exit+0x68/0xb0
[<ffffffff8153669c>] ? retint_restore_args+0xe/0xe
[<ffffffff8121ff97>] sel_write_load+0xa7/0x770
[<ffffffff811394c3>] ? vfs_write+0x1c3/0x200
[<ffffffff812124de>] ? security_file_permission+0x1e/0xa0
[<ffffffff811393bb>] vfs_write+0xbb/0x200
[<ffffffff81158337>] ? fget_light+0x397/0x4b0
[<ffffffff81139ab7>] SyS_write+0x47/0xa0
[<ffffffff8153d634>] tracesys+0xdd/0xe2

We're holding a bunch of locks here, so we can't just cond_resched. Thoughts ?

Dave

2013-09-30 17:38:21

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On 09/30/2013 01:24 PM, Dave Jones wrote:
> On Mon, Sep 16, 2013 at 02:40:30PM -0400, Dave Jones wrote:
> > On a slow machine (with debugging enabled), upgrading selinux policy may take
> > a considerable amount of time. Long enough that the softlockup detector
> > gets triggered.
> >
> > The backtrace looks like this..
> >
> > > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045]
> > > Call Trace:
> > > [<ffffffff81221ddf>] symcmp+0xf/0x20
> > > [<ffffffff81221c27>] hashtab_search+0x47/0x80
> > > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0
> > > [<ffffffff812294e8>] convert_context+0x378/0x460
> > > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240
> > > [<ffffffff812221b5>] sidtab_map+0x45/0x80
> > > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580
>
> With that patch applied, the problem seems to have moved elsewhere..
>
> BUG: soft lockup - CPU#3 stuck for 22s! [load_policy:8119]
> irq event stamp: 1590886
> hardirqs last enabled at (1590885): [<ffffffff8152c3dd>] __slab_alloc.constprop.78+0x4c0/0x4d7
> hardirqs last disabled at (1590886): [<ffffffff8153e06a>] apic_timer_interrupt+0x6a/0x80
> softirqs last enabled at (1590336): [<ffffffff810480d9>] __do_softirq+0x169/0x200
> softirqs last disabled at (1590331): [<ffffffff8104839d>] irq_exit+0x11d/0x140
> RIP: 0010:[<ffffffff81223182>] [<ffffffff81223182>] hashtab_insert+0x62/0x110
>
> Call Trace:
> [<ffffffff812283b5>] policydb_read+0xc25/0x1200
> [<ffffffff810b1639>] ? is_module_text_address+0x19/0x40
> [<ffffffff8122cefe>] security_load_policy+0x10e/0x580
> [<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff810786ad>] ? sched_clock_local+0x1d/0x80
> [<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff810786ad>] ? sched_clock_local+0x1d/0x80
> [<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8103098a>] ? __change_page_attr_set_clr+0x82a/0xa50
> [<ffffffff81078878>] ? sched_clock_cpu+0xa8/0x100
> [<ffffffff8153669c>] ? retint_restore_args+0xe/0xe
> [<ffffffff810a196d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [<ffffffff8127b07e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff810d2958>] ? rcu_irq_exit+0x68/0xb0
> [<ffffffff8153669c>] ? retint_restore_args+0xe/0xe
> [<ffffffff8121ff97>] sel_write_load+0xa7/0x770
> [<ffffffff811394c3>] ? vfs_write+0x1c3/0x200
> [<ffffffff812124de>] ? security_file_permission+0x1e/0xa0
> [<ffffffff811393bb>] vfs_write+0xbb/0x200
> [<ffffffff81158337>] ? fget_light+0x397/0x4b0
> [<ffffffff81139ab7>] SyS_write+0x47/0xa0
> [<ffffffff8153d634>] tracesys+0xdd/0xe2
>
> We're holding a bunch of locks here, so we can't just cond_resched. Thoughts ?

Sorry, what locks are we holding there? You ought to be able to do a
cond_resched() anywhere during policydb_read() AFAIK; it is loading the
policy into a new structure that isn't being accessed by anything else
yet and the policy_rwlock is only held by security_load_policy after
calling policydb_read and only to switch it into place as the active
policydb.

2013-09-30 21:13:59

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On Mon, Sep 30, 2013 at 01:37:53PM -0400, Stephen Smalley wrote:

> > With that patch applied, the problem seems to have moved elsewhere..
> >
> > BUG: soft lockup - CPU#3 stuck for 22s! [load_policy:8119]
> > irq event stamp: 1590886
> > hardirqs last enabled at (1590885): [<ffffffff8152c3dd>] __slab_alloc.constprop.78+0x4c0/0x4d7
> > hardirqs last disabled at (1590886): [<ffffffff8153e06a>] apic_timer_interrupt+0x6a/0x80
> > softirqs last enabled at (1590336): [<ffffffff810480d9>] __do_softirq+0x169/0x200
> > softirqs last disabled at (1590331): [<ffffffff8104839d>] irq_exit+0x11d/0x140
> > RIP: 0010:[<ffffffff81223182>] [<ffffffff81223182>] hashtab_insert+0x62/0x110
> >
> > Call Trace:
> > [<ffffffff812283b5>] policydb_read+0xc25/0x1200
> ...
> > We're holding a bunch of locks here, so we can't just cond_resched. Thoughts ?
>
> Sorry, what locks are we holding there? You ought to be able to do a
> cond_resched() anywhere during policydb_read() AFAIK; it is loading the
> policy into a new structure that isn't being accessed by anything else
> yet and the policy_rwlock is only held by security_load_policy after
> calling policydb_read and only to switch it into place as the active
> policydb.

Hmm, I thought I had tried this already, and got a lot of spew, but it turns out
for some reason I had previously patched hashtab_search instead.

I'll try running with this for a while..

Dave


diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
index 933e735..2cc4961 100644
--- a/security/selinux/ss/hashtab.c
+++ b/security/selinux/ss/hashtab.c
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/errno.h>
+#include <linux/sched.h>
#include "hashtab.h"

struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void *key),
@@ -40,6 +41,8 @@ int hashtab_insert(struct hashtab *h, void *key, void *datum)
u32 hvalue;
struct hashtab_node *prev, *cur, *newnode;

+ cond_resched();
+
if (!h || h->nel == HASHTAB_MAX_NODES)
return -EINVAL;

2013-10-03 20:36:17

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On Monday, September 30, 2013 05:13:42 PM Dave Jones wrote:
> On Mon, Sep 30, 2013 at 01:37:53PM -0400, Stephen Smalley wrote:
> > > With that patch applied, the problem seems to have moved elsewhere..
> > >
> > > BUG: soft lockup - CPU#3 stuck for 22s! [load_policy:8119]
> > >
> > > irq event stamp: 1590886
> > > hardirqs last enabled at (1590885): [<ffffffff8152c3dd>]
> > > __slab_alloc.constprop.78+0x4c0/0x4d7 hardirqs last disabled at
> > > (1590886): [<ffffffff8153e06a>] apic_timer_interrupt+0x6a/0x80
> > > softirqs last enabled at (1590336): [<ffffffff810480d9>]
> > > __do_softirq+0x169/0x200 softirqs last disabled at (1590331):
> > > [<ffffffff8104839d>] irq_exit+0x11d/0x140 RIP:
> > > 0010:[<ffffffff81223182>] [<ffffffff81223182>]
> > > hashtab_insert+0x62/0x110 > >
> > > Call Trace:
> > > [<ffffffff812283b5>] policydb_read+0xc25/0x1200
> >
> > ...
> >
> > > We're holding a bunch of locks here, so we can't just cond_resched.
> > > Thoughts ?
> >
> > Sorry, what locks are we holding there? You ought to be able to do a
> > cond_resched() anywhere during policydb_read() AFAIK; it is loading the
> > policy into a new structure that isn't being accessed by anything else
> > yet and the policy_rwlock is only held by security_load_policy after
> > calling policydb_read and only to switch it into place as the active
> > policydb.
>
> Hmm, I thought I had tried this already, and got a lot of spew, but it turns
> out for some reason I had previously patched hashtab_search instead.
>
> I'll try running with this for a while..

Hi Dave,

Just checking to see if this patch solved your problem ... ?

> diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> index 933e735..2cc4961 100644
> --- a/security/selinux/ss/hashtab.c
> +++ b/security/selinux/ss/hashtab.c
> @@ -6,6 +6,7 @@
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/errno.h>
> +#include <linux/sched.h>
> #include "hashtab.h"
>
> struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const
> void *key), @@ -40,6 +41,8 @@ int hashtab_insert(struct hashtab *h, void
> *key, void *datum) u32 hvalue;
> struct hashtab_node *prev, *cur, *newnode;
>
> + cond_resched();
> +
> if (!h || h->nel == HASHTAB_MAX_NODES)
> return -EINVAL;

--
paul moore
http://www.paul-moore.com

2013-10-03 20:40:46

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On Thu, Oct 03, 2013 at 04:36:10PM -0400, Paul Moore wrote:
> On Monday, September 30, 2013 05:13:42 PM Dave Jones wrote:
> > On Mon, Sep 30, 2013 at 01:37:53PM -0400, Stephen Smalley wrote:
> > > > With that patch applied, the problem seems to have moved elsewhere..
> > >
> > > Sorry, what locks are we holding there? You ought to be able to do a
> > > cond_resched() anywhere during policydb_read() AFAIK; it is loading the
> > > policy into a new structure that isn't being accessed by anything else
> > > yet and the policy_rwlock is only held by security_load_policy after
> > > calling policydb_read and only to switch it into place as the active
> > > policydb.
> >
> > Hmm, I thought I had tried this already, and got a lot of spew, but it turns
> > out for some reason I had previously patched hashtab_search instead.
> >
> > I'll try running with this for a while..
>
> Hi Dave,
>
> Just checking to see if this patch solved your problem ... ?
>
> > diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> > index 933e735..2cc4961 100644
> > --- a/security/selinux/ss/hashtab.c
> > +++ b/security/selinux/ss/hashtab.c
> > @@ -6,6 +6,7 @@
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > #include <linux/errno.h>
> > +#include <linux/sched.h>
> > #include "hashtab.h"
> >
> > struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const
> > void *key), @@ -40,6 +41,8 @@ int hashtab_insert(struct hashtab *h, void
> > *key, void *datum) u32 hvalue;
> > struct hashtab_node *prev, *cur, *newnode;
> >
> > + cond_resched();
> > +
> > if (!h || h->nel == HASHTAB_MAX_NODES)
> > return -EINVAL;

I couldn't get a backtrace when I downgraded, and reupgraded my policy,
but that said, I didn't when I tested that way on my first patch either.
It wasn't until I got a newer policy that I saw the 2nd spew.
I was planning on sitting on this until the next policy update just
to confirm.

Dave

2013-10-03 20:49:33

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH] conditionally reschedule while loading selinux policy.

On Thursday, October 03, 2013 04:40:31 PM Dave Jones wrote:
> I couldn't get a backtrace when I downgraded, and reupgraded my policy,
> but that said, I didn't when I tested that way on my first patch either.
> It wasn't until I got a newer policy that I saw the 2nd spew.
> I was planning on sitting on this until the next policy update just
> to confirm.

Okay, no problem. Let me know how it goes.

Thanks,
-Paul

--
paul moore
http://www.paul-moore.com