2021-02-23 21:46:06

by Tyler Hicks

[permalink] [raw]
Subject: [BUG] Race between policy reload sidtab conversion and live conversion

I'm seeing a race during policy load while the "regular" sidtab
conversion is happening and a live conversion starts to take place in
sidtab_context_to_sid().

We have an initial policy that's loaded by systemd ~0.6s into boot and
then another policy gets loaded ~2-3s into boot. That second policy load
is what hits the race condition situation because the sidtab is only
partially populated and there's a decent amount of filesystem operations
happening, at the same time, which are triggering live conversions.

[ 3.091910] Unable to handle kernel paging request at virtual address 001303e1aa140408
[ 3.100083] Mem abort info:
[ 3.102963] ESR = 0x96000004
[ 3.102965] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.102967] SET = 0, FnV = 0
[ 3.102968] EA = 0, S1PTW = 0
[ 3.102969] Data abort info:
[ 3.102970] ISV = 0, ISS = 0x00000004
[ 3.102971] CM = 0, WnR = 0
[ 3.102973] [001303e1aa140408] address between user and kernel address ranges
[ 3.102977] Internal error: Oops: 96000004 [#1] SMP
[ 3.102981] Modules linked in:
[ 3.111250] bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
[ 3.111259] CPU: 0 PID: 529 Comm: (tservice) Tainted: G O 5.10.17.1 #1
[ 3.119881] Hardware name: redacted (DT)
[ 3.119884] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--)
[ 3.119898] pc : sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202)
[ 3.119902] lr : sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312)
[ 3.126105] sp : ffff800011ceb810
[ 3.126106] x29: ffff800011ceb810 x28: 0000000000000000
[ 3.126108] x27: 0000000000000005 x26: ffffda109f3f2000
[ 3.126110] x25: 00000000ffffffff x24: 0000000000000000
[ 3.126113] x23: 0000000000000001
[ 3.133124] x22: 0000000000000005
[ 3.133125] x21: aa1303e1aa140408 x20: 0000000000000001
[ 3.133127] x19: 00000000000000cc x18: 0000000000000003
[ 3.133128] x17: 000000000000003e x16: 000000000000003f
[ 3.145519] x15: 0000000000000039 x14: 000000000000002e
[ 3.145521] x13: 0000000058294db1 x12: 00000000158294db
[ 3.145523] x11: 000000007f0b3af2 x10: 0000000000004e00
[ 3.145525] x9 : 00000000000000cd x8 : 0000000000000005
[ 3.281289] x7 : feff735e62647764 x6 : 00000000000080cc
[ 3.286769] x5 : 0000000000000005 x4 : ffff3f47c5b20000
[ 3.292249] x3 : ffff800011ceb900 x2 : 0000000000000001
[ 3.297729] x1 : 00000000000000cc x0 : aa1303e1aa1403e0
[ 3.303210] Call trace:
[ 3.305733] sidtab_do_lookup (/usr/src/kernel/security/selinux/ss/sidtab.c:202)
[ 3.309867] sidtab_context_to_sid (/usr/src/kernel/security/selinux/ss/sidtab.c:312)
[ 3.314451] security_context_to_sid_core (/usr/src/kernel/security/selinux/ss/services.c:1557)
[ 3.319661] security_context_to_sid_default (/usr/src/kernel/security/selinux/ss/services.c:1616)
[ 3.324961] inode_doinit_use_xattr (/usr/src/kernel/security/selinux/hooks.c:1366)
[ 3.329634] inode_doinit_with_dentry (/usr/src/kernel/security/selinux/hooks.c:1457)
[ 3.334486] selinux_d_instantiate (/usr/src/kernel/security/selinux/hooks.c:6278)
[ 3.338889] security_d_instantiate (/usr/src/kernel/security/security.c:2004)
[ 3.343385] d_splice_alias (/usr/src/kernel/fs/dcache.c:3030)
[ 3.347251] squashfs_lookup (/usr/src/kernel/fs/squashfs/namei.c:220)
[ 3.385561] el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
[ 3.389517] el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
[ 3.392939] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0)
All code
========
0: 18 27 sbb %ah,(%rdi)
2: 00 51 d7 add %dl,-0x29(%rcx)
5: 01 00 add %eax,(%rax)
7: 34 68 xor $0x68,%al
9: 27 (bad)
a: d8 1a fcomps (%rdx)
c:* 15 4c 28 8b a0 adc $0xa08b284c,%eax <-- trapping instruction
11: 02 40 f9 add -0x7(%rax),%al

Code starting with the faulting instruction
===========================================
0: a0 .byte 0xa0
1: 02 40 f9 add -0x7(%rax),%al
[ 3.399230] ---[ end trace cc1840b3ff2c7506 ]---

The corresponding source from sidtab.c:

179 static struct sidtab_entry *sidtab_do_lookup(struct sidtab *s, u32 index,
180 int alloc)
181 {
...
193 /* lookup inside the subtree */
194 entry = &s->roots[level];
195 while (level != 0) {
196 capacity_shift -= SIDTAB_INNER_SHIFT;
197 --level;
198
199 entry = &entry->ptr_inner->entries[leaf_index >> capacity_shift];
200 leaf_index &= ((u32)1 << capacity_shift) - 1;
201
202 if (!entry->ptr_inner) {
203 if (alloc)
204 entry->ptr_inner = kzalloc(SIDTAB_NODE_ALLOC_SIZE,
205 GFP_ATOMIC);
206 if (!entry->ptr_inner)
207 return NULL;
208 }
209 }
210 if (!entry->ptr_leaf) {
211 if (alloc)
212 entry->ptr_leaf = kzalloc(SIDTAB_NODE_ALLOC_SIZE,
213 GFP_ATOMIC);
214 if (!entry->ptr_leaf)
215 return NULL;
216 }
217 return &entry->ptr_leaf->entries[index % SIDTAB_LEAF_ENTRIES];
218 }

...

263 int sidtab_context_to_sid(struct sidtab *s, struct context *context,
264 u32 *sid)
265 {
...
305 /*
306 * if we are building a new sidtab, we need to convert the context
307 * and insert it there as well
308 */
309 if (convert) {
310 rc = -ENOMEM;
311 dst_convert = sidtab_do_lookup(convert->target, count, 1);
312 if (!dst_convert) {
313 context_destroy(&dst->context);
314 goto out_unlock;
315 }
...

What I'm having trouble understanding is how the above call to
sidtab_do_lookup(), on the target sidtab that's undergoing a conversion
in sidtab_convert(), can be expected to work. sidtab_convert_tree() is
allocating and initializing ptr_inner sidtab nodes at the same time
sidtab_do_lookup() is trying to use them with no locking being performed
on the target sidtab.

Ondrej specifically mentions, in commit ee1a84fdfeed ("selinux: overhaul
sidtab to fix bug and improve performance"), that there's no need to
freeze the sidtab during policy reloads so I know that there's thought
given to these code paths running in parallel.

Can someone more knowledgeable on how the sidtab locking is expected to
work suggest a fix for this crash?

Tyler


2021-02-23 21:54:51

by Tyler Hicks

[permalink] [raw]
Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion

On 2021-02-23 15:43:48, Tyler Hicks wrote:
> I'm seeing a race during policy load while the "regular" sidtab
> conversion is happening and a live conversion starts to take place in
> sidtab_context_to_sid().
>
> We have an initial policy that's loaded by systemd ~0.6s into boot and
> then another policy gets loaded ~2-3s into boot. That second policy load
> is what hits the race condition situation because the sidtab is only
> partially populated and there's a decent amount of filesystem operations
> happening, at the same time, which are triggering live conversions.

Here are the relevant messages written to the ring buffer showing the initial
policy load, the second policy load, the sidtab conversion starting, and then
the stack trace:

[ 0.112159] SELinux: Initializing.
...
[ 0.624107] audit: type=1404 audit(0.492:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1
[ 0.664063] SELinux: Permission perfmon in class capability2 not defined in policy.
[ 0.664068] SELinux: Permission bpf in class capability2 not defined in policy.
[ 0.664070] SELinux: Permission checkpoint_restore in class capability2 not defined in policy.
[ 0.664077] SELinux: Permission perfmon in class cap2_userns not defined in policy.
[ 0.664079] SELinux: Permission bpf in class cap2_userns not defined in policy.
[ 0.664080] SELinux: Permission checkpoint_restore in class cap2_userns not defined in policy.
[ 0.664114] SELinux: Class perf_event not defined in policy.
[ 0.664115] SELinux: Class lockdown not defined in policy.
[ 0.664117] SELinux: the above unknown classes and permissions will be allowed
[ 0.667863] SELinux: policy capability network_peer_controls=1
[ 0.667866] SELinux: policy capability open_perms=1
[ 0.667867] SELinux: policy capability extended_socket_class=1
[ 0.667868] SELinux: policy capability always_check_network=0
[ 0.667870] SELinux: policy capability cgroup_seclabel=1
[ 0.667871] SELinux: policy capability nnp_nosuid_transition=1
[ 0.667873] SELinux: policy capability genfs_seclabel_symlinks=0
[ 0.712152] audit: type=1403 audit(0.584:3): auid=4294967295 ses=4294967295 lsm=selinux res=1
[ 0.714430] systemd[1]: Successfully loaded SELinux policy in 91.004ms.
...
[ 3.042735] SELinux: Permission perfmon in class capability2 not defined in policy.
[ 3.042741] SELinux: Permission bpf in class capability2 not defined in policy.
[ 3.042743] SELinux: Permission checkpoint_restore in class capability2 not defined in policy.
[ 3.042750] SELinux: Permission perfmon in class cap2_userns not defined in policy.
[ 3.042751] SELinux: Permission bpf in class cap2_userns not defined in policy.
[ 3.042753] SELinux: Permission checkpoint_restore in class cap2_userns not defined in policy.
[ 3.042787] SELinux: Class perf_event not defined in policy.
[ 3.042789] SELinux: Class lockdown not defined in policy.
[ 3.042791] SELinux: the above unknown classes and permissions will be allowed
[ 3.042801] SELinux: Converting 208 SID table entries...
[ 3.044951] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000040000028
[ 3.050919] SELinux: policy capability network_peer_controls=1
[ 3.055990] Mem abort info:
[ 3.055992] ESR = 0x96000004
[ 3.055994] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.055995] SET = 0, FnV = 0
[ 3.055996] EA = 0, S1PTW = 0
[ 3.055997] Data abort info:
[ 3.055998] ISV = 0, ISS = 0x00000004
[ 3.055999] CM = 0, WnR = 0
[ 3.056002] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000940ea000
[ 3.056003] [0000000040000028] pgd=0000000000000000, p4d=0000000000000000
[ 3.058894] SELinux: policy capability open_perms=1
[ 3.062037]
[ 3.062041] Internal error: Oops: 96000004 [#1] SMP
[ 3.062044] Modules linked in: bnxt_en pcie_iproc_platform
[ 3.067530] SELinux: policy capability extended_socket_class=1
[ 3.070674] pcie_iproc diagbe(O)
[ 3.070680] CPU: 1 PID: 512 Comm: (tservice) Tainted: G O 5.10.17.1 #1
[ 3.070683] Hardware name: redacted (DT)
[ 3.073924] SELinux: policy capability always_check_network=0
[ 3.076890] pstate: 20400085 (nzCv daIf +PAN -UAO -TCO BTYPE=--)
[ 3.076904] pc : sidtab_do_lookup+0x128/0x1b4
[ 3.076908] lr : sidtab_context_to_sid+0x210/0x378
[ 3.080864] SELinux: policy capability cgroup_seclabel=1
[ 3.083920] sp : ffff800011bfbaf0
[ 3.083922] x29: ffff800011bfbaf0
[ 3.090570] SELinux: policy capability nnp_nosuid_transition=1
[ 3.097564] x28: 0000000000000000
[ 3.097566] x27: 0000000000000005 x26: ffffd1b6451f2000
[ 3.097568] x25: 00000000ffffffff x24: 0000000000000000
[ 3.102693] SELinux: policy capability genfs_seclabel_symlinks=0
[ 3.104225] x23: 0000000000000001 x22: 0000000000000005
[ 3.104227] x21: 0000000040000028 x20: 0000000000000001
[ 3.104230] x19: 00000000000000d0 x18: 5b00000000000000
[ 3.217640] x17: 0000000077493ecb x16: 00000000c2b2ae35
[ 3.223120] x15: 00000000639285d3 x14: 0000000000000010
[ 3.228600] x13: 00000000128b8525 x12: 000000005128b852
[ 3.234079] x11: 0000000049dd48c1 x10: 0000000000004e00
[ 3.239559] x9 : 00000000000000d1 x8 : 0000000000000005
[ 3.245039] x7 : 0000000000000001 x6 : 0000000000000000
[ 3.250518] x5 : ffff800011bfbc48 x4 : 0000000000000010
[ 3.255998] x3 : 0000000000000002 x2 : 0000000000000001
[ 3.261477] x1 : 00000000000000d0 x0 : 0000000040000000
[ 3.266957] Call trace:
[ 3.269480] sidtab_do_lookup+0x128/0x1b4
[ 3.273615] sidtab_context_to_sid+0x210/0x378
[ 3.278198] security_compute_sid+0x3f4/0x60c
[ 3.282692] security_transition_sid+0x30/0x38
[ 3.287277] selinux_bprm_creds_for_exec+0x11c/0x2e8
[ 3.292398] security_bprm_creds_for_exec+0x30/0x4c
[ 3.297432] bprm_execve+0x100/0x1d4
[ 3.301118] do_execveat_common+0x1e8/0x228
[ 3.305432] __arm64_sys_execve+0x3c/0x4c
[ 3.309569] do_el0_svc+0xec/0x154
[ 3.313079] el0_svc+0xc/0x14
[ 3.316139] el0_sync_handler+0x7c/0xd8
[ 3.320095] el0_sync+0x144/0x180
[ 3.323516] Code: 51002718 340001d7 1ad82768 8b284c15 (f94002a0)
[ 3.329808] ---[ end trace 69ebb6381d57e49b ]---

Tyler

2021-02-24 00:47:14

by Tyler Hicks

[permalink] [raw]
Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion

On 2021-02-23 15:50:56, Tyler Hicks wrote:
> On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > I'm seeing a race during policy load while the "regular" sidtab
> > conversion is happening and a live conversion starts to take place in
> > sidtab_context_to_sid().
> >
> > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > then another policy gets loaded ~2-3s into boot. That second policy load
> > is what hits the race condition situation because the sidtab is only
> > partially populated and there's a decent amount of filesystem operations
> > happening, at the same time, which are triggering live conversions.

Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
change here:

https://lore.kernel.org/selinux/[email protected]/

I'll put these changes through a validation run (the only place that I
can seem to reproduce this crash) and see how it looks.

Tyler

2021-02-24 01:10:19

by Paul Moore

[permalink] [raw]
Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion

On Tue, Feb 23, 2021 at 5:36 PM Tyler Hicks <[email protected]> wrote:
> On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > I'm seeing a race during policy load while the "regular" sidtab
> > > conversion is happening and a live conversion starts to take place in
> > > sidtab_context_to_sid().
> > >
> > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > then another policy gets loaded ~2-3s into boot. That second policy load
> > > is what hits the race condition situation because the sidtab is only
> > > partially populated and there's a decent amount of filesystem operations
> > > happening, at the same time, which are triggering live conversions.
>
> Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> change here:
>
> https://lore.kernel.org/selinux/[email protected]/
>
> I'll put these changes through a validation run (the only place that I
> can seem to reproduce this crash) and see how it looks.

Thanks, please let us know what you find out.

--
paul moore
http://www.paul-moore.com

2021-02-24 13:07:42

by Ondrej Mosnacek

[permalink] [raw]
Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion

On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks
<[email protected]> wrote:
> On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > I'm seeing a race during policy load while the "regular" sidtab
> > > conversion is happening and a live conversion starts to take place in
> > > sidtab_context_to_sid().
> > >
> > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > then another policy gets loaded ~2-3s into boot. That second policy load
> > > is what hits the race condition situation because the sidtab is only
> > > partially populated and there's a decent amount of filesystem operations
> > > happening, at the same time, which are triggering live conversions.
>
> Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> change here:
>
> https://lore.kernel.org/selinux/[email protected]/
>
> I'll put these changes through a validation run (the only place that I
> can seem to reproduce this crash) and see how it looks.

Hm... I think there is actually another race condition introduced by
the switch from rwlock to RCU [1]... Judging from the call trace you
may be hitting that.

Basically, before the switch the sidtab swapover worked like this:
1. Start live conversion of new entries.
2. Convert existing entries.
[Still only the old sidtab is visible to readers here.]
3. Swap sidtab under write lock.
4. Now only the new sidtab is visible to readers, so the old one can
be destroyed.

After the switch to RCU, we now have:
1. Start live conversion of new entries.
2. Convert existing entries.
3. RCU-assign the new policy pointer to selinux_state.
[!!! Now actually both old and new sidtab may be referenced by
readers, since there is no synchronization barrier previously provided
by the write lock.]
4. Wait for synchronize_rcu() to return.
5. Now only the new sidtab is visible to readers, so the old one can
be destroyed.

So the race can happen between 3. and 5., if one thread already sees
the new sidtab and adds a new entry there, and a second thread still
has the reference to the old sidtab and also tires to add a new entry;
live-converting to the new sidtab, which it doesn't expect to change
by itself. Unfortunately I failed to realize this when reviewing the
patch :/

I think the only two options to fix it are A) switching back to
read-write lock (the easy and safe way; undoing the performance
benefits of [1]), or B) implementing a safe two-way live conversion of
new sidtab entries, so that both tables are kept in sync while they
are both available (more complicated and with possible tricky
implications of different interpretations of contexts by the two
policies).

[1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.

2021-02-24 15:18:10

by Tyler Hicks

[permalink] [raw]
Subject: Re: [BUG] Race between policy reload sidtab conversion and live conversion

On 2021-02-24 10:33:46, Ondrej Mosnacek wrote:
> On Tue, Feb 23, 2021 at 11:37 PM Tyler Hicks
> <[email protected]> wrote:
> > On 2021-02-23 15:50:56, Tyler Hicks wrote:
> > > On 2021-02-23 15:43:48, Tyler Hicks wrote:
> > > > I'm seeing a race during policy load while the "regular" sidtab
> > > > conversion is happening and a live conversion starts to take place in
> > > > sidtab_context_to_sid().
> > > >
> > > > We have an initial policy that's loaded by systemd ~0.6s into boot and
> > > > then another policy gets loaded ~2-3s into boot. That second policy load
> > > > is what hits the race condition situation because the sidtab is only
> > > > partially populated and there's a decent amount of filesystem operations
> > > > happening, at the same time, which are triggering live conversions.
> >
> > Hmm, perhaps this is the same problem that's fixed by Ondrej's proposed
> > change here:
> >
> > https://lore.kernel.org/selinux/[email protected]/
> >
> > I'll put these changes through a validation run (the only place that I
> > can seem to reproduce this crash) and see how it looks.
>
> Hm... I think there is actually another race condition introduced by
> the switch from rwlock to RCU [1]... Judging from the call trace you
> may be hitting that.

I believe your patches above fixed the race I was seeing. I was able to
make it through a full validation run without any crashes. Without those
patches applied, I would see several crashes resulting from this race
over the course of a validation run.

I'll continue to test with your changes and let you know if I end up
running into the other race you spotted.

Tyler

>
> Basically, before the switch the sidtab swapover worked like this:
> 1. Start live conversion of new entries.
> 2. Convert existing entries.
> [Still only the old sidtab is visible to readers here.]
> 3. Swap sidtab under write lock.
> 4. Now only the new sidtab is visible to readers, so the old one can
> be destroyed.
>
> After the switch to RCU, we now have:
> 1. Start live conversion of new entries.
> 2. Convert existing entries.
> 3. RCU-assign the new policy pointer to selinux_state.
> [!!! Now actually both old and new sidtab may be referenced by
> readers, since there is no synchronization barrier previously provided
> by the write lock.]
> 4. Wait for synchronize_rcu() to return.
> 5. Now only the new sidtab is visible to readers, so the old one can
> be destroyed.
>
> So the race can happen between 3. and 5., if one thread already sees
> the new sidtab and adds a new entry there, and a second thread still
> has the reference to the old sidtab and also tires to add a new entry;
> live-converting to the new sidtab, which it doesn't expect to change
> by itself. Unfortunately I failed to realize this when reviewing the
> patch :/
>
> I think the only two options to fix it are A) switching back to
> read-write lock (the easy and safe way; undoing the performance
> benefits of [1]), or B) implementing a safe two-way live conversion of
> new sidtab entries, so that both tables are kept in sync while they
> are both available (more complicated and with possible tricky
> implications of different interpretations of contexts by the two
> policies).
>
> [1] 1b8b31a2e612 ("selinux: convert policy read-write lock to RCU")
>
> --
> Ondrej Mosnacek
> Software Engineer, Linux Security - SELinux kernel
> Red Hat, Inc.
>