2023-04-06 17:39:41

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Semantics of blktrace with lockdown (integrity) enabled kernel.

Hey Jens, Paul, James, Nathan,

We are trying to use blktrace with a kernel that has lockdown enabled and find that it cannot run.

Specifically the issue is that we are trying to do is pretty simple:

strace -f blktrace -d /dev/sda -w 60
 
[pid 148882] <... mprotect resumed>)    = 0
[pid 148881] openat(AT_FDCWD, "/sys/kernel/debug/block/sda/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
[pid 148882] sched_setaffinity(0, 8, [1]) = 0
[pid 148881] <... openat resumed>)      = -1 EPERM (Operation not permitted)

which fails. The analysis from Eric (CCed) is that

All debugfs entries do not exist until blktrace is run.  It is opening
/sys/kernel/debug/block/sda/trace0 which isn’t there normally. While running the utility,
to place something in it, it must have the write permission set.  When exiting out of
blktrace, the entry is gone, both on a machine running with secure boot enabled
and one with it disabled.  Which also indicates the write permission was set,
otherwise the entry would still be there.

The fix is simple enough (see attachment) but we are not sure about the semantics of what
lockdown has in mind.

Looking at the include/linux/security.h the LOCKDOWN_TRACEFS exists which would
imply that it is expected that operations with tracefs *should* work with lockdown (integrity mode).

But at the same point, debugfs writable attributes are a nono with lockdown.

So what is the right way forward?

Thank you.


Attachments:
(No filename) (1.45 kB)
0001-debugfs-whitelisted-relay-file-for-lockdown.patch (2.24 kB)
Download all attachments

2023-04-06 18:44:13

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Thu, Apr 6, 2023 at 1:38 PM Konrad Rzeszutek Wilk
<[email protected]> wrote:
>
> Hey Jens, Paul, James, Nathan,
>
> We are trying to use blktrace with a kernel that has lockdown enabled and find that it cannot run.
>
> Specifically the issue is that we are trying to do is pretty simple:
>
> strace -f blktrace -d /dev/sda -w 60
>
> [pid 148882] <... mprotect resumed>) = 0
> [pid 148881] openat(AT_FDCWD, "/sys/kernel/debug/block/sda/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
> [pid 148882] sched_setaffinity(0, 8, [1]) = 0
> [pid 148881] <... openat resumed>) = -1 EPERM (Operation not permitted)
>
> which fails. The analysis from Eric (CCed) is that
>
> All debugfs entries do not exist until blktrace is run. It is opening
> /sys/kernel/debug/block/sda/trace0 which isn’t there normally. While running the utility,
> to place something in it, it must have the write permission set. When exiting out of
> blktrace, the entry is gone, both on a machine running with secure boot enabled
> and one with it disabled. Which also indicates the write permission was set,
> otherwise the entry would still be there.
>
> The fix is simple enough (see attachment) but we are not sure about the semantics of what
> lockdown has in mind.
>
> Looking at the include/linux/security.h the LOCKDOWN_TRACEFS exists which would
> imply that it is expected that operations with tracefs *should* work with lockdown (integrity mode).
>
> But at the same point, debugfs writable attributes are a nono with lockdown.
>
> So what is the right way forward?

What did you use as a basis for your changes? I'm looking at the
patch you sent and it appears to be making a change to a
debugfs_lockdown_whitelisted() function defined in
fs/debugfs/internal.h which does not exist in Linus' tree. If I
search through all of the archives on lore.kernel.org the only hit I
get is your email, so it seems doubtful it is in a subsystem tree
which hasn't made its way to Linus yet.

Before we go any further, can you please verify that your issue is
reproducible on a supported, upstream tree (preferably Linus')?

--
paul-moore.com

2023-04-06 19:31:22

by Junxiao Bi

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On 4/6/23 11:39 AM, Paul Moore wrote:

> On Thu, Apr 6, 2023 at 1:38 PM Konrad Rzeszutek Wilk
> <[email protected]> wrote:
>> Hey Jens, Paul, James, Nathan,
>>
>> We are trying to use blktrace with a kernel that has lockdown enabled and find that it cannot run.
>>
>> Specifically the issue is that we are trying to do is pretty simple:
>>
>> strace -f blktrace -d /dev/sda -w 60
>>
>> [pid 148882] <... mprotect resumed>) = 0
>> [pid 148881] openat(AT_FDCWD, "/sys/kernel/debug/block/sda/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
>> [pid 148882] sched_setaffinity(0, 8, [1]) = 0
>> [pid 148881] <... openat resumed>) = -1 EPERM (Operation not permitted)
>>
>> which fails. The analysis from Eric (CCed) is that
>>
>> All debugfs entries do not exist until blktrace is run. It is opening
>> /sys/kernel/debug/block/sda/trace0 which isn’t there normally. While running the utility,
>> to place something in it, it must have the write permission set. When exiting out of
>> blktrace, the entry is gone, both on a machine running with secure boot enabled
>> and one with it disabled. Which also indicates the write permission was set,
>> otherwise the entry would still be there.
>>
>> The fix is simple enough (see attachment) but we are not sure about the semantics of what
>> lockdown has in mind.
>>
>> Looking at the include/linux/security.h the LOCKDOWN_TRACEFS exists which would
>> imply that it is expected that operations with tracefs *should* work with lockdown (integrity mode).
>>
>> But at the same point, debugfs writable attributes are a nono with lockdown.
>>
>> So what is the right way forward?
> What did you use as a basis for your changes? I'm looking at the
> patch you sent and it appears to be making a change to a
> debugfs_lockdown_whitelisted() function defined in
> fs/debugfs/internal.h which does not exist in Linus' tree. If I
> search through all of the archives on lore.kernel.org the only hit I
> get is your email, so it seems doubtful it is in a subsystem tree
> which hasn't made its way to Linus yet.
>
> Before we go any further, can you please verify that your issue is
> reproducible on a supported, upstream tree (preferably Linus')?

The patch attached is applied to oracle kernel which is just used to
demo the idea of a possible fix.

Upstream will have the same issue because blktrace uses relay files from
debugfs to transfer trace information from kernel to userspace. Those
relay files are having permission 0400 which are good, but they support
mmap (struct file_operations relay_file_operations), which are against
the rule of lock down. Is there any security concern to whitelist these
files in lockdown mode? Any idea how to fix this for upstream?

static int debugfs_locked_down(struct inode *inode,
                   struct file *filp,
                   const struct file_operations *real_fops)
{
    if ((inode->i_mode & 07777 & ~0444) == 0 &&
        !(filp->f_mode & FMODE_WRITE) &&
        !real_fops->unlocked_ioctl &&
        !real_fops->compat_ioctl &&
        !real_fops->mmap)
        return 0;

    if (security_locked_down(LOCKDOWN_DEBUGFS))
        return -EPERM;

    return 0;
}

Thanks,

Junxiao.

>

2023-04-06 19:39:15

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
> On Thu, Apr 6, 2023 at 1:38 PM Konrad Rzeszutek Wilk
> <[email protected]> wrote:
> >
> > Hey Jens, Paul, James, Nathan,
> >
> > We are trying to use blktrace with a kernel that has lockdown enabled and find that it cannot run.
> >
> > Specifically the issue is that we are trying to do is pretty simple:
> >
> > strace -f blktrace -d /dev/sda -w 60
> >
> > [pid 148882] <... mprotect resumed>) = 0
> > [pid 148881] openat(AT_FDCWD, "/sys/kernel/debug/block/sda/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
> > [pid 148882] sched_setaffinity(0, 8, [1]) = 0
> > [pid 148881] <... openat resumed>) = -1 EPERM (Operation not permitted)
> >
> > which fails. The analysis from Eric (CCed) is that
> >
> > All debugfs entries do not exist until blktrace is run. It is opening
> > /sys/kernel/debug/block/sda/trace0 which isn’t there normally. While running the utility,
> > to place something in it, it must have the write permission set. When exiting out of
> > blktrace, the entry is gone, both on a machine running with secure boot enabled
> > and one with it disabled. Which also indicates the write permission was set,
> > otherwise the entry would still be there.
> >
> > The fix is simple enough (see attachment) but we are not sure about the semantics of what
> > lockdown has in mind.
> >
> > Looking at the include/linux/security.h the LOCKDOWN_TRACEFS exists which would
> > imply that it is expected that operations with tracefs *should* work with lockdown (integrity mode).
> >
> > But at the same point, debugfs writable attributes are a nono with lockdown.
> >
> > So what is the right way forward?
>
> What did you use as a basis for your changes? I'm looking at the
> patch you sent and it appears to be making a change to a
> debugfs_lockdown_whitelisted() function defined in
> fs/debugfs/internal.h which does not exist in Linus' tree. If I
> search through all of the archives on lore.kernel.org the only hit I
> get is your email, so it seems doubtful it is in a subsystem tree
> which hasn't made its way to Linus yet.

My apologies. We had to add some extra code for flipping IBRS on/off at
some point and that is why see this 'whitelisted' one. A more upstream
appropiate patch not be based on this.
>
> Before we go any further, can you please verify that your issue is
> reproducible on a supported, upstream tree (preferably Linus')?

Yes. Very much so.
Thank you.

2023-04-06 21:34:04

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Thu, Apr 6, 2023 at 3:30 PM Junxiao Bi <[email protected]> wrote:
> On 4/6/23 11:39 AM, Paul Moore wrote:
> > On Thu, Apr 6, 2023 at 1:38 PM Konrad Rzeszutek Wilk
> > <[email protected]> wrote:
> >> Hey Jens, Paul, James, Nathan,
> >>
> >> We are trying to use blktrace with a kernel that has lockdown enabled and find that it cannot run.
> >>
> >> Specifically the issue is that we are trying to do is pretty simple:
> >>
> >> strace -f blktrace -d /dev/sda -w 60
> >>
> >> [pid 148882] <... mprotect resumed>) = 0
> >> [pid 148881] openat(AT_FDCWD, "/sys/kernel/debug/block/sda/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
> >> [pid 148882] sched_setaffinity(0, 8, [1]) = 0
> >> [pid 148881] <... openat resumed>) = -1 EPERM (Operation not permitted)
> >>
> >> which fails. The analysis from Eric (CCed) is that
> >>
> >> All debugfs entries do not exist until blktrace is run. It is opening
> >> /sys/kernel/debug/block/sda/trace0 which isn’t there normally. While running the utility,
> >> to place something in it, it must have the write permission set. When exiting out of
> >> blktrace, the entry is gone, both on a machine running with secure boot enabled
> >> and one with it disabled. Which also indicates the write permission was set,
> >> otherwise the entry would still be there.
> >>
> >> The fix is simple enough (see attachment) but we are not sure about the semantics of what
> >> lockdown has in mind.
> >>
> >> Looking at the include/linux/security.h the LOCKDOWN_TRACEFS exists which would
> >> imply that it is expected that operations with tracefs *should* work with lockdown (integrity mode).
> >>
> >> But at the same point, debugfs writable attributes are a nono with lockdown.
> >>
> >> So what is the right way forward?
> > What did you use as a basis for your changes? I'm looking at the
> > patch you sent and it appears to be making a change to a
> > debugfs_lockdown_whitelisted() function defined in
> > fs/debugfs/internal.h which does not exist in Linus' tree. If I
> > search through all of the archives on lore.kernel.org the only hit I
> > get is your email, so it seems doubtful it is in a subsystem tree
> > which hasn't made its way to Linus yet.
> >
> > Before we go any further, can you please verify that your issue is
> > reproducible on a supported, upstream tree (preferably Linus')?
>
> The patch attached is applied to oracle kernel which is just used to
> demo the idea of a possible fix.
>
> Upstream will have the same issue because blktrace uses relay files from
> debugfs to transfer trace information from kernel to userspace ...

For future reference, the problem with sending patches that aren't
based on an upstream tree is that it both wastes reviewer time trying
to figure out where the basis of the patch, and it makes one question
if the issue is present in an upstream kernel or if there is some
out-of-tree patch in the unknown kernel which is the root cause.

Maybe you've tested everything and know it is a problem with the
upstream code, but when we see a patch that doesn't match up with
anything, how are we supposed to know?

--
paul-moore.com

2023-04-06 21:56:14

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:

...

> > Before we go any further, can you please verify that your issue is
> > reproducible on a supported, upstream tree (preferably Linus')?
>
> Yes. Very much so.

Okay, in that case I suspect the issue is due to the somewhat limited
granularity in the lockdown LSM. While there are a number of
different lockdown "levels", the reality is that the admin has to
choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
digging to deep into the code path that you would be hitting, we can
see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
that would allow DEBUGFS is NONE.

Without knowing too much about blktrace beyond the manpage, it looks
like it has the ability to trace/snoop on the block device operations
so I don't think this is something we would want to allow in a
"locked" system.

Sorry.

--
paul-moore.com

2023-04-10 19:32:17

by Junxiao Bi

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On 4/6/23 2:43 PM, Paul Moore wrote:

> On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
> <[email protected]> wrote:
>> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
> ...
>
>>> Before we go any further, can you please verify that your issue is
>>> reproducible on a supported, upstream tree (preferably Linus')?
>> Yes. Very much so.
> Okay, in that case I suspect the issue is due to the somewhat limited
> granularity in the lockdown LSM. While there are a number of
> different lockdown "levels", the reality is that the admin has to
> choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
> digging to deep into the code path that you would be hitting, we can
> see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
> INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
> setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
> that would allow DEBUGFS is NONE.
>
> Without knowing too much about blktrace beyond the manpage, it looks
> like it has the ability to trace/snoop on the block device operations
> so I don't think this is something we would want to allow in a
> "locked" system.

blktrace depends on tracepoint in block layer to trace io events of
block devices,

through the test with mainline, those tracepoints were not blocked by
lockdown.

If snoop block devices operations is a security concern in lock down, these

tracepoints should be disabled?

[root@jubi-ol8 tracecmd]# uname -a
Linux jubi-ol8 6.3.0-rc6.master.20230410.ol8.x86_64 #1 SMP
PREEMPT_DYNAMIC Mon Apr 10 03:33:56 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@jubi-ol8 tracecmd]# cat /sys/kernel/security/lockdown
none [integrity] confidentiality
[root@jubi-ol8 tracecmd]# trace-cmd record -e block:block_rq_issue -e
block:block_rq_complete
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x9fa000
    4096 bytes in size
CPU1 data recorded at offset=0x9fb000
    4096 bytes in size
CPU2 data recorded at offset=0x9fc000
    53248 bytes in size
CPU3 data recorded at offset=0xa09000
    12288 bytes in size

Thanks,

Junxiao.

>
> Sorry.
>

2023-04-10 20:30:29

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Mon, Apr 10, 2023 at 3:20 PM Junxiao Bi <[email protected]> wrote:
> On 4/6/23 2:43 PM, Paul Moore wrote:
> > On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
> > <[email protected]> wrote:
> >> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
> > ...
> >
> >>> Before we go any further, can you please verify that your issue is
> >>> reproducible on a supported, upstream tree (preferably Linus')?
> >> Yes. Very much so.
> > Okay, in that case I suspect the issue is due to the somewhat limited
> > granularity in the lockdown LSM. While there are a number of
> > different lockdown "levels", the reality is that the admin has to
> > choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
> > digging to deep into the code path that you would be hitting, we can
> > see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
> > INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
> > setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
> > that would allow DEBUGFS is NONE.
> >
> > Without knowing too much about blktrace beyond the manpage, it looks
> > like it has the ability to trace/snoop on the block device operations
> > so I don't think this is something we would want to allow in a
> > "locked" system.
>
> blktrace depends on tracepoint in block layer to trace io events of
> block devices,
>
> through the test with mainline, those tracepoints were not blocked by
> lockdown.
>
> If snoop block devices operations is a security concern in lock down, these
>
> tracepoints should be disabled?

Possibly, however, as I said earlier I'm not very familiar with
blktrace and the associated tracepoints. If it is possible to snoop
on kernel/user data using blktrace then it probably should be
protected by a lockdown control point.

Is this something you could verify and potentially submit a patch to resolve?

--
paul-moore.com

2023-04-10 21:45:55

by Junxiao Bi

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On 4/10/23 1:22 PM, Paul Moore wrote:

> On Mon, Apr 10, 2023 at 3:20 PM Junxiao Bi <[email protected]> wrote:
>> On 4/6/23 2:43 PM, Paul Moore wrote:
>>> On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
>>> <[email protected]> wrote:
>>>> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
>>> ...
>>>
>>>>> Before we go any further, can you please verify that your issue is
>>>>> reproducible on a supported, upstream tree (preferably Linus')?
>>>> Yes. Very much so.
>>> Okay, in that case I suspect the issue is due to the somewhat limited
>>> granularity in the lockdown LSM. While there are a number of
>>> different lockdown "levels", the reality is that the admin has to
>>> choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
>>> digging to deep into the code path that you would be hitting, we can
>>> see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
>>> INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
>>> setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
>>> that would allow DEBUGFS is NONE.
>>>
>>> Without knowing too much about blktrace beyond the manpage, it looks
>>> like it has the ability to trace/snoop on the block device operations
>>> so I don't think this is something we would want to allow in a
>>> "locked" system.
>> blktrace depends on tracepoint in block layer to trace io events of
>> block devices,
>>
>> through the test with mainline, those tracepoints were not blocked by
>> lockdown.
>>
>> If snoop block devices operations is a security concern in lock down, these
>>
>> tracepoints should be disabled?
> Possibly, however, as I said earlier I'm not very familiar with
> blktrace and the associated tracepoints. If it is possible to snoop
> on kernel/user data using blktrace then it probably should be
> protected by a lockdown control point.
>
> Is this something you could verify and potentially submit a patch to resolve?

blktrace can not snoop kernel/user data. The information it got from
kernel is kind of "io metadata", basically include which process from
which cpu, at what time, triggered what kind of IO events(issue, queue,
complete etc.), to which disk, from which sector offset and how long.
blktrace has no way to know what's inside that io. I am kind of think
this is safe for lockdown mode.

The following is sample of blktrace output.

252,0    1        1     0.000000000  2570  Q   W 45779288 + 48 [nstat]
252,0    1        1     0.000000000  2570  Q   W 45779288 + 48 [nstat]
252,0    1        1     0.000000000  2570  Q   W 45779288 + 48 [nstat]
252,0    1        1     0.000000000  2570  Q   W 45779288 + 48 [nstat]
252,0    3        1     0.001038626  2572  C   W 45779288 + 48 [0]
252,0    3        1     0.001038626  2572  C   W 45779288 + 48 [0]
252,0    3        1     0.001038626  2572  C   W 45779288 + 48 [0]
252,0    3        1     0.001038626  2572  C   W 45779288 + 48 [0]
252,0    1        2     1.764157693  1384  Q  WS 24577112 + 8 [auditd]
252,0    1        2     1.764157693  1384  Q  WS 24577112 + 8 [auditd]
252,0    1        2     1.764157693  1384  Q  WS 24577112 + 8 [auditd]
252,0    1        2     1.764157693  1384  Q  WS 24577112 + 8 [auditd]
252,0    1        3     1.764216845  1384  Q  WS 24577120 + 16 [auditd]
252,0    1        3     1.764216845  1384  Q  WS 24577120 + 16 [auditd]
252,0    1        3     1.764216845  1384  Q  WS 24577120 + 16 [auditd]
252,0    1        3     1.764216845  1384  Q  WS 24577120 + 16 [auditd]


Thanks,

Junxiao.

>

2023-04-10 21:52:33

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Mon, Apr 10, 2023 at 5:28 PM Junxiao Bi <[email protected]> wrote:
> On 4/10/23 1:22 PM, Paul Moore wrote:
> > On Mon, Apr 10, 2023 at 3:20 PM Junxiao Bi <[email protected]> wrote:
> >> On 4/6/23 2:43 PM, Paul Moore wrote:
> >>> On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
> >>> <[email protected]> wrote:
> >>>> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
> >>> ...
> >>>
> >>>>> Before we go any further, can you please verify that your issue is
> >>>>> reproducible on a supported, upstream tree (preferably Linus')?
> >>>> Yes. Very much so.
> >>> Okay, in that case I suspect the issue is due to the somewhat limited
> >>> granularity in the lockdown LSM. While there are a number of
> >>> different lockdown "levels", the reality is that the admin has to
> >>> choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
> >>> digging to deep into the code path that you would be hitting, we can
> >>> see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
> >>> INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
> >>> setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
> >>> that would allow DEBUGFS is NONE.
> >>>
> >>> Without knowing too much about blktrace beyond the manpage, it looks
> >>> like it has the ability to trace/snoop on the block device operations
> >>> so I don't think this is something we would want to allow in a
> >>> "locked" system.
> >> blktrace depends on tracepoint in block layer to trace io events of
> >> block devices,
> >>
> >> through the test with mainline, those tracepoints were not blocked by
> >> lockdown.
> >>
> >> If snoop block devices operations is a security concern in lock down, these
> >>
> >> tracepoints should be disabled?
> > Possibly, however, as I said earlier I'm not very familiar with
> > blktrace and the associated tracepoints. If it is possible to snoop
> > on kernel/user data using blktrace then it probably should be
> > protected by a lockdown control point.
> >
> > Is this something you could verify and potentially submit a patch to resolve?
>
> blktrace can not snoop kernel/user data. The information it got from
> kernel is kind of "io metadata", basically include which process from
> which cpu, at what time, triggered what kind of IO events(issue, queue,
> complete etc.), to which disk, from which sector offset and how long.
> blktrace has no way to know what's inside that io. I am kind of think
> this is safe for lockdown mode.

Well, you could always submit a patch* and we would review it like any
other; that's usually a much better approach.

* Yes, there was a patch submitted, but it was against a distro kernel
that diverged significantly from the upstream kernel in the relevant
areas.

--
paul-moore.com

2023-04-10 22:04:02

by Junxiao Bi

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.


On 4/10/23 2:44 PM, Paul Moore wrote:
> On Mon, Apr 10, 2023 at 5:28 PM Junxiao Bi <[email protected]> wrote:
>> On 4/10/23 1:22 PM, Paul Moore wrote:
>>> On Mon, Apr 10, 2023 at 3:20 PM Junxiao Bi <[email protected]> wrote:
>>>> On 4/6/23 2:43 PM, Paul Moore wrote:
>>>>> On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
>>>>> <[email protected]> wrote:
>>>>>> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
>>>>> ...
>>>>>
>>>>>>> Before we go any further, can you please verify that your issue is
>>>>>>> reproducible on a supported, upstream tree (preferably Linus')?
>>>>>> Yes. Very much so.
>>>>> Okay, in that case I suspect the issue is due to the somewhat limited
>>>>> granularity in the lockdown LSM. While there are a number of
>>>>> different lockdown "levels", the reality is that the admin has to
>>>>> choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
>>>>> digging to deep into the code path that you would be hitting, we can
>>>>> see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
>>>>> INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
>>>>> setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
>>>>> that would allow DEBUGFS is NONE.
>>>>>
>>>>> Without knowing too much about blktrace beyond the manpage, it looks
>>>>> like it has the ability to trace/snoop on the block device operations
>>>>> so I don't think this is something we would want to allow in a
>>>>> "locked" system.
>>>> blktrace depends on tracepoint in block layer to trace io events of
>>>> block devices,
>>>>
>>>> through the test with mainline, those tracepoints were not blocked by
>>>> lockdown.
>>>>
>>>> If snoop block devices operations is a security concern in lock down, these
>>>>
>>>> tracepoints should be disabled?
>>> Possibly, however, as I said earlier I'm not very familiar with
>>> blktrace and the associated tracepoints. If it is possible to snoop
>>> on kernel/user data using blktrace then it probably should be
>>> protected by a lockdown control point.
>>>
>>> Is this something you could verify and potentially submit a patch to resolve?
>> blktrace can not snoop kernel/user data. The information it got from
>> kernel is kind of "io metadata", basically include which process from
>> which cpu, at what time, triggered what kind of IO events(issue, queue,
>> complete etc.), to which disk, from which sector offset and how long.
>> blktrace has no way to know what's inside that io. I am kind of think
>> this is safe for lockdown mode.
> Well, you could always submit a patch* and we would review it like any
> other; that's usually a much better approach.
>
> * Yes, there was a patch submitted, but it was against a distro kernel
> that diverged significantly from the upstream kernel in the relevant
> areas.

Sure, i will submit a new one.

Before that, may i ask this question? It may affect the approach of the
patch.

Lockdown blocked files with mmap operation even that files are
read-only, may i know what's the security concern there?

static int debugfs_locked_down(struct inode *inode,
                   struct file *filp,
                   const struct file_operations *real_fops)
{
    if ((inode->i_mode & 07777 & ~0444) == 0 &&
        !(filp->f_mode & FMODE_WRITE) &&
        !real_fops->unlocked_ioctl &&
        !real_fops->compat_ioctl &&
        !real_fops->mmap)
        return 0;

    if (security_locked_down(LOCKDOWN_DEBUGFS))
        return -EPERM;

    return 0;
}

Thanks,

Junxiao.

>

2023-04-10 22:08:15

by Paul Moore

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On Mon, Apr 10, 2023 at 5:52 PM Junxiao Bi <[email protected]> wrote:
> On 4/10/23 2:44 PM, Paul Moore wrote:
> > On Mon, Apr 10, 2023 at 5:28 PM Junxiao Bi <[email protected]> wrote:
> >> On 4/10/23 1:22 PM, Paul Moore wrote:
> >>> On Mon, Apr 10, 2023 at 3:20 PM Junxiao Bi <[email protected]> wrote:
> >>>> On 4/6/23 2:43 PM, Paul Moore wrote:
> >>>>> On Thu, Apr 6, 2023 at 3:33 PM Konrad Rzeszutek Wilk
> >>>>> <[email protected]> wrote:
> >>>>>> On Thu, Apr 06, 2023 at 02:39:57PM -0400, Paul Moore wrote:
> >>>>> ...
> >>>>>
> >>>>>>> Before we go any further, can you please verify that your issue is
> >>>>>>> reproducible on a supported, upstream tree (preferably Linus')?
> >>>>>> Yes. Very much so.
> >>>>> Okay, in that case I suspect the issue is due to the somewhat limited
> >>>>> granularity in the lockdown LSM. While there are a number of
> >>>>> different lockdown "levels", the reality is that the admin has to
> >>>>> choose from either NONE, INTEGRITY, or CONFIDENTIALITY. Without
> >>>>> digging to deep into the code path that you would be hitting, we can
> >>>>> see that TRACEFS is blocked by the CONFIDENTIALITY (and therefore
> >>>>> INTEGRITY too) setting and DEBUGFS is blocked by the INTEGRITY
> >>>>> setting. With DEBUGFS blocked by INTEGRITY, the only lockdown option
> >>>>> that would allow DEBUGFS is NONE.
> >>>>>
> >>>>> Without knowing too much about blktrace beyond the manpage, it looks
> >>>>> like it has the ability to trace/snoop on the block device operations
> >>>>> so I don't think this is something we would want to allow in a
> >>>>> "locked" system.
> >>>> blktrace depends on tracepoint in block layer to trace io events of
> >>>> block devices,
> >>>>
> >>>> through the test with mainline, those tracepoints were not blocked by
> >>>> lockdown.
> >>>>
> >>>> If snoop block devices operations is a security concern in lock down, these
> >>>>
> >>>> tracepoints should be disabled?
> >>> Possibly, however, as I said earlier I'm not very familiar with
> >>> blktrace and the associated tracepoints. If it is possible to snoop
> >>> on kernel/user data using blktrace then it probably should be
> >>> protected by a lockdown control point.
> >>>
> >>> Is this something you could verify and potentially submit a patch to resolve?
> >> blktrace can not snoop kernel/user data. The information it got from
> >> kernel is kind of "io metadata", basically include which process from
> >> which cpu, at what time, triggered what kind of IO events(issue, queue,
> >> complete etc.), to which disk, from which sector offset and how long.
> >> blktrace has no way to know what's inside that io. I am kind of think
> >> this is safe for lockdown mode.
> > Well, you could always submit a patch* and we would review it like any
> > other; that's usually a much better approach.
> >
> > * Yes, there was a patch submitted, but it was against a distro kernel
> > that diverged significantly from the upstream kernel in the relevant
> > areas.
>
> Sure, i will submit a new one.
>
> Before that, may i ask this question? It may affect the approach of the
> patch.
>
> Lockdown blocked files with mmap operation even that files are
> read-only, may i know what's the security concern there?
>
> static int debugfs_locked_down(struct inode *inode,
> struct file *filp,
> const struct file_operations *real_fops)
> {
> if ((inode->i_mode & 07777 & ~0444) == 0 &&
> !(filp->f_mode & FMODE_WRITE) &&
> !real_fops->unlocked_ioctl &&
> !real_fops->compat_ioctl &&
> !real_fops->mmap)
> return 0;
>
> if (security_locked_down(LOCKDOWN_DEBUGFS))
> return -EPERM;
>
> return 0;
> }

I think the comment block at the top of that function describes it well:

/*
* Only permit access to world-readable files when the kernel is locked down.
* We also need to exclude any file that has ways to write or alter it as root
* can bypass the permissions check.
*/

--
paul-moore.com

2023-04-10 22:38:56

by Junxiao Bi

[permalink] [raw]
Subject: Re: Semantics of blktrace with lockdown (integrity) enabled kernel.

On 4/10/23 3:00 PM, Paul Moore wrote:

>>> Well, you could always submit a patch* and we would review it like any
>>> other; that's usually a much better approach.
>>>
>>> * Yes, there was a patch submitted, but it was against a distro kernel
>>> that diverged significantly from the upstream kernel in the relevant
>>> areas.
>> Sure, i will submit a new one.
>>
>> Before that, may i ask this question? It may affect the approach of the
>> patch.
>>
>> Lockdown blocked files with mmap operation even that files are
>> read-only, may i know what's the security concern there?
>>
>> static int debugfs_locked_down(struct inode *inode,
>> struct file *filp,
>> const struct file_operations *real_fops)
>> {
>> if ((inode->i_mode & 07777 & ~0444) == 0 &&
>> !(filp->f_mode & FMODE_WRITE) &&
>> !real_fops->unlocked_ioctl &&
>> !real_fops->compat_ioctl &&
>> !real_fops->mmap)
>> return 0;
>>
>> if (security_locked_down(LOCKDOWN_DEBUGFS))
>> return -EPERM;
>>
>> return 0;
>> }
> I think the comment block at the top of that function describes it well:
>
> /*
> * Only permit access to world-readable files when the kernel is locked down.
> * We also need to exclude any file that has ways to write or alter it as root
> * can bypass the permissions check.
> */

I may have some misunderstanding of  commit 5496197f9b08("debugfs:
Restrict debugfs when the kernel is locked down"),  it mentioned chmod
is disabled for debugfs file, so i thought permission of debugfs file
can not be changed. Actually I just tested that, the permission can be
changed! I am not sure whether this is an issue or not. Anyway i
understand the security concern with mmap, thanks a lot.

Thanks,

Junxiao.

>
> --
> paul-moore.com