From: Joe Konno <[email protected]>
It was pointed out that normal, unprivileged users reading certain EFI
variables (through efivarfs) can generate SMIs. Given these nodes are created
with 0644 permissions, normal users could generate a lot of SMIs. By
restricting permissions a bit (patch 1), we can make it harder for normal users
to generate spurious SMIs.
A normal user could generate lots of SMIs by reading the efivarfs in a trivial
loop:
```
while true; do
cat /sys/firmware/efi/evivars/* > /dev/null
done
```
Patch 1 in this series limits read and write permissions on efivarfs to the
owner/superuser. Group and world cannot access.
Patch 2 is for consistency and hygiene. If we restrict permissions for either
efivarfs or efi/vars, the other interface should get the same treatment.
Joe Konno (2):
fs/efivarfs: restrict inode permissions
efi: restrict top-level attribute permissions
drivers/firmware/efi/efi.c | 10 ++++++----
fs/efivarfs/super.c | 4 ++--
2 files changed, 8 insertions(+), 6 deletions(-)
--
2.14.1
From: Joe Konno <[email protected]>
Restrict four top-level (/sys/firmware/efi) attributes to match systab.
This is for consistency's sake, as well as hygiene.
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joe Konno <[email protected]>
---
drivers/firmware/efi/efi.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..9a1f6c85c8c7 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -167,11 +167,13 @@ static ssize_t fw_platform_size_show(struct kobject *kobj,
return sprintf(buf, "%d\n", efi_enabled(EFI_64BIT) ? 64 : 32);
}
-static struct kobj_attribute efi_attr_fw_vendor = __ATTR_RO(fw_vendor);
-static struct kobj_attribute efi_attr_runtime = __ATTR_RO(runtime);
-static struct kobj_attribute efi_attr_config_table = __ATTR_RO(config_table);
+static struct kobj_attribute efi_attr_fw_vendor =
+ __ATTR_RO_MODE(fw_vendor, 0400);
+static struct kobj_attribute efi_attr_runtime = __ATTR_RO_MODE(runtime, 0400);
+static struct kobj_attribute efi_attr_config_table =
+ __ATTR_RO_MODE(config_table, 0400);
static struct kobj_attribute efi_attr_fw_platform_size =
- __ATTR_RO(fw_platform_size);
+ __ATTR_RO_MODE(fw_platform_size, 0400);
static struct attribute *efi_subsys_attrs[] = {
&efi_attr_systab.attr,
--
2.14.1
From: Joe Konno <[email protected]>
Efivarfs nodes are created with group and world readable permissions.
Reading certain EFI variables trigger SMIs. So, this is a potential DoS
surface.
Make permissions more restrictive-- only the owner may read or write to
created inodes.
Suggested-by: Andi Kleen <[email protected]>
Reported-by: Tony Luck <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Matthew Garrett <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joe Konno <[email protected]>
---
fs/efivarfs/super.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 5b68e4294faa..ca98c4e31eb7 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -145,7 +145,7 @@ static int efivarfs_callback(efi_char16_t *name16, efi_guid_t vendor,
name[len + EFI_VARIABLE_GUID_LEN+1] = '\0';
- inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0,
+ inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0600, 0,
is_removable);
if (!inode)
goto fail_name;
@@ -207,7 +207,7 @@ static int efivarfs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_d_op = &efivarfs_d_ops;
sb->s_time_gran = 1;
- inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0755, 0, true);
+ inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0700, 0, true);
if (!inode)
return -ENOMEM;
inode->i_op = &efivarfs_dir_inode_operations;
--
2.14.1
On 15 February 2018 at 18:22, Joe Konno <[email protected]> wrote:
> From: Joe Konno <[email protected]>
>
> It was pointed out that normal, unprivileged users reading certain EFI
> variables (through efivarfs) can generate SMIs. Given these nodes are created
> with 0644 permissions, normal users could generate a lot of SMIs. By
> restricting permissions a bit (patch 1), we can make it harder for normal users
> to generate spurious SMIs.
>
> A normal user could generate lots of SMIs by reading the efivarfs in a trivial
> loop:
>
> ```
> while true; do
> cat /sys/firmware/efi/evivars/* > /dev/null
> done
> ```
>
> Patch 1 in this series limits read and write permissions on efivarfs to the
> owner/superuser. Group and world cannot access.
>
> Patch 2 is for consistency and hygiene. If we restrict permissions for either
> efivarfs or efi/vars, the other interface should get the same treatment.
>
I am inclined to apply this as a fix, but I will give the x86 guys a
chance to respond as well.
> Joe Konno (2):
> fs/efivarfs: restrict inode permissions
> efi: restrict top-level attribute permissions
>
> drivers/firmware/efi/efi.c | 10 ++++++----
> fs/efivarfs/super.c | 4 ++--
> 2 files changed, 8 insertions(+), 6 deletions(-)
>
> --
> 2.14.1
>
On Fri, Feb 16, 2018 at 10:41:45AM +0000, Ard Biesheuvel wrote:
> On 15 February 2018 at 18:22, Joe Konno <[email protected]> wrote:
> > From: Joe Konno <[email protected]>
> >
> > It was pointed out that normal, unprivileged users reading certain EFI
> > variables (through efivarfs) can generate SMIs. Given these nodes are created
> > with 0644 permissions, normal users could generate a lot of SMIs. By
> > restricting permissions a bit (patch 1), we can make it harder for normal users
> > to generate spurious SMIs.
> >
> > A normal user could generate lots of SMIs by reading the efivarfs in a trivial
> > loop:
> >
> > ```
> > while true; do
> > cat /sys/firmware/efi/evivars/* > /dev/null
> > done
> > ```
> >
> > Patch 1 in this series limits read and write permissions on efivarfs to the
> > owner/superuser. Group and world cannot access.
> >
> > Patch 2 is for consistency and hygiene. If we restrict permissions for either
> > efivarfs or efi/vars, the other interface should get the same treatment.
> >
>
> I am inclined to apply this as a fix, but I will give the x86 guys a
> chance to respond as well.
That stinking pile EFI never ceases to amaze me...
Just one question: by narrowing permissions this way, aren't you
breaking some userspace which reads those?
And if you do, then that's a no-no.
Which then would mean that you'd have to come up with some caching
scheme to protect the firmware from itself.
Or we could simply admit that EFI is a piece of crap, kill it and
start anew, this time much more conservatively and not add a whole OS
underneath the actual OS.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 16 February 2018 at 10:55, Borislav Petkov <[email protected]> wrote:
> On Fri, Feb 16, 2018 at 10:41:45AM +0000, Ard Biesheuvel wrote:
>> On 15 February 2018 at 18:22, Joe Konno <[email protected]> wrote:
>> > From: Joe Konno <[email protected]>
>> >
>> > It was pointed out that normal, unprivileged users reading certain EFI
>> > variables (through efivarfs) can generate SMIs. Given these nodes are created
>> > with 0644 permissions, normal users could generate a lot of SMIs. By
>> > restricting permissions a bit (patch 1), we can make it harder for normal users
>> > to generate spurious SMIs.
>> >
>> > A normal user could generate lots of SMIs by reading the efivarfs in a trivial
>> > loop:
>> >
>> > ```
>> > while true; do
>> > cat /sys/firmware/efi/evivars/* > /dev/null
>> > done
>> > ```
>> >
>> > Patch 1 in this series limits read and write permissions on efivarfs to the
>> > owner/superuser. Group and world cannot access.
>> >
>> > Patch 2 is for consistency and hygiene. If we restrict permissions for either
>> > efivarfs or efi/vars, the other interface should get the same treatment.
>> >
>>
>> I am inclined to apply this as a fix, but I will give the x86 guys a
>> chance to respond as well.
>
> That stinking pile EFI never ceases to amaze me...
>
> Just one question: by narrowing permissions this way, aren't you
> breaking some userspace which reads those?
>
> And if you do, then that's a no-no.
>
> Which then would mean that you'd have to come up with some caching
> scheme to protect the firmware from itself.
>
> Or we could simply admit that EFI is a piece of crap, kill it and
> start anew, this time much more conservatively and not add a whole OS
> underneath the actual OS.
>
By your own reasoning above, that's a no-no as well.
But thanks for your input. Anyone else got something constructive to contribute?
On Fri, Feb 16, 2018 at 10:58:47AM +0000, Ard Biesheuvel wrote:
> By your own reasoning above, that's a no-no as well.
I'm sure we can come up with some emulation - the same way we did the
BIOS emulation.
> But thanks for your input. Anyone else got something constructive to contribute?
The not-breaking userspace is constructive contribution. The last
paragraph is my usual rant.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 16 February 2018 at 11:08, Borislav Petkov <[email protected]> wrote:
> On Fri, Feb 16, 2018 at 10:58:47AM +0000, Ard Biesheuvel wrote:
>> By your own reasoning above, that's a no-no as well.
>
> I'm sure we can come up with some emulation - the same way we did the
> BIOS emulation.
>
>> But thanks for your input. Anyone else got something constructive to contribute?
>
> The not-breaking userspace is constructive contribution. The last
> paragraph is my usual rant.
>
Fair enough. And I am not disagreeing with you either.
So question to Joe: is it well defined which variables may exhibit
this behavior? Given that UEFI variables are GUID scoped, would
whitelisting certain GUIDs (the ones userland currently relies on to
be readable my non-privileged users) and making everything else
user-only solve this problem as well?
On Fri, Feb 16, 2018 at 11:18:12AM +0000, Ard Biesheuvel wrote:
> On 16 February 2018 at 11:08, Borislav Petkov <[email protected]> wrote:
> > On Fri, Feb 16, 2018 at 10:58:47AM +0000, Ard Biesheuvel wrote:
> >> By your own reasoning above, that's a no-no as well.
> >
> > I'm sure we can come up with some emulation - the same way we did the
> > BIOS emulation.
> >
> >> But thanks for your input. Anyone else got something constructive to contribute?
> >
> > The not-breaking userspace is constructive contribution. The last
> > paragraph is my usual rant.
> >
>
> Fair enough. And I am not disagreeing with you either.
>
> So question to Joe: is it well defined which variables may exhibit
> this behavior?
For brevity's sake, "not yet." Folks-- e.g. firmware writers and
equipment makers-- may use SMIs more (or less) than others. So, how many
SMIs generated by the reference shell script can, and will, vary
platform to platform. I and my colleagues are digging into this.
> Given that UEFI variables are GUID scoped, would whitelisting certain
> GUIDs (the ones userland currently relies on to be readable my
> non-privileged users) and making everything else user-only solve this
> problem as well?
Perhaps. I don't want us chasing white rabbits just yet, but the
whitelist is but one approach under consideration. We may see some other
patches or RFCs about caching and/or shadowing variable values in
efivarfs to reduce the number of direct EFI reads, with the goal of
reducing how many SMIs are generated.
Any obvious EFI variables that userspace tools have come to depend on--
those which normal, unprivileged users need to read-- are helpful inputs
to this discussion.
The discussion is double-ended: asking the "which variables almost
always cause SMIs?" at the front and "which variables do normal,
unprivileged tools need for standard operation?" at the back.
Cheers, thanks for the feedback and consideration
On Fri, Feb 16, 2018 at 10:48:32AM -0800, Joe Konno wrote:
> On Fri, Feb 16, 2018 at 11:18:12AM +0000, Ard Biesheuvel wrote:
> > On 16 February 2018 at 11:08, Borislav Petkov <[email protected]> wrote:
> > > On Fri, Feb 16, 2018 at 10:58:47AM +0000, Ard Biesheuvel wrote:
> > >> By your own reasoning above, that's a no-no as well.
> > >
> > > I'm sure we can come up with some emulation - the same way we did the
> > > BIOS emulation.
> > >
> > >> But thanks for your input. Anyone else got something constructive to contribute?
> > >
> > > The not-breaking userspace is constructive contribution. The last
> > > paragraph is my usual rant.
> > >
> >
> > Fair enough. And I am not disagreeing with you either.
> >
> > So question to Joe: is it well defined which variables may exhibit
> > this behavior?
>
> For brevity's sake, "not yet." Folks-- e.g. firmware writers and
> equipment makers-- may use SMIs more (or less) than others. So, how many
> SMIs generated by the reference shell script can, and will, vary
> platform to platform. I and my colleagues are digging into this.
As a first guess: anything generated during boot is probably not an
SMI. Everything else is probably an SMI. In fact, I would expect that
for most systems, the entire list of things that *don't* generate an SMI
(aside from a few IBV specific variables) is all the variables in Table
10 of the UEFI spec that don't have the NV flag.
> > Given that UEFI variables are GUID scoped, would whitelisting certain
> > GUIDs (the ones userland currently relies on to be readable my
> > non-privileged users) and making everything else user-only solve this
> > problem as well?
>
> Perhaps. I don't want us chasing white rabbits just yet, but the
> whitelist is but one approach under consideration. We may see some other
> patches or RFCs about caching and/or shadowing variable values in
> efivarfs to reduce the number of direct EFI reads, with the goal of
> reducing how many SMIs are generated.
>
> Any obvious EFI variables that userspace tools have come to depend on--
> those which normal, unprivileged users need to read-- are helpful inputs
> to this discussion.
So, our big userland consumers are efibootmgr, fwupdate, and
"systemctl reboot --firmware-setup". efibootmgr and fwupdate can do the
"show the current status" half of their job as a user right now, but
they rely on root for the other half anyway. I don't think we normally
use libfwup as non-root even for reading status. So basically, the use
case from userland that this will effect looks like:
efibootmgr -v
*scratch head*
sudo su -
efibootmgr -b 0000 -B
efibootmgr -b 0000 -c -L "fixed entry" ...
exit
I don't feel really bad about people having to move the third line up to
the top of that. It's also not a use case you can have very much of: it
means you manually booted without any valid boot entries. fallback.efi
would have created a valid boot entry if you didn't do it manually.
"systemctl reboot --firmware-setup" effectively runs as root in all
cases.
The only thing we really must ensure to preserve the other workflows
is making sure creat() and open() with 0644 doesn't return an error (it
obviously won't be preserved across a reboot), because that would break
the existing tools. But I don't see anything in this patchset which
will cause that.
tl;dr: I think changing everything to 0600 is probably completely fine,
and whitelisting is probably pointless.
--
Peter
On Fri, Feb 16, 2018 at 10:48:32AM -0800, Joe Konno wrote:
> We may see some other patches or RFCs about caching and/or shadowing
> variable values in efivarfs to reduce the number of direct EFI reads,
> with the goal of reducing how many SMIs are generated.
So if you do the caching scheme, the question about narrowing
permissions becomes moot...
> Any obvious EFI variables that userspace tools have come to depend on--
> those which normal, unprivileged users need to read-- are helpful inputs
> to this discussion.
... which solves the aspect of not breaking userspace nicely.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
> tl;dr: I think changing everything to 0600 is probably completely fine,
> and whitelisting is probably pointless.
But do you speak for all users? It will just take one person complaining
that efibootmgr no longer shows them what it used to show to bring down
the wrath of Linus on our (specifically Joe's) head for breaking user space.
I've got someone about to start looking at making efivarfs read and save
the values during mount, so all the read-only options can continue to work
without making EFI calls.
This will cost some memory (say 20-30 variables at up to 1K each).
-Tony
On 16 February 2018 at 19:22, Peter Jones <[email protected]> wrote:
> On Fri, Feb 16, 2018 at 10:48:32AM -0800, Joe Konno wrote:
>> On Fri, Feb 16, 2018 at 11:18:12AM +0000, Ard Biesheuvel wrote:
>> > On 16 February 2018 at 11:08, Borislav Petkov <[email protected]> wrote:
>> > > On Fri, Feb 16, 2018 at 10:58:47AM +0000, Ard Biesheuvel wrote:
>> > >> By your own reasoning above, that's a no-no as well.
>> > >
>> > > I'm sure we can come up with some emulation - the same way we did the
>> > > BIOS emulation.
>> > >
>> > >> But thanks for your input. Anyone else got something constructive to contribute?
>> > >
>> > > The not-breaking userspace is constructive contribution. The last
>> > > paragraph is my usual rant.
>> > >
>> >
>> > Fair enough. And I am not disagreeing with you either.
>> >
>> > So question to Joe: is it well defined which variables may exhibit
>> > this behavior?
>>
>> For brevity's sake, "not yet." Folks-- e.g. firmware writers and
>> equipment makers-- may use SMIs more (or less) than others. So, how many
>> SMIs generated by the reference shell script can, and will, vary
>> platform to platform. I and my colleagues are digging into this.
>
> As a first guess: anything generated during boot is probably not an
> SMI. Everything else is probably an SMI. In fact, I would expect that
> for most systems, the entire list of things that *don't* generate an SMI
> (aside from a few IBV specific variables) is all the variables in Table
> 10 of the UEFI spec that don't have the NV flag.
>
>> > Given that UEFI variables are GUID scoped, would whitelisting certain
>> > GUIDs (the ones userland currently relies on to be readable my
>> > non-privileged users) and making everything else user-only solve this
>> > problem as well?
>>
>> Perhaps. I don't want us chasing white rabbits just yet, but the
>> whitelist is but one approach under consideration. We may see some other
>> patches or RFCs about caching and/or shadowing variable values in
>> efivarfs to reduce the number of direct EFI reads, with the goal of
>> reducing how many SMIs are generated.
>>
>> Any obvious EFI variables that userspace tools have come to depend on--
>> those which normal, unprivileged users need to read-- are helpful inputs
>> to this discussion.
>
> So, our big userland consumers are efibootmgr, fwupdate, and
> "systemctl reboot --firmware-setup". efibootmgr and fwupdate can do the
> "show the current status" half of their job as a user right now, but
> they rely on root for the other half anyway. I don't think we normally
> use libfwup as non-root even for reading status. So basically, the use
> case from userland that this will effect looks like:
>
> efibootmgr -v
> *scratch head*
> sudo su -
> efibootmgr -b 0000 -B
> efibootmgr -b 0000 -c -L "fixed entry" ...
> exit
>
> I don't feel really bad about people having to move the third line up to
> the top of that. It's also not a use case you can have very much of: it
> means you manually booted without any valid boot entries. fallback.efi
> would have created a valid boot entry if you didn't do it manually.
>
> "systemctl reboot --firmware-setup" effectively runs as root in all
> cases.
>
> The only thing we really must ensure to preserve the other workflows
> is making sure creat() and open() with 0644 doesn't return an error (it
> obviously won't be preserved across a reboot), because that would break
> the existing tools. But I don't see anything in this patchset which
> will cause that.
>
> tl;dr: I think changing everything to 0600 is probably completely fine,
> and whitelisting is probably pointless.
This is why I was leaning towards applying these patches: not breaking
userland is an important rule, but it does not imply every aspect of
behavior observable by userland is set in stone. In other words, I
agree with Peter that making this change does not *break* userland in
a way anyone is likely to care deeply about.
Adding a caching layer to efivarfs for this purpose is really overkill
IMHO. Also, we need this only on x86, so I'd like to see it proposed
in a way that allows it to be compiled out for architectures that have
no need for it.
On Fri, Feb 16, 2018 at 11:31 AM Ard Biesheuvel <[email protected]>
wrote:
> This is why I was leaning towards applying these patches: not breaking
> userland is an important rule, but it does not imply every aspect of
> behavior observable by userland is set in stone. In other words, I
> agree with Peter that making this change does not *break* userland in
> a way anyone is likely to care deeply about.
In some modes tpmtotp will run as non-root and expect to be able to read an
EFI variable.
On Fri, Feb 16, 2018 at 07:32:17PM +0000, Luck, Tony wrote:
> > tl;dr: I think changing everything to 0600 is probably completely fine,
> > and whitelisting is probably pointless.
>
> But do you speak for all users?
No, I just write their tools :)
> It will just take one person complaining that efibootmgr no longer
> shows them what it used to show to bring down the wrath of Linus on
> our (specifically Joe's) head for breaking user space.
The userland use case is gazing idly at the values without intending to
do anything about them. And most of this is firmware config and
firmware/OS interaction. Honestly it should never have been user
readable to begin with.
But also, we had a bug for quite some time where efibootmgr created
everything as 0600, and as a result non-root users couldn't do e.g.
"efibootmgr -v" and see the paths of new entries until a reboot
occurred. Nobody ever reported it in bugzilla.redhat.com or efibootmgr
or efivar's github issues pages. One person noticed it while commenting
about another issue, but didn't see it as related to his actual issue or
being a bug so much as "weird" that listing worked as non-root before
changing something but not after.
Another user asked about getting permission denied while setting the
boot order on askubuntu here:
https://askubuntu.com/questions/688317/getting-permission-denied-errors-from-efibootmgr
The response was exactly that you have to run it as root. I think it's
telling that nobody said anything about reading vs writing.
> I've got someone about to start looking at making efivarfs read and save
> the values during mount, so all the read-only options can continue to work
> without making EFI calls.
>
> This will cost some memory (say 20-30 variables at up to 1K each).
71 variables on my laptop, and the 1K restriction went away a *loooong*
time ago. It was fully excised from the userland tools in 2013. On my
laptop, 4 of those 71 variables are >5000 bytes. The total storage of
all of the data in the variables is 38kB.
I still think changing it to 0600 and calling this done is the right
thing to do.
--
Peter
On Fri, 2018-02-16 at 10:41 +0000, Ard Biesheuvel wrote:
> On 15 February 2018 at 18:22, Joe Konno <[email protected]>
> wrote:
> >
> > From: Joe Konno <[email protected]>
> >
> > It was pointed out that normal, unprivileged users reading certain
> > EFI
> > variables (through efivarfs) can generate SMIs. Given these nodes
> > are created
> > with 0644 permissions, normal users could generate a lot of SMIs.
> > By
> > restricting permissions a bit (patch 1), we can make it harder for
> > normal users
> > to generate spurious SMIs.
> >
> > A normal user could generate lots of SMIs by reading the efivarfs
> > in a trivial
> > loop:
> >
> > ```
> > while true; do
> > cat /sys/firmware/efi/evivars/* > /dev/null
> > done
> > ```
> >
> > Patch 1 in this series limits read and write permissions on
> > efivarfs to the
> > owner/superuser. Group and world cannot access.
> >
> > Patch 2 is for consistency and hygiene. If we restrict permissions
> > for either
> > efivarfs or efi/vars, the other interface should get the same
> > treatment.
> >
>
> I am inclined to apply this as a fix, but I will give the x86 guys a
> chance to respond as well.
It would break my current efi certificate tools because right at the
moment you can read the EFI secure boot variables as an unprivileged
user.
That said, I'm not sure how many non-root users run the toolkit to
extract their EFI certificates or check on the secure boot status of
the system, but I suspect it might be non-zero: I can see the tinfoil
hat people wanting at least to check the secure boot status when they
log in.
James
> That said, I'm not sure how many non-root users run the toolkit to
> extract their EFI certificates or check on the secure boot status of
> the system, but I suspect it might be non-zero: I can see the tinfoil
> hat people wanting at least to check the secure boot status when they
> log in.
Another fix option might be to rate limit EFI calls for non-root users (on X86
since only we have the SMI problem). That would:
1) Avoid using memory to cache all the variables
2) Catch any other places where non-root users can call EFI
-Tony
On Fri, Feb 16, 2018 at 9:09 PM, Luck, Tony <[email protected]> wrote:
>> That said, I'm not sure how many non-root users run the toolkit to
>> extract their EFI certificates or check on the secure boot status of
>> the system, but I suspect it might be non-zero: I can see the tinfoil
>> hat people wanting at least to check the secure boot status when they
>> log in.
>
> Another fix option might be to rate limit EFI calls for non-root users (on X86
> since only we have the SMI problem). That would:
>
> 1) Avoid using memory to cache all the variables
> 2) Catch any other places where non-root users can call EFI
I'm going to go out on a limb and suggest that the fact that
unprivileged users can read efi variables at all is a mistake
regardless of SMI issues.
Also, chmod() just shouldn't work on efi variables, and the mode
passed to creat() should be ignored. After all, there's no backing
store for the mode.
On Fri, Feb 16, 2018 at 1:45 PM Andy Lutomirski <[email protected]> wrote:
> I'm going to go out on a limb and suggest that the fact that
> unprivileged users can read efi variables at all is a mistake
> regardless of SMI issues.
Why? They should never contain sensitive material.
> Also, chmod() just shouldn't work on efi variables, and the mode
> passed to creat() should be ignored. After all, there's no backing
> store for the mode.
If the default is 600 then it makes sense to allow a privileged service to
selectively make certain variables world readable at runtime.
> If the default is 600 then it makes sense to allow a privileged service to
> selectively make certain variables world readable at runtime.
As soon as you make one variable world readable you are vulnerable to
a local user launching a DoS attack by reading that variable over and over
generating a flood of SMIs.
-Tony
On Fri, Feb 16, 2018 at 2:02 PM Luck, Tony <[email protected]> wrote:
> > If the default is 600 then it makes sense to allow a privileged service
to
> > selectively make certain variables world readable at runtime.
> As soon as you make one variable world readable you are vulnerable to
> a local user launching a DoS attack by reading that variable over and over
> generating a flood of SMIs.
I'm not terribly worried about untrusted users on my laptop, but I would
prefer to run as little code as root as possible.
On Fri, Feb 16, 2018 at 09:09:30PM +0000, Luck, Tony wrote:
> > That said, I'm not sure how many non-root users run the toolkit to
> > extract their EFI certificates or check on the secure boot status of
> > the system, but I suspect it might be non-zero: I can see the tinfoil
> > hat people wanting at least to check the secure boot status when they
> > log in.
>
> Another fix option might be to rate limit EFI calls for non-root users (on X86
> since only we have the SMI problem). That would:
>
> 1) Avoid using memory to cache all the variables
> 2) Catch any other places where non-root users can call EFI
I could get behind that as well. Currently the things I maintain do
approximately this many normal accesses with invocations you can do as a
user:
"efibootmgr -v" - six files we always try to read, plus one per Boot####
entry.
"fwupdate --info" - one file it always tries to read, one file for each
ESRT entry.
"dbxtool -l" - one file it always reads.
"mokutil --sb-state" - reads the same file twice. I don't maintain
this, but I'll send a patch to Gary to make it
only read it once. AFAICS all of the other
invocations you can currently do as a user
/legitimately/ read two files, though.
Some systems seem to *love* making a pile of Boot#### entries; I think
the most I've seen is something like 16. So on that machine, one
"efibootmgr -v" invocation is ~22 efivars files read. I've never seen a
machine that advertised more than 2 ESRT entries, but maybe we'll get
there some day.
--
Peter
On 16 February 2018 at 22:05, Peter Jones <[email protected]> wrote:
> On Fri, Feb 16, 2018 at 09:09:30PM +0000, Luck, Tony wrote:
>> > That said, I'm not sure how many non-root users run the toolkit to
>> > extract their EFI certificates or check on the secure boot status of
>> > the system, but I suspect it might be non-zero: I can see the tinfoil
>> > hat people wanting at least to check the secure boot status when they
>> > log in.
>>
>> Another fix option might be to rate limit EFI calls for non-root users (on X86
>> since only we have the SMI problem). That would:
>>
>> 1) Avoid using memory to cache all the variables
>> 2) Catch any other places where non-root users can call EFI
>
> I could get behind that as well. Currently the things I maintain do
> approximately this many normal accesses with invocations you can do as a
> user:
>
> "efibootmgr -v" - six files we always try to read, plus one per Boot####
> entry.
> "fwupdate --info" - one file it always tries to read, one file for each
> ESRT entry.
> "dbxtool -l" - one file it always reads.
> "mokutil --sb-state" - reads the same file twice. I don't maintain
> this, but I'll send a patch to Gary to make it
> only read it once. AFAICS all of the other
> invocations you can currently do as a user
> /legitimately/ read two files, though.
>
> Some systems seem to *love* making a pile of Boot#### entries; I think
> the most I've seen is something like 16. So on that machine, one
> "efibootmgr -v" invocation is ~22 efivars files read. I've never seen a
> machine that advertised more than 2 ESRT entries, but maybe we'll get
> there some day.
>
Would rate limiting (but not only for non-root) help mitigate Spectre
v1 issues in UEFI runtime services code as well? I have been looking
into unmapping the entire kernel while such calls are in progress,
because firmware is likely to remain vulnerable long after the OSes
have been fixed, and we may be able to kill two birds with one stone
here (and not break userland in the process)
> Would rate limiting (but not only for non-root) help mitigate Spectre
> v1 issues in UEFI runtime services code as well? I have been looking
> into unmapping the entire kernel while such calls are in progress,
> because firmware is likely to remain vulnerable long after the OSes
> have been fixed, and we may be able to kill two birds with one stone
> here (and not break userland in the process)
Yes a global rate limit would seem like a good compromise.
-Andi
On Fri, Feb 16, 2018 at 10:03 PM, Matthew Garrett <[email protected]> wrote:
> On Fri, Feb 16, 2018 at 2:02 PM Luck, Tony <[email protected]> wrote:
>
>> > If the default is 600 then it makes sense to allow a privileged service
> to
>> > selectively make certain variables world readable at runtime.
>
>> As soon as you make one variable world readable you are vulnerable to
>> a local user launching a DoS attack by reading that variable over and over
>> generating a flood of SMIs.
>
> I'm not terribly worried about untrusted users on my laptop, but I would
> prefer to run as little code as root as possible.
I think that, for the most part, systemwide configuration should not
be accessible to non-root. Unprivileged users, in general, have no
legitimate reason to know that my default boot is Boot0000* Fedora
HD(1,GPT,ee...,0x800,0x64000)/File(\EFI\fedora\shim.efi). Even more
so if I'm network booting.
Alternatively, we could call this a distro issue. Distros could
easily change the permissions on /sys/firmware/efi/efivars to disallow
unprivileged access.
On 02/15/2018 10:22 AM, Joe Konno wrote:
> From: Joe Konno <[email protected]>
>
> Efivarfs nodes are created with group and world readable permissions.
> Reading certain EFI variables trigger SMIs. So, this is a potential DoS
> surface.
>
> Make permissions more restrictive-- only the owner may read or write to
> created inodes.
>
> Suggested-by: Andi Kleen <[email protected]>
> Reported-by: Tony Luck <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Matthew Garrett <[email protected]>
> Cc: Jeremy Kerr <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Joe Konno <[email protected]>
The discussion in this thread has gone on too long, so:
Acked-by: Andy Lutomirski <[email protected]>
And yes, this patch will break a couple of minor usecases, but IMO those
usecases deserve to break.
> ---
> fs/efivarfs/super.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
> index 5b68e4294faa..ca98c4e31eb7 100644
> --- a/fs/efivarfs/super.c
> +++ b/fs/efivarfs/super.c
> @@ -145,7 +145,7 @@ static int efivarfs_callback(efi_char16_t *name16, efi_guid_t vendor,
>
> name[len + EFI_VARIABLE_GUID_LEN+1] = '\0';
>
> - inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0,
> + inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0600, 0,
> is_removable);
> if (!inode)
> goto fail_name;
> @@ -207,7 +207,7 @@ static int efivarfs_fill_super(struct super_block *sb, void *data, int silent)
> sb->s_d_op = &efivarfs_d_ops;
> sb->s_time_gran = 1;
>
> - inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0755, 0, true);
> + inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0700, 0, true);
> if (!inode)
> return -ENOMEM;
> inode->i_op = &efivarfs_dir_inode_operations;
>
On Tue, Feb 20, 2018 at 11:18:57AM -0800, Andy Lutomirski wrote:
> On 02/15/2018 10:22 AM, Joe Konno wrote:
> > From: Joe Konno <[email protected]>
> >
> > Efivarfs nodes are created with group and world readable permissions.
> > Reading certain EFI variables trigger SMIs. So, this is a potential DoS
> > surface.
> >
> > Make permissions more restrictive-- only the owner may read or write to
> > created inodes.
...
> The discussion in this thread has gone on too long, so:
>
> Acked-by: Andy Lutomirski <[email protected]>
>
> And yes, this patch will break a couple of minor usecases, but IMO those
> usecases deserve to break.
...
> > - inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0,
> > + inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0600, 0,
> > is_removable);
Linus,
Does this rate an exception to the "don't break userspace" for a security issue?
What breaks:
User can't run efibootmgr(8) to see things like BootOrder. Also
"fwupdate", "dbxtool", "mokutil", and "tpmtotp" have some modes
where ordinary users need read access to some EFI variables.
We looked at some other options.
1) When mounting efivarfs have it read all the variables and
cache the values. Then user can read without making an EFI
call because we just copyout the cached copy.
Rejected as there can be a lot of variables (70 on Peter Jones
system) and EFI dropped the 1KB per variable limit. So this
pins a bunch of memory for a few obscure use cases.
2) Rate limit EFI calls for non-root
This solution still has some cheer-leaders. Obviously a bit
more code than just changing the permissions. But would also
preemptively fix any other places where an ordinary user can
trigger an EFI call.
-Tony
On Tue, Feb 20, 2018 at 1:18 PM Luck, Tony <[email protected]> wrote:
> Does this rate an exception to the "don't break userspace" for a security
issue?
To be clear, when you say "security" is this in reference to it being a
denial of service, or are you worried about other interactions that may
cause wider security issues?
On Tue, Feb 20, 2018 at 09:22:29PM +0000, Matthew Garrett wrote:
> On Tue, Feb 20, 2018 at 1:18 PM Luck, Tony <[email protected]> wrote:
>
> > Does this rate an exception to the "don't break userspace" for a security
> issue?
>
> To be clear, when you say "security" is this in reference to it being a
> denial of service, or are you worried about other interactions that may
> cause wider security issues?
The immediate problem is the denial of service attack. I have
a nagging worry that allowing a user to cause an SMI at a precise
time might also be a problem. But I don't know how that could be
leveraged in some other attack.
Making the efivar files 0600 would stop the user from causing the
SMIs. The rate limit solution could include a random delay to make
it tricky to use any attack that relies on an SMI during some specific
code sequence.
-Tony
On Tue, Feb 20, 2018 at 1:32 PM Luck, Tony <[email protected]> wrote:
> The immediate problem is the denial of service attack. I have
> a nagging worry that allowing a user to cause an SMI at a precise
> time might also be a problem. But I don't know how that could be
> leveraged in some other attack.
The thing that worries me here is that if it's possible for root to
potentially attack the kernel then just changing the permissions is still
allowing an escalation of privilege. The other approaches would also block
this.
On Tue, Feb 20, 2018 at 1:18 PM, Luck, Tony <[email protected]> wrote:
> ...
>> > - inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0,
>> > + inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0600, 0,
>> > is_removable);
>
> Linus,
>
> Does this rate an exception to the "don't break userspace" for a security issue?
I have a hard time judging, since I don't know what the breakage is.
_Theoretical_ breakage doesn't matter.
But yes, if it's actually part of some regular use flow, then it
matters, and we should not do this change. It's not a real security
issue, afaik.
I do think that we might need to just extend on the whitelist for efi
variables we _already_ have for other reasons. I'm talking about that
whole variable_validate[] array.
Which variables are actually used by user space tools? It sounds like
it may be exactly those boot order things that we already have flags
and special behavior for.
And no, I don't believe in the "SMI can trigger a DoS" argument. Those
garbage efivars that *do* trigger SMI's should me marked and shamed,
and maybe _they_ need to be added to the list of things that you
should look out for. Root or not.
And just on general principlies, I don't want to see weasel-wordy
commit messages like
"Reading certain EFI variables trigger SMIs"
I understand *writing* them causing SMI's due to some flash protection
scheme. What's the reading thing? And why aren't we calling that
garbage out?
So even if we do need to limit reading, I want out comments (in core
or commits) to actually explain *Why*., instead of this kind of
non-specific fear-mongering that nobody can really say yay or nay
about.
Linus
On Tue, Feb 20, 2018 at 7:18 PM, Andy Lutomirski <[email protected]> wrote:
> On 02/15/2018 10:22 AM, Joe Konno wrote:
>>
>> From: Joe Konno <[email protected]>
>>
>> Efivarfs nodes are created with group and world readable permissions.
>> Reading certain EFI variables trigger SMIs. So, this is a potential DoS
>> surface.
>>
>> Make permissions more restrictive-- only the owner may read or write to
>> created inodes.
>>
>> Suggested-by: Andi Kleen <[email protected]>
>> Reported-by: Tony Luck <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Matthew Garrett <[email protected]>
>> Cc: Jeremy Kerr <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Joe Konno <[email protected]>
>
>
> The discussion in this thread has gone on too long, so:
>
> Acked-by: Andy Lutomirski <[email protected]>
>
> And yes, this patch will break a couple of minor usecases, but IMO those
> usecases deserve to break.
Alternatively, a patch like this (untested but straightforward) might
be a little more effective and easier to undo in userspace for anyone
who may be adversely affected:
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 5b68e4294faa..88c7163c0ac1 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -207,7 +207,7 @@ static int efivarfs_fill_super(struct super_block
*sb, void *data, int silent)
sb->s_d_op = &efivarfs_d_ops;
sb->s_time_gran = 1;
- inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0755, 0, true);
+ inode = efivarfs_get_inode(sb, NULL, S_IFDIR | 0700, 0, true);
if (!inode)
return -ENOMEM;
inode->i_op = &efivarfs_dir_inode_operations;
If you prefer that, I'd be happy to re-send it for real.
--Andy
On Tue, Feb 20, 2018 at 02:01:51PM -0800, Linus Torvalds wrote:
> And just on general principlies, I don't want to see weasel-wordy
> commit messages like
>
> "Reading certain EFI variables trigger SMIs"
>
> I understand *writing* them causing SMI's due to some flash protection
> scheme. What's the reading thing? And why aren't we calling that
> garbage out?
Too much weasel there. Should say:
EFI[1] stinks. Reading any file in /sys/firmware/efi/efivars/ generates
4 (yes FOUR!) SMIs.
# rdmsr 0x34
14e2
# cat /sys/firmware/efi/efivars/ConIn-8be4df61-93ca-11d2-aa0d-00e098032b8c > /dev/null
# rdmsr 0x34
14e6
-Tony
[1] I didn't dig through the Linux code to check whether we manage to
get those four SMIs from a single EFI call, or whether we make multiple
EFI calls to open/read/close one file. It is possible that we stink a
bit too if we are doing more EFI calls than required.
On Tue, Feb 20, 2018 at 3:30 PM Luck, Tony <[email protected]> wrote:
> [1] I didn't dig through the Linux code to check whether we manage to
> get those four SMIs from a single EFI call, or whether we make multiple
> EFI calls to open/read/close one file. It is possible that we stink a
> bit too if we are doing more EFI calls than required.
read() will make two calls - one to obtain the size of the variable, the
other to read it. It looks like cat will also trigger an fstat(), so we're
probably also making a call for that. There's presumably some optimisation
that could be made there if we trust the firmware not to change the size
behind our back.
> read() will make two calls - one to obtain the size of the variable, the
> other to read it. It looks like cat will also trigger an fstat(), so we're
> probably also making a call for that. There's presumably some optimisation
> that could be made there if we trust the firmware not to change the size
> behind our back.
Hmmm. "ls -l" reports the size of each of the files without causing any SMIs,
so Linux has that cached someplace and "read" is being silly making a call
to get something that we already know. Unless we think the size may be
changing asynchronously and are OK with reporting a stale value for "stat"
but want the actual value for "read".
-Tony
On Tue, Feb 20, 2018 at 3:30 PM, Luck, Tony <[email protected]> wrote:
>
> Too much weasel there. Should say:
>
> EFI[1] stinks. Reading any file in /sys/firmware/efi/efivars/ generates
> 4 (yes FOUR!) SMIs.
Is that actualkly the normal implementation?
Also, if these are just synchronous SMI's, then don't we just end up
correctly assigning the CPU load to the reader, and it doesn't
actually matter? Where's the DoS?
Linus
On Tue, Feb 20, 2018 at 02:01:51PM -0800, Linus Torvalds wrote:
> Which variables are actually used by user space tools? It sounds like
> it may be exactly those boot order things that we already have flags
> and special behavior for.
Not that many are really part of a non-root workflow. The biggest
consumer that there's a real /user/ workflow for is Matthew's tpmtotp,
which lets you pick your own when you set it up as root and merely read
from it as a user in perpetuity. Most workflows are of the same form as
"I run efibootmgr as a user to list things and then use sudo when I
actually need to make some changes."
> And no, I don't believe in the "SMI can trigger a DoS" argument. Those
> garbage efivars that *do* trigger SMI's should me marked and shamed,
> and maybe _they_ need to be added to the list of things that you
> should look out for. Root or not.
It's all of them except the very few that are generated during bootup.
It's worth noting, though, that EFI does not actually require this
implementation behavior in any way. This is all about Intel's
choice to use SMI to protect the variable store.
> And just on general principlies, I don't want to see weasel-wordy
> commit messages like
>
> "Reading certain EFI variables trigger SMIs"
>
> I understand *writing* them causing SMI's due to some flash protection
> scheme. What's the reading thing? And why aren't we calling that
> garbage out?
Assuming most Intel CPUs work more or less the same as Galileo in this
particular regard, what's going on is the SPI part is connected to the
North Cluster, which presents an MMIO interface to program it, and SMM
is configured so that touching those addresses triggers an SMI. This
way they can protect the the "authenticated" variables by checking
signatures inside SMM.
So basically it's not "Reading certain variables trigger SMIs", it's
"reading any variable that's not completely fake causes SMI". The
result is any user can not merely trigger an SMI, but really more like
they can hold it asserted.
> So even if we do need to limit reading, I want out comments (in core
> or commits) to actually explain *Why*., instead of this kind of
> non-specific fear-mongering that nobody can really say yay or nay
> about.
Yeah, makes sense.
There are other options, but they're not great either. For example, On
most systems the total data is <100kB, so we could make a default-on
config option to just cache it all during boot by default. Like the
options suggested so far, it has downsides, but it's not the end of the
world for most systems.
--
Peter
>> EFI[1] stinks. Reading any file in /sys/firmware/efi/efivars/ generates
>> 4 (yes FOUR!) SMIs.
> Is that actualkly the normal implementation?
I don't know if there are other implementations. This is what I see on my
lab system.
> Also, if these are just synchronous SMI's, then don't we just end up
> correctly assigning the CPU load to the reader, and it doesn't
> actually matter? Where's the DoS?
SMI are broadcast to all logical CPUs. The bigger the system the
more the pain from an SMI as code tries to rendezvous them all
(4 socket * 24 core * 2 HyperThreads = 192 CPUs on my system).
Looping reading files from efivarfs doesn't stop the system, but it
does slow to a crawl while it is happening. Not a full blown DoS,
but fairly unpleasant.
-Tony
On Tue, Feb 20, 2018 at 5:05 PM, Luck, Tony <[email protected]> wrote:
>>> EFI[1] stinks. Reading any file in /sys/firmware/efi/efivars/ generates
>>> 4 (yes FOUR!) SMIs.
>
>> Is that actualkly the normal implementation?
>
> I don't know if there are other implementations. This is what I see on my
> lab system.
Ok. I'm not a huge fan of EFI. Over-designed to the hilt. Happily at
least the loadable drivers are a thing of the past.
Do we have a list of things normal users care about? Because one thing
that would solve it is caching of the values. We don't want to do that
in general, but maybe we could just do it for the subset that we think
are "user accessible".
Although maybe just that "rate limit" thing would be simplest.
I don't want to break existing users, although it's not entirely clear
to me if there are any real use cases that matter to users. If tpmtotp
is the main case, maybe that can be changed to work around it and just
cache a value or something?
So I could imagine just applying Joe's / Andy's patch to see if
anybody even notices. But if somebody does, we'd have to go to the
alternatives anyway.
Linus
On 21 February 2018 at 02:16, Linus Torvalds
<[email protected]> wrote:
> On Tue, Feb 20, 2018 at 5:05 PM, Luck, Tony <[email protected]> wrote:
>>>> EFI[1] stinks. Reading any file in /sys/firmware/efi/efivars/ generates
>>>> 4 (yes FOUR!) SMIs.
>>
>>> Is that actualkly the normal implementation?
>>
>> I don't know if there are other implementations. This is what I see on my
>> lab system.
>
> Ok. I'm not a huge fan of EFI. Over-designed to the hilt. Happily at
> least the loadable drivers are a thing of the past.
>
I don't think there is any disagreement on that aspect of the discussion.
> Do we have a list of things normal users care about? Because one thing
> that would solve it is caching of the values. We don't want to do that
> in general, but maybe we could just do it for the subset that we think
> are "user accessible".
>
> Although maybe just that "rate limit" thing would be simplest.
>
The thing I like about rate limiting (for everyone including root) is
that it does not break anybody's workflow (unless EFI variable access
occurs on a hot path, in which case you're either a) asking for it, or
b) the guy trying to DoS us), and that it can make exploitation of any
potential Spectre v1 vulnerabilities impractical at the same time. At
present, we don't know if this is a concern, but we cannot rule it
out, and what we do know is that those shiny new Spectre-proof kernels
that we will have any day now will be running on systems with UEFI
firmware that may contain vulnerable code paths and may never get
updated. Perhaps I am being overly paranoid here, but if we end up
adding rate limiting, let's just apply it to root as well.
> I don't want to break existing users, although it's not entirely clear
> to me if there are any real use cases that matter to users. If tpmtotp
> is the main case, maybe that can be changed to work around it and just
> cache a value or something?
>
> So I could imagine just applying Joe's / Andy's patch to see if
> anybody even notices. But if somebody does, we'd have to go to the
> alternatives anyway.
>
> Linus
On Wed, Feb 21, 2018 at 1:03 AM, Ard Biesheuvel
<[email protected]> wrote:
>
> The thing I like about rate limiting (for everyone including root) is
> that it does not break anybody's workflow (unless EFI variable access
> occurs on a hot path, in which case you're either a) asking for it, or
> b) the guy trying to DoS us), and that it can make exploitation of any
> potential Spectre v1 vulnerabilities impractical at the same time.
I do agree that ratelimiting would seem to be the simple solution.
We _would_ presumably need to make it per-user, so that one user
cannot just DoS another user.
But it should be fairly easy to just add a 'struct ratelimit_state' to
'struct user_struct', and then you can easily just use
'&file->f_cred->user->ratelimit'
and you're done. Make sure the initial root user has it unlimited, and
limit it to something reasonable for all other user allocations.
So I think a rate-limiting thing wouldn't be overly complicated. We
could make the rate-limiting be some completely generic thing, not
tying it to efivars itself, but just saying "this is for random
"occasional" things where we are ok with a user doing a hundred
operations per second, but if somebody tries to do millions, they get
shut down".
Realistically, even root is fine with those, but letting root in the
initial namespace be entirely unlimited is obviously a pretty
reasonable thing to do.
So it might be a few tens of lines of code or something, including the
initialization of that new user struct entry.
I think the real issue is testing and just doing it.
Anybody?
Linus
> But it should be fairly easy to just add a 'struct ratelimit_state' to
> 'struct user_struct', and then you can easily just use
>
> '&file->f_cred->user->ratelimit'
>
> and you're done. Make sure the initial root user has it unlimited, and
> limit it to something reasonable for all other user allocations.
How about uid name spaces? Someone untrusted in a container could
create a lot of uids and switch between them.
A global rate limit seems better. While in theory it allows DoS
it's probably not worse than a lot of others we have with
other resources, and it's relatively harmless.
-Andi
On Wed, Feb 21, 2018 at 10:21:04AM -0800, Andi Kleen wrote:
> > But it should be fairly easy to just add a 'struct ratelimit_state' to
> > 'struct user_struct', and then you can easily just use
> >
> > '&file->f_cred->user->ratelimit'
> >
> > and you're done. Make sure the initial root user has it unlimited, and
> > limit it to something reasonable for all other user allocations.
>
> How about uid name spaces? Someone untrusted in a container could
> create a lot of uids and switch between them.
>
> A global rate limit seems better. While in theory it allows DoS
> it's probably not worse than a lot of others we have with
> other resources, and it's relatively harmless.
The EFI calls are all about checking system configuration. A thing
that only a handful of users do on a very occasional basis. I don't
see much harm if my "efibootmgr -v" call is slowed down a bit (or even
a lot) because you are using a bunch of the available ratelimit reading
the efivars.
Per-user seems over engineered to solve this problem.
-Tony
On Wed, Feb 21, 2018 at 11:47 AM, Luck, Tony <[email protected]> wrote:
>
> The EFI calls are all about checking system configuration. A thing
> that only a handful of users do on a very occasional basis. I don't
> see much harm if my "efibootmgr -v" call is slowed down a bit (or even
> a lot) because you are using a bunch of the available ratelimit reading
> the efivars.
>
It's not about slowing down.
It's about "user Xyz is messing with the system and reading efi vars
all the time" resulting in "user 'torvalds' is installing a kernel,
and actually wants to read efi vars, but can't".
if you don't make it per-user, you're just replacing one DoS attack
with another one!
Linus
On Wed, Feb 21, 2018 at 10:21 AM, Andi Kleen <[email protected]> wrote:
>
> How about uid name spaces? Someone untrusted in a container could
> create a lot of uids and switch between them.
Anybody who does that deserves whatever the hell they get.
You can already blow out a lot of other resources that way. If you can
create users indiscriminately enough, you can bypass most other
resource limits too.
If you think containers protect against security issues from untrusted
users, I have a bridge to sell you.
Linus
> It's not about slowing down.
>
> It's about "user Xyz is messing with the system and reading efi vars
> all the time" resulting in "user 'torvalds' is installing a kernel,
> and actually wants to read efi vars, but can't".
>
> if you don't make it per-user, you're just replacing one DoS attack
> with another one!
How are you envisioning this rate-limiting to be implemented? Are
you going to fail an EFI call if the rate is too high? I'm thinking that
we just add a delay to each call so that we can't exceed the limit.
That means your kernel install will complete, just slower than it
would without the delays.
I think I want a small random delay anyway to prevent users from
causing an SMI at the precise moment of their choosing.
-Tony
On Wed, Feb 21, 2018 at 11:58 AM, Luck, Tony <[email protected]> wrote:
>
> How are you envisioning this rate-limiting to be implemented? Are
> you going to fail an EFI call if the rate is too high? I'm thinking that
> we just add a delay to each call so that we can't exceed the limit.
Delaying sounds ok, I guess.
But the "obvious" implementation may be simple:
static void efi_ratelimit(void)
{
static DEFINE_RATELIMIT_STATE(ratelimit, HZ, 100);
if (!__ratelimit(&ratelimit))
msleep(10);
}
}
but the above is actually completely broken.
Why? If you have multiple processes, they can each only do a hundred
per second, but globally they can do millions per second by just
having a few thousand threads. They all sleep, but..
So how do you restrict it *globally*?
If you put this all inside a lock like a mutex, you can generate
basically arbitrary delays, and you're back to the DoS schenario. A
fair lock will allow thousands of waiters to line up and make the
delay be
But maybe I'm missing some really obvious way. You *can* make the
msleep be a spinning wait instead, and rely on the scheduler, I guess.
Or maybe I'm just stupid and am overlooking the obvious case.
Again, making the ratelimiting be per-user makes all of these issues
go away. Then one user cannot delay another one.
Linus
Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).
On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.
Linus suggested per-user rate limit to solve this.
So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.
In efivarfs if the limit is exceeded when reading, we sleep for 10ms.
Signed-off-by: Tony Luck <[email protected]>
---
fs/efivarfs/file.c | 4 ++++
include/linux/sched/user.h | 4 ++++
kernel/user.c | 3 +++
3 files changed, 11 insertions(+)
diff --git a/fs/efivarfs/file.c b/fs/efivarfs/file.c
index 5f22e74bbade..7bcf5b041028 100644
--- a/fs/efivarfs/file.c
+++ b/fs/efivarfs/file.c
@@ -8,6 +8,7 @@
*/
#include <linux/efi.h>
+#include <linux/delay.h>
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/mount.h>
@@ -74,6 +75,9 @@ static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
ssize_t size = 0;
int err;
+ if (!__ratelimit(&file->f_cred->user->ratelimit))
+ usleep_range(10000, 10000);
+
err = efivar_entry_size(var, &datasize);
/*
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 0dcf4e480ef7..96fe289c4c6e 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -4,6 +4,7 @@
#include <linux/uidgid.h>
#include <linux/atomic.h>
+#include <linux/ratelimit.h>
struct key;
@@ -41,6 +42,9 @@ struct user_struct {
defined(CONFIG_NET)
atomic_long_t locked_vm;
#endif
+
+ /* Miscellaneous per-user rate limit */
+ struct ratelimit_state ratelimit;
};
extern int uids_sysfs_init(void);
diff --git a/kernel/user.c b/kernel/user.c
index 9a20acce460d..36288d840675 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -101,6 +101,7 @@ struct user_struct root_user = {
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
.uid = GLOBAL_ROOT_UID,
+ .ratelimit = RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
};
/*
@@ -191,6 +192,8 @@ struct user_struct *alloc_uid(kuid_t uid)
new->uid = uid;
atomic_set(&new->__count, 1);
+ ratelimit_state_init(&new->ratelimit, HZ, 100);
+ ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
/*
* Before adding this, check whether we raced
--
2.14.1
On Wed, Feb 21, 2018 at 5:45 PM, Luck, Tony <[email protected]> wrote:
>
> Linus suggested per-user rate limit to solve this.
Note that you also need to serialize per user, because otherwise..
> + if (!__ratelimit(&file->f_cred->user->ratelimit))
> + usleep_range(10000, 10000);
..this doesn't really ratelimit anything, because you can just start a
thousand threads, and they all end up being rate-limited, but they all
just sleep for 10ms each, so you can get a hundred thousand accesses
per second anyway.
To fix that, you can either:
- just make it return -EAGAIN instead of sleeping (which probably
just works fine and doesn't break anything and is simple)
- add a per-user mutex, and do the usleep inside of it, so that
anybody who tries to do a thousand threads will just be serialized by
the mutex.
Note that the mutex needs to be per-user, because otherwise it will be
a DoS for the other users.
Of course, to avoid *another* DoS, the mutex should probably be
interruptible, and return -EAGAIN, so that you don't have a thousand
thread waiting for the mutex and have something that is effectively
unkillable for ten seconds.
Can it be hard and annoying to avoid DoS by rate limiting? Why, yes.
Yes it can.
Linus
> - just make it return -EAGAIN instead of sleeping (which probably
> just works fine and doesn't break anything and is simple)
It is very simple. But it does break things :-(. If I read one of these files
using "dd bs=1", that used to read the whole file (while generating
lots of SMI). With the -EAGAIN it just reads 100 bytes and says:
dd: error reading 'DefSetup-e8a99903-302c-4851-a6be-ab2731873b2f': Resource temporarily unavailable
100+0 records in
100+0 records out
100 bytes copied, 0.153943 s, 0.6 kB/s
> - add a per-user mutex, and do the usleep inside of it, so that
> anybody who tries to do a thousand threads will just be serialized by
> the mutex.
>
> Note that the mutex needs to be per-user, because otherwise it will be
> a DoS for the other users.
I can try that tomorrow (adding the per-user mutex to struct user_struct
right next to the ratelimit I added.
-Tony
"Luck, Tony" <[email protected]> writes:
>> - add a per-user mutex, and do the usleep inside of it, so that
>> anybody who tries to do a thousand threads will just be serialized by
>> the mutex.
>>
>> Note that the mutex needs to be per-user, because otherwise it will be
>> a DoS for the other users.
>
> I can try that tomorrow (adding the per-user mutex to struct user_struct
> right next to the ratelimit I added.
Another possibility is to cache the files in the page cache. That will
reduce re-reads of the same data to the maximum extent.
If efi has a chance of changing variables behind our back we might want
something that would have a timeout on how long the data is cached,
and we probably want to make the caching policy write-trough not
write-back.
I just suggest this as it seems like a much more tried and true solution.
Eric
On Thu, Feb 22, 2018 at 06:11:14AM +0000, Luck, Tony wrote:
>> On Feb 21, 2018, at 21:52, Linus Torvalds wrote:
>>
>> Does the error return actually break real users? Not "I can do did
>> things and it acts differently" things, but actual users...
>
> Probably not. Peter Jones said that efibootmgr might access up to 20
> files. Assuming it is sanely reading in big chunks, it won’t hit the
> rate limit that I set at 100.
Typically each read looks like:
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 3
read(3, "\7\0\0\0", 4) = 4
read(3, "\1\0\0\0b\0t\0e\0s\0t\0\0\0\4\1*\0\1\0\0\0\0\10\0\0\0\0\0\0"..., 4096) = 114
read(3, "", 3982) = 0
close(3) = 0
For each multiple of 4k, you'll see one more read() call. (It's two
reads because libefivar's efi_get_variable() returns the attributes
separately from the data, which goes in an allocation the caller is
responsible for freeing, so doing it as one read means it would
introduce an extra copy.) Looking at that code path, if it *does* get
tripped up by EAGAIN, it should handle it fine, though maybe I should
add a short randomized delay (or just sched_yield()) in that case.
I don't think the 3rd read there to detect EOF hits the efivarfs code,
so that's 2 reads per variable until you go over 4k, which most never
do.
That pattern is true of everything that uses libefivar to do its EFI
variable manipulation. On my moderately typical laptop with 2 boot
variables set, it looks something like:
trillian:~$ strace -o efibootmgr.strace efibootmgr -v
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001,0000
Boot0000* Fedora HD(1,GPT,2cf5261b-7b98-48c0-ae54-463dbd23e65b,0x800,0x64000)/File(\EFI\fedora\shimx64.efi)
Boot0001* test HD(1,GPT,2cf5261b-7b98-48c0-ae54-463dbd23e65b,0x800,0x64000)/File(\EFI\fedora\shimx64.efi)
trillian:~$ grep '^\(open\|read\)' efibootmgr.strace | grep -A100 sys/firmware | grep -c ^read
15
Which, if I'm write about VFS eating that last read, means 10 calls.
Many machines have some default boot variables; my desktop at home has 5
completely useless variables the firmware sets. So there it winds up
being 27 calls to read(2), and thus 18 calls to count towards our limit.
Your limit at 100 looks sufficiently large to me.
--
Peter
On Wed, 21 Feb 2018 09:03:00 +0000
> The thing I like about rate limiting (for everyone including root) is
> that it does not break anybody's workflow (unless EFI variable access
> occurs on a hot path, in which case you're either a) asking for it, or
> b) the guy trying to DoS us), and that it can make exploitation of any
> potential Spectre v1 vulnerabilities impractical at the same time. At
b) doesn't make spectre v1 much harder alas. What matters there is that
you turn on the right CPU protections before calling into EFI and turn
them off afterwards. EFI firmware internally isn't built with retpoline
anyway.
Alan
On 24 February 2018 at 20:06, Alan Cox <[email protected]> wrote:
> On Wed, 21 Feb 2018 09:03:00 +0000
>> The thing I like about rate limiting (for everyone including root) is
>> that it does not break anybody's workflow (unless EFI variable access
>> occurs on a hot path, in which case you're either a) asking for it, or
>> b) the guy trying to DoS us), and that it can make exploitation of any
>> potential Spectre v1 vulnerabilities impractical at the same time. At
>
> b) doesn't make spectre v1 much harder alas. What matters there is that
> you turn on the right CPU protections before calling into EFI and turn
> them off afterwards. EFI firmware internally isn't built with retpoline
> anyway.
>
Well, that is exactly my concern. The parsing code in the
authenticated variable routines in UEFI may well contain sequences
that are exploitable, due to UEFI's heavy reliance on protocols
involving function pointers, and the fact that a variable update is
structured data (with bounds that may need checking). Also, this code
is highly likely to remain unpatched on many systems, and it usually
appears in the same place in memory regardless of KASLR. However, only
root can call SetVariable(), so perhaps the risk is limited after all?
I had a stab (for arm64) at unmapping the kernel while UEFI calls are
in progress, which is a fairly big hammer, but effective, given that
UEFI itself does not keep any secrets in the first place, and so if
the kernel isn't mapped, there isn't anything to steal to begin with.
Note that on arm64, Spectre v2 should not be a concern, due to the
branch predictor maintenance that takes place when entering the
kernel. But Spectre v1 in UEFI runtime service code is currently
unmitigated, and I wonder what the risk level really is.