Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.
This happens for example with the setuid pppd application on Ubuntu 12.04:
$ head -1 /proc/kallsyms
00000000 T startup_32
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.
Fix this by adding a check that in addition to the current process
having CAP_SYSLOG, that effective user and group ids are equal to the
real ids. If a setuid binary reads the contents of a file which uses
%pK then the pointer values will be printed as NULL if the real user
is unprivileged.
Update the sysctl documentation to reflect the changes, and also
correct the documentation to state the kptr_restrict=0 is the default.
Signed-off-by: Ryan Mallon <[email protected]>
---
Changes since v2:
* Fixed typo in comment: 'proccess' -> 'process'
* Use a switch statement for the kptr_restrict values
* Updated the sysctl documentation
Changes since v1:
* Fix the description to say 'vsprintf' instead of 'printk'.
* Clarify the open() vs read() time checks in the patch description
and code comment.
* Remove comment about 'badly written' setuid binaries. This occurs
with setuids binaries which correctly handle opening files.
* Added extra people to the Cc list.
---
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d1..d57766e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -290,13 +290,14 @@ Default value is "/sbin/hotplug".
kptr_restrict:
This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces. When
-kptr_restrict is set to (0), there are no restrictions. When
-kptr_restrict is set to (1), the default, kernel pointers
-printed using the %pK format specifier will be replaced with 0's
-unless the user has CAP_SYSLOG. When kptr_restrict is set to
-(2), kernel pointers printed using %pK will be replaced with 0's
-regardless of privileges.
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to (0), the default, there are no restrictions.
+When kptr_restrict is set to (1), kernel pointers printed using the %pK
+format specifier will be replaced with 0's unless the user has CAP_SYSLOG
+and effective user and group ids are equal to the real ids.
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
==============================================================
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 26559bd..d76555c 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -27,6 +27,7 @@
#include <linux/uaccess.h>
#include <linux/ioport.h>
#include <linux/dcache.h>
+#include <linux/cred.h>
#include <net/addrconf.h>
#include <asm/page.h> /* for PAGE_SIZE */
@@ -1312,11 +1313,37 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
spec.field_width = default_width;
return string(buf, end, "pK-error", spec);
}
- if (!((kptr_restrict == 0) ||
- (kptr_restrict == 1 &&
- has_capability_noaudit(current, CAP_SYSLOG))))
+
+ switch (kptr_restrict) {
+ case 0:
+ /* Always print %pK values */
+ break;
+ case 1: {
+ /*
+ * Only print the real pointer value if the current
+ * process has CAP_SYSLOG and is running with the
+ * same credentials it started with. This is because
+ * access to files is checked at open() time, but %pK
+ * checks permission at read() time. We don't want to
+ * leak pointer values if a binary opens a file using
+ * %pK and then elevates privileges before reading it.
+ */
+ const struct cred *cred = current_cred();
+
+ if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+ !uid_eq(cred->euid, cred->uid) ||
+ !gid_eq(cred->egid, cred->gid))
+ ptr = NULL;
+ break;
+ }
+ case 2:
+ default:
+ /* Always print 0's for %pK */
ptr = NULL;
+ break;
+ }
break;
+
case 'N':
switch (fmt[1]) {
case 'F':
On Thu, 2013-10-10 at 08:52 +1100, Ryan Mallon wrote:
> Some setuid binaries will allow reading of files which have read
> permission by the real user id. This is problematic with files which
> use %pK because the file access permission is checked at open() time,
> but the kptr_restrict setting is checked at read() time. If a setuid
> binary opens a %pK file as an unprivileged user, and then elevates
> permissions before reading the file, then kernel pointer values may be
> leaked.
Please review the patch I sent you a little more.
> Fix this by adding a check that in addition to the current process
> having CAP_SYSLOG, that effective user and group ids are equal to the
> real ids. If a setuid binary reads the contents of a file which uses
> %pK then the pointer values will be printed as NULL if the real user
> is unprivileged.
[]
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> @@ -1312,11 +1313,37 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> spec.field_width = default_width;
> return string(buf, end, "pK-error", spec);
> }
Move the interrupt tests and pK-error printk
into case 1:
It's the only case where CAP_SYSLOG needs to be
tested so it doesn't need to be above the switch.
> - if (!((kptr_restrict == 0) ||
> - (kptr_restrict == 1 &&
> - has_capability_noaudit(current, CAP_SYSLOG))))
> +
> + switch (kptr_restrict) {
> + case 0:
> + /* Always print %pK values */
> + break;
> + case 1: {
> + /*
> + * Only print the real pointer value if the current
> + * process has CAP_SYSLOG and is running with the
> + * same credentials it started with. This is because
> + * access to files is checked at open() time, but %pK
> + * checks permission at read() time. We don't want to
> + * leak pointer values if a binary opens a file using
> + * %pK and then elevates privileges before reading it.
> + */
> + const struct cred *cred = current_cred();
> +
> + if (!has_capability_noaudit(current, CAP_SYSLOG) ||
> + !uid_eq(cred->euid, cred->uid) ||
> + !gid_eq(cred->egid, cred->gid))
> + ptr = NULL;
> + break;
> + }
> + case 2:
> + default:
> + /* Always print 0's for %pK */
> ptr = NULL;
> + break;
> + }
> break;
> +
> case 'N':
> switch (fmt[1]) {
> case 'F':
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On 10/10/13 09:00, Joe Perches wrote:
> On Thu, 2013-10-10 at 08:52 +1100, Ryan Mallon wrote:
>> Some setuid binaries will allow reading of files which have read
>> permission by the real user id. This is problematic with files which
>> use %pK because the file access permission is checked at open() time,
>> but the kptr_restrict setting is checked at read() time. If a setuid
>> binary opens a %pK file as an unprivileged user, and then elevates
>> permissions before reading the file, then kernel pointer values may be
>> leaked.
>
> Please review the patch I sent you a little more.
>
>> Fix this by adding a check that in addition to the current process
>> having CAP_SYSLOG, that effective user and group ids are equal to the
>> real ids. If a setuid binary reads the contents of a file which uses
>> %pK then the pointer values will be printed as NULL if the real user
>> is unprivileged.
>
> []
>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> []
>> @@ -1312,11 +1313,37 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
>> spec.field_width = default_width;
>> return string(buf, end, "pK-error", spec);
>> }
>
> Move the interrupt tests and pK-error printk
> into case 1:
>
> It's the only case where CAP_SYSLOG needs to be
> tested so it doesn't need to be above the switch.
Like I said, I think it is useful to do the pK-error check anyway. It is
checking for internal kernel bugs, since if 'pK-error' ever gets
printed, then some kernel code is doing the wrong thing. Therefore, I
think it is useful to print it always (I would argue it even makes sense
when kptr_restrict=0). I decided to just leave that part of the code alone.
~Ryan
On Thu, 2013-10-10 at 09:04 +1100, Ryan Mallon wrote:
> On 10/10/13 09:00, Joe Perches wrote:
[]
> > Move the interrupt tests and pK-error printk
> > into case 1:
> >
> > It's the only case where CAP_SYSLOG needs to be
> > tested so it doesn't need to be above the switch.
>
> Like I said, I think it is useful to do the pK-error check anyway. It is
> checking for internal kernel bugs, since if 'pK-error' ever gets
> printed, then some kernel code is doing the wrong thing.
I think you don't quite understand how kptr_restrict works.
If it's 0, then the ptr value is always emitted naturally.
if it's 2, then the ptr value is always emitted as 0.
> Therefore, I
> think it is useful to print it always (I would argue it even makes sense
> when kptr_restrict=0).
How? Maybe it's me that doesn't quite understand.
On 10/10/13 09:14, Joe Perches wrote:
> On Thu, 2013-10-10 at 09:04 +1100, Ryan Mallon wrote:
>> On 10/10/13 09:00, Joe Perches wrote:
> []
>>> Move the interrupt tests and pK-error printk
>>> into case 1:
>>>
>>> It's the only case where CAP_SYSLOG needs to be
>>> tested so it doesn't need to be above the switch.
>>
>> Like I said, I think it is useful to do the pK-error check anyway. It is
>> checking for internal kernel bugs, since if 'pK-error' ever gets
>> printed, then some kernel code is doing the wrong thing.
>
> I think you don't quite understand how kptr_restrict works.
>
> If it's 0, then the ptr value is always emitted naturally.
> if it's 2, then the ptr value is always emitted as 0.
I understand this.
>
>> Therefore, I
>> think it is useful to print it always (I would argue it even makes sense
>> when kptr_restrict=0).
>
> How? Maybe it's me that doesn't quite understand.
This check:
if (kptr_restrict && (in_irq() || in_serving_softirq() ||
in_nmi())) {
Is making sure that you don't have kernel code doing something like this:
irqreturn_t some_irq_handler(int irq, void *data)
{
struct seq_file *seq = to_seq(data);
seq_printf(seq, "value = %pK\n");
return IRQ_HANDLED;
}
Because that obviously won't work when kptr_restrict=1 (because the
CAP_SYSLOG check is meaningless). However, the code is broken regardless
of the kptr_restrict value. Since the default value of kptr_restrict is
0, this kind of bug can go over-looked because the seq file will print
the pointer value correctly when kptr_restrict=0, and it will correctly
print 0's when kptr_restrict=2, but it will print 'pK-error' when
kptr_restrict=1. Doing the check in all cases makes it more likely that
bugs like this get found. In fact, doing something like:
if (WARN_ON(in_irq() || in_serving_softirq() || in_nmi())) {
Might be better, since that will print a stack-trace showing where the
offending vsprintf is.
~Ryan
On Thu, 2013-10-10 at 09:25 +1100, Ryan Mallon wrote:
> if (kptr_restrict && (in_irq() || in_serving_softirq() ||
> in_nmi())) {
>
> Is making sure that you don't have kernel code doing something like this:
>
> irqreturn_t some_irq_handler(int irq, void *data)
> {
> struct seq_file *seq = to_seq(data);
>
> seq_printf(seq, "value = %pK\n");
> return IRQ_HANDLED;
> }
>
> Because that obviously won't work when kptr_restrict=1 (because the
> CAP_SYSLOG check is meaningless). However, the code is broken regardless
> of the kptr_restrict value.
The only brokenness I see here is that the code doesn't pass
a pointer along with %pK
seq_printf(seq, "value of seq: %pK\n", seq);
> Since the default value of kptr_restrict is
> 0, this kind of bug can go over-looked because the seq file will print
> the pointer value correctly when kptr_restrict=0, and it will correctly
> print 0's when kptr_restrict=2, but it will print 'pK-error' when
> kptr_restrict=1. Doing the check in all cases makes it more likely that
> bugs like this get found. In fact, doing something like:
>
> if (WARN_ON(in_irq() || in_serving_softirq() || in_nmi())) {
>
> Might be better, since that will print a stack-trace showing where the
> offending vsprintf is.
WARN_ON would be potentially _very_ noisy.
Maybe a long period (once a day?) ratelimited dump_stack();
On 10/10/13 09:33, Joe Perches wrote:
> On Thu, 2013-10-10 at 09:25 +1100, Ryan Mallon wrote:
>
>> if (kptr_restrict && (in_irq() || in_serving_softirq() ||
>> in_nmi())) {
>>
>> Is making sure that you don't have kernel code doing something like this:
>>
>> irqreturn_t some_irq_handler(int irq, void *data)
>> {
>> struct seq_file *seq = to_seq(data);
>>
>> seq_printf(seq, "value = %pK\n");
>> return IRQ_HANDLED;
>> }
>>
>> Because that obviously won't work when kptr_restrict=1 (because the
>> CAP_SYSLOG check is meaningless). However, the code is broken regardless
>> of the kptr_restrict value.
>
> The only brokenness I see here is that the code doesn't pass
> a pointer along with %pK
>
> seq_printf(seq, "value of seq: %pK\n", seq);
>
>> Since the default value of kptr_restrict is
>> 0, this kind of bug can go over-looked because the seq file will print
>> the pointer value correctly when kptr_restrict=0, and it will correctly
>> print 0's when kptr_restrict=2, but it will print 'pK-error' when
>> kptr_restrict=1. Doing the check in all cases makes it more likely that
>> bugs like this get found. In fact, doing something like:
>>
>> if (WARN_ON(in_irq() || in_serving_softirq() || in_nmi())) {
>>
>> Might be better, since that will print a stack-trace showing where the
>> offending vsprintf is.
>
> WARN_ON would be potentially _very_ noisy.
> Maybe a long period (once a day?) ratelimited dump_stack();
If it was noisy, it would indicate a bunch of broken kernel code which
needs fixing :-). Anyway, this is really a separate issue to what I am
trying to fix, which is why I left the original code intact. If you want
to change it, post a follow-up patch.
~Ryan
Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.
This happens for example with the setuid pppd application on Ubuntu
12.04:
$ head -1 /proc/kallsyms
00000000 T startup_32
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.
Fix this by adding a check that in addition to the current process
having CAP_SYSLOG, that effective user and group ids are equal to the
real ids. If a setuid binary reads the contents of a file which uses
%pK then the pointer values will be printed as NULL if the real user
is unprivileged.
Update the sysctl documentation to reflect the changes, and also
correct the documentation to state the kptr_restrict=0 is the default.
Original-patch-by: Ryan Mallon <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
---
On Thu, 2013-10-10 at 09:42 +1100, Ryan Mallon wrote:
> If it was noisy, it would indicate a bunch of broken kernel code which
> needs fixing :-).
Or maybe a single kernel source line but
you'd still have a filled up log file.
Changes in V3a:
Do the in_irq tests only when kptr_restrict is 1.
Document the %pK mechanism in vsnprintf
Add missing documentation for %pV and %pNF too
Documentation/sysctl/kernel.txt | 17 ++++++++--------
lib/vsprintf.c | 43 ++++++++++++++++++++++++++++-------------
2 files changed, 39 insertions(+), 21 deletions(-)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d1..c17d5ca 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -289,14 +289,15 @@ Default value is "/sbin/hotplug".
kptr_restrict:
-This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces. When
-kptr_restrict is set to (0), there are no restrictions. When
-kptr_restrict is set to (1), the default, kernel pointers
-printed using the %pK format specifier will be replaced with 0's
-unless the user has CAP_SYSLOG. When kptr_restrict is set to
-(2), kernel pointers printed using %pK will be replaced with 0's
-regardless of privileges.
+This toggle indicates whether restrictions are placed on exposing kernel
+addresses via /proc and other interfaces.
+
+When kptr_restrict is set to (0), the default, there are no restrictions.
+When kptr_restrict is set to (1), kernel pointers printed using the %pK
+format specifier will be replaced with 0's unless the user has CAP_SYSLOG
+and effective user and group ids are equal to the real ids.
+When kptr_restrict is set to (2), kernel pointers printed using %pK will
+be replaced with 0's regardless of privileges.
==============================================================
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 26559bd..3efcf29 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -27,6 +27,7 @@
#include <linux/uaccess.h>
#include <linux/ioport.h>
#include <linux/dcache.h>
+#include <linux/cred.h>
#include <net/addrconf.h>
#include <asm/page.h> /* for PAGE_SIZE */
@@ -1301,21 +1302,34 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
va_end(va);
return buf;
}
- case 'K':
- /*
- * %pK cannot be used in IRQ context because its test
- * for CAP_SYSLOG would be meaningless.
- */
- if (kptr_restrict && (in_irq() || in_serving_softirq() ||
- in_nmi())) {
- if (spec.field_width == -1)
- spec.field_width = default_width;
- return string(buf, end, "pK-error", spec);
+ case 'K': /* see: Documentation/sysctl/kernel.txt */
+ switch (kptr_restrict) {
+ case 0: /* None (default) */
+ break;
+ case 1: { /* Restricted */
+ const struct cred *cred;
+
+ if (in_irq() || in_serving_softirq() || in_nmi()) {
+ /*
+ * This cannot be used in IRQ context because
+ * the test for CAP_SYSLOG would be meaningless
+ */
+ if (spec.field_width == -1)
+ spec.field_width = default_width;
+ return string(buf, end, "pK-error", spec);
+ }
+ cred = current_cred();
+ if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+ !uid_eq(cred->euid, cred->uid) ||
+ !gid_eq(cred->egid, cred->gid))
+ ptr = NULL;
+ break;
}
- if (!((kptr_restrict == 0) ||
- (kptr_restrict == 1 &&
- has_capability_noaudit(current, CAP_SYSLOG))))
+ case 2: /* Never - Always emit 0 */
+ default:
ptr = NULL;
+ break;
+ }
break;
case 'N':
switch (fmt[1]) {
@@ -1574,6 +1588,9 @@ qualifier:
* %piS depending on sa_family of 'struct sockaddr *' print IPv4/IPv6 address
* %pU[bBlL] print a UUID/GUID in big or little endian using lower or upper
* case.
+ * %pV recurse and output a struct va_format (const char *fmt, va_list *)
+ * %pK output a kernel address or 0 depending on sysctl kptr_restrict
+ * %pNF output a netdev_features_t
* %*ph[CDN] a variable-length hex string with a separator (supports up to 64
* bytes of the input)
* %n is ignored
On 10/10/13 10:09, Joe Perches wrote:
> Changes in V3a:
>
> Do the in_irq tests only when kptr_restrict is 1.
> Document the %pK mechanism in vsnprintf
> Add missing documentation for %pV and %pNF too
I really did mean post a follow-up/separate patch, not a different
version of mine. The missing documentation for %pV and %pNF fix is fine,
but does not belong in this patch. The kptr_restrict pK-error is a
separate issue, my patch deliberately did not touch it because it is
unrelated. If you want to change it, please do so in a seperate patch.
You also removed my comment explaining why the uid/gid check is
necessary :-/.
~Ryan
> Documentation/sysctl/kernel.txt | 17 ++++++++--------
> lib/vsprintf.c | 43 ++++++++++++++++++++++++++++-------------
> 2 files changed, 39 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 9d4c1d1..c17d5ca 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -289,14 +289,15 @@ Default value is "/sbin/hotplug".
>
> kptr_restrict:
>
> -This toggle indicates whether restrictions are placed on
> -exposing kernel addresses via /proc and other interfaces. When
> -kptr_restrict is set to (0), there are no restrictions. When
> -kptr_restrict is set to (1), the default, kernel pointers
> -printed using the %pK format specifier will be replaced with 0's
> -unless the user has CAP_SYSLOG. When kptr_restrict is set to
> -(2), kernel pointers printed using %pK will be replaced with 0's
> -regardless of privileges.
> +This toggle indicates whether restrictions are placed on exposing kernel
> +addresses via /proc and other interfaces.
> +
> +When kptr_restrict is set to (0), the default, there are no restrictions.
> +When kptr_restrict is set to (1), kernel pointers printed using the %pK
> +format specifier will be replaced with 0's unless the user has CAP_SYSLOG
> +and effective user and group ids are equal to the real ids.
> +When kptr_restrict is set to (2), kernel pointers printed using %pK will
> +be replaced with 0's regardless of privileges.
>
> ==============================================================
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 26559bd..3efcf29 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -27,6 +27,7 @@
> #include <linux/uaccess.h>
> #include <linux/ioport.h>
> #include <linux/dcache.h>
> +#include <linux/cred.h>
> #include <net/addrconf.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> @@ -1301,21 +1302,34 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> va_end(va);
> return buf;
> }
> - case 'K':
> - /*
> - * %pK cannot be used in IRQ context because its test
> - * for CAP_SYSLOG would be meaningless.
> - */
> - if (kptr_restrict && (in_irq() || in_serving_softirq() ||
> - in_nmi())) {
> - if (spec.field_width == -1)
> - spec.field_width = default_width;
> - return string(buf, end, "pK-error", spec);
> + case 'K': /* see: Documentation/sysctl/kernel.txt */
> + switch (kptr_restrict) {
> + case 0: /* None (default) */
> + break;
> + case 1: { /* Restricted */
> + const struct cred *cred;
> +
> + if (in_irq() || in_serving_softirq() || in_nmi()) {
> + /*
> + * This cannot be used in IRQ context because
> + * the test for CAP_SYSLOG would be meaningless
> + */
> + if (spec.field_width == -1)
> + spec.field_width = default_width;
> + return string(buf, end, "pK-error", spec);
> + }
> + cred = current_cred();
> + if (!has_capability_noaudit(current, CAP_SYSLOG) ||
> + !uid_eq(cred->euid, cred->uid) ||
> + !gid_eq(cred->egid, cred->gid))
> + ptr = NULL;
> + break;
> }
> - if (!((kptr_restrict == 0) ||
> - (kptr_restrict == 1 &&
> - has_capability_noaudit(current, CAP_SYSLOG))))
> + case 2: /* Never - Always emit 0 */
> + default:
> ptr = NULL;
> + break;
> + }
> break;
> case 'N':
> switch (fmt[1]) {
> @@ -1574,6 +1588,9 @@ qualifier:
> * %piS depending on sa_family of 'struct sockaddr *' print IPv4/IPv6 address
> * %pU[bBlL] print a UUID/GUID in big or little endian using lower or upper
> * case.
> + * %pV recurse and output a struct va_format (const char *fmt, va_list *)
> + * %pK output a kernel address or 0 depending on sysctl kptr_restrict
> + * %pNF output a netdev_features_t
> * %*ph[CDN] a variable-length hex string with a separator (supports up to 64
> * bytes of the input)
> * %n is ignored
>
>
On Thu, 2013-10-10 at 10:18 +1100, Ryan Mallon wrote:
> On 10/10/13 10:09, Joe Perches wrote:
> > Do the in_irq tests only when kptr_restrict is 1.
> > Document the %pK mechanism in vsnprintf
> > Add missing documentation for %pV and %pNF too
>
> I really did mean post a follow-up/separate patch, not a different
> version of mine.
I think that's not the right thing to do.
There's no good reason to have multiple commits
for the same content.
And almost all of that is actually stuff I wrote.
Your content there is the cred checks.
> > + cred = current_cred();
> > + if (!has_capability_noaudit(current, CAP_SYSLOG) ||
> > + !uid_eq(cred->euid, cred->uid) ||
> > + !gid_eq(cred->egid, cred->gid))
Joe Perches <[email protected]> writes:
> Some setuid binaries will allow reading of files which have read
> permission by the real user id. This is problematic with files which
> use %pK because the file access permission is checked at open() time,
> but the kptr_restrict setting is checked at read() time. If a setuid
> binary opens a %pK file as an unprivileged user, and then elevates
> permissions before reading the file, then kernel pointer values may be
> leaked.
>
> This happens for example with the setuid pppd application on Ubuntu
> 12.04:
>
> $ head -1 /proc/kallsyms
> 00000000 T startup_32
>
> $ pppd file /proc/kallsyms
> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
>
> This will only leak the pointer value from the first line, but other
> setuid binaries may leak more information.
>
> Fix this by adding a check that in addition to the current process
> having CAP_SYSLOG, that effective user and group ids are equal to the
> real ids. If a setuid binary reads the contents of a file which uses
> %pK then the pointer values will be printed as NULL if the real user
> is unprivileged.
>
> Update the sysctl documentation to reflect the changes, and also
> correct the documentation to state the kptr_restrict=0 is the default.
Sigh. This is all wrong. The only correct thing to test is
file->f_cred. Aka the capabilities of the program that opened the
file.
Which means that the interface to %pK in the case of kptr_restrict is
broken as it has no way to be passed the information it needs to make
a sensible decision.
So if you all are going to make a great big fuss and clutter up my inbox
can you please figure out how to implement kptr_restrict in a non-buggy
way?
Thank you.
Eric
> Original-patch-by: Ryan Mallon <[email protected]>
> Signed-off-by: Joe Perches <[email protected]>
> ---
> On Thu, 2013-10-10 at 09:42 +1100, Ryan Mallon wrote:
>> If it was noisy, it would indicate a bunch of broken kernel code which
>> needs fixing :-).
>
> Or maybe a single kernel source line but
> you'd still have a filled up log file.
>
> Changes in V3a:
>
> Do the in_irq tests only when kptr_restrict is 1.
> Document the %pK mechanism in vsnprintf
> Add missing documentation for %pV and %pNF too
>
> Documentation/sysctl/kernel.txt | 17 ++++++++--------
> lib/vsprintf.c | 43 ++++++++++++++++++++++++++++-------------
> 2 files changed, 39 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 9d4c1d1..c17d5ca 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -289,14 +289,15 @@ Default value is "/sbin/hotplug".
>
> kptr_restrict:
>
> -This toggle indicates whether restrictions are placed on
> -exposing kernel addresses via /proc and other interfaces. When
> -kptr_restrict is set to (0), there are no restrictions. When
> -kptr_restrict is set to (1), the default, kernel pointers
> -printed using the %pK format specifier will be replaced with 0's
> -unless the user has CAP_SYSLOG. When kptr_restrict is set to
> -(2), kernel pointers printed using %pK will be replaced with 0's
> -regardless of privileges.
> +This toggle indicates whether restrictions are placed on exposing kernel
> +addresses via /proc and other interfaces.
> +
> +When kptr_restrict is set to (0), the default, there are no restrictions.
> +When kptr_restrict is set to (1), kernel pointers printed using the %pK
> +format specifier will be replaced with 0's unless the user has CAP_SYSLOG
> +and effective user and group ids are equal to the real ids.
> +When kptr_restrict is set to (2), kernel pointers printed using %pK will
> +be replaced with 0's regardless of privileges.
>
> ==============================================================
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 26559bd..3efcf29 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -27,6 +27,7 @@
> #include <linux/uaccess.h>
> #include <linux/ioport.h>
> #include <linux/dcache.h>
> +#include <linux/cred.h>
> #include <net/addrconf.h>
>
> #include <asm/page.h> /* for PAGE_SIZE */
> @@ -1301,21 +1302,34 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
> va_end(va);
> return buf;
> }
> - case 'K':
> - /*
> - * %pK cannot be used in IRQ context because its test
> - * for CAP_SYSLOG would be meaningless.
> - */
> - if (kptr_restrict && (in_irq() || in_serving_softirq() ||
> - in_nmi())) {
> - if (spec.field_width == -1)
> - spec.field_width = default_width;
> - return string(buf, end, "pK-error", spec);
> + case 'K': /* see: Documentation/sysctl/kernel.txt */
> + switch (kptr_restrict) {
> + case 0: /* None (default) */
> + break;
> + case 1: { /* Restricted */
> + const struct cred *cred;
> +
> + if (in_irq() || in_serving_softirq() || in_nmi()) {
> + /*
> + * This cannot be used in IRQ context because
> + * the test for CAP_SYSLOG would be meaningless
> + */
> + if (spec.field_width == -1)
> + spec.field_width = default_width;
> + return string(buf, end, "pK-error", spec);
> + }
> + cred = current_cred();
> + if (!has_capability_noaudit(current, CAP_SYSLOG) ||
> + !uid_eq(cred->euid, cred->uid) ||
> + !gid_eq(cred->egid, cred->gid))
> + ptr = NULL;
> + break;
> }
> - if (!((kptr_restrict == 0) ||
> - (kptr_restrict == 1 &&
> - has_capability_noaudit(current, CAP_SYSLOG))))
> + case 2: /* Never - Always emit 0 */
> + default:
> ptr = NULL;
> + break;
> + }
> break;
> case 'N':
> switch (fmt[1]) {
> @@ -1574,6 +1588,9 @@ qualifier:
> * %piS depending on sa_family of 'struct sockaddr *' print IPv4/IPv6 address
> * %pU[bBlL] print a UUID/GUID in big or little endian using lower or upper
> * case.
> + * %pV recurse and output a struct va_format (const char *fmt, va_list *)
> + * %pK output a kernel address or 0 depending on sysctl kptr_restrict
> + * %pNF output a netdev_features_t
> * %*ph[CDN] a variable-length hex string with a separator (supports up to 64
> * bytes of the input)
> * %n is ignored
On 11/10/13 13:20, Eric W. Biederman wrote:
> Joe Perches <[email protected]> writes:
>
>> Some setuid binaries will allow reading of files which have read
>> permission by the real user id. This is problematic with files which
>> use %pK because the file access permission is checked at open() time,
>> but the kptr_restrict setting is checked at read() time. If a setuid
>> binary opens a %pK file as an unprivileged user, and then elevates
>> permissions before reading the file, then kernel pointer values may be
>> leaked.
>>
>> This happens for example with the setuid pppd application on Ubuntu
>> 12.04:
>>
>> $ head -1 /proc/kallsyms
>> 00000000 T startup_32
>>
>> $ pppd file /proc/kallsyms
>> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
>>
>> This will only leak the pointer value from the first line, but other
>> setuid binaries may leak more information.
>>
>> Fix this by adding a check that in addition to the current process
>> having CAP_SYSLOG, that effective user and group ids are equal to the
>> real ids. If a setuid binary reads the contents of a file which uses
>> %pK then the pointer values will be printed as NULL if the real user
>> is unprivileged.
>>
>> Update the sysctl documentation to reflect the changes, and also
>> correct the documentation to state the kptr_restrict=0 is the default.
>
> Sigh. This is all wrong. The only correct thing to test is
> file->f_cred. Aka the capabilities of the program that opened the
> file.
>
> Which means that the interface to %pK in the case of kptr_restrict is
> broken as it has no way to be passed the information it needs to make
> a sensible decision.
Would it make sense to add a struct file * to struct seq_file and set
that in seq_open? Then the capability check can be done against seq->file.
~Ryan
Ryan Mallon <[email protected]> writes:
> On 11/10/13 13:20, Eric W. Biederman wrote:
>> Joe Perches <[email protected]> writes:
>>
>>> Some setuid binaries will allow reading of files which have read
>>> permission by the real user id. This is problematic with files which
>>> use %pK because the file access permission is checked at open() time,
>>> but the kptr_restrict setting is checked at read() time. If a setuid
>>> binary opens a %pK file as an unprivileged user, and then elevates
>>> permissions before reading the file, then kernel pointer values may be
>>> leaked.
>>>
>>> This happens for example with the setuid pppd application on Ubuntu
>>> 12.04:
>>>
>>> $ head -1 /proc/kallsyms
>>> 00000000 T startup_32
>>>
>>> $ pppd file /proc/kallsyms
>>> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
>>>
>>> This will only leak the pointer value from the first line, but other
>>> setuid binaries may leak more information.
>>>
>>> Fix this by adding a check that in addition to the current process
>>> having CAP_SYSLOG, that effective user and group ids are equal to the
>>> real ids. If a setuid binary reads the contents of a file which uses
>>> %pK then the pointer values will be printed as NULL if the real user
>>> is unprivileged.
>>>
>>> Update the sysctl documentation to reflect the changes, and also
>>> correct the documentation to state the kptr_restrict=0 is the default.
>>
>> Sigh. This is all wrong. The only correct thing to test is
>> file->f_cred. Aka the capabilities of the program that opened the
>> file.
>>
>> Which means that the interface to %pK in the case of kptr_restrict is
>> broken as it has no way to be passed the information it needs to make
>> a sensible decision.
>
> Would it make sense to add a struct file * to struct seq_file and set
> that in seq_open? Then the capability check can be done against
> seq->file.
It would make most sense to do the capability check at open time,
and cache the result. Doing it generically so that seq_printf could
still use %pK doesn't sound wrong, but it does sound convoluted.
Eric
[email protected] (Eric W. Biederman) wrote:
> Sigh. This is all wrong. The only correct thing to test is
> file->f_cred. Aka the capabilities of the program that opened the
> file.
>
> Which means that the interface to %pK in the case of kptr_restrict is
> broken as it has no way to be passed the information it needs to make
> a sensible decision.
I looked at the code, and pretty painful. Certainly it's possible to
include a reference to the file (I was thinking of just the credentials,
actually) in the seq_file. But getting that to the vsprintf.c code
(specifically, the pointer() function) is a PITA.
I'm willing to accept the currently proposed kludge as a "good enough"
approximation, as long as we're all agreed that using the credentials
at open() time would be The Right Thing, and hopefully someone will find
the round tuitts to implement that in future.
But in the meantime, "the perfect is the enemey of the good" is worth
remembering.
(An alternate implementation I've been thinking about would be to do
away with %pK, and instead have a "secret_ptr(p, seq->cred)" helper that
returned p or 0 depending on the credential.)
On 11/10/13 15:42, George Spelvin wrote:
> [email protected] (Eric W. Biederman) wrote:
>> Sigh. This is all wrong. The only correct thing to test is
>> file->f_cred. Aka the capabilities of the program that opened the
>> file.
>>
>> Which means that the interface to %pK in the case of kptr_restrict is
>> broken as it has no way to be passed the information it needs to make
>> a sensible decision.
>
> I looked at the code, and pretty painful. Certainly it's possible to
> include a reference to the file (I was thinking of just the credentials,
> actually) in the seq_file. But getting that to the vsprintf.c code
> (specifically, the pointer() function) is a PITA.
>
> I'm willing to accept the currently proposed kludge as a "good enough"
> approximation, as long as we're all agreed that using the credentials
> at open() time would be The Right Thing, and hopefully someone will find
> the round tuitts to implement that in future.
>
> But in the meantime, "the perfect is the enemey of the good" is worth
> remembering.
>
> (An alternate implementation I've been thinking about would be to do
> away with %pK, and instead have a "secret_ptr(p, seq->cred)" helper that
> returned p or 0 depending on the credential.)
Yeah, that is probably the best solution. I'll try to put together a
patch series doing this. It will obviously be more involved though, so I
think it is still worth merging the original patch in the interm.
~Ryan
On Fri, 2013-10-11 at 16:19 +1100, Ryan Mallon wrote:
> Yeah, that is probably the best solution. I'll try to put together a
> patch series doing this. It will obviously be more involved though, so I
> think it is still worth merging the original patch in the interm.
I just submitted a patch neatening and shrinking
the original code without the cred changes.
Maybe build on that?
On 11/10/13 15:42, George Spelvin wrote:
> [email protected] (Eric W. Biederman) wrote:
>> Sigh. This is all wrong. The only correct thing to test is
>> file->f_cred. Aka the capabilities of the program that opened the
>> file.
>>
>> Which means that the interface to %pK in the case of kptr_restrict is
>> broken as it has no way to be passed the information it needs to make
>> a sensible decision.
>
> I looked at the code, and pretty painful. Certainly it's possible to
> include a reference to the file (I was thinking of just the credentials,
> actually) in the seq_file. But getting that to the vsprintf.c code
> (specifically, the pointer() function) is a PITA.
>
> I'm willing to accept the currently proposed kludge as a "good enough"
> approximation, as long as we're all agreed that using the credentials
> at open() time would be The Right Thing, and hopefully someone will find
> the round tuitts to implement that in future.
>
> But in the meantime, "the perfect is the enemey of the good" is worth
> remembering.
>
> (An alternate implementation I've been thinking about would be to do
> away with %pK, and instead have a "secret_ptr(p, seq->cred)" helper that
> returned p or 0 depending on the credential.)
I've been looking at this approach. The majority of %pK uses are in
seq_files, so George's suggestion will work fine there.
There are a handful of instances in printk() statements. I don't think
%pK can ever make sense from printk(), because it is basically impossible
to do any sane permission check. If a setuid binary does some action
with elevated which causes printk() to be called then the user might see
leaked kernel pointers. The correct way to prevent this is to use
dmesg_restrict and not allow normal users to see the syslog.
The only remaining problem is kernel/module.c:module_sect_show() which
is used to write the sysfs files in /sys/module/<modname>/sections/.
Those files are actually are really good target for leaking %pK values
via setuid binaries. The problem is that the module_sect_show() function
isn't passed information about who opened the sysfs file. I don't think
this information is available in general for sysfs files either. Also,
I can't actually see how module_sect_show() gets called?
I'm a bit stuck on how to solve this. Any ideas?
~Ryan
Ryan Mallon <[email protected]> writes:
> The only remaining problem is kernel/module.c:module_sect_show() which
> is used to write the sysfs files in /sys/module/<modname>/sections/.
> Those files are actually are really good target for leaking %pK values
> via setuid binaries. The problem is that the module_sect_show() function
> isn't passed information about who opened the sysfs file. I don't think
> this information is available in general for sysfs files either. Also,
> I can't actually see how module_sect_show() gets called?
>
> I'm a bit stuck on how to solve this. Any ideas?
I haven't yet had a chance to review the patches but there are patches
to make sysfs files seq files in Greg's driver core tree.
Eric
On 12/10/13 09:37, Eric W. Biederman wrote:
> Ryan Mallon <[email protected]> writes:
>
>> The only remaining problem is kernel/module.c:module_sect_show() which
>> is used to write the sysfs files in /sys/module/<modname>/sections/.
>> Those files are actually are really good target for leaking %pK values
>> via setuid binaries. The problem is that the module_sect_show() function
>> isn't passed information about who opened the sysfs file. I don't think
>> this information is available in general for sysfs files either. Also,
>> I can't actually see how module_sect_show() gets called?
>>
>> I'm a bit stuck on how to solve this. Any ideas?
>
> I haven't yet had a chance to review the patches but there are patches
> to make sysfs files seq files in Greg's driver core tree.
Hmm, I had a look at the driver-core tree, and although sysfs files
internally use the seq_file interface, the sysfs show/store handlers do
not get passed the struct seq_file, so doesn't appear possible to do the
checks there.
We could add a sysfs_ops->seq_show, but that feels clunky to have two
different interfaces for handling sysfs files. Converting the whole
tree to pass struct seq_file to the sysfs handlers would be painful :-/.
I assume the reason the /sys/module/<modname>/sections/* cannot be made
0400 is that some user-space tools are expecting those files to be
readable by unprivileged users?
~Ryan
On Fri, Oct 11, 2013 at 02:19:14PM +1100, Ryan Mallon wrote:
> On 11/10/13 13:20, Eric W. Biederman wrote:
> > Joe Perches <[email protected]> writes:
> >
> >> Some setuid binaries will allow reading of files which have read
> >> permission by the real user id. This is problematic with files which
> >> use %pK because the file access permission is checked at open() time,
> >> but the kptr_restrict setting is checked at read() time. If a setuid
> >> binary opens a %pK file as an unprivileged user, and then elevates
> >> permissions before reading the file, then kernel pointer values may be
> >> leaked.
> >>
> >> This happens for example with the setuid pppd application on Ubuntu
> >> 12.04:
> >>
> >> $ head -1 /proc/kallsyms
> >> 00000000 T startup_32
> >>
> >> $ pppd file /proc/kallsyms
> >> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
> >>
> >> This will only leak the pointer value from the first line, but other
> >> setuid binaries may leak more information.
> >>
> >> Fix this by adding a check that in addition to the current process
> >> having CAP_SYSLOG, that effective user and group ids are equal to the
> >> real ids. If a setuid binary reads the contents of a file which uses
> >> %pK then the pointer values will be printed as NULL if the real user
> >> is unprivileged.
> >>
> >> Update the sysctl documentation to reflect the changes, and also
> >> correct the documentation to state the kptr_restrict=0 is the default.
> >
> > Sigh. This is all wrong. The only correct thing to test is
> > file->f_cred. Aka the capabilities of the program that opened the
> > file.
> >
> > Which means that the interface to %pK in the case of kptr_restrict is
> > broken as it has no way to be passed the information it needs to make
> > a sensible decision.
>
> Would it make sense to add a struct file * to struct seq_file and set
> that in seq_open? Then the capability check can be done against seq->file.
For the "add a struct file * to struct seq_file" and set it during
seq_open(), It was proposed by Linus, but Al Viro didn't like it:
https://lkml.org/lkml/2013/9/25/765
I'm not sure if this will work for you: you can make seq_file->private
cache some data, by calling single_open()... at ->open(), later check it
during read()...
As noted by Eric, I'll also go for the capability check at ->open(), if it
does not break some userspace. BTW the CAP_SYSLOG check should do the job
Checks during read() are not sufficient, since the design allows passing
file descriptors and dup() stdin/stdout of suid-execve.
IMO: unprivileged code should not get that file descriptor, so ->open()
should fail.
If this will break userspace then allow open() and cache result for read()
Can you emulate the behaviour of kptr_restrict=1 ? If so:
1) perform check during open() and cache data
2) during read() check kptr_restrict==1
check the cached value and if opener had CAP_SYSLOG if so:
print something like this: 00000000 T startup_32
All this without modifying vsprintf, I mean just do the checks outside
vsprintf() inside your ->read()
Just my 2 cents
Please cc:me, Thanks
--
Djalal Harouni
http://opendz.org
On Mon, Oct 14, 2013 at 11:17:06AM +0100, Djalal Harouni wrote:
> On Fri, Oct 11, 2013 at 02:19:14PM +1100, Ryan Mallon wrote:
> > On 11/10/13 13:20, Eric W. Biederman wrote:
> > > Joe Perches <[email protected]> writes:
> > >
> > >> Some setuid binaries will allow reading of files which have read
> > >> permission by the real user id. This is problematic with files which
> > >> use %pK because the file access permission is checked at open() time,
> > >> but the kptr_restrict setting is checked at read() time. If a setuid
> > >> binary opens a %pK file as an unprivileged user, and then elevates
> > >> permissions before reading the file, then kernel pointer values may be
> > >> leaked.
> > >>
> > >> This happens for example with the setuid pppd application on Ubuntu
> > >> 12.04:
> > >>
> > >> $ head -1 /proc/kallsyms
> > >> 00000000 T startup_32
> > >>
> > >> $ pppd file /proc/kallsyms
> > >> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
> > >>
> > >> This will only leak the pointer value from the first line, but other
> > >> setuid binaries may leak more information.
> > >>
> > >> Fix this by adding a check that in addition to the current process
> > >> having CAP_SYSLOG, that effective user and group ids are equal to the
> > >> real ids. If a setuid binary reads the contents of a file which uses
> > >> %pK then the pointer values will be printed as NULL if the real user
> > >> is unprivileged.
> > >>
> > >> Update the sysctl documentation to reflect the changes, and also
> > >> correct the documentation to state the kptr_restrict=0 is the default.
> > >
> > > Sigh. This is all wrong. The only correct thing to test is
> > > file->f_cred. Aka the capabilities of the program that opened the
> > > file.
> > >
> > > Which means that the interface to %pK in the case of kptr_restrict is
> > > broken as it has no way to be passed the information it needs to make
> > > a sensible decision.
> >
> > Would it make sense to add a struct file * to struct seq_file and set
> > that in seq_open? Then the capability check can be done against seq->file.
> For the "add a struct file * to struct seq_file" and set it during
> seq_open(), It was proposed by Linus, but Al Viro didn't like it:
> https://lkml.org/lkml/2013/9/25/765
>
> I'm not sure if this will work for you: you can make seq_file->private
> cache some data, by calling single_open()... at ->open(), later check it
> during read()...
>
>
> As noted by Eric, I'll also go for the capability check at ->open(), if it
> does not break some userspace. BTW the CAP_SYSLOG check should do the job
>
> Checks during read() are not sufficient, since the design allows passing
> file descriptors and dup() stdin/stdout of suid-execve.
>
>
> IMO: unprivileged code should not get that file descriptor, so ->open()
> should fail.
> If this will break userspace then allow open() and cache result for read()
>
>
> Can you emulate the behaviour of kptr_restrict=1 ? If so:
> 1) perform check during open() and cache data
> 2) during read() check kptr_restrict==1
> check the cached value and if opener had CAP_SYSLOG if so:
> print something like this: 00000000 T startup_32
Sorry, I mean if the opener didn't have CAP_SYSLOG
--
Djalal Harouni
http://opendz.org
On 14/10/13 21:17, Djalal Harouni wrote:
> On Fri, Oct 11, 2013 at 02:19:14PM +1100, Ryan Mallon wrote:
>> On 11/10/13 13:20, Eric W. Biederman wrote:
>>> Joe Perches <[email protected]> writes:
>>>
>>>> Some setuid binaries will allow reading of files which have read
>>>> permission by the real user id. This is problematic with files which
>>>> use %pK because the file access permission is checked at open() time,
>>>> but the kptr_restrict setting is checked at read() time. If a setuid
>>>> binary opens a %pK file as an unprivileged user, and then elevates
>>>> permissions before reading the file, then kernel pointer values may be
>>>> leaked.
>>>>
>>>> This happens for example with the setuid pppd application on Ubuntu
>>>> 12.04:
>>>>
>>>> $ head -1 /proc/kallsyms
>>>> 00000000 T startup_32
>>>>
>>>> $ pppd file /proc/kallsyms
>>>> pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
>>>>
>>>> This will only leak the pointer value from the first line, but other
>>>> setuid binaries may leak more information.
>>>>
>>>> Fix this by adding a check that in addition to the current process
>>>> having CAP_SYSLOG, that effective user and group ids are equal to the
>>>> real ids. If a setuid binary reads the contents of a file which uses
>>>> %pK then the pointer values will be printed as NULL if the real user
>>>> is unprivileged.
>>>>
>>>> Update the sysctl documentation to reflect the changes, and also
>>>> correct the documentation to state the kptr_restrict=0 is the default.
>>>
>>> Sigh. This is all wrong. The only correct thing to test is
>>> file->f_cred. Aka the capabilities of the program that opened the
>>> file.
>>>
>>> Which means that the interface to %pK in the case of kptr_restrict is
>>> broken as it has no way to be passed the information it needs to make
>>> a sensible decision.
>>
>> Would it make sense to add a struct file * to struct seq_file and set
>> that in seq_open? Then the capability check can be done against seq->file.
> For the "add a struct file * to struct seq_file" and set it during
> seq_open(), It was proposed by Linus, but Al Viro didn't like it:
> https://lkml.org/lkml/2013/9/25/765
>
> I'm not sure if this will work for you: you can make seq_file->private
> cache some data, by calling single_open()... at ->open(), later check it
> during read()...
>
>
> As noted by Eric, I'll also go for the capability check at ->open(), if it
> does not break some userspace. BTW the CAP_SYSLOG check should do the job
Yes, it has already been agreed on that open() time is the correct place to
do the check, and that either the check value should be cached in the
struct seq_file or the struct cred/file should be kept so that the check
can be done later. I think caching the result is actually better since it
removes the whole in_irq() problem also.
The problem I have at the moment is handling the case for
/sys/module<modname>/sections/*, which are seq_files in Greg's
driver-core tree, but do not actually pass the struct seq_file to the show
function, which makes it impossible to check what the open() time
permissions were.
> Checks during read() are not sufficient, since the design allows passing
> file descriptors and dup() stdin/stdout of suid-execve.
>
>
> IMO: unprivileged code should not get that file descriptor, so ->open()
> should fail.
> If this will break userspace then allow open() and cache result for read()
>
>
> Can you emulate the behaviour of kptr_restrict=1 ? If so:
> 1) perform check during open() and cache data
> 2) during read() check kptr_restrict==1
> check the cached value and if opener had CAP_SYSLOG if so:
> print something like this: 00000000 T startup_32
>
> All this without modifying vsprintf, I mean just do the checks outside
> vsprintf() inside your ->read()
Again, this has already been agreed on. As suggested by George I have
a function called seq_kernel_pointer(), which returns either the real
pointer or NULL based on the cached check at seq_open(), so you do
prints like:
seq_printf(seq, "secret value = %p\n", seq_kernel_pointer(seq, ptr));
Once I figure out how to resolve the module sections case, I will post
a patch series.
~Ryan