2010-12-22 17:00:52

by Dan Rosenberg

[permalink] [raw]
Subject: [PATCH v5] kptr_restrict for hiding kernel pointers

Add the %pK printk format specifier and
the /proc/sys/kernel/kptr_restrict sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges. Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".


v5 sets kptr_restrict to a default value of 1, and properly handles the
case where it's incorrectly used in IRQ context.

v4 incorporates Eric Paris' suggestion of using
has_capability_noaudit(), since failing this capability check is not a
policy violation but rather a code path choice and shouldn't generate
potentially excessive log noise. Adjusted IRQ comment for clarity.

v3 adds the "2" setting, cleans up documentation, removes the CONFIG,
and incorporates changes and suggestions from Andrew Morton.

v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra. Thanks to Thomas Graf for suggesting use of a centralized
format specifier.


Signed-off-by: Dan Rosenberg <[email protected]>
Cc: James Morris <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Thomas Graf <[email protected]>
Cc: Eugene Teo <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Eric Paris <[email protected]>
---
Documentation/sysctl/kernel.txt | 14 ++++++++++++++
include/linux/kernel.h | 2 ++
kernel/sysctl.c | 9 +++++++++
lib/vsprintf.c | 24 ++++++++++++++++++++++++
4 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..8ace8c4 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
- hotplug
- java-appletviewer [ binfmt_java, obsolete ]
- java-interpreter [ binfmt_java, obsolete ]
+- kptr_restrict
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If

==============================================================

+kptr_restrict:
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces. When
+kptr_restrict is set to (0), there are no restrictions. When
+kptr_restrict is set to (1), the default, kernel pointers
+printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG. When kptr_restrict is set to
+(2), kernel pointers printed using %pK will be replaced with 0's
+regardless of privileges.
+
+==============================================================
+
kstack_depth_to_print: (X86 only)

Controls the number of words to print when dumping the raw
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b6de9a6..b4f4863 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,6 +201,8 @@ extern int sscanf(const char *, const char *, ...)
extern int vsscanf(const char *, const char *, va_list)
__attribute__ ((format (scanf, 2, 0)));

+extern int kptr_restrict; /* for sysctl */
+
extern int get_option(char **str, int *pint);
extern char *get_options(const char *str, int nints, int *ints);
extern unsigned long long memparse(const char *ptr, char **retptr);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..236fa91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
},
#endif
{
+ .procname = "kptr_restrict",
+ .data = &kptr_restrict,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &two,
+ },
+ {
.procname = "ngroups_max",
.data = &ngroups_max,
.maxlen = sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..97543b8 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
return string(buf, end, uuid, spec);
}

+int kptr_restrict = 1;
+
/*
* Show a '%p' thing. A kernel extension is that the '%p' is followed
* by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
* Implements a "recursive vsnprintf".
* Do not use this feature without some mechanism to verify the
* correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
*
* Note: The difference between 'S' and 'F' is that on ia64 and ppc64
* function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,27 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
return buf + vsnprintf(buf, end - buf,
((struct va_format *)ptr)->fmt,
*(((struct va_format *)ptr)->va));
+ case 'K':
+ /*
+ * %pK cannot be used in IRQ context because its test
+ * for CAP_SYSLOG would be meaningless.
+ */
+ if (in_irq() || in_serving_softirq() || in_nmi())
+ WARN_ONCE(1, "%%pK used in interrupt context.\n");
+
+ else if (!kptr_restrict)
+ break; /* %pK does not obscure pointers */
+
+ else if ((kptr_restrict != 2) &&
+ has_capability_noaudit(current, CAP_SYSLOG))
+ break; /* privileged apps expose pointers,
+ unless kptr_restrict is 2 */
+
+ if (spec.field_width == -1) {
+ spec.field_width = 2 * sizeof(void *);
+ spec.flags |= ZEROPAD;
+ }
+ return number(buf, end, 0, spec);
}
spec.flags |= SMALL;
if (spec.field_width == -1) {






2010-12-22 17:19:57

by Dan Rosenberg

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers

On Wed, 2010-12-22 at 12:18 -0500, Dan Rosenberg wrote:
> On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> > * Dan Rosenberg <[email protected]> wrote:
> >
> > > + case 'K':
> > > + /*
> > > + * %pK cannot be used in IRQ context because its test
> > > + * for CAP_SYSLOG would be meaningless.
> > > + */
> > > + if (in_irq() || in_serving_softirq() || in_nmi())
> > > + WARN_ONCE(1, "%%pK used in interrupt context.\n");
> >
> > Hm, that bit looks possibly broken - some useful warning in irq context could print
> > a pointer into the syslog and this would generate a second warning? That probably
> > would crash as it recurses back into the printk code?
> >
>
> The double "%%" acts as an escape and simply prints "%" rather than
> treating it as a format specifier.

I apologize, I misunderstood your point at first glance. I'll consider
this as a potential problem.

-Dan

2010-12-22 17:35:37

by Dan Rosenberg

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers

On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > + case 'K':
> > + /*
> > + * %pK cannot be used in IRQ context because its test
> > + * for CAP_SYSLOG would be meaningless.
> > + */
> > + if (in_irq() || in_serving_softirq() || in_nmi())
> > + WARN_ONCE(1, "%%pK used in interrupt context.\n");
>
> Hm, that bit looks possibly broken - some useful warning in irq context could print
> a pointer into the syslog and this would generate a second warning? That probably
> would crash as it recurses back into the printk code?
>

I don't see a reason to ever use %pK to print to the syslog, since
reading it is now optionally protected with dmesg_restrict, and
stripping pointers from the syslog will cripple any post-mortem
debugging for everyone. I understand the desire to prevent things from
breaking even if it's used incorrectly, but I'm not really convinced
that this would break anything even in this scenario. The WARN_ONCE
will prevent any unbounded recursion. I'm just not clear on how this
could cause a crash.

> Instead a warning could be inserted into the generated output instead, for example
> 'pK-error' (carefully staying within pointer length limits).
>

If it's used in IRQ context and its output needs to be read by a
userspace utility using %p to parse, this will break it.

-Dan

2010-12-22 17:13:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers


* Dan Rosenberg <[email protected]> wrote:

> + case 'K':
> + /*
> + * %pK cannot be used in IRQ context because its test
> + * for CAP_SYSLOG would be meaningless.
> + */
> + if (in_irq() || in_serving_softirq() || in_nmi())
> + WARN_ONCE(1, "%%pK used in interrupt context.\n");

Hm, that bit looks possibly broken - some useful warning in irq context could print
a pointer into the syslog and this would generate a second warning? That probably
would crash as it recurses back into the printk code?

Instead a warning could be inserted into the generated output instead, for example
'pK-error' (carefully staying within pointer length limits).

Also, it would be nice to see a couple of actual %pK usage sites submitted as well -
instead of this pure infrastructure patch.

Ingo

2010-12-22 17:18:04

by Dan Rosenberg

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers

On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> * Dan Rosenberg <[email protected]> wrote:
>
> > + case 'K':
> > + /*
> > + * %pK cannot be used in IRQ context because its test
> > + * for CAP_SYSLOG would be meaningless.
> > + */
> > + if (in_irq() || in_serving_softirq() || in_nmi())
> > + WARN_ONCE(1, "%%pK used in interrupt context.\n");
>
> Hm, that bit looks possibly broken - some useful warning in irq context could print
> a pointer into the syslog and this would generate a second warning? That probably
> would crash as it recurses back into the printk code?
>

The double "%%" acts as an escape and simply prints "%" rather than
treating it as a format specifier.

> Instead a warning could be inserted into the generated output instead, for example
> 'pK-error' (carefully staying within pointer length limits).
>
> Also, it would be nice to see a couple of actual %pK usage sites submitted as well -
> instead of this pure infrastructure patch.
>

I did this separately so that any arguments about individual usage
didn't sink the whole ship. Don't worry, you'll get your usage sites
very soon. :)

-Dan

2010-12-22 21:26:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers


* Dan Rosenberg <[email protected]> wrote:

> On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> > * Dan Rosenberg <[email protected]> wrote:
> >
> > > + case 'K':
> > > + /*
> > > + * %pK cannot be used in IRQ context because its test
> > > + * for CAP_SYSLOG would be meaningless.
> > > + */
> > > + if (in_irq() || in_serving_softirq() || in_nmi())
> > > + WARN_ONCE(1, "%%pK used in interrupt context.\n");
> >
> > Hm, that bit looks possibly broken - some useful warning in irq context could print
> > a pointer into the syslog and this would generate a second warning? That probably
> > would crash as it recurses back into the printk code?
> >
>
> I don't see a reason to ever use %pK to print to the syslog, since
> reading it is now optionally protected with dmesg_restrict, and
> stripping pointers from the syslog will cripple any post-mortem
> debugging for everyone. I understand the desire to prevent things from
> breaking even if it's used incorrectly, but I'm not really convinced
> that this would break anything even in this scenario. The WARN_ONCE
> will prevent any unbounded recursion. I'm just not clear on how this
> could cause a crash.

It's a simple QOI issue. We simply do not add kernel facilities that can produce a
stack overflow, memory corruption and triple fault if a rare debug statement
triggers in an IRQ context by accident:

printk(KERN_WARN "driver bar: bug foo in function %pK\n");

> > Instead a warning could be inserted into the generated output instead, for
> > example 'pK-error' (carefully staying within pointer length limits).
>
> If it's used in IRQ context and its output needs to be read by a
> userspace utility using %p to parse, this will break it.

Didnt you just say that it should not be used from IRQ context? There wont be any
user-space tool to read it - it's a simple robustness change: the warning as you
implemented it can crash the system. I suggested an implementation that would emit
the warning in a more robust way.

Thanks,

Ingo

2010-12-22 21:44:34

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v5] kptr_restrict for hiding kernel pointers

On Wed, 22 Dec 2010 12:17:59 EST, Dan Rosenberg said:
> On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> > * Dan Rosenberg <[email protected]> wrote:
> >
> > > + case 'K':
> > > + /*
> > > + * %pK cannot be used in IRQ context because its test
> > > + * for CAP_SYSLOG would be meaningless.
> > > + */
> > > + if (in_irq() || in_serving_softirq() || in_nmi())
> > > + WARN_ONCE(1, "%%pK used in interrupt context.\n");
> >
> > Hm, that bit looks possibly broken - some useful warning in irq context could print
> > a pointer into the syslog and this would generate a second warning? That probably
> > would crash as it recurses back into the printk code?

> The double "%%" acts as an escape and simply prints "%" rather than
> treating it as a format specifier.

I think Ingo was more worried about the fact that we're doing a WARN_ONCE which
will generate a call to printk() - while we're in the middle of a printk() already.

So if we hit a 'printk(KERN_INFO "Some blather with a %pK pointer in it",ptr) in irq
context, what we'll get (if we're lucky is:

Some blather with a <50-60 lines of WARN_ONCE output> pointer in it.

If we're unlucky? Well...


Attachments:
(No filename) (227.00 B)