Over the weekend my server was acting funny. The display wasn't working
well, and I assumed that a driver was going bad. I went to look at the
kernel dmesg, but the buffer only had the following over and over:
[226062.401405] systemd-logind[3511]: Removed session 4168.
[226063.381051] systemd-logind[3511]: Removed session 4169.
[226232.279412] systemd-logind[3511]: New session 4172 of user rostedt.
[226295.639223] systemd-logind[3511]: Removed session 4172.
[227867.920584] systemd-logind[3511]: New session 4204 of user rostedt.
[227869.016023] systemd-logind[3511]: New session 4205 of user rostedt.
[227927.094215] systemd-logind[3511]: Removed session 4204.
[227927.905655] systemd-logind[3511]: Removed session 4205.
[229740.942811] systemd-logind[3511]: New session 4237 of user rostedt.
[229741.505884] systemd-logind[3511]: New session 4238 of user rostedt.
[229799.710123] systemd-logind[3511]: Removed session 4237.
[229800.668171] systemd-logind[3511]: Removed session 4238.
[229835.378869] systemd-logind[3511]: New session 4240 of user rostedt.
[229898.433560] systemd-logind[3511]: Removed session 4240.
[231429.405715] systemd-logind[3511]: New session 4272 of user rostedt.
[231429.964865] systemd-logind[3511]: New session 4273 of user rostedt.
[231487.908190] systemd-logind[3511]: Removed session 4272.
[231488.861240] systemd-logind[3511]: Removed session 4273.
[233280.032816] systemd-logind[3511]: New session 4306 of user rostedt.
[233280.505022] systemd-logind[3511]: New session 4307 of user rostedt.
[233338.761804] systemd-logind[3511]: Removed session 4306.
[233339.749970] systemd-logind[3511]: Removed session 4307.
[233438.696027] systemd-logind[3511]: New session 4309 of user rostedt.
[233499.959512] systemd-logind[3511]: Removed session 4309.
The kernel buffer was completely overridden by useless spewing from
user space. I know that people consider this a "feature" but to me it's
quite annoying that I constantly have to fight to get kernel messages.
I personally believe that only the kernel should have the right to
write into the kernel log buffers, as user space can easily blow away
any useful kernel information with useless logging.
I simply propose a way to let us kernel developers keep user space from
interfering, by adding a new kernel command line parameter that will
disable writing to /dev/kmsg. Any attempt to open the file in write
mode will return a -EPERM error.
This should have no affect on distros that want to keep the feature of
writing to /dev/kmsg, as it requires a kernel command line to disable.
Signed-off-by: Steven Rostedt <[email protected]>
---
Changes since v1:
Rename the kernel parameter disable_devkmsg_write to
printk.disable_kmsg_write, to be more inline with the other printk
kmsg parameters. (Suggested by Borislav Petkov).
Have the internal parameter be set to "read_mostly" as it's only set
on boot up. (Suggested by Uwe Kleine-König).
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 0b3de80ec8f6..517ebe4bb391 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3111,6 +3111,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
default: disabled
+ printk.disable_kmsg_write
+ Disable writing to /dev/kmsg. This prevents user space
+ tools from writing into the kernel printk buffers.
+ When set, opening /dev/kmsg for write mode will return
+ -EPERM.
+
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bfbf284e4218..e86cede893a9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,6 +85,15 @@ static struct lockdep_map console_lock_dep_map = {
};
#endif
+static bool __read_mostly devkmsg_disabled;
+static int __init disable_devkmsg(char *str)
+{
+ devkmsg_disabled = true;
+ return 0;
+}
+__setup("printk.disable_kmsg_write", disable_devkmsg);
+
+
/*
* Number of registered extended console drivers.
*
@@ -799,6 +808,10 @@ static int devkmsg_open(struct inode *inode, struct file *file)
struct devkmsg_user *user;
int err;
+ /* When devkmsg_disabled is set, fail all write access */
+ if (devkmsg_disabled && (file->f_flags & O_ACCMODE))
+ return -EPERM;
+
/* write-only does not need any file context */
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;
On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> +static bool __read_mostly devkmsg_disabled;
> +static int __init disable_devkmsg(char *str)
> +{
> + devkmsg_disabled = true;
> + return 0;
> +}
> +__setup("printk.disable_kmsg_write", disable_devkmsg);
Again, please default enable and use an easier name to toggle this.
Userspace flooding this with junk is really insane.
On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > +static bool __read_mostly devkmsg_disabled;
> > +static int __init disable_devkmsg(char *str)
> > +{
> > + devkmsg_disabled = true;
> > + return 0;
> > +}
> > +__setup("printk.disable_kmsg_write", disable_devkmsg);
>
> Again, please default enable and use an easier name to toggle this.
> Userspace flooding this with junk is really insane.
If you default enable, you break working systems today that want to log
stuff through this interface, sorry.
thanks,
greg k-h
On Mon, Apr 25, 2016 at 12:18:15PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > > +static bool __read_mostly devkmsg_disabled;
> > > +static int __init disable_devkmsg(char *str)
> > > +{
> > > + devkmsg_disabled = true;
> > > + return 0;
> > > +}
> > > +__setup("printk.disable_kmsg_write", disable_devkmsg);
> >
> > Again, please default enable and use an easier name to toggle this.
> > Userspace flooding this with junk is really insane.
>
> If you default enable, you break working systems today that want to log
> stuff through this interface, sorry.
Oh, you mean those 'working' systems that livelocked my box because I
had console to serial enabled which could not keep up with the endless
flood of shite?
This systemd exposure has seriously eroded your sanity.
On Mon, Apr 25, 2016 at 12:06 PM, Peter Zijlstra <[email protected]> wrote:
>
> Again, please default enable and use an easier name to toggle this.
> Userspace flooding this with junk is really insane.
I think it should be a tristate with "yes/no/ratelimit", and let's
default to ratelimit.
And I also suspect that we would be better off not returning an error
(which could make user space decide to break, either intentionally or
just because some people think that "error handling is important"
means that you should abort on all errors you don't recognize), but
just silently drop the write. IOW, the "no" would just be a rather
extreme form of rate-limiting, while "yes" would just be the other
extreme.
Linus
On Mon, 25 Apr 2016 12:18:15 -0700
Greg Kroah-Hartman <[email protected]> wrote:
> On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > > +static bool __read_mostly devkmsg_disabled;
> > > +static int __init disable_devkmsg(char *str)
> > > +{
> > > + devkmsg_disabled = true;
> > > + return 0;
> > > +}
> > > +__setup("printk.disable_kmsg_write", disable_devkmsg);
> >
> > Again, please default enable and use an easier name to toggle this.
> > Userspace flooding this with junk is really insane.
Peter,
I would love to default enable it, but as Greg said...
>
> If you default enable, you break working systems today that want to log
> stuff through this interface, sorry.
I'm not trying to be confrontational here, as these conversations
usually become. I just want a way to keep my systems from having user
space blowing away logs I care about. This is my way to compromise,
letting distros continue to do things the way they always do, and a way
for us to not be disrupted by it.
I could also add a config that default enables it. Then you just need
to make sure that config is set when you build your kernels. That
doesn't help me as I also want this in kernels that I don't build.
Peter, yes it's a rather long nasty name to remember, but it's
documented in kernel-parameters.txt and once you add it as your default
command line, you should be able to quickly forget about it. That's my
plan.
There's too many parameters already that namespace pollution pretty
much dictates this. :-/
-- Steve
On Mon, Apr 25, 2016 at 12:28 PM, Linus Torvalds
<[email protected]> wrote:
>
> I think it should be a tristate with "yes/no/ratelimit", and let's
> default to ratelimit.
Oh, and in general we should *not* use "negative" settings. It's
really annoying to have to enable things with a double negative.
So even if you want to disable something, you should see it as an
"enable" that is on by default, and you can then disable things by
turning that setting off.
That just makes things way more logical.
(Which is not to say that we haven't made this mistake in the past, but ...)
Linus
On Mon, 25 Apr 2016 12:28:30 -0700
Linus Torvalds <[email protected]> wrote:
> On Mon, Apr 25, 2016 at 12:06 PM, Peter Zijlstra <[email protected]> wrote:
> >
> > Again, please default enable and use an easier name to toggle this.
> > Userspace flooding this with junk is really insane.
>
> I think it should be a tristate with "yes/no/ratelimit", and let's
> default to ratelimit.
Hmm, as this is currently decided on opening, it would be an
interesting task, and more complex. But I could look into it.
>
> And I also suspect that we would be better off not returning an error
> (which could make user space decide to break, either intentionally or
> just because some people think that "error handling is important"
> means that you should abort on all errors you don't recognize), but
> just silently drop the write. IOW, the "no" would just be a rather
> extreme form of rate-limiting, while "yes" would just be the other
> extreme.
>
Actually, systemd currently does the right thing when this errors on
opening. It finds another way to do logging. At least on my boxes that
I tested (Debian testing and Fedora 18). If it doesn't error out,
systemd may decided to continue to use this as its means for general
purpose logging, and now I lose out on the logging of userspace, as
everything that systemd does is silently dropped.
-- Steve
On Mon, 25 Apr 2016 12:28:30 -0700
Linus Torvalds <[email protected]> wrote:
> On Mon, Apr 25, 2016 at 12:06 PM, Peter Zijlstra <[email protected]> wrote:
>
> I think it should be a tristate with "yes/no/ratelimit", and let's
> default to ratelimit.
>
Also note that ratelimit wouldn't have solved my issue either. It
wasn't that systemd was rapidly spewing into the kernel buffers. It was
doing it gradually, for every connection that I made to the box (lots
of automated connections were also being logged). I believe something
happened on Friday but due to all the unnecessary logging over the
weekend, I lost it.
To make matters worse, I lost my /var/log/syslog data from then.
logrotate was a bit more destructive than I expected.
-- Steve
On Mon, 25 Apr 2016 21:24:35 +0200 Peter Zijlstra <[email protected]> wrote:
> On Mon, Apr 25, 2016 at 12:18:15PM -0700, Greg Kroah-Hartman wrote:
> > On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> > > On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > > > +static bool __read_mostly devkmsg_disabled;
> > > > +static int __init disable_devkmsg(char *str)
> > > > +{
> > > > + devkmsg_disabled = true;
> > > > + return 0;
> > > > +}
> > > > +__setup("printk.disable_kmsg_write", disable_devkmsg);
> > >
> > > Again, please default enable and use an easier name to toggle this.
> > > Userspace flooding this with junk is really insane.
> >
> > If you default enable, you break working systems today that want to log
> > stuff through this interface, sorry.
>
> Oh, you mean those 'working' systems that livelocked my box because I
> had console to serial enabled which could not keep up with the endless
> flood of shite?
>
> This systemd exposure has seriously eroded your sanity.
I'm trying to remember why we added /dev/kmsg in the first place - it
seems a quite stupid idea. Why the heck should userspace be able to
diddle with the kernel->userspace messaging stream? Sigh.
This:
commit e1b19d6e06a98874018c66990465dcc69b69c4bf
Author: Andrew Morton <[email protected]>
Date: Tue Aug 13 06:12:54 2002 -0700
[PATCH] printk from userspace
The patch allows userspace to issue printk's, via sys_syslog().
The main use of this is within hpa's klibc - initial userspace needs a
way of logging information and this API allows that information to be
captured into the printk ringbuffer. It ends up in /var/log/messages.
Messages are truncated at 1024 characters by printk's vsprintf().
Requires CAP_SYS_ADMIN.
This later got reverted and was replaced with /dev/kmsg and /proc/kmsg.
I guess that's a legitimate enough reason but in retrospect we should
have found a way of just shutting it off after booting has completed.
Steven Rostedt <[email protected]> writes:
> Over the weekend my server was acting funny. The display wasn't working
> well, and I assumed that a driver was going bad. I went to look at the
> kernel dmesg, but the buffer only had the following over and over:
You may also like
https://lkml.org/lkml/2016/4/9/70
Otherwise if you use Chrome the logs totally get spammed every time
you open a new tab. This was the number one log spam cause on my
system.
Unfortunately the audit maintainers totally fail to recognize this
as a fatal flaw :-(
-Andi
--
[email protected] -- Speaking for myself only
On Mon, Apr 25, 2016 at 09:24:35PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 25, 2016 at 12:18:15PM -0700, Greg Kroah-Hartman wrote:
> > On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> > > On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > > > +static bool __read_mostly devkmsg_disabled;
> > > > +static int __init disable_devkmsg(char *str)
> > > > +{
> > > > + devkmsg_disabled = true;
> > > > + return 0;
> > > > +}
> > > > +__setup("printk.disable_kmsg_write", disable_devkmsg);
> > >
> > > Again, please default enable and use an easier name to toggle this.
> > > Userspace flooding this with junk is really insane.
> >
> > If you default enable, you break working systems today that want to log
> > stuff through this interface, sorry.
>
> Oh, you mean those 'working' systems that livelocked my box because I
> had console to serial enabled which could not keep up with the endless
> flood of shite?
>
> This systemd exposure has seriously eroded your sanity.
See Andrew's message, this was added because of klibc :)
On Mon, 25 Apr 2016 12:54:37 -0700
Greg Kroah-Hartman <[email protected]> wrote:
> > This systemd exposure has seriously eroded your sanity.
>
> See Andrew's message, this was added because of klibc :)
For the purpose to pass messages from early init through to final boot.
Systemd continued this for the same purpose. Which I agree is totally
legit. But where systemd fails, is that it continues to use this
interface far beyond the need.
I agree with Andrew, we should have had a way to close the pipe after
the system was up and running.
-- Steve
On Mon, Apr 25, 2016 at 1:00 PM, Steven Rostedt <[email protected]> wrote:
>
> For the purpose to pass messages from early init through to final boot.
> Systemd continued this for the same purpose. Which I agree is totally
> legit. But where systemd fails, is that it continues to use this
> interface far beyond the need.
>
> I agree with Andrew, we should have had a way to close the pipe after
> the system was up and running.
I really don't think this should be about open/close, but about rate
limiting writes.
I think we'd be much better off allowing writes during early boot
(when the real root filesystem hasn't been started yet, so logging to
disk doesn't work - at that point the kernel log really is a better
alternative), and then just start throttling it later (possibly very
aggressively).
But the other issue is that once you actually have logging working, I
don't see why you don't just look at the system logs. Yeah, it's not
/var/log/messages any more, but it's not *that* hard to do. Just use
"journalctl -k" instead of dmesg, and you won't be missing data.
This is why I harp on rate limiting, and I think your patch is silly:
it solves the wrong problem (the one that isn't a real problem), and
it does it with a sledgehammer when a flyswatter would be more
appropriate.
Linus
On Mon, 25 Apr 2016 13:23:36 -0700
Linus Torvalds <[email protected]> wrote:
> But the other issue is that once you actually have logging working, I
> don't see why you don't just look at the system logs. Yeah, it's not
> /var/log/messages any more, but it's not *that* hard to do. Just use
> "journalctl -k" instead of dmesg, and you won't be missing data.
There's a bug somewhere with mine:
# journalctl -k
No journal files were found.
-- No entries --
And dmesg is filled with that session crap.
>
> This is why I harp on rate limiting, and I think your patch is silly:
> it solves the wrong problem (the one that isn't a real problem), and
> it does it with a sledgehammer when a flyswatter would be more
> appropriate.
>
Unfortunately, it appears that my logging is broken. I don't have
anything useful in /var/log/ with respect to kernel messages.
Thus when userspace is broken, you are basically shit-out-of-luck.
-- Steve
On Mon, Apr 25, 2016 at 1:34 PM, Steven Rostedt <[email protected]> wrote:
>
> There's a bug somewhere with mine:
>
> # journalctl -k
> No journal files were found.
> -- No entries --
Is that with your patch? If you don't allow kmsg open, maybe systemd
doesn't end up logging for reads either?
That said, the worst part of systemd by far is the insane binary
logging crap, and I would not be surprised if you have scrogged logs.
Quite frankly, everything else in systemd at least has an _excuse_ for
it. The binary logging is a pile of unadulterated shit.
Linus
On Mon, 25 Apr 2016 13:45:25 -0700
Linus Torvalds <[email protected]> wrote:
> On Mon, Apr 25, 2016 at 1:34 PM, Steven Rostedt <[email protected]> wrote:
> >
> > There's a bug somewhere with mine:
> >
> > # journalctl -k
> > No journal files were found.
> > -- No entries --
>
> Is that with your patch? If you don't allow kmsg open, maybe systemd
> doesn't end up logging for reads either?
Nope. This is on Debian testing, with sysvinit and a systemd-shim to
allow other parts to work. Perhaps that's where the breakage is.
-- Steve
* Steven Rostedt <[email protected]> wrote:
> On Mon, 25 Apr 2016 12:28:30 -0700
> Linus Torvalds <[email protected]> wrote:
>
> > On Mon, Apr 25, 2016 at 12:06 PM, Peter Zijlstra <[email protected]> wrote:
> >
> > I think it should be a tristate with "yes/no/ratelimit", and let's
> > default to ratelimit.
> >
>
> Also note that ratelimit wouldn't have solved my issue either. It
> wasn't that systemd was rapidly spewing into the kernel buffers. It was
> doing it gradually, for every connection that I made to the box (lots
> of automated connections were also being logged). I believe something
> happened on Friday but due to all the unnecessary logging over the
> weekend, I lost it.
>
> To make matters worse, I lost my /var/log/syslog data from then.
> logrotate was a bit more destructive than I expected.
I have similar problems with systemd logging. For test kernels I always use the
'debug' option on the boot command line:
BOOT_IMAGE=/boot/bzImage root=/dev/sda2 ro rd.md=0 rd.lvm=0 rd.dm=0
earlyprintk=serial,ttyS0,115200 console=ttyS0,115200 console=tty
SYSFONT=latarcyrheb-sun16 KEYTABLE=us rd.luks=0 LANG=en_US.UTF-8 debug
^^^^^
... to increase the default loglevel to the max, but in this case systemd decides
that it wants to be very chatty, and note how it even mentions this in the comment
(from the latest systemd source code):
static int parse_proc_cmdline_item(const char *key, const char *value) {
/*
* The systemd.log_xyz= settings are parsed by all tools, and
* so is "debug".
*
* However, "quiet" is only parsed by PID 1, and only turns of
* status output to /dev/console, but does not alter the log
* level.
*/
if (streq(key, "debug") && !value)
log_set_max_level(LOG_DEBUG);
else if (streq(key, "systemd.log_target") && value) {
Why systemd does this fully knowing that other software (such as the kernel)
already uses 'debug', while systemd already has the 'system.*' parameter namespace
is left up to the reader...
Thanks,
Ingo
Ingo Molnar <[email protected]> writes:
> static int parse_proc_cmdline_item(const char *key, const char *value) {
>
> /*
> * The systemd.log_xyz= settings are parsed by all tools, and
> * so is "debug".
> *
> * However, "quiet" is only parsed by PID 1, and only turns of
> * status output to /dev/console, but does not alter the log
> * level.
> */
>
> if (streq(key, "debug") && !value)
> log_set_max_level(LOG_DEBUG);
>
> else if (streq(key, "systemd.log_target") && value) {
>
> Why systemd does this fully knowing that other software (such as the kernel)
> already uses 'debug', while systemd already has the 'system.*' parameter namespace
> is left up to the reader...
A couple of links to save that reader from making futile comments:
https://lkml.org/lkml/2014/4/2/415
https://bugs.freedesktop.org/show_bug.cgi?id=76935
Bjørn
On Mon, Apr 25, 2016 at 01:23:36PM -0700, Linus Torvalds wrote:
> I really don't think this should be about open/close, but about rate
> limiting writes.
FWIW, I've taken your ratelimiting change into SLES at the time the
whole "fun" happened and we haven't had any complaints with it so far:
http://kernel.opensuse.org/cgit/kernel-source/commit/?id=bcda8bd37b684cfa63813c255594f32f5b268758
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Tue, Apr 26, 2016 at 12:43:49PM +0200, Bj?rn Mork wrote:
> Ingo Molnar <[email protected]> writes:
>
> > static int parse_proc_cmdline_item(const char *key, const char *value) {
> >
> > /*
> > * The systemd.log_xyz= settings are parsed by all tools, and
> > * so is "debug".
> > *
> > * However, "quiet" is only parsed by PID 1, and only turns of
> > * status output to /dev/console, but does not alter the log
> > * level.
> > */
> >
> > if (streq(key, "debug") && !value)
> > log_set_max_level(LOG_DEBUG);
> >
> > else if (streq(key, "systemd.log_target") && value) {
> >
> > Why systemd does this fully knowing that other software (such as the kernel)
> > already uses 'debug', while systemd already has the 'system.*' parameter namespace
> > is left up to the reader...
>
> A couple of links to save that reader from making futile comments:
> https://lkml.org/lkml/2014/4/2/415
> https://bugs.freedesktop.org/show_bug.cgi?id=76935
Thanks for providing the links, I did remember all that.
Its one of the many reasons why I recommend against running
systemd(isease).
On Mon, Apr 25, 2016 at 12:54:37PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Apr 25, 2016 at 09:24:35PM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 25, 2016 at 12:18:15PM -0700, Greg Kroah-Hartman wrote:
> > > On Mon, Apr 25, 2016 at 09:06:51PM +0200, Peter Zijlstra wrote:
> > > > On Mon, Apr 25, 2016 at 02:56:06PM -0400, Steven Rostedt wrote:
> > > > > +static bool __read_mostly devkmsg_disabled;
> > > > > +static int __init disable_devkmsg(char *str)
> > > > > +{
> > > > > + devkmsg_disabled = true;
> > > > > + return 0;
> > > > > +}
> > > > > +__setup("printk.disable_kmsg_write", disable_devkmsg);
> > > >
> > > > Again, please default enable and use an easier name to toggle this.
> > > > Userspace flooding this with junk is really insane.
> > >
> > > If you default enable, you break working systems today that want to log
> > > stuff through this interface, sorry.
> >
> > Oh, you mean those 'working' systems that livelocked my box because I
> > had console to serial enabled which could not keep up with the endless
> > flood of shite?
> >
> > This systemd exposure has seriously eroded your sanity.
>
> See Andrew's message, this was added because of klibc :)
The initial feature, yes. But systemd ran with it and made it a problem
(as per usual).