Sorry to drag up an old thread again, but Larry Kessler and I have been
having some conversations about this subject over the last week, and I
said I'd post the conclusions we came to. There's a reference to the more
general plans for Event Logging at: http://evlog.sourceforge.net/
1. Making any significant changes to the way we call existing printk subsystem
is going to be extremely unpopular - the sheer bulk of changes needed to
make this kind of update, the mindshift for future developers, and the number
of patches that aren't yet in the mainline kernel that we'd break makes this
unfeasible.
2. Making any significant changes to the way we log existing messages into
/var/log/messages et al via syslog is going to be extremely unpopular - it
will break sysadmin's setups and log parsing scripts.
3. People want to be able to get better debug information, with more detail
than is currently possible with printk, hence the Event logging project.
It's unrealistic to expect wholesale conversion to the new event logging
calls, though some drivers and kernel areas could be converted.
Given these constraints, it seems like it may be best to leave the printk
subsystem more or less "as is", add the evlog subsystem as a seperate
entity, whilst adding hooks to printk to capture messages into the evlog
subsystem as well. This could be reasonably easily implemented by
renaming the existing printk call to printk_raw, and creating a new printk
macro along the lines of:
# ifdef CONFIG_EVLOG
#define printk printk_evlog
# else
#define printk printk_raw
# endif.
where printk_evlog calls printk_raw, then logs an "enhanced" version of
the printk message to the *event logging* subsystem (not /var/log/messages),
including process PID (0 or -1 if in_interrupt() ), file, line number, function,
cpu number, accurate time stamp, etc, etc.
Some people were worried about buffer usage, I'd suggest these enhanced
messages go into the evlog buffer, rather than the existing printk buffer,
which should avoid overflow problems.
Some printk's are not "line atomic" (5845 cases) - ie they do something like this:
"for (i=0; i<10; i++) { printk ("%d ", i); } printk("\n");" - this causes two problems:
1. There seems to me to be a race within the current SMP code (with or without
the event logging stuff). It seems that segments of the line could get interspersed
with segments of another line (or a whole other line) generated by another cpu ...
is this correct?
2. If we have event logging enabled we don't want to log a heavyweight message
for each little piece of the printk - we really want to have one per "event".
Unfortunately, I can't see how to designate an "event" under the existing infrastructure
of printk, but we can get close by breaking things up per line of output.
The solution would be similar for both problems mentioned above: we need to
buffer per line until we have a complete line, then flush the buffer out with the
normal calls. The question is then where to store the buffer, or rather how many
buffers to create. As we potentially have a pre-emptible kernel, I think we need
to make a per task buffer (potentially just on the task's kernel stack?). The only
case I can think of where this might go wrong is during an interrupt handler ...
as long as we cannot get preempted during an interrupt handler, I think we're OK
here too.
There are apparently about 20 cases where we do something like this:
printk("generic options"
# ifdef AUTOPROBE_IRQ
"AUTOPROBE_IRQ"
# else
"AUTOSENSE"
# endif
);
I'd propose we just unwrap these to:
# ifdef AUTOPROBE_IRQ
printk("generic options" "AUTOPROBE_IRQ");
# else
printk("generic options" "AUTOSENSE");
# endif
which seems simple enough.
Hopefully this all makes sense, I know much of this has been stated before,
but it seems useful to pull together the current set of thoughts in one summary.
M.
"Martin J. Bligh" wrote:
>
>> ...
> 1. There seems to me to be a race within the current SMP code (with or without
> the event logging stuff). It seems that segments of the line could get interspersed
> with segments of another line (or a whole other line) generated by another cpu ...
> is this correct?
>
I'm not aware of that being the case. The output string
is formed into a static buffer and then copied into the
printk ring buffer all under spinlock_irqsave(logbuf_lock).
If there is something wrong then it would be occurring
at the other end - where data is taken out of the ring
and is sent to the console device(s). The locking there
is OK, I think?
-
> I'm not aware of that being the case. The output string
> is formed into a static buffer and then copied into the
> printk ring buffer all under spinlock_irqsave(logbuf_lock).
>
> If there is something wrong then it would be occurring
> at the other end - where data is taken out of the ring
> and is sent to the console device(s). The locking there
> is OK, I think?
I think we're talking about slightly different things. I'd agree that
one call to printk is atomic, and won't get interspersed with other
things, but if we output a line via multiple calls to printk, then I
think we have a problem. Say CPU 0 executes this bit of code:
for (i=0; i<10; i++) { printk ("%d ", i); } printk("\n");
and CPU 1 does "printk("hello\n");" then instead of getting either
0 1 2 3 4 5 6 7 8 9
hello
or
hello
0 1 2 3 4 5 6 7 8 9
either of which would be fine, we may get
0 1 2 3 hello
4 5 6 7 8 9
which I don't think is fine - obviously the example is somewhat
trite, but with the real instances of things that build output for one
line through multiple calls to printk, you can get unreadable garbage,
if I read the code correctly ?
M.
"Martin J. Bligh" wrote:
>
> ...
> for (i=0; i<10; i++) { printk ("%d ", i); } printk("\n");
>
> and CPU 1 does "printk("hello\n");" then instead of getting either
>
> 0 1 2 3 4 5 6 7 8 9
> hello
>
> or
>
> hello
> 0 1 2 3 4 5 6 7 8 9
>
> either of which would be fine, we may get
>
> 0 1 2 3 hello
> 4 5 6 7 8 9
>
> which I don't think is fine - obviously the example is somewhat
> trite, but with the real instances of things that build output for one
> line through multiple calls to printk, you can get unreadable garbage,
> if I read the code correctly ?
Ah. Yes, that will definitely happen. We only have atomicity
at the level of a single printk call.
It would be feasible to introduce additional locking so that
multiple printks can be made atomic. This should be resisted
though - printk needs to be really robust, and needs to have
a good chance of working even when the machine is having hysterics.
It's already rather complex.
For the rare cases which you cite we can use a local staging
buffer and sprintf, or just live with it, I suspect.
-
> Ah. Yes, that will definitely happen. We only have atomicity
> at the level of a single printk call.
>
> It would be feasible to introduce additional locking so that
> multiple printks can be made atomic. This should be resisted
> though - printk needs to be really robust, and needs to have
> a good chance of working even when the machine is having hysterics.
> It's already rather complex.
>
> For the rare cases which you cite we can use a local staging
> buffer and sprintf, or just live with it, I suspect.
Right - what I'm proposing would be a generic equivalent of the
local staging buffer and sprintf - basically just a little wrapper
that does this for you, keeping a per task buffer somewhere.
The reason I want to do it like this, rather than what you suggest,
is that there are over 5000 of these "rare cases" of a printk without
a newline, according to the IBM RAS group's code search ;-) I don't
fancy changing that for 5000 instances (obviously some of those are
grouped together, but the count is definitely non-trivial). I'd
attach the report they sent me, but it's 657K long ;-)
M.
On 8 April 2002 21:18, Martin J. Bligh wrote:
[snip]
> where printk_evlog calls printk_raw, then logs an "enhanced" version of
> the printk message to the *event logging* subsystem (not
> /var/log/messages), including process PID (0 or -1 if in_interrupt() ),
> file, line number, function, cpu number, accurate time stamp, etc, etc.
[snip]
As I understand, Linus accepts new features only if they are improving kernel
in some vital area significantly (for example, Ingo's new scheduler).
If he has a feeling that existing subsystem is adequate, he is unlikely to
take replacements and intrusive enhancements (example: kbuild 2.5).
Something along the lines "it is does not broke _enough_, don't rewrite it".
(that's only IMHO, Linus didn't say it AFAIK)
You'll need to show that "enhanced" printk/evlog is significant improvement
and is worth the trouble. That won't be easy.
That said, I wish you luck.
--
vda
> People want to be able to get better debug information, with more detail
> than is currently possible with printk, hence the Event logging project.
I would like to emphasize that point; logging and tracing is of prime
importance in several areas:
- Telecom and Security. Logging and auditing is a requirement in Telecom
applications. It is for instance one of the important features of the
Distributed Security Infrastructure proposed by Ericsson Research and to
be presented in a couple of months at the Ottawa Linux Symposium.
The OSDL "Carrier Grade Linux" is certainly no different.
- Real time systems. A low overhead trace is often the only way to debug
these systems. Lineo, Monte-Vista, and FSM labs all have logging solutions,
most based on the Linux Trace Toolkit (http://www.opersys.com/LTT). It would be
relatively easy to merge the best features of these two systems, present
on the 2.5 status list, (high volume low overhead of LTT and event
templates and filtering of EVLOG).
- Kernel/device drivers debugging.
- Detailed performance analysis. Using the Linux Trace Toolkit, it is possible
to extract very detailed information like the time spent by a process
executing, executing on behalf of client x, waiting for file y
(read/page fault), waiting for process z...
The current printk simply does not cut it for these applications. There are
over 80000 printk statements in the kernel, using many different conventions.
A few tens of driver specific tracing facilities (SCSI, ftape, wireless...)
were implemented each with its own mechanism to compile/not the debugging
statements, trigger massive debugging output at runtime...
The EVLOG proposition strikes me as a good balance between solid features,
simplicity, and ease of integration/transition with the current printk.
Here are some of the advantages of more structured logging:
- More consistent activation mechanisms for logging points
troughout the kernel at configuration/compilation time and at runtime.
- Structured data events for which it is easier to apply filtering, querying,
analysis and detection tools.
- More compact format, when desired, where data and text descriptions are
separated. This facilitates high volume applications (lower logging
overhead, smaller files) and also enables customization/i18n of the static
text descriptions.
- In its current configuration, klogd rapidly looses events under high volumes.
The Linux Trace Toolkit with its zero-copy, kernel-daemon shared memory,
does much better under heavy debugging/tracing output.
On Mon, 8 Apr 2002, Martin J. Bligh wrote:
> I think we're talking about slightly different things. I'd agree that
> one call to printk is atomic, and won't get interspersed with other
> things, but if we output a line via multiple calls to printk, then I
> think we have a problem. Say CPU 0 executes this bit of code:
>
> for (i=0; i<10; i++) { printk ("%d ", i); } printk("\n");
>
> and CPU 1 does "printk("hello\n");" then instead of getting either
>
> 0 1 2 3 4 5 6 7 8 9
> hello
>
> or
>
> hello
> 0 1 2 3 4 5 6 7 8 9
>
> either of which would be fine, we may get
>
> 0 1 2 3 hello
> 4 5 6 7 8 9
>
> which I don't think is fine - obviously the example is somewhat
> trite, but with the real instances of things that build output for one
> line through multiple calls to printk, you can get unreadable garbage,
> if I read the code correctly ?
You read the code correctly, and it follows the intent just fine. When
doing the printk I don't believe you want to introduce any delay in the
output, since often the system is in deep shit by the time the printk
starts.
If you want buffering you can add it on a case-by-case basis, but in
general I don't believe you do want a delay, because the output might be
lost on a dying system. Prink works like output to stderr, character
buffered. I would think a change to anything this fundimental would be a
Linus decision, but I think it's correct as is.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
> As I understand, Linus accepts new features only if they are improving
> kernel in some vital area significantly (for example, Ingo's new
> scheduler).
I think that's more true of 2.4 than 2.5, but a change should indeed
make some improvement to be accepted. What seems to be more vital
is that the cost:benefit ratio is advantageous ... much of the
discussions that Larry and I were having were oriented to keeping
the cost very low indeed ... if you didn't turn on event logging,
the cost would be pretty much 0 (just those macro unwraps).
> You'll need to show that "enhanced" printk/evlog is significant
> improvement and is worth the trouble. That won't be easy.
Having worked in Customer service for a long time on high end
systems, I know that having accurate error logging is critical
to fast problem diagnosis. Look at the other extreme which people
like Microsoft drift towards ... I hate it when you get error
messages like "Error. Something is wrong somewhere".
If the open source methodology of "many eyes fix bugs" is to work
as well as possible, we need to help those eyes to see as clearly
as possible.
M.
> If you want buffering you can add it on a case-by-case basis, but in
> general I don't believe you do want a delay, because the output might be
> lost on a dying system. Prink works like output to stderr, character
> buffered. I would think a change to anything this fundimental would be a
> Linus decision, but I think it's correct as is.
OK, now try to read the panic output when two cpus panic at once ;-)
Been there, sworn vehemently at that ...
There's no point in logging messages if you can't read them afterwards.
I think 99.99% of such cases would not involve printk printing half
a buffered line, then dying, though I admit it's technically possible.
Of course, we could just do this buffering for the event logging half
of the subsystem if people really object. Personally, I think it's a
win to fix printk whilst we're at it, but a half-fix is always an
option.
M.
On Tue, 09 Apr 2002 07:42:32 -0700, "Martin J. Bligh"
<[email protected]> wrote:
>> As I understand, Linus accepts new features only if they are improving
>> kernel in some vital area significantly (for example, Ingo's new
>> scheduler).
>
>I think that's more true of 2.4 than 2.5, but a change should indeed
>make some improvement to be accepted. What seems to be more vital
>is that the cost:benefit ratio is advantageous ... much of the
>discussions that Larry and I were having were oriented to keeping
>the cost very low indeed ... if you didn't turn on event logging,
>the cost would be pretty much 0 (just those macro unwraps).
>
What I have observed is that Linus tends to accept some major changes
that have achieved a certain consensus. ReiserFS is a good example. It
was in wide use and was being shipped by SUSE... not universal but
there was significant usage. Other good ideas with no significant
usage languish unless it is something interesting to Linus himself.
john
On Tue, 2002-04-09 at 07:21, Michel Dagenais wrote:
>
> The current printk simply does not cut it for these applications. There are
> over 80000 printk statements in the kernel, using many different conventions.
It would be easier, to fix the printk's, than to put evlogging into any
particular piece of the kernel.
> A few tens of driver specific tracing facilities (SCSI, ftape, wireless...)
> were implemented each with its own mechanism to compile/not the debugging
> statements, trigger massive debugging output at runtime...
>
this could also be fixed as easily as adding evloging.
> The EVLOG proposition strikes me as a good balance between solid features,
> simplicity, and ease of integration/transition with the current printk.
>
Evlog side-by-side with printk adds significat bloat.
> Here are some of the advantages of more structured logging:
>
> - More consistent activation mechanisms for logging points
> troughout the kernel at configuration/compilation time and at runtime.
I'm not sure what you are saying here, but if I understand you, this is
a problem with the printk calls written by the author. There is nothing
in evlog that will make this significatly easier for evlog as opposed to
evlog, and now you have two almost identical mechanisims to maintain.
>
> - Structured data events for which it is easier to apply filtering, querying,
> analysis and detection tools.
this is a post processing problem. To speak to Martins issue of
interspersed messages, it would be easy enough, to make sure that each
call to printk starts with a new line, and prepends id information to
sort the messages. What I hear you asking for, is to make it more of
the kernels responsibilty easing the problem of analysing the out put,
as opposed to making that the responsibilty of user space
postprocessing. One thing to keep in mind, 99% of logged messages will
never be reviewed.
>
> - More compact format, when desired, where data and text descriptions are
> separated. This facilitates high volume applications (lower logging
> overhead, smaller files) and also enables customization/i18n of the static
> text descriptions.
But poorly implemented, a new format will in pratice increase the
volume, and with the increased complexity of the logging also slowing
down, logging will be slower, and more messages will be lost. This has
been seen in pratice.
>
> - In its current configuration, klogd rapidly looses events under high volumes.
> The Linux Trace Toolkit with its zero-copy, kernel-daemon shared memory,
> does much better under heavy debugging/tracing output.
I would prefer to see effort expended on fixing printk/klogd...off the
top of my head:
- make printk a macro that prepends file/function/line to the message.
- fix printk calls: messages with consistent format, calls in the right
places, with the "correct" information.
- postprocessing tools for analysing the logs.
I would say that this is probably less work than implementing evlog,
much less work to maintain, and provide generally better performance.
>> The current printk simply does not cut it for these applications. There are
>> over 80000 printk statements in the kernel, using many different conventions.
>
> It would be easier, to fix the printk's, than to put evlogging into any
> particular piece of the kernel.
You want to fix 80000 printks? Be my guest ... I await your patch eagerly.
If you mean changing printk with a wrapper to help clean up the existing
mess in an automated fashion, that's exactly what's being proposed.
> Evlog side-by-side with printk adds significat bloat.
To what? A kernel with event logging switch on? Sure.
But if you don't want it, don't turn it on. If it's a config option, I don't
see why anyone would care.
> What I hear you asking for, is to make it more of
> the kernels responsibilty easing the problem of analysing the out put,
> as opposed to making that the responsibilty of user space
> postprocessing.
Indeed. That's because the kernel has more context, and can trivially
log the information it has, rather then reverse engineering it from user space.
Why munge all the messages to post them through a tiny little formatting
hole, and then try to unmunge them all again on the other side with a
bunch of hokey scripts?
> One thing to keep in mind, 99% of logged messages will
> never be reviewed.
If we had a more structured log format, it'd be a damned sight easier to
write automated tools to parse through them, and actually do something
useful with that 99%. Been there, done that.
> But poorly implemented, a new format will in pratice increase the
> volume, and with the increased complexity of the logging also slowing
> down, logging will be slower, and more messages will be lost. This has
> been seen in pratice.
But correctly implemented, it will help. That's why this is being debated in
public to make sure the design is correct.
> I would prefer to see effort expended on fixing printk/klogd...off the
> top of my head:
>
> - make printk a macro that prepends file/function/line to the message.
> - fix printk calls: messages with consistent format, calls in the right
> places, with the "correct" information.
> - postprocessing tools for analysing the logs.
This is actually very close to what is being proposed. The main reason the
we came to the conclusion that end result should be dumped into an evlog
file instead of dmesg and /var/log/messages is that changing the format
of /var/log/messages breaks the existing log parsing tools that people have.
If people wanted to take the improvments made for event logging, and use
those to change the dmesg output format, that would be a fairly simple efffort
for them to do as an additional patch on top of event logging.
M.
On Tue, 2002-04-09 at 14:16, Martin J. Bligh wrote:
> >> The current printk simply does not cut it for these applications. There are
> >> over 80000 printk statements in the kernel, using many different conventions.
> >
> > It would be easier, to fix the printk's, than to put evlogging into any
^ required
> > particular piece of the kernel.
>
> You want to fix 80000 printks? Be my guest ... I await your patch eagerly.
> If you mean changing printk with a wrapper to help clean up the existing
> mess in an automated fashion, that's exactly what's being proposed.
how nice of you to say so. As to automating, I don;t believe it can
work. You will be adding volume to the log, making the log processing
more complicated, a replay of EES.
>
> > Evlog side-by-side with printk adds significat bloat.
>
> To what? A kernel with event logging switch on? Sure.
> But if you don't want it, don't turn it on. If it's a config option, I don't
> see why anyone would care.
The poor sod who has to maintain this stuff.
>
> > What I hear you asking for, is to make it more of
> > the kernels responsibilty easing the problem of analysing the out put,
> > as opposed to making that the responsibilty of user space
> > postprocessing.
>
> Indeed. That's because the kernel has more context, and can trivially
> log the information it has, rather then reverse engineering it from user space.
> Why munge all the messages to post them through a tiny little formatting
> hole, and then try to unmunge them all again on the other side with a
> bunch of hokey scripts?
Information that only the kernel has is not what I'm talking about, it
is all the stuff to make it easy to parse and collate, it has to be
possible, not necessarly easy.
>
> > One thing to keep in mind, 99% of logged messages will
> > never be reviewed.
>
> If we had a more structured log format, it'd be a damned sight easier to
> write automated tools to parse through them, and actually do something
> useful with that 99%. Been there, done that.
What? generate reports that will be munged for statistics that will be
quoted in endless management meetings.
>
> > But poorly implemented, a new format will in pratice increase the
> > volume, and with the increased complexity of the logging also slowing
> > down, logging will be slower, and more messages will be lost. This has
> > been seen in pratice.
>
> But correctly implemented, it will help. That's why this is being debated in
> public to make sure the design is correct.
But you could get 90% of the advantage for much less effort by fixing
the problems with printk/klogd without implementing yet another
subsystem.
>
> > I would prefer to see effort expended on fixing printk/klogd...off the
> > top of my head:
> >
> > - make printk a macro that prepends file/function/line to the message.
> > - fix printk calls: messages with consistent format, calls in the right
> > places, with the "correct" information.
> > - postprocessing tools for analysing the logs.
>
> This is actually very close to what is being proposed. The main reason the
> we came to the conclusion that end result should be dumped into an evlog
> file instead of dmesg and /var/log/messages is that changing the format
> of /var/log/messages breaks the existing log parsing tools that people have.
And this could be avoided using the "improved printk" by not compiling
the new features.
>> You want to fix 80000 printks? Be my guest ... I await your patch eagerly.
>> If you mean changing printk with a wrapper to help clean up the existing
>> mess in an automated fashion, that's exactly what's being proposed.
>
> how nice of you to say so.
You're welcome ;-)
> As to automating, I don;t believe it can work. You will be adding volume
> to the log, making the log processing more complicated, a replay of EES.
OK, I think we both agree there's a problem with the current subsystem.
Larry and I discussed the problem and think we have a low impact way
of improving the current situation, giving the customers that require a
robust, detailed logging system what they want (by turning on a config
option) whilst having pretty much 0 impact to the rest of the community.
In my mind, this should make everone happy. You obviously have some
issue with this - as far as I can tell, that seems to be tied to your concepts
of EES (for those that don't know, EES was Dynix/PTX's "error event subsystem"
or some such name). This is not EES, nor ever will be - EES's major design flaw
was that it pervasively invaded everything throughout the kernel, requiring
every equivalent of printk to be changed to a complex log event. It also took
away the existing subsystem equivalent to what ends up in /var/log/messages,
breaking logging parsing scripts. Nobody (that I know of) is proposing doing any
such thing to Linux. On the other hand customers liked the new more powerful
ability to search the error scripts, and automatically generate events from that,
such as email, paging, etc. FWIW, I was the Sequent customer service rep for EES.
We are trying to get for Linux the benefits of EES without the associated pain.
In more specific reply to what you said, we'd be adding volume to the evlog
*if* you turned evlog on, making log processing more complex *if* you turned
evlog on, and this is not EES. This gives people who want evlog that functionality,
and people who don't want it no change. If people object to buffering the normal
printk per line, we don't have to do that. That was a side observation, not a critical
part of this - in fact if it's not fixed, it just provides one more reason for people to
use evlog.
>> > Evlog side-by-side with printk adds significat bloat.
>>
>> To what? A kernel with event logging switch on? Sure.
>> But if you don't want it, don't turn it on. If it's a config option, I don't
>> see why anyone would care.
>
> The poor sod who has to maintain this stuff.
Those people are the exact same people who are proposing it. IBM, and the
large group of customers who want this functionality are unlikely to dissappear
overnight, I feel. They will maintain this subsystem - we're not asking you
personally to do it.
>> Indeed. That's because the kernel has more context, and can trivially
>> log the information it has, rather then reverse engineering it from user space.
>> Why munge all the messages to post them through a tiny little formatting
>> hole, and then try to unmunge them all again on the other side with a
>> bunch of hokey scripts?
>
> Information that only the kernel has is not what I'm talking about, it
> is all the stuff to make it easy to parse and collate, it has to be
> possible, not necessarly easy.
As I've stated before, I don't think it's a good idea to change the current
/var/log/messages format unilaterally - we'd never get agreement from
everyone to do it, and I wouldn't want to even if we did - it'd make an
unreadable mess of a simple text file. I prefer to make things easy, not
just possible. Are you disagreeing with the information that is being
proposed to log, or just that you want all that stuff dumped into dmesg
and /var/log/messages (via the limitations of syslog)?
>> If we had a more structured log format, it'd be a damned sight easier to
>> write automated tools to parse through them, and actually do something
>> useful with that 99%. Been there, done that.
>
> What? generate reports that will be munged for statistics that will be
> quoted in endless management meetings.
No, generating useful tools for real customers on critical systems that
alert them when certain errors occur.
> But you could get 90% of the advantage for much less effort by fixing
> the problems with printk/klogd without implementing yet another
> subsystem.
I disagree - I think you'll make an unwieldy ugly mess of the existing
subsystem, and piss everybody off. The evlog stuff is self-contained.
>> This is actually very close to what is being proposed. The main reason the
>> we came to the conclusion that end result should be dumped into an evlog
>> file instead of dmesg and /var/log/messages is that changing the format
>> of /var/log/messages breaks the existing log parsing tools that people have.
>
> And this could be avoided using the "improved printk" by not compiling
> the new features.
see above. Try getting an existing section of a /var/log/messages file, converting
it to include all the information we're trying to log, and writing tools to parse it.
Then publish the resultant mess back here to see if people like what you've
done to the existing format they're used to.
M.
On Tue, 2002-04-09 at 15:28, Brian Beattie wrote:
> >
> > > I would prefer to see effort expended on fixing printk/klogd...off the
> > > top of my head:
> > >
> > > - make printk a macro that prepends file/function/line to the message.
I take this one back, file/function/line does not add any useful
information to the vast majority of the printk messages, but adds a
great deal of bloat to the logs.
I think establishing some guide lines for log message content and
format, and updating some of the more critical parts of the kernel would
do a lot more to provide the kind of logging that is being called for.
It may well be that this needs to be done in a way that can be
controlled by a compile time option so the old format can be had by
those who prefer it, and the new by those who feel the need.
The basic problem, in my experience, is not the logging mechanisims, but
that the correct messages are not produced under the correct
conditions. This is a problem that can only be fixed by adding the
appropriate calls to whatever logging mechanism is available.
That is to say that, not all conditions that are critical are logged in
a way that provides the information to determine, after the fact, what
happened. A new logging mechanism will not fix that.
On Mon, Apr 08, 2002 at 07:14:44PM -0700, Martin J. Bligh wrote:
> > Ah. Yes, that will definitely happen. We only have atomicity
> > at the level of a single printk call.
> >
> > It would be feasible to introduce additional locking so that
> > multiple printks can be made atomic. This should be resisted
> > though - printk needs to be really robust, and needs to have
> > a good chance of working even when the machine is having hysterics.
> > It's already rather complex.
> >
> > For the rare cases which you cite we can use a local staging
> > buffer and sprintf, or just live with it, I suspect.
>
> Right - what I'm proposing would be a generic equivalent of the
> local staging buffer and sprintf - basically just a little wrapper
> that does this for you, keeping a per task buffer somewhere.
That still doesn't solve the race with the interrupt handlers, you'd
need a buffer for each irq handler and one the softirq too to make
printk buffered and coeherent coherent across newlines (doable but even
more tricky and in turn less robust and less self contained).
> The reason I want to do it like this, rather than what you suggest,
> is that there are over 5000 of these "rare cases" of a printk without
> a newline, according to the IBM RAS group's code search ;-) I don't
> fancy changing that for 5000 instances (obviously some of those are
> grouped together, but the count is definitely non-trivial). I'd
> attach the report they sent me, but it's 657K long ;-)
Pavel omits the newline intentionally during debugging, to avoid losing
80% of the persistent stoarge in the framebuffer, but ok we could
implement a printk_flush that flushes the buffer afterwards no matte
what. Some other code may omit it by mistake, leading to the other cpus
blackholed and data lost after the buffer on the other cpus overflowed
so at least we should put a timer that spawns an huge warning if a cpu
doesn't flush the buffer in a rasonable amount of time so we can catch
those places.
Given the overcomplexity and the fact the logs should be parsed by
humans and the low probability for the race, I'm not sure if the above
is worthwhile.
If what you need is a high bandwith logging system, where with an high
frequency of posted events the races could become more likely to
trigger, printk is not the way to go for high bandwith anyways.
Andrea
>> Right - what I'm proposing would be a generic equivalent of the
>> local staging buffer and sprintf - basically just a little wrapper
>> that does this for you, keeping a per task buffer somewhere.
>
> That still doesn't solve the race with the interrupt handlers, you'd
> need a buffer for each irq handler and one the softirq too to make
> printk buffered and coeherent coherent across newlines (doable but even
> more tricky and in turn less robust and less self contained).
I was envisaging a larger buffer where the current location pointer
simply taken by the interrupt handler, and the remaining section of
that buffer was used for the "inner printk". Which is really just
like a stack, so it makes more sense to just allocate this off the
stack really .... I think this would work? We might need to flush
on a certain size limit (128 chars, maybe?) to stop any risk of
stack overflow.
> Some other code may omit it by mistake, leading to the other cpus
> blackholed and data lost after the buffer on the other cpus overflowed
> so at least we should put a timer that spawns an huge warning if a cpu
> doesn't flush the buffer in a rasonable amount of time so we can catch
> those places.
It seems that 99.9% of these cases are just assembling a line of output
a few characters at a time. There might be a few odd miscreants around,
but not enough to worry about - I think it's overkill to do the runtime
timer check, but we could always run like this to test it, or try to
work some sort of automated code inspection (though that sounds hard to
do 100% reliably).
M.
On 9 April 2002 18:49, Brian Beattie wrote:
> I would prefer to see effort expended on fixing printk/klogd...off the
> top of my head:
>
> - make printk a macro that prepends file/function/line to the message.
> - fix printk calls: messages with consistent format, calls in the right
> places, with the "correct" information.
> - postprocessing tools for analysing the logs.
>
> I would say that this is probably less work than implementing evlog,
> much less work to maintain, and provide generally better performance.
Sounds ok for me.
It will be difficult to push it into mainline kernel.
I tried to fix loglevels in some printks. Patches were _trivial_
but nevertheless they weren't taken.
--
vda
Hi,
I am currently working in the Linux/390 (mainframe) team. Our
customers usually have really large installations with dozens
of Linux images. In those scenarios it is very important to
have a good logging mechanism to do service and support system
operators in their daily work.
I tried out posix event logging (kernel and userspace) on Linux/390.
In my eyes the main benefits of posix event logging compared to the
current printk/syslog solution are:
- Additional event data is automatically logged: PID,PGRP,UID,GID,etc.
- Appropriate user space API to get this additional event data
- Better event classification: It is possible to define multiple
facilities and event types. E.g. write out an event with
facility=scsi and event type = 4711 (hardware malfunction)
- Event registration:
E.g. it is possible to register callbacks to all events with
severity=critical, facility=scsi, data contains "xyz" etc.
If a matching log entry is written the callback is triggered.
--> Automatic processing of events is much easier than with
the existing printk/syslog mechanism
In my last project (before Linux/390) we tried to add Unix
support to an already existing automation software on OS/390.
This software basically is an automated operator, where you
can define rules what should be done, when message xyz is written
to the event log. On OS/390 there exist APIs for event
registration and getting additional event data.
With the existing syslogd solution it was really hard or
nearly impossible to get all the required information.
If we had something like posix event in those days,
logging things would have been much easier.
Because there probably exist a lot of linux installations
which rely on syslogd and perhaps do automatic parsing of
var/log/messages in order to find out that e.g. driver x
has problems and the operator should be informed etc.,
I think it would be important to have both options:
feed printk messages into posix event logging (this does
the current patch as far as I know) AND feed events
which are written with the new posix event APIs into the
traditional syslogd logging. Perhaps this could be
configurable via kernel parameters. So nobody would be
forced to use the new functions, the printk API remains
in the kernel, operators have the choice between syslogd
and posix event logging (at least for some transitional
period)
Best Regards
Michael
------------------------------------------------------------------------
Linux for E-Server Development
Phone: +49-7031-16-2360, Bld 71032-03-U09
Email: [email protected]
On Wed, Apr 10, 2002 at 07:07:46PM -0500, Mike Sullivan wrote:
> >> Right - what I'm proposing would be a generic equivalent of the
> >> local staging buffer and sprintf - basically just a little wrapper
> >> that does this for you, keeping a per task buffer somewhere.
> >
> > That still doesn't solve the race with the interrupt handlers, you'd
> > need a buffer for each irq handler and one the softirq too to make
> > printk buffered and coeherent coherent across newlines (doable but even
> > more tricky and in turn less robust and less self contained).
>
> I was envisaging a larger buffer where the current location pointer
> simply taken by the interrupt handler, and the remaining section of
> that buffer was used for the "inner printk". Which is really just
> like a stack, so it makes more sense to just allocate this off the
> stack really .... I think this would work? We might need to flush
> on a certain size limit (128 chars, maybe?) to stop any risk of
> stack overflow.
>
> > Some other code may omit it by mistake, leading to the other cpus
> > blackholed and data lost after the buffer on the other cpus overflowed
> > so at least we should put a timer that spawns an huge warning if a cpu
> > doesn't flush the buffer in a rasonable amount of time so we can catch
> > those places.
>
> It seems that 99.9% of these cases are just assembling a line of output
> a few characters at a time. There might be a few odd miscreants around,
> but not enough to worry about - I think it's overkill to do the runtime
> timer check, but we could always run like this to test it, or try to
> work some sort of automated code inspection (though that sounds hard to
> do 100% reliably).
As an alternative to building a buffer and flushing, maybe it would be better ifin the case of evlog records of this type could be marked with a continuation flag. During post processing there should be enough information on the invocation location to the record to allow the recombination and viewing of the full record.
In general I think the event notification features offered by evlog offer a good platform upon which to build some very useful sysadmin tools.
>
> M.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
> Thanks.
> Mike Sullivan
> IBM Linux Technology Center
> (512)838-0539 or t/l 678-0539
Michael Holzheu wrote...
> I think it would be important to have both options:
> feed printk messages into posix event logging (this does
> the current patch as far as I know)
The current patch forks the message to evlog inside the printk
function. This thread is proposing that the printk function be
wrapped inside a macro so you could easily capture
file/funcname/lineno of the calling function along with the
original printk message
(plus the other stuff stored in the evlog record header).
> AND feed events
> which are written with the new posix event APIs into the
> traditional syslogd logging.
This would be done in user-space, not in the kernel. This is on
our enhancements list for event logging.
Hello,
Michel Dagenais <[email protected]> writes
>> People want to be able to get better debug information, with more detail
>> than is currently possible with printk, hence the Event logging project.
>
>I would like to emphasize that point; logging and tracing is of prime
>importance in several areas:
>
>- Telecom and Security. Logging and auditing is a requirement in Telecom
> applications. It is for instance one of the important features of the
> Distributed Security Infrastructure proposed by Ericsson Research and to
> be presented in a couple of months at the Ottawa Linux Symposium.
> The OSDL "Carrier Grade Linux" is certainly no different.
>
I am currently working in the Telecom area (signalling), on systems that
usually process several tenths of thousands of telephone calls per
second, during years. So I am the kind of guy interrested in having
enhancements to the Linux logging sub-systems, to:
1. Make my life easier when something goes wrong on my user's sites &
I am supposed to gather the information from the logs.
2. Make my user's life easier, to detect that something is going
wrong on my machines (among hunderds of other ones), so that he
can (try to) join me.
3. Make my developer's life easier, not forcing me to code yet
another log parser dedicated to one system & use open standard
instead.
4. Not make my Linux desktop system an intensive disk & CPU consume r
for logging-only purpose (this is more a personal view than a
professional constraint... :-)
I have been following the work done by evlog team since a few monthes
now, so I can say now - as an evlog user - that it provides support for
every requirements listed above, the following ways:
* Evlog can be configured to flag log messages as part of a
functional area (facilty) so that they can be logged in separated
log files, regardless of the source of the log: kernel or
userland. Take the case of a telecom protocol stack running partly
in kernel space (device driver and/or protocol module) and partly
in userland (application). Evlog can be configured a way so that
logs coming from the telecom functional area are consolidated at
the same place. If something goes wrong in kernel space, the logs
of the whole telecom sub-system are placed together showing
(mostly) the appropriated order of failure. This considerably
helps troubleshooting a failure (my trouble-shooting life).
* From the telecom network supervision/management perspective (my
user's life), the ability of evlog to have registered call-backs
(in user-land) on specified (configured) events permits to:
o Take automatic corrective actions if possible
o Raise network alarms, so that the operators can log on the
system & trouble-shoot it (or at least apply a corrective
procedure & call a help-desk)
* Evlog is a proposed standard to POSIX. I do hope that it will be
accepted (as well as in the LSB, but this is probably another
story...), so that the telecom stack & network supervision work
can be re-used from one system to another.
* Evlog can be totally optimized away at kernel configuration time
(like any other CONFIG_XXX component, after all), or have no
noticeable no cost when compiled-in & powered-off. (my day-to-day
Linux desktop user life).
For the reasons listed above, I strongly support integrating evlog in
the Linux kernel.
FiX
--
Francois-Xavier "FiX" KOWALSKI /_ __ Tel: +33 (0)4 76 14 63 27
Telecom Infrastructure Division / //_/ Fax: +33 (0)4 76 14 43 23
SigTech eXpert / HP Telnet: 779-6327
i n v e n t http://www.hp.com/go/opencall
Francois-Xavier Kowalski wrote:
> I am currently working in the Telecom area (signalling), on systems that
> usually process several tenths of thousands of telephone calls per
> second, during years. So I am the kind of guy interrested in having
> enhancements to the Linux logging sub-systems
You did not say what Linux distribution(s) you "favor", but MontaVista
has announced that Event Logging, Linux Trace Toolkit, and some other
features you probably want will be included in their Carrier Grade
Edition,
Version 2.1.
See http://www.mvista.com/dswp/MVLCGE.pdf for details.
Francois-Xavier Kowalski wrote:
> I am currently working in the Telecom area (signalling), on systems that
> usually process several tenths of thousands of telephone calls per
> second, during years. So I am the kind of guy interrested in having
> enhancements to the Linux logging sub-systems
You did not say what Linux distribution(s) you "favor", but MontaVista
has announced that Event Logging, Linux Trace Toolkit, and some other
features you probably want will be included in their Carrier Grade
Edition, Version 2.1.
See http://www.mvista.com/dswp/MVLCGE.pdf for details.
On Tue, Apr 09, 2002 at 10:28:57PM -0700, Martin J. Bligh wrote:
> >> Right - what I'm proposing would be a generic equivalent of the
> >> local staging buffer and sprintf - basically just a little wrapper
> >> that does this for you, keeping a per task buffer somewhere.
> >
> > That still doesn't solve the race with the interrupt handlers, you'd
> > need a buffer for each irq handler and one the softirq too to make
> > printk buffered and coeherent coherent across newlines (doable but even
> > more tricky and in turn less robust and less self contained).
>
> I was envisaging a larger buffer where the current location pointer
> simply taken by the interrupt handler, and the remaining section of
> that buffer was used for the "inner printk". Which is really just
> like a stack, so it makes more sense to just allocate this off the
> stack really .... I think this would work? We might need to flush
> on a certain size limit (128 chars, maybe?) to stop any risk of
> stack overflow.
that's equivalent with the big difference it will waste many nr_task
times more ram, you'd need to reserve some ram in each task-buffer for
the nr_irqs*(bufsize+1) nested printk. So for every task in the system
you wouldn't have only the overhead of bufsize but bufsize*(nr_irqs+2).
It doesn't sound a good idea to me. remember all irqs can be nested. The
kernel stack is just unsafe enough against nested irq (not anymore on
x86-64 luckily), printk should have a guaranteed bufsize instead. It
would be pointless to make printk robust against \n and overflowing
trivially with nested irq, then you'd be unreliable again.
> > Some other code may omit it by mistake, leading to the other cpus
> > blackholed and data lost after the buffer on the other cpus overflowed
> > so at least we should put a timer that spawns an huge warning if a cpu
> > doesn't flush the buffer in a rasonable amount of time so we can catch
> > those places.
>
> It seems that 99.9% of these cases are just assembling a line of output
> a few characters at a time. There might be a few odd miscreants around,
> but not enough to worry about - I think it's overkill to do the runtime
> timer check, but we could always run like this to test it, or try to
> work some sort of automated code inspection (though that sounds hard to
> do 100% reliably).
Doing the check at runtime would be better, it will catch binary-only
module bugs and drivers out of the kernel bugs too. It should be
possible to disable the debugging feature with an #ifdef though.
But again, the above is all assuming we will change it, but I'm not
suggesting to change printk, I think atomicity printk-against-printk is
enough because of the non-segfaulting-human-parsing and because of the
low rate and in turn near zero probability for races (so basically zero
wasted time while reading the logs).
An high bandwith logging channel were such race starts to matter could
be built with PF_NETLINK or something like that, without using printk,
you need to store the plenty of data into a file and read it later, all
the console stuff is overhead for such a load, printk isn't high
bandwith in production, also think all the serial consoles out there.
Andrea
> It would be easier, to fix the printk's, than to put evlogging into any
> particular piece of the kernel.
Fine, let's call evlog "an enhanced printk" and discuss the specific technical
details of the proposition.
> Evlog side-by-side with printk adds significat bloat.
Whenever you change/enhance such things you may need coexistance for a while.
However this is configurable, and to a certain extent temporary, bloat.
> > - Structured data events for which it is easier to apply filtering, querying,
> > analysis and detection tools.
>
> this is a post processing problem.
...
> What I hear you asking for, is to make it more of
> the kernels responsibilty easing the problem of analysing the out put,
> as opposed to making that the responsibilty of user space
> postprocessing.
Actually this is pushing the formatting out of the kernel, is more efficient,
and it leaves more flexibility to the logging daemon!
Mark Hahn wrote:
>
> user-space ones, fine. you ex-mainframers have yet to demonstrate
> any sane argument for why printk is bad. for instance, it would be
> utterly trivial to add high-res timestamps. trivial to make klogd
> a little more efficient. no reason you can't plug your own
> analysis code into it, rather than the normal syslog approach.
> logging pid/gid/etc is nonsense for many, even most printk's.
I am not saying that printk is bad.
I use it myself, too, for what it is for.
During the bring up, test, debug phase it is useful.
Can you make sure with printk-s that no error log is lost, can
you tell when a log has actually reached a permanent store device ?
Can you force the people to use the same printk format ?
( I am still not good at artificial intelligence and
I am too lazy to modify my analysis code every time when somebody
modifies a printk. :-) )
Can you pass lots of data through a printk ?
Can you make sure that printks are not intermixed ?
(You need to revise all the printk-s: one event - a single printk.
Otherwise the mutex does not protect what you print.)
I was glad to find this error log feature that meets our requirements.
It provides us services which reduce our development cost and provides
us functionality at "usual industrial level" (see e.g. POSIX).
Regards,
Zoltan Menyhart
P.S.: Please CC your answers to [email protected]
> Can you make sure with printk-s that no error log is lost, can
> you tell when a log has actually reached a permanent store device ?
I see no reason to believe that evlog would behave any differently
from printk in this regard.
> Can you pass lots of data through a printk ?
yes, of course you can, and it's just as stupid as passing a lot of
data through evlog.
> Can you make sure that printks are not intermixed ?
show why this is a serious problem.
> I was glad to find this error log feature that meets our requirements.
> It provides us services which reduce our development cost and provides
> us functionality at "usual industrial level" (see e.g. POSIX).
frankly, evlog is a solution in search of a problem. I see no reason
printk can't do TSC timestamping, more robust and/or efficient buffering,
auto-classification in klogd, realtime filtering/notification in userspace,
even delaying of formatting, and logging of binary data.
regards, mark hahn.
>> Can you make sure that printks are not intermixed ?
>
> show why this is a serious problem.
I already have - see earlier in this thread. Interspersed output
on SMP machines gives unreadable garbage.
>> I was glad to find this error log feature that meets our requirements.
>> It provides us services which reduce our development cost and provides
>> us functionality at "usual industrial level" (see e.g. POSIX).
>
> frankly, evlog is a solution in search of a problem. I see no reason
> printk can't do TSC timestamping, more robust and/or efficient buffering,
> auto-classification in klogd, realtime filtering/notification in
> userspace, even delaying of formatting, and logging of binary data.
Of course you could. You could just take the existing mechanism
that's been written for event logging and call it printk, for one.
But the point is to *avoid* dramatic changes to the existing subsystem
to avoid the massive pain of migration (80,000 existing calls), and
an unknown number of sysadmins with scripts to parse stuff out that
you'd drive completely mad (as we did with PTX). In the first post I
did to this thread is a the reasoning layed out for why not to change
printk.
M.
"Martin J. Bligh" wrote:
> > frankly, evlog is a solution in search of a problem. I see no reason
> > printk can't do TSC timestamping, more robust and/or efficient buffering,
> > auto-classification in klogd, realtime filtering/notification in
> > userspace, even delaying of formatting, and logging of binary data.
>
> Of course you could. You could just take the existing mechanism
> that's been written for event logging and call it printk, for one.
True,
I've been following this debate for some time and it seems to me that
there's been a lot of arguments for or against an "enhanced printk".
As Michel Dagenais pointed out, we can give it the name we would like,
it is the technical merits of the evlog proposal which should be looked
at carefully.
Since everyone seems to agree that printk needs to be changed and since
the evlog folks have already worked extensively on this issue, it would
seem that their work should be weighed in and, at the very least, tested
out by the folks who insist on an "enhanced printk".
Cheers,
Karim
===================================================
Karim Yaghmour
[email protected]
Embedded and Real-Time Linux Expert
===================================================