Check out http://www.kernelnewbies.org/status/ for the latest kernel
status update.
Many link changes this week to reflect new URLs for several projects.
Also removed some URLs that were not very useful or pointing to patches
through the Bitkeeper Web interface that kept changing (I think the
Bitmover team is working on a fix - I'll put them back when that's
done).
Finally, on the cleanup side, I am planning on removing the following
entries that have been submitted a while back and don't look like
there are going to be accepted for inclusion anytime soon:
- Better event logging for enterprise systems
- Linux booting ELF images
- First pass at LinuxBIOS support
- Build option for Linux Trace Toolkit (LTT)
- Scalable CPU bitmasks
Also on the planned deletion list:
- Add thrashing control
- Generic parameter/command line interface
As usual, feedback welcome!
Enjoy,
-- Guillaume
----------------------------------
Linux Kernel 2.5 Status - July 10th, 2002
(Latest kernel release is 2.5.25)
Features:
Merged
o in 2.5.1+ Rewrite of the block IO (bio) layer (Jens Axboe)
o in 2.5.2 Initial support for USB 2.0 (David Brownell, Greg
Kroah-Hartman, etc.)
o in 2.5.2 Per-process namespaces, late-boot cleanups (Al Viro, Manfred
Spraul)
o in 2.5.2+ New scheduler for improved scalability (Ingo Molnar)
o in 2.5.2+ New kernel device structure (kdev_t) (Linus Torvalds, etc.)
o in 2.5.3 IDE layer update (Andre Hedrick)
o in 2.5.3 Support reiserfs external journal (Reiserfs team)
o in 2.5.3 Generic ACL (Access Control List) support (Nathan Scott)
o in 2.5.3 PnP BIOS driver (Alan Cox, Thomas
Hood, Dave Jones, etc.)
o in 2.5.3+ New driver model & unified device tree (Patrick Mochel)
o in 2.5.4 Add preempt kernel option (Robert Love,
MontaVista team)
o in 2.5.4 Support for Next Generation POSIX Threading (NGPT team)
o in 2.5.4+ Porting all input devices over to input API (Vojtech Pavlik,
James Simmons)
o in 2.5.5 Add ALSA (Advanced Linux Sound Architecture) (ALSA team)
o in 2.5.5 Pagetables in highmem support (Ingo Molnar, Arjan
van de Ven)
o in 2.5.5 New architecture: AMD 64-bit (x86-64) (Andi Kleen, x86-64
Linux team)
o in 2.5.5 New architecture: PowerPC 64-bit (ppc64) (Anton Blanchard,
ppc64 team)
o in 2.5.5+ IDE subsystem rewrite (Martin Dalecki)
o in 2.5.6 Add JFS (Journaling FileSystem from IBM) (JFS team)
o in 2.5.6 per_cpu infrastructure (Rusty Russell)
o in 2.5.6 HDLC (High-level Data Link Control) update (Krzysztof Halasa)
o in 2.5.6 smbfs Unicode and large file support (Urban Widmark)
o in 2.5.7 New driver API for Wireless Extensions (Jean Tourrilhes)
o in 2.5.7 Video for Linux (V4L) redesign (Gerd Knorr)
o in 2.5.7 Futexes (Fast Lightweight Userspace Semaphores) (Rusty Russell, etc.)
o in 2.5.7+ NAPI network interrupt mitigation (Jamal Hadi Salim,
Robert Olsson, Alexey Kuznetsov)
o in 2.5.7+ ACPI (Advanced Configuration & Power Interface) (Andy Grover, ACPI
team)
o in 2.5.8 Syscall interface for CPU task affinity (Robert Love)
o in 2.5.8 Radix-tree pagecache (Momchil Velikov,
Christoph Hellwig)
o in 2.5.8+ Delayed disk block allocation (Andrew Morton)
o in 2.5.9 Smarter IRQ balancing (Ingo Molnar)
o in 2.5.11 Replace old NTFS driver with NTFS TNG driver (Anton Altaparmakov)
o in 2.5.11 Fast walk dcache (Hanna Linder)
o in 2.5.11+ Rewrite of the framebuffer layer (James Simmons)
o in 2.5.12+ Rewrite of the buffer layer (Andrew Morton)
o in 2.5.14 Support for IDE TCQ (Tagged Command Queueing) (Jens Axboe)
o in 2.5.14 Bluetooth support (no longer experimental!) (Maxim Krasnyansky,
Bluetooth team)
o in 2.5.17 New quota system supporting plugins (Jan Kara)
o in 2.5.17+ Move ISDN4Linux to CAPI based interface (Kai Germaschewski,
ISDN4Linux team)
o in 2.5.18 Software suspend (to disk & RAM) (Pavel Machek)
o in 2.5.23 More complete IEEE 802.2 stack (Arnaldo, Jay
Schullist, from Procom donated code)
o in 2.5.23+ Hotplug CPU support (Rusty Russell)
o in -dj Rewrite of the console layer (James Simmons)
o in -dj New MTRR (Memory Type Range Register) driver (Patrick Mochel)
o in -dj Add support for CPU clock/voltage scaling (Erik Mouw, Dave
Jones, Russell King, Arjan van de Ven)
o in -ac Strict address space accounting (Alan Cox)
o in -ac PCMCIA Zoom video support (Alan Cox)
o in -ac More complete NetBEUI stack (Arnaldo Carvalho de
Melo, from Procom donated code)
o in -ac Improved i2o (Intelligent Input/Ouput) layer (Alan Cox)
o Ready Better event logging for enterprise systems (Larry Kessler, evlog
team)
o Ready Linux booting ELF images (Eric Biederman)
o Ready First pass at LinuxBIOS support (Eric Biederman)
o Ready Build option for Linux Trace Toolkit (LTT) (Karim Yaghmour)
o Ready New kernel build system (kbuild 2.5) (Keith Owens)
o Ready Read-Copy Update Mutual Exclusion (Dipankar Sarma,
Rusty Russell, Andrea Arcangeli, LSE Team)
o Ready USB gadget support (Stuart Lynne, Greg
Kroah-Hartman)
o Ready Scalable CPU bitmasks (Russ Weight)
o Ready Add hardware sensors drivers (lm_sensors team)
o Beta Serial driver restructure (Russell King)
o Beta New IO scheduler (Jens Axboe)
o Beta Add XFS (A journaling filesystem from SGI) (XFS team)
o Beta New VM with reverse mappings (Rik van Riel)
o Beta Fix long-held locks for low scheduling latency (Andrew Morton,
Robert Love, etc.)
o Beta Add Linux Security Module (LSM) (LSM team)
o Beta Per-mountpoint read-only, union-mounts, unionfs (Al Viro)
o Beta EVMS (Enterprise Volume Management System) (EVMS team)
o Beta LVM (Logical Volume Manager) v2.0 (LVM team)
o Beta Dynamic Probes (Suparna
Bhattacharya, dprobes team)
o Beta Page table sharing (Daniel Phillips)
o Beta ext2/ext3 online resize support (Andreas Dilger)
o Beta Add User-Mode Linux (UML) (Jeff Dike)
o Beta UDF Write support for CD-R/RW (packet writing) (Jens Axboe, Peter
Osterlund)
o Beta Asynchronous IO (aio) support (Ben LaHaise)
o Beta Direct pagecache <-> BIO disk I/O (Andrew Morton)
o Alpha Better support of high-end NUMA machines (NUMA team)
o Alpha Full compliance with IPv6 (Alexey Kuznetzov,
Jun Murai, Yoshifuji Hideaki, USAGI team)
o Alpha UMSDOS (Unix under MS-DOS) Rewrite (Al Viro)
o Alpha Scalable Statistics Counter (Ravikiran Thirumalai)
o Alpha Linux Kernel Crash Dumps (Matt Robinson, LKCD
team)
o Alpha Add support for NFS v4 (NFS v4 team)
o Alpha ext2/ext3 large directory support: HTree index (Daniel Phillips,
Christopher Li, Ted Ts'o)
o Alpha Remove use of the BKL (Big Kernel Lock) (Alan Cox, Robert
Love, Neil Brown, Dave Hansen, etc.)
o Alpha Zerocopy NFS (Hirokazu Takahashi)
o Alpha Change all drivers to new driver model (All maintainers)
o Alpha Remove the 2TB block device limit (Peter Chubb)
o Alpha SCTP (Stream Control Transmission Protocol) (lksctp team)
o Started Overhaul PCMCIA support (David Woodhouse,
David Hinds)
o Started Reiserfs v4 (Reiserfs team)
o Started Serial ATA support (Andre Hedrick)
o Started InfiniBand support (InfiniBand team)
* Started Fix device naming issues (Patrick Mochel, Greg
Kroah-Hartman)
o Draft #2 New lightweight library (klibc) (Greg Kroah-Hartman)
o Draft #3 Replace initrd by initramfs (H. Peter Anvin, Al
Viro)
o Planning Add thrashing control (Rik van Riel)
o Planning Remove all hardwired drivers from kernel (Alan Cox, etc.)
o Planning Generic parameter/command line interface (Keith Owens)
o Planning New mount API (Al Viro)
Cleanups:
Merged
o in 2.5.3 Break Configure.help into multiple files (Linus Torvalds)
o in 2.5.3 Untangle sched.h & fs.h include dependancies (Dave Jones, Roman
Zippel)
o in 2.5.4 Per network protocol slabcache & sock.h (Arnaldo Carvalho de
Melo)
o in 2.5.4 Per filesystem slabcache & fs.h (Daniel Phillips,
Jeff Garzik, Al Viro)
o in 2.5.6 Killing kdev_t for block devices (Al Viro)
o in 2.5.18+ ->getattr() ->setattr() ->permission() changes (Al Viro)
o in 2.5.21 Split up x86 setup.c into managable pieces (Patrick Mochel)
o in 2.5.23+ Major MD tool (RAID 5) cleanup (Neil Brown)
o Ready Switch to ->get_super() for file_system_type (Al Viro)
o Beta file.h and INIT_TASK (Benjamin LaHaise)
o Beta Proper UFS fixes, ext2 and locking cleanups (Al Viro)
o Beta Lifting limitations on mount(2) (Al Viro)
o Beta Remove dcache_lock (Maneesh Soni, IBM
team)
o Started Reorder x86 initialization (Dave Jones, Randy
Dunlap)
Have some free time and want to help? Check out the Kernel Janitor
TO DO list for a list of source code cleanups you can work on.
A great place to start learning more about kernel internals!
On Wed, 10 Jul 2002, Guillaume Boissiere wrote:
> Also on the planned deletion list:
> - Add thrashing control
I've had the mechanism working for well over a year now, but
still don't have a proper policy for load control implemented.
> o Beta New VM with reverse mappings (Rik van Riel)
This is (in limited form) ready for merging and has been
stability tested by Andrew Morton. A patch should go to
Linus soon...
kind regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Wed, 10 Jul 2002, Guillaume Boissiere wrote:
>...
> Finally, on the cleanup side, I am planning on removing the following
> entries that have been submitted a while back and don't look like
> there are going to be accepted for inclusion anytime soon:
> - Better event logging for enterprise systems
> - Linux booting ELF images
> - First pass at LinuxBIOS support
> - Build option for Linux Trace Toolkit (LTT)
> - Scalable CPU bitmasks
>...
Are there any reasons why these don't make it into 2.5?
cu
Adrian
--
You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox
Hi,
On Wed, 10 Jul 2002, Adrian Bunk wrote:
> Are there any reasons why these don't make it into 2.5?
>
> > - Better event logging for enterprise systems
Linus was scared we could break old syslog parsers.
> > - Linux booting ELF images
Reason unknown. I expect some statement here.
> > - First pass at LinuxBIOS support
Possibly no one had one? I have some boxes running prettily on LinuxBIOS,
can't imagine why it isn't merged into 2.5. But I can't imagine what's
speaking against KBuild-2.5 either. I can speak it, I love it.
> > - Build option for Linux Trace Toolkit (LTT)
Nobody seemed to be interested in this toolkit. The (s|l)trace toolkit and
kdb seemed to be sufficient for the most developers. (I don't whine here
either.)
> > - Scalable CPU bitmasks
Seems they got lost.
Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------
Hi,
On Wed, 10 Jul 2002, Thunder from the hill wrote:
> > > - Build option for Linux Trace Toolkit (LTT)
>
> Nobody seemed to be interested in this toolkit. The (s|l)trace toolkit and
> kdb seemed to be sufficient for the most developers. (I don't whine here
> either.)
???
Have you actually tried it? I can only encourage everyone to try it and
play a bit with it. It's a great debug tool and it can't be replaced by
any of the mentioned tools. It's very useful in situations where printk
doesn't work anymore.
bye, Roman
> > > - Better event logging for enterprise systems
> Linus was scared we could break old syslog parsers.
This is still in discussion. The real issue is not breaking syslog parsers
(in fact its a way to make sure they dont break in future) but to get
- Accurate reporting
- Error classifications
- Error manuals
- Translations
etc
done in a way that doesnt make the kernel ugly. Thats non trivial. I need
to schedule a discussion with some IBM folks about part of this
Alan
Thunder from the hill wrote:
> > > - Build option for Linux Trace Toolkit (LTT)
>
> Nobody seemed to be interested in this toolkit. The (s|l)trace toolkit and
> kdb seemed to be sufficient for the most developers. (I don't whine here
> either.)
It's somewhat unfair to compare LTT to s/ltrace or kdb because they
don't serve the same purposes. The other thread on "Enhanced profiling"
should have made this very clear by now.
I've spoken to many key kernel developers about this and they all saw
its inclusion as being positive, but they also all said the same thing:
it's really Linus' decision.
In light of the recent discussions, it would be really nice to get a
definitive statement about LTT's inclusion in 2.5.
Cheers,
Karim
===================================================
Karim Yaghmour
[email protected]
Embedded and Real-Time Linux Expert
===================================================
Hi,
On Wed, 10 Jul 2002, Roman Zippel wrote:
> Have you actually tried it? I can only encourage everyone to try it and
> play a bit with it. It's a great debug tool and it can't be replaced by
> any of the mentioned tools. It's very useful in situations where printk
> doesn't work anymore.
It might not have been noticed publicly, since no one showed interest in
it...
Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------
> Also on the planned deletion list:
> - Generic parameter/command line interface
This isn't dead at all. Rusty and I talked about this at the kernel
summit, and there was no loud verbal dissention (that I remember).
Unfortunately, progress has been temporarily hampered by a few things,
like the factd that the people that this type of stuff actually turns on
are few in number, and distracted by several other projects...
-pat
On Wed, Jul 10, 2002 at 12:54:49PM -0400, Karim Yaghmour wrote:
> In light of the recent discussions, it would be really nice to get a
> definitive statement about LTT's inclusion in 2.5.
It has been pointed out to you at least once that it would stand a much
better chance if you were to follow the kernel coding style, for one ...
And things like :
+#ifndef CONFIG_SMP /* On an SMP machine NMIs are used to implement a watchdog and will hang
+ the machine if traced. */
+ TRACE_TRAP_ENTRY(2, regs->eip);
+#endif
+
aren't very encouraging.
just my 2p
john
--
"I know I believe in nothing but it is my nothing"
- Manic Street Preachers
Em Wed, Jul 10, 2002 at 01:11:08AM -0400, Guillaume Boissiere escreveu:
> o in -ac More complete NetBEUI stack (Arnaldo Carvalho de Melo,
> from Procom donated code)
Not really, still not submitted.
- Arnaldo
While I understand the possible value, are printk translations
really important enough to justify?
Do we really need to have the equivalent of:
printk(tr("Context string %s: %d"),tr("some string"),value);
translate/lookups? Why? If so, is this facility supposed to be
run-time or compile-time?
Unfortunately, I missed the RAS BOF at OLS, so I don't know
what was discussed. Some of these were audio recorded.
Anyone know of the audio repository location? Can't find any of
the 2001 or 2002 sessions on the symposium website.
-----Original Message-----
From: Alan Cox [mailto:[email protected]]
The real issue is not breaking syslog parsers but to get
- Translations
etc
done in a way that doesnt make the kernel ugly. Thats non trivial. I need
to schedule a discussion with some IBM folks about part of this
On Tue, 2002-07-09 at 22:11, Guillaume Boissiere wrote:
> As usual, feedback welcome!
As of 2.5.25, we have HZ=1000 (on x86) and a scalable user-space
exported clock_t that remains at 100 HZ to keep user-space compatible.
This is attributed to the Commander in Chief, Linus Torvalds.
Robert Love
Hi,
On Wed, 10 Jul 2002, Perches, Joe wrote:
> Do we really need to have the equivalent of:
> printk(tr("Context string %s: %d"),tr("some string"),value);
> translate/lookups? Why? If so, is this facility supposed to be
> run-time or compile-time?
Ah, I see. Somewhen I have some piece of english text that I want to send
to some japanese people, and when I do C-X in my pine, it gets translated
to japanese automatically by the kernel network layer...
Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------
> > > - Scalable CPU bitmasks
>
> Seems they got lost.
>
> Regards,
> Thunder
This patch was most recently submitted against 2.5.20. The patch
introduces two new files, and modifies a single makefile. I believe
it still applies cleanly to the latest version.
I will resubmit the patch soon for the latest kernel version.
- Russ
PS. Please copy me on any replies, as I am not subscribed to the list.
> While I understand the possible value, are printk translations
> really important enough to justify?
Is it worth chsnging the kernel just for translations - probably not. If
it comes out as a convenient side effect - yes.
Hi Alan,
On Wed, Jul 10, 2002 at 07:38:23PM +0100, Alan Cox wrote:
> > While I understand the possible value, are printk translations
> > really important enough to justify?
>
> Is it worth chsnging the kernel just for translations - probably not. If
> it comes out as a convenient side effect - yes.
I'd still say no. Kernel messages are meant not meant for end users but for
kernel developers. The latter all speak english, but chances are that most
of them won't understand japanese kernel messages. So they would need to
be translated back before posted here.
If you want translated kernel messages, use message IDs, that can be parsed
and translated in userspace, if somebody really needs it.
Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE Linux AG, Nuernberg, DE SCSI, Security
On Wed, Jul 10, 2002 at 10:54:21AM -0700, you [Robert Love] wrote:
>
> As of 2.5.25, we have HZ=1000 (on x86) and a scalable user-space
> exported clock_t that remains at 100 HZ to keep user-space compatible.
> This is attributed to the Commander in Chief, Linus Torvalds.
But jiffies now wrap at 49.7 days, right? If so, did Tim Schmielau's jiffies
wrap patches go in as well? ISTR they went in -dj.
Didn't Red Hat change HZ to 1000 (or 1024) in Limbo as well? How did they
handle that?
-- v --
[email protected]
Kurt Garloff wrote:
> If you want translated kernel messages, use message IDs, that can be parsed
> and translated in userspace,
Agreed.
What's been discussed with Rusty Russell (and I believe he has
discussed this with Alan) is not modifying the printks, but providing
logging macros that keep the format string separate from the vararg list
(but written to a log file as a single event record).
Then, a user-space utility would read the event record from the log
and do one of the following:
1) printf-style formatting with the original format string, just like
printk
2a) Use a unique reference code (a hash, generated in the kernel, of
original format string with sourcefile name and function name, for
example) to look-up the non-english format string (similar to the
catgets approach).
or
2b) Use the format string to look-up its non-english equivalent in
a message catalog (similar to the gettext approach).
Rusty's proposal has many other benefits, which I will leave for him
to describe at the appropriate time, but translation in user-space of
kernel messages into multiple languages is one of them.
In fact, with event logging (not currently part of the base) you can
"fork" printk() messages both to the current ring buffer (formatted),
and
to a separate buffer where the format string and varargs list could
be kept separate, as described above.
Existing parsing scripts, sys admins, etc. expect /var/log/messages,
etc.
to have pure, unmodified printk messages, so you would not want to touch
the original printk messages. However, storing the unformatted event
data
separately in its own log file would allow the processing options
described
above. Also, if the variable event data is stored separately from the
format string in the event record, parsing of the data by a user-space
utility is cleaner and more efficient.
> if somebody really needs it.
Agreed, again. If you don't want/need translation, there must be a way
to
completely disable it and the extra overhead that makes it possible.
John Levon wrote:
> On Wed, Jul 10, 2002 at 12:54:49PM -0400, Karim Yaghmour wrote:
>
> > In light of the recent discussions, it would be really nice to get a
> > definitive statement about LTT's inclusion in 2.5.
>
> It has been pointed out to you at least once that it would stand a much
> better chance if you were to follow the kernel coding style, for one ...
And if you had actually cared to follow that thread with Roman Zippel to
its logical conclusion, you would have noticed that patches including an
update to match the coding style have been made available:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102106555627527&w=2
> And things like :
>
> +#ifndef CONFIG_SMP /* On an SMP machine NMIs are used to implement a watchdog and will hang
> + the machine if traced. */
> + TRACE_TRAP_ENTRY(2, regs->eip);
> +#endif
> +
>
> aren't very encouraging.
Care to comment on why? All what the above says is that this trace point
shouldn't be active on an SMP box. That's one of the very rare cases
(if not the only case) where such a build-condition is added. And if this
really is too much for the kernel crowd's stomach then it is easily remedied.
I would have thought you had something a little bit more substantial
to stand against LTT's inclusion.
Karim
===================================================
Karim Yaghmour
[email protected]
Embedded and Real-Time Linux Expert
===================================================
On Wed, 2002-07-10 at 12:18, Ville Herva wrote:
> On Wed, Jul 10, 2002 at 10:54:21AM -0700, you [Robert Love] wrote:
> >
> > As of 2.5.25, we have HZ=1000 (on x86) and a scalable user-space
> > exported clock_t that remains at 100 HZ to keep user-space compatible.
> > This is attributed to the Commander in Chief, Linus Torvalds.
>
> But jiffies now wrap at 49.7 days, right? If so, did Tim Schmielau's jiffies
> wrap patches go in as well? ISTR they went in -dj.
George Anzinger's 64-bit jiffies are in 2.5.
Tim's code to better utilize them is in 2.5 I _think_.
> Didn't Red Hat change HZ to 1000 (or 1024) in Limbo as well? How did they
> handle that?
Yes, RedHat's current devel kernel is using HZ=1000. I am not sure how
they handled it. What we have in 2.5 now is correct.
Robert Love
On Wednesday 10 July 2002 18:31, Thunder from the hill wrote:
> On Wed, 10 Jul 2002, Adrian Bunk wrote:
> > Are there any reasons why these don't make it into 2.5?
> > > - Build option for Linux Trace Toolkit (LTT)
>
> Nobody seemed to be interested in this toolkit. The (s|l)trace toolkit and
> kdb seemed to be sufficient for the most developers.
Please don't presume to speak for me on that.
--
Daniel
On Wed, 2002-07-10 at 13:20, Cort Dougan wrote:
> Why was the rate incremented to maintain interactive performance? Wasn't
> that the whole idea of the pre-empt work? Does the burden of pre-empt
> actually require this?
I did not say it was increased to improve interactivity response - and
it certainly has little or nothing to do with kernel preemption being
merged.
I suspect a big benefit would be poll/select performance. I think this
is why RedHat increased HZ in their kernels.
You would have to ask Linus exactly what his intentions were.
> It seems that the added inefficiency of these extra interrupts is going to
> drag performance down.
Robert Love
Why was the rate incremented to maintain interactive performance? Wasn't
that the whole idea of the pre-empt work? Does the burden of pre-empt
actually require this?
It seems that the added inefficiency of these extra interrupts is going to
drag performance down.
} George Anzinger's 64-bit jiffies are in 2.5.
}
} Tim's code to better utilize them is in 2.5 I _think_.
}
} > Didn't Red Hat change HZ to 1000 (or 1024) in Limbo as well? How did they
} > handle that?
}
} Yes, RedHat's current devel kernel is using HZ=1000. I am not sure how
} they handled it. What we have in 2.5 now is correct.
}
} Robert Love
}
} -
} To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
} the body of a message to [email protected]
} More majordomo info at http://vger.kernel.org/majordomo-info.html
} Please read the FAQ at http://www.tux.org/lkml/
On Wed, 10 Jul 2002, Ville Herva wrote:
>
> Didn't Red Hat change HZ to 1000 (or 1024) in Limbo as well? How did they
> handle that?
>
Yes, they did, but the implementation is sligtly different. I would expect
it will change to Linus' implementation in the next Limbo kernel update.
Justin
> Why was the rate incremented to maintain interactive performance? Wasn't
> that the whole idea of the pre-empt work? Does the burden of pre-empt
> actually require this?
Bizarrely in many cases it increases throughput
> It seems that the added inefficiency of these extra interrupts is going to
> drag performance down.
Sometimes - Beowulf folks already sometimes hack the clock down to 20Hz or
less. This is best approached on sane hardware by extending the S/390 stuff
for no regular ticks.
On Wed, Jul 10, 2002 at 01:25:38PM -0700, you [Robert Love] wrote:
> On Wed, 2002-07-10 at 13:20, Cort Dougan wrote:
>
> > Why was the rate incremented to maintain interactive performance? Wasn't
> > that the whole idea of the pre-empt work? Does the burden of pre-empt
> > actually require this?
>
> I did not say it was increased to improve interactivity response - and
> it certainly has little or nothing to do with kernel preemption being
> merged.
>
> I suspect a big benefit would be poll/select performance. I think this
> is why RedHat increased HZ in their kernels.
Red Hat Limbo ChangeLog says:
"The kernel used in this release supports the following list of improvements
and new features. The kernel is based on the 2.4.19- pre10-ac2 release for
this beta."
"HZ=1000 on i686 and Athlon means that the system clock ticks 10 times as
fast as on other x86 platforms (i386 and i586); HZ=100 has been the Linux
default on x86 platforms for the entire history of the Linux kernel. This
change provides better interactive response, lower latency response from
some programs, and better response from the scheduler. We have adjusted the
/proc filesystem to report numbers as if using the default HZ=100, but it is
possible that issues could arise -- please test and report bugs, as always."
So they aim for interactive response. Otoh, I think they don't include
pre-emp nor any low-lat work. I might be wrong. But the network console and
crash dump functionality they include (by Ingo Molner, I reckon) seems
sweet.
-- v --
[email protected]
On Wed, 2002-07-10 at 14:07, Alan Cox wrote:
> > Why was the rate incremented to maintain interactive performance? Wasn't
> > that the whole idea of the pre-empt work? Does the burden of pre-empt
> > actually require this?
>
> Bizarrely in many cases it increases throughput
I can attest to this. We see the same thing with the preemptible kernel
(throughput increases on certain workloads).
My guess would be the better process response applies the same to
throughput: sooner to wake up, sooner to run, sooner to be done.
Robert Love
> Kurt Garloff wrote:
> > If you want translated kernel messages, use message IDs, that can be parsed
> > and translated in userspace,
>
> Agreed.
> What's been discussed with Rusty Russell (and I believe he has
> discussed this with Alan) is not modifying the printks, but providing
> logging macros that keep the format string separate from the vararg list
> (but written to a log file as a single event record).
You need a bit more than that. You need a consistent way to report
an IRQ number, a device name, a PCI object etc. That does mean tidying up
printk but not in a bad way. One can imagine either
"%s", irq_name(irq)
or
"%I", irq
type solutions
> I'd still say no. Kernel messages are meant not meant for end users but for
> kernel developers. The latter all speak english, but chances are that most
Then why do they appear on the screen 8)
There are cases (eg someone selling Linux entirely into the PRC) where there
are a billion plus potential users for whom English is not just an odd
language but has a totally bizarre character set.
I can't conceive of anyone in the EU/US etc wanting to do such translation
but if you get it for free then it means we got the design right and someone
may one day find a good use for it.
(I missed part of this thread. I hope I correcltly deduced that you
guys are talking about the improved disk troughput when increasing the
HZ clock rate ... )
Alan Cox wrote:
> > Why was the rate incremented to maintain interactive performance? Wasn't
> > that the whole idea of the pre-empt work? Does the burden of pre-empt
> > actually require this?
>
> Bizarrely in many cases it increases throughput
IMHO, This is a hint that there is something not quite right with the
scheduler.
This effect has been reported here a couple of times.
If increasing the timer rate improves disk throughput that means that
the disk-reading process is not scheduled immediately following the
disk interrupt, but is somehow left waiting until the next timer
tick....
It should be scheduled "immediately" even if there is another
cpu-eating process: the scheduling heuristics should help there...
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.
> Nobody seemed to be interested in this toolkit. The (s|l)trace toolkit
and
> kdb seemed to be sufficient for the most developers. (I don't whine here
> either.)
That' ridiculous - there's quite a lot of interest in having a
comprehensive system trace capability. The OLS RAS BoF confirmed that. This
sort of capability is essential for system serviceability. Talk to any
Service Engneer who has to deal with real-word non-recreatable problems,
which are nevertheless of enormous impact to a customer.
I would accept that LTT doesn't yet have all the desired features, but
those are one the way and this demonstates a definite interest in LTT.
Also, LTT is currently or about to be distributed in the Monta Vista's
carrier grade offering, Lineo and TurboLinux's enterprise server offering.
There's also a strong possibility that it will end up in United Linux 1.0
- no interest? I don't think so.
Richard J Moore RAS Project Lead - IBM Linux Technology Centre
>> Are there any reasons why these don't make it into 2.5?
>>
>> > - Better event logging for enterprise systems
>
> Linus was scared we could break old syslog parsers.
This is a surprising view given that what we currently have is broken.
Logging serves two purposes:
problem determination - via a human interface
system's management - via automation
It's the latter we need to be able to do reliably and can't because
currently:
message uniqueness is not guaranteed
message content is not complete for automation purposes
some of the most serious error message have the least useful content
many messages are issued using multiple printks and on an MP system can
have their text interleaved
there's no national language support
embedded systems are not well catered for
message recognition and parsing is haphazard
EVL is not seeking to do a wholesale replacement of printk. But does
provide the necessary infrastructure to achieve automation. Instrumentation
and re-instrumentation is an independent activity. It can be done in
incremental steps. But until we have a useful log management system service
we can't even begin to address the needs of system automation and systems'
management.
Again the OLS RAS BoF discussions were very focused on this issue and
supportive of it. Instrumentation was the key subject of discussion - how
to do it with no administrative overhead, how to do it in a way that
developers would find acceptable, how to satisfy the needs for NLS and
embedded systems.
Richard J Moore IBM Linux Technology Centre
> You need a bit more than that. You need a consistent way to report
> an IRQ number, a device name, a PCI object etc. That does mean tidying up
> printk but not in a bad way. One can imagine either
>
> "%s", irq_name(irq)
>
> or
> "%I", irq
>
> type solutions
This would be nice. Ive been meaning to clean up the sparc64 irq
printing macros so I can use them on ppc64 (we have sparse irqs and
map them dynamically into the irq_desc array).
Same problem once we get PCI domains. So this cleanup will help more
than just event logging.
Anton
On Thu, Jul 11, 2002 at 09:34:37PM +0100, Richard J Moore wrote:
>
> message uniqueness is not guaranteed
> message content is not complete for automation purposes
> some of the most serious error message have the least useful content
> many messages are issued using multiple printks and on an MP system can
> have their text interleaved
> there's no national language support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[1]
> embedded systems are not well catered for
> message recognition and parsing is haphazard
[1] Should stay like this. Or at least only the original message
should go to the log file.
Just an example:
Huch? Sytemkernseite konnte nicht an Adresse 1098bf00 eingeblendet werden.
current->tss.cr3 = 0062d000, %cr3 = 0062d000
*pde = 00000000
Huch: 0002
Prozessor: 0
EIP: 0010:[oops:_oops+16/3868]
EFLAGS: 00010212
eax: 315e97cc ebx: 003a6f80
ecx: 001be77b edx: 00237c0c
esi: 00000000 edi: bffffdb3
ebp: 00589f90 esp: 00589f8c
ds: 0018 es: 0018 fs: 002b
gs: 002b ss: 0018
Proze? oops_test (pid: 3374,
Proze?nummer: 21, Kellerseite=00589000)
Keller: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
This is only German, which is very similiar to whose character
set is a slight superset of the characterset in UK/US.
Now consider the same message in Chinese (20.000 possible
characters), Hebrew (right to left and 'strange' characters),
Arabic (right to left, strange rules for composing connections
between the characters, which even Micro$oft was unable to get
right) and many more fun.
So the driver writer has get his debugging messages translated
somehow and than he has to ask the translation serices again to
translate the bug report from the user.
Sorry, this sounds like someone is looking for a market niche
for getting more money instead of getting real problems solved.
Communication requires a common language. This is English for
Problems regarding Linux kernel development and maintenance.
Anything else would slow down both, because of added complexity.
Regards
Ingo Oeser, who thinks i18ned user space messages are enough
pain in the ass.
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth