2002-10-30 15:07:38

by Guillaume Boissiere

[permalink] [raw]
Subject: [STATUS 2.5] October 30, 2002

Many new big items merged in the last few days:
IPsec, CryptoAPI, LVM2 (device-mapper), Digital Video Broadcasting layer, etc.
And still a long list of pending items marked as "Ready".
Oh, and Halloween is tomorrow.... :-)

http://www.kernelnewbies.org/status/ for all the details.
Enjoy!

-- Guillaume



-------------------------------------------------------
Linux Kernel 2.5 Status - October 30th, 2002
(Latest kernel release is 2.5.44)

Items in bold have changed since last week.
Items in grey are post Halloween (feature freeze).

Features:

Merged
o in 2.5.1+ Rewrite of the block IO (bio) layer (Jens Axboe)
o in 2.5.2 Initial support for USB 2.0 (David Brownell, Greg Kroah-Hartman, etc.)
o in 2.5.2 Per-process namespaces, late-boot cleanups (Al Viro, Manfred Spraul)
o in 2.5.2+ New scheduler for improved scalability (Ingo Molnar)
o in 2.5.2+ New kernel device structure (kdev_t) (Linus Torvalds, etc.)
o in 2.5.3 IDE layer update (Andre Hedrick)
o in 2.5.3 Support reiserfs external journal (Reiserfs team)
o in 2.5.3 Generic ACL (Access Control List) support (Nathan Scott)
o in 2.5.3 PnP BIOS driver (Alan Cox, Thomas Hood, Dave Jones, etc.)
o in 2.5.3+ New driver model & unified device tree (Patrick Mochel)
o in 2.5.4 Add preempt kernel option (Robert Love, MontaVista team)
o in 2.5.4 Support for Next Generation POSIX Threading (NGPT team)
o in 2.5.5 Add ALSA (Advanced Linux Sound Architecture) (ALSA team)
o in 2.5.5 Pagetables in highmem support (Ingo Molnar, Arjan van de Ven)
o in 2.5.5 New architecture: AMD 64-bit (x86-64) (Andi Kleen, x86-64 Linux team)
o in 2.5.5 New architecture: PowerPC 64-bit (ppc64) (Anton Blanchard, ppc64 team)
o in 2.5.6 Add JFS (Journaling FileSystem from IBM) (JFS team)
o in 2.5.6 per_cpu infrastructure (Rusty Russell)
o in 2.5.6 HDLC (High-level Data Link Control) update (Krzysztof Halasa)
o in 2.5.6 smbfs Unicode and large file support (Urban Widmark)
o in 2.5.7 New driver API for Wireless Extensions (Jean Tourrilhes)
o in 2.5.7 Video for Linux (V4L) redesign (Gerd Knorr)
o in 2.5.7 Futexes (Fast Lightweight Userspace Semaphores) (Rusty Russell, etc.)
o in 2.5.7+ NAPI network interrupt mitigation (Jamal Hadi Salim, Robert Olsson, Alexey
Kuznetsov)
o in 2.5.7+ ACPI (Advanced Configuration & Power Interface) (Andy Grover, ACPI team)
o in 2.5.8 Syscall interface for CPU task affinity (Robert Love)
o in 2.5.8 Radix-tree pagecache (Momchil Velikov, Christoph Hellwig)
o in 2.5.9 Smarter IRQ balancing (Ingo Molnar)
o in 2.5.11 Replace old NTFS driver with NTFS TNG driver (Anton Altaparmakov)
o in 2.5.11 Fast walk dcache (Hanna Linder)
o in 2.5.11+ Rewrite of the framebuffer layer (James Simmons)
o in 2.5.12+ Rewrite of the buffer layer (Andrew Morton)
o in 2.5.14 Support for IDE TCQ (Tagged Command Queueing) (Jens Axboe)
o in 2.5.14 Bluetooth support (no longer experimental!) (Maxim Krasnyansky, Bluetooth team)
o in 2.5.17 New quota system supporting plugins (Jan Kara)
o in 2.5.17+ Move ISDN4Linux to CAPI based interface (Kai Germaschewski, ISDN4Linux team)
o in 2.5.18 Software suspend (to disk & RAM) (Pavel Machek)
o in 2.5.23 More complete IEEE 802.2 stack (Arnaldo, Jay Schullist, from Procom donated
code)
o in 2.5.23+ Hotplug CPU support (Rusty Russell)
o in 2.5.25 Faster internal kernel clock frequency (Linus Torvalds)
o in 2.5.26 Direct pagecache <-> BIO disk I/O (Andrew Morton)
o in 2.5.27+ New VM with reverse mappings (Rik van Riel)
o in 2.5.28+ Serial driver restructure (Russell King)
o in 2.5.28 Remove the "Big IRQ lock" (Ingo Molnar)
o in 2.5.29+ Thread-Local Storage (TLS) support (Ingo Molnar)
o in 2.5.29+ Add Linux Security Module (LSM) (LSM team)
o in 2.5.29+ Strict address space accounting (Alan Cox)
o in 2.5.31+ Disk description cleanups (Al Viro)
o in 2.5.31 Support insane number of processes (Linus Torvalds)
o in 2.5.32 New MTRR (Memory Type Range Register) driver (Patrick Mochel)
o in 2.5.32+ Porting all input devices over to input API (Vojtech Pavlik, James Simmons)
o in 2.5.32+ Asynchronous IO (aio) support (Ben LaHaise)
o in 2.5.32+ Improved POSIX threading support (Ingo Molnar)
o in 2.5.33 SCTP (Stream Control Transmission Protocol) (lksctp team)
o in 2.5.33 TCP segmentation offload (Alexey Kuznetsov)
o in 2.5.34 discontigmem support (ia32) (Pat Gaughen, Martin Bligh, Jack Steiner, Tony Luck)

o in 2.5.34 POSIX threading support for signals (Ingo Molnar)
o in 2.5.35 Add User-Mode Linux (UML) (Jeff Dike)
o in 2.5.35 Serial ATA support (Andre Hedrick)
o in 2.5.36 Add XFS (A journaling filesystem from SGI) (XFS team)
o in 2.5.37 Remove the global tasklist (Ingo Molnar, William Lee Irwin)
o in 2.5.39 New IO scheduler (Jens Axboe)
o in 2.5.40 Add support for CPU clock/voltage scaling (Dominik Brodowski, Erik Mouw, Dave
Jones, Russell King, Arjan van de Ven)
o in 2.5.40 NUMA topology support (Matt Dobson)
o in 2.5.40 Parallelizing page replacement (Andrew Morton, Momchil Velikov, Dave Hansen,
William Lee Irwin)
o in 2.5.42 Improved i2o (Intelligent Input/Ouput) layer (Alan Cox)
o in 2.5.42 Remove the 2TB block device limit (Peter Chubb)
o in 2.5.42 Add new CIFS (Common Internet File System) (Steve French)
o in 2.5.42 ext2/ext3 large directory support: HTree index (Daniel Phillips, Christopher Li,
Andrew Morton, Ted Ts'o)
o in 2.5.43 Add support for NFS v4 (NFS v4 team, Trond Myklebust, Neil Brown)
o in 2.5.43 Read-Copy Update (RCU) Mutual Exclusion (Dipankar Sarma, Rusty Russell, Andrea
Arcangeli, LSE Team)
o in 2.5.43 Add OProfile, a low-overhead profiler (John Levon)
o in 2.5.43 Andrew File System (AFS) support (David Howells)
o in 2.5.44 x86 BIOS Enhanced Disk Device (EDD) polling (Matt Domsch)
o in 2.5.44 Plug'N Play Layer Rewrite (Adam Belay)
o in 2.5.45 Device mapper for Logical Volume Manager (LVM2) (Alasdair Kergon, Patrick
Caulfield, Joe Thornber)
o in 2.5.45 Digital Video Broadcasting (DVB) layer (LinuxTV team)
o in 2.5.45 IPsec support (Alexey Kuznetsov, Dave Miller, USAGI team)
o in 2.5.45 CryptoAPI (James Morris)


o in -mm Page table sharing (Daniel Phillips, Dave McCracken)
o in -mm Extended Attributes and ACLs for ext2/ext3 (Ted Ts'o)
o in -mm Per-cpu hot & cold page lists (Andrew Morton, Martin Bligh)
o in -ac MMU-less processor support (ucLinux) (Greg Ungerer)


o Ready Build option for Linux Trace Toolkit (LTT) (Karim Yaghmour)
o Ready Kernel Probes (kprobes) (Vamsi Krishna, kprobes team)
o Ready High resolution timers (George Anzinger, etc.)
o Ready EVMS (Enterprise Volume Management System) (EVMS team)
o Ready Linux Kernel Crash Dumps (Matt Robinson, LKCD team)
o Ready Rewrite of the console layer (James Simmons)
o Ready Zerocopy NFS (Hirokazu Takahashi)
o Ready Kexec, syscall to load kernel from kernel (Eric Biederman)
o Ready New Linux configuration system (Roman Zippel)
o Ready In-kernel module loader (Rusty Russell)
o Ready Unified boot/parameter support (Rusty Russell)
o Ready Support insane number of groups (Tim Hockin)
o Ready Better I/O performance with epoll (Davide Libenzi)
o Ready NUMA aware scheduler extensions (Erich Focht, Michael Hohnbaum)
o Ready Replace initrd by initramfs (H. Peter Anvin, Al Viro)
o Ready SCSI and FibreChannel Hotswap Support (Steven Dake)


o Beta Worldclass support for IPv6 (Alexey Kuznetsov, Dave Miller, Jun Murai, Yoshifuji
Hideaki, USAGI team)
o Beta Reiserfs v4 (Reiserfs team)
o Beta SCSI multipath IO (with NUMA support) (Patrick Mansfield, Mike Anderson)


o Alpha Basic NUMA API (Matt Dobson)
o Alpha Remove waitqueue heads from kernel structures (William Lee Irwin)
o Alpha NUMA aware slab allocator (Manfred Spraul, Martin Bligh)


o Started 32bit dev_t (?)


o Post-freeze Change all drivers to new driver model (All maintainers)
o Post-freeze Fix device naming issues (Patrick Mochel, Greg Kroah-Hartman)
o Post-freeze Better event logging for enterprise systems (Larry Kessler, evlog team)
o Post-freeze Page table reclamation (William Lee Irwin, Rik Van Riel)
o Post-freeze UMSDOS (Unix under MS-DOS) Rewrite (Al Viro)
o Post-freeze USB gadget support (Stuart Lynne, Greg Kroah-Hartman)
o Post-freeze Overhaul PCMCIA support (David Woodhouse, David Hinds)
o Post-freeze InfiniBand support (InfiniBand team)
o Post-freeze Per-mountpoint read-only, union-mounts, unionfs (Al Viro)
o Post-freeze More complete NetBEUI stack (Arnaldo Carvalho de Melo, from Procom donated
code)
o Post-freeze New mount API (Al Viro)
o Post-freeze Add thrashing control (Rik van Riel)
o Post-freeze Remove all hardwired drivers from kernel (Alan Cox, etc.)
o Post-freeze Improved AppleTalk stack (Arnaldo Carvalho de Melo)
o Post-freeze ext2/ext3 online resize support (Andreas Dilger)
o Post-freeze New lightweight library (klibc) (H. Peter Anvin)
o Post-freeze UDF Write support for CD-R/RW (packet writing) (Jens Axboe, Peter Osterlund)
o Post-freeze Scalable Statistics Counter (Ravikiran Thirumalai)
o Post-freeze Add hardware sensors drivers (lm_sensors team)



Cleanups:

Merged
o in 2.5.3 Break Configure.help into multiple files (Linus Torvalds)
o in 2.5.3 Untangle sched.h & fs.h include dependancies (Dave Jones, Roman Zippel)
o in 2.5.4 Per network protocol slabcache & sock.h (Arnaldo Carvalho de Melo)
o in 2.5.4 Per filesystem slabcache & fs.h (Daniel Phillips, Jeff Garzik, Al Viro)
o in 2.5.6 Killing kdev_t for block devices (Al Viro)
o in 2.5.18+ ->getattr() ->setattr() ->permission() changes (Al Viro)
o in 2.5.21 Split up x86 setup.c into managable pieces (Patrick Mochel)
o in 2.5.23+ Major MD tool (RAID 5) cleanup (Neil Brown)
o in 2.5.30 Remove khttpd (Christoph Hellwig)
o in 2.5.31 Rework datalink protocols to not use cli/sti (Arnaldo Carvalho de Melo)
o in 2.5.31 Remove incomplete SPX network stack (Arnaldo Carvalho de Melo)
o in 2.5.43 Remove kiobufs (Andrew Morton)


o in -mm Avoid dcache_lock while path walking (Maneesh Soni, IBM team)


o Ready Switch to ->get_super() for file_system_type (Al Viro)


o Beta file.h and INIT_TASK (Benjamin LaHaise)
o Beta Proper UFS fixes, ext2 and locking cleanups (Al Viro)
o Beta Lifting limitations on mount(2) (Al Viro)


o Started Reorder x86 initialization (Dave Jones, Randy Dunlap)



Have some free time and want to help? Check out the Kernel Janitor
TO DO list for a list of source code cleanups you can work on.
A great place to start learning more about kernel internals!


2002-10-30 15:50:00

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

In article <3DBFB0D2.21734.21E3A6B@localhost> (at Wed, 30 Oct 2002 10:13:38 -0500), "Guillaume Boissiere" <[email protected]> says:

> o in 2.5.45 IPsec support (Alexey Kuznetsov, Dave Miller, USAGI team)

How is the status of IPsec for IPv6?


> o Beta Worldclass support for IPv6 (Alexey Kuznetsov, Dave Miller, Jun Murai, Yoshifuji
> Hideaki, USAGI team)

We've almost done.

One thing that I'll contribute before the feature freeze is:
- Privacy Extensions for IPv6 addrconf

The remaining things which we DO want to see in 2.6 are:
- check is "rmmod ipv6" is ok
- IPv6 source address selection; which will be mandated by the
node requirement.
- IPsec for IPv6
- make IPv6 non-experimental :-)
- several enhancements on specification conformity
(neighbour discovery etc.)

Thanks.

--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2002-10-30 16:11:06

by Dave Jones

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

> o in 2.5.35 Serial ATA support (Andre Hedrick)

Erm, really ?

> o Post-freeze Add hardware sensors drivers (lm_sensors team)

Something else I took a look at in the last few days was the ECC
drivers. These are also zero impact, and could go in after the freeze
(assuming the authors want them merged). They could do with a small
amount of cleanup, but otherwise look ok.

> o Started Reorder x86 initialization (Dave Jones, Randy Dunlap)

I've jiggled a bunch of this (Randy didnt have time to play here)
around as much as its probably going to be for 2.6. It's in -dj,
has been sent for -ac, and will likely go to Linus post-freeze
as its all cleanups, and one-liners.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk

2002-10-30 17:12:58

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On Wed, 30 Oct 2002, Dave Jones wrote:

| > o Started Reorder x86 initialization (Dave Jones, Randy Dunlap)
|
| I've jiggled a bunch of this (Randy didnt have time to play here)
| around as much as its probably going to be for 2.6. It's in -dj,
| has been sent for -ac, and will likely go to Linus post-freeze
| as its all cleanups, and one-liners.

Right. Please remove my name from that item.

--
~Randy

2002-10-30 22:40:00

by David Miller

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

From: YOSHIFUJI Hideaki / 吉藤英明 <[email protected]>
Date: Thu, 31 Oct 2002 00:55:35 +0900 (JST)

> o in 2.5.45 IPsec support (Alexey Kuznetsov, Dave Miller, USAGI team)

How is the status of IPsec for IPv6?

It will be done after ipv4 side is fully functional.

- IPv6 source address selection; which will be mandated by the
node requirement.

We told you several times how this USAGI patch is not currently in an
acceptable form and needs to be reimplemented via the routing code.

- IPsec for IPv6

Alexey and I will implement this, it is basically reading RFCs and
typing at the keyboard, no more.

- several enhancements on specification conformity
(neighbour discovery etc.)

Where are these patches? I've applied everything you've submitted.

2002-10-31 02:42:35

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

In article <[email protected]> (at Wed, 30 Oct 2002 14:36:15 -0800 (PST)), "David S. Miller" <[email protected]> says:

> - IPv6 source address selection; which will be mandated by the
> node requirement.
>
> We told you several times how this USAGI patch is not currently in an
> acceptable form and needs to be reimplemented via the routing code.

Yes, but I think
- integrate our code to your tree
then
- reimplement (re-design)
is better way to go forward.

This is because the code, which works well in O(n) as current one
does, will tell you our needs and intentions better than our
babble when you re-design it; I belive we will achieve better
design in this way.


> - several enhancements on specification conformity
> (neighbour discovery etc.)
>
> Where are these patches? I've applied everything you've submitted.

Yes, thanks.

I need to check the result of current code and
to look at diff by byte-to-byte before preparing
patches for current tree.

--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2002-10-31 02:48:29

by David Miller

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

From: YOSHIFUJI Hideaki / 吉藤英明 <[email protected]>
Date: Thu, 31 Oct 2002 11:48:32 +0900 (JST)

In article <[email protected]> (at Wed, 30 Oct 2002 14:36:15 -0800 (PST)), "David S. Miller" <[email protected]> says:

> We told you several times how this USAGI patch is not currently in an
> acceptable form and needs to be reimplemented via the routing code.

Yes, but I think
- integrate our code to your tree
then
- reimplement (re-design)
is better way to go forward.

Absolutely not, we do not put improperly architected code into the
tree first then clean it up later.

Especially because this source address selection code interferes with
many IPSEC issues. Source address selection belongs at routing
tables, and there is no arguing about this. If you put it somewhere
else it gets in the way and causes many problems.

Please implement source address selection properly, then resubmit.
Thank you.

I need to check the result of current code and
to look at diff by byte-to-byte before preparing
patches for current tree.

Ok.

2002-10-31 03:01:56

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

Hello!

> Please implement source address selection properly, then resubmit.

Actually, I would propose... not to worry about this for a while.
The issue might happen to dissolve after cleaning the space
around ip6_route_output().

Alexey

2002-10-31 03:10:04

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

In article <[email protected]> (at Wed, 30 Oct 2002 18:44:43 -0800 (PST)), "David S. Miller" <[email protected]> says:

> > We told you several times how this USAGI patch is not currently in an
> > acceptable form and needs to be reimplemented via the routing code.
>
> Yes, but I think
> - integrate our code to your tree
> then
> - reimplement (re-design)
> is better way to go forward.
>
> Absolutely not, we do not put improperly architected code into the
> tree first then clean it up later.

That patch do NOT change current architecture so above is unfair.

It would be ok to say "we do not put code into the part of
improperly architected code in the tree then clean it up later."

--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2002-10-31 03:17:36

by David Miller

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

From: YOSHIFUJI Hideaki / 吉藤英明 <[email protected]>
Date: Thu, 31 Oct 2002 12:16:09 +0900 (JST)

In article <[email protected]> (at Wed, 30 Oct 2002 18:44:43 -0800 (PST)), "David S. Miller" <[email protected]> says:

> Absolutely not, we do not put improperly architected code into the
> tree first then clean it up later.

That patch do NOT change current architecture so above is unfair.

Ok, I correct myself, this patch adds more dependencies on badly
architected area making it _harder_ for us to clean it up.

2002-10-31 06:18:08

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

Dave Jones <[email protected]> writes:

> Something else I took a look at in the last few days was the ECC
> drivers. These are also zero impact, and could go in after the freeze
> (assuming the authors want them merged). They could do with a small
> amount of cleanup, but otherwise look ok.

Assuming they work. No offense to the guys who got the ball rolling, but
the architecture is lousy, and every driver I have messed with does not
work correctly, and I wind up reimplementing it before I can use it.

I actually like the idea of ECC drivers, and routinely make certain
there is a working ECC driver on the systems I ship. It is so much
very easier to catch memory errors with good ECC error reporting. But
unless I have slept soundly through a fundamental change, the
linux-ecc project currently does not ship quality drivers. The
infrastructure is bad, and the code is not quite correct.

If you want I can dig up the drivers I am currently using and send
them to you.

I even have a working memory scrub routine.

Eric

2002-10-31 10:30:34

by Alan

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On Thu, 2002-10-31 at 06:22, Eric W. Biederman wrote:
> I actually like the idea of ECC drivers, and routinely make certain
> there is a working ECC driver on the systems I ship. It is so much
> very easier to catch memory errors with good ECC error reporting. But
> unless I have slept soundly through a fundamental change, the
> linux-ecc project currently does not ship quality drivers. The
> infrastructure is bad, and the code is not quite correct.
>
> If you want I can dig up the drivers I am currently using and send
> them to you.

That would be really cool

2002-10-31 14:35:04

by Dave Jones

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On Wed, Oct 30, 2002 at 11:22:12PM -0700, Eric W. Biederman wrote:
> I actually like the idea of ECC drivers, and routinely make certain
> there is a working ECC driver on the systems I ship. It is so much
> very easier to catch memory errors with good ECC error reporting. But
> unless I have slept soundly through a fundamental change, the
> linux-ecc project currently does not ship quality drivers. The
> infrastructure is bad, and the code is not quite correct.
>
> If you want I can dig up the drivers I am currently using and send
> them to you.

Go wild..

Dave

--
| Dave Jones. http://www.codemonkey.org.uk

2002-10-31 16:28:29

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On 31 Oct 2002, Alan Cox wrote:

| On Thu, 2002-10-31 at 06:22, Eric W. Biederman wrote:
| > I actually like the idea of ECC drivers, and routinely make certain
| > there is a working ECC driver on the systems I ship. It is so much
| > very easier to catch memory errors with good ECC error reporting. But
| > unless I have slept soundly through a fundamental change, the
| > linux-ecc project currently does not ship quality drivers. The
| > infrastructure is bad, and the code is not quite correct.
| >
| > If you want I can dig up the drivers I am currently using and send
| > them to you.
|
| That would be really cool
Ditto.

--
~Randy

2002-10-31 22:04:55

by Pavel Machek

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

Hi!

> If you want I can dig up the drivers I am currently using and send
> them to you.
>
> I even have a working memory scrub routine.

What is "memory scrubbing" good for?
Pavel
--
When do you have heart between your knees?

2002-11-01 14:01:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

Pavel Machek <[email protected]> writes:

> Hi!
>
> > If you want I can dig up the drivers I am currently using and send
> > them to you.
> >
> > I even have a working memory scrub routine.
>
> What is "memory scrubbing" good for?

When you have a correctable ECC error on a page you need to rewrite the
memory to remove the error. This prevents the correctable error from becoming
an uncorrectable error if another bit goes bad. Also if you have a
working software memory scrub routine you can be certain multiple
errors from the same address are actually distinct. As opposed to
multiple reports of the same error.

Eric

2002-11-01 16:23:06

by Alan

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> When you have a correctable ECC error on a page you need to rewrite the
> memory to remove the error. This prevents the correctable error from becoming
> an uncorrectable error if another bit goes bad. Also if you have a
> working software memory scrub routine you can be certain multiple
> errors from the same address are actually distinct. As opposed to
> multiple reports of the same error.

Note that this area has some extremely "interesting" properties. For one
you have to be very careful what operation you use to scrub and its
platform specific. On x86 for example you want to do something like lock
addl $0, mem. A simple read/write isnt safe because if the memory area
is a DMA target your read then write just corrupted data and made the
problem worse not better!

2002-11-01 16:53:24

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On 1 Nov 2002, Alan Cox wrote:

> On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > When you have a correctable ECC error on a page you need to rewrite the
> > memory to remove the error. This prevents the correctable error from becoming
> > an uncorrectable error if another bit goes bad. Also if you have a
> > working software memory scrub routine you can be certain multiple
> > errors from the same address are actually distinct. As opposed to
> > multiple reports of the same error.
>
> Note that this area has some extremely "interesting" properties. For one
> you have to be very careful what operation you use to scrub and its
> platform specific. On x86 for example you want to do something like lock
> addl $0, mem. A simple read/write isnt safe because if the memory area
> is a DMA target your read then write just corrupted data and made the
> problem worse not better!
>

The correctable ECC is supposed to be just that (correctable). It's
supposed to be entirely transparent to the CPU/Software. An additional
read of the affected error produces the same correction so the CPU
will never even know. The x86 CPU/Software is only notified on an
uncorrectable error. I don't know of any SDRAM controller that
generates an interrupt upon a correctable error. Some store "logging"
information internally, very difficult to get at on a running system.

Given that, "scrubbing" RAM seems to be somewhat useless on a
running system. The next write to the affected area will fix the
ECC bits, that't what is supposed to clear up the condition.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-01 18:11:24

by Ed Vance

[permalink] [raw]
Subject: RE: [STATUS 2.5] October 30, 2002

On Fri, November 01, 2002 at 9:00 AM, Richard B. Johnson wrote:
> [...]
> The correctable ECC is supposed to be just that (correctable). It's
> supposed to be entirely transparent to the CPU/Software. An additional
> read of the affected error produces the same correction so the CPU
> will never even know. The x86 CPU/Software is only notified on an
> uncorrectable error. I don't know of any SDRAM controller that
> generates an interrupt upon a correctable error. Some store "logging"
> information internally, very difficult to get at on a running system.
>
> Given that, "scrubbing" RAM seems to be somewhat useless on a
> running system. The next write to the affected area will fix the
> ECC bits, that's what is supposed to clear up the condition.
>

Scrubbing has nothing whatever to do with reporting of correctable errors to
the CPU, even if it does the scrubbing.

Scrubbing does not happen on the basis of chance detection of correctable
errors from normal activity, because that would sometimes be too late.
Remember, the hardware only finds out about an error when the word is
accessed. There is no detection of the bit cell getting its charge altered,
and the errors are cumulative between corrections.

Scrubbing is intended to lower the probability that any given memory word
will be hit by a second error causing event (such as an alpha particle
emitted from a ceramic case) without having been accessed and corrected. The
scrub just continuously rolls through all of physical memory (at low
priority) again and again doing whatever level of access is necessary to
cause correction. This limits the maximum time between correction of any
memory word. Some memory systems automatically correct and rewrite
(atomically) on a read of a word with a single bit error. Some mainframe
memory systems do the whole ECC scrub/correction operation in hardware,
simultaneously in each bank.

The primary benefit of logging is to catch deteriorating memory cells during
periodic maintenance that either do not correct at all (single stuck bit,
single hits become uncorrectable) or that repeatedly fail over time, perhaps
due to charge leaks from long term diffusion of contaminants.

Cheers,
Ed

2002-11-01 18:43:45

by Malcolm Beattie

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

Ed Vance writes:
> Some mainframe
> memory systems do the whole ECC scrub/correction operation in hardware,
> simultaneously in each bank.

For those interested in the gory details of how the z900 mainframe
does memory scrubbing, see the section on "Memory" in
"RAS design for the IBM eServer z900" by L. C. Alves et al
in the z900 issue of IBM Journal of Research and Development.
HTML version at
http://www.research.ibm.com/journal/rd/464/alves.html
PDF version at
http://www.research.ibm.com/journal/rd/464/alves.pdf
Web page for whole issue at
http://www.research.ibm.com/journal/rd46-45.html

--Malcolm

2002-11-02 12:15:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

"Richard B. Johnson" <[email protected]> writes:

> On 1 Nov 2002, Alan Cox wrote:
>
> > On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > > When you have a correctable ECC error on a page you need to rewrite the
> > > memory to remove the error. This prevents the correctable error from
> becoming
>
> > > an uncorrectable error if another bit goes bad. Also if you have a
> > > working software memory scrub routine you can be certain multiple
> > > errors from the same address are actually distinct. As opposed to
> > > multiple reports of the same error.
> >
> > Note that this area has some extremely "interesting" properties. For one
> > you have to be very careful what operation you use to scrub and its
> > platform specific. On x86 for example you want to do something like lock
> > addl $0, mem. A simple read/write isnt safe because if the memory area
> > is a DMA target your read then write just corrupted data and made the
> > problem worse not better!

yep lock addl $0, mem with the appropriate kmaps so it will work on any system
I use. It isn't rocket science but since it is using kmap_atomic that function
at least should probably get in the kernel.

> The correctable ECC is supposed to be just that (correctable). It's
> supposed to be entirely transparent to the CPU/Software. An additional
> read of the affected error produces the same correction so the CPU
> will never even know. The x86 CPU/Software is only notified on an
> uncorrectable error. I don't know of any SDRAM controller that
> generates an interrupt upon a correctable error. Some store "logging"
> information internally, very difficult to get at on a running system.

Polling the memory controller periodically isn't hard, and you can usually
get an interrupt as well. Though I have not explored the whole interrupt
territory. Finding out when you have a corrected error is extremely useful
as it gives a warning that your memory is going bad. Just like with a disk
getting a bunch of errors means it is time to be replaced, but you still
have a little time left.

> Given that, "scrubbing" RAM seems to be somewhat useless on a
> running system. The next write to the affected area will fix the
> ECC bits, that't what is supposed to clear up the condition.

If it is your kernel text space that is getting the error there will
be no next write.

Beyond that if you are trying to see if the multiple correctable errors
you have are a single error, or an actual problem software scrubbing helps.
Because then you know the second report was because the problem reoccured.
Making it likely you have a bad bit in your DIMM.

Eric

2002-11-04 14:30:44

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

On 2 Nov 2002, Eric W. Biederman wrote:

> "Richard B. Johnson" <[email protected]> writes:
>
> > On 1 Nov 2002, Alan Cox wrote:
> >
> > > On Fri, 2002-11-01 at 14:05, Eric W. Biederman wrote:
> > > > When you have a correctable ECC error on a page you need to rewrite the
> > > > memory to remove the error. This prevents the correctable error from
> > becoming
> >
> > > > an uncorrectable error if another bit goes bad. Also if you have a
> > > > working software memory scrub routine you can be certain multiple
> > > > errors from the same address are actually distinct. As opposed to
> > > > multiple reports of the same error.
> > >
> > > Note that this area has some extremely "interesting" properties. For one
> > > you have to be very careful what operation you use to scrub and its
> > > platform specific. On x86 for example you want to do something like lock
> > > addl $0, mem. A simple read/write isnt safe because if the memory area
> > > is a DMA target your read then write just corrupted data and made the
> > > problem worse not better!
>
> yep lock addl $0, mem with the appropriate kmaps so it will work on any system
> I use. It isn't rocket science but since it is using kmap_atomic that function
> at least should probably get in the kernel.
>
> > The correctable ECC is supposed to be just that (correctable). It's
> > supposed to be entirely transparent to the CPU/Software. An additional
> > read of the affected error produces the same correction so the CPU
> > will never even know. The x86 CPU/Software is only notified on an
> > uncorrectable error. I don't know of any SDRAM controller that
> > generates an interrupt upon a correctable error. Some store "logging"
> > information internally, very difficult to get at on a running system.
>
> Polling the memory controller periodically isn't hard, and you can usually
> get an interrupt as well. Though I have not explored the whole interrupt
> territory. Finding out when you have a corrected error is extremely useful
> as it gives a warning that your memory is going bad. Just like with a disk
> getting a bunch of errors means it is time to be replaced, but you still
> have a little time left.
>
> > Given that, "scrubbing" RAM seems to be somewhat useless on a
> > running system. The next write to the affected area will fix the
> > ECC bits, that't what is supposed to clear up the condition.
>
> If it is your kernel text space that is getting the error there will
> be no next write.
>
> Beyond that if you are trying to see if the multiple correctable errors
> you have are a single error, or an actual problem software scrubbing helps.
> Because then you know the second report was because the problem reoccured.
> Making it likely you have a bad bit in your DIMM.
>
> Eric
>

The initial premise is fundamentally flawed. That being
that the first error you get will be a single-bit error.

Memory is not a bunch of randomly spaced bits that get
coalesced into bites/shorts/longs when accessed. Instead,
all the bits in a word are in the same general area. This
means that a nuclear event will alter several. In fact,
a nuclear event will likely put an "electronic hole" in
a physical area of memory. This area may cross several
memory "block" boundaries. These blocks are not usually
related to physical pages at all. These blocks have
different bit-densities depending upon the type and
manufacturer. Typical bit-densities are 16, 64, 128,
and 256 megabits. They are organized into banks so
they can be addressed in rows and columns, minimizing
the hardware. The result of a nuclear event may look
like this:

Base
_______________________________________
0x1000 | Bank 1 | Bank 2 | Bank 3 | Bank 4 |
0x8000 | | | / | |
0x10000 |--------|---------/---------|--------|
0x18000 | Bank 5 | Bank / | Bank 7 | Bank 8 |
0x20000 | | / | | |
0x0000 -------------/-------------------------
/
Particle trail---->

In this case, the event altered bits in bank 6 and
bank 3. It may have altered bits in 2 and 7 also.
The hits altered bits at many memory addresses as
the diagram shows. The bits that got altered are
in the hundreds of thousands (of bits). If you read
these areas, without disabling ECC, you will get
a NMI. If you read these areas with modern ECC
hardware, the read, just like a write, will correct
the ECC bits. Therefore, you have "fixed" corrupt
memory data. This is not good.

Isolating a bad bit in RAM caused by bad RAM is not
done by memory "scrubbing", it is done by having the
NMI handler disable access to the bad RAM. In an ix86
machine, that task is very difficult because the handler,
unlike a page-fault handler, has no direct knowledge of
the page being accessed when the NMI occurred. One could
"inspect" the code leading up to the fault, and guess what
memory access occurred but that access is quite likely
in the .text segment which means the code isn't even
correct to inspect. This stuff is possible to do, and
now that gigabytes of RAM are commonplace, it would
probably be a welcome addition to the kernel because the
probability of a single-bit error in ba-zillions of bits
is quite high.

Any "memory scrubbing" routines are worthless and simply
eat CPU cycles. Further, because of the well-established
principle of locality-of-action, you can have multiple
pages of trashed data in RAM, owned by all those sleeping
processes, that won't be accessed until the next boot.
If you want a reliable system, it's better to let sleeping
dogs lie and not access that RAM. You certainly don't want
to "scrub" it. That's like picking a scab. It will bleed.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Bush : The Fourth Reich of America


2002-11-04 15:55:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [STATUS 2.5] October 30, 2002

"Richard B. Johnson" <[email protected]> writes:

> The initial premise is fundamentally flawed. That being
> that the first error you get will be a single-bit error.

I did not say a single bit error I said a correctable error. Which
can recover if a single chip on a pair of DIMMs goes bad.

What I have seen in practice is that during manufacturing it is pretty
random weather the first error from bad memory will be correctable
or uncorrectable. Once the memory is running error free it is
quite likely the first error will be a correctable error. Especially
when it is the RAM that is going bad.
>
> Isolating a bad bit in RAM caused by bad RAM is not
> done by memory "scrubbing", it is done by having the
> NMI handler disable access to the bad RAM.

Scrubbing is for making certain the correction is written back to the RAM.
Many chipsets will correct the data going to the processor, but will leave
it corrupted in RAM. Allowing the possibility of errors to accumulate,
and making it hard to tell if multiple reports are from the same
error or a different error.

>In an ix86
> machine, that task is very difficult because the handler,
> unlike a page-fault handler, has no direct knowledge of
> the page being accessed when the NMI occurred.

We are obviously working with quite different hardware. Intel
chipsets routinely report an ECC error on the page level granularity.

> One could
> "inspect" the code leading up to the fault, and guess what
> memory access occurred but that access is quite likely
> in the .text segment which means the code isn't even
> correct to inspect.

I have seen no NMI error that ever trigger a cpu exception
to be synchronous with the code, though that may be possible with
an Athlon, which does the ECC correction in the CPU. In general the
errors come in asynchronously at some point after the error occured.
So even killing the task that is using the bad RAM is unreliable.
If the error is not correctable, on a server I panic the machine.

> This stuff is possible to do, and
> now that gigabytes of RAM are commonplace, it would
> probably be a welcome addition to the kernel because the
> probability of a single-bit error in ba-zillions of bits
> is quite high.
>
> Any "memory scrubbing" routines are worthless and simply
> eat CPU cycles.

Functional memory in practice does not have ECC errors, so
ECC code does not run. I only run the scrub routine on memory
that has reported a correctable error. And I think
1200 machines with 4GB each, running processor intensive tasks is a
reasonable sample to make this conclusion with.

>Further, because of the well-established
> principle of locality-of-action, you can have multiple
> pages of trashed data in RAM, owned by all those sleeping
> processes, that won't be accessed until the next boot.
> If you want a reliable system, it's better to let sleeping
> dogs lie and not access that RAM. You certainly don't want
> to "scrub" it. That's like picking a scab. It will bleed.

I do not randomly scrub memory, though for the hardware that does
not do that I am not be opposed to the idea of a daemon that does.
The biggest problem with doing that in the cpu is that you are likely
to trash your cache.

One of the bigger challenges to work through is that frequently leaves
a few ECC error after setting up RAM. So a cpu scrubber might trigger
those. Replacing the BIOS is a good way to be certain that doesn't
happen :)

Eric