2003-05-12 22:45:59

by Andrew Morton

[permalink] [raw]
Subject: 2.6 must-fix list, v2


There have been surprisingly few additions. The original and updated lists
are at

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/must-fix/

Nothing has been deleted. This means either that nobody is doing anything
or people forgot to tell me.


The changes are:


--- /tmp/must-fix-1.txt Mon May 12 15:49:48 2003
+++ /tmp/must-fix-2.txt Mon May 12 15:50:01 2003
@@ -5,11 +5,12 @@
drivers/char/
-------------

-- TTY locking is broken (see FIXME in do_tty_hangup())
+- TTY locking is broken.
+
+ - see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable()
+
+ - Other problems: aviro, dipankar, Alan have details.

- "One bug that was found is that the dropping of lock_kernel from do_exit
- caused races in the exit tty cleanup. There was a patch for that, but I'm
- not sure it was merged."


drivers/block/
@@ -64,6 +65,41 @@

- Lots of drivers don't compile, others do but don't work.

+drivers/scsi/
+-------------
+
+- hch: large parts of the locking are hosed or not existant
+
+ - shost->my_devices isn't locked down at all
+
+ - the host list ist locked but not refcounted, mess can happen when the
+ spinlock is dropped
+
+ - there are lots of members of struct Scsi_Host/scsi_device/scsi_cmnd
+ with very unclear locking, many of them probably want to become
+ atomic_t's or bitmaps (for the 1bit bitfields).
+
+ - there's lots of volatile abuse in the scsi code that needs to be
+ thought about.
+
+ - there's some global variables incremented without any locks
+
+
+ (Mike Anderson, Patrick Mansfield, Badari Pulavarty)
+
+ - large parts of the locking are hosed or non existent
+
+ -- shost->my_devices isn't locked at all
+
+ -- host list locked but not refcounted
+
+ -- lots of members of struct scsi_host/scsi_device/ scsi_cmd with
+ very unclear locking
+
+ -- lots of volatile abuse in scsi code
+
+ -- global variables incremented without locks.
+
fs/
---

@@ -90,6 +126,11 @@
whole lot of bogus packets start appearing. They look severely corrupted,
(they even crashed ethereal once 8-)

+- hch: devfs: there's a fundamental lookup vs devfsd race that's only
+ fixable by introducing a lookup vs devfs deadlock. I can't see how this is
+ fixable without getting rid of the current devfsd design. Mandrake seems
+ to have a workaround for this so this is at least not triggered so easily,
+ but that's not what I'd consider a fix..

kernel/
-------
@@ -286,6 +327,8 @@

- Integrate userspace irq balancing daemon.

+- kexec. Seems to work, is in -mm.
+
mm/
---

@@ -365,7 +408,7 @@
- Arch-independent code for performing state transitions, that calls
platform-specific methods along the way.

-- A better suspend-to-disk mechanism that swsusp.
+- A better suspend-to-disk mechanism than swsusp.

There are various other details to be worked out, which are the real fun
part. And of course, driver support, but that is something that can happen
@@ -390,6 +433,11 @@

- Pat's swsusp rework?

+- Pat: There are already CPU device structures; MTRRs should be a
+ dynamically registered interface of CPUs, which implies there needs
+ to be some other glue to know that there are MTRRs that need to be
+ saved/restored.
+
arch/i386/
----------



2003-05-12 22:47:25

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Andrew Morton <[email protected]> wrote:
>
>
> There have been surprisingly few additions.

And here is the full list:


Must-fix bugs
=============

drivers/char/
-------------

- TTY locking is broken.

- see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable()

- Other problems: aviro, dipankar, Alan have details.



drivers/block/
--------------

- RAID0 dies on strangely aligned BIOs

- Need to hoist BIO-split code out of device mapper, use that.

(neilb)

1/ RAID5 should work fine. It accepts any sort of bio and always
submits a 1-page bio to the underlying device, and if my
understanding is correct, every device must be able to handle a
single page bio, no matter what the alignment (which is why raid0
has a problem - it doesn't).

2/ RAID1 works pretty well. The only improvement needed is to define
a merge_bvec_fn function which passes the question down to lower
layers. This should be easy except for the small fact that it is
impossible :-) There is no enforced pairing between calls to
merge_bvec_fn and submit_bh, so it is possible that a hot spare
with different restrictions could get swapped in between the one
and the other and could confuse things. I suspect that can be
worked around somehow though...

Someone sent me a patch that is sorely needed - it allows you
to simply call blk_queue_stack() (or somethink like that), and it will
get your stacked limits set appropriately.

3/ I just realised that raid0 is easier than I had previously
thought. We don't need the completely functional bio splitting
that dm has. We only need to be able to split a bio that has just
one page as the use of merge_bvec_fn will ensure that we never get
a larger bio that we cannot handle. And splitting a bio with only
one page is a lot easier. I now have code in my tree that
implements this quite cleanly and will probably post a patch
during the week.

- ideraid hasn't been ported to 2.5 at all yet.

- CD burning. There are still a few quirks to solve wrt SG_IO and ide-cd.

Jens: The basic hang has been solved (double fault in ide-cd), there still
seems to be some cases that don't work too well. Don't really have a
handle on those :/

- IDE tcq. Either kill it or fix it. Not a "big todo", as such.

drivers/video/
--------------

- Lots of drivers don't compile, others do but don't work.

drivers/scsi/
-------------

- hch: large parts of the locking are hosed or not existant

- shost->my_devices isn't locked down at all

- the host list ist locked but not refcounted, mess can happen when the
spinlock is dropped

- there are lots of members of struct Scsi_Host/scsi_device/scsi_cmnd
with very unclear locking, many of them probably want to become
atomic_t's or bitmaps (for the 1bit bitfields).

- there's lots of volatile abuse in the scsi code that needs to be
thought about.

- there's some global variables incremented without any locks


(Mike Anderson, Patrick Mansfield, Badari Pulavarty)

- large parts of the locking are hosed or non existent

-- shost->my_devices isn't locked at all

-- host list locked but not refcounted

-- lots of members of struct scsi_host/scsi_device/ scsi_cmd with
very unclear locking

-- lots of volatile abuse in scsi code

-- global variables incremented without locks.

fs/
---

- NFS client gets an OOM deadlock.

- Some fixes exist in -mm. Seem to mostly work.

- NFS client runs very slowly consuming 100% CPU under heavy writeout.

- Unsubtle fix exists in -mm. (Looks like it's fixed anyway).

- ext3 data=journal mode is bust.

- ext3/htree doesn't play right with NFS server. 90% fixed in -mm.

- AIO/direct-IO writes can race with truncate and wreck filesystems.

- Easy fix is to only allow the feature for S_ISBLK files.

- davej: NFS seems to have a really bad time for some people. (Including
myself on one testbox). The common factor seems to be a high spec client
torturing an underpowered NFS server with lots of IO. (fsx/fsstress etc
show this up). Lots of "NFS server cheating" messages get dumped, and a
whole lot of bogus packets start appearing. They look severely corrupted,
(they even crashed ethereal once 8-)

- hch: devfs: there's a fundamental lookup vs devfsd race that's only
fixable by introducing a lookup vs devfs deadlock. I can't see how this is
fixable without getting rid of the current devfsd design. Mandrake seems
to have a workaround for this so this is at least not triggered so easily,
but that's not what I'd consider a fix..

kernel/
-------

- O(1) scheduler starvation, poor behaviour seems unresolved.

Jens: "I've been running 2.5.67-mm3 on my workstation for two days, and
it still doesn't feel as good as 2.4. It's not a disaster like some
revisisons ago, but it still has occasional CPU "stalls" where it feels
like a process waits for half a second of so for CPU time. That's is very
noticable."

Also see Mike Galbraith's work.

- Alan: 32bit uid support is *still* broken for process accounting.

(Test case?)

mm/
---

- Overcommit accounting gets wrong answers

- underestimates reclaimable slab, gives bogus failures when
dcache&icache are large.

- gets confused by reclaimable-but-not-freed truncated ext3 pages.
Lame fix exists in -mm.

- Proper user level no overcommit also requires a root margin adding

modules
-------

(Rusty)

- The .modinfo patch needs to go in. It's trivial, but it's the major
missing functionality vs. 2.4. Keeps bouncing off Linus.

- __module_get(): "I know I have a refcount already and I don't care
if they're doing rmmod --wait, gimme.". Keeps bouncing off Linus.

- Per-cpu support inside modules (have patch, in testing).

- driver class code is getting redone. I have this now working, and will
send it out in a few days.

net/
----

(davem)

- UDP apps can in theory deadlock, because the ip_append_data path can end
up sleeping while the socket lock is held.

It is OK to sleep with the socket held held, normally. But in this case
the sleep happens while waiting for socket memory/space to become
available, if another context needs to take the socket lock to free up the
space we could hang.

I sent a rough patch on how to fix this to Alexey, and he is analyzing
the situation. I expect a final fix from him next week or so.

- Semantics for IPSEC during operations such as TCP connect suck currently.

When we first try to connect to a destination, we may need to ask the
IPSEC key management daemon to resolve the IPSEC routes for us. For the
purposes of what the kernel needs to do, you can think of it like ARP. We
can't send the packet out properly until we resolve the path.

What happens now for IPSEC is basically this:

O_NONBLOCK: returns -EAGAIN over and over until route is resolved

!O_NONBLOCK: Sleeps until route is resolved

These semantics are total crap. The solution, which Alexey is working
on, is to allow incomplete routes to exist. These "incomplete" routes
merely put the packet onto a "resolution queue", and once the key manager
does it's thing we finish the output of the packet. This is precisely how
ARP works.

I don't know when Alexey will be done with this.

- There are those mysterious TCP hangs of established state sockets.
Someone has to get a good log in order for us to effectively debug this.



net/*/netfilter/
----------------

(Rusty)

- Handle non-linear skbs everywhere. This is going in via Dave now.

- Rework conntrack hashing.

- Module relationship bogosity fix (trivial, have patch).


global
------

- Lots of 2.4 fixes including some security are not in 2.5

- There are about 60 or 70 security related checks that need doing
(copy_user etc) from Stanford tools

- A couple of hundred real looking bugzilla bugs




Not-ready features and speedups
===============================


drivers/block/
--------------

- Framework for selecting IO schedulers. This is the main one really.
Once this is in place we can drop in new schedulers any old time, no risk.

- Dynamic disk request allocation. Patch exists.

- Runtime-selectable disk scheduler framework.

- Anticipatory scheduler. Working OK now, still has problems with seeky
OLTP-style loads.

- CFQ scheduler. Seems to work but Jens planning significant rework.

- The feral.com qlogic driver: needs work.


fs/
---

- reiserfs_file_write() speedup. There are concerns that some applications
do the wrong thing with large stat.st_blksize.

- ext3 lock_kernel() removal: that part works OK and is mergeable. But
we'll also need to make lock_journal() a spinlock, and that's deep surgery.

- 32bit quota needs a lot more testing but may work now

- Integrate Chris Mason's 2.4 reiserfs ordered data and data journaling
patches. They make reiserfs a lot safer.

- (Trond:) Yes: I'm still working on an atomic "open()", i.e. one
where we short-circuit the usual VFS path_walk() + lookup() +
permission() + create() + .... bullsh*t...

I have several reasons for wanting to do this (all of
them related to NFS of course, but much of the reasoning applies
to *all* networked file systems).

1) The above sequence is simply not atomic on *any* networked
filesystem.

2) It introduces a sh*tload of completely unnecessary RPC calls (why
do a 'permission' RPC call when the server is in *any* case going to
tell you whether or not this operations is allowed. Why do a
'lookup()' when the 'create()' call can be made to tell you whether or
not a file already exists).

3) It is incompatible with some operations: the current create()
doesn't pass an 'EXCLUSIVE' flag down to the filesystems.

4) (NFS specific?) open() has very different cache consistency
requirements when compared to most other VFS operations.

I'd very much like for something like Peter Braam's 'lookup with
intent' or (better yet) for a proper dentry->open() to be integrated with
path_walk()/open_namei(). I'm still working on the latter (Peter has
already completed the lookup with intent stuff).


kernel/
-------

(Rusty)

- Zippel's Reference count simplification. Tricky code, but cuts about 120
lines from module.c. Patch exists, needs stressing.

- /proc/kallsyms. What most people really wanted from /proc/ksyms. Patch
exists.

- Fix module-failed-init races by starting module "disabled". Patch
exists, requires some subsystems (ie. add_partition) to explicitly say
"make module live now". Without patch we are no worse off than 2.4 etc.

- Integrate userspace irq balancing daemon.

- kexec. Seems to work, is in -mm.

mm/
---

- objrmap: concerns over page reclaim performance at high sharing levels,
and interoperation with nonlinear mappings is hairy.

- Readd and make /proc/sys/vm/freepages writable again so that boxes can be
tuned for heavy interrupt load.

net/
----

(davem)

- Real serious use of IPSEC is hampered by lack of MPLS support. MPLS is a
switching technology that works by switching based upon fixed length labels
prepended to packets. Many people use this and IPSEC to implement VPNs
over public networks, it is also used for things like traffic engineering.

A good reference site is:

http://www.mplsrc.com/

Anyways, an existing (crappy) implementation exists. I've almost
completed a rewrite, I should have something in the tree next week.

- Sometimes we generate IP fragments when it truly isn't necessary.

The way IP fragmentation is specified, each fragment must be modulo 8
bytes in length. So suppose the device has an MTU that is not 0 modulo 8,
ethernet even classifies in this way. 1500 == (8 * 187) + 4

Our IP fragmenting engine can fragment on packets that are sized within
the last modulo 8 bytes of the MTU. This happens in obscure cases, but it
does happen.

I've proposed a fix to Alexey, whereby very late in the output path we
check the packet, if we fragmented but the data length would fit into the
MTU we unfragment the packet.

This is low priority, because technically it creates suboptimal behavior
rather than mis-operation.

- IPV4 output engine changes for IPSEC need to be moved over to IPV6.

IPV6 ipsec works but gravely suboptimally in some cases. It is also for
this reason that the zerocopy UDP stuff isn't functional on the ipv6 side.

The USAGI project (http://www.linux-ipv6.org) is working with Alexey on this
work.

net/*/netfilter/
----------------

- Lots of misc. cleanups, which are happening slowly.

- davem: Netfilter needs to stop linearizing packets as much as possible.

Zerocopy output packets are basically undone by netfilter becuase all of
it assumed it was working with linear socket buffers.

Rusty is fixing this piece by piece. He is nearly done with this work.

power management
----------------

(Pat) There is some preliminary work at bk://ldm.bkbits.net/linux-2.5-power,
though I'm currently in the process of reworking it.

It includes:

- New device power management core code, both for individual devices,
and for global state transitions.

- A generic user interface for triggering system power state transitions.

- Arch-independent code for performing state transitions, that calls
platform-specific methods along the way.

- A better suspend-to-disk mechanism than swsusp.

There are various other details to be worked out, which are the real fun
part. And of course, driver support, but that is something that can happen
at any time.

(Alan)

- PCI locking

- Frame buffer restore codepaths (that requires some deep PCI magic)

- XFree86 hooks

- AGP restoration

- DRI restoration

- IDE suspend/resume without races (Ben is looking at this a little)

- How to deal with devices that babble (some stuff we have to global IRQ
off to save, and global IRQ on -after- we recover with APM)

- Pat's swsusp rework?

- Pat: There are already CPU device structures; MTRRs should be a
dynamically registered interface of CPUs, which implies there needs
to be some other glue to know that there are MTRRs that need to be
saved/restored.

arch/i386/
----------

- Andi: i386 sub architectures for common boxes (in particular bigsmp and
summit) need to be runtime probed options, not compile time. Vendors
cannot ship an own kernel rpm for all these cases. (patch is in -mm, works
OK).

- Also PC9800 merge needs finishing to the point we want for 2.6 (not all).

- ES7000 wants merging (now we are all happy with it). That shouldn't be a
big problem.

global
------

- 64-bit dev_t. Seems almost ready, but it's not really known how much
work is still to do. Patches exist in -mm but with the recent rise of the
neo-viro I'm not sure where things are at.

- We need a kernel side API for reporting error events to userspace (could
be async to 2.6 itself)

(Prototype core based on netlink exists)

- Kai: Introduce a sane, easy and standard way to build external modules

- Kai: Allow separate src/objdir





drivers
=======

- Alan: PCI random reordering from 2.4 to 2.5 isnt understood yet (might be
fixed now?)

- Alan: We have multiple drivers walking the pci device lists and also
using things like pci_find_device in unsafe ways with no refcounting. I
think we have to make pci_find_device etc refcount somewhere and add
pci_device_put as was done with networking.

- Lots of network drivers don't even build

- Alan: PCI hotplug is unsafe (locking is totally screwed)

- Ditto cardbus

- Alan: Cardbus/PCMCIA requires all Russell's stuff is merged to do
multiheader right and so on

drivers/acpi/
-------------

- davej: ACPI has a number of failures right now. There are a number of
entries in bugzilla which could all be the same bug. It manifests as a
"network card doesn't recieve packets" booting with 'acpi=off noapic' fixes
it.

- davej: There's also another nasty 'doesnt boot' bug which quite a few
people (myself included) are seeing on some boxes (especially laptops).

drivers/block/
--------------

- Alan: Partition handling is hosed for DM users. (I have some partly
debugged patches in the -ac tree, but Andries objects to them and I think
his user knows magic options hack is unacceptable too. Mostly this is
figuring out the right answer)

- Floppy is almost unusably buggy still

drivers/char/
-------------

- Alan: Multiple serious bugs in the DRI drivers (most now with patches
thankfully). "The badness I know about is almost entirely IRQ mishandling.
DRI failing to mask PCI irqs on exit paths."

- Various suspect things in AGP.

drivers/ide/
------------

(Alan)

- IDE requires bio walking

- IDE PIO has occasional unexplained PIO disk eating reports

- IDE has multiple zillions of races/hangs in 2.5 still

- IDE eats disks with HPT372N on 2.5.x

- IDE scsi needs rewriting

- IDE needs significant reworking to handle Simplex right

- IDE hotplug handling for 2.5 is completely broken still

drivers/isdn/
-------------

(Kai, rmk)

- isdn_tty locking is completely broken (cli() and friends)

- fix lots of remaining bugs in the isdn link layer / hisax protocol layer
/ hisax subdrivers, so that at least 99% of the users have a usable ISDN
subsystem

- fix other drivers

- lots more cleanups, adaption to recent APIs etc

- fixup tty-based ISDN drivers which provide TIOCM* ioctls (see my recent
3-set patch for serial stuff)

Alternatively, we could re-introduce the fallback to driver ioctl parsing
for these if not enough drivers get updated.

- fixup the usb-serial core and drivers to provide support for this
patch.

drivers/net/
------------

- davej: Either Wireless network drivers or PCMCIA broke somewhen. A
configuration that worked fine under 2.4 doesn't receive any packets. Need
to look into this more to make sure I don't have any misconfiguration that
just 'happened to work' under 2.4


drivers/scsi/
-------------

- Half of SCSI doesn't compile

arch/i386/
----------

- 2.5.x won't boot on some 440GX

- 2.5.x doesn't handle VIA APIC right yet - dont know why

- ACPI needs the relax patches merging to work on lots of laptops

- ECC driver questions are not yet sorted (DaveJ is working on this)

arch/x86_64/
------------

(Andi)

- time handling is broken. Need to move up 2.4 time.c code.

- memory corruption with IOMMU pci_free_consistent - often causes crashes
at shutdown. This is rather mysterious, the code is basically identical to
2.4 which works fine. Can only be seen on systems with >4GB of memory or
with iommu=force

- Another report of a crash at shutdown on Simics with no iommu when all
memory was used. Could be related to the one above.

- change_page_attr corrupts memory/crashes. Breaks some AGP users.

- NMI watchdog seems to tick too fast

- some fixes from 2.4 still need to be merged

- not very well tested. probably more bugs lurking.


2003-05-13 00:08:14

by Greg KH

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Mon, May 12, 2003 at 03:54:17PM -0700, Andrew Morton wrote:
>
> There have been surprisingly few additions. The original and updated lists
> are at
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/must-fix/
>
> Nothing has been deleted. This means either that nobody is doing anything
> or people forgot to tell me.

People forget to tell you :)

Here's a small patch knocking two things off the list that are now in
Linus's tree.

thanks,

greg k-h


--- must-fix-2.txt.original 2003-05-12 17:18:23.782129948 -0700
+++ must-fix-2.txt 2003-05-12 17:19:36.906235476 -0700
@@ -175,9 +175,6 @@

- Per-cpu support inside modules (have patch, in testing).

-- driver class code is getting redone. I have this now working, and will
- send it out in a few days.
-
net/
----

@@ -561,9 +558,6 @@
Alternatively, we could re-introduce the fallback to driver ioctl parsing
for these if not enough drivers get updated.

-- fixup the usb-serial core and drivers to provide support for this
- patch.
-
drivers/net/
------------




2003-05-13 01:47:35

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Andrew Morton wrote:

> - Alan: PCI random reordering from 2.4 to 2.5 isnt understood yet (might be
> fixed now?)

This is fixed for ide and anything else that relies on PCI bus order.
Network cards rely on link order and nobody has volunteered to fix it
(the latest proposal I saw is not even 2.4-compatible.)


2003-05-13 03:53:13

by Al Viro

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Mon, May 12, 2003 at 03:55:11PM -0700, Andrew Morton wrote:
>
> drivers/char/
> -------------
>
> - TTY locking is broken.

No shit. Locking, refcounting, serial drivers, yada, yada. Currently it's
the worst widely-used subsystem in the tree - both 2.4 and 2.5 (and 2.2 is
not much better). I've got some cleanups, but that will have to go slowly
and carefully - otherwise we'll destroy the last remnants of 2.0 race
prevention logics in there and that's the only thing that makes the current
code kinda-sorta work most of the time.

> - see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable()
>
> - Other problems: aviro, dipankar, Alan have details.

BTW, somebody will have to document the tty driver and ldisc API.

2003-05-13 04:46:23

by Greg KH

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 05:05:57AM +0100, [email protected] wrote:
>
> BTW, somebody will have to document the tty driver and ldisc API.

I've been working on documenting the current tty API and hope to do the
ldisc API too. I guess I should put it up somewhere for people to poke
at...

Or do you mean document your changes in the API? I've been trying to
keep up to date (got the recent ioctl changes that went in a few
versions ago) and would be glad to try to keep it current.

thanks,

greg k-h

2003-05-13 05:47:48

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Andrew Morton <[email protected]> writes:

I fixed a lot of bugs since then. It's not all merged yet and I introduced
a few new ones in the process but:

>
> arch/x86_64/
> ------------
>
> (Andi)
>
> - memory corruption with IOMMU pci_free_consistent - often causes crashes
> at shutdown. This is rather mysterious, the code is basically identical to
> 2.4 which works fine. Can only be seen on systems with >4GB of memory or
> with iommu=force

This is fixed.


> - change_page_attr corrupts memory/crashes. Breaks some AGP users.

This is also fixed.


> - some fixes from 2.4 still need to be merged

This is basically done, except the timing code.

Current new bug list:

- 32bit vsyscalls seem to be broken
- 32bit elf coredumps are broken

Required/Wanted features:

- need to coredump 64bit vsyscall code with dwarf2
- move 64bit signal trampolines into vsyscall code and add dwarf2 for it.
- describe kernel assembly with dwarf2 annotations for kgdb (currently waiting
on some binutils changes for this)

-Andi

2003-05-13 11:23:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Andrew Morton <[email protected]> writes:

> - NFS client gets an OOM deadlock.
> - Some fixes exist in -mm. Seem to mostly work.
> - NFS client runs very slowly consuming 100% CPU under heavy
> writeout.
> - Unsubtle fix exists in -mm. (Looks like it's fixed anyway).

<snip>

> - davej: NFS seems to have a really bad time for some people. (Including
> myself on one testbox). The common factor seems to be a high
> spec client torturing an underpowered NFS server with lots of
> IO. (fsx/fsstress etc show this up). Lots of "NFS server
> cheating" messages get dumped, and a whole lot of bogus
> packets start appearing. They look severely corrupted, (they
> even crashed ethereal once 8-)

Could people please test these items out again using the latest
Bitkeeper release? I believe I've addressed all these issues with the
patches that have gone to Linus in the last 2-3 weeks.

Cheers,
Trond

2003-05-13 11:36:42

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2



Can I propose an addition?

Not-ready features and speedups
===============================

kernel/fork.c
-------------

- Add ability to restrict the the default CPU affinity mask so that
sys_setaffinity() can be used to implement exclusive access to a CPU. Patch
exists on LKML.


2003-05-13 13:45:10

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 01:36:14PM +0200, Trond Myklebust wrote:
> > - davej: NFS seems to have a really bad time for some people. (Including
> > myself on one testbox). The common factor seems to be a high
> > spec client torturing an underpowered NFS server with lots of
> > IO. (fsx/fsstress etc show this up). Lots of "NFS server
> > cheating" messages get dumped, and a whole lot of bogus
> > packets start appearing. They look severely corrupted, (they
> > even crashed ethereal once 8-)
>
> Could people please test these items out again using the latest
> Bitkeeper release? I believe I've addressed all these issues with the
> patches that have gone to Linus in the last 2-3 weeks.

I can still kill an NFS server in under a minute with fsx.

Dave

2003-05-13 14:42:53

by Alan

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Llu, 2003-05-12 at 23:55, Andrew Morton wrote:
>
> - IDE tcq. Either kill it or fix it. Not a "big todo", as such.
>

There are lots of other IDE bugs that wont go away until the taskfile
stuff is included, the locking bugs that allow any user to hang the IDE
layer in 2.5, and some other updates are forward ported. Bart is making
amazing progress.

> - Lots of drivers don't compile, others do but don't work.

True of char, scsi and others. I'd like to propose that we don't treat
drivers that need porting as a 2.6test showstopper. In fact its going to
be a lot easier to fix them once someone can port them knowing the core
code is frozen.

drivers/pci

Some cardbus crashes the system
Hotplug locking is hosed

drivers/pcmcia
Most drivers crash the system on eject randomly with timer bugs. I think
after RML's stuff is in most of the pcmcia/cardbus ones go except the
locking disaster

> - davej: NFS seems to have a really bad time for some people. (Including
> myself on one testbox). The common factor seems to be a high spec client
> torturing an underpowered NFS server with lots of IO. (fsx/fsstress etc
> show this up). Lots of "NFS server cheating" messages get dumped, and a
> whole lot of bogus packets start appearing. They look severely corrupted,
> (they even crashed ethereal once 8-)

Lots of recent 2.4 NFS fixes in the client especially want forward
porting

> - Alan: 32bit uid support is *still* broken for process accounting.
>
> (Test case?)

Create a 32bit uid, turn accounting on. Shock horror it doesnt work
because the field is 16bit. We need an acct structure flag day for 2.6
IMHO


> - Proper user level no overcommit also requires a root margin adding

I've played with this a bit, it turns out to be suprisingly trivial

> Not-ready features and speedups
> ===============================
>

> - 32bit quota needs a lot more testing but may work now

Seems to work in 2.4-ac at the moment 8)

> - ES7000 wants merging (now we are all happy with it). That shouldn't be a
> big problem.

In 2.5.6x-ac - just worked.

> drivers
> =======
>
> - Alan: PCI random reordering from 2.4 to 2.5 isnt understood yet (might be
> fixed now?)

FIXED

> - Lots of network drivers don't even build

Mostly fixed

> drivers/acpi/
> -------------
>
> - davej: ACPI has a number of failures right now. There are a number of
> entries in bugzilla which could all be the same bug. It manifests as a
> "network card doesn't recieve packets" booting with 'acpi=off noapic' fixes
> it.

VIA APIC stuff is one bit of this, there are also some other reports
that were caused by ACPI not setting level v edge trigger some times

> drivers/block/
> --------------
>
> - Alan: Partition handling is hosed for DM users. (I have some partly
> debugged patches in the -ac tree, but Andries objects to them and I think
> his user knows magic options hack is unacceptable too. Mostly this is
> figuring out the right answer)

"FIXED" - We've established the device mapper can do the translation.
Its a chunk of work for vendors but its doable

> - Floppy is almost unusably buggy still

Seems to be working for me now as of 2.5.69-ac1


> drivers/char/
> -------------
>
> - Alan: Multiple serious bugs in the DRI drivers (most now with patches
> thankfully). "The badness I know about is almost entirely IRQ mishandling.
> DRI failing to mask PCI irqs on exit paths."

Linus has been updating DRI stuff since then so may be sorted

> drivers/ide/
> ------------
>
> (Alan)
>
> - IDE requires bio walking

Bartlomiej has IDE multisector working

> - IDE PIO has occasional unexplained PIO disk eating reports

Seems ok in 2.5.69-ac

> - IDE has multiple zillions of races/hangs in 2.5 still

The wonder Bart is currently redoing ide setup which is the big mess
left and has tackled taskfile block I/O already.

> - IDE eats disks with HPT372N on 2.5.x

Fixed for 2.4.x now - can be ported easily. This has folded into
"forward port IDE driver fixes from 2.4"

> - IDE needs significant reworking to handle Simplex right

Some old draft patches from Torben exist for the beginnings of
this.

> arch/i386/
> ----------
>
> - 2.5.x won't boot on some 440GX

Problem understood now, feasible fix in 2.4/2.4-ac. (440GX has two IRQ
routers, we use the $PIR table with the PIIX, but the 440GX doesnt use
the PIIX for its IRQ routing). Fall back to BIOS for 440GX works and
Intel concurs.

> - 2.5.x doesn't handle VIA APIC right yet - dont know why

1. We must write the PCI_INTERRUPT_LINE, 2. We have quirk handlers that
seem to trash it.

> - ACPI needs the relax patches merging to work on lots of laptops

Working in 2.4.21-ac, Toshiba cheap laptops now run a treat. Forward
port looks like a patch command


Other items:

PC9800 is not fully merged - most of this I think is 2.7 stuff but a few
bits might be 2.6 candidate

SH3/SH3-64 need resynching, as do some other ports. No impact on
mainstream platforms hopefully


2003-05-13 14:48:01

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 02:57:08PM +0100, Alan Cox wrote:
> On Llu, 2003-05-12 at 23:55, Andrew Morton wrote:
> >
> > - IDE tcq. Either kill it or fix it. Not a "big todo", as such.
> >
>
> There are lots of other IDE bugs that wont go away until the taskfile
> stuff is included,

Let me ask the dumb question then. :) I've been following the various
IDE threads and see a lot of "X won't happen until taskfile IO is in"

What remains to do, until taskfile IO can go in?

Thanks,

Jeff



2003-05-13 15:04:12

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> Dave Jones <[email protected]> writes:

> I can still kill an NFS server in under a minute with fsx.

I'm more interested in hearing how the client fixes are coping.
i.e. is the client recovering properly if/when you restart the server
after such a crash?

Cheers,
Trond

2003-05-13 15:00:26

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13 2003, Jeff Garzik wrote:
> On Tue, May 13, 2003 at 02:57:08PM +0100, Alan Cox wrote:
> > On Llu, 2003-05-12 at 23:55, Andrew Morton wrote:
> > >
> > > - IDE tcq. Either kill it or fix it. Not a "big todo", as such.
> > >
> >
> > There are lots of other IDE bugs that wont go away until the taskfile
> > stuff is included,
>
> Let me ask the dumb question then. :) I've been following the various
> IDE threads and see a lot of "X won't happen until taskfile IO is in"
>
> What remains to do, until taskfile IO can go in?

Main issue is the bio walking stuff, which is just awaiting a Linus
commit. At least from the block layer side.

--
Jens Axboe

2003-05-13 15:09:38

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 05:16:39PM +0200, Trond Myklebust wrote:
> >>>>> Dave Jones <[email protected]> writes:
>
> > I can still kill an NFS server in under a minute with fsx.
>
> I'm more interested in hearing how the client fixes are coping.
> i.e. is the client recovering properly if/when you restart the server
> after such a crash?

After a crash, I can carry on using that export just fine.
unexporting and reexporting also works fine.
Perhaps 'kill' was an over-strong word to use above, lets
replace it with 'make it break causing possible fs corruption'.

Dave

2003-05-13 15:26:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 02:57:08PM +0100, Alan Cox wrote:
> > - ACPI needs the relax patches merging to work on lots of laptops
>
> Working in 2.4.21-ac, Toshiba cheap laptops now run a treat. Forward
> port looks like a patch command
>
>
> Other items:
>
> PC9800 is not fully merged - most of this I think is 2.7 stuff but a few
> bits might be 2.6 candidate
>
> SH3/SH3-64 need resynching, as do some other ports. No impact on
> mainstream platforms hopefully

That brings up another issue: what ports do regularly work with 2.5
mainline? I've been working with David to get all those core changes ia64
needs (and there's still a lot) sorted out so maybe 2.6 will work out of
the box. I guess some other arches (parisc, mips?) will need similar
work.

2003-05-13 15:19:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Dave Jones <[email protected]> writes:

> unexporting and reexporting also works fine. Perhaps 'kill'
> was an over-strong word to use above, lets replace it with
> 'make it break causing possible fs corruption'.

That is a server bug. There are no rules for congestion control
etc. in the NFS or SunRPC protocols, so the server is supposed to be
able to cope with whatever the client manages to throw at it.

I presume, though, that you are not seeing the 2.4.x NFS server die in
this way when you blast it with a 2.5 client?

Cheers,
Trond

2003-05-13 15:34:51

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 05:32:29PM +0200, Trond Myklebust wrote:

> That is a server bug. There are no rules for congestion control
> etc. in the NFS or SunRPC protocols, so the server is supposed to be
> able to cope with whatever the client manages to throw at it.
>
> I presume, though, that you are not seeing the 2.4.x NFS server die in
> this way when you blast it with a 2.5 client?

I had thought that the 2.4 server survived this. I just did a test
with a 2.4.21pre7 kernel and found the same behaviour, so this isn't
a regression, just something thats not very nice.

Dave

2003-05-13 15:34:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Alan Cox <[email protected]> writes:

> Lots of recent 2.4 NFS fixes in the client especially want
> forward porting

Which ones are you thinking about in particular? I've developed most
of those fixes in parallel on 2.4.x and 2.5.x, and have tried to push
them to Linus asap. I therefore believe 2.5.x should be pretty much up
to date.
I believe Chuck has compiled a list of discrepancies, but IIRC there
are no showstoppers there.

The most serious non-NFSv4 problem I believe is the fact that IODirect
for NFS needs to be completed. I need to bug Chuck about that.

I also need to look over the VFS file locking changes to see if
anything has broken lockd.

any more?

Cheers,
Trond

2003-05-13 15:38:03

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 04:38:54PM +0100, Christoph Hellwig wrote:
> On Tue, May 13, 2003 at 02:57:08PM +0100, Alan Cox wrote:
> > > - ACPI needs the relax patches merging to work on lots of laptops
> >
> > Working in 2.4.21-ac, Toshiba cheap laptops now run a treat. Forward
> > port looks like a patch command
> >
> >
> > Other items:
> >
> > PC9800 is not fully merged - most of this I think is 2.7 stuff but a few
> > bits might be 2.6 candidate
> >
> > SH3/SH3-64 need resynching, as do some other ports. No impact on
> > mainstream platforms hopefully
>
> That brings up another issue: what ports do regularly work with 2.5
> mainline? I've been working with David to get all those core changes ia64
> needs (and there's still a lot) sorted out so maybe 2.6 will work out of
> the box. I guess some other arches (parisc, mips?) will need similar
> work.

mips definitely needs work. I don't know that there exists a working
2.5 mips port.

I told Ralf I would work on getting it booting on my Indy, and have been
slowly working through that. There is also some mips work in the
linux-mips cvs tree.

That's definitely a "todo"

Jeff



2003-05-13 15:53:34

by Alan

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Maw, 2003-05-13 at 16:47, Trond Myklebust wrote:
> The most serious non-NFSv4 problem I believe is the fact that IODirect
> for NFS needs to be completed. I need to bug Chuck about that.
>
> I also need to look over the VFS file locking changes to see if
> anything has broken lockd.
>
> any more?

Are all of Steve's fixes for the NFS client from 2.4 propogated into 2.5
now then ?

2003-05-13 15:46:30

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 01:36:14PM +0200, Trond Myklebust wrote:
> >>>>> " " == Andrew Morton <[email protected]> writes:
>
> > - NFS client gets an OOM deadlock.
> > - Some fixes exist in -mm. Seem to mostly work.
> > - NFS client runs very slowly consuming 100% CPU under heavy
> > writeout.
> > - Unsubtle fix exists in -mm. (Looks like it's fixed anyway).
>
> <snip>
>
> > - davej: NFS seems to have a really bad time for some people. (Including
> > myself on one testbox). The common factor seems to be a high
> > spec client torturing an underpowered NFS server with lots of
> > IO. (fsx/fsstress etc show this up). Lots of "NFS server
> > cheating" messages get dumped, and a whole lot of bogus
> > packets start appearing. They look severely corrupted, (they
> > even crashed ethereal once 8-)
>
> Could people please test these items out again using the latest
> Bitkeeper release? I believe I've addressed all these issues with the
> patches that have gone to Linus in the last 2-3 weeks.

Well, using BK as of Friday last week I'm still having a complete
disaster of NFS support. Copying a 13MB file within an NFS-mounted
directory usually yields an I/O error, creating that same file does too
(it's a final link, so I don't know offhand if reading the objects or
writing the binary is falling over). Server is rather old now,
in-kernel NFSd from 2.4.19-pre10-ac2, but it works just fine on 2.4
clients.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2003-05-13 15:51:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Dave Jones <[email protected]> writes:

> I had thought that the 2.4 server survived this. I just did a
> test with a 2.4.21pre7 kernel and found the same behaviour, so
> this isn't a regression, just something thats not very nice.

Then I'm confused as to what you are saying. Are we talking about a
full NFS server crash or just a temporary 'server not responding'
situation? Does NFS over TCP fix it, for instance?

Cheers,
Trond

2003-05-13 16:00:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Daniel Jacobowitz <[email protected]> writes:

> Well, using BK as of Friday last week I'm still having a
> complete disaster of NFS support.

Please try a more recent snapshot. The OOM situation was only fixed
with the patches that Linus pulled for patch-2.5.69-bk7
(i.e. yesterday's snapshot).

Oh. Please also turn off any 'soft' mount option that you may
have. Like it or not, those *will* cause EIO errors.

Cheers,
Trond

2003-05-13 15:58:10

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 06:02:31PM +0200, Trond Myklebust wrote:
> >>>>> " " == Dave Jones <[email protected]> writes:
>
> > I had thought that the 2.4 server survived this. I just did a
> > test with a 2.4.21pre7 kernel and found the same behaviour, so
> > this isn't a regression, just something thats not very nice.
>
> Then I'm confused as to what you are saying. Are we talking about a
> full NFS server crash or just a temporary 'server not responding'
> situation?

ok, I've now established that kernel version is irrelevant.
the server keeps on ticking, no problems.
the client (which runs fsx on an nfs mount) fails.

Two failure modes.
1, fsx dies with bus error. This is infrequent (seen it once)
The most common failure mode is..
2, fsx fails. Different failure each time. Usually takes a minute
to trigger.

(16:30:08:root@tetrachloride:mesh)# ~/fsx voon
truncating to largest ever: 0x13e76
truncating to largest ever: 0x2e52c
truncating to largest ever: 0x3c2c2
truncating to largest ever: 0x3f15f
truncating to largest ever: 0x3fcb9
truncating to largest ever: 0x3fe96
truncating to largest ever: 0x3ff9d
Size error: expected 0x2126e stat 0x21546 seek 0x21546
LOG DUMP (7665 total operations):
7666(242 mod 256): READ 0x1b20b thru 0x215b7 (0x63ad bytes)
7667(243 mod 256): MAPWRITE 0x247ef thru 0x26e9f (0x26b1 bytes)
7668(244 mod 256): WRITE 0x26fc thru 0x11d2f (0xf634 bytes)
7669(245 mod 256): TRUNCATE DOWN from 0x3b18c to 0x1cbfc
7670(246 mod 256): MAPREAD 0x7f92 thru 0xd860 (0x58cf byte
....
etc...

> Does NFS over TCP fix it, for instance?

Untested, I can give that a try.

Dave

2003-05-13 16:09:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Alan Cox <[email protected]> writes:

> Are all of Steve's fixes for the NFS client from 2.4 propogated
> into 2.5 now then ?

Which ones? Are you talking about the mmap() problem that he reported?
We're still looking for a solution to that. I'm not convinced that his
fix is appropriate as it appears to me just to be playing with the
timing of the symptoms.

Unless he's hoarding something, then most of the other 2.4 fixes
should be stuff he got off Chuck and me, so those are already in...

Cheers,
Trond

2003-05-13 16:14:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Dave Jones <[email protected]> writes:

> (16:30:08:root@tetrachloride:mesh)# ~/fsx voon truncating to
> largest ever: 0x13e76 truncating to largest ever: 0x2e52c
> truncating to largest ever: 0x3c2c2 truncating to largest ever:
> 0x3f15f truncating to largest ever: 0x3fcb9 truncating to
> largest ever: 0x3fe96 truncating to largest ever: 0x3ff9d Size
> error: expected 0x2126e stat 0x21546 seek 0x21546 LOG DUMP
> (7665 total operations): 7666(242 mod 256): READ 0x1b20b thru
> 0x215b7 (0x63ad bytes) 7667(243 mod 256): MAPWRITE 0x247ef thru
> 0x26e9f (0x26b1 bytes) 7668(244 mod 256): WRITE 0x26fc thru
> 0x11d2f (0xf634 bytes) 7669(245 mod 256): TRUNCATE DOWN from
> 0x3b18c to 0x1cbfc 7670(246 mod 256): MAPREAD 0x7f92 thru
> 0xd860 (0x58cf byte .... etc...

Ah... mmap()ed writes + truncate()...

OK. There's currently a known problem here which appears both in 2.4.x
and 2.5.x: we appear to be incapable of flushing out all the dirty
pages prior to truncating the file. The usual
filemap_fdatasync()/filemap_fdatawait() appears to be subject to races
with VM swapping.

Could we have some help from the VM experts on this one?

Cheers,
Trond

2003-05-13 16:32:52

by Dave McCracken

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2


--On Tuesday, May 13, 2003 18:27:35 +0200 Trond Myklebust
<[email protected]> wrote:

> Ah... mmap()ed writes + truncate()...
>
> OK. There's currently a known problem here which appears both in 2.4.x
> and 2.5.x: we appear to be incapable of flushing out all the dirty
> pages prior to truncating the file. The usual
> filemap_fdatasync()/filemap_fdatawait() appears to be subject to races
> with VM swapping.
>
> Could we have some help from the VM experts on this one?

I'm in the process of quantifying a big race condition in vmtruncate().
The scenario for the race is this:

* Task 1 truncates the file, which resets the size and calls vmtruncate().

* Task 1 in vmtruncate() walks all vmas for the file and unmaps pages from
the truncated file region.

* Task 2 then extends the file and faults pages back in.

* Task 1 (still in vmtruncate()) removes pages including the newly remapped
pages from the page cache using the original truncated size.

We now have mapped and dirty pages that do not belong to any page cache and
will not be written back to the file. All subsequent data written via the
mapped pages will be lost.

I don't have a solution for it yet. I've just gotten as far as identifying
out the race.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2003-05-13 16:41:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Dave McCracken <[email protected]> writes:

> I'm in the process of quantifying a big race condition in
> vmtruncate(). The scenario for the race is this:

> * Task 1 truncates the file, which resets the size and calls
> vmtruncate().

> * Task 1 in vmtruncate() walks all vmas for the file and unmaps
> pages from
> the truncated file region.

> * Task 2 then extends the file and faults pages back in.

> * Task 1 (still in vmtruncate()) removes pages including the
> newly remapped
> pages from the page cache using the original truncated size.

The scenario I'm thinking about is different and can be triggered on a
single process.

The dirty pages are failing to be written out because
they've been swapped out. We then try to do an RPC call to the server
to get it to truncate the file on its side.
Meanwhile one or more of the swapped out pages are faulted in, and
attempted written out. -> race...

Note that we do the vmtruncate() *after* the RPC call (since we cannot
predict whether or not the server will agree to our request) however
actually moving the vmtruncate() to before the RPC call does not
appear to fix the problem.

Cheers,
Trond

2003-05-13 16:48:48

by Sam Ravnborg

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 11:50:47AM -0400, Jeff Garzik wrote:
> mips definitely needs work. I don't know that there exists a working
> 2.5 mips port.
>
> I told Ralf I would work on getting it booting on my Indy, and have been
> slowly working through that. There is also some mips work in the
> linux-mips cvs tree.

If I want to update mips Makefiles to new style - what should be used
as baseline?

Linus-BK or a mips cvs somewhere?

Sam

2003-05-13 17:00:01

by James Bottomley

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

For SCSI, as far as the basics go we still have

Need to convert to DMA-mapping:

am53c974 dpt_i2o initio pci2220i

Don't compile currently:

inia100 cpqfc pci2000 dc390t

Need converting to the new eh:

wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93
53c7xx based: amiga7xxx bvme6000 mvme16x
initio am53c974 pci2000 pci2220i qla1280 sym53c8xx dc390t

I think the sym53c8xx could probably be pulled out of the tree because
the sym_2 replaces it. I'm also looking at converting the qla1280.

It also might be possible to shift the 53c7xx based drivers over to
53c700 which does the new EH stuff, but I don't have the hardware to
check such a shift.

For the non-compiling stuff, I've probably missed a few that just aren't
compilable on my platforms, so any updates would be welcome. Also, are
some of our non-compiling or unconverted drivers obsolete?

James


2003-05-13 17:25:12

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 06:02:31PM +0200, Trond Myklebust wrote:
> Then I'm confused as to what you are saying. Are we talking about a
> full NFS server crash or just a temporary 'server not responding'
> situation? Does NFS over TCP fix it, for instance?

Just to keep you busy..
I had thought NFS over TCP fixed it. It rang for a lot longer
(around 50 minutes), and then did the following..
Looks like a different bug to my untrained eye.

(17:51:02:root@tetrachloride:mesh)# ~/fsx voon
truncating to largest ever: 0x13e76
truncating to largest ever: 0x2e52c
truncating to largest ever: 0x3c2c2
truncating to largest ever: 0x3f15f
truncating to largest ever: 0x3fcb9
truncating to largest ever: 0x3fe96
truncating to largest ever: 0x3ff9d
truncating to largest ever: 0x3ffff
skipping zero size read
skipping zero size write
Size error: expected 0x30501 stat 0x1f486 seek 0x1f486
LOG DUMP (90892 total operations):
90893(13 mod 256): MAPREAD 0x227bf thru 0x2e4a4 (0xbce6 bytes)
90894(14 mod 256): WRITE 0x20dbf thru 0x28e7b (0x80bd bytes)
90895(15 mod 256): READ 0x1d762 thru 0x241ff (0x6a9e bytes)
90896(16 mod 256): WRITE 0x26621 thru 0x33855 (0xd235 bytes)
90897(17 mod 256): READ 0x28df3 thru 0x33603 (0xa811 bytes)
90898(18 mod 256): READ 0xe303 thru 0x13c31 (0x592f bytes)
90899(19 mod 256): WRITE 0x3b680 thru 0x3ffff (0x4980 bytes) HOLE
90900(20 mod 256): MAPWRITE 0x33ba5 thru 0x34740 (0xb9c bytes)
90901(21 mod 256): READ 0x15ed6 thru 0x2546a (0xf595 bytes)
90902(22 mod 256): MAPWRITE 0x188e8 thru 0x260ec (0xd805 bytes)
90903(23 mod 256): WRITE 0x29f99 thru 0x2b02a (0x1092 bytes)
90904(24 mod 256): TRUNCATE DOWN from 0x40000 to 0x279fb

Complete (60KB) fsx log is at http://www.codemonkey.org.uk/cruft/voon.fsxlog

Dave

2003-05-13 17:39:29

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

In article <[email protected]>,
Trond Myklebust <[email protected]> wrote:
>[NFS]
>any more?

NFSv3 O_EXCL support in the client ?

Mike.

2003-05-13 17:47:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Dave Jones <[email protected]> writes:

> On Tue, May 13, 2003 at 06:02:31PM +0200, Trond Myklebust
> wrote:
>> Then I'm confused as to what you are saying. Are we talking
>> about a full NFS server crash or just a temporary 'server not
>> responding' situation? Does NFS over TCP fix it, for instance?

> Just to keep you busy.. I had thought NFS over TCP fixed
> it. It rang for a lot longer (around 50 minutes), and then did
> the following.. Looks like a different bug to my untrained
> eye.

Nah. Looks like the same thing: mmapped writes followed by truncate.
TCP is likely to change the timings a bit (reliable transport means
that the race between out-of-order write and truncate is smaller, but
it is still there.

Cheers,
Trond

2003-05-13 17:59:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Miquel van Smoorenburg <[email protected]> writes:

> Trond Myklebust <[email protected]> wrote:
>> [NFS] any more?

> NFSv3 O_EXCL support in the client ?

I'm working on it. Consider it bundled in the long rant that Andrew
quoted concerning an atomic "open()". By allowing filesystems to
replace most of what is currently contained in open_namei() (and doing
it atomically instead of relying on local semaphores for atomicity)
it will be possible to implement O_EXCL (and to do it efficiently)...

Cheers,
Trond

2003-05-13 17:57:30

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

James Bottomley [[email protected]] wrote:
> For SCSI, as far as the basics go we still have
>
> Need to convert to DMA-mapping:
>
> am53c974 dpt_i2o initio pci2220i
>
> Don't compile currently:
>
> inia100 cpqfc pci2000 dc390t
>
> Need converting to the new eh:
>
> wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93
> 53c7xx based: amiga7xxx bvme6000 mvme16x
> initio am53c974 pci2000 pci2220i qla1280 sym53c8xx dc390t
>
> I think the sym53c8xx could probably be pulled out of the tree because
> the sym_2 replaces it. I'm also looking at converting the qla1280.
>

I would vote for sym_2 replacement. I have bug 647 that fails on
sym53c8xx but works on sym_2. The bug still has a rmmod problem which
maybe cleaned up with the cleanups in scsi-misc.

> It also might be possible to shift the 53c7xx based drivers over to
> 53c700 which does the new EH stuff, but I don't have the hardware to
> check such a shift.
>
> For the non-compiling stuff, I've probably missed a few that just aren't
> compilable on my platforms, so any updates would be welcome. Also, are
> some of our non-compiling or unconverted drivers obsolete?


I have the following not covered by your list above

qlogicisp (isp1020) - Convert to new eh and other issues. Could be
covered by feral driver, but status unclear of inclusion of feral.
Bug 140.

iph5526.c - Compile failure. Bug 201.

ini9100u.c - DMA-mapping conversion. Bug 213.

tmscsim.c - Compile failure. Bug 219.

AM53C974.c - Compile failure Bug 220.


An issue on fixing some of these is that lack of hardware,
documentation, or maintainer makes verification beyond compile difficult
(and just compile testing could lead to hidden data issues).

-andmike
--
Michael Anderson
[email protected]

2003-05-13 17:59:46

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 06:11:42PM +0200, Trond Myklebust wrote:
> >>>>> " " == Daniel Jacobowitz <[email protected]> writes:
>
> > Well, using BK as of Friday last week I'm still having a
> > complete disaster of NFS support.
>
> Please try a more recent snapshot. The OOM situation was only fixed
> with the patches that Linus pulled for patch-2.5.69-bk7
> (i.e. yesterday's snapshot).
>
> Oh. Please also turn off any 'soft' mount option that you may
> have. Like it or not, those *will* cause EIO errors.

Thanks for the quick and accurate response. I switched to this
morning's BK, and now NFS-root is working like a charm. I used to get
both EIO and EPERM errors under load; now everything appears to work
OK.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2003-05-13 18:08:38

by James Bottomley

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, 2003-05-13 at 13:11, Mike Anderson wrote:
> qlogicisp (isp1020) - Convert to new eh and other issues. Could be
> covered by feral driver, but status unclear of inclusion of feral.
> Bug 140.

Missed, thanks

> iph5526.c - Compile failure. Bug 201.

missed, thanks.

> ini9100u.c - DMA-mapping conversion. Bug 213.

that is the initio driver, covered above

> tmscsim.c - Compile failure. Bug 219.

That is the dc390t driver, covered above. I think the new dc395x driver
may finesse this problem, though.

> AM53C974.c - Compile failure Bug 220.

That's the am53c974, covered above (just couldn't be bothered to
capitalise).

James


2003-05-13 18:59:39

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

James Bottomley [[email protected]] wrote:
> > tmscsim.c - Compile failure. Bug 219.
>
> That is the dc390t driver, covered above. I think the new dc395x driver
> may finesse this problem, though.
>

I will check with the maintainers dc395x (oliver, aliakc, lenehan),
dc390 (garloff) to check on this. Unless you already have.


> > AM53C974.c - Compile failure Bug 220.
>
> That's the am53c974, covered above (just couldn't be bothered to
> capitalise).
>

I guess I did not bother to read the lower case either :-).

-andmike
--
Michael Anderson
[email protected]

2003-05-13 19:39:05

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, 13 May 2003, Christoph Hellwig wrote:
> On Tue, May 13, 2003 at 02:57:08PM +0100, Alan Cox wrote:
> > SH3/SH3-64 need resynching, as do some other ports. No impact on
> > mainstream platforms hopefully
>
> That brings up another issue: what ports do regularly work with 2.5
> mainline? I've been working with David to get all those core changes ia64
> needs (and there's still a lot) sorted out so maybe 2.6 will work out of
> the box. I guess some other arches (parisc, mips?) will need similar
> work.

Just FYI... For the m68k port, I have ca. 150 KiB of patches in Linus' INBOX
(if they're still there, mainly irqreturn_t stuff), and about 100 KiB of
postponed stuff I'm not gonna send (i.e. things that are not ready for
submission yet, or that are too controversial).

Amiga (non-SCSI) and Q40/Q60 should work fairly well in 2.5.x, except that
early userspace (launching of /sbin/init) got broken in 2.5.67 or 2.5.68.

For comparison, 2.4.x has no stuff in Marcelo's INBOX, and about the same 100
KiB of postponed stuff. Not counting Michael M?ller's new TekXpress port, which
is not even in Linux/m68k CVS (http://linux-m68k-cvs.apia.dhs.org/) yet.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2003-05-13 19:49:03

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

"Shaheed R. Haque" <[email protected]> wrote:
>
> - Add ability to restrict the the default CPU affinity mask so that
> sys_setaffinity() can be used to implement exclusive access to a CPU.

Why is this useful?

2003-05-13 19:56:00

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, 13 May 2003, Sam Ravnborg wrote:
> On Tue, May 13, 2003 at 11:50:47AM -0400, Jeff Garzik wrote:
> > mips definitely needs work. I don't know that there exists a working
> > 2.5 mips port.
> >
> > I told Ralf I would work on getting it booting on my Indy, and have been
> > slowly working through that. There is also some mips work in the
> > linux-mips cvs tree.
>
> If I want to update mips Makefiles to new style - what should be used
> as baseline?
>
> Linus-BK or a mips cvs somewhere?

There's still almost daily activity in the Linux/MIPS CVS tree, but compared to
mainline, it's a bit outdated (the main trunk is at 2.5.47, the 2.4 branch at
2.4.21-pre4).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


2003-05-13 19:55:16

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Alan Cox wrote:

> > - 2.5.x won't boot on some 440GX
>
> Problem understood now, feasible fix in 2.4/2.4-ac. (440GX has two IRQ
> routers, we use the $PIR table with the PIIX, but the 440GX doesnt use
> the PIIX for its IRQ routing). Fall back to BIOS for 440GX works and
> Intel concurs.


With 2.5.69, 2.4.20 and 2.4.21-rc2-ac1 on Dell Workstation 610
(440GX) I see:

PCI: Using configuration type 1
...
PCI: Using IRQ router PIIX [8086/7110] at 00:07.0


lspci says it's an 82443GX. Why does this one work when others are
broken?

2003-05-13 20:00:47

by Sam Ravnborg

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 10:07:43PM +0200, Geert Uytterhoeven wrote:
> There's still almost daily activity in the Linux/MIPS CVS tree, but compared to
> mainline, it's a bit outdated (the main trunk is at 2.5.47, the 2.4 branch at
> 2.4.21-pre4).

It must have been before that I checked mainline then - I just remembered
it looked terribly outdated.

I will take a look soon to get Makefiles in shape.

Sam

2003-05-13 20:03:32

by Chris Friesen

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Trond Myklebust wrote:

> Oh. Please also turn off any 'soft' mount option that you may
> have. Like it or not, those *will* cause EIO errors.

Is hard,intr okay from this perspective?

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2003-05-13 20:05:48

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Christoph Hellwig <[email protected]> wrote:
>
> That brings up another issue: what ports do regularly work with 2.5
> mainline?

I test ppc64 regularly. In fact -mm is probably the best place to go to
get a working ppc64 tree at present.

But I do not view non-ia32 support as being a 2.6.0 requirement. I'd be OK
with 2.6.0 working _only_ on ia32. Other architectures will catch up when
they can. The only core requirement is that 2.6.0 not contain gross
x86isms which make other ports impossible.

That's a rather lame position, and sure, one would wish otherwise. Feel
free to disagree ;)

2003-05-13 20:29:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

>>>>> " " == Chris Friesen <[email protected]> writes:

> Is hard,intr okay from this perspective?

Yes.

Cheers,
Trond

2003-05-13 22:15:55

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 01:17:54PM -0700, Andrew Morton wrote:

> But I do not view non-ia32 support as being a 2.6.0 requirement. I'd be OK
> with 2.6.0 working _only_ on ia32. Other architectures will catch up when
> they can. The only core requirement is that 2.6.0 not contain gross
> x86isms which make other ports impossible.

I kinda sorta agree. Holding up 2.6.0 for other ports to catch up
could end up with us waiting, and in the meantime, Linus merging other
stuff which could break non-x86 etc..

Once we're into 2.6.x though, would it be unfeasable to hold off on
final point releases until arch maintainers have sent in a 'make things
work for this release' diff ? Ie, make rc's "strict bugfixes only, and
arch updates"

Though, for some archs (sparc32 springs to mind), we may end up waiting
quite a while, so perhaps just settle on a handful of 'to be kept
up-to-date' archs ?

Dave

2003-05-13 22:36:42

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2


Quoting Andrew Morton <[email protected]>:

> > - Add ability to restrict the the default CPU affinity mask so that
> > sys_setaffinity() can be used to implement exclusive access to a CPU.
>
> Why is this useful?

I forgot to add that the result is the rough equivalent of Digital UNIX's psets
and Irix's sysmp for my prurposes at least.


2003-05-13 22:33:46

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2


Quoting Andrew Morton <[email protected]>:

> "Shaheed R. Haque" <[email protected]> wrote:
> >
> > - Add ability to restrict the the default CPU affinity mask so that
> > sys_setaffinity() can be used to implement exclusive access to a CPU.
>
> Why is this useful?

Because it allows one to dedicate a CPU to a process. For example, lets say you
have a quad processor,and want to run joe-random stuff on CPU 0, but a
specialised program on CPUs 1, 2, 3 that does not want to compete with
joe-random stuff.

With sys_setaffinity(), one can set the affinity of the special program to
0xe...but the default affinity for all the joe-random stuff is still 0xf (from
cpu_online_map)! Since its impractical to to modify every single joe-random
executable to set its affinity to 0x1, a way is needed to set the default. The
logical place is in init(), a.k.a. kernel/fork.c.

I hope that make sense.

Thanks, Shaheed


2003-05-13 22:43:00

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Trond Myklebust <[email protected]> wrote:
>
> The dirty pages are failing to be written out because
> they've been swapped out. We then try to do an RPC call to the server
> to get it to truncate the file on its side.
> Meanwhile one or more of the swapped out pages are faulted in, and
> attempted written out. -> race...

These are file-backed pages: they don't get swapped out. The VM will write
them out with ->writepage(), and will reclaim them when they are clean.

A filemap_fdatawait() will do the right thing with these pages: it'll wait
on them.

There is a weird race in there wrt ongoing pagefaults in the truncated
region, but they require two processes - one faulting, the other
truncating. fsx-linux doesn't do that.

I'd need to see some more details on the code flow, including pointers to
the relevant code in the NFS client to understand this one please.

2003-05-13 22:43:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 01:17:54PM -0700, Andrew Morton wrote:
>> But I do not view non-ia32 support as being a 2.6.0 requirement. I'd be OK
>> with 2.6.0 working _only_ on ia32. Other architectures will catch up when
>> they can. The only core requirement is that 2.6.0 not contain gross
>> x86isms which make other ports impossible.

On Tue, May 13, 2003 at 11:25:32PM +0100, Dave Jones wrote:
> I kinda sorta agree. Holding up 2.6.0 for other ports to catch up
> could end up with us waiting, and in the meantime, Linus merging other
> stuff which could break non-x86 etc..
> Once we're into 2.6.x though, would it be unfeasable to hold off on
> final point releases until arch maintainers have sent in a 'make things
> work for this release' diff ? Ie, make rc's "strict bugfixes only, and
> arch updates"
> Though, for some archs (sparc32 springs to mind), we may end up waiting
> quite a while, so perhaps just settle on a handful of 'to be kept
> up-to-date' archs ?

MIPS seems to be taking a while too.


-- wli

2003-05-13 23:33:09

by Russell King

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 11:25:32PM +0100, Dave Jones wrote:
> Though, for some archs (sparc32 springs to mind), we may end up waiting
> quite a while, so perhaps just settle on a handful of 'to be kept
> up-to-date' archs ?

As far as the ARM patch is concerned, last time I checked, there's still
a fair amount outstanding. Recently, I haven't been able to put the
usual amount of time into Linux, but there is a partial merge pending
as of tonight (with 57K of about a 1.8MB overall ARM patch.)

I don't think I'm going to get through all this and get it sanely
merged for 2.6, which means ARM will probably end up spending yet
another stable kernel release outside the main stream kernel. That
coupled with probably a raft of new ARM machine types during 2.6
with their own random oddball drivers.

Stuff outstanding (this is based upon 2.5.68 + knowledge of what's
changed and pending merging, so might not be completely accurate):

Core arch-independent stuff:
- modules / /proc/kcore / vmalloc
This needs sorting and testing to ensure that stuff like gdb vmlinux
/proc/kcore works as expected. I believe this is the only show stopper
preventing any ARM platform being built in Linus' kernel.

- update acorn partition parsing code - making all acorn schemes
appear in check.c so we don't have to duplicate the scanning of
multiple types, and adding support for eesox partitions.

- lib/inflate.c must not use static variables (causes these to be
referenced via GOTOFF relocations in PIC decompressor. We have
a PIC decompressor to avoid having to hard code a per platform
zImage link address into the makefiles.)


Drivers:
- several OSS drivers for SA11xx-based hardware in need of ALSA-ification
and L3 bus support code for these.

- UCB1[23]00 drivers, currently sitting in drivers/misc in the ARM tree.
(touchscreen, audio, gpio, type device.)

- EPXA (ARM platform) PLD hotswap drivers (drivers/pld)

- linux/sound/drivers/mpu401/mpu401.c and linux/sound/drivers/virmidi.c
complained about 'errno' at some time in the past, need to confirm
whether this is still a problem.

- need to complete ALSA-ification of the WaveArtist driver for both
NetWinder and other stuff (there's some fairly fundamental differences
in the way the mixer needs to be handled for the NetWinder.)

- unconverted keyboard/mouse drivers (there's a deadline of 2.6.0
currently on these remaining in my/Linus' tree.)

- SA11xx USB client/gadget code
(David B has been doing some work on this, and keeps trying to prod me,
but unfortunately I haven't had the time to look at his work, sorry
David.)

- I think we need a generic RTC driver (which is backed by real RTCs).
Integrator-based stuff has a 32-bit 1Hz counter RTC with alarm, as
has the SA11xx, and probably PXA. There's another implementation
for the RiscPC and ARM26 stuff. I'd rather not see 4 implementations
of the RTC userspace API, but one common implementation so that stuff
gets done in a consistent way.

We postponed this at the beginning of 2.4 until 2.5 happened. We're
now at 2.5, and I'm about to add at least one more (the Integrator
implementation.) This isn't sane imo.

- missing raw keyboard translation tables for all ARM machines. Haven't
even looked into this at all. This could be messy since there isn't
an ARM architecture standard. I'm presently hoping that it won't be
an issue. If it does, I guess we'll see drivers/char/keyboard.c
explode.

- network drivers. ARM people like to add tonnes of #ifdefs into these
to customise them to their hardware platform (eg, chip access methods,
addresses, etc.) I cope with this by not integrating them into my
tree. The result is that many ARM platforms can't be built from even
my tree without extra patches. This isn't sane, and has bread a
culture of network drivers not being submitted. I don't see this
changing for 2.6 though.


Net:
- Refuse IrDA initialisation if sizeof(structures) is incorrect
(I'm not sure if we still need this; I think gcc 2.95.3 on ARM shows
this problem though.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-05-14 01:53:15

by Perez-Gonzalez, Inaky

[permalink] [raw]
Subject: RE: 2.6 must-fix list, v2

> From: Shaheed R. Haque [mailto:[email protected]]
>
> Quoting Andrew Morton <[email protected]>:
>
> > "Shaheed R. Haque" <[email protected]> wrote:
> > >
> > > - Add ability to restrict the the default CPU affinity mask so that
> > > sys_setaffinity() can be used to implement exclusive access to a CPU.
> >
> > Why is this useful?
>
> Because it allows one to dedicate a CPU to a process. For example, lets
say you
> have a quad processor,and want to run joe-random stuff on CPU 0, but a
> specialised program on CPUs 1, 2, 3 that does not want to compete with
> joe-random stuff.

Real time applications can also benefit from this; if I can
get all the random stuff out of the way so that I know the
important, timing-sensitive thingie in CPU1 will always
get it, bonus points! ...

I?aky P?rez-Gonz?lez -- Not speaking for Intel -- all opinions are my own
(and my fault)

2003-05-14 02:19:48

by Pete Zaitcev

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

> On Tue, May 13, 2003 at 01:17:54PM -0700, Andrew Morton wrote:
> > But I do not view non-ia32 support as being a 2.6.0 requirement. I'd be OK
> > with 2.6.0 working _only_ on ia32. Other architectures will catch up when
> > they can. The only core requirement is that 2.6.0 not contain gross
> > x86isms which make other ports impossible.

> Once we're into 2.6.x though, would it be unfeasable to hold off on
> final point releases until arch maintainers have sent in a 'make things
> work for this release' diff ? Ie, make rc's "strict bugfixes only, and
> arch updates" [...]

> Though, for some archs (sparc32 springs to mind), we may end up waiting
> quite a while, so perhaps just settle on a handful of 'to be kept
> up-to-date' archs ?

Why does the sparc(32) spring to mind, in particular?
It is likely to be in better shape than sh or mips.
I'm injured (not that anyone cares, but just for the record).

I agree with Andrew on the whole though. More, it's not about
being first tier architecture, or a second tier architecture.
It's about being up to date. I know at least one first tier
architecture which is fond of taking removed features and
reimplementing them inder arch/foo, inventing wheels (sometimes
square) and generally being not up to date.

-- Pete

2003-05-14 02:32:33

by Steven Cole

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, 2003-05-13 at 16:46, Shaheed R. Haque wrote:
> Quoting Andrew Morton <[email protected]>:
>
> > "Shaheed R. Haque" <[email protected]> wrote:
> > >
> > > - Add ability to restrict the the default CPU affinity mask so that
> > > sys_setaffinity() can be used to implement exclusive access to a CPU.
> >
> > Why is this useful?
>
> Because it allows one to dedicate a CPU to a process. For example, lets say you
> have a quad processor,and want to run joe-random stuff on CPU 0, but a
> specialised program on CPUs 1, 2, 3 that does not want to compete with
> joe-random stuff.
>
> With sys_setaffinity(), one can set the affinity of the special program to
> 0xe...but the default affinity for all the joe-random stuff is still 0xf (from
> cpu_online_map)! Since its impractical to to modify every single joe-random
> executable to set its affinity to 0x1, a way is needed to set the default. The
> logical place is in init(), a.k.a. kernel/fork.c.
>
> I hope that make sense.
>
> Thanks, Shaheed
>

Is this related or not to processor shielding used by RedHawk Linux?
Here is a link to their page:

http://www.ccur.com/realtime/sys_rdhwklnx.html

I saw a presentation by these guys over a year ago. I'm not sure what
they're up to now.

Steven

2003-05-14 02:40:37

by Zwane Mwaikambo

[permalink] [raw]
Subject: RE: 2.6 must-fix list, v2

On Tue, 13 May 2003, Perez-Gonzalez, Inaky wrote:

> Real time applications can also benefit from this; if I can
> get all the random stuff out of the way so that I know the
> important, timing-sensitive thingie in CPU1 will always
> get it, bonus points! ...

Not really, during load your reserved cpu will now have to wait longer
for shared resources instead of helping make progress, bringing down the
performance of all your applications including the 'realtime' one.

Zwane
--
function.linuxpower.ca

2003-05-14 07:58:15

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 14 May 2003, Russell King wrote:
> - I think we need a generic RTC driver (which is backed by real RTCs).
> Integrator-based stuff has a 32-bit 1Hz counter RTC with alarm, as
> has the SA11xx, and probably PXA. There's another implementation
> for the RiscPC and ARM26 stuff. I'd rather not see 4 implementations
> of the RTC userspace API, but one common implementation so that stuff
> gets done in a consistent way.
>
> We postponed this at the beginning of 2.4 until 2.5 happened. We're
> now at 2.5, and I'm about to add at least one more (the Integrator
> implementation.) This isn't sane imo.

What about adding the periodic counter and alarm support to
drivers/char/genrtc.c? Genrtc is used on m68k, PA-RISC, PPC, MIPS (private
tree), and even on ia32.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2003-05-14 10:49:51

by Felipe Alfaro Solana

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 2003-05-14 at 00:49, Shaheed R. Haque wrote:
> Quoting Andrew Morton <[email protected]>:
>
> > > - Add ability to restrict the the default CPU affinity mask so that
> > > sys_setaffinity() can be used to implement exclusive access to a CPU.
> >
> > Why is this useful?
>
> I forgot to add that the result is the rough equivalent of Digital UNIX's psets
> and Irix's sysmp for my prurposes at least.

And psets and fencing in Solaris too...

2003-05-14 11:36:59

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2


Quoting Steven Cole <[email protected]>:

> Is this related or not to processor shielding used by RedHawk Linux?
> Here is a link to their page:
>
> http://www.ccur.com/realtime/sys_rdhwklnx.html
>
> I saw a presentation by these guys over a year ago. I'm not sure what
> they're up to now.

Yes, if I correctly read the description of this feature, it seems to be the
same thing.




2003-05-14 11:52:41

by Shaheed

[permalink] [raw]
Subject: RE: 2.6 must-fix list, v2


Quoting Zwane Mwaikambo <[email protected]>:

> Not really, during load your reserved cpu will now have to wait longer
> for shared resources instead of helping make progress, bringing down the
> performance of all your applications including the 'realtime' one.

Of course you are right in the general case. But as long as one has correctly
sized the load so that it does NOT exceed the dedicated CPUs, and one has
reserved the right resources, then it can help the stability of the soft-
realtime applications that are of interest to me.

So, in my case, using dedicated raw-ish disks, pinned memory, dedicated CPUs
and the understanding that the kernel has absolute priority over userspace (and
so is never a worse bottleneck than usual), it all works. Also, my application
maximises the probability of success by explictly managing the resources which
*are* shared between the time-sensitive code and the relevant "joe-random" code.

Can I be certain that there is no shared lock or anything else in the whole
kernel? No, but I'm prepared to make that probabalistic tradeoff (backed via
extensive testing) rather than have to go to hard-realtime.

Thanks, Shaheed


2003-05-14 12:58:34

by Steven Cole

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 2003-05-14 at 05:49, Shaheed R. Haque wrote:
> Quoting Steven Cole <[email protected]>:
>
> > Is this related or not to processor shielding used by RedHawk Linux?
> > Here is a link to their page:
> >
> > http://www.ccur.com/realtime/sys_rdhwklnx.html
> >
> > I saw a presentation by these guys over a year ago. I'm not sure what
> > they're up to now.
>
> Yes, if I correctly read the description of this feature, it seems to be the
> same thing.
>
Thanks, that is what I suspected.

There seemed to be quite a bit of interest in this from the other
customers, although our facility doesn't presently need this
functionality. In the spirit of the "squeaky wheel", I'll squeak softly
for them.

>From the above web page, thus quoth the RedHawk:

"In tightly-coupled symmetric multiprocessing systems such as
Concurrent?s iHawk real-time systems, RedHawk Linux allows individual
CPUs to be shielded from interrupt processing, daemons, bottom halves,
and other Linux tasks. Processor shielding provides a highly
deterministic execution environment where interrupt response is
guaranteed. RedHawk implements shielding via the industry-accepted
shield(1) command."

Steven

2003-05-14 15:44:49

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 2003-05-14 at 07:02, Felipe Alfaro Solana wrote:

> > > > - Add ability to restrict the the default CPU affinity mask so that
> > > > sys_setaffinity() can be used to implement exclusive access to a CPU.
> > >
> > > Why is this useful?
> >
> > I forgot to add that the result is the rough equivalent of Digital UNIX's psets
> > and Irix's sysmp for my prurposes at least.
>
> And psets and fencing in Solaris too...

You can get exclusive access with mangling the system call, simply by
having init bind itself to the non-exclusive processors on boot.

Try it. Every task will then end up on only the non-exclusive
processors. Seems a very simple change to me, and one that can be done
in user-space.

You do not even have to modify init, if you do not want. Grab
http://tech9.net/rml/schedutils and put a taskset call in your rc.d

Robert Love

2003-05-14 15:49:10

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 2003-05-14 at 11:59, Robert Love wrote:

> You can get exclusive access with mangling the system call, simply by
> having init bind itself to the non-exclusive processors on boot.

Whoops. s/with/without/

Sorry,

Robert Love

2003-05-14 16:09:14

by Tom Rini

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Tue, May 13, 2003 at 01:17:54PM -0700, Andrew Morton wrote:
> Christoph Hellwig <[email protected]> wrote:
> >
> > That brings up another issue: what ports do regularly work with 2.5
> > mainline?
>
> I test ppc64 regularly. In fact -mm is probably the best place to go to
> get a working ppc64 tree at present.
>
> But I do not view non-ia32 support as being a 2.6.0 requirement. I'd be OK
> with 2.6.0 working _only_ on ia32. Other architectures will catch up when
> they can. The only core requirement is that 2.6.0 not contain gross
> x86isms which make other ports impossible.

How about some holding point shortly before to ping arch maintainers?
I'm sure a number of arches will be at 'current bk works, but Linus
keeps dropping my emails' stage.

--
Tom Rini
http://gate.crashing.org/~trini/

2003-05-14 18:09:13

by Perez-Gonzalez, Inaky

[permalink] [raw]
Subject: RE: 2.6 must-fix list, v2

-----Original Message-----
> From: Shaheed R. Haque [mailto:[email protected]]
>
> Can I be certain that there is no shared lock or anything else in the
whole
> kernel? No, but I'm prepared to make that probabalistic tradeoff (backed
via
> extensive testing) rather than have to go to hard-realtime.

An of course, when you pin one of those, you try to "fix it",
so to improve the time response of the kernel ...

I?aky P?rez-Gonz?lez -- Not speaking for Intel -- all opinions are my own
(and my fault)

2003-05-14 20:51:56

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wednesday 14 May 2003 4:59 pm, Robert Love wrote:

> You can get exclusive access with mangling the system call, simply by
> having init bind itself to the non-exclusive processors on boot.
>
> Try it. Every task will then end up on only the non-exclusive
> processors. Seems a very simple change to me, and one that can be done
> in user-space.
>
> You do not even have to modify init, if you do not want. Grab
> http://tech9.net/rml/schedutils and put a taskset call in your rc.d

Ah. I think I misread your previous note to me on this...that's why my patch
modifies init itself (it does not muck with the syscall in any way). I'll try
this as soon as I have my 2.5 multiprocessor back. BTW: what are the plans
for getting schedutils (and specifically taskset) into a normal 2.6-based
distribution? Can I be reasonably sure that this will happen?

2003-05-14 21:00:06

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Wed, 2003-05-14 at 17:01, shaheed wrote:

> Ah. I think I misread your previous note to me on this...that's why my patch
> modifies init itself (it does not muck with the syscall in any way). I'll try
> this as soon as I have my 2.5 multiprocessor back. BTW: what are the plans
> for getting schedutils (and specifically taskset) into a normal 2.6-based
> distribution? Can I be reasonably sure that this will happen?

I think it is in Debian (unstable at least).

More important to me is getting it into Red Hat and SuSE. I have heard
encouring words from Matt Wilson at Red Hat about schedutils possibly
going into Rawhide soon. It would not hurt to let Red Hat/SuSE/whoever
know that schedutils is something their customers want.

Both Red Hat and SuSE's kernels have the CPU affinity system calls
merged, so you do not need to wait until 2.6 is out to use them.

Robert Love

2003-05-15 09:07:38

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2


Quoting Robert Love <[email protected]>:

> More important to me is getting it into Red Hat and SuSE. I have heard
> encouring words from Matt Wilson at Red Hat about schedutils possibly
> going into Rawhide soon. It would not hurt to let Red Hat/SuSE/whoever
> know that schedutils is something their customers want.
>
> Both Red Hat and SuSE's kernels have the CPU affinity system calls
> merged, so you do not need to wait until 2.6 is out to use them.

These are the distros I am interested in too. I knew it was in RH AS/ES, but
are you saying it is in RH9.0? That would be good news.

On the technical point, I tried out taskset in rc.sysinit, and as you said, it
works just fine. On reflection, I feel that editing rc.sysinit is not the right
answer given the confidence/competence level of our customers' typical
sysadmins: but I can see that a carefully crafted rc5.d/S00aaaaa script could
set the affinity of the executing shell, and its parent(s) upto init to fix all
subsequent rcN.d children in the desired manner.

I do suspect that other commercial users will also baulk at editing rc.sysint,
and so have to brew the same rcN.d solution. Now, the rcN.d script hackery
would be greatly simplified if taskset had a mode of "set the affinity of the
identified process, and all its parent processes upto init". Would you accept a
patch to taskset along those lines?

I think that would be a very acceptable, easy to deploy, solution.

Thoughts? Shaheed


2003-05-15 15:17:35

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Thu, 2003-05-15 at 05:19, Shaheed R. Haque wrote:

> These are the distros I am interested in too. I knew it was in RH AS/ES, but
> are you saying it is in RH9.0? That would be good news.

No, I am saying with luck it will be in the next RH release.

> On the technical point, I tried out taskset in rc.sysinit, and as you said, it
> works just fine.

Good :)

> On reflection, I feel that editing rc.sysinit is not the right
> answer given the confidence/competence level of our customers' typical
> sysadmins: but I can see that a carefully crafted rc5.d/S00aaaaa script could
> set the affinity of the executing shell, and its parent(s) upto init to fix all
> subsequent rcN.d children in the desired manner.
>
> I do suspect that other commercial users will also baulk at editing rc.sysint,
> and so have to brew the same rcN.d solution. Now, the rcN.d script hackery
> would be greatly simplified if taskset had a mode of "set the affinity of the
> identified process, and all its parent processes upto init". Would you accept a
> patch to taskset along those lines?

It is racey to do this, so its something that should remain a hack and
not part of taskset, I think.

If you do it in rc.d, you don't need to set all the parents. rc.d is the
first thing run, so if you do it at the top of the script, nothing else
is running. Just put:

taskset <mask> 1
taskset <mask> $$

at the top of rc.d.

Another consideration is modifying init (and hopefully having said
changes merged back). Init could call sched_setaffinity() when it is
first created, based on a setting in /etc/inittab or a command line
parameter passed during boot.

My reservation is against doing it in the kernel. I do not particularly
care _how_ its done in user-space.

Robert Love

2003-05-15 19:57:09

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Thursday 15 May 2003 4:32 pm, Robert Love wrote:

> It is racey to do this, so its something that should remain a hack and
> not part of taskset, I think.

Hmm. I guess you are thinking of daemons?

> If you do it in rc.d, you don't need to set all the parents. rc.d is the
> first thing run, so if you do it at the top of the script, nothing else
> is running. Just put:
>
> taskset <mask> 1
> taskset <mask> $$
>
> at the top of rc.d.

Perhaps we are talking at cross purposes. As I understand it the calling chain
is:

1. kernel bootstrap
2. /sbin/init
3. bash to run /etc/rc.sysinit
4. bash to run individual /etc/rcN.d/whatever

I feel wary of doing it in 3 as you seem to suggest because I am pretty sure
this will intimidate my customers. I am happy to do it in 4 - I can avoid the
races by only doing it for the distros I care about.

That leaves options 1 and 2 for a community-wide solution. I guess I haven't
quite understood the reluctance to do it in 1 given that:

- we know who owns 1 (!!)

- AFAICS, it isn't conceptual bloat because the utility of the implementation
of sys_setaffinity() in 1 is greatly limited by not including this feature.

- its hardly physical bloat because the number of bytes required to implement
this is absolutely in the noise, and virtually all __init()ed away.

> Another consideration is modifying init (and hopefully having said
> changes merged back). Init could call sched_setaffinity() when it is
> first created, based on a setting in /etc/inittab or a command line
> parameter passed during boot.

I have no idea with whom to persue this path, and as I say, I feel that
solving this once for each distro is crazy IMHO.

> My reservation is against doing it in the kernel. I do not particularly
> care _how_ its done in user-space.

I'm sorry to appear foolish, but as explained above, I genuinely don't
understand why this does not belong in the kernel. I would be grateful for
elaboration. If I really am being thick, then just ignore me and I'll just
solve this for myself using route 4.

In any case, thanks for all the patience and kind suggestions so far - it is
appreciated.

Regards, Shaheed

2003-05-15 20:10:11

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Thu, 2003-05-15 at 16:07, shaheed wrote:

> I'm sorry to appear foolish, but as explained above, I genuinely don't
> understand why this does not belong in the kernel. I would be grateful for
> elaboration. If I really am being thick, then just ignore me and I'll just
> solve this for myself using route 4.

Oh, one other problem with doing it in the kernel via INIT_TASK:

You end up affining any kernel threads, which you absolutely do not want
to do _implicitly_. Maybe explicitly, but certainly not implicitly as a
blind consequence.

Doing it via init is really the way to go.

Regards,

Robert Love

2003-05-15 20:05:29

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Thu, 2003-05-15 at 16:07, shaheed wrote:

> I have no idea with whom to persue this path, and as I say, I feel that
> solving this once for each distro is crazy IMHO.

It does not have to be done for each distribution. Modify the SysVinit
package directly to support this feature. All init needs to do is bind
itself to the allowed processors prior to its first fork(). This can be
done as part of the core init package, and thus all distributions
automatically reap the benefits.

If init is not modified, then it can be done in rc.d or wherever by
hand.

> I'm sorry to appear foolish, but as explained above, I genuinely don't
> understand why this does not belong in the kernel. I would be grateful for
> elaboration. If I really am being thick, then just ignore me and I'll just
> solve this for myself using route 4.

Things which can be done in user-space should be done in user-space.
There is absolutely zero reason to do this in the kernel. init can do
it.

Submit a patch to the init maintainer to have it bind itself on boot to
a given command line value. Maybe I will do this if I find the time...

Robert Love

2003-05-15 21:20:13

by Shaheed

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

On Thursday 15 May 2003 9:24 pm, Robert Love wrote:

> Oh, one other problem with doing it in the kernel via INIT_TASK:
>
> You end up affining any kernel threads, which you absolutely do not want
> to do _implicitly_. Maybe explicitly, but certainly not implicitly as a
> blind consequence.
>
> Doing it via init is really the way to go.

OK, that does make sense.

>Submit a patch to the init maintainer to have it bind itself on boot to
>a given command line value. Maybe I will do this if I find the time...

I will have a go myself too...

[srhaque@chiswick srhaque]$ rpm -q --whatprovides /sbin/init
SysVinit-2.84-2mdk

and rpmfind.net points to ftp://ftp.cistron.nl/pub/people/miquels/software.
I'll drop you a line if I make progress.

Thanks, Shaheed

2003-05-17 20:57:18

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Hi!

> > - ACPI needs the relax patches merging to work on lots of laptops
>
> Working in 2.4.21-ac, Toshiba cheap laptops now run a treat. Forward
> port looks like a patch command

Well, this looks like easy to patch but
hard to convince Andy to take it...

Perhaps such workarounds could be surrounded
by CONFIG_BROKEN_HW (default to yes).
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...

2003-05-17 20:58:21

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6 must-fix list, v2

Hi!

> That brings up another issue: what ports do regularly work with 2.5
> mainline? I've been working with David to get all those core changes ia64
> needs (and there's still a lot) sorted out so maybe 2.6 will work out of
> the box. I guess some other arches (parisc, mips?) will need similar
> work.

x86-64 does usually work out of the box or after
few lines of fixes.
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...