The most fundamental part of this has already been discussed a lot and
posted to the kernel list, including a lot of fixes (all hopefully
integrated). That's obviously the removal of the global irq lock. In the
short term (famous last words) that breaks a number of SMP configurations,
but fixing them should not be horribly hard.
A lot of other stuff here too - the regular USB updates, fbdev updates,
m68k and ppc64 updates, IDE fix, and a sync-up with Al. Serial lawyer all
shook up (the irq lock kind of forced that one, but it's certainly been
pending long enough..)
Go wild,
Linus
---
Summary of changes from v2.5.27 to v2.5.28
============================================
<[email protected]>:
o IDE-101
<[email protected]>:
o drivers/usb/misc/tiglusb.c v1.04
<[email protected]>:
o Added help for the Toshbia and Permedia3 framebuffer devices. Small
fixes for the ATI 128 card and the logo drawing code in fbcon.c.
Proper handling of data for pci handling
o Added VBI support to VESA. ATY 128 compiles now :-)
o Removed all old fbgen code. Small cleanups
o Synced up to m68k changes
<[email protected]>:
o consolidate task->mm code + fix
<[email protected]>:
o USB: rtl8150 updated
Alexander Viro <[email protected]>:
o make hfs use regular semaphores
o Use wipe_partitions() where appropriate
o partition parsing cleanup
o block device size cleanups
o partition handling locking cleanups
o blk_ioctl() not exported anymore
o paride cleanup and fixes
o SCSI ->bios_param() switched to struct block_device *
o removal of dead prototypes
o jffs kdev_t cleanups
o fix for nfs_unlink and vfs_unlink
o Fix dcache deadlock introduced by previous fix
Andrew Morton <[email protected]>:
o disable highpte in rmap kernels
o page-writeback.c compile warning fix
Anton Blanchard <[email protected]>:
o ppc64: enable eeh on non-LPAR
o ppc64: copy_user_page and clear_user_page now take a page * ppc64:
updates for mmu gather code
o ppc64: Use non context synchronising mtmsrd. Cleanup init.c
o ppc64: 64 and 32 bit signal cleanups from Stephen Rothwell
o ppc64: Implement copy_siginfo_to_user32 from Stephen Rothwell
o ppc64: Remove POWER4 special case for cache_decay_ticks
o ppc64: Initial DISCONTIGMEM and NUMA support
o ppc64: Only use irq balancing on openpic for the moment
o ppc64: update ppc64 tlb batch code
o ppc64: Add fls
o ppc64: POWER4 lazy icache flushing
o ppc64: add missing gcc barrier in softirq code
o ppc64: export timebase frequency in /proc/cpuinfo
o pSeries HVC console: Fix hang up race - from Dave Engebretsen
o pSeries HVC console: Add SYSRQ and handle errors better from Dave
Engebretsen
o pSeries firmware flash support from Todd Inglett
o ppc64: iSeries updates
o ppc64: increase IRQ_KMALLOC_ENTRIES
o ppc64: ptrace cleanup from Stephen Rothwell
o ppc64: ptrace32 fix when tracing 64bit tasks from Will Schmidt
o ppc64: config.h resync and remove some stale code ppc64: turn off
STRICT_MM_TYPECHECKS
o pSeries HVC console: fix hvc_hangup definition
o ppc64: remove __openfirmware and __chrp
o ppc64: include/asm/md.h is not used any more
o ppc64: remove some stale code
o ppc64: updates for 2.5.21
o ppc64: Makefile updates
o ppc64: non linear cpu support
o ppc64: define new scheduler hooks
o ppc64: update for recent 32/64bit binutils
o ppc64: Fix warnings
o ppc64: Fix for 32 bit ELF timeval handling, from sparc64
o ppc64: define SMP_CACHE_BYTES and cleanup HZ handling
o ppc64: use symbolic names for fault types
o ppc64: _switch_to -> __switch_to
o ppc64: defconfig update
o ppc64: exception should be 0x480 for instruction SLB miss - jimix
o ppc64: misc cleanups
o ppc64: __clear_user should return number of bytes not copied
o ppc64: iSeries update - from 2.4
o ppc64: add comment and missing include
o ppc64: Add northstar CPU
o ppc64: Add winnipeg support
o ppc64: UP fix for irq affinity
o ppc64: add POWER4+ (GQ) support
o ppc64: add rmap.h
o ppc64: workaround for gcc 3.1, otherwise we busy loop in
pte_chain_lock()
o ppc64: fix test_bit and remove workaround in cpu_relax
o ppc64: big IRQ lock removal
o ppc64: Fix for spurious interrupts in LPAR without ISA
o ppc64: merge some 2.4 fixes
o ppc64: missed during last merge
o ppc64: Designated initializers from Rusty
o ppc64: add Config.help
o ppc64: Optimise for 630 by default
o ppc64: put paca in r13 and fix non zero boot cpu
o flags must be unsigned long]
o Make tlb_remove_tlb_entry take ptep]
o Fix token ring compile]
Christopher Hoover <[email protected]>:
o for ohci on SA-1111
o set_device_description oops fixage mk2
Dave Kleikamp <[email protected]>:
o Clean up Documentation/filesystems/jfs.txt
o JFS: Use cond_resched()
o JFS: cosmetic syncup with 2.4 code
o JFS: Replace depreciated initializer syntax with C99 style
David Brownell <[email protected]>:
o usb_set_interface() doc
o hid_ff_init could not find initializer
Geert Uytterhoeven <[email protected]>:
o M68k update (1-49)
Greg Kroah-Hartman <[email protected]>:
o PCI Hotplug: fix i_nlink for root inode in pcihpfs
o PCI Hotplug: fix the dbg() macro to work properly on older versions
of gcc
o USB pl2303: new device support added
o USB: rio500.c bugfix
o USB: usb-serial.h cleanups
o USB: changed the interface name to be a bit more unique
Hugh Dickins <[email protected]>:
o shm_destroy lock hang
o shmem_link duplicated test
o shmem_file_write double kunmap
o shmem_getpage_locked missing unlock
Ingo Molnar <[email protected]>:
o "big IRQ lock" removal, IRQ cleanups
o "big IRQ lock" removal docs
o Re: [patch] cli()/sti() cleanup
o irqlock patch 2.5.27-H6
o scheduler fixes
James Simmons <[email protected]>:
o Removal of nonexistant iplan16 support. Compile fix for aty128fb
driver. Proper handling of PCI private data for fbdev drivers
o Removed old FB_COMPAT_XPMAC stuff. Ported over the Riva framebuffer
driver over to the new api. Updated the Voodoo 1 driver
o Added Help info for Permedia 3 and Toshiba TX3912 graphics card
support
o Updated Voodoo 1 documentation
o Finished the NVIDIA driver port to the new api. Killed a strtok in
sstfb
o Added VBI support to VESA
o Supports more NVIDIA cards
o NVIDIA fixes to handle HSYNC and VSYNC flags. Set the registers to
read the image data as big endian. Handle the different smem_len
for different kinds of cards
o Cleanusp for the 3Dfx driver
o Finally touchs to the New mac framebuffer driver. Started the port
of the ATI 128 driver to the new api. A few small optimizations and
a bug fix for SUN 12x22 fonts with the new accel code
o M68K updates for there framebuffer devices
o More changes to port over teh ATI 128 driver to new api.
Optimizations for fbgen and small bug fix for gen_update_var
o Ported over ATI 128 Rage driver to new api. A few config mistakes
where fixed
o Ported SA1100 framebuffer over to new fbdev api
o Fixed bug for large logos. Also had to make a patch to handle X
server reversing the image order programming verses how the riva
fbdev driver does it
o Removed fbcon-vga.c and the old fbgen code. Ported over the SgiVW,
OpenFirmware fbdev driver and started the Mach64 fbdev driver to
the new api. A few simple typos as well
o Port step some changes at authors request
o Reversed so more changes
o Permiedia 2 support on PowerPC platform
o Updates to SIS framebuffer driver
o Porting Mach 64 drive over to new api
Jan Harkes <[email protected]>:
o uhci-hcd suspend fix
Jens Axboe <[email protected]>:
o add __blk_stop_queue() as locked variant of blk_stop_queue() and
make cpqarray and cciss use these
[email protected] <[email protected]>:
o SCSI tape driver fixes for 2.5.27
Linus Torvalds <[email protected]>:
o Fix incoherent LDT at mmap exit
o Update ensoniq sound driver to new irq serialization
o Remove extraneous dget/dput pair in vfs_unlink() that confused the
NFS client code wrt the exclusiveness of a dentry getting removed.
o Remove unused variable
o Fix up irqlock removal patch, avoid compiler warnings
o Fixups for previous changesets, avoid warnings etc
Neil Brown <[email protected]>:
o type safe(r) list_entry repacement: container_of
o MD - Fix two bugs that would cause sync_sbs to Oops
o MD - Convert struct initialised in md to "the new way"
o MD - Remove get_spare declaration and associated warning
o NFSD - new struct initialisers for nfsd
Richard Gooch <[email protected]>:
o Switched to ISO C structure field initialisers
Rik van Riel <[email protected]>:
o urgent rmap bugfix
Robert Love <[email protected]>:
o Re: "big IRQ lock" removal docs
Russell King <[email protected]>:
o Serial driver stuff
o [SERIAL] Rename files to remove serial_ prefix
o [SERIAL] Fix up various filenames, etc, from Ingo's merge of serial
o [SERIAL] Fix another SMP deadlock with modem status signal changes
The original fix sent to Ingo for stop_tx didn't take account that
the start_tx and stop_tx methods can be called from the device
specific code under the port spinlock. Consequently, we move the
spinlock to the callers of these methods. Documentation updated to
reflect the change.
o [SERIAL] Fix deadlock in __uart_start introduced in previous cset
Thanks to Zwane Mwaikambo for finding this.
o [SERIAL] Fix sa1100 serial driver stop function parameters
Rusty Russell <[email protected]>:
o AGP designated initializer update
o drivers/hotplug designated initializers
Trond Myklebust <[email protected]>:
o 2.5.27 fix potential spinlocking race
Error building 2.5.28:
ld -m elf_i386 -T arch/i386/vmlinux.lds -e stext
arch/i386/kernel/head.o arch/i386/kernel/init_task.o init/init.o
--start-group arch/i386/kernel/kernel.o arch/i386/mm/mm.o
kernel/kernel.o mm/mm.o fs/fs.o ipc/ipc.o security/built-in.o
/usr/src/linux-2.5.28/arch/i386/lib/lib.a lib/lib.a
/usr/src/linux-2.5.28/arch/i386/lib/lib.a drivers/built-in.o
sound/sound.o arch/i386/pci/pci.o net/network.o --end-group -o vmlinux
drivers/built-in.o: In function `put_cmd640_reg_pci1':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:223: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:224: undefined reference to
`cli'
drivers/built-in.o: In function `get_cmd640_reg_pci1':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:235: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:236: undefined reference to
`cli'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:239: undefined reference to
`restore_flags'
drivers/built-in.o: In function `put_cmd640_reg_pci2':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:249: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:250: undefined reference to
`cli'
drivers/built-in.o: In function `get_cmd640_reg_pci2':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:262: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:263: undefined reference to
`cli'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:267: undefined reference to
`restore_flags'
drivers/built-in.o: In function `put_cmd640_reg_vlb':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:277: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:278: undefined reference to
`cli'
drivers/built-in.o: In function `get_cmd640_reg_vlb':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:289: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:290: undefined reference to
`cli'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:293: undefined reference to
`restore_flags'
drivers/built-in.o: In function `put_cmd640_reg_pci1':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:227: undefined reference to
`restore_flags'
drivers/built-in.o: In function `put_cmd640_reg_pci2':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:254: undefined reference to
`restore_flags'
drivers/built-in.o: In function `put_cmd640_reg_vlb':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:281: undefined reference to
`restore_flags'
drivers/built-in.o: In function `secondary_port_responding':
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:370: undefined reference to
`save_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:371: undefined reference to
`cli'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:379: undefined reference to
`restore_flags'
/usr/src/linux-2.5.28/drivers/ide/cmd640.c:383: undefined reference to
`restore_flags'
make: *** [vmlinux] Error 1
On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
> Error building 2.5.28:
Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
-Paul Larson
On Wed, 2002-07-24 at 14:57, Paul Larson wrote:
> On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
> > Error building 2.5.28:
>
> Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
This is known... there is a handful of drivers still using the old
global IRQ methods that need a spring cleaning.
On UP, we "cheat" and define the global methods to the local ones and
everything is happy. On SMP you are out of luck until the code if
fixed.
Robert Love
On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
>> Error building 2.5.28:
On Wed, Jul 24, 2002 at 04:57:35PM -0500, Paul Larson wrote:
> Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
> -Paul Larson
Which drivers?
Cheers,
Bill
On Wed, 2002-07-24 at 15:14, William Lee Irwin III wrote:
> On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
> >> Error building 2.5.28:
>
> On Wed, Jul 24, 2002 at 04:57:35PM -0500, Paul Larson wrote:
> > Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
> > -Paul Larson
>
> Which drivers?
He reported the cmd640 driver... but we really do not need 100 "it does
not compile reports" on lkml. Just grep your tree for the old global
IRQ methods... also, Ingo wrote a nice document in Documentation/ ..
Robert Love
On Wed, 2002-07-24 at 17:14, William Lee Irwin III wrote:
> On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
> >> Error building 2.5.28:
>
> On Wed, Jul 24, 2002 at 04:57:35PM -0500, Paul Larson wrote:
> > Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
> > -Paul Larson
>
> Which drivers?
CMD640 for certain, but according to rml there are some others that need
fixing as well.
-Paul Larson
On 24 Jul 2002, Paul Larson wrote:
>
> Error building 2.5.28:
Yes, a number of drivers won't compile on SMP because they want the global
lock (that is gone). This includes the cmd640 IDE driver, and the parallel
port driver (and a largish number of others too, but I think those two are
the really common ones).
It should all work the same way it always did on UP, and hopefully the
straggling drivers will be converted to use local spinlocks soon (and in
some cases the global irq lock might not even be needed at all)
Linus
Am Mit, 2002-07-24 um 23.13 schrieb Linus Torvalds:
> <[email protected]>:
> o IDE-101
What on earth is this??? I'm really surprised you accept this as a
changelog entry especially when considering that there's no further
information about the latest IDE changes on the mailinglist anymore...
--
Servus,
Daniel
On Wed, Jul 24, 2002 at 02:13:49PM -0700, Linus Torvalds wrote:
> Russell King <[email protected]>:
> o Serial driver stuff
I'd just like to clear up some confusion here.
Under this cset is a change that adds '-g' to the top-level make flags.
This cset, although it has my name and comments against it is actually
a patch that added the serial drivers and extra bits that Ingo sent to
Linus (and Linus kindly credited it back to me, and took the comments
from one of my patch submissions for the serial drivers.)
The addition of '-g' is not, and has never been in my serial patches;
please do not direct comments about the '-g' to me. Any problems
relating to the serial changes should be forwarded to me.
Thanks.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Wed, 2002-07-24 at 17:22, Robert Love wrote:
> He reported the cmd640 driver... but we really do not need 100 "it does
> not compile reports" on lkml. Just grep your tree for the old global
It was not my intention to provide something unimportant to lkml, but
the same config worked on the same machine in linux-2.5.27 and then was
not working on 2.5.28. Since it had only recently broken, it's more
likely that whatever broke it will still be fresh on somebody's mind and
I was hoping that would make it easier for someone to fix.
On 25 Jul 2002, Daniel Egger wrote:
>
> What on earth is this??? I'm really surprised you accept this as a
> changelog entry especially when considering that there's no further
> information about the latest IDE changes on the mailinglist anymore...
Actually, that patch _was_ on the mailing list, with lots of discussion.
And since none of the discussion was civil, it didn't get a changelog. But
you can search it out yourself.
Linus
In article <[email protected]>,
Russell King <[email protected]> wrote:
>
>I'd just like to clear up some confusion here.
>
>Under this cset is a change that adds '-g' to the top-level make flags.
Oops, you're right. I'll undo that, and sorry for the attribution mess.
Linus
> > <[email protected]>:
> > o IDE-101
>
> What on earth is this??? I'm really surprised you accept this as a
> changelog entry especially when considering that there's no further
> information about the latest IDE changes on the mailinglist anymore...
You need to look at the full changelog to see the full entry: see, for
example: http://lwn.net/Articles/5577/. Or, to save the wear on your web
browser:
<[email protected]>
[PATCH] IDE-101
Here is a quick fix. I would like to synchronize with the irq handler
changes as well. Becouse right now I know that preemption is killing
the disk subsystem when moving data between disks using different
request queues... In esp. It get's me in to do_request() with a queue
in unplugged state. (Not everything is my fault, after all :-).
This does still leave open the question of why these patches no longer go
to linux-kernel, though...
jon
Jonathan Corbet
Executive editor, LWN.net
[email protected]
On Wed, Jul 24, 2002 at 02:13:49PM -0700, Linus Torvalds wrote:
> ... Serial lawyer all shook up (the irq lock kind of forced that one,
^^^^^^^
> but it's certainly been pending long enough..)
Shake `em harder, lets see what falls out. 8-)
Dave.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
On Thu, 25 Jul 2002, Dave Jones wrote:
> On Wed, Jul 24, 2002 at 02:13:49PM -0700, Linus Torvalds wrote:
>
> > ... Serial lawyer all shook up (the irq lock kind of forced that one,
> ^^^^^^^
> > but it's certainly been pending long enough..)
>
> Shake `em harder, lets see what falls out. 8-)
Yeah yeah, rub it in.
Linus "dyslexic" Torvalds
Paul Larson wrote:
> On Wed, 2002-07-24 at 17:14, William Lee Irwin III wrote:
>
>>On Wed, 2002-07-24 at 16:46, Paul Larson wrote:
>>
>>>>Error building 2.5.28:
>>>
>>On Wed, Jul 24, 2002 at 04:57:35PM -0500, Paul Larson wrote:
>>
>>>Forgot to mention this is an SMP box. Without CONFIG_SMP it works fine.
>>>-Paul Larson
>>
>>Which drivers?
>
> CMD640 for certain, but according to rml there are some others that need
> fixing as well.
But, on my UP:
if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.5.28; fi
depmod: *** Unresolved symbols in /lib/modules/2.5.28/kernel/net/irda/irda.o
depmod: cli
depmod: restore_flags
depmod: save_flags
(and no, CONFIG_SMP is not set :)
Ciao,
--alessandro
"my actions make me beautiful / and dignify the flesh"
(R.E.M., "Falls to Climb")
Russell King wrote:
> [snip]
> Any problems relating to the serial changes should be forwarded to me.
I had to export these 2 funcs.
--- linux/drivers/serial/core.c~ Wed Jul 24 19:47:04 2002
+++ linux/drivers/serial/core.c Wed Jul 24 19:47:06 2002
@@ -2469,6 +2469,8 @@
EXPORT_SYMBOL(uart_unregister_driver);
EXPORT_SYMBOL(uart_register_port);
EXPORT_SYMBOL(uart_unregister_port);
+EXPORT_SYMBOL(uart_add_one_port);
+EXPORT_SYMBOL(uart_remove_one_port);
MODULE_DESCRIPTION("Serial driver core");
MODULE_LICENSE("GPL");
--
Skip
Am Don, 2002-07-25 um 00.52 schrieb Linus Torvalds:
> Actually, that patch _was_ on the mailing list, with lots of discussion.
So IDE-101 equals to the small snippet of code pasted somewhere in the
evil flamewar?
> And since none of the discussion was civil, it didn't get a changelog. But
> you can search it out yourself.
Especially since the IDE code at the moment is not really something I
would trust uncoditionally a bit more comentary would be adequate IMHO,
even if it's just a: "Fixed race introduced in IDE-97, see flamewar"...
Perhaps not everyone wanting to use 2.5.x for some development is
eager to disassemble a patch to see whether it might be usable or
trash the partition even more badly (given that one has the knowledge
to judge for her-/himself).
--
Servus,
Daniel
Alessandro Suardi wrote :
> But, on my UP:
>
> if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.5.28; fi
> depmod: *** Unresolved symbols in /lib/modules/2.5.28/kernel/net/irda/irda.o
> depmod: cli
> depmod: restore_flags
> depmod: save_flags
>
> (and no, CONFIG_SMP is not set :)
>
>
> Ciao,
>
> --alessandro
IrDA is not going to get fixed soon. Over the time I've been
fixing the IrDA stack, I've slowly fixed some of most dangerous
locking problems, but fixing the remaining code will involve some
serious re-work and is unfortunately not just about sprinking a few
spinlocks there and there.
That's life...
Jean
Here's a small patch to get the i810_audio.c driver working again.
thanks,
greg k-h
diff -Nru a/sound/oss/i810_audio.c b/sound/oss/i810_audio.c
--- a/sound/oss/i810_audio.c Wed Jul 24 17:35:31 2002
+++ b/sound/oss/i810_audio.c Wed Jul 24 17:35:32 2002
@@ -1733,7 +1733,7 @@
}
spin_unlock_irqrestore(&state->card->lock, flags);
- synchronize_irq();
+ synchronize_irq(state->card->irq);
dmabuf->ready = 0;
dmabuf->swptr = dmabuf->hwptr = 0;
dmabuf->count = dmabuf->total_bytes = 0;
@@ -2814,15 +2814,15 @@
}
dmabuf->count = dmabuf->dmasize;
outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
- save_flags(flags);
- cli();
+ local_irq_save(flags);
+ local_irq_disable();
start_dac(state);
offset = i810_get_dma_addr(state, 0);
mdelay(50);
new_offset = i810_get_dma_addr(state, 0);
stop_dac(state);
outb(2,card->iobase+dmabuf->write_channel->port+OFF_CR);
- restore_flags(flags);
+ local_irq_restore(flags);
i = new_offset - offset;
#ifdef DEBUG
printk("i810_audio: %d bytes in 50 milliseconds\n", i);
On 25 Jul 2002, Daniel Egger wrote:
> Am Don, 2002-07-25 um 00.52 schrieb Linus Torvalds:
>
> > Actually, that patch _was_ on the mailing list, with lots of discussion.
>
> So IDE-101 equals to the small snippet of code pasted somewhere in the
> evil flamewar?
Have you _looked_ at the full changelog? Apparently not.
The snippet was posted as part of the IDE-2.5.27 thread. Go look for it
yourself. In addition, I asked Martin to send it to me separately, to
verify that he hadn't had other issues too. The full changelog has that
email.
> > And since none of the discussion was civil, it didn't get a changelog. But
> > you can search it out yourself.
>
> Especially since the IDE code at the moment is not really something I
> would trust uncoditionally a bit more comentary would be adequate IMHO,
> even if it's just a: "Fixed race introduced in IDE-97, see flamewar"...
Most of the IDE stuff is FUD and misinformation. I've run every single
2.5.x kernel on an IDE system ("penguin.transmeta.com" has everything on
IDE), and the main reported 2.5.27 corruption was actually from my BK tree
apparently due to the IRQ handling changes.
> Perhaps not everyone wanting to use 2.5.x for some development is
> eager to disassemble a patch to see whether it might be usable or
> trash the partition even more badly (given that one has the knowledge
> to judge for her-/himself).
The thing I dislike is how people who apparently haven't even read the
discussions, and didn't bother to look up the full changelog feel that
they are perfectly fine to spread FUD and misinformation about the IDE
layer.
Do we have issues there? Yes. But there are actually _more_ problems with
people dissing the work than with the code itself.
Linus
On Wed, 24 Jul 2002, Greg KH wrote:
>
> Here's a small patch to get the i810_audio.c driver working again.
Hmm..
> @@ -2814,15 +2814,15 @@
> }
> dmabuf->count = dmabuf->dmasize;
> outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
> - save_flags(flags);
> - cli();
> + local_irq_save(flags);
> + local_irq_disable();
First off, "local_irq_save()" does both the save and the disable (the same
way "spin_lock_irqsave()" does), it's the "local_save_flags(") that is
equivalent to the old plain save_flags. So this should just be
local_irq_save(flags);
However, I also wonder if the code doesn't want any SMP locking? Is it ok
to just make it a non-spinlock local interrupt disable, and if so, why?
Linus
Hi!
On Wed, 24 Jul 2002, Linus Torvalds wrote:
> On 25 Jul 2002, Daniel Egger wrote:
> > Am Don, 2002-07-25 um 00.52 schrieb Linus Torvalds:
> >
> > > Actually, that patch _was_ on the mailing list, with lots of discussion.
> >
> > So IDE-101 equals to the small snippet of code pasted somewhere in the
> > evil flamewar?
>
> Have you _looked_ at the full changelog? Apparently not.
>
> The snippet was posted as part of the IDE-2.5.27 thread. Go look for it
> yourself. In addition, I asked Martin to send it to me separately, to
> verify that he hadn't had other issues too. The full changelog has that
> email.
Yes, but the code actually is only this fix.
> > > And since none of the discussion was civil, it didn't get a changelog. But
> > > you can search it out yourself.
> >
> > Especially since the IDE code at the moment is not really something I
> > would trust uncoditionally a bit more comentary would be adequate IMHO,
> > even if it's just a: "Fixed race introduced in IDE-97, see flamewar"...
>
> Most of the IDE stuff is FUD and misinformation. I've run every single
> 2.5.x kernel on an IDE system ("penguin.transmeta.com" has everything on
> IDE), and the main reported 2.5.27 corruption was actually from my BK tree
> apparently due to the IRQ handling changes.
I wish it was misinformation.
> > Perhaps not everyone wanting to use 2.5.x for some development is
> > eager to disassemble a patch to see whether it might be usable or
> > trash the partition even more badly (given that one has the knowledge
> > to judge for her-/himself).
>
> The thing I dislike is how people who apparently haven't even read the
> discussions, and didn't bother to look up the full changelog feel that
> they are perfectly fine to spread FUD and misinformation about the IDE
> layer.
>
> Do we have issues there? Yes. But there are actually _more_ problems with
> people dissing the work than with the code itself.
No, I worked on this code really hard for some time.
Just got tired listening to flacky ideas and syncing against
intendation/whitespaces changes.
The biggest problem with IDE is not code or dissing people,
but _the way_ of changes...
And the thing I dislike is how people who apparently haven't read
carefully every ide-clean patch, and didn't bother to look up the full
track of changes (2.4 -> 2.5) feel that they are perfectly fine to make
such statements ;-).
> Linus
Linus, please don't add me to .killmail for this mail yet :-).
It is really _the last_ mail on this subject.
Yep, I did too much of this dissing lately...
...and it was really not needed.
Regards
--
Bartlomiej
On Wed, Jul 24 2002, Jonathan Corbet wrote:
> > > <[email protected]>:
> > > o IDE-101
> >
> > What on earth is this??? I'm really surprised you accept this as a
> > changelog entry especially when considering that there's no further
> > information about the latest IDE changes on the mailinglist anymore...
>
> You need to look at the full changelog to see the full entry: see, for
> example: http://lwn.net/Articles/5577/. Or, to save the wear on your web
> browser:
>
> <[email protected]>
> [PATCH] IDE-101
>
> Here is a quick fix. I would like to synchronize with the irq handler
> changes as well. Becouse right now I know that preemption is killing
> the disk subsystem when moving data between disks using different
> request queues... In esp. It get's me in to do_request() with a queue
> in unplugged state. (Not everything is my fault, after all :-).
^^^^^^^^^
must be a typo, it would be a bug to enter do_request() with the queue
in a _plugged_ state, not vice versa.
--
Jens Axboe
On Wed, Jul 24, 2002 at 06:17:17PM -0700, Linus Torvalds wrote:
> > @@ -2814,15 +2814,15 @@
> > }
> > dmabuf->count = dmabuf->dmasize;
> > outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
> > - save_flags(flags);
> > - cli();
> > + local_irq_save(flags);
> > + local_irq_disable();
>
> First off, "local_irq_save()" does both the save and the disable (the same
> way "spin_lock_irqsave()" does), it's the "local_save_flags(") that is
> equivalent to the old plain save_flags. So this should just be
>
> local_irq_save(flags);
Ah, sorry, I didn't get that from cli-sti-removal.txt. Actually it
looks like cli-sti-removal.txt is a bit wrong, as there is no
local_irq_save_off() function. I'll send a patch for that next.
> However, I also wonder if the code doesn't want any SMP locking? Is it ok
> to just make it a non-spinlock local interrupt disable, and if so, why?
This is _only_ a guess, as I do not know this hardware or driver at all,
but this function is only called at device init time (from the PCI probe
function), so I don't think anything else is going on in the driver at
the same time to warrent SMP locking. It also looks like this is done
to determine the "clocking" of the device, so interrupts need to be off
for the CPU doing the determination.
But again, this is only a guess from glancing at the code for a very
short time, and I should probably start using the ALSA version of this
driver anyway :)
Here's a patch with the extra local_irq_disable() call removed.
thanks,
greg k-h
diff -Nru a/sound/oss/i810_audio.c b/sound/oss/i810_audio.c
--- a/sound/oss/i810_audio.c Wed Jul 24 23:08:21 2002
+++ b/sound/oss/i810_audio.c Wed Jul 24 23:08:21 2002
@@ -1733,7 +1733,7 @@
}
spin_unlock_irqrestore(&state->card->lock, flags);
- synchronize_irq();
+ synchronize_irq(state->card->irq);
dmabuf->ready = 0;
dmabuf->swptr = dmabuf->hwptr = 0;
dmabuf->count = dmabuf->total_bytes = 0;
@@ -2814,15 +2814,14 @@
}
dmabuf->count = dmabuf->dmasize;
outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
- save_flags(flags);
- cli();
+ local_irq_save(flags);
start_dac(state);
offset = i810_get_dma_addr(state, 0);
mdelay(50);
new_offset = i810_get_dma_addr(state, 0);
stop_dac(state);
outb(2,card->iobase+dmabuf->write_channel->port+OFF_CR);
- restore_flags(flags);
+ local_irq_restore(flags);
i = new_offset - offset;
#ifdef DEBUG
printk("i810_audio: %d bytes in 50 milliseconds\n", i);
In article <[email protected]>,
Jean Tourrilhes <[email protected]> wrote:
>
> IrDA is not going to get fixed soon. Over the time I've been
>fixing the IrDA stack, I've slowly fixed some of most dangerous
>locking problems, but fixing the remaining code will involve some
>serious re-work and is unfortunately not just about sprinking a few
>spinlocks there and there.
Actually, the way to emulate cli/sti behaviour is not to "sprinkle"
spinlocks, you can generally do it with _one_ spinlock per subsystem.
So the straightforward way to port away from cli/sti is to add one
spinlock which takes their place for that subsystem, and then get that
lock on entry to subsystem interrupts and timer events, and in all
places where there used to be a cli/sti.
It gets a bit more complicated partly because you could nest cli/sti,
and you can't nest spinlocks, but on the whole none of it is "rocket
science".
Of course, doing it _right_ (rather than try to just translate the
semantics of cli/sti fairly directly) can be a lot more work. But even a
straight translation improves on what used to be, since different
subsystems will now be independent, and since it is easier later on to
split the one lock up on a as-needed basis.
Linus
From: [email protected] (Linus Torvalds)
Date: Thu, 25 Jul 2002 06:02:04 +0000 (UTC)
It gets a bit more complicated partly because you could nest cli/sti,
and you can't nest spinlocks, but on the whole none of it is "rocket
science".
Actually the "rocket science" part is that these "cli()" users in the
unmaintained net stacks also want cli() to shut up input packet
processing as well as TIMER_BH.
This means they assume that cli() means "nobody can even look at the
existence of any of the timers". Ie. they do this to ensure they
can simply del_timer and there is no possibility someone sits inside
of the actual handler.
Of course del_timer_sync can be used to deal with that specific
case. But this specific example is just the tip of the iceberg.
I really think it is unwise to even imply that this kind of cli/sti
fixup can be done in some mindless manner, it really can't :-)
On Wed, Jul 24, 2002 at 06:17:17PM -0700, Linus Torvalds wrote:
> On Wed, 24 Jul 2002, Greg KH wrote:
> >
> > Here's a small patch to get the i810_audio.c driver working again.
>
> Hmm..
>
> > @@ -2814,15 +2814,15 @@
> > }
> > dmabuf->count = dmabuf->dmasize;
> > outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
> > - save_flags(flags);
> > - cli();
> > + local_irq_save(flags);
> > + local_irq_disable();
>
> First off, "local_irq_save()" does both the save and the disable (the same
> way "spin_lock_irqsave()" does), it's the "local_save_flags(") that is
> equivalent to the old plain save_flags. So this should just be
>
> local_irq_save(flags);
>
> However, I also wonder if the code doesn't want any SMP locking? Is it ok
> to just make it a non-spinlock local interrupt disable, and if so, why?
Because this is all part of the module init code and our purpose here is
not locking out other users of the card (which is already accomplished via
the fact that card->initializing != 0 at the moment so any other attempts
to access the card, which must first go through the open routine to get a
file handle to the device, are all blocking in i810_open() or
i810_mixer_open() waiting on card->initializing to become 0) but instead
it is merely intended to stop all interrupts that might skew our timing
via udelay() on the local CPU (it's actually pretty important that we keep
our variance from a real 50ms delay as small as possible, since the more
variance we allow in this loop the more likely it will be that our sound
card will play sounds either a bit too fast or too slow).
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
On Wed, Jul 24, 2002 at 11:01:06PM -0700, Greg KH wrote:
> On Wed, Jul 24, 2002 at 06:17:17PM -0700, Linus Torvalds wrote:
> > > @@ -2814,15 +2814,15 @@
> > > }
> > > dmabuf->count = dmabuf->dmasize;
> > > outb(31,card->iobase+dmabuf->write_channel->port+OFF_LVI);
> > > - save_flags(flags);
> > > - cli();
> > > + local_irq_save(flags);
> > > + local_irq_disable();
> >
> > First off, "local_irq_save()" does both the save and the disable (the same
> > way "spin_lock_irqsave()" does), it's the "local_save_flags(") that is
> > equivalent to the old plain save_flags. So this should just be
> >
> > local_irq_save(flags);
>
> Ah, sorry, I didn't get that from cli-sti-removal.txt. Actually it
> looks like cli-sti-removal.txt is a bit wrong, as there is no
> local_irq_save_off() function. I'll send a patch for that next.
Here's that patch.
thanks,
greg k-h
diff -Nru a/Documentation/cli-sti-removal.txt b/Documentation/cli-sti-removal.txt
--- a/Documentation/cli-sti-removal.txt Wed Jul 24 23:25:38 2002
+++ b/Documentation/cli-sti-removal.txt Wed Jul 24 23:25:38 2002
@@ -94,10 +94,10 @@
released.
drivers that want to disable local interrupts (interrupts on the
-current CPU), can use the following five macros:
+current CPU), can use the following four macros:
local_irq_disable(), local_irq_enable(), local_irq_save(flags),
- local_irq_save_off(flags), local_irq_restore(flags)
+ local_irq_restore(flags)
but beware, their meaning and semantics are much simpler, far from
that of the old cli(), sti(), save_flags(flags) and restore_flags(flags)
@@ -107,11 +107,7 @@
local_irq_enable() => turn local IRQs on
- local_irq_save(flags) => save the current IRQ state into flags. The
- state can be on or off. (on some
- architectures there's even more bits in it.)
-
- local_irq_save_off(flags) => save the current IRQ state into flags and
+ local_irq_save(flags) => save the current IRQ state into flags and
disable interrupts.
local_irq_restore(flags) => restore the IRQ state from flags.
Hi,
On Wed, 24 Jul 2002, Greg KH wrote:
> > Ah, sorry, I didn't get that from cli-sti-removal.txt. Actually it
> > looks like cli-sti-removal.txt is a bit wrong, as there is no
> > local_irq_save_off() function. I'll send a patch for that next.
In my understanding things look rather like this:
--- linus-2.5/Documentation/cli-sti-removal.txt 2002-07-24 23:10:23.000000000 -0600
+++ thunder-2.5/Documentation/cli-sti-removal.txt 2002-07-25 01:12:45.000000000 -0600
@@ -96,8 +96,8 @@
drivers that want to disable local interrupts (interrupts on the
current CPU), can use the following five macros:
- local_irq_disable(), local_irq_enable(), local_irq_save(flags),
- local_irq_save_off(flags), local_irq_restore(flags)
+ local_irq_disable(), local_irq_enable(), local_save_flags(flags),
+ local_irq_save(flags), local_irq_restore(flags)
but beware, their meaning and semantics are much simpler, far from
that of the old cli(), sti(), save_flags(flags) and restore_flags(flags)
@@ -107,11 +107,11 @@
local_irq_enable() => turn local IRQs on
- local_irq_save(flags) => save the current IRQ state into flags. The
+ local_save_flags(flags) => save the current IRQ state into flags. The
state can be on or off. (on some
architectures there's even more bits in it.)
- local_irq_save_off(flags) => save the current IRQ state into flags and
+ local_irq_save(flags) => save the current IRQ state into flags and
disable interrupts.
local_irq_restore(flags) => restore the IRQ state from flags.
Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------
Jens Axboe wrote:
>> <[email protected]>
>> [PATCH] IDE-101
>>
>> Here is a quick fix. I would like to synchronize with the irq handler
>> changes as well. Becouse right now I know that preemption is killing
>> the disk subsystem when moving data between disks using different
>> request queues... In esp. It get's me in to do_request() with a queue
>> in unplugged state. (Not everything is my fault, after all :-).
>
> ^^^^^^^^^
>
> must be a typo, it would be a bug to enter do_request() with the queue
> in a _plugged_ state, not vice versa.
Yes you are right. Anyway the described problem is indeed observable.
Am Don, 2002-07-25 um 03.08 schrieb Linus Torvalds:
> > So IDE-101 equals to the small snippet of code pasted somewhere in the
> > evil flamewar?
> Have you _looked_ at the full changelog? Apparently not.
I was merely requesting a bit more verbose information in your regular
changelog, the whole thing is quite exhaustive but this entry didn't
really fit and contained no useful information at all.
I will definitely consider reading the "full changelog" although I
cannot remember having read anything about such a thing before this
thread.
> The snippet was posted as part of the IDE-2.5.27 thread. Go look for it
> yourself.
Exactly what I said, no?
> Most of the IDE stuff is FUD and misinformation. I've run every single
> 2.5.x kernel on an IDE system ("penguin.transmeta.com" has everything on
> IDE), and the main reported 2.5.27 corruption was actually from my BK tree
> apparently due to the IRQ handling changes.
This is very encouraging information that had been missing from the
threads at all: a success story from a person actually trusting und
using this thing.
> The thing I dislike is how people who apparently haven't even read the
> discussions, and didn't bother to look up the full changelog feel that
> they are perfectly fine to spread FUD and misinformation about the IDE
> layer.
I for one did read the discussion(s) but it's really hard to map IDE-101
to some tiny patch in a huge tree of mails.
> Do we have issues there? Yes. But there are actually _more_ problems with
> people dissing the work than with the code itself.
I appreciate Martins work and even more your word on it that it's pretty
stable.
Keep on the good work and let us end this thread for good.
--
Servus,
Daniel
On Wed, 24 Jul 2002, David S. Miller wrote:
> I really think it is unwise to even imply that this kind of cli/sti
> fixup can be done in some mindless manner, it really can't :-)
i think the networking code is a special case - nothing else relies on the
interaction of timers and IRQ contexts in such a deep way. (which it does
for performance reasons.) I'd say 99% of all cli()/sti() users are in the
'introduce a per-driver or per-subsystem lock' league Linus mentioned.
Ingo
On Thu, 25 Jul 2002, Thunder from the hill wrote:
> > > Ah, sorry, I didn't get that from cli-sti-removal.txt. Actually it
> > > looks like cli-sti-removal.txt is a bit wrong, as there is no
> > > local_irq_save_off() function. I'll send a patch for that next.
>
> In my understanding things look rather like this:
indeed - the document did not fully survive some of the cleanups.
> + local_irq_disable(), local_irq_enable(), local_save_flags(flags),
> + local_irq_save(flags), local_irq_restore(flags)
yes.
Ingo
Ingo Molnar wrote:
> On Wed, 24 Jul 2002, David S. Miller wrote:
>
>
>>I really think it is unwise to even imply that this kind of cli/sti
>>fixup can be done in some mindless manner, it really can't :-)
>
>
> i think the networking code is a special case - nothing else relies on the
> interaction of timers and IRQ contexts in such a deep way. (which it does
> for performance reasons.) I'd say 99% of all cli()/sti() users are in the
> 'introduce a per-driver or per-subsystem lock' league Linus mentioned.
Carefull.... The ATA host controller patches showed that mindless fixing
would just hide the fact that, well let me guess, 50% of cli() sti()
are remnants from the days we didn't even have spin locks or
are simple used becouse somone feeled like he needs "kind of safety"
and wanted to make some thing "bullet proof".. And it's easier to see
this kind of aplication on cli() then on "carefully" added spinlocking.
Becouse in the case of spinlocks there is always a chance that they
interact with some code you don't see when looking at a particular place
of usage of course...
On Thu, 25 Jul 2002, Marcin Dalecki wrote:
> Carefull.... The ATA host controller patches showed that mindless fixing
> would just hide the fact that, [...]
the main.c change was, frankly, trivially broken. The function that called
the unregister function already held the ide_lock. We have a debugging
mechanism to detect such bugs as they happen (the NMI watchdog), so it's
relatively straightforward (of course not easy) to extend the use of
ide_lock.
the more subtle cases are when the code somehow relies on cli()
excluding multiple IRQ contexts and BH contexts for example.
Ingo
Dave Jones <[email protected]> writes:
> On Wed, Jul 24, 2002 at 02:13:49PM -0700, Linus Torvalds wrote:
>
> > ... Serial lawyer all shook up (the irq lock kind of forced that one,
> ^^^^^^^
> > but it's certainly been pending long enough..)
>
> Shake `em harder, lets see what falls out. 8-)
>
Run the serial port too fast and it won't crash... it'll sue you,
american style :)
> Dave.
>
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy
From: Ingo Molnar <[email protected]>
Date: Thu, 25 Jul 2002 11:28:15 +0200 (CEST)
i think the networking code is a special case - nothing else relies on the
interaction of timers and IRQ contexts in such a deep way. (which it does
for performance reasons.) I'd say 99% of all cli()/sti() users are in the
'introduce a per-driver or per-subsystem lock' league Linus mentioned.
I'm sure the serial drivers used to. Look at how they were using
SERIAL_BH for example.
RMK's stuff fixes that so wrt. the current state of affairs you're
probably right.
Linus wrote :
> In article <[email protected]>,
> Jean Tourrilhes <[email protected]> wrote:
> >
> > IrDA is not going to get fixed soon. Over the time I've been
> >fixing the IrDA stack, I've slowly fixed some of most dangerous
> >locking problems, but fixing the remaining code will involve some
> >serious re-work and is unfortunately not just about sprinking a few
> >spinlocks there and there.
>
> Actually, the way to emulate cli/sti behaviour is not to "sprinkle"
> spinlocks, you can generally do it with _one_ spinlock per subsystem.
Unfortunately, it won't work for IrDA. The reason is that you
tend to have path like this :
IrLAP -> IrLMP -> IrTTP -> IrNET/IrCOMM/IrLAN/IrSock -> IrTTP -> IrLMP -> IrLAP
And I can't have one global spinlock for the IrDA stack,
because the higher layers (IrNET/IrCOMM/IrLAN/IrSOCK) are totally
independant and have their own locking (for example, I can guarantee
you the IrNET is already safe).
I'm also especially nervous about keeping a spinlock and irq
off while calling protocols higher layres, such as the various socket
function (IrSOCK), or the PPP mux (IrNET), or the TTY layer (IrCOMM).
My feeling is that doing the _one_ spinlock properly would be
as much work than fixing the root problem (the hasbins).
> So the straightforward way to port away from cli/sti is to add one
> spinlock which takes their place for that subsystem, and then get that
> lock on entry to subsystem interrupts and timer events, and in all
> places where there used to be a cli/sti.
Been there, done that. I've been the one doing most the
original SMP work in most Wireless LAN drivers (and the HP100 driver),
and I'm still the one doing most testing.
So, I feel qualified to comment about the IrDA situation.
> It gets a bit more complicated partly because you could nest cli/sti,
> and you can't nest spinlocks, but on the whole none of it is "rocket
> science".
No, here the problem is that the whole locking design is
broken. I know perfectly that the hashbin locking is totally unsafe
and it's a miracle that it work at all. And I'm not sure if I will
ever get something 100% safe.
> Of course, doing it _right_ (rather than try to just translate the
> semantics of cli/sti fairly directly) can be a lot more work. But even a
> straight translation improves on what used to be, since different
> subsystems will now be independent, and since it is easier later on to
> split the one lock up on a as-needed basis.
>
> Linus
Unfortunately, with the current IrDA code, I don't have much
choice but to do it somewhat right.
Now, it's a matter of priorities. The other IrDA developpers
have been bitten by "nothing is wrong" IDE in 2.5.X, so the logical
course of action is to shift to 2.4.X until I find time to get back to
this issue and catch up.
But having IrDA not functional in 2.5.X for a few months is
certainly not that painful compared to other problems in 2.5.X.
And as I was saying to Ingo, the good news is that I'll
probably also no longer will need "deliver_to_old_one()" in the
networking code.
Have fun...
Jean
- Rename ata-timings.h to timings.h. Same arguments as for agp.
- Always include hdparm.h just before ide.h. Include them last where
used. This is preparing to split out the IDE register declarations
out of this file, since many other files in the kernel include it,
which don't have anything to do with IDE.
- Don't use the "IDE special" data type "byte". Just use the u8 data
type for consistency with the rest of the kernel where applicable.
On Thu, 25 Jul 2002, Doug Ledford wrote:
> it is merely intended to stop all interrupts that might skew our timing
> via udelay() on the local CPU (it's actually pretty important that we
> keep our variance from a real 50ms delay as small as possible, since the
> more variance we allow in this loop the more likely it will be that our
> sound card will play sounds either a bit too fast or too slow).
how about a disable_irq_all() and enable_irq_all() call, which would
disable every single interrupt source in the system? Sure it's a bit
heavyweight (it disables the timer interrupt too), but if some driver
**really** needs complete silence in the IRQ system then it might be
useful. It would roughly be equivalent to cli() and sti(), from the
hardirq disabling point of view. [it would not disable bottom halves.]
Ingo
On Sat, 2002-07-27 at 10:10, Ingo Molnar wrote:
> how about a disable_irq_all() and enable_irq_all() call, which would
> disable every single interrupt source in the system? Sure it's a bit
> heavyweight (it disables the timer interrupt too), but if some driver
> **really** needs complete silence in the IRQ system then it might be
> useful. It would roughly be equivalent to cli() and sti(), from the
> hardirq disabling point of view. [it would not disable bottom halves.]
For the precision needed I think a local irq disable and the lock the
driver needs itself are sufficient, and the lock _irqsave will handle
the IRQ bits
On Wed, Jul 24, 2002 at 06:08:48PM -0700, Linus Torvalds wrote:
> Most of the IDE stuff is FUD and misinformation. I've run every single
> 2.5.x kernel on an IDE system ("penguin.transmeta.com" has everything on
> IDE), and the main reported 2.5.27 corruption was actually from my BK tree
> apparently due to the IRQ handling changes.
Linus, Linus, how can you say something so naive?
I need not tell you that one user without problems does not imply
that nobody will have problems.
A few people reported lost filesystems. Many more reported mild
filesystem damage. And now you also report mild filesystem damage.
FUD? Fear? Yes, the fear is justified for whoever does not have backups.
Uncertainty? Yes, when the filesystem is damaged again, it is not quite
clear what causes it. Doubt? Yes, many people doubt whether they can
afford to run 2.5.recent.
This evening I ran vanilla 2.5.29 and was rewarded with mild filesystem damage.
91 files in /lost+found. Nothing. A few kernel versions ago it was three
orders of magnitude worse.
IDE? 2.4.17 and 2.5.27+Jens are stable for me in ordinary use.
IRQ? Quite possible.
My third candidate is USB. Systems without USB are clearly more stable.
Andries
On Sun, 2002-07-28 at 00:57, Andries Brouwer wrote:
> This evening I ran vanilla 2.5.29 and was rewarded with mild filesystem damage.
> 91 files in /lost+found. Nothing. A few kernel versions ago it was three
> orders of magnitude worse.
>
> IDE? 2.4.17 and 2.5.27+Jens are stable for me in ordinary use.
> IRQ? Quite possible.
> My third candidate is USB. Systems without USB are clearly more stable.
USB may have problems but on my test sets with 2.5.of those that booted,
the scsi ones are pretty stable, the IDE ones eat disks or hang (mostly
hang). USB loaded on some of the IDE boxes, the SCSI test boxes dont
have USB.
I've not tried the forward port of the stable IDE code with the test
loads. My SMP 2.5.27 test set on 2.5.27-ac1 (all the bits of which are
in 2.5.29) with symbios scsi on a dual PPro has been running for 6 days.
On Sun, Jul 28, 2002 at 01:57:26AM +0200, Andries Brouwer wrote:
> My third candidate is USB. Systems without USB are clearly more stable.
Hm, then that would imply that all of my systems are unstable :)
Seriously, I don't know of any outstanding 2.5 USB issues that cause
oopses right now, or effect stability. Any problems that people are
having, they sure are not telling me, or the other USB developers
about...
thanks,
greg k-h
On Sun, 28 Jul 2002, Andries Brouwer wrote:
> On Wed, Jul 24, 2002 at 06:08:48PM -0700, Linus Torvalds wrote:
>
> > Most of the IDE stuff is FUD and misinformation. I've run every single
> > 2.5.x kernel on an IDE system ("penguin.transmeta.com" has everything on
> > IDE), and the main reported 2.5.27 corruption was actually from my BK tree
> > apparently due to the IRQ handling changes.
>
> Linus, Linus, how can you say something so naive?
> I need not tell you that one user without problems does not imply
> that nobody will have problems.
That's not what I'm saying. I'm saying that there _are_ problems with IDE,
but that the real problem with IDE is that some people aren't even willing
to help despite the fact that we do have a maintainer that actually can
work with people.
I realize that so many people are probably used to the fact that IDE
maintainers do not take patches from the outside that people have kind of
given up on even working on IDE, but it doesn't help to have people only
be negative (and btw, I'm definitely not talking about you - you've been
exceedingly _positive_ in that you're still willing to test and report on
problems. I'm talking about people who don't even bother to do
bug-reports, but only trash-talk the maintenance).
> A few people reported lost filesystems. Many more reported mild
> filesystem damage. And now you also report mild filesystem damage.
No, I've not reported lost filesystems. I'm reporting that _others_
reported filesystem damage that was _not_ related to the IDE patches at
all, yet were instantly blamed on the IDE patches.
And THAT is part of the problem. I don't know why, but the IDE subsystem
brings out the worst in people.
This is, btw, one reason why I hate mid layers. People blame them for
everything, and fixing it for one setup breaks it for another.
> My third candidate is USB. Systems without USB are clearly more stable.
Hmm.. I doubt that's your problem, but you might just want to pester
Martin about your particular IDE setup and see if some light eventually
goes off somewhere.
I have this memory that you're using PIO mode? Please do make full details
available, reminding people which exact setups are broken..
Linus
On Sat, 27 Jul 2002, Linus Torvalds wrote:
>
> I'm talking about people who don't even bother to do
> bug-reports, but only trash-talk the maintenance.
On that note, let me mention the machines I personally am using IDE, and
apparently do not see problems: a dual PII with "Intel Corp. 82371AB PIIX4
IDE", and a P4 with "SiS 5513 IDE (rev 208)".
Both setups in DMA mode, both setups have one disk per channel (first
channel is disk, second channel is CD-ROM).
So what are the patterns for "working" vs "broken"?
Linus
On Sat, Jul 27, 2002 at 09:40:40PM -0700, Linus Torvalds wrote:
>
>
> On Sat, 27 Jul 2002, Linus Torvalds wrote:
> >
> > I'm talking about people who don't even bother to do
> > bug-reports, but only trash-talk the maintenance.
>
> On that note, let me mention the machines I personally am using IDE, and
> apparently do not see problems: a dual PII with "Intel Corp. 82371AB PIIX4
> IDE", and a P4 with "SiS 5513 IDE (rev 208)".
>
> Both setups in DMA mode, both setups have one disk per channel (first
> channel is disk, second channel is CD-ROM).
>
> So what are the patterns for "working" vs "broken"?
In the probably-not-useful department because I haven't tested on 2.5,
my experience over a quite some time has been that you find a lot more
problems when you are actively beating on both channels. There is some
chipset, I suspect you know which but Andre certainly does, that is just
basically busted when you use both channels. I've had so many problems
with this that for any data I care about I plug in a 3ware controller
and use that instead.
I have a diskscrubber program which runs the bits through a series of
changes, it's pretty trivial to write but I can post mine if you like,
it works for banging on the disk.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
On Sat, Jul 27, 2002 at 01:35:42PM +0100, Alan Cox wrote:
> On Sat, 2002-07-27 at 10:10, Ingo Molnar wrote:
> > how about a disable_irq_all() and enable_irq_all() call, which would
> > disable every single interrupt source in the system? Sure it's a bit
> > heavyweight (it disables the timer interrupt too), but if some driver
> > **really** needs complete silence in the IRQ system then it might be
> > useful. It would roughly be equivalent to cli() and sti(), from the
> > hardirq disabling point of view. [it would not disable bottom halves.]
>
> For the precision needed I think a local irq disable and the lock the
> driver needs itself are sufficient, and the lock _irqsave will handle
> the IRQ bits
Well, to my belief the irq_all() stuff is overkill as Alan points out.
However, Alan also implies that during the init stage we should be holding
the card lock and using that to disable interrupts. I disagree with that
since we may already have other entry points looking at our card and we
don't want the card lock held until after it has been initted and is ready
for real use. So, I would leave it just like the patch to fix up the sti
usage left it.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
On Sat, 27 Jul 2002, Linus Torvalds wrote:
> On Sat, 27 Jul 2002, Linus Torvalds wrote:
> >
> > I'm talking about people who don't even bother to do
> > bug-reports, but only trash-talk the maintenance.
>
> On that note, let me mention the machines I personally am using IDE, and
> apparently do not see problems: a dual PII with "Intel Corp. 82371AB PIIX4
> IDE", and a P4 with "SiS 5513 IDE (rev 208)".
>
> Both setups in DMA mode, both setups have one disk per channel (first
> channel is disk, second channel is CD-ROM).
>
> So what are the patterns for "working" vs "broken"?
>
> Linus
You have too standard systems to see problems :-).
Unusual combinations or more quirky chipsets -> real problems.
Plus there are PIO problems (esp. multisector), but some of them
(not all) are in 2.4 IDE forward port also.
Regards
--
Bartlomiej
On Sat, Jul 27, 2002 at 07:47:01PM -0700, Linus Torvalds wrote:
> > My third candidate is USB. Systems without USB are clearly more stable.
>
> Hmm.. I doubt that's your problem, but you might just want to pester
> Martin about your particular IDE setup and see if some light eventually
> goes off somewhere.
>
> I have this memory that you're using PIO mode? Please do make full details
> available, reminding people which exact setups are broken..
The machine I usually try new kernels on is a 400 MHz Intel Pentium II.
% dmesg | grep hd
Kernel command line: auto BOOT_IMAGE=2.5.27axboe ro root=346 rootfstype=reiserfs hdc=ide-scsi
ide_setup: hdc=ide-scsi
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
ide2: BM-DMA at 0x9c00-0x9c07, BIOS settings: hde:pio, hdf:pio
ide3: BM-DMA at 0xa800-0xa807, BIOS settings: hdg:pio, hdh:pio
hda: Maxtor 91728D8, ATA DISK drive
hdb: Maxtor 91728D8, ATA DISK drive
hdc: Hewlett-Packard CD-Writer Plus 8200, ATAPI CD/DVD-ROM drive
hdd: CD-ROM 40X/AKU, ATAPI CD/DVD-ROM drive
hde: Maxtor 93652U8, ATA DISK drive
hdf: Maxtor 96147H6, ATA DISK drive
hda: host protected area => 1
hda: 33750864 sectors (17280 MB) w/512KiB Cache, CHS=2100/255/63
hdb: host protected area => 1
hdb: 33750864 sectors (17280 MB) w/512KiB Cache, CHS=2100/255/63
hde: host protected area => 1
hde: 71346240 sectors (36529 MB) w/2048KiB Cache, CHS=70780/16/63
hdf: host protected area => 1
hdf: 120064896 sectors (61473 MB) w/2048KiB Cache, CHS=119112/16/63
hdd: ATAPI 48X CD-ROM drive, 128kB Cache
hda: hda1 < hda5 hda6 hda7 > hda4
hda4: <unixware: hda8 hda9 hda10 hda11 hda12 hda13 hda14 >
hdb: hdb1 hdb2 hdb3 < hdb5 hdb6 hdb7 >
hde: hde1 hde2 hde3 < hde5 > hde4
hde2: <bsd: hde6 hde7 hde8 hde9 >
hdf: hdf1 hdf2 hdf3
...
Here hde and hdf live on a HPT366 card.
% dmesg | grep HPT
HPT366: IDE controller on PCI bus 00 dev 48
HPT366: detected chipset, but driver not compiled in!
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
HPT366: IDE controller on PCI bus 00 dev 49
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
[This is from dmesg on a 2.5.27+2.4ide.]
hdc is a CD writer (on ide-scsi)
hdd is a CDROM
No hdparm is used - the IDE is left as the kernel sets it.
I have seen (at least) two kinds of problems:
kernel hang and filesystem corruption.
The hang was always on hde. The corruption was mostly on hdb.
1) Hangs are caused by this HPT366 card. Early 2.5 kernels would
not boot because they would hang as soon as hde was touched.
The same happens for example with the SuSE 8.0 install kernel.
Other kernels would boot but would hang when there was significant
activity on hde or hdf.
2) A different type of problem would be that the superblock
of the root filesystem (on hdb) was zeroed. I have seen this
at least three times - no damage at all, except for a wiped
superblock. Easily repaired with e2fsck -b N.
A less pleasant version of this is a wiped block different
from the superblock, or a block in which all data has been
shifted by a few bytes. On such an occasion e2fsck went
totally berserk and after believing this one block decided
that most of my filesystem was broken, and "repaired" it
out of existence. (That was a filesystem different from
the root filesystem.)
Also yesterday the damage was to a single block, this time
to a reiserfs root filesystem. Lots of messages
is_leaf: free space seems wrong: level=1, nr_items=29, free_space=64 rdkey
vs-5150: search_by_key: invalid format found in block 8274. Fsck?
vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [328 634 0x0 SD]
all involving block 8274.
Turns out that one needs to boot from other media in order to repair
a reiserfs root filesystem - it does not suffice to mount it read-only.
So, maybe ext2 is more convenient than reiserfs on root.
It is not impossible that this corruption-type problem is created at
reboot time.
Andries
On Sun, Jul 28, 2002 at 05:56:26PM +0200, Andries Brouwer wrote:
> On Sat, Jul 27, 2002 at 07:47:39PM -0700, Greg KH wrote:
>
> > On Sun, Jul 28, 2002 at 01:57:26AM +0200, Andries Brouwer wrote:
> > > My third candidate is USB. Systems without USB are clearly more stable.
> >
> > Hm, then that would imply that all of my systems are unstable :)
> >
> > Seriously, I don't know of any outstanding 2.5 USB issues that cause
> > oopses right now, or effect stability. Any problems that people are
> > having, they sure are not telling me, or the other USB developers
> > about...
>
> I reported an oops at shutdown and provided the trivial fix.
> It is the the standard kernel since 2.5.26, I think.
That patch should be in the latest kernel, thanks. Let me know if you
are still having that problem in .29
> But there are still other oopses at shutdown for 2.5.27.
>
> For 2.5.29 I reported
> "> I booted 2.5.29 earlier this evening and was greeted by
> > kernel BUG at transport.c: 351 and
> > kernel BUG at scsiglue.c: 150.
> > (And the usb-storage module now hangs initializing; rmmod fails,
> > reboot is necessary.)"
>
> Further improvement of usb-storage is possible.
Oh yeah, I'm not saying that this is not true at all :)
Matt added a BUG_ON() that seems to be hitting a lot of people, but I
guess that was his intention.
thanks,
greg k-h
On Sun, Jul 28, 2002 at 11:53:10AM -0700, Greg KH wrote:
> > I reported an oops at shutdown and provided the trivial fix.
> > It is the the standard kernel since 2.5.26, I think.
>
> That patch should be in the latest kernel, thanks.
Yes, since 2.5.26.
> Let me know if you are still having that problem in .29
No, I fixed that problem. But, as I told you:
> > But there are still other oopses at shutdown for 2.5.27.
> >
> > For 2.5.29 I reported
> > "> I booted 2.5.29 earlier this evening and was greeted by
> > > kernel BUG at transport.c: 351 and
> > > kernel BUG at scsiglue.c: 150.
> > > (And the usb-storage module now hangs initializing; rmmod fails,
> > > reboot is necessary.)"
[Maybe I forgot to tell you; I am a mathematician; tend to be
fairly precise; thus, the above precisely describes the state
of my knowledge yesterday evening: in the category "USB-induced
oopses at reboot" one bug was fixed in 2.5.26; there are further
such bugs still present in 2.5.27; concerning 2.5.29, it does not
get far enough to decide: the usb-storage module hangs initializing.]
Andries
(Concerning this BUG_ON in transport.c: it should be commented out
for the moment. First of all, nothing is wrong, I think, and secondly,
we know already with certainty that it will happen, so nothing is learnt.
Matt simultaneously came with some SCSI patches. More or less reasonable,
although discussion was possible. But these were not applied in 2.5.29.
If and when these or similar patches have been applied to the SCSI code
one may consider enabling this BUG_ON again.)
(I have not yet looked at the BUG at scsiglue.c: 150.)
On Sun, Jul 28, 2002 at 01:27:32PM +0200, Oliver Neukum wrote:
> Am Sonntag, 28. Juli 2002 01:57 schrieb Andries Brouwer:
> > IDE? 2.4.17 and 2.5.27+Jens are stable for me in ordinary use.
> > IRQ? Quite possible.
> > My third candidate is USB. Systems without USB are clearly more stable.
> could you be a bit more specific?
> Are you refering to a USB mass storage device, or USB in general?
>
> Also which devices do you have connected to USB?
> Which HCD and which chipset? (VIA is known to be problematic)
"USB mass storage" in general.
hcd-pci.c: uhci-hcd @ 00:07.2, Intel Corp. 82371AB PIIX4 USB
I don't think you need to search in this kind of direction.
The usb-storage code is just not very solid. It works more or less,
but it is really easy to provoke an oops.
Since there has been so much mail this evening, let me provoke an oops
just to show. Steps:
1. compile vanilla 2.5.29 with all built-in (also usb), except for
usb-storage.
2a. boot it, do nothing, reboot - all is fine
2b. boot it, insmod usb-storage, rmmod usb-storage, reboot - all is fine
2c. boot it, connect four CF/SM card readers to a hub. Mount usbdevfs.
Look at them with usbview. Now insmod usb-storage. This generates some
kernel messages about the probing, then silence. Wait for two minutes.
Nothing. Still no prompt showing the completion of the insmod.
Remove the four Smart Media card readers from the hub. No reaction.
Ctrl-Alt-Del initiates a reboot, but the reboot hangs.
Wait for a while. Nothing. Touch the (non-USB) keyboard. Oops.
This was a funny oops, rather different from those I usually see.
The stack trace was:
put_queue < handle_scancode < handle_kbd_event < update_wall_time <
timer_bh < keyboard_interrupt < handle_IRQ_event < do_IRQ <
default_idle < default_idle < common_interrupt < default_idle <
default_idle < default_idle < cpu_idle < rest_init.
Andries
[that was all for today, I am afraid - have no time to do Linux work today]
Greg KH wrote:
> On Sun, Jul 28, 2002 at 01:57:26AM +0200, Andries Brouwer wrote:
>
>>My third candidate is USB. Systems without USB are clearly more stable.
>
>
> Hm, then that would imply that all of my systems are unstable :)
>
> Seriously, I don't know of any outstanding 2.5 USB issues that cause
> oopses right now, or effect stability. Any problems that people are
> having, they sure are not telling me, or the other USB developers
> about...
>
> thanks,
Please please learn how to use __FUNCTION__ properly. I see the same
crap over and over again in security. OK? Please tell me a way how
to dual boot a system with the new host controller names between 2.4 and
2.5. Putting redundant alias lines in /etc/modules.conf didn't work.
On Mon, Jul 29, 2002 at 12:16:08PM +0200, Marcin Dalecki wrote:
> Greg KH wrote:
> >On Sun, Jul 28, 2002 at 01:57:26AM +0200, Andries Brouwer wrote:
> >
> >>My third candidate is USB. Systems without USB are clearly more stable.
> >
> >
> >Hm, then that would imply that all of my systems are unstable :)
> >
> >Seriously, I don't know of any outstanding 2.5 USB issues that cause
> >oopses right now, or effect stability. Any problems that people are
> >having, they sure are not telling me, or the other USB developers
> >about...
> >
> >thanks,
>
> Please please learn how to use __FUNCTION__ properly. I see the same
> crap over and over again in security. OK?
{sigh} This has _nothing_ to do with the stability of the code :)
I'd be glad to fix this, if someone sends me a patch. I recently
accepted just such a patch for my 2.4 USB tree, as it is something that
eventually needs to get done.
And if you do send me such a patch, please test it out on older compiler
versions. I just finally got the pci hotplug code fixed up after you
sent in a patch "fixing" this issue.
> Please tell me a way how to dual boot a system with the new host
> controller names between 2.4 and 2.5. Putting redundant alias lines in
> /etc/modules.conf didn't work.
Works for me :)
My modules.conf has the following two lines in it:
alias usb-controller usb-uhci
alias usb-controller uhci-hcd
Works just wonderfully for 2.2, 2.4, and 2.5.
thanks,
greg k-h
On Sat, Jul 27, 2002 at 07:47:39PM -0700, Greg KH wrote:
> On Sun, Jul 28, 2002 at 01:57:26AM +0200, Andries Brouwer wrote:
> > My third candidate is USB. Systems without USB are clearly more stable.
>
> Hm, then that would imply that all of my systems are unstable :)
>
> Seriously, I don't know of any outstanding 2.5 USB issues that cause
> oopses right now, or effect stability. Any problems that people are
> having, they sure are not telling me, or the other USB developers
> about...
I reported an oops at shutdown and provided the trivial fix.
It is the the standard kernel since 2.5.26, I think.
But there are still other oopses at shutdown for 2.5.27.
For 2.5.29 I reported
"> I booted 2.5.29 earlier this evening and was greeted by
> kernel BUG at transport.c: 351 and
> kernel BUG at scsiglue.c: 150.
> (And the usb-storage module now hangs initializing; rmmod fails,
> reboot is necessary.)"
Further improvement of usb-storage is possible.
Andries