2006-02-27 05:27:36

by Linus Torvalds

[permalink] [raw]
Subject: Linux v2.6.16-rc5


The tar-ball is being uploaded right now, and everything else should
already be pushed out. Mirroring might take a while, of course.

There's not much to say about this: people have been pretty good, and it's
just a random collection of fixes in various random areas. The shortlog is
actually pretty short, and it really describes the updates better than
anything else.

Have I missed anything? Holler. And please keep reminding about any
regressions since 2.6.15.

Linus

---

Adrian Bunk:
[AGPGART] help text updates
drivers/net/tlan.c: #ifdef CONFIG_PCI the PCI specific code

Al Viro:
GFP_KERNEL allocations in atomic (auditsc)
don't mangle INQUIRY if cmddt or evpd bits are set
fix handling of st_nlink on procfs root
m68k: restore disable_irq_nosync()
missing ntohs() in ip6_tunnel
m68k: pm_power_off() breakage
iomap_copy fallout (m68k)
sd: fix memory corruption with broken mode page headers

Alan Curry:
powerpc: fix altivec_unavailable_exception Oopses

Alessandro Zummo:
[ARM] 3342/1: NSLU2: Protect power button init routine with machine_is_nslu2()
[ARM] 3343/1: NAS100d: Fix incorrect I2C pin assignment
[ARM] 3344/1: NSLU2: beeper support

Alexey Dobriyan:
mm/mempolicy.c: fix 'if ();' typo
drivers/fc4/fc.c: memset correct length

Alexey Korolev:
cfi_cmdset_0001: fix range for cache invalidation

Andi Kleen:
x86_64: Don't set CONFIG_DEBUG_INFO in defconfig
Fix units in mbind check
x86_64: Only do the clustered systems have unsynchronized TSC assumption on IBM systems
x86-64/i386: Use common X86_PM_TIMER option and make it EMBEDDED
x86_64: Disable ACPI blacklist by year for now on x86-64
x86_64: Fix the additional_cpus=.. option
x86_64: Move the SMP time selection earlier
x86_64: Better ATI timer fix
x86_64: Fix ioctl compat code for /dev/rtc

Andreas Deresch:
i386: Handle non existing APICs without panicing

Andrew Morton:
ramfs: update dir mtime and ctime

Andrew Victor:
[ARM] 3325/2: GPIO function to control multi-drive (open collector) capability
[ARM] 3348/1: Disable GPIO interrupts

Anton Altaparmakov:
NTFS: Fix a potential overflow by casting (index + 1) to s64 before doing a
NTFS: - Cope with attribute list attribute having invalid flags.
NTFS: Implement support for sector sizes above 512 bytes (up to the maximum
NTFS: Do more detailed reporting of why we cannot mount read-write by

Anton Blanchard:
powerpc: Fix runlatch performance issues
powerpc64: remove broken/bitrotted HMT support

Antonino A. Daplas:
Fix pseudo_palette setup in asiliantfb_setcolreg()

Atsushi Nemoto:
[MIPS] Fixes for uaccess.h with gcc >= 4.0.1
[MIPS] jiffies_to_compat_timeval fix

Benjamin Herrenschmidt:
powermac: Fix loss of ethernet PHY on sleep

Bj?rn Steinbrink:
kjournald keeps reference to namespace

Brian Magnuson:
fix build on x86_64 with !CONFIG_HOTPLUG_CPU

Carl-Daniel Hailfinger:
radeonfb: resume support for Samsung P35 laptops

Catalin Marinas:
[ARM] 3340/1: Fix the PCI setup for direct master access to SDRAM

Chris McDermott:
x86_64: Fix NMI watchdog on x460

Christoph Hellwig:
[SCSI] esp: fix eh locking

Christoph Lameter:
Terminate process that fails on a constrained allocation
page migration: Fix MPOL_INTERLEAVE behavior for migration via mbind()
vmscan: fix zone_reclaim

Daniel Yeisley:
i386: need to pass virtual address to smp_read_mpc()

Dave Airlie:
drm: fixup i915 interrupt on X server exit
drm: radeon add r300 TX_CNTL and verify bitblt packets
drm: fix brace placement

Dave Jones:
[AGPGART] Improve the error message shown when we detect a ServerWorks CNB20HE
[AGPGART] Add some informational printk to nforce GART failure path.
x86-64: react to new topology.c location

David S. Miller:
[SPARC64]: Implement futex_atomic_op_inuser().
[SPARC64]: Make cpu_present_map available earlier.

Eric Van Hensbergen:
v9fs: update documentation and fix debug flag

Francois Romieu:
r8169: fix broken ring index handling in suspend/resume
r8169: enable wake on lan

Frank Pavlic:
s390: V=V qdio fixes

Freddy Spierenburg:
au1100fb: replaced io_remap_page_range() with io_remap_pfn_range()

Greg Kroah-Hartman:
Revert mount/umount uevent removal

Haren Myneni:
powerpc: Trivial fix to set the proper timeout value for kdump

Heiko Carstens:
cpu hotplug documentation fix
s390: revert dasd eer module

Herbert Xu:
padlock: Fix typo that broke 256-bit keys
[XFRM]: Eliminate refcounting confusion by creating __xfrm_state_put().
[IPSEC]: Use TOS when doing tunnel lookups

Hirokazu Takata:
m32r: __cmpxchg_u32 fix
m32r: update sys_tas() routine
m32r: enable asm code optimization
m32r: fix and update for gcc-4.0

Hugh Dickins:
tmpfs: fix mount mpol nodelist parsing
tmpfs: recommend remount for mpol

Hugo Santos:
[IPV6] ip6_tunnel: release cached dst on change of tunnel params

Jamal Hadi Salim:
[NET] ethernet: Fix first packet goes out with MAC 00:00:00:00:00:00

James Bottomley:
voyager: fix boot panic by adding topology export
voyager: fix the cpu_possible_map to make voyager boot again
x86: fix broken SMP boot sequence
fix voyager after topology.c move

Jan Beulich:
x86_64: fix USER_PTRS_PER_PGD

Jean Tourrilhes:
[IRDA]: irda-usb bug fixes

Jon Mason:
x86_64: no_iommu removal in pci-gart.c

Juergen Kreileder:
Fix snd-usb-audio in 32-bit compat environment

Jun'ichi Nomura:
dm: missing bdput/thaw_bdev at removal
dm: free minor after unlink gendisk

Kaj-Michael Lang:
gbefb: IP32 gbefb depth change fix

Kelly Daly:
powerpc: disable OProfile for iSeries

Kumar Gala:
powerpc: Enable coherency for all pages on 83xx to fix PCI data corruption
powerpc: Fix mem= cmdline handling on arch/powerpc for !MULTIPLATFORM

Kurt Garloff:
OOM kill: children accounting

Linus Torvalds:
Make Kprobes depend on modules
Linux v2.6.16-rc5

Luke Yang:
Fix undefined symbols for nommu architecture

Marc Zyngier:
Fix Specialix SI probing

Martin Michlmayr:
[MIPS] Add support for TIF_RESTORE_SIGMASK for signal32
[MIPS] Make do_signal32 return void.
[MIPS] Fix compiler warnings in arch/mips/sibyte/bcm1480/irq.c
gbefb: Set default of FB_GBE_MEM to 4 MB

Michael Ellerman:
powerpc: Don't start secondary CPUs in a UP && KEXEC kernel
powerpc: Make UP -> SMP kexec work again
powerpc: Fix bug in spinup of renumbered secondary threads
powerpc: Initialise hvlpevent_queue.lock correctly
powerpc: Only calculate htab_size in one place for kexec

Michal Janusz Miroslaw:
[SERIAL] Trivial comment fix: include/linux/serial_reg.h

Michal Ostrowski:
Fix race condition in hvc console.

M?rten Wikstr?m:
[ARM] 3347/1: Bugfix for ixp4xx_set_irq_type()

Olaf Hering:
powerpc: remove duplicate exports
ppc: fix adb breakage in xmon

Olof Johansson:
powerpc: Fix OOPS in lparcfg on G5
powerpc: Update {g5,pseries,ppc64}_defconfig

Paolo 'Blaisorblade' Giarrusso:
uml: correct error messages in COW driver
uml: fix usage of kernel_errno in place of errno
uml: fix ((unused)) attribute
uml: os_connect_socket error path fixup
uml: better error reporting for read_output
uml: tidying COW code

Patrick McHardy:
[XFRM]: Fix policy double put
[NETFILTER]: Fix NAT PMTUD problems
[NETFILTER]: Fix outgoing redirects to loopback
[NETFILTER]: Fix bridge netfilter related in xfrm_lookup

Paul Mackerras:
powerpc: Keep xtime and gettimeofday in sync

Pavel Machek:
suspend-to-ram: allow video options to be set at runtime

Pekka Enberg:
NTFS: We have struct kmem_cache now so use it instead of the typedef.

Peter Oberparleiter:
s390: dasd reference counting

Peter Osterlund:
pktcdvd: Correctly set rq->cmd_len in pkt_generic_packet()
pktcdvd: Rename functions and make their return values sane
pktcdvd: Remove useless printk statements
pktcdvd: Fix the logic in the pkt_writable_track function
pktcdvd: Only return -EROFS when appropriate

Prasanna S Panchamukhi:
Kprobes causes NX protection fault on i686 SMP

R Sharada:
powerpc64: fix spinlock recursion in native_hpte_clear

Ralf Baechle:
H8/300: CONFIG_CONFIG_ doesn't fly.
[MIPS] Make integer overflow exceptions in kernel mode fatal.
[MIPS] Reformat _sys32_rt_sigsuspend with tabs instead of space for consistency.
[MIPS] N32: Fix N32 rt_sigtimedwait and rt_sigsuspend breakage.
[MIPS] N32: Make sure pointer is good before passing it to sys_waitid().
[MIPS] Sibyte: #if CONFIG_* doesn't fly.
[MIPS] Sibyte: Config option names shouldn't be prefixed with CONFIG_
[MIPS] Follow Uli's latest *at syscall changes.
[MIPS] Yosemite: Fix build damage by dc8f6029cd51af1b148846a32e68d69013a5cc0f.
[MIPS] Disable CONFIG_ISCSI_TCP; it triggers a gcc 3.4 endless loop.

Rene Herman:
snd-cs4236 typo fix

Richard Lucassen:
[NET]: Increase default IFB device count.

Rojhalat Ibrahim:
[MIPS] Add topology_init.

Russell King:
[MMC] Fix mmc_cmd_type() mask
[ARM] Add panic-on-oops support
[ARM] Update mach-types
[ARM] CONFIG_CPU_MPCORE -> CONFIG_CPU_32v6K
[SERIAL] Add comment about early_serial_setup()

Samuel Thibault:
vgacon: no vertical resizing on EGA

Segher Boessenkool:
powerpc: Fix some MPIC + HT APIC buglets
powerpc: Don't re-assign PCI resources on Maple

Simon Vogl:
cfi: init wait queue in chip struct

Stefan Richter:
sbp2: fix another deadlock after disconnection
sbp2: variable status FIFO address (fix login timeout)
sbp2: update 36byte inquiry workaround (fix compatibility regression)

Stephen Hemminger:
sky2: yukon-ec-u chipset initialization
sky2: limit coalescing values to ring size
sky2: poke coalescing timer to fix hang
sky2: force early transmit status
sky2: use device iomem to access PCI config
sky2: close race on IRQ mask update.
skge: NAPI/irq race fix
skge: genesis phy initialzation
skge: protect interrupt mask

Stephen Rothwell:
Fix compile for CONFIG_SYSVIPC=n or CONFIG_SYSCTL=n

Stephen Street:
spi: Fix modular master driver remove and device suspend/remove

Steve French:
CIFS: CIFSSMBRead was returning an invalid pointer in buf on socket error

Suresh Bhogavilli:
[IPV4]: Fix garbage collection of multipath route entries

Suresh Siddha:
x86_64: Check for bad elf entry address.

Takashi Iwai:
alsa: fix bogus snd_device_free() in opl3-oss.c

Tejun Heo:
libata: fix WARN_ON() condition in *_fill_sg()
libata: fix qc->n_elem == 0 case handling in ata_qc_next_sg
libata: make ata_sg_setup_one() trim zero length sg

Uli Luckas:
[ARM] 3345/1: Fix interday RTC alarms

Ulrich Drepper:
flags parameter for linkat

YOSHIFUJI Hideaki:
[NET]: NETFILTER: remove duplicated lines and fix order in skb_clone().
[IPV6]: Do not ignore IPV6_MTU socket option.

Zachary Amsden:
Fix topology.c location

Zhu Yi:
ipw2200: Suppress warning message


2006-02-27 05:51:12

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Linus Torvalds wrote:
> The tar-ball is being uploaded right now, and everything else should
> already be pushed out. Mirroring might take a while, of course.
>
> There's not much to say about this: people have been pretty good, and it's
> just a random collection of fixes in various random areas. The shortlog is
> actually pretty short, and it really describes the updates better than
> anything else.
>
> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

Yep, you missed the data corruption fix (libata) and oops fix (netdev)
that I sent at 5pm EST today...

And we may have to turn off FUA (barriers) before 2.6.16 goes out.

Jeff



2006-02-27 06:13:57

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.16-rc5: known regressions

This email lists some known regressions in 2.6.16-rc5 compared to 2.6.15.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you was declared guilty for a breakage or I'm considering you in any
other way possibly involved with one or more of these issues.

Due to the huge amount of recipients, this email has a Reply-To set.
Please add the appropriate people to the Cc when replying regarding one
of these issues.


Subject : usb_submit_urb(ctrl) failed on 2.6.16-rc4-git10 kernel
References : http://bugzilla.kernel.org/show_bug.cgi?id=6134
Submitter : Ryan Phillips <[email protected]>
Status : unknown


Subject : Oops in Kernel 2.6.16-rc4 on Modprobe of saa7134.ko
References : http://lkml.org/lkml/2006/2/20/122
Submitter : Brian Marete <[email protected]>
Status : unknown


Subject : saa7146: no devices created in /dev/dvb
References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=181063
http://lkml.org/lkml/2006/2/18/204
Submitter : Tom Seeley <[email protected]>
Dave Jones <[email protected]>
Handled-By : Jiri Slaby <[email protected]>
Status : unknown


Subject : S3 sleep hangs the second time - 600X
References : http://bugzilla.kernel.org/show_bug.cgi?id=5989
Submitter : Sanjoy Mahajan <[email protected]>
Handled-By : Luming Yu <[email protected]>
Status : is being debugged,
we might want to change the default back for 2.6.16:
http://lkml.org/lkml/2006/2/25/101


Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
References : http://lkml.org/lkml/2006/2/20/159
Submitter : Mark Lord <[email protected]>
Handled-By : Randy Dunlap <[email protected]>
Status : one of Randy's patches seems to fix it


Subject : total ps2 keyboard lockup from boot
References : http://bugzilla.kernel.org/show_bug.cgi?id=6130
Submitter : Duncan <[email protected]>
Handled-By : Dmitry Torokhov <[email protected]>
Pavlik Vojtech <[email protected]>
Status : discussion and debugging in the bug logs


Subject : psmouse starts losing sync in 2.6.16-rc2
References : http://lkml.org/lkml/2006/2/5/50
Submitter : Meelis Roos <[email protected]>
Handled-By : Dmitry Torokhov <[email protected]>
Status : Dmitry: Working on various manifestations of this one.
At worst we will have to disable resync by default
before 2.6.16 final is out and continue in 2.6.17 cycle.



cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-27 06:19:57

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, 27 Feb 2006 00:51:07 -0500 Jeff Garzik wrote:

> Linus Torvalds wrote:
> > The tar-ball is being uploaded right now, and everything else should
> > already be pushed out. Mirroring might take a while, of course.
> >
> > There's not much to say about this: people have been pretty good, and it's
> > just a random collection of fixes in various random areas. The shortlog is
> > actually pretty short, and it really describes the updates better than
> > anything else.
> >
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> Yep, you missed the data corruption fix (libata) and oops fix (netdev)
> that I sent at 5pm EST today...
>
> And we may have to turn off FUA (barriers) before 2.6.16 goes out.

Jeff, were you planning to make atapi_enabled=1 be the default
for 2.6.16 ?

---
~Randy

2006-02-27 06:27:16

by Ryan Phillips

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

Adrian Bunk wrote:
> This email lists some known regressions in 2.6.16-rc5 compared to 2.6.15.
>
> If you find your name in the Cc header, you are either submitter of one
> of the bugs, maintainer of an affectected subsystem or driver, a patch
> of you was declared guilty for a breakage or I'm considering you in any
> other way possibly involved with one or more of these issues.
>
> Due to the huge amount of recipients, this email has a Reply-To set.
> Please add the appropriate people to the Cc when replying regarding one
> of these issues.
>
>
> Subject : usb_submit_urb(ctrl) failed on 2.6.16-rc4-git10 kernel
> References : http://bugzilla.kernel.org/show_bug.cgi?id=6134
> Submitter : Ryan Phillips <[email protected]>
> Status : unknown
>
>
*snipped
> Subject : total ps2 keyboard lockup from boot
> References : http://bugzilla.kernel.org/show_bug.cgi?id=6130
> Submitter : Duncan <[email protected]>
> Handled-By : Dmitry Torokhov <[email protected]>
> Pavlik Vojtech <[email protected]>
> Status : discussion and debugging in the bug logs
>
>
*snipped


It appears that Duncan's "total ps2 keyboard lockup from boot" is the
same, or similar problem as mine.
2.6.15.1 kernel is working for me though.

-Ryan

2006-02-27 06:39:15

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Sun, Feb 26, 2006 at 10:26:41PM -0800, Ryan Phillips wrote:
> Adrian Bunk wrote:
> > This email lists some known regressions in 2.6.16-rc5 compared to 2.6.15.
> >
> > If you find your name in the Cc header, you are either submitter of one
> > of the bugs, maintainer of an affectected subsystem or driver, a patch
> > of you was declared guilty for a breakage or I'm considering you in any
> > other way possibly involved with one or more of these issues.
> >
> > Due to the huge amount of recipients, this email has a Reply-To set.
> > Please add the appropriate people to the Cc when replying regarding one
> > of these issues.
> >
> >
> > Subject : usb_submit_urb(ctrl) failed on 2.6.16-rc4-git10 kernel
> > References : http://bugzilla.kernel.org/show_bug.cgi?id=6134
> > Submitter : Ryan Phillips <[email protected]>
> > Status : unknown
> >
> >
> *snipped
> > Subject : total ps2 keyboard lockup from boot
> > References : http://bugzilla.kernel.org/show_bug.cgi?id=6130
> > Submitter : Duncan <[email protected]>
> > Handled-By : Dmitry Torokhov <[email protected]>
> > Pavlik Vojtech <[email protected]>
> > Status : discussion and debugging in the bug logs
> >
> >
> *snipped
>
> It appears that Duncan's "total ps2 keyboard lockup from boot" is the
> same, or similar problem as mine.
> 2.6.15.1 kernel is working for me though.

Except one of the keyboards is USB and the other PS/2. Both are
Microsoft wireless, though.

--
Vojtech Pavlik
Director SuSE Labs

2006-02-27 06:52:57

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Randy.Dunlap wrote:
> On Mon, 27 Feb 2006 00:51:07 -0500 Jeff Garzik wrote:
>
>
>>Linus Torvalds wrote:
>>
>>>The tar-ball is being uploaded right now, and everything else should
>>>already be pushed out. Mirroring might take a while, of course.
>>>
>>>There's not much to say about this: people have been pretty good, and it's
>>>just a random collection of fixes in various random areas. The shortlog is
>>>actually pretty short, and it really describes the updates better than
>>>anything else.
>>>
>>>Have I missed anything? Holler. And please keep reminding about any
>>>regressions since 2.6.15.
>>
>>Yep, you missed the data corruption fix (libata) and oops fix (netdev)
>>that I sent at 5pm EST today...
>>
>>And we may have to turn off FUA (barriers) before 2.6.16 goes out.
>
>
> Jeff, were you planning to make atapi_enabled=1 be the default
> for 2.6.16 ?

It's far too late for that now.

Jeff



2006-02-27 06:54:35

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

Adrian Bunk wrote:
> Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> References : http://lkml.org/lkml/2006/2/20/159
> Submitter : Mark Lord <[email protected]>
> Handled-By : Randy Dunlap <[email protected]>
> Status : one of Randy's patches seems to fix it


This is not a regression, libata suspend/resume has always been crappy.
It's under active development (by Randy, among others) to fix this.

Jeff


2006-02-27 07:08:31

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Mon, Feb 27, 2006 at 01:54:17AM -0500, Jeff Garzik wrote:
> Adrian Bunk wrote:
> >Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> >References : http://lkml.org/lkml/2006/2/20/159
> >Submitter : Mark Lord <[email protected]>
> >Handled-By : Randy Dunlap <[email protected]>
> >Status : one of Randy's patches seems to fix it
>
>
> This is not a regression, libata suspend/resume has always been crappy.
> It's under active development (by Randy, among others) to fix this.

It might have always been crappy, but it is a regression since
according to the submitter it is working with 2.6.15.

> Jeff

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-27 07:29:16

by Dave Jones

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Feb 26, 2006 at 09:27:28PM -0800, Linus Torvalds wrote:

> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

We seem to have a nasty bio slab leak.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=183017
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182970

Two seperate reports, both using raid1, sata_via and firewire
Curiously, they're both on x86-64 too.

Will keep an eye open for other reports of this as they come in.

(The kernels they mention in those reports are fairly recent.
2.6.15-1.1977_FC5 is ctually based on 2.6.16rc4-git6)

Dave


2006-02-27 07:42:39

by Dave Jones

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Feb 26, 2006 at 09:27:28PM -0800, Linus Torvalds wrote:
>
> The tar-ball is being uploaded right now, and everything else should
> already be pushed out. Mirroring might take a while, of course.
>
> There's not much to say about this: people have been pretty good, and it's
> just a random collection of fixes in various random areas. The shortlog is
> actually pretty short, and it really describes the updates better than
> anything else.
>
> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

Those brave Fedora-rawhide testers have also hit an assortment of slab
related errors recently, manifesting in various ways including our old
friend the negative page_mapcount.

(From a 2.6.16rc4-git6 based kernel ...)
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182593

Unable to handle kernel paging request at virtual address 6363665f
printing eip:
00800000
*pde = 6b6b6b6b
Oops: 0000 [#1]
SMP
last sysfs file: /block/hda/hda1/size
Modules linked in: ppdev autofs4 sunrpc ip_conntrack_netbios_ns ipt_REJECT
xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables
video button battery ac ipv6 lp parport_pc parport floppy nvram sg sd_mod
usb_storage scsi_mod ehci_hcd uhci_hcd natsemi snd_intel8x0 snd_ac97_codec
snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd hw_random i2c_i801 soundcore
i2c_core snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd
CPU: 1
EIP: 0060:[<00800000>] Not tainted VLI
EFLAGS: 00210292 (2.6.15-1.1975_FC5smp #1)
EIP is at 0x800000
eax: 00000000 ebx: f7db2200 ecx: f0a6df5c edx: 00000010
esi: f7db2030 edi: f14e5004 ebp: 00000000 esp: f0a6dfa8
ds: 007b es: 007b ss: 0068
Process gnome-settings- (pid: 2565, threadinfo=f0a6d000 task=f1e23290)
Stack: <0>00000001 00000003 bff1f7f4 08966978 c0103d39 00000003 0000541b bff1f7f4
bff1f7f4 08966978 bff1f7a8 ffffffda c010007b 0000007b 00000036 001a5402
00000073 00200216 bff1f784 0000007b 00000000 00000000
Call Trace:
[<c0103d39>] syscall_call+0x7/0xb <0>Code: Bad EIP value.


kernel: slab error in cache_free_debugcheck(): cache `size-32': double free, or memory outside object was overwritten
kernel: [<c015fbc4>] cache_free_debugcheck+0xce/0x1b9 [<c0160b79>] free_block+0x141/0x17d
kernel: [<c0161619>] kmem_cache_free+0x2a/0x5c [<c0160b79>] free_block+0x141/0x17d
kernel: [<c0160c24>] drain_array_locked+0x6f/0x90 [<c0160cb9>] cache_reap+0x74/0x29c
kernel: [<c0131108>] run_workqueue+0x7f/0xba [<c0160c45>] cache_reap+0x0/0x29c
kernel: [<c01318f5>] worker_thread+0x0/0x117 [<c01319db>] worker_thread+0xe6/0x117
kernel: [<c011d6ab>] default_wake_function+0x0/0xc [<c0134149>] kthread+0x9d/0xc9
kernel: [<c01340ac>] kthread+0x0/0xc9 [<c0102005>]kernel_thread_helper+0x5/0xb
kernel: ee9979ac: redzone 1: 0x160fc2a5, redzone 2: 0x170fc2a5.
kernel: Eeek! page_mapcount(page) went negative! (-1)
kernel: page->flags = c001006c
kernel: page->count = 1
kernel: page->mapping = f68f1c00
kernel: ------------[ cut here ]------------
kernel: kernel BUG at mm/rmap.c:555!



There's a bunch more reports of slab related problems, but they're mostly against
kernels from a few weeks back, I'm trying to weed through them and get them
retested on current builds.

Dave


2006-02-27 08:12:42

by Paul Rolland

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Missing also the fix I sent for e1000 MII interface.

Paul

Paul Rolland, rol(at)as2917.net
ex-AS2917 Network administrator and Peering Coordinator

--

Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur
"Some people dream of success... while others wake up and work hard at it"

"I worry about my child and the Internet all the time, even though she's too
young to have logged on yet. Here's what I worry about. I worry that 10 or 15
years from now, she will come to me and say 'Daddy, where were you when they
took freedom of the press away from the Internet?'"
--Mike Godwin, Electronic Frontier Foundation



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Jeff Garzik
> Sent: Monday, February 27, 2006 6:51 AM
> To: Linus Torvalds
> Cc: Linux Kernel Mailing List
> Subject: Re: Linux v2.6.16-rc5
>
> Linus Torvalds wrote:
> > The tar-ball is being uploaded right now, and everything
> else should
> > already be pushed out. Mirroring might take a while, of course.
> >
> > There's not much to say about this: people have been pretty
> good, and it's
> > just a random collection of fixes in various random areas.
> The shortlog is
> > actually pretty short, and it really describes the updates
> better than
> > anything else.
> >
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> Yep, you missed the data corruption fix (libata) and oops fix
> (netdev)
> that I sent at 5pm EST today...
>
> And we may have to turn off FUA (barriers) before 2.6.16 goes out.
>
> Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2006-02-27 09:06:09

by Luming Yu

[permalink] [raw]
Subject: RE: 2.6.16-rc5: known regressions

>Subject : S3 sleep hangs the second time - 600X
>References : http://bugzilla.kernel.org/show_bug.cgi?id=5989
>Submitter : Sanjoy Mahajan <[email protected]>
>Handled-By : Luming Yu <[email protected]>
>Status : is being debugged,
> we might want to change the default back for 2.6.16:
> http://lkml.org/lkml/2006/2/25/101
>

Accordint to bug report, the BIOS DSDT is modified.
I don't know how these changes affect the results
of suspend/resume. But, it is clear this is NOT right approach
to fix problem. Hence, I need the testing report with
un-modified DSDT on TP 600X, bios 1.11.

--Luming

2006-02-27 09:14:52

by Duncan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions (ps2 mouse/keyboard issues)

On Sunday 26 February 2006 23:39, Vojtech Pavlik wrote:
> On Sun, Feb 26, 2006 at 10:26:41PM -0800, Ryan Phillips wrote:
> > Adrian Bunk wrote:
> > > This email lists some known regressions in 2.6.16-rc5 compared to
> > > 2.6.15.
[snip]
> > > Subject : usb_submit_urb(ctrl) failed on 2.6.16-rc4-git10 kernel
> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=6134
> > > Submitter : Ryan Phillips <[email protected]>
> > > Status : unknown
[snip]
> > > Subject : total ps2 keyboard lockup from boot
> > > References : http://bugzilla.kernel.org/show_bug.cgi?id=6130
> > > Submitter : Duncan <[email protected]>
> > > Handled-By : Dmitry Torokhov <[email protected]>
> > > Pavlik Vojtech <[email protected]>
> > > Status : discussion and debugging in the bug logs
> >
> > It appears that Duncan's "total ps2 keyboard lockup from boot" is the
> > same, or similar problem as mine.
> > 2.6.15.1 kernel is working for me though.
>
> Except one of the keyboards is USB and the other PS/2. Both are
> Microsoft wireless, though.

As Ryan observes in his bug, the keyboard and mouse were both plugged into the
ps2 ports. Same keyboard, both amd64, both with both the mouse and keyboard
plugged into the ps2 ports, both with a dead keyboard before rc1. As he says,
evidence suggests it is indeed the same bug. Also, we're both on Gentoo, but
that could simply be due to the fact that Gentoo folks are probably more
likely to be running rc or even git kernels than most, due to the type of
distribution it is and GregKH's recently adding mainline git snapshots to the
package tree to encourage quicker testing.

One discrepancy so far: Ryan mentions git10, implying it failed for him, while
git10 works here but git11 fails. My guess is that the USB error he
originally keyed in on, that turned out to be happening with working kernels
too, sent him down the wrong path, and git10 will end up working for him.
Either that or the root issue is something that changed in git10, and then
again in git11, killing his one day and mine the next, due to the mobo
differences or something.

--
Duncan - Plain text mail please, HTML mail filtered as spam
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin

2006-02-27 09:28:49

by Nick Piggin

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Dave Jones wrote:

> Those brave Fedora-rawhide testers have also hit an assortment of slab
> related errors recently, manifesting in various ways including our old
> friend the negative page_mapcount.
>
> (From a 2.6.16rc4-git6 based kernel ...)
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182593
>

From that report, the page flags indicate that the page is regular
old pagecache. I'd be very surprised if there is a mapcount bug here.

It looks like something is scribbling in memory, which I would
suspect first.

However if you do see new mm/rmap bugs, do keep posting them so we
can see if there is a pattern.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-02-27 11:31:54

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, Feb 27 2006, Dave Jones wrote:
> On Sun, Feb 26, 2006 at 09:27:28PM -0800, Linus Torvalds wrote:
>
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> We seem to have a nasty bio slab leak.
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=183017
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182970
>
> Two seperate reports, both using raid1, sata_via and firewire
> Curiously, they're both on x86-64 too.
>
> Will keep an eye open for other reports of this as they come in.
>
> (The kernels they mention in those reports are fairly recent.
> 2.6.15-1.1977_FC5 is ctually based on 2.6.16rc4-git6)

This smells very much like a raid1 bio leak, I thought Neil had
diagnosed and fixed that already though - Neil?

--
Jens Axboe

2006-02-27 13:37:12

by Mark Lord

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

Adrian Bunk wrote:
>
> Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> References : http://lkml.org/lkml/2006/2/20/159
> Submitter : Mark Lord <[email protected]>
> Handled-By : Randy Dunlap <[email protected]>
> Status : one of Randy's patches seems to fix it

I'm not certain about this. It may also have been broken in 2.6.15,
but it (resume) did work fine with 2.6.14. I've been using Randy's
patches with both 2.6.15 (since -rc?), and 2.6.16-rc.

Cheers

2006-02-27 18:11:36

by Francois Romieu

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Jeff Garzik <[email protected]> :
[...]
> Yep, you missed the data corruption fix (libata) and oops fix (netdev)
> that I sent at 5pm EST today...

Expect a fix for a via-velocity bug when mtu > 1500 and a fix for
suspend/resume with the 8139cp driver later today.

--
Ueimor

2006-02-27 18:38:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Francois Romieu wrote:
> Jeff Garzik <[email protected]> :
> [...]
>
>>Yep, you missed the data corruption fix (libata) and oops fix (netdev)
>>that I sent at 5pm EST today...
>
>
> Expect a fix for a via-velocity bug when mtu > 1500 and a fix for
> suspend/resume with the 8139cp driver later today.

Cool, I'll send those with the e1000 fix that needs to go.

Jeff



2006-02-27 19:51:15

by Rene Herman

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Linus Torvalds wrote:

> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

This one isn't in: http://lkml.org/lkml/2006/2/21/7

Andrew did pick it up -- pnp-bus-type-fix.patch, named as being in the
2.6.16 queue in his 2.6.16-rc4-mm2 announce:

http://lkml.org/lkml/2006/2/24/66

so it's probably okay. The other two patches from that same thread
already made it into -rc5 though, so thought I'd ping anyway. It does
really want to make 2.6.16. Many ISA-PnP drivers are quite severely
broken without (it's also a regression against 2.6.15).

Rene.

2006-02-27 22:28:04

by Francois Romieu

[permalink] [raw]
Subject: Pull request for 'for-jeff' branch

Please pull from branch 'for-jeff' to get the changes below:

git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git

Shortlog
--------
$ git rev-list --pretty master..HEAD | git shortlog

Francois Romieu:
via-velocity: fix memory corruption when changing the mtu
8139cp: fix broken suspend/resume


Patch
-----
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
index f822cd3..dd41049 100644
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -1118,13 +1118,18 @@ err_out:
return -ENOMEM;
}

+static void cp_init_rings_index (struct cp_private *cp)
+{
+ cp->rx_tail = 0;
+ cp->tx_head = cp->tx_tail = 0;
+}
+
static int cp_init_rings (struct cp_private *cp)
{
memset(cp->tx_ring, 0, sizeof(struct cp_desc) * CP_TX_RING_SIZE);
cp->tx_ring[CP_TX_RING_SIZE - 1].opts1 = cpu_to_le32(RingEnd);

- cp->rx_tail = 0;
- cp->tx_head = cp->tx_tail = 0;
+ cp_init_rings_index(cp);

return cp_refill_rx (cp);
}
@@ -1886,30 +1891,30 @@ static int cp_suspend (struct pci_dev *p

spin_unlock_irqrestore (&cp->lock, flags);

- if (cp->pdev && cp->wol_enabled) {
- pci_save_state (cp->pdev);
- cp_set_d3_state (cp);
- }
+ pci_save_state(pdev);
+ pci_enable_wake(pdev, pci_choose_state(pdev, state), cp->wol_enabled);
+ pci_set_power_state(pdev, pci_choose_state(pdev, state));

return 0;
}

static int cp_resume (struct pci_dev *pdev)
{
- struct net_device *dev;
- struct cp_private *cp;
+ struct net_device *dev = pci_get_drvdata (pdev);
+ struct cp_private *cp = netdev_priv(dev);
unsigned long flags;

- dev = pci_get_drvdata (pdev);
- cp = netdev_priv(dev);
+ if (!netif_running(dev))
+ return 0;

netif_device_attach (dev);
-
- if (cp->pdev && cp->wol_enabled) {
- pci_set_power_state (cp->pdev, PCI_D0);
- pci_restore_state (cp->pdev);
- }
-
+
+ pci_set_power_state(pdev, PCI_D0);
+ pci_restore_state(pdev);
+ pci_enable_wake(pdev, PCI_D0, 0);
+
+ /* FIXME: sh*t may happen if the Rx ring buffer is depleted */
+ cp_init_rings_index (cp);
cp_init_hw (cp);
netif_start_queue (dev);

diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index c2d5907..ed1f837 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -1106,6 +1106,9 @@ static void velocity_free_rd_ring(struct

for (i = 0; i < vptr->options.numrx; i++) {
struct velocity_rd_info *rd_info = &(vptr->rd_info[i]);
+ struct rx_desc *rd = vptr->rd_ring + i;
+
+ memset(rd, 0, sizeof(*rd));

if (!rd_info->skb)
continue;

2006-02-27 22:43:45

by NeilBrown

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Monday February 27, [email protected] wrote:
> On Mon, Feb 27 2006, Dave Jones wrote:
> > On Sun, Feb 26, 2006 at 09:27:28PM -0800, Linus Torvalds wrote:
> >
> > > Have I missed anything? Holler. And please keep reminding about any
> > > regressions since 2.6.15.
> >
> > We seem to have a nasty bio slab leak.
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=183017
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=182970
> >
> > Two seperate reports, both using raid1, sata_via and firewire
> > Curiously, they're both on x86-64 too.
> >
> > Will keep an eye open for other reports of this as they come in.
> >
> > (The kernels they mention in those reports are fairly recent.
> > 2.6.15-1.1977_FC5 is ctually based on 2.6.16rc4-git6)
>
> This smells very much like a raid1 bio leak, I thought Neil had
> diagnosed and fixed that already though - Neil?

It certainly does smell like a raid1 bio leak, and we have had those
before, but I've looked over the relevant code several time and cannot
find one. And my test machine doesn't show a leak.

There are some different code paths depending on whether the
underlying devices support BIO_RW_BARRIER or not, so my testing isn't
conclusive - I think my devices do support BIO_RW_BARRIER so it could
just happen where BIO_RW_BARRIER isn't supported .... but the code
still looks good.

There are new code paths to handle auto-correcting read errors, and
they probably haven't been exercises as much as I would like (some,
but not lots and lots) so maybe there is an issue there, but nobody is
reporting disk errors along with the bio leak, and given the size of
the leak, it would need to be lots of errors.

I think we need to narrow down where the problem was introduced. The
current:

2.6.14.7 works,
2.6.16-rc4 doesn't

is too broad.

NeilBrown

2006-02-27 22:52:35

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Rene Herman <[email protected]> wrote:
>
> Linus Torvalds wrote:
>
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> This one isn't in: http://lkml.org/lkml/2006/2/21/7
>
> Andrew did pick it up -- pnp-bus-type-fix.patch, named as being in the
> 2.6.16 queue in his 2.6.16-rc4-mm2 announce:
>
> http://lkml.org/lkml/2006/2/24/66
>
> so it's probably okay. The other two patches from that same thread
> already made it into -rc5 though, so thought I'd ping anyway. It does
> really want to make 2.6.16. Many ISA-PnP drivers are quite severely
> broken without (it's also a regression against 2.6.15).
>

Problem is, that patch was just a "here, try this" thing which Adam slung
onto the mailing list - I have no idea whether it was compete or final or
whether he wants it in 2.6.16 or what. No indication of what problem it's
fixing, nor how, now what risk there is of breaking something else. It's
just a lonely little diff at present.

2006-02-27 23:31:17

by Rene Herman

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Andrew Morton wrote:

> Rene Herman <[email protected]> wrote:

>> This one isn't in: http://lkml.org/lkml/2006/2/21/7
>>
>> Andrew did pick it up -- pnp-bus-type-fix.patch, named as being in the
>> 2.6.16 queue in his 2.6.16-rc4-mm2 announce:
>>
>> http://lkml.org/lkml/2006/2/24/66
>>
>> so it's probably okay. The other two patches from that same thread
>> already made it into -rc5 though, so thought I'd ping anyway. It does
>> really want to make 2.6.16. Many ISA-PnP drivers are quite severely
>> broken without (it's also a regression against 2.6.15).
>>
>
> Problem is, that patch was just a "here, try this" thing which Adam slung
> onto the mailing list - I have no idea whether it was compete or final or
> whether he wants it in 2.6.16 or what.

Adam? But something will need to go in. At the moment an entire bus
subsystem appears to be broken.

> No indication of what problem it's fixing, nor how, now what risk
> there is of breaking something else. It's just a lonely little diff
> at present.

The problem it fixed for me was that the CS4236 ALSA driver's private
PnP remove method was not being called at modprobe -r, which meant that
the card wasn't being freed at all, resulting in memory leaks, the
inability to reload the driver, and oopses, during modprobe -r and reboot.

All ALSA ISA card drivers, not just CS4236, use the same interface to
PnP (the pnp_card_driver struct) meaning they would all appear to be
broken in that exact same way as well. Or rather, _any_ ISA-PnP driver
using that pnp_card_driver interface (there's also drivers using the
pnp_driver interface -- those appear to be okay). CS4236 isn't doing
anything special...

The problem seems to be caused by the "bustype" driver model changes in
2.6.16, the same ones that made the sensors drivers complain about
private methods versus bustype methods which was fixed in -rc2. Adam
said that in fact not so much the teardown was broken, but the setup,
and the patch replaces a subsystem probe method with a bustype method.

As to the risk of it breaking anything else... I doubt it. Given that
the old method did not work _at all_ it seems this is simply the way to
do this.

Adam ofcourse is the PnP expert though...

Rene.

2006-02-28 01:04:17

by Rene Herman

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Rene Herman wrote:

> All ALSA ISA card drivers, not just CS4236, use the same interface to
> PnP (the pnp_card_driver struct) meaning they would all appear to be
> broken in that exact same way as well. Or rather, _any_ ISA-PnP driver
> using that pnp_card_driver interface (there's also drivers using the
> pnp_driver interface -- those appear to be okay). CS4236 isn't doing
> anything special...

If it helps any, I can at least confirm that it's nothing ALSA or CS4236
specific. This is a minimal, skeleton, pnp_card driver:

=== foo.c

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pnp.h>

MODULE_LICENSE("GPL");

static struct pnp_card_device_id foo_pnp_card_device_id_table[] = {
{ .id = "CSCa836", .devs = { { "CSCa800" } } },
/* --- */
{ .id = "" }
};

MODULE_DEVICE_TABLE(pnp_card, foo_pnp_card_device_id_table);

static int foo_pnp_probe(struct pnp_card_link *pcard,
const struct pnp_card_device_id *pid)
{
struct pnp_dev *pdev;

printk(KERN_INFO "%s\n", __FUNCTION__);

pdev = pnp_request_card_device(pcard, pid->devs[0].id, NULL);
if (!pdev || pnp_activate_dev(pdev) < 0)
return -ENODEV;

// allocate, enable.

return 0;
}

static void foo_pnp_remove(struct pnp_card_link *pcard)
{
printk(KERN_INFO "%s\n", __FUNCTION__);

// disable, deallocate.
}

static struct pnp_card_driver foo_pnp_card_driver = {
.name = "foo",
.id_table = foo_pnp_card_device_id_table,
.flags = PNP_DRIVER_RES_DISABLE,
.probe = foo_pnp_probe,
.remove = foo_pnp_remove
};

int __init foo_init(void)
{
return pnp_register_card_driver(&foo_pnp_card_driver);
}

void __exit foo_exit(void)
{
pnp_unregister_card_driver(&foo_pnp_card_driver);
}

module_init(foo_init);
module_exit(foo_exit);

===

compile with

=== Makefile

ifneq ($(KERNELRELEASE),)

obj-m := foo.o

else

default:
$(MAKE) -C /lib/modules/$(shell uname -r)/build M=$(shell pwd)

clean:
$(MAKE) -C /lib/modules/$(shell uname -r)/build M=$(shell pwd)
clean

endif

===

This ofcourse needs ISA-PnP support in the kernel, and actually loading
it requires replacing the PnP IDs with IDs actually present (these are
from my CS4236 soundcard).

With 2.6.15.4 and with 2.6.16-rc with Adam's fix applied, an "insmod
foo.ko && rmmod foo" shows the following in dmesg (this needs the PnP
debug messages selectable in menuconfig):

pnp: the driver 'foo' has been registered
foo_pnp_probe
pnp: match found with the PnP device '01:01.00' and the driver 'foo'
pnp: Device 01:01.00 activated.
foo_pnp_remove
pnp: Device 01:01.00 disabled.
pnp: the driver 'foo' has been unregistered

which is as it should be. On 2.6.16-rc without Adam's fix, both the
"pnp: match found with" and the "foo_pnp_remove" lines are missing:

pnp: the driver 'foo' has been registered
foo_pnp_probe
pnp: Device 01:01.00 activated.
pnp: Device 01:01.00 disabled.
pnp: the driver 'foo' has been unregistered

Of course, with this skeleton driver that's not much of a problem, but
in real drivers it certainly is; in pnp_remove you'd deactivate and
deallocate anything that was allocated and activated in/through the
pnp_probe method -- all things associated with this instance of the
card, normally.

I can also confirm that a driver using the "pnp_driver" interface isn't
affected by the bug. Same skeleton-type driver:

=== bar.c

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pnp.h>

MODULE_LICENSE("GPL");

static struct pnp_device_id bar_pnp_device_id_table[] = {
{ .id = "CSCa800" },
/* --- */
{ .id = "" }
};

MODULE_DEVICE_TABLE(pnp, bar_pnp_device_id_table);

static int bar_pnp_probe(struct pnp_dev *pdev,
const struct pnp_device_id *pid)
{
printk(KERN_INFO "%s\n", __FUNCTION__);

if (pnp_activate_dev(pdev) < 0)
return -ENODEV;

// allocate, enable.

return 0;
}

static void bar_pnp_remove(struct pnp_dev *pdev)
{
printk(KERN_INFO "%s\n", __FUNCTION__);

// disable, deallocate.
}

static struct pnp_driver bar_pnp_driver = {
.name = "bar",
.id_table = bar_pnp_device_id_table,
.flags = PNP_DRIVER_RES_DISABLE,
.probe = bar_pnp_probe,
.remove = bar_pnp_remove
};

int __init bar_init(void)
{
return pnp_register_driver(&bar_pnp_driver);
}

void __exit bar_exit(void)
{
pnp_unregister_driver(&bar_pnp_driver);
}

module_init(bar_init);
module_exit(bar_exit);

===

2.6.15.4, 2.6.16-rc with or without Adam's fix:

pnp: the driver 'bar' has been registered
pnp: match found with the PnP device '01:01.00' and the driver 'bar'
bar_pnp_probe
pnp: Device 01:01.00 activated.
bar_pnp_remove
pnp: Device 01:01.00 disabled.
pnp: the driver 'bar' has been unregistered

So that's all fine. As said though, all ALSA drivers for one are using
the card_driver interface, and are therefore all broken currently.

Rene.

2006-02-28 01:13:42

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Rene Herman <[email protected]> wrote:
>
> On 2.6.16-rc without Adam's fix, both the
> "pnp: match found with" and the "foo_pnp_remove" lines are missing:

Useful, thanks. Hopefully we'll hear from Adam in the next day or two
(he's intermittent lately). If not, I guess we'll need to jam the patch in
anwyay.

In which case we might as well jam it in now, so we get more testing.

2006-02-28 09:38:55

by Peter Hagervall

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5 - regression

In -rc5 the printk timing numbers do not reset to [ 0.000000] upon
boot. This worked in -rc4 and so I started bisecting and git came up
with:

commit 9827b781f20828e5ceb911b879f268f78fe90815
Author: Kurt Garloff <[email protected]>
Date: Mon Feb 20 18:27:51 2006 -0800

[PATCH] OOM kill: children accounting

I can't see why that would break the timing information, but I'll just
assume that git was right, and tell you guys.

My system is:
Linux sap 2.6.16-rc4 #1 PREEMPT Mon Feb 20 13:34:18 CET 2006 i686
Intel(R) Pentium(R) 4 CPU 2.00GHz GenuineIntel GNU/Linux

Let me know if more information is needed.

Peter Hagervall

--
Peter Hagervall......................email: [email protected]
Department of Computing Science........tel: +46(0)90 786 7018
University of Ume?, SE-901 87 Ume?.....fax: +46(0)90 786 6126

2006-02-28 09:44:34

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Mon, Feb 27 2006, Adrian Bunk wrote:
> On Mon, Feb 27, 2006 at 01:54:17AM -0500, Jeff Garzik wrote:
> > Adrian Bunk wrote:
> > >Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> > >References : http://lkml.org/lkml/2006/2/20/159
> > >Submitter : Mark Lord <[email protected]>
> > >Handled-By : Randy Dunlap <[email protected]>
> > >Status : one of Randy's patches seems to fix it
> >
> >
> > This is not a regression, libata suspend/resume has always been crappy.
> > It's under active development (by Randy, among others) to fix this.
>
> It might have always been crappy, but it is a regression since
> according to the submitter it is working with 2.6.15.

It might have worked under lucky circumstances with an idle disk and a
goat sacrifice, so I agree with Jeff that this is definitely not a
regression. To my knowledge, Mark always used my libata suspend patch on
earlier kernels so it's not even an apples-apples comparison.

So please scratch that entry.

--
Jens Axboe

2006-02-28 10:04:53

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5 - regression

Peter Hagervall <[email protected]> wrote:
>
> In -rc5 the printk timing numbers do not reset to [ 0.000000] upon
> boot.

What numbers are you getting now?

> This worked in -rc4 and so I started bisecting and git came up
> with:
>
> commit 9827b781f20828e5ceb911b879f268f78fe90815
> Author: Kurt Garloff <[email protected]>
> Date: Mon Feb 20 18:27:51 2006 -0800
>
> [PATCH] OOM kill: children accounting
>
> I can't see why that would break the timing information, but I'll just
> assume that git was right, and tell you guys.

Well yes, it'll be something else - perhaps some TSC change or something.
We'd need to know what architecture you're using...

Anwyay, these numbers aren't supposed to measure anything absolute like
uptime - they're purely for relative timing. It would be nice to get them
increasing monotonically from zero, but we wouldn't bust a gut to achieve
that - it's just a debugging thing.

2006-02-28 11:41:42

by Peter Hagervall

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5 - regression

On Tue, Feb 28, 2006 at 02:03:36AM -0800, Andrew Morton wrote:
> Peter Hagervall <[email protected]> wrote:
> >
> > In -rc5 the printk timing numbers do not reset to [ 0.000000] upon
> > boot.
>
> What numbers are you getting now?
>

[4294667.296000] and upwards.

> > This worked in -rc4 and so I started bisecting and git came up
> > with:
> >
> > commit 9827b781f20828e5ceb911b879f268f78fe90815
> > Author: Kurt Garloff <[email protected]>
> > Date: Mon Feb 20 18:27:51 2006 -0800
> >
> > [PATCH] OOM kill: children accounting
> >
> > I can't see why that would break the timing information, but I'll just
> > assume that git was right, and tell you guys.
>
> Well yes, it'll be something else - perhaps some TSC change or something.
> We'd need to know what architecture you're using...

sap ~ $ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping : 4
cpu MHz : 1994.176
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3992.49

>
> Anwyay, these numbers aren't supposed to measure anything absolute like
> uptime - they're purely for relative timing. It would be nice to get them
> increasing monotonically from zero, but we wouldn't bust a gut to achieve
> that - it's just a debugging thing.

Yeah, it's not a showstopper or anything, just thought I'd pipe up.

Peter Hagervall

--
Peter Hagervall......................email: [email protected]
Department of Computing Science........tel: +46(0)90 786 7018
University of Ume?, SE-901 87 Ume?.....fax: +46(0)90 786 6126

2006-02-28 11:49:27

by Peter Hagervall

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5 - regression

> On Tue, Feb 28, 2006 at 02:03:36AM -0800, Andrew Morton wrote:
> >
> > Well yes, it'll be something else - perhaps some TSC change or something.

Looking closer I see that CONFIG_X86_PM_TIMER defaults to y in
2.6.16-rc5, whereas I have had it unset in earlier kernels.
This changed silently when I ran 'make oldconfig', and is most likely
the source of this "problem".

Peter Hagervall

--
Peter Hagervall......................email: [email protected]
Department of Computing Science........tel: +46(0)90 786 7018
University of Ume?, SE-901 87 Ume?.....fax: +46(0)90 786 6126

2006-02-28 12:43:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Feb 26, 2006 at 09:27:28PM -0800, Linus Torvalds wrote:
>
> The tar-ball is being uploaded right now, and everything else should
> already be pushed out. Mirroring might take a while, of course.
>
> There's not much to say about this: people have been pretty good, and it's
> just a random collection of fixes in various random areas. The shortlog is
> actually pretty short, and it really describes the updates better than
> anything else.
>
> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

We still have a regression from 2.6.15 in the megaraid_sas driver.

We started sending down all requests as scatter/gather lists after 2.6.15,
and the (broken) way megaraid_sas tried to hide the physical disks ceased
to work. Now the driver shows all physical disks which confuses installers
to no end and could trick people to write to it which would corrupt controller
internal state badly.

To fix this properly the scsi midlayer needs to handle the ->slave_configure
return value. The patch for that is pretty trivially, but could in theory
cause problems if an existing driver returns something bogus from
->slave_configure. Both the core patch and the actual megaraid_sas fix
are in James' scsi-rc-fixes tree, so if you pull that once more we should
be done with this.

2006-02-28 14:56:44

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Mon 27-02-06 07:13:54, Adrian Bunk wrote:
>
> Subject : S3 sleep hangs the second time - 600X
> References : http://bugzilla.kernel.org/show_bug.cgi?id=5989
> Submitter : Sanjoy Mahajan <[email protected]>
> Handled-By : Luming Yu <[email protected]>
> Status : is being debugged,
> we might want to change the default back for 2.6.16:
> http://lkml.org/lkml/2006/2/25/101

Luming's call, but ec_intr apparently fixed some machines, too.s

> Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> References : http://lkml.org/lkml/2006/2/20/159
> Submitter : Mark Lord <[email protected]>
> Handled-By : Randy Dunlap <[email protected]>
> Status : one of Randy's patches seems to fix it

Is this really regression?

--
Thanks, Sharp!

2006-03-01 00:16:09

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Tue, 28 Feb 2006 10:40:53 +0100 Jens Axboe wrote:

> On Mon, Feb 27 2006, Adrian Bunk wrote:
> > On Mon, Feb 27, 2006 at 01:54:17AM -0500, Jeff Garzik wrote:
> > > Adrian Bunk wrote:
> > > >Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> > > >References : http://lkml.org/lkml/2006/2/20/159
> > > >Submitter : Mark Lord <[email protected]>
> > > >Handled-By : Randy Dunlap <[email protected]>
> > > >Status : one of Randy's patches seems to fix it
> > >
> > >
> > > This is not a regression, libata suspend/resume has always been crappy.
> > > It's under active development (by Randy, among others) to fix this.
> >
> > It might have always been crappy, but it is a regression since
> > according to the submitter it is working with 2.6.15.
>
> It might have worked under lucky circumstances with an idle disk and a
> goat sacrifice, so I agree with Jeff that this is definitely not a
> regression. To my knowledge, Mark always used my libata suspend patch on
> earlier kernels so it's not even an apples-apples comparison.
>
> So please scratch that entry.

I'll third that request/comment.

---
~Randy

2006-03-02 14:00:54

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] 2.6.16-rc5: known regressions


> Subject : Oops in Kernel 2.6.16-rc4 on Modprobe of saa7134.ko
> References : http://lkml.org/lkml/2006/2/20/122
> Submitter : Brian Marete <[email protected]>
> Status : unknown

This is not a regression, since the user is not configuring saa7134 with
the right card.

Cheers,
Mauro.

2006-03-03 02:59:30

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

>> Subject : S3 sleep hangs the second time - 600X
>> References : http://bugzilla.kernel.org/show_bug.cgi?id=5989

From: "Yu, Luming" <[email protected]>
> According to bug report, the BIOS DSDT is modified. I don't know
> how these changes affect the results of suspend/resume. But, it is
> clear this is NOT right approach to fix problem. Hence, I need the
> testing report with un-modified DSDT on TP 600X, bios 1.11.

I'll try it, although I don't think I'll get any data on the problem.
The unmodified DSDT (bios 1.11) lacks an S3 sleep object, so I had to
modify the DSDT even to get S3 to sleep at all. See
<http://bugzilla.kernel.org/show_bug.cgi?id=3534> for that discussion.
In additional comment #4 there (2004-10-14), you said:

The root cause of [the missing S3 object] failure is that linux is
using element in

const char *acpi_gbl_sleep_state_names[ACPI_S_STATE_COUNT] =
{
"\_S0_",
"\_S1_",
"\_S2_",
"\_S3_",
"\_S4_",
"\_S5_"
};

to call acpi_get_sleep_type_data, but your box define _S3 under the
device PNP0A03. So, the evaluating \_S3 will fail.

The workaround in DSDT is to change _S3 to \_S3_ .
We can fix it in acpi driver soon.

It looks unchanged in a recent acpi driver
(drivers/acpi/utilities/utglobal.c, line 170, 2.6.16-rc2), so I
suspect S3 won't happen with the vanilla DSDT.

(Sorry, I was away for 10 days and also just saw your info requests in
the bugme #5989.)

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-03-03 04:49:16

by Luming Yu

[permalink] [raw]
Subject: RE: 2.6.16-rc5: known regressions


>
>>> Subject : S3 sleep hangs the second time - 600X
>>> References : http://bugzilla.kernel.org/show_bug.cgi?id=5989
>
>From: "Yu, Luming" <[email protected]>
>> According to bug report, the BIOS DSDT is modified. I don't know
>> how these changes affect the results of suspend/resume. But, it is
>> clear this is NOT right approach to fix problem. Hence, I need the
>> testing report with un-modified DSDT on TP 600X, bios 1.11.
>
>I'll try it, although I don't think I'll get any data on the problem.
>The unmodified DSDT (bios 1.11) lacks an S3 sleep object, so I had to
>modify the DSDT even to get S3 to sleep at all. See
><http://bugzilla.kernel.org/show_bug.cgi?id=3534> for that discussion.
>In additional comment #4 there (2004-10-14), you said:
>
> The root cause of [the missing S3 object] failure is that linux is
> using element in
>
> const char *acpi_gbl_sleep_state_names[ACPI_S_STATE_COUNT] =
> {
> "\_S0_",
> "\_S1_",
> "\_S2_",
> "\_S3_",
> "\_S4_",
> "\_S5_"
> };
>
> to call acpi_get_sleep_type_data, but your box define _S3 under the
> device PNP0A03. So, the evaluating \_S3 will fail.
>
> The workaround in DSDT is to change _S3 to \_S3_ .
> We can fix it in acpi driver soon.

Hmm, this conclusion seems to be wrong. at that time, I said it too
early. The real problem is this, if your box support S3, the _S3 method
should return from ELSE-statement which return package
{0x01,0x01,0x00,0x00}.

If you still use this
http://bugzilla.kernel.org/show_bug.cgi?id=3534#c10 to
override your DSDT, which bypass the testing and blindly assume BIOS or
platform
do support S3, then I suggest you to retest, and post dmesg with
UN-modified BIOS.

Thanks,
Luming


Method (_S3, 0, NotSerialized)
{
If (BXPT)
{
Return (Package (0x04)
{
0x06,
0x06,
0x00,
0x00
})
}
Else
{
Return (Package (0x04)
{
0x01,
0x01,
0x00,
0x00
})
}
}

2006-03-03 05:58:55

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [600X S3 sleep]

>>>> Subject : S3 sleep hangs the second time - 600X
>>>> References : http://bugzilla.kernel.org/show_bug.cgi?id=5989

>>From: "Yu, Luming" <[email protected]>
>>> According to bug report, the BIOS DSDT is modified. I don't know
>>> how these changes affect the results of suspend/resume. But, it is
>>> clear this is NOT right approach to fix problem. Hence, I need the
>>> testing report with un-modified DSDT on TP 600X, bios 1.11.

>>I'll try it, although I don't think I'll get any data on the problem.
>>The unmodified DSDT (bios 1.11) lacks an S3 sleep object, so I had to
>>modify the DSDT even to get S3 to sleep at all. See
>><http://bugzilla.kernel.org/show_bug.cgi?id=3534> for that discussion.

I just tried the first failing commit
(02b28a33aae93a3b53068e0858d62f8bcaef60a3):

Author: Len Brown <[email protected]>
Date: Mon Dec 5 16:46:36 2005 -0500

[ACPI] Enable Embedded Controller (EC) interrupt mode by default

"ec_intr=0" reverts to polling
"ec_burst=" no longer exists.

Signed-off-by: Len Brown <[email protected]>
Acked-by: Luming Yu <[email protected]>

but with the vanilla (BIOS 1.11) DSDT. And not only did S3 sleep
happen, but the thermal+processor bug didn't show up: i.e. it did two
(actually) many S3 sleep-wake cycles. So the problem is due to
something in the modified DSDT, which is either a problem itself or it
exposes another problem.

[Picture of me hanging head in shame for putting people to the trouble.]

My only excuse is that long ago (2.6.11), the machine wouldn't S3
sleep at all with the vanilla DSDT, and I patched it as recommended by
the ACPI experts. However, the current ACPI interpreter seems to
handle the 600X's mangy DSDT.

Now I agree 100% with Len Brown that it's best to have ACPI handle
marginal DSDT's, not hack the DSDT.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-03-03 15:59:47

by Mark Rosenstand

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Monday 27 February 2006 06:27, Linus Torvalds wrote:
> The tar-ball is being uploaded right now, and everything else should
> already be pushed out. Mirroring might take a while, of course.

It would be nice if the -rc announcements could make it to
linux-kernel-announce as well, like -mm.

2006-03-03 16:53:08

by Matthew Garrett

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Fri, Mar 03, 2006 at 02:59:22AM +0000, Sanjoy Mahajan wrote:

> I'll try it, although I don't think I'll get any data on the problem.
> The unmodified DSDT (bios 1.11) lacks an S3 sleep object, so I had to
> modify the DSDT even to get S3 to sleep at all. See
> <http://bugzilla.kernel.org/show_bug.cgi?id=3534> for that discussion.

I think it's arguably a bit extreme to describe "My setup is so
unsupported that I had to modify my firmware to enable sleep and then
override the kernel's sanity checks and it's stopped working with
2.6.16" as a regression.

--
Matthew Garrett | [email protected]

2006-03-03 21:04:22

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

>> I'll try it, although I don't think I'll get any data on the problem.
>> The unmodified DSDT (bios 1.11) lacks an S3 sleep object, so I had to
>> modify the DSDT even to get S3 to sleep at all. See
>> <http://bugzilla.kernel.org/show_bug.cgi?id=3534> for that discussion.

> I think it's arguably a bit extreme to describe "My setup is so
> unsupported that I had to modify my firmware to enable sleep and then
> override the kernel's sanity checks and it's stopped working with
> 2.6.16" as a regression.

I agree, and that was the point of 'picture of me hanging head in
shame', so there's no need to rub it in.

Anyway, the TP600X w/ vanilla DSDT *was* unsupported (circa 2.6.11),
but now the ACPI interpreter can interpret the vanilla DSDT and go
into S3 sleep (before, it would complain about a missing S3 sleep
object because the DSDT used a funny syntax). There were other
problems in the vanilla DSDT (e.g. probably using fn-F7 to switch to
an external display doesn't work) but I'll investigate them one at a
time.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-03-03 23:01:51

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

The m68k defconfig does no longer compile in 2.6.16-rc:

<-- snip -->

...
CC fs/file_table.o
fs/file_table.c: In function `fget':
fs/file_table.c:170: warning: implicit declaration of function `cmpxchg'
...
LD .tmp_vmlinux1
fs/built-in.o(.text+0x275a): In function `fget':
: undefined reference to `cmpxchg'
fs/built-in.o(.text+0x27da): In function `fget_light':
: undefined reference to `cmpxchg'
make: *** [.tmp_vmlinux1] Error 1

<-- snip -->


It seems the problem is that in the CONFIG_RMW_INSNS=n case, there's no
cmpxchg #define in include/asm-m68k/system.h required for the
atomic_add_unless #define in include/asm-m68k/atomic.h.


cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-03 23:26:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken



On Sat, 4 Mar 2006, Adrian Bunk wrote:
>
> It seems the problem is that in the CONFIG_RMW_INSNS=n case, there's no
> cmpxchg #define in include/asm-m68k/system.h required for the
> atomic_add_unless #define in include/asm-m68k/atomic.h.

Hmm. It seems like it never has been there.. Do you know what brought this
on? Was it Nick's RCU changes from "rcuref_dec_and_test()" to
"atomic_dec_and_test()" and friends?

Judging by your error messages, I _think_ it's the "atomic_inc_not_zero()"
that gets expanded to a cmpxchg() that simply doesn't exist on m68k and
never has.

I guess we've never depended on cmpxchg before. Or am I missing something?

Linus

2006-03-03 23:43:26

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

On Fri, Mar 03, 2006 at 03:22:42PM -0800, Linus Torvalds wrote:
>
>
> On Sat, 4 Mar 2006, Adrian Bunk wrote:
> >
> > It seems the problem is that in the CONFIG_RMW_INSNS=n case, there's no
> > cmpxchg #define in include/asm-m68k/system.h required for the
> > atomic_add_unless #define in include/asm-m68k/atomic.h.
>
> Hmm. It seems like it never has been there.. Do you know what brought this
> on? Was it Nick's RCU changes from "rcuref_dec_and_test()" to
> "atomic_dec_and_test()" and friends?

It was Nick's commit 8426e1f6af0fd7f44d040af7263750c5a52f3cc3 that added
atomic_inc_not_zero(), and Nick's patch that changed fs/file_table.c
from rcuref_dec_and_test() to atomic_dec_and_test() exposed this
problem.

> Judging by your error messages, I _think_ it's the "atomic_inc_not_zero()"
> that gets expanded to a cmpxchg() that simply doesn't exist on m68k and
> never has.

Exactly, that's what I wanted to say in my report.

> I guess we've never depended on cmpxchg before. Or am I missing something?

It seems this is the case.

And as far as I can see, m68k is the only architecture where cmpxchg
isn't always available.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-04 00:00:16

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Linus Torvalds <[email protected]> wrote:
>
>
>
> On Sat, 4 Mar 2006, Adrian Bunk wrote:
> >
> > It seems the problem is that in the CONFIG_RMW_INSNS=n case, there's no
> > cmpxchg #define in include/asm-m68k/system.h required for the
> > atomic_add_unless #define in include/asm-m68k/atomic.h.
>
> Hmm. It seems like it never has been there.. Do you know what brought this
> on? Was it Nick's RCU changes from "rcuref_dec_and_test()" to
> "atomic_dec_and_test()" and friends?
>
> Judging by your error messages, I _think_ it's the "atomic_inc_not_zero()"
> that gets expanded to a cmpxchg() that simply doesn't exist on m68k and
> never has.
>
> I guess we've never depended on cmpxchg before. Or am I missing something?
>

Yes, we now require cmpxchg of all architectures.

It's pretty simple to fix - just use local_irq_save(). We can steal the code
from include/asm-m68knommu/system.h.

2006-03-04 13:18:22

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions

On Tue, Feb 28, 2006 at 04:17:25PM -0800, Randy.Dunlap wrote:
> On Tue, 28 Feb 2006 10:40:53 +0100 Jens Axboe wrote:
>
> > On Mon, Feb 27 2006, Adrian Bunk wrote:
> > > On Mon, Feb 27, 2006 at 01:54:17AM -0500, Jeff Garzik wrote:
> > > > Adrian Bunk wrote:
> > > > >Subject : 2.6.16-rc[34]: resume-from-RAM unreliable (SATA)
> > > > >References : http://lkml.org/lkml/2006/2/20/159
> > > > >Submitter : Mark Lord <[email protected]>
> > > > >Handled-By : Randy Dunlap <[email protected]>
> > > > >Status : one of Randy's patches seems to fix it
> > > >
> > > >
> > > > This is not a regression, libata suspend/resume has always been crappy.
> > > > It's under active development (by Randy, among others) to fix this.
> > >
> > > It might have always been crappy, but it is a regression since
> > > according to the submitter it is working with 2.6.15.
> >
> > It might have worked under lucky circumstances with an idle disk and a
> > goat sacrifice, so I agree with Jeff that this is definitely not a
> > regression. To my knowledge, Mark always used my libata suspend patch on
> > earlier kernels so it's not even an apples-apples comparison.
> >
> > So please scratch that entry.
>
> I'll third that request/comment.

OK, done.

> ~Randy

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-04 13:27:49

by Adrian Bunk

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] 2.6.16-rc5: known regressions

On Thu, Mar 02, 2006 at 11:00:11AM -0300, Mauro Carvalho Chehab wrote:
>
> > Subject : Oops in Kernel 2.6.16-rc4 on Modprobe of saa7134.ko
> > References : http://lkml.org/lkml/2006/2/20/122
> > Submitter : Brian Marete <[email protected]>
> > Status : unknown
>
> This is not a regression, since the user is not configuring saa7134 with
> the right card.

Thanks for this information.

The Oops is still a problem that should IMHO be fixed, but I removed
this issue from my regressions list.

> Cheers,
> Mauro.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-03-04 13:40:18

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [v4l-dvb-maintainer] 2.6.16-rc5: known regressions

Em S?b, 2006-03-04 ?s 14:27 +0100, Adrian Bunk escreveu:

> The Oops is still a problem that should IMHO be fixed
For sure. We are working on it.

Cheers,
Mauro.

2006-03-04 14:05:26

by Roman Zippel

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Hi,

On Fri, 3 Mar 2006, Andrew Morton wrote:

> Yes, we now require cmpxchg of all architectures.

Actually I'd prefer if we used atomic_cmpxchg() instead.
The cmpxchg() emulation was never added for a good reason - to keep code
from assuming it can be used it for userspace synchronisation. Using an
atomic_t here would probably get at least some attention.

bye, Roman

2006-03-04 14:12:53

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Roman Zippel wrote:
> Hi,
>
> On Fri, 3 Mar 2006, Andrew Morton wrote:
>
>
>>Yes, we now require cmpxchg of all architectures.
>
>
> Actually I'd prefer if we used atomic_cmpxchg() instead.
> The cmpxchg() emulation was never added for a good reason - to keep code
> from assuming it can be used it for userspace synchronisation. Using an
> atomic_t here would probably get at least some attention.
>

Yes, I guess that's what Andrew meant. The reason we can require
atomic_cmpxchg of all architectures is because it is only guaranteed
to work on atomic_t.

Glad to hear it won't be a problem for you though.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-03-04 20:33:44

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Nick Piggin <[email protected]> wrote:
>
> Roman Zippel wrote:
> > Hi,
> >
> > On Fri, 3 Mar 2006, Andrew Morton wrote:
> >
> >
> >>Yes, we now require cmpxchg of all architectures.
> >
> >
> > Actually I'd prefer if we used atomic_cmpxchg() instead.
> > The cmpxchg() emulation was never added for a good reason - to keep code
> > from assuming it can be used it for userspace synchronisation. Using an
> > atomic_t here would probably get at least some attention.
> >
>
> Yes, I guess that's what Andrew meant. The reason we can require
> atomic_cmpxchg of all architectures is because it is only guaranteed
> to work on atomic_t.
>
> Glad to hear it won't be a problem for you though.
>

Could someone with an m68k compiler please send the patch?

2006-03-05 14:09:35

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Feb 26, Linus Torvalds wrote:

> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

I see random memory corruption on an early G3 ibook.
Testcase is an openSuSE 10.1 installation. 2.6.15 works ok modulo 2 bugs
to get it booted at all, and the usual udev breakage.

plain 2.6.16-rc5-git7 locks up after a few packages, no ping.
Our SuSE kernel does not lockup, but ext2 shows access beyond end of
device after > 200 packages, or the rpmdb gets corrupt, or both. With reiserfs
it gets past 100 packages, then reiserfs complains about fs corruption.
plain -rc2 shows the same reiserfs corruption.
plain -rc1 dies after a few packages, it jumps to 0x0 in softirq.

I'm trying to compile the git snapshots now, which is a real challenge..

2006-03-05 18:59:25

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Mar 05, Olaf Hering wrote:

> On Sun, Feb 26, Linus Torvalds wrote:
>
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> I see random memory corruption on an early G3 ibook.
> Testcase is an openSuSE 10.1 installation. 2.6.15 works ok modulo 2 bugs
> to get it booted at all, and the usual udev breakage.
>
> plain 2.6.16-rc5-git7 locks up after a few packages, no ping.
> Our SuSE kernel does not lockup, but ext2 shows access beyond end of
> device after > 200 packages, or the rpmdb gets corrupt, or both. With reiserfs
> it gets past 100 packages, then reiserfs complains about fs corruption.
> plain -rc2 shows the same reiserfs corruption.
> plain -rc1 dies after a few packages, it jumps to 0x0 in softirq.

-git5 works, -git7 showed reiserfs corruption. -git6 died, jumped from
__do_softirq to 0x0, will try once again.

git5->6 has the mutex changes, but also lots of powerpc changes. Lets
see if I can narrow it down further.

The ibook has 160mb, installation is done via modular nfs
(ro,v3,rsize=32768,wsize=32768,hard,nolock,proto=tcp,addr=1.1.1.3)
I havent seen this on a B&W G3 with 256mb, nor on other ppc32 systems.

2006-03-05 20:03:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5



On Sun, 5 Mar 2006, Olaf Hering wrote:

> On Sun, Mar 05, Olaf Hering wrote:
> >
> > plain 2.6.16-rc5-git7 locks up after a few packages, no ping.
> > Our SuSE kernel does not lockup, but ext2 shows access beyond end of
> > device after > 200 packages, or the rpmdb gets corrupt, or both. With reiserfs
> > it gets past 100 packages, then reiserfs complains about fs corruption.
> > plain -rc2 shows the same reiserfs corruption.
> > plain -rc1 dies after a few packages, it jumps to 0x0 in softirq.
>
> -git5 works, -git7 showed reiserfs corruption. -git6 died, jumped from
> __do_softirq to 0x0, will try once again.

Since there are several git users in the ppc camp, one thing that always
helps is that when you test a -git snapshot, you also say what the "git
ID" was.

I'm assuming that when you say "-git5 works", you mean 2.6.15-git5.

In this case:
2.6.15-git5: 5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f
2.6.15-git6: 977127174a7dff52d17faeeb4c4949a54221881f
2.6.15-git7: 05f6ece6f37f987e9de643f6f76e8fb5d5b9e014

> git5->6 has the mutex changes, but also lots of powerpc changes. Lets
> see if I can narrow it down further.

If you can try out git, the best way to proceed is

git bisect start
git bisect good 5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f
git bisect bad 977127174a7dff52d17faeeb4c4949a54221881f

which should help narrow it down pretty efficiently (I'm marking -git6 as
bad, on the logic that the bug being chased is "corruption _or_ jumping to
address 0". It's much harder if you want to chase down just one bug, when
the other bug might stand in your way).

And yes, that range contains not just powerpc updates, but also PCI layer,
mutex changes, crypto and V4L/DVB. Doing just three or four bisection
trials would help narrow it down a lot (now it's 448 commits - doing three
bisctions should narrow it down into less than 60 commits and likely which
subsystem, while doing another bisection or two would get us into a few
tens of commits).

"git bisect" really is very powerful and easy to use.

Linus

2006-03-05 20:42:35

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Mar 05, Linus Torvalds wrote:

> "git bisect" really is very powerful and easy to use.

Indeed. The one "between" gi5 and git6
(93b47684f60cf25e8cefe19a21d94aa0257fdf36) is fails also. There are no
mutex changes left, so I suspect some ppc bug.

With this changeset, my first attempt ran into this deadlock, the second
attempt lead to the reiserfs corruption.
See attached 93b47684f60cf25e8cefe19a21d94aa0257fdf36.log

I'm now at 03929c76f3e5af919fb762e9882a9c286d361e7d, which fails as
well. dmesg shows this:

Adding 295332k swap on /dev/hda10. Priority:-1 extents:1 across:295332k
ReiserFS: warning: is_leaf: wrong item type for item *3.5*[-1 -1 0xffffffff DIRECT], item_len 65535, item_location 65535, free_space(entry_count) 65535
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 4652 0x0 SD]
ReiserFS: warning: is_leaf: wrong item type for item *3.5*[-1 -1 0xffffffff DIRECT], item_len 65535, item_location 65535, free_space(entry_count) 65535
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 4652 0x0 SD]
ReiserFS: warning: is_leaf: wrong item type for item *3.5*[-1 -1 0xffffffff DIRECT], item_len 65535, item_location 65535, free_space(entry_count) 65535
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 4652 0x0 SD]


There are only powerpc related changes left.
Its a bit time comsuming until I get to the point of where it fails,
lets see how far I get this evening.


Attachments:
(No filename) (1.81 kB)
93b47684f60cf25e8cefe19a21d94aa0257fdf36.log (23.57 kB)
Download all attachments

2006-03-05 21:50:53

by Paul Mackerras

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

Olaf Hering writes:

> I'm now at 03929c76f3e5af919fb762e9882a9c286d361e7d, which fails as
> well. dmesg shows this:

The range from git5 to there includes David Woodhouse's syscall
entry/exit revamp (401d1f029bebb7153ca704997772113dc36d9527) and the
follow-ons which fix it for 32-bit:

9687c587596b54a77f08620595f5686ea35eed97
623703f620453c798b6fa3eb79ad8ea27bfd302a

There are also commits from Ben H that change the way we parse
addresses from the OF device tree. If you can bisect a bit further
that would be good, although you may strike problems between the 401d
and 6237 commits I mentioned above.

It would be interesting to take 401d and then apply 9687 and 6237
directly on top of it and try that, and if it fails, then try
1cd8e506209223ed10da805d99be55e268f4023c (the parent of 401d).

Paul.

Subject: Re: Linux v2.6.16-rc5

[email protected] (Linus Torvalds) writes:
> Have I missed anything? Holler. And please keep reminding about any
> regressions since 2.6.15.

As reported yesterday [1], the generic irq framework for alpha introduced
in commit 0595bf3bca9d9932a05b06dd438f40f01d27cd33 kills my box under
fairly heavy disk usage. I got a md raid 0 array stripped accross 3 scsi
disks and any kind of relatively intensive IOs (like md5sum or sha1sum
against iso files) kill the box immediately; either it panics in
kernel/exit.c:do_exit - the first three "unlikely" - or in
arch/alpha/mm/fault.c:do_page_fault "Unable to handle paging reguest at
some address"...

Reverting it makes the box stable again (as it was under vanilla 2.6.15).

Here's the commit detail:

0595bf3bca9d9932a05b06dd438f40f01d27cd33 is first bad commit
diff-tree 0595bf3bca9d9932a05b06dd438f40f01d27cd33 (from eee45269b0f5979c70bc151c6c2f4e5f4f5ababe)
Author: Ivan Kokshaysky <[email protected]>
Date: Fri Jan 6 00:12:22 2006 -0800

[PATCH] Alpha: convert to generic irq framework (alpha part)

Kconfig tweaks and tons of deletions.

Signed-off-by: Ivan Kokshaysky <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Richard Henderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

:040000 040000 ac127f16325bb65941bd38208325ab7821877f52 15d7d4d17a7c8cfb8fe53c29ded31ff9cf287534 M arch
:040000 040000 287f73cdf371b2b33cc48f1d876005aab29ff3de 29263093ae33ceccd6346b987870367bc8329f0a M include


[1] Problem on Alpha with "convert to generic irq framework"
Message-Id: <20060304111219.GA10532@localhost>
http://lkml.org/lkml/2006/3/4/31

--
Mathieu Chouquet-Stringer

2006-03-05 22:22:06

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, Mar 06, Paul Mackeras wrote:

> Olaf Hering writes:
>
> > I'm now at 03929c76f3e5af919fb762e9882a9c286d361e7d, which fails as
> > well. dmesg shows this:
>
> The range from git5 to there includes David Woodhouse's syscall
> entry/exit revamp (401d1f029bebb7153ca704997772113dc36d9527) and the
> follow-ons which fix it for 32-bit:
>
> 9687c587596b54a77f08620595f5686ea35eed97
> 623703f620453c798b6fa3eb79ad8ea27bfd302a
>
> There are also commits from Ben H that change the way we parse
> addresses from the OF device tree. If you can bisect a bit further
> that would be good, although you may strike problems between the 401d
> and 6237 commits I mentioned above.

I will check this tomorrow.

quick update:

d4e4b3520c4df46cf1d15a56379a6fa57e267b7d, locks up, tried two times


404849bbd2bfd62e05b36f4753f6e1af6050a824 + 3 buildfixes:

31df1678d7732b94178a6e457ed6666e4431212f
8dacaedf04467e32c50148751a96150e73323cdc
d2dd482bc17c3bc240045f80a7c4b4d5cea5e29c


This one has the syscall changes, but not the two fixes you mentioned.
It gets far, but at the point where it locks up with the d4eb, it
crashes in run_timer_softirq, branched to 0x1f4. Maybe its the result of
the missing fixes. Will continue tomorrow.

2006-03-05 22:44:40

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Mar 05, Olaf Hering wrote:

> 404849bbd2bfd62e05b36f4753f6e1af6050a824 + 3 buildfixes:
>
> 31df1678d7732b94178a6e457ed6666e4431212f
> 8dacaedf04467e32c50148751a96150e73323cdc
> d2dd482bc17c3bc240045f80a7c4b4d5cea5e29c
>
>
> This one has the syscall changes, but not the two fixes you mentioned.
> It gets far, but at the point where it locks up with the d4eb, it
> crashes in run_timer_softirq, branched to 0x1f4. Maybe its the result of
> the missing fixes. Will continue tomorrow.

Another try with that version, now I see the corruption before the
package where it locked up before (glibc-locale, rather large).
Will backout the syscall change and try again with 404849bbd2bfd62e05b36f4753f6e1af6050a824.

ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234
ReiserFS: warning: vs-500: unknown uniqueness 1634738234, item_len 25965, item_location 25972, free_space(entry_count) 24946
ReiserFS: hda11: warning: vs-5150: search_by_key: invalid format found in block 0. Fsck?
ReiserFS: hda11: warning: zam-7001: io error in reiserfs_find_entry


2006-03-06 02:16:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5


Ivan, rth,
any ideas?

Linus

On Sun, 5 Mar 2006, Mathieu Chouquet-Stringer wrote:
>
> [email protected] (Linus Torvalds) writes:
> > Have I missed anything? Holler. And please keep reminding about any
> > regressions since 2.6.15.
>
> As reported yesterday [1], the generic irq framework for alpha introduced
> in commit 0595bf3bca9d9932a05b06dd438f40f01d27cd33 kills my box under
> fairly heavy disk usage. I got a md raid 0 array stripped accross 3 scsi
> disks and any kind of relatively intensive IOs (like md5sum or sha1sum
> against iso files) kill the box immediately; either it panics in
> kernel/exit.c:do_exit - the first three "unlikely" - or in
> arch/alpha/mm/fault.c:do_page_fault "Unable to handle paging reguest at
> some address"...
>
> Reverting it makes the box stable again (as it was under vanilla 2.6.15).
>
> Here's the commit detail:
>
> 0595bf3bca9d9932a05b06dd438f40f01d27cd33 is first bad commit
> diff-tree 0595bf3bca9d9932a05b06dd438f40f01d27cd33 (from eee45269b0f5979c70bc151c6c2f4e5f4f5ababe)
> Author: Ivan Kokshaysky <[email protected]>
> Date: Fri Jan 6 00:12:22 2006 -0800
>
> [PATCH] Alpha: convert to generic irq framework (alpha part)
>
> Kconfig tweaks and tons of deletions.
>
> Signed-off-by: Ivan Kokshaysky <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Richard Henderson <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> :040000 040000 ac127f16325bb65941bd38208325ab7821877f52 15d7d4d17a7c8cfb8fe53c29ded31ff9cf287534 M arch
> :040000 040000 287f73cdf371b2b33cc48f1d876005aab29ff3de 29263093ae33ceccd6346b987870367bc8329f0a M include
>
>
> [1] Problem on Alpha with "convert to generic irq framework"
> Message-Id: <20060304111219.GA10532@localhost>
> http://lkml.org/lkml/2006/3/4/31
>
> --
> Mathieu Chouquet-Stringer
>

2006-03-06 04:52:16

by Suzanne Wood

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

> From: Adrian Bunk Fri Mar 03 2006 - 18:40:57 EST
>
> On Fri, Mar 03, 2006 at 03:22:42PM -0800, Linus Torvalds wrote:
>>
>> On Sat, 4 Mar 2006, Adrian Bunk wrote:
>> >
>> > It seems the problem is that in the CONFIG_RMW_INSNS=n
>> > case, there's no
>> > cmpxchg #define in include/asm-m68k/system.h required for the
>> > atomic_add_unless #define in include/asm-m68k/atomic.h.
>>
>> Hmm. It seems like it never has been there.. Do you know what brought this
>> on? Was it Nick's RCU changes from "rcuref_dec_and_test()" to
>> "atomic_dec_and_test()" and friends?
>
> It was Nick's commit 8426e1f6af0fd7f44d040af7263750c5a52f3cc3 that added
> atomic_inc_not_zero(), and Nick's patch that changed fs/file_table.c
> from rcuref_dec_and_test() to atomic_dec_and_test() exposed this
> problem.

Do kernel coders value the marking of the rcu read-side critical
section for consistency? In fs/file_table.c, fcheck_files()
is called by fget_light() without rcu_read_lock() in one case,
but with the apparently necessary rcu_read_lock() in place
otherwise. The struct file pointer that fcheck_files() returns
is rcu_dereference(fdt->fd[fd]) or NULL. Does the _commented_guarantee
of the current task holding the refcnt assure there's no need to
check for NULL or to mark the rcu readside section around the first
call to fcheck_files()?

This is the code sample:
/*
* Lightweight file lookup - no refcnt increment if fd table isn't shared.
* You can use this only if it is guranteed that the current task already
* holds a refcnt to that file. That check has to be done at fget() only
* and a flag is returned to be passed to the corresponding fput_light().
* There must not be a cloning between an fget_light/fput_light pair.
*/
struct file fastcall *fget_light(unsigned int fd, int *fput_needed)
{
struct file *file;
struct files_struct *files = current->files;

*fput_needed = 0;
if (likely((atomic_read(&files->count) == 1))) {
file = fcheck_files(files, fd);
} else {
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
if (atomic_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
}

return file;
}

The attached patch would superficially address the rcu
discrepancy, but another underlying question is about the
desired extent of the rcu read-side critical section in that
fget_light() returns the pointer to the file struct that was
flagged for rcu protection by rcu_dereference() in
fcheck_files(). In this application, does it make sense to
push the rcu_read_lock() into fcheck_files() or add it there
or to extend it to the calling function?

Up the call tree, we note that fcheck() uses fcheck_files(),
but the only call to fcheck() nested in rcu_read_lock() is
in the disparaged irixioctl.c.

Are the other calls to fcheck() under circumstances that give
reason for the rcu_read_lock elision, like
spin_lock(&files->file_lock) in fs/fcntl.c, or being in the
context of applying locks in fs/locks.c, or calls from assembly
code in arch/sparc/kernel/sunos_ioctl.c & solaris/socksys.c.
If there is reason to pursue the insertion of the
rcu_read_lock/unlock() pairs in these circumstances, any
suggestions would be appreciated in order to dispel the question
altogether or to try to submit a more extensive patch.

Thank you.
Suzanne

-
file_table.c | 2 ++
1 files changed, 2 insertions(+)
---------------------------------

--- /testbed2/linux-2.6.16-rc5/fs/file_table.c 2006-02-26 21:09:35.000000000 -0800
+++ /testbed1/linux-2.6.16-rc5/fs/file_table.c 2006-03-05 14:36:46.000000000 -0800
@@ -194,7 +194,9 @@ struct file fastcall *fget_light(unsigne

*fput_needed = 0;
if (likely((atomic_read(&files->count) == 1))) {
+ rcu_read_lock();
file = fcheck_files(files, fd);
+ rcu_read_unlock();
} else {
rcu_read_lock();
file = fcheck_files(files, fd);

2006-03-06 07:48:15

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Sun, Mar 05, Olaf Hering wrote:

> On Sun, Mar 05, Olaf Hering wrote:
>
> > 404849bbd2bfd62e05b36f4753f6e1af6050a824 + 3 buildfixes:
> >
> > 31df1678d7732b94178a6e457ed6666e4431212f
> > 8dacaedf04467e32c50148751a96150e73323cdc
> > d2dd482bc17c3bc240045f80a7c4b4d5cea5e29c
> >
> >
> > This one has the syscall changes, but not the two fixes you mentioned.
> > It gets far, but at the point where it locks up with the d4eb, it
> > crashes in run_timer_softirq, branched to 0x1f4. Maybe its the result of
> > the missing fixes. Will continue tomorrow.
>
> Another try with that version, now I see the corruption before the
> package where it locked up before (glibc-locale, rather large).
> Will backout the syscall change and try again with 404849bbd2bfd62e05b36f4753f6e1af6050a824.

Its not the syscall change at least. Looking further.

2006-03-06 15:36:36

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

On Sun, Mar 05, 2006 at 08:44:15PM -0800, Suzanne Wood wrote:
> > From: Adrian Bunk Fri Mar 03 2006 - 18:40:57 EST
> >
> Do kernel coders value the marking of the rcu read-side critical
> section for consistency? In fs/file_table.c, fcheck_files()

Generally speaking, yes.

> is called by fget_light() without rcu_read_lock() in one case,
> but with the apparently necessary rcu_read_lock() in place
> otherwise. The struct file pointer that fcheck_files() returns
> is rcu_dereference(fdt->fd[fd]) or NULL. Does the _commented_guarantee
> of the current task holding the refcnt assure there's no need to
> check for NULL or to mark the rcu readside section around the first
> call to fcheck_files()?
>
> This is the code sample:
> /*
> * Lightweight file lookup - no refcnt increment if fd table isn't shared.
> * You can use this only if it is guranteed that the current task already
> * holds a refcnt to that file. That check has to be done at fget() only
> * and a flag is returned to be passed to the corresponding fput_light().
> * There must not be a cloning between an fget_light/fput_light pair.
> */
> struct file fastcall *fget_light(unsigned int fd, int *fput_needed)
> {
> struct file *file;
> struct files_struct *files = current->files;
>
> *fput_needed = 0;
> if (likely((atomic_read(&files->count) == 1))) {
> file = fcheck_files(files, fd);
> } else {

This means that the fd table is not shared between threads. So,
there can't be any race and no need to protect using
rcu_read_lock()/rcu_read_unlock().

>
> The attached patch would superficially address the rcu
> discrepancy, but another underlying question is about the
> desired extent of the rcu read-side critical section in that
> fget_light() returns the pointer to the file struct that was
> flagged for rcu protection by rcu_dereference() in
> fcheck_files(). In this application, does it make sense to
> push the rcu_read_lock() into fcheck_files() or add it there
> or to extend it to the calling function?

I think a comment there explaining why rcu_read_lock/unlock
pair is not there should be sufficient. While the are NOP
for non-PREEMPT kernels, they do have a cost otherwise.
Avoiding them if we can is a good idea, IMO.

> Up the call tree, we note that fcheck() uses fcheck_files(),
> but the only call to fcheck() nested in rcu_read_lock() is
> in the disparaged irixioctl.c.
>
> Are the other calls to fcheck() under circumstances that give
> reason for the rcu_read_lock elision, like
> spin_lock(&files->file_lock) in fs/fcntl.c, or being in the
> context of applying locks in fs/locks.c, or calls from assembly
> code in arch/sparc/kernel/sunos_ioctl.c & solaris/socksys.c.
> If there is reason to pursue the insertion of the
> rcu_read_lock/unlock() pairs in these circumstances, any
> suggestions would be appreciated in order to dispel the question
> altogether or to try to submit a more extensive patch.

It depends on whether the fdtable is shared or not and if
shared whether we are already holding the ->files_lock or
not. The key is that if it is lock-free and if the fdtable
is shared, they rcu_read_lock()/unlock() pair must be
there, otherwise it is a bug.

Thanks
Dipankar

2006-03-06 16:21:40

by Suzanne Wood

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Thank you very much.

> From [email protected] Mon Mar 6 07:36:50 2006

> On Sun, Mar 05, 2006 at 08:44:15PM -0800, Suzanne Wood wrote:
> > > From: Adrian Bunk Fri Mar 03 2006 - 18:40:57 EST
> > >
> > Do kernel coders value the marking of the rcu read-side critical
> > section for consistency? In fs/file_table.c, fcheck_files()

> Generally speaking, yes.

> > is called by fget_light() without rcu_read_lock() in one case,
> > but with the apparently necessary rcu_read_lock() in place
> > otherwise. The struct file pointer that fcheck_files() returns
> > is rcu_dereference(fdt->fd[fd]) or NULL. Does the _commented_guarantee
> > of the current task holding the refcnt assure there's no need to
> > check for NULL or to mark the rcu readside section around the first
> > call to fcheck_files()?
> >
> > This is the code sample:
> > /*
> > * Lightweight file lookup - no refcnt increment if fd table isn't shared.
> > * You can use this only if it is guranteed that the current task already
> > * holds a refcnt to that file. That check has to be done at fget() only
> > * and a flag is returned to be passed to the corresponding fput_light().
> > * There must not be a cloning between an fget_light/fput_light pair.
> > */
> > struct file fastcall *fget_light(unsigned int fd, int *fput_needed)
> > {
> > struct file *file;
> > struct files_struct *files = current->files;
> >
> > *fput_needed = 0;
> > if (likely((atomic_read(&files->count) == 1))) {
> > file = fcheck_files(files, fd);
> > } else {

> This means that the fd table is not shared between threads. So,
> there can't be any race and no need to protect using
> rcu_read_lock()/rcu_read_unlock().

Then why call fcheck_files() with the rcu_dereference() which would flag
an automated check for the need to mark a read-side critical section?
Would it make sense to introduce the function that doesn't? The goal of
keeping the kernel small is balanced with clarity. The inconsistency of
how fcheck_files() is used within a single function (fget_light()) was
my opening question.

> > The attached patch would superficially address the rcu
> > discrepancy, but another underlying question is about the
> > desired extent of the rcu read-side critical section in that
> > fget_light() returns the pointer to the file struct that was
> > flagged for rcu protection by rcu_dereference() in
> > fcheck_files(). In this application, does it make sense to
> > push the rcu_read_lock() into fcheck_files() or add it there
> > or to extend it to the calling function?

> I think a comment there explaining why rcu_read_lock/unlock
> pair is not there should be sufficient. While the are NOP
> for non-PREEMPT kernels, they do have a cost otherwise.
> Avoiding them if we can is a good idea, IMO.

> > Up the call tree, we note that fcheck() uses fcheck_files(),
> > but the only call to fcheck() nested in rcu_read_lock() is
> > in the disparaged irixioctl.c.
> >
> > Are the other calls to fcheck() under circumstances that give
> > reason for the rcu_read_lock elision, like
> > spin_lock(&files->file_lock) in fs/fcntl.c, or being in the
> > context of applying locks in fs/locks.c, or calls from assembly
> > code in arch/sparc/kernel/sunos_ioctl.c & solaris/socksys.c.
> > If there is reason to pursue the insertion of the
> > rcu_read_lock/unlock() pairs in these circumstances, any
> > suggestions would be appreciated in order to dispel the question
> > altogether or to try to submit a more extensive patch.

> It depends on whether the fdtable is shared or not and if
> shared whether we are already holding the ->files_lock or
> not. The key is that if it is lock-free and if the fdtable
> is shared, they rcu_read_lock()/unlock() pair must be
> there, otherwise it is a bug.

> Thanks
> Dipankar

Many thanks again.
Suzanne

2006-03-06 16:48:21

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, Mar 06, Paul Mackeras wrote:

> There are also commits from Ben H that change the way we parse
> addresses from the OF device tree. If you can bisect a bit further
> that would be good, although you may strike problems between the 401d
> and 6237 commits I mentioned above.

What I have right now is this, which got me in a non-compiling state.
I will pick the udbg stuff and apply the relevant changes to -git5.

==> .git/HEAD <==
463ce0e103f419f51b1769111e73fe8bb305d0ec

==> .git/refs/bisect/bad <==
51d3082fe6e55aecfa17113dbe98077c749f724c

==> .git/refs/bisect/good-5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f <==
5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f

==> .git/refs/bisect/good-d1405b869850982f05c7ec0d3f137ca27588192f <==
d1405b869850982f05c7ec0d3f137ca27588192f

==> .git/BISECT_LOG <==
git-bisect start
# good: [5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
git-bisect good 5367f2d67c7d0bf1faae90e6e7b4e2ac3c9b5e0f
# bad: [977127174a7dff52d17faeeb4c4949a54221881f] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
git-bisect bad 977127174a7dff52d17faeeb4c4949a54221881f
# bad: [93b47684f60cf25e8cefe19a21d94aa0257fdf36] drivers/*rest*: Replace pci_module_init() with pci_register_driver()
git-bisect bad 93b47684f60cf25e8cefe19a21d94aa0257fdf36
# bad: [03929c76f3e5af919fb762e9882a9c286d361e7d] ppc32: cpm_uart: fix xchar sending
git-bisect bad 03929c76f3e5af919fb762e9882a9c286d361e7d
# bad: [d4e4b3520c4df46cf1d15a56379a6fa57e267b7d] powerpc: fix for "Update OF address parsers"
git-bisect bad d4e4b3520c4df46cf1d15a56379a6fa57e267b7d
# bad: [404849bbd2bfd62e05b36f4753f6e1af6050a824] powerpc: Remove some unneeded fields from the paca
git-bisect bad 404849bbd2bfd62e05b36f4753f6e1af6050a824
# good: [d1405b869850982f05c7ec0d3f137ca27588192f] powerpc: Add OF address parsing code (#2)
git-bisect good d1405b869850982f05c7ec0d3f137ca27588192f
# bad: [e199500c6280aadf98c185db99fd24ab61ebe0c7] powerpc: partly merge iseries do_IRQ
git-bisect bad e199500c6280aadf98c185db99fd24ab61ebe0c7
# bad: [2c5bd01f8f5d7c655d9d1aa60b696d980947e3be] powerpc: convert macio_asic to use prom_parse
git-bisect bad 2c5bd01f8f5d7c655d9d1aa60b696d980947e3be
# bad: [51d3082fe6e55aecfa17113dbe98077c749f724c] powerpc: Unify udbg (#2)
git-bisect bad 51d3082fe6e55aecfa17113dbe98077c749f724c


2006-03-06 16:49:33

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

On Mon, Mar 06, 2006 at 08:13:41AM -0800, Suzanne Wood wrote:
> Thank you very much.
> > > struct file fastcall *fget_light(unsigned int fd, int *fput_needed)
> > > {
> > > struct file *file;
> > > struct files_struct *files = current->files;
> > >
> > > *fput_needed = 0;
> > > if (likely((atomic_read(&files->count) == 1))) {
> > > file = fcheck_files(files, fd);
> > > } else {
>
> > This means that the fd table is not shared between threads. So,
> > there can't be any race and no need to protect using
> > rcu_read_lock()/rcu_read_unlock().
>
> Then why call fcheck_files() with the rcu_dereference() which would flag
> an automated check for the need to mark a read-side critical section?
> Would it make sense to introduce the function that doesn't? The goal of
> keeping the kernel small is balanced with clarity. The inconsistency of
> how fcheck_files() is used within a single function (fget_light()) was
> my opening question.

Because rcu_dereference() hurts only alpha and we don't care about
alpha :-)

Just kidding!

Good point about automated checkers. However, this isn't an
uncommon thing in multi-threaded programs - can't the checker
rules be written to take into account sharing and non-sharing of
the object in question ?

Thanks
Dipankar

2006-03-06 22:20:09

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, Mar 06, Olaf Hering wrote:

> On Mon, Mar 06, Paul Mackeras wrote:
>
> > There are also commits from Ben H that change the way we parse
> > addresses from the OF device tree. If you can bisect a bit further
> > that would be good, although you may strike problems between the 401d
> > and 6237 commits I mentioned above.
>
> What I have right now is this, which got me in a non-compiling state.
> I will pick the udbg stuff and apply the relevant changes to -git5.

Here is the working vs. non-working thing.
So its the udbg patch. My used .config is attached, use yes '' | make oldconfig
to get it in shape.

--- ../series.conf.good.463ce0e103f419f51b1769111e73fe8bb305d0ec 2006-03-06 22:47:03.040936341 +0100
+++ series.conf 2006-03-06 22:48:50.611294236 +0100
@@ -25,21 +25,28 @@
patches.suse/suse-ppc-legacy-io.patch
patches.arch/0022-powerpc-incorrect-rmo_top-handling-in-prom_init.txt

+ # buildfix
+ patches.suse/9100b205fdc70b300894954ebebbf2709c5ed525.patch
+
#kexec32
patches.suse/3d1229d6ae92ed1994f4411b8493327ef8f4b76f.patch
- # udbg2
+ # address parsing code2
patches.suse/d1405b869850982f05c7ec0d3f137ca27588192f.patch
#serial port2
patches.suse/463ce0e103f419f51b1769111e73fe8bb305d0ec.patch
- # buildfix
- patches.suse/9100b205fdc70b300894954ebebbf2709c5ed525.patch
+
+ #Subject: [PATCH] powerpc: Unify udbg (#2)
+ patches.suse/51d3082fe6e55aecfa17113dbe98077c749f724c.patch
+ #Subject: [PATCH] powerpc: serial port discovery: cope with broken firmware
+ patches.suse/31df1678d7732b94178a6e457ed6666e4431212f.patch
+ #Subject: [PATCH] powerpc: More serial probe fixes (#2)
+ patches.suse/8dacaedf04467e32c50148751a96150e73323cdc.patch
+
#- patches.suse/ac448afbcdcc218fd8d177960466ecc4a523722f.patch
#- patches.suse/36874579dbf4cafa31486d4207c6807efbbf1378.patch
#- patches.suse/575e321606c5673efabf28c0fa075e198980c44e.patch
# patches.suse/3b212db9217d02e623eaa12f41c9b5f8c6a99535.patch

-# patches.suse/31df1678d7732b94178a6e457ed6666e4431212f.patch
-# patches.suse/8dacaedf04467e32c50148751a96150e73323cdc.patch
#- patches.suse/d2dd482bc17c3bc240045f80a7c4b4d5cea5e29c.patch

#+rev patches.suse/bcb05504edf0e27a648aa1059cbb71e8746758a1.patch



Attachments:
(No filename) (2.30 kB)
default (45.62 kB)
Download all attachments

2006-03-06 23:02:37

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Mon, Mar 06, Olaf Hering wrote:

> On Mon, Mar 06, Olaf Hering wrote:
>
> > On Mon, Mar 06, Paul Mackeras wrote:
> >
> > > There are also commits from Ben H that change the way we parse
> > > addresses from the OF device tree. If you can bisect a bit further
> > > that would be good, although you may strike problems between the 401d
> > > and 6237 commits I mentioned above.
> >
> > What I have right now is this, which got me in a non-compiling state.
> > I will pick the udbg stuff and apply the relevant changes to -git5.

I tried with CONFIG_BOOTX_TEXT disabled. same result. This is the list
of patches I used on top of 2.6.15:

patches.kernel.org/patch-2.6.15-git5
patches.suse/get_cramfs_inode-revert.patch
patches.suse/suse-ppc-legacy-io.patch
patches.arch/0022-powerpc-incorrect-rmo_top-handling-in-prom_init.txt
patches.suse/9100b205fdc70b300894954ebebbf2709c5ed525.patch
patches.suse/3d1229d6ae92ed1994f4411b8493327ef8f4b76f.patch
patches.suse/d1405b869850982f05c7ec0d3f137ca27588192f.patch
patches.suse/463ce0e103f419f51b1769111e73fe8bb305d0ec.patch

patches.suse/51d3082fe6e55aecfa17113dbe98077c749f724c.patch
patches.suse/31df1678d7732b94178a6e457ed6666e4431212f.patch
patches.suse/8dacaedf04467e32c50148751a96150e73323cdc.patch
patches.suse/52020d2bda9fe447bb50674a2e39e4064b6a10b5.patch

2006-03-07 01:15:18

by Suzanne Wood

[permalink] [raw]
Subject: Re: 2.6.16-rc regression: m68k CONFIG_RMW_INSNS=n compile broken

Hello and thank you again.

> From [email protected] Mon Mar 6 08:49:55 2006

> On Mon, Mar 06, 2006 at 08:13:41AM -0800, Suzanne Wood wrote:
> > Thank you very much.
> > > > struct file fastcall *fget_light(unsigned int fd, int *fput_needed)
> > > > {
> > > > struct file *file;
> > > > struct files_struct *files = current->files;
> > > >
> > > > *fput_needed = 0;
> > > > if (likely((atomic_read(&files->count) == 1))) {
> > > > file = fcheck_files(files, fd);
> > > > } else {
> >
> > > This means that the fd table is not shared between threads. So,
> > > there can't be any race and no need to protect using
> > > rcu_read_lock()/rcu_read_unlock().
> >
> > Then why call fcheck_files() with the rcu_dereference() which would flag
> > an automated check for the need to mark a read-side critical section?
> > Would it make sense to introduce the function that doesn't? The goal of
> > keeping the kernel small is balanced with clarity. The inconsistency of
> > how fcheck_files() is used within a single function (fget_light()) was
> > my opening question.

> Because rcu_dereference() hurts only alpha and we don't care about
> alpha :-)

> Just kidding!

> Good point about automated checkers. However, this isn't an
> uncommon thing in multi-threaded programs - can't the checker
> rules be written to take into account sharing and non-sharing of
> the object in question ?

Henzinger, et al., UC Berkeley, describe race checking on a
language for networked embedded systems NES-C using the atomic
keyword to delimit sections. (The rcu_read_lock() would be
similar in disallowing interrupts.) When flow-based analysis
returned false positives, the programmer could annotate the
code with "norace" and in practice all shared accesses were
put in atomic sections even if there were no actual race
conditions. To limit the number of atomic sections, the
UCB group modeled multiple threads, triggered hardware
interrupts and interleaved tasks and checked for safe access
and did manual corrections to the unsafe code.

In fget_light(), the rcu_dereference() is apparently never
intended in the "if true" of the conditional where
(likely((atomic_read(&files->count) == 1), so only one file
descriptor is open for the current task at that instant. (A
child process could share that descriptor, but an unrelated
process would have its own file descriptor.)

But fget_light() does return the file pointer which _some_ of
the time does require rcu-protection, so hypothetically, a
checker flags it as unsafe if no rcu_read_lock() is in place
in a caller at some level and checking can proceed to other
locking.

The core premises have been that a path through the code
that contains rcu_dereference() or rcu_assign_pointer() is
matched to the assign/deref counterpart with the same struct
object type and the rcu_dereference() is nested in a read-side
critical section delimited by rcu_read_lock() and
rcu_read_unlock() used to determine the extent of the duration
of the struct at the address.

The pointer to the file struct that fcheck_files() returns is
rcu_dereference(fdt->fd[fd]) and open.c has fd_install() call
rcu_assign_pointer(fdt->fd[fd], file). In file_table.c,
file_free() calls call_rcu(&f->f_u.fu_rcuhead, file_free_rcu),
where the fu_rcuhead is a field of the file struct and
file_free_rcu() calls kmem_cache_free().

Thank you very much for your insights into the reasoning.
Suzanne

2006-03-08 11:24:30

by Adrian Bunk

[permalink] [raw]
Subject: [2.6 patch] m68k: fix cmpxchg compile errors if CONFIG_RMW_INSNS=n

On Sat, Mar 04, 2006 at 12:28:48PM -0800, Andrew Morton wrote:
> Nick Piggin <[email protected]> wrote:
> >
> > Roman Zippel wrote:
> > > Hi,
> > >
> > > On Fri, 3 Mar 2006, Andrew Morton wrote:
> > >
> > >
> > >>Yes, we now require cmpxchg of all architectures.
> > >
> > >
> > > Actually I'd prefer if we used atomic_cmpxchg() instead.
> > > The cmpxchg() emulation was never added for a good reason - to keep code
> > > from assuming it can be used it for userspace synchronisation. Using an
> > > atomic_t here would probably get at least some attention.
> > >
> >
> > Yes, I guess that's what Andrew meant. The reason we can require
> > atomic_cmpxchg of all architectures is because it is only guaranteed
> > to work on atomic_t.
> >
> > Glad to hear it won't be a problem for you though.
> >
>
> Could someone with an m68k compiler please send the patch?

It's below.

cu
Adrian


<-- snip -->


This patch provides a cmpxchg() if CONFIG_RMW_INSNS=n (code stolen from
m68knommu).


Signed-off-by: Adrian Bunk <[email protected]>

---

include/asm-m68k/system.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)

--- linux-2.6.16-rc5-mm3-m68k/include/asm-m68k/system.h.old 2006-03-08 12:10:48.000000000 +0100
+++ linux-2.6.16-rc5-mm3-m68k/include/asm-m68k/system.h 2006-03-08 12:17:47.000000000 +0100
@@ -192,6 +192,21 @@
#define cmpxchg(ptr,o,n)\
((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),\
(unsigned long)(n),sizeof(*(ptr))))
+
+#else
+
+static inline unsigned long cmpxchg(volatile int *p, int old, int new)
+{
+ unsigned long flags;
+ int prev;
+
+ local_irq_save(flags);
+ if ((prev = *p) == old)
+ *p = new;
+ local_irq_restore(flags);
+ return(prev);
+}
+
#endif

#define arch_align_stack(x) (x)

2006-03-10 05:28:12

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

[Re: bugme #5989, head no longer hanging in shame]

From: "Yu, Luming" <[email protected]>
> I suggest you to retest, and post dmesg with UN-modified BIOS.

I'm now running/testing an unmodified DSDT with 2.6.16-rc5. For a while
I had no S3 hangs, but I just noticed them again. The error is the same
as with the modified DSDT (with slightly different offsets):

exregion-0185 [36] ex_system_memory_space: system_memory 0 (32 width) Address=0000000023FDFFC0
exregion-0185 [36] ex_system_memory_space: system_memory 1 (32 width) Address=0000000023FDFFC0
exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 width) Address=00000000000000B2

repeated endlessly.

I think the problem resurfaced once I decided to let my sleep.sh script
leave the thermal driver loaded before going into S3 (suspecting that
the bug might come back if I did that).

So I susect that my modified DSDT didn't cause the S3 problems, it
merely exposed one even in the minimal configuration discussed in the
#5989 report.

Which makes me wonder about another bug that disappeared when I switched
to the vanilla DSDT: While printing (via gs+hpijs to an HP photosmart
2710 via the wireless card), the system makes double-beeps as if it were
having the AC adapter plugged and unplugged. These noises happen when
printing via the wireless card or via USB (to a different HP inkjet),
but not when printing via the parallel port to a Lexmark laserprinter
(using just gs). Since I didn't do anything to the battery code in the
DSDT, I now wonder whether changing the DSDT merely exposed the issue
but didn't create it.

[From an earlier msg:]
> I think the truth is, for 5989, we need to fix thermal and processor
> driver issue.

I agree, although I think the processor driver is not the culprit. My
earlier testing with the (with the modified DSDT) worked fine with the
processor module loaded, but hung with processor + thermal loaded.

-Sanjoy

`A society of sheep must in time beget a government of wolves.'
- Bertrand de Jouvenal

2006-03-11 21:59:42

by Olaf Hering

[permalink] [raw]
Subject: Re: Linux v2.6.16-rc5

On Tue, Mar 07, Olaf Hering wrote:

> On Mon, Mar 06, Olaf Hering wrote:
>
> > On Mon, Mar 06, Olaf Hering wrote:
> >
> > > On Mon, Mar 06, Paul Mackeras wrote:
> > >
> > > > There are also commits from Ben H that change the way we parse
> > > > addresses from the OF device tree. If you can bisect a bit further
> > > > that would be good, although you may strike problems between the 401d
> > > > and 6237 commits I mentioned above.
> > >
> > > What I have right now is this, which got me in a non-compiling state.
> > > I will pick the udbg stuff and apply the relevant changes to -git5.
>
> I tried with CONFIG_BOOTX_TEXT disabled. same result. This is the list
> of patches I used on top of 2.6.15:
>
> patches.kernel.org/patch-2.6.15-git5
> patches.suse/get_cramfs_inode-revert.patch
> patches.suse/suse-ppc-legacy-io.patch
> patches.arch/0022-powerpc-incorrect-rmo_top-handling-in-prom_init.txt
> patches.suse/9100b205fdc70b300894954ebebbf2709c5ed525.patch
> patches.suse/3d1229d6ae92ed1994f4411b8493327ef8f4b76f.patch
> patches.suse/d1405b869850982f05c7ec0d3f137ca27588192f.patch
> patches.suse/463ce0e103f419f51b1769111e73fe8bb305d0ec.patch
>
> patches.suse/51d3082fe6e55aecfa17113dbe98077c749f724c.patch
> patches.suse/31df1678d7732b94178a6e457ed6666e4431212f.patch
> patches.suse/8dacaedf04467e32c50148751a96150e73323cdc.patch
> patches.suse/52020d2bda9fe447bb50674a2e39e4064b6a10b5.patch

51d3082fe6e55aecfa17113dbe98077c749f724c enabled powersave_nap
unconditionally. But early G3 cpus can not handle it.
I sent a patch in another thread.

2006-05-19 13:42:16

by Thomas Renninger

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

On Fri, 2006-03-10 at 00:26 -0500, Sanjoy Mahajan wrote:
> [Re: bugme #5989, head no longer hanging in shame]
>
> From: "Yu, Luming" <[email protected]>
> > I suggest you to retest, and post dmesg with UN-modified BIOS.
>
> I'm now running/testing an unmodified DSDT with 2.6.16-rc5. For a while
> I had no S3 hangs, but I just noticed them again. The error is the same
> as with the modified DSDT (with slightly different offsets):
>
> exregion-0185 [36] ex_system_memory_space: system_memory 0 (32 width) Address=0000000023FDFFC0
> exregion-0185 [36] ex_system_memory_space: system_memory 1 (32 width) Address=0000000023FDFFC0
> exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 width) Address=00000000000000B2
>
> repeated endlessly.

This sounds like the problem Daniel had on his Samsung P35 recently.
He could fix it by getting rid of some asus_unhide_smbus stuff or the
otherway around, adding asus_unhide_smbus quirks in the S3 resume code.

This thread was recently posted on lkml:
Re: [patch] smbus unhiding kills thermal management

Here are some more details, for me that sounds related...:
https://bugzilla.novell.com/show_bug.cgi?id=173420

Thomas

2006-05-21 00:13:01

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

> This sounds like the problem Daniel had on his Samsung P35 recently.
> He could fix it by getting rid of some asus_unhide_smbus stuff or the
> otherway around, adding asus_unhide_smbus quirks in the S3 resume code.
>
> This thread was recently posted on lkml:
> Re: [patch] smbus unhiding kills thermal management
>

That seems likely, thanks for the pointer: Besides the ACPI sleep
hangs, this machine (TP 600X) has fan troubles upon S3 resume. The
problems don't do harm (the damn fan keeps turning on when it
shouldn't), but that's probably chance. Various patches that I tested
for S3 resume hangs reversed this fan behavior, making the fan refuse
to turn on when it should have. The same problem happened after
resume from swsusp (bugzilla #5000).

> https://bugzilla.novell.com/show_bug.cgi?id=173420

>From Comment #30 at the above url: "The Linux ACPI code seems to
actively prevent the fan from running and that worries me."

I saw that as well, and found the following recipe would work around
the problem:

1. Set the trip point to, say, 70 C -- well above the actual
temperature.

2. Then set the trip to anything reasonable that's under the current
temperature (27 C always works). Now the fan turns on, and behaves
fine from then.

My explanation is that, before step 1, the fan is off but the OS
thinks it's on. So the dialogue goes something like:

Hardware (from EC or BIOS?): Ack, I'm overheating, turn on the fan now!
OS: There, there, take it easy. I've checked bit fields in my
memory, and the fan is on. So I don't have to do anything.
Hardware: Ack, ...
OS: There, there, ...
[Hence the 100% kacpid CPU usage]

Based on this explanation, I added a resume method to the fan driver.
It would turn on the fan and mark it as on. So then the internal OS
state matched the actual state. The fix didn't work for at least one
reason: ACPI drivers didn't have suspend/resume methods (though now
there are test patches to add those methods).

Another fix, probably worth doing anyway, is to turn on the fan if the
BIOS asks for it, whether or not the OS thinks it's on. The chance of
the two pieces of information getting out of synch, and the hardware
damage it can cause, is enough to make it worthwhile. The reverse
case can try to optimize (if BIOS asks to shut off the fan, shut it
off only if OS thinks it's on). That creates no danger: just extra
fan noise if the fan is on but the OS thinks it's off.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-05-21 00:42:10

by Carl-Daniel Hailfinger

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

Sanjoy Mahajan wrote:
> That seems likely, thanks for the pointer: Besides the ACPI sleep
> hangs, this machine (TP 600X) has fan troubles upon S3 resume. The
> problems don't do harm (the damn fan keeps turning on when it
> shouldn't), but that's probably chance. Various patches that I tested
> for S3 resume hangs reversed this fan behavior, making the fan refuse
> to turn on when it should have. The same problem happened after
> resume from swsusp (bugzilla #5000).

Please try kernel 2.6.16.17 (just released). It has the SMBus fix which
may fix resume and fan behaviour.


Regards,
Carl-Daniel
--
http://www.hailfinger.org/

2006-05-21 01:30:12

by Joshua Hudson

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

On 5/20/06, Carl-Daniel Hailfinger <[email protected]> wrote:
> Sanjoy Mahajan wrote:
> > That seems likely, thanks for the pointer: Besides the ACPI sleep
> > hangs, this machine (TP 600X) has fan troubles upon S3 resume. The
> > problems don't do harm (the damn fan keeps turning on when it
> > shouldn't), but that's probably chance. Various patches that I tested
> > for S3 resume hangs reversed this fan behavior, making the fan refuse
> > to turn on when it should have. The same problem happened after
> > resume from swsusp (bugzilla #5000).
>
> Please try kernel 2.6.16.17 (just released). It has the SMBus fix which
> may fix resume and fan behaviour.

Am I the only person who read that as 2.6.17 the first time around?

2006-05-21 03:53:44

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

On Sat, 2006-05-20 at 18:30 -0700, Joshua Hudson wrote:
> On 5/20/06, Carl-Daniel Hailfinger <[email protected]> wrote:
> > Please try kernel 2.6.16.17 (just released). It has the SMBus fix which
> > may fix resume and fan behaviour.
>
> Am I the only person who read that as 2.6.17 the first time around?

I think it's evidence that the -stable process is working brilliantly.
We have 17 point releases worth of bug fixes that would not have been
available under the previous model.

Lee

2006-05-22 09:56:31

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

Hi!

> > https://bugzilla.novell.com/show_bug.cgi?id=173420
>
> >From Comment #30 at the above url: "The Linux ACPI code seems to
> actively prevent the fan from running and that worries me."
>
> I saw that as well, and found the following recipe would work around
> the problem:
>
> 1. Set the trip point to, say, 70 C -- well above the actual
> temperature.
>
> 2. Then set the trip to anything reasonable that's under the current
> temperature (27 C always works). Now the fan turns on, and behaves
> fine from then.
>
> My explanation is that, before step 1, the fan is off but the OS
> thinks it's on. So the dialogue goes something like:
>
> Hardware (from EC or BIOS?): Ack, I'm overheating, turn on the fan now!
> OS: There, there, take it easy. I've checked bit fields in my
> memory, and the fan is on. So I don't have to do anything.
> Hardware: Ack, ...
> OS: There, there, ...
> [Hence the 100% kacpid CPU usage]
>
> Based on this explanation, I added a resume method to the fan driver.
> It would turn on the fan and mark it as on. So then the internal OS
> state matched the actual state. The fix didn't work for at least one
> reason: ACPI drivers didn't have suspend/resume methods (though now
> there are test patches to add those methods).

Can you redo your patches with those methods?

> Another fix, probably worth doing anyway, is to turn on the fan if the
> BIOS asks for it, whether or not the OS thinks it's on. The chance of
> the two pieces of information getting out of synch, and the hardware
> damage it can cause, is enough to make it worthwhile. The reverse

There should be 0% hardware damage chance. Fan failure means overheats
mean emergency power cutoff. I even tested it with paper into fan
blades several times. It mostly works.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html