2006-02-13 01:19:57

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.16-rc3


Ok,
it is out there (or in the process of getting mirrored out), so go wild.

And remember - if you get the tar-balls, you can get the patch file as a
bonus FOR THE SAME PRICE! Have we got an unbeatable deal for you or what?

Of course, if you get the git repo, you'll get both anyway. And a nifty
full log, since I'm just appending the short version here as a teaser.

Most of it is your run-of-the-mill fixes, with some driver updates making
it all a bit larger (DVB in particular, but there's a few odd ones
elsewhere too).

The most user-visible one (eventually) is the unshare() system call, which
glibc wanted. Along with some fixes for fstatat() (use the proper 64-bit
interfaces, not the "newer old" one).

MIPS updates to round it up, although other architectures got their
regularly scheduled updates too (powerpc, arm, ia64, s390..)

The shortlog gives more details.

Linus

----

Adrian Bunk:
VIDEO_CX88_ALSA must select SND_PCM
V4L/DVB (3428): drivers/media/dvb/ possible cleanups
kernel/kprobes.c: fix a warning #ifndef ARCH_SUPPORTS_KRETPROBES
don't allow users to set CONFIG_BROKEN=y
drivers/serial/jsm/: cleanups
drivers/ide/ide-io.c: make __ide_end_request() static
IDE: always enable CONFIG_PDC202XX_FORCE
OCFS2: __init / __exit problem
fs/ocfs2/dlm/dlmrecovery.c must #include <linux/delay.h>
Let CDROM_PKTCDVD_WCACHE depend on EXPERIMENTAL
i386: HIGHMEM64G must depend on X86_CMPXCHG64
drivers/base/: proper prototypes
V4L/DVB (3318e): DVB: remove the at76c651/tda80xx frontends
drivers/video/Kconfig: remove unused BUS_I2C option

Akinobu Mita:
fix generic_fls64()

Al Viro:
cris: asm-offsets related build failure
remove bogus asm/bug.h includes.
bogus asm/delay.h includes
drive_info removal outside of arch/i386
missing includes in drivers/net/mv643xx_eth.c
fix breakage in ocp.c
restore power-off on sparc32
ppc: last_task_.... is defined only on non-SMP
drivers/scsi/mac53c94.c __iomem annotations
fallout from ptrace consolidation patch: cris/arch-v10
missing include in ser_a2232
fix __user annotations in fs/select.c
ipv4 NULL noise removal
timer.c NULL noise removal
kernel/sys.c NULL noise removal
dvb NULL noise removal
drivers/char/watchdog/sbc_epx_c3.c __user annotations
fix __user annotations in drivers/base/memory.c
drivers/edac/i82875p_edac.c __user annotations
cmm NULL noise removal, __user annotations
scsi_transport_iscsi gfp_t annotations
sg gfp_t annotations
eeh_driver NULL noise removal
bogus extern in low_i2c.c
amd64 time.c __iomem annotations
__user annotations of video_spu_palette
net/ipv6/mcast.c NULL noise removal
arch/x86_64/pci/mmconfig.c NULL noise removal
nfsroot port= parameter fix [backport of 2.4 fix]
umount_tree() decrements mount count on wrong dentry
arm: fix dependencies for MTD_XIP
mips: namespace pollution - mem_... -> __mem_... in io.h
s390x compat __user annotations
powermac pci iomem annotations
drivers/media/video __user annotations and fixes
powerpc signal __user annotations
sn3 iomem annotations and fixes
compat_ioctl __user annotations
s390 misc __user annotations
fix iomem annotations in dart_iommu
__user annotations in powerpc thread_info
synclink_gt is PCI-only
s390 __get_user() bogus warnings removal
type-safe min() in prism54
mark HISAX_AMD7930 as broken
m32r_sio iomem annotations
sh: lvalues abuse in arch/sh/boards/renesas/rts7751r2d/io.c

Alan Cox:
SBC EPX does not check/claim I/O ports it uses (2nd Edition)
rio cleanups
Fix some ucLinux breakage from the tty updates
ide: set latency when resetting it821x out of firmware mode

Alexey Dobriyan:
dscc4: fix dscc4_init_dummy_skb check
include/asm-*/bitops.h: fix more "~0UL >> size" typos
ixj: fix writing silence check
ipmi: mem_{in,out}[bwl] => intf_mem_{in,out}[bwl]
dscc4: fix dscc4_init_dummy_skb check

Alexey Kuznetsov:
[NETLINK]: Fix a severe bug
[NETLINK]: illegal use of pid in rtnetlink

Ananth N Mavinakayanahalli:
Kprobes: Fix deadlock in function-return probes

Andi Kleen:
x86_64: Update defconfig
x86_64: Disallow kprobes on NMI handlers
x86_64: Define pmtmr_ioport to 0 when PM_TIMER is not available
x86_64: Allow to run main time keeping from the local APIC interrupt
x86_64: Automatically enable apicmaintimer on ATI boards
x86_64: Fix swiotlb dma_alloc_coherent fallback
x86_64: Undo the earlier changes to remove unrolled copy/memset functions
x86_64: Remove CONFIG_INIT_DEBUG
x86_64: Remove rogue default y in EDAC Kconfig
x86_64: Clear more state when ignoring empty node in SRAT parsing
x86_64: Do more checking in the SRAT header code
x86_64: Fix zero mcfg entry workaround on x86-64
x86_64: Don't allow kprobes on __switch_to
x86_64: Calibrate APIC timer using PM timer
i386/x86-64: Don't ack the APIC for bad interrupts when the APIC is not enabled
x86_64: Let impossible CPUs point to reference per cpu data
Fix bad apic fix on i386
x86-64: Add sys_unshare
x86_64: GART DMA merging fix

Andreas Gruenbacher:
Fix two ext[23] uninitialized warnings
Fix building external modules on ppc32

Andreas Mohr:
ide Kconfig fixes

Andreas Schwab:
ufs: fix char vs. __s8 clash in ufs

Andrew Morton:
sx.c warning fixes
parport_serial: printk warning fix
quota_v2: printk warning fixes
sx.c printk warning fixes
uninline __sigqueue_free()
ip2main.c warning fixes
reiserfs_get_acl() build fix
jbd: fix transaction batching
uli526x warning fix
module: strlen_user() race fix
x86: don't initialise cpu_possible_map to all ones
select: fix returned timeval
tipar fixes
fbdev: video_setup() warning fix

Andrey Panin:
[SERIAL] SIIG 8-port serial boards support

Andy Gospodarek:
r8169: fix forced-mode link settings

Antonino A. Daplas:
nvidiafb: Add support for Geforce4 MX 4000

Arjan van de Ven:
ocfs2: Semaphore to mutex conversion.

Arnaud Giersch:
parport: add parallel port support for SGI O2
parport: fix documentation
parport: remove dead address in MAINTAINERS

Ashok Raj:
x86_64: data/functions wrongly marked as __init with cpu hotplug.
x86_64: Dont record local apic ids when they are disabled in MADT

Atsushi Nemoto:
[SERIAL] initialize spinlock for port failed to setup console
[MIPS] Sparse: Fix some compiler/sparse warnings in ptrace32.c
[MIPS] Build blast_cache routines from template
[MIPS] Sparse: Add _MIPS_SZINT and _MIPS_ISA to CHECKFLAGS to fix sparse warnings.
[MIPS] Remove wrong __user tags.
[MIPS] ieee754[sd]p_neg workaround
[MIPS] Sparse: Add some __user tags to signal functions.
[MIPS] Fix minor sparse warnings
[MIPS] Fix dump_tlb.c warning and cleanup.
[MIPS] TX49 MFC0 bug workaround
[MIPS] Sparse: Add __user tags to syscall.c
[MIPS] Add 'const' to readb and friends

Becky Bruce:
documentation/powerpc: add bus-frequency property to SOC node
powerpc: Add FSL USB node to documentation

Ben Dooks:
[ARM] 3303/1: S3C24XX - add clock enable usage counting
[ARM] 3306/1: S3C24XX - update defconfig
[ARM] 3299/1: S3C24XX - fix irq range on adc device
[ARM] 3326/1: H1940 - Control latches

Benjamin Herrenschmidt:
Fix uevent buffer overflow in input layer
powerpc: Fix sound driver use of i2c
powerpc: Thermal control for dual core G5s

Bjorn Helgaas:
[IA64] avoid broken SAL_CACHE_FLUSH implementations
ia64: drop arch-specific IDE MAX_HWIFS definition

Carsten Otte:
ext2: print xip mount option in ext2_show_options

Catalin Marinas:
[ARM] 3290/1: Fix the FIFO size detection
[ARM] 3313/1: Use OSC4 instead of OSC1 for CLCD

Chen, Kenneth W:
[IA64] remove staled comments in asm/system.h
x86_64: Fix memory policy build without CONFIG_HUGETLBFS
[IA64] add syscall entry for *at()

Chris McDermott:
x86-64: Fix HPET timer on x460

Chris Pascoe:
V4L/DVB (3308): Use parallel transport for FusionHDTV Dual Digital USB

Christoph Lameter:
hugetlb: add comment explaining reasons for Bus Errors
hugetlbpage: return VM_FAULT_OOM on oom
Updates for page migration
zone reclaim: do not check references to a page during zone reclaim
vmscan: remove duplicate increment of reclaim_in_progress
vmscan: skip reclaim_mapped determination if we do not swap

Chuck Ebbert:
sched: only print migration_cost once per boot
i386 cpu hotplug: don't access freed memory
i386: print kernel version in register dumps
kobject: don't oops on null kobject.name

Cornelia Huck:
s390: fix to_channelpath macro
s390: fix locking in __chp_add() and s390_subchannel_remove_chpid()

Daniel Jacobowitz:
[MIPS] Support /proc/kcore for MIPS

Dave C Boutcher:
powerpc: return correct rtas status from ibm,suspend-me
powerpc: prod all processors after ibm,suspend-me
powerpc: remove useless call to touch_softlockup_watchdog

Dave Jones:
Fix build failure in recent pm_prepare_* changes.
EDAC config cleanup
missing license tag in intermodule
V4L/DVB (3318c): fix saa7146 kobject register failure
More informative message on umount failure
Fix s390 build failure.

Davi Arnaut:
Fix keyctl usage of strnlen_user()

David Binderman:
[IRDA]: out of range array access

David Brownell:
SPI: spi_butterfly, restore lost deltas

David Chinner:
[XFS] Account for the page we just wrote when we detect congestion during

David Gibson:
powerpc: Cleanup, consolidating icache dirtying logic
Hugepages need clear_user_highpage() not clear_highpage()

David S. Miller:
[TG3]: Update driver version and release date.
[SPARC64]: Add .gitignore file for sparc64 boot images.
[SPARC64]: Update defconfig.
[SPARC]: Wire up sys_unshare().
[SPARC64]: Update defconfig.

dean gaudet:
fcntl F_SETFL and read-only IS_APPEND files

Domen Puncer:
drivers/isdn/sc/ioctl.c: copy_from_user() size fix

Eric Dumazet:
percpu data: only iterate over possible CPUs

Eric Paris:
s390: remove one set of brackets in __constant_test_bit()

Eric Sesterhenn:
i2c: Use module_param in i2c-algo-sibyte

Eric Sesterhenn / snakebyte:
BUG_ON() Conversion in fs/ocfs2/
BUG_ON() Conversion in fs/configfs/

Eric W. Biederman:
edac_mc: Remove include of version.h

Evgeniy Dushistov:
ufs: fix oops with `ufs1' type
ufs: fix hang during `rm'

Felix Oxley:
fs/jffs/intrep.c: 255 is unsigned char

Fernando Luis Vazquez Cao:
Compilation of kexec/kdump broken

Francois Romieu:
r8169: prevent excessive busy-waiting
8139too: fix a TX timeout watchdog thread against NAPI softirq race

Geoff Levand:
powerpc: Fix spufs initialization sequence.

George Anzinger:
Normalize timespec for negative values in ns_to_timespec

Grant Grundler:
[PARISC] Remove unnecessary extern declarations from asm/pci.h

Greg Kroah-Hartman:
USB: Fix GPL markings on usb core functions.
kobject_add() must have a valid name in order to succeed.
DRM: fix up classdev interface for drm core
IB: fix up major/minor sysfs interface for IB core

Greg Ungerer:
m68knommu: compile fixes for mcfserial.c
m68knommu: need pm_power_off in m68knommu
m68knommu: hardirq.h needs definition of NR_IRQS
m68knommu: use tty_schedule_flip() in 68360serial.c
m68knommu: use tty_schedule_flip() in 68328serial.c

Hans Verkuil:
V4L/DVB (3403): Add probe check for the tda9840.
V4L/DVB (3300): Add standard for South Korean NTSC-M using A2 audio.

Haren Myneni:
kexec: fix in free initrd when overlapped with crashkernel region

Heiko Carstens:
s390: compile fix: missing defines in asm-s390/io.h
s390: fix compat syscall wrapper
[SPARC64]: Fix sys_newfstatat syscall table entry for 64-bit.
remove bogus comment from init/main.c
s390: update default configuration
s390: earlier initialization of cpu_possible_map
s390: update maintainers file
s390: fix non smp build of kexec
s390: add support for unshare system call
s390: add #ifdef __KERNEL__ to asm-s390/setup.h
s390: fstatat64 support

Helge Deller:
[PARISC] Use kzalloc and other janitor-style cleanups
[PARISC] Drop unused do_check_pgt_cache()
[PARISC] Clean up compiler warning in pci.c
[PARISC] Add CONFIG_DEBUG_RODATA to protect read-only data
[PARISC] Use DEBUG_KERNEL to catch used-after-free __init data

Herbert Poetzl:
quota: remove unused sync_dquots_dev()
quota: fix error code for ext2_new_inode()

Herbert Xu:
[IPV6]: Don't hold extra ref count in ipv6_ifa_notify
[IPV4] multipath_wrandom: Fix softirq-unsafe spin lock usage
[IPV6]: Fix illegal dst locking in softirq context.
[ICMP]: Fix extra dst release when ip_options_echo fails
[PPP]: Fixed hardware RX checksum handling

Hidetoshi Seto:
[IA64] mca_drv: Add minstate validation

Holger Eitzenberger:
[NETFILTER]: ULOG/nfnetlink_log: Use better default value for 'nlbufsiz'

Horms:
[IPV4]: Document icmp_errors_use_inbound_ifaddr sysctl
[IPV4]: Remove suprious use of goto out: in icmp_reply

Hugh Dickins:
x86: fix stack trace facility level

Ian Campbell:
[WATCHDOG] sa1100_wdt.c sparse clean (2)

Ian Pickworth:
V4L/DVB (3416): Recognise Hauppauge card #34519

Ingo Molnar:
sem2mutex: drivers/macintosh/windfarm_core.c
solve false-positive soft lockup messages during IDE init
Fix spinlock debugging delays to not time out too early
SLOB=y && SMP=y fix
x86: print out early faults via early_printk()

Ivan Kokshaysky:
alpha: set cpu_possible_map much earlier

J. Bruce Fields:
[OCFS2] Documentation Fix
knfsd: fix nfs4_open lock leak

Jack Steiner:
[IA64-SGI] Update TLB flushing code for SN platform

Jake Moilanen:
powerpc: IOMMU SG paranoia

James Bottomley:
[PARISC] Fix floating point invalid exception trap handler

Jan Beulich:
x86_64: small fix for CFI annotations
x86_64: minor odering correction to dump_pagetable()
prevent recursive panic from softlockup watchdog

Jan Glauber:
s390: timer interface visibility

JANAK DESAI:
unshare system call -v5: Documentation file
unshare system call -v5: system call handler function
unshare system call -v5: unshare filesystem info
unshare system call -v5: unshare namespace
unshare system call -v5: unshare vm
unshare system call -v5: unshare files
unshare system call -v5: system call registration for i386
powerpc: unshare system call registration

Janak Desai:
[IA64] unshare system call registration for ia64

Jason Gaston:
piix: add Intel ICH8M device IDs
i2c-i801: I2C patch for Intel ICH8

Jay Vosburgh:
bonding: allow bond to use TSO if slaves support it

Jayachandran C:
UDF: Fix issues reported by Coverity in namei.c
IPMI: fix issues reported by Coverity in ipmi_msghandler.c

Jean Delvare:
ide-disk: Restore missing space in log message
I2C: Resurrect i2c_smbus_write_i2c_block_data.
hwmon: Fix negative temperature readings in lm77 driver
hwmon: Inline w83792d register access functions
hwmon: Add f71805f documentation
hwmon: New f71805f driver
hwmon: Fix reboot on it87 driver load

Jeff Dike:
uml: add debug switch for skas mode
uml: close TUN/TAP file descriptors
uml: balance list_add and list_del in the network driver
uml: block SIGWINCH in ptrace tester child
uml: initialize process FP registers properly
uml: remove a dead file

Jeff Garzik:
[libata sata_sil] implement 'slow_down' module parameter
[libata sata_mv] do not enable PCI MSI by default

Jeff Mahoney:
ocfs2/dlm: fix compilation on ia64
reiserfs: disable automatic enabling of reiserfs inode attributes

Jeff Moyer:
fix O_DIRECT read of last block in a sparse file

Jens Axboe:
fix ordering on requeued request drainage
cciss: softirq handler needs to save interrupt flags
blk: Fix SG_IO ioctl failure retry looping

Jes Sorensen:
drivers/sn/ must be entered for CONFIG_SGI_IOC3
[IA64-SGI] sn2 housekeeping
[IA64-SGI] include/asm-ia64/sn/intr.h more sn2 housekeeping
[IA64] prevent sn2 specific code to be run in generic kernels

Jesper Juhl:
Don't check pointer for NULL before passing it to kfree [arch/powerpc/kernel/rtas_flash.c]
wrong firmware location in IPW2100 Kconfig entry
netfilter: fix build error due to missing has_bridge_parent macro

Jesse Allen:
orinoco: support smc2532w

Jesse Brandeburg:
e100: remove init_hw call to fix panic

Jiri Slaby:
V4L/DVB (3439a): media video stradis memory fix

Joel Becker:
o Remove confusing Kconfig text for CONFIGFS_FS.
configfs: Clean up MAINTAINERS entry
configfs: Add permission and ownership to configfs objects.

John Blackwood:
arch/x86_64/kernel/traps.c PTRACE_SINGLESTEP oops

John Heffner:
[TCP]: rcvbuf lock when tcp_moderate_rcvbuf enabled

Jon Mason:
x86_64: IOMMU printk cleanup

Jordan Crouse:
[SERIAL] Fix compile error in 8250_au1x00.c
[MMC] Remove extra character in AU1XXX MMC Kconfig entry

KAMBAROV, ZAUR:
coverity: udf/balloc.c null deref fix

KAMEZAWA Hiroyuki:
shmdt cannot detach not-alined shm segment cleanly.

Karsten Keil:
i4l: warning fixes

Keith Owens:
[IA64-SGI] Recursive flags do not work for selective builds
Tell kallsyms_lookup_name() to ignore type U entries

Kevin VanMaren:
x86_64: When allocation of merged SG lists fails in the IOMMU don't merge

Kirill Korotaev:
[NETFILTER]: Fix possible overflow in netfilters do_replace()

Kristian Slavov:
[IPV6]: Address autoconfiguration does not work after device down/up cycle

Kumar Gala:
gianfar: Fix sparse warnings
[SERIAL] 8250 serial console update uart_8250_port ier
powerpc: Add CONFIG_DEFAULT_UIMAGE for embedded boards

Kurt Hackel:
ocfs2/dlm: fixes

Kyle McMartin:
[PARISC] Use F_EXTEND() for COMMAND_GLOBAL
[PARISC] atomic64 support
[PARISC] Move pm_power_off export to process.c
[PARISC] Remove obsolete _hlt cruft
[PARISC] Add chassis_power_off routine
[PARISC] Clean up printk in superio.c
[PARISC] Arch-specific compat signals
[PARISC] Simplify DISCONTIGMEM in Kconfig
[PARISC] New syscalls (inotify, *at, pselect6/ppoll, migrate_pages)
[IA64] Remove stale comment from ia64/Kconfig
sys_hpux: fix strlen_user() race

Latchesar Ionkov:
v9fs: symlink support fixes
v9fs: v9fs_put_str fix
v9fs: fix corner cases when flushing request

Lennert Buytenhek:
sis900: remove cfgpmcsr I/O space register define
[ARM] 3300/1: make ixdp2x01 co-exist with other ixp2000 machine types
[ARM] 3301/1: remove unnecessary clock default from ixdp2801 defconfig
[ARM] 3302/1: make pci=firmware the default for ixp2000

Linas Vepstas:
Clean up Documentation/driver-model/overview.txt
Documentation: Updated PCI Error Recovery

Linus Torvalds:
Revert "x86_64: Fix the node cpumask of a cpu going down"
mm/slab.c (non-NUMA): Fix compile warning and clean up code
ppc: fix up trivial Kconfig config selection
Revert "kconfig: detect if -lintl is needed when linking conf,mconf"
Linux v2.6.16-rc3

Loren M. Lang:
RocketPoint 1520 [hpt366] fails clock stabilization

Lucas Correia Villa Real:
[ARM] 3284/1: S3C2400 - adds support to GPIO
[ARM] 3286/2: S3C2400 - adds to the table of supported CPUs
[ARM] 3283/1: S3C2400 - defines the number of serial ports
[ARM] 3314/1: S3C2400 - adds s3c2400.h

Luiz Fernando Capitulino:
bonding: Sparse warnings fix

Manu Abraham:
V4L/DVB (3294): Fix [Bug 5895] to correct snd_87x autodetect

Marcelo Tosatti:
make "struct d_cookie" depend on CONFIG_PROFILING
powerpc/8xx: last two 8MB D-TLB entries are incorrectly set

Marcin Rudowski:
V4L/DVB (3266): Fix NICAM buzz on analog sound

Marco Manenti:
V4L/DVB (3297): Add IR support to KWorld DVB-T (cx22702-based)

Marcus Sundberg:
[NETFILTER]: ctnetlink: Fix subsystem used for expectation events

Mark Fasheh:
[OCFS2] Make ip_io_sem a mutex
ocfs2: fix compile warnings
ocfs2: don't wait on recovery when locking journal

Mark Mason:
[MIPS] BCM1125 PCI fixes
[MIPS] BCM1480: Cleanup debug code left behind in the PCI driver.
[MIPS] SB1: Add oprofile support.

Mark Maule:
[IA64-SGI] fix smp_affinity redirection when using CONFIG_PCI_MSI
[IA64-SGI] disable msi for all altix pci devices

Markus Lidel:
I2O: don't disable PCI device if it is enabled before probing
I2O: fix and workaround for Motorola/Freescale controller
Fix i2o_scsi oops on abort

Markus Rechberger:
V4L/DVB (3429): Missing break statement on tuner-core
V4L/DVB (3434): changed comment in tuner-core.c
V4L/DVB (3281): Added signal detection support to tvp5150
V4L/DVB (3306): Fixed i2c return value, conversion mdelay to msleep
V4L/DVB (3325): Disabled debug on by default in tvp5150

Martin Michlmayr:
Fix compilation errors in maps/dc21285.c
[ARM] 3304/1: Add help descriptions to ARCH config items that don't have one
[ARM] 3305/1: Minor typographical and spelling fixes in Konfig

Matt Waddel:
Add wording to m68k .S files to help clarify license info

Matthew Wilcox:
[PARISC] Make flush_tlb_all_local take a void *
[PARISC] Update b180_defconfig
[PARISC] Remove {,un}lock_kernel from perf ioctl

Mauro Carvalho Chehab:
V4L/DVB (3406): Added credits for em28xx-video.c
V4L/DVB (3405): Fixes tvp5150a/am1 detection.
V4L/DVB (3453a): Alters MAINTAINERS file to point to newer v4l-dvb email
V4L/DVB (3318a): Makes Some symbols static.

Michael Chan:
[TG3]: Flush tg3_reset_task()

Michael Ellerman:
powerpc: Don't allocate zero bytes in finish_device_tree()
powerpc: Make sure we don't create empty lmb regions
powerpc: Refuse to boot a kdump kernel via OF
powerpc: Fix !SMP build of rtas.c
powerpc: Don't overwrite flat device tree with kdump kernel
powerpc: Don't use toc in decrementer_iSeries_masked

Michael Krufky:
V4L/DVB (3392): Add PCI ID for DigitalNow DVB-T Dual, rebranded DViCO FusionHDTV DVB-T Dual.
V4L/DVB (3413): Kill nxt2002 in favor of the nxt200x module
V4L/DVB (3414): rename dvb_pll_tbmv30111in to dvb_pll_samsung_tbmv
V4L/DVB (3417): make VP-3054 Secondary I2C Bus Support a Kconfig option.
V4L/DVB (3431): fixed spelling error, exectuted --> executed.
V4L/DVB (3442): Allow tristate build for cx88-vp3054-i2c
V4L/DVB (3299): Kconfig: DVB_USB_CXUSB depends on DVB_LGDT330X and DVB_MT352
V4L/DVB (3310): Use MT352 parallel transport function for all Bluebird FusionHDTV DVB-T boxes.

Michael Neuling:
powerpc: hypervisor check in pseries_kexec_cpu_down

Michael Richardson:
ide: cast arguments to pr_debug() properly

Michal Ostrowski:
Fix RocketPort driver

Mike Isely:
V4L/DVB (3418): Cause tda9887 to use I2C_DRIVERID_TDA9887

Miklos Szeredi:
fuse: fix request_end() vs fuse_reset_request() race

Nathan Lynch:
powerpc: avoid timer interrupt replay effect when onlining cpu

Nathan Scott:
[XFS] Fix missing inode atime update from the utime syscall.

NeilBrown:
md: Handle overflow of mdu_array_info_t->size better
md: Assorted little md fixes
md: Make sure rdev->size gets set for version-1 superblocks

Nick Piggin:
mm: compound release fix
sched: remove smpnice

Nicolas Pitre:
[ARM] 3293/1: don't invalidate the whole I-cache with xscale_coherent_user_range
[ARM] 3294/1: don't invalidate individual BTB entries on ARMv6
[ARM] 3307/1: old ABI compat: mark it experimental
[ARM] 3308/1: old ABI compat: struct sockaddr_un
[ARM] 3309/1: disable the pre-ARMv5 NPTL kernel helper in the non MMU case
[ARM] 3310/1: add a comment about the possible __kuser_cmpxchg transient false
[ARM] 3311/1: clean up include/asm-arm/mutex.h

OGAWA Hirofumi:
fat: Replace an own implementation with ll_rw_block(SWRITE,)
Trivial optimization of ll_rw_block()
fat: Fix truncate() write ordering

Olaf Hering:
powerpc: remove pointer/integer confusion in generic_calibrate_decr
powerpc: restore clock speed in /proc/cpuinfo
powerpc: remove pointer/integer confusion in of_find_node_by_name
powerpc: add refcounting to setup_peg2 and of_get_pci_address
powerpc: fix compile warning in udbg_init_maple_realmode

Oleg Nesterov:
sys_signal: initialize ->sa_mask
do_sigaction: cleanup ->sa_mask manipulation

Oliver Endriss:
V4L/DVB (3307): Support for Galaxis DVB-S rev1.3

Pablo Neira Ayuso:
[TEXTSEARCH]: Fix broken good shift array calculation in Boyer-Moore
[NETFILTER]: ctnetlink: add MODULE_ALIAS for expectation subsystem

Paolo 'Blaisorblade' Giarrusso:
Kbuild menu - hide empty NETDEVICES menu when NET is disabled

Patrick Boettcher:
V4L/DVB (3312): FIX: Multiple usage of VP7045-based devices
V4L/DVB (3313): FIX: Check if FW was downloaded or not + new firmware file

Patrick McHardy:
[NETFILTER]: Fix undersized skb allocation in ipt_ULOG/ebt_ulog/nfnetlink_log
[NETFILTER]: nfnetlink_queue: fix packet marking over netlink
[NETFILTER]: Fix missing src port initialization in tftp expectation mask
[NETFILTER]: Check policy length in policy match strict mode
[NETFILTER]: Fix ip6t_policy address matching
[NETFILTER]: Prepare {ipt,ip6t}_policy match for x_tables unification
[NETFILTER]: Fix check whether dst_entry needs to be released after NAT

Paul E. McKenney:
Fix comment to synchronize_sched()

Paul Fulghum:
new tty buffering locking fix
tty buffering stall fix

Paul Mackerras:
powerpc/64: Fix bug in setting floating-point exception mode
ppc: Use the system call table from arch/powerpc/kernel/systbl.S

Pavel Machek:
Fix Userspace interface breakage in power/state
swsusp: kill unneeded/unbalanced bio_get

Peter Horton:
[MIPS] Fix Cobalt PCI cache line sizes

Peter Missel:
V4L/DVB (3409): Mark Typhoon cards as Lifeview OEM's

Peter Oberparleiter:
s390: fix sclp memory corruption in tty pages list

Peter Osterlund:
pktcdvd: remove version string
pktcdvd: Don't waste kernel memory

Peter Williams:
lib: Fix bug in int_sqrt() for 64 bit longs

Phillip Susi:
pktcdvd: Fix overflow for discs with large packets
pktcdvd: Allow larger packets

Prarit Bhargava:
[IA64-SGI] Hotplug driver related fix in the SN ia64 code.
[IA64-SGI] Small cleanup for misuse of list_for_each to list_for_each_safe.

Rafael J. Wysocki:
Fix build failure in recent pm_prepare_* changes.

Ralf Baechle:
[MIPS] Remove stray .set mips3 resulting in 64-bit instruction in 32-bit kernels.
[MIPS] Fix C version of ssnop to use the right opcode.
[MIPS] Get rid of unnecessary prototypes. Fixes and optimizations for HZ > 100.
[MIPS] RTLX compile fixes.
[MIPS] Revert "mips: add pm_power_off"
[MIPS] Check function pointers are non-zero before calling.
[MIPS] Rename _machine_power_off to pm_power_off so the kernel builds again.
[MIPS] Rename include/asm-mips/cobalt to include/asm-mips/mach-cobalt.
[MIPS] CPU definitions for Cobalt.
[MIPS] Nevada support for SGI O2.
[MIPS] Bullet proof uaccess.h against 4.0.1 miss-compilation.
[MIPS] Get rid of CONFIG_SB1_PASS_1_WORKAROUNDS #ifdef crapola.
[MIPS] Sibyte: Make all setup functions __init.
[MIPS] Reformat to 80 columns.
[MIPS] Remove commented out code to add -mmad for Nevada.
[MIPS] local_irq_restore wasn't safe to be used in other macros mode.
[MIPS] Cleanup fls implementation.
[MIPS] IP22: Fix serial console detection
[MIPS] Shrink Qemu configuration to the bare minimum that is need and tested.
[MIPS] Remove buggy inline version of memscan.
[MIPS] MIPS R2 optimized endianess swapping.
[MIPS] Oprofile: Support for 34K UP kernels.
[MIPS] Fix linker script to work for non-4K page size.
[MIPS] Clear ST0_RE on bootup.
[MIPS] Add support for TIF_RESTORE_SIGMASK.
[MIPS] Make do_signal return void.
[MIPS] Wire up new syscalls.
[SERIAL] ip22zilog: Whitespace cleanup.

Randy Dunlap:
V4L/DVB (3433): Fix printk type warning
parport: fix printk format warning
cpuset: fix sparse warning
edac: use C99 initializers (sparse warnings)

Ravikiran G Thirumalai:
x86_64: Fix the node cpumask of a cpu going down
NUMA slab locking fixes: move color_next to l3
NUMA slab locking fixes: irq disabling from cahep->spinlock to l3 lock
NUMA slab locking fixes: fix cpu down and up locking
x86_64: Fix the node cpumask of a cpu going down
slab: Avoid deadlock at kmem_cache_create/kmem_cache_destroy

Richard Purdie:
[ARM] 3291/1: PXA27x: Correct get_clk_frequency_khz turbo flag handling
[ARM] 3292/1: Fix memory corruption in asm-arm/checksum.h: ip_fast_csum()
stop CompactFlash devices being marked as removable

Robb, Sam:
kconfig: detect if -lintl is needed when linking conf,mconf

Robert Love:
inotify: fix one-shot support

Robin Holt:
[IA64-SGI] Fix XPC code which sleeps with spin_lock_irqsave().

Rudolf Marek:
i2c: Rename i2c-sis96x documentation file

Russ Anderson:
[IA64-SGI] Shub2 BTE address fix

Russ Dill:
[ARM] 3295/1: Fix oprofile init return value

Russell King:
[MMC] Add MMC command type flags
[SERIAL] 8250: limit range of runtime ports
[ARM] Remove ARCH_CAMELOT from at91 defconfigs
[SERIAL] uart_port iotype member should use UPIO_*
[SERIAL] uart_port flags member should use UPF_*
[SERIAL] Remove unnecessary serial.h include
Fix compiler warning in driver core for CONFIG_HOTPLUG=N
drivers/base/bus.c warning fixes
[ARM] Experimental config options should have (EXPERIMENTAL)
[SERIAL] Remove incorrect code from ioc4 serial driver

Sam Ravnborg:
kconfig: fix /dev/null breakage
kbuild: fix build with O=..

Samir Bellabes:
[NETFILTER]: nf_conntrack: fix incorrect memset() size in FTP helper

Samuel Ortiz:
[IRDA]: Set proper IrLAP device address length

[email protected]:
disable per cpu intr in /proc/stat

Sergei Shtylylov:
[MIPS] TX49x7: Fix timer register #define's
[MIPS] Au1xx0: really set KSEG0 to uncached on reboot
[MIPS] Au1200: Make KGDB compile
[MIPS] TX49x7: Fix reporting of the CPU name and PCI clock

Shaohua Li:
x86_64: timer resume
x86_64: mark two routines as __cpuinit

Stefan Weinhuber:
s390: dasd extended error reporting module

Steffen Klassert:
3c59x: collision statistic fix

Stephen Hemminger:
[NET] snap: needs hardware checksum fix
[NET]: Add CONFIG_NETDEBUG to suppress bad packet messages.
sky2: power management fix
sky2: pci config space checking
sky2: ethtool rx_coalesce settings fix
sky2: set mac address fix
sky2: clear irq race
sky2: add irq to entropy pool
sky2: support msi interrupt (revised)
sky2: version 0.15 update
[BRIDGE]: fix for RCU and deadlock on device removal
[BRIDGE]: netfilter handle RCU during removal
[BRIDGE]: fix error handling for add interface to bridge

Stephen Smalley:
SELinux: fix size-128 slab leak
MAINTAINERS/CREDITS: Update SELinux contact info
selinux: require SECURITY_NETWORK
selinux: require AUDIT

Steve Langasek:
__cmpxchg() must really always be inlined on alpha

Suzuki:
Fix do_path_lookup() to add the check for error in link_path_walk()

Takashi Iwai:
Fix "value computed is not used" compile warnings with gcc-4.1

Tejun Heo:
block: request_queue->ordcolor must not be flipped on SOFTBARRIER
block: implement elv_insert and use it (fix ordcolor flipping bug)

Thibaut VARENE:
[PARISC] pdc_stable version 0.22
ide: restore support for AEC6280M cards in aec62xx.c

Tobias Klauser:
umem: check pci_set_dma_mask return value correctly
i2c: Use ARRAY_SIZE macro

Tong Li:
OProfile: fixed x86_64 incorrect kernel call graphs

Tony Lindgren:
[ARM] 3279/1: OMAP: 1/3 Fix low-level io init
[ARM] 3280/1: OMAP: 2/3 Fix low-level io init for omap1 boards
[ARM] 3278/1: OMAP: 3/3 Fix low-level io init for omap2 boards

Tony Luck:
[IA64] Fix CONFIG_PRINTK_TIME
[IA64] sys32_signal() forgets to initialize ->sa_mask

Trond Myklebust:
VFS: Ensure LOOKUP_CONTINUE flag is preserved by link_path_walk()

Ulrich Drepper:
namei.c: unlock missing in error case
fstatat64 support

V. Ananda Krishnan:
jsm: update for tty buffering revamp

Venkatesh Pallipadi:
x86_64: Only switch to IPI broadcast timer on Intel when C3 is supported

Vincent Hanquez:
debugfs: hard link count wrong
debugfs: trivial comment fix

Vitaly Bordug:
[SERIAL] PPC32 CPM_UART: update to utilize the new TTY flip API

Vitaly Fertman:
someone broke reiserfs V3 mount options, this fixes it

Vlad Yasevich:
[SCTP]: Fix 'fast retransmit' to send a TSN only once.

Wim Van Sebroeck:
[WATCHDOG] pcwd.c add comments + tabs
[WATCHDOG] pcwd.c card_found-- fix.
[WATCHDOG] pcwd.c private data struct patch
[WATCHDOG] pcwd.c Control Status #2 patch
[WATCHDOG] pcwd.c move get_support to pcwd_check_temperature_support
[WATCHDOG] pcwd.c show card info patch
[WATCHDOG] pcwd.c - update module version info

Yasuyuki Kozakai:
[NETFILTER]: nf_conntrack: check address family when finding protocol module
[NETFILTER]: iptables: fix typos in ipt_connbytes.h

Yoichi Yuasa:
[SERIAL] 8250_pci: add new PCI serial card support

Zach Brown:
list.h: don't evaluate macro args multiple times
x86_64: align per-cpu section to configured cache bytes

Zhang, Yanmin:
Export cpu topology in sysfs

Zou Nan hai:
[IA64] Fix a possible buffer overflow in efi.c
[IA64] Fix wrong use of memparse in efi.c


2006-02-13 03:08:01

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3


We still have some serious bugs, several of which are in 2.6.15 as well:

- The scsi_cmd leak, which I don't think is fixed.

- The some-x86_64-boxes-use-GFP_DMA-from-bio-layer bug, which causes
oom-killings.

- The skbuff_head_cache leak, which has been around since at least
2.6.11. Another box-killer, but is seems very hard to hit.
([email protected], "the dreaded oom-killer (reproducable in 2.6.11 -
2.6.16-rc1) :(")

- http://bugzilla.kernel.org/show_bug.cgi?id=6060: an apparent ACPI
regression.

- Nathan's "sysfs-related oops during module unload", which Greg seems to
have under control.

- http://bugzilla.kernel.org/show_bug.cgi?id=6049 - another acpi
regression. We have the actual offending commit here.

- A couple of random tty-related oopses reported by Jesper Juhl. We
don't know why these happened - they appear to not be related to the tty
buffering changes.

- http://bugzilla.kernel.org/show_bug.cgi?id=6038, another box-killing
acpi regression.

- Various reports similar to
http://bugzilla.kernel.org/show_bug.cgi?id=6011, seemingly related to USB
PCI quirk handling.

- "Ben Castricum" <[email protected]> reports that ppp has started
exhibiting mysterious failures (again).

- Nasty warnings from scsi about kobject-layer things being called from
irq context. James has a push-it-to-process-context patch which sadly
assumes kmalloc() is immortal, but no other fix seems to have offered
itself.

- In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy Mahajan has
another regression, but he's off collecting more info.

- Helge Hafting reports a usb printer regression - I don't know if that's
still live?

- "Carlo E. Prelz" <[email protected]> has another USB/ehci regression
("ATI RS480-based motherboard: stuck while booting with kernel >= 2.6.15
rc1").

- Gerrit Bruchhuser <[email protected]> seems to have an aic7xxx
regression ("AHA-7850 doesn't detect scanner anymore") but he doesn't say
which kernel got it right.

- http://bugzilla.kernel.org/show_bug.cgi?id=5914 - a sata bug (which is
quite unremarkable :(), but this one is reported to eat filesystems.

- Patrizio Bassi <[email protected]> has an alsa suspend
regression ("alsa suspend/resume continues to fail for ens1370")

- Bjorn Nilsson <[email protected]> has an sk99lin regression ("3COM
3C940, does not work anymore after upgrade to 2.6.15")

- Andrey Borzenkov <[email protected]> has an acpi-cpufreq regression
("cannot unload acpi-cpufreq")

- "P. Christeas" <[email protected]> had an autofs regression ("Regression
in Autofs, 2.6.15-git"), whic might be fixed now?

- ghrt <[email protected]> reports an alsa regression ("PROBLEM: SB
Live! 5.1 (emu10k1, rev. 0a) doesn't work with 2.6.15")

- jinhong hu <[email protected]> reports what appears to be a qlogic
regression ("kernel 2.6.15 scsi problem")

- Benjamin LaHaise <[email protected]> had an NFS problem ("NFS processes
gettting stuck in D with currrent git").



These are clear regressions, reported in the last month by people who are
willing to test patches. They're almost all in subsystems which have
active and professional maintainers.

2006-02-13 03:23:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, 2006-02-12 at 19:05 -0800, Andrew Morton wrote:
> We still have some serious bugs, several of which are in 2.6.15 as well:

> - Benjamin LaHaise <[email protected]> had an NFS problem ("NFS processes
> gettting stuck in D with currrent git").

...but which was apparently not repeatable:

As of this afternoon's tree
(6150c32589d1976ca8a5c987df951088c05a7542) after the more
recent set of nfs patches, it seems to be behaving itself. Will
keep sysrq enabled to see if it hits again, though.

I've had no news from Ben since then...

Cheers,
Trond

2006-02-13 03:28:47

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

> In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy Mahajan has
> another regression, but he's off collecting more info.

I'm nearly done with bisecting (spent a day on a wild bisect goose
chase due to being careless) and I'm 95% sure the problem is
introduced by:

commit b8e4d89357fc434618a59c1047cac72641191805
Author: Bob Moore <[email protected]>
Date: Fri Jan 27 16:43:00 2006 -0500

[ACPI] ACPICA 20060127

But I will know for sure shortly.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-02-13 03:36:36

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Andrew Morton wrote:
> - http://bugzilla.kernel.org/show_bug.cgi?id=5914 - a sata bug (which is
> quite unremarkable :(), but this one is reported to eat filesystems.

Issue closed, as the bug notes...

Jeff




2006-02-13 04:40:42

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, 2006-02-12 at 19:05 -0800, Andrew Morton wrote:
> - The scsi_cmd leak, which I don't think is fixed.

Erm, you mean the leak caused by flush barriers? That was verified as
fixed (albeit accidentally) in 2.6.16-rc1.

James


2006-02-13 05:20:41

by Sanjoy Mahajan

[permalink] [raw]
Subject: S3 sleep regression bisected (was Re: Linux 2.6.16-rc3)

> In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy Mahajan has
> another regression, but he's off collecting more info.

Now collected. The problematic commit is:

bad: [292dd876ee765c478b27c93cc51e93a558ed58bf] Pull release into acpica branch

The longer story follows (and I'll file it with the bugzilla). The
bug is that S3 sleep hangs on the second sleep of my TP 600X,
producing an endless loop in the dmesgs (across a serial console):

exregion-0182 [30] ex_system_memory_space: system_memory 0 (32 width) Address=0000000023FDFFC0
exregion-0182 [30] ex_system_memory_space: system_memory 1 (32 width) Address=0000000023FDFFC0
exregion-0287 [30] ex_system_io_space_han: system_iO 1 (8 width) Address=00000000000000B2
exregion-0182 [29] ex_system_memory_space: system_memory 0 (32 width) Address=0000000023FDFFC0

'git bisect' narrowed the problematic commit to:

commit 292dd876ee765c478b27c93cc51e93a558ed58bf
Merge: d4ec6c7cc9a15a7a529719bc3b84f46812f9842e
9fdb62af92c741addbea15545f214a6e89460865
Author: Len Brown <[email protected]>
Date: Fri Jan 27 17:18:29 2006 -0500

Pull release into acpica branch

This commit had a slightly different bug from all the other ones I
marked as bad. This one hung on the first S3 sleep, with the same
endless loop pattern as the other bad ones (but they hang on the
second S3 sleep).

Below is the 'git bisect log', which for fun you can put in some file
and feed to 'git bisect replay somefile' :

$ git bisect log
git-bisect start
# good: [f3bcf72eb85aba88a7bd0a6116dd0b5418590dbe] Linux v2.6.16-rc1
git-bisect good f3bcf72eb85aba88a7bd0a6116dd0b5418590dbe
# bad: [e4f9aae0d74cb7d2fd5f0eb315cf9de1118fe260] Linux v2.6.16-rc2
git-bisect bad e4f9aae0d74cb7d2fd5f0eb315cf9de1118fe260
# good: [a6df590dd8b7644c8e298e3b13442bcd6ceeb739] Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
git-bisect good a6df590dd8b7644c8e298e3b13442bcd6ceeb739
# good: [e0e851cf30f1a9bd2e2a7624e9810378d6a2b072] reiserfs: reiserfs hang and performance fix for data=journal mode
git-bisect good e0e851cf30f1a9bd2e2a7624e9810378d6a2b072
# bad: [59ed2f59e4ea6a32f9591e378da7935f713a7000] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
git-bisect bad 59ed2f59e4ea6a32f9591e378da7935f713a7000
# good: [9fdb62af92c741addbea15545f214a6e89460865] [ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi branches into release
git-bisect good 9fdb62af92c741addbea15545f214a6e89460865
# good: [61ee9cd5f2e76859222c1d64394ae633f9080163] PowerPC/PCI Hotplug build break
git-bisect good 61ee9cd5f2e76859222c1d64394ae633f9080163
# good: [e6da74e1f20ea7822e52a9e4fbd3d25bd907e471] Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git-bisect good e6da74e1f20ea7822e52a9e4fbd3d25bd907e471
# bad: [b8e4d89357fc434618a59c1047cac72641191805] [ACPI] ACPICA 20060127
git-bisect bad b8e4d89357fc434618a59c1047cac72641191805
# bad: [292dd876ee765c478b27c93cc51e93a558ed58bf] Pull release into acpica branch
git-bisect bad 292dd876ee765c478b27c93cc51e93a558ed58bf
# good: [d4ec6c7cc9a15a7a529719bc3b84f46812f9842e] [ACPI] remove "Resource isn't an IRQ" warning
git-bisect good d4ec6c7cc9a15a7a529719bc3b84f46812f9842e

2006-02-13 07:01:51

by Brown, Len

[permalink] [raw]
Subject: RE: Linux 2.6.16-rc3


>- http://bugzilla.kernel.org/show_bug.cgi?id=6049 - another acpi
> regression. We have the actual offending commit here.

per my note in the bug report, I believe that this failure
is not related to the "offending commit", and thus that commit
should not be reverted. I believe that this failure is because
the system is booted with "pci=noacpi" in IOAPIC mode --
and unsuportable configuration -- and will endeavor to confirm...

thanks,
-Len

2006-02-13 07:05:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, 2006-02-12 at 19:05 -0800, Andrew Morton wrote:
> We still have some serious bugs, several of which are in 2.6.15 as well:
>
> - The scsi_cmd leak, which I don't think is fixed.

didn't this got nailed down to a 2.6.15 specific queueing bug, fixed in
2.6.16-rc ?



2006-02-13 07:10:26

by Brown, Len

[permalink] [raw]
Subject: RE: Linux 2.6.16-rc3


>- In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy
>Mahajan has another regression, but he's off collecting more info.

We're talking here about a system from 1999 where Windows 98
refuses to run in ACPI mode and instead runs in APM mode.
So I don't consider a regression on this box as "serious" --
I consider that it works in ACPI mode at all as "miraculous":-)

However, I do think the issue merits investigation in the event
that it has an effect on systems newer than 6 years old.

thanks,
-Len

2006-02-13 07:16:21

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

From: "Brown, Len" <[email protected]>
Date: Mon, 13 Feb 2006 02:07:50 -0500

> >- In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy
> >Mahajan has another regression, but he's off collecting more info.
>
> We're talking here about a system from 1999 where Windows 98
> refuses to run in ACPI mode and instead runs in APM mode.

If it worked before a change which was installed, it's a regression
regardless of whether another OS tries to use ACPI on that system or
not. I don't understand how one can use that fact to label this as
not a regression from Linux's perspective.

2006-02-13 07:43:50

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

> systems newer than 6 years old.

According to the sticker on the bottom, this model was made in
04/2000, so the 6 years is right.

> We're talking here about a system from 1999 where Windows 98 refuses
> to run in ACPI mode and instead runs in APM mode.

I haven't tried Windows 98 on this machine, but Windows 98SE would run
in ACPI mode if it weren't for a cheap hack by IBM. The latest BIOS
(1.11), which I'm using, claims to be from 1999. However, that date
is almost surely wrong. The readme/changelog with the BIOS update
diskette is dated Sept 20, 2001 and contains this note about the 1.01
update:

- (Fix) If Windows 98 Second Edition is installed as APM mode and
an updated BIOS is installed with a BIOS date 12/02/99 or
later, Windows 98SE will change the mode from APM to ACPI
whenever a New hardware profile is created. So this BIOS
set the date to 11-30-99.

Probably IBM marked all the BIOS dates as 11-30-99 in order to work
around this W98SE misfeature. My guess is that BIOS 1.11 is really
from Sept 2001, or 4.5 years ago. Old, but not octagenerian!

> I consider that it works in ACPI mode at all as "miraculous":-)

Amen to that. I was very pleased when the combination of newer ACPI
releases plus my modifying the DSDT made S3 work.

> I do think the issue merits investigation ...

Although I have little idea of what sections of code to modify,
especially since the commit in question merges two well travelled
branches, I'm happy to test patches.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-02-13 08:05:35

by Brown, Len

[permalink] [raw]
Subject: RE: Linux 2.6.16-rc3

>> >- In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy
>> >Mahajan has another regression, but he's off collecting more info.
>>
>> We're talking here about a system from 1999 where Windows 98
>> refuses to run in ACPI mode and instead runs in APM mode.
>
>If it worked before a change which was installed, it's a regression
>regardless of whether another OS tries to use ACPI on that system or
>not. I don't understand how one can use that fact to label this as
>not a regression from Linux's perspective.

I don't think anybody claimed this isn't a regression for the 600X.
Sanjoy has done a wonderful job documenting that.

My point is that it that on the grand scale of bugs serious enough
to have an effect on the course of 2.6.16, this one doesn't qualify
unless the same issue is seen on other systems.

-Len

2006-02-13 08:10:40

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, Feb 12 2006, Andrew Morton wrote:
>
> We still have some serious bugs, several of which are in 2.6.15 as well:
>
> - The scsi_cmd leak, which I don't think is fixed.

It is fixed in 2.6.16-rcX.

> - The some-x86_64-boxes-use-GFP_DMA-from-bio-layer bug, which causes
> oom-killings.

Still pending.

--
Jens Axboe

2006-02-13 08:16:14

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

"Brown, Len" <[email protected]> wrote:
>
> My point is that it that on the grand scale of bugs serious enough
> to have an effect on the course of 2.6.16, this one doesn't qualify
> unless the same issue is seen on other systems.

I think we can assume that it will be seen there. 2.6.16 is going into
distros and will have more exposure than 2.6.15, let alone 2.6.16-rcX.

2006-02-13 08:42:31

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

> I think we can assume that it will be seen there. 2.6.16 is going into
> distros and will have more exposure than 2.6.15, let alone
> 2.6.16-rcX.

A related point is that S3 sleep/wake problems are very difficult to
debug. The bug is often not reproducible (I've had a few of those).
Or it happens early in the wakeup, when the serial console hasn't been
restored to a working state (at least on some machines, see bugzilla
#4270). Or the system has bugs that prevents its going to sleep,
which also prevents any wakeup problems from being investigated.

Or they happen to regular users, who give up and say 'my laptop cannot
go to sleep in Linux, oh well'. Besides being inconvenient, it gives
Linux a bad name, especially when people nearby have iBooks and
PowerBooks running MacOS that sleep and wake in 2-3 seconds, including
restoring networking and wireless.

So there's value in chasing any S3 bugs that can be reproduced,
especially those affecting sleeping.

The TP 600X is indeed old, and perhaps the bug is caused by an
otherwise fine change uncovering a 600X hardware or firmware bug
(perhaps the point that comment #8 at bugzilla 5989 is getting at).
However, one of the beauties of Linux, and a nightmare for developers,
is that Linux works on all sorts of hardware. I don't know whether
this bug should affect the 2.6.16 schedule. But I think its worth
solving eventually, if only to point where it's clear that it's unique
to this model.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-02-13 08:47:03

by Jan Dittmer

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Linus Torvalds wrote:
> The most user-visible one (eventually) is the unshare() system call, which
> glibc wanted. Along with some fixes for fstatat() (use the proper 64-bit
> interfaces, not the "newer old" one).

This breaks compilation on 3 archs compared to -rc2:

- mips: broke
AR arch/mips/lib-32/lib.a
GEN .version
CHK include/linux/compile.h
UPD include/linux/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/mips/kernel/built-in.o(.text+0x9820): In function `einval':
/usr/src/ctest/rc/kernel/arch/mips/kernel/scall32-o32.S: undefined reference to `sys_newfstatat'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [cdbuilddir] Error 2


Details: http://l4x.org/k/?d=10888

- sparc: broke
SYSMAP System.map
SYSMAP .tmp_System.map
HOSTCC arch/sparc/boot/btfixupprep
BTFIX arch/sparc/boot/btfix.S
AS arch/sparc/boot/btfix.o
LD arch/sparc/boot/image
arch/sparc/kernel/built-in.o(.data+0x794): In function `sys_call_table':
: undefined reference to `sys_newfstatat'
make[2]: *** [arch/sparc/boot/image] Error 1
make[1]: *** [image] Error 2
make: *** [cdbuilddir] Error 2


Details: http://l4x.org/k/?d=10897

- sparc64: broke
AR arch/sparc64/lib/lib.a
GEN .version
CHK include/linux/compile.h
UPD include/linux/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/sparc64/kernel/head.o(.text+0xe78): In function `sys_call_table':
/usr/src/ctest/rc/kernel/arch/sparc64/kernel/head.S: undefined reference to `sys_newfstatat'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [cdbuilddir] Error 2


Details: http://l4x.org/k/?d=10898

Jan

2006-02-13 08:57:59

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> "Brown, Len" <[email protected]> wrote:
> >
> > My point is that it that on the grand scale of bugs serious enough
> > to have an effect on the course of 2.6.16, this one doesn't qualify
> > unless the same issue is seen on other systems.
>
> I think we can assume that it will be seen there. 2.6.16 is going into
> distros and will have more exposure than 2.6.15,

2.6.15 went into distros as well, such as Fedora Core 4 ;)


2006-02-13 09:07:44

by Yoichi Yuasa

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Hi Jan

On Mon, 13 Feb 2006 09:46:56 +0100
Jan Dittmer <[email protected]> wrote:

> Linus Torvalds wrote:
> > The most user-visible one (eventually) is the unshare() system call, which
> > glibc wanted. Along with some fixes for fstatat() (use the proper 64-bit
> > interfaces, not the "newer old" one).
>
> This breaks compilation on 3 archs compared to -rc2:
>
> - mips: broke
> AR arch/mips/lib-32/lib.a
> GEN .version
> CHK include/linux/compile.h
> UPD include/linux/compile.h
> CC init/version.o
> LD init/built-in.o
> LD .tmp_vmlinux1
> arch/mips/kernel/built-in.o(.text+0x9820): In function `einval':
> /usr/src/ctest/rc/kernel/arch/mips/kernel/scall32-o32.S: undefined reference to `sys_newfstatat'
> make[1]: *** [.tmp_vmlinux1] Error 1
> make: *** [cdbuilddir] Error 2
>
>
> Details: http://l4x.org/k/?d=10888

MIPS 32bit machines need fstatat64 support.

Yoichi

Signed-off-by: Yoichi Yuasa <[email protected]>

diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index d7c4a38..d83e033 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -623,7 +623,7 @@ einval: li v0, -EINVAL
sys sys_mknodat 4 /* 4290 */
sys sys_fchownat 5
sys sys_futimesat 3
- sys sys_newfstatat 4
+ sys sys_fstatat64 4
sys sys_unlinkat 3
sys sys_renameat 4 /* 4295 */
sys sys_linkat 4

2006-02-13 09:22:35

by Patrizio Bassi

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Andrew Morton ha scritto:
> We still have some serious bugs, several of which are in 2.6.15 as well:
>
> - The scsi_cmd leak, which I don't think is fixed.
>
> - The some-x86_64-boxes-use-GFP_DMA-from-bio-layer bug, which causes
> oom-killings.
>
> - The skbuff_head_cache leak, which has been around since at least
> 2.6.11. Another box-killer, but is seems very hard to hit.
> ([email protected], "the dreaded oom-killer (reproducable in 2.6.11 -
> 2.6.16-rc1) :(")
>
> - http://bugzilla.kernel.org/show_bug.cgi?id=6060: an apparent ACPI
> regression.
>
> - Nathan's "sysfs-related oops during module unload", which Greg seems to
> have under control.
>
> - http://bugzilla.kernel.org/show_bug.cgi?id=6049 - another acpi
> regression. We have the actual offending commit here.
>
> - A couple of random tty-related oopses reported by Jesper Juhl. We
> don't know why these happened - they appear to not be related to the tty
> buffering changes.
>
> - http://bugzilla.kernel.org/show_bug.cgi?id=6038, another box-killing
> acpi regression.
>
> - Various reports similar to
> http://bugzilla.kernel.org/show_bug.cgi?id=6011, seemingly related to USB
> PCI quirk handling.
>
> - "Ben Castricum" <[email protected]> reports that ppp has started
> exhibiting mysterious failures (again).
>
> - Nasty warnings from scsi about kobject-layer things being called from
> irq context. James has a push-it-to-process-context patch which sadly
> assumes kmalloc() is immortal, but no other fix seems to have offered
> itself.
>
> - In http://bugzilla.kernel.org/show_bug.cgi?id=5989, Sanjoy Mahajan has
> another regression, but he's off collecting more info.
>
> - Helge Hafting reports a usb printer regression - I don't know if that's
> still live?
>
> - "Carlo E. Prelz" <[email protected]> has another USB/ehci regression
> ("ATI RS480-based motherboard: stuck while booting with kernel >= 2.6.15
> rc1").
>
> - Gerrit Bruchhuser <[email protected]> seems to have an aic7xxx
> regression ("AHA-7850 doesn't detect scanner anymore") but he doesn't say
> which kernel got it right.
>
> - http://bugzilla.kernel.org/show_bug.cgi?id=5914 - a sata bug (which is
> quite unremarkable :(), but this one is reported to eat filesystems.
>
> - Patrizio Bassi <[email protected]> has an alsa suspend
> regression ("alsa suspend/resume continues to fail for ens1370")
>
> - Bjorn Nilsson <[email protected]> has an sk99lin regression ("3COM
> 3C940, does not work anymore after upgrade to 2.6.15")
>
> - Andrey Borzenkov <[email protected]> has an acpi-cpufreq regression
> ("cannot unload acpi-cpufreq")
>
> - "P. Christeas" <[email protected]> had an autofs regression ("Regression
> in Autofs, 2.6.15-git"), whic might be fixed now?
>
> - ghrt <[email protected]> reports an alsa regression ("PROBLEM: SB
> Live! 5.1 (emu10k1, rev. 0a) doesn't work with 2.6.15")
>
> - jinhong hu <[email protected]> reports what appears to be a qlogic
> regression ("kernel 2.6.15 scsi problem")
>
> - Benjamin LaHaise <[email protected]> had an NFS problem ("NFS processes
> gettting stuck in D with currrent git").
>
>
>
> These are clear regressions, reported in the last month by people who are
> willing to test patches. They're almost all in subsystems which have
> active and professional maintainers.
>
>
Really sad to say, but my Alsa ens1370 regression due to suspend problem
is still there.
Only fix is reboot actually. Ready to patch :)

PS.

i have a bug similar to:
http://bugzilla.kernel.org/show_bug.cgi?id=6038 (marked as blocking by
Andrew)
on my laptop.

but my dma expiry problem only happens during suspend.
I have a Sis 630 chipset with a 2.5" hitachi drive.
Been there...for ages..i can say: always...never got a working 2.6 kernel.
However in my poor opinion it's not blocking on my system, just boring.


Patrizio

2006-02-13 10:07:45

by Andi Kleen

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3 - x86_64 specific outstanding bugs.

Andrew Morton <[email protected]> writes:

[Note to people who find this using google in some archive.
This list is for an old kernel and totally obsolete. Don't email me
about asking about it. Use a newer kernel.]

> We still have some serious bugs, several of which are in 2.6.15 as well:

>From the x86-64 side for 2.6.16:

- Still oopses with mbind in some setups that I need to fix.
Problem at least understood now. Should be fixed. Related to
GFP_DMA32

- mbind can currently cause a local dos by starting the OOM killer
early. Christoph Lameter has patches that should be added.

- Doesn't boot on Opterons with nodes in the middle unpopulated.
I can reproduce on SimNow, but still need to fix (unfortunately
it completely changes when I add any debugging code)

- kg_crashme causes do_exit be called with interrupts disabled.
Need to fix that.

- logical cpu hot replug on multiprocessor opterons hangs the machine.
Prime suspect is the powernow-k8 driver.

- Nested kprobes are broken and cause a stack overflow on the int3
stack. Impossible to fix fully, but should support nesting for
a few levels at least. The systemtap testsuite triggers this apparently.

- The ATI timer fix using the local APIC timer breaks on many AMD
laptops who don't run the APIC timer in C2. Latest plan is to switch
to the RTC interrupt on these machines, but still needs to be implemented.

Also some other not well researched timer issue on the usual suspects
(mainly nforce3 laptops which seem to have all kind of broken timing),
but some of that might be related to the completely different timer
code in -mm* which seems to miss all the latest fixes.

-Andi

2006-02-13 10:40:26

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Avuton Olrich <[email protected]> wrote:
>
> On 2/12/06, Linus Torvalds <[email protected]> wrote:
> >
> > Ok,
> > it is out there (or in the process of getting mirrored out), so go wild.
>
> Hello, I appear to be getting the same issue I had back here:
> http://lkml.org/lkml/2006/2/1/121
>
> A fix appeared a few messages later:
> http://lkml.org/lkml/2006/2/1/129
> http://lkml.org/lkml/2006/2/1/131
>
> I'm obviously unsure if these were correct fixes or what but they did
> fix it, and they were or are in -mm afaik.
>
> I am not _sure_ this is the same bug, but the panic messages rings a
> bell (sorry, I already deleted the old pictures as it appeared to be
> taken care of in -mm).
>
> In the case that it's not the same issue here's a new snapshot of the
> kernel panic:
> http://68.111.224.150:8080/~sbh/P1010029.JPG
> ..and the config:
> http://68.111.224.150:8080/~sbh/micromachine.config
>
> If an essential part of the panic is missing please let me know and I
> will try to remember how I got it shrunk down better last time.
>

That looks like a different cpufreq bug. Unfortunately the critical first
few lines have scrolled away. Please boot with `vga=extended' so we get to
see them.

2006-02-13 10:55:09

by Con Kolivas

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Monday 13 February 2006 21:39, Andrew Morton wrote:
> That looks like a different cpufreq bug. Unfortunately the critical first
> few lines have scrolled away. Please boot with `vga=extended' so we get to
> see them.

Just as a suggestion, why don't we print oopsen out in the opposite direction
so the critical information is in the last few lines and the stacktrace in
reverse, or have that as a bootparam option oops=reverse .

Cheers,
Con

2006-02-13 10:57:00

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Avuton Olrich <[email protected]> wrote:
>
> I should have realized that would happen, hopefully here's a better
> one. Please let me know anything I can do to help.
>
> http://68.111.224.150:8080/~sbh/P1010031.JPG

Thanks. Yes, it does look like the same bug.

2006-02-13 12:02:41

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

At Sun, 12 Feb 2006 19:05:20 -0800,
Andrew Morton wrote:
>
> - Patrizio Bassi <[email protected]> has an alsa suspend
> regression ("alsa suspend/resume continues to fail for ens1370")

It's not a "regression". PM didn't work with ens1370 at all in the
eralier version.

About the problem there, I have no idea now what's wrong. The
suspend-to-disk works fine if the driver is built as module but not as
built-in kernel.


> - ghrt <[email protected]> reports an alsa regression ("PROBLEM: SB
> Live! 5.1 (emu10k1, rev. 0a) doesn't work with 2.6.15")

I couldn't find this bugzilla entry...


Takashi

2006-02-13 12:38:08

by Patrizio Bassi

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Takashi Iwai ha scritto:
> At Sun, 12 Feb 2006 19:05:20 -0800,
> Andrew Morton wrote:
>
>> - Patrizio Bassi <[email protected]> has an alsa suspend
>> regression ("alsa suspend/resume continues to fail for ens1370")
>>
>
> It's not a "regression". PM didn't work with ens1370 at all in the
> eralier version.
>
> About the problem there, I have no idea now what's wrong. The
> suspend-to-disk works fine if the driver is built as module but not as
> built-in kernel.
>
>
>
i wrote "regression" because before (ehm...exactly don't know...about
2.6.14 time)
after suspend i had to restart my distro's mixer values service or i
couldn't hear anything.
and...ok..it was boring but worked.

now i get 0x660 errors if i try to restore values and device seems dead,
just flood the syslog.
sound apps slow down (like 100% cpu usage) and no sound produced.

only fix is reboot.


>> - ghrt <[email protected]> reports an alsa regression ("PROBLEM: SB
>> Live! 5.1 (emu10k1, rev. 0a) doesn't work with 2.6.15")
>>
>
> I couldn't find this bugzilla entry...
>
>
> Takashi
>
>

2006-02-13 13:09:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Monday 13 February 2006 13:02, Takashi Iwai wrote:
> At Sun, 12 Feb 2006 19:05:20 -0800,
> Andrew Morton wrote:
> >
> > - Patrizio Bassi <[email protected]> has an alsa suspend
> > regression ("alsa suspend/resume continues to fail for ens1370")
>
> It's not a "regression". PM didn't work with ens1370 at all in the
> eralier version.
>
> About the problem there, I have no idea now what's wrong. The
> suspend-to-disk works fine if the driver is built as module but not as
> built-in kernel.

That may be related to the fact that modular drivers are not present in
memory during resume (just a thought).

Greetings,
Rafael

2006-02-13 13:14:03

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

At Mon, 13 Feb 2006 13:37:55 +0100,
Patrizio Bassi wrote:
>
> Takashi Iwai ha scritto:
> > At Sun, 12 Feb 2006 19:05:20 -0800,
> > Andrew Morton wrote:
> >
> >> - Patrizio Bassi <[email protected]> has an alsa suspend
> >> regression ("alsa suspend/resume continues to fail for ens1370")
> >>
> >
> > It's not a "regression". PM didn't work with ens1370 at all in the
> > eralier version.
> >
> > About the problem there, I have no idea now what's wrong. The
> > suspend-to-disk works fine if the driver is built as module but not as
> > built-in kernel.
> >
> >
> >
> i wrote "regression" because before (ehm...exactly don't know...about
> 2.6.14 time)
> after suspend i had to restart my distro's mixer values service or i
> couldn't hear anything.
> and...ok..it was boring but worked.

You abused the function which wasn't officially supported :)


Takashi

2006-02-13 13:31:35

by Patrizio Bassi

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Takashi Iwai ha scritto:
> At Mon, 13 Feb 2006 13:37:55 +0100,
> Patrizio Bassi wrote:
>
>> Takashi Iwai ha scritto:
>>
>>> At Sun, 12 Feb 2006 19:05:20 -0800,
>>> Andrew Morton wrote:
>>>
>>>
>>>> - Patrizio Bassi <[email protected]> has an alsa suspend
>>>> regression ("alsa suspend/resume continues to fail for ens1370")
>>>>
>>>>
>>> It's not a "regression". PM didn't work with ens1370 at all in the
>>> eralier version.
>>>
>>> About the problem there, I have no idea now what's wrong. The
>>> suspend-to-disk works fine if the driver is built as module but not as
>>> built-in kernel.
>>>
>>>
>>>
>>>
>> i wrote "regression" because before (ehm...exactly don't know...about
>> 2.6.14 time)
>> after suspend i had to restart my distro's mixer values service or i
>> couldn't hear anything.
>> and...ok..it was boring but worked.
>>
>
> You abused the function which wasn't officially supported :)
>
>
> Takashi
>
>
nice i'm an abuser! :)

ok, seriously..that's bad, because before it was not implemented, so ok...
but now it fails with errors (and make apps not working properly) which
is worse.

Patrizio

2006-02-13 13:51:23

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

At Mon, 13 Feb 2006 14:09:51 +0100,
Rafael J. Wysocki wrote:
>
> On Monday 13 February 2006 13:02, Takashi Iwai wrote:
> > At Sun, 12 Feb 2006 19:05:20 -0800,
> > Andrew Morton wrote:
> > >
> > > - Patrizio Bassi <[email protected]> has an alsa suspend
> > > regression ("alsa suspend/resume continues to fail for ens1370")
> >
> > It's not a "regression". PM didn't work with ens1370 at all in the
> > eralier version.
> >
> > About the problem there, I have no idea now what's wrong. The
> > suspend-to-disk works fine if the driver is built as module but not as
> > built-in kernel.
>
> That may be related to the fact that modular drivers are not present in
> memory during resume (just a thought).

I think the modular drivers are on memory but the order of
re-initialization is different.


Takashi

2006-02-13 14:15:32

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

At Mon, 13 Feb 2006 14:31:23 +0100,
Patrizio Bassi wrote:
>
> Takashi Iwai ha scritto:
> > At Mon, 13 Feb 2006 13:37:55 +0100,
> > Patrizio Bassi wrote:
> >
> >> Takashi Iwai ha scritto:
> >>
> >>> At Sun, 12 Feb 2006 19:05:20 -0800,
> >>> Andrew Morton wrote:
> >>>
> >>>
> >>>> - Patrizio Bassi <[email protected]> has an alsa suspend
> >>>> regression ("alsa suspend/resume continues to fail for ens1370")
> >>>>
> >>>>
> >>> It's not a "regression". PM didn't work with ens1370 at all in the
> >>> eralier version.
> >>>
> >>> About the problem there, I have no idea now what's wrong. The
> >>> suspend-to-disk works fine if the driver is built as module but not as
> >>> built-in kernel.
> >>>
> >>>
> >>>
> >>>
> >> i wrote "regression" because before (ehm...exactly don't know...about
> >> 2.6.14 time)
> >> after suspend i had to restart my distro's mixer values service or i
> >> couldn't hear anything.
> >> and...ok..it was boring but worked.
> >>
> >
> > You abused the function which wasn't officially supported :)
> >
> >
> > Takashi
> >
> >
> nice i'm an abuser! :)
>
> ok, seriously..that's bad, because before it was not implemented, so ok...
> but now it fails with errors (and make apps not working properly) which
> is worse.

My rough guess is the initialization order, the resume was called too
early.

What about to put sleep between snd_ensoniq_chip_init() and
snd_ak4531_resume()? Or put more delay in snd_ak4531_resume()?


Takashi

2006-02-13 14:34:45

by Patrizio Bassi

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Takashi Iwai ha scritto:
> At Mon, 13 Feb 2006 14:31:23 +0100,
> Patrizio Bassi wrote:
>
>> Takashi Iwai ha scritto:
>>
>>> At Mon, 13 Feb 2006 13:37:55 +0100,
>>> Patrizio Bassi wrote:
>>>
>>>
>>>> Takashi Iwai ha scritto:
>>>>
>>>>
>>>>> At Sun, 12 Feb 2006 19:05:20 -0800,
>>>>> Andrew Morton wrote:
>>>>>
>>>>>
>>>>>
>>>>>> - Patrizio Bassi <[email protected]> has an alsa suspend
>>>>>> regression ("alsa suspend/resume continues to fail for ens1370")
>>>>>>
>>>>>>
>>>>>>
>>>>> It's not a "regression". PM didn't work with ens1370 at all in the
>>>>> eralier version.
>>>>>
>>>>> About the problem there, I have no idea now what's wrong. The
>>>>> suspend-to-disk works fine if the driver is built as module but not as
>>>>> built-in kernel.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> i wrote "regression" because before (ehm...exactly don't know...about
>>>> 2.6.14 time)
>>>> after suspend i had to restart my distro's mixer values service or i
>>>> couldn't hear anything.
>>>> and...ok..it was boring but worked.
>>>>
>>>>
>>> You abused the function which wasn't officially supported :)
>>>
>>>
>>> Takashi
>>>
>>>
>>>
>> nice i'm an abuser! :)
>>
>> ok, seriously..that's bad, because before it was not implemented, so ok...
>> but now it fails with errors (and make apps not working properly) which
>> is worse.
>>
>
> My rough guess is the initialization order, the resume was called too
> early.
>
> What about to put sleep between snd_ensoniq_chip_init() and
> snd_ak4531_resume()? Or put more delay in snd_ak4531_resume()?
>
>
> Takashi
>
>
i'm almost sure the problem is not there (or, at least not only)
infact i get 0x660 errors (or better a long flood...) while suspending too.

there may be a bug or problem during suspending, and these problems
affect the normal resume.

however sleep is not always trustable...soif you think that's an init
problem you may add a
boolean value to check if init is completed or not and poll for TRUE
value in the resume function.

but, as i wrote before, i'm not sure the problem is **only** there.

Patrizio

2006-02-13 14:39:34

by Takashi Iwai

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

At Mon, 13 Feb 2006 15:34:28 +0100,
Patrizio Bassi wrote:
>
> Takashi Iwai ha scritto:
> > At Mon, 13 Feb 2006 14:31:23 +0100,
> > Patrizio Bassi wrote:
> >
> >> Takashi Iwai ha scritto:
> >>
> >>> At Mon, 13 Feb 2006 13:37:55 +0100,
> >>> Patrizio Bassi wrote:
> >>>
> >>>
> >>>> Takashi Iwai ha scritto:
> >>>>
> >>>>
> >>>>> At Sun, 12 Feb 2006 19:05:20 -0800,
> >>>>> Andrew Morton wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> - Patrizio Bassi <[email protected]> has an alsa suspend
> >>>>>> regression ("alsa suspend/resume continues to fail for ens1370")
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> It's not a "regression". PM didn't work with ens1370 at all in the
> >>>>> eralier version.
> >>>>>
> >>>>> About the problem there, I have no idea now what's wrong. The
> >>>>> suspend-to-disk works fine if the driver is built as module but not as
> >>>>> built-in kernel.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> i wrote "regression" because before (ehm...exactly don't know...about
> >>>> 2.6.14 time)
> >>>> after suspend i had to restart my distro's mixer values service or i
> >>>> couldn't hear anything.
> >>>> and...ok..it was boring but worked.
> >>>>
> >>>>
> >>> You abused the function which wasn't officially supported :)
> >>>
> >>>
> >>> Takashi
> >>>
> >>>
> >>>
> >> nice i'm an abuser! :)
> >>
> >> ok, seriously..that's bad, because before it was not implemented, so ok...
> >> but now it fails with errors (and make apps not working properly) which
> >> is worse.
> >>
> >
> > My rough guess is the initialization order, the resume was called too
> > early.
> >
> > What about to put sleep between snd_ensoniq_chip_init() and
> > snd_ak4531_resume()? Or put more delay in snd_ak4531_resume()?
> >
> >
> > Takashi
> >
> >
> i'm almost sure the problem is not there (or, at least not only)
> infact i get 0x660 errors (or better a long flood...) while suspending too.

IIRC, suspend calls resume callback once to revive the devices back.
So, basically you see the same problem here.


Takashi

2006-02-13 15:14:01

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, Feb 12, 2006 at 10:22:55PM -0500, Trond Myklebust wrote:
> > - Benjamin LaHaise <[email protected]> had an NFS problem ("NFS processes
> > gettting stuck in D with currrent git").
>
> ...but which was apparently not repeatable:
>
> As of this afternoon's tree
> (6150c32589d1976ca8a5c987df951088c05a7542) after the more
> recent set of nfs patches, it seems to be behaving itself. Will
> keep sysrq enabled to see if it hits again, though.
>
> I've had no news from Ben since then...

Confirmed: I've had no problems with NFS since that update, and my test
box uses NFS regularly.

-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.

2006-02-13 15:48:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3



On Mon, 13 Feb 2006, Jan Dittmer wrote:
>
> This breaks compilation on 3 archs compared to -rc2:
>
> - mips: broke
> - sparc: broke
> - sparc64: broke

Should be ok in -git now, pls verify.

Linus

2006-02-13 16:10:50

by Adrian Bunk

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, Feb 12, 2006 at 07:05:20PM -0800, Andrew Morton wrote:
>...
> - Various reports similar to
> http://bugzilla.kernel.org/show_bug.cgi?id=6011, seemingly related to USB
> PCI quirk handling.
>...

This bug contains a patch.

What is the status of this patch?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-13 17:09:48

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.16-rc3: more regressions


I have some regressions [1] on my list that weren't metioned by Andrew:


Subject : Xorg freezes 2.6.16-rc1
References : http://lkml.org/lkml/2006/1/26/97
Submitter : Mauro Tassinari <[email protected]>
Status : unknown

Subject : 2.6.16rc1-git4 slab corruption
References : http://lkml.org/lkml/2006/1/31/164
Submitter : Dave Jones <[email protected]>
Status : unknown

Subject : psmouse starts losing sync in 2.6.16-rc2
References : http://lkml.org/lkml/2006/2/5/50
Submitter : Meelis Roos <[email protected]>
Status : unknown

Subject : OCFS2 Filesystem inconsistency across nodes
References : http://lkml.org/lkml/2006/2/10/14
Submitter : Claudio Martins <[email protected]>
Handled-By : Mark Fasheh <[email protected]>
Status : unknown



cu
Adrian

[1] "regression" defined as "bug was not present in 2.6.15"

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed


2006-02-13 17:26:05

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 06:09:45PM +0100, Adrian Bunk wrote:

> Subject : 2.6.16rc1-git4 slab corruption
> References : http://lkml.org/lkml/2006/1/31/164
> Submitter : Dave Jones <[email protected]>
> Status : unknown

Haven't seen any instances of this for a few days now with latest -git's.

Dave

2006-02-13 17:27:35

by Andi Kleen

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Con Kolivas <[email protected]> writes:

> On Monday 13 February 2006 21:39, Andrew Morton wrote:
> > That looks like a different cpufreq bug. Unfortunately the critical first
> > few lines have scrolled away. Please boot with `vga=extended' so we get to
> > see them.
>
> Just as a suggestion, why don't we print oopsen out in the opposite direction
> so the critical information is in the last few lines and the stacktrace in
> reverse, or have that as a bootparam option oops=reverse .

x86-64 has a one line "executive summary" with the RIP etc.
at the end to solve that problem. Might be a good idea for i386 too.

-Andi

2006-02-13 17:30:55

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On 2/13/06, Adrian Bunk <[email protected]> wrote:
>
> Subject : psmouse starts losing sync in 2.6.16-rc2
> References : http://lkml.org/lkml/2006/2/5/50
> Submitter : Meelis Roos <[email protected]>
> Status : unknown

Working on various manifestations of this one. At worst we will have
to disable resync by default before 2.6.16 final is out and continue
in 2.6.17 cycle.

--
Dmitry

2006-02-13 17:38:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Adrian Bunk wrote:
>
> I have some regressions [1] on my list that weren't metioned by Andrew:
>
> Subject : Xorg freezes 2.6.16-rc1
> References : http://lkml.org/lkml/2006/1/26/97
> Submitter : Mauro Tassinari <[email protected]>
> Status : unknown

For this one, it would be interesting to see more info about the working
setup. Notably

- what modules are loaded by the time X is running
- any differences in 'dmesg' output from 2.6.15 to 16-rc1 (PCI allocation
issues should show up there)

I don't see any real differences in the radeonfb driver, for example
(there's some trivial cleanup, including things like speeling fixums, but
nothing that looks remotely likely).

Of course, in a perfect world, we'd have serial or network console
output.. X crashes are nastier than most, if only because the console is
mostly gone.

Linus

2006-02-13 17:48:23

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 09:35:30AM -0800, Linus Torvalds wrote:

> > Subject : Xorg freezes 2.6.16-rc1
> > References : http://lkml.org/lkml/2006/1/26/97
> > Submitter : Mauro Tassinari <[email protected]>
> > Status : unknown
>
> For this one, it would be interesting to see more info about the working
> setup. Notably
>
> - what modules are loaded by the time X is running
> - any differences in 'dmesg' output from 2.6.15 to 16-rc1 (PCI allocation
> issues should show up there)
>
> Of course, in a perfect world, we'd have serial or network console
> output.. X crashes are nastier than most, if only because the console is
> mostly gone.

I think this is the Radeon DRM bug I hit. (rc1 had a big drm update iirc)

Unless I patch with this..

--- linux-2.6.15.noarch/drivers/char/drm/drm_pciids.h~ 2006-02-09 19:26:06.000000000 -0500
+++ linux-2.6.15.noarch/drivers/char/drm/drm_pciids.h 2006-02-09 19:26:56.000000000 -0500
@@ -85,7 +85,6 @@
{0x1002, 0x5969, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV100}, \
{0x1002, 0x596A, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280}, \
{0x1002, 0x596B, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280}, \
- {0x1002, 0x5b60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV350}, \
{0x1002, 0x5c61, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280|CHIP_IS_MOBILITY}, \
{0x1002, 0x5c62, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280}, \
{0x1002, 0x5c63, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280|CHIP_IS_MOBILITY}, \

I get a machine check exception, triple fault, or NMI watchdog lockup.

Dave

2006-02-13 18:08:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Dave Jones wrote:
>
> I think this is the Radeon DRM bug I hit. (rc1 had a big drm update iirc)

Ahh..

> Unless I patch with this..
>
> --- linux-2.6.15.noarch/drivers/char/drm/drm_pciids.h~ 2006-02-09 19:26:06.000000000 -0500
> +++ linux-2.6.15.noarch/drivers/char/drm/drm_pciids.h 2006-02-09 19:26:56.000000000 -0500
> @@ -85,7 +85,6 @@
> {0x1002, 0x5969, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV100}, \
> {0x1002, 0x596A, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280}, \
> {0x1002, 0x596B, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280}, \
> - {0x1002, 0x5b60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV350}, \
> {0x1002, 0x5c61, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV280|CHIP_IS_MOBILITY}, \

DaveA, I'll apply this for now. Comments?

Linus

2006-02-13 18:20:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Linus Torvalds wrote:
>
> DaveA, I'll apply this for now. Comments?

Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
suppresses the ID entirely, but the string seems to match the one that
Dave Jones reports) may be unrelated.

DaveJ (or Mauro): since you can test this, can you test having that ID
there but _without_ the other changes to drm in -rc1?

Ie was it the addition of that particular ID, or are the other radeon
driver changes (which haven't had as much testing) perhaps the culprit?

I realize that without the ID, that card would never have been tested
anyway, but the point being that plain 2.6.15 with _just_ that ID added
has at least gotten more testing on other (similar) chips. So before I
revert that particular ID, it would be nice to know that it was broken
even with the previous radeon driver state.

Linus

2006-02-13 18:28:44

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 13 Feb 2006, Linus Torvalds wrote:
> >
> > DaveA, I'll apply this for now. Comments?
>
> Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> suppresses the ID entirely, but the string seems to match the one that
> Dave Jones reports) may be unrelated.
>
> DaveJ (or Mauro): since you can test this, can you test having that ID
> there but _without_ the other changes to drm in -rc1?
>
> Ie was it the addition of that particular ID, or are the other radeon
> driver changes (which haven't had as much testing) perhaps the culprit?
>
> I realize that without the ID, that card would never have been tested
> anyway, but the point being that plain 2.6.15 with _just_ that ID added
> has at least gotten more testing on other (similar) chips. So before I
> revert that particular ID, it would be nice to know that it was broken
> even with the previous radeon driver state.

r300 is unlike the other chips though.
Adding that ID on its own doesn't make any sense, as the rest of the
radeon driver won't have a clue how to drive the new hardware.

Dave

2006-02-13 18:34:47

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 13 Feb 2006, Linus Torvalds wrote:
> >
> > DaveA, I'll apply this for now. Comments?
>
> Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> suppresses the ID entirely, but the string seems to match the one that
> Dave Jones reports) may be unrelated.

Dave's patch removes the entry for the card with the 0x5b60.

According to his bug report, Mauro has a Radeon X300SE that should
have the 0x5b70 according to pci.ids from pciutils and that doesn't seem
to be claimed by the DRM driver (and the dmesg from the bug report
confirms that the radeon DRM driver didn't claim to be responsible for
this card).

> DaveJ (or Mauro): since you can test this, can you test having that ID
> there but _without_ the other changes to drm in -rc1?
>
> Ie was it the addition of that particular ID, or are the other radeon
> driver changes (which haven't had as much testing) perhaps the culprit?
>
> I realize that without the ID, that card would never have been tested
> anyway, but the point being that plain 2.6.15 with _just_ that ID added
> has at least gotten more testing on other (similar) chips. So before I
> revert that particular ID, it would be nice to know that it was broken
> even with the previous radeon driver state.

The ID removed by Dave's patch is the only ID listed for an RV370 chips
(the other RV370's aren't listed in the radeon DRM driver).

I suspect Dave and Mauro having unrelated problems.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-13 18:37:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Dave Jones wrote:
>
> r300 is unlike the other chips though.
> Adding that ID on its own doesn't make any sense, as the rest of the
> radeon driver won't have a clue how to drive the new hardware.

There were RV350 entries in the drm_pciids file before 2.6.15 as far as I
can tell..

Linus

2006-02-13 18:37:55

by Mark Fasheh

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 06:09:45PM +0100, Adrian Bunk wrote:
> Subject : OCFS2 Filesystem inconsistency across nodes
> References : http://lkml.org/lkml/2006/2/10/14
> Submitter : Claudio Martins <[email protected]>
> Handled-By : Mark Fasheh <[email protected]>
> Status : unknown
Definitely a regression. I think some patch that was merged shortly after
OCFS2 is causing this. I'm trying to narrow it down to a single patch right
now...
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[email protected]

2006-02-13 18:42:57

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 10:33:24AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 13 Feb 2006, Dave Jones wrote:
> >
> > r300 is unlike the other chips though.
> > Adding that ID on its own doesn't make any sense, as the rest of the
> > radeon driver won't have a clue how to drive the new hardware.
>
> There were RV350 entries in the drm_pciids file before 2.6.15 as far as I
> can tell..

The entry Dave's patch removes is the only RV350 entry for an RV370.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-13 18:43:39

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 07:34:45PM +0100, Adrian Bunk wrote:
> On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
> >
> >
> > On Mon, 13 Feb 2006, Linus Torvalds wrote:
> > >
> > > DaveA, I'll apply this for now. Comments?
> >
> > Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> > suppresses the ID entirely, but the string seems to match the one that
> > Dave Jones reports) may be unrelated.
>
> Dave's patch removes the entry for the card with the 0x5b60.
> According to his bug report, Mauro has a Radeon X300SE that should
> have the 0x5b70 according to pci.ids from pciutils and that doesn't seem
> to be claimed by the DRM driver (and the dmesg from the bug report
> confirms that the radeon DRM driver didn't claim to be responsible for
> this card).

The X300SE (mine at least) is a dual head card, with a 0x5b60 _and_ a 0x5b70

Dave

2006-02-13 18:47:58

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 01:42:09PM -0500, Dave Jones wrote:
> On Mon, Feb 13, 2006 at 07:34:45PM +0100, Adrian Bunk wrote:
> > On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Mon, 13 Feb 2006, Linus Torvalds wrote:
> > > >
> > > > DaveA, I'll apply this for now. Comments?
> > >
> > > Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> > > suppresses the ID entirely, but the string seems to match the one that
> > > Dave Jones reports) may be unrelated.
> >
> > Dave's patch removes the entry for the card with the 0x5b60.
> > According to his bug report, Mauro has a Radeon X300SE that should
> > have the 0x5b70 according to pci.ids from pciutils and that doesn't seem
> > to be claimed by the DRM driver (and the dmesg from the bug report
> > confirms that the radeon DRM driver didn't claim to be responsible for
> > this card).
>
> The X300SE (mine at least) is a dual head card, with a 0x5b60 _and_ a 0x5b70

OK, this might explain it.
Then your patch could indeed fix Mauro's problem.

> Dave

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-13 18:51:54

by Alex Deucher

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On 2/13/06, Dave Jones <[email protected]> wrote:
> On Mon, Feb 13, 2006 at 07:34:45PM +0100, Adrian Bunk wrote:
> > On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Mon, 13 Feb 2006, Linus Torvalds wrote:
> > > >
> > > > DaveA, I'll apply this for now. Comments?
> > >
> > > Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> > > suppresses the ID entirely, but the string seems to match the one that
> > > Dave Jones reports) may be unrelated.
> >
> > Dave's patch removes the entry for the card with the 0x5b60.
> > According to his bug report, Mauro has a Radeon X300SE that should
> > have the 0x5b70 according to pci.ids from pciutils and that doesn't seem
> > to be claimed by the DRM driver (and the dmesg from the bug report
> > confirms that the radeon DRM driver didn't claim to be responsible for
> > this card).
>
> The X300SE (mine at least) is a dual head card, with a 0x5b60 _and_ a 0x5b70
>
> Dave
>

The secondary id is just a place holder for the windows driver so
dualhead will work on windows 2000. Neither the drm nor the xorg DDX
uses the secondary id.

Alex

2006-02-13 18:54:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Adrian Bunk wrote:
>
> Dave's patch removes the entry for the card with the 0x5b60.
>
> According to his bug report, Mauro has a Radeon X300SE that should
> have the 0x5b70 according to pci.ids from pciutils

No. Look closer:

04:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)] (prog-if 00 [VGA])
Subsystem: ASUSTeK Computer Inc. Extreme AX300SE-X

That's the 5b60 chip too (yeah, the lspci doesn't show numbers when it has
an ascii string, but in this case the ascii string happily has at least
that part of the number in it: ".. RV370 5B60 ..", where the 5B60 is just
the PCI ID number).

So it _is_ the same chip.

I just worry whether (a) the other added PCI ID's are any good for that
core and (b) whether the bug was really introduced with some of the other
changes. I admit that (b) is pretty unlikely, but it would be good to
test.

> and that doesn't seem to be claimed by the DRM driver (and the dmesg
> from the bug report confirms that the radeon DRM driver didn't claim to
> be responsible for this card).

Sadly, that module loading is done by X. So the bootup dmesg stuff
wouldn't have had the message.

It might be interesting to see if the hang can be reproduced without
starting X at all, by just going a "modprobe radeon" or something.
Unlikely, though - while loading the drm modules does _some_ things to the
card, it's usually only when X actually starts sending commands to them
that things really go downhill..

Linus

2006-02-13 19:09:11

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, Feb 13, 2006 at 10:50:52AM -0800, Linus Torvalds wrote:
>...
> I just worry whether (a) the other added PCI ID's are any good for that
> core and (b) whether the bug was really introduced with some of the other
> changes. I admit that (b) is pretty unlikely, but it would be good to
> test.
>...

The one thing I have not yet been proven wrong for is that this PCI id
is the only one we have in this driver for an RV370.

Perhaps someone will be able to prove this wrong, too, ;-) but otherwise
it would be a good explanation why exactly this one is causing problems.

> Linus

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-13 19:20:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Adrian Bunk wrote:
>
> The one thing I have not yet been proven wrong for is that this PCI id
> is the only one we have in this driver for an RV370.

It definitely is an RV370, you're right in that. I'm too lazy to actually
see if the other entries that claim to be RV350's really are RV350's.

So I decided to just remove it. Even if there is some other bug that could
make it work again, we can always just re-add it at that time. In the
meantime, this should fix both DaveJs and Mauros problems, and is clearly
no worse than 2.6.15 (which also didn't recognize the card), so...

Linus

2006-02-13 19:24:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions



On Mon, 13 Feb 2006, Linus Torvalds wrote:
>
> So I decided to just remove it. Even if there is some other bug that could
> make it work again, we can always just re-add it at that time. In the
> meantime, this should fix both DaveJs and Mauros problems, and is clearly
> no worse than 2.6.15 (which also didn't recognize the card), so...

Btw, on a totally unrelated tangent, I just wanted to say how much I
appreciate people looking for regressions like this, and trying to track
them. Andrew does it, but this is absolutely something that should be
possible to get more people to do, and it would be a huge boon for kernel
development if we had a more aggressive regression tracking system.

Right now it all very easily gets lost in the noise - either on the
mailing list or even on bugzilla (where following up on regressions and
trying to get details and prodding people to perhaps try to narrow things
down a bit more often ends up falling on the floor, because it's a big
job).

Linus

2006-02-13 19:27:24

by Daniel Drake

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Adrian Bunk wrote:
> On Sun, Feb 12, 2006 at 07:05:20PM -0800, Andrew Morton wrote:
>> ...
>> - Various reports similar to
>> http://bugzilla.kernel.org/show_bug.cgi?id=6011, seemingly related to USB
>> PCI quirk handling.
>> ...
>
> This bug contains a patch.
>
> What is the status of this patch?

The patch is in Greg's tree so should see its way to Linus soon.
However, it's not the complete fix for the general issue.

Gentoo have had two reports of it. One is fixed by David's patch, the
other is not (http://bugs.gentoo.org/122277 - will be re-filed on kernel
bugzilla once I have investigated more).

Daniel

2006-02-13 19:32:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Monday 13 February 2006 14:51, Takashi Iwai wrote:
> At Mon, 13 Feb 2006 14:09:51 +0100,
> Rafael J. Wysocki wrote:
> >
> > On Monday 13 February 2006 13:02, Takashi Iwai wrote:
> > > At Sun, 12 Feb 2006 19:05:20 -0800,
> > > Andrew Morton wrote:
> > > >
> > > > - Patrizio Bassi <[email protected]> has an alsa suspend
> > > > regression ("alsa suspend/resume continues to fail for ens1370")
> > >
> > > It's not a "regression". PM didn't work with ens1370 at all in the
> > > eralier version.
> > >
> > > About the problem there, I have no idea now what's wrong. The
> > > suspend-to-disk works fine if the driver is built as module but not as
> > > built-in kernel.
> >
> > That may be related to the fact that modular drivers are not present in
> > memory during resume (just a thought).
>
> I think the modular drivers are on memory but the order of
> re-initialization is different.

No, they are not (this is the part I'm sure of). software_resume() is called
before any modules have a chance to be loaded unless you boot with
noresume, load them from an initrd and start the resume manually.

Greetings,
Rafael

2006-02-13 20:30:55

by Paul Fulghum

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, 2006-02-12 at 19:05 -0800, Andrew Morton wrote:

> - A couple of random tty-related oopses reported by Jesper Juhl. We
> don't know why these happened - they appear to not be related to the tty
> buffering changes.

I just posted a patch for this under
[PATCH] tty reference count fix

This is not related to the tty buffering changes.

--
Paul Fulghum
Microgate Systems, Ltd

2006-02-13 20:38:24

by Greg KH

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Sun, Feb 12, 2006 at 07:05:20PM -0800, Andrew Morton wrote:
>
> We still have some serious bugs, several of which are in 2.6.15 as well:
>
> - Nathan's "sysfs-related oops during module unload", which Greg seems to
> have under control.

Yes, this isn't a "regression" but has been there for a while. It can
also only be triggered by a root user, so it's severity is much lower.

> - Various reports similar to
> http://bugzilla.kernel.org/show_bug.cgi?id=6011, seemingly related to USB
> PCI quirk handling.

I have a patch for this, which will be going to Linus later tonight.

> - Nasty warnings from scsi about kobject-layer things being called from
> irq context. James has a push-it-to-process-context patch which sadly
> assumes kmalloc() is immortal, but no other fix seems to have offered
> itself.

This has been the case for a long time. I don't really think there is a
rush to get this fixed, but I really like James's proposed patch. It's
up to him if he feels it is ready for 2.6.16 or not.

> - Helge Hafting reports a usb printer regression - I don't know if that's
> still live?

He never reported back, and one other person who reported this, figured
out that it was bad ram in the system. Replacing that fixed the issue.
I've also printed a lot of stuff here (tax time...) and had no problems.

> - "Carlo E. Prelz" <[email protected]> has another USB/ehci regression
> ("ATI RS480-based motherboard: stuck while booting with kernel >= 2.6.15
> rc1").

Fixed by the previously mentioned EHCI quirk/handoff patch that will be
going to Linus.

So, that's it for the USB stuff, thankfully.

thanks,

greg k-h

2006-02-13 22:11:08

by Jan Dittmer

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Linus Torvalds wrote:
>
> On Mon, 13 Feb 2006, Jan Dittmer wrote:
>
>>This breaks compilation on 3 archs compared to -rc2:
>>
>>- mips: broke
>>- sparc: broke
>>- sparc64: broke
>
>
> Should be ok in -git now, pls verify.
>

- mips: fixed
Details: http://l4x.org/k/?d=10915

- sparc: fixed
Details: http://l4x.org/k/?d=10932

- sparc64: fixed
Details: http://l4x.org/k/?d=10933

Good work,

Jan

2006-02-13 23:27:43

by Jesse Allen

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On 2/13/06, Linus Torvalds <[email protected]> wrote:
>
>
> On Mon, 13 Feb 2006, Adrian Bunk wrote:
> >
> > The one thing I have not yet been proven wrong for is that this PCI id
> > is the only one we have in this driver for an RV370.
>
> It definitely is an RV370, you're right in that. I'm too lazy to actually
> see if the other entries that claim to be RV350's really are RV350's.
>

Well a while back, I hacked in the pci id for my Xpress 200M (5955),
which is basically an RV370 with no dedicated vram. I did the same
thing and claimed an RV350, which is the closest model. This allowed
the radeon module to load. When I startx'ed and DRI was allowed to
load on it, it locked up. So I never sent in the patch. I believe
the person who sent this one in originally seemed to indicate that it
worked, and I believed it if he had an X300 and my problem was having
the IGP version. But now having this reported, I'm pretty sure it is
the same problem. RV370 doesn't seem to work as an RV350.

Jesse

2006-02-13 23:36:04

by Felix Kühling

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

Am Montag, den 13.02.2006, 16:27 -0700 schrieb Jesse Allen:
> On 2/13/06, Linus Torvalds <[email protected]> wrote:
> >
> >
> > On Mon, 13 Feb 2006, Adrian Bunk wrote:
> > >
> > > The one thing I have not yet been proven wrong for is that this PCI id
> > > is the only one we have in this driver for an RV370.
> >
> > It definitely is an RV370, you're right in that. I'm too lazy to actually
> > see if the other entries that claim to be RV350's really are RV350's.
> >
>
> Well a while back, I hacked in the pci id for my Xpress 200M (5955),
> which is basically an RV370 with no dedicated vram. I did the same
> thing and claimed an RV350, which is the closest model. This allowed
> the radeon module to load. When I startx'ed and DRI was allowed to
> load on it, it locked up. So I never sent in the patch. I believe
> the person who sent this one in originally seemed to indicate that it
> worked, and I believed it if he had an X300 and my problem was having
> the IGP version. But now having this reported, I'm pretty sure it is
> the same problem. RV370 doesn't seem to work as an RV350.

The Xpress200 chips have a completely different GART implementation.
Thus the driver can't even send commands to the command processor and
that's why X locked up on you when DRI was enabled. This has nothing to
do with the Xpress200 being (almost) an RV370 or not.

Regards,
Felix

--
| Felix Kühling <[email protected]> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |

2006-02-14 00:08:20

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

>
> DaveA, I'll apply this for now. Comments?
>

(sorry - I've been finding my way back home from Xdevconf, just landed)...

I asked DaveJ I believe in one thread to disable Load "dri" in his
xorg.conf and report back,

the X.org driver contains problems not the DRM driver, however adding
the PCI ID to the DRM driver will cause the X.org driver to enable
buggy features..

I cannot fix this from the DRM side, either we enable DRM for these
cards at some point or we don't, ideally the X.org driver wouldn't
enable DRI by default for r300 class cards..

There may be some other issues however that Ben is currently looking
into with the memory manager setup,

I've tested the r300 on the 5460 rv350, and I've heard the rv370
doesn't have any great differences, the Xpress 200 is a mess and I'm
not accepting any patches for them until someone with one that knows
what they are doing gets it working..

Dave.

Dave.

2006-02-14 00:30:41

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Tue, Feb 14, 2006 at 11:08:16AM +1100, Dave Airlie wrote:
> >
> > DaveA, I'll apply this for now. Comments?
> >
>
> (sorry - I've been finding my way back home from Xdevconf, just landed)...
>
> I asked DaveJ I believe in one thread to disable Load "dri" in his
> xorg.conf and report back,

Ah sorry about that. You were just about to go to LCA when that came
up, so I figured I'd wait until you had time to look at it again :)

There's a log at http://people.redhat.com/davej/Xorg.0.log
That doesn't have drm disabled, but it is being run on a kernel
with the pci id commented out. I'm a bit reluctant to reboot
the workstation to try a non-drm enabled X at the moment, until
some stuff finishes. Let me know if that log is insufficient or not.

Dave

2006-02-14 00:33:37

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

nding my way back home from Xdevconf, just landed)...
> >
> > I asked DaveJ I believe in one thread to disable Load "dri" in his
> > xorg.conf and report back,
>
> Ah sorry about that. You were just about to go to LCA when that came
> up, so I figured I'd wait until you had time to look at it again :)
>
> There's a log at http://people.redhat.com/davej/Xorg.0.log
> That doesn't have drm disabled, but it is being run on a kernel
> with the pci id commented out. I'm a bit reluctant to reboot
> the workstation to try a non-drm enabled X at the moment, until
> some stuff finishes. Let me know if that log is insufficient or not.
>

Well the things is not having the PCI id is the same things as
commenting out Load "dri" in the xorg.conf really, X never tries to
initialise the DRM layer on the card, I know this is a bug in the
current 7.0 radeon driver and I'm hoping Ben's fixed can fix this
however, we will end up never being able to turn on DRM support for
r300 cards until X.org 7.1 is widespread or we add some kind of second
stage enable that a new X server can work but I'm not really sure that
this is possible,

The fix is to not load DRI on old X.org r300 installs...

Dave.

2006-02-14 02:45:49

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Andrew Morton <[email protected]> wrote:
>
> Avuton Olrich <[email protected]> wrote:
> >
> > I should have realized that would happen, hopefully here's a better
> > one. Please let me know anything I can do to help.
> >
> > http://68.111.224.150:8080/~sbh/P1010031.JPG
>
> Thanks. Yes, it does look like the same bug.

argh. The fix for this oops is still languishing in David's tree.


--- a/arch/i386/kernel/timers/timer_tsc.c
+++ b/arch/i386/kernel/timers/timer_tsc.c
@@ -272,6 +272,10 @@ time_cpufreq_notifier(struct notifier_bl
if (val != CPUFREQ_RESUMECHANGE)
write_seqlock_irq(&xtime_lock);
if (!ref_freq) {
+ if (!freq->old){
+ ref_freq = freq->new;
+ goto end;
+ }
ref_freq = freq->old;
loops_per_jiffy_ref = cpu_data[freq->cpu].loops_per_jiffy;
#ifndef CONFIG_SMP
@@ -297,6 +301,7 @@ time_cpufreq_notifier(struct notifier_bl
#endif
}

+end:
if (val != CPUFREQ_RESUMECHANGE)
write_sequnlock_irq(&xtime_lock);


2006-02-14 02:54:54

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, Feb 13, 2006 at 06:44:42PM -0800, Andrew Morton wrote:
> Andrew Morton <[email protected]> wrote:
> >
> > Avuton Olrich <[email protected]> wrote:
> > >
> > > I should have realized that would happen, hopefully here's a better
> > > one. Please let me know anything I can do to help.
> > >
> > > http://68.111.224.150:8080/~sbh/P1010031.JPG
> >
> > Thanks. Yes, it does look like the same bug.
>
> argh. The fix for this oops is still languishing in David's tree.

I was waiting for it to turn up in an -mm release first to be
sure everything was ok.

If you're ok with it going as is, Linus, please pull
from master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq.git/
to get the changesets below.

Dave


commit 7d5e350fab47f1273bc8b52d5f133ed6e4baeb7f
Author: Dave Jones <[email protected]>
Date: Thu Feb 2 17:03:42 2006 -0500

[CPUFREQ] Whitespace/CodingStyle cleanups

Signed-off-by: Dave Jones <[email protected]>

commit a85f7bd310dbc9010309bfe70b6b02432a11ef59
Author: Thomas Renninger <[email protected]>
Date: Wed Feb 1 11:36:04 2006 +0100

[CPUFREQ] Check whether driver init did not initialize current freq

Check whether driver init did not initialize current freq

Signed-off-by: Thomas Renninger <[email protected]>
Signed-off-by: Dave Jones <[email protected]>

commit 9d2725bb815d915fc6c8531097d9e71b579a8763
Author: Thomas Renninger <[email protected]>
Date: Wed Feb 1 11:38:37 2006 +0100

[CPUFREQ] Check for not initialized freq on cpufreq changes

Test for old_freq equals 0 to insure not to divide by 0:
______________________________________________

Check for not initialized freq on cpufreq changes

Signed-off-by: Thomas Renninger <[email protected]>
Signed-off-by: Dave Jones <[email protected]>

commit e4472cb3706ceea42797ae1dc79d624026986694
Author: Dave Jones <[email protected]>
Date: Tue Jan 31 15:53:55 2006 -0800

[CPUFREQ] cpufreq_notify_transition cleanup.

Introduce caching of cpufreq_cpu_data[freqs->cpu], which allows us to
make the function a lot more readable, and as a nice side-effect, it
now fits in < 80 column displays again.

Signed-off-by: Dave Jones <[email protected]>

2006-02-14 03:10:23

by Michal Jaegermann

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> >
> > I think we can assume that it will be seen there. 2.6.16 is going into
> > distros and will have more exposure than 2.6.15,
>
> 2.6.15 went into distros as well, such as Fedora Core 4 ;)

And promptly broke laptop suspension. See, for example:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998

Michal

2006-02-14 03:22:18

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Dave Jones <[email protected]> wrote:
>
> > argh. The fix for this oops is still languishing in David's tree.
>
> I was waiting for it to turn up in an -mm release first to be
> sure everything was ok.

If we're at -rc<late> and we have a fix in hand and we know we will want
that fix in 2.6.x, I don't think there's a lot of point in hanging around -
slam it in asap, give it the most exposure possible.

2006-02-14 03:31:11

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Michal Jaegermann <[email protected]> wrote:
>
> On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> > >
> > > I think we can assume that it will be seen there. 2.6.16 is going into
> > > distros and will have more exposure than 2.6.15,
> >
> > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
>
> And promptly broke laptop suspension. See, for example:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998
>

That's suspend-to-disk, yes?

Dave, would you have the 2.6.15-1.1830_FC4 -> 2.6.15-1.1831_FC4 details
handy? There surely can't be much difference?

There seem to be several ACPI problems there. Do we have a reliable means
of feeding such reports up into the (for example) acpi developers?

<I have this vaguely unsettled feeling that distros must get more bug
reports than the usptream developers, yet we hear so little about it>

2006-02-14 03:31:04

by Lee Revell

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, 2006-02-13 at 20:08 -0700, Michal Jaegermann wrote:
> On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> > >
> > > I think we can assume that it will be seen there. 2.6.16 is going into
> > > distros and will have more exposure than 2.6.15,
> >
> > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
>
> And promptly broke laptop suspension. See, for example:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998

It broke suspension on YOUR laptop - the bug report does not give a make
and model. 2.6.15 would not have shipped if it broke suspend on the
developers' laptops.

Lee

2006-02-14 04:50:47

by Dave Jones

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, Feb 13, 2006 at 07:28:38PM -0800, Andrew Morton wrote:
> Michal Jaegermann <[email protected]> wrote:
> >
> > On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > > On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> > > >
> > > > I think we can assume that it will be seen there. 2.6.16 is going into
> > > > distros and will have more exposure than 2.6.15,
> > >
> > > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
> >
> > And promptly broke laptop suspension. See, for example:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998
> >
>
> That's suspend-to-disk, yes?
>
> Dave, would you have the 2.6.15-1.1830_FC4 -> 2.6.15-1.1831_FC4 details
> handy? There surely can't be much difference?

Tiny changes.
- The icmp remote DoS fix.
- Dropped a patch that broke booting with 'quiet' bootparam
- the 'dm_crypt: zero key before freeing it' change

> There seem to be several ACPI problems there. Do we have a reliable means
> of feeding such reports up into the (for example) acpi developers?
>
> <I have this vaguely unsettled feeling that distros must get more bug
> reports than the usptream developers, yet we hear so little about it>

I'd love more hours in the day to push more of them upstream, as
I bet would other vendors kernel maintainers.

Should anyone want to drink from the firehose that is 'redhat kernel bugzilla',
let me know, and I'll see if I can't get a fedora-kernel-bugs mailing
list or the like set up.

Some subsystem maintainers (ACPI for example) really help out here,
and add [email protected] to all the Fedora ACPI bugs.
(I believe that list actually gets bug reports from other distro bugzillas too)

(There's also a few 'meta-bugs' -- enter FCMETA_ACPI as a bug id
and you get a link to a dependancy tree showing all the ACPI bugs
reported. There's a bunch of those for various subsystems which
makes it a little easier to track, though again, it's time-consuming
just sorting through stuff). Off the top of my head, theres one
for USB, SCSI, ACPI, ALSA, SATA (All with FCMETA_ prefix)
Some of them are a bit sparse due to lack of time & effort so far.

Dave

2006-02-14 05:30:03

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3


> <I have this vaguely unsettled feeling that distros must get more bug
> reports than the usptream developers, yet we hear so little about it>

the number of quality reports (eg enough information to do anything)
isn't that high; Dave is pretty good in sending the good ones on, it
often takes time though to get all the basic info..


2006-02-14 05:31:11

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, 2006-02-13 at 20:08 -0700, Michal Jaegermann wrote:
> On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> > >
> > > I think we can assume that it will be seen there. 2.6.16 is going into
> > > distros and will have more exposure than 2.6.15,
> >
> > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
>
> And promptly broke laptop suspension. See, for example:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998


fedora core 4 never really supported suspend in the first place tho..


2006-02-14 06:23:47

by Michal Jaegermann

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, Feb 13, 2006 at 07:28:38PM -0800, Andrew Morton wrote:
> Michal Jaegermann <[email protected]> wrote:
> >
> > On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > > On Mon, 2006-02-13 at 00:12 -0800, Andrew Morton wrote:
> > > >
> > > > I think we can assume that it will be seen there. 2.6.16 is going into
> > > > distros and will have more exposure than 2.6.15,
> > >
> > > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
> >
> > And promptly broke laptop suspension. See, for example:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998
> >
>
> That's suspend-to-disk, yes?

No. That is an S3 suspension to RAM. On the box in question it
generally worked for quite a long while now provided
'acpi_sleep=s3_bios' was passed to a kernel or video would be
unrestorable. It is Acer Travelmate 230 with i845G video.

I did not try on that laptop suspend-to-disk so far (and in this
moment the damn thing is just plain broken).

Michal

2006-02-14 06:25:52

by Brown, Len

[permalink] [raw]
Subject: RE: Linux 2.6.16-rc3

>i have a bug similar to:
>http://bugzilla.kernel.org/show_bug.cgi?id=6038 (marked as blocking by
>Andrew) on my laptop.

6038 is in NEEDINFO.
If no submitter participates, it may never get fixed.

-Len

2006-02-14 06:57:09

by Michal Jaegermann

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, Feb 13, 2006 at 10:30:56PM -0500, Lee Revell wrote:
> On Mon, 2006-02-13 at 20:08 -0700, Michal Jaegermann wrote:
> > On Mon, Feb 13, 2006 at 09:57:48AM +0100, Arjan van de Ven wrote:
> > >
> > > 2.6.15 went into distros as well, such as Fedora Core 4 ;)
> >
> > And promptly broke laptop suspension. See, for example:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180998
>
> It broke suspension on YOUR laptop - the bug report does not give a make
> and model.

Yes, indeed. It is Acer Travelmate 230 and it is using
'acpi_sleep=s3_bios'. The bug report noted though that the
following showed up:

Yenta O2: res at 0x94/0xD4: 00/ca
Yenta O2: enabling read prefetch/write burst
ACPI-0265: *** Error: No installed handler for fixed event [00000002]

which was not something which I have seen before and indeed on
another laptop with the same kernel is absent. But it was also not
there on 230 with earlier kernels.

BTW - this another laptop mentioned above, which happens to be Acer
Travelmate 740, is doing suspend/resume with 2.6.15 kernel, and no
'acpi_sleep=s3_bios' is needed, but shortly after such cycle both
an external mouse and a touchpad go crazy and a mouse pointer
refuses to move in X from the left screen edge. Not very useful and
so far I did not found a way to reset rodents. No problems of that
sort before I will try to suspend. Always something "interesting".
It is actually possible that in this case this is a problem with
"ATI Radeon Mobility M6" video driver which gets upset by suspend
(or some other pieces driving display) but I do not really know.

Michal

2006-02-14 15:19:08

by Gerald Britton

[permalink] [raw]
Subject: [PATCH] x86: fix oprofile kernel callgraph regression

Fix x86 oprofile regression introduced by:
commit c34d1b4d165c67b966bca4aba026443d7ff161eb
[PATCH] mm: kill check_user_page_readable

That commit reorganized tests for the userspace stack walking moving all
those tests into dump_backtrace(), however, dump_backtrace() was used for
both userspace and kernel stalk walking. The result is typically no
recorded callgraph information for kernel samples.

Revive the original function as dump_kernel_backtrace() and rename the
other to dump_user_backtrace() to avoid future confusion.

Signed-off-by: Gerald Britton <[email protected]>
---
backtrace.c | 19 ++++++++++++++++---
1 files changed, 16 insertions(+), 3 deletions(-)
--- a/arch/i386/oprofile/backtrace.c 2006-02-13 19:27:40.000000000 -0500
+++ b/arch/i386/oprofile/backtrace.c 2006-02-13 19:30:32.000000000 -0500
@@ -20,7 +20,20 @@ struct frame_head {
} __attribute__((packed));

static struct frame_head *
-dump_backtrace(struct frame_head * head)
+dump_kernel_backtrace(struct frame_head * head)
+{
+ oprofile_add_trace(head->ret);
+
+ /* frame pointers should strictly progress back up the stack
+ * (towards higher addresses) */
+ if (head >= head->ebp)
+ return NULL;
+
+ return head->ebp;
+}
+
+static struct frame_head *
+dump_user_backtrace(struct frame_head * head)
{
struct frame_head bufhead[2];

@@ -105,10 +118,10 @@ x86_backtrace(struct pt_regs * const reg

if (!user_mode_vm(regs)) {
while (depth-- && valid_kernel_stack(head, regs))
- head = dump_backtrace(head);
+ head = dump_kernel_backtrace(head);
return;
}

while (depth-- && head)
- head = dump_backtrace(head);
+ head = dump_user_backtrace(head);
}

2006-02-14 15:57:39

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] x86: fix oprofile kernel callgraph regression

On Tue, 14 Feb 2006, Gerald Britton wrote:

> Fix x86 oprofile regression introduced by:
> commit c34d1b4d165c67b966bca4aba026443d7ff161eb
> [PATCH] mm: kill check_user_page_readable
>
> That commit reorganized tests for the userspace stack walking moving all
> those tests into dump_backtrace(), however, dump_backtrace() was used for
> both userspace and kernel stalk walking. The result is typically no
> recorded callgraph information for kernel samples.
>
> Revive the original function as dump_kernel_backtrace() and rename the
> other to dump_user_backtrace() to avoid future confusion.
>
> Signed-off-by: Gerald Britton <[email protected]>

Apology-from: Hugh Dickins <[email protected]>

2006-02-14 16:35:17

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Mon, 2006-02-13 at 12:38 -0800, Greg KH wrote:
> > - Nasty warnings from scsi about kobject-layer things being called
> from
> > irq context. James has a push-it-to-process-context patch which
> sadly
> > assumes kmalloc() is immortal, but no other fix seems to have
> offered
> > itself.
>
> This has been the case for a long time. I don't really think there is
> a
> rush to get this fixed, but I really like James's proposed patch.
> It's
> up to him if he feels it is ready for 2.6.16 or not.

Well, I can't solve the problem that it requires memory allocation from
IRQ context to operate. Based on that, it's an unsafe interface. I'm
going to put it inside SCSI for 2.6.16, since it's better than what we
have now, but I don't think we can export it globally.

James


2006-02-14 21:17:58

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

> I don't think anybody claimed this isn't a regression for the 600X.

I narrowed it further. The short story is that this commit (diff below
sig) makes the second S3 sleep go into the endless loop, if the loaded
modules are exactly thermal, processor, intel_agp, and agpgart:

53f11d4ff8797bcceaf014e62bd39f16ce84baec is first bad commit
diff-tree 53f11d4ff8797bcceaf014e62bd39f16ce84baec (from 02b28a33aae93a3b53068e0858d62f8bcaef60a3)
Author: Len Brown <[email protected]>
Date: Mon Dec 5 16:46:36 2005 -0500

[ACPI] Enable Embedded Controller (EC) interrupt mode by default

"ec_intr=0" reverts to polling
"ec_burst=" no longer exists.

Signed-off-by: Len Brown <[email protected]>
Acked-by: Luming Yu <[email protected]>

:040000 040000 9eec66712c68ebe372b2fb2c8d78bdc99df942ab e7e62cd09983730aee468edd4ba1cce50786b7e5 M Documentation
:040000 040000 6e7db46918f6124f64a11f6757560078a8a27519 aa8abb1023024902300cb2e7a5bf74acd8c579e8 M drivers

If I boot with ec_intr=0, the second sleep works fine.

Here is the full story. First I tried a system with the minimal set of
modules to boot and run X (S3 sleep-wake wrecks the VGA consoles, but X
restores fine with 'chvt 1; sleep 0.5 ; chvt 7' on wakeup). So I
stopped every service and unloaded all modules possible, which left only
intel_agp and agpgart. Then, for each of the usual loaded modules
(except for sound modules, which often has sleep-wake problems anyway),
I tried:

1. Load the module
2. S3 sleep-wake-sleep-wake
3. Unload the module

to see whether the second sleep went into the infinite loop (visible
across the console). The one culprit I found was 'thermal' (which
brings in 'processor'). Other modules didn't trigger the problem.

Then I recompiled using a minimal config, with only networking (for X to
work) and the acpi modules, and maybe a few others that I couldn't
avoid, and retested to make sure 'thermal' still triggered the problem,
which it did. I used this config, or one just like it for 2.6.15, as a
base for all other kernels in the bisection search, using the nearest
ancestor's .config and then
yes '' | make oldconfig
to make the new .config

Eventually bisection converged to the commit above, and then I retested
that kernel with 'ec_intr=0'.

Is this a problem with the TP 600X hardware, in which case I'll just use
ec_intr=0 forever, or is there more software debugging (DSDT or related
to the diff)? I can turn on gobs of ACPI debugging and send the useful
parts of the log file.

In the bisection, many kernels worked fine, meaning that the second S3
cycle returned and I could compile the next bisection kernel. In doing
that I noticed another problem, that the fan would not turn on with some
of these good kernels even though the system was hot enough (plenty of
chance for it to turn on since the next compile heated up the CPU). For
example, acpi -t showed

Thermal 1: ok, 46.0 degrees C
Thermal 2: active[0], 42.0 degrees C
Thermal 3: ok, 31.0 degrees C
Thermal 4: ok, 34.0 degrees C

but the fan was off.

So whenever I had a good kernel (meaning taht S3 sleep-wake-sleep-wake
returned), I checked whether the fan would turn on, to collect data for
a separate bisection search. However, the data seems inconsistent. If
I feed the data to a fresh git bisect, it complains about one commit
being marked both good and bad. So I'll ask on the git list about that
issue.

-Sanjoy

`A society of sheep must in time beget a government of wolves.'
- Bertrand de Jouvenal


diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 5dffcfe..2ad64ef 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -452,6 +452,11 @@ running once the system is up.

eata= [HW,SCSI]

+ ec_intr= [HW,ACPI] ACPI Embedded Controller interrupt mode
+ Format: <int>
+ 0: polling mode
+ non-0: interrupt mode (default)
+
eda= [HW,PS2]

edb= [HW,PS2]
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index bb3963b..d4366ad 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -73,7 +73,7 @@ static struct acpi_driver acpi_ec_driver
.class = ACPI_EC_CLASS,
.ids = ACPI_EC_HID,
.ops = {
- .add = acpi_ec_poll_add,
+ .add = acpi_ec_intr_add,
.remove = acpi_ec_remove,
.start = acpi_ec_start,
.stop = acpi_ec_stop,
@@ -147,7 +147,7 @@ static union acpi_ec *ec_ecdt;

/* External interfaces use first EC only, so remember */
static struct acpi_device *first_ec;
-static int acpi_ec_poll_mode = EC_POLL;
+static int acpi_ec_poll_mode = EC_INTR;

/* --------------------------------------------------------------------------
Transaction Management
@@ -1594,4 +1594,4 @@ static int __init acpi_ec_set_intr_mode(
return 0;
}

-__setup("ec_burst=", acpi_ec_set_intr_mode);
+__setup("ec_intr=", acpi_ec_set_intr_mode);

2006-02-14 23:56:05

by Gerhard Mack

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On Mon, 13 Feb 2006, Adrian Bunk wrote:

> Date: Mon, 13 Feb 2006 19:34:45 +0100
> From: Adrian Bunk <[email protected]>
> To: Linus Torvalds <[email protected]>
> Cc: Dave Jones <[email protected]>, Andrew Morton <[email protected]>,
> Linux Kernel Mailing List <[email protected]>,
> Mauro Tassinari <[email protected]>, [email protected],
> [email protected]
> Subject: Re: 2.6.16-rc3: more regressions
>
> On Mon, Feb 13, 2006 at 10:16:59AM -0800, Linus Torvalds wrote:
> >
> >
> > On Mon, 13 Feb 2006, Linus Torvalds wrote:
> > >
> > > DaveA, I'll apply this for now. Comments?
> >
> > Btw, the fact that Mauro has the same exact PCI ID (well, lspci stupidly
> > suppresses the ID entirely, but the string seems to match the one that
> > Dave Jones reports) may be unrelated.
>
> Dave's patch removes the entry for the card with the 0x5b60.
>
> According to his bug report, Mauro has a Radeon X300SE that should
> have the 0x5b70 according to pci.ids from pciutils and that doesn't seem
> to be claimed by the DRM driver (and the dmesg from the bug report
> confirms that the radeon DRM driver didn't claim to be responsible for
> this card).
>
> > DaveJ (or Mauro): since you can test this, can you test having that ID
> > there but _without_ the other changes to drm in -rc1?
> >
> > Ie was it the addition of that particular ID, or are the other radeon
> > driver changes (which haven't had as much testing) perhaps the culprit?
> >
> > I realize that without the ID, that card would never have been tested
> > anyway, but the point being that plain 2.6.15 with _just_ that ID added
> > has at least gotten more testing on other (similar) chips. So before I
> > revert that particular ID, it would be nice to know that it was broken
> > even with the previous radeon driver state.
>
> The ID removed by Dave's patch is the only ID listed for an RV370 chips
> (the other RV370's aren't listed in the radeon DRM driver).
>
> I suspect Dave and Mauro having unrelated problems.
>
> > Linus
>
> cu
> Adrian

The X300 has two pci ids:
0000:05:00.0 0300: 1002:5b60
0000:05:00.1 0380: 1002:5b70

0000:05:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60
[Radeon X300 (PCIE)]
0000:05:00.1 Display controller: ATI Technologies Inc RV370 [Radeon
X300SE]

Gerhard


--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2006-02-15 06:53:04

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Thierry Vignaud <[email protected]> wrote:
>
> Takashi Iwai <[email protected]> writes:
>
> > It's not a "regression". PM didn't work with ens1370 at all in the
> > eralier version.
>
> btw, PM support in snd-intel8x0 is broken (at least regarding
> suspending) in 2.6.16-rc2-mm1 on a nforce2 chipset

Can you identify when this breakage occurred?

2006-02-15 10:31:39

by [email protected]

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

>> The ID removed by Dave's patch is the only ID listed for an RV370
>> chips
>> (the other RV370's aren't listed in the radeon DRM driver).
>>
>> I suspect Dave and Mauro having unrelated problems.
>>
>> > Linus
>>
>> cu
>> Adrian
>
>The X300 has two pci ids:
>0000:05:00.0 0300: 1002:5b60
>0000:05:00.1 0380: 1002:5b70
>
>0000:05:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60
>[Radeon X300 (PCIE)]
>0000:05:00.1 Display controller: ATI Technologies Inc RV370 [Radeon
>X300SE]
>
> Gerhard

This particular pci express card is commercialized as "ASUS EXTREME AX 300
SE-X/TD/128 MB DDR TV OUT DVI" has the following:

04:00.0 Class 0300: 1002:5b60
04:00.1 Class 0380: 1002:5b70

04:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon
X300 (PCIE)] (prog-if 00 [VGA])
Subsystem: ASUSTeK Computer Inc. Extreme AX300SE-X
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 04
Interrupt: pin A routed to IRQ 0
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at e000 [size=256]
Region 2: Memory at d7fe0000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at d7fc0000 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] #10 [0001]
Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-
Address: 0000000000000000 Data: 0000

04:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
Subsystem: ASUSTeK Computer Inc.: Unknown device 002b
Flags: bus master, fast devsel, latency 0
Memory at d7ff0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #10 [0001]

It does indeed work with dri disabled in xorg.conf, as explained by Dave
Airlie in previous post.

the only modules loaded starting xorg are:

Module Size Used by
appletalk 27696 2
ax25 45784 2
ipx 21164 2
radeon 94112 0

this of course on 2.6.15, since 2.6.16 completely hangs, and no other info
can be gathered.

Mauro

2006-02-15 14:19:21

by Mauro Tassinari

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

>Module Size Used by
>appletalk 27696 2
>ax25 45784 2
>ipx 21164 2
>radeon 94112 0
>
>this of course on 2.6.15, since 2.6.16 completely hangs, and no other info
can be gathered.


BTW, modprobe-ing radeon.ko (no Xorg started), 2.6.16-rc1 stays up and this
is dmesg:

... snip ...

ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 16 (level, low) -> IRQ 16
[drm] Initialized radeon 1.21.0 20051229 on minor 0

... snip ...

then, as soon as Xorg is started, the system hangs.

Mauro

2006-02-15 16:07:34

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Tue, 14 Feb 2006, James Bottomley wrote:

> On Mon, 2006-02-13 at 12:38 -0800, Greg KH wrote:
> > > - Nasty warnings from scsi about kobject-layer things being called
> > from
> > > irq context. James has a push-it-to-process-context patch which
> > sadly
> > > assumes kmalloc() is immortal, but no other fix seems to have
> > offered
> > > itself.
> >
> > This has been the case for a long time. I don't really think there is
> > a
> > rush to get this fixed, but I really like James's proposed patch.
> > It's
> > up to him if he feels it is ready for 2.6.16 or not.
>
> Well, I can't solve the problem that it requires memory allocation from
> IRQ context to operate. Based on that, it's an unsafe interface. I'm
> going to put it inside SCSI for 2.6.16, since it's better than what we
> have now, but I don't think we can export it globally.

Could we perhaps make this safer and more general?

For instance, add to struct device a "pending puts" counter and a list
header (both protected by a global spinlock), and have a kernel thread
periodically check the list, doing put_device wherever needed. How does
that sound?

Alan Stern

2006-02-15 16:22:41

by Jesse Allen

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

On 2/13/06, Felix K?hling <[email protected]> wrote:
> Am Montag, den 13.02.2006, 16:27 -0700 schrieb Jesse Allen:
> >
> > Well a while back, I hacked in the pci id for my Xpress 200M (5955),
> > which is basically an RV370 with no dedicated vram. I did the same
> > thing and claimed an RV350, which is the closest model. This allowed
> > the radeon module to load. When I startx'ed and DRI was allowed to
> > load on it, it locked up. So I never sent in the patch. I believe
> > the person who sent this one in originally seemed to indicate that it
> > worked, and I believed it if he had an X300 and my problem was having
> > the IGP version. But now having this reported, I'm pretty sure it is
> > the same problem. RV370 doesn't seem to work as an RV350.
>
> The Xpress200 chips have a completely different GART implementation.
> Thus the driver can't even send commands to the command processor and
> that's why X locked up on you when DRI was enabled. This has nothing to
> do with the Xpress200 being (almost) an RV370 or not.
>


Well, I did not know about the GART problem. So this means that
RV370s and XPRESS will be listed both separately in the driver in the
future? They certainly don't function as an RV350 and of course they
aren't quite compatable then.

Jesse

2006-02-15 16:27:18

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, Feb 15, 2006 at 11:07:32AM -0500, Alan Stern wrote:
> On Tue, 14 Feb 2006, James Bottomley wrote:
>
> > On Mon, 2006-02-13 at 12:38 -0800, Greg KH wrote:
> > > > - Nasty warnings from scsi about kobject-layer things being called
> > > from
> > > > irq context. James has a push-it-to-process-context patch which
> > > sadly
> > > > assumes kmalloc() is immortal, but no other fix seems to have
> > > offered
> > > > itself.
> > >
> > > This has been the case for a long time. I don't really think there is
> > > a
> > > rush to get this fixed, but I really like James's proposed patch.
> > > It's
> > > up to him if he feels it is ready for 2.6.16 or not.
> >
> > Well, I can't solve the problem that it requires memory allocation from
> > IRQ context to operate. Based on that, it's an unsafe interface. I'm
> > going to put it inside SCSI for 2.6.16, since it's better than what we
> > have now, but I don't think we can export it globally.
>
> Could we perhaps make this safer and more general?
>
> For instance, add to struct device a "pending puts" counter and a list
> header (both protected by a global spinlock), and have a kernel thread
> periodically check the list, doing put_device wherever needed. How does
> that sound?

Sounds like a garbage collector :)

Nah, I don't think it's a good idea. James's patch should work just
fine.

thanks,

greg k-h

2006-02-15 16:35:16

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
>
> Nah, I don't think it's a good idea. James's patch should work just
> fine.

another option is to have a "kill list" which you put the thing on, and
then wake up a thread. only 2 pointers in the object ;(


2006-02-15 17:06:11

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, Feb 15, 2006 at 05:35:08PM +0100, Arjan van de Ven wrote:
> On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
> >
> > Nah, I don't think it's a good idea. James's patch should work just
> > fine.
>
> another option is to have a "kill list" which you put the thing on, and
> then wake up a thread. only 2 pointers in the object ;(

Hm, that's almost what James's patch is trying to do. Care to mock up a
patch that shows this? It might be a simpler solution.

thanks,

greg k-h

2006-02-15 20:01:33

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Thierry Vignaud <[email protected]> wrote:
>
> Andrew Morton <[email protected]> writes:
>
> > > > It's not a "regression". PM didn't work with ens1370 at all in
> > > > the eralier version.
> > >
> > > btw, PM support in snd-intel8x0 is broken (at least regarding
> > > suspending) in 2.6.16-rc2-mm1 on a nforce2 chipset
> >
> > Can you identify when this breakage occurred?
>
> i'll try to compile a few older kernels (and/or just older
> alsa-kernel) if you want but i'm not sure it's a regression (i'll
> check if it has ever worked before).

OK, thanks.

> i've tried unloading/reloading sound modules after resuming (maybe
> would it work if unloaded before suspending but of course full PM
> support would be nicer).
>
> not sure if it can help but while resuming, the snd-intel8x0 printed
> quite a lot of warnings (due to preempting[1] i guess?) such as:
> BUG: scheduling while atomic: zsh/0x00000001/2196
> <c028b93f> schedule+0x43/0x54e <c028c6bf> schedule_timeout+0x7a/0x95
> <c011c755> process_timeout+0x0/0x5 <d4938e56> snd_intel8x0_chip_init+0x110/0x39e [snd_intel8x0]
> <d4939142> intel8x0_resume+0x5e/0x1ba [snd_intel8x0] <c01b6dee> pci_device_resume+0x16/0x43
> <c02025d9> resume_device+0x7d/0x96 <c02026a7> dpm_resume+0x58/0x80
> <c02026dc> device_resume+0xd/0x16 <c012db1f> pm_suspend_disk+0xbf/0xc8
> <c012cb95> enter_state+0x50/0x16f <c012cd37> state_store+0x83/0x8f
> <c012ccb4> state_store+0x0/0x8f <c0173492> subsys_attr_store+0x1e/0x22
> <c0173a1b> sysfs_write_file+0x92/0xb9 <c0173989> sysfs_write_file+0x0/0xb9
> <c01491ca> vfs_write+0x83/0x122 <c01499df> sys_write+0x3c/0x63
> <c0102973> sysenter_past_esp+0x54/0x75
>
> dmesg after resuming (only look at the beginning, the end is only ehci
> garbage b/c ehci is bugging for monthes (rejecting mass media after
> writing a few Mo)):

That's odd. I don't see what could have elevated preempt_count() on that
path. What does `grep PREEMPT .config' say?

2006-02-15 21:35:16

by James Bottomley

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 2006-02-15 at 11:07 -0500, Alan Stern wrote:
> Could we perhaps make this safer and more general?
>
> For instance, add to struct device a "pending puts" counter and a list
> header (both protected by a global spinlock), and have a kernel thread
> periodically check the list, doing put_device wherever needed. How does
> that sound?

That's what I've been discussing with Jens elsewhere on this list.
However, I think what you're proposing is overly complex. All we really
need is for a way of flagging a kobject (or kref) so the final put will
be in user context. Then we can use storage within the kobject or
device (or something else) for the purpose.

James


2006-02-15 21:52:48

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 15 Feb 2006, Greg KH wrote:

> On Wed, Feb 15, 2006 at 05:35:08PM +0100, Arjan van de Ven wrote:
> > On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
> > >
> > > Nah, I don't think it's a good idea. James's patch should work just
> > > fine.
> >
> > another option is to have a "kill list" which you put the thing on, and
> > then wake up a thread. only 2 pointers in the object ;(
>
> Hm, that's almost what James's patch is trying to do. Care to mock up a
> patch that shows this? It might be a simpler solution.

It won't work. You might have to do 2 put_device calls on the same
structure. That's why I suggested the "pending puts" counter; something
can't go on a list more than once.

Alan Stern

2006-02-15 22:00:10

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, Feb 15, 2006 at 04:52:43PM -0500, Alan Stern wrote:
> On Wed, 15 Feb 2006, Greg KH wrote:
>
> > On Wed, Feb 15, 2006 at 05:35:08PM +0100, Arjan van de Ven wrote:
> > > On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
> > > >
> > > > Nah, I don't think it's a good idea. James's patch should work just
> > > > fine.
> > >
> > > another option is to have a "kill list" which you put the thing on, and
> > > then wake up a thread. only 2 pointers in the object ;(
> >
> > Hm, that's almost what James's patch is trying to do. Care to mock up a
> > patch that shows this? It might be a simpler solution.
>
> It won't work. You might have to do 2 put_device calls on the same
> structure. That's why I suggested the "pending puts" counter; something
> can't go on a list more than once.

It would only go on the list if the "put" was the last one. Otherwise
it would not make any sense to put it on any list.

thanks,

greg k-h

2006-02-15 22:24:41

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 15 Feb 2006, James Bottomley wrote:

> On Wed, 2006-02-15 at 11:07 -0500, Alan Stern wrote:
> > Could we perhaps make this safer and more general?
> >
> > For instance, add to struct device a "pending puts" counter and a list
> > header (both protected by a global spinlock), and have a kernel thread
> > periodically check the list, doing put_device wherever needed. How does
> > that sound?
>
> That's what I've been discussing with Jens elsewhere on this list.
> However, I think what you're proposing is overly complex. All we really
> need is for a way of flagging a kobject (or kref) so the final put will
> be in user context. Then we can use storage within the kobject or
> device (or something else) for the purpose.

That's more or less what I was suggesting. The problems are: How do you
know which put is the last? (Answer: you don't. So every put has to be
done in process context.) And how do you flag the data structure and tell
some process thread to do the put? (Answer: by putting the object on a
list, as in my suggestion.)

Alan Stern

2006-02-15 22:25:39

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 15 Feb 2006, Greg KH wrote:

> On Wed, Feb 15, 2006 at 04:52:43PM -0500, Alan Stern wrote:
> > On Wed, 15 Feb 2006, Greg KH wrote:
> >
> > > On Wed, Feb 15, 2006 at 05:35:08PM +0100, Arjan van de Ven wrote:
> > > > On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
> > > > >
> > > > > Nah, I don't think it's a good idea. James's patch should work just
> > > > > fine.
> > > >
> > > > another option is to have a "kill list" which you put the thing on, and
> > > > then wake up a thread. only 2 pointers in the object ;(
> > >
> > > Hm, that's almost what James's patch is trying to do. Care to mock up a
> > > patch that shows this? It might be a simpler solution.
> >
> > It won't work. You might have to do 2 put_device calls on the same
> > structure. That's why I suggested the "pending puts" counter; something
> > can't go on a list more than once.
>
> It would only go on the list if the "put" was the last one. Otherwise
> it would not make any sense to put it on any list.

There's no way to know whether or not any particular "put" is the last
one. So you have to assume they all are.

Alan Stern

2006-02-15 22:37:25

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, Feb 15, 2006 at 05:25:37PM -0500, Alan Stern wrote:
> On Wed, 15 Feb 2006, Greg KH wrote:
>
> > On Wed, Feb 15, 2006 at 04:52:43PM -0500, Alan Stern wrote:
> > > On Wed, 15 Feb 2006, Greg KH wrote:
> > >
> > > > On Wed, Feb 15, 2006 at 05:35:08PM +0100, Arjan van de Ven wrote:
> > > > > On Wed, 2006-02-15 at 08:27 -0800, Greg KH wrote:
> > > > > >
> > > > > > Nah, I don't think it's a good idea. James's patch should work just
> > > > > > fine.
> > > > >
> > > > > another option is to have a "kill list" which you put the thing on, and
> > > > > then wake up a thread. only 2 pointers in the object ;(
> > > >
> > > > Hm, that's almost what James's patch is trying to do. Care to mock up a
> > > > patch that shows this? It might be a simpler solution.
> > >
> > > It won't work. You might have to do 2 put_device calls on the same
> > > structure. That's why I suggested the "pending puts" counter; something
> > > can't go on a list more than once.
> >
> > It would only go on the list if the "put" was the last one. Otherwise
> > it would not make any sense to put it on any list.
>
> There's no way to know whether or not any particular "put" is the last
> one. So you have to assume they all are.

The underlying kobject can "know" that the put was the last one, and
handle it differently if needed. Yes, it would not use a kref anymore,
but that might be needed here.

thanks,

greg k-h

2006-02-15 22:52:04

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Wed, 15 Feb 2006, Greg KH wrote:

> > > It would only go on the list if the "put" was the last one. Otherwise
> > > it would not make any sense to put it on any list.
> >
> > There's no way to know whether or not any particular "put" is the last
> > one. So you have to assume they all are.
>
> The underlying kobject can "know" that the put was the last one, and
> handle it differently if needed. Yes, it would not use a kref anymore,
> but that might be needed here.

You would need a kref variant, something which would include room for the
list header and a pointer to the release routine. Okay, that does involve
less time overhead than what I proposed (although the same amount of space
overhead).

Alan Stern

2006-02-16 16:15:40

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Tue, 2006-02-14 at 10:34 -0600, James Bottomley wrote:
> Well, I can't solve the problem that it requires memory allocation from
> IRQ context to operate. Based on that, it's an unsafe interface. I'm
> going to put it inside SCSI for 2.6.16, since it's better than what we
> have now, but I don't think we can export it globally.

OK, this is what I'm proposing as the device model fix. What it does is
thread context checking APIs throughout the device subsystem. SCSI can
then use it simply via device_put_process_context(). Since we have to
supply the kref_work; I'd plan to do that as an additional element in
struct scsi_device.

This, by itself, won't solve the SCSI target problem, but I plan to fix
that via a device model addition which would have target alloc waiting
around for any deleted targets to disappear.

Since this is planned for post 2.6.16, we have plenty of time to argue
about it.

James

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 6b355bd..4ae42de 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -338,10 +338,10 @@ struct device * get_device(struct device
* put_device - decrement reference count.
* @dev: device in question.
*/
-void put_device(struct device * dev)
+void put_device_process_context(struct device * dev, struct kref_work *work)
{
if (dev)
- kobject_put(&dev->kobj);
+ kobject_put_process_context(&dev->kobj, work);
}


@@ -445,7 +445,7 @@ EXPORT_SYMBOL_GPL(device_register);
EXPORT_SYMBOL_GPL(device_del);
EXPORT_SYMBOL_GPL(device_unregister);
EXPORT_SYMBOL_GPL(get_device);
-EXPORT_SYMBOL_GPL(put_device);
+EXPORT_SYMBOL_GPL(put_device_process_context);

EXPORT_SYMBOL_GPL(device_create_file);
EXPORT_SYMBOL_GPL(device_remove_file);
diff --git a/include/linux/device.h b/include/linux/device.h
index 58df18d..ac9d457 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -396,7 +396,13 @@ extern int (*platform_notify_remove)(str
*
*/
extern struct device * get_device(struct device * dev);
-extern void put_device(struct device * dev);
+extern void put_device_process_context(struct device * dev,
+ struct kref_work *work);
+static inline void put_device(struct device *dev)
+{
+ put_device_process_context(dev, NULL);
+}
+


/* drivers/base/power.c */
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 2a8d8da..d079fea 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -76,7 +76,12 @@ extern int kobject_register(struct kobje
extern void kobject_unregister(struct kobject *);

extern struct kobject * kobject_get(struct kobject *);
-extern void kobject_put(struct kobject *);
+extern void kobject_put_process_context(struct kobject *, struct kref_work *);
+
+static inline void kobject_put(struct kobject *kobj)
+{
+ kobject_put_process_context(kobj, NULL);
+}

extern char * kobject_get_path(struct kobject *, gfp_t);

diff --git a/include/linux/kref.h b/include/linux/kref.h
index 6fee353..16b15db 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -18,15 +18,29 @@
#ifdef __KERNEL__

#include <linux/types.h>
+#include <linux/workqueue.h>
#include <asm/atomic.h>

struct kref {
atomic_t refcount;
};

+struct kref_work {
+ struct work_struct work;
+ struct kref *kref;
+ void (*release)(struct kref *kref);
+};
+
void kref_init(struct kref *kref);
void kref_get(struct kref *kref);
-int kref_put(struct kref *kref, void (*release) (struct kref *kref));
+int kref_put_process_context(struct kref *kref,
+ void (*release) (struct kref *kref),
+ struct kref_work *work);
+static inline int kref_put(struct kref *kref,
+ void (*release) (struct kref *kref))
+{
+ return kref_put_process_context(kref, release, NULL);
+}

#endif /* __KERNEL__ */
#endif /* _KREF_H_ */
diff --git a/lib/kobject.c b/lib/kobject.c
index efe67fa..6b80c54 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -372,10 +372,10 @@ static void kobject_release(struct kref
*
* Decrement the refcount, and if 0, call kobject_cleanup().
*/
-void kobject_put(struct kobject * kobj)
+void kobject_put_process_context(struct kobject * kobj, struct kref_work *work)
{
if (kobj)
- kref_put(&kobj->kref, kobject_release);
+ kref_put_process_context(&kobj->kref, kobject_release, work);
}


@@ -537,7 +537,7 @@ EXPORT_SYMBOL(kobject_init);
EXPORT_SYMBOL(kobject_register);
EXPORT_SYMBOL(kobject_unregister);
EXPORT_SYMBOL(kobject_get);
-EXPORT_SYMBOL(kobject_put);
+EXPORT_SYMBOL(kobject_put_process_context);
EXPORT_SYMBOL(kobject_add);
EXPORT_SYMBOL(kobject_del);

diff --git a/lib/kref.c b/lib/kref.c
index 0d07cc3..66231cf 100644
--- a/lib/kref.c
+++ b/lib/kref.c
@@ -13,6 +13,7 @@

#include <linux/kref.h>
#include <linux/module.h>
+#include <linux/hardirq.h>

/**
* kref_init - initialize object.
@@ -33,27 +34,47 @@ void kref_get(struct kref *kref)
atomic_inc(&kref->refcount);
}

+static void kref_release_process_context(void *data)
+{
+ struct kref_work *work = data;
+
+ work->release(work->kref);
+}
+
/**
- * kref_put - decrement refcount for object.
+ * kref_put_user - decrement refcount for object and put in user context
* @kref: object.
* @release: pointer to the function that will clean up the object when the
* last reference to the object is released.
* This pointer is required, and it is not acceptable to pass kfree
* in as this function.
+ * @work: pointer to a kref_work used to take the release through user
+ * context (may be null)
*
- * Decrement the refcount, and if 0, call release().
+ * Decrement the refcount, and if 0, call release(). If work is not null
+ * execute release via schedule_work if not in process context.
* Return 1 if the object was removed, otherwise return 0. Beware, if this
* function returns 0, you still can not count on the kref from remaining in
* memory. Only use the return value if you want to see if the kref is now
* gone, not present.
*/
-int kref_put(struct kref *kref, void (*release)(struct kref *kref))
+int kref_put_process_context(struct kref *kref,
+ void (*release)(struct kref *kref),
+ struct kref_work *work)
{
WARN_ON(release == NULL);
WARN_ON(release == (void (*)(struct kref *))kfree);

if (atomic_dec_and_test(&kref->refcount)) {
- release(kref);
+ if (!work || !in_interrupt())
+ release(kref);
+ else {
+ INIT_WORK(&work->work, kref_release_process_context,
+ work);
+ schedule_work(&work->work);
+ }
+
+
return 1;
}
return 0;
@@ -61,4 +82,4 @@ int kref_put(struct kref *kref, void (*r

EXPORT_SYMBOL(kref_init);
EXPORT_SYMBOL(kref_get);
-EXPORT_SYMBOL(kref_put);
+EXPORT_SYMBOL(kref_put_process_context);


2006-02-16 17:12:53

by Russell King

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Wed, Feb 15, 2006 at 08:56:00PM -0500, James Bottomley wrote:
> On Tue, 2006-02-14 at 10:34 -0600, James Bottomley wrote:
> > Well, I can't solve the problem that it requires memory allocation from
> > IRQ context to operate. Based on that, it's an unsafe interface. I'm
> > going to put it inside SCSI for 2.6.16, since it's better than what we
> > have now, but I don't think we can export it globally.
>
> OK, this is what I'm proposing as the device model fix. What it does is
> thread context checking APIs throughout the device subsystem. SCSI can
> then use it simply via device_put_process_context(). Since we have to
> supply the kref_work; I'd plan to do that as an additional element in
> struct scsi_device.
>
> This, by itself, won't solve the SCSI target problem, but I plan to fix
> that via a device model addition which would have target alloc waiting
> around for any deleted targets to disappear.
>
> Since this is planned for post 2.6.16, we have plenty of time to argue
> about it.

This is probably an idiotic question, but if there's something in the
scsi release handler can't be called in non-process context, why can't
scsi queue up the release processing via the work API itself, rather
than having to have this additional code and complexity for everyone?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-16 17:38:27

by Stefan Richter

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Russell King wrote:
> On Wed, Feb 15, 2006 at 08:56:00PM -0500, James Bottomley wrote:
[...]
>>OK, this is what I'm proposing as the device model fix. What it does is
>>thread context checking APIs throughout the device subsystem. SCSI can
>>then use it simply via device_put_process_context().
[...]
>>Since this is planned for post 2.6.16, we have plenty of time to argue
>>about it.
>
> This is probably an idiotic question, but if there's something in the
> scsi release handler can't be called in non-process context, why can't
> scsi queue up the release processing via the work API itself, rather
> than having to have this additional code and complexity for everyone?

Moreover, why are SCSI release handlers called in non-process context in
the first place? IMO the fix should be to make sure that SCSI release
handlers are always called from process context --- by the respective
layers which manage physical devices, i.e. one or more layers beneath
SCSI core.
--
Stefan Richter
-=====-=-==- --=- =----
http://arcgraph.de/sr/

2006-02-16 17:58:12

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, 2006-02-16 at 17:12 +0000, Russell King wrote:
> This is probably an idiotic question, but if there's something in the
> scsi release handler can't be called in non-process context, why can't
> scsi queue up the release processing via the work API itself, rather
> than having to have this additional code and complexity for everyone?

It's because, in order to get a guaranteed single allocation for the
workqueue to execute in user context, I need to know when the release
will be called. The only way to do that is to add the execute in
process context directly to kref_put.

James


2006-02-16 18:10:20

by Russell King

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, Feb 16, 2006 at 09:57:32AM -0800, James Bottomley wrote:
> On Thu, 2006-02-16 at 17:12 +0000, Russell King wrote:
> > This is probably an idiotic question, but if there's something in the
> > scsi release handler can't be called in non-process context, why can't
> > scsi queue up the release processing via the work API itself, rather
> > than having to have this additional code and complexity for everyone?
>
> It's because, in order to get a guaranteed single allocation for the
> workqueue to execute in user context, I need to know when the release
> will be called. The only way to do that is to add the execute in
> process context directly to kref_put.

Is there something in the driver model which would prevent something
like this?

static void scsi_release_process(void *p)
{
struct my_work *work = p;
struct device *dev = work->dev;

/* destroy dev */

kfree(work);
}

static void scsi_release(struct device *dev)
{
struct my_work *work;

work = kmalloc(sizeof(struct my_work), GFP_ATOMIC);
if (work) {
INIT_WORK(&work->work, scsi_release_process, work);
work->dev = dev;
schedule_work(&work->work);
} else {
printk(KERN_ERR ...);
}
}

where scsi_release() is the function called by the device model on the
last put of a scsi device.

I guess is more or less what you're trying to do invasively via the
driver model.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-16 18:14:59

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, 2006-02-16 at 18:09 +0000, Russell King wrote:
> where scsi_release() is the function called by the device model on the
> last put of a scsi device.
>
> I guess is more or less what you're trying to do invasively via the
> driver model.

Yes ... except I think more than just SCSI has the problem (and we
actually have it in more than one release function) so it seems like a
good candidate for a general abstraction.

James


2006-02-16 18:18:32

by Russell King

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, Feb 16, 2006 at 10:14:31AM -0800, James Bottomley wrote:
> On Thu, 2006-02-16 at 18:09 +0000, Russell King wrote:
> > where scsi_release() is the function called by the device model on the
> > last put of a scsi device.
> >
> > I guess is more or less what you're trying to do invasively via the
> > driver model.
>
> Yes ... except I think more than just SCSI has the problem (and we
> actually have it in more than one release function) so it seems like a
> good candidate for a general abstraction.

Maybe implementing it as a helper function would be the best and
simplest solution?

static void scsi_release(struct device *dev)
{
schedule_release_process(dev, scsi_release_process);
}

where schedule_release_process() contains more or less what I posted
in the previous mailing.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-02-16 19:10:04

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, 2006-02-16 at 18:18 +0000, Russell King wrote:
> Maybe implementing it as a helper function would be the best and
> simplest solution?
>
> static void scsi_release(struct device *dev)
> {
> schedule_release_process(dev, scsi_release_process);
> }
>
> where schedule_release_process() contains more or less what I posted
> in the previous mailing.

That's almost exactly the execute_in_process_context() API that began
this discussion (and which Andi NAK'd). However, it could possibly be
resurrected with the proviso that the caller has to feed in the
workqueue memory. How would people feel about that?

James


2006-02-16 20:04:22

by Jens Axboe

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, Feb 16 2006, James Bottomley wrote:
> On Thu, 2006-02-16 at 18:18 +0000, Russell King wrote:
> > Maybe implementing it as a helper function would be the best and
> > simplest solution?
> >
> > static void scsi_release(struct device *dev)
> > {
> > schedule_release_process(dev, scsi_release_process);
> > }
> >
> > where schedule_release_process() contains more or less what I posted
> > in the previous mailing.
>
> That's almost exactly the execute_in_process_context() API that began
> this discussion (and which Andi NAK'd). However, it could possibly be
> resurrected with the proviso that the caller has to feed in the
> workqueue memory. How would people feel about that?

That's what I suggested in the first place as well. I still think it's a
good idea, fwiw :)

--
Jens Axboe

2006-02-16 22:00:17

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.16-rc3: more regressions

>
> Well, I did not know about the GART problem. So this means that
> RV370s and XPRESS will be listed both separately in the driver in the
> future? They certainly don't function as an RV350 and of course they
> aren't quite compatable then.

The RV350 and RV370 are more or less programatically the same, I'm not
sure there is any difference, the XPRESS chipsets although based on
RV370 have a whole different memory controller architecture by virtue
of being shared memory..

Dave.

2006-02-18 00:44:12

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Thu, 2006-02-16 at 21:01 +0100, Jens Axboe wrote:
> That's what I suggested in the first place as well. I still think it's a
> good idea, fwiw :)

OK smarty pants ... some of us are a bit slower on the uptake. How
about this then. It won't solve the target problem, but it will solve
the device put.

James

[PATCH] add execute_in_process_context() API

We have several points in the SCSI stack (primarily for our device
functions) where we need to guarantee process context, but (given the
place where the last reference was released) we cannot guarantee this.

This API gets around the issue by executing the function directly if
the caller has process context, but scheduling a workqueue to execute
in process context if the caller doesn't have it.

Signed-off-by: James Bottomley <[email protected]>

Index: BUILD-2.6/include/linux/workqueue.h
===================================================================
--- BUILD-2.6.orig/include/linux/workqueue.h 2006-02-17 13:02:00.000000000 -0600
+++ BUILD-2.6/include/linux/workqueue.h 2006-02-17 17:57:52.000000000 -0600
@@ -20,6 +20,12 @@
struct timer_list timer;
};

+struct execute_work {
+ struct work_struct work;
+ void (*fn)(void *);
+ void *data;
+};
+
#define __WORK_INITIALIZER(n, f, d) { \
.entry = { &(n).entry, &(n).entry }, \
.func = (f), \
@@ -74,6 +80,8 @@
void cancel_rearming_delayed_work(struct work_struct *work);
void cancel_rearming_delayed_workqueue(struct workqueue_struct *,
struct work_struct *);
+int execute_in_process_context(void (*fn)(void *), void *,
+ struct execute_work *);

/*
* Kill off a pending schedule_delayed_work(). Note that the work callback
Index: BUILD-2.6/kernel/workqueue.c
===================================================================
--- BUILD-2.6.orig/kernel/workqueue.c 2006-02-17 13:02:01.000000000 -0600
+++ BUILD-2.6/kernel/workqueue.c 2006-02-17 18:00:15.000000000 -0600
@@ -27,6 +27,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/kthread.h>
+#include <linux/hardirq.h>

/*
* The per-CPU workqueue (if single thread, we always use the first
@@ -476,6 +477,45 @@
}
EXPORT_SYMBOL(cancel_rearming_delayed_work);

+static void execute_in_process_context_work(void *data)
+{
+ void (*fn)(void *data);
+ struct execute_work *ew = data;
+
+ fn = ew->fn;
+ data = ew->data;
+
+ fn(data);
+}
+
+/**
+ * execute_in_process_context - reliably execute the routine with user context
+ * @fn: the function to execute
+ * @data: data to pass to the function
+ *
+ * Executes the function immediately if process context is available,
+ * otherwise schedules the function for delayed execution.
+ *
+ * Returns: 0 - function was executed
+ * 1 - function was scheduled for execution
+ */
+int execute_in_process_context(void (*fn)(void *data), void *data,
+ struct execute_work *ew)
+{
+ if (!in_interrupt()) {
+ fn(data);
+ return 0;
+ }
+
+ INIT_WORK(&ew->work, execute_in_process_context_work, ew);
+ ew->fn = fn;
+ ew->data = data;
+ schedule_work(&ew->work);
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(execute_in_process_context);
+
int keventd_up(void)
{
return keventd_wq != NULL;


2006-02-18 01:02:41

by Greg KH

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

On Fri, Feb 17, 2006 at 04:42:43PM -0800, James Bottomley wrote:
> On Thu, 2006-02-16 at 21:01 +0100, Jens Axboe wrote:
> > That's what I suggested in the first place as well. I still think it's a
> > good idea, fwiw :)
>
> OK smarty pants ... some of us are a bit slower on the uptake. How
> about this then. It won't solve the target problem, but it will solve
> the device put.
>
> James
>
> [PATCH] add execute_in_process_context() API

I like it, nice job.

Acked-by: Greg Kroah-Hartma <[email protected]>

thanks,

greg k-h

2006-02-18 02:12:22

by Roland Dreier

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

> +/**
> + * execute_in_process_context - reliably execute the routine with user context
> + * @fn: the function to execute
> + * @data: data to pass to the function
> + *
> + * Executes the function immediately if process context is available,
> + * otherwise schedules the function for delayed execution.
> + *
> + * Returns: 0 - function was executed
> + * 1 - function was scheduled for execution
> + */
> +int execute_in_process_context(void (*fn)(void *data), void *data,
> + struct execute_work *ew)
> +{
> + if (!in_interrupt()) {
> + fn(data);
> + return 0;
> + }

Is testing in_interrupt() really sufficient to make this work? I seem
to remember that (at least) with CONFIG_PREEMPT disabled, there are
contexts where it is not safe to sleep but where both in_interrupt()
and in_atomic() still return 0.

- R.

2006-02-18 10:03:54

by Sergey Vlasov

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Fri, Feb 17, 2006 at 04:42:43PM -0800, James Bottomley wrote:
> +static void execute_in_process_context_work(void *data)
> +{
> + void (*fn)(void *data);
> + struct execute_work *ew = data;
> +
> + fn = ew->fn;
> + data = ew->data;
> +
> + fn(data);
> +}

After removing kfree(), which was here in the initial implementation,
this function became a thunk which does nothing useful - we can just
stick fn and data directly into work_struct.

> +
> +/**
> + * execute_in_process_context - reliably execute the routine with user context
> + * @fn: the function to execute
> + * @data: data to pass to the function
> + *
> + * Executes the function immediately if process context is available,
> + * otherwise schedules the function for delayed execution.
> + *
> + * Returns: 0 - function was executed
> + * 1 - function was scheduled for execution
> + */
> +int execute_in_process_context(void (*fn)(void *data), void *data,
> + struct execute_work *ew)
> +{
> + if (!in_interrupt()) {
> + fn(data);
> + return 0;
> + }
> +
> + INIT_WORK(&ew->work, execute_in_process_context_work, ew);
> + ew->fn = fn;
> + ew->data = data;
> + schedule_work(&ew->work);
> +
> + return 1;
> +}

Then this becomes:

int execute_in_process_context(void (*fn)(void *data), void *data,
struct work_struct *work)
{
if (!in_interrupt()) {
fn(data);
return 0;
}

INIT_WORK(work, fn, data);
schedule_work(work);
return 1;
}

(and struct execute_work is no longer needed).


Attachments:
(No filename) (1.45 kB)
(No filename) (189.00 B)
Download all attachments

2006-02-18 12:55:43

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Hi!

> That's suspend-to-disk, yes?
>
> Dave, would you have the 2.6.15-1.1830_FC4 -> 2.6.15-1.1831_FC4 details
> handy? There surely can't be much difference?
>
> There seem to be several ACPI problems there. Do we have a reliable means
> of feeding such reports up into the (for example) acpi developers?
>
> <I have this vaguely unsettled feeling that distros must get more bug
> reports than the usptream developers, yet we hear so little about it>

Its about 1:1 upstream:suse bugs for me... Unfortunately
suse bugs are often for old kernel...
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2006-02-18 20:16:32

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Fri, 17 Feb 2006, James Bottomley wrote:

> +/**
> + * execute_in_process_context - reliably execute the routine with user context
> + * @fn: the function to execute
> + * @data: data to pass to the function
> + *
> + * Executes the function immediately if process context is available,
> + * otherwise schedules the function for delayed execution.
> + *
> + * Returns: 0 - function was executed
> + * 1 - function was scheduled for execution
> + */
> +int execute_in_process_context(void (*fn)(void *data), void *data,
> + struct execute_work *ew)
> +{
> + if (!in_interrupt()) {
> + fn(data);
> + return 0;
> + }

The test should be in_atomic(), not in_interrupt().

Alan Stern

2006-02-18 21:06:49

by Helge Hafting

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3

Andrew Morton wrote:

>We still have some serious bugs, several of which are in 2.6.15 as well:
>
[...]

>- Helge Hafting reports a usb printer regression - I don't know if that's
> still live?
>
>
I tried printing four pages of graphichs with 2.6.16-rc3-mm1
and it worked fine. When I had the problem I couldn't even print
three, I had to print the 10 pages I needed one by one.

So I believe the situation has improved.

Helge Hafting

2006-02-19 13:52:16

by James Bottomley

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Sat, 2006-02-18 at 15:16 -0500, Alan Stern wrote:
> The test should be in_atomic(), not in_interrupt().

There's a long prior discussion of why it has to be in_interrupt()

James


2006-02-19 14:31:27

by James Bottomley

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Sat, 2006-02-18 at 13:03 +0300, Sergey Vlasov wrote:
> After removing kfree(), which was here in the initial implementation,
> this function became a thunk which does nothing useful - we can just
> stick fn and data directly into work_struct.

Yes, thanks ... although I think there's still value to wrappering
work_struct (a bit like kref wrappers atomic_t).

James


2006-02-22 16:51:47

by Ben Castricum

[permalink] [raw]
Subject: Re: Linux 2.6.16-rc3


> We still have some serious bugs, several of which are in 2.6.15 as well:
> [...]
> - "Ben Castricum" <[email protected]> reports that ppp has started
> exhibiting mysterious failures (again).

I found the time to do some more testing, this time with the latest
2.6.16-rc4 from git and still pppd version 2.4.1.

The problem was very easy to reproduce, within 10 minutes I appeared again.
I could narrow the problem down to not being able to _send_ traffic.
Ifconfig -a and tcpdump showed traffic coming in, but not going out. I tried
stracing the relevant programs (I use pptp to connect to the dsl modem)

root@gateway:~# ps -ef | grep pp
root 870 1 0 16:47 ? 00:00:00 /usr/sbin/upnpd ppp0 eth1
root 329 1 0 16:45 ? 00:00:00 /usr/sbin/pptp pptp: call
manager for 10.0.0.138
root 323 322 0 16:45 ? 00:00:14 /usr/sbin/pptp pptp:
GRE-to-PPP gateway on /dev/ptmx
root 322 1 0 16:45 ? 00:00:00 /usr/sbin/pppd call adsl
root 3156 2941 0 17:15 pts/2 00:00:00 grep pp

root@gateway:~# strace -p 322
select(4, [0 3], NULL, [0 3], NULL <unfinished ...>

(nothing happening there)

root@gateway:~# strace -p 323
write(4, " \201\210\v\0\0\0\0\0\2\320X", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0`\36P\0\0@/G\225\n\0\0\212\n\0\0\0010\1\210\v\0@\0"..., 8260)
= 96
write(0, "\377\3\0!E\0\0< \211@\0(\6\27\336\310\256\260\215\325T"..., 64) =
64
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 488000})
read(4, "E\0\0T\36Q\0\0@/G\240\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000_E@\0w\6\320rRH\340\256\325T\313\304\20"..., 52)
= 52
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320Z", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0T\36R\0\0@/G\237\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000_F@\0w\6\320qRH\340\256\325T\313\304\20"..., 52)
= 52
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 472000})
read(4, "E\0\0T\36S\0\0@/G\236\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000\21\223@\0q\6\205\\P\313\200\364\325T"..., 52) =
52
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320\\", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0T\36T\0\0@/G\235\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000C\2@\0h\6\17\233\311\307UJ\325T\313\304"..., 52)
= 52
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 496000})
read(4, "E\0\0`\36U\0\0@/G\220\n\0\0\212\n\0\0\0010\1\210\v\0@\0"..., 8260)
= 96
write(0, "\377\3\0!E\0\0<0e@\0003\6\4T\303\200\256i\325T\313\304"..., 64) =
64
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320^", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0d\36V\0\0@/G\213\n\0\0\212\n\0\0\0010\1\210\v\0D\0"..., 8260)
= 100
write(0, "\377\3\0!E\0\0@:S\0\0003\6\203Y\311\32_\330\325T\313\304"..., 68)
= 68
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 484000})
read(4, "E\0\0`\36W\0\0@/G\216\n\0\0\212\n\0\0\0010\1\210\v\0@\0"..., 8260)
= 96
write(0, "\377\3\0!E\0\0<\275\240@\0%\6\24\355\310\320\31E\325T\313"..., 64)
= 64
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320`", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0d\36X\0\0@/G\211\n\0\0\212\n\0\0\0010\1\210\v\0D\0"..., 8260)
= 100
write(0, "\377\3\0!E\0\0@a\264\0\0003\6\250\345\311\374\22\t\325"..., 68) =
68
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 496000})
read(4, "E\0\0d\36Y\0\0@/G\210\n\0\0\212\n\0\0\0010\1\210\v\0D\0"..., 8260)
= 100
write(0, "\377\3\0!E\0\0@j\217\0\0003\6\272\317\310\350\370W\325"..., 68) =
68
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320b", 12) = 12
select(5, [0 4], NULL, NULL, NULL) = 1 (in [4])
read(4, "E\0\0T\36Z\0\0@/G\227\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000\26\217@\0p\6\375o\310\251\215\6\325T"..., 52) =
52
select(5, [0 4], NULL, NULL, {0, 500000}) = 1 (in [4], left {0, 440000})
read(4, "E\0\0T\36[\0\0@/G\226\n\0\0\212\n\0\0\0010\1\210\v\000"..., 8260) =
84
write(0, "\377\3\0!E\0\0000\3672@\0o\6[-\310\264O\232\325T\313\304"..., 52)
= 52
select(5, [0 4], NULL, NULL, {0, 0}) = 0 (Timeout)
write(4, " \201\210\v\0\0\0\0\0\2\320d", 12) = 12
select(5, [0 4], NULL, NULL, NULL <unfinished ...>

(and so on...)

root@gateway:~# strace -p 329
select(7, [3 4 6], [], NULL, NULL <unfinished ...>

nothing hapening here either.

checking the iptables setup (I flushed the rules before I started with
troubleshooting):
root@gateway:~# iptables-save
# Generated by iptables-save v1.2.6a on Wed Feb 22 17:10:44 2006
*filter
:INPUT ACCEPT [6204:440626]
:FORWARD ACCEPT [8:284]
:OUTPUT ACCEPT [18978:2854072]
COMMIT
# Completed on Wed Feb 22 17:10:44 2006
# Generated by iptables-save v1.2.6a on Wed Feb 22 17:10:44 2006
*nat
:PREROUTING ACCEPT [44153:2316103]
:POSTROUTING ACCEPT [805:50699]
:OUTPUT ACCEPT [2308:153409]
COMMIT
# Completed on Wed Feb 22 17:10:44 2006

root@gateway:~# iptables-save
# Generated by iptables-save v1.2.6a on Wed Feb 22 17:10:48 2006
*filter
:INPUT ACCEPT [6721:478137]
:FORWARD ACCEPT [8:284]
:OUTPUT ACCEPT [20528:3083677]
COMMIT
# Completed on Wed Feb 22 17:10:48 2006
# Generated by iptables-save v1.2.6a on Wed Feb 22 17:10:48 2006
*nat
:PREROUTING ACCEPT [44241:2320941]
:POSTROUTING ACCEPT [806:50777]
:OUTPUT ACCEPT [2309:153487]
COMMIT
# Completed on Wed Feb 22 17:10:48 2006


Does this help anything?

Kind regards,
Ben

2006-02-23 18:45:12

by James Bottomley

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: Linux 2.6.16-rc3

On Sun, 2006-02-19 at 08:30 -0600, James Bottomley wrote:
> Yes, thanks ... although I think there's still value to wrappering
> work_struct (a bit like kref wrappers atomic_t).

OK, so how about this?

James

[PATCH] add execute_in_process_context() API

We have several points in the SCSI stack (primarily for our device
functions) where we need to guarantee process context, but (given the
place where the last reference was released) we cannot guarantee this.

This API gets around the issue by executing the function directly if
the caller has process context, but scheduling a workqueue to execute
in process context if the caller doesn't have it.

Signed-off-by: James Bottomley <[email protected]>

Index: BUILD-2.6/include/linux/workqueue.h
===================================================================
--- BUILD-2.6.orig/include/linux/workqueue.h 2006-02-20 08:57:43.000000000 -0600
+++ BUILD-2.6/include/linux/workqueue.h 2006-02-20 08:58:34.000000000 -0600
@@ -20,6 +20,10 @@
struct timer_list timer;
};

+struct execute_work {
+ struct work_struct work;
+};
+
#define __WORK_INITIALIZER(n, f, d) { \
.entry = { &(n).entry, &(n).entry }, \
.func = (f), \
@@ -74,6 +78,8 @@
void cancel_rearming_delayed_work(struct work_struct *work);
void cancel_rearming_delayed_workqueue(struct workqueue_struct *,
struct work_struct *);
+int execute_in_process_context(void (*fn)(void *), void *,
+ struct execute_work *);

/*
* Kill off a pending schedule_delayed_work(). Note that the work callback
Index: BUILD-2.6/kernel/workqueue.c
===================================================================
--- BUILD-2.6.orig/kernel/workqueue.c 2006-02-20 08:57:43.000000000 -0600
+++ BUILD-2.6/kernel/workqueue.c 2006-02-20 08:59:18.000000000 -0600
@@ -27,6 +27,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/kthread.h>
+#include <linux/hardirq.h>

/*
* The per-CPU workqueue (if single thread, we always use the first
@@ -476,6 +477,34 @@
}
EXPORT_SYMBOL(cancel_rearming_delayed_work);

+/**
+ * execute_in_process_context - reliably execute the routine with user context
+ * @fn: the function to execute
+ * @data: data to pass to the function
+ * @ew: guaranteed storage for the execute work structure (must
+ * be available when the work executes)
+ *
+ * Executes the function immediately if process context is available,
+ * otherwise schedules the function for delayed execution.
+ *
+ * Returns: 0 - function was executed
+ * 1 - function was scheduled for execution
+ */
+int execute_in_process_context(void (*fn)(void *data), void *data,
+ struct execute_work *ew)
+{
+ if (!in_interrupt()) {
+ fn(data);
+ return 0;
+ }
+
+ INIT_WORK(&ew->work, fn, data);
+ schedule_work(&ew->work);
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(execute_in_process_context);
+
int keventd_up(void)
{
return keventd_wq != NULL;