2007-01-25 02:58:16

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.20-rc6


It's been more than a week since -rc5, but I blame everybody (including
me) being away for Linux.conf.au and then me waiting for a few days
afterwards to let everybody sync up.

So there it is, -rc6, hopefully the last -rc of the series.

I'd like everybody to take a really good look at any regressions that
Adrian has been pointing out, and that very much includes the people who
reported them too, so that we can confirm whether they are still active
and relevant.

As to -rc6 itself: the bulk of it are the MTD updates (including a few new
drivers), and the POWER update (and the bulk of _that_ in terms of patch
size being defconfig updates ;)

But there's various random fixes in infiniband, DVB, network drivers,
scsi, usb, some filesystems (cifs, jffs2, nfs, ntfs, ocfs2) as well as
core networking too.

Oh, and KVM, of course.

And stuff I probably have already forgotten.

ShortLog appended.

Linus

---
Adrian Bunk (7):
[MTD] SSFDC must depend on BLOCK
[MTD] [NAND] rtc_from4.c: use lib/bitrev.c
[MTD] make drivers/mtd/cmdlinepart.c:mtdpart_setup() static
[SCSI] qla2xxx: make qla2x00_reg_remote_port() static
more ftape removal
[IRDA] vlsi_ir.{h,c}: remove kernel 2.4 code
[NET]: Process include/linux/if_{addr,link}.h with unifdef

Adrian Friedli (1):
HID: GEYSER4_ISO needs quirk

Adrian Hunter (2):
[MTD] OneNAND: Implement read-while-load
[MTD] OneNAND: Handle DDP chip boundary during read-while-load

Akinobu Mita (2):
[JFFS2] Use rb_first() and rb_last() cleanup
[SCSI] iscsi: fix crypto_alloc_hash() error check

Al Viro (5):
funsoft: ktermios fix
horizon.c: missing __devinit
s2io bogus memset
fix prototype of csum_ipv6_magic() (ia64)
s2io bogus memset

Alan Cox (1):
[MTD] MAPS: esb2rom: use hotplug safe interfaces

Alexey Dobriyan (2):
[MTD] JEDEC probe: fix comment typo (devic)
[MIPS] There is no __GNUC_MAJOR__

Amit Choudhary (1):
[JFFS2] Fix error-path leak in summary scan

Amit S. Kale (2):
NetXen: Firmware check modifications
NetXen: Use pci_register_driver() instead of pci_module_init() in init_module

Andres Salomon (1):
USB: asix: Detect internal PHY and enable/use accordingly

Andrew Hendry (1):
[X.25]: Add missing sock_put in x25_receive_data

Andrew Morton (4):
[MTD] Tidy bitrev usage in rtc_from4.c
fix "kvm: add vm exit profiling"
blockdev direct_io: fix signedness bug
SubmitChecklist update

Andrew Vasquez (9):
[SCSI] qla2xxx: Don't log trace-control async-events.
[SCSI] qla2xxx: Correct endianess issue while interrogating MS status.
[SCSI] qla2xxx: Use proper prep_ms_iocb() function during GFPN_ID.
[SCSI] qla2xxx: Detect GPSC capabilities within a fabric.
[SCSI] qla2xxx: Correct IOCB queueing mechanism for ISP54XX HBAs.
[SCSI] qla2xxx: Correct reset handling logic.
[SCSI] qla2xxx: Perform a fw-dump when an ISP23xx RISC-paused state is detected.
[SCSI] qla2xxx: Use generic isp_ops.fw_dump() function.
[SCSI] qla2xxx: Update version number to 8.01.07-k4.

Andrew Victor (2):
[MTD] NAND: AT91 NAND driver
[MTD] NAND: Support for 16-bit bus-width on AT91.

Anssi Hannula (1):
HID: put usb_interface instead of usb_device into hid->dev to fix udevinfo breakage

Anton Altaparmakov (2):
NTFS: 2.1.28 - Fix deadlock reported by Sergey Vlasov due to ntfs_put_inode().
NTFS: Forgot to bump version number in makefile to 2.1.28...

Arne Redlich (1):
[SCSI] iscsi: fix 2.6.19 data digest calculation bug

Artem Bityutskiy (10):
[MTD] core: trivial comments fix
[MTD] NAND: nandsim: support subpage write
[MTD] increase MAX_MTD_DEVICES
[MTD] add get_mtd_device_nm() function
[MTD] add get and put methods
[MTD] return error code from get_mtd_device()
[MTD] nandsim: bugfix in page addressing
[JFFS2] add cond_resched() when garbage collecting deletion dirent
[JFFS2] Reschedule in loops
[MTD] OneNAND: release CPU in cycles

Atsushi Nemoto (1):
[MIPS] Fix wrong checksum calculation on 64-bit MIPS

Avi Kivity (4):
KVM: make sure there is a vcpu context loaded when destroying the mmu
KVM: fix race between mmio reads and injected interrupts
KVM: x86 emulator: fix bit string instructions
KVM: fix bogus pagefault on writable pages

Benjamin Herrenschmidt (2):
[POWERPC] Remove bogus sanity check in pci -> OF node code
[POWERPC] Fix cell's mmio nvram to properly parse device tree

Brian Haley (1):
[SCTP]: Fix compiler warning.

Brian King (2):
libata: Fixup n_elem initialization
libata: Initialize qc->pad_len

Brice Goglin (3):
myri10ge: make wc_fifo usage load-time tunable
myri10ge: check that we can get an irq
myri10ge: update driver version to 1.2.0

Burman Yan (1):
[MTD] replace kmalloc+memset with kzalloc

Carlos Eduardo Aguiar (1):
omap: Update MMC response types

Chen, Kenneth W (1):
fix blk_direct_IO bio preparation

Chris Lalancette (1):
8139cp: Don't blindly enable interrupts

Christoph Lameter (1):
mbind: restrict nodes to the currently allowed cpuset

Dale Farnsworth (1):
mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs

Daniel Gollub (1):
USB: rndis_host: fix crash while probing a Nokia S60 mobile

Daniel Ritz (1):
usbtouchscreen: make ITM screens report BTN_TOUCH as zero when not touched

Dave Olsen (1):
[MTD] [MAPS] Support for BIOS flash chips on the nvidia ck804 southbridge

David Anders (1):
[MTD] NOR: leave Intel chips in read-array mode on suspend

David Woodhouse (22):
[MTD NAND] Initial import of CAFÉ NAND driver.
[MTD NAND] OLPC CAFÉ driver update
[MTD] NAND: Combined oob buffer so it's contiguous with data
[MTD] NAND: Correct setting of chip->oob_poi OOB buffer
[MTD] NAND: Add hardware ECC correction support to CAFÉ NAND driver
[MTD] NAND: CAFÉ NAND driver cleanup, fix ECC on reading empty flash
[MTD] NAND: Disable ECC checking on CAFÉ since it's broken for now
[MTD] NAND: Café ECC -- remove spurious BUG_ON() in err_pos()
[MTD] NAND: Reset Café controller before initialising.
[MTD] CAFÉ NAND: Add 'slowtiming' parameter, default usedma and checkecc on
[MTD] NAND: Add ECC debugging for CAFÉ
[MTD] NAND: Remove empty block ECC workaround
[MTD] NAND: Fix timing calculation in CAFÉ debugging message
[MTD] NAND: Use register #defines throughout CAFÉ driver, not numbers
[MTD] NAND: Add register debugging spew option to CAFÉ driver
[MTD] NAND: Fix ECC settings in CAFÉ controller driver.
[MTD] [NAND] Update CAFÉ driver interrupt handler prototype
[MTD] Use EXPORT_SYMBOL_GPL() for exported symbols.
[MTD] Remove trailing whitespace
[MTD] Fix SSFDC build for variable blocksize.
[MTD] Fix ssfdc blksize typo
[JFFS2] debug.h: include <linux/sched.h> for current->pid

Eric Moore (4):
[SCSI] fusion: fibre channel: return DID_ERROR for MPI_IOCSTATUS_SCSI_IOC_TERMINATED
[SCSI] fusion: power pc and miscellaneous bug fixs
[SCSI] fusion: MODULE_VERSION support
[SCSI] fusion: bump version

FUJITA Tomonori (1):
[SCSI] iscsi: simplify IPv6 and IPv4 address printing

Francois Romieu (1):
sis190: failure to set the MAC address from EEPROM

Gerd Hoffmann (1):
V4L/DVB (5069): Fix bttv and friends on 64bit machines with lots of memory

Gong Jun (3):
hwmon/w83793: Remove the description of AMDSI and update the voltage formula
hwmon/w83793: Ignore disabled temperature channels
hwmon/w83793: Hide invalid VID readings

Grant Grundler (1):
PCI: rework Documentation/pci.txt

Grant Likely (2):
V4L/DVB (5024): Fix quickcam communicator driver for big endian architectures
[POWERPC] Make it blatantly clear; mpc5200 device tree is not yet stable

Greg Kroah-Hartman (1):
USB: disable USB_MULTITHREAD_PROBE

Guy Streeter (1):
correct sys_shmget allocation check

Haavard Skinnemoen (1):
[MTD] bugfix: DataFlash is not bit writable

Herbert Xu (3):
vmx: Fix register constraint in launch code
[IPSEC] flow: Fix potential memory leak
[IPSEC]: Policy list disorder

Hoang-Nam Nguyen (2):
IB/ehca: Fix improper use of yield() with spinlock held
IB/ehca: Fix mismatched spin_unlock in irq handler

Horms (2):
Kdump documentation update: kexec-tools update
Kdump documentation update: ia64 portion

Ingo Molnar (2):
paravirt: mark the paravirt_ops export internal
notifiers: fix blocking_notifier_call_chain() scalability

Ishai Rabinovitz (1):
IB/srp: Check match_strdup() return

James Bottomley (4):
[SCSI] scsi_transport_spi: fix sense buffer size error
[SCSI] seagate: remove BROKEN tag
[SCSI] scsi_scan: fix report lun problems with CDROM or RBC devices
x86: fix PDA variables to work during boot

Jamie Lenehan (1):
rtc-sh: act on rtc_wkalrm.enabled when setting an alarm

Jarek Poplawski (1):
[TCP]: rare bad TCP checksum with 2.6.19

Jean Delvare (2):
hwmon: Fix the VRD 11 decoding
PCI: Unhide the SMBus on the Asus P4P800-X

Jeff Chua (1):
acpi: remove "video device notify" message

Jeff Garzik (2):
[JFFS2] kill warning RE debug-only variables
Note that JFFS (v1) is to be deleted, in feature-removal-schedule.txt

Jeremy Roberson (1):
hid-core.c: Adds GTCO CalComp Interwrite IPanel PIDs to blacklist

Jes Sorensen (1):
[SCSI] qla1280: set residual correctly

Jiri Kosina (3):
HID: update MAINTAINERS entry for USB-HID
HID: compilation fix when DEBUG_DATA is defined
HID: hid/hid-input.c doesn't need to include linux/usb/input.h

Josh Boyer (1):
[MTD] add MTD_BLKDEVS Kconfig option

Karsten Wiese (1):
[ALSA] Repair snd-usb-usx2y over OHCI

Komuro (1):
modify 3c589_cs to be SMP safe

Kumar Gala (1):
PHY: Export phy ethtool helpers

Kyungmin Park (9):
MTD: OneNAND: interrupt based wait support
[MTD] OneNAND: lock support
[MTD] OneNAND: Single bit error detection
[MTD] OneNAND: fix oob handling in recent oob patch
[JFFS2] use the ref_offset macro
[MTD] OneNAND: fix onenand_wait bug
[MTD] OneNAND: add subpage write support
[MTD] OneNAND: fix onenand_wait bug in read ecc error
[MTD] OneNAND: return ecc error code only when 2-bit ecc occurs

Larry Finger (1):
bcm43xx: Fix failure to deliver PCI-E interrupts

Lew Glendenning (1):
[MTD] MAPS: Support for BIOS flash chips on Intel ESB2 southbridge

Li Yang (1):
[POWERPC] Fix OF node refcnt underflow in 836x and 832x platform code

Linas Vepstas (2):
[POWERPC] Fix broken DMA on non-LPAR pSeries
elevator: move clearing of unplug flag earlier

Linus Torvalds (4):
Revert "[PATCH] Fix up mmap_kmem"
Clear spurious irq stat information when adding irq handler
Change Linus' email address too
Linux 2.6.20-rc6

Luca Pedrielli (1):
sata_via: add PCI ID 0x5337

Manuel Osdoba (1):
USB: unusual_devs.h entry for nokia 6233

Marcel Holtmann (2):
[Bluetooth] Missing endian swapping for L2CAP socket list
[Bluetooth] Restrict well known PSM to privileged users

Mariusz Kozlowski (2):
[MTD] [NAND] Compile fix in rfc_from4.c
[SCSI] scsi: lpfc error path fix

Mark Fasheh (4):
ocfs2: Don't print errors when following symlinks
ocfs2: Directory c/mtime update fixes
ocfs2: cleanup ocfs2_iget() errors
ocfs2: Add backup superblock info to ocfs2_fs.h

Mark Gross (1):
tlclk: bug fix + misc fixes

Martin Samuelsson (1):
V4L/DVB (5029): Ks0127 status flags

Masahide NAKAMURA (1):
[IP] TUNNEL: Fix to be built with user application.

Masayuki Nakagawa (1):
[TCP]: skb is unexpectedly freed.

Matthew Wilcox (1):
[SCSI] Add missing completion to scsi_complete_async_scans()

Mauro Carvalho Chehab (2):
V4L/DVB (5020): Fix: disable interrupts while at KM_BOUNCE_READ
V4L/DVB (5023): Fix compilation on ppc32 architecture

Meelis Roos (1):
[SCSI] iscsi: newline in printk

Michael Krufky (1):
V4L/DVB (5071): Tveeprom: autodetect LG TAPC G701D as tuner type 37

Mikael Pettersson (1):
[NETFILTER]: fix xt_state compile failure

Mike Christie (1):
[SCSI] libiscsi: fix senselen calculation

Noriaki TAKAMIYA (1):
[IPV6]: Fixed the size of the netlink message notified by inet6_rt_notify().

Oleg Nesterov (1):
V4L/DVB (5123): Buf_qbuf: fix: videobuf_queue->stream corruption and lockup

Oliver Neukum (1):
USB: make usbhid ignore Imation Disc Stakka

Olof Johansson (1):
sata_mv HighPoint 2310 support (88SX7042)

Patrick McHardy (2):
[NETFILTER]: ctnetlink: fix leak in ctnetlink_create_conntrack error path
[NETFILTER]: Fix iptables ABI breakage on (at least) CRIS

Paul Mackerras (1):
[POWERPC] Update defconfigs

Pete Zaitcev (1):
USB: unusual_devs.h for 0x046b:ff40

Petr Stetiar (1):
USB: Fix for typo in ohci-ep93xx.c

Philip Langdale (1):
mmc: Correct definition of R6

Qi Yong (1):
[JFFS2] Fix jffs2_follow_link() typo

Ralf Baechle (8):
[MTD] Nuke IVR leftovers
[MIPS] SMTC: Fix cp0 hazard.
[MIPS] Delete duplicate call to load_irq_save.
[MIPS] SMTC: Instant IPI replay.
[MIPS] Fix APM build
[MIPS] SMTC: Fix TLB sizing bug for TLB of 64 >= entries
[MIPS] SMTC: Fix module build by exporting symbol
[MIPS] VPE loader: Initialize lists before they're actually being used ...

Randy Dunlap (4):
[MTD] Fix printk format warning in physmap. (resources again)
[MTD] ESB2ROM uses PCI
[SCSI] advansys: wrap PCI table inside ifdef CONFIG_PCI
PCI: fix pci-driver kernel-doc

Ricard Wanderlöf (2):
[MTD] mtdchar: Fix MEMGETOOBSEL and ECCGETLAYOUT ioctls
[MTD] NAND: Fix nand_default_mark_blockbad() when flash-based BBT disabled

Richard Purdie (1):
[MTD] Allow variable block sizes in mtd_blkdevs

Robert Hancock (2):
V4L/DVB (5021): Cx88xx: Fix lockup on suspend
sata_nv: don't rely on NV_INT_DEV indication with ADMA

Robert Jennings (1):
[POWERPC] atomic_dec_if_positive sign extension fix

Robert P. J. Day (1):
libata doc: "error : unterminated entity reference exceptions"

Rod Whitby (1):
[MTD] Support combined RedBoot FIS directory and configuration area

Rudolf Marek (1):
hwmon/w83793: Fix the fan input detection

Russell King (1):
HID: fix some ARM builds due to HID brokenness - make USB_HID depend on INPUT

Ryan Jackson (2):
[MTD] MAPS: Add parameter to amd76xrom to override rom window size
[MTD] CHIPS: Support for SST 49LF040B flash chip

Salyzyn, Mark (1):
[SCSI] aacraid: Product List Update

Samuel Ortiz (2):
[IrDA]: irda-usb TX path optimization (was Re: IrDA spams logfiles - since 2.6.19)
[IrDA]: Removed incorrect IRDA_ASSERT()

Simon Budig (2):
HID: proper LED-mapping for SpaceNavigator
HID: add missing RX, RZ and RY enum values to hid-debug output

Stefan Roese (1):
[MTD] [NAND] Fix endianess bug in ndfc.c

Stephen Hemminger (1):
email change for [email protected]

Steve French (4):
[CIFS] Update CIFS version number
[CIFS] Remove 2 unneeded kzalloc casts
[CIFS] cifs sprintf fix
[CIFS] Fix oops when Windows server sent bad domain name null terminator

Sumant Patro (1):
[SCSI] megaraid_sas: Update module author

Tejun Heo (5):
[SCSI] sr: fix error code check in sr_block_ioctl()
libata: initialize qc->dma_dir to DMA_NONE
libata: fix handling of port actions in per-dev action mask
ahci: make ULi M5288 ignore interface fatal error bit
ahci: don't enter slumber on power down

Thiemo Seufer (1):
[MIPS] Fix reported amount of freed memory - it's in kB not bytes

Thierry MERLE (1):
V4L/DVB (5019): Fix the frame->grabstate update in read() entry point.

Thomas Gleixner (1):
[MTD] NAND: add subpage write support

Thomas Klein (7):
ehea: Fixed wrong dereferencation
ehea: Fixing firmware queue config issue
ehea: Modified initial autoneg state determination
ehea: New method to determine number of available ports
ehea: Improved logging of permission issues
ehea: Added logging off associated errors
ehea: Fixed possible nullpointer access

Timo Lindhorst (2):
[MTD] [NAND] fix ifdef option in nand_ecc.c
[MTD] NAND: use SmartMedia ECC byte order for ndfc

Timur Tabi (2):
Update ucc_geth.c for new workqueue structure
Fix phy_read/write redefinition errors in ucc_geth_phy.c

Trond Myklebust (2):
NFS: Fix Oops in rpc_call_sync()
NFS: Fix races in nfs_revalidate_mapping()

Venkat Yekkirala (1):
[SELINUX]: increment flow cache genid

Venkatesh Pallipadi (1):
Revert nmi_known_cpu() check during boot option parsing

Vijay Kumar (3):
[MTD] NAND: nandsim page-wise allocation (1/2)
[MTD] NAND: nandsim page-wise allocation (2/2)
[MTD] NAND: nandsim coding style fix

Vitaly Wool (2):
[MTD] [NAND] remove len/ooblen confusion.
[MTD] of_device-based physmap driver

Vlad Yasevich (4):
[SCTP]: Set correct error cause value for missing parameters
[SCTP]: Verify some mandatory parameters.
[SCTP]: Correctly handle unexpected INIT-ACK chunk.
[SCTP]: Fix SACK sequence during shutdown

Vladimir Saveliev (1):
resierfs: avoid tail packing if an inode was ever mmapped

YOSHIFUJI Hideaki (1):
[IPV6] MCAST: Fix joining all-node multicast group on device initialization.

Yan Burman (1):
[JFFS2] replace kmalloc+memset with kzalloc

Yoichi Yuasa (4):
[MTD] MAPS: Remove ITE 8172G and Globespan IVR MTD support
[MTD] fix map probe name for cstm_mips_ixx
[MIPS] Vr41xx: Fix after GENERIC_HARDIRQS_NO__DO_IRQ change
[MIPS] vr41xx: need one more nop with mtc0_tlbw_hazard()

Yoshinori Sato (1):
[MTD] redboot partition combined fis / config problem

adam radford (1):
[SCSI] 3ware 8000 serialize reset code

[email protected] (1):
USB: add vendor/device id for Option GT Max 3.6 cards

hermann pitton (1):
V4L/DVB (5033): MSI TV@nywhere Plus fixes


2007-01-25 10:09:55

by Sunil Naidu

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6

On 1/25/07, Linus Torvalds <[email protected]> wrote:
>
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.
>
> So there it is, -rc6, hopefully the last -rc of the series.
>

It was a cool booting, have really enjoyed this.

I have one question which is open (seems ignored or missed by u guys).

migration_cost=33 for 2.6.20-rc5
migration_cost=159 for 2.6.20-rc6

What does this mean?


> Linus
>
~Akula2

2007-01-25 11:20:07

by Eyal Lebedinsky

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - build failure

i386
Practically all modules selected.

Building modules, stage 2.
MODPOST 1931 modules
WARNING: drivers/atm/fore_200e.o - Section mismatch: reference to .init.text: from .text between 'fore200e_initialize' (at offset 0x25af) and 'fore200e_monitor_putc'
WARNING: drivers/atm/lanai.o - Section mismatch: reference to .init.text: from .text between 'sram_test_pass' (at offset 0x1a8) and 'sram_test_and_clear'
WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from .text after 'zatm_init_one' (at offset 0x1f25)
WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from .text after 'zatm_init_one' (at offset 0x1f32)
WARNING: drivers/net/rrunner.o - Section mismatch: reference to .init.text:rr_init from .text between 'rr_init_one' (at offset 0x1d0) and 'rr_remove_one'
WARNING: drivers/net/sis900.o - Section mismatch: reference to .init.text:sis900_mii_probe from .text between 'sis900_probe' (at offset 0x47b) and 'sis900_default_phy'
WARNING: drivers/net/sunhme.o - Section mismatch: reference to .init.text: from .text between 'happy_meal_pci_probe' (at offset 0x2add) and 'happy_meal_pci_remove'
WARNING: drivers/net/tokenring/3c359.o - Section mismatch: reference to .init.text:xl_init from .text between 'xl_probe' (at offset 0x1da) and 'xl_hw_reset'
WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2151) and 'de_remove_one'
WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2158) and 'de_remove_one'
WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2220) and 'de_remove_one'
WARNING: "__udivdi3" [fs/ocfs2/ocfs2.ko] undefined!
make[1]: *** [__modpost] Error 1

--
Eyal Lebedinsky ([email protected]) <http://samba.org/eyal/>
attach .zip as .dat

2007-01-25 17:51:03

by Arkadiusz Patyk

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6

Linus Torvalds ([email protected]) wrote:
>
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.
>
> So there it is, -rc6, hopefully the last -rc of the series.

Hi,

What about alsa 1.0.14rc2 ?

Cheers,
Areq

2007-01-25 21:06:08

by Michal Piotrowski

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6

Hi,

Linus Torvalds napisał(a):
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.
>
> So there it is, -rc6, hopefully the last -rc of the series.
>
> I'd like everybody to take a really good look at any regressions that
> Adrian has been pointing out, and that very much includes the people who
> reported them too, so that we can confirm whether they are still active
> and relevant.
>

It doesn't build for me.

make O=/dir
[..]
security/built-in.o: In function `security_set_bools':
(.text+0x12471): undefined reference to `flow_cache_genid'
security/built-in.o: In function `security_load_policy':
(.text+0x128b3): undefined reference to `flow_cache_genid'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [_all] Error 2

334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e is first bad commit
commit 334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e
Author: Venkat Yekkirala <[email protected]>
Date: Mon Jan 15 16:38:45 2007 -0800

[SELINUX]: increment flow cache genid

Currently, old flow cache entries remain valid even after
a reload of SELinux policy.

This patch increments the flow cache generation id
on policy (re)loads so that flow cache entries are
revalidated as needed.

Thanks to Herbet Xu for pointing this out. See:
http://marc.theaimsgroup.com/?l=linux-netdev&m=116841378704536&w=2

There's also a general issue as well as a solution proposed
by David Miller for when flow_cache_genid wraps. I might be
submitting a separate patch for that later.

I request that this be applied to 2.6.20 since it's
a security relevant fix.

Signed-off-by: Venkat Yekkirala <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

:040000 040000 aad4041b6ef7f6c0503fb9b66bfe3ce4db3405db 7b3c344f46ac21d524f9eb1c15b9e64b3459f2b7 M security

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

2007-01-25 21:12:39

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6

From: Michal Piotrowski <[email protected]>
Date: Thu, 25 Jan 2007 22:05:48 +0100

> It doesn't build for me.
>
> make O=/dir
> [..]
> security/built-in.o: In function `security_set_bools':
> (.text+0x12471): undefined reference to `flow_cache_genid'
> security/built-in.o: In function `security_load_policy':
> (.text+0x128b3): undefined reference to `flow_cache_genid'
> make[1]: *** [.tmp_vmlinux1] Error 1
> make: *** [_all] Error 2

Venkat, I think we should fix this by embedding the flow_cache_genid
bumps into selinux_netlbl_cache_invalidate() and
selnl_notify_policyload(), or something like that.

That way we don't have to pepper CONFIG_XFRM ifdefs all over
services.c

Any better ideas? I'm actually surprised this code gets built
at all when CONFIG_XFRM is disabled to be honest. :-)

2007-01-26 02:22:40

by Eyal Lebedinsky

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - build failure

I should have added that this is on Debian stable:
$ gcc --version
gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

Eyal Lebedinsky wrote:
> i386
> Practically all modules selected.
>
> Building modules, stage 2.
> MODPOST 1931 modules
> WARNING: drivers/atm/fore_200e.o - Section mismatch: reference to .init.text: from .text between 'fore200e_initialize' (at offset 0x25af) and 'fore200e_monitor_putc'
> WARNING: drivers/atm/lanai.o - Section mismatch: reference to .init.text: from .text between 'sram_test_pass' (at offset 0x1a8) and 'sram_test_and_clear'
> WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from .text after 'zatm_init_one' (at offset 0x1f25)
> WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from .text after 'zatm_init_one' (at offset 0x1f32)
> WARNING: drivers/net/rrunner.o - Section mismatch: reference to .init.text:rr_init from .text between 'rr_init_one' (at offset 0x1d0) and 'rr_remove_one'
> WARNING: drivers/net/sis900.o - Section mismatch: reference to .init.text:sis900_mii_probe from .text between 'sis900_probe' (at offset 0x47b) and 'sis900_default_phy'
> WARNING: drivers/net/sunhme.o - Section mismatch: reference to .init.text: from .text between 'happy_meal_pci_probe' (at offset 0x2add) and 'happy_meal_pci_remove'
> WARNING: drivers/net/tokenring/3c359.o - Section mismatch: reference to .init.text:xl_init from .text between 'xl_probe' (at offset 0x1da) and 'xl_hw_reset'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2151) and 'de_remove_one'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2158) and 'de_remove_one'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to .init.text: from .text between 'de_init_one' (at offset 0x2220) and 'de_remove_one'
> WARNING: "__udivdi3" [fs/ocfs2/ocfs2.ko] undefined!
> make[1]: *** [__modpost] Error 1

--
Eyal Lebedinsky ([email protected]) <http://samba.org/eyal/>
attach .zip as .dat

2007-01-26 17:00:11

by Venkat Yekkirala

[permalink] [raw]
Subject: RE: Linux 2.6.20-rc6

> Venkat, I think we should fix this by embedding the flow_cache_genid
> bumps into selinux_netlbl_cache_invalidate() and
> selnl_notify_policyload(), or something like that.
>

Sure. Sending a patch under separate cover in a minute.

> That way we don't have to pepper CONFIG_XFRM ifdefs all over
> services.c
>
> Any better ideas? I'm actually surprised this code gets built
> at all when CONFIG_XFRM is disabled to be honest. :-)

Me too :)

My apologies...

2007-01-26 18:10:32

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known unfixed regressions (part 1)

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : NULL pointer dereference at as_move_to_dispatch()
References : http://lkml.org/lkml/2007/1/22/141
Submitter : Andrew Vasquez <[email protected]>
Status : unknown


Subject : raid1: copying a big file triggers OOM killer
References : http://lkml.org/lkml/2007/1/20/69
Submitter : Justin Piszcz <[email protected]>
Handled-By : Jens Axboe <[email protected]>
Status : problem is being debugged


Subject : ext3 with data=journal hangs when running fsx-linux since -rc2
References : http://bugzilla.kernel.org/show_bug.cgi?id=7844
Submitter : Randy Dunlap <[email protected]>
Handled-By : Linus Torvalds <[email protected]>
Randy Dunlap <[email protected]>
Andrew Morton <[email protected]>
Status : problem is being debugged


Subject : reboot instead of powerdown (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2006/12/25/40
http://bugzilla.kernel.org/show_bug.cgi?id=7828
Submitter : Berthold Cogel <[email protected]>
François Valenduc <[email protected]>
Handled-By : Alan Stern <[email protected]>
Status : problem is being debugged


Subject : usb somehow broken (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2007/1/11/146
Submitter : Prakash Punnoor <[email protected]>
Handled-By : Oliver Neukum <[email protected]>
Alan Stern <[email protected]>
Status : problem is being debugged


Subject : fix geode_configure()
References : http://lkml.org/lkml/2007/1/9/216
Submitter : Lennart Sorensen <[email protected]>
Caused-By : takada <[email protected]>
commit e4f0ae0ea63caceff37a13f281a72652b7ea71ba
Handled-By : takada <[email protected]>
Lennart Sorensen <[email protected]>
Status : patches are being discussed



2007-01-26 18:11:29

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known unfixed regressions (part 2)

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : problems with CD burning
References : http://www.spinics.net/lists/linux-ide/msg06545.html
Submitter : Uwe Bugla <[email protected]>
Status : unknown


Subject : pktcdvd fails with pata_amd
References : http://bugzilla.kernel.org/show_bug.cgi?id=7810
http://lkml.org/lkml/2007/1/25/128
Submitter : Gerhard Dirschl <[email protected]>
Caused-By : Christoph Hellwig <[email protected]>
commit 406c9b605cbc45151c03ac9a3f95e9acf050808c
commit 3b00315799d78f76531b71435fbc2643cd71ae4c
Status : problem is being debugged


Subject : SELinux compile error with CONFIG_XFRM=n
References : http://lkml.org/lkml/2007/1/25/233
Submitter : Michal Piotrowski <[email protected]>
Caused-By : Venkat Yekkirala <[email protected]>
commit 334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e
Handled-By : Venkat Yekkirala <[email protected]>
David Miller <[email protected]>
Status : problem is being discussed


Subject : powerpc64: performance monitor exception
References : http://ozlabs.org/pipermail/linuxppc-dev/2007-January/030045.html
Submitter : Livio Soares <[email protected]>
Caused-By : Paul Mackerras <[email protected]>
commit d04c56f73c30a5e593202ecfcf25ed43d42363a2
Status : problem is being discussed


Subject : BUG: at fs/inotify.c:172 set_dentry_child_flags()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7785
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Handled-By : Nick Piggin <[email protected]>
Status : problem is being debugged


Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
References : http://lkml.org/lkml/2007/1/7/117
http://lkml.org/lkml/2007/1/10/202
Submitter : Malte Schröder <[email protected]>
Handled-By : Vladimir V. Saveliev <[email protected]>
Nick Piggin <[email protected]>
Patch : http://lkml.org/lkml/2007/1/10/202
Status : problem is being discussed


Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (XFS)
References : http://lkml.org/lkml/2007/1/5/308
http://lkml.org/lkml/2007/1/23/190
http://lkml.org/lkml/2007/1/23/192
Submitter : Sami Farin <[email protected]>
Handled-By : David Chinner <[email protected]>
Status : patches are being discussed

2007-01-26 18:16:34

by Malte Schröder

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)

On Friday 26 January 2007 19:11, Adrian Bunk wrote:
> Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
> References : http://lkml.org/lkml/2007/1/7/117
> http://lkml.org/lkml/2007/1/10/202
> Submitter : Malte Schröder <[email protected]>
> Handled-By : Vladimir V. Saveliev <[email protected]>
> Nick Piggin <[email protected]>
> Patch : http://lkml.org/lkml/2007/1/10/202
> Status : problem is being discussed

This did not happen again.


--
---------------------------------------
Malte Schröder
[email protected]
ICQ# 68121508
---------------------------------------


Attachments:
(No filename) (631.00 B)
(No filename) (189.00 B)
Download all attachments

2007-01-26 18:18:48

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known regressions with patches

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
with patches available.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : RAID: annoying "hunk_aligned_read : non aligned" messages
References : http://bugzilla.kernel.org/show_bug.cgi?id=7835
Submitter : Duncan <[email protected]>
Handled-By : Neil Brown <[email protected]>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=7835
Status : patch available


Subject : ACPI: fix cpufreq regression
References : http://lkml.org/lkml/2007/1/16/120
Submitter : Ingo Molnar <[email protected]>
Caused-By : Dave Jones <[email protected]>
commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
Handled-By : Ingo Molnar <[email protected]>
Patch : http://lkml.org/lkml/2007/1/16/120
Status : patch available


Subject : MIPS Malta: CONFIG_MTD=n compile error
References : http://lkml.org/lkml/2007/1/25/122
Submitter : Jan Altenberg <[email protected]>
Caused-By : Ralf Baechle <[email protected]>
commit b228f4c54df37b53c6f364aa7f3efa4280bcc4f0
Handled-By : Jan Altenberg <[email protected]>
Patch : http://lkml.org/lkml/2007/1/25/122
Status : patch available


Subject : NFS triggers WARN_ON() in invalidate_inode_pages2_range()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7826
Submitter : Andrew Clayton <[email protected]>
Caused-By : Andrew Morton <[email protected]>
commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140
Handled-By : Trond Myklebust <[email protected]>
Patch : http://lkml.org/lkml/2007/1/24/323
Status : patch available

2007-01-26 18:49:36

by Adrian Bunk

[permalink] [raw]
Subject: [2.6 patch] fix OCFS2 compile error

On Fri, Jan 26, 2007 at 01:22:34PM +1100, Eyal Lebedinsky wrote:
> I should have added that this is on Debian stable:
> $ gcc --version
> gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
>
> Eyal Lebedinsky wrote:
> > i386
> > Practically all modules selected.
> >
> > Building modules, stage 2.
> > MODPOST 1931 modules
>...
> > WARNING: "__udivdi3" [fs/ocfs2/ocfs2.ko] undefined!
> > make[1]: *** [__modpost] Error 1

Thanks for your report.

I don't know why gcc 3.3 generates this now since ocfs2_backup_super_blkno()
seems to be unused, but there is a bug for 32 bit systems that should be
fixed:

Commit 50af94b14c98f5769860a282a397c6f3b135c8a8 adds:
offset /= sb->s_blocksize;

That is:
u64 = u64 / long

Not a problem on 64bit architectures, but obviously a problem on 32 bit
architectures.

This patch fixes it by using sb->s_blocksize_bits instead of sb->s_blocksize.

Signed-off-by: Adrian Bunk <[email protected]>

--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -587,7 +587,7 @@ static inline u64 ocfs2_backup_super_blkno(struct super_block *sb, int index)

if (index >= 0 && index < OCFS2_MAX_BACKUP_SUPERBLOCKS) {
offset <<= (2 * index);
- offset /= sb->s_blocksize;
+ offset >>= sb->s_blocksize_bits;
return offset;
}



2007-01-26 18:56:53

by Mark Fasheh

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - build failure

On Thu, Jan 25, 2007 at 10:10:00PM +1100, Eyal Lebedinsky wrote:
> WARNING: "__udivdi3" [fs/ocfs2/ocfs2.ko] undefined!

Doh! This one is almost definitely my fault. Does the attached patch get rid
of that warning for you?


From: Mark Fasheh <[email protected]>
ocfs2: fix thinko in ocfs2_backup_super_blkno()

Fix a bug which was introduced when I synced up ocfs2_fs.h with ocfs2-tools.
We can't do u64/u32 in kernel.

Signed-off-by: Mark Fasheh <[email protected]>
---
fs/ocfs2/ocfs2_fs.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h
index c99e905..e61e218 100644
--- a/fs/ocfs2/ocfs2_fs.h
+++ b/fs/ocfs2/ocfs2_fs.h
@@ -587,7 +587,7 @@ static inline u64 ocfs2_backup_super_blkno(struct super_block *sb, int index)

if (index >= 0 && index < OCFS2_MAX_BACKUP_SUPERBLOCKS) {
offset <<= (2 * index);
- offset /= sb->s_blocksize;
+ offset >>= sb->s_blocksize_bits;
return offset;
}

--
1.4.4.2

2007-01-26 19:04:46

by Michal Piotrowski

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)

Hi,

Adrian Bunk napisał(a):
> This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
> that are not yet fixed in Linus' tree.
>
> If you find your name in the Cc header, you are either submitter of one
> of the bugs, maintainer of an affectected subsystem or driver, a patch
> of you caused a breakage or I'm considering you in any other way possibly
> involved with one or more of these issues.
>
> Due to the huge amount of recipients, please trim the Cc when answering.
>
>
> Subject : SELinux compile error with CONFIG_XFRM=n
> References : http://lkml.org/lkml/2007/1/25/233
> Submitter : Michal Piotrowski <[email protected]>
> Caused-By : Venkat Yekkirala <[email protected]>
> commit 334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e
> Handled-By : Venkat Yekkirala <[email protected]>
> David Miller <[email protected]>
> Status : problem is being discussed
>

Venkat Yekkirala has sent me a patch that fixed this build problem.

Thanks!

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

2007-01-26 19:10:48

by Venkat Yekkirala

[permalink] [raw]
Subject: RE: 2.6.20-rc6: known unfixed regressions (part 2)

FYI- The patch can be viewed here:

http://marc.theaimsgroup.com/?l=linux-netdev&m=116983297300536&w=2

> -----Original Message-----
> From: Michal Piotrowski [mailto:[email protected]]
> Sent: Friday, January 26, 2007 1:04 PM
> To: Adrian Bunk
> Cc: Linus Torvalds; Andrew Morton; Linux Kernel Mailing List;
> Uwe Bugla;
> [email protected]; [email protected];
> [email protected]; Gerhard Dirschl; Christoph Hellwig;
> [email protected]; Michal Piotrowski; Venkat Yekkirala; David Miller;
> [email protected]; [email protected]; [email protected]; Livio
> Soares; Paul Mackerras; [email protected]; Cijoml Cijomlovic
> Cijomlov; Nick Piggin; [email protected]; [email protected]; Malte
> Schröder; Vladimir V. Saveliev; [email protected]; Sami Farin;
> David Chinner; [email protected]
> Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)
>
>
> Hi,
>
> Adrian Bunk napisał(a):
> > This email lists some known regressions in 2.6.20-rc6
> compared to 2.6.19
> > that are not yet fixed in Linus' tree.
> >
> > If you find your name in the Cc header, you are either
> submitter of one
> > of the bugs, maintainer of an affectected subsystem or
> driver, a patch
> > of you caused a breakage or I'm considering you in any
> other way possibly
> > involved with one or more of these issues.
> >
> > Due to the huge amount of recipients, please trim the Cc
> when answering.
> >
> >
> > Subject : SELinux compile error with CONFIG_XFRM=n
> > References : http://lkml.org/lkml/2007/1/25/233
> > Submitter : Michal Piotrowski <[email protected]>
> > Caused-By : Venkat Yekkirala <[email protected]>
> > commit 334c85569b8adeaa820c0f2fab3c8f0a9dc8b92e
> > Handled-By : Venkat Yekkirala <[email protected]>
> > David Miller <[email protected]>
> > Status : problem is being discussed
> >
>
> Venkat Yekkirala has sent me a patch that fixed this build problem.
>
> Thanks!
>
> Regards,
> Michal
>
> --
> Michal K. K. Piotrowski
> LTG - Linux Testers Group
> (http://www.stardust.webpages.pl/ltg/)
>

2007-01-26 19:50:33

by Mark Fasheh

[permalink] [raw]
Subject: Re: [2.6 patch] fix OCFS2 compile error

On Fri, Jan 26, 2007 at 07:49:42PM +0100, Adrian Bunk wrote:
> I don't know why gcc 3.3 generates this now since ocfs2_backup_super_blkno()
> seems to be unused, but there is a bug for 32 bit systems that should be
> fixed:
>
> Commit 50af94b14c98f5769860a282a397c6f3b135c8a8 adds:
> offset /= sb->s_blocksize;
>
> That is:
> u64 = u64 / long
>
> Not a problem on 64bit architectures, but obviously a problem on 32 bit
> architectures.
>
> This patch fixes it by using sb->s_blocksize_bits instead of sb->s_blocksize.

Thanks Adrian... I think we both hit upon this at about the same time :) My
patch made it into ocfs2.git first (for obvious reasons)...
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[email protected]

2007-01-26 19:53:14

by Adrian Bunk

[permalink] [raw]
Subject: Re: [2.6 patch] fix OCFS2 compile error

On Fri, Jan 26, 2007 at 11:47:13AM -0800, Mark Fasheh wrote:
> On Fri, Jan 26, 2007 at 07:49:42PM +0100, Adrian Bunk wrote:
> > I don't know why gcc 3.3 generates this now since ocfs2_backup_super_blkno()
> > seems to be unused, but there is a bug for 32 bit systems that should be
> > fixed:
> >
> > Commit 50af94b14c98f5769860a282a397c6f3b135c8a8 adds:
> > offset /= sb->s_blocksize;
> >
> > That is:
> > u64 = u64 / long
> >
> > Not a problem on 64bit architectures, but obviously a problem on 32 bit
> > architectures.
> >
> > This patch fixes it by using sb->s_blocksize_bits instead of sb->s_blocksize.
>
> Thanks Adrian... I think we both hit upon this at about the same time :) My
> patch made it into ocfs2.git first (for obvious reasons)...

That's unfair. ;-)

> --Mark

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-01-27 17:27:54

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)

On Fri, Jan 26, 2007 at 07:16:25PM +0100, Malte Schröder wrote:
> On Friday 26 January 2007 19:11, Adrian Bunk wrote:
> > Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
> > References : http://lkml.org/lkml/2007/1/7/117
> > http://lkml.org/lkml/2007/1/10/202
> > Submitter : Malte Schröder <[email protected]>
> > Handled-By : Vladimir V. Saveliev <[email protected]>
> > Nick Piggin <[email protected]>
> > Patch : http://lkml.org/lkml/2007/1/10/202
> > Status : problem is being discussed
>
> This did not happen again.

Unless I misunderstand this issue, we want this fixed before 2.6.20
because otherwise many people might see this BUG message.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-01-27 17:31:55

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known unfixed regressions (v2) (part 1)

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : NULL pointer dereference at as_move_to_dispatch()
References : http://lkml.org/lkml/2007/1/22/141
Submitter : Andrew Vasquez <[email protected]>
Status : unknown


Subject : reboot instead of powerdown (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2006/12/25/40
http://bugzilla.kernel.org/show_bug.cgi?id=7828
Submitter : Berthold Cogel <[email protected]>
François Valenduc <[email protected]>
Handled-By : Alan Stern <[email protected]>
Status : problem is being debugged


Subject : usb somehow broken (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2007/1/11/146
Submitter : Prakash Punnoor <[email protected]>
Handled-By : Oliver Neukum <[email protected]>
Alan Stern <[email protected]>
Status : problem is being debugged


Subject : fix geode_configure()
References : http://lkml.org/lkml/2007/1/9/216
Submitter : Lennart Sorensen <[email protected]>
Caused-By : takada <[email protected]>
commit e4f0ae0ea63caceff37a13f281a72652b7ea71ba
Handled-By : takada <[email protected]>
Lennart Sorensen <[email protected]>
Status : patches are being discussed

2007-01-27 17:39:00

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)

On Sat, Jan 27, 2007 at 06:28:01PM +0100, Adrian Bunk wrote:
> On Fri, Jan 26, 2007 at 07:16:25PM +0100, Malte Schröder wrote:
> > On Friday 26 January 2007 19:11, Adrian Bunk wrote:
> > > Subject : BUG: at mm/truncate.c:60 cancel_dirty_page() (reiserfs)
> > > References : http://lkml.org/lkml/2007/1/7/117
> > > http://lkml.org/lkml/2007/1/10/202
> > > Submitter : Malte Schröder <[email protected]>
> > > Handled-By : Vladimir V. Saveliev <[email protected]>
> > > Nick Piggin <[email protected]>
> > > Patch : http://lkml.org/lkml/2007/1/10/202
> > > Status : problem is being discussed
> >
> > This did not happen again.
>
> Unless I misunderstand this issue, we want this fixed before 2.6.20
> because otherwise many people might see this BUG message.

/me looks:
The warning was already removed post -rc6.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-01-27 17:42:23

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : problems with CD burning
References : http://www.spinics.net/lists/linux-ide/msg06545.html
Submitter : Uwe Bugla <[email protected]>
Status : unknown


Subject : pktcdvd fails with pata_amd
References : http://bugzilla.kernel.org/show_bug.cgi?id=7810
http://lkml.org/lkml/2007/1/25/128
Submitter : Gerhard Dirschl <[email protected]>
Caused-By : Christoph Hellwig <[email protected]>
commit 3b00315799d78f76531b71435fbc2643cd71ae4c
commit 406c9b605cbc45151c03ac9a3f95e9acf050808c
Status : problem is being debugged


Subject : powerpc64: performance monitor exception
References : http://ozlabs.org/pipermail/linuxppc-dev/2007-January/030045.html
Submitter : Livio Soares <[email protected]>
Caused-By : Paul Mackerras <[email protected]>
commit d04c56f73c30a5e593202ecfcf25ed43d42363a2
Status : problem is being discussed


Subject : BUG: at fs/inotify.c:172 set_dentry_child_flags()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7785
Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
Handled-By : Nick Piggin <[email protected]>
Status : problem is being debugged


2007-01-27 17:44:13

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.20-rc6: known regressions with patches (v2)

This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
with patches available.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : MIPS Malta: CONFIG_MTD=n compile error
References : http://lkml.org/lkml/2007/1/25/122
Submitter : Jan Altenberg <[email protected]>
Caused-By : Ralf Baechle <[email protected]>
commit b228f4c54df37b53c6f364aa7f3efa4280bcc4f0
Handled-By : Jan Altenberg <[email protected]>
Patch : http://lkml.org/lkml/2007/1/25/122
Status : patch available


Subject : NFS triggers WARN_ON() in invalidate_inode_pages2_range()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7826
Submitter : Andrew Clayton <[email protected]>
Caused-By : Andrew Morton <[email protected]>
commit 8258d4a574d3a8c01f0ef68aa26b969398a0e140
Handled-By : Trond Myklebust <[email protected]>
Patch : http://lkml.org/lkml/2007/1/24/323
Status : patch available



2007-01-27 18:01:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (part 2)



On Sat, 27 Jan 2007, Adrian Bunk wrote:
>
> Unless I misunderstand this issue, we want this fixed before 2.6.20
> because otherwise many people might see this BUG message.

The warnign was removed for other reasons, I'll re-instate it after 2.6.20
has been released so that we can resolve all filesystem things (as it is,
right now, it's not any worse than it ever has been, and the kernel will
shut up about filesystems doing something strange).

But I think the Reiserfs problem already got fixed, and wouldn't warn any
more even if the WARN_ON() was still there..

Linus

2007-01-27 20:47:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - supend lockdep warning

On Wed, 2007-01-24 at 18:58 -0800, Linus Torvalds wrote:
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.

2.6.20-rc6-git (today) on a dual core laptop:

PM: Preparing system for mem sleep
Disabling non-boot CPUs ...

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc6 #3
-------------------------------------------------------
pm-suspend/3601 is trying to acquire lock:
(cpu_bitmask_lock){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f

but task is already holding lock:
(workqueue_mutex){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (workqueue_mutex){--..}:
[<c0140880>] __lock_acquire+0x8dd/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<c0136d14>] __create_workqueue+0x61/0x136
[<f8bfe62e>] cpufreq_governor_dbs+0xa1/0x30e [cpufreq_ondemand]
[<c02b2c3c>] __cpufreq_governor+0x9e/0xd2
[<c02b2df7>] __cpufreq_set_policy+0x187/0x209
[<c02b3056>] store_scaling_governor+0x164/0x1b1
[<c02b24f9>] store+0x37/0x48
[<c01aeb8d>] sysfs_write_file+0xb3/0xdb
[<c0175e0f>] vfs_write+0xaf/0x163
[<c017645d>] sys_write+0x3d/0x61
[<c0103f8c>] sysenter_past_esp+0x5d/0x99
[<ffffffff>] 0xffffffff

-> #2 (dbs_mutex){--..}:
[<c0140880>] __lock_acquire+0x8dd/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<f8bfe612>] cpufreq_governor_dbs+0x85/0x30e [cpufreq_ondemand]
[<c02b2c3c>] __cpufreq_governor+0x9e/0xd2
[<c02b2df7>] __cpufreq_set_policy+0x187/0x209
[<c02b3056>] store_scaling_governor+0x164/0x1b1
[<c02b24f9>] store+0x37/0x48
[<c01aeb8d>] sysfs_write_file+0xb3/0xdb
[<c0175e0f>] vfs_write+0xaf/0x163
[<c017645d>] sys_write+0x3d/0x61
[<c0103f8c>] sysenter_past_esp+0x5d/0x99
[<ffffffff>] 0xffffffff

-> #1 (&policy->lock){--..}:
[<c0140880>] __lock_acquire+0x8dd/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<c02b2ea2>] cpufreq_set_policy+0x29/0x79
[<c02b3804>] cpufreq_add_dev+0x342/0x48a
[<c025a799>] sysdev_driver_register+0x5f/0xa9
[<c02b2ad5>] cpufreq_register_driver+0xac/0x175
[<c046fd3f>] centrino_init+0x9b/0xa2
[<c01004c4>] init+0x11b/0x2c8
[<c0104c87>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff

-> #0 (cpu_bitmask_lock){--..}:
[<c0140781>] __lock_acquire+0x7de/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<c0144840>] lock_cpu_hotplug+0x6c/0x78
[<c02b3264>] cpufreq_driver_target+0x28/0x5e
[<c02b398e>] cpufreq_cpu_callback+0x42/0x52
[<c0133cd3>] notifier_call_chain+0x20/0x31
[<c0133d00>] raw_notifier_call_chain+0x8/0xa
[<c014452e>] _cpu_down+0x47/0x1fb
[<c01448c7>] disable_nonboot_cpus+0x7b/0x100
[<c014853f>] enter_state+0x91/0x1bb
[<c01486ef>] state_store+0x86/0x9c
[<c01ae8e8>] subsys_attr_store+0x20/0x25
[<c01aeb8d>] sysfs_write_file+0xb3/0xdb
[<c0175e0f>] vfs_write+0xaf/0x163
[<c017645d>] sys_write+0x3d/0x61
[<c0103f8c>] sysenter_past_esp+0x5d/0x99
[<ffffffff>] 0xffffffff

other info that might help us debug this:

4 locks held by pm-suspend/3601:
#0: (pm_mutex){--..}, at: [<c01484ee>] enter_state+0x40/0x1bb
#1: (cpu_add_remove_lock){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f
#2: (cache_chain_mutex){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f
#3: (workqueue_mutex){--..}, at: [<c032cd2b>] mutex_lock+0x1c/0x1f

stack backtrace:
[<c0104ff6>] show_trace_log_lvl+0x1a/0x2f
[<c01056af>] show_trace+0x12/0x14
[<c0105761>] dump_stack+0x16/0x18
[<c013f101>] print_circular_bug_tail+0x5f/0x68
[<c0140781>] __lock_acquire+0x7de/0xa04
[<c0140c90>] lock_acquire+0x56/0x6f
[<c032cb80>] __mutex_lock_slowpath+0xe5/0x274
[<c032cd2b>] mutex_lock+0x1c/0x1f
[<c0144840>] lock_cpu_hotplug+0x6c/0x78
[<c02b3264>] cpufreq_driver_target+0x28/0x5e
[<c02b398e>] cpufreq_cpu_callback+0x42/0x52
[<c0133cd3>] notifier_call_chain+0x20/0x31
[<c0133d00>] raw_notifier_call_chain+0x8/0xa
[<c014452e>] _cpu_down+0x47/0x1fb
[<c01448c7>] disable_nonboot_cpus+0x7b/0x100
[<c014853f>] enter_state+0x91/0x1bb
[<c01486ef>] state_store+0x86/0x9c
[<c01ae8e8>] subsys_attr_store+0x20/0x25
[<c01aeb8d>] sysfs_write_file+0xb3/0xdb
[<c0175e0f>] vfs_write+0xaf/0x163
[<c017645d>] sys_write+0x3d/0x61
[<c0103f8c>] sysenter_past_esp+0x5d/0x99
=======================
Breaking affinity for irq 1
Breaking affinity for irq 12
Breaking affinity for irq 21
Breaking affinity for irq 22
Breaking affinity for irq 219
CPU 1 is now offline


2007-01-27 20:55:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Wed, 2007-01-24 at 18:58 -0800, Linus Torvalds wrote:
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.

Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd
(sky2: power management/MSI workaround) makes the problem go away.

With the commit it breaks sky2 resume on my laptop:

1. request_irq in early resume is triggering:
BUG: sleeping function called from invalid context
at /home/tglx/work/kernel/vanilla/linux-2.6/mm/slab.c:3034

This is easy resolvable by moving the request_irq into the normal resume
path. There is no need to have this in early resume.

2. The network device is unusable after resume. The only way to resurect
it is: rmmod/insmod.

The reason is, that the driver grabs the normal PCI irq on resume, but
the pci express resume routes it away. All we get is an unhandled
spurious interrupt on the irq line which was used by the net device
before suspend:

irq 219, desc: c045bb80, depth: 0, count: 9607, unhandled: 0
->handle_irq(): c0155c20, handle_bad_irq+0x0/0x1f0
->chip(): c0418920, no_irq_chip+0x0/0x40
->action(): 00000000
IRQ_DISABLED set
unexpected IRQ trap at vector db

tglx


2007-01-27 22:11:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - suspend / resume ata_piix

On Wed, 2007-01-24 at 18:58 -0800, Linus Torvalds wrote:
> It's been more than a week since -rc5, but I blame everybody (including
> me) being away for Linux.conf.au and then me waiting for a few days
> afterwards to let everybody sync up.

ata_piix survives exactly one suspend resume cylce. After resuming the
second time the disk is not longer usable.

After the first resume a simple "emacs -nw bla.txt" takes already ~45sec
to launch, but there are no kernel messages.

During the second resume the ATA interrupt gets disabled due to an
unhandled interrupt.

This is 100% reproducible. So I can provide as much info as needed.

tglx

Boot:

SCSI subsystem initialized
libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00ac7
ata_piix 0000:00:1f.2: MAP [ P0 P2 XX XX ]
ata_piix 0000:00:1f.2: invalid MAP value 0
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 22 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x18D0 ctl 0x18C6 bmdma 0x18B0 irq 21
ata2: SATA max UDMA/133 cmd 0x18C8 ctl 0x18C2 bmdma 0x18B8 irq 21
scsi0 : ata_piix
PM: Adding info for No Bus:host0
ata1.00: ATA-7, max UDMA/133, 195371568 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ata_piix
PM: Adding info for No Bus:host1
scsi 0:0:0:0: Direct-Access ATA ST9100824AS 3.14 PQ: 0 ANSI: 5
PM: Adding info for scsi:0:0:0:0
SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 < sda5 > sda3
sd 0:0:0:0: Attached scsi disk sda

1st Suspend:

ata_piix 0000:00:1f.2: suspend
ACPI: PCI interrupt for device 0000:00:1f.2 disabled
PIIX_IDE 0000:00:1f.1: suspend
....
PIIX_IDE 0000:00:1f.1: LATE suspend

1st Resume:

ata1.00: configured for UDMA/133
SCSI device sda: 195371568 512-byte hdwr sectors (100030 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

2nd Suspend as above

2nd resume:
Initializing CPU#1
irq 21: nobody cared(try booting with the "irqpoll" option)
...
handlers:
[<f88e5b02>] (atat_interrupt+0x0/0x1cd [libata])
Disabling IRQ #21
....
ata1.00: qc timeout (cmd 0xe1)
ata1.00: failed to spin up (err_mask=0x4)
ata1.00: failed to set xfermode (err_mask=0x40)
ata1.00: limiting speed to UDMA/100
ata1: failed to recover some devies, retrying in 5 secs
....
sd 0:0:0:0: rejectimg I/O to offline device
....
ata1.00: ATA-7, max UDMA/133, 195371568 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
sd 0:0:0:0: rejectimg I/O to offline device
.....


2007-01-27 22:40:19

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - suspend / resume ata_piix

Thomas Gleixner wrote:
> On Wed, 2007-01-24 at 18:58 -0800, Linus Torvalds wrote:
>> It's been more than a week since -rc5, but I blame everybody (including
>> me) being away for Linux.conf.au and then me waiting for a few days
>> afterwards to let everybody sync up.
>
> ata_piix survives exactly one suspend resume cylce. After resuming the
> second time the disk is not longer usable.
>
> After the first resume a simple "emacs -nw bla.txt" takes already ~45sec
> to launch, but there are no kernel messages.
>
> During the second resume the ATA interrupt gets disabled due to an
> unhandled interrupt.
>
> This is 100% reproducible. So I can provide as much info as needed.

Is this a regression, or behavior that's always been present?

If its a regression, what changeset caused the problem?

Jeff



2007-01-27 22:43:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - suspend / resume ata_piix

On Sat, 2007-01-27 at 17:40 -0500, Jeff Garzik wrote:
> > During the second resume the ATA interrupt gets disabled due to an
> > unhandled interrupt.
> >
> > This is 100% reproducible. So I can provide as much info as needed.
>
> Is this a regression, or behavior that's always been present?

No, that's new. Something post 2.6.19

> If its a regression, what changeset caused the problem?

Hey. I just discovered that crap. I'm going to bisect tomorrow. Bed time
here in good old Europe. :)

tglx


2007-01-28 13:34:18

by Uwe Bugla

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)


-------- Original-Nachricht --------
Datum: Sat, 27 Jan 2007 18:42:30 +0100
Von: Adrian Bunk <[email protected]>
An: Linus Torvalds <[email protected]>, Andrew Morton <[email protected]>
Betreff: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

> This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
> that are not yet fixed in Linus' tree.
>
> If you find your name in the Cc header, you are either submitter of one
> of the bugs, maintainer of an affectected subsystem or driver, a patch
> of you caused a breakage or I'm considering you in any other way possibly
> involved with one or more of these issues.
>
> Due to the huge amount of recipients, please trim the Cc when answering.
>
>
> Subject : problems with CD burning
> References : http://www.spinics.net/lists/linux-ide/msg06545.html
> Submitter : Uwe Bugla <[email protected]>
> Status : unknown
>
>
> Subject : pktcdvd fails with pata_amd
> References : http://bugzilla.kernel.org/show_bug.cgi?id=7810
> http://lkml.org/lkml/2007/1/25/128
> Submitter : Gerhard Dirschl <[email protected]>
> Caused-By : Christoph Hellwig <[email protected]>
> commit 3b00315799d78f76531b71435fbc2643cd71ae4c
> commit 406c9b605cbc45151c03ac9a3f95e9acf050808c
> Status : problem is being debugged
>
>
> Subject : powerpc64: performance monitor exception
> References :
> http://ozlabs.org/pipermail/linuxppc-dev/2007-January/030045.html
> Submitter : Livio Soares <[email protected]>
> Caused-By : Paul Mackerras <[email protected]>
> commit d04c56f73c30a5e593202ecfcf25ed43d42363a2
> Status : problem is being discussed
>
>
> Subject : BUG: at fs/inotify.c:172 set_dentry_child_flags()
> References : http://bugzilla.kernel.org/show_bug.cgi?id=7785
> Submitter : Cijoml Cijomlovic Cijomlov <[email protected]>
> Handled-By : Nick Piggin <[email protected]>
> Status : problem is being debugged
>

Hi everybody,
the problem I already reported for earlier release candidates of kernel 2.6.20
(rc1 – 5) unfortunately stills persists.

The regression has become more extreme: While in earlier release candidates nerolinux recognized my burning devices at least after the first start and then never again after all following starts the situation in rc6 is different from that:

The CD and DVD burning devices aren´t recognized even once and the drive seek errors I already reported are still there.

nerolinux runs excellently with kernel 2.6.19.2, but only shows an “image recorder” (i. e. no burning device at all) in kernel 2.6.20-rc6.

Still hope that this terrible bug will not be part of the final version of 2.6.20!

Regards

Uwe

P. S.: I already reported that 2.6.20-rc4-mm1 is not bootable at all.

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

2007-01-28 22:05:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - suspend / resume ata_piix

On Sat, 2007-01-27 at 23:44 +0100, Thomas Gleixner wrote:
> > If its a regression, what changeset caused the problem?
>
> Hey. I just discovered that crap. I'm going to bisect tomorrow. Bed time
> here in good old Europe. :)

It seems to be there in 2.6.18 already, although it takes more
suspend/resume cycles to show up. So it's just the surfacing of some
longer standing problem. Just went unnoticed.

tglx


2007-01-29 06:26:17

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

On Sun, 2007-01-28 at 14:33 +0100, Uwe Bugla wrote:
> -------- Original-Nachricht --------
> Datum: Sat, 27 Jan 2007 18:42:30 +0100
> Von: Adrian Bunk <[email protected]>
> An: Linus Torvalds <[email protected]>, Andrew Morton <[email protected]>
> Betreff: 2.6.20-rc6: known unfixed regressions (v2) (part 2)
>
> > This email lists some known regressions in 2.6.20-rc6 compared to 2.6.19
> > that are not yet fixed in Linus' tree.
> >
> > If you find your name in the Cc header, you are either submitter of one
> > of the bugs, maintainer of an affectected subsystem or driver, a patch
> > of you caused a breakage or I'm considering you in any other way possibly
> > involved with one or more of these issues.
> >
> > Due to the huge amount of recipients, please trim the Cc when answering.
> >
> >
> > Subject : problems with CD burning
> > References : http://www.spinics.net/lists/linux-ide/msg06545.html
> > Submitter : Uwe Bugla <[email protected]>
> > Status : unknown

<snip>

> Hi everybody,
> the problem I already reported for earlier release candidates of kernel 2.6.20
> (rc1 – 5) unfortunately stills persists.
>
> The regression has become more extreme: While in earlier release candidates nerolinux recognized my burning devices at least after the first start and then never again after all following starts the situation in rc6 is different from that:
>
> The CD and DVD burning devices aren´t recognized even once and the drive seek errors I already reported are still there.
>
> nerolinux runs excellently with kernel 2.6.19.2, but only shows an “image recorder” (i. e. no burning device at all) in kernel 2.6.20-rc6.
>
> Still hope that this terrible bug will not be part of the final version of 2.6.20!
>
> Regards
>
> Uwe

FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is
run, the kernel never gives up retrying whatever command failed, so I
get...

[ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
[ 4362.981475] ide: failed opcode was: unknown
[ 4362.986183] hdd: drive not ready for command

endlessly.

-Mike

2007-01-29 07:00:30

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

On Mon, 29 Jan 2007 07:26:03 +0100
Mike Galbraith <[email protected]> wrote:

> FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is
> run, the kernel never gives up retrying whatever command failed, so I
> get...
>
> [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete
> DataRequest }
> [ 4362.981475] ide: failed opcode was: unknown
> [ 4362.986183] hdd: drive not ready for command
>
> endlessly.

Do you have time to bisect it?

2007-01-29 07:08:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)

On Sun, 2007-01-28 at 22:48 -0800, Andrew Morton wrote:
> On Mon, 29 Jan 2007 07:26:03 +0100
> Mike Galbraith <[email protected]> wrote:
>
> > FWIW, I just tried it with 2.6.20-rc6, and can confirm. Once nero is
> > run, the kernel never gives up retrying whatever command failed, so I
> > get...
> >
> > [ 4362.972995] hdd: status error: status=0x58 { DriveReady SeekComplete
> > DataRequest }
> > [ 4362.981475] ide: failed opcode was: unknown
> > [ 4362.986183] hdd: drive not ready for command
> >
> > endlessly.
>
> Do you have time to bisect it?

Unfortunately, I'm git impaired. I am rummaging as we speak though.

-Mike

2007-01-29 07:24:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known unfixed regressions (v2) (part 2)



On Mon, 29 Jan 2007, Mike Galbraith wrote:
>
> Unfortunately, I'm git impaired. I am rummaging as we speak though.

Ok, I'm personally heading to bed, but it rally should be as simple as

- get the git tree in the first place
- do

git bisect good v2.6.19
git bisect bad v2.6.20-rc2
.. it will pick a point for you to try ..
.. compile, boot, test ..

"git bisect {good|bad}" depending on results

- until (found)

(Of course, you should check that -rc2 really is bad to make sure. I think
that's what Uwe reported, though. And I don't think we've done anything
after -rc2 that could impact this, so I don't doubt it).

Linus

2007-01-29 08:50:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known regressions with patches


* Adrian Bunk <[email protected]> wrote:

> Subject : ACPI: fix cpufreq regression
> References : http://lkml.org/lkml/2007/1/16/120
> Submitter : Ingo Molnar <[email protected]>
> Caused-By : Dave Jones <[email protected]>
> commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
> Handled-By : Ingo Molnar <[email protected]>
> Patch : http://lkml.org/lkml/2007/1/16/120
> Status : patch available

this is commit e4233dec749a3519069d9390561b5636a75c7579 meanwhile.

Ingo

2007-01-29 13:05:50

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.20-rc6: known regressions with patches

On Mon, Jan 29, 2007 at 09:45:48AM +0100, Ingo Molnar wrote:
>
> * Adrian Bunk <[email protected]> wrote:
>
> > Subject : ACPI: fix cpufreq regression
> > References : http://lkml.org/lkml/2007/1/16/120
> > Submitter : Ingo Molnar <[email protected]>
> > Caused-By : Dave Jones <[email protected]>
> > commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
> > Handled-By : Ingo Molnar <[email protected]>
> > Patch : http://lkml.org/lkml/2007/1/16/120
> > Status : patch available
>
> this is commit e4233dec749a3519069d9390561b5636a75c7579 meanwhile.

Thanks for pushing this on in my absense btw.
FWIW, it has my belated ACK :)
I've asked for it to be in 19.3 too as the same bug exists there.

Dave

--
http://www.codemonkey.org.uk

2007-01-29 19:34:36

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Does this fix it?
---
drivers/net/sky2.c | 43 ++++++++++++++++++-------------------------
1 file changed, 18 insertions(+), 25 deletions(-)

--- sky2-2.6.orig/drivers/net/sky2.c 2007-01-29 10:05:12.000000000 -0800
+++ sky2-2.6/drivers/net/sky2.c 2007-01-29 10:29:56.000000000 -0800
@@ -3675,6 +3675,12 @@
sky2_write32(hw, B0_IMSK, 0);
sky2_power_aux(hw);

+ /* Turn off IRQ to avoid power management bug (see resume) */
+ if (hw->msi) {
+ free_irq(pdev->irq, hw);
+ pci_disable_msi(pdev);
+ }
+
pci_save_state(pdev);
pci_enable_wake(pdev, pci_choose_state(pdev, state), wol);
pci_set_power_state(pdev, pci_choose_state(pdev, state));
@@ -3700,6 +3706,18 @@

sky2_write32(hw, B0_IMSK, Y2_IS_BASE);

+ /* Can't re-enable MSI because kernel resume ordering is broken
+ * and calls device resume before ACPI (BIOS) is called.
+ * BIOS then resets device to INTx!
+ */
+ if (hw->msi) {
+ err = request_irq(pdev->irq, sky2_intr, IRQF_SHARED,
+ hw->dev[0]->name, hw);
+ if (err)
+ goto out;
+ hw->msi = 0;
+ }
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (netif_running(dev)) {
@@ -3721,29 +3739,6 @@
pci_disable_device(pdev);
return err;
}
-
-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif

static void sky2_shutdown(struct pci_dev *pdev)
@@ -3783,8 +3778,6 @@
#ifdef CONFIG_PM
.suspend = sky2_suspend,
.resume = sky2_resume,
- .suspend_late = sky2_suspend_late,
- .resume_early = sky2_resume_early,
#endif
.shutdown = sky2_shutdown,
};

2007-01-29 20:09:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 2007-01-29 at 11:31 -0800, Stephen Hemminger wrote:
> Does this fix it?

Don't know.

> --- sky2-2.6.orig/drivers/net/sky2.c 2007-01-29 10:05:12.000000000 -0800
> +++ sky2-2.6/drivers/net/sky2.c 2007-01-29 10:29:56.000000000 -0800
> @@ -3675,6 +3675,12 @@
> sky2_write32(hw, B0_IMSK, 0);
> sky2_power_aux(hw);
>

patching file drivers/net/sky2.c
Hunk #1 FAILED at 3675.
Hunk #2 succeeded at 3625 (offset -81 lines).
Hunk #3 succeeded at 3738 with fuzz 1 (offset -1 lines).
Hunk #4 succeeded at 3668 with fuzz 2 (offset -110 lines).
1 out of 4 hunks FAILED -- saving rejects to file drivers/net/sky2.c.rej

# grep -c sky2_power_aux drivers/net/sky2.c
0

Shrug.

tglx


2007-01-29 21:41:34

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 21:10:30 +0100
Thomas Gleixner <[email protected]> wrote:

> On Mon, 2007-01-29 at 11:31 -0800, Stephen Hemminger wrote:
> > Does this fix it?
>
> Don't know.

Sorry it was against the last patch I sent to Jeff for netdev.
Here is against 2.6.20-rc6

---
drivers/net/sky2.c | 43 ++++++++++++++++++-------------------------
1 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index a2e804d..d85de63 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -3598,6 +3598,12 @@ static int sky2_suspend(struct pci_dev *
}
}

+ /* Turn off IRQ to avoid power management bug (see resume) */
+ if (hw->msi) {
+ free_irq(pdev->irq, hw);
+ pci_disable_msi(pdev);
+ }
+
sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
sky2_set_power_state(hw, pstate);
@@ -3619,6 +3625,18 @@ static int sky2_resume(struct pci_dev *p

sky2_write32(hw, B0_IMSK, Y2_IS_BASE);

+ /* Can't re-enable MSI because kernel resume ordering is broken
+ * and calls device resume before ACPI (BIOS) is called.
+ * BIOS then resets device to INTx!
+ */
+ if (hw->msi) {
+ err = request_irq(pdev->irq, sky2_intr, IRQF_SHARED,
+ hw->dev[0]->name, hw);
+ if (err)
+ goto out;
+ hw->msi = 0;
+ }
+
for (i = 0; i < hw->ports; i++) {
struct net_device *dev = hw->dev[i];
if (netif_running(dev)) {
@@ -3639,29 +3657,6 @@ static int sky2_resume(struct pci_dev *p
out:
return err;
}
-
-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif

static struct pci_driver sky2_driver = {
@@ -3672,8 +3667,6 @@ static struct pci_driver sky2_driver = {
#ifdef CONFIG_PM
.suspend = sky2_suspend,
.resume = sky2_resume,
- .suspend_late = sky2_suspend_late,
- .resume_early = sky2_resume_early,
#endif
};

--
1.4.1

2007-01-29 22:22:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 2007-01-29 at 13:38 -0800, Stephen Hemminger wrote:
> Sorry it was against the last patch I sent to Jeff for netdev.
> Here is against 2.6.20-rc6

Still the same problem. The only difference of this patch to the
previous version is, that the unhandled interrupt message is gone.

As I said before:

Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
this hackery in the first place, makes the device survive
suspend/resume.

tglx


2007-01-29 22:26:30

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 23:23:21 +0100
Thomas Gleixner <[email protected]> wrote:

> On Mon, 2007-01-29 at 13:38 -0800, Stephen Hemminger wrote:
> > Sorry it was against the last patch I sent to Jeff for netdev.
> > Here is against 2.6.20-rc6
>
> Still the same problem. The only difference of this patch to the
> previous version is, that the unhandled interrupt message is gone.
>
> As I said before:
>
> Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
> this hackery in the first place, makes the device survive
> suspend/resume.
>
> tglx
>
>

But the fix is necessary on laptops where ACPI messes with MSI/INTx
on resume.

--
Stephen Hemminger <[email protected]>

2007-01-29 22:30:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 2007-01-29 at 14:23 -0800, Stephen Hemminger wrote:
> > Still the same problem. The only difference of this patch to the
> > previous version is, that the unhandled interrupt message is gone.
> >
> > As I said before:
> >
> > Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
> > this hackery in the first place, makes the device survive
> > suspend/resume.

> But the fix is necessary on laptops where ACPI messes with MSI/INTx
> on resume.

And the fix is unnecessary and counter productive on laptops, where ACPI
does the right thing.

tglx


2007-01-29 22:37:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage



On Mon, 29 Jan 2007, Thomas Gleixner wrote:
>
> Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
> this hackery in the first place, makes the device survive
> suspend/resume.

I suspect some BIOSes do *not* screw up the MSI thing on resume, and
others do.

I would suggest that the real fix is to not do that kind of hackery at
suspend/resume time (because we can't know what the heck the BIOS does),
and instead just do one of two cases:

- since MSI is known to be broken for the sky2 driver due to firmware
bugs, just disable it by default if CONFIG_PM is enabled. The
advantages of MSI just aren't all that compelling. Possibly add a
command line option to force MSI to be enabled regardless.

Simple, direct, and should work for everybody.

- Just add a command line to disable MSI for people that it breaks for.

I don't actually like this one. It defaults to the unsafe behaviour,
and while that makes sense in a "well, your machine is broken anyway"
kind of way, the thing is, the advantages of MSI just aren't big enough
to warrant defaulting to a known-unsafe thing, even if only a small
percentage of machines are affected.

With _eventually_ maybe having a third possible situation:

- some way of figuring it out dynamically.

The third case doesn't seem to be very likely in the short term, though,
which is why I'd suggest one of the first two (the first one being
probably the best one).

Comments?

Linus

2007-01-29 22:38:45

by Frederic Riss

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Le lundi 29 janvier 2007 à 23:23 +0100, Thomas Gleixner a écrit :
> On Mon, 2007-01-29 at 13:38 -0800, Stephen Hemminger wrote:
> > Sorry it was against the last patch I sent to Jeff for netdev.
> > Here is against 2.6.20-rc6
>
> Still the same problem. The only difference of this patch to the
> previous version is, that the unhandled interrupt message is gone.
>
> As I said before:
>
> Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
> this hackery in the first place, makes the device survive
> suspend/resume.

I see the same symptoms on my Intel Mac Mini, and reverting the commit
also allows the driver to seemingly resume correctly.

However after coming out of sleep I need to reconfigure the network
interface. No need to rmmod/insmod, just ifdown/ifup is sufficient (but
of course shouldn't be necessary, should it?). If I don't reconfigure
it, ping from/to the box will work, but nothing more complicated like
ssh will go through.

Fred.

2007-01-29 22:43:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 14:37:23 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 29 Jan 2007, Thomas Gleixner wrote:
> >
> > Reverting commit 44ade178249fe53d055fd92113eaa271e06acddd, which added
> > this hackery in the first place, makes the device survive
> > suspend/resume.
>
> I suspect some BIOSes do *not* screw up the MSI thing on resume, and
> others do.
>
> I would suggest that the real fix is to not do that kind of hackery at
> suspend/resume time (because we can't know what the heck the BIOS does),
> and instead just do one of two cases:
>
> - since MSI is known to be broken for the sky2 driver due to firmware
> bugs, just disable it by default if CONFIG_PM is enabled. The
> advantages of MSI just aren't all that compelling. Possibly add a
> command line option to force MSI to be enabled regardless.

MSI works fine for almost all systems (except AMD systems where
MSI is broken for ALL devices).

> Simple, direct, and should work for everybody.
>
> - Just add a command line to disable MSI for people that it breaks for.
>
> I don't actually like this one. It defaults to the unsafe behaviour,
> and while that makes sense in a "well, your machine is broken anyway"
> kind of way, the thing is, the advantages of MSI just aren't big enough
> to warrant defaulting to a known-unsafe thing, even if only a small
> percentage of machines are affected.

Module option out already exists.

2007-01-29 22:44:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 2007-01-29 at 23:38 +0100, Frédéric Riss wrote:
> I see the same symptoms on my Intel Mac Mini, and reverting the commit
> also allows the driver to seemingly resume correctly.
>
> However after coming out of sleep I need to reconfigure the network
> interface. No need to rmmod/insmod, just ifdown/ifup is sufficient (but
> of course shouldn't be necessary, should it?). If I don't reconfigure
> it, ping from/to the box will work, but nothing more complicated like
> ssh will go through.

That's probably a userspace problem. Are you using DHCP ?

tglx




2007-01-29 22:51:04

by Frederic Riss

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Le lundi 29 janvier 2007 à 23:45 +0100, Thomas Gleixner a écrit :
> On Mon, 2007-01-29 at 23:38 +0100, Frédéric Riss wrote:
> > I see the same symptoms on my Intel Mac Mini, and reverting the commit
> > also allows the driver to seemingly resume correctly.
> >
> > However after coming out of sleep I need to reconfigure the network
> > interface. No need to rmmod/insmod, just ifdown/ifup is sufficient (but
> > of course shouldn't be necessary, should it?). If I don't reconfigure
> > it, ping from/to the box will work, but nothing more complicated like
> > ssh will go through.
>
> That's probably a userspace problem. Are you using DHCP ?

Yep DHCP. Is that a known issue? I never had to reconfigure with older
kernels.

Fred.

2007-01-29 22:57:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 2007-01-29 at 23:50 +0100, Frédéric Riss wrote:
> > That's probably a userspace problem. Are you using DHCP ?
>
> Yep DHCP. Is that a known issue? I never had to reconfigure with older
> kernels.

Is dhclient running after resume ? What's the output of ifconfig (before
you do ifdown/up) ? Have you checked the syslog ?

tglx


2007-01-29 23:04:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage



On Mon, 29 Jan 2007, Stephen Hemminger wrote:
>
> MSI works fine for almost all systems (except AMD systems where
> MSI is broken for ALL devices).

Why do you ignore reality?

MSI does *not* work fine, exactly because the firmware screws it up.

The fact that on a "hardware level" it may work is totally irrelevant. The
*only* thing that matters is what people actually see.

"Positivism" may not be a hot philosophy these days any more, but dang, it
certainly is better than what you seem to espouse: "in theory things work
fine".

And if you don't like positivism, how about just simple scientific method:
a theory is *proven*wrong* by a single observation to the opposite. And we
have several people standing up saying that your theory is wrong.

Linus

2007-01-29 23:26:45

by Frederic Riss

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Le lundi 29 janvier 2007 à 23:57 +0100, Thomas Gleixner a écrit :
> On Mon, 2007-01-29 at 23:50 +0100, Frédéric Riss wrote:
> > > That's probably a userspace problem. Are you using DHCP ?
> >
> > Yep DHCP. Is that a known issue? I never had to reconfigure with older
> > kernels.
>
> Is dhclient running after resume ?

The process is of course in the process list, if that's what you mean by
'running'.

> What's the output of ifconfig (before you do ifdown/up) ?

The output is always the same modulo the transmitted packet numbers:

eth0 Link encap:Ethernet HWaddr 00:16:CB:A2:E4:43
inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::216:cbff:fea2:e443/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:269 errors:0 dropped:0 overruns:0 frame:0
TX packets:57 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:72528 (70.8 KiB) TX bytes:7900 (7.7 KiB)
Interrupt:17

The RX/TX counts are reset to 0 after a resume.

> Have you checked the syslog ?

Yes of course. Nothing interesting.

Fred.

2007-01-29 23:37:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Tue, 2007-01-30 at 00:26 +0100, Frédéric Riss wrote:
> > Have you checked the syslog ?
> Yes of course. Nothing interesting.

Just got the same issue on one of my test boxen. Different network card
though. The interface comes up fine, but DNS is not working. ifdown/up
resolves it.

/me keeps an eye on that.

tglx




2007-01-29 23:42:14

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] sky2: fix MSI related resume breakage

commmit 44ade178249fe53d055fd92113eaa271e06acddd breaks sane
MSI/ACPI/BIOS combinations. It's impossible to keep broken and sane
MSI/ACPI/BIOSes happy at the same time.

Revert the patch and disable MSI for sky2 when CONFIG_PM is enabled.

Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index a2e804d..420fef7 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -91,7 +91,11 @@ static int copybreak __read_mostly = 128;
module_param(copybreak, int, 0);
MODULE_PARM_DESC(copybreak, "Receive copy threshold");

+#ifdef CONFIG_PM
+static int disable_msi = 1;
+#else
static int disable_msi = 0;
+#endif
module_param(disable_msi, int, 0);
MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");

@@ -3601,6 +3605,7 @@ static int sky2_suspend(struct pci_dev *pdev, pm_message_t state)
sky2_write32(hw, B0_IMSK, 0);
pci_save_state(pdev);
sky2_set_power_state(hw, pstate);
+
return 0;
}

@@ -3640,28 +3645,6 @@ out:
return err;
}

-/* BIOS resume runs after device (it's a bug in PM)
- * as a temporary workaround on suspend/resume leave MSI disabled
- */
-static int sky2_suspend_late(struct pci_dev *pdev, pm_message_t state)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
-
- free_irq(pdev->irq, hw);
- if (hw->msi) {
- pci_disable_msi(pdev);
- hw->msi = 0;
- }
- return 0;
-}
-
-static int sky2_resume_early(struct pci_dev *pdev)
-{
- struct sky2_hw *hw = pci_get_drvdata(pdev);
- struct net_device *dev = hw->dev[0];
-
- return request_irq(pdev->irq, sky2_intr, IRQF_SHARED, dev->name, hw);
-}
#endif

static struct pci_driver sky2_driver = {
@@ -3672,8 +3655,6 @@ static struct pci_driver sky2_driver = {
#ifdef CONFIG_PM
.suspend = sky2_suspend,
.resume = sky2_resume,
- .suspend_late = sky2_suspend_late,
- .resume_early = sky2_resume_early,
#endif
};



2007-01-29 23:47:49

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 15:04:06 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> >
> > MSI works fine for almost all systems (except AMD systems where
> > MSI is broken for ALL devices).
>
> Why do you ignore reality?
>
> MSI does *not* work fine, exactly because the firmware screws it up.
>
> The fact that on a "hardware level" it may work is totally irrelevant. The
> *only* thing that matters is what people actually see.
>
> "Positivism" may not be a hot philosophy these days any more, but dang, it
> certainly is better than what you seem to espouse: "in theory things work
> fine".
>
> And if you don't like positivism, how about just simple scientific method:
> a theory is *proven*wrong* by a single observation to the opposite. And we
> have several people standing up saying that your theory is wrong.
>
> Linus

Why do you insist on maintaining the wrong initialization order
on resume? When I raised the issue, Len brought up that the resume
order did not match spec, but then there has been slow progress
in fixing it (it's buried in -mm tree).





--
Stephen Hemminger <[email protected]>

2007-01-29 23:53:17

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] block MSI on Sony

The Sony VAIO BIOS resets to INTx on resume. This happens
after device resume, so device irq's get misrouted.

This hack turns off MSI on this laptop, until power management
initialization order is fixed.

Signed-off-by: Stephen Hemminger <[email protected]>

---
drivers/pci/quirks.c | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ef882a8..9a64179 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -21,6 +21,7 @@ #include <linux/pci.h>
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/acpi.h>
+#include <linux/dmi.h>
#include "pci.h"

/* The Mellanox Tavor device gives false positive parity errors
@@ -1779,6 +1780,37 @@ static void __devinit quirk_nvidia_ck804
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_CK804_PCIE,
quirk_nvidia_ck804_msi_ht_cap);
+
+/* On Sony VAIO laptop, BIOS resets MSI during resume. */
+static __initdata struct dmi_system_id sony_dmi_table[] = {
+ {
+ .ident = "Sony Vaio",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "PCG-"),
+ },
+ },
+ {
+ .ident = "Sony Vaio",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "VGN-"),
+ },
+ },
+ { }
+};
+
+static void __init quirk_sony_msi(struct pci_dev *dev)
+{
+ if (!dmi_check_system(sony_dmi_table))
+ return;
+
+ pci_msi_quirk = 1;
+ printk(KERN_WARNING "PCI: MSI sony quirk detected. pci_msi_quirk set.\n");
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801BA_6,
+ quirk_sony_msi);
+
#endif /* CONFIG_PCI_MSI */

EXPORT_SYMBOL(pcie_mch_quirk);
--
1.4.1

2007-01-30 00:12:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage



On Mon, 29 Jan 2007, Stephen Hemminger wrote:
>
> Why do you insist on maintaining the wrong initialization order
> on resume? When I raised the issue, Len brought up that the resume
> order did not match spec, but then there has been slow progress
> in fixing it (it's buried in -mm tree).

It's not getting merged, SINCE IT DOESN'T WORK. It causes all sorts of
problems, because ACPI requires all kinds of things to be up and running
in order to actually work, and that in turn breaks all the devices that
have different ordering constraints.

ACPI is a piece of sh*t. It asks the OS to do impossible things, like
running it early in the config sequence when it then at the same time
wants to depend on stuff that are there *late* in the sequence. It's not
the first time this insane situation has happened, either.

But we'll try to merge the patch that totally switches around the whole
initialization order hopefully early after 2.6.20. But no way in hell do
we do it now, and I personally suspect we'll end reverting it when we do
try it just because it will probably break other things. But we'll see.

In the meantime, sky2 doesn't work with MSI.

Linus

2007-01-30 00:18:58

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 16:12:27 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> >
> > Why do you insist on maintaining the wrong initialization order
> > on resume? When I raised the issue, Len brought up that the resume
> > order did not match spec, but then there has been slow progress
> > in fixing it (it's buried in -mm tree).
>
> It's not getting merged, SINCE IT DOESN'T WORK. It causes all sorts of
> problems, because ACPI requires all kinds of things to be up and running
> in order to actually work, and that in turn breaks all the devices that
> have different ordering constraints.
>
> ACPI is a piece of sh*t. It asks the OS to do impossible things, like
> running it early in the config sequence when it then at the same time
> wants to depend on stuff that are there *late* in the sequence. It's not
> the first time this insane situation has happened, either.

You will find no argument from me with that statement.

> But we'll try to merge the patch that totally switches around the whole
> initialization order hopefully early after 2.6.20. But no way in hell do
> we do it now, and I personally suspect we'll end reverting it when we do
> try it just because it will probably break other things. But we'll see.
>
> In the meantime, sky2 doesn't work with MSI

On one and only one platform. It works fine on others. Don't blame the
driver, stop it in PCI.


--
Stephen Hemminger <[email protected]>

2007-01-30 00:22:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] block MSI on Sony

On Mon, 2007-01-29 at 15:50 -0800, Stephen Hemminger wrote:
> The Sony VAIO BIOS resets to INTx on resume. This happens
> after device resume, so device irq's get misrouted.

Err? My Sony VAIO does _NOT_ do that. It works fine without that.
It's just the sky2 hackery which fucked up things.

tglx




2007-01-30 00:24:28

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] block MSI on Sony

On Tue, 30 Jan 2007 01:22:54 +0100
Thomas Gleixner <[email protected]> wrote:

> On Mon, 2007-01-29 at 15:50 -0800, Stephen Hemminger wrote:
> > The Sony VAIO BIOS resets to INTx on resume. This happens
> > after device resume, so device irq's get misrouted.
>
> Err? My Sony VAIO does _NOT_ do that. It works fine without that.
> It's just the sky2 hackery which fucked up things.
>

What machine and BIOS version?


--
Stephen Hemminger <[email protected]>

2007-01-30 00:26:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage



On Mon, 29 Jan 2007, Stephen Hemminger wrote:
>
> On one and only one platform. It works fine on others. Don't blame the
> driver, stop it in PCI.

How sure are you that it's only those Sony laptops?

Linus

2007-01-30 00:26:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] block MSI on Sony

On Tue, 2007-01-30 at 01:22 +0100, Thomas Gleixner wrote:
> On Mon, 2007-01-29 at 15:50 -0800, Stephen Hemminger wrote:
> > The Sony VAIO BIOS resets to INTx on resume. This happens
> > after device resume, so device irq's get misrouted.
>
> Err? My Sony VAIO does _NOT_ do that. It works fine without that.
> It's just the sky2 hackery which fucked up things.

And how is this going to solve the breakage on Frederics box?

> I see the same symptoms on my Intel Mac Mini, and reverting the commit
> also allows the driver to seemingly resume correctly.

Still it stands:

Your sky2 patch #44ade178249fe53d055fd92113eaa271e06acddd is broken.

Just get it.

tglx


2007-01-30 00:29:31

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Mon, 29 Jan 2007 16:25:48 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> >
> > On one and only one platform. It works fine on others. Don't blame the
> > driver, stop it in PCI.
>
> How sure are you that it's only those Sony laptops?
>

I do not underestimate the ability of BIOS writers to
screw things up.

--
Stephen Hemminger <[email protected]>

2007-01-30 00:30:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] block MSI on Sony

On Mon, 2007-01-29 at 16:21 -0800, Stephen Hemminger wrote:
> > > The Sony VAIO BIOS resets to INTx on resume. This happens
> > > after device resume, so device irq's get misrouted.
> >
> > Err? My Sony VAIO does _NOT_ do that. It works fine without that.
> > It's just the sky2 hackery which fucked up things.
>
> What machine and BIOS version?

VGN-SZ2XP_C
BIOS: R0081N0

tglx


2007-01-30 00:34:19

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] block MSI on Sony

On Tue, 30 Jan 2007 01:31:33 +0100
Thomas Gleixner <[email protected]> wrote:

> On Mon, 2007-01-29 at 16:21 -0800, Stephen Hemminger wrote:
> > > > The Sony VAIO BIOS resets to INTx on resume. This happens
> > > > after device resume, so device irq's get misrouted.
> > >
> > > Err? My Sony VAIO does _NOT_ do that. It works fine without that.
> > > It's just the sky2 hackery which fucked up things.
> >
> > What machine and BIOS version?
>
> VGN-SZ2XP_C
> BIOS: R0081N0

Mine is:
VGN-N170G
BIOS: R0020J4

It might be BIOS bug that has been fixed, but updating the
BIOS requires Windows. It checks for some ID so even Wine
won't work.


--
Stephen Hemminger <[email protected]>

2007-01-30 06:56:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Linus Torvalds <[email protected]> wrote:

> On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> >
> > On one and only one platform. It works fine on others. Don't blame
> > the driver, stop it in PCI.
>
> How sure are you that it's only those Sony laptops?

i'm wondering, could we go with Thomas' temporary patch that disables
sky2 MSI if CONFIG_PM is enabled - we could revert that after 2.6.20.
It's not like MSI is a life and death feature. On IO-APIC systems
vectors are abundant and in any case we share irqs just fine. The true
advantage of MSI is minimal. (MSI-X has the potential to be better by
being message based, but in reality it still goes through the full IRQ
layer.) MSI might be useful on really, really large systems - but i
really hope those really large systems dont rely on CONFIG_PM. Meanwhile
Thomas' patch maximizes the amount of working hardware (it has the
chance to produce working systems in 100% of the cases) - which is a few
orders of magnitude more important than IRQ management micro-costs. Am i
missing anything?

Ingo

2007-01-30 07:39:17

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Ingo Molnar wrote:
> i'm wondering, could we go with Thomas' temporary patch that disables
> sky2 MSI if CONFIG_PM is enabled - we could revert that after 2.6.20.
> It's not like MSI is a life and death feature. On IO-APIC systems
> vectors are abundant and in any case we share irqs just fine. The true
> advantage of MSI is minimal. (MSI-X has the potential to be better by
> being message based, but in reality it still goes through the full IRQ
> layer.) MSI might be useful on really, really large systems - but i
> really hope those really large systems dont rely on CONFIG_PM. Meanwhile
> Thomas' patch maximizes the amount of working hardware (it has the
> chance to produce working systems in 100% of the cases) - which is a few
> orders of magnitude more important than IRQ management micro-costs. Am i
> missing anything?


Sharing irqs /sucks/. I routinely have to fight a USB device dying,
because the ATA device is causing an interrupt storm, or vice versa.
/Very/ common headache.

Other than that, they use a tiny bit fewer CPU cycles, and allow
simplification of the interrupt handler (saving another few CPU cycles).

The biggest benefit is (a) for hardware designers, where MSI means a
cleaner h/w design, and (b) preparation of drivers and the kernel
systems for MSI-only hardware.

At present only high end hardware is MSI-only (like infiniband), but
that's the future direction.

Jeff


2007-01-30 07:54:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Jeff Garzik <[email protected]> wrote:

> Ingo Molnar wrote:
> >i'm wondering, could we go with Thomas' temporary patch that disables
> >sky2 MSI if CONFIG_PM is enabled - we could revert that after 2.6.20.
> >It's not like MSI is a life and death feature. On IO-APIC systems
> >vectors are abundant and in any case we share irqs just fine. The
> >true advantage of MSI is minimal. (MSI-X has the potential to be
> >better by being message based, but in reality it still goes through
> >the full IRQ layer.) MSI might be useful on really, really large
> >systems - but i really hope those really large systems dont rely on
> >CONFIG_PM. Meanwhile Thomas' patch maximizes the amount of working
> >hardware (it has the chance to produce working systems in 100% of the
> >cases) - which is a few orders of magnitude more important than IRQ
> >management micro-costs. Am i missing anything?
>
>
> Sharing irqs /sucks/. I routinely have to fight a USB device dying,
> because the ATA device is causing an interrupt storm, or vice versa.
> /Very/ common headache.

Yeah. Admittedly, ATA is very special because it is still edge-triggered
most of the time (for legacy reasons):

14: 389907 0 IO-APIC-edge ide0

so if it shares an irq with a device that has level-triggered
assumptions, those two dont intermix very well. That's why i have the
delayed-disable patches (see the two patches below), which will unify
the two methods, and the irq flow handling method will be mostly a
'performance hint' not a correctness issue. This has been in -rt for
quite a few weeks now and it works well.

btw., it would be great if you could help us here: could you perhaps,
from a past example, outline a specific case of such an ATA/USB IRQ
storm and how it occured (precisely) - and what the fix was? I'd like to
analyze a specific case to make sure the genirq layer recovers from such
cases more gracefully. In general, i think the IRQ subsystem needs to
become more failure-resilient and needs to become more auto-learning
(and these two dont stand in the way of good performance). This problem
of shared IRQs will be with us for at least another 10 years, if not
more. (for example ISA is /still/ not dead everywhere and it was already
legacy technology 15 years ago when Linux was started.)

Ingo

------------------->
Subject: irq: do not mask interrupts by default
From: Ingo Molnar <[email protected]>

never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change
could recover lost edges (or even other types of lost interrupts) by
conservatively only masking interrupts after they happen. (NOTE: with
this change the highlevel irq-disable code still soft-disables this IRQ
line - and if such an interrupt happens then the IRQ flow handler keeps
the IRQ masked.)

mark i8529A controllers as 'never loses an edge'.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/i8259.c | 1 +
arch/x86_64/kernel/i8259.c | 1 +
kernel/irq/chip.c | 17 ++++++++++-------
3 files changed, 12 insertions(+), 7 deletions(-)

Index: linux/arch/i386/kernel/i8259.c
===================================================================
--- linux.orig/arch/i386/kernel/i8259.c
+++ linux/arch/i386/kernel/i8259.c
@@ -41,6 +41,7 @@ static void mask_and_ack_8259A(unsigned
static struct irq_chip i8259A_chip = {
.name = "XT-PIC",
.mask = disable_8259A_irq,
+ .disable = disable_8259A_irq,
.unmask = enable_8259A_irq,
.mask_ack = mask_and_ack_8259A,
};
Index: linux/arch/x86_64/kernel/i8259.c
===================================================================
--- linux.orig/arch/x86_64/kernel/i8259.c
+++ linux/arch/x86_64/kernel/i8259.c
@@ -103,6 +103,7 @@ static void mask_and_ack_8259A(unsigned
static struct irq_chip i8259A_chip = {
.name = "XT-PIC",
.mask = disable_8259A_irq,
+ .disable = disable_8259A_irq,
.unmask = enable_8259A_irq,
.mask_ack = mask_and_ack_8259A,
};
Index: linux/kernel/irq/chip.c
===================================================================
--- linux.orig/kernel/irq/chip.c
+++ linux/kernel/irq/chip.c
@@ -202,10 +202,6 @@ static void default_enable(unsigned int
*/
static void default_disable(unsigned int irq)
{
- struct irq_desc *desc = irq_desc + irq;
-
- if (!(desc->status & IRQ_DELAYED_DISABLE))
- desc->chip->mask(irq);
}

/*
@@ -270,13 +266,18 @@ handle_simple_irq(unsigned int irq, stru

if (unlikely(desc->status & IRQ_INPROGRESS))
goto out_unlock;
- desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
kstat_cpu(cpu).irqs[irq]++;

action = desc->action;
- if (unlikely(!action || (desc->status & IRQ_DISABLED)))
+ if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
+ if (desc->chip->mask)
+ desc->chip->mask(irq);
+ desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
+ desc->status |= IRQ_PENDING;
goto out_unlock;
+ }

+ desc->status &= ~(IRQ_REPLAY | IRQ_WAITING | IRQ_PENDING);
desc->status |= IRQ_INPROGRESS;
spin_unlock(&desc->lock);

@@ -368,11 +369,13 @@ handle_fasteoi_irq(unsigned int irq, str

/*
* If its disabled or no action available
- * keep it masked and get out of here
+ * then mask it and get out of here:
*/
action = desc->action;
if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
desc->status |= IRQ_PENDING;
+ if (desc->chip->mask)
+ desc->chip->mask(irq);
goto out;
}

---------------------->
Subject: genirq: remove IRQ_DISABLED
From: Ingo Molnar <[email protected]>

now that disable_irq() defaults to delayed-disable semantics, the
IRQ_DISABLED flag is not needed anymore.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/arm/kernel/irq.c | 3 +--
arch/i386/kernel/io_apic.c | 4 +---
arch/powerpc/platforms/powermac/pic.c | 2 --
arch/x86_64/kernel/io_apic.c | 4 +---
include/linux/irq.h | 7 +++----
5 files changed, 6 insertions(+), 14 deletions(-)

Index: linux/arch/arm/kernel/irq.c
===================================================================
--- linux.orig/arch/arm/kernel/irq.c
+++ linux/arch/arm/kernel/irq.c
@@ -159,8 +159,7 @@ void __init init_IRQ(void)
int irq;

for (irq = 0; irq < NR_IRQS; irq++)
- irq_desc[irq].status |= IRQ_NOREQUEST | IRQ_DELAYED_DISABLE |
- IRQ_NOPROBE;
+ irq_desc[irq].status |= IRQ_NOREQUEST | IRQ_NOPROBE;

#ifdef CONFIG_SMP
bad_irq_desc.affinity = CPU_MASK_ALL;
Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -1275,11 +1275,9 @@ static void ioapic_register_intr(int irq
trigger == IOAPIC_LEVEL)
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_fasteoi_irq, "fasteoi");
- else {
- irq_desc[irq].status |= IRQ_DELAYED_DISABLE;
+ else
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_edge_irq, "edge");
- }
set_intr_gate(vector, interrupt[irq]);
}

Index: linux/arch/powerpc/platforms/powermac/pic.c
===================================================================
--- linux.orig/arch/powerpc/platforms/powermac/pic.c
+++ linux/arch/powerpc/platforms/powermac/pic.c
@@ -305,8 +305,6 @@ static int pmac_pic_host_map(struct irq_
level = !!(level_mask[hw >> 5] & (1UL << (hw & 0x1f)));
if (level)
desc->status |= IRQ_LEVEL;
- else
- desc->status |= IRQ_DELAYED_DISABLE;
set_irq_chip_and_handler(virq, &pmac_pic, level ?
handle_level_irq : handle_edge_irq);
return 0;
Index: linux/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/io_apic.c
+++ linux/arch/x86_64/kernel/io_apic.c
@@ -810,11 +810,9 @@ static void ioapic_register_intr(int irq
trigger == IOAPIC_LEVEL)
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_fasteoi_irq, "fasteoi");
- else {
- irq_desc[irq].status |= IRQ_DELAYED_DISABLE;
+ else
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_edge_irq, "edge");
- }
}
static void __init setup_IO_APIC_irq(int apic, int pin, int idx, int irq)
{
Index: linux/include/linux/irq.h
===================================================================
--- linux.orig/include/linux/irq.h
+++ linux/include/linux/irq.h
@@ -57,10 +57,9 @@ typedef void fastcall (*irq_flow_handler
#define IRQ_NOPROBE 0x00020000 /* IRQ is not valid for probing */
#define IRQ_NOREQUEST 0x00040000 /* IRQ cannot be requested */
#define IRQ_NOAUTOEN 0x00080000 /* IRQ will not be enabled on request irq */
-#define IRQ_DELAYED_DISABLE 0x00100000 /* IRQ disable (masking) happens delayed. */
-#define IRQ_WAKEUP 0x00200000 /* IRQ triggers system wakeup */
-#define IRQ_MOVE_PENDING 0x00400000 /* need to re-target IRQ destination */
-#define IRQ_NO_BALANCING 0x00800000 /* IRQ is excluded from balancing */
+#define IRQ_WAKEUP 0x00100000 /* IRQ triggers system wakeup */
+#define IRQ_MOVE_PENDING 0x00200000 /* need to re-target IRQ destination */
+#define IRQ_NO_BALANCING 0x00400000 /* IRQ is excluded from balancing */

#ifdef CONFIG_IRQ_PER_CPU
# define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

2007-01-30 08:03:09

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Ingo Molnar wrote:
> btw., it would be great if you could help us here: could you perhaps,
> from a past example, outline a specific case of such an ATA/USB IRQ
> storm and how it occured (precisely) - and what the fix was? I'd like to
> analyze a specific case to make sure the genirq layer recovers from such
> cases more gracefully. In general, i think the IRQ subsystem needs to
> become more failure-resilient and needs to become more auto-learning
> (and these two dont stand in the way of good performance). This problem
> of shared IRQs will be with us for at least another 10 years, if not
> more. (for example ISA is /still/ not dead everywhere and it was already
> legacy technology 15 years ago when Linux was started.)


Easy to name an example, as they are pretty generic. When sharing irqs
-- usually ATA is configured to PCI native (IO-APIC-fasteoi) -- any
interrupt storm causes the other devices sharing that irq to crap
themselves (kernel turns off irq, suggests irqpoll, etc.)

ATA is unfortunately easier to cause interrupt storms than most because
the standard PCI IDE definition has __no__ possible way to indicate
certain interrupt conditions are pending. You have to /know/ that you
are expecting an interrupt, which causes problems if the hardware
decides to send the interrupt early or late, rather than when its
expected. Most modern hardware has a read/write/clear interrupt status
register that gives you an immediate summary of the pending interrupt
conditions, and an easy way to ack the pending events. ATA does not
have any such capability.

That said, stuff like AHCI or sata_sil or sata_sil24 do have modern
designs with the expected interrupt status register(s), so they do not
suffer from the problems suffered by the more legacy-like hardware
(ata_piix, sata_via, pata_*)

Jeff


2007-01-30 08:04:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Jeff Garzik <[email protected]> wrote:

> Sharing irqs /sucks/. [...]

btw., MSI is not really needed to avoid the sharing of irqs: x86 has 224
IRQ vectors which is abundant for all but the largest boxes. Even the
smallest laptop tends to have an IO-APIC with at least 24 pins - which
is enough to never have to share irqs. How system designers can still
end up with mapping so many devices to the same pin is really their
fault.

so MSI's only true accomplishment AFAICS is that it now says on the
hardware level that "you must not share IRQs". Well, doh...

Ingo

2007-01-30 08:09:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Jeff Garzik <[email protected]> wrote:

> Easy to name an example, as they are pretty generic. When sharing
> irqs -- usually ATA is configured to PCI native (IO-APIC-fasteoi) --
> any interrupt storm causes the other devices sharing that irq to crap
> themselves (kernel turns off irq, suggests irqpoll, etc.)

ok. Can you suggest any way for me to reproduce such a bug artificially
on a test system? [i have both old and new systems, so if you can think
of a way for me to trigger this i'd be happy to try]

I /think/ my two patches should automatically avoid the 'cap themselves'
effect you outlined: the absolutely worst case should be that we'll have
twice the IRQ rate of the optimal one - but no irq storm nor lost
interrupts should happen due to irq trigger type mismatches, ever - as
long as the basic mapping of device to IRQ is correct. [ I tried to push
to include this in v2.6.20 but i lost that argument ;-) ]

Ingo

2007-01-30 08:14:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Ingo Molnar <[email protected]> wrote:

> I /think/ my two patches should automatically avoid the 'cap themselves'
^--crap
> effect you outlined: the absolutely worst case should be that we'll
> have twice the IRQ rate of the optimal one - but no irq storm nor lost
> interrupts should happen due to irq trigger type mismatches, ever - as
> long as the basic mapping of device to IRQ is correct. [ I tried to
> push to include this in v2.6.20 but i lost that argument ;-) ]

2007-01-30 08:59:29

by Len Brown

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Monday 29 January 2007 19:12, Linus Torvalds wrote:
>
> On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> >
> > Why do you insist on maintaining the wrong initialization order
> > on resume? When I raised the issue, Len brought up that the resume
> > order did not match spec, but then there has been slow progress
> > in fixing it (it's buried in -mm tree).
>
> It's not getting merged, SINCE IT DOESN'T WORK. It causes all sorts of
> problems, because ACPI requires all kinds of things to be up and running
> in order to actually work, and that in turn breaks all the devices that
> have different ordering constraints.
>
> ACPI is a piece of sh*t. It asks the OS to do impossible things, like
> running it early in the config sequence when it then at the same time
> wants to depend on stuff that are there *late* in the sequence. It's not
> the first time this insane situation has happened, either.

And it will not be the last:-)

There are really two cases, one is easy, one hard:

1. The ACPI spec and our knowledge of how the HW and talking to our own BIOS
folks tells us quite a bit about how things are supposed to work.

2. "Windows Bug Compatibility" (tm)
When OEMs build systems and test them only with Windows, then
the implementation quirks of Windows get ingrained in the platforms.
Linux then tries to run on the same platform and wonders why
the BIOS does "unusual" things. The answer is because it has been
only tested on Windows and BIOS quirks slip through Windows testing.

To be fair, the exact same thing would happen in reverse to Windows
if vendors only tested with Linux.

http://www.linuxfirmwarekit.org/ is intended to help mitigate some of this
problem. So at least vendors that care about Linux can make sure that
they minimize the curve balls they throw us.

An example of a recent curve ball is when the BIOS supplies two APIC (MADT)
tables. Well, the spec says there should be only one... We have proof
that Windows doesn't use the 1st for enumerating processors because
Windows works on a box with a garbled 1st table.
If we prove that Windows doesn't use the second either then it means
they enumerate processors via the DSDT -- which means bringing up
the ACPI interpreter before bringing up SMP -- and that would require
a significant change to Linux boot sequence...

> But we'll try to merge the patch that totally switches around the whole
> initialization order hopefully early after 2.6.20. But no way in hell do
> we do it now, and I personally suspect we'll end reverting it when we do
> try it just because it will probably break other things. But we'll see.

I agree with this plan, and I concur with your outlook.

I think Rafel is holding the ball here as we wait for an SMP-safe freezer:
http://lists.osdl.org/pipermail/linux-pm/2006-December/004233.html

cheers,
-Len

2007-01-30 16:00:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Hi,

On Tuesday, 30 January 2007 09:57, Len Brown wrote:
> On Monday 29 January 2007 19:12, Linus Torvalds wrote:
> >
> > On Mon, 29 Jan 2007, Stephen Hemminger wrote:
> > >
> > > Why do you insist on maintaining the wrong initialization order
> > > on resume? When I raised the issue, Len brought up that the resume
> > > order did not match spec, but then there has been slow progress
> > > in fixing it (it's buried in -mm tree).
> >
> > It's not getting merged, SINCE IT DOESN'T WORK. It causes all sorts of
> > problems, because ACPI requires all kinds of things to be up and running
> > in order to actually work, and that in turn breaks all the devices that
> > have different ordering constraints.
> >
> > ACPI is a piece of sh*t. It asks the OS to do impossible things, like
> > running it early in the config sequence when it then at the same time
> > wants to depend on stuff that are there *late* in the sequence. It's not
> > the first time this insane situation has happened, either.
>
> And it will not be the last:-)
>
> There are really two cases, one is easy, one hard:
>
> 1. The ACPI spec and our knowledge of how the HW and talking to our own BIOS
> folks tells us quite a bit about how things are supposed to work.
>
> 2. "Windows Bug Compatibility" (tm)
> When OEMs build systems and test them only with Windows, then
> the implementation quirks of Windows get ingrained in the platforms.
> Linux then tries to run on the same platform and wonders why
> the BIOS does "unusual" things. The answer is because it has been
> only tested on Windows and BIOS quirks slip through Windows testing.
>
> To be fair, the exact same thing would happen in reverse to Windows
> if vendors only tested with Linux.
>
> http://www.linuxfirmwarekit.org/ is intended to help mitigate some of this
> problem. So at least vendors that care about Linux can make sure that
> they minimize the curve balls they throw us.
>
> An example of a recent curve ball is when the BIOS supplies two APIC (MADT)
> tables. Well, the spec says there should be only one... We have proof
> that Windows doesn't use the 1st for enumerating processors because
> Windows works on a box with a garbled 1st table.
> If we prove that Windows doesn't use the second either then it means
> they enumerate processors via the DSDT -- which means bringing up
> the ACPI interpreter before bringing up SMP -- and that would require
> a significant change to Linux boot sequence...
>
> > But we'll try to merge the patch that totally switches around the whole
> > initialization order hopefully early after 2.6.20. But no way in hell do
> > we do it now, and I personally suspect we'll end reverting it when we do
> > try it just because it will probably break other things. But we'll see.
>
> I agree with this plan, and I concur with your outlook.
>
> I think Rafel is holding the ball here as we wait for an SMP-safe freezer:
> http://lists.osdl.org/pipermail/linux-pm/2006-December/004233.html

Well, no longer. :-)

The freezer in 2.6.20-rc6 should be SMP-safe and the patches to change
the suspend-resume code ordering are in -mm:

pm-change-code-ordering-in-mainc.patch
swsusp-change-code-ordering-in-diskc.patch
swsusp-change-code-order-in-diskc-fix.patch
swsusp-change-code-ordering-in-userc.patch
swsusp-change-code-ordering-in-userc-sanity.patch
swsusp-change-pm_ops-handling-by-userland-interface.patch

I have no problems whatsoever with these patches on SMP boxes and if anyone
has, please let me know.

Greetings,
Rafael


--
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King

2007-01-30 21:28:14

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Hi.

On Tue, 2007-01-30 at 17:01 +0100, Rafael J. Wysocki wrote:
> The freezer in 2.6.20-rc6 should be SMP-safe and the patches to change
> the suspend-resume code ordering are in -mm:
>
> pm-change-code-ordering-in-mainc.patch
> swsusp-change-code-ordering-in-diskc.patch
> swsusp-change-code-order-in-diskc-fix.patch
> swsusp-change-code-ordering-in-userc.patch
> swsusp-change-code-ordering-in-userc-sanity.patch
> swsusp-change-pm_ops-handling-by-userland-interface.patch
>
> I have no problems whatsoever with these patches on SMP boxes and if anyone
> has, please let me know.

I've been running an SMP box here with the matching changes for
Suspend2, with no problems. I believe the algorithm looks good.

Regards,

Nigel

2007-01-31 15:28:00

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Ingo Molnar wrote:
> * Jeff Garzik <[email protected]> wrote:
>
>> Easy to name an example, as they are pretty generic. When sharing
>> irqs -- usually ATA is configured to PCI native (IO-APIC-fasteoi) --
>> any interrupt storm causes the other devices sharing that irq to crap
>> themselves (kernel turns off irq, suggests irqpoll, etc.)
>
> ok. Can you suggest any way for me to reproduce such a bug artificially
> on a test system? [i have both old and new systems, so if you can think
> of a way for me to trigger this i'd be happy to try]

Should be pretty easy. With either the old-IDE driver or libata,
complete a command without acknowledging an interrupt. For libata, that
means poking around in ata_host_intr() and avoiding well-built hardware
like AHCI. Anything that uses ata_piix driver, basically all Intel
machines, should be applicable in the "not well built" category... :)

Jeff



2007-01-31 17:40:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage


* Jeff Garzik <[email protected]> wrote:

> >ok. Can you suggest any way for me to reproduce such a bug
> >artificially on a test system? [i have both old and new systems, so
> >if you can think of a way for me to trigger this i'd be happy to try]
>
> Should be pretty easy. With either the old-IDE driver or libata,
> complete a command without acknowledging an interrupt. For libata,
> that means poking around in ata_host_intr() and avoiding well-built
> hardware like AHCI. Anything that uses ata_piix driver, basically all
> Intel machines, should be applicable in the "not well built"
> category... :)

ok, here's one victi^H^H^H^H testbox that seems to match your
description:

18: 3 0 IO-APIC-fasteoi uhci_hcd:usb3, ohci1394
19: 2413090 0 IO-APIC-fasteoi uhci_hcd:usb2, libata
22: 168 0 IO-APIC-fasteoi HDA Intel
23: 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5

so i should try to generate some missing ACK [this meaning a missing
driver-level ack, right?] on IRQ#19's libata handler - and i should
expect a screaming interrupt? Or non-working USB? Or both?

[ i can hunt for other hardware if this doesnt look broken enough to you
:-) ]

Ingo

2007-01-31 17:52:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Ingo Molnar wrote:
> 19: 2413090 0 IO-APIC-fasteoi uhci_hcd:usb2, libata

Yep, that's a good candidate for such experiments :)

Jeff



2007-01-31 20:12:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

On Wed, 2007-01-31 at 12:52 -0500, Jeff Garzik wrote:
> Ingo Molnar wrote:
> > 19: 2413090 0 IO-APIC-fasteoi uhci_hcd:usb2, libata
>
> Yep, that's a good candidate for such experiments :)

Happens to be the same thing, which causes a stale interrupt on the
second suspend/resume cycle.

tglx


2007-02-01 12:49:54

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc6 - sky2 resume breakage

Hi!

> If we prove that Windows doesn't use the second either then it means
> they enumerate processors via the DSDT -- which means bringing up
> the ACPI interpreter before bringing up SMP -- and that would require
> a significant change to Linux boot sequence...

Well, as we can do cpu hotplug these days... we could do this. Just
boot up with single cpu, then bring up additional cpus at runtime...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-02-07 11:57:41

by Conke Hu

[permalink] [raw]
Subject: [LIBATA BUG] sr.c: TEST_UNIT_READY error

Hi,
TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
see below) always returns error.

---------------- code begin -----------------------------
retries = 0;
do {
memset((void *)cmd, 0, MAX_COMMAND_SIZE);
cmd[0] = TEST_UNIT_READY;

the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
0, &sshdr, SR_TIMEOUT,
MAX_RETRIES);

retries++;
} while (retries < 5 &&
(!scsi_status_is_good(the_result) ||
(scsi_sense_valid(&sshdr) &&
sshdr.sense_key == UNIT_ATTENTION)));
---------------- code end -----------------------------

I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
and retries==5; on silicon image 3132, i get the_result=0x2eb.
Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
what's wrong?


Conke



2007-02-07 12:40:36

by Jeff Garzik

[permalink] [raw]
Subject: Re: [LIBATA BUG] sr.c: TEST_UNIT_READY error

Conke Hu wrote:
> Hi,
> TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
> see below) always returns error.
>
> ---------------- code begin -----------------------------
> retries = 0;
> do {
> memset((void *)cmd, 0, MAX_COMMAND_SIZE);
> cmd[0] = TEST_UNIT_READY;
>
> the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
> 0, &sshdr, SR_TIMEOUT,
> MAX_RETRIES);
>
> retries++;
> } while (retries < 5 &&
> (!scsi_status_is_good(the_result) ||
> (scsi_sense_valid(&sshdr) &&
> sshdr.sense_key == UNIT_ATTENTION)));
> ---------------- code end -----------------------------
>
> I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
> and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
> and retries==5; on silicon image 3132, i get the_result=0x2eb.
> Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
> what's wrong?

What does the sense data returned in the sense buffer say is wrong?

Jeff



2007-02-08 11:29:36

by Conke Hu

[permalink] [raw]
Subject: Re: [LIBATA BUG] sr.c: TEST_UNIT_READY error

On Wed, 2007-02-07 at 07:40 -0500, Jeff Garzik wrote:
> Conke Hu wrote:
> > Hi,
> > TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
> > see below) always returns error.
> >
> > ---------------- code begin -----------------------------
> > retries = 0;
> > do {
> > memset((void *)cmd, 0, MAX_COMMAND_SIZE);
> > cmd[0] = TEST_UNIT_READY;
> >
> > the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
> > 0, &sshdr, SR_TIMEOUT,
> > MAX_RETRIES);
> >
> > retries++;
> > } while (retries < 5 &&
> > (!scsi_status_is_good(the_result) ||
> > (scsi_sense_valid(&sshdr) &&
> > sshdr.sense_key == UNIT_ATTENTION)));
> > ---------------- code end -----------------------------
> >
> > I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
> > and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
> > and retries==5; on silicon image 3132, i get the_result=0x2eb.
> > Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
> > what's wrong?
>
> What does the sense data returned in the sense buffer say is wrong?
>
> Jeff

I dump scsi_sense_hdr as follows:
sshdr.response_code = 0x70
sshdr.sense_key = 0x2
sshdr.asc = 0x3a
sshdr.ascq = 0x1
sshdr.additional_length = 0x0

the sense_key is 0x2 (NOT_READY), but the expected UNIT_ATTENTION :(

BTW, I am sorry for a mistake, Sil3132 also returns 0x8000002, not 0x2eb
as I said in the first mail. In a word, all cases return "the_result" as
0x8000002.

Conke



2007-02-13 07:30:44

by Conke Hu

[permalink] [raw]
Subject: Re: [LIBATA BUG] sr.c: TEST_UNIT_READY error

On 2/2/07, Conke Hu <[email protected]> wrote:
> On Wed, 2007-02-07 at 07:40 -0500, Jeff Garzik wrote:
> > Conke Hu wrote:
> > > Hi,
> > > TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
> > > see below) always returns error.
> > >
> > > ---------------- code begin -----------------------------
> > > retries = 0;
> > > do {
> > > memset((void *)cmd, 0, MAX_COMMAND_SIZE);
> > > cmd[0] = TEST_UNIT_READY;
> > >
> > > the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
> > > 0, &sshdr, SR_TIMEOUT,
> > > MAX_RETRIES);
> > >
> > > retries++;
> > > } while (retries < 5 &&
> > > (!scsi_status_is_good(the_result) ||
> > > (scsi_sense_valid(&sshdr) &&
> > > sshdr.sense_key == UNIT_ATTENTION)));
> > > ---------------- code end -----------------------------
> > >
> > > I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
> > > and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
> > > and retries==5; on silicon image 3132, i get the_result=0x2eb.
> > > Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
> > > what's wrong?
> >
> > What does the sense data returned in the sense buffer say is wrong?
> >
> > Jeff
>
> I dump scsi_sense_hdr as follows:
> sshdr.response_code = 0x70
> sshdr.sense_key = 0x2
> sshdr.asc = 0x3a
> sshdr.ascq = 0x1
> sshdr.additional_length = 0x0
>
> the sense_key is 0x2 (NOT_READY), but the expected UNIT_ATTENTION :(
but "NOT" the expected UNIT_ATTENTION.

>
> BTW, I am sorry for a mistake, Sil3132 also returns 0x8000002, not 0x2eb
> as I said in the first mail. In a word, all cases return "the_result" as
> 0x8000002.
>


the bytes 0 ~ 13 in sense buffer are:
70 00 02 00 00 00 00 0a 00 00 00 00 3a
other bytes are all 0x00;

in fact this issue can be reproduced in any libata driver, either sata or pata.

Conke

2007-02-15 06:30:06

by Conke Hu

[permalink] [raw]
Subject: Re: [LIBATA BUG] sr.c: TEST_UNIT_READY error

On 2/13/07, Conke Hu <[email protected]> wrote:
> On 2/2/07, Conke Hu <[email protected]> wrote:
> > On Wed, 2007-02-07 at 07:40 -0500, Jeff Garzik wrote:
> > > Conke Hu wrote:
> > > > Hi,
> > > > TEST_UNIT_READY in get_capabilities (drivers/scsi/sr.c line 743, or
> > > > see below) always returns error.
> > > >
> > > > ---------------- code begin -----------------------------
> > > > retries = 0;
> > > > do {
> > > > memset((void *)cmd, 0, MAX_COMMAND_SIZE);
> > > > cmd[0] = TEST_UNIT_READY;
> > > >
> > > > the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
> > > > 0, &sshdr, SR_TIMEOUT,
> > > > MAX_RETRIES);
> > > >
> > > > retries++;
> > > > } while (retries < 5 &&
> > > > (!scsi_status_is_good(the_result) ||
> > > > (scsi_sense_valid(&sshdr) &&
> > > > sshdr.sense_key == UNIT_ATTENTION)));
> > > > ---------------- code end -----------------------------
> > > >
> > > > I debugged all kernel versions from 2.6.17 to 2.6.20 on several AMD
> > > > and other vendor's PATA/IDE controllers, and I get the_result==0x8000002
> > > > and retries==5; on silicon image 3132, i get the_result=0x2eb.
> > > > Does 0x8000002 mean ((DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION)?
> > > > what's wrong?
> > >
> > > What does the sense data returned in the sense buffer say is wrong?
> > >
> > > Jeff
> >
> > I dump scsi_sense_hdr as follows:
> > sshdr.response_code = 0x70
> > sshdr.sense_key = 0x2
> > sshdr.asc = 0x3a
> > sshdr.ascq = 0x1
> > sshdr.additional_length = 0x0
> >
> > the sense_key is 0x2 (NOT_READY), but the expected UNIT_ATTENTION :(
> but "NOT" the expected UNIT_ATTENTION.
>
> >
> > BTW, I am sorry for a mistake, Sil3132 also returns 0x8000002, not 0x2eb
> > as I said in the first mail. In a word, all cases return "the_result" as
> > 0x8000002.
> >
>
>
> the bytes 0 ~ 13 in sense buffer are:
> 70 00 02 00 00 00 00 0a 00 00 00 00 3a
> other bytes are all 0x00;
>
> in fact this issue can be reproduced in any libata driver, either sata or pata.
>
> Conke
>

[resend]
any suggestion ?