2006-11-08 02:33:51

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.19-rc5


Ok, things are finally calming down, it seems.

The -rc5 thing is mainly a few random architecture updates (arm, mips,
uml, avr, power) and the only really noticeable one there is likely some
fixes to the local APIC accesses on x86, which apparently fixes a few
machines.

The rest is really mostly one-liners (or close) to various subsystems. New
PCI ID's, trivial fixes, cifs, dvb, things like that. I'm feeling better
about this - there may be a -rc6, but maybe we don't even need one.

As usual, thanks to everybody who tested and chased down some of the
regressions,

Linus

---
Adrian Bunk (2):
[TIPC] net/tipc/port.c: fix NULL dereference
PCI: Let PCI_MULTITHREAD_PROBE depend on BROKEN

Akinobu Mita (4):
tokenring: fix module_init error handling
n2: fix confusing error code
edac_mc: fix error handling
sunrpc: add missing spin_unlock

Al Viro (8):
[IPV6]: File the fingerprints off ah6->spi/esp6->spi
[IPX]: Trivial parts of endianness annotations
[IPX]: Annotate and fix IPX checksum
[IPV6]: Fix ECN bug on big-endian
[NETFILTER] bug: NFULA_CFG_QTHRESH uses 32bit
[NETFILTER] bug: nfulnl_msg_config_mode ->copy_range is 32bit
[NETFILTER] bug: skb->protocol is already net-endian
[PKTGEN]: TCI endianness fixes

Alexey Dobriyan (1):
[GFS2] don't panic needlessly

Amol Lad (1):
drivers/isdn/hysdn/hysdn_sched.c: sleep after taking spinlock fix

Andreas Gruenbacher (1):
Fix user.* xattr permission check for sticky dirs

Andrew Morton (6):
find_bd_holder() fix
tidy "md: check bio address after mapping through partitions"
Add printk_timed_ratelimit()
schedule removal of FUTEX_FD
acpi_noirq section fix
spi section fix

Andy Fleming (2):
[POWERPC] Fix rmb() for e500-based machines it
[POWERPC] Fix oprofile support for e500 in arch/powerpc

Ankita Garg (1):
Fix for LKDTM MEM_SWAPOUT crashpoint

Atsushi Nemoto (2):
[MIPS] Fixup migration to GENERIC_TIME
[MIPS] Do not use -msym32 option for modules.

Auke Kok (1):
e1000: Fix regression: garbled stats and irq allocation during swsusp

Ben Dooks (5):
[ARM] 3915/1: S3C2412: Add s3c2410_gpio_getirq() to general gpio.c
[ARM] 3920/1: S3C24XX: Remove smdk2410_defconfig
[ARM] 3921/1: S3C24XX: remove bast_defconfig
[ARM] 3922/1: S3C24XX: update s3c2410_defconfig to 2.6.19-rc4
[ARM] 3923/1: S3C24XX: update s3c2410_defconfig with new drivers

Benjamin Herrenschmidt (2):
[POWERPC] Fix various offb issues
[POWERPC] Make alignment exception always check exception table

Bjorn Schneider (1):
USB: new VID/PID-combos for cp2101

Brice Goglin (1):
myri10ge: ServerWorks HT2000 PCI id is already defined in pci_ids.h

Daniel Drake (1):
jfs: Add splice support

Daniel Ritz (1):
usbtouchscreen: use endpoint address from endpoint descriptor

Daniel Yeisley (1):
init_reap_node() initialization fix

Dave Kleikamp (1):
JFS: Remove redundant xattr permission checking

David Brownell (3):
USB: fix compiler issues with newer gcc versions
USB: use MII hooks only if CONFIG_MII is enabled
[ARM] 3926/1: make timer led handle HZ != 100

David H?rdeman (1):
V4L/DVB (4785): Budget-ci: Change DEBIADDR_IR to a safer default

David Rientjes (1):
net s2io: return on NULL dev_alloc_skb()

David S. Miller (7):
[APPLETALK]: Fix potential OOPS in atalk_sendmsg().
[XFRM] xfrm_user: Fix unaligned accesses.
[ETH1394]: Fix unaligned accesses.
[SPARC64]: Fix Tomatillo/Schizo IRQ handling.
[SPARC64]: Add some missing print_symbol() calls.
[SPARC64]: Fix futex_atomic_cmpxchg_inatomic implementation.
[SPARC]: Fix robust futex syscalls and wire up migrate_pages.

Dmitry Mishin (3):
[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables
[NETFILTER]: ip_tables: compat code module refcounting fix
[IPV6]: Add ndisc_netdev_notifier unregister.

Dominic Cerquetti (1):
USB: xpad: additional USB id's added

Enrico Scholz (1):
[ARM] 3919/1: Fixed definition of some PXA270 CIF related registers

Erez Zilber (1):
IB/iser: Start connection after enabling iSER

Eric Sandeen (1):
fix UFS superblock alignment issues

Eric W. Biederman (3):
Improve the removed sysctl warnings
sysctl: allow a zero ctl_name in the middle of a sysctl table
sysctl: implement CTL_UNNUMBERED

Gautham R Shenoy (1):
Fix the spurious unlock_cpu_hotplug false warnings

Grant Grundler (1):
hid-core: big-endian fix fix

Greg Kroah-Hartman (2):
PCI: Revert "PCI: i386/x86_84: disable PCI resource decode on device disable"
USB: add another sierra wireless device id

Gui,Jian (1):
[POWERPC] Disallow kprobes on emulate_step and branch_taken

Haavard Skinnemoen (4):
AVR32: Get rid of board_early_init
AVR32: Fix thinko in generic_find_next_zero_le_bit()
AVR32: Wire up sys_epoll_pwait
AVR32: Add missing return instruction in __raw_writesb

Hartmut Hackmann (1):
V4L/DVB (4770): Fix mode switch of Compro Videomate T300

Heiko Carstens (4):
[NET]: fix uaccess handling
sys_pselect7 vs compat_sys_pselect7 uaccess error handling
[S390] revert add_active_range() usage patch.
[S390] IRQs too early enabled.

Herbert Xu (2):
[NET]: Fix segmentation of linear packets
[SCTP]: Always linearise packet on input

Hugh Dickins (3):
[POWERPC] Make current preempt-safe
[POWERPC] Make high hugepage areas preempt safe
[POWERPC] Make mmiowb's io_sync preempt safe

Jack Morgenstein (1):
IB/uverbs: Return sq_draining value in query_qp response

James Morris (3):
[IPV6]: fix lockup via /proc/net/ip6_flowlabel
[IPV6]: return EINVAL for invalid address with flowlabel lease request
[IPV6]: fix flowlabel seqfile handling

Jamie Lenehan (2):
sh: Fix IPR-IRQ's for IRQ-chip change breakage.
sh: Titan defconfig update.

Jan Luebbe (1):
USB: sierra: Fix id for Sierra Wireless MC8755 in new table

Jan Mate (1):
USB Storage: unusual_devs.h entry for Sony Ericsson P990i

Jan-Benedict Glaw (1):
Update for the srm_env driver.

Jan-Bernd Themann (1):
ehea: kzalloc GFP_ATOMIC fix

Jeff Dike (4):
uml: add _text definition to linker scripts
uml: add INITCALLS
uml: fix I/O hang
uml: include tidying

Jeff Garzik (1):
Revert "Add 0x7110 piix to ata_piix.c"

Jeff Mahoney (1):
reiserfs: reset errval after initializing bitmap cache

Jens Axboe (3):
CFQ: request <-> request merging rr_list fixup
Add 0x7110 piix to ata_piix.c
splice: fix problem introduced with inode diet

Jes Sorensen (1):
[IA64] don't double >> PAGE_SHIFT pointer for /dev/kmem access

Jiri Benc (1):
ieee80211: don't flood log with errors

Johannes Berg (1):
b44: change comment about irq mask register

Keith Owens (1):
[IA64] Correct definition of handle_IPI

Kenji Kaneshige (1):
[IA64] cpu-hotplug: Fixing confliction between CPU hot-add and IPI

Kevin Hilman (2):
[ARM] 3917/1: Fix dmabounce symbol exports
[ARM] 3918/1: ixp4xx irq-chip rework

Krishna Kumar (1):
RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count

Kristoffer Ericson (1):
video: Fix include in hp680_bl.

Larry Finger (1):
bcm43xx: fix unexpected LED control values in BCM4303 sprom

Larry Woodman (1):
[NET]: __alloc_pages() failures reported due to fragmentation

Lennert Buytenhek (3):
ep93xx_eth: fix RX/TXstatus ring full handling
ep93xx_eth: fix unlikely(x) > y test
ep93xx_eth: don't report RX errors

Linas Vepstas (1):
[POWERPC] Use 4kB iommu pages even on 64kB-page systems

Linus Torvalds (6):
i386: clean up io-apic accesses
i386: write IO APIC irq routing entries in correct order
Revert unintentional "volatile" changes in ipc/msg.c
Fix unlikely (but possible) race condition on task->user access
Make sure "user->sigpending" count is in sync
Linux 2.6.19-rc5

Manish Lachwani (1):
[MIPS] Add missing file for support of backplane on TX4927 based board

Martin Josefsson (1):
[NETFILTER]: nf_conntrack: add missing unlock in get_next_corpse()

Meelis Roos (1):
[NETFILTER]: silence a warning in ebtables

Michael Buesch (1):
bcm43xx: Fix low-traffic netdev watchdog TX timeouts

Michael Chan (1):
[TG3]: Fix 2nd ifup failure on 5752M.

Michael Halcrow (7):
eCryptfs: Clean up crypto initialization
eCryptfs: Hash code to new crypto API
eCryptfs: Cipher code to new crypto API
eCryptfs: Consolidate lower dentry_open's
eCryptfs: Remove ecryptfs_umount_begin
eCryptfs: Fix handling of lower d_count
eCryptfs: Fix pointer deref

Michael S. Tsirkin (1):
IB/mthca: Fix MAD extended header format for MAD_IFC firmware command

Naranjo Manuel Francisco (1):
USB: HID: add blacklist AIRcable USB, little beautification

NeilBrown (2):
md: check bio address after mapping through partitions.
md: send online/offline uevents when an md array starts/stops

nkalmala (1):
mm: un-needed add-store operation wastes a few bytes

OGAWA Hirofumi (4):
Cleanup read_pages()
cifs: ->readpages() fixes
fuse: ->readpages() cleanup
gfs2: ->readpages() fixes

Oleg Nesterov (2):
taskstats: fix sub-threads accounting
fix Documentation/accounting/getdelays.c buf size

Oliver Endriss (1):
V4L/DVB (4784): [saa7146_i2c] short_delay mode fixed for fast machines

Oliver Neukum (2):
USB: failure in usblp's error path
USB: usblp: fix system suspend for some systems

Paolo 'Blaisorblade' Giarrusso (11):
uml ubd driver: allow using up to 16 UBD devices
uml ubd driver: document some struct fields
uml ubd driver: var renames
uml ubd driver: give better names to some functions.
uml ubd driver: change ubd_lock to be a mutex
uml ubd driver: ubd_io_lock usage fixup
uml ubd driver: convert do_ubd to a boolean variable
uml ubd driver: reformat ubd_config
uml ubd driver: use bitfields where possible
uml ubd driver: do not store error codes as ->fd
uml ubd driver: various little changes

Patrick Caulfield (2):
[DLM] Fix kref_put oops
[DLM] fix oops in kref_put when removing a lockspace

Patrick McHardy (2):
[NETFILTER]: remove masq/NAT from ip6tables Kconfig help
[IPV6]: Give sit driver an appropriate module alias.

Paul Gortmaker (1):
[ARM] 3912/1: Make PXA270 advertise HWCAP_IWMMXT capability

Paul Mackerras (2):
IB/ehca: Fix eHCA driver compilation for uniprocessor
powerpc: Eliminate "exceeds stub group size" linker warning

Paul Moore (2):
[NetLabel]: protect the CIPSOv4 socket option from setsockopt()
[NETLABEL]: Fix build failure.

Paul Mundt (2):
sh: Wire up new syscalls.
sh: Update r7780rp_defconfig.

Pavel Emelianov (1):
Fix ipc entries removal

Pavel Roskin (1):
hostap_plx: fix CIS verification

Peer Chen (5):
[libata] sata_nv: Add PCI IDs
[libata] Add support for PATA controllers of MCP67 to pata_amd.c.
[libata] Add support for AHCI controllers of MCP67.
pci_ids.h: Add NVIDIA PCI ID
IDE: Add the support of nvidia PATA controllers of MCP67 to amd74xx.c

Peter Zijlstra (1):
lockdep: fix delayacct locking bug

Phil Dibowitz (1):
USB: usb-storage: Unusual_dev update

Rafael J. Wysocki (1):
swsusp: debugging

Ralf Baechle (26):
[MIPS] TX4927: Remove indent error message that somehow ended in the code.
[MIPS] Sort out missuse of __init for prom_getcmdline()
[MIPS] VSMP: Fix initialization ordering bug.
[MIPS] Flags must be unsigned long.
[MIPS] VSMP: Synchronize cp0 counters on bootup.
[MIPS] 16K & 64K page size fixes
[MIPS] SMTC: Fix crash if # of TC's > # of VPE's after pt_regs irq cleanup.
[MIPS] SMTC: Synchronize cp0 counters on bootup.
Revert "[MIPS] Make SPARSEMEM selectable on QEMU."
[MIPS] Fix merge screwup by patch(1)
[MIPS] IP27: Allow SMP ;-) Another changeset messed up by patch.
[MIPS] Fix warning about init_initrd() call if !CONFIG_BLK_DEV_INITRD.
[MIPS] Ocelot G: Fix : "CURRENTLY_UNUSED" is not defined warning.
[MIPS] Don't use R10000 llsc workaround version for all llsc-full processors.
[MIPS] Ocelot C: Fix large number of warnings.
[MIPS] Ocelot C: fix eth registration after conversion to platform_device
[MIPS] Ocelot C: Fix warning about missmatching format string.
[MIPS] Ocelot C: Fix mapping of ioport address range.
[MIPS] Ocelot 3: Fix large number of warnings.
[MIPS] SB1: On bootup only flush cache on local CPU.
[MIPS] Ocelot C: Fix MAC address detection after platform_device conversion.
[MIPS] Ocelot 3: Fix MAC address detection after platform_device conversion.
[MIPS] EV64120: Fix timer initialization for HZ != 100.
[MIPS] Make irq number allocator generally available for fixing EV64120.
[MIPS] EV64120: Fix PCI interrupt allocation.
[MIPS] Fix EV64120 and Ocelot builds by providing a plat_timer_setup().

Randy Dunlap (8):
[NET] sealevel: uses arp_broken_ops
[DCCP]: fix printk format warnings
SCSI: ISCSI build failure
V4L/DVB (4786): Pvrusb2: use NULL instead of 0
update some docbook comments
docbook: merge journal-api into filesystems.tmpl
lkdtm: cleanup headers and module_param/MODULE_PARM_DESC
Kconfig: remove redundant NETDEVICES depends

Ray Lehtiniemi (1):
[ARM] 3927/1: Allow show_mem() to work with holes in memory map.

Raymond Mantchala (1):
V4L/DVB (4787): Budget-ci: Inversion setting fixed for Technotrend 1500 T

Russ Anderson (1):
[IA64] MCA recovery: Montecito support

Sean Hefty (1):
RDMA/addr: Use client registration to fix module unload race

Srinivasa Ds (1):
NFS4: fix for recursive locking problem

Stephen Hemminger (4):
sky2: not experimental
skge, sky2, et all. gplv2 only
sky2: netpoll on dual port cards
[TCP]: Set default congestion control when no sysctl.

Stephen Rothwell (3):
Create compat_sys_migrate_pages
powerpc: wire up sys_migrate_pages
Fix sys_move_pages when a NULL node list is passed

Steve French (3):
[CIFS] Fix readdir breakage when blocksize set too small
[CIFS] Allow null user connections
[CIFS] report rename failure when target file is locked by Windows

Steve Wise (2):
IB/amso1100: Use dma_alloc_coherent() instead of kmalloc/dma_map_single
IB/amso1100: Fix incorrect pr_debug()

Steven Whitehouse (2):
[GFS2] Fix incorrect fs sync behaviour.
[GFS2] Fix OOM error handling

Tejun Heo (4):
sata_sis: fix flags handling for the secondary port
libata: unexport ata_dev_revalidate()
ata_piix: allow 01b MAP for both ICH6M and ICH7M
ahci: fix status register check in ahci_softreset

Thomas Klein (3):
ehea: Nullpointer dereferencation fix
ehea: Removed redundant define
ehea: 64K page support fix

Tilman Schmidt (1):
isdn/gigaset: convert warning message

Timur Tabi (1):
[POWERPC] qe_lib: qe_issue_cmd writes wrong value to CECDR

Trent Piepho (2):
V4L/DVB (4752): DVB: Add DVB_FE_CUSTOMISE support for MT2060
V4L/DVB (4751): Fix DBV_FE_CUSTOMISE for card drivers compiled into kernel

Troy Heber (1):
[IA64] move SAL_CACHE_FLUSH check later in boot

Vasily Averin (1):
[NETFILTER]: ip_tables: compat error way cleanup

Vlad Yasevich (2):
[SCTP]: Correctly set IP id for SCTP traffic
[SCTP]: Remove temporary associations from backlog and hash.

Yoichi Yuasa (3):
[MIPS] Yosemite: fix uninitialized variable in titan_i2c_xfer()
[MIPS] Fix warning of printk format in mips_srs_init()
[MIPS] Fix warning in mips-boards generic PCI

Yvan Seth (1):
ipmi_si_intf.c sets bad class_mask with PCI_DEVICE_CLASS


2006-11-08 09:43:43

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Linux 2.6.19-rc5

Gidday.

On Tue, 2006-11-07 at 18:33 -0800, Linus Torvalds wrote:
> Ok, things are finally calming down, it seems.
>
> The -rc5 thing is mainly a few random architecture updates (arm, mips,
> uml, avr, power) and the only really noticeable one there is likely some
> fixes to the local APIC accesses on x86, which apparently fixes a few
> machines.
>
> The rest is really mostly one-liners (or close) to various subsystems. New
> PCI ID's, trivial fixes, cifs, dvb, things like that. I'm feeling better
> about this - there may be a -rc6, but maybe we don't even need one.
>
> As usual, thanks to everybody who tested and chased down some of the
> regressions,
>
> Linus

The patch etc doesn't seem to be available yet. (The front page is still
showing -rc4, for example).

Regards,

Nigel

2006-11-08 09:59:32

by Alessandro Suardi

[permalink] [raw]
Subject: Re: Linux 2.6.19-rc5

On 11/8/06, Nigel Cunningham <[email protected]> wrote:
> Gidday.
>
> On Tue, 2006-11-07 at 18:33 -0800, Linus Torvalds wrote:
> > Ok, things are finally calming down, it seems.
> >
> > The -rc5 thing is mainly a few random architecture updates (arm, mips,
> > uml, avr, power) and the only really noticeable one there is likely some
> > fixes to the local APIC accesses on x86, which apparently fixes a few
> > machines.
> >
> > The rest is really mostly one-liners (or close) to various subsystems. New
> > PCI ID's, trivial fixes, cifs, dvb, things like that. I'm feeling better
> > about this - there may be a -rc6, but maybe we don't even need one.
> >
> > As usual, thanks to everybody who tested and chased down some of the
> > regressions,
> >
> > Linus
>
> The patch etc doesn't seem to be available yet. (The front page is still
> showing -rc4, for example).

The patch is available, it's just the kernel.org home that
isn't updated.

http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.19-rc5.bz2

--alessandro

"...when I get it, I _get_ it"

(Lara Eidemiller)

2006-11-08 10:04:19

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Linux 2.6.19-rc5

Hi.

On Wed, 2006-11-08 at 10:59 +0100, Alessandro Suardi wrote:
> On 11/8/06, Nigel Cunningham <[email protected]> wrote:
> > Gidday.
> >
> > On Tue, 2006-11-07 at 18:33 -0800, Linus Torvalds wrote:
> > > Ok, things are finally calming down, it seems.
> > >
> > > The -rc5 thing is mainly a few random architecture updates (arm, mips,
> > > uml, avr, power) and the only really noticeable one there is likely some
> > > fixes to the local APIC accesses on x86, which apparently fixes a few
> > > machines.
> > >
> > > The rest is really mostly one-liners (or close) to various subsystems. New
> > > PCI ID's, trivial fixes, cifs, dvb, things like that. I'm feeling better
> > > about this - there may be a -rc6, but maybe we don't even need one.
> > >
> > > As usual, thanks to everybody who tested and chased down some of the
> > > regressions,
> > >
> > > Linus
> >
> > The patch etc doesn't seem to be available yet. (The front page is still
> > showing -rc4, for example).
>
> The patch is available, it's just the kernel.org home that
> isn't updated.
>
> http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.19-rc5.bz2

Ta. I was more concerned that whoever needs to fix whatever's broken
knows the issue exists.

Regards,

Nigel

2006-11-08 14:19:25

by Gene Heskett

[permalink] [raw]
Subject: Re: Linux 2.6.19-rc5

On Wednesday 08 November 2006 04:59, Alessandro Suardi wrote:
>On 11/8/06, Nigel Cunningham <[email protected]> wrote:
>> Gidday.
>>
>> On Tue, 2006-11-07 at 18:33 -0800, Linus Torvalds wrote:
>> > Ok, things are finally calming down, it seems.
>> >
>> > The -rc5 thing is mainly a few random architecture updates (arm,
>> > mips, uml, avr, power) and the only really noticeable one there is
>> > likely some fixes to the local APIC accesses on x86, which apparently
>> > fixes a few machines.
>> >
>> > The rest is really mostly one-liners (or close) to various
>> > subsystems. New PCI ID's, trivial fixes, cifs, dvb, things like that.
>> > I'm feeling better about this - there may be a -rc6, but maybe we
>> > don't even need one.
>> >
>> > As usual, thanks to everybody who tested and chased down some of the
>> > regressions,
>> >
>> > Linus
>>
>> The patch etc doesn't seem to be available yet. (The front page is
>> still showing -rc4, for example).
>
>The patch is available, it's just the kernel.org home that
> isn't updated.
>
Tis now, I have it building.

>http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.19-rc5.bz2
>
>--alessandro
>
>"...when I get it, I _get_ it"
>
> (Lara Eidemiller)
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

2006-11-08 15:44:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.19-rc5



On Wed, 8 Nov 2006, Nigel Cunningham wrote:
>
> The patch etc doesn't seem to be available yet. (The front page is still
> showing -rc4, for example).

It seems that mirroring is taking forever again. The patch and tar-balls
are definitely there on the master site, and even gitweb has mirrored out
(at least to one of the mirrors), but it looks like the mirroring hasn't
gotten to the kernel source "testing" directory yet.

Linus

2006-11-13 22:14:43

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.19-rc5: known regressions with patches

This email lists some known regressions in 2.6.19-rc5 compared to 2.6.18
with patches available.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : x86_64: Fix partial page check to ensure unusable memory
is not being marked usable
References : http://lkml.org/lkml/2006/11/9/239
Submitter : Aaron Durbin <[email protected]>
Caused-By : Mel Gorman <[email protected]>
commit 5cb248abf5ab65ab543b2d5fc16c738b28031fc0
Patch : http://lkml.org/lkml/2006/11/9/239
Status : patch available


Subject : libata must be initialized earlier
References : http://ozlabs.org/pipermail/linuxppc-dev/2006-November/027945.html
Submitter : Paul Mackerras <[email protected]>
Handled-By : Brian King <[email protected]>
Patch : http://marc.theaimsgroup.com/?l=linux-ide&m=116169938407596&w=2
Status : patch available


2006-11-13 22:56:59

by Brian King

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions with patches

Adrian Bunk wrote:
> Subject : libata must be initialized earlier
> References : http://ozlabs.org/pipermail/linuxppc-dev/2006-November/027945.html
> Submitter : Paul Mackerras <[email protected]>
> Handled-By : Brian King <[email protected]>
> Patch : http://marc.theaimsgroup.com/?l=linux-ide&m=116169938407596&w=2
> Status : patch available

I just resubmitted this patch a few minutes ago.

Brian

--
Brian King
eServer Storage I/O
IBM Linux Technology Center

2006-11-13 23:19:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions with patches



On Mon, 13 Nov 2006, Brian King wrote:

> Adrian Bunk wrote:
> > Subject : libata must be initialized earlier
> > References : http://ozlabs.org/pipermail/linuxppc-dev/2006-November/027945.html
> > Submitter : Paul Mackerras <[email protected]>
> > Handled-By : Brian King <[email protected]>
> > Patch : http://marc.theaimsgroup.com/?l=linux-ide&m=116169938407596&w=2
> > Status : patch available
>
> I just resubmitted this patch a few minutes ago.

I definitely want an ACK on this from Jeff - I'll take a few broken ppc64
machines any day over the worry that there might be problems elsewhere.

Jeff? Ack, Nack, or "I'll push it to you through my git tree", please..

Linus

2006-11-14 02:36:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions with patches

Linus Torvalds wrote:
>
> On Mon, 13 Nov 2006, Brian King wrote:
>
>> Adrian Bunk wrote:
>>> Subject : libata must be initialized earlier
>>> References : http://ozlabs.org/pipermail/linuxppc-dev/2006-November/027945.html
>>> Submitter : Paul Mackerras <[email protected]>
>>> Handled-By : Brian King <[email protected]>
>>> Patch : http://marc.theaimsgroup.com/?l=linux-ide&m=116169938407596&w=2
>>> Status : patch available
>> I just resubmitted this patch a few minutes ago.
>
> I definitely want an ACK on this from Jeff - I'll take a few broken ppc64
> machines any day over the worry that there might be problems elsewhere.
>
> Jeff? Ack, Nack, or "I'll push it to you through my git tree", please..

Reluctant ACK. But this whole subsys_init() mess is highly fragile, and
this is going to change again once a new dependency arises :/

Jeff



2006-11-15 10:21:18

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.19-rc5: known regressions (v3)

This email lists some known regressions in 2.6.19-rc5 compared to 2.6.18
that are not yet fixed in Linus' tree.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject : PCI MSI setting corrupted during resume
References : http://bugzilla.kernel.org/show_bug.cgi?id=7479
Submitter : Stephen Hemminger <[email protected]>
Status : unknown


Subject : SMP kernel can not generate ISA irq properly
References : http://lkml.org/lkml/2006/10/22/15
http://lkml.org/lkml/2006/11/10/142
Submitter : Komuro <[email protected]>
Handled-By : "Eric W. Biederman" <[email protected]>
Ingo Molnar <[email protected]>
Status : problem is being debugged


Subject : ThinkPad R50p: boot fail with (lapic && on_battery)
References : http://lkml.org/lkml/2006/10/31/333
Submitter : Ernst Herzberg <[email protected]>
Handled-By : Len Brown <[email protected]>
Status : problem is being debugged


Subject : x86_64: Bad page state in process 'swapper'
References : http://lkml.org/lkml/2006/11/10/135
http://lkml.org/lkml/2006/11/10/208
Submitter : Andre Noll <[email protected]>
Handled-By : Andi Kleen <[email protected]>
Status : Andi is investigating


Subject : x86_64: oprofile doesn't work
References : http://lkml.org/lkml/2006/10/27/3
Submitter : Prakash Punnoor <[email protected]>
Status : unknown


Subject : unable to rip cd
References : http://lkml.org/lkml/2006/10/13/100
http://lkml.org/lkml/2006/11/8/42
Submitter : Alex Romosan <[email protected]>
Handled-By : Jens Axboe <[email protected]>
Status : Jens is investigating


Subject : can't disable OHCI wakeup via sysfs
References : http://lkml.org/lkml/2006/11/11/33
Submitter : Andrey Borzenkov <[email protected]>
Handled-By : Alan Stern <[email protected]>
Patch : http://lkml.org/lkml/2006/11/13/261
Status : patch available


2006-11-15 10:35:43

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wed, Nov 15 2006, Adrian Bunk wrote:
> Subject : unable to rip cd
> References : http://lkml.org/lkml/2006/10/13/100
> http://lkml.org/lkml/2006/11/8/42
> Submitter : Alex Romosan <[email protected]>
> Handled-By : Jens Axboe <[email protected]>
> Status : Jens is investigating

it's fixed and patched has been merged.

--
Jens Axboe

2006-11-15 10:36:05

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:

> Subject : x86_64: oprofile doesn't work
> References : http://lkml.org/lkml/2006/10/27/3
> Submitter : Prakash Punnoor <[email protected]>
> Status : unknown
>

I confirm a got this one too.
On a working kernel on an Opteron, we have normally 4 directories
in /dev/oprofile :

# ls -ld /dev/oprofile/?
drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0
drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1
drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2
drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3

With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3

Maybe the 'bug' is in oprofile tools, that currently expect to find '0'

Eric

2006-11-15 10:50:43

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)


> On a working kernel on an Opteron, we have normally 4 directories
> in /dev/oprofile :
>
> # ls -ld /dev/oprofile/?
> drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0
> drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1
> drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2
> drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3
>
> With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3

That's because 0 was never available. It is used by the NMI watchdog.
The new kernel doesn't give it to oprofile anymore.

> Maybe the 'bug' is in oprofile tools, that currently expect to find '0'

Yes, it's likely a user space issue.

-Andi

2006-11-15 10:53:37

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wed, Nov 15, 2006 at 11:35:05AM +0100, Jens Axboe wrote:
> On Wed, Nov 15 2006, Adrian Bunk wrote:
> > Subject : unable to rip cd
> > References : http://lkml.org/lkml/2006/10/13/100
> > http://lkml.org/lkml/2006/11/8/42
> > Submitter : Alex Romosan <[email protected]>
> > Handled-By : Jens Axboe <[email protected]>
> > Status : Jens is investigating
>
> it's fixed and patched has been merged.

Thanks for the information, I've removed it from my list.

> Jens Axboe

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-11-15 11:06:25

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

Adrian Bunk wrote:
> Subject : unable to rip cd
> References : http://lkml.org/lkml/2006/10/13/100
> http://lkml.org/lkml/2006/11/8/42
> Submitter : Alex Romosan <[email protected]>
> Handled-By : Jens Axboe <[email protected]>
> Status : Jens is investigating

I think this one is already fixed.

Brice




commit 616e8a091a035c0bd9b871695f4af191df123caa
author Jens Axboe <[email protected]> 1163437499 +0100
committer Linus Torvalds <[email protected]> 1163440020 -0800

[PATCH] Fix bad data direction in SG_IO

Contrary to what the name misleads you to believe, SG_DXFER_TO_FROM_DEV
is really just a normal read seen from the device side.

This patch fixes http://lkml.org/lkml/2006/10/13/100


2006-11-15 12:07:37

by Alan

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

> Subject : PCI MSI setting corrupted during resume
> References : http://bugzilla.kernel.org/show_bug.cgi?id=7479
> Submitter : Stephen Hemminger <[email protected]>
> Status : unknown

This is one of the minor resume problems as far as I can tell. I believe
the patches I posted for having a resume quirk run on each device if
appropriate should correctly resolve these. See the patch I sent to l/k.

There are a variety of other resume quirks we definitely require.

Alan

2006-11-15 16:08:34

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)


>
> Subject : PCI MSI setting corrupted during resume
> References : http://bugzilla.kernel.org/show_bug.cgi?id=7479
> Submitter : Stephen Hemminger <[email protected]>
> Status : unknown
>
Turns out this isn't a regression, it was always there. It has to do with ACPI
clearing state on resume. MSI wasn't being used the same in older kernels so
it didn't show up.

2006-11-15 16:38:13

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

Stephen Hemminger <[email protected]> writes:

>>
>> Subject : PCI MSI setting corrupted during resume
>> References : http://bugzilla.kernel.org/show_bug.cgi?id=7479
>> Submitter : Stephen Hemminger <[email protected]>
>> Status : unknown
>>
> Turns out this isn't a regression, it was always there. It has to do with ACPI
> clearing state on resume. MSI wasn't being used the same in older kernels so
> it didn't show up.

Ok. Do we know enough to fix the MSI case?

Eric

2006-11-15 16:42:59

by William Cohen

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

Andi Kleen wrote:
>>On a working kernel on an Opteron, we have normally 4 directories
>>in /dev/oprofile :
>>
>># ls -ld /dev/oprofile/?
>>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0
>>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1
>>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2
>>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3
>>
>>With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3
>
>
> That's because 0 was never available. It is used by the NMI watchdog.
> The new kernel doesn't give it to oprofile anymore.
>
>
>>Maybe the 'bug' is in oprofile tools, that currently expect to find '0'
>
>
> Yes, it's likely a user space issue.
>
> -Andi

OProfile has a simplistic view of the performance monitoring hardware. The
routines in libop/op_alloc_counter.c determine what set of performance registers
is available from the processor in use. There is no check to see what registers
are actually available in the /dev/oprofile directory.

opcontrol executes ophelp to determine which specific counters to count which
events. The function map_event_to_counter() in libop/op_alloc_counter.c does the
actual selection. It seems what is needed is for map_event_to_counter() to check
to see which counters are available and mark the others as unavailable.

-Will

2006-11-15 16:48:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)


> OProfile has a simplistic view of the performance monitoring hardware. The
> routines in libop/op_alloc_counter.c determine what set of performance registers
> is available from the processor in use. There is no check to see what registers
> are actually available in the /dev/oprofile directory.
>
> opcontrol executes ophelp to determine which specific counters to count which
> events. The function map_event_to_counter() in libop/op_alloc_counter.c does the
> actual selection. It seems what is needed is for map_event_to_counter() to check
> to see which counters are available and mark the others as unavailable

Thanks for the explanation. Can you please fix it and release a new version?
Documentation/Changes could be adapted then.

-Andi

2006-11-15 18:44:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Wed, 15 Nov 2006 17:48:05 +0100
Andi Kleen <[email protected]> wrote:

>
> > OProfile has a simplistic view of the performance monitoring hardware. The
> > routines in libop/op_alloc_counter.c determine what set of performance registers
> > is available from the processor in use. There is no check to see what registers
> > are actually available in the /dev/oprofile directory.
> >
> > opcontrol executes ophelp to determine which specific counters to count which
> > events. The function map_event_to_counter() in libop/op_alloc_counter.c does the
> > actual selection. It seems what is needed is for map_event_to_counter() to check
> > to see which counters are available and mark the others as unavailable
>
> Thanks for the explanation. Can you please fix it and release a new version?
> Documentation/Changes could be adapted then.
>

Meanwhile we should restore the NMI counter to fix this bug.

2006-11-15 18:45:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Wednesday 15 November 2006 19:39, Andrew Morton wrote:
> On Wed, 15 Nov 2006 17:48:05 +0100
> Andi Kleen <[email protected]> wrote:
>
> >
> > > OProfile has a simplistic view of the performance monitoring hardware. The
> > > routines in libop/op_alloc_counter.c determine what set of performance registers
> > > is available from the processor in use. There is no check to see what registers
> > > are actually available in the /dev/oprofile directory.
> > >
> > > opcontrol executes ophelp to determine which specific counters to count which
> > > events. The function map_event_to_counter() in libop/op_alloc_counter.c does the
> > > actual selection. It seems what is needed is for map_event_to_counter() to check
> > > to see which counters are available and mark the others as unavailable
> >
> > Thanks for the explanation. Can you please fix it and release a new version?
> > Documentation/Changes could be adapted then.
> >
>
> Meanwhile we should restore the NMI counter to fix this bug.

No, it was always oprofile who was buggy here, silently taking
the nmi watchdog away.

-Andi

2006-11-15 19:12:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)



On Wed, 15 Nov 2006, Andi Kleen wrote:
> >
> > Meanwhile we should restore the NMI counter to fix this bug.
>
> No, it was always oprofile who was buggy here, silently taking
> the nmi watchdog away.

Andi, your "blame game" doesn't matter.

The fact is, it used to work, and the kernel changed interfaces, so now it
doesn't.

In other words, a kernel interface to user land changed. THAT IS ALWAYS A
BUG. We don't change UI.

Yes, "oprofile" should be fixed to not depend on that, but the kernel
shouldn't change the interfaces, and we should add back the zero entry.

Linus

2006-11-15 19:24:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)


> The fact is, it used to work, and the kernel changed interfaces, so now it
> doesn't.

No, it didn't work. oprofile may have done something, but it
just silently killed the NMI watchdog in the process.
That was never acceptable.

Now we do proper accounting of NMI sources and also proper allocation
of performance counters.


> Yes, "oprofile" should be fixed to not depend on that, but the kernel
> shouldn't change the interfaces, and we should add back the zero entry.

That would break the nmi watchdog again.

Anyways, there is a sysctl to disable the nmi watchdog if someone
is desperate.

But I think it is clearly oprofile who did wrong here and needs
to be fixed.

-Andi

2006-11-15 20:26:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Wed, 15 Nov 2006 20:23:53 +0100
Andi Kleen <[email protected]> wrote:

>
> > The fact is, it used to work, and the kernel changed interfaces, so now it
> > doesn't.
>
> No, it didn't work. oprofile may have done something, but it
> just silently killed the NMI watchdog in the process.
> That was never acceptable.

But people could get profiles out. I know, I've seen them!

> Now we do proper accounting of NMI sources and also proper allocation
> of performance counters.
>
>
> > Yes, "oprofile" should be fixed to not depend on that, but the kernel
> > shouldn't change the interfaces, and we should add back the zero entry.
>
> That would break the nmi watchdog again.
>
> Anyways, there is a sysctl to disable the nmi watchdog if someone
> is desperate.
>
> But I think it is clearly oprofile who did wrong here and needs
> to be fixed.
>

Is it correct to say that oprofile-on-2.6.18 works, and that
oprofile-on-2.6.19-rc5 does not?

Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail
in some particular scenarios?

If it's really true that oprofile is simply busted then that's a serious
problem and we should find some way of unbusting it. If that means just
adding a dummy "0" entry which always returns zero or something like that,
then fine.

But we can't just go and bust it.

2006-11-15 21:20:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Andrew Morton <[email protected]> writes:

> Is it correct to say that oprofile-on-2.6.18 works, and that
> oprofile-on-2.6.19-rc5 does not?
>
> Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail
> in some particular scenarios?
>
> If it's really true that oprofile is simply busted then that's a serious
> problem and we should find some way of unbusting it. If that means just
> adding a dummy "0" entry which always returns zero or something like that,
> then fine.
>
> But we can't just go and bust it.

The simple question. If we turn off the NMI watchdog on 2.6.19-rc5
does oprofile work? I believe that is what Andi said.

The description I read was a resource conflict. The resources oprofile
just expects it can used are already in use so we tell it no and
the user space oprofile doesn't cope.

Now I don't know the interface allows us to rename the interfaces
from 1 2 3 to 0 1 2. If we can then that looks like something we can
fix. Otherwise from the description I tend to agree with Andi.

The user space application assumed it own hardware that it did not.

Hmm. I bet if nothing else we could move the NMI watchdog from 0 to 3
and make things work that way...


Eric

2006-11-15 21:35:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Wed, 15 Nov 2006 14:18:24 -0700
[email protected] (Eric W. Biederman) wrote:

> Andrew Morton <[email protected]> writes:
>
> > Is it correct to say that oprofile-on-2.6.18 works, and that
> > oprofile-on-2.6.19-rc5 does not?
> >
> > Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail
> > in some particular scenarios?
> >
> > If it's really true that oprofile is simply busted then that's a serious
> > problem and we should find some way of unbusting it. If that means just
> > adding a dummy "0" entry which always returns zero or something like that,
> > then fine.
> >
> > But we can't just go and bust it.
>
> The simple question. If we turn off the NMI watchdog on 2.6.19-rc5
> does oprofile work? I believe that is what Andi said.
>
> The description I read was a resource conflict. The resources oprofile
> just expects it can used are already in use so we tell it no and
> the user space oprofile doesn't cope.

That would have been a bug in earlier kernels.

> Now I don't know the interface allows us to rename the interfaces
> from 1 2 3 to 0 1 2. If we can then that looks like something we can
> fix. Otherwise from the description I tend to agree with Andi.
>
> The user space application assumed it own hardware that it did not.
>
> Hmm. I bet if nothing else we could move the NMI watchdog from 0 to 3
> and make things work that way...

Surely the appropriate behaviour is to allow oprofile to steal the NMI and
to then put the NMI back to doing the watchdog thing after oprofile has
finished with it.

If that's not a feasible thing to do for 2.6.19 then some short-term
hack which makes oprofile work again is needed.

2006-11-15 22:32:12

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wed, Nov 15, 2006 at 12:06:22PM +0100, Brice Goglin wrote:
> Adrian Bunk wrote:
> > Subject : unable to rip cd
> > References : http://lkml.org/lkml/2006/10/13/100
> > http://lkml.org/lkml/2006/11/8/42
> > Submitter : Alex Romosan <[email protected]>
> > Handled-By : Jens Axboe <[email protected]>
> > Status : Jens is investigating
>
> I think this one is already fixed.

Thanks for this information (Jens already told me the same).

> Brice

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-11-16 03:21:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Wed, Nov 15, 2006 at 12:21:18PM -0800, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
>
> >
> > > The fact is, it used to work, and the kernel changed interfaces, so now it
> > > doesn't.
> >
> > No, it didn't work. oprofile may have done something, but it
> > just silently killed the NMI watchdog in the process.
> > That was never acceptable.
>
> But people could get profiles out. I know, I've seen them!

Just the nmi watchdog was gone then.

>
> > Now we do proper accounting of NMI sources and also proper allocation
> > of performance counters.
> >
> >
> > > Yes, "oprofile" should be fixed to not depend on that, but the kernel
> > > shouldn't change the interfaces, and we should add back the zero entry.
> >
> > That would break the nmi watchdog again.
> >
> > Anyways, there is a sysctl to disable the nmi watchdog if someone
> > is desperate.
> >
> > But I think it is clearly oprofile who did wrong here and needs
> > to be fixed.
> >
>
> Is it correct to say that oprofile-on-2.6.18 works, and that
> oprofile-on-2.6.19-rc5 does not?
>
> Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail

echo 0 > /proc/sys/kernel/nmi_watchdog before the oprofile module is loaded.
With builtin oprofile probably nmi_watchdog=0

> in some particular scenarios?

On x86-64 and on newer i386 machines (based on DMI year)


>
> If it's really true that oprofile is simply busted then that's a serious
> problem and we should find some way of unbusting it. If that means just
> adding a dummy "0" entry which always returns zero or something like that,
> then fine.

That could be probably done.

> But we can't just go and bust it.

It just did something unbelievable broken before. I would say it busted
itself.

-Andi

2006-11-16 05:08:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Thu, 16 Nov 2006 04:21:09 +0100
Andi Kleen <[email protected]> wrote:

> >
> > If it's really true that oprofile is simply busted then that's a serious
> > problem and we should find some way of unbusting it. If that means just
> > adding a dummy "0" entry which always returns zero or something like that,
> > then fine.
>
> That could be probably done.

I'm told that this is exactly what it was doing before it got changed.

> > But we can't just go and bust it.
>
> It just did something unbelievable broken before.

What did it do?

> I would say it busted
> itself.

It gave profiles, which was fairly handy.

2006-11-16 07:04:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Thursday 16 November 2006 06:05, Andrew Morton wrote:
> On Thu, 16 Nov 2006 04:21:09 +0100
> Andi Kleen <[email protected]> wrote:
>
> > >
> > > If it's really true that oprofile is simply busted then that's a serious
> > > problem and we should find some way of unbusting it. If that means just
> > > adding a dummy "0" entry which always returns zero or something like that,
> > > then fine.
> >
> > That could be probably done.
>
> I'm told that this is exactly what it was doing before it got changed.

Hmm, ok perhaps that can be arranged again.

The trouble is that I want to use this performance counter for
other purposes too, so we would run into trouble again
if oprofile keeps stealing it.

> > > But we can't just go and bust it.
> >
> > It just did something unbelievable broken before.
>
> What did it do?

Silently kill the nmi watchdog.

>
> > I would say it busted
> > itself.
>
> It gave profiles, which was fairly handy.

I'm sure it can be fixed there. Ok ok I keep sounding like a sysfs maintainer
now @)

-Andi

2006-11-16 11:05:20

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Andrew Morton writes:
> Surely the appropriate behaviour is to allow oprofile to steal the NMI and
> to then put the NMI back to doing the watchdog thing after oprofile has
> finished with it.

Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
the in-kernel API allowing real performance counter drivers like
oprofile (and perfctr) to claim the HW from the NMI watchdog,
do their work, and then release it which resumed the watchdog.

Note that oprofile (and perfctr) didn't do anything behind the
NMI watchdog's back. They went via the API. Nothing dodgy going on.

2006-11-16 15:36:36

by William Cohen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Andi Kleen wrote:
> On Thursday 16 November 2006 06:05, Andrew Morton wrote:
>
>>On Thu, 16 Nov 2006 04:21:09 +0100
>>Andi Kleen <[email protected]> wrote:
>>
>>
>>>>If it's really true that oprofile is simply busted then that's a serious
>>>>problem and we should find some way of unbusting it. If that means just
>>>>adding a dummy "0" entry which always returns zero or something like that,
>>>>then fine.
>>>
>>>That could be probably done.
>>
>>I'm told that this is exactly what it was doing before it got changed.
>
>
> Hmm, ok perhaps that can be arranged again.
>
> The trouble is that I want to use this performance counter for
> other purposes too, so we would run into trouble again
> if oprofile keeps stealing it.

What other purposes do you see the performance counters useful for? To collect
information on process characteristics so they can be scheduled more efficiently?

Is this going to require sharing the nmi interrupt and knowing which perfcounter
register triggered the interrupt to get the correct action? Currently the
oprofile interrupt handler assumes any performance monitoring counter it sees
overflowing is something it should count.

-Will

2006-11-16 15:48:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)


> What other purposes do you see the performance counters useful for?

Export one to user space as a cycle counter for benchmarking. RDTSC doesn't
do this job anymore.

> To collect information on process characteristics so they can be scheduled more efficiently?

That might happen at some point in the future, but i would expect
us to wait for CPUs with more performance counters first.

> Is this going to require sharing the nmi interrupt and knowing which perfcounter
> register triggered the interrupt to get the correct action? Currently the
> oprofile interrupt handler assumes any performance monitoring counter it sees
> overflowing is something it should count.

Yes. That needs to be fixed.

-Andi

2006-11-16 20:29:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Thu, 16 Nov 2006 11:55:46 +0100
Mikael Pettersson <[email protected]> wrote:

> Andrew Morton writes:
> > Surely the appropriate behaviour is to allow oprofile to steal the NMI and
> > to then put the NMI back to doing the watchdog thing after oprofile has
> > finished with it.
>
> Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
> the in-kernel API allowing real performance counter drivers like
> oprofile (and perfctr) to claim the HW from the NMI watchdog,
> do their work, and then release it which resumed the watchdog.

OK. But from Andi's comments it seems that the NMI watchdog was failing to
resume its operation.

> Note that oprofile (and perfctr) didn't do anything behind the
> NMI watchdog's back. They went via the API. Nothing dodgy going on.

2006-11-16 21:34:41

by Stephane Eranian

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Hello,

On Thu, Nov 16, 2006 at 10:34:56AM -0500, William Cohen wrote:
>
> Is this going to require sharing the nmi interrupt and knowing which perfcounter
> register triggered the interrupt to get the correct action? Currently the
> oprofile interrupt handler assumes any performance monitoring counter it sees
> overflowing is something it should count.
>
Yes, you need to share the NMI interrupt. In my next perfmon patch you will
see that this can be made to work. You just need to add one check in the
NMI handler callback: is it for me or else try perfmon? Perfmon can auto-detect
if NMI is active and give up the right counter (there is an API to check
what is reserved). The interface propagates the list of available counters
to apps which then pass the information onto libpfm which tries to use
the remaining counters.

--
-Stephane

2006-11-17 10:08:14

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Andrew Morton writes:
> On Thu, 16 Nov 2006 11:55:46 +0100
> Mikael Pettersson <[email protected]> wrote:
>
> > Andrew Morton writes:
> > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and
> > > to then put the NMI back to doing the watchdog thing after oprofile has
> > > finished with it.
> >
> > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
> > the in-kernel API allowing real performance counter drivers like
> > oprofile (and perfctr) to claim the HW from the NMI watchdog,
> > do their work, and then release it which resumed the watchdog.
>
> OK. But from Andi's comments it seems that the NMI watchdog was failing to
> resume its operation.

It certainly worked when I originally implemented it. If it didn't work
that way before 2.6.19-rc1 butchered it then that would have been a bug
that should have been fixed.

2006-11-17 10:18:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Fri, 17 Nov 2006 10:59:07 +0100
Mikael Pettersson <[email protected]> wrote:

> Andrew Morton writes:
> > On Thu, 16 Nov 2006 11:55:46 +0100
> > Mikael Pettersson <[email protected]> wrote:
> >
> > > Andrew Morton writes:
> > > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and
> > > > to then put the NMI back to doing the watchdog thing after oprofile has
> > > > finished with it.
> > >
> > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
> > > the in-kernel API allowing real performance counter drivers like
> > > oprofile (and perfctr) to claim the HW from the NMI watchdog,
> > > do their work, and then release it which resumed the watchdog.
> >
> > OK. But from Andi's comments it seems that the NMI watchdog was failing to
> > resume its operation.
>
> It certainly worked when I originally implemented it. If it didn't work
> that way before 2.6.19-rc1 butchered it then that would have been a bug
> that should have been fixed.

Oh. OK.

Meanwhile, 2.6.19-rc6 remains unfixed.

2006-11-17 10:30:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

On Friday 17 November 2006 10:59, Mikael Pettersson wrote:

> It certainly worked when I originally implemented it.

I don't think so. NMI watchdog never recovered no matter if oprofile
used the counter or not.

-Andi

2006-11-19 03:07:26

by Bill Davidsen

[permalink] [raw]
Subject: Re: [discuss] Re: 2.6.19-rc5: known regressions (v3)

Andrew Morton wrote:
> On Fri, 17 Nov 2006 10:59:07 +0100
> Mikael Pettersson <[email protected]> wrote:
>
>> Andrew Morton writes:
>> > On Thu, 16 Nov 2006 11:55:46 +0100
>> > Mikael Pettersson <[email protected]> wrote:
>> >
>> > > Andrew Morton writes:
>> > > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and
>> > > > to then put the NMI back to doing the watchdog thing after oprofile has
>> > > > finished with it.
>> > >
>> > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
>> > > the in-kernel API allowing real performance counter drivers like
>> > > oprofile (and perfctr) to claim the HW from the NMI watchdog,
>> > > do their work, and then release it which resumed the watchdog.
>> >
>> > OK. But from Andi's comments it seems that the NMI watchdog was failing to
>> > resume its operation.
>>
>> It certainly worked when I originally implemented it. If it didn't work
>> that way before 2.6.19-rc1 butchered it then that would have been a bug
>> that should have been fixed.
>
> Oh. OK.
>
> Meanwhile, 2.6.19-rc6 remains unfixed.
>
Has anyone verified that nmi watchdog works at all in 2.6.19-rc6? I
haven't built a kernel since rc2, other things have been taking my time.

--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.

2006-11-22 10:27:51

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
> On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
> > Subject : x86_64: oprofile doesn't work
> > References : http://lkml.org/lkml/2006/10/27/3
> > Submitter : Prakash Punnoor <[email protected]>
> > Status : unknown
>

I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.

# opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE
# opcontrol --start
/usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or
directory
/usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory
/usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory
/usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or
directory
/usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory
/usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or
directory
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.

# ls -l /dev/oprofile/
total 0
drwxr-xr-x 1 root root 0 Nov 22 11:18 1
-rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth
-rw-r--r-- 1 root root 0 Nov 22 11:18 buffer
-rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size
-rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed
-rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size
-rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type
-rw-rw-rw- 1 root root 0 Nov 22 11:18 dump
-rw-r--r-- 1 root root 0 Nov 22 11:18 enable
-rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size
drwxr-xr-x 1 root root 0 Nov 22 11:18 stats
# dmesg | grep oprofile
oprofile: using NMI interrupt.
# opcontrol --version
opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09

Eric


2006-11-22 10:42:30

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wednesday 22 November 2006 11:28, Eric Dumazet wrote:
> On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
> > On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
> > > Subject : x86_64: oprofile doesn't work
> > > References : http://lkml.org/lkml/2006/10/27/3
> > > Submitter : Prakash Punnoor <[email protected]>
> > > Status : unknown
> >
>
> I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.

oprofile is still broken because it cannot deal with the lack of perfctr 0.
You can disable the nmi watchdog as a workaround.

-Andi

2006-11-22 18:01:06

by William Cohen

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

Eric Dumazet wrote:
> On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
>
>>On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
>>
>>>Subject : x86_64: oprofile doesn't work
>>>References : http://lkml.org/lkml/2006/10/27/3
>>>Submitter : Prakash Punnoor <[email protected]>
>>>Status : unknown
>>
>
> I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.
>
> # opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE
> # opcontrol --start
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or
> directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or
> directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or
> directory
> Using 2.6+ OProfile kernel interface.
> Reading module info.
> Using log file /var/lib/oprofile/oprofiled.log
> Daemon started.
> Profiler running.
>
> # ls -l /dev/oprofile/
> total 0
> drwxr-xr-x 1 root root 0 Nov 22 11:18 1
> -rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed
> -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size
> -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type
> -rw-rw-rw- 1 root root 0 Nov 22 11:18 dump
> -rw-r--r-- 1 root root 0 Nov 22 11:18 enable
> -rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size
> drwxr-xr-x 1 root root 0 Nov 22 11:18 stats
> # dmesg | grep oprofile
> oprofile: using NMI interrupt.
> # opcontrol --version
> opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09
>
> Eric

Could you try the patch that I posted on the oprofile mailing list last week
November 17 2005 for op_allocate.c and see if that resolves the problem you are
having?

http://sourceforge.net/mailarchive/message.php?msg_id=37316102

-Will


2006-11-22 18:06:57

by William Cohen

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

Eric Dumazet wrote:
> On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
>
>>On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
>>
>>>Subject : x86_64: oprofile doesn't work
>>>References : http://lkml.org/lkml/2006/10/27/3
>>>Submitter : Prakash Punnoor <[email protected]>
>>>Status : unknown
>>
>
> I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.
>
> # opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE
> # opcontrol --start
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or
> directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or
> directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory
> /usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or
> directory
> Using 2.6+ OProfile kernel interface.
> Reading module info.
> Using log file /var/lib/oprofile/oprofiled.log
> Daemon started.
> Profiler running.
>
> # ls -l /dev/oprofile/
> total 0
> drwxr-xr-x 1 root root 0 Nov 22 11:18 1
> -rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size
> -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed
> -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size
> -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type
> -rw-rw-rw- 1 root root 0 Nov 22 11:18 dump
> -rw-r--r-- 1 root root 0 Nov 22 11:18 enable
> -rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size
> drwxr-xr-x 1 root root 0 Nov 22 11:18 stats
> # dmesg | grep oprofile
> oprofile: using NMI interrupt.
> # opcontrol --version
> opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09
>
> Eric

You will also need another patch checked into the oprofile cvs last week mentioned:

http://sourceforge.net/mailarchive/message.php?msg_id=35422937

-Will


Attachments:
opalloc.diff (538.00 B)

2006-11-22 18:26:26

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wednesday 22 November 2006 19:05, William Cohen wrote:

> You will also need another patch checked into the oprofile cvs last week
> mentioned:
>
> http://sourceforge.net/mailarchive/message.php?msg_id=35422937
>
> -Will

Thank you William.

I confirm that CVS oprofile version + patches you gave here works with
linux-2.6.16-rc6 on i386, regardless of disabling nmi_watchdog (adding or
not nmi_watchdog=0 in boot params)

Eric

2006-11-22 18:47:59

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On Wed, 22 Nov 2006 11:36:14 +0100
Andi Kleen <[email protected]> wrote:

> On Wednesday 22 November 2006 11:28, Eric Dumazet wrote:
> > On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
> > > On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
> > > > Subject : x86_64: oprofile doesn't work
> > > > References : http://lkml.org/lkml/2006/10/27/3
> > > > Submitter : Prakash Punnoor <[email protected]>
> > > > Status : unknown
> > >
> >
> > I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.
>
> oprofile is still broken because it cannot deal with the lack of perfctr 0.

The kernel is still broken because we changed the interface.

> You can disable the nmi watchdog as a workaround.

I don't understand why you think this is acceptable.

2006-12-16 11:20:29

by Ray Lee

[permalink] [raw]
Subject: Re: 2.6.19-rc5: known regressions (v3)

On 11/22/06, Andrew Morton <[email protected]> wrote:
> On Wed, 22 Nov 2006 11:36:14 +0100
> Andi Kleen <[email protected]> wrote:
>
> > On Wednesday 22 November 2006 11:28, Eric Dumazet wrote:
> > > On Wednesday 15 November 2006 11:35, Eric Dumazet wrote:
> > > > On Wednesday 15 November 2006 11:21, Adrian Bunk wrote:
> > > > > Subject : x86_64: oprofile doesn't work
> > > > > References : http://lkml.org/lkml/2006/10/27/3
> > > > > Submitter : Prakash Punnoor <[email protected]>
> > > > > Status : unknown
> > > >
> > >
> > > I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set.
> >
> > oprofile is still broken because it cannot deal with the lack of perfctr 0.
>
> The kernel is still broken because we changed the interface.

I just got bit by this on 2.6.20-latest (well, of two days ago anyway)
while trying to debug another transient 'kacpid sucks all available
cpu time'. But that's okay, I'm sure it will happen again in a week or
two.

In the meantime, who won this pis^H^H^H discussion?

Mikael Pettersson wrote:
> Andrew Morton writes:
> > Surely the appropriate behaviour is to allow oprofile to steal the NMI and
> > to then put the NMI back to doing the watchdog thing after oprofile has
> > finished with it.
>
> Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented
> the in-kernel API allowing real performance counter drivers like
> oprofile (and perfctr) to claim the HW from the NMI watchdog,
> do their work, and then release it which resumed the watchdog.
>
> Note that oprofile (and perfctr) didn't do anything behind the
> NMI watchdog's back. They went via the API. Nothing dodgy going on.

Well, that seems clear.