2011-02-08 00:24:23

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.38-rc4

No travel or cyclone-dodging this time, so as promised, the -rc's are
now back to the usual weekly schedule.

There's nothing much that stands out here. Some arch updates (arm and
powerpc), the usual driver updates: dri (radeon/i915), network cards,
sound, media, scisi, some filesystem updates (cifs, btrfs), and some
random stuff to round it all out (networking, watchpoints,
tracepoints, etc).

Pretty small, all in all. I'd obviously prefer it to be even smaller,
and I actually dropped a pull request or two, but for being -rc4 this
is by no means horrible. As long as it keeps shrinking, I'll be happy.

Linus

---

Aaro Koskinen (3):
arm: mach-omap2: voltage: debugfs: fix memory leak
arm: mach-omap2: board-rm680: fix rm680_vemmc regulator constraints
arm: mach-omap2: mux: free allocated memory on error exit

Ajit Khaparde (3):
be2net: fix a crash seen during insmod/rmmod test
be2net: remove netif_stop_queue being called before register_netdev.
MAINTAINERS: update email ids of the be2net driver maintainers.

Akinobu Mita (1):
[S390] use asm-generic/cacheflush.h

Alan Cox (1):
depca: Fix warnings

Alex Deucher (10):
drm/radeon/kms: rv6xx+ thermal sensor fixes
drm/radeon/kms: switch back to min->max pll post divider iteration
drm/radeon/kms: add pll debugging output
drm/radeon/kms: add new pll algo for avivo asics
drm/radeon/kms: Enable new pll calculation for avivo+ asics
drm/radeon: remove 0x4243 pci id
drm/radeon/kms: add updated ib_execute function for evergreen
drm/radeon/kms/evergreen: always set certain VGT regs at CP init
drm/radeon/kms: fix s/r issues with bios scratch regs
drm/radeon/kms: dynamically allocate power state space

Alexey Charkov (1):
btrfs: Drop __exit attribute on btrfs_exit_compress

Amerigo Wang (1):
sound: silent echo'ed messages in Makefile

Andrea Arcangeli (1):
mm: when migrate_pages returns 0, all pages must have been released

Andy Gospodarek (1):
gro: reset skb_iif on reuse

Andy Robinson (1):
ALSA: HDA: cxt5066 - Use asus model for Asus U50F, select
correct SPDIF output

Anton Blanchard (6):
powerpc/numa: Only use active VPHN count fields
powerpc/numa: Check for all VPHN changes
powerpc/numa: Add length when creating OF properties via VPHN
powerpc/numa: Disable VPHN on dedicated processor partitions
powerpc/numa: Fix bug in unmap_cpu_from_node
powerpc: Fix hcall tracepoint recursion

Arnaldo Carvalho de Melo (1):
perf stat: Fix aggreate counter reading accounting

Ben Dooks (3):
MAINTAINERS: move s3c2410 drivers to ARM/SAMSUNG ARM
MAINTAINERS: fixup file entries for "SIMTEC EB2410ITX (BAST)"
MAINTAINERS: fixup Simtec support email entries

Ben Hutchings (1):
arm/ixp4xx: Rename FREQ macro to avoid collisions

Ben Skeggs (1):
drm/nv50: fix display on 0x50

Benjamin Herrenschmidt (2):
powerpc: Pass the right cpu_spec to ->setup_cpu() on 64-bit
powerpc: Fix some 6xx/7xxx CPU setup functions

Boaz Harrosh (1):
Revert "exofs: Set i_mapping->backing_dev_info anyway"

Bob Copeland (2):
ath5k: fix error handling in ath5k_hw_dma_stop
ath5k: correct endianness of frame duration

Chaoming Li (1):
rtlwifi: Fix firmware upload errors

Chris Mason (2):
Btrfs: catch errors from btrfs_sync_log
Btrfs: avoid uninit variable warnings in ordered-data.c

Chris Wilson (10):
drm/i915/sdvo: If at first we don't succeed in reading the response, wait
drm: Add an interface to reset the device
drm/i915: Reset state after a GPU reset or resume
drm/i915/crt: Force the initial probe after reset
drm/i915: Reset crtc after resume
drm: Don't switch fb when disabling an output
drm: Simplify and defend later checks when disabling a crtc
drm: Avoid leak of adjusted mode along quick set_mode paths
drm/i915: Suppress spurious vblank interrupts
drm/i915: Only bind to function 0 of the PCI device

Christoph Hellwig (2):
hfsplus: fix failed mount handling
hfsplus: fix up a comparism in hfsplus_file_extend

Chuck Ebbert (4):
CAN: softing driver depends on IOMEM
atl1c: Add missing PCI device ID
hfsplus: do not leak buffer on error
hfsplus: fix two memory leaks in wrapper.c

Clemens Ladisch (1):
ALSA: oxygen: fix output routing on Xonar DG

David Dillow (1):
[SCSI] fix incorrect value of SCSI_MAX_SG_CHAIN_SEGMENTS due to
include file ordering

David Henningsson (3):
ALSA: HDA: Refactor some redundant code for Conexant 5066/205xx
ALSA: HDA: Add a new model "asus" for Conexant 5066/205xx
ALSA: HDA: Fix microphone(s) on Lenovo Edge 13

David S. Miller (5):
ipv6: Remove route peer binding assertions.
niu: Fix races between up/down and get_stats.
net: Fix bug in compat SIOCGETSGCNT handling.
net: Support compat SIOCGETVIFCNT ioctl in ipv4.
net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.

Eric Dumazet (4):
perf: Fix alloc_callchain_buffers()
econet: remove compiler warnings
net: add kmemcheck annotation in __alloc_skb()
epoll: epoll_wait() should not use timespec_add_ns()

Eric W. Biederman (3):
net: Fix ip link add netns oops
net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT
net: Fix ipv6 neighbour unregister_sysctl_table warning

Fabio Estevam (2):
ARM: imx: Add VPR200 and MX51_3DS entries to uncompress.h
ARM: mach-imx/mach-mx25_3ds: Fix section type

Francois Romieu (2):
r8169: RxFIFO overflow oddities with 8168 chipsets.
r8169: prevent RxFIFO induced loops in the irq handler.

Frank Blaschka (1):
qeth: add more strict MTU checking

H. Peter Anvin (2):
x86-32: Make sure the stack is set up before we use it
x86, nx: Mark the ACPI resume trampoline code as +x

Heiko Carstens (1):
[S390] tlb: fix build error caused by THP

Herbert Xu (1):
gro: Reset dev pointer on reuse

Huang Weiyi (1):
omap1: remove duplicated #include

Ian Campbell (1):
xen: netfront: handle incoming GSO SKBs which are not CHECKSUM_PARTIAL

Ian Kent (1):
Btrfs: Fix memory leak on finding existing super

Ivan Vecera (1):
r8169: use RxFIFO overflow workaround for 8168c chipset.

James Bottomley (1):
[SCSI] libsas: fix runaway error handler problem

Jan Glauber (1):
[S390] qdio: prevent compile warning under CONFIG_32BIT

Janusz Krzysztofik (2):
ASoC: Amstrad Delta: fix const related build error
ASoC: CX20442: fix NULL pointer dereference

Jarkko Nikula (1):
ASoC: Fix module refcount for auxiliary devices

Jarod Wilson (8):
[media] rc/mce: add mappings for missing keys
[media] hdpvr: fix up i2c device registration
[media] lirc_zilog: z8 on usb doesn't like back-to-back i2c_master_send
[media] ir-kbd-i2c: improve remote behavior with z8 behind usb
[media] rc/ir-lirc-codec: add back debug spew
[media] rc: use time unit conversion macros correctly
[media] mceusb: really fix remaining keybounce issues
[media] rc/streamzap: fix reporting response times

Javi Merino (1):
sched, docs: Update schedstats documentation to version 15

Jean-François Moine (3):
[media] gspca - zc3xx: Bad delay when given by a table
[media] gspca - zc3xx: Fix bad images with the sensor hv7131r
[media] gspca - zc3xx: Discard the partial frames

Jeff Layton (13):
cifs: fix two compiler warning about uninitialized vars
cifs: handle cancelled requests better
cifs: send an NT_CANCEL request when a process is signalled
cifs: simplify SMB header check routine
cifs: don't pop a printk when sending on a socket is interrupted
cifs: force a reconnect if there are too many MIDs in flight
cifs: make CIFS depend on CRYPTO_MD4
cifs: clean up some compiler warnings
cifs: fix length checks in checkSMB
cifs: fix length vs. total_read confusion in cifs_demultiplex_thread
cifs: enable signing flag in SMB header when server has it on
cifs: don't send an echo request unless NegProt has been done
cifs: remove checks for ses->status == CifsExiting

Jesse Larrew (3):
powerpc/pseries: Fix typo in VPHN comments
powerpc/pseries: Fix brace placement in numa.c
powerpc/pseries: Remove unnecessary variable initializations in numa.c

Jin Dongming (3):
thp: fix splitting of hwpoisoned hugepages
thp: fix the wrong reported address of hwpoisoned hugepages
thp: fix unsuitable behavior for hwpoisoned tail page

Johannes Weiner (3):
memcg: prevent endless loop when charging huge pages
memcg: prevent endless loop when charging huge pages to near-limit group
memcg: never OOM when charging huge pages

Josef Bacik (9):
Btrfs: fix check_path_shared so it returns the right value
Btrfs: do not release more reserved bytes to the
global_block_rsv than we need
Btrfs: use the global block reserve if we cannot reserve space
Btrfs: do error checking in btrfs_del_csums
Btrfs: handle no memory properly in prepare_pages
Btrfs: make shrink_delalloc a little friendlier
fs: make block fiemap mapping length at least blocksize long
Btrfs: make sure search_bitmap finds something in remove_from_bitmap
Btrfs: exclude super blocks when we read in block groups

Julia Lawall (3):
OMAP: PM: SmartReflex: Add missing IS_ERR test
fs/btrfs/inode.c: Add missing IS_ERR test
include/net/genetlink.h: Allow genlmsg_cancel to accept a NULL argument

KAMEZAWA Hiroyuki (1):
memcg: fix event counting breakage from recent THP update

Kashyap, Desai (6):
[SCSI] mpt2sas: Fix device removal handshake for zoned devices
[SCSI] mpt2sas: fix internal device reset for older firmware
prior to MPI Rev K
[SCSI] mpt2sas: Correct resizing calculation for max_queue_depth
[SCSI] mpt2sas: Fix the race between broadcast asyn event and
scsi command completion
[SCSI] mpt2sas: Kernel Panic during Large Topology discovery
[SCSI] mpt2sas: fix Integrated Raid unsynced on shutdown problem

Keith Packard (1):
drm: Only set DPMS ON when actually configuring a mode

Ken Kawasaki (1):
axnet_cs: reduce delay time at ei_rx_overrun

Kevin Hilman (1):
OMAP3: PM: fix save secure RAM to restore MPU power state

Krzysztof Hałasa (1):
IXP4xx: Fix qmgr_release_queue() flushing unexpected queue entries.

Kurt Van Dijck (1):
net: fix validate_link_af in rtnetlink core

Li Zefan (8):
btrfs: Fix threshold calculation for block groups smaller than 1GB
btrfs: Add helper function free_bitmap()
btrfs: Free fully occupied bitmap in cluster
btrfs: Update stats when allocating from a cluster
btrfs: Add a helper try_merge_free_space()
btrfs: Check mergeable free space when removing a cluster
Btrfs: Fix memory leak at umount
Btrfs: Fix file clone when source offset is not 0

Linus Lüssing (1):
batman-adv: Fix kernel panic when fetching vis data on a vis server

Linus Torvalds (1):
Linux 2.6.38-rc4

Lucas Stach (1):
drm/nouveau: correctly pair hwmon_init and hwmon_fini

Luciano Coelho (1):
MAINTAINERS: update information for the wl12xx driver

Manjunathappa, Prakash (1):
ASoC: DaVinci: fix kernel panic due to uninitialized platform_data

Marcelo Roberto Jimenez (1):
RTC: Prevents a division by zero in kernel code.

Marcin Slusarz (3):
watchdog: Fix broken nowatchdog logic
watchdog: Fix sysctl consistency
watchdog: Don't change watchdog state on read of sysctl

Marek Vasut (1):
OMAP1: Fix non-working LCD on OMAP310

Martin Schwidefsky (2):
[S390] pgtable_list corruption
[S390] missing sacf in uaccess

Mathias Krause (1):
wl12xx: fix use after free

Mathieu Desnoyers (1):
tracepoints: Fix section alignment using pointer array

Matt Turner (2):
amd-k7-agp: remove non-x86 code
Revert "agp: AMD AGP is used on UP1100 & UP1500 alpha boxen"

Matthieu CASTET (1):
x86, nx: Don't force pages RW when setting NX bits

Miao Xie (2):
Btrfs: Don't return acl info when mounting with noacl option
Btrfs: Fix memory leak in writepage fixup work

Michael S. Tsirkin (1):
vhost: rcu annotation fixup

Michal Hocko (2):
memsw: handle swapaccount kernel parameter correctly
memsw: deprecate noswapaccount kernel parameter and schedule it
for removal

Michal Simek (3):
microblaze: Fix DTB passing from bootloader
microblaze: Fix unaligned issue on MMU system with BS=0 DIV=1
microblaze: Fix ASM optimized code for LE

Michel Lespinasse (1):
mlock: operate on any regions with protection != PROT_NONE

Mika Westerberg (1):
ARM: 6652/1: ep93xx: correct the end address of the AC97 memory resource

Minchan Kim (1):
mm/migration: fix page corruption during hugepage migration

Ming Lei (1):
arm: omap4: panda: remove usb_nop_xceiv_register(v1)

Mitko Haralanov (1):
IB/qib: Hold link for TX SERDES settings

Mohammed Shafi Shajakhan (1):
ath9k: Fix memory leak due to failed PAPRD frames

Namhyung Kim (2):
vfs: sparse: remove a warning on OPEN_FMODE()
vfs: sparse: add __FMODE_EXEC

NickCheng (1):
[SCSI] arcmsr: Fix the issue of system hangup after commands
timeout on ARC-1200

Oliver Hartkopp (1):
slcan: fix referenced website in Kconfig help text

Pablo Neira Ayuso (3):
netfilter: ctnetlink: fix missing refcount increment during dumps
netfilter: arpt_mangle: fix return values of checkentry
netfilter: ecache: always set events bits, filter them later

Pavel Emelyanov (1):
bridge: Don't put partly initialized fdb into hash

Pavel Shilovsky (1):
CIFS: Fix variable types in cifs_iovec_read/write (try #2)

Peter Chubb (1):
tcp_ecn is an integer not a boolean

Peter Zijlstra (3):
perf: Fix reading in perf_event_read()
sched: Fix update_curr_rt()
lockdep, timer: Fix del_timer_sync() annotation

Rajkumar Manoharan (2):
ath9k_hw: Fix system hang when resuming from S3/S4
ath9k: Fix power save usage count imbalance on deinit

Ralf Thielow (1):
RDMA/amso1100: Fix compile warnings

Randy Dunlap (2):
gpu/stub: fix acpi_video build error, fix stub kconfig dependencies
gpu/stub: fix acpi_video build error, fix stub kconfig dependencies

Roland Dreier (1):
net: Add default_mtu() methods to blackhole dst_ops

Russell King (3):
[media] fix saa7111 non-detection
ARM: Update mach-types
ALSA: AACI: allow writes to MAINCR to take effect

Sascha Hauer (4):
ARM i.MX28: fix bit operation
ARM i.MX28: use correct register for setting the rate
ARM i.MX23/28: remove secondary field from struct clk. It's unused
ARM i.MX23: use correct register for setting the rate

Scott Wood (2):
powerpc: Fix pfn_valid() when memory starts at a non-zero address
powerpc/book3e: Protect complex macro args in mmu-book3e.h

Sebastian Ott (1):
[S390] reset default for CONFIG_CHSC_SCH

Shawn Guo (1):
ARM: mxs: fix clock base address missing

Shirish Pargaonkar (2):
cifs: No need to check crypto blockcipher allocation
cifs: Possible slab memory corruption while updating extended
stats (repost)

Stanislav Fomichev (1):
cifs: add check for kmalloc in parse_dacl

Stanislaw Gruszka (3):
ath9k: fix race conditions when stop device
ath9k_htc: fix race conditions when stop device
dl2k: nulify fraginfo after unmap

Stefan Haberland (1):
[S390] dasd: prevent panic with unresumed devices

Stefan Weil (8):
drm/radeon: Fix wrong boolean operator
OMAP: PM: SmartReflex: Fix possible memory leak
OMAP: PM: SmartReflex: Fix possible null pointer read access
enc28j60: Fix reading of transmit status vector
vxge: Fix wrong boolean operator
isdn: icn: Fix potentially wrong string handling
s390: Fix wrong size in memcmp (netiucv)
s390: Fix possibly wrong size in strncmp (smsgiucv)

Stephane Eranian (1):
perf: Fix Pentium4 raw event validation

Stephen Kitt (1):
agp: ensure GART has an address before enabling it

Stephen Warren (1):
ASoC: Fix mask/val_mask confusion snd_soc_dapm_put_volsw()

Steve French (1):
[CIFS] Update cifs minor version

Steve Wise (3):
RDMA/cxgb4: Limit MAXBURST EQ context field to 256B
RDMA/cxgb4: Set the correct device physical function for iWARP connections
RDMA/ucma: Copy iWARP route information on queries

Steven Rostedt (2):
tracing: Replace trace_event struct array with pointer array
tracing: Replace syscall_meta_data struct array with pointer array

Suresh Siddha (2):
x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
x86, mm: avoid possible bogus tlb entries by clearing prev
mm_cpumask after switching mm

Sven Eckelmann (3):
batman-adv: Remove vis info on hashing errors
batman-adv: Remove vis info element in free_info
batman-adv: Make vis info stack traversal threadsafe

Takashi Iwai (2):
ALSA: hda - Fix memory leaks in conexant jack arrays
ALSA: use linux/io.h to fix compile warnings

Tejun Heo (1):
RDMA: Update missed conversion of flush_scheduled_work()

Tero Roponen (1):
Btrfs: Free correct pointer after using strsep

Tetsuo Handa (3):
CRED: Fix kernel panic upon security_file_alloc() failure.
CRED: Fix BUG() upon security_cred_alloc_blank() failure
CRED: Fix memory and refcount leaks upon security_prepare_creds() failure

Thomas Gleixner (3):
genirq: Prevent irq storm on migration
genirq: Add missing status flags to modification mask
m32r: Fixup last __do_IRQ leftover

Thomas Jacob (1):
netfilter: xt_iprange: Incorrect xt_iprange boundary check for IPv6

Thomas Weber (1):
OMAP3: Devkit8000: Change lcd power pin

Tom Herbert (1):
net: Check rps_flow_table when RPS map length is 1

Tsutomu Itoh (5):
btrfs: fix return value check of btrfs_join_transaction()
btrfs: check return value of btrfs_start_ioctl_transaction() properly
btrfs: checking NULL or not in some functions
btrfs: fix return value check of btrfs_start_transaction()
btrfs: cleanup error handling in btrfs_unlink_inode()

Ursula Braun (3):
qeth: show new mac-address if its setting fails
qeth: allow HiperSockets framesize change in suspend
qeth: allow OSA CHPARM change in suspend state

Uwe Kleine-König (2):
ARM: mxs: acknowledge gpio irq
ARM: mxs/imx28: remove now unused clock lookup "fec.0"

Vasiliy Kulikov (2):
net: can: at91_can: world-writable sysfs files
net: can: janz-ican3: world-writable sysfs termination file

Vladislav Zolotarov (1):
bnx2x: multicasts in NPAR mode

Yan, Zheng (1):
Btrfs: Fix page count calculation

Yaniv Rosner (5):
bnx2x: Remove setting XAUI low-power for BCM8073
bnx2x: Fix LED blink rate on BCM84823
bnx2x: Fix port swap for BCM8073
bnx2x: Fix potential link loss in multi-function mode
bnx2x: Update bnx2x version to 1.62.00-5

Yevgeny Petrilin (1):
mlx4_core: Add ConnectX-3 device IDs

liubo (3):
btrfs: fix uncheck memory allocation in btrfs_submit_compressed_read
btrfs: fix several uncheck memory allocations
btrfs: fix missing break in switch phrase

[email protected] (1):
caif: bugfix - add caif headers for userspace usage.


2011-02-08 10:17:49

by Borislav Petkov

[permalink] [raw]
Subject: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Mon, Feb 07, 2011 at 04:23:37PM -0800, Linus Torvalds wrote:
> No travel or cyclone-dodging this time, so as promised, the -rc's are
> now back to the usual weekly schedule.
>
> There's nothing much that stands out here. Some arch updates (arm and
> powerpc), the usual driver updates: dri (radeon/i915), network cards,
> sound, media, scisi, some filesystem updates (cifs, btrfs), and some
> random stuff to round it all out (networking, watchpoints,
> tracepoints, etc).
>
> Pretty small, all in all. I'd obviously prefer it to be even smaller,
> and I actually dropped a pull request or two, but for being -rc4 this
> is by no means horrible. As long as it keeps shrinking, I'll be happy.

So, I'm getting the warning below early in the boot process. And yes, I
didn't have it on -rc3. 2.6.38-rc4-00001-g1e554e3-dirty means a debug
diff ontop of -rc4 which shouldn't have anything to do with this splat
since all it does is a couple of printk's due to -rc3 not suspending to
disk properly in some cases.

Now, I'm not going to even pretend to understand the code but here's
what I can read out, you tell me whether it makes sense.

spawn_ksoftirqd() is one of the early initcalls that gets called and
it's notifier callback does kthread_bind() and you can follow in the
backtrace below that this thing comes down to del_timer_sync() which
does the lockdep annotation. Now, problem as I see it, is that hardirqs
were disabled when we were called although it doesn't say so in the
irqtrace events dump after the calltrace: "hardirqs last enabled at
(999)" and our irq event stamp is 1000.

There are actually at least three del_timer_sync()'s inflight so the
problem could be there somewhere, I dunno.

So, is it a wrong lockdep annotation or is it a real problem? I've
attached dmesg and config.

Thanks.

[ 0.023211] Setting APIC routing to flat
[ 0.024378] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.034634] CPU0: AMD Phenom(tm) II X4 940 Processor stepping 02
[ 0.034997] calling trace_init_flags_sys_exit+0x0/0x12 @ 1
[ 0.034997] initcall trace_init_flags_sys_exit+0x0/0x12 returned 0 after 0 usecs
[ 0.034997] calling trace_init_flags_sys_enter+0x0/0x12 @ 1
[ 0.034997] initcall trace_init_flags_sys_enter+0x0/0x12 returned 0 after 0 usecs
[ 0.034997] calling init_hw_perf_events+0x0/0xbfa @ 1
[ 0.034997] Performance Events: AMD PMU driver.
[ 0.035003] ... version: 0
[ 0.035140] ... bit width: 48
[ 0.035278] ... generic registers: 4
[ 0.035415] ... value mask: 0000ffffffffffff
[ 0.035554] ... max period: 00007fffffffffff
[ 0.035692] ... fixed-purpose events: 0
[ 0.035829] ... event mask: 000000000000000f
[ 0.036035] initcall init_hw_perf_events+0x0/0xbfa returned 0 after 1952 usecs
[ 0.036276] calling migration_init+0x0/0x6d @ 1
[ 0.036418] initcall migration_init+0x0/0x6d returned 0 after 0 usecs
[ 0.036558] calling spawn_ksoftirqd+0x0/0x52 @ 1
[ 0.036786] System has AMD C1E enabled
[ 0.036937] Switch to broadcast mode on CPU0
[ 0.037063] ------------[ cut here ]------------
[ 0.037210] WARNING: at kernel/lockdep.c:3151 check_flags+0x63/0x179()
[ 0.037349] Hardware name: System Product Name
[ 0.037487] Modules linked in:
[ 0.037658] Pid: 1, comm: swapper Not tainted 2.6.38-rc4-00001-g1e554e3-dirty #11
[ 0.037898] Call Trace:
[ 0.037997] [<ffffffff81039906>] ? warn_slowpath_common+0x85/0x9d
[ 0.037997] [<ffffffff81048251>] ? del_timer_sync+0x0/0xa0
[ 0.037997] [<ffffffff81039938>] ? warn_slowpath_null+0x1a/0x1c
[ 0.037997] [<ffffffff81066f1c>] ? check_flags+0x63/0x179
[ 0.037997] [<ffffffff8106c664>] ? lock_acquire+0x4c/0x192
[ 0.037997] [<ffffffff81048273>] ? del_timer_sync+0x22/0xa0
[ 0.037997] [<ffffffff81048292>] ? del_timer_sync+0x41/0xa0
[ 0.037997] [<ffffffff81048251>] ? del_timer_sync+0x0/0xa0
[ 0.037997] [<ffffffff81439969>] ? schedule_timeout+0x35c/0x3bb
[ 0.037997] [<ffffffff81047e9e>] ? process_timeout+0x0/0x10
[ 0.037997] [<ffffffff814399e6>] ? schedule_timeout_uninterruptible+0x1e/0x20
[ 0.037997] [<ffffffff81036c44>] ? wait_task_inactive+0x181/0x1cd
[ 0.037997] [<ffffffff810584bc>] ? kthread_bind+0x1c/0x6d
[ 0.037997] [<ffffffff81436007>] ? cpu_callback+0x87/0x3e8
[ 0.037997] [<ffffffff81931ed1>] ? spawn_ksoftirqd+0x0/0x52
[ 0.037997] [<ffffffff81931ef5>] ? spawn_ksoftirqd+0x24/0x52
[ 0.037997] [<ffffffff810001f2>] ? do_one_initcall+0x57/0x133
[ 0.037997] [<ffffffff8192258a>] ? kernel_init+0x67/0x1c1
[ 0.037997] [<ffffffff81002f94>] ? kernel_thread_helper+0x4/0x10
[ 0.037997] [<ffffffff81032a78>] ? finish_task_switch+0x80/0xec
[ 0.037997] [<ffffffff8143c499>] ? _raw_spin_unlock_irq+0x3b/0x58
[ 0.037997] [<ffffffff8143cb44>] ? restore_args+0x0/0x30
[ 0.037997] [<ffffffff81922523>] ? kernel_init+0x0/0x1c1
[ 0.037997] [<ffffffff81002f90>] ? kernel_thread_helper+0x0/0x10
[ 0.037997] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 0.037997] possible reason: unannotated irqs-off.
[ 0.037997] irq event stamp: 1000
[ 0.037997] hardirqs last enabled at (999): [<ffffffff8143c48e>] _raw_spin_unlock_irq+0x30/0x58
[ 0.037997] hardirqs last disabled at (998): [<ffffffff8143ba38>] _raw_spin_lock_irq+0x19/0x79
[ 0.037997] softirqs last enabled at (908): [<ffffffff81040864>] __do_softirq+0x2a0/0x2f0
[ 0.037997] softirqs last disabled at (1000): [<ffffffff81048273>] del_timer_sync+0x22/0xa0
[ 0.038006] initcall spawn_ksoftirqd+0x0/0x52 returned 0 after 1952 usecs
[ 0.039000] calling init_workqueues+0x0/0x34d @ 1
[ 0.040114] initcall init_workqueues+0x0/0x34d returned 0 after 976 usecs
[ 0.041001] calling init_call_single_data+0x0/0xaa @ 1
[ 0.041143] initcall init_call_single_data+0x0/0xaa returned 0 after 0 usecs


--
Regards/Gruss,
Boris.


Attachments:
(No filename) (5.78 kB)
dmesg.log (102.49 kB)
config (61.34 kB)
Download all attachments

2011-02-08 10:40:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, 2011-02-08 at 11:17 +0100, Borislav Petkov wrote:
>
> So, I'm getting the warning below early in the boot process. And yes, I
> didn't have it on -rc3. 2.6.38-rc4-00001-g1e554e3-dirty means a debug
> diff ontop of -rc4 which shouldn't have anything to do with this splat
> since all it does is a couple of printk's due to -rc3 not suspending to
> disk properly in some cases.
>
> Now, I'm not going to even pretend to understand the code but here's
> what I can read out, you tell me whether it makes sense.
>
> spawn_ksoftirqd() is one of the early initcalls that gets called and
> it's notifier callback does kthread_bind() and you can follow in the
> backtrace below that this thing comes down to del_timer_sync() which
> does the lockdep annotation. Now, problem as I see it, is that hardirqs
> were disabled when we were called although it doesn't say so in the
> irqtrace events dump after the calltrace: "hardirqs last enabled at
> (999)" and our irq event stamp is 1000.
>
> There are actually at least three del_timer_sync()'s inflight so the
> problem could be there somewhere, I dunno.
>
> So, is it a wrong lockdep annotation or is it a real problem? I've
> attached dmesg and config.

Argh! Its an annotation nightmare that.. it didn't trigger for me when
running that because I didn't have DEBUG_LOCKDEP=y.

OK, let me try and come up with another way to annotate this
del_timer_sync() muck, the trouble is we want that lock to be called
with BH disabled, but simply doing local_bh_disable()/local_bh_enable()
has the nasty side effect of calling __do_softirq(). Faking IRQ state
will trip this check_flags() debug muck..

We used to have local_irq_disable()/local_irq_enable() around it, but
then people wanted to use del_timer_sync() from softirq context..


2011-02-08 12:11:26

by Yong Zhang

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, Feb 08, 2011 at 11:41:52AM +0100, Peter Zijlstra wrote:
> Argh! Its an annotation nightmare that.. it didn't trigger for me when
> running that because I didn't have DEBUG_LOCKDEP=y.

Me too...

>
> OK, let me try and come up with another way to annotate this
> del_timer_sync() muck,

Is my previous patch acceptable?

I inlined it as below(and updated based on linux-2.6.38-rc4)

---
From: Yong Zhang <[email protected]>
Subject: [PATCH 1/2] softirq: introduce loacal_bh_enable_force_wake()

If there is pending softirq, don't handle it in the caller's
context, invoke ksoftirqd directly instead.

del_timer_sync() will be the first caller.

Signed-off-by: Yong Zhang <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/bottom_half.h | 1 +
kernel/softirq.c | 21 +++++++++++++++------
2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index 27b1bcf..665d697 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -5,5 +5,6 @@ extern void local_bh_disable(void);
extern void _local_bh_enable(void);
extern void local_bh_enable(void);
extern void local_bh_enable_ip(unsigned long ip);
+extern void local_bh_enable_force_wake(void);

#endif /* _LINUX_BH_H */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 68eb5ef..3c05dfa 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -154,9 +154,9 @@ void _local_bh_enable(void)

EXPORT_SYMBOL(_local_bh_enable);

-static inline void _local_bh_enable_ip(unsigned long ip)
+static inline void _local_bh_enable_ip(unsigned long ip, bool force_wake)
{
- WARN_ON_ONCE(in_irq() || irqs_disabled());
+ WARN_ON_ONCE(in_irq() || (!force_wake && irqs_disabled()));
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_disable();
#endif
@@ -171,8 +171,12 @@ static inline void _local_bh_enable_ip(unsigned long ip)
*/
sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);

- if (unlikely(!in_interrupt() && local_softirq_pending()))
- do_softirq();
+ if (unlikely(!in_interrupt() && local_softirq_pending())) {
+ if (!force_wake)
+ do_softirq();
+ else
+ wakeup_softirqd();
+ }

dec_preempt_count();
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -183,16 +187,21 @@ static inline void _local_bh_enable_ip(unsigned long ip)

void local_bh_enable(void)
{
- _local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+ _local_bh_enable_ip((unsigned long)__builtin_return_address(0), false);
}
EXPORT_SYMBOL(local_bh_enable);

void local_bh_enable_ip(unsigned long ip)
{
- _local_bh_enable_ip(ip);
+ _local_bh_enable_ip(ip, false);
}
EXPORT_SYMBOL(local_bh_enable_ip);

+void local_bh_enable_force_wake(void)
+{
+ _local_bh_enable_ip((unsigned long)__builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(local_bh_enable_force_wake);
/*
* We restart softirq processing MAX_SOFTIRQ_RESTART times,
* and we fall back to softirqd after that.
--
1.7.1

2011-02-08 12:14:25

by Yong Zhang

[permalink] [raw]
Subject: [PATCH 2/2] timer: use local_bh_enable_force_wake() in del_timer_sync()

From: Yong Zhang <[email protected]>
Subject: [PATCH 2/2] timer: use local_bh_enable_force_wake() in del_timer_sync()

raw_local_irq_save()/raw_local_irq_restore() is also not suitable
here because lockdep will report "possible reason: unannotated irqs-off"

To cure this issue, use local_bh_enable_force_wake() here in case of
we have pending softirq left during this time.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Yong Zhang <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/timer.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index d53ce66..1654dc1 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -969,14 +969,10 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
int del_timer_sync(struct timer_list *timer)
{
#ifdef CONFIG_LOCKDEP
- unsigned long flags;
-
- raw_local_irq_save(flags);
local_bh_disable();
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
- _local_bh_enable();
- raw_local_irq_restore(flags);
+ local_bh_enable_force_wake();
#endif
/*
* don't use it in hardirq context, because it
--
1.7.1

2011-02-08 13:35:10

by Yong Zhang

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, Feb 08, 2011 at 08:11:08PM +0800, Yong Zhang wrote:
> From: Yong Zhang <[email protected]>
> Subject: [PATCH 1/2] softirq: introduce loacal_bh_enable_force_wake()
>
> If there is pending softirq, don't handle it in the caller's
> context, invoke ksoftirqd directly instead.
>
> del_timer_sync() will be the first caller.
>
> Signed-off-by: Yong Zhang <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/bottom_half.h | 1 +
> kernel/softirq.c | 21 +++++++++++++++------
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
> index 27b1bcf..665d697 100644
> --- a/include/linux/bottom_half.h
> +++ b/include/linux/bottom_half.h
> @@ -5,5 +5,6 @@ extern void local_bh_disable(void);
> extern void _local_bh_enable(void);
> extern void local_bh_enable(void);
> extern void local_bh_enable_ip(unsigned long ip);
> +extern void local_bh_enable_force_wake(void);
>
> #endif /* _LINUX_BH_H */
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 68eb5ef..3c05dfa 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -154,9 +154,9 @@ void _local_bh_enable(void)
>
> EXPORT_SYMBOL(_local_bh_enable);
>
> -static inline void _local_bh_enable_ip(unsigned long ip)
> +static inline void _local_bh_enable_ip(unsigned long ip, bool force_wake)
> {
> - WARN_ON_ONCE(in_irq() || irqs_disabled());
> + WARN_ON_ONCE(in_irq() || (!force_wake && irqs_disabled()));

Only suppressing the warning is not enough here :(

> #ifdef CONFIG_TRACE_IRQFLAGS
> local_irq_disable();


The semantic "called with irqs_disabled" is showed on several place
of _local_bh_enable_ip().

Need more thinking about it...

2011-02-08 13:47:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, 2011-02-08 at 21:34 +0800, Yong Zhang wrote:
> On Tue, Feb 08, 2011 at 08:11:08PM +0800, Yong Zhang wrote:
> > From: Yong Zhang <[email protected]>
> > Subject: [PATCH 1/2] softirq: introduce loacal_bh_enable_force_wake()
> >
> > If there is pending softirq, don't handle it in the caller's
> > context, invoke ksoftirqd directly instead.
> >
> > del_timer_sync() will be the first caller.
> >
> > Signed-off-by: Yong Zhang <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > ---
> > include/linux/bottom_half.h | 1 +
> > kernel/softirq.c | 21 +++++++++++++++------
> > 2 files changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
> > index 27b1bcf..665d697 100644
> > --- a/include/linux/bottom_half.h
> > +++ b/include/linux/bottom_half.h
> > @@ -5,5 +5,6 @@ extern void local_bh_disable(void);
> > extern void _local_bh_enable(void);
> > extern void local_bh_enable(void);
> > extern void local_bh_enable_ip(unsigned long ip);
> > +extern void local_bh_enable_force_wake(void);
> >
> > #endif /* _LINUX_BH_H */
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 68eb5ef..3c05dfa 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -154,9 +154,9 @@ void _local_bh_enable(void)
> >
> > EXPORT_SYMBOL(_local_bh_enable);
> >
> > -static inline void _local_bh_enable_ip(unsigned long ip)
> > +static inline void _local_bh_enable_ip(unsigned long ip, bool force_wake)
> > {
> > - WARN_ON_ONCE(in_irq() || irqs_disabled());
> > + WARN_ON_ONCE(in_irq() || (!force_wake && irqs_disabled()));
>
> Only suppressing the warning is not enough here :(
>
> > #ifdef CONFIG_TRACE_IRQFLAGS
> > local_irq_disable();
>
>
> The semantic "called with irqs_disabled" is showed on several place
> of _local_bh_enable_ip().
>
> Need more thinking about it...

Well that and modifying softirq bits for a lockdep annotation really
feels wrong.

2011-02-08 14:16:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, 2011-02-08 at 14:48 +0100, Peter Zijlstra wrote:
>
> Well that and modifying softirq bits for a lockdep annotation really
> feels wrong.

How about we revert it for this release and try again later?

---
Subject: lockdep, timer: Revert the del_timer_sync() annotation

Both attempts at trying to allow softirq usage failed, revert for this
release and try again later.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/timer.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 343ff27..c848cd8 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
*
* Synchronization rules: Callers must prevent restarting of the timer,
* otherwise this function is meaningless. It must not be called from
- * hardirq contexts. The caller must not hold locks which would prevent
+ * interrupt contexts. The caller must not hold locks which would prevent
* completion of the timer's handler. The timer's handler must not call
* add_timer_on(). Upon exit the timer is not queued and the handler is
* not running on any CPU.
@@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
#ifdef CONFIG_LOCKDEP
unsigned long flags;

- raw_local_irq_save(flags);
- local_bh_disable();
+ local_irq_save(flags);
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
- _local_bh_enable();
- raw_local_irq_restore(flags);
+ local_irq_restore(flags);
#endif
/*
* don't use it in hardirq context, because it

2011-02-08 15:16:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)


* Peter Zijlstra <[email protected]> wrote:

> On Tue, 2011-02-08 at 14:48 +0100, Peter Zijlstra wrote:
> >
> > Well that and modifying softirq bits for a lockdep annotation really
> > feels wrong.
>
> How about we revert it for this release and try again later?

That's the sanest way forward i think ...

I've queued up the revert.

Thanks,

Ingo

2011-02-08 15:52:58

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:core/urgent] Revert "lockdep, timer: Fix del_timer_sync() annotation"

Commit-ID: 7ff207928eb0761fa6b6c39eda82ac07a5241acf
Gitweb: http://git.kernel.org/tip/7ff207928eb0761fa6b6c39eda82ac07a5241acf
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 8 Feb 2011 15:18:00 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 8 Feb 2011 16:18:39 +0100

Revert "lockdep, timer: Fix del_timer_sync() annotation"

Both attempts at trying to allow softirq usage for
del_timer_sync() failed (produced bogus warnings),
so revert the commit for this release:

f266a5110d45: lockdep, timer: Fix del_timer_sync() annotation

and try again later.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Yong Zhang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
LKML-Reference: <1297174680.13327.107.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/timer.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index d53ce66..d645992 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
*
* Synchronization rules: Callers must prevent restarting of the timer,
* otherwise this function is meaningless. It must not be called from
- * hardirq contexts. The caller must not hold locks which would prevent
+ * interrupt contexts. The caller must not hold locks which would prevent
* completion of the timer's handler. The timer's handler must not call
* add_timer_on(). Upon exit the timer is not queued and the handler is
* not running on any CPU.
@@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
#ifdef CONFIG_LOCKDEP
unsigned long flags;

- raw_local_irq_save(flags);
- local_bh_disable();
+ local_irq_save(flags);
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
- _local_bh_enable();
- raw_local_irq_restore(flags);
+ local_irq_restore(flags);
#endif
/*
* don't use it in hardirq context, because it

2011-02-08 20:28:24

by Eric W. Biederman

[permalink] [raw]
Subject: Heads up Linux 2.6.38-rc4 compile problems.


A quick heads up. 2.6.38-rc4 looks like the worst kernel I've ever tried to
test. It boots up properly and looks ok, but I can't get it to even
compile the programs I usually test with. 2.6.38-rc3 at least managed
that yesterday.

>From 2.6.38-rc3 I managed to figure out that tftp transfers over ipv6
with curl are dying in with a getpeername problem.

Eric

2011-02-08 20:45:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Tue, Feb 8, 2011 at 12:28 PM, Eric W. Biederman
<[email protected]> wrote:
>
> A quick heads up. ?2.6.38-rc4 looks like the worst kernel I've ever tried to
> test. ?It boots up properly and looks ok, but I can't get it to even
> compile the programs I usually test with. 2.6.38-rc3 at least managed
> that yesterday.

gcc dying with ICE or SIGSEGV? Or what? Seriously lacking information here.

Are you using btrfs or cifs (the two filesystems that had
bigger-than-average changes)?

Any chance to bisect it (even if just partially - a couple of
bisection runs would already narrow it down quite a bit)?

> From 2.6.38-rc3 I managed to figure out that tftp transfers over ipv6
> with curl are dying in with a getpeername problem.

Hmm. Adding Davem for that one, although I'd expect you to be able to
figure out some networking problem on your own and perhaps give a
better report?

Linus

2011-02-09 01:46:46

by Yong Zhang

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, Feb 8, 2011 at 10:18 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2011-02-08 at 14:48 +0100, Peter Zijlstra wrote:
>>
>> Well that and modifying softirq bits for a lockdep annotation really
>> feels wrong.
>
> How about we revert it for this release and try again later?

No problem.

I'll take another look at this issue later.

Thanks,
Yong

2011-02-09 09:02:09

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

> On Tue, Feb 8, 2011 at 12:28 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> A quick heads up.  2.6.38-rc4 looks like the worst kernel I've ever tried to
>> test.  It boots up properly and looks ok, but I can't get it to even
>> compile the programs I usually test with. 2.6.38-rc3 at least managed
>> that yesterday.
>
> gcc dying with ICE or SIGSEGV? Or what? Seriously lacking information
> here.
>
> Are you using btrfs or cifs (the two filesystems that had
> bigger-than-average changes)?
>
> Any chance to bisect it (even if just partially - a couple of
> bisection runs would already narrow it down quite a bit)?

What I have been able to do today on this one is, narrow it down a
little and provide a reasonable description of one of the problem.

In a fedora 12 world in a subdirectory on ext4 with the applications
pinned into a mount namespace. I am seeing compiles of gcc-4.4.3 fail. A
quick survey of the compile failures makes it look like these compiles
failures are deterministic, at least I see the same error messages in multiple
failed builds.

The machine is a dual socket quad core non hyperthread machine with 10G of RAM.
The cpus are: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

In the case that fails I am simultaneously compiling a 32bit gcc cross
compiler for x86_64 and i386, for the same source but with different
object directories. The compiles failures happen exactly as before if I
rerun the build commands. I'm still trying to figure out why the
individual files are failing. I expect if I do that will give me a much
bigger clue.

What is interesting is that this fails exactly the same way in at least three invocations
of the command and I expect all of them. With failures taking about 30 minutes.
I might try a bisect tomorrow.

The unfortunate thing is that this weird compile failure during a
parallel build doesn't tell me much yet about what the root cause is.

I have attached my build scripts in case that helps. To understand.
At the moment I am still trying to break this down into something smaller
so I can see just what the heck is going on. And I will keep poking at
it until I get somewhere.

The annoying thing is that looking at the other failures I was seeing
before the compile fell apart. I think I am might be on top of half a
dozen other regressions as well. I don't have a clue how I am going to
get through all of those regressions before before 2.6.38 is out.

Eric

The failures look like:

$ make -C gcc-stage1-x86_64 all-gcc all-target-libgcc install-gcc install-target-libgcc DESTDIR=/bld/Across/Across-2.0.0/sysroot-x86_64
gcc -qlanglvl=ansi -c -DHAVE_CONFIG_H -g -O2 -I. -I/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/../include -W -Wall
-Wwrite-strings -Wc++-compat -Wstrict-prototypes /bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c -o fibheap.o
gcc: unrecognized option '-qlanglvl=ansi'
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_union’:
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:151: warning: implicit declaration of function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:151: warning: incompatible implicit declaration of built-in function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:156: warning: incompatible implicit declaration of built-in function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:172: warning: incompatible implicit declaration of built-in function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_extract_min’:
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:190: warning: incompatible implicit declaration of built-in function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_delete_node’:
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:258: error: ‘LONG_MIN’ undeclared (first use in this function)
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:258: error: (Each undeclared identifier is reported only once
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:258: error: for each function it appears in.)
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_delete’:
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:269: warning: incompatible implicit declaration of built-in function ‘free’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_consolidate’:
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:360: warning: implicit declaration of function ‘memset’
/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:360: warning: incompatible implicit declaration of built-in function ‘memset’


$ make -C gcc-stage1-i386 all-gcc all-target-libgcc install-gcc install-target-libgcc DESTDIR=/bld/Across/Across-2.0.0/sysroot-i386
make: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386'
make[1]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/libiberty'
make[2]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/libiberty/testsuite'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/libiberty/testsuite'
make[1]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/libiberty'
make[1]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/intl'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/intl'
make[1]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/libiberty'
make[2]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/libiberty/testsuite'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/libiberty/testsuite'
make[1]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/libiberty'
make[1]: Entering directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/fixincludes'
gcc -g -O2 -o fixincl fixincl.o fixtests.o fixfixes.o server.o procopen.o fixlib.o fixopts.o ../libiberty/libiberty.a
fixincl.o: In function `quoted_file_exists':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:624: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:624: undefined reference to `_sch_istable'
fixincl.o: In function `egrep_test':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:600: undefined reference to `xregexec'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:600: undefined reference to `xregexec'
fixincl.o: In function `test_for_changes':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:1211: undefined reference to `xregexec'
fixincl.o: In function `extract_quoted_files':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:711: undefined reference to `xregexec'
fixincl.o: In function `initialize':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:245: undefined reference to `_sch_istable'
fixincl.o: In function `main':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:139: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixincl.c:157: undefined reference to `_sch_istable'
fixtests.o: In function `machine_name_test':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixtests.c:79: undefined reference to `xregexec'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixtests.c:104: undefined reference to `xregexec'
fixfixes.o: In function `gnu_type_fix':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:681: undefined reference to `xregexec'
fixfixes.o: In function `emit_gnu_type':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:144: undefined reference to `_sch_toupper'
fixfixes.o: In function `wrap_fix':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:599: undefined reference to `xregexec'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:617: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:621: undefined reference to `_sch_toupper'
fixfixes.o: In function `machine_name_fix':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:508: undefined reference to `xregexec'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:539: undefined reference to `xregexec'
fixfixes.o: In function `format_fix':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:282: undefined reference to `xregexec'
fixfixes.o: In function `format_write':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:198: undefined reference to `_sch_istable'
fixfixes.o: In function `char_macro_use_fix':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:332: undefined reference to `xregexec'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:439: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:448: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:451: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:453: undefined reference to `_sch_istable'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixfixes.c:448: undefined reference to `_sch_istable'
server.o:/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/server.c:124: more undefined references to `_sch_istable' follow
fixlib.o: In function `compile_re':
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixlib.c:193: undefined reference to `xregcomp'
/bld/Across/Across-2.0.0/gcc-4.4.3/fixincludes/fixlib.c:198: undefined reference to `xregerror'
collect2: ld returned 1 exit status
make[1]: *** [full-stamp] Error 1
make[1]: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386/build-i686-pc-linux-gnu/fixincludes'
make: *** [all-build-fixincludes] Error 2
make: Leaving directory `/bld/Across/Across-2.0.0/gcc-stage1-i386'




Attachments:
Across.spec (2.07 kB)
build.sh (2.79 kB)
Download all attachments

2011-02-09 14:59:23

by Alex Riesen

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Wed, Feb 9, 2011 at 10:01, Eric W. Biederman <[email protected]> wrote:
> The failures look like:
>
> $ make -C gcc-stage1-x86_64 all-gcc all-target-libgcc install-gcc install-target-libgcc DESTDIR=/bld/Across/Across-2.0.0/sysroot-x86_64
> gcc -qlanglvl=ansi -c -DHAVE_CONFIG_H -g -O2 -I. -I/bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/../include  -W -Wall
> -Wwrite-strings -Wc++-compat -Wstrict-prototypes /bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c -o fibheap.o
> gcc: unrecognized option '-qlanglvl=ansi'
> /bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c: In function ‘fibheap_union’:
> /bld/Across/Across-2.0.0/gcc-4.4.3/libiberty/fibheap.c:151: warning: implicit declaration of function ‘free’

Maybe some files (stdlib.h or malloc.h) return no data when read?
Then, gcc still
"compiles" them, but all the declarations are missing. And I would expect gcc
to complain if zero blocks returned, and truncated files are very likely to
abort compilation.

Can the command be strace'd immediately after the failure?

2011-02-09 16:03:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Wed, Feb 9, 2011 at 6:59 AM, Alex Riesen <[email protected]> wrote:
>
> Maybe some files (stdlib.h or malloc.h) return no data when read?
> Then, gcc still
> "compiles" them, but all the declarations are missing. And I would expect gcc
> to complain if zero blocks returned, and truncated files are very likely to
> abort compilation.

Well, the thing is, Eric said he was using ext4.

And there are absolutely no changes I can see after -rc3 that would
affect anything like this. No VFS layer changes that look at all
likely, there are no ext4 changes at all, and the VM changes there are
look rather unlikely too (ie they are about corner cases in page
migration and transparent hugepage support - not to mention that it's
almost certainly not some VM race or whatever if they are
deterministic).

The only unusual thing in Eric's setup is the mount namespace usage,
but nothing has changed wrt that, at least since -rc3.

Linus

2011-02-09 17:08:49

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (test_nx: BUG)


test_nx BUGs.
CONFIG_DEBUG_RODATA=y
(nearly allmodconfig, with a few changes)

Is that expected?


[ 132.533559] Testing NX protection
[ 132.533583] BUG: unable to handle kernel paging request at ffffffffa05fa178
[ 132.533730] IP: [<ffffffffa05f905e>] fudze_exception_table+0x4c/0x58 [test_nx]
[ 132.533835] PGD 1a14067 PUD 1a18063 PMD 78b1c067 PTE 800000006cebf161
[ 132.534073] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
[ 132.534132] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
[ 132.534138] CPU 0
[ 132.534140] Modules linked in: test_nx(+) af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod mousedev joydev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq usbmouse snd_seq_device usbkbd usbhid snd_pcm hid 8250_pnp rtc_cmos snd_timer sr_mod pcspkr rtc_core dcdbas i2c_i801 cdrom 8250 snd sg rtc_lib processor tg3 serial_core soundcore thermal_sys intel_agp iTCO_wdt snd_page_alloc button iTCO_vendor_support intel_gtt hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
[ 132.534237]
[ 132.534241] Pid: 2526, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
[ 132.534245] RIP: 0010:[<ffffffffa05f905e>] [<ffffffffa05f905e>] fudze_exception_table+0x4c/0x58 [test_nx]
[ 132.534252] RSP: 0018:ffff88006cec1ec8 EFLAGS: 00010246
[ 132.534254] RAX: ffffffffa05fa178 RBX: 0000000000000001 RCX: 0000000000000000
[ 132.534257] RDX: 0000000000000000 RSI: ffff88006cec1ef8 RDI: ffffffffa05f909d
[ 132.534261] RBP: ffff88006cec1ec8 R08: 0000000000000001 R09: 0000000000000000
[ 132.534263] R10: 0000000000000000 R11: ffff88006cec1d68 R12: ffff88006cec1ef8
[ 132.534266] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 132.534270] FS: 00007f420c3536f0(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 132.534274] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 132.534277] CR2: ffffffffa05fa178 CR3: 000000006cef0000 CR4: 00000000000006f0
[ 132.534280] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 132.534283] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 132.534287] Process modprobe (pid: 2526, threadinfo ffff88006cec0000, task ffff88006ce93000)
[ 132.534289] Stack:
[ 132.534291] ffff88006cec1ee8 ffffffffa05f9094 ffffffffa05fb080 ffffffffa05f90d7
[ 132.534299] ffff88006cec1f18 ffffffffa05f911d 00000000000090c3 0000000000000000
[ 132.534304] ffff88006cec1f18 ffffffffa05fb080 ffff88006cec1f48 ffffffff810020e3
[ 132.534310] Call Trace:
[ 132.534318] [<ffffffffa05f9094>] test_address+0x2a/0x33 [test_nx]
[ 132.534325] [<ffffffffa05f90d7>] ? test_NX+0x0/0x180 [test_nx]
[ 132.534330] [<ffffffffa05f911d>] test_NX+0x46/0x180 [test_nx]
[ 132.534336] [<ffffffff810020e3>] do_one_initcall+0xa9/0x1ef
[ 132.534341] [<ffffffff810d6674>] sys_init_module+0x12b/0x307
[ 132.534345] [<ffffffff8100e942>] system_call_fastpath+0x16/0x1b
[ 132.534347] Code: 00 e8 4b 9d f5 e0 48 c7 c7 53 a0 5f a0 31 c0 48 ff 05 c7 22 00 00 e8 36 9d f5 e0 48 ff 05 c3 22 00 00 eb 11 48 8b 05 62 21 00 00 <48> 89 30 48 ff 05 b8 22 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f
[ 132.534381] RIP [<ffffffffa05f905e>] fudze_exception_table+0x4c/0x58 [test_nx]
[ 132.534386] RSP <ffff88006cec1ec8>
[ 132.534388] CR2: ffffffffa05fa178
[ 132.534392] ---[ end trace c7905109946d6306 ]---

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 17:10:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (test_nx: BUG)

On 2/9/2011 9:08 AM, Randy Dunlap wrote:
> test_nx BUGs.
> CONFIG_DEBUG_RODATA=y
> (nearly allmodconfig, with a few changes)
>
> Is that expected?
>
this test should pass...
so something broke it (I'll call that a success for the test ;-)

2011-02-09 17:25:01

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (hysdn: BUG)

on x86_64. no HYSDN hardware found (correct).
Nearly allmodconfig.


[ 65.397577] HYSDN: module Rev: 1.6.6.6 loaded
[ 65.397584] HYSDN: network interface Rev: 1.8.6.4
[ 65.398057] HYSDN: 0 card(s) found.
[ 65.398121] BUG: unable to handle kernel paging request at ffffffffa06c99f0
[ 65.398269] IP: [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
[ 65.398379] PGD 1a14067 PUD 1a18063 PMD 6f6c1067 PTE 800000006ce8c161
[ 65.398613] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
[ 65.398805] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.4/0000:03:00.0/irq
[ 65.398864] CPU 0
[ 65.398913] Modules linked in: hysdn(+) kernelcapi af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod mousedev joydev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq usbmouse usbkbd snd_seq_device usbhid snd_pcm hid snd_timer 8250_pnp tg3 pcspkr sr_mod rtc_cmos dcdbas sg snd cdrom i2c_i801 rtc_core iTCO_wdt 8250 processor rtc_lib soundcore iTCO_vendor_support serial_core thermal_sys intel_agp snd_page_alloc intel_gtt button hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: micr!
ocode]
[ 65.400030]
[ 65.400030] Pid: 2497, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
[ 65.400030] RIP: 0010:[<ffffffffa06c68ba>] [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
[ 65.400030] RSP: 0018:ffff88006eec1e68 EFLAGS: 00010206
[ 65.400030] RAX: ffffffffa06c99f1 RBX: ffffffffa06c99e9 RCX: ffff88007c4159a0
[ 65.400030] RDX: 000000000c960c24 RSI: 0000000000000024 RDI: ffffffffa06c99e9
[ 65.400030] RBP: ffff88006eec1e78 R08: 0000000000000000 R09: 0000000000000000
[ 65.400030] R10: 0000000000000000 R11: ffffffff8124698c R12: ffff88006eec1e88
[ 65.400030] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 65.400030] FS: 00007fe1ae6c76f0(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 65.400030] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 65.400030] CR2: ffffffffa06c99f0 CR3: 000000006eee3000 CR4: 00000000000006f0
[ 65.400030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 65.400030] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 65.400030] Process modprobe (pid: 2497, threadinfo ffff88006eec0000, task ffff88006cf88000)
[ 65.400030] Stack:
[ 65.400030] ffff88006eec1e78 0000000000000000 ffff88006eec1eb8 ffffffffa06c3bf8
[ 65.400030] ffff88006cf88000 0000000000000000 0000000000000000 00000000b3346a0d
[ 65.400030] 0000000000000000 ffffffffa062b000 ffff88006eec1f18 ffffffffa062b0e2
[ 65.400030] Call Trace:
[ 65.400030] [<ffffffffa06c3bf8>] hysdn_procconf_init+0xf5/0x133 [hysdn]
[ 65.400030] [<ffffffffa062b000>] ? hysdn_init+0x0/0x1000 [hysdn]
[ 65.400030] [<ffffffffa062b0e2>] hysdn_init+0xe2/0x1000 [hysdn]
[ 65.400030] [<ffffffff810020e3>] do_one_initcall+0xa9/0x1ef
[ 65.400030] [<ffffffff810d6674>] sys_init_module+0x12b/0x307
[ 65.400030] [<ffffffff8100e942>] system_call_fastpath+0x16/0x1b
[ 65.400030] Code: e5 53 48 83 ec 08 0f 1f 44 00 00 be 3a 00 00 00 e8 71 39 c0 e0 48 85 c0 74 1e 48 8d 58 02 be 24 00 00 00 48 89 df e8 5b 39 c0 e0 <c6> 40 ff 00 48 ff 05 db 79 00 00 eb 0e 48 ff 05 da 79 00 00 48
[ 65.400030] RIP [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
[ 65.400030] RSP <ffff88006eec1e68>
[ 65.400030] CR2: ffffffffa06c99f0
[ 65.400030] ---[ end trace bf14fd4acc41f5a9 ]---

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 17:27:45

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (tty/ifx6x60: BUG)

x86_64, nearly allmodconfig.


[ 86.956907] BUG: unable to handle kernel paging request at ffffffffa062c5d8
[ 86.958336] IP: [<ffffffff814461b1>] spi_register_driver+0x15/0x6c
[ 86.958349] PGD 1a14067 PUD 1a18063 PMD 6f558067 PTE 800000006ed06161
[ 86.958359] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
[ 86.958364] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
[ 86.958370] CPU 0
[ 86.958372] Modules linked in: ifx6x60(+) af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod mousedev joydev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq usbmouse usbkbd snd_seq_device usbhid hid snd_pcm sr_mod tg3 pcspkr dcdbas 8250_pnp snd_timer rtc_cmos cdrom i2c_i801 sg rtc_core iTCO_wdt snd 8250 iTCO_vendor_support rtc_lib processor serial_core soundcore thermal_sys intel_agp snd_page_alloc intel_gtt hwmon button unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
[ 86.958466]
[ 86.958471] Pid: 2523, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
[ 86.958474] RIP: 0010:[<ffffffff814461b1>] [<ffffffff814461b1>] spi_register_driver+0x15/0x6c
[ 86.958480] RSP: 0018:ffff88006cd63ef8 EFLAGS: 00010286
[ 86.958483] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006cd63e98
[ 86.958487] RDX: 0000000000000000 RSI: 000000143f08b654 RDI: ffffffffa062c5a0
[ 86.958490] RBP: ffff88006cd63ef8 R08: 0000000000000000 R09: 0000000000000000
[ 86.958493] R10: 0000000000000000 R11: 2222222222222222 R12: ffffffffa01c2000
[ 86.958496] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 86.958501] FS: 00007f71c311a6f0(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 86.958504] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 86.958507] CR2: ffffffffa062c5d8 CR3: 000000007033a000 CR4: 00000000000006f0
[ 86.958511] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 86.958514] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 86.958518] Process modprobe (pid: 2523, threadinfo ffff88006cd62000, task ffff88006cfa8000)
[ 86.958521] Stack:
[ 86.958523] ffff88006cd63f18 ffffffffa01c2133 ffff88006cd63f18 ffffffffa062e140
[ 86.958529] ffff88006cd63f48 ffffffff810020e3 ffffffffa062e140 0000000000000001
[ 86.958535] 000000000061b470 0000000000000000 ffff88006cd63f78 ffffffff810d6674
[ 86.958541] Call Trace:
[ 86.958551] [<ffffffffa01c2133>] ifx_spi_init+0x133/0x1000 [ifx6x60]
[ 86.958557] [<ffffffff810020e3>] do_one_initcall+0xa9/0x1ef
[ 86.958564] [<ffffffff810d6674>] sys_init_module+0x12b/0x307
[ 86.958569] [<ffffffff8100e942>] system_call_fastpath+0x16/0x1b
[ 86.958571] Code: e1 10 00 48 ff 05 e8 b9 94 01 41 58 5b 44 89 e0 41 5c 41 5d c9 c3 55 48 89 e5 0f 1f 44 00 00 48 ff 05 34 bb 94 01 48 83 7f 08 00 <48> c7 47 38 70 75 cb 81 74 0f 48 c7 47 58 f4 50 44 81 48 ff 05
[ 86.958613] RIP [<ffffffff814461b1>] spi_register_driver+0x15/0x6c
[ 86.958617] RSP <ffff88006cd63ef8>
[ 86.958620] CR2: ffffffffa062c5d8
[ 86.958625] ---[ end trace 99fc207b918ffa02 ]---

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 17:30:15

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

x86_64, nearly allmodconfig. No target hardware.


[ 144.508473] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 144.509901] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
[ 144.512026] CPU 1
[ 144.512026] Modules linked in: target_core_mod(-) configfs af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod mousedev joydev evdev mac_hid snd_hda_codec_analog usbmouse snd_hda_intel snd_hda_codec usbkbd usbhid hid snd_hwdep snd_seq 8250_pnp snd_seq_device dcdbas pcspkr sr_mod i2c_i801 cdrom tg3 sg iTCO_wdt snd_pcm 8250 iTCO_vendor_support rtc_cmos serial_core rtc_core snd_timer rtc_lib snd processor button thermal_sys soundcore intel_agp hwmon snd_page_alloc intel_gtt unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unload!
ed: microcode]
[ 144.512026]
[ 144.512026] Pid: 2597, comm: rmmod Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
[ 144.512026] RIP: 0010:[<ffffffff810c3e5f>] [<ffffffff810c3e5f>] __lock_acquire+0xd8/0x4e8
[ 144.512026] RSP: 0018:ffff88006df1bb78 EFLAGS: 00010006
[ 144.512026] RAX: 0000000000000002 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
[ 144.512026] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6be3
[ 144.512026] RBP: ffff88006df1bbd8 R08: 0000000000000001 R09: 0000000000000000
[ 144.512026] R10: 0000000000000006 R11: ffffffffa06ab0ef R12: 0000000000000000
[ 144.512026] R13: ffff88006dec3000 R14: 0000000000000000 R15: 0000000000000000
[ 144.512026] FS: 00007fe0320d36f0(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
[ 144.512026] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 144.512026] CR2: 0000003fadc7bf20 CR3: 000000006de4f000 CR4: 00000000000006e0
[ 144.512026] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 144.512026] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 144.512026] Process rmmod (pid: 2597, threadinfo ffff88006df1a000, task ffff88006dec3000)
[ 144.512026] Stack:
[ 144.512026] 0000000000000118 ffff8800000003a7 0000000000000005 00000000810b175b
[ 144.512026] ffff88006df1bba8 0000000000000000 ffff88006dec3000 0000000000000000
[ 144.512026] ffff88006dec3000 ffffffffa06ab0ef 0000000000000001 0000000000000000
[ 144.512026] Call Trace:
[ 144.512026] [<ffffffffa06ab0ef>] ? spin_lock+0x15/0x1e [configfs]
[ 144.512026] [<ffffffff810c436f>] lock_acquire+0x100/0x150
[ 144.512026] [<ffffffffa06ab0ef>] ? spin_lock+0x15/0x1e [configfs]
[ 144.512026] [<ffffffffa06ac40f>] ? detach_groups+0x91/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac40f>] ? detach_groups+0x91/0x12e [configfs]
[ 144.512026] [<ffffffff81556300>] _raw_spin_lock+0x44/0xaf
[ 144.512026] [<ffffffffa06ab0ef>] ? spin_lock+0x15/0x1e [configfs]
[ 144.512026] [<ffffffff810c47db>] ? lock_release_nested+0xfb/0x133
[ 144.512026] [<ffffffffa06ab0ef>] spin_lock+0x15/0x1e [configfs]
[ 144.512026] [<ffffffffa06ab144>] dget+0x2e/0x56 [configfs]
[ 144.512026] [<ffffffffa06ac3a4>] detach_groups+0x26/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac363>] configfs_detach_group+0x2d/0x48 [configfs]
[ 144.512026] [<ffffffffa06ac41f>] detach_groups+0xa1/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac363>] configfs_detach_group+0x2d/0x48 [configfs]
[ 144.512026] [<ffffffffa06ac41f>] detach_groups+0xa1/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac363>] configfs_detach_group+0x2d/0x48 [configfs]
[ 144.512026] [<ffffffffa06ac41f>] detach_groups+0xa1/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac363>] configfs_detach_group+0x2d/0x48 [configfs]
[ 144.512026] [<ffffffffa06ac41f>] detach_groups+0xa1/0x12e [configfs]
[ 144.512026] [<ffffffffa06ac363>] configfs_detach_group+0x2d/0x48 [configfs]
[ 144.512026] [<ffffffffa06ace26>] configfs_unregister_subsystem+0x105/0x194 [configfs]
[ 144.512026] [<ffffffffa06baf55>] target_core_exit_configfs+0x185/0x1eb [target_core_mod]
[ 144.512026] [<ffffffff810d46a8>] sys_delete_module+0x2d6/0x368
[ 144.512026] [<ffffffff8155602d>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 144.512026] [<ffffffff81555fb7>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 144.512026] [<ffffffff8100e942>] system_call_fastpath+0x16/0x1b
[ 144.512026] Code: 05 8f 32 8d 01 e8 6c b1 fb ff 48 ff 05 8b 32 8d 01 48 ff 05 8c 32 8d 01 48 ff 05 95 32 8d 01 e9 e3 03 00 00 48 ff 05 81 32 8d 01 <48> 81 3b 40 5f 26 82 75 07 48 ff 05 81 32 8d 01 83 fe 01 77 13
[ 144.512026] RIP [<ffffffff810c3e5f>] __lock_acquire+0xd8/0x4e8
[ 144.512026] RSP <ffff88006df1bb78>
[ 144.512026] ---[ end trace 37e0ba5347875330 ]---

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 17:37:59

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs)


There are also bugs or other oopses etc. with ipmi, x25 (rmmod hangs system),
and (usb gadget) g_audio, but I don't have logs for them yet...
Someone else could do those.

Oh, and ext4, which was reported on the ext4 mailing list.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 18:25:01

by Alan

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (tty/ifx6x60: BUG)

On Wed, 9 Feb 2011 09:26:33 -0800
Randy Dunlap <[email protected]> wrote:

> x86_64, nearly allmodconfig.

Russ posted a set of updates for ifx6x60 a while back which fix this (2nd
Feb was the first posting of it)


2011-02-09 19:00:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, Feb 9, 2011 at 9:28 AM, Randy Dunlap <[email protected]> wrote:
> x86_64, nearly allmodconfig. ?No target hardware.
>
>
> [ ?144.508473] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ ?144.509901] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
> [ ?144.512026] CPU 1
> [ ?144.512026]
> [ ?144.512026] Pid: 2597, comm: rmmod Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
> [ ?144.512026] RIP: 0010:[<ffffffff810c3e5f>] ?[<ffffffff810c3e5f>] __lock_acquire+0xd8/0x4e8
> [ ?144.512026] RSP: 0018:ffff88006df1bb78 ?EFLAGS: 00010006
> [ ?144.512026] RAX: 0000000000000002 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000

The code disassembles to

0: 8d 01 lea (%rcx),%eax
2: e8 6c b1 fb ff callq 0xfffffffffffbb173
7: 48 ff 05 8b 32 8d 01 incq 0x18d328b(%rip) # 0x18d3299
e: 48 ff 05 8c 32 8d 01 incq 0x18d328c(%rip) # 0x18d32a1
15: 48 ff 05 95 32 8d 01 incq 0x18d3295(%rip) # 0x18d32b1
1c: e9 e3 03 00 00 jmpq 0x404
21: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32a9
28:* 48 81 3b 40 5f 26 82 cmpq $0xffffffff82265f40,(%rbx) <--
trapping instruction
2f: 75 07 jne 0x38
31: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32b9
38: 83 fe 01 cmp $0x1,%esi

and %rbx (and %rdi) contains the poison pattern for free'd memory (0x6b6b6b..).

> [ ?144.512026] Process rmmod (pid: 2597, threadinfo ffff88006df1a000, task ffff88006dec3000)

.. and that's likely not a very commonly tested case.

> [ ?144.512026] ?[<ffffffffa06ace26>] configfs_unregister_subsystem+0x105/0x194 [configfs]
> [ ?144.512026] ?[<ffffffffa06baf55>] target_core_exit_configfs+0x185/0x1eb [target_core_mod]
> [ ?144.512026] ?[<ffffffff810d46a8>] sys_delete_module+0x2d6/0x368

The target_core_exit_configfs() code looks _very_ broken. It looks
broken for two reasons:

- it's very different from the cleanup code for the "failed to init"
case in target_core_init_configfs, which does a lot less (see the
"out:" code there)

- it seems to do a lot of manual freeing of the
"su_group.default_groups" stuff etc, which is all internal configfs
stuff, and seems to be used by the register/unregister phases.

So somebody show knows configfs better should really check that
cleanup, but it looks like target-core is just totally broken for the
rmmod case.

Added more people to the cc. Nicholas, Joel and James. Guys: please
check the insmod/rmmod case with
(a) spinlock debugging and lockdep enabled
(b) SLUB poisoning enabled.
ie all of these should be on:

CONFIG_SLUB_DEBUG_ON=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_STACKTRACE=y

and you might also want to add CONFIG_DEBUG_PAGEALLOC to the mix.

Linus

2011-02-09 19:44:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (hysdn: BUG)

On Wed, Feb 9, 2011 at 9:24 AM, Randy Dunlap <[email protected]> wrote:
>
> on x86_64. ?no HYSDN hardware found (correct).
> Nearly allmodconfig.
>
>
> [ ? 65.397577] HYSDN: module Rev: 1.6.6.6 loaded
> [ ? 65.397584] HYSDN: network interface Rev: 1.8.6.4
> [ ? 65.398057] HYSDN: 0 card(s) found.
> [ ? 65.398121] BUG: unable to handle kernel paging request at ffffffffa06c99f0
> [ ? 65.398269] IP: [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
> [ ? 65.398379] PGD 1a14067 PUD 1a18063 PMD 6f6c1067 PTE 800000006ce8c161
> [ ? 65.398613] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
> [ ? 65.400030]
> [ ? 65.400030] Pid: 2497, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
> [ ? 65.400030] RIP: 0010:[<ffffffffa06c68ba>] ?[<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
> [ ? 65.400030] RSP: 0018:ffff88006eec1e68 ?EFLAGS: 00010206
> [ ? 65.400030] RAX: ffffffffa06c99f1 RBX: ffffffffa06c99e9 RCX: ffff88007c4159a0

The instruction sequence decodes to

1e: be 24 00 00 00 mov $0x24,%esi
23: 48 89 df mov %rbx,%rdi
26: e8 5b 39 c0 e0 callq 0xffffffffe0c03986
2b:* c6 40 ff 00 movb $0x0,-0x1(%rax) <-- trapping instruction

which seems to be this

p = strchr(rev, '$');
*--p = 0;

code. And yes, it's total crap, because while "p" and "rev" are "char
*", the string that is passed in is actually of type "const char *",
so that function is seriously broken. It's also seriously broken to
not test that "p" is non-NULL - the function would just break if there
is a colon in the string but not a '$'.

And hysdn_procconf_init() passes in a constant string to the thing:

static char *hysdn_procconf_revision = "$Revision: 1.8.6.4 $";

What happens is that it breaks when we mark the constant section as
read-only, because you have CONFIG_DEBUG_RODATA enabled.

So the fix seems to be to
- fix the prototype for hysdn_getrev() to not have "const".
- fix hysdn_procconf_init() to not pass in a constant string to it

The minimal patch would appear to be something like the appended. UNTESTED!

Btw, all of this code seems to go back to before the git history even
started, so it doesn't seem to be new. I assume you haven't tried
booting these all-module kernels before? Or is it just the
DEBUG_RODATA thing that is new for you?

Linus


Attachments:
patch.diff (1.65 kB)

2011-02-09 20:02:43

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, 2011-02-09 at 11:00 -0800, Linus Torvalds wrote:
> On Wed, Feb 9, 2011 at 9:28 AM, Randy Dunlap <[email protected]> wrote:
> > x86_64, nearly allmodconfig. No target hardware.
> >
> >
> > [ 144.508473] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> > [ 144.509901] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
> > [ 144.512026] CPU 1
> > [ 144.512026]
> > [ 144.512026] Pid: 2597, comm: rmmod Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
> > [ 144.512026] RIP: 0010:[<ffffffff810c3e5f>] [<ffffffff810c3e5f>] __lock_acquire+0xd8/0x4e8
> > [ 144.512026] RSP: 0018:ffff88006df1bb78 EFLAGS: 00010006
> > [ 144.512026] RAX: 0000000000000002 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
>
> The code disassembles to
>
> 0: 8d 01 lea (%rcx),%eax
> 2: e8 6c b1 fb ff callq 0xfffffffffffbb173
> 7: 48 ff 05 8b 32 8d 01 incq 0x18d328b(%rip) # 0x18d3299
> e: 48 ff 05 8c 32 8d 01 incq 0x18d328c(%rip) # 0x18d32a1
> 15: 48 ff 05 95 32 8d 01 incq 0x18d3295(%rip) # 0x18d32b1
> 1c: e9 e3 03 00 00 jmpq 0x404
> 21: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32a9
> 28:* 48 81 3b 40 5f 26 82 cmpq $0xffffffff82265f40,(%rbx) <--
> trapping instruction
> 2f: 75 07 jne 0x38
> 31: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32b9
> 38: 83 fe 01 cmp $0x1,%esi
>
> and %rbx (and %rdi) contains the poison pattern for free'd memory (0x6b6b6b..).
>
> > [ 144.512026] Process rmmod (pid: 2597, threadinfo ffff88006df1a000, task ffff88006dec3000)
>
> .. and that's likely not a very commonly tested case.
>
> > [ 144.512026] [<ffffffffa06ace26>] configfs_unregister_subsystem+0x105/0x194 [configfs]
> > [ 144.512026] [<ffffffffa06baf55>] target_core_exit_configfs+0x185/0x1eb [target_core_mod]
> > [ 144.512026] [<ffffffff810d46a8>] sys_delete_module+0x2d6/0x368
>
> The target_core_exit_configfs() code looks _very_ broken. It looks
> broken for two reasons:
>
> - it's very different from the cleanup code for the "failed to init"
> case in target_core_init_configfs, which does a lot less (see the
> "out:" code there)
>

When registering a top level struct configfs_subsystem to appear under

/sys/kernel/config/$SUBSYSTEM

the releasing of the top-level default group via
configfs_unregister_subsystem() during a failure in
target_core_init_configfs() is done for us, but we are still missing the
extra config_item_put()'s on the sub top-level groups (Joel, please
correct me)

The original 'out:' failure path code does not call config_item_put() on
these default groups, because config_group_init_type_name() has only
initialized struct config_group until configfs_register_subsystem() is
called to register the top level struct config_subsystem.

With the current 'out:' path being broken, to address the first point I
think moving the following code chunk in target_core_init_configfs to
before the configfs_register_subsystem() would make sense so that
configfs_register_subsystem() will fail last:

/*
* Register built-in RAMDISK subsystem logic for virtual LUN 0
*/
ret = rd_module_init();
if (ret < 0)
goto out;

if (core_dev_setup_virtual_lun0() < 0)
goto out;

return 0;

However looking at fs/configfs/dir.c:configfs_register_subsystem(), I
think the caller is still expected to release any sub top-level struct
config_group->default_groups[] w/ config_item_put() even though
unlink_group() is called from the configfs_attach_group() failure path..
(Joel..?)

> - it seems to do a lot of manual freeing of the
> "su_group.default_groups" stuff etc, which is all internal configfs
> stuff, and seems to be used by the register/unregister phases.
>

The specific issue rmmod with SLUB poisioning had been reported by Fubo
Chen to linux-scsi in the last weeks. The patch to address the proper
release of the top-level + sub top-level struct configfs_subsystem's
default_groups in target_core_exit_configfs() has been committed into
the upstream tree in lio-core-2.6.git/linus-38-rc3 and sent out to
linux-scsi here:

[PATCH] target: Fix top-level configfs_subsystem default_group shutdown breakage
http://marc.info/?l=linux-scsi&m=129662389218924&w=2

> So somebody show knows configfs better should really check that
> cleanup, but it looks like target-core is just totally broken for the
> rmmod case.
>
> Added more people to the cc. Nicholas, Joel and James. Guys: please
> check the insmod/rmmod case with
> (a) spinlock debugging and lockdep enabled
> (b) SLUB poisoning enabled.
> ie all of these should be on:
>
> CONFIG_SLUB_DEBUG_ON=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_LOCKDEP=y
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> CONFIG_STACKTRACE=y
>
> and you might also want to add CONFIG_DEBUG_PAGEALLOC to the mix.
>

<nod> I believe the above patch resolves the specific rmmod issue.
However, during SLUB poisioning testing we also came across errors with
the incorrect use of struct config_item_operations->release() in
target_core_configfs.c and target_core_fabric_configfs.c code. The
series to address these was included in the last series to James here:

[PATCH 00/12] target: Updates for .38-rc4
http://marc.info/?l=linux-scsi&m=129680191624837&w=2

Note that this series for-38 mainline needs to be applied on top of the
original update series after the drivers/target/ mainline merge:

[PATCH 00/24] target updates for .38-rc3 (v2)
http://marc.info/?l=linux-scsi&m=129632617326015&w=2

The entire series is available from

git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-38-rc4

James, please review + sign-off so we can get these updates into mainline.

2011-02-09 20:14:01

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, 2011-02-09 at 12:02 -0800, Nicholas A. Bellinger wrote:
> On Wed, 2011-02-09 at 11:00 -0800, Linus Torvalds wrote:
> > On Wed, Feb 9, 2011 at 9:28 AM, Randy Dunlap <[email protected]> wrote:
> > > x86_64, nearly allmodconfig. No target hardware.
> > >
> > >
> > > [ 144.508473] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> > > [ 144.509901] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
> > > [ 144.512026] CPU 1
> > > [ 144.512026]
> > > [ 144.512026] Pid: 2597, comm: rmmod Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
> > > [ 144.512026] RIP: 0010:[<ffffffff810c3e5f>] [<ffffffff810c3e5f>] __lock_acquire+0xd8/0x4e8
> > > [ 144.512026] RSP: 0018:ffff88006df1bb78 EFLAGS: 00010006
> > > [ 144.512026] RAX: 0000000000000002 RBX: 6b6b6b6b6b6b6be3 RCX: 0000000000000000
> >
> > The code disassembles to
> >
> > 0: 8d 01 lea (%rcx),%eax
> > 2: e8 6c b1 fb ff callq 0xfffffffffffbb173
> > 7: 48 ff 05 8b 32 8d 01 incq 0x18d328b(%rip) # 0x18d3299
> > e: 48 ff 05 8c 32 8d 01 incq 0x18d328c(%rip) # 0x18d32a1
> > 15: 48 ff 05 95 32 8d 01 incq 0x18d3295(%rip) # 0x18d32b1
> > 1c: e9 e3 03 00 00 jmpq 0x404
> > 21: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32a9
> > 28:* 48 81 3b 40 5f 26 82 cmpq $0xffffffff82265f40,(%rbx) <--
> > trapping instruction
> > 2f: 75 07 jne 0x38
> > 31: 48 ff 05 81 32 8d 01 incq 0x18d3281(%rip) # 0x18d32b9
> > 38: 83 fe 01 cmp $0x1,%esi
> >
> > and %rbx (and %rdi) contains the poison pattern for free'd memory (0x6b6b6b..).
> >
> > > [ 144.512026] Process rmmod (pid: 2597, threadinfo ffff88006df1a000, task ffff88006dec3000)
> >
> > .. and that's likely not a very commonly tested case.
> >
> > > [ 144.512026] [<ffffffffa06ace26>] configfs_unregister_subsystem+0x105/0x194 [configfs]
> > > [ 144.512026] [<ffffffffa06baf55>] target_core_exit_configfs+0x185/0x1eb [target_core_mod]
> > > [ 144.512026] [<ffffffff810d46a8>] sys_delete_module+0x2d6/0x368
> >
> > The target_core_exit_configfs() code looks _very_ broken. It looks
> > broken for two reasons:
> >
> > - it's very different from the cleanup code for the "failed to init"
> > case in target_core_init_configfs, which does a lot less (see the
> > "out:" code there)
> >
>
> When registering a top level struct configfs_subsystem to appear under
>
> /sys/kernel/config/$SUBSYSTEM
>
> the releasing of the top-level default group via
> configfs_unregister_subsystem() during a failure in
> target_core_init_configfs() is done for us, but we are still missing the
> extra config_item_put()'s on the sub top-level groups (Joel, please
> correct me)
>
> The original 'out:' failure path code does not call config_item_put() on
> these default groups, because config_group_init_type_name() has only
> initialized struct config_group until configfs_register_subsystem() is
> called to register the top level struct config_subsystem.
>
> With the current 'out:' path being broken, to address the first point I
> think moving the following code chunk in target_core_init_configfs to
> before the configfs_register_subsystem() would make sense so that
> configfs_register_subsystem() will fail last:
>
> /*
> * Register built-in RAMDISK subsystem logic for virtual LUN 0
> */
> ret = rd_module_init();
> if (ret < 0)
> goto out;
>
> if (core_dev_setup_virtual_lun0() < 0)
> goto out;
>
> return 0;
>
> However looking at fs/configfs/dir.c:configfs_register_subsystem(), I
> think the caller is still expected to release any sub top-level struct
> config_group->default_groups[] w/ config_item_put() even though
> unlink_group() is called from the configfs_attach_group() failure path..
> (Joel..?)
>
> > - it seems to do a lot of manual freeing of the
> > "su_group.default_groups" stuff etc, which is all internal configfs
> > stuff, and seems to be used by the register/unregister phases.
> >
>
> The specific issue rmmod with SLUB poisioning had been reported by Fubo
> Chen to linux-scsi in the last weeks. The patch to address the proper
> release of the top-level + sub top-level struct configfs_subsystem's
> default_groups in target_core_exit_configfs() has been committed into
> the upstream tree in lio-core-2.6.git/linus-38-rc3 and sent out to
> linux-scsi here:
>
> [PATCH] target: Fix top-level configfs_subsystem default_group shutdown breakage
> http://marc.info/?l=linux-scsi&m=129662389218924&w=2
>
> > So somebody show knows configfs better should really check that
> > cleanup, but it looks like target-core is just totally broken for the
> > rmmod case.
> >
> > Added more people to the cc. Nicholas, Joel and James. Guys: please
> > check the insmod/rmmod case with
> > (a) spinlock debugging and lockdep enabled
> > (b) SLUB poisoning enabled.
> > ie all of these should be on:
> >
> > CONFIG_SLUB_DEBUG_ON=y
> > CONFIG_DEBUG_SPINLOCK=y
> > CONFIG_DEBUG_MUTEXES=y
> > CONFIG_DEBUG_LOCK_ALLOC=y
> > CONFIG_PROVE_LOCKING=y
> > CONFIG_LOCKDEP=y
> > CONFIG_DEBUG_LOCKDEP=y
> > CONFIG_TRACE_IRQFLAGS=y
> > CONFIG_DEBUG_SPINLOCK_SLEEP=y
> > CONFIG_STACKTRACE=y
> >
> > and you might also want to add CONFIG_DEBUG_PAGEALLOC to the mix.
> >
>
> <nod> I believe the above patch resolves the specific rmmod issue.
> However, during SLUB poisioning testing we also came across errors with
> the incorrect use of struct config_item_operations->release() in
> target_core_configfs.c and target_core_fabric_configfs.c code. The
> series to address these was included in the last series to James here:
>
> [PATCH 00/12] target: Updates for .38-rc4
> http://marc.info/?l=linux-scsi&m=129680191624837&w=2
>
> Note that this series for-38 mainline needs to be applied on top of the
> original update series after the drivers/target/ mainline merge:
>
> [PATCH 00/24] target updates for .38-rc3 (v2)
> http://marc.info/?l=linux-scsi&m=129632617326015&w=2
>
> The entire series is available from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-38-rc4
>
> James, please review + sign-off so we can get these updates into mainline.

Firstly, could we get the serious bug fixes identified and separated
from the general enhancement updates, so they can go in a fixes tree
without depending on enhancements? The former category would include
the /proc interface removal, since we don't want the legacy interface to
be in a released kernel.

James

2011-02-09 20:20:42

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, 2011-02-09 at 14:13 -0600, James Bottomley wrote:
> On Wed, 2011-02-09 at 12:02 -0800, Nicholas A. Bellinger wrote:
> > On Wed, 2011-02-09 at 11:00 -0800, Linus Torvalds wrote:
> > > On Wed, Feb 9, 2011 at 9:28 AM, Randy Dunlap <[email protected]> wrote:
> > > > x86_64, nearly allmodconfig. No target hardware.

<SNIP>

> > <nod> I believe the above patch resolves the specific rmmod issue.
> > However, during SLUB poisioning testing we also came across errors with
> > the incorrect use of struct config_item_operations->release() in
> > target_core_configfs.c and target_core_fabric_configfs.c code. The
> > series to address these was included in the last series to James here:
> >
> > [PATCH 00/12] target: Updates for .38-rc4
> > http://marc.info/?l=linux-scsi&m=129680191624837&w=2
> >
> > Note that this series for-38 mainline needs to be applied on top of the
> > original update series after the drivers/target/ mainline merge:
> >
> > [PATCH 00/24] target updates for .38-rc3 (v2)
> > http://marc.info/?l=linux-scsi&m=129632617326015&w=2
> >
> > The entire series is available from
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-38-rc4
> >
> > James, please review + sign-off so we can get these updates into mainline.
>
> Firstly, could we get the serious bug fixes identified and separated
> from the general enhancement updates, so they can go in a fixes tree
> without depending on enhancements? The former category would include
> the /proc interface removal, since we don't want the legacy interface to
> be in a released kernel.
>

Everything in those two series should be considered bug fixes and
immediate for-38 mainline material.

The target_core_mib.c statistics logic using procfs seq_list() has been
removed in [PATCH 12/12] of the most recent series above.

Thanks,

--nab

2011-02-09 20:29:05

by James Bottomley

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, 2011-02-09 at 12:20 -0800, Nicholas A. Bellinger wrote:
> On Wed, 2011-02-09 at 14:13 -0600, James Bottomley wrote:
> > Firstly, could we get the serious bug fixes identified and separated
> > from the general enhancement updates, so they can go in a fixes tree
> > without depending on enhancements? The former category would include
> > the /proc interface removal, since we don't want the legacy interface to
> > be in a released kernel.
> >
>
> Everything in those two series should be considered bug fixes and
> immediate for-38 mainline material.

Things like this:

target: remove EXTRA_CFLAGS
target: Remove unnecessary container_of() pointer check
target: Remove unnecessary se_clear_dev_ports legacy code
target: Remove spurious double cast from structure macro accessors
target: Convert TMR REQ/RSP definitions to target namespace
target: Minor sparse warning fixes and annotations
target: Remove unneeded test of se_cmd

Are not serious bug fixes. I could go either way on some of the error path changes.

James

2011-02-09 20:44:52

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (target_core: rmmod GP fault)

On Wed, 2011-02-09 at 14:28 -0600, James Bottomley wrote:
> On Wed, 2011-02-09 at 12:20 -0800, Nicholas A. Bellinger wrote:
> > On Wed, 2011-02-09 at 14:13 -0600, James Bottomley wrote:
> > > Firstly, could we get the serious bug fixes identified and separated
> > > from the general enhancement updates, so they can go in a fixes tree
> > > without depending on enhancements? The former category would include
> > > the /proc interface removal, since we don't want the legacy interface to
> > > be in a released kernel.
> > >
> >
> > Everything in those two series should be considered bug fixes and
> > immediate for-38 mainline material.
>
> Things like this:
>
> target: remove EXTRA_CFLAGS
> target: Remove unnecessary container_of() pointer check
> target: Remove unnecessary se_clear_dev_ports legacy code
> target: Remove spurious double cast from structure macro accessors
> target: Convert TMR REQ/RSP definitions to target namespace

This is an important one, as using w/o TMR_* prefixed definitions for
task management response/response defs, we run into problems with
existing include/scsi/scsi.h message codes.

> target: Minor sparse warning fixes and annotations
> target: Remove unneeded test of se_cmd
>
> Are not serious bug fixes. I could go either way on some of the error path changes.
>

Yes, these others are minor items that have been submitted by people
(hch, roland, jesper, danc) who have been reviewing target code since it
was merged for .38-rc1.

Considering how minor these are I would prefer to have these merged,
than deferring for-39. If not, then I will need to respin tree w/o
their cleanups if what you prefer to be sent to Linus for-38, there are
3 other bugfix patches in the upstream LIO tree that I have been saving
for-38-rc5 since for-38-rc4 was cut(considering the two outstanding
series). I would like to have these included as well.

Do you want me to send them all to the linux-scsi list again...?

--nab





> James
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-02-09 21:27:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (hysdn: BUG)

On Wed, 9 Feb 2011 11:44:00 -0800 Linus Torvalds wrote:

> On Wed, Feb 9, 2011 at 9:24 AM, Randy Dunlap <[email protected]> wrote:
> >
> > on x86_64. ?no HYSDN hardware found (correct).
> > Nearly allmodconfig.
> >
> >
> > [ ? 65.397577] HYSDN: module Rev: 1.6.6.6 loaded
> > [ ? 65.397584] HYSDN: network interface Rev: 1.8.6.4
> > [ ? 65.398057] HYSDN: 0 card(s) found.
> > [ ? 65.398121] BUG: unable to handle kernel paging request at ffffffffa06c99f0
> > [ ? 65.398269] IP: [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
> > [ ? 65.398379] PGD 1a14067 PUD 1a18063 PMD 6f6c1067 PTE 800000006ce8c161
> > [ ? 65.398613] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
> > [ ? 65.400030]
> > [ ? 65.400030] Pid: 2497, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
> > [ ? 65.400030] RIP: 0010:[<ffffffffa06c68ba>] ?[<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
> > [ ? 65.400030] RSP: 0018:ffff88006eec1e68 ?EFLAGS: 00010206
> > [ ? 65.400030] RAX: ffffffffa06c99f1 RBX: ffffffffa06c99e9 RCX: ffff88007c4159a0
>
> The instruction sequence decodes to
>
> 1e: be 24 00 00 00 mov $0x24,%esi
> 23: 48 89 df mov %rbx,%rdi
> 26: e8 5b 39 c0 e0 callq 0xffffffffe0c03986
> 2b:* c6 40 ff 00 movb $0x0,-0x1(%rax) <-- trapping instruction
>
> which seems to be this
>
> p = strchr(rev, '$');
> *--p = 0;
>
> code. And yes, it's total crap, because while "p" and "rev" are "char
> *", the string that is passed in is actually of type "const char *",
> so that function is seriously broken. It's also seriously broken to
> not test that "p" is non-NULL - the function would just break if there
> is a colon in the string but not a '$'.
>
> And hysdn_procconf_init() passes in a constant string to the thing:
>
> static char *hysdn_procconf_revision = "$Revision: 1.8.6.4 $";
>
> What happens is that it breaks when we mark the constant section as
> read-only, because you have CONFIG_DEBUG_RODATA enabled.
>
> So the fix seems to be to
> - fix the prototype for hysdn_getrev() to not have "const".
> - fix hysdn_procconf_init() to not pass in a constant string to it
>
> The minimal patch would appear to be something like the appended. UNTESTED!

for your patch:

Tested-and-acked-by: Randy Dunlap <[email protected]>

> Btw, all of this code seems to go back to before the git history even
> started, so it doesn't seem to be new. I assume you haven't tried
> booting these all-module kernels before? Or is it just the
> DEBUG_RODATA thing that is new for you?

Neither is new. I tested and reported many-modules on 2.6.37-rc1 and
reported these 2 bugs:

https://bugzilla.kernel.org/show_bug.cgi?id=22912
https://bugzilla.kernel.org/show_bug.cgi?id=22882

and that was with CONFIG_DEBUG_RODATA=y.
I don't know how hysdn was missed at that time.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-09 21:57:17

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (hysdn: BUG)

From: Randy Dunlap <[email protected]>
Date: Wed, 9 Feb 2011 13:25:29 -0800

> On Wed, 9 Feb 2011 11:44:00 -0800 Linus Torvalds wrote:
>
>> On Wed, Feb 9, 2011 at 9:24 AM, Randy Dunlap <[email protected]> wrote:
>> >
>> > on x86_64. ?no HYSDN hardware found (correct).
>> > Nearly allmodconfig.
>> >
>> >
>> > [ ? 65.397577] HYSDN: module Rev: 1.6.6.6 loaded
>> > [ ? 65.397584] HYSDN: network interface Rev: 1.8.6.4
>> > [ ? 65.398057] HYSDN: 0 card(s) found.
>> > [ ? 65.398121] BUG: unable to handle kernel paging request at ffffffffa06c99f0
>> > [ ? 65.398269] IP: [<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
>> > [ ? 65.398379] PGD 1a14067 PUD 1a18063 PMD 6f6c1067 PTE 800000006ce8c161
>> > [ ? 65.398613] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC
>> > [ ? 65.400030]
>> > [ ? 65.400030] Pid: 2497, comm: modprobe Not tainted 2.6.38-rc4 #1 0TY565/OptiPlex 745
>> > [ ? 65.400030] RIP: 0010:[<ffffffffa06c68ba>] ?[<ffffffffa06c68ba>] hysdn_getrev+0x2e/0x50 [hysdn]
>> > [ ? 65.400030] RSP: 0018:ffff88006eec1e68 ?EFLAGS: 00010206
>> > [ ? 65.400030] RAX: ffffffffa06c99f1 RBX: ffffffffa06c99e9 RCX: ffff88007c4159a0
>>
>> The instruction sequence decodes to
>>
>> 1e: be 24 00 00 00 mov $0x24,%esi
>> 23: 48 89 df mov %rbx,%rdi
>> 26: e8 5b 39 c0 e0 callq 0xffffffffe0c03986
>> 2b:* c6 40 ff 00 movb $0x0,-0x1(%rax) <-- trapping instruction
>>
>> which seems to be this
>>
>> p = strchr(rev, '$');
>> *--p = 0;
>>
>> code. And yes, it's total crap, because while "p" and "rev" are "char
>> *", the string that is passed in is actually of type "const char *",
>> so that function is seriously broken. It's also seriously broken to
>> not test that "p" is non-NULL - the function would just break if there
>> is a colon in the string but not a '$'.
>>
>> And hysdn_procconf_init() passes in a constant string to the thing:
>>
>> static char *hysdn_procconf_revision = "$Revision: 1.8.6.4 $";
>>
>> What happens is that it breaks when we mark the constant section as
>> read-only, because you have CONFIG_DEBUG_RODATA enabled.
>>
>> So the fix seems to be to
>> - fix the prototype for hysdn_getrev() to not have "const".
>> - fix hysdn_procconf_init() to not pass in a constant string to it
>>
>> The minimal patch would appear to be something like the appended. UNTESTED!
>
> for your patch:
>
> Tested-and-acked-by: Randy Dunlap <[email protected]>

This stuff just prints out a CVS revision string that hasn't changed
in 10 years into the kernel log.

I propose we just kill this stuff off completely.

I note that there is code in other places of this driver that copy
the read-only revision string into a local string buffer then pass
it into the hysdn_getrev() function, it just doesn't happen in this
one spot.

Anyways, I think I'll fix this like so:

--------------------
isdn: hysdn: Kill (partially buggy) CVS regision log reporting.

Some cases try to modify const strings, and in any event the
CVS revision strings have not changed in over ten years making
these printouts completely worthless.

Just kill all of this stuff off.

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
drivers/isdn/hysdn/hysdn_defs.h | 2 --
drivers/isdn/hysdn/hysdn_init.c | 26 +-------------------------
drivers/isdn/hysdn/hysdn_net.c | 3 ---
drivers/isdn/hysdn/hysdn_procconf.c | 3 +--
4 files changed, 2 insertions(+), 32 deletions(-)

diff --git a/drivers/isdn/hysdn/hysdn_defs.h b/drivers/isdn/hysdn/hysdn_defs.h
index 729df40..18b801a 100644
--- a/drivers/isdn/hysdn/hysdn_defs.h
+++ b/drivers/isdn/hysdn/hysdn_defs.h
@@ -227,7 +227,6 @@ extern hysdn_card *card_root; /* pointer to first card */
/*************************/
/* im/exported functions */
/*************************/
-extern char *hysdn_getrev(const char *);

/* hysdn_procconf.c */
extern int hysdn_procconf_init(void); /* init proc config filesys */
@@ -259,7 +258,6 @@ extern int hysdn_tx_cfgline(hysdn_card *, unsigned char *,

/* hysdn_net.c */
extern unsigned int hynet_enable;
-extern char *hysdn_net_revision;
extern int hysdn_net_create(hysdn_card *); /* create a new net device */
extern int hysdn_net_release(hysdn_card *); /* delete the device */
extern char *hysdn_net_getname(hysdn_card *); /* get name of net interface */
diff --git a/drivers/isdn/hysdn/hysdn_init.c b/drivers/isdn/hysdn/hysdn_init.c
index b7cc5c2..0ab42ac 100644
--- a/drivers/isdn/hysdn/hysdn_init.c
+++ b/drivers/isdn/hysdn/hysdn_init.c
@@ -36,7 +36,6 @@ MODULE_DESCRIPTION("ISDN4Linux: Driver for HYSDN cards");
MODULE_AUTHOR("Werner Cornelius");
MODULE_LICENSE("GPL");

-static char *hysdn_init_revision = "$Revision: 1.6.6.6 $";
static int cardmax; /* number of found cards */
hysdn_card *card_root = NULL; /* pointer to first card */
static hysdn_card *card_last = NULL; /* pointer to first card */
@@ -49,25 +48,6 @@ static hysdn_card *card_last = NULL; /* pointer to first card */
/* Additionally newer versions may be activated without rebooting. */
/****************************************************************************/

-/******************************************************/
-/* extract revision number from string for log output */
-/******************************************************/
-char *
-hysdn_getrev(const char *revision)
-{
- char *rev;
- char *p;
-
- if ((p = strchr(revision, ':'))) {
- rev = p + 2;
- p = strchr(rev, '$');
- *--p = 0;
- } else
- rev = "???";
- return rev;
-}
-
-
/****************************************************************************/
/* init_module is called once when the module is loaded to do all necessary */
/* things like autodetect... */
@@ -175,13 +155,9 @@ static int hysdn_have_procfs;
static int __init
hysdn_init(void)
{
- char tmp[50];
int rc;

- strcpy(tmp, hysdn_init_revision);
- printk(KERN_NOTICE "HYSDN: module Rev: %s loaded\n", hysdn_getrev(tmp));
- strcpy(tmp, hysdn_net_revision);
- printk(KERN_NOTICE "HYSDN: network interface Rev: %s \n", hysdn_getrev(tmp));
+ printk(KERN_NOTICE "HYSDN: module loaded\n");

rc = pci_register_driver(&hysdn_pci_driver);
if (rc)
diff --git a/drivers/isdn/hysdn/hysdn_net.c b/drivers/isdn/hysdn/hysdn_net.c
index feec8d8..11f2cce 100644
--- a/drivers/isdn/hysdn/hysdn_net.c
+++ b/drivers/isdn/hysdn/hysdn_net.c
@@ -26,9 +26,6 @@
unsigned int hynet_enable = 0xffffffff;
module_param(hynet_enable, uint, 0);

-/* store the actual version for log reporting */
-char *hysdn_net_revision = "$Revision: 1.8.6.4 $";
-
#define MAX_SKB_BUFFERS 20 /* number of buffers for keeping TX-data */

/****************************************************************************/
diff --git a/drivers/isdn/hysdn/hysdn_procconf.c b/drivers/isdn/hysdn/hysdn_procconf.c
index 96b3e39..5fe83bd 100644
--- a/drivers/isdn/hysdn/hysdn_procconf.c
+++ b/drivers/isdn/hysdn/hysdn_procconf.c
@@ -23,7 +23,6 @@
#include "hysdn_defs.h"

static DEFINE_MUTEX(hysdn_conf_mutex);
-static char *hysdn_procconf_revision = "$Revision: 1.8.6.4 $";

#define INFO_OUT_LEN 80 /* length of info line including lf */

@@ -404,7 +403,7 @@ hysdn_procconf_init(void)
card = card->next; /* next entry */
}

- printk(KERN_NOTICE "HYSDN: procfs Rev. %s initialised\n", hysdn_getrev(hysdn_procconf_revision));
+ printk(KERN_NOTICE "HYSDN: procfs initialised\n");
return (0);
} /* hysdn_procconf_init */

--
1.7.4

2011-02-09 22:00:41

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs)

From: Randy Dunlap <[email protected]>
Date: Wed, 9 Feb 2011 09:36:56 -0800

>
> There are also bugs or other oopses etc. with ipmi, x25 (rmmod hangs system),
> and (usb gadget) g_audio, but I don't have logs for them yet...
> Someone else could do those.

Randy if you could simply get a CPU program counter when the x25
rmmod hang happens, I can probably fix it quickly.

But if you don't have time I can fiddle around with it myself.

2011-02-09 22:01:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (hysdn: BUG)

On Wed, Feb 9, 2011 at 1:57 PM, David Miller <[email protected]> wrote:
>
> I propose we just kill this stuff off completely.
>
> I note that there is code in other places of this driver that copy
> the read-only revision string into a local string buffer then pass
> it into the hysdn_getrev() function, it just doesn't happen in this
> one spot.
>
> Anyways, I think I'll fix this like so:

Ack from me. Removing terminally buggy and pointless code is certainly
better than trying to fix it up. No amount of lipstick will make that
pig look good.

Linus

2011-02-09 22:18:41

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs)

On Wed, 09 Feb 2011 14:01:15 -0800 (PST) David Miller wrote:

> From: Randy Dunlap <[email protected]>
> Date: Wed, 9 Feb 2011 09:36:56 -0800
>
> >
> > There are also bugs or other oopses etc. with ipmi, x25 (rmmod hangs system),
> > and (usb gadget) g_audio, but I don't have logs for them yet...
> > Someone else could do those.
>
> Randy if you could simply get a CPU program counter when the x25
> rmmod hang happens, I can probably fix it quickly.
>
> But if you don't have time I can fiddle around with it myself.

I can probably get it this (WED.) evening.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-10 05:00:12

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: x25)

On Wed, 09 Feb 2011 14:01:15 -0800 (PST) David Miller wrote:

> From: Randy Dunlap <[email protected]>
> Date: Wed, 9 Feb 2011 09:36:56 -0800
>
> >
> > There are also bugs or other oopses etc. with ipmi, x25 (rmmod hangs system),
> > and (usb gadget) g_audio, but I don't have logs for them yet...
> > Someone else could do those.
>
> Randy if you could simply get a CPU program counter when the x25
> rmmod hang happens, I can probably fix it quickly.
>
> But if you don't have time I can fiddle around with it myself.
> --

Hi Dave,

Here's what I captured before the system hung and the beeper stayed
on constantly. ;)


[ 300.488956] calling x25_init+0x0/0x106 [x25] @ 2571
[ 300.494055] NET: Registered protocol family 9
[ 300.498609] X.25 for Linux Version 0.2
[ 300.503565] initcall x25_init+0x0/0x106 [x25] returned 0 after 9284 usecs
[ 303.931229] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 303.934923] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
[ 303.934923] CPU 1
[ 303.934923] Modules linked in: x25(-) af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod joydev mousedev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usbmouse usbkbd usbhid snd_pcm hid snd_timer sr_mod tg3 pcspkr rtc_cmos dcdbas sg snd iTCO_wdt cdrom i2c_i801 rtc_core processor iTCO_vendor_support rtc_lib 8250_pnp soundcore thermal_sys intel_agp button intel_gtt snd_page_alloc hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
[ 303.934923]
[ 303.934923] Pid: 2573, comm: rmmod Not tainted 2.6.38-rc4 #3 0TY565/OptiPlex 745
[ 303.934923] RIP: 0010:[<ffffffffa069c131>] [<ffffffffa069c131>] x25_link_free+0x41/0x81 [x25]
[ 303.934923] RSP: 0018:ffff880061777eb8 EFLAGS: 00010202
[ 303.934923] RAX: 6b6b6b6b6b6b6b6b RBX: ffffffffa06a03d0 RCX: 0010000000004040
[ 303.934923] RDX: 00000000001c0016 RSI: ffff88005ecc6438 RDI: 0000000000000216
[ 303.934923] RBP: ffff880061777ec8 R08: ffff88007f004c80 R09: 000000000000005a
[ 303.934923] R10: ffffffff8182c266 R11: ffff880061777e18 R12: ffffffffa06a03d0
[ 303.934923] R13: 00007fff94141220 R14: 0000000000000000 R15: 0000000000000001
[ 303.934923] FS: 00007fc688ac36f0(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
[ 303.934923] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 303.934923] CR2: 00000000008faf20 CR3: 000000006873c000 CR4: 00000000000006e0
[ 303.934923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 303.934923] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 303.934923] Process rmmod (pid: 2573, threadinfo ffff880061776000, task ffff88005cec3000)
[ 303.934923] Stack:
[ 303.934923] ffffffffa06a0b10 0000000000000000 ffff880061777ed8 ffffffffa069c085
[ 303.934923] ffff880061777f78 ffffffff810d3e94 ffffffffa06a0b10 0000000000000880
[ 303.934923] ffff880061777f14 ffffffff8155da0e ffff880061777f28 0000000081136a5f
[ 303.934923] Call Trace:
[ 303.934923] [<ffffffffa069c085>] x25_exit+0x21/0x8c [x25]
[ 303.934923] [<ffffffff810d3e94>] sys_delete_module+0x2d6/0x368
[ 303.934923] [<ffffffff8155da0e>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 303.934923] [<ffffffff810fd3f2>] ? audit_syscall_entry+0x172/0x1a5
[ 303.934923] [<ffffffff8155d998>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 303.934923] [<ffffffff8100e942>] system_call_fastpath+0x16/0x1b
[ 303.934923] Code: 56 6a 00 00 e8 6e 22 ec e0 48 8b 1d ba 42 00 00 4c 8b 23 eb 2e 48 89 df 48 ff 05 4b 6a 00 00 e8 62 d1 ff ff 48 8b 43 10 4c 89 e3 <48> 8b 80 80 04 00 00 65 ff 08 49 8b 04 24 48 ff 05 22 6a 00 00
[ 303.934923] RIP [<ffffffffa069c131>] x25_link_free+0x41/0x81 [x25]
[ 303.934923] RSP <ffff880061777eb8>


HTH.
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-10 05:48:23

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: x25)

From: Randy Dunlap <[email protected]>
Date: Wed, 9 Feb 2011 20:58:42 -0800

> Here's what I captured before the system hung and the beeper stayed
> on constantly. ;)

:-)

> [ 303.931229] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 303.934923] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
> [ 303.934923] CPU 1
> [ 303.934923] Modules linked in: x25(-) af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod joydev mousedev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usbmouse usbkbd usbhid snd_pcm hid snd_timer sr_mod tg3 pcspkr rtc_cmos dcdbas sg snd iTCO_wdt cdrom i2c_i801 rtc_core processor iTCO_vendor_support rtc_lib 8250_pnp soundcore thermal_sys intel_agp button intel_gtt snd_page_alloc hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
> [ 303.934923]
> [ 303.934923] Pid: 2573, comm: rmmod Not tainted 2.6.38-rc4 #3 0TY565/OptiPlex 745
> [ 303.934923] RIP: 0010:[<ffffffffa069c131>] [<ffffffffa069c131>] x25_link_free+0x41/0x81 [x25]

Ok, a GPF in x25_link_free().

This code simply traverses the x25_neigh_list, unlinking and releasing
each entry it finds.

Every node entry which is added to this list is dynamically allocated
entry. See x25_link_device_up(), which is the only place where a
list_add() is performed on the x25_neigh_list.

The device should be accessible and the dev_put() should not cause
trouble because we grabbed a reference to this device when
x25_link_device_up() added the new x25_neigh to the list.

I can't see anything here that should barf like this.

I also can't see anything "const" in the x25 protocol code that might
be trampled upon.

I'm assuming in all of this that it's a write to a read-only location
which is causing this GPF, via CONFIG_DEBUG_RODATA.

Playing around with config options and looking at the various x86_64 asm
in these different cases seems to suggest that it's indeed the dev_put()
that is causing the GPF.

Network devices use per-cpu refcounts.

We know that at some point in the past, the ref bump worked, because
we did a dev_hold() when we added the referencing x25_neigh entry to
the list.

For some reason now it fails.

RAX is where the per-cpu base pointer should be, and in your dump
that's:

[ 303.934923] RAX: 6b6b6b6b6b6b6b6b RBX: ffffffffa06a03d0 RCX: 0010000000004040

Which is the SLAB free poison value.

So it seems like the network device at nb->dev has been freed for some
reason.

Weird....

Oh, the bug is obvious... 'nb' is freed right before we 'nb->dev', duh.

Please try this fix:

--------------------
x25: Do not reference freed memory.

In x25_link_free(), we destroy 'nb' before dereferencing
'nb->dev'. Don't do this, because 'nb' might be freed
by then.

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
---
net/x25/x25_link.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/x25/x25_link.c b/net/x25/x25_link.c
index 4cbc942..2130692 100644
--- a/net/x25/x25_link.c
+++ b/net/x25/x25_link.c
@@ -396,9 +396,12 @@ void __exit x25_link_free(void)
write_lock_bh(&x25_neigh_list_lock);

list_for_each_safe(entry, tmp, &x25_neigh_list) {
+ struct net_device *dev;
+
nb = list_entry(entry, struct x25_neigh, node);
+ dev = nb->dev;
__x25_remove_neigh(nb);
- dev_put(nb->dev);
+ dev_put(dev);
}
write_unlock_bh(&x25_neigh_list_lock);
}
--
1.7.4

2011-02-10 06:31:06

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: x25)

On 02/09/11 21:48, David Miller wrote:
> From: Randy Dunlap <[email protected]>
> Date: Wed, 9 Feb 2011 20:58:42 -0800
>
>> Here's what I captured before the system hung and the beeper stayed
>> on constantly. ;)
>
> :-)
>
>> [ 303.931229] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [ 303.934923] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1/6-1.3/devnum
>> [ 303.934923] CPU 1
>> [ 303.934923] Modules linked in: x25(-) af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod joydev mousedev evdev mac_hid snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usbmouse usbkbd usbhid snd_pcm hid snd_timer sr_mod tg3 pcspkr rtc_cmos dcdbas sg snd iTCO_wdt cdrom i2c_i801 rtc_core processor iTCO_vendor_support rtc_lib 8250_pnp soundcore thermal_sys intel_agp button intel_gtt snd_page_alloc hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
>> [ 303.934923]
>> [ 303.934923] Pid: 2573, comm: rmmod Not tainted 2.6.38-rc4 #3 0TY565/OptiPlex 745
>> [ 303.934923] RIP: 0010:[<ffffffffa069c131>] [<ffffffffa069c131>] x25_link_free+0x41/0x81 [x25]
>
> Ok, a GPF in x25_link_free().
>
> This code simply traverses the x25_neigh_list, unlinking and releasing
> each entry it finds.
>
> Every node entry which is added to this list is dynamically allocated
> entry. See x25_link_device_up(), which is the only place where a
> list_add() is performed on the x25_neigh_list.
>
> The device should be accessible and the dev_put() should not cause
> trouble because we grabbed a reference to this device when
> x25_link_device_up() added the new x25_neigh to the list.
>
> I can't see anything here that should barf like this.
>
> I also can't see anything "const" in the x25 protocol code that might
> be trampled upon.
>
> I'm assuming in all of this that it's a write to a read-only location
> which is causing this GPF, via CONFIG_DEBUG_RODATA.
>
> Playing around with config options and looking at the various x86_64 asm
> in these different cases seems to suggest that it's indeed the dev_put()
> that is causing the GPF.
>
> Network devices use per-cpu refcounts.
>
> We know that at some point in the past, the ref bump worked, because
> we did a dev_hold() when we added the referencing x25_neigh entry to
> the list.
>
> For some reason now it fails.
>
> RAX is where the per-cpu base pointer should be, and in your dump
> that's:
>
> [ 303.934923] RAX: 6b6b6b6b6b6b6b6b RBX: ffffffffa06a03d0 RCX: 0010000000004040
>
> Which is the SLAB free poison value.
>
> So it seems like the network device at nb->dev has been freed for some
> reason.
>
> Weird....
>
> Oh, the bug is obvious... 'nb' is freed right before we 'nb->dev', duh.
>
> Please try this fix:

Yes, that survives 5 loads/rmmods. Thanks.

Tested-and-acked-by: Randy Dunlap <[email protected]>


> --------------------
> x25: Do not reference freed memory.
>
> In x25_link_free(), we destroy 'nb' before dereferencing
> 'nb->dev'. Don't do this, because 'nb' might be freed
> by then.
>
> Reported-by: Randy Dunlap <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> ---
> net/x25/x25_link.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/net/x25/x25_link.c b/net/x25/x25_link.c
> index 4cbc942..2130692 100644
> --- a/net/x25/x25_link.c
> +++ b/net/x25/x25_link.c
> @@ -396,9 +396,12 @@ void __exit x25_link_free(void)
> write_lock_bh(&x25_neigh_list_lock);
>
> list_for_each_safe(entry, tmp, &x25_neigh_list) {
> + struct net_device *dev;
> +
> nb = list_entry(entry, struct x25_neigh, node);
> + dev = nb->dev;
> __x25_remove_neigh(nb);
> - dev_put(nb->dev);
> + dev_put(dev);
> }
> write_unlock_bh(&x25_neigh_list_lock);
> }


--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-10 06:34:27

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: x25)

From: Randy Dunlap <[email protected]>
Date: Wed, 09 Feb 2011 22:29:50 -0800

> On 02/09/11 21:48, David Miller wrote:
>> Please try this fix:
>
> Yes, that survives 5 loads/rmmods. Thanks.
>
> Tested-and-acked-by: Randy Dunlap <[email protected]>

Thanks a lot Randy!

2011-02-10 19:36:43

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: ipmi Oops)

On Wed, 9 Feb 2011 09:36:56 -0800 Randy Dunlap wrote:

>
> There are also bugs or other oopses etc. with ipmi, x25 (rmmod hangs system),
> and (usb gadget) g_audio, but I don't have logs for them yet...
> Someone else could do those.
>
> Oh, and ext4, which was reported on the ext4 mailing list.


Loading ipmi_si module a second time causes an Oops:


[ 66.603456] calling ipmi_init_msghandler_mod+0x0/0x1000 [ipmi_msghandler] @ 2443
[ 66.611879] ipmi message handler version 39.2
[ 66.617403] initcall ipmi_init_msghandler_mod+0x0/0x1000 [ipmi_msghandler] returned 0 after 5548 usecs
[ 66.707849] calling init_ipmi_si+0x0/0x4cd [ipmi_si] @ 2443
[ 66.713933] IPMI System Interface driver.
[ 66.718605] ipmi_si: Adding default-specified kcs state machine
[ 66.724835] ipmi_si: Trying default-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
[ 66.735302] ipmi_si: Interface detection failed
[ 66.762751] ipmi_si: Adding default-specified smic state machine
[ 66.769292] ipmi_si: Trying default-specified smic state machine at i/o address 0xca9, slave address 0x0, irq 0
[ 66.779869] ipmi_si: Interface detection failed
[ 66.800186] ipmi_si: Adding default-specified bt state machine
[ 66.806567] ipmi_si: Trying default-specified bt state machine at i/o address 0xe4, slave address 0x0, irq 0
[ 66.816866] ipmi_si: Interface detection failed
[ 66.836603] ipmi_si: Unable to find any System Interface(s)
[ 66.842704] initcall init_ipmi_si+0x0/0x4cd [ipmi_si] returned -19 after 125889 usecs
[ 68.105422] calling init_ipmi_si+0x0/0x4cd [ipmi_si] @ 2453
[ 68.111878] IPMI System Interface driver.
[ 68.116576] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 68.120143] last sysfs file: /sys/module/ipmi_msghandler/initstate
[ 68.120143] CPU 0
[ 68.120143] Modules linked in: ipmi_si(+) ipmi_msghandler af_packet nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod snd_hda_codec_analog joydev mousedev evdev mac_hid snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device usbmouse snd_pcm usbkbd usbhid snd_timer hid snd sr_mod tg3 dcdbas cdrom pcspkr rtc_cmos i2c_i801 soundcore sg iTCO_wdt rtc_core processor snd_page_alloc iTCO_vendor_support intel_agp rtc_lib thermal_sys 8250_pnp intel_gtt button hwmon unix ide_pci_generic ide_core ata_generic pata_acpi ata_piix sd_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ssb mmc_core pcmcia pcmcia_core firmware_class ehci_hcd usbcore nls_base [last unloaded: microcode]
[ 68.120143]
[ 68.120143] Pid: 2453, comm: modprobe Not tainted 2.6.38-rc4 #4 0TY565/OptiPlex 745
[ 68.120143] RIP: 0010:[<ffffffff813fc579>] [<ffffffff813fc579>] put_driver+0x10/0x22
[ 68.120143] RSP: 0018:ffff88005ee8be98 EFLAGS: 00010202
[ 68.120143] RAX: ffffffffa06a8430 RBX: ffffffffa06bd430 RCX: ffff88007c4187b0
[ 68.120143] RDX: 00000000000b000a RSI: 0000000fdc10217c RDI: ffffffffa06a8430
[ 68.120143] RBP: ffff88005ee8be98 R08: 0000000000000000 R09: 0000000000000000
[ 68.120143] R10: ffffffff812be7f7 R11: ffff88005ee8bcd8 R12: ffffffffa06b86e9
[ 68.120143] R13: 0000000000000000 R14: 0000000fdbc6e8dc R15: 0000000000000000
[ 68.120143] FS: 00007fd2014156f0(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 68.120143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 68.120143] CR2: ffffffffa06a8490 CR3: 000000005cec7000 CR4: 00000000000006f0
[ 68.120143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 68.120143] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 68.120143] Process modprobe (pid: 2453, threadinfo ffff88005ee8a000, task ffff8800606ab000)
[ 68.120143] Stack:
[ 68.120143] ffff88005ee8bec8 ffffffff813fc64b 0000000000000000 ffffffffa06b86e9
[ 68.340115] 0000000000000000 0000000fdbc6e8dc ffff88005ee8bed8 ffffffff8137f5de
[ 68.340115] ffff88005ee8bf18 ffffffffa06b888d ffff88005ee8bf18 ffffffff810b5cee
[ 68.340115] Call Trace:
[ 68.340115] [<ffffffff813fc64b>] driver_register+0xc0/0x1b2
[ 68.340115] [<ffffffffa06b86e9>] ? init_ipmi_si+0x0/0x4cd [ipmi_si]
[ 68.340115] [<ffffffff8137f5de>] pnp_register_driver+0x28/0x31
[ 68.340115] [<ffffffffa06b888d>] init_ipmi_si+0x1a4/0x4cd [ipmi_si]
[ 68.340115] [<ffffffff810b5cee>] ? ktime_get+0x88/0xde
[ 68.340115] [<ffffffffa06b86e9>] ? init_ipmi_si+0x0/0x4cd [ipmi_si]
[ 68.340115] [<ffffffff810020a6>] do_one_initcall+0x6c/0x1e3
[ 68.340115] [<ffffffff810d4998>] sys_init_module+0x12b/0x307
[ 68.340115] [<ffffffff8100e902>] system_call_fastpath+0x16/0x1b
[ 68.340115] Code: 85 f6 75 d9 48 89 df e8 6f de ff ff 48 ff 05 d7 af 80 01 58 5b 41 5c 41 5d c9 c3 55 48 89 e5 0f 1f 44 00 00 48 ff 05 c7 af 80 01 <48> 8b 7f 60 e8 38 27 ec ff 48 ff 05 bf af 80 01 c9 c3 55 48 89
[ 68.340115] RIP [<ffffffff813fc579>] put_driver+0x10/0x22
[ 68.340115] RSP <ffff88005ee8be98>
[ 68.340115] CR2: ffffffffa06a8490
[ 68.340115] ---[ end trace 0c99b6dc9c0e95a3 ]---


This is on x86_64, almost allmodconfig.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-10 20:03:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: ipmi Oops)

On Thu, Feb 10, 2011 at 11:34 AM, Randy Dunlap <[email protected]> wrote:
>
> Loading ipmi_si module a second time causes an Oops:
>
> [ ? 68.120143] RIP: 0010:[<ffffffff813fc579>] ?[<ffffffff813fc579>] put_driver+0x10/0x22

The disassembly is

55 push %rbp
48 89 e5 mov %rsp,%rbp
0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
48 ff 05 c7 af 80 01 incq 0x180afc7(%rip) # 0x180aff2
* 48 8b 7f 60 mov 0x60(%rdi),%rdi <-- trapping instruction
e8 38 27 ec ff callq 0xffffffffffec276c
48 ff 05 bf af 80 01 incq 0x180afbf(%rip) # 0x180affa
c9 leaveq
c3 retq

which is the access of "drv->p" in that function:

kobject_put(&drv->p->kobj);

so "drv" that was passed in was just bogus. (it's
"0xffffffffa06a8430", looks like it's the DEBUG_PAGEALLOC that has
caused the page to be free'd).

> [ ? 68.340115] Call Trace:
> [ ? 68.340115] ?[<ffffffff813fc64b>] driver_register+0xc0/0x1b2
> [ ? 68.340115] ?[<ffffffff8137f5de>] pnp_register_driver+0x28/0x31
> [ ? 68.340115] ?[<ffffffffa06b888d>] init_ipmi_si+0x1a4/0x4cd [ipmi_si]
> [ ? 68.340115] ?[<ffffffff810020a6>] do_one_initcall+0x6c/0x1e3
> [ ? 68.340115] ?[<ffffffff810d4998>] sys_init_module+0x12b/0x307

And I think that - as usual - the problem is that the damn driver
cleanup is very ugly, and has this duplicate set of code to unregister
all the random crap. Except one of the duplicates is missing one case.
I think the bug was introduced by Gjorn Helgaas in commit 9e368fa011d4
("ipmi: add PNP discovery (ACPI namespace via PNPACPI)") which added
the acpi pnp case, but only unregistered it on the regular module exit
path, not on the "module loaded with no pnp devices" path.

Does this patch fix it? And Corey - this is a good example of why the
code shouldn't duplicate the "unregister stuff" in the module load
error case vs the module exit path, and there should be a shared
"cleanup()" function that is called by both. Can this be cleaned up,
please?

PATCH IS UNTESTED!

Linus


Attachments:
patch.diff (567.00 B)

2011-02-10 20:09:19

by Corey Minyard

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: ipmi Oops)

On 02/10/2011 02:03 PM, Linus Torvalds wrote:
> On Thu, Feb 10, 2011 at 11:34 AM, Randy Dunlap<[email protected]> wrote:
>> Loading ipmi_si module a second time causes an Oops:
>>
>> [ 68.120143] RIP: 0010:[<ffffffff813fc579>] [<ffffffff813fc579>] put_driver+0x10/0x22
> The disassembly is
>
> 55 push %rbp
> 48 89 e5 mov %rsp,%rbp
> 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 48 ff 05 c7 af 80 01 incq 0x180afc7(%rip) # 0x180aff2
> * 48 8b 7f 60 mov 0x60(%rdi),%rdi<-- trapping instruction
> e8 38 27 ec ff callq 0xffffffffffec276c
> 48 ff 05 bf af 80 01 incq 0x180afbf(%rip) # 0x180affa
> c9 leaveq
> c3 retq
>
> which is the access of "drv->p" in that function:
>
> kobject_put(&drv->p->kobj);
>
> so "drv" that was passed in was just bogus. (it's
> "0xffffffffa06a8430", looks like it's the DEBUG_PAGEALLOC that has
> caused the page to be free'd).
>
>> [ 68.340115] Call Trace:
>> [ 68.340115] [<ffffffff813fc64b>] driver_register+0xc0/0x1b2
>> [ 68.340115] [<ffffffff8137f5de>] pnp_register_driver+0x28/0x31
>> [ 68.340115] [<ffffffffa06b888d>] init_ipmi_si+0x1a4/0x4cd [ipmi_si]
>> [ 68.340115] [<ffffffff810020a6>] do_one_initcall+0x6c/0x1e3
>> [ 68.340115] [<ffffffff810d4998>] sys_init_module+0x12b/0x307
> And I think that - as usual - the problem is that the damn driver
> cleanup is very ugly, and has this duplicate set of code to unregister
> all the random crap. Except one of the duplicates is missing one case.
> I think the bug was introduced by Gjorn Helgaas in commit 9e368fa011d4
> ("ipmi: add PNP discovery (ACPI namespace via PNPACPI)") which added
> the acpi pnp case, but only unregistered it on the regular module exit
> path, not on the "module loaded with no pnp devices" path.
Yes, I already have a patch (that was neglected) from Peter Huewe to fix
this problem. I'll send it today once I finish testing it.

> Does this patch fix it? And Corey - this is a good example of why the
> code shouldn't duplicate the "unregister stuff" in the module load
> error case vs the module exit path, and there should be a shared
> "cleanup()" function that is called by both. Can this be cleaned up,
> please?
I will work on that.

-corey

2011-02-10 21:42:35

by Randy Dunlap

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (other bugs: ipmi Oops)

On 02/10/11 12:03, Linus Torvalds wrote:
> On Thu, Feb 10, 2011 at 11:34 AM, Randy Dunlap <[email protected]> wrote:
>>
>> Loading ipmi_si module a second time causes an Oops:
>>
>> [ 68.120143] RIP: 0010:[<ffffffff813fc579>] [<ffffffff813fc579>] put_driver+0x10/0x22
>
> The disassembly is
>
> 55 push %rbp
> 48 89 e5 mov %rsp,%rbp
> 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 48 ff 05 c7 af 80 01 incq 0x180afc7(%rip) # 0x180aff2
> * 48 8b 7f 60 mov 0x60(%rdi),%rdi <-- trapping instruction
> e8 38 27 ec ff callq 0xffffffffffec276c
> 48 ff 05 bf af 80 01 incq 0x180afbf(%rip) # 0x180affa
> c9 leaveq
> c3 retq
>
> which is the access of "drv->p" in that function:
>
> kobject_put(&drv->p->kobj);
>
> so "drv" that was passed in was just bogus. (it's
> "0xffffffffa06a8430", looks like it's the DEBUG_PAGEALLOC that has
> caused the page to be free'd).
>
>> [ 68.340115] Call Trace:
>> [ 68.340115] [<ffffffff813fc64b>] driver_register+0xc0/0x1b2
>> [ 68.340115] [<ffffffff8137f5de>] pnp_register_driver+0x28/0x31
>> [ 68.340115] [<ffffffffa06b888d>] init_ipmi_si+0x1a4/0x4cd [ipmi_si]
>> [ 68.340115] [<ffffffff810020a6>] do_one_initcall+0x6c/0x1e3
>> [ 68.340115] [<ffffffff810d4998>] sys_init_module+0x12b/0x307
>
> And I think that - as usual - the problem is that the damn driver
> cleanup is very ugly, and has this duplicate set of code to unregister
> all the random crap. Except one of the duplicates is missing one case.
> I think the bug was introduced by Gjorn Helgaas in commit 9e368fa011d4
> ("ipmi: add PNP discovery (ACPI namespace via PNPACPI)") which added
> the acpi pnp case, but only unregistered it on the regular module exit
> path, not on the "module loaded with no pnp devices" path.
>
> Does this patch fix it? And Corey - this is a good example of why the
> code shouldn't duplicate the "unregister stuff" in the module load
> error case vs the module exit path, and there should be a shared
> "cleanup()" function that is called by both. Can this be cleaned up,
> please?
>
> PATCH IS UNTESTED!

That works.

Acked-and-tested-by: Randy Dunlap <[email protected]>

thanks,
--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-02-13 17:40:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Wed, Feb 9, 2011 at 8:02 AM, Linus Torvalds
<[email protected]> wrote:
>
> Well, the thing is, Eric said he was using ext4.
>
> And there are absolutely no changes I can see after -rc3 that would
> affect anything like this.

Hmm. Eric - mind testing current -git?

J. R. Okajima found a possible problem with the new RCU filename
lookup, which could corrupt the filp_cache. I'd expect the normal
result to be an oops, but maybe there could be memory corruption. And
the easiest way to trigger it would probably be to have lots of
concurrent fs activity with renames.

Now, it's not new to -rc4: the whole rcu lookup thing was merged into
-rc1. But since I still don't see anything that looks likely to be
introduced after -rc3, it might not hurt to think that maybe it's just
rare enough that you just thought -rc3 was ok, and then you were
unlucky with -rc4.

Linus

2011-02-14 02:05:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

> On Wed, Feb 9, 2011 at 8:02 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Well, the thing is, Eric said he was using ext4.
>>
>> And there are absolutely no changes I can see after -rc3 that would
>> affect anything like this.
>
> Hmm. Eric - mind testing current -git?

Sorry for taking so long to get back to this. I came down with
a nasty cold and haven't been had much time.

While I haven't been doing anything the machine has been still running
the builds so I have some interesting test results.

The build failures appear to have been due to a corrupted ccache. A
coworker turned off using the ccache and the compiles started working
again. Unfortunately I can't qualify when my ccache got corrupted,
or give a hint at which kernel bug caused the corrupted cache. I
expected it happened in whatever I tested just before -rc3.


There is something corrupting my page tables.

messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88028688b748 pmd:28688b067
messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88028688b748 pmd:28688b067
messages:Feb 13 12:52:17 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff880011065748 pmd:11065067
messages:Feb 13 12:52:17 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff880011065748 pmd:11065067
messages:Feb 13 12:52:27 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff8802460d3748 pmd:2460d3067
messages:Feb 13 12:52:27 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff8802460d3748 pmd:2460d3067
messages-20110213:Feb 7 05:50:21 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff8801d256b748 pmd:1d256b067
messages-20110213:Feb 7 05:50:21 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff8801d256b748 pmd:1d256b067
messages-20110213:Feb 7 18:34:32 bs38 kernel: BUG: Bad page map in process Mlag pte:ffff8800cad2d748 pmd:cad2d067
messages-20110213:Feb 7 18:34:33 bs38 kernel: BUG: Bad page map in process Mlag pte:ffff8800cad2d748 pmd:cad2d067
messages-20110213:Feb 7 18:35:11 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88003c021748 pmd:3c021067
messages-20110213:Feb 7 18:35:12 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88003c021748 pmd:3c021067
messages-20110213:Feb 8 04:08:26 bs38 kernel: BUG: Bad page map in process IgmpSnooping pte:ffff880288b29748 pmd:288b29067
messages-20110213:Feb 8 04:08:26 bs38 kernel: BUG: Bad page map in process IgmpSnooping pte:ffff880288b29748 pmd:288b29067
messages-20110213:Feb 10 14:21:34 bs38 kernel: BUG: Bad page map in process pylint pte:ffff8802984d7c28 pmd:2984d7067
messages-20110213:Feb 10 14:21:35 bs38 kernel: BUG: Bad page map in process pylint pte:ffff8802984d7c28 pmd:2984d7067
messages-20110213:Feb 11 00:02:32 bs38 kernel: BUG: soft lockup - CPU#5 stuck for 67s! [kswapd0:57]
messages-20110213:Feb 11 02:03:33 bs38 kernel: BUG: Bad page map in process configure pte:ffff880299b1b748 pmd:299b1b067
messages-20110213:Feb 11 02:03:33 bs38 kernel: BUG: Bad page map in process configure pte:ffff880299b1b748 pmd:299b1b067
messages-20110213:Feb 11 17:16:36 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88013efa9748 pmd:13efa9067
messages-20110213:Feb 11 17:16:37 bs38 kernel: BUG: Bad page map in process [manager] pte:ffff88013efa9748 pmd:13efa9067

> J. R. Okajima found a possible problem with the new RCU filename
> lookup, which could corrupt the filp_cache. I'd expect the normal
> result to be an oops, but maybe there could be memory corruption. And
> the easiest way to trigger it would probably be to have lots of
> concurrent fs activity with renames.

It does look like I have seen something like that. I will update
shortly and hopefully I can see something tomorrow.

I still have about half a dozen unclassified failures of my tests under
-rc4 that I haven't been seen anywhere. But at least I have them all
running

> Now, it's not new to -rc4: the whole rcu lookup thing was merged into
> -rc1. But since I still don't see anything that looks likely to be
> introduced after -rc3, it might not hurt to think that maybe it's just
> rare enough that you just thought -rc3 was ok, and then you were
> unlucky with -rc4.

I have some unexpected kernel crashes as well.
With 2.6.38-rc3 (something I think this was a git snapshot) I saw:

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
<1>IP: [<ffffffff81069008>] do_raw_spin_lock+0x9/0x1a
<4>PGD 0
<0>Oops: 0002 [#1] SMP
<0>last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
<4>CPU 5
<4>Modules linked in: macvtap ipt_LOG xt_limit ipt_REJECT xt_hl xt_state dummy tulip xt_tcpudp iptable_filter inet_diag veth macvlan nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc dm_mirror dm_region_hash dm_log uinput bonding ipv6 kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q serio_raw sg shpchp pcspkr i5k_amb iTCO_wdt iTCO_vendor_support i2c_i801 i5400_edac ioatdma ghes microcode edac_core hed dca radeon ttm drm_kms_helper drm hwmon sr_mod i2c_algo_bit i2c_core uhci_hcd igb ehci_hcd cdrom netxen_nic dm_mod [last unloaded: mperf]
<4>
<4>Pid: 57, comm: kswapd0 Tainted: G B 2.6.38-rc3-355347.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
<4>RIP: 0010:[<ffffffff81069008>] [<ffffffff81069008>] do_raw_spin_lock+0x9/0x1a
<4>RSP: 0000:ffff880296ee5a90 EFLAGS: 00010246
<4>RAX: 0000000000000100 RBX: ffff880072d529b0 RCX: ffff880296ee5bf8
<4>RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000008
<4>RBP: ffff880296ee5a90 R08: dead000000200200 R09: dead000000100100
<4>R10: 0000000000014a0c R11: 00000000000149b8 R12: 0000000000000000
<4>R13: ffffea00060d7cc8 R14: ffff880296ee5c80 R15: 0000000000000001
<4>FS: 0000000000000000(0000) GS:ffff8800cfd40000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000008 CR3: 0000000001803000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process kswapd0 (pid: 57, threadinfo ffff880296ee4000, task ffff88029adc6040)
<0>Stack:
<4> ffff880296ee5aa0 ffffffff813d0a0c ffff880296ee5ad0 ffffffff810d30cd
<4> ffffea00060bdcb8 ffffea00060d7cc8 0000000000000000 ffff880072d529b1
<4> ffff880296ee5b80 ffffffff810d3633 ffffea00060bdcb8 ffffffff8181ff70
<0>Call Trace:
<4> [<ffffffff813d0a0c>] _raw_spin_lock+0x9/0xb
<4> [<ffffffff810d30cd>] __page_lock_anon_vma+0x3a/0x54
<4> [<ffffffff810d3633>] page_referenced+0xaf/0x240
<4> [<ffffffff810bbbe4>] ? pageout+0x223/0x233
<4> [<ffffffff810bcfda>] shrink_page_list+0x154/0x49e
<4> [<ffffffff810bd762>] shrink_inactive_list+0x234/0x386
<4> [<ffffffff810b79da>] ? determine_dirtyable_memory+0x18/0x21
<4> [<ffffffff810bdede>] shrink_zone+0x356/0x418
<4> [<ffffffff810b3eef>] ? zone_watermark_ok_safe+0x9c/0xa9
<4> [<ffffffff810bed0e>] kswapd+0x4f6/0x84d
<4> [<ffffffff810be818>] ? kswapd+0x0/0x84d
<4> [<ffffffff81057de9>] kthread+0x7d/0x85
<4> [<ffffffff810037a4>] kernel_thread_helper+0x4/0x10
<4> [<ffffffff81057d6c>] ? kthread+0x0/0x85
<4> [<ffffffff810037a0>] ? kernel_thread_helper+0x0/0x10
<0>Code: 00 00 01 74 05 e8 49 be 18 00 c9 c3 55 48 89 e5 f0 ff 07 c9 c3 55 48 89 e5 f0 81 07 00 00 00 01 c9 c3 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c9 c3 55 48 89 e5
<1>RIP [<ffffffff81069008>] do_raw_spin_lock+0x9/0x1a
<4> RSP <ffff880296ee5a90>
<0>CR2: 0000000000000008

With 2.6.38-rc4 I have seen:
<0>general protection fault: 0000 [#1] SMP
<0>last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
<4>CPU 6
<4>Modules linked in: dummy tulip xt_tcpudp iptable_filter inet_diag veth macvlan nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc dm_mirror dm_region_hash dm_log uinput bonding ipv6 kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q iTCO_wdt iTCO_vendor_support i5k_amb i5400_edac ioatdma edac_core dca i2c_i801 serio_raw shpchp sg pcspkr ghes microcode hed radeon ttm drm_kms_helper drm sr_mod hwmon i2c_algo_bit i2c_core igb netxen_nic cdrom ehci_hcd uhci_hcd dm_mod [last unloaded: mperf]
<4>
<4>Pid: 7643, comm: netnsd Not tainted 2.6.38-rc4-355739.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
<4>RIP: 0010:[<ffffffff810326b0>] [<ffffffff810326b0>] post_schedule+0x7/0x4e
<4>RSP: 0000:ffff8802981c5bf8 EFLAGS: 00010287
<4>RAX: 0000000000000006 RBX: ffff100367f45c28 RCX: ffff8801a6af0dc0
<4>RDX: ffff8802981c5fd8 RSI: ffff8801a6af0dc0 RDI: ffff100367f45c28
<4>RBP: ffff8802981c5c08 R08: ffff8802981c4000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800036f2a00
<4>R13: ffff880296bc2a00 R14: ffff8801a6af1068 R15: 0000000000000006
<4>FS: 0000000000000000(0000) GS:ffff8800cfd80000(0063) knlGS:00000000f74e76d0
<4>CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<4>CR2: 00000000ffd70f80 CR3: 0000000297dc9000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process netnsd (pid: 7643, threadinfo ffff8802981c4000, task ffff8801d2f1a260)
<0>Stack:
<4> ffff100367f45c28 ffff8800036f2a00 ffff8802981c5cb8 ffffffff813cf98c
<4> ffff8802981c5ca8 00000000000118c0 ffff8802981c5c28 ffff8802981c5c28
<4> 00000000000118c0 ffff8801d2f1a260 00000000000118c0 ffff8802981c5fd8
<0>Call Trace:
<4> [<ffffffff813cf98c>] schedule+0x544/0x577
<4> [<ffffffff813cfb4f>] schedule_timeout+0x22/0xbb
<4> [<ffffffff813d0a5f>] ? _raw_spin_unlock_irqrestore+0x11/0x13
<4> [<ffffffff81058427>] ? prepare_to_wait_exclusive+0x70/0x7b
<4> [<ffffffff813386e5>] __skb_recv_datagram+0x1ec/0x264
<4> [<ffffffff810e3da8>] ? arch_local_irq_save+0x16/0x1c
<4> [<ffffffff8133877e>] ? receiver_wake_function+0x0/0x1a
<4> [<ffffffff8133877c>] skb_recv_datagram+0x1f/0x21
<4> [<ffffffff813aefeb>] unix_accept+0x55/0x103
<4> [<ffffffff8132efcb>] sys_accept4+0xf3/0x1c3
<4> [<ffffffff81076155>] ? compat_sys_wait4+0x26/0xc3
<4> [<ffffffff813d0a4c>] ? _raw_spin_lock_irq+0x1a/0x1c
<4> [<ffffffff8104f34a>] ? do_sigaction+0x168/0x179
<4> [<ffffffff8102e15b>] ? ia32_restore_sigcontext+0x136/0x15c
<4> [<ffffffff81353b97>] compat_sys_socketcall+0x17d/0x186
<4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
<0>Code: 49 89 c4 8b 75 e8 48 89 df 31 c9 e8 a3 d4 ff ff 4c 89 e6 48 89 df e8 ae e3 39 00 48 83 c4 20 5b 41 5c c9 c3 55 48 89 e5 41 54 53 <83> bf 74 08 00 00 00 48 89 fb 74 36 e8 4d e3 39 00 49 89 c4 48
<1>RIP [<ffffffff810326b0>] post_schedule+0x7/0x4e
<4> RSP <ffff8802981c5bf8>


With 2.6.38-rc4 I have seen:
<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
<1>IP: [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
<4>PGD 15a66d067 PUD 15a65a067 PMD 0
<0>Oops: 0002 [#1] SMP
<0>last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
<4>CPU 5
<4>Modules linked in: macvtap ipt_LOG xt_limit ipt_REJECT xt_hl xt_state dummy tulip xt_tcpudp iptable_filter inet_diag veth macvlan nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc dm_mirror dm_region_hash dm_log uinput bonding ipv6 kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q i5k_amb i5400_edac edac_core iTCO_wdt iTCO_vendor_support ioatdma dca i2c_i801 shpchp sg ghes hed pcspkr serio_raw microcode radeon ttm drm_kms_helper drm sr_mod cdrom ehci_hcd hwmon i2c_algo_bit i2c_core netxen_nic uhci_hcd igb dm_mod [last unloaded: mperf]
<4>
<4>Pid: 24433, comm: netnsd Tainted: G B 2.6.38-rc4-355739.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
<4>RIP: 0010:[<ffffffff811016cb>] [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
<4>RSP: 0018:ffff8802633c9bb8 EFLAGS: 00010213
<4>RAX: ffffffff8141c100 RBX: ffff880128e3d600 RCX: ffff880128e3d738
<4>RDX: 0000000000000000 RSI: ffff880128e3d740 RDI: ffffffff818022c0
<4>RBP: ffff8802633c9c18 R08: 0000000000000004 R09: ffff880128e3d638
<4>R10: ffff8802633c9c65 R11: 0000000000000000 R12: ffff880128e3d748
<4>R13: 0000000000000004 R14: ffff880128e3d600 R15: ffff880128e3d6b8
<4>FS: 0000000000000000(0000) GS:ffff8800cfd40000(0063) knlGS:00000000f746b6d0
<4>CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
<4>CR2: 0000000000000008 CR3: 00000001e4181000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process netnsd (pid: 24433, threadinfo ffff8802633c8000, task ffff880296870000)
<0>Stack:
<4> ffff88004ec85000 00b2130a00000000 ffff8802633c8000 ffff880128e3d65c
<4> ffff8802633c9be8 ffff8801c6826cc0 ffff8802633c9c48 ffff8802633c9c58
<4> 0000000000000002 ffff88019d1f2500 00000000000013bf ffff8802633c9c48
<0>Call Trace:
<4> [<ffffffff8113c8bc>] proc_flush_task+0xae/0x1d2
<4> [<ffffffff8104061a>] release_task+0x35/0x3b9
<4> [<ffffffff81040f53>] wait_consider_task+0x5b5/0x911
<4> [<ffffffff810413a6>] do_wait+0xf7/0x222
<4> [<ffffffff8104266f>] sys_wait4+0x99/0xbc
<4> [<ffffffff8104038f>] ? child_wait_callback+0x0/0x53
<4> [<ffffffff81076155>] compat_sys_wait4+0x26/0xc3
<4> [<ffffffff813d0a4c>] ? _raw_spin_lock_irq+0x1a/0x1c
<4> [<ffffffff8104f34a>] ? do_sigaction+0x168/0x179
<4> [<ffffffff810024c1>] ? do_notify_resume+0x27/0x69
<4> [<ffffffff8102d9e0>] sys32_waitpid+0xb/0xd
<4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
<0>Code: 00 49 89 87 80 00 00 00 49 89 8f 88 00 00 00 48 89 11 49 8b 47 68 ff 05 28 04 72 00 ff 80 f0 00 00 00 eb 33 49 8b b7 88 00 00 00 <48> 89 72 08 48 89 16 48 8b 90 e8 00 00 00 48 89 88 e8 00 00 00
<1>RIP [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
<4> RSP <ffff8802633c9bb8>
<0>CR2: 0000000000000008

Eric

2011-02-14 02:45:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Sun, Feb 13, 2011 at 6:04 PM, Eric W. Biederman
<[email protected]> wrote:
>
> The build failures appear to have been due to a corrupted ccache. A
> coworker turned off using the ccache and the compiles started working
> again. ?Unfortunately I can't qualify when my ccache got corrupted,
> or give a hint at which kernel bug caused the corrupted cache. ?I
> expected it happened in whatever I tested just before -rc3.

Ok, that certainly explains how it was reproducible, and why it would
show up in rc4 despite there not being a lot of reasons for any of the
post-rc3 changes to introduce anything like that.

It does sound like memory corruption. I'm not at all sure that it's
the rcu lookup thing (although it's a possible case), and especially
if you've been playing around with some of the more experimental VM
features (memcg? transparent hugepage? migration/compaction?) it could
easily be something there. There's been several bug-fixes in those
areas.

Having SLUB debugging on would be a good start. Obviously,
CONFIG_DEBUG_PAGEALLOC would be wondeful, but it's expensive as heck,
so it can be a bit painful to use on a machine that is actually used
for real work. But it can really help pinpoint those kinds of
problems.

> There is something corrupting my page tables.
>
> messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager] ?pte:ffff88028688b748 pmd:28688b067
> messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager] ?pte:ffff88028688b748 pmd:28688b067
> messages:Feb 13 12:52:17 bs38 kernel: BUG: Bad page map in process [manager] ?pte:ffff880011065748 pmd:11065067

Odd pattern. That is a totally invalid pte, and I do not see what the
pattern would come from. It's a kernel pointer, afaik, and obviously
shouldn't show up in the pte.

But it could be the result of a use-after-free. Or a double free.
Which I _think_ is that rcu lookup bug pattern, but I may be barking
up the wrong tree. Again, SLUB or PAGEALLOC debugging would probably
give more information.

I'm adding Andrew to the cc too, in case it's simply some of the VM patches.

> I have some unexpected kernel crashes as well.
> With 2.6.38-rc3 (something I think this was a git snapshot) I saw:
>
> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

The instruction is the "lock xadd %ax,(%rdi)" that is the actual
locked spin-lock instruction. It's this:

spin_lock(&root_anon_vma->lock);

in __page_lock_anon_vma(), and %rdi is 8. Which is consistent with
root_anon_vma being NULL.

> <0>Call Trace:
> <4> [<ffffffff813d0a0c>] _raw_spin_lock+0x9/0xb
> <4> [<ffffffff810d30cd>] __page_lock_anon_vma+0x3a/0x54
> <4> [<ffffffff810d3633>] page_referenced+0xaf/0x240
> <4> [<ffffffff810bcfda>] shrink_page_list+0x154/0x49e
> <4> [<ffffffff810bd762>] shrink_inactive_list+0x234/0x386
> <4> [<ffffffff810bdede>] shrink_zone+0x356/0x418
> <4> [<ffffffff810bed0e>] kswapd+0x4f6/0x84d
> <4> [<ffffffff81057de9>] kthread+0x7d/0x85
> <4> [<ffffffff810037a4>] kernel_thread_helper+0x4/0x10

It goes without saying that root_anon_vma shouldn't have been NULL
here. But maybe this triggers something for Andrew?

> With 2.6.38-rc4 I have seen:
> <0>general protection fault: 0000 [#1] SMP
> <4>RIP: 0010:[<ffffffff810326b0>] ?[<ffffffff810326b0>] post_schedule+0x7/0x4e
> <4>RSP: 0000:ffff8802981c5bf8 ?EFLAGS: 00010287
> <4>RAX: 0000000000000006 RBX: ffff100367f45c28 RCX: ffff8801a6af0dc0
> <4>RDX: ffff8802981c5fd8 RSI: ffff8801a6af0dc0 RDI: ffff100367f45c28
> <0>Call Trace:
> <4> [<ffffffff813cf98c>] schedule+0x544/0x577
> <4> [<ffffffff813cfb4f>] schedule_timeout+0x22/0xbb
> <4> [<ffffffff813386e5>] __skb_recv_datagram+0x1ec/0x264
> <4> [<ffffffff8133877c>] skb_recv_datagram+0x1f/0x21
> <4> [<ffffffff813aefeb>] unix_accept+0x55/0x103
> <4> [<ffffffff8132efcb>] sys_accept4+0xf3/0x1c3
> <4> [<ffffffff81353b97>] compat_sys_socketcall+0x17d/0x186
> <4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
> <0>Code: 49 89 c4 8b 75 e8 48 89 df 31 c9 e8 a3 d4 ff ff 4c 89 e6 48 89 df e8 ae e3 39 00 48 83 c4 20 5b 41 5c c9 c3 55 48 89 e5 41 54 53 <83> bf 74 08 00 00 00 48 89 fb 74 36 e8 4d e3 39 00 49 89 c4 48
> <1>RIP ?[<ffffffff810326b0>] post_schedule+0x7/0x4e

This is the very first memory access in post_schedule, the

if (rq->post_schedule) {

load. (trapping instruction is "cmpl $0x0,0x874(%rdi)". With %rdi
being corrupt, and the resulting pointer being invalid, it looks like.

Odd, and looks pretty random. Maybe it really is just memory corruption.

> With 2.6.38-rc4 I have seen:
> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> <1>IP: [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
> <0>Call Trace:
> <4> [<ffffffff8113c8bc>] proc_flush_task+0xae/0x1d2
> <4> [<ffffffff8104061a>] release_task+0x35/0x3b9
> <4> [<ffffffff81040f53>] wait_consider_task+0x5b5/0x911
> <4> [<ffffffff810413a6>] do_wait+0xf7/0x222
> <4> [<ffffffff8104266f>] sys_wait4+0x99/0xbc
> <4> [<ffffffff81076155>] compat_sys_wait4+0x26/0xc3
> <4> [<ffffffff8102d9e0>] sys32_waitpid+0xb/0xd
> <4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
> <0>Code: 00 49 89 87 80 00 00 00 49 89 8f 88 00 00 00 48 89 11 49 8b 47 68 ff 05 28 04 72 00 ff 80 f0 00 00 00 eb 33 49 8b b7 88 00 00 00 <48> 89 72 08 48 89 16 48 8b 90 e8 00 00 00 48 89 88 e8 00 00 00
> <1>RIP ?[<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c

I dunno. That instruction sequence looks like a list_del(), but I'm
not certain ("mov %rsi,0x8(%rdx) ; mov %rdx,(%rsi)"). With %rdx being
NULL. But shrink_dcache tends to be where a lot of random memory
corruption ends up then blowing up (because the dcache is very
pointer-intensive, and it can be a large cache), so again, I don't
think the oops really tells us anything. It looks more like the
symptom rather than a cause.

Linus

2011-02-14 03:40:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

> On Sun, Feb 13, 2011 at 6:04 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> The build failures appear to have been due to a corrupted ccache. A
>> coworker turned off using the ccache and the compiles started working
>> again.  Unfortunately I can't qualify when my ccache got corrupted,
>> or give a hint at which kernel bug caused the corrupted cache.  I
>> expected it happened in whatever I tested just before -rc3.
>
> Ok, that certainly explains how it was reproducible, and why it would
> show up in rc4 despite there not being a lot of reasons for any of the
> post-rc3 changes to introduce anything like that.
>
> It does sound like memory corruption. I'm not at all sure that it's
> the rcu lookup thing (although it's a possible case), and especially
> if you've been playing around with some of the more experimental VM
> features (memcg? transparent hugepage? migration/compaction?) it could
> easily be something there. There's been several bug-fixes in those
> areas.

I wish. Our builds trigger the OOM killer is way to frequent but I
haven't figured out the magic invocation to get the memory control
groups to prevent that from happening.

This is a distribution like build so practically everything is enabled.
Let's see. Memory control groups are in there but unused. Transparent
huge pages are not enabled. Memory migration and compaction are not enabled.

We use kvm a little bit to but most of our stuff uses namespaces and in
particular the network namespace for testing. And I haven't seen any
problems in the one or two tests that use vms..

> Having SLUB debugging on would be a good start. Obviously,
> CONFIG_DEBUG_PAGEALLOC would be wondeful, but it's expensive as heck,
> so it can be a bit painful to use on a machine that is actually used
> for real work. But it can really help pinpoint those kinds of
> problems.

If the problems persist I will look at what I can start turning on in
that direction.

>> There is something corrupting my page tables.
>>
>> messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager]  pte:ffff88028688b748 pmd:28688b067
>> messages:Feb 13 12:50:00 bs38 kernel: BUG: Bad page map in process [manager]  pte:ffff88028688b748 pmd:28688b067
>> messages:Feb 13 12:52:17 bs38 kernel: BUG: Bad page map in process [manager]  pte:ffff880011065748 pmd:11065067
>
> Odd pattern. That is a totally invalid pte, and I do not see what the
> pattern would come from. It's a kernel pointer, afaik, and obviously
> shouldn't show up in the pte.
>
> But it could be the result of a use-after-free. Or a double free.
> Which I _think_ is that rcu lookup bug pattern, but I may be barking
> up the wrong tree. Again, SLUB or PAGEALLOC debugging would probably
> give more information.
>
> I'm adding Andrew to the cc too, in case it's simply some of the VM patches.
>
>> I have some unexpected kernel crashes as well.
>> With 2.6.38-rc3 (something I think this was a git snapshot) I saw:
>>
>> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>
> The instruction is the "lock xadd %ax,(%rdi)" that is the actual
> locked spin-lock instruction. It's this:
>
> spin_lock(&root_anon_vma->lock);
>
> in __page_lock_anon_vma(), and %rdi is 8. Which is consistent with
> root_anon_vma being NULL.
>
>> <0>Call Trace:
>> <4> [<ffffffff813d0a0c>] _raw_spin_lock+0x9/0xb
>> <4> [<ffffffff810d30cd>] __page_lock_anon_vma+0x3a/0x54
>> <4> [<ffffffff810d3633>] page_referenced+0xaf/0x240
>> <4> [<ffffffff810bcfda>] shrink_page_list+0x154/0x49e
>> <4> [<ffffffff810bd762>] shrink_inactive_list+0x234/0x386
>> <4> [<ffffffff810bdede>] shrink_zone+0x356/0x418
>> <4> [<ffffffff810bed0e>] kswapd+0x4f6/0x84d
>> <4> [<ffffffff81057de9>] kthread+0x7d/0x85
>> <4> [<ffffffff810037a4>] kernel_thread_helper+0x4/0x10
>
> It goes without saying that root_anon_vma shouldn't have been NULL
> here. But maybe this triggers something for Andrew?
>
>> With 2.6.38-rc4 I have seen:
>> <0>general protection fault: 0000 [#1] SMP
>> <4>RIP: 0010:[<ffffffff810326b0>]  [<ffffffff810326b0>] post_schedule+0x7/0x4e
>> <4>RSP: 0000:ffff8802981c5bf8  EFLAGS: 00010287
>> <4>RAX: 0000000000000006 RBX: ffff100367f45c28 RCX: ffff8801a6af0dc0
>> <4>RDX: ffff8802981c5fd8 RSI: ffff8801a6af0dc0 RDI: ffff100367f45c28
>> <0>Call Trace:
>> <4> [<ffffffff813cf98c>] schedule+0x544/0x577
>> <4> [<ffffffff813cfb4f>] schedule_timeout+0x22/0xbb
>> <4> [<ffffffff813386e5>] __skb_recv_datagram+0x1ec/0x264
>> <4> [<ffffffff8133877c>] skb_recv_datagram+0x1f/0x21
>> <4> [<ffffffff813aefeb>] unix_accept+0x55/0x103
>> <4> [<ffffffff8132efcb>] sys_accept4+0xf3/0x1c3
>> <4> [<ffffffff81353b97>] compat_sys_socketcall+0x17d/0x186
>> <4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
>> <0>Code: 49 89 c4 8b 75 e8 48 89 df 31 c9 e8 a3 d4 ff ff 4c 89 e6 48 89 df e8 ae e3 39 00 48 83 c4 20 5b 41 5c c9 c3 55 48 89 e5 41 54 53 <83> bf 74 08 00 00 00 48 89 fb 74 36 e8 4d e3 39 00 49 89 c4 48
>> <1>RIP  [<ffffffff810326b0>] post_schedule+0x7/0x4e
>
> This is the very first memory access in post_schedule, the
>
> if (rq->post_schedule) {
>
> load. (trapping instruction is "cmpl $0x0,0x874(%rdi)". With %rdi
> being corrupt, and the resulting pointer being invalid, it looks like.
>
> Odd, and looks pretty random. Maybe it really is just memory corruption.
>
>> With 2.6.38-rc4 I have seen:
>> <1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> <1>IP: [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
>> <0>Call Trace:
>> <4> [<ffffffff8113c8bc>] proc_flush_task+0xae/0x1d2
>> <4> [<ffffffff8104061a>] release_task+0x35/0x3b9
>> <4> [<ffffffff81040f53>] wait_consider_task+0x5b5/0x911
>> <4> [<ffffffff810413a6>] do_wait+0xf7/0x222
>> <4> [<ffffffff8104266f>] sys_wait4+0x99/0xbc
>> <4> [<ffffffff81076155>] compat_sys_wait4+0x26/0xc3
>> <4> [<ffffffff8102d9e0>] sys32_waitpid+0xb/0xd
>> <4> [<ffffffff8102cd90>] sysenter_dispatch+0x7/0x2e
>> <0>Code: 00 49 89 87 80 00 00 00 49 89 8f 88 00 00 00 48 89 11 49 8b 47 68 ff 05 28 04 72 00 ff 80 f0 00 00 00 eb 33 49 8b b7 88 00 00 00 <48> 89 72 08 48 89 16 48 8b 90 e8 00 00 00 48 89 88 e8 00 00 00
>> <1>RIP  [<ffffffff811016cb>] shrink_dcache_parent+0x104/0x23c
>
> I dunno. That instruction sequence looks like a list_del(), but I'm
> not certain ("mov %rsi,0x8(%rdx) ; mov %rdx,(%rsi)"). With %rdx being
> NULL. But shrink_dcache tends to be where a lot of random memory
> corruption ends up then blowing up (because the dcache is very
> pointer-intensive, and it can be a large cache), so again, I don't
> think the oops really tells us anything. It looks more like the
> symptom rather than a cause.

Agreed. Except for the pmd corruption I haven't seen any of these more
than once.

Eric

2011-02-14 05:34:46

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.


And for completeness. When I was rebooting v2.6.38-rc4 to start running
795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 I hit this.

Sigh. I wish crash worked on something besides redhats enterprise
kernels. Then I could use the system core file I have to do more than
extract the dmesg.

Eric

<0>------------[ cut here ]------------
<2>kernel BUG at mm/filemap.c:125!
<0>invalid opcode: 0000 [#1] SMP
<0>last sysfs file: /sys/devices/virtual/net/sit0/uevent
<4>CPU 5
<4>Modules linked in: sit tunnel4 macvtap ipt_LOG xt_limit ipt_REJECT xt_hl xt_state dummy tulip xt_tcpudp iptable_filter inet_diag veth macvlan nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc dm_mirror dm_region_hash dm_log uinput bonding ipv6 kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q i5k_amb iTCO_wdt i5400_edac ioatdma iTCO_vendor_support edac_core i2c_i801 dca shpchp ghes hed microcode sg serio_raw pcspkr radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ehci_hcd uhci_hcd sr_mod cdrom igb netxen_nic dm_mod [last unloaded: mperf]
<4>
<4>Pid: 2611, comm: umount Tainted: G B 2.6.38-rc4-355739.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
<4>RIP: 0010:[<ffffffff810b0cd7>] [<ffffffff810b0cd7>] __remove_from_page_cache+0x54/0xb9
<4>RSP: 0018:ffff880296397c28 EFLAGS: 00010046
<4>RAX: 0000000000000000 RBX: ffffea00086cf1b8 RCX: 00000000ffffffc8
<4>RDX: 0000000000000009 RSI: 0000000000000009 RDI: ffff8802affece00
<4>RBP: ffff880296397c38 R08: 0000000000000000 R09: de80000000000000
<4>R10: ffff88028a4f6aa8 R11: ffff8802affece00 R12: ffff8801a7588578
<4>R13: ffff8801a7588590 R14: ffff880296397cc8 R15: ffffffffffffffff
<4>FS: 00007fbde33db760(0000) GS:ffff8800cfd40000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000003928271610 CR3: 0000000295e3b000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process umount (pid: 2611, threadinfo ffff880296396000, task ffff88007d62d960)
<0>Stack:
<4> ffffea00086cf1b8 0000000000000000 ffff880296397c68 ffffffff810b0d75
<4> ffff880296397cc8 ffffea00086cf1b8 ffff8801a7588578 ffff8801a7588578
<4> ffff880296397c88 ffffffff810b9b5e 0000000000000000 0000000000000005
<0>Call Trace:
<4> [<ffffffff810b0d75>] remove_from_page_cache+0x39/0x5c
<4> [<ffffffff810b9b5e>] truncate_inode_page+0x63/0x77
<4> [<ffffffff810b9c3c>] truncate_inode_pages_range+0xca/0x2e9
<4> [<ffffffff811876ae>] ? ext4_discard_preallocations+0x88/0x309
<4> [<ffffffff810b9e68>] truncate_inode_pages+0xd/0xf
<4> [<ffffffff8116716a>] ext4_evict_inode+0x4b/0x213
<4> [<ffffffff8110389d>] evict+0x1f/0x88
<4> [<ffffffff81103945>] dispose_list+0x3f/0xce
<4> [<ffffffff81104652>] ? evict_inodes+0xe4/0x124
<4> [<ffffffff8110467b>] evict_inodes+0x10d/0x124
<4> [<ffffffff810f1e2c>] generic_shutdown_super+0x60/0xf0
<4> [<ffffffff810f1ede>] kill_block_super+0x22/0x65
<4> [<ffffffff810f2139>] deactivate_locked_super+0x21/0x41
<4> [<ffffffff810f2c2d>] deactivate_super+0x35/0x39
<4> [<ffffffff81106fa3>] mntput_no_expire+0xcb/0xd0
<4> [<ffffffff81107ae4>] sys_umount+0x2e9/0x317
<4> [<ffffffff810f8afe>] ? path_put+0x1d/0x21
<4> [<ffffffff81002992>] system_call_fastpath+0x16/0x1b
<0>Code: be 09 00 00 00 48 89 df e8 5b 45 01 00 48 8b 03 a9 00 00 08 00 74 0d be 16 00 00 00 48 89 df e8 44 45 01 00 8b 43 0c 85 c0 78 02 <0f> 0b 48 8b 03 a8 10 74 57 49 8b 44 24 68 f6 40 20 01 75 4c 48
<1>RIP [<ffffffff810b0cd7>] __remove_from_page_cache+0x54/0xb9
<4> RSP <ffff880296397c28>

2011-02-14 14:51:46

by Yong Zhang

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Tue, Feb 08, 2011 at 03:18:00PM +0100, Peter Zijlstra wrote:
> Subject: lockdep, timer: Revert the del_timer_sync() annotation
>
> Both attempts at trying to allow softirq usage failed, revert for this
> release and try again later.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/timer.c | 8 +++-----
> 1 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 343ff27..c848cd8 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
> *
> * Synchronization rules: Callers must prevent restarting of the timer,
> * otherwise this function is meaningless. It must not be called from
> - * hardirq contexts. The caller must not hold locks which would prevent
> + * interrupt contexts. The caller must not hold locks which would prevent

I think we don't need to revert this comment.

> * completion of the timer's handler. The timer's handler must not call
> * add_timer_on(). Upon exit the timer is not queued and the handler is
> * not running on any CPU.
> @@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
> #ifdef CONFIG_LOCKDEP
> unsigned long flags;
>
> - raw_local_irq_save(flags);
> - local_bh_disable();
> + local_irq_save(flags);

Going back to local_irq_save()/local_irq_restore() doesn't prevent
it from using in softirq context.

Thanks,
Yong

2011-02-14 15:27:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Sun, Feb 13, 2011 at 9:34 PM, Eric W. Biederman
<[email protected]> wrote:
>
> And for completeness. ?When I was rebooting v2.6.38-rc4 to start running
> 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 ?I hit this.

.. but this was while still running the older kernel, right?

> <2>kernel BUG at mm/filemap.c:125!

I suspect this is "normal" after page table corruption. Any page that
was mapped but overwritten by the corruption would never get unmapped
(since it can't be found in the page tables), and then you trigger the

BUG_ON(page_mapped(page));

in __remove_from_page_cache() at umount time. Your register state
shows that %eax is 0, and that's the count that we tested
("page->_mapcount" is -1 when there are no mappings, so you have one
lost mapping reference to that page).

So that oops isn't all that interesting, I'm afraid.

Linus

2011-02-14 16:37:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Mon, Feb 14, 2011 at 7:37 AM, Eric W. Biederman
<[email protected]> wrote:
>
> 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 ?is not fairing too well.
>
> The Bad PMDs may be happening more frequently but the oops that killed
> me was a NULL pointer dereference in acct_collect this time. ?Ugh.

So you also have a fair amount of those user-level SIGSEGV reports.
Which is consistent with memory corruption - most of the time the
corruption is not something that gets caught as a kernel data
structure corruption, but some random other data.

The PTE corruption does show a interesting patterns, though:

- it's always two consecutive page table entries (that have the same
value, and it looks like a kernel pointer)

This implies to me that it's a list operation. Please enable
CONFIG_DEBUG_LIST.

The fact that the words are the same also tends to imply that it's
likely a bogus "list_init()" on free'd (or re-used) memory.

- The values have a pattern, they look like this:

ffff88000aea5748
ffff88000af0d748
ffff88000af0f748
ffff88001dae1748
ffff88004b41f748
ffff8800aeb67748
ffff8801178f5748
ffff880192d85748
ffff8801e07a9748
ffff8801e50ef748
ffff880282177748

which means that they are always at the same offset (0x1748) of a
8kB allocation

- The page table addresses have a pattern too (the count there is the
uniq count - there's one pair of addresses that shows up twice):

1 00000000082e9000
1 00000000082ea000
1 000000000bae9000
1 000000000baea000
1 00000000c2ce9000
1 00000000c2cea000
1 00000000eeae9000
1 00000000eeaea000
1 00000000ef4e9000
1 00000000ef4ea000
1 00000000f04e9000
1 00000000f04ea000
1 00000000f3ce9000
1 00000000f3cea000
1 00000000f42e9000
1 00000000f42ea000
2 00000000f50e9000
2 00000000f50ea000
1 00000000f66e9000
1 00000000f66ea000

and turning "virtual address" into "page table address" (shift down
by page size, shift up by page table entry size), you get

00041748
00041750
0005d748
0005d750
00616748
00616750
00775748
00775750
0077a748
0077a750
00782748
00782750
0079e748
0079e750
007a1748
007a1750
007a8748
007a8750
007b3748
007b3750

which shows the same 0x748 pattern (the "1750" pattern is just the
next word address). Which is *exactly* what you'd expect from an empty
list (list pointer pointing to itself, and the low 12 bits are
identical in virtual address - the high bits will obviously differ,
since they are all about the allocation of the page tables
themselves).

In other words: I can pretty much guarantee that this is a "struct
list" that is in a 8kB allocation at offset 0x1748. And that gets
re-initialized after it got freed.

Now, I don't know what the actual 8kB allocation is. And most
structures end up having very different offsets based on various
config options, so I can't even guess. And it is possible that there
is some other reason for the 8kB thing (for example, you clearly are
doing things with networking and promiscuous mode, and maybe the
particular skb allocation pattern or something ends up using a SLUB
entry that is always two pages etc.

Can anybody see any other patterns?

Linus

2011-02-14 16:59:04

by Mike Snitzer

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Hey Eric,

On Mon, Feb 14, 2011 at 12:34 AM, Eric W. Biederman
<[email protected]> wrote:
>
> And for completeness. ?When I was rebooting v2.6.38-rc4 to start running
> 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 ?I hit this.
>
> Sigh. ?I wish crash worked on something besides redhats enterprise
> kernels. ?Then I could use the system core file I have to do more than
> extract the dmesg.

Then you should cc [email protected] (now cc'd) and work with
Dave Anderson (e.g. get him your vmlinux and core files, which version
of crash you're using and how it fails). Dave does an amazing job of
working through crash issues which are reported against upstream
kernels -- the key first step is the report.

Mike

2011-02-14 17:39:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

> On Mon, Feb 14, 2011 at 7:37 AM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505  is not fairing too well.
>>
>> The Bad PMDs may be happening more frequently but the oops that killed
>> me was a NULL pointer dereference in acct_collect this time.  Ugh.
>
> So you also have a fair amount of those user-level SIGSEGV reports.
> Which is consistent with memory corruption - most of the time the
> corruption is not something that gets caught as a kernel data
> structure corruption, but some random other data.
>
> The PTE corruption does show a interesting patterns, though:
>
> - it's always two consecutive page table entries (that have the same
> value, and it looks like a kernel pointer)
>
> This implies to me that it's a list operation. Please enable
> CONFIG_DEBUG_LIST.
>
> The fact that the words are the same also tends to imply that it's
> likely a bogus "list_init()" on free'd (or re-used) memory.
>
> - The values have a pattern, they look like this:
>
> ffff88000aea5748
> ffff88000af0d748
> ffff88000af0f748
> ffff88001dae1748
> ffff88004b41f748
> ffff8800aeb67748
> ffff8801178f5748
> ffff880192d85748
> ffff8801e07a9748
> ffff8801e50ef748
> ffff880282177748
>
> which means that they are always at the same offset (0x1748) of a
> 8kB allocation
>
> - The page table addresses have a pattern too (the count there is the
> uniq count - there's one pair of addresses that shows up twice):
>
> 1 00000000082e9000
> 1 00000000082ea000
> 1 000000000bae9000
> 1 000000000baea000
> 1 00000000c2ce9000
> 1 00000000c2cea000
> 1 00000000eeae9000
> 1 00000000eeaea000
> 1 00000000ef4e9000
> 1 00000000ef4ea000
> 1 00000000f04e9000
> 1 00000000f04ea000
> 1 00000000f3ce9000
> 1 00000000f3cea000
> 1 00000000f42e9000
> 1 00000000f42ea000
> 2 00000000f50e9000
> 2 00000000f50ea000
> 1 00000000f66e9000
> 1 00000000f66ea000
>
> and turning "virtual address" into "page table address" (shift down
> by page size, shift up by page table entry size), you get
>
> 00041748
> 00041750
> 0005d748
> 0005d750
> 00616748
> 00616750
> 00775748
> 00775750
> 0077a748
> 0077a750
> 00782748
> 00782750
> 0079e748
> 0079e750
> 007a1748
> 007a1750
> 007a8748
> 007a8750
> 007b3748
> 007b3750
>
> which shows the same 0x748 pattern (the "1750" pattern is just the
> next word address). Which is *exactly* what you'd expect from an empty
> list (list pointer pointing to itself, and the low 12 bits are
> identical in virtual address - the high bits will obviously differ,
> since they are all about the allocation of the page tables
> themselves).
>
> In other words: I can pretty much guarantee that this is a "struct
> list" that is in a 8kB allocation at offset 0x1748. And that gets
> re-initialized after it got freed.

Interesting.

> Now, I don't know what the actual 8kB allocation is. And most
> structures end up having very different offsets based on various
> config options, so I can't even guess. And it is possible that there
> is some other reason for the 8kB thing (for example, you clearly are
> doing things with networking and promiscuous mode, and maybe the
> particular skb allocation pattern or something ends up using a SLUB
> entry that is always two pages etc.

It could be. I also use a lot of transient network namespaces, so
potentially it could be just about anything in the networking stack.
They make testing all kinds networking behavior easy, especially when
all you have is a single machine. Since we sniff the traffic to make
certain the right traffic is in transit we also get a lot of network
interfaces in promiscuous mode.

Eric

2011-02-14 17:50:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Mon, Feb 14, 2011 at 8:37 AM, Linus Torvalds
<[email protected]> wrote:
>
> In other words: I can pretty much guarantee that this is a "struct
> list" that is in a 8kB allocation at offset 0x1748. And that gets
> re-initialized after it got freed.

Btw, can you just make that 'vmlinux' image (not the compressed one,
the actual final object file that I could do "objdump" on) available
somewhere, or just send me the .config file to try to reproduce it
closely enough to see what the 0x1748 offset might be.

As mentioned, those offsets change (often wildly) depending on config
options, and I don't think I've seen the .config file you have (or if
I've seen it, I've misplaced it).

Linus

2011-02-14 18:14:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Mon, Feb 14, 2011 at 9:49 AM, Linus Torvalds
<[email protected]> wrote:
>
> Btw, can you just make that 'vmlinux' image [..]

.. actually, since you have lots of modules, the .config file is a
much better option. If it's in some module, the vmlinux file won't
contain it.

(Although best guess would be for it to be some core data structure,
so who knows?)

Linus

2011-02-14 18:24:36

by Andi Kleen

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

>
> In other words: I can pretty much guarantee that this is a "struct
> list" that is in a 8kB allocation at offset 0x1748. And that gets
> re-initialized after it got freed.

If it's that the DEBUG_PAGEALLOC should have a good chance of catching
it.

-Andi

--
[email protected] -- Speaking for myself only

2011-02-14 18:53:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: lockdep: possible reason: unannotated irqs-off. (was: Re: Linux 2.6.38-rc4)

On Mon, 14 Feb 2011, Yong Zhang wrote:

> On Tue, Feb 08, 2011 at 03:18:00PM +0100, Peter Zijlstra wrote:
> > Subject: lockdep, timer: Revert the del_timer_sync() annotation
> >
> > Both attempts at trying to allow softirq usage failed, revert for this
> > release and try again later.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > ---
> > kernel/timer.c | 8 +++-----
> > 1 files changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index 343ff27..c848cd8 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -959,7 +959,7 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
> > *
> > * Synchronization rules: Callers must prevent restarting of the timer,
> > * otherwise this function is meaningless. It must not be called from
> > - * hardirq contexts. The caller must not hold locks which would prevent
> > + * interrupt contexts. The caller must not hold locks which would prevent
>
> I think we don't need to revert this comment.

That does not matter. It breaks stuff left and right and we need to go
back to the old (maybe less broken) state in that phase of -rc. It's
that simple.

> > * completion of the timer's handler. The timer's handler must not call
> > * add_timer_on(). Upon exit the timer is not queued and the handler is
> > * not running on any CPU.
> > @@ -971,12 +971,10 @@ int del_timer_sync(struct timer_list *timer)
> > #ifdef CONFIG_LOCKDEP
> > unsigned long flags;
> >
> > - raw_local_irq_save(flags);
> > - local_bh_disable();
> > + local_irq_save(flags);
>
> Going back to local_irq_save()/local_irq_restore() doesn't prevent
> it from using in softirq context.

That does not matter. It goes back to status quo and does not
introduce new problems. As the changelog says:

> > Both attempts at trying to allow softirq usage failed, revert for this
> > release and try again later.

So it's not forgotten. It's just not fixable right now.

Thanks,

tglx

2011-02-14 19:45:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

Linus Torvalds <[email protected]> writes:

> On Mon, Feb 14, 2011 at 9:49 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Btw, can you just make that 'vmlinux' image [..]
>
> .. actually, since you have lots of modules, the .config file is a
> much better option. If it's in some module, the vmlinux file won't
> contain it.
>
> (Although best guess would be for it to be some core data structure,
> so who knows?)

Yeah I will get you the whole pile of rpms with everything in them
in a bit.

Meanwhile.

With CONFIG_DEBUG_PAGEALLOC I don't have a kernel that successfully
boots. I am having trouble reading the output, and it doesn't look like
it is consistently failing in the same place.

My three boot attempts with CONFIG_DEBUG_PAGEALLOC are below.

Eric


ERROR: kmemcheck: Fatal error
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810df982>] perf_event_update_userpage+0x42/0xe0
Oops: 0002 [#1] SMP
last sysfs file:
Stack:
Call Trace:
<NMI>
<<EOE>>
<#DB>
<<EOE>>
Code: 48 89 fb 48 c7 c7 c0 a3 a1 81 48 83 ec 18 48 c7 04 24 40 f9 0d 81 e8 be 79 fa ff 48 8b 83 e8 02 00 00 48 85 c0 74 57 48 8b 40 58 <83> 40 08 01 31 d2 f6 83 30 01 00 00 01 8b 4b 58 8b b3 10 01 00
RIP [<ffffffff810df982>] perf_event_update_userpage+0x42/0xe0
CR2: 0000000000000008
Kernel panic - not syncing: Attempted to kill init!



Then when I crank the loglevel up to 8 I see.
..........................................................................................................
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 ro root=
LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9
600 LANG=en_US.UTF-8 loglevel=8
[Linux-bzImage, setup=0x3800, size=0x3ebb90]
initrd /boot/initramfs-2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64.img
[Linux-initrd @ 0x36d47000, 0x12a8eb7 bytes]

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 ([email protected]) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Mon Feb 14 09:30:51 PST 2011
Command line: ro root=LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9600 LANG=en_US.UTF-8 loglevel=8
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cfef0000 (usable)
BIOS-e820: 00000000cfef0000 - 00000000cff04000 (ACPI data)
BIOS-e820: 00000000cff04000 - 00000000cff05000 (ACPI NVS)
BIOS-e820: 00000000cff05000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 00000002b0000000 (usable)
NX (Execute Disable) protection: active
DMI present.
DMI: X7DWU/X7DWU, BIOS 1.2 11/04/2008
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
No AGP bridge found
last_pfn = 0x2b0000 max_arch_pfn = 0x400000000
MTRR default type: uncachable
MTRR fixed ranges enabled:
00000-9FFFF write-back
A0000-BFFFF uncachable
C0000-CFFFF write-protect
D0000-E3FFF uncachable
E4000-FFFFF write-protect
MTRR variable ranges enabled:
0 base 00D0000000 mask 3FF0000000 uncachable
1 base 00E0000000 mask 3FE0000000 uncachable
2 base 0000000000 mask 3E00000000 write-back
3 base 0200000000 mask 3F80000000 write-back
4 base 0280000000 mask 3FE0000000 write-back
5 base 02A0000000 mask 3FF0000000 write-back
6 base 00CFF80000 mask 3FFFF80000 uncachable
7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
e820 update range: 00000000cff80000 - 0000000100000000 (usable) ==> (reserved)
last_pfn = 0xcfef0 max_arch_pfn = 0x400000000
found SMP MP-table at [ffff8800000f68b0] f68b0
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-00000000cfef0000
0000000000 - 00cfef0000 page 4k
kernel direct mapping tables up to cfef0000 @ 1f97b000-20000000
init_memory_mapping: 0000000100000000-00000002b0000000
0100000000 - 02b0000000 page 4k
kernel direct mapping tables up to 2b0000000 @ ce964000-cfef0000
RAMDISK: 36d47000 - 37ff0000
Reserving 128MB of memory at 736MB for crashkernel (System RAM: 11008MB)
ACPI: RSDP 00000000000f6880 00024 (v02 PTLTD )
ACPI: XSDT 00000000cfefcef6 000D4 (v01 PTLTD ? XSDT 06040000 LTP 00000000)
ACPI: FACP 00000000cff033a8 000F4 (v03 INTEL STOAKLEY 06040000 PTL 00000003)
ACPI: DSDT 00000000cfefeab8 0487C (v01 Intel SEABURG 06040000 MSFT 03000001)
ACPI: FACS 00000000cff04fc0 00040
ACPI: _MAR 00000000cff0349c 00030 (v01 Intel OEMDMAR 06040000 LOHR 00000001)
ACPI: TCPA 00000000cff034cc 00032 (v01 Intel STOAKLEY 06040000 LOHR 0000005A)
ACPI: APIC 00000000cff034fe 000C8 (v01 PTLTD ? APIC 06040000 LTP 00000000)
ACPI: MCFG 00000000cff035c6 0003C (v01 PTLTD MCFG 06040000 LTP 00000000)
ACPI: HPET 00000000cff03602 00038 (v01 PTLTD HPETTBL 06040000 LTP 00000001)
ACPI: BOOT 00000000cff0363a 00028 (v01 PTLTD $SBFTBL$ 06040000 LTP 00000001)
ACPI: SPCR 00000000cff03662 00050 (v01 PTLTD $UCRTBL$ 06040000 PTL 00000001)
ACPI: ERST 00000000cff036b2 00590 (v01 SMCI ERSTTBL 06040000 SMCI 00000001)
ACPI: HEST 00000000cff03c42 000A8 (v01 SMCI HESTTBL 06040000 SMCI 00000001)
ACPI: BERT 00000000cff03cea 00030 (v01 SMCI BERTTBL 06040000 SMCI 00000001)
ACPI: EINJ 00000000cff03d1a 00170 (v01 SMCI EINJTBL 06040000 SMCI 00000001)
ACPI: SLIC 00000000cff03e8a 00176 (v01 OEM_ID OEMTABLE 06040000 LTP 00000000)
ACPI: SSDT 00000000cfefe859 0025F (v01 PmRef Cpu0Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe7b3 000A6 (v01 PmRef Cpu7Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe70d 000A6 (v01 PmRef Cpu6Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe667 000A6 (v01 PmRef Cpu5Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe5c1 000A6 (v01 PmRef Cpu4Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe51b 000A6 (v01 PmRef Cpu3Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe475 000A6 (v01 PmRef Cpu2Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe3cf 000A6 (v01 PmRef Cpu1Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefcfca 01405 (v01 PmRef CpuPm 00003000 INTL 20050228)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-00000002b0000000
Initmem setup node 0 0000000000000000-00000002b0000000
NODE_DATA [00000002affec000 - 00000002afffffff]
[ffffea0000000000-ffffea000abfffff] PMD -> [ffff8802a5e00000-ffff8802afdfffff] on node 0
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x002b0000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0x00000010 -> 0x0000009d
0: 0x00000100 -> 0x000cfef0
0: 0x00100000 -> 0x002b0000
On node 0 totalpages: 2621053
DMA zone: 64 pages used for memmap
DMA zone: 6 pages reserved
DMA zone: 3911 pages, LIFO batch:0
DMA32 zone: 16320 pages used for memmap
DMA32 zone: 831280 pages, LIFO batch:31
Normal zone: 27648 pages used for memmap
Normal zone: 1741824 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec89000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec89000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a201 base: 0xfed00000
SMP: Allowing 8 CPUs, 0 hotplug CPUs
nr_irqs_gsi: 64
PM: Registered nosave memory: 000000000009d000 - 000000000009e000
PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
PM: Registered nosave memory: 00000000cfef0000 - 00000000cff04000
PM: Registered nosave memory: 00000000cff04000 - 00000000cff05000
PM: Registered nosave memory: 00000000cff05000 - 00000000d0000000
PM: Registered nosave memory: 00000000d0000000 - 00000000e0000000
PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000
PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000
PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000
PM: Registered nosave memory: 00000000fee01000 - 00000000ff000000
PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
Allocating PCI resources starting at d0000000 (gap: d0000000:10000000)
Booting paravirtualized kernel on bare hardware
setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:8 nr_node_ids:1
PERCPU: Embedded 26 pages/cpu @ffff8800cfc00000 s77056 r8192 d21248 u262144
pcpu-alloc: s77056 r8192 d21248 u262144 alloc=1*2097152
pcpu-alloc: [0] 0 1 2 3 4 5 6 7
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 2577015
Policy zone: Normal
Kernel command line: ro root=LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9600 LANG=en_US.UTF-8 loglevel=8
PID hash table entries: 4096 (order: 3, 32768 bytes)
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 10058856k/11272192k available (4972k kernel code, 787980k absent, 425356k reserved, 5824k data, 1308k init)
SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is disabled.
NR_IRQS:4352 nr_irqs:1152 16
Extended CMOS year: 2000
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES: 8
... MAX_LOCK_DEPTH: 48
... MAX_LOCKDEP_KEYS: 8191
... CLASSHASH_SIZE: 4096
... MAX_LOCKDEP_ENTRIES: 16384
... MAX_LOCKDEP_CHAINS: 32768
... CHAINHASH_SIZE: 16384
memory used by lock dependency info: 5855 kB
per task-struct memory footprint: 1920 bytes
allocated 104857600 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
hpet clockevent registered
Fast TSC calibration using PIT
Detected 2500.101 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.20 BogoMIPS (lpj=10000404)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys blkio
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 6 MCE banks
CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20110112
Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping 06
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
kmemcheck: Limiting number of CPUs to 1.
kmemcheck: Initialized
NMI watchdog enabled, takes one hw-pmu counter.
Brought up 1 CPUs
Total of 1 processors activated (5000.20 BogoMIPS).
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-09] at [mem 0xe0000000-0xe09fffff] (base 0xe0000000)
PCI: MMCONFIG at [mem 0xe0000000-0xe09fffff] reserved in E820
PCI: Using configuration type 1 for base access
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81318f57>] ? vt_console_print+0xa7/0x390
[<ffffffff81318f1c>] ? vt_console_print+0x6c/0x390
[<ffffffff8104f025>] ? __call_console_drivers+0x75/0x90
[<ffffffff8104f085>] ? _call_console_drivers+0x45/0x70
[<ffffffff8104f59e>] ? console_unlock+0x11e/0x250
[<ffffffff8104f9ee>] ? vprintk+0x1fe/0x4c0
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814cd50c>] ? printk+0x3c/0x40
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff8104ee68>] ? warn_slowpath_common+0x38/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81011bf8>] ? x86_perf_event_update+0x28/0xc0
[<ffffffff81015ffc>] ? intel_pmu_handle_irq+0x1fc/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5107>] ? __atomic_notifier_call_chain+0x67/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d92>] ? do_nmi+0xd2/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff8112e60e>] ? kmem_cache_alloc+0x3e/0x160
[<ffffffff814d1580>] ? page_fault+0x0/0x30
<<EOE>> [<ffffffff8112e626>] ? kmem_cache_alloc+0x56/0x160
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff811a6838>] ? sysfs_add_file_mode+0x38/0xc0
[<ffffffff811a68cc>] ? sysfs_add_file+0xc/0x10
[<ffffffff811a69a1>] ? sysfs_create_file+0x21/0x30
[<ffffffff81338314>] ? device_create_file+0x14/0x20
[<ffffffff81338cb8>] ? device_add+0x128/0x650
[<ffffffff813391f9>] ? device_register+0x19/0x20
[<ffffffff813392fb>] ? device_create_vargs+0xfb/0x130
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff8110308f>] ? bdi_register+0x6f/0x1c0
[<ffffffff8127c183>] ? prop_local_init_percpu+0x43/0x50
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff81abdd36>] ? default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a725 ]---
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81011bf8>] ? x86_perf_event_update+0x28/0xc0
[<ffffffff81015ffc>] ? intel_pmu_handle_irq+0x1fc/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5107>] ? __atomic_notifier_call_chain+0x67/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d92>] ? do_nmi+0xd2/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff8112e60e>] ? kmem_cache_alloc+0x3e/0x160
[<ffffffff814d1580>] ? page_fault+0x0/0x30
<<EOE>> [<ffffffff8112e626>] ? kmem_cache_alloc+0x56/0x160
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff811a6838>] ? sysfs_add_file_mode+0x38/0xc0
[<ffffffff811a68cc>] ? sysfs_add_file+0xc/0x10
[<ffffffff811a69a1>] ? sysfs_create_file+0x21/0x30
[<ffffffff81338314>] ? device_create_file+0x14/0x20
[<ffffffff81338cb8>] ? device_add+0x128/0x650
[<ffffffff813391f9>] ? device_register+0x19/0x20
[<ffffffff813392fb>] ? device_create_vargs+0xfb/0x130
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff8110308f>] ? bdi_register+0x6f/0x1c0
[<ffffffff8127c183>] ? prop_local_init_percpu+0x43/0x50
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff81abdd36>] ? default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a726 ]---
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
\_SB_:_OSC evaluation returned wrong type
_OSC request data:1 7
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
HEST: Table parsing has been initialized.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000cc000-0x000cffff]
pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000d3fff]
pci_root PNP0A03:00: host bridge window [mem 0x000d4000-0x000d7fff]
pci_root PNP0A03:00: host bridge window [mem 0x000d8000-0x000dbfff]
pci_root PNP0A03:00: host bridge window [mem 0x000dc000-0x000dffff]
pci_root PNP0A03:00: host bridge window [mem 0xd0000000-0xfebfffff]
pci 0000:00:00.0: [8086:4003] type 0 class 0x000600
pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
pci 0000:00:00.0: PME# disabled
pci 0000:00:01.0: [8086:4021] type 1 class 0x000604
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
pci 0000:00:03.0: [8086:4023] type 1 class 0x000604
pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
pci 0000:00:03.0: PME# disabled
pci 0000:00:05.0: [8086:4025] type 1 class 0x000604
pci 0000:00:05.0: PME# supported from D0 D3hot D3cold
pci 0000:00:05.0: PME# disabled
pci 0000:00:07.0: [8086:4027] type 1 class 0x000604
pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
pci 0000:00:07.0: PME# disabled
pci 0000:00:09.0: [8086:4029] type 1 class 0x000604
pci 0000:00:09.0: PME# supported from D0 D3hot D3cold
pci 0000:00:09.0: PME# disabled
pci 0000:00:0f.0: [8086:402f] type 0 class 0x000880
pci 0000:00:0f.0: reg 10: [mem 0xfe700000-0xfe703fff 64bit]
pci 0000:00:10.0: [8086:4030] type 0 class 0x000600
pci 0000:00:10.1: [8086:4030] type 0 class 0x000600
pci 0000:00:10.2: [8086:4030] type 0 class 0x000600
pci 0000:00:10.3: [8086:4030] type 0 class 0x000600
pci 0000:00:10.4: [8086:4030] type 0 class 0x000600
pci 0000:00:11.0: [8086:4031] type 0 class 0x000600
pci 0000:00:15.0: [8086:4035] type 0 class 0x000600
pci 0000:00:15.1: [8086:4035] type 0 class 0x000600
pci 0000:00:16.0: [8086:4036] type 0 class 0x000600
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2465 lockdep_trace_alloc+0xcc/0xe0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
[<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff810883dc>] ? lockdep_trace_alloc+0xcc/0xe0
[<ffffffff8112ca3c>] ? __kmalloc+0x4c/0x190
[<ffffffff81284f77>] ? kvasprintf+0x57/0x90
[<ffffffff81284f77>] ? kvasprintf+0x57/0x90
[<ffffffff81279eda>] ? kobject_set_name_vargs+0x3a/0x80
[<ffffffff81337cbc>] ? dev_set_name+0x3c/0x40
[<ffffffff81297301>] ? pci_setup_device+0x131/0x410
[<ffffffff814bc873>] ? pci_scan_single_device+0xf3/0x160
[<ffffffff81296820>] ? next_trad_fn+0x0/0x10
[<ffffffff8129777c>] ? pci_scan_slot+0xac/0x130
[<ffffffff814be568>] ? pci_scan_child_bus+0x29/0xb1
[<ffffffff81296565>] ? pci_bus_add_resource+0x55/0xa0
[<ffffffff814c395e>] ? pci_acpi_scan_root+0x2a4/0x2fb
[<ffffffff814c0e0a>] ? acpi_pci_root_add+0x197/0x33a
[<ffffffff812ca5a7>] ? acpi_device_probe+0x0/0x185
[<ffffffff812ca5f2>] ? acpi_device_probe+0x4b/0x185
[<ffffffff8133bc36>] ? driver_probe_device+0x96/0x1c0
[<ffffffff8133be03>] ? __driver_attach+0xa3/0xb0
[<ffffffff8133bd60>] ? __driver_attach+0x0/0xb0
[<ffffffff8133a8fe>] ? bus_for_each_dev+0x5e/0x90
[<ffffffff8133b899>] ? driver_attach+0x19/0x20
[<ffffffff8133b443>] ? bus_add_driver+0xc3/0x280
[<ffffffff81acc931>] ? acpi_pci_root_init+0x0/0x2d
[<ffffffff8133c081>] ? driver_register+0x71/0x140
[<ffffffff81acc931>] ? acpi_pci_root_init+0x0/0x2d
[<ffffffff812cad36>] ? acpi_bus_register_driver+0x3e/0x40
[<ffffffff81acc956>] ? acpi_pci_root_init+0x25/0x2d
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a727 ]---
pci 0000:00:16.1: [8086:4036] type 0 class 0x000600
ERROR: kmemcheck: Fatal error

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff8112c880>] [<ffffffff8112c880>] kmem_cache_alloc_trace+0x90/0x160
RSP: 0018:ffff880299ccbaf0 EFLAGS: 00010106
RAX: ffff880299edd000 RBX: ffff8802a58039c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000007fffff RDI: ffff880299ede000
RBP: ffff880299ccbb30 R08: 0000000000000000 R09: ffff880299edd000
R10: 0000000000000002 R11: 0000000000000001 R12: ffff880299edd000
R13: 00000000000080d0 R14: 0000000000000286 R15: ffffffff81296b2f
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880299c98810 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[<ffffffff810370c4>] kmemcheck_error_save_bug+0x74/0xb0
[<ffffffff8103790b>] kmemcheck_show+0x8b/0xa0
[<ffffffff81037d0a>] kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] do_page_fault+0x41d/0x560
[<ffffffff814d15a5>] page_fault+0x25/0x30
[<ffffffff81296b2f>] alloc_pci_dev+0x1f/0x40
[<ffffffff814bc84b>] pci_scan_single_device+0xcb/0x160
[<ffffffff8129777c>] pci_scan_slot+0xac/0x130
[<ffffffff814be568>] pci_scan_child_bus+0x29/0xb1
[<ffffffff814c395e>] pci_acpi_scan_root+0x2a4/0x2fb
[<ffffffff814c0e0a>] acpi_pci_root_add+0x197/0x33a
[<ffffffff812ca5f2>] acpi_device_probe+0x4b/0x185
[<ffffffff8133bc36>] driver_probe_device+0x96/0x1c0
[<ffffffff8133be03>] __driver_attach+0xa3/0xb0
[<ffffffff8133a8fe>] bus_for_each_dev+0x5e/0x90
[<ffffffff8133b899>] driver_attach+0x19/0x20
[<ffffffff8133b443>] bus_add_driver+0xc3/0x280
[<ffffffff8133c081>] driver_register+0x71/0x140
[<ffffffff812cad36>] acpi_bus_register_driver+0x3e/0x40
[<ffffffff81acc956>] acpi_pci_root_init+0x25/0x2d
[<ffffffff810002ff>] do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] kernel_init+0x17c/0x205
[<ffffffff81003c24>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
pci 0000:00:1d.0: [8086:2688] type 0 class 0x000c03
pci 0000:00:1d.0: reg 20: [io 0x1800-0x181f]
pci 0000:00:1d.1: [8086:2689] type 0 class 0x000c03
pci 0000:00:1d.1: reg 20: [io 0x1820-0x183f]
pci 0000:00:1d.2: [8086:268a] type 0 class 0x000c03
pci 0000:00:1d.2: reg 20: [io 0x1840-0x185f]
pci 0000:00:1d.3: [8086:268b] type 0 class 0x000c03
pci 0000:00:1d.3: reg 20: [io 0x1860-0x187f]
pci 0000:00:1d.7: [8086:268c] type 0 class 0x000c03
pci 0000:00:1d.7: reg 10: [mem 0xdc904000-0xdc9043ff]
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1e.0: [8086:244e] type 1 class 0x000604
pci 0000:00:1f.0: [8086:2670] type 0 class 0x000601
pci 0000:00:1f.1: [8086:269e] type 0 class 0x000101
pci 0000:00:1f.1: reg 10: [io 0x0000-0x0007]
pci 0000:00:1f.1: reg 14: [io 0x0000-0x0003]
pci 0000:00:1f.1: reg 18: [io 0x0000-0x0007]
pci 0000:00:1f.1: reg 1c: [io 0x0000-0x0003]
pci 0000:00:1f.1: reg 20: [io 0x1880-0x188f]
pci 0000:00:1f.2: [8086:2680] type 0 class 0x000101
pci 0000:00:1f.2: reg 10: [io 0x18c0-0x18c7]
pci 0000:00:1f.2: reg 14: [io 0x18b8-0x18bb]
pci 0000:00:1f.2: reg 18: [io 0x18b0-0x18b7]
pci 0000:00:1f.2: reg 1c: [io 0x1894-0x1897]
pci 0000:00:1f.2: reg 20: [io 0x18a0-0x18af]
pci 0000:00:1f.2: reg 24: [mem 0xdc904400-0xdc9047ff]
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:00:1f.3: [8086:269b] type 0 class 0x000c05
pci 0000:00:1f.3: reg 20: [io 0x1100-0x111f]
pci 0000:00:01.0: PCI bridge to [bus 01-01]
pci 0000:00:01.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:00:01.0: bridge window [mem 0xfff00000-0x000fffff] (disabled)
pci 0000:00:01.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:02:00.0: [8086:3500] type 1 class 0x000604
pci 0000:02:00.0: PME# supported from D0 D3hot D3cold
pci 0000:02:00.0: PME# disabled
pci 0000:02:00.3: [8086:350c] type 1 class 0x000604
pci 0000:02:00.3: PME# supported from D0 D3hot D3cold
pci 0000:02:00.3: PME# disabled
pci 0000:02:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
pci 0000:00:03.0: PCI bridge to [bus 02-05]
pci 0000:00:03.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:00:03.0: bridge window [mem 0xdc500000-0xdc5fffff]
pci 0000:00:03.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:03:00.0: [8086:3510] type 1 class 0x000604
pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
pci 0000:03:00.0: PME# disabled
pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
pci 0000:02:00.0: PCI bridge to [bus 03-04]
pci 0000:02:00.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:02:00.0: bridge window [mem 0xfff00000-0x000fffff] (disabled)
pci 0000:02:00.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:03:00.0: PCI bridge to [bus 04-04]
pci 0000:03:00.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:03:00.0: bridge window [mem 0xfff00000-0x000fffff] (disabled)
pci 0000:03:00.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:02:00.3: PCI bridge to [bus 05-05]
pci 0000:02:00.3: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:02:00.3: bridge window [mem 0xfff00000-0x000fffff] (disabled)
pci 0000:02:00.3: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:06:00.0: [4040:0100] type 0 class 0x000200
pci 0000:06:00.0: reg 10: [mem 0xdc000000-0xdc1fffff 64bit]
pci 0000:06:00.0: reg 20: [mem 0xd8000000-0xd9ffffff 64bit]
pci 0000:06:00.0: reg 30: [mem 0x00000000-0x0000ffff pref]
pci 0000:06:00.0: PME# supported from D0 D3hot D3cold
pci 0000:06:00.0: PME# disabled
pci 0000:06:00.1: [4040:0100] type 0 class 0x000200
pci 0000:06:00.1: reg 10: [mem 0xdc200000-0xdc3fffff 64bit]
pci 0000:06:00.1: reg 20: [mem 0xda000000-0xdbffffff 64bit]
pci 0000:06:00.1: PME# supported from D0 D3hot D3cold
pci 0000:06:00.1: PME# disabled
pci 0000:00:05.0: PCI bridge to [bus 06-06]
pci 0000:00:05.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:00:05.0: bridge window [mem 0xd8000000-0xdc3fffff]
pci 0000:00:05.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:00:07.0: PCI bridge to [bus 07-07]
pci 0000:00:07.0: bridge window [io 0xf000-0x0000] (disabled)
pci 0000:00:07.0: bridge window [mem 0xfff00000-0x000fffff] (disabled)
pci 0000:00:07.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:08:00.0: [8086:10a7] type 0 class 0x000200
pci 0000:08:00.0: reg 10: [mem 0xdc420000-0xdc43ffff]
pci 0000:08:00.0: reg 14: [mem 0xdc400000-0xdc41ffff]
pci 0000:08:00.0: reg 18: [io 0x2000-0x201f]
pci 0000:08:00.0: reg 1c: [mem 0xdc480000-0xdc483fff]
pci 0000:08:00.0: reg 30: [mem 0x00000000-0x0001ffff pref]
pci 0000:08:00.0: PME# supported from D0 D3hot D3cold
pci 0000:08:00.0: PME# disabled
pci 0000:08:00.1: [8086:10a7] type 0 class 0x000200
pci 0000:08:00.1: reg 10: [mem 0xdc460000-0xdc47ffff]
pci 0000:08:00.1: reg 14: [mem 0xdc440000-0xdc45ffff]
pci 0000:08:00.1: reg 18: [io 0x2020-0x203f]
pci 0000:08:00.1: reg 1c: [mem 0xdc484000-0xdc487fff]
pci 0000:08:00.1: reg 30: [mem 0x00000000-0x0001ffff pref]
pci 0000:08:00.1: PME# supported from D0 D3hot D3cold
pci 0000:08:00.1: PME# disabled
pci 0000:00:09.0: PCI bridge to [bus 08-08]
pci 0000:00:09.0: bridge window [io 0x2000-0x2fff]
pci 0000:00:09.0: bridge window [mem 0xdc400000-0xdc4fffff]
pci 0000:00:09.0: bridge window [mem 0xfff00000-0x000fffff pref] (disabled)
pci 0000:09:01.0: [1002:515e] type 0 class 0x000300
pci 0000:09:01.0: reg 10: [mem 0xd0000000-0xd7ffffff pref]
pci 0000:09:01.0: reg 14: [io 0x3000-0x30ff]
pci 0000:09:01.0: reg 18: [mem 0xdc600000-0xdc60ffff]
pci 0000:09:01.0: reg 30: [mem 0x00000000-0x0001ffff pref]
pci 0000:09:01.0: supports D1 D2
pci 0000:00:1e.0: PCI bridge to [bus 09-09] (subtractive decode)
pci 0000:00:1e.0: bridge window [io 0x3000-0x3fff]
pci 0000:00:1e.0: bridge window [mem 0xdc600000-0xdc6fffff]
pci 0000:00:1e.0: bridge window [mem 0xd0000000-0xd7ffffff 64bit pref]
pci 0000:00:1e.0: bridge window [io 0x0000-0x0cf7] (subtractive decode)
pci 0000:00:1e.0: bridge window [io 0x0d00-0xffff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000a0000-0x000bffff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000cc000-0x000cffff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000d0000-0x000d3fff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000d4000-0x000d7fff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000d8000-0x000dbfff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0x000dc000-0x000dffff] (subtractive decode)
pci 0000:00:1e.0: bridge window [mem 0xd0000000-0xfebfffff] (subtractive decode)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3.BMF0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3.BMF0.BPD0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P5._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P9._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIB._PRT]
pci0000:00: Requesting ACPI _OSC control (0x1d)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000200
IP: [<ffffffff8101484b>] intel_pmu_enable_all+0xab/0x110
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff8101484b>] [<ffffffff8101484b>] intel_pmu_enable_all+0xab/0x110
RSP: 0018:ffff8800cfc07cf8 EFLAGS: 00010292
RAX: ffff880299cd0000 RBX: ffff880299c98800 RCX: 0000000000000002
RDX: ffff880299cd0000 RSI: 0000000000000001 RDI: 0000000000000246
RBP: ffff8800cfc07d28 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff80000001 R14: 0000000080000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8802a58182e4 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
BUG: unable to handle kernel paging request at 0000000181037f51
IP: [<ffffffff81038420>] kmemcheck_shadow_set+0x10/0x20
PGD 0
Oops: 0002 [#2] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff81038420>] [<ffffffff81038420>] kmemcheck_shadow_set+0x10/0x20
RSP: 0018:ffff8800cfc079d8 EFLAGS: 00010082
RAX: 000000020206ffa6 RBX: 0000000181037f51 RCX: ffff880299c98000
RDX: 0000000081038055 RSI: 0000000081038054 RDI: 0000000181037f51
RBP: ffff8800cfc079d8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800cfc079d8
R13: ffffffff81038055 R14: ffff880299c98000 R15: ffff880299c98ae8
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880299c98910 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff880299cca000, task ffff880299cd0000)

RIP [<ffff8800cfc079c8>] 0xffff8800cfc079c8
RSP <00000001cfc079d8>
CR2: 0000000181037f51
---[ end trace a7919e7f17c0a728 ]---
swapper used greatest stack depth: 2752 bytes left
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: swapper Tainted: G D W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff814cd3b6>] ? panic+0x79/0x193
[<ffffffff810533eb>] ? do_exit+0x7fb/0x8d0
[<ffffffff814d23d5>] ? oops_end+0xa5/0xf0
[<ffffffff81030c70>] ? no_context+0xf0/0x260
[<ffffffff81030f05>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff81030fce>] ? bad_area_nosemaphore+0xe/0x10
[<ffffffff814d4f52>] ? do_page_fault+0x482/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff81037177>] ? kmemcheck_read_strict+0x47/0xb0
[<ffffffff81011bf8>] ? x86_perf_event_update+0x28/0xc0
[<ffffffff81037225>] ? kmemcheck_read+0x45/0x70
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff810372e6>] ? kmemcheck_show_addr+0x46/0x70
[<ffffffff810377c9>] ? kmemcheck_show_all+0x39/0x60
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff81037d0a>] ? kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff810df9ed>] ? perf_event_update_userpage+0xad/0xe0
[<ffffffff810df940>] ? perf_event_update_userpage+0x0/0xe0
[<ffffffff81011c36>] ? x86_perf_event_update+0x66/0xc0
[<ffffffff81014849>] ? intel_pmu_enable_all+0xa9/0x110
[<ffffffff8101606e>] ? intel_pmu_handle_irq+0x26e/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5121>] ? __atomic_notifier_call_chain+0x81/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d3d>] ? do_nmi+0x7d/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff81322863>] ? io_serial_in+0x13/0x20
<<EOE>>
...................................................................................
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 ro root=
LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9
600 LANG=en_US.UTF-8 loglevel=8
[Linux-bzImage, setup=0x3800, size=0x3ebb90]
initrd /boot/initramfs-2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64.img
[Linux-initrd @ 0x36d47000, 0x12a8eb7 bytes]

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 ([email protected]) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #1 SMP Mon Feb 14 09:30:51 PST 2011
Command line: ro root=LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9600 LANG=en_US.UTF-8 loglevel=8
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cfef0000 (usable)
BIOS-e820: 00000000cfef0000 - 00000000cff04000 (ACPI data)
BIOS-e820: 00000000cff04000 - 00000000cff05000 (ACPI NVS)
BIOS-e820: 00000000cff05000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 00000002b0000000 (usable)
NX (Execute Disable) protection: active
DMI present.
DMI: X7DWU/X7DWU, BIOS 1.2 11/04/2008
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
No AGP bridge found
last_pfn = 0x2b0000 max_arch_pfn = 0x400000000
MTRR default type: uncachable
MTRR fixed ranges enabled:
00000-9FFFF write-back
A0000-BFFFF uncachable
C0000-CFFFF write-protect
D0000-E3FFF uncachable
E4000-FFFFF write-protect
MTRR variable ranges enabled:
0 base 00D0000000 mask 3FF0000000 uncachable
1 base 00E0000000 mask 3FE0000000 uncachable
2 base 0000000000 mask 3E00000000 write-back
3 base 0200000000 mask 3F80000000 write-back
4 base 0280000000 mask 3FE0000000 write-back
5 base 02A0000000 mask 3FF0000000 write-back
6 base 00CFF80000 mask 3FFFF80000 uncachable
7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
e820 update range: 00000000cff80000 - 0000000100000000 (usable) ==> (reserved)
last_pfn = 0xcfef0 max_arch_pfn = 0x400000000
found SMP MP-table at [ffff8800000f68b0] f68b0
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-00000000cfef0000
0000000000 - 00cfef0000 page 4k
kernel direct mapping tables up to cfef0000 @ 1f97b000-20000000
init_memory_mapping: 0000000100000000-00000002b0000000
0100000000 - 02b0000000 page 4k
kernel direct mapping tables up to 2b0000000 @ ce964000-cfef0000
RAMDISK: 36d47000 - 37ff0000
Reserving 128MB of memory at 736MB for crashkernel (System RAM: 11008MB)
ACPI: RSDP 00000000000f6880 00024 (v02 PTLTD )
ACPI: XSDT 00000000cfefcef6 000D4 (v01 PTLTD ? XSDT 06040000 LTP 00000000)
ACPI: FACP 00000000cff033a8 000F4 (v03 INTEL STOAKLEY 06040000 PTL 00000003)
ACPI: DSDT 00000000cfefeab8 0487C (v01 Intel SEABURG 06040000 MSFT 03000001)
ACPI: FACS 00000000cff04fc0 00040
ACPI: _MAR 00000000cff0349c 00030 (v01 Intel OEMDMAR 06040000 LOHR 00000001)
ACPI: TCPA 00000000cff034cc 00032 (v01 Intel STOAKLEY 06040000 LOHR 0000005A)
ACPI: APIC 00000000cff034fe 000C8 (v01 PTLTD ? APIC 06040000 LTP 00000000)
ACPI: MCFG 00000000cff035c6 0003C (v01 PTLTD MCFG 06040000 LTP 00000000)
ACPI: HPET 00000000cff03602 00038 (v01 PTLTD HPETTBL 06040000 LTP 00000001)
ACPI: BOOT 00000000cff0363a 00028 (v01 PTLTD $SBFTBL$ 06040000 LTP 00000001)
ACPI: SPCR 00000000cff03662 00050 (v01 PTLTD $UCRTBL$ 06040000 PTL 00000001)
ACPI: ERST 00000000cff036b2 00590 (v01 SMCI ERSTTBL 06040000 SMCI 00000001)
ACPI: HEST 00000000cff03c42 000A8 (v01 SMCI HESTTBL 06040000 SMCI 00000001)
ACPI: BERT 00000000cff03cea 00030 (v01 SMCI BERTTBL 06040000 SMCI 00000001)
ACPI: EINJ 00000000cff03d1a 00170 (v01 SMCI EINJTBL 06040000 SMCI 00000001)
ACPI: SLIC 00000000cff03e8a 00176 (v01 OEM_ID OEMTABLE 06040000 LTP 00000000)
ACPI: SSDT 00000000cfefe859 0025F (v01 PmRef Cpu0Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe7b3 000A6 (v01 PmRef Cpu7Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe70d 000A6 (v01 PmRef Cpu6Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe667 000A6 (v01 PmRef Cpu5Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe5c1 000A6 (v01 PmRef Cpu4Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe51b 000A6 (v01 PmRef Cpu3Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe475 000A6 (v01 PmRef Cpu2Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefe3cf 000A6 (v01 PmRef Cpu1Tst 00003000 INTL 20050228)
ACPI: SSDT 00000000cfefcfca 01405 (v01 PmRef CpuPm 00003000 INTL 20050228)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-00000002b0000000
Initmem setup node 0 0000000000000000-00000002b0000000
NODE_DATA [00000002affec000 - 00000002afffffff]
[ffffea0000000000-ffffea000abfffff] PMD -> [ffff8802a5e00000-ffff8802afdfffff] on node 0
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x002b0000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0x00000010 -> 0x0000009d
0: 0x00000100 -> 0x000cfef0
0: 0x00100000 -> 0x002b0000
On node 0 totalpages: 2621053
DMA zone: 64 pages used for memmap
DMA zone: 6 pages reserved
DMA zone: 3911 pages, LIFO batch:0
DMA32 zone: 16320 pages used for memmap
DMA32 zone: 831280 pages, LIFO batch:31
Normal zone: 27648 pages used for memmap
Normal zone: 1741824 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec89000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec89000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a201 base: 0xfed00000
SMP: Allowing 8 CPUs, 0 hotplug CPUs
nr_irqs_gsi: 64
PM: Registered nosave memory: 000000000009d000 - 000000000009e000
PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
PM: Registered nosave memory: 00000000cfef0000 - 00000000cff04000
PM: Registered nosave memory: 00000000cff04000 - 00000000cff05000
PM: Registered nosave memory: 00000000cff05000 - 00000000d0000000
PM: Registered nosave memory: 00000000d0000000 - 00000000e0000000
PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000
PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000
PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000
PM: Registered nosave memory: 00000000fee01000 - 00000000ff000000
PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
Allocating PCI resources starting at d0000000 (gap: d0000000:10000000)
Booting paravirtualized kernel on bare hardware
setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:8 nr_node_ids:1
PERCPU: Embedded 26 pages/cpu @ffff8800cfc00000 s77056 r8192 d21248 u262144
pcpu-alloc: s77056 r8192 d21248 u262144 alloc=1*2097152
pcpu-alloc: [0] 0 1 2 3 4 5 6 7
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 2577015
Policy zone: Normal
Kernel command line: ro root=LABEL=/ rhgb quiet crashkernel=128M rd_NO_PLYMOUTH console=tty0 console=ttyS0,9600 LANG=en_US.UTF-8 loglevel=8
PID hash table entries: 4096 (order: 3, 32768 bytes)
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 10058856k/11272192k available (4972k kernel code, 787980k absent, 425356k reserved, 5824k data, 1308k init)
SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is disabled.
NR_IRQS:4352 nr_irqs:1152 16
Extended CMOS year: 2000
Console: colour VGA+ 80x25
console [tty0] enabled
console [ttyS0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES: 8
... MAX_LOCK_DEPTH: 48
... MAX_LOCKDEP_KEYS: 8191
... CLASSHASH_SIZE: 4096
... MAX_LOCKDEP_ENTRIES: 16384
... MAX_LOCKDEP_CHAINS: 32768
... CHAINHASH_SIZE: 16384
memory used by lock dependency info: 5855 kB
per task-struct memory footprint: 1920 bytes
allocated 104857600 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
hpet clockevent registered
Fast TSC calibration using PIT
Detected 2500.222 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 5000.44 BogoMIPS (lpj=10000888)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys blkio
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 6 MCE banks
CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20110112
Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping 06
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
kmemcheck: Limiting number of CPUs to 1.
kmemcheck: Initialized
NMI watchdog enabled, takes one hw-pmu counter.
Brought up 1 CPUs
Total of 1 processors activated (5000.44 BogoMIPS).
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-09] at [mem 0xe0000000-0xe09fffff] (base 0xe0000000)
PCI: MMCONFIG at [mem 0xe0000000-0xe09fffff] reserved in E820
PCI: Using configuration type 1 for base access
------------[ cut here ]------------
------------[ cut here ]------------
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81318f57>] ? vt_console_print+0xa7/0x390
[<ffffffff81318f1c>] ? vt_console_print+0x6c/0x390
[<ffffffff8104f025>] ? __call_console_drivers+0x75/0x90
[<ffffffff8104f085>] ? _call_console_drivers+0x45/0x70
[<ffffffff8104f59e>] ? console_unlock+0x11e/0x250
[<ffffffff8104f9ee>] ? vprintk+0x1fe/0x4c0
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814cd50c>] ? printk+0x3c/0x40
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff8104ee68>] ? warn_slowpath_common+0x38/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81011bf8>] ? x86_perf_event_update+0x28/0xc0
[<ffffffff81015ffc>] ? intel_pmu_handle_irq+0x1fc/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5107>] ? __atomic_notifier_call_chain+0x67/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d92>] ? do_nmi+0xd2/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff810372ee>] ? kmemcheck_show_addr+0x4e/0x70
<<EOE>> [<ffffffff810377c9>] ? kmemcheck_show_all+0x39/0x60
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff81037d0a>] ? kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff811004fb>] ? kstrdup+0x4b/0x70
[<ffffffff811a739c>] ? sysfs_new_dirent+0x10c/0x120
[<ffffffff811a8123>] ? sysfs_do_create_link+0x93/0x230
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff811a82ce>] ? sysfs_create_link+0xe/0x10
[<ffffffff81338f0b>] ? device_add+0x37b/0x650
[<ffffffff813391f9>] ? device_register+0x19/0x20
[<ffffffff813392fb>] ? device_create_vargs+0xfb/0x130
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff8110308f>] ? bdi_register+0x6f/0x1c0
[<ffffffff8127c183>] ? prop_local_init_percpu+0x43/0x50
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff81abdd36>] ? default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a725 ]---
WARNING: at arch/x86/mm/kmemcheck/kmemcheck.c:634 kmemcheck_fault+0xb9/0xd0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff81037d39>] ? kmemcheck_fault+0xb9/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81011bf8>] ? x86_perf_event_update+0x28/0xc0
[<ffffffff81015ffc>] ? intel_pmu_handle_irq+0x1fc/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5107>] ? __atomic_notifier_call_chain+0x67/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d92>] ? do_nmi+0xd2/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff810372ee>] ? kmemcheck_show_addr+0x4e/0x70
<<EOE>> [<ffffffff810377c9>] ? kmemcheck_show_all+0x39/0x60
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff81037d0a>] ? kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff811004fb>] ? kstrdup+0x4b/0x70
[<ffffffff811a739c>] ? sysfs_new_dirent+0x10c/0x120
[<ffffffff811a8123>] ? sysfs_do_create_link+0x93/0x230
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff811a82ce>] ? sysfs_create_link+0xe/0x10
[<ffffffff81338f0b>] ? device_add+0x37b/0x650
[<ffffffff813391f9>] ? device_register+0x19/0x20
[<ffffffff813392fb>] ? device_create_vargs+0xfb/0x130
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff8110308f>] ? bdi_register+0x6f/0x1c0
[<ffffffff8127c183>] ? prop_local_init_percpu+0x43/0x50
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff81abdd36>] ? default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a726 ]---
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2465 lockdep_trace_alloc+0xcc/0xe0()
Hardware name: X7DWU
Modules linked in:
Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
[<ffffffff8104eeaa>] ? warn_slowpath_common+0x7a/0xb0
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff8104eef5>] ? warn_slowpath_null+0x15/0x20
[<ffffffff810883dc>] ? lockdep_trace_alloc+0xcc/0xe0
[<ffffffff8112e5f8>] ? kmem_cache_alloc+0x28/0x160
[<ffffffff811a72cc>] ? sysfs_new_dirent+0x3c/0x120
[<ffffffff811a8123>] ? sysfs_do_create_link+0x93/0x230
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff811a82ce>] ? sysfs_create_link+0xe/0x10
[<ffffffff81338f0b>] ? device_add+0x37b/0x650
[<ffffffff813391f9>] ? device_register+0x19/0x20
[<ffffffff813392fb>] ? device_create_vargs+0xfb/0x130
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff8110308f>] ? bdi_register+0x6f/0x1c0
[<ffffffff8127c183>] ? prop_local_init_percpu+0x43/0x50
[<ffffffff81abdc97>] ? default_bdi_init+0x0/0xaf
[<ffffffff81abdd36>] ? default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] ? do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] ? kernel_init+0x17c/0x205
[<ffffffff81003c24>] ? kernel_thread_helper+0x4/0x10
[<ffffffff814d1394>] ? restore_args+0x0/0x30
[<ffffffff81aa0c4a>] ? kernel_init+0x0/0x205
[<ffffffff81003c20>] ? kernel_thread_helper+0x0/0x10
---[ end trace a7919e7f17c0a727 ]---
ERROR: kmemcheck: Fatal error

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff811004fb>] [<ffffffff811004fb>] kstrdup+0x4b/0x70
RSP: 0018:ffff880299ccbc10 EFLAGS: 00010102
RAX: ffff8802a584f688 RBX: 0000000000000008 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff8802a584f66c RDI: ffff8802a584f68c
RBP: ffff880299ccbc30 R08: 0000000000000000 R09: ffff8802a5850688
R10: ffff880000000000 R11: 0000000000000001 R12: ffff8802a584f668
R13: 00000000000000d0 R14: ffff8802a584f668 R15: 000000000000a1ff
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880299c98810 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[<ffffffff810370c4>] kmemcheck_error_save_bug+0x74/0xb0
[<ffffffff8103790b>] kmemcheck_show+0x8b/0xa0
[<ffffffff81037d0a>] kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] do_page_fault+0x41d/0x560
[<ffffffff814d15a5>] page_fault+0x25/0x30
[<ffffffff811a739c>] sysfs_new_dirent+0x10c/0x120
[<ffffffff811a8123>] sysfs_do_create_link+0x93/0x230
[<ffffffff811a82ce>] sysfs_create_link+0xe/0x10
[<ffffffff81338f0b>] device_add+0x37b/0x650
[<ffffffff813391f9>] device_register+0x19/0x20
[<ffffffff813392fb>] device_create_vargs+0xfb/0x130
[<ffffffff8110308f>] bdi_register+0x6f/0x1c0
[<ffffffff81abdd36>] default_bdi_init+0x9f/0xaf
[<ffffffff810002ff>] do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] kernel_init+0x17c/0x205
[<ffffffff81003c24>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
\_SB_:_OSC evaluation returned wrong type
_OSC request data:1 7
ERROR: kmemcheck: Fatal error

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff8107664f>] [<ffffffff8107664f>] down_timeout+0x2f/0x60
RSP: 0018:ffff880299ccbbd0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880299c6e0c0 RCX: 0000000000000000
RDX: 000000000000f101 RSI: 0000000000000000 RDI: ffff880299c6e0c0
RBP: ffff880299ccbbf0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000286
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880299c98810 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[<ffffffff810370c4>] kmemcheck_error_save_bug+0x74/0xb0
[<ffffffff8103790b>] kmemcheck_show+0x8b/0xa0
[<ffffffff81037d0a>] kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] do_page_fault+0x41d/0x560
[<ffffffff814d15a5>] page_fault+0x25/0x30
[<ffffffff812c7523>] acpi_os_wait_semaphore+0x96/0x13c
[<ffffffff812f2f3b>] acpi_ut_acquire_mutex+0x84/0x101
[<ffffffff812e8b2b>] acpi_ns_walk_namespace+0xea/0x17b
[<ffffffff812e5c96>] acpi_walk_namespace+0x8e/0xc8
[<ffffffff812e83cc>] acpi_ns_initialize_objects+0x93/0x185
[<ffffffff812f0fbb>] acpi_initialize_objects+0x85/0xd3
[<ffffffff81acbf3c>] acpi_bus_init+0x9f/0x1dd
[<ffffffff81acc0eb>] acpi_init+0x71/0xd8
[<ffffffff810002ff>] do_one_initcall+0x3f/0x180
[<ffffffff81aa0dc6>] kernel_init+0x17c/0x205
[<ffffffff81003c24>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
BUG: unable to handle kernel NULL pointer dereference at 0000000000000064
IP: [<ffffffff810df97e>] perf_event_update_userpage+0x3e/0xe0
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<ffffffff810df97e>] [<ffffffff810df97e>] perf_event_update_userpage+0x3e/0xe0
RSP: 0018:ffff8800cfc07cc8 EFLAGS: 00010206
RAX: 000000000000000c RBX: ffff880299cbc4c8 RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000156 RDI: 0000000000001000
RBP: ffff880299ccb8d0 R08: 0000000000000080 R09: ffffffff817a4b41
R10: ffff880000000001 R11: 0000000000000000 R12: 0000000000000008
R13: ffff880299e21208 R14: 0000000000000013 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8802a58182e4 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
BUG: unable to handle kernel paging request at 0000000181037f51
IP: [<0000000181037f51>] 0x181037f51
PGD 0
Oops: 0010 [#2] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Tainted: G W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1 X7DWU/X7DWU
RIP: 0010:[<0000000181037f51>] [<0000000181037f51>] 0x181037f51
RSP: 0018:ffff8800cfc079b8 EFLAGS: 00010092
RAX: 0000000000000048 RBX: ffff8800cfc07c18 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000046
RBP: ffff8800cfc079c8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800cfc07cc8
R13: ffff8800cfc07c18 R14: ffff8800cfc079b8 R15: ffffffff81037f51
FS: 0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8802a58182e4 CR3: 0000000001a03000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff880299cca000, task ffff880299cd0000)
Stack:
ffff8800cfc079d8 ffffffff81038055 ffff880299c98000 ffff880299c98ae8
ffff8800cfc07a08 ffffffff81037177 ffff880299c98000 ffff8800cfc07c18
ffff880299c98aef ffffffff810df972 ffff8800cfc07a38 ffffffff81037225
Call Trace:
<NMI>
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff81037177>] ? kmemcheck_read_strict+0x47/0xb0
[<ffffffff810df972>] ? perf_event_update_userpage+0x32/0xe0
[<ffffffff81037225>] ? kmemcheck_read+0x45/0x70
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff810372e6>] ? kmemcheck_show_addr+0x46/0x70
[<ffffffff810377c9>] ? kmemcheck_show_all+0x39/0x60
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff81037d0a>] ? kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff810df9ed>] ? perf_event_update_userpage+0xad/0xe0
[<ffffffff810df940>] ? perf_event_update_userpage+0x0/0xe0
[<ffffffff81014849>] ? intel_pmu_enable_all+0xa9/0x110
[<ffffffff8101606e>] ? intel_pmu_handle_irq+0x26e/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5121>] ? __atomic_notifier_call_chain+0x81/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d3d>] ? do_nmi+0x7d/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff81322863>] ? io_serial_in+0x13/0x20
<<EOE>>
Code: Bad RIP value.
RIP [<0000000181037f51>] 0x181037f51
RSP <ffffffff81037f51>
CR2: 0000000181037f51
---[ end trace a7919e7f17c0a728 ]---
swapper used greatest stack depth: 3344 bytes left
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: swapper Tainted: G D W 2.6.38-rc4-357858.2010AroraKernelBeta.fc14.x86_64 #1
Call Trace:
<NMI> [<ffffffff814cd3b6>] ? panic+0x79/0x193
[<ffffffff810533eb>] ? do_exit+0x7fb/0x8d0
[<ffffffff814d23d5>] ? oops_end+0xa5/0xf0
[<ffffffff81030c70>] ? no_context+0xf0/0x260
[<ffffffff81030f05>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff81030fce>] ? bad_area_nosemaphore+0xe/0x10
[<ffffffff814d4f52>] ? do_page_fault+0x482/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff814d15a5>] ? page_fault+0x25/0x30
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff81038055>] ? kmemcheck_shadow_lookup+0x45/0x70
[<ffffffff81037177>] ? kmemcheck_read_strict+0x47/0xb0
[<ffffffff810df972>] ? perf_event_update_userpage+0x32/0xe0
[<ffffffff81037225>] ? kmemcheck_read+0x45/0x70
[<ffffffff81037f51>] ? kmemcheck_pte_lookup+0x11/0x50
[<ffffffff810372e6>] ? kmemcheck_show_addr+0x46/0x70
[<ffffffff810377c9>] ? kmemcheck_show_all+0x39/0x60
[<ffffffff810378bc>] ? kmemcheck_show+0x3c/0xa0
[<ffffffff81037d0a>] ? kmemcheck_fault+0x8a/0xd0
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d4eed>] ? do_page_fault+0x41d/0x560
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d0528>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff814d1850>] ? error_exit+0x30/0xb0
[<ffffffff810df9ed>] ? perf_event_update_userpage+0xad/0xe0
[<ffffffff810df940>] ? perf_event_update_userpage+0x0/0xe0
[<ffffffff81014849>] ? intel_pmu_enable_all+0xa9/0x110
[<ffffffff8101606e>] ? intel_pmu_handle_irq+0x26e/0x4b0
[<ffffffff814d2b78>] ? perf_event_nmi_handler+0x58/0xe0
[<ffffffff814d507d>] ? notifier_call_chain+0x4d/0x70
[<ffffffff814d5121>] ? __atomic_notifier_call_chain+0x81/0xa0
[<ffffffff814d50a0>] ? __atomic_notifier_call_chain+0x0/0xa0
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff814d5151>] ? atomic_notifier_call_chain+0x11/0x20
[<ffffffff814d518e>] ? notify_die+0x2e/0x30
[<ffffffff814d1d3d>] ? do_nmi+0x7d/0x290
[<ffffffff814d18f0>] ? nmi+0x20/0x39
[<ffffffff81322f90>] ? serial8250_console_putchar+0x0/0x40
[<ffffffff81322863>] ? io_serial_in+0x13/0x20
<<EOE>>

2011-02-14 20:41:11

by Andrew Morton

[permalink] [raw]
Subject: Re: Heads up Linux 2.6.38-rc4 compile problems.

On Mon, 14 Feb 2011 11:44:47 -0800
[email protected] (Eric W. Biederman) wrote:

> My three boot attempts with CONFIG_DEBUG_PAGEALLOC are below.
>
>
> ERROR: kmemcheck: Fatal error

It might be that kmemcheck and DEBUG_PAGE_ALLOC don't get along well -
try disabling kmemcheck?

It's worth persisting with DEBUG_PAGE_ALLOC, please - if we can get it
working then it should be able to find a stray write into a freed 8k page
with precision.

2011-02-15 14:08:11

by Dave Anderson

[permalink] [raw]
Subject: Re: [Crash-utility] Heads up Linux 2.6.38-rc4 compile problems.



----- Original Message -----
> Hey Eric,
>
> On Mon, Feb 14, 2011 at 12:34 AM, Eric W. Biederman
> <[email protected]> wrote:
> >
> > And for completeness. When I was rebooting v2.6.38-rc4 to start running
> > 795abaf1e4e188c4171e3cd3dbb11a9fcacaf505 I hit this.
> >
> > Sigh. I wish crash worked on something besides redhats enterprise
> > kernels. Then I could use the system core file I have to do more than
> > extract the dmesg.

I update the upstream version of crash typically once a month.

Perhaps you are using an older version? I just built a 2.6.38-rc4
kernel, and the latest version of crash (5.1.2) works OK with it:

[root@hp-z400-02 ~]# crash

crash 5.1.2
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: /vmlinux
DUMPFILE: /dev/mem
CPUS: 6
DATE: Tue Feb 15 08:57:17 2011
UPTIME: 00:00:54
LOAD AVERAGE: 0.37, 0.10, 0.04
TASKS: 132
NODENAME: hp-z400-02.lab.bos.redhat.com
RELEASE: 2.6.38-rc4
VERSION: #1 SMP Mon Feb 14 17:41:17 EST 2011
MACHINE: x86_64 (3067 Mhz)
MEMORY: 4 GB
PID: 1539
COMMAND: "crash"
TASK: ffff8801363f9710 [THREAD_INFO: ffff880135f2a000]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)

crash>

Dave

>
> Then you should cc [email protected] (now cc'd) and work with
> Dave Anderson (e.g. get him your vmlinux and core files, which version
> of crash you're using and how it fails). Dave does an amazing job of
> working through crash issues which are reported against upstream
> kernels -- the key first step is the report.
>
> Mike
>
> --
> Crash-utility mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/crash-utility

2011-02-17 19:34:17

by Kees Cook

[permalink] [raw]
Subject: Re: Linux 2.6.38-rc4 (test_nx: BUG)

Hi Arjan,

On Wed, Feb 09, 2011 at 09:10:48AM -0800, Arjan van de Ven wrote:
> On 2/9/2011 9:08 AM, Randy Dunlap wrote:
> >test_nx BUGs.
> >CONFIG_DEBUG_RODATA=y
> >(nearly allmodconfig, with a few changes)
> >
> >Is that expected?
> >
> this test should pass...
> so something broke it (I'll call that a success for the test ;-)

I would suspect that the code in fudze_exception_table() isn't legal any
more due to the RO patches. (i.e. I think test_nx.c needs to be updated.)

-Kees

--
Kees Cook
Ubuntu Security Team