2009-12-24 22:00:32

by Linus Torvalds

[permalink] [raw]
Subject: Linux 2.6.33-rc2 - Merry Christmas ...



.. or wahetever it is you'll be celebrating today/tomorrow.

And if you aren't celebrating anything at all, but instead sitting in your
dark basement feeling lonely and bored, you can at least try out the
latest -rc kernel. Because it's better than moping around doing nothing.

And if you _are_ celebrating, but are having a bit of an overload of ham
(or are getting fed up with losing all your clothes in strip dreidel and
having people point and snigger at you, if that's what gets you through
the holidays), take a break, compile a kernel, and try it out for size.

Because while your pants may not fit you for the next month, the new
kernel might just fit your computer perfectly.

The dirstat says it all: mostly drivers (much of it staging updates,
especially as nouveau also counts as staging even if it's not physically
there). But a smattering of changes all over (scheduler, vfs, arch, kfifo
etc):

2.9% Documentation/
2.4% arch/arm/
3.4% arch/powerpc/
2.7% arch/x86/
9.4% arch/
9.8% drivers/gpu/drm/nouveau/
2.8% drivers/gpu/drm/radeon/
15.4% drivers/gpu/drm/
6.4% drivers/platform/x86/
12.1% drivers/staging/dst/
13.4% drivers/staging/sm7xx/
27.1% drivers/staging/
10.3% drivers/usb/serial/
11.9% drivers/usb/
69.3% drivers/
3.7% fs/
6.4% include/linux/
6.5% include/
3.4% kernel/
2.4% sound/pci/hda/
3.3% sound/

I'd have been happier with a smaller -rc2, but I've seen worse too, so I'm
not going to complain.

And now I'm off to peel potatoes. And then to eat them.

Have fun,

Linus

---
Al Viro (4):
fix braindamage in audit_tree.c untag_chunk()
fix more leaks in audit_tree.c tag_chunk()
Fix f_flags/f_mode in case of lookup_instantiate_filp() from open(pathname, 3)
Sanitize f_flags helpers

Alan Cox (1):
jfs: Fix 32bit build warning

Alan Stern (5):
PM: Use pm_runtime_put_sync in system resume
PM: Runtime PM documentation update
USB: power management documentation update
USB: rename usb_configure_device
USB: fix bugs in usb_(de)authorize_device

Albert Herranz (3):
powerpc/gamecube/wii: Fix off-by-one error in ugecon/usbgecko_udbg
powerpc/gc/wii: hlwd-pic: convert irq_desc.lock to raw_spinlock
powerpc/gc/wii: Remove get_irq_desc()

Alex Chiang (11):
ACPI: processor: call _PDC early
ACPI: processor: introduce arch_has_acpi_pdc
ACPI: processor: unify arch_acpi_processor_init_pdc
ACPI: processor: factor out common _PDC settings
ACPI: processor: finish unifying arch_acpi_processor_init_pdc()
ACPI: processor: unify arch_acpi_processor_cleanup_pdc
ACPI: processor: introduce acpi_processor_alloc_pdc()
ACPI: processor: change acpi_processor_eval_pdc interface
ACPI: processor: open code acpi_processor_cleanup_pdc
ACPI: processor: change acpi_processor_set_pdc() interface
ACPI: processor: remove _PDC object list from struct acpi_processor

Alex Deucher (6):
drm/radeon/kms/atom: fill in proper defines for digital setup
drm/radeon/kms: fix legacy rmx
drm/radeon/kms: set proper default tv standard
drm/radeon/kms: add cvt mode if we only have lvds w/h and no edid (v4)
drm/radeon/kms: never combine LVDS with another encoder
drm/radeon/kms: add definitions for v4 power tables

Alexey Dobriyan (4):
powerpc/iseries: Convert to proc_fops
netns: fix net.ipv6.route.gc_min_interval_ms in netns
asus_acpi: convert to seq_file
toshiba_acpi: convert to seq_file

Alexey Starikovskiy (1):
ACPI: EC: Fix MSI DMI detection

Anand Gadiyar (1):
ARM: 5853/1: ARM: Fix build break on ARM v6 and v7

Anatolij Gustschin (1):
powerpc/44x: Extend Katmai dts for ADMA and RAID56 support

Andi Kleen (4):
HWPOISON: Add PROC_FS dependency to hwpoison injector v2
DRM: Rename clamp variable
SYSCTL: Print binary sysctl warnings (nearly) only once
SYSCTL: Add a mutex to the page_alloc zone order sysctl

Andreas Gruenbacher (1):
Remove obsolete comment in fs.h

Andreas Herrmann (1):
x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space

Andreas Mohr (2):
USB: ftdi_sio: isolate all device IDs to new ftdi_sio_ids.h header
USB: ftdi_sio: sort PID/VID entries in new ftdi_sio_ids.h header

Andrei Emeltchenko (1):
Bluetooth: Fix L2CAP locking scheme regression

Andres Salomon (1):
watchdog: update geodewdt for new MFGPT API

Andrew Lunn (1):
Staging: batman-adv: Add Kconfig dependancies on PROC_FS and PACKET.

Andrew Morton (1):
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: avoid cross-CPU interrupts by using smp_call_function_any()

Anisse Astier (3):
MAINTAINERS: add maintainer for msi-wmi driver
ALSA: hda - Add STAC9205 PCI_QUIRK for Dell Vostro 1700
wmi: Free the allocated acpi objects through wmi_get_event_data

Anton Blanchard (3):
powerpc/defconfigs: Reduce 64bit vmlinux by making acenic and cramfs modules
powerpc/defconfigs: Disable token ring in powerpc defconfigs
powerpc/defconfigs: Set HZ=100 on pseries and ppc64 defconfigs

Anton Vorontsov (4):
powerpc/fsl_pci: Fix P2P bridge handling for MPC83xx PCIe controllers
powerpc/83xx/suspend: Clear deep_sleeping after devices resume
powerpc/83xx/suspend: Save and restore SICRL, SICRH and SCCR
powerpc/83xx: Add power management support for MPC8315E-RDB boards

Arnaldo Carvalho de Melo (1):
perf session: Make events_stats u64 to avoid overflow on 32-bit arches

Arnaud Mandy (1):
USB: musb: gadget: set otg tranceiver to idle when registering gadget

Arnd Bergmann (2):
drm: convert drm_ioctl to unlocked_ioctl
fs/compat_ioctl.c: fix build error when !BLOCK

Bartlomiej Zolnierkiewicz (1):
pata_cmd64x: fix overclocking of UDMA0-2 modes

Ben Skeggs (5):
drm/nv40: implement ctxprog/state generation
drm/nv50: fix two potential suspend/resume oopses
drm/nouveau: prevent all channel creation if accel not available
drm/nv50: fix suspend/resume delays without firmware present
drm/nouveau: fix bug causing pinned buffers to lose their NO_EVICT flag

Benjamin Herrenschmidt (2):
powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM
powerpc: Fix MSI support on U4 bridge PCIe slot

Bernd Porr (2):
Staging: comedi: usbdux.c: fix locking up of the driver when the comedi ringbuffer runs empty
Staging: comedi: removed "depricated" from COMEDI_CB_BLOCK

Bill Gatliff (1):
USB: Fix double-linking of drivers/usb/otg when ULPI is selected

Blaise Gassend (1):
USB: serial: Extra device/vendor ID for mos7840 driver

Bob Gleitsmann (1):
drm/mm: fix logic for selection of best fit block

Borislav Petkov (7):
x86, msr: msrs_alloc/free for CONFIG_SMP=n
amd64_edac: fix K8 chip select reporting
amd64_edac: fix driver instance freeing
amd64_edac: make driver loading more robust
amd64_edac: fix forcing module load/unload
amd64_edac: restrict PCI config space access
edac, pci: remove pesky debug printk

Breno Leitao (2):
bnx2: reset_task is crashing the kernel. Fixing it.
bnx2: fixing a timout error due not refreshing TX timers correctly

Bruce Allan (1):
e1000e: LED settings in EEPROM ignored on 82571 and 82572

Bryan Wu (1):
USB: musb: workaround Blackfin FIFO anomalies

Carlos R. Mafra (1):
ACPI: do not select ACPI_DOCK from ATA_ACPI

Christoph Hellwig (1):
libata: use the WRITE_SAME_16 define

Clemens Ladisch (2):
sound: sgio2audio/pdaudiocf/usb-audio: initialize PCM buffer
USB: emi62: fix crash when trying to load EMI 6|2 firmware

Cliff Cai (4):
USB: musb: fix compiling warning with min() macro
USB: musb: correct DMA address for tx
USB: audio gadget: fix wTotalLength calculation
USB: audio gadget: free alsa devices when unloading

Coly Li (2):
ocfs2: explicit declare uninitialized var in user_cluster_connect()
ocfs2: replace u8 by __u8 in ocfs2_fs.h

Cory Maccarrone (1):
i2c-omap: Don't write IE state in unidle if 0

Daniel T Chen (1):
ALSA: hda: Set Front Mic to input vref 50% for Lenovo 3000 Y410

Daniele Calore (1):
alpha: Wire up missing/new syscalls

Dave Airlie (1):
drm/radeon: fix build on 64-bit with some compilers.

David Daney (2):
powerpc: Convert BUG() to use unreachable()
alpha: Convert BUG() to use unreachable()

David Gibson (1):
powerpc/mm: Fix stupid bug in subpge protection handling

David Miller (1):
sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

David Vrabel (1):
MAINTAINERS: update entries for WUSB, UWB and WLP subsystems

Dmitry Eremin-Solenikov (4):
powerpc/83xx: mpc8349emitx - add gpio controller declarations
powerpc/83xx: mpc8349emitx - populate I2C busses in device tree
powerpc/83xx: mpc8349emitx - add OF descriptions of LocalBus devices
powerpc/83xx: mpc8349emitx - add leds-gpio binding

Dmitry Monakhov (7):
ext3: quota macros cleanup [V2]
Add unlocked version of inode_add_bytes() function
quota: decouple fs reserved space from quota reservation
ext4: Convert to generic reserved quota's space management.
quota: Move duplicated code to separate functions
ext4: Fix potential quota deadlock
ext4: fix sleep inside spinlock issue with quota and dealloc (#14739)

Dmitry Torokhov (4):
tc1100-wmi - switch to using attribute group
tc1100-wmi - add error handling for device registration
tc1100-wmi - switch to using dev_pm_ops
dell-wmi: do not keep driver loaded on unsupported boxes

Dominik Brodowski (1):
resources: fix call to alignf() in allocate_resource()

Donny Kurnia (1):
USB: option: support hi speed for modem Haier CE100

Einar R?nkaru (2):
ALSA: hda - Fixed internal mic initialization for Dell Vostro 1015
ALSA: hda - Make use of beep device found in Dell Vostro 1015n

Eric Millbrandt (1):
ASoC: Do not write to invalid registers on the wm9712.

Eric Sandeen (4):
ext3: Remove outdated comment about lock_super()
ext3: ext3_mark_recovery_complete() doesn't need to use lock_super
ext3: Replace lock/unlock_super() with an explicit lock for the orphan list
ext3: Replace lock/unlock_super() with an explicit lock for resizing

Evgeniy Polyakov (1):
pohmelfs needs I_LOCK

Felipe Balbi (4):
USB: musb: move musb_remove to __exit
USB: musb: MAINTAINERS: Fix my tree's address
USB: musb: do not work if no gadget driver is loaded
usb: otg: isp1301_omap: fix compile error

Felix Radensky (1):
powerpc/85xx: Workaround MPC8572/MPC8536 GPIO 1 errata.

Florian Fainelli (1):
ALSA: sound/core/pcm_timer.c: use lib/gcd.c

Francisco Jerez (4):
drm/nv04-nv40: Fix "conflicting memory types" when saving/restoring VGA fonts.
drm/i2c/ch7006: Fix load detection false positives right after system init.
drm/nouveau: Fix up buffer eviction, and evict them to GART, if possible.
drm/nv10: Add the initial graph context and soft methods needed for LMA.

Frederic Weisbecker (4):
sched: Teach might_sleep() about preemptible RCU
perf events, x86/stacktrace: Make stack walking optional
perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
hw-breakpoints: Fix hardware breakpoints -> perf events dependency

Gautham R Shenoy (2):
powerpc/pseries: Don't panic when H_PROD fails during cpu-online.
powerpc/pseries: Make declarations of cpu_hotplug_driver_lock() ANSI compatible.

George Kadianakis (3):
Staging: fix rtl8187se compilation errors with mac80211
staging: fix rtl8192e compilation errors with mac80211
staging: fix rtl8192su compilation errors with mac80211

Greg Kroah-Hartman (2):
Staging: batman: fix debug Kconfig option
Staging: dst: remove from the tree

Guennadi Liakhovetski (3):
sh: dmaengine support for sh7724.
ASoC: wm8974: fix a wrong bit definition
ASoC: add missing parameter to mx27vis_hifi_hw_free()

Gustavo F. Padovan (2):
Bluetooth: Fix unset of RemoteBusy flag for L2CAP
Bluetooth: Ack L2CAP I-frames before retransmit missing packet

H Hartley Sweeten (2):
pata_octeon_cf: use resource_size(), to fix resource sizing bug
[WATCHDOG] use resource_size()

H. Peter Anvin (3):
x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
Makefile: Unexport LC_ALL instead of clearing it

Hector Martin (3):
ALSA: HDA: simplify Aspire 8930G verb array
ALSA: HDA: remove useless mixers on Aspire 8930G
ALSA: HDA: add powersaving hook for Realtek

Heiko Carstens (2):
[S390] wire up sys_recvmmsg
[S390] Use strim instead of strstrip to avoid false warnings.

Ian Abbott (1):
Staging: comedi: jr3_pci: Don't ioremap too much space. Check result.

Imre Kaloz (1):
[WATCHDOG] iTCO_wdt: add PCI ID for the Intel EP80579 (Tolapai) SoC

Ingo Molnar (1):
sched: Make warning less noisy

J. Bruce Fields (1):
nfsd: fix "insecure" export option

Jan Beulich (1):
USB: fix section mismatch in early ehci dbgp

Jan Glauber (2):
[S390] qdio: remove superfluous log entries and WARN_ONs.
[S390] qdio: add counter for input queue full condition

Jan Kara (5):
ocfs2: Always include ACL support
ocfs2: Make acl use the default
ocfs2: Set MS_POSIXACL on remount
quota: Fix 64-bit limits setting on 32-bit archs
quota: Improve checking of quota file header

Jaroslav Kysela (1):
ALSA: hda/realtek: Remove extra .capsrc_nids initialization for ALC889_INTEL

Jean PIHET (1):
ARM: 5849/1: ARMv7: fix Oprofile events count

Jeff Garzik (2):
sata_mv: remove pointless NULL test
Revert "pata_cmd64x: implement serialization as per notes"

Jeff Liu (1):
ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()

Jerome Glisse (2):
drm/radeon/kms: Avoid crash when trying to cleanup uninitialized structure
drm/radeon/kms: Check module arguments to be valid V2

Joe Perches (1):
sched: Use pr_fmt() and pr_<level>()

Jon Smirl (1):
ASoC: Fix disable of SPDIF on STAC9766 codec

Jonathan Woithe (1):
fujitu-laptop: fix tests of acpi_evaluate_integer() return value

Julia Lawall (6):
[S390] drivers: Correct size given to memset
ALSA: Use kzalloc for allocating only one thing
drivers/gpu: Use kzalloc for allocating only one thing
Staging: wlan-ng: fix Correct size given to memset
Staging: batman-adv: introduce missing kfree
USB: gadget: Use ERR_PTR/IS_ERR

Kailang Yang (1):
ALSA: hda - More ALC663 fixes and support of compatible chips

Kay Sievers (2):
vfs: get_sb_single() - do not pass options twice
devtmpfs: unlock mutex in case of string allocation error

Krzysztof Helt (2):
ALSA: fix incorrect rounding direction in snd_interval_ratnum()
ALSA: sbawe: fix memory detection

Kuninori Morimoto (2):
ASoC: ak4642: Add default return value in ak4642_modinit
sh: mach-ecovec24: setup.c detailed correction

Linus Torvalds (3):
Revert "time: Remove xtime_cache"
Revert "x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle"
Linux 2.6.33-rc2

Maarten Maathuis (1):
drm/nouveau: use drm debug levels

Magnus Damm (1):
serial: sh-sci: earlyprintk zero uartclk fix

Manjunatha GK (1):
i2c-omap: OMAP3: Fix I2C lockup during timeout/error cases

Marcin Ko?cielnicki (4):
drm/nouveau: Kill global state in NvShadowBIOS
drm/nouveau: Kill global state in BIOS script interpreter
drm/nv04: Fix NV04 set_operation software method.
drm/nouveau: Add proper error handling to nouveau_card_init

Marek Ol??k (2):
drm/radeon/kms: allow rendering while no colorbuffer is set on r300
drm/radeon/kms: add 3DC compression support

Marin Mitov (1):
drm/kms: silencing a false positive warning.

Mark Ware (1):
powerpc/cpm2_pic: Allow correct flow_types for port C interrupts

Markus Pietrek (1):
sh: Ensure all PG_dcache_dirty pages are written back.

Martin Schwidefsky (1):
[S390] rename NT_PRXSTATUS to NT_S390_HIGHREGS

Masami Hiramatsu (4):
perf probe: Fix libdwarf include path for Debian
perf probe: Check whether debugfs path is correct
kprobe-tracer: Check new event/group name
perf probe: Check new event name

Mattia Dongili (3):
sony-laptop: add AVMode key mapping
sony-laptop: rfkill support for newer models
sony-laptop: enumerate rfkill devices using SN06

Maulik Mankad (2):
USB: musb: Fix null pointer dereference issue
USB: musb: Fix array index out of bounds issue

Mel Gorman (1):
powerpc/pseries: Select XICS and PCI_MSI PSERIES

Michael Chan (1):
bnx2: Fix bnx2_netif_stop() merge error.

Michael Cree (1):
alpha: Add minimal support for software performance events

Michael Hennerich (1):
Driver core: export platform_device_register_data as a GPL symbol

Michael Holzheu (1):
[S390] tape: Add pr_fmt() macro to all tape source files

Mike Rapoport (1):
ARM: 5857/1: ARM: dmabounce: fix build

Nageswari Srinivasan (1):
TI DaVinci EMAC: Fix MDIO bus frequency configuration

Neil Campbell (1):
powerpc: Handle VSX alignment faults correctly in little-endian mode

Nitin Gupta (1):
Staging: ramzswap: remove ARM specific d-cache hack

Nobuhiro Iwamatsu (1):
sh: dmaengine support for SH7785

Oleg Nesterov (1):
[S390] ptrace: dont abuse PT_PTRACED

Oliver Neukum (1):
Bluetooth: Prevent ill-timed autosuspend in USB driver

Pallipadi, Venkatesh (1):
x86: Reenable TSC sync check at boot, even with NONSTOP_TSC

Paul Mundt (4):
sh: Fix up MAX_DMA_CHANNELS definition when DMA is disabled.
sh: Restore bl bit toggling in idle loop.
sh: Only use bl bit toggling for sleeping idle.
serial: sh-sci: Convert tremaining ctrl_xxx I/O routines to __raw_xxx.

Peter Feuerer (2):
acerhdf: add new BIOS versions
drivers/platform/x86/acerhdf.c: check BIOS information whether it begins with string of table

Peter Huewe (2):
Staging: panel: Fix compilation error with custom lcd charset
Staging: panel: Adjust range for PANEL_KEYPAD in Kconfig

Peter Korsgaard (1):
powerpc/gpio: support gpio_to_irq()

Peter Oberparleiter (1):
[S390] cio: fix channel path vary

Peter Zijlstra (20):
sched: Mark boot-cpu active before smp_init()
sched: Fix task_hot() test order
sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE
sched: Use TASK_WAKING for fork wakups
sched: Ensure set_task_cpu() is never called on blocked tasks
sched: Fix sched_exec() balancing
sched: Fix select_task_rq() vs hotplug issues
sched: Move kthread_bind() back to kthread.c
sched: Add pre and post wakeup hooks
sched: Remove the cfs_rq dependency from set_task_cpu()
sched: Simplify set_task_cpu()
perf events: Dont report side-band events on each cpu for per-task-per-cpu events
sched: Move TASK_STATE_TO_CHAR_STR near the TASK_state bits
sched: Add missing state chars to TASK_STATE_TO_CHAR_STR
sched: Update task_state_arraypwith new states
sched: Assert task state bits at build time
sched: Fix broken assertion
sched: Restore printk sanity
sched: Fix hotplug hang
sched: Revert 738d2be, simplify set_task_cpu()

Phil Carmody (4):
Driver core: device_attribute parameters can often be const*
Driver core: bin_attribute parameters can often be const*
Driver core: driver_attribute parameters can often be const*
driver core: Prevent reference to freed memory on error path

Phillip Lougher (3):
bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
bzip2: Add missing checks for malloc returning NULL
initramfs: add missing decompressor error check

Rafael Avila de Espindola (1):
ALSA: hda - Add support for the new 27 inch IMacs

Rafael J. Wysocki (4):
sched: Make wakeup side and atomic variants of completion API irq safe
PM: Make the initcall_debug style timing for suspend/resume complete
PM: Measure device suspend and resume times
PM / Runtime: Use device type and device class callbacks

Rafa? Mi?ecki (2):
drm/radeon/kms: prevent parallel AtomBIOS calls
drm/radeon/kms: enable memory clock reading on legacy (V2)

Randy Dunlap (6):
lib/string.c: fix kernel-doc warnings
mm tracing: cleanup Documentation/trace/events-kmem.txt
kfifo: fix Error/broken kernel-doc notation
Staging: rtl8192x: fix printk formats
Staging/vt66*: kconfig, depends on WLAN
USB core: fix recent kernel-doc warnings

Robert Hancock (1):
libata: fix reporting of drained bytes when clearing DRQ

Robert Jennings (2):
mm: Add notifier in pageblock isolation for balloon drivers
powerpc: Make the CMM memory hotplug aware

Robert P. J. Day (1):
perf events: Remove unused perf_counter.h header file

Roel Kluin (8):
sound/oss/pss: Fix test of unsigned in pss_reset_dsp() and pss_download_boot()
Bluetooth: Fix PTR_ERR return of wrong pointer in hidp_setup_hid()
powerpc/85xx: Wrong variable returned on error
[S390] dasd: PTR_ERR return of wrong pointer in
[S390] s390: PTR_ERR return of wrong pointer in fallback_init_cip()
[S390] tty: PTR_ERR return of wrong pointer in fs3270_open()
broadcom: bcm54xx_shadow_read() errors ignored in bcm54xx_adjust_rxrefclk()
Staging: rtl8192su: fix test for negative error in rtl8192_rx_isr()

Roger Oksanen (1):
e100: Fix broken cbs accounting due to missing memset.

Roland Dreier (3):
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
alloc_file(): simplify handling of mnt_clone_write() errors
anonfd: Allow making anon files read-only

Russell King (13):
ARM: Convert VFP/Crunch/XscaleCP thread_release() to exit_thread()
ARM: Kill CONFIG_CPU_32
ALSA: AACI: simplify codec rate information
ALSA: AACI: cleanup aaci_pcm_hw_params
ALSA: AACI: factor common hw_params logic into aaci_pcm_hw_params
ALSA: AACI: add double-rate support
ALSA: AACI: switch to per-pcm locking
ARM: add missing include to nwflash.c
ARM: Fix wrong shared bit for CPU write buffer bug test
ARM: fix PAGE_KERNEL
ARM: footbridge: trim down old ISA rtc setup
ARM: dma-isa: request cascade channel after registering it
VIDEO: cyberpro: pci_request_regions needs a persistent name

Sachin P. Sant (1):
powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.

Saeed Bishara (6):
sata_mv: increase PIO IORDY timeout
sata_mv: support clkdev framework
sata_mv: add power management support for the platform driver
sata_mv: move the PCI bar description initialization code
sata_mv: store the board_idx into the host private data
sata_mv: add power management support for the PCI controllers.

Sandeep Gopalpet (3):
gianfar: Fix a filer bug
gianfar: Fix stats support
gianfar: Fix bit definitions of IMASK_GRSC and IMASK_GTSC

Sean MacLennan (1):
powerpc/44x: Increase warp SD buffer

Sebastian Andrzej Siewior (2):
powerpc/fsl: try to explain why the interrupt numbers are off by 16
Doc/stable rules: add new cherry-pick logic

Sebastian Ott (1):
[S390] cio: fix drvdata usage for the console subchannel

Sergei Shtylyov (3):
pata_hpt3x2n: fix clock turnaround
USB: musb_gadget: fix kernel oops in txstate()
USB: musb: gadget_ep0: avoid SetupEnd interrupt

Shaohua Li (1):
ACPI: fix OSC regression that caused aer and pciehp not to load

Sheng Yang (1):
x86: Add IA32_TSC_AUX MSR and use it

Simon Horman (1):
timers: Remove duplicate setting of new_base in __mod_timer()

Sonic Zhang (2):
pata_bf54x: handle portmuxing of pins through GPIO PORTs
i2c-bfin-twi: fix CLKDIV calculation

Stefan Bader (1):
acerhdf: limit modalias matching to supported

Stefan Haberland (1):
[S390] dasd: move dasd-diag kmsg to dasd

Stefani Seibold (10):
kfifo: move struct kfifo in place
kfifo: move out spinlock
kfifo: cleanup namespace
kfifo: rename kfifo_put... into kfifo_in... and kfifo_get... into kfifo_out...
kfifo: fix warn_unused_result
kfifo: add DEFINE_KFIFO and friends, add very tiny functions
kfifo: add kfifo_skip, kfifo_from_user and kfifo_to_user
kfifo: add record handling functions
media video cx23888 driver: ported to new kfifo API
Fix usb_serial_probe() problem introduced by the recent kfifo changes

Stephane Glondu (1):
staging: rtl8192su: add USB VID/PID for HWNUm-300

Stephen Hemminger (1):
netxen: use module parameter correctly

Sunil Mushran (3):
ocfs2/cluster: Make fence method configurable - v2
fiemap: Add new extent flag FIEMAP_EXTENT_SHARED
ocfs2: Use FIEMAP_EXTENT_SHARED

Suresh Siddha (2):
x86, cpuid: Add "volatile" to asm in native_cpuid()
x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system

Swaminathan S (2):
USB: musb: Populate the VBUS GPIO with the correct GPIO number
USB: musb: fix for crash in DM646x USB when (CPPI)DMA is enabled

Takashi Iwai (6):
ALSA: hda - Fix missing capsrc_nids for ALC88x
ALSA: hda - Fix quirk for Maxdata obook4-1
ALSA: aaci - Fix a typo
ALSA: hda - Fix NULL dereference with enable_beep=0 option
ALSA: hda - Add MSI blacklist
ALSA: hda - Set mixer name after codec patch

Tao Ma (6):
ocfs2: Find proper end cpos for a leaf refcount block.
ocfs2: refcounttree.c cleanup.
ocfs2: Add reflinked file's inode to inode hash eariler.
ocfs2: Set i_nlink properly during reflink.
ocfs2/trivial: Use proper mask for 2 places in hearbeat.c
ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c

Thadeu Lima de Souza Cascardo (1):
classmate-laptop: add support for Classmate PC ACPI devices

Thomas Gleixner (9):
signal: Fix racy access to __task_cred in kill_pid_info_as_uid()
signals: Fix more rcu assumptions
sys: Fix missing rcu protection for __task_cred() access
clockevents: Prevent clockevent_devices list corruption on cpu hotplug
sched: Use rcu in sys_sched_getscheduler/sys_sched_getparam()
sched: Use rcu in sched_get/set_affinity()
sched: Use rcu in sched_get_rr_param()
devtmpfs: Convert dirlock to a mutex
Driver-core: Fix bogus 0 error return in device_add()

Thomas Hellstrom (3):
drm/vmwgfx: Fix unlocked ioctl and add proper access control
drm/vmwgfx: Return -ERESTARTSYS when interrupted by a signal.
drm/vmwgfx: Use TTM handles instead of SIDs as user-space surface handles.

Thomas Renninger (2):
acer-wmi, msi-wmi: Remove needless DMI MODULE_ALIAS
hp-wmi: Fix two memleaks

Tiger Yang (1):
ocfs2: return -EAGAIN instead of EAGAIN in dlm

Tristan Ye (2):
Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?
Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.

Uwe Kleine-K?nig (2):
can/at91: don't check platform_get_irq's return value against zero
ASoC: sh: FSI:: don't check platform_get_irq's return value against zero

Wu Zhangjin (1):
Staging: sm7xx: add a new framebuffer driver

Xiaotian Feng (1):
sched: Fix set_cpu_active() in cpu_down()

Yang Hongyang (1):
ipv6: fix an oops when force unload ipv6 module

Yang Li (2):
powerpc/mm: Fix typo of cpumask_clear_cpu()
powerpc/mpic: Fix problem that affinity is not updated

Yin Kangkai (1):
jbd: jbd-debug and jbd2-debug should be writable

Yinghai Lu (2):
x86: Fix checking of SRAT when node 0 ram is not from 0
x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA

Yong Zhang (2):
powerpc/iseries: use DECLARE_COMPLETION_ONSTACK for non-constant completion
mISDN: use DECLARE_COMPLETION_ONSTACK for non-constant completion

Zhang Rui (1):
ACPI: disable _OSI(Windows 2009) on Asus K50IJ

[email protected] (1):
x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk

pancho horrillo (2):
USB: add device ID for Apple Cinema Display 23in 2007
USB: Fix a bug on appledisplay.c regarding signedness

wanzongshun (3):
ARM: 5854/1: fix compiling error for NUC900
ARM: 5855/1: putc support for nuc900
ARM: 5856/1: Fix bug of uart0 platfrom data for nuc900


2009-12-25 10:28:07

by Ingo Molnar

[permalink] [raw]
Subject: -tip: origin tree boot crash


Today's -tip crashed during bootup on one of my testsystems:

[ 28.616777] initcall eeepc_laptop_init+0x0/0x4d returned -19 after 29853 usecs
[ 28.624002] calling dell_wmi_init+0x0/0x126 @ 1
[ 28.629128] device: 'input1': device_add
[ 28.633113] PM: Adding info for No Bus:input1
[ 28.637678] input: Dell WMI hotkeys as /class/input/input1
[ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
[ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
[ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70
[ 28.654439] *pde = 00000000
[ 28.654439] Oops: 0000 [#1] DEBUG_PAGEALLOC
[ 28.654439] last sysfs file:
[ 28.654439] Modules linked in:
[ 28.654439]
[ 28.654439] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00158-g2bd999a-dirty #2238 A8N-E/System Product Name
[ 28.654439] EIP: 0060:[<c17f7f21>] EFLAGS: 00010282 CPU: 0
[ 28.654439] EIP is at wmi_install_notify_handler+0x31/0x70
[ 28.654439] EAX: 00000006 EBX: c17f55d0 ECX: 0000009d EDX: fffffff4
[ 28.654439] ESI: 00000000 EDI: 00000000 EBP: f7052f90 ESP: f7052f84
[ 28.654439] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 28.654439] Process swapper (pid: 1, ti=f7052000 task=f7070000 task.ti=f7052000)
[ 28.654439] Stack:
[ 28.654439] fffffff4 00000000 00000006 f7052fa4 c2165992 aa66398a 00000006 aa66398a
[ 28.654439] <0> f7052fd0 c1001122 c1d58799 c2165874 00000001 0000749d 00000000 c2165874
[ 28.654439] <0> c219a214 c21271d6 00000000 f7052fe4 c2127262 00000000 000000e0 00000000
[ 28.654439] Call Trace:
[ 28.654439] [<c2165992>] ? dell_wmi_init+0x11e/0x126
[ 28.654439] [<c1001122>] ? do_one_initcall+0x32/0x1d0
[ 28.654439] [<c2165874>] ? dell_wmi_init+0x0/0x126
[ 28.654439] [<c2165874>] ? dell_wmi_init+0x0/0x126
[ 28.654439] [<c21271d6>] ? kernel_init+0x0/0xe0
[ 28.654439] [<c2127262>] ? kernel_init+0x8c/0xe0
[ 28.654439] [<c10033c6>] ? kernel_thread_helper+0x6/0x10
[ 28.654439] Code: 89 5d f8 89 75 fc e8 cf b4 80 ff 85 d2 89 d3 89 ce 74 2f 85 c0 74 2b 8d 55 f4 e8 0b fd ff ff 8b 55 f4 b8 06 00 00 00 85 d2 74 09 <8b> 4a 20 b0 15 85 c9 74 1e 8b 5d f8 8b 75 fc 89 ec 5d c3 8d 74
[ 28.654439] EIP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70 SS:ESP 0068:f7052f84
[ 28.654439] CR2: 0000000000000014
[ 28.819532] ---[ end trace 6cbbad4ca20c038d ]---
[ 28.824158] Kernel panic - not syncing: Fatal exception
[ 28.829394] Pid: 1, comm: swapper Tainted: G D 2.6.33-rc2-tip-00158-g2bd999a-dirty #2238
[ 28.838180] Call Trace:
[ 28.840626] [<c1aa71dc>] ? printk+0x1d/0x1f
[ 28.844909] [<c1aa7110>] panic+0x48/0xf7
[ 28.848930] [<c1aab609>] oops_end+0xb9/0xd0
[ 28.853210] [<c1020086>] no_context+0xc6/0x160
[ 28.857752] [<c10201b0>] __bad_area_nosemaphore+0x90/0x140
[ 28.863334] [<c1020277>] bad_area_nosemaphore+0x17/0x20
[ 28.868656] [<c1aacf5d>] do_page_fault+0x2dd/0x380
[ 28.873545] [<c17f55d0>] ? dell_wmi_notify+0x0/0x170
[ 28.878604] [<c1aacc80>] ? do_page_fault+0x0/0x380
[ 28.883494] [<c1aaaa90>] error_code+0x70/0x80
[ 28.887947] [<c17f55d0>] ? dell_wmi_notify+0x0/0x170
[ 28.893007] [<c17f7f21>] ? wmi_install_notify_handler+0x31/0x70
[ 28.899024] [<c2165992>] dell_wmi_init+0x11e/0x126
[ 28.903910] [<c1001122>] do_one_initcall+0x32/0x1d0
[ 28.908885] [<c2165874>] ? dell_wmi_init+0x0/0x126
[ 28.913774] [<c2165874>] ? dell_wmi_init+0x0/0x126
[ 28.918662] [<c21271d6>] ? kernel_init+0x0/0xe0
[ 28.923289] [<c2127262>] kernel_init+0x8c/0xe0
[ 28.927831] [<c10033c6>] kernel_thread_helper+0x6/0x10

The crash is due to this commit from yesterday's (v2.6.33-rc2) upstream tree:

| commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853
| Author: Dmitry Torokhov <[email protected]>
| Date: Thu Dec 17 22:19:42 2009 -0800
|
| dell-wmi: do not keep driver loaded on unsupported boxes
|
| There is no point in having the driver loaded in memory if we fail
| to locate particular WMI GUID.
|
| Signed-off-by: Dmitry Torokhov <[email protected]>
| Acked-by: Matthew Garrett <[email protected]>
| Signed-off-by: Len Brown <[email protected]>

I've reverted the commit from -tip for now.

Thanks,

Ingo

2009-12-25 19:49:50

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

On Fri, Dec 25, 2009 at 11:27:31AM +0100, Ingo Molnar wrote:
>
> Today's -tip crashed during bootup on one of my testsystems:
>

...

>
> The crash is due to this commit from yesterday's (v2.6.33-rc2) upstream tree:
>
> | commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853
> | Author: Dmitry Torokhov <[email protected]>
> | Date: Thu Dec 17 22:19:42 2009 -0800
> |
> | dell-wmi: do not keep driver loaded on unsupported boxes
> |
> | There is no point in having the driver loaded in memory if we fail
> | to locate particular WMI GUID.
> |
> | Signed-off-by: Dmitry Torokhov <[email protected]>
> | Acked-by: Matthew Garrett <[email protected]>
> | Signed-off-by: Len Brown <[email protected]>
>
> I've reverted the commit from -tip for now.
>

Hmm, the patch is busted in one way, but it should not be crashing like
that still... I wonder what is going on. Still, the patch below should
help it a bit.

--
Dmitry

dell-wmi - fix condition to abort driver loading

From: Dmitry Torokhov <[email protected]>

The commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853 incorrectly made driver
abort loading when known GUID is present when it should have done exactly
the opposite.

Signed-off-by: Dmitry Torokhov <[email protected]>
---

drivers/platform/x86/dell-wmi.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..c980782 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -324,7 +324,7 @@ static int __init dell_wmi_init(void)
{
int err;

- if (wmi_has_guid(DELL_EVENT_GUID)) {
+ if (!wmi_has_guid(DELL_EVENT_GUID)) {
printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
return -ENODEV;
}

2009-12-25 20:00:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: Linux 2.6.33-rc2 - Merry Christmas ...

Hi,

the r8169 driver fails loading here with the following message:

[ 0.353955] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 0.354258] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[ 0.354391] r8169 0000:02:00.0: PCI INT A disabled
[ 0.354527] r8169: probe of 0000:02:00.0 failed with error -22

Machine is Acer Aspire One, Atom N270 CPU.

Actually, the breakage seems to have appeared a bit earlier, sometime
between .32 and .33-rc1 as the bisection result shows:

ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
Author: Jesse Barnes <[email protected]>
Date: Mon Oct 26 13:20:44 2009 -0700

PCI: determine CLS more intelligently

Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right. On most configurations, the chance is that
firmware configures the correct value during boot.

This patch makes pci_init() determine CLS by looking at what firmware
has configured. It scans all devices and if all non-zero values
agree, the value is used. If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.

ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.

While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Miller <[email protected]>
Acked-by: Greg KH <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Jesse Barnes <[email protected]>

:040000 040000 8b20ad60ed3e273b74bfb588dbea6948547e8de1 92498585770ca360c2716c0d3c55d5f4a37356a1 M arch
:040000 040000 56d1abd61286dd303bb37c2002b699e526988f85 3f20bba2d1e107a80a738e3561fc0fa92e0c4024 M drivers
:040000 040000 26d85393248c542ca2cea0e3ac4ceabd0ea659aa 326cbb98321cd4e490d888f48bc584b3662c8f06 M include

And since this result looks unrelated to my eye I did the whole
bisection search twice just to make sure. In both cases I get the same
result with the following log:

git bisect start
# bad: [6b7b284958d47b77d06745b36bc7f36dab769d9b] Linux 2.6.33-rc2
git bisect bad 6b7b284958d47b77d06745b36bc7f36dab769d9b
# good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git bisect good 22763c5cf3690a681551162c15d34d935308c8d7
# good: [f6c4c8195b5e7878823caa1181be404d9e86d369] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good f6c4c8195b5e7878823caa1181be404d9e86d369
# bad: [37222e1c9ee3ce587f5b41fed868bd8a592a992f] Merge branch 'for-linus' of git://neil.brown.name/md
git bisect bad 37222e1c9ee3ce587f5b41fed868bd8a592a992f
# bad: [5f1141eb352ea79d849920039503e40dd623fffa] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
git bisect bad 5f1141eb352ea79d849920039503e40dd623fffa
# good: [f71eaf68406cfee91b6a96bcdf7ce33dc78829c5] Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect good f71eaf68406cfee91b6a96bcdf7ce33dc78829c5
# good: [6ee738610f41b59733f63718f0bdbcba7d3a3f12] drm/nouveau: Add DRM driver for NVIDIA GPUs
git bisect good 6ee738610f41b59733f63718f0bdbcba7d3a3f12
# good: [aa65607373a4daf2010e8c3867b6317619f3c1a3] Add missing alignment check in arch/score sys_mmap()
git bisect good aa65607373a4daf2010e8c3867b6317619f3c1a3
# bad: [f6e1d8cc38b3776038fb15d3acc82ed8bb552f82] x86/PCI: MMCONFIG: add lookup function
git bisect bad f6e1d8cc38b3776038fb15d3acc82ed8bb552f82
# bad: [865df576e8fc70daf297b53e61a4fbefc719d065] PCI: improve discovery/configuration messages
git bisect bad 865df576e8fc70daf297b53e61a4fbefc719d065
# bad: [af5a8ee05404112f38fb2904747c688bdc31a746] x86/PCI: use -DDEBUG when CONFIG_PCI_DEBUG set
git bisect bad af5a8ee05404112f38fb2904747c688bdc31a746
# bad: [c91d3376e5f4277173a22f0ef9989125c318bacb] vsprintf: add %pR support for IRQ and DMA resources
git bisect bad c91d3376e5f4277173a22f0ef9989125c318bacb
# bad: [98e724c791924c0dfc5b1dcf053ed3841cc89c78] PCI: pci_dfl_cache_line_size is __devinitdata
git bisect bad 98e724c791924c0dfc5b1dcf053ed3841cc89c78
# bad: [ac1aa47b131416a6ff37eb1005a0a1d2541aad6c] PCI: determine CLS more intelligently
git bisect bad ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
# good: [99935a7a59eaca0292c1a5880e10bae03f4a5e3d] x86/PCI: read root resources from IOH on Intel
git bisect good 99935a7a59eaca0292c1a5880e10bae03f4a5e3d

Config, dmesg and lspci output is attached.

--
Regards/Gruss,
Boris.


Attachments:
(No filename) (4.80 kB)
.config (59.25 kB)
dmesg.log (40.45 kB)
lspci.log (28.97 kB)
Download all attachments

2009-12-25 21:50:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: Linux 2.6.33-rc2 - Merry Christmas ...

On Fri, Dec 25, 2009 at 09:00:17PM +0100, Borislav Petkov wrote:
> Hi,
>
> the r8169 driver fails loading here with the following message:
>
> [ 0.353955] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 0.354258] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [ 0.354391] r8169 0000:02:00.0: PCI INT A disabled
> [ 0.354527] r8169: probe of 0000:02:00.0 failed with error -22
>
> Machine is Acer Aspire One, Atom N270 CPU.
>
> Actually, the breakage seems to have appeared a bit earlier, sometime
> between .32 and .33-rc1 as the bisection result shows:
>
> ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
> commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
> Author: Jesse Barnes <[email protected]>
> Date: Mon Oct 26 13:20:44 2009 -0700
>
> PCI: determine CLS more intelligently
>
> Till now, CLS has been determined either by arch code or as
> L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
> always get it right. On most configurations, the chance is that
> firmware configures the correct value during boot.
>
> This patch makes pci_init() determine CLS by looking at what firmware
> has configured. It scans all devices and if all non-zero values
> agree, the value is used. If none is configured or there is a
> disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
> value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
> override the actual one.
>
> ia64, x86 and sparc64 updated to set the default cls instead of the
> actual one.
>
> While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
> in pci.h and drop private declarations from arch code.

Ok here's what happens:

pci_apply_final_quirks() dumps on the console

[ 0.369252] PCI: CLS 0 bytes, default 64

which means that it hasn't fallen back to setting the default cache line
size. Also, the call

pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, &tmp);

sets tmp = 0 and the following condition hits everytime

if (!cls)
cls = tmp;
if (!tmp || cls == tmp)
continue;

Which means that we never get around the set the default CLS.

The following dirty fix solves the issue on my machine:

--
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 7cfa7c3..9854c26 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2629,7 +2629,10 @@ static int __init pci_apply_final_quirks(void)
if (!pci_cache_line_size) {
printk(KERN_DEBUG "PCI: CLS %u bytes, default %u\n",
cls << 2, pci_dfl_cache_line_size << 2);
- pci_cache_line_size = cls;
+ if (!cls)
+ pci_cache_line_size = pci_dfl_cache_line_size;
+ else
+ pci_cache_line_size = cls;
}

return 0;


--
Regards/Gruss,
Boris.

2009-12-26 06:00:38

by Jesse Barnes

[permalink] [raw]
Subject: Re: Linux 2.6.33-rc2 - Merry Christmas ...

On Fri, 25 Dec 2009 22:50:32 +0100
Borislav Petkov <[email protected]> wrote:

> On Fri, Dec 25, 2009 at 09:00:17PM +0100, Borislav Petkov wrote:
> > Hi,
> >
> > the r8169 driver fails loading here with the following message:
> >
> > [ 0.353955] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> > [ 0.354258] r8169 0000:02:00.0: PCI INT A -> GSI 17 (level, low)
> > -> IRQ 17 [ 0.354391] r8169 0000:02:00.0: PCI INT A disabled
> > [ 0.354527] r8169: probe of 0000:02:00.0 failed with error -22
> >
> > Machine is Acer Aspire One, Atom N270 CPU.
> >
> > Actually, the breakage seems to have appeared a bit earlier,
> > sometime between .32 and .33-rc1 as the bisection result shows:
> >
> > ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
> > commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
> > Author: Jesse Barnes <[email protected]>
> > Date: Mon Oct 26 13:20:44 2009 -0700
> >
> > PCI: determine CLS more intelligently
> >
> > Till now, CLS has been determined either by arch code or as
> > L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86
> > doesn't always get it right. On most configurations, the chance is
> > that firmware configures the correct value during boot.
> >
> > This patch makes pci_init() determine CLS by looking at what
> > firmware has configured. It scans all devices and if all non-zero
> > values agree, the value is used. If none is configured or there is
> > a disagreement, pci_dfl_cache_line_size is used. arch can set the
> > dfl value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
> > override the actual one.
> >
> > ia64, x86 and sparc64 updated to set the default cls instead of
> > the actual one.
> >
> > While at it, declare pci_cache_line_size and
> > pci_dfl_cache_line_size in pci.h and drop private declarations from
> > arch code.
>
> Ok here's what happens:
>
> pci_apply_final_quirks() dumps on the console
>
> [ 0.369252] PCI: CLS 0 bytes, default 64
>
> which means that it hasn't fallen back to setting the default cache
> line size. Also, the call
>
> pci_read_config_byte(dev, PCI_CACHE_LINE_SIZE, &tmp);
>
> sets tmp = 0 and the following condition hits everytime
>
> if (!cls)
> cls = tmp;
> if (!tmp || cls == tmp)
> continue;
>
> Which means that we never get around the set the default CLS.
>
> The following dirty fix solves the issue on my machine:
>
> --
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 7cfa7c3..9854c26 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2629,7 +2629,10 @@ static int __init pci_apply_final_quirks(void)
> if (!pci_cache_line_size) {
> printk(KERN_DEBUG "PCI: CLS %u bytes, default %u\n",
> cls << 2, pci_dfl_cache_line_size << 2);
> - pci_cache_line_size = cls;
> + if (!cls)
> + pci_cache_line_size =
> pci_dfl_cache_line_size;
> + else
> + pci_cache_line_size = cls;
> }

There should be a fix queued in my for-linus branch (I've been out this
week so haven't sent the pull req yet), can you give it a try and make
sure?

Jesse

2009-12-26 08:02:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: Linux 2.6.33-rc2 - Merry Christmas ...

On Fri, Dec 25, 2009 at 10:00:31PM -0800, Jesse Barnes wrote:
> There should be a fix queued in my for-linus branch (I've been out this
> week so haven't sent the pull req yet), can you give it a try and make
> sure?

Yep, 2820f333e3b4ad96590093efbed7b3400bcf492b fixes it.

Thanks.

--
Regards/Gruss,
Boris.

2009-12-26 09:36:28

by Borislav Petkov

[permalink] [raw]
Subject: EHCI resume sysfs duplicates (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Hi,

I'm getting the following warning upon resume.
Kernel is 2.6.33-rc2 + one unrelated PCI fix¹

------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:477 sysfs_add_one+0x107/0x121()
Hardware name: System Product Name
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:12.2/usb1/1-2/1-2:1.0/ep_81'
Modules linked in: binfmt_misc kvm_amd kvm powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table c
pufreq_conservative ipv6 vfat fat 8250_pnp 8250 pcspkr serial_core ohci_hcd k10temp edac_core
Pid: 279, comm: khubd Not tainted 2.6.33-rc2-00001-g6d7daec #1
Call Trace:
[<ffffffff8103b010>] warn_slowpath_common+0x7c/0x94
[<ffffffff8103b07f>] warn_slowpath_fmt+0x41/0x43
[<ffffffff8112c448>] sysfs_add_one+0x107/0x121
[<ffffffff8118b8c1>] ? kobject_add_internal+0x9e/0x1a0
[<ffffffff8112c9d8>] create_dir+0x5d/0x98
[<ffffffff8112ca50>] sysfs_create_dir+0x3d/0x50
[<ffffffff813eb98b>] ? _raw_spin_unlock+0x35/0x52
[<ffffffff8118b902>] kobject_add_internal+0xdf/0x1a0
[<ffffffff8118ba99>] kobject_add_varg+0x41/0x50
[<ffffffff8118bb63>] kobject_add+0x64/0x66
[<ffffffff8118b79a>] ? kobject_get+0x1a/0x22
[<ffffffff812930dc>] device_add+0xad/0x4e6
[<ffffffff81195a7b>] ? __raw_spin_lock_init+0x31/0x52
[<ffffffff81293533>] device_register+0x1e/0x22
[<ffffffff812ea5ae>] usb_create_ep_devs+0xfd/0x127
[<ffffffff812e3604>] create_intf_ep_devs+0x50/0x6c
[<ffffffff812e4a81>] usb_set_interface+0x228/0x23a
[<ffffffff813eb938>] ? _raw_spin_unlock_irqrestore+0x5b/0x79
[<ffffffff812de26d>] usb_reset_and_verify_device+0x412/0x497
[<ffffffff812de577>] usb_port_resume+0x164/0x25b
[<ffffffff813e9404>] ? mutex_unlock+0xe/0x10
[<ffffffff812edc84>] generic_resume+0x1c/0x1e
[<ffffffff812e5b1c>] usb_resume_device+0x3a/0x3c
[<ffffffff812e61a9>] usb_resume_both+0x70/0x10a
[<ffffffff812e6e6f>] usb_external_resume_device+0x39/0x77
[<ffffffff812dfd51>] hub_thread+0x4eb/0xfc6
[<ffffffff813eb8c0>] ? _raw_spin_unlock_irq+0x41/0x5e
[<ffffffff81057a00>] ? autoremove_wake_function+0x0/0x39
[<ffffffff813eb938>] ? _raw_spin_unlock_irqrestore+0x5b/0x79
[<ffffffff812df866>] ? hub_thread+0x0/0xfc6
[<ffffffff81057571>] kthread+0x7f/0x87
[<ffffffff810030e4>] kernel_thread_helper+0x4/0x10
[<ffffffff813eb8c0>] ? _raw_spin_unlock_irq+0x41/0x5e
[<ffffffff813ebe54>] ? restore_args+0x0/0x30
[<ffffffff810574f2>] ? kthread+0x0/0x87
[<ffffffff810030e0>] ? kernel_thread_helper+0x0/0x10
---[ end trace 044a12284a40512a ]---

.config and dmesg attached.


¹http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commit;h=2820f333e3b4ad96590093efbed7b3400bcf492b
--
Regards/Gruss,
Boris.


Attachments:
(No filename) (2.59 kB)
.config (60.32 kB)
dmesg.log (52.40 kB)
Download all attachments

2009-12-26 09:45:14

by Borislav Petkov

[permalink] [raw]
Subject: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Hi,

this jumped into dmesg upon resume (.config and dmesg are attached in
the previous "EHCI resume sysfs duplicates..." message in this thread):

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-rc2-00001-g6d7daec #1
-------------------------------------------------------
Xorg/3076 is trying to acquire lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff81223fd4>] drm_mmap+0x38/0x5c

but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff810b7509>] sys_mmap_pgoff+0xd6/0x1b4

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++++}:
[<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff810bb2b5>] might_fault+0x95/0xb8
[<ffffffff810e87d6>] filldir+0x75/0xd0
[<ffffffff8112be2a>] sysfs_readdir+0x10f/0x149
[<ffffffff810e895b>] vfs_readdir+0x6b/0xa8
[<ffffffff810e8ae1>] sys_getdents+0x81/0xd1
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

-> #2 (sysfs_mutex){+.+.+.}:
[<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
[<ffffffff8112c488>] sysfs_addrm_start+0x26/0x28
[<ffffffff8112c940>] sysfs_remove_dir+0x52/0x8d
[<ffffffff8118b6f9>] kobject_del+0x16/0x37
[<ffffffff8118b758>] kobject_release+0x3e/0x66
[<ffffffff8118c5b5>] kref_put+0x43/0x4d
[<ffffffff8118b674>] kobject_put+0x47/0x4b
[<ffffffff813e11c1>] cacheinfo_cpu_callback+0xa2/0xdb
[<ffffffff8105c317>] notifier_call_chain+0x37/0x63
[<ffffffff8105c3c7>] raw_notifier_call_chain+0x14/0x16
[<ffffffff813d58ec>] _cpu_down+0x1a5/0x29a
[<ffffffff8103c851>] disable_nonboot_cpus+0x74/0x10d
[<ffffffff8107793e>] hibernation_snapshot+0x99/0x1d3
[<ffffffff81077b46>] hibernate+0xce/0x172
[<ffffffff810768d4>] state_store+0x5c/0xd3
[<ffffffff8118b48b>] kobj_attr_store+0x17/0x19
[<ffffffff8112b4bd>] sysfs_write_file+0x108/0x144
[<ffffffff810daf53>] vfs_write+0xb2/0x153
[<ffffffff810db0b7>] sys_write+0x4a/0x71
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
[<ffffffff8103c980>] get_online_cpus+0x3c/0x50
[<ffffffff81014c1a>] mtrr_del_page+0x3e/0x13c
[<ffffffff81014d5f>] mtrr_del+0x47/0x4f
[<ffffffff8121c23b>] drm_rmmap_locked+0xdc/0x1a2
[<ffffffff812226e3>] drm_master_destroy+0x86/0x11f
[<ffffffff8118c5b5>] kref_put+0x43/0x4d
[<ffffffff812225c4>] drm_master_put+0x20/0x2b
[<ffffffff8121ea71>] drm_release+0x54b/0x688
[<ffffffff810dbb24>] __fput+0x125/0x1e7
[<ffffffff810dbc00>] fput+0x1a/0x1c
[<ffffffff810d8d02>] filp_close+0x5d/0x67
[<ffffffff810d8db9>] sys_close+0xad/0xe7
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

-> #0 (&dev->struct_mutex){+.+.+.}:
[<ffffffff81069170>] __lock_acquire+0x1023/0x16fd
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
[<ffffffff81223fd4>] drm_mmap+0x38/0x5c
[<ffffffff810c34f5>] mmap_region+0x2e0/0x4ff
[<ffffffff810c39a4>] do_mmap_pgoff+0x290/0x2f3
[<ffffffff810b7529>] sys_mmap_pgoff+0xf6/0x1b4
[<ffffffff8100719b>] sys_mmap+0x22/0x27
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by Xorg/3076:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff810b7509>] sys_mmap_pgoff+0xd6/0x1b4

stack backtrace:
Pid: 3076, comm: Xorg Tainted: G W 2.6.33-rc2-00001-g6d7daec #1
Call Trace:
[<ffffffff81067c10>] print_circular_bug+0xae/0xbd
[<ffffffff81069170>] __lock_acquire+0x1023/0x16fd
[<ffffffff81223fd4>] ? drm_mmap+0x38/0x5c
[<ffffffff8106993c>] lock_acquire+0xf2/0x116
[<ffffffff81223fd4>] ? drm_mmap+0x38/0x5c
[<ffffffff81223fd4>] ? drm_mmap+0x38/0x5c
[<ffffffff81223fd4>] ? drm_mmap+0x38/0x5c
[<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
[<ffffffff81223fd4>] ? drm_mmap+0x38/0x5c
[<ffffffff81067526>] ? mark_held_locks+0x52/0x70
[<ffffffff810d55f6>] ? kmem_cache_alloc+0xc2/0x168
[<ffffffff810c3452>] ? mmap_region+0x23d/0x4ff
[<ffffffff810677b1>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff81223fd4>] drm_mmap+0x38/0x5c
[<ffffffff813eac54>] ? __down_write_nested+0x1c/0xcc
[<ffffffff810c34f5>] mmap_region+0x2e0/0x4ff
[<ffffffff810c39a4>] do_mmap_pgoff+0x290/0x2f3
[<ffffffff810b7529>] sys_mmap_pgoff+0xf6/0x1b4
[<ffffffff810677b1>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff813eae16>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100719b>] sys_mmap+0x22/0x27
[<ffffffff810022f2>] system_call_fastpath+0x16/0x1b
[drm] Setting GART location based on new memory map
[drm] Loading RV635 CP Microcode
platform r600_cp.0: firmware: using built-in firmware radeon/RV635_pfp.bin
platform r600_cp.0: firmware: using built-in firmware radeon/RV635_me.bin
[drm] Resetting GPU
[drm] writeback test succeeded in 1 usecs


--
Regards/Gruss,
Boris.

2009-12-26 20:18:08

by Len Brown

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

> [ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
> [ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
> [ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70

Rather than reverting the broken patch that caused wmi to load,
does applying this patch to deal with the broken error handling
cause the oops to go away?

thanks,
-Len

>From d11e073ee3e3091d9190dace97ce480e960cca1b Mon Sep 17 00:00:00 2001
From: Len Brown <[email protected]>
Date: Fri, 25 Dec 2009 23:14:26 -0500
Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data"
X-Patchwork-Hint: ignore

This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065.

Reported-by: Sedat Dilek <[email protected]>
Tested-by Maciej Rutecki <[email protected]>
Signed-off-by: Len Brown <[email protected]>
---
drivers/platform/x86/dell-wmi.c | 1 -
drivers/platform/x86/hp-wmi.c | 2 --
drivers/platform/x86/wmi.c | 4 ++--
3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..46244c6 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -238,7 +238,6 @@ static void dell_wmi_notify(u32 value, void *context)
input_sync(dell_wmi_input_dev);
}
}
- kfree(obj);
}


diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
index 8781d8f..222ab57 100644
--- a/drivers/platform/x86/hp-wmi.c
+++ b/drivers/platform/x86/hp-wmi.c
@@ -388,8 +388,6 @@ static void hp_wmi_notify(u32 value, void *context)
} else
printk(KERN_INFO "HP WMI: Unknown key pressed - %x\n",
eventcode);
-
- kfree(obj);
}

static int __init hp_wmi_input_setup(void)
diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
index 9f93d6c..e425a86 100644
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -540,8 +540,8 @@ EXPORT_SYMBOL_GPL(wmi_remove_notify_handler);
/**
* wmi_get_event_data - Get WMI data associated with an event
*
- * @event: Event to find
- * @out: Buffer to hold event data. out->pointer should be freed with kfree()
+ * @event - Event to find
+ * &out - Buffer to hold event data
*
* Returns extra data associated with an event in WMI.
*/
--
1.6.6.rc4.11.g129a5

2009-12-26 20:19:14

by Len Brown

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

> dell-wmi - fix condition to abort driver loading


applied to acpi-test

thanks,
-Len Brown, Intel Open Source Technology Center

> From: Dmitry Torokhov <[email protected]>
>
> The commit 1fdd407f4e3f2ecb453954cbebb6c22491c61853 incorrectly made driver
> abort loading when known GUID is present when it should have done exactly
> the opposite.
>
> Signed-off-by: Dmitry Torokhov <[email protected]>
> ---
>
> drivers/platform/x86/dell-wmi.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>
> diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
> index 916ccb2..c980782 100644
> --- a/drivers/platform/x86/dell-wmi.c
> +++ b/drivers/platform/x86/dell-wmi.c
> @@ -324,7 +324,7 @@ static int __init dell_wmi_init(void)
> {
> int err;
>
> - if (wmi_has_guid(DELL_EVENT_GUID)) {
> + if (!wmi_has_guid(DELL_EVENT_GUID)) {
> printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
> return -ENODEV;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-12-27 04:20:19

by Len Brown

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

On Sat, 26 Dec 2009, Len Brown wrote:

> > [ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
> > [ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
> > [ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70
>
> Rather than reverting the broken patch that caused wmi to load,
> does applying this patch to deal with the broken error handling
> cause the oops to go away?

> Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data"
>
> This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065.

These kfree's look correct, assuming we properly check
the return status. So perhaps instead you can test
the patch below?

thanks,
-Len

>From 5caa3ab36da77d59017cff9b9d1e910862b489e7 Mon Sep 17 00:00:00 2001
Message-Id: <5caa3ab36da77d59017cff9b9d1e910862b489e7.1261887124.git.len.brown@intel.com>
In-Reply-To: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
References: <51b0f1c2b8c32ee44ff01ef74599a1f17e4fc565.1261887124.git.len.brown@intel.com>
From: Len Brown <[email protected]>
Date: Sat, 26 Dec 2009 23:02:24 -0500
Subject: [PATCH 3/3] dell-wmi, hp-wmi, msi-wmi: check wmi_get_event_data() return value
X-Patchwork-Hint: ignore

When acpi_evaluate_object() is passed ACPI_ALLOCATE_BUFFER,
the caller must kfree the returned buffer if AE_OK is returned.

The callers of wmi_get_event_data() pass ACPI_ALLOCATE_BUFFER,
and thus must check its return value before accessing
or kfree() on the buffer.

Signed-off-by: Len Brown <[email protected]>
---
drivers/platform/x86/dell-wmi.c | 7 ++++++-
drivers/platform/x86/hp-wmi.c | 7 ++++++-
drivers/platform/x86/msi-wmi.c | 7 ++++++-
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 4c7e702..500af8c 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -202,8 +202,13 @@ static void dell_wmi_notify(u32 value, void *context)
struct acpi_buffer response = { ACPI_ALLOCATE_BUFFER, NULL };
static struct key_entry *key;
union acpi_object *obj;
+ acpi_status status;

- wmi_get_event_data(value, &response);
+ status = wmi_get_event_data(value, &response);
+ if (status != AE_OK) {
+ printk(KERN_INFO "dell-wmi: bad event status 0x%x\n", status);
+ return;
+ }

obj = (union acpi_object *)response.pointer;

diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
index 18bf741..5b648f0 100644
--- a/drivers/platform/x86/hp-wmi.c
+++ b/drivers/platform/x86/hp-wmi.c
@@ -338,8 +338,13 @@ static void hp_wmi_notify(u32 value, void *context)
static struct key_entry *key;
union acpi_object *obj;
int eventcode;
+ acpi_status status;

- wmi_get_event_data(value, &response);
+ status = wmi_get_event_data(value, &response);
+ if (status != AE_OK) {
+ printk(KERN_INFO "hp-wmi: bad event status 0x%x\n", status);
+ return;
+ }

obj = (union acpi_object *)response.pointer;

diff --git a/drivers/platform/x86/msi-wmi.c b/drivers/platform/x86/msi-wmi.c
index f746c67..f5f70d4 100644
--- a/drivers/platform/x86/msi-wmi.c
+++ b/drivers/platform/x86/msi-wmi.c
@@ -149,8 +149,13 @@ static void msi_wmi_notify(u32 value, void *context)
static struct key_entry *key;
union acpi_object *obj;
ktime_t cur;
+ acpi_status status;

- wmi_get_event_data(value, &response);
+ status = wmi_get_event_data(value, &response);
+ if (status != AE_OK) {
+ printk(KERN_INFO DRV_PFX "bad event status 0x%x\n", status);
+ return;
+ }

obj = (union acpi_object *)response.pointer;

--
1.6.0.6

2009-12-28 00:40:36

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Hi

Quick analysis is here.

> Hi,
>
> this jumped into dmesg upon resume (.config and dmesg are attached in
> the previous "EHCI resume sysfs duplicates..." message in this thread):
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.33-rc2-00001-g6d7daec #1
> -------------------------------------------------------
> Xorg/3076 is trying to acquire lock:
> (&dev->struct_mutex){+.+.+.}, at: [<ffffffff81223fd4>] drm_mmap+0x38/0x5c
>
> but task is already holding lock:
> (&mm->mmap_sem){++++++}, at: [<ffffffff810b7509>] sys_mmap_pgoff+0xd6/0x1b4
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&mm->mmap_sem){++++++}:
> [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
> [<ffffffff8106993c>] lock_acquire+0xf2/0x116
> [<ffffffff810bb2b5>] might_fault+0x95/0xb8 <- mmap_sem
> [<ffffffff810e87d6>] filldir+0x75/0xd0 <- sysfs_mutex
> [<ffffffff8112be2a>] sysfs_readdir+0x10f/0x149
> [<ffffffff810e895b>] vfs_readdir+0x6b/0xa8
> [<ffffffff810e8ae1>] sys_getdents+0x81/0xd1
> [<ffffffff810022f2>] system_call_fastpath+0x16/0x1b
>
> -> #2 (sysfs_mutex){+.+.+.}:
> [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
> [<ffffffff8106993c>] lock_acquire+0xf2/0x116
> [<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
> [<ffffffff8112c488>] sysfs_addrm_start+0x26/0x28 <- sysfs_mutex
> [<ffffffff8112c940>] sysfs_remove_dir+0x52/0x8d
> [<ffffffff8118b6f9>] kobject_del+0x16/0x37
> [<ffffffff8118b758>] kobject_release+0x3e/0x66
> [<ffffffff8118c5b5>] kref_put+0x43/0x4d
> [<ffffffff8118b674>] kobject_put+0x47/0x4b
> [<ffffffff813e11c1>] cacheinfo_cpu_callback+0xa2/0xdb
> [<ffffffff8105c317>] notifier_call_chain+0x37/0x63
> [<ffffffff8105c3c7>] raw_notifier_call_chain+0x14/0x16
> [<ffffffff813d58ec>] _cpu_down+0x1a5/0x29a <- cpu_hotplug.lock
> [<ffffffff8103c851>] disable_nonboot_cpus+0x74/0x10d
> [<ffffffff8107793e>] hibernation_snapshot+0x99/0x1d3
> [<ffffffff81077b46>] hibernate+0xce/0x172
> [<ffffffff810768d4>] state_store+0x5c/0xd3
> [<ffffffff8118b48b>] kobj_attr_store+0x17/0x19
> [<ffffffff8112b4bd>] sysfs_write_file+0x108/0x144
> [<ffffffff810daf53>] vfs_write+0xb2/0x153
> [<ffffffff810db0b7>] sys_write+0x4a/0x71
> [<ffffffff810022f2>] system_call_fastpath+0x16/0x1b
>
> -> #1 (cpu_hotplug.lock){+.+.+.}:
> [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
> [<ffffffff8106993c>] lock_acquire+0xf2/0x116
> [<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
> [<ffffffff8103c980>] get_online_cpus+0x3c/0x50 <- cpu_hotplug.lock
> [<ffffffff81014c1a>] mtrr_del_page+0x3e/0x13c
> [<ffffffff81014d5f>] mtrr_del+0x47/0x4f
> [<ffffffff8121c23b>] drm_rmmap_locked+0xdc/0x1a2
> [<ffffffff812226e3>] drm_master_destroy+0x86/0x11f
> [<ffffffff8118c5b5>] kref_put+0x43/0x4d
> [<ffffffff812225c4>] drm_master_put+0x20/0x2b
> [<ffffffff8121ea71>] drm_release+0x54b/0x688 <- dev->struct_mutex
> [<ffffffff810dbb24>] __fput+0x125/0x1e7
> [<ffffffff810dbc00>] fput+0x1a/0x1c
> [<ffffffff810d8d02>] filp_close+0x5d/0x67
> [<ffffffff810d8db9>] sys_close+0xad/0xe7
> [<ffffffff810022f2>] system_call_fastpath+0x16/0x1b
>
> -> #0 (&dev->struct_mutex){+.+.+.}:
> [<ffffffff81069170>] __lock_acquire+0x1023/0x16fd
> [<ffffffff8106993c>] lock_acquire+0xf2/0x116
> [<ffffffff813e9e5c>] mutex_lock_nested+0x63/0x354
> [<ffffffff81223fd4>] drm_mmap+0x38/0x5c <- dev->struct_mutex
> [<ffffffff810c34f5>] mmap_region+0x2e0/0x4ff
> [<ffffffff810c39a4>] do_mmap_pgoff+0x290/0x2f3
> [<ffffffff810b7529>] sys_mmap_pgoff+0xf6/0x1b4
> [<ffffffff8100719b>] sys_mmap+0x22/0x27 <- mmap_sem
> [<ffffffff810022f2>] system_call_fastpath+0x16/0x1b

This output seems to suggest need to fix drm.


2009-12-28 09:45:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash


* Len Brown <[email protected]> wrote:

> On Sat, 26 Dec 2009, Len Brown wrote:
>
> > > [ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
> > > [ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
> > > [ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70
> >
> > Rather than reverting the broken patch that caused wmi to load,
> > does applying this patch to deal with the broken error handling
> > cause the oops to go away?
>
> > Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data"
> >
> > This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065.
>
> These kfree's look correct, assuming we properly check
> the return status. So perhaps instead you can test
> the patch below?

Applied it to tip:out-of-tree for testing, and have dropped the revert as
well. Will let you know how it goes. (if you dont hear from me later today you
an assume it's all fixed.)

Thanks,

Ingo

2009-12-28 12:02:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash


* Ingo Molnar <[email protected]> wrote:

> * Len Brown <[email protected]> wrote:
>
> > On Sat, 26 Dec 2009, Len Brown wrote:
> >
> > > > [ 28.643216] evbug.c: Connected device: input1 (Dell WMI hotkeys at wmi/input0)
> > > > [ 28.650449] BUG: unable to handle kernel NULL pointer dereference at 00000014
> > > > [ 28.654439] IP: [<c17f7f21>] wmi_install_notify_handler+0x31/0x70
> > >
> > > Rather than reverting the broken patch that caused wmi to load,
> > > does applying this patch to deal with the broken error handling
> > > cause the oops to go away?
> >
> > > Subject: [PATCH] Revert "wmi: Free the allocated acpi objects through wmi_get_event_data"
> > >
> > > This reverts commit 3e9b988e4edf065d39c1343937f717319b1c1065.
> >
> > These kfree's look correct, assuming we properly check
> > the return status. So perhaps instead you can test
> > the patch below?
>
> Applied it to tip:out-of-tree for testing, and have dropped the revert as
> well. Will let you know how it goes. (if you dont hear from me later today
> you an assume it's all fixed.)

Still a very similar looking crash (attached). I went for the plain revert in
tip:out-of-tree again.

(Note that the system does not have this hardware, and that it's booted with
the driver built-in. So the relevant codepath should be very simple. Config
attached.)

Ingo

[ 27.447053] initcall compal_init+0x0/0xf7 returned -19 after 3 usecs
[ 27.453409] calling dell_wmi_init+0x0/0x129 @ 1
[ 27.458255] PM: Adding info for No Bus:input3
[ 27.462676] input: Dell WMI hotkeys as /class/input/input3
[ 27.468179] BUG: unable to handle kernel NULL pointer dereference at 00000014
[ 27.472165] IP: [<c1f26aa8>] wmi_install_notify_handler+0x28/0x80
[ 27.472165] *pde = 00000000
[ 27.472165] Oops: 0000 [#1] PREEMPT SMP
[ 27.472165] last sysfs file:
[ 27.472165]
[ 27.472165] Pid: 1, comm: swapper Not tainted 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297 A8N-E/System Product Name
[ 27.472165] EIP: 0060:[<c1f26aa8>] EFLAGS: 00010282 CPU: 0
[ 27.472165] EIP is at wmi_install_notify_handler+0x28/0x80
[ 27.472165] EAX: fffffff4 EBX: c1f1e200 ECX: 22b612b0 EDX: c2b2d1e0
[ 27.472165] ESI: 00000000 EDI: 00000001 EBP: f64b3f84 ESP: f64b3f78
[ 27.472165] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 27.472165] Process swapper (pid: 1, ti=f64b3000 task=f64d0000 task.ti=f64b3000)
[ 27.472165] Stack:
[ 27.472165] fffffff4 00000000 00000006 f64b3f98 c28d70d7 64a0689d 00000006 64a0689d
[ 27.472165] <0> f64b3fc4 c100112b c255aa54 c28d6fb6 00000001 00000003 00000000 c28d6fb6
[ 27.472165] <0> c294a50c c2882310 00000000 f64b3fd0 c28822fd c29493e8 f64b3fe4 c2882392
[ 27.472165] Call Trace:
[ 27.472165] [<c28d70d7>] ? dell_wmi_init+0x121/0x129
[ 27.472165] [<c100112b>] ? do_one_initcall+0x2b/0x1c0
[ 27.472165] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[ 27.472165] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[ 27.472165] [<c2882310>] ? kernel_init+0x0/0xc9
[ 27.472165] [<c28822fd>] ? do_basic_setup+0x44/0x57
[ 27.472165] [<c2882392>] ? kernel_init+0x82/0xc9
[ 27.472165] [<c1039816>] ? kernel_thread_helper+0x6/0x10
[ 27.472165] Code: 00 00 00 55 89 e5 56 53 83 ec 04 0f 1f 44 00 00 89 d3 89 ce 85 c0 74 2b 85 d2 74 27 8d 55 f4 e8 2f fd ff ff 8b 45 f4 85 c0 74 48 <8b> 50 20 85 d2 74 21 b8 15 00 00 00 59 5b 5e 5d c3 8d b4 26 00
[ 27.472165] EIP: [<c1f26aa8>] wmi_install_notify_handler+0x28/0x80 SS:ESP 0068:f64b3f78
[ 27.472165] CR2: 0000000000000014
[ 27.638896] ---[ end trace 7a45c2b3ab0f183e ]---
[ 27.643524] Kernel panic - not syncing: Fatal exception
[ 27.648758] Pid: 1, comm: swapper Tainted: G D 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297
[ 27.657546] Call Trace:
[ 27.659997] [<c2170a4a>] ? printk+0x1d/0x1f
[ 27.664281] [<c217097f>] panic+0x52/0x100
[ 27.668390] [<c103ce33>] oops_end+0xb3/0xc0
[ 27.672673] [<c1064d74>] no_context+0xb4/0xd0
[ 27.677124] [<c1064e27>] __bad_area_nosemaphore+0x97/0x140
[ 27.682706] [<c1082437>] ? vprintk+0x297/0x420
[ 27.687248] [<c1064ee7>] bad_area_nosemaphore+0x17/0x20
[ 27.692570] [<c10652c6>] do_page_fault+0x296/0x350
[ 27.697460] [<c1f1e200>] ? dell_wmi_notify+0x0/0x1c0
[ 27.702518] [<c1065030>] ? do_page_fault+0x0/0x350
[ 27.707408] [<c2173a16>] error_code+0x66/0x70
[ 27.711863] [<c1f1e200>] ? dell_wmi_notify+0x0/0x1c0
[ 27.716924] [<c1065030>] ? do_page_fault+0x0/0x350
[ 27.721810] [<c1f26aa8>] ? wmi_install_notify_handler+0x28/0x80
[ 27.727827] [<c28d70d7>] dell_wmi_init+0x121/0x129
[ 27.732713] [<c100112b>] do_one_initcall+0x2b/0x1c0
[ 27.737688] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[ 27.742576] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
[ 27.747465] [<c2882310>] ? kernel_init+0x0/0xc9
[ 27.752094] [<c28822fd>] do_basic_setup+0x44/0x57
[ 27.756896] [<c2882392>] kernel_init+0x82/0xc9
[ 27.761436] [<c1039816>] kernel_thread_helper+0x6/0x10


Attachments:
(No filename) (4.82 kB)
config (73.58 kB)
Download all attachments

2009-12-28 15:04:40

by Paul Rolland

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Hello,

On Mon, 28 Dec 2009 13:01:25 +0100
Ingo Molnar <[email protected]> wrote:

> (Note that the system does not have this hardware, and that it's booted
> with the driver built-in. So the relevant codepath should be very simple.
> Config attached.)
>
> Ingo
>
> [ 27.447053] initcall compal_init+0x0/0xf7 returned -19 after 3 usecs
> [ 27.453409] calling dell_wmi_init+0x0/0x129 @ 1
> [ 27.458255] PM: Adding info for No Bus:input3
> [ 27.462676] input: Dell WMI hotkeys as /class/input/input3
> [ 27.468179] BUG: unable to handle kernel NULL pointer dereference at
> 00000014 [ 27.472165] IP: [<c1f26aa8>]
> wmi_install_notify_handler+0x28/0x80 [ 27.472165] *pde = 00000000
> [ 27.472165] Oops: 0000 [#1] PREEMPT SMP
> [ 27.472165] last sysfs file:
> [ 27.472165]
> [ 27.472165] Pid: 1, comm: swapper Not tainted
> 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297 A8N-E/System Product Name
> [ 27.472165] EIP: 0060:[<c1f26aa8>] EFLAGS: 00010282 CPU: 0
> [ 27.472165] EIP is at wmi_install_notify_handler+0x28/0x80
> [ 27.472165] EAX: fffffff4 EBX: c1f1e200 ECX: 22b612b0 EDX: c2b2d1e0
> [ 27.472165] ESI: 00000000 EDI: 00000001 EBP: f64b3f84 ESP: f64b3f78
> [ 27.472165] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 27.472165] Process swapper (pid: 1, ti=f64b3000 task=f64d0000
> task.ti=f64b3000) [ 27.472165] Stack: [ 27.472165] fffffff4 00000000
> 00000006 f64b3f98 c28d70d7 64a0689d 00000006 64a0689d [ 27.472165] <0>
> f64b3fc4 c100112b c255aa54 c28d6fb6 00000001 00000003 00000000 c28d6fb6
> [ 27.472165] <0> c294a50c c2882310 00000000 f64b3fd0 c28822fd c29493e8
> f64b3fe4 c2882392 [ 27.472165] Call Trace: [ 27.472165]
> [<c28d70d7>] ? dell_wmi_init+0x121/0x129 [ 27.472165] [<c100112b>] ?
> do_one_initcall+0x2b/0x1c0 [ 27.472165] [<c28d6fb6>] ?
> dell_wmi_init+0x0/0x129 [ 27.472165] [<c28d6fb6>] ?
> dell_wmi_init+0x0/0x129 [ 27.472165] [<c2882310>] ?
> kernel_init+0x0/0xc9 [ 27.472165] [<c28822fd>] ?
> do_basic_setup+0x44/0x57 [ 27.472165] [<c2882392>] ?
> kernel_init+0x82/0xc9 [ 27.472165] [<c1039816>] ?
> kernel_thread_helper+0x6/0x10 [ 27.472165] Code: 00 00 00 55 89 e5 56
> 53 83 ec 04 0f 1f 44 00 00 89 d3 89 ce 85 c0 74 2b 85 d2 74 27 8d 55 f4
> e8 2f fd ff ff 8b 45 f4 85 c0 74 48 <8b> 50 20 85 d2 74 21 b8 15 00 00 00
> 59 5b 5e 5d c3 8d b4 26 00 [ 27.472165] EIP: [<c1f26aa8>]
> wmi_install_notify_handler+0x28/0x80 SS:ESP 0068:f64b3f78 [ 27.472165]
> CR2: 0000000000000014 [ 27.638896] ---[ end trace 7a45c2b3ab0f183e ]---
> [ 27.643524] Kernel panic - not syncing: Fatal exception [ 27.648758]
> Pid: 1, comm: swapper Tainted: G D
> 2.6.33-rc2-tip-00212-g3c2365e-dirty #3297 [ 27.657546] Call Trace:
> [ 27.659997] [<c2170a4a>] ? printk+0x1d/0x1f [ 27.664281]
> [<c217097f>] panic+0x52/0x100 [ 27.668390] [<c103ce33>]
> oops_end+0xb3/0xc0 [ 27.672673] [<c1064d74>] no_context+0xb4/0xd0
> [ 27.677124] [<c1064e27>] __bad_area_nosemaphore+0x97/0x140
> [ 27.682706] [<c1082437>] ? vprintk+0x297/0x420 [ 27.687248]
> [<c1064ee7>] bad_area_nosemaphore+0x17/0x20 [ 27.692570] [<c10652c6>]
> do_page_fault+0x296/0x350 [ 27.697460] [<c1f1e200>] ?
> dell_wmi_notify+0x0/0x1c0 [ 27.702518] [<c1065030>] ?
> do_page_fault+0x0/0x350 [ 27.707408] [<c2173a16>] error_code+0x66/0x70
> [ 27.711863] [<c1f1e200>] ? dell_wmi_notify+0x0/0x1c0
> [ 27.716924] [<c1065030>] ? do_page_fault+0x0/0x350
> [ 27.721810] [<c1f26aa8>] ? wmi_install_notify_handler+0x28/0x80
> [ 27.727827] [<c28d70d7>] dell_wmi_init+0x121/0x129
> [ 27.732713] [<c100112b>] do_one_initcall+0x2b/0x1c0
> [ 27.737688] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
> [ 27.742576] [<c28d6fb6>] ? dell_wmi_init+0x0/0x129
> [ 27.747465] [<c2882310>] ? kernel_init+0x0/0xc9
> [ 27.752094] [<c28822fd>] do_basic_setup+0x44/0x57
> [ 27.756896] [<c2882392>] kernel_init+0x82/0xc9
> [ 27.761436] [<c1039816>] kernel_thread_helper+0x6/0x10

Looks like it is very similar to what I have when I run :
modprobe dell-wmi
on my Dell laptop (Vostro 1520).

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:Oops: 0000 [#1] SMP

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:Stack:

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:Call Trace:

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:Code: 65 f8 48 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44

Message from syslogd@tux at Dec 28 16:00:22 ...
kernel:CR2: 0000000100000024
Dec 28 16:00:22 tux kernel: ACPI: WMI: Mapper loaded
Dec 28 16:00:22 tux kernel: input: Dell WMI hotkeys as /devices/virtual/input/input13
Dec 28 16:00:22 tux kernel: BUG: unable to handle kernel paging request at 0000000100000024
Dec 28 16:00:22 tux kernel: IP: [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi]
Dec 28 16:00:22 tux kernel: PGD 565f3067 PUD 0
Dec 28 16:00:22 tux kernel: Oops: 0000 [#1] SMP
Dec 28 16:00:22 tux kernel: last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
Dec 28 16:00:22 tux kernel: CPU 1
Dec 28 16:00:22 tux kernel: Pid: 20215, comm: modprobe Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520
Dec 28 16:00:22 tux kernel: RIP: 0010:[<ffffffffa0003789>] [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi]
Dec 28 16:00:22 tux kernel: RSP: 0018:ffff88009434bed8 EFLAGS: 00010202
Dec 28 16:00:22 tux kernel: RAX: 0000000000000015 RBX: ffffffffa014f0f0 RCX: 000000000000009d
Dec 28 16:00:22 tux kernel: RDX: 00000000fffffff4 RSI: ffff88009434be98 RDI: ffff88009434be8c
Dec 28 16:00:22 tux kernel: RBP: ffff88009434bef8 R08: ffff88009434be88 R09: 0000000000000010
Dec 28 16:00:22 tux kernel: R10: ffff88009434bee0 R11: 0000000000000006 R12: 0000000000000000
Dec 28 16:00:22 tux kernel: R13: 0000000002188470 R14: 0000000000000000 R15: 0000000000000000
Dec 28 16:00:22 tux kernel: FS: 00007fb6aa7446f0(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
Dec 28 16:00:22 tux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 28 16:00:22 tux kernel: CR2: 0000000100000024 CR3: 00000000bd319000 CR4: 00000000000406e0
Dec 28 16:00:22 tux kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 28 16:00:22 tux kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 28 16:00:22 tux kernel: Process modprobe (pid: 20215, threadinfo ffff88009434a000, task ffff8800a3a0ace0)
Dec 28 16:00:22 tux kernel: Stack:
Dec 28 16:00:22 tux kernel: ffff88009434bef8 00000000fffffff4 0000000000000000 ffffffffa0152000
Dec 28 16:00:22 tux kernel: <0> ffff88009434bf18 ffffffffa0152142 0000000000000000 0000000000000000
Dec 28 16:00:22 tux kernel: <0> ffff88009434bf48 ffffffff810001d7 0000000000000000 ffffffffa014f780
Dec 28 16:00:22 tux kernel: Call Trace:
Dec 28 16:00:22 tux kernel: [<ffffffffa0152000>] ? dell_wmi_init+0x0/0x14a [dell_wmi]
Dec 28 16:00:22 tux kernel: [<ffffffffa0152142>] dell_wmi_init+0x142/0x14a [dell_wmi]
Dec 28 16:00:22 tux kernel: [<ffffffff810001d7>] do_one_initcall+0x37/0x190
Dec 28 16:00:22 tux kernel: [<ffffffff81074408>] sys_init_module+0xd8/0x250
Dec 28 16:00:22 tux kernel: [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
Dec 28 16:00:22 tux kernel: Code: 65 f8 48 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44
Dec 28 16:00:22 tux kernel: RIP [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi]
Dec 28 16:00:22 tux kernel: RSP <ffff88009434bed8>
Dec 28 16:00:22 tux kernel: CR2: 0000000100000024
Dec 28 16:00:22 tux kernel: ---[ end trace d18b623021a6c139 ]---


--
Paul Rolland E-Mail : rol(at)witbe.net
CTO - Witbe.net SA Tel. +33 (0)1 47 67 77 77
Les Collines de l'Arche Fax. +33 (0)1 47 67 77 99
F-92057 Paris La Defense RIPE : PR12-RIPE
LinkedIn : http://www.linkedin.com/in/paulrolland

This is dedicated to all the ones who want to control Internet, its
content or its usage :

"I worry about my child and the Internet all the time, even though she's
too young to have logged on yet. Here's what I worry about. I worry that 10
or 15 years from now, she will come to me and say 'Daddy, where were you
when they took freedom of the press away from the Internet?'"
--Mike Godwin, Electronic Frontier Foundation

2009-12-28 16:30:07

by Paul Rolland

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Hello,

On Mon, 28 Dec 2009 16:02:12 +0100
Paul Rolland (ポール・ロラン) <[email protected]> wrote:

> Message from syslogd@tux at Dec 28 16:00:22 ...
> kernel:CR2: 0000000100000024
> Dec 28 16:00:22 tux kernel: ACPI: WMI: Mapper loaded
> Dec 28 16:00:22 tux kernel: input: Dell WMI hotkeys
> as /devices/virtual/input/input13 Dec 28 16:00:22 tux kernel: BUG: unable
> to handle kernel paging request at 0000000100000024 Dec 28 16:00:22 tux
> kernel: IP: [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80
> [wmi] Dec 28 16:00:22 tux kernel: PGD 565f3067 PUD 0 Dec 28 16:00:22 tux
> kernel: Oops: 0000 [#1] SMP Dec 28 16:00:22 tux kernel: last sysfs
> file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq Dec 28
> 16:00:22 tux kernel: CPU 1 Dec 28 16:00:22 tux kernel: Pid: 20215, comm:
> modprobe Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520 Dec 28 16:00:22 tux
> kernel: RIP: 0010:[<ffffffffa0003789>] [<ffffffffa0003789>]
> wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28 16:00:22 tux kernel:
> RSP: 0018:ffff88009434bed8 EFLAGS: 00010202 Dec 28 16:00:22 tux kernel:
> RAX: 0000000000000015 RBX: ffffffffa014f0f0 RCX: 000000000000009d Dec 28
> 16:00:22 tux kernel: RDX: 00000000fffffff4 RSI: ffff88009434be98 RDI:
> ffff88009434be8c Dec 28 16:00:22 tux kernel: RBP: ffff88009434bef8 R08:
> ffff88009434be88 R09: 0000000000000010 Dec 28 16:00:22 tux kernel: R10:
> ffff88009434bee0 R11: 0000000000000006 R12: 0000000000000000 Dec 28
> 16:00:22 tux kernel: R13: 0000000002188470 R14: 0000000000000000 R15:
> 0000000000000000 Dec 28 16:00:22 tux kernel: FS: 00007fb6aa7446f0(0000)
> GS:ffff880028300000(0000) knlGS:0000000000000000 Dec 28 16:00:22 tux
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 28 16:00:22
> tux kernel: CR2: 0000000100000024 CR3: 00000000bd319000 CR4:
> 00000000000406e0 Dec 28 16:00:22 tux kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000 Dec 28 16:00:22 tux kernel: DR3:
> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 28
> 16:00:22 tux kernel: Process modprobe (pid: 20215, threadinfo
> ffff88009434a000, task ffff8800a3a0ace0) Dec 28 16:00:22 tux kernel:
> Stack: Dec 28 16:00:22 tux kernel: ffff88009434bef8 00000000fffffff4
> 0000000000000000 ffffffffa0152000 Dec 28 16:00:22 tux kernel: <0>
> ffff88009434bf18 ffffffffa0152142 0000000000000000 0000000000000000 Dec
> 28 16:00:22 tux kernel: <0> ffff88009434bf48 ffffffff810001d7
> 0000000000000000 ffffffffa014f780 Dec 28 16:00:22 tux kernel: Call Trace:
> Dec 28 16:00:22 tux kernel: [<ffffffffa0152000>] ?
> dell_wmi_init+0x0/0x14a [dell_wmi] Dec 28 16:00:22 tux kernel:
> [<ffffffffa0152142>] dell_wmi_init+0x142/0x14a [dell_wmi] Dec 28 16:00:22
> tux kernel: [<ffffffff810001d7>] do_one_initcall+0x37/0x190 Dec 28
> 16:00:22 tux kernel: [<ffffffff81074408>] sys_init_module+0xd8/0x250 Dec
> 28 16:00:22 tux kernel: [<ffffffff810024ab>]
> system_call_fastpath+0x16/0x1b Dec 28 16:00:22 tux kernel: Code: 65 f8 48
> 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff
> 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48
> 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 Dec 28 16:00:22 tux kernel: RIP
> [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28
> 16:00:22 tux kernel: RSP <ffff88009434bed8> Dec 28 16:00:22 tux kernel:
> CR2: 0000000100000024 Dec 28 16:00:22 tux kernel: ---[ end trace
> d18b623021a6c139 ]---


I've fixed this one by changing wmi.c :

--- wmi.c 2009-12-28 17:11:52.000000000 +0100
+++ wmi.c.changed 2009-12-28 17:12:17.000000000 +0100
@@ -488,12 +488,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;

if (!guid || !handler)
return AE_BAD_PARAMETER;

- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;

if (block->handler)

Signed-off-by: [email protected] <Paul Rolland>

but now I have :
input: Dell WMI hotkeys as /devices/virtual/input/input13
dell-wmi: Unable to register notify handler - 6
sys_init_module: 'dell_wmi'->init suspiciously returned 6, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 2220, comm: modprobe Not tainted 2.6.33-rc2 #1
Call Trace:
[<ffffffff8107450a>] sys_init_module+0x1da/0x250
[<ffffffff810024ab>] system_call_fastpath+0x16/0x1b


Then, I tried to remove dell-wmi to fix the warning, and then I have :
BUG: unable to handle kernel NULL pointer dereference at 00000000000008b0
IP: [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
PGD bb506067 PUD bd170067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
CPU 1
Pid: 3833, comm: rmmod Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520
RIP: 0010:[<ffffffffa024971b>] [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
RSP: 0000:ffff8800bd3e7ea8 EFLAGS: 00010202
RAX: 000000000000000a RBX: 0000000000000880 RCX: 000000000000009d
RDX: 0000000000000000 RSI: ffff8800bd3e7e68 RDI: 0000000000000880
RBP: ffff8800bd3e7eb8 R08: ffff8800bd3e7e58 R09: 0000000000000010
R10: ffff8800bd3e7eb0 R11: 0000000000000006 R12: ffffffffa0251780
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f628c4c46f0(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000008b0 CR3: 00000000bb429000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 3833, threadinfo ffff8800bd3e6000, task ffff88013e43d9c0)
Stack:
ffff8800bd3e7ec8 0000000000000880 ffff8800bd3e7ec8 ffffffffa0251374
<0> ffff8800bd3e7f78 ffffffff81072091 ffff88010aa39040 ffff88013e43d9c0
<0> ffffffffa0251780 ffffffff00000880 ffff8800bd3e7f14 ffffffff81024461
Call Trace:
[<ffffffffa0251374>] dell_wmi_exit+0x10/0x1e [dell_wmi]
[<ffffffff81072091>] sys_delete_module+0x1b1/0x290
[<ffffffff81024461>] ? do_page_fault+0x131/0x2d0
[<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
Code: 55 b8 01 10 00 00 48 89 e5 48 83 ec 10 48 85 ff 74 3f 48 8d 75 f8 e8 95 fd ff ff 48 8b 7d f8 b8 06 00 00 00 48 85 ff 74 28 b0 0a <48> 83 7f 30 00 74 1f 31 f6 e8 37 ff ff ff 48 8b 55 f8 48 c7 42
RIP [<ffffffffa024971b>] wmi_remove_notify_handler+0x2b/0x60 [wmi]
RSP <ffff8800bd3e7ea8>
CR2: 00000000000008b0
---[ end trace b8447e2a4bf54240 ]---

Paul

2009-12-28 16:55:18

by Paul Rolland

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Hello,

> Message from syslogd@tux at Dec 28 16:00:22 ...
> kernel:CR2: 0000000100000024
> Dec 28 16:00:22 tux kernel: ACPI: WMI: Mapper loaded
> Dec 28 16:00:22 tux kernel: input: Dell WMI hotkeys
> as /devices/virtual/input/input13 Dec 28 16:00:22 tux kernel: BUG: unable
> to handle kernel paging request at 0000000100000024 Dec 28 16:00:22 tux
> kernel: IP: [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80
> [wmi] Dec 28 16:00:22 tux kernel: PGD 565f3067 PUD 0 Dec 28 16:00:22 tux
> kernel: Oops: 0000 [#1] SMP Dec 28 16:00:22 tux kernel: last sysfs
> file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq Dec 28
> 16:00:22 tux kernel: CPU 1 Dec 28 16:00:22 tux kernel: Pid: 20215, comm:
> modprobe Not tainted 2.6.33-rc2 #1 0T816J/Vostro 1520 Dec 28 16:00:22 tux
> kernel: RIP: 0010:[<ffffffffa0003789>] [<ffffffffa0003789>]
> wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28 16:00:22 tux kernel:
> RSP: 0018:ffff88009434bed8 EFLAGS: 00010202 Dec 28 16:00:22 tux kernel:
> RAX: 0000000000000015 RBX: ffffffffa014f0f0 RCX: 000000000000009d Dec 28
> 16:00:22 tux kernel: RDX: 00000000fffffff4 RSI: ffff88009434be98 RDI:
> ffff88009434be8c Dec 28 16:00:22 tux kernel: RBP: ffff88009434bef8 R08:
> ffff88009434be88 R09: 0000000000000010 Dec 28 16:00:22 tux kernel: R10:
> ffff88009434bee0 R11: 0000000000000006 R12: 0000000000000000 Dec 28
> 16:00:22 tux kernel: R13: 0000000002188470 R14: 0000000000000000 R15:
> 0000000000000000 Dec 28 16:00:22 tux kernel: FS: 00007fb6aa7446f0(0000)
> GS:ffff880028300000(0000) knlGS:0000000000000000 Dec 28 16:00:22 tux
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 28 16:00:22
> tux kernel: CR2: 0000000100000024 CR3: 00000000bd319000 CR4:
> 00000000000406e0 Dec 28 16:00:22 tux kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2: 0000000000000000 Dec 28 16:00:22 tux kernel: DR3:
> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 28
> 16:00:22 tux kernel: Process modprobe (pid: 20215, threadinfo
> ffff88009434a000, task ffff8800a3a0ace0) Dec 28 16:00:22 tux kernel:
> Stack: Dec 28 16:00:22 tux kernel: ffff88009434bef8 00000000fffffff4
> 0000000000000000 ffffffffa0152000 Dec 28 16:00:22 tux kernel: <0>
> ffff88009434bf18 ffffffffa0152142 0000000000000000 0000000000000000 Dec
> 28 16:00:22 tux kernel: <0> ffff88009434bf48 ffffffff810001d7
> 0000000000000000 ffffffffa014f780 Dec 28 16:00:22 tux kernel: Call Trace:
> Dec 28 16:00:22 tux kernel: [<ffffffffa0152000>] ?
> dell_wmi_init+0x0/0x14a [dell_wmi] Dec 28 16:00:22 tux kernel:
> [<ffffffffa0152142>] dell_wmi_init+0x142/0x14a [dell_wmi] Dec 28 16:00:22
> tux kernel: [<ffffffff810001d7>] do_one_initcall+0x37/0x190 Dec 28
> 16:00:22 tux kernel: [<ffffffff81074408>] sys_init_module+0xd8/0x250 Dec
> 28 16:00:22 tux kernel: [<ffffffff810024ab>]
> system_call_fastpath+0x16/0x1b Dec 28 16:00:22 tux kernel: Code: 65 f8 48
> 89 f3 49 89 d4 48 85 f6 74 35 48 85 ff 74 30 48 8d 75 e8 e8 27 fd ff ff
> 48 8b 55 e8 b8 06 00 00 00 48 85 d2 74 09 b0 15 <48> 83 7a 30 00 74 20 48
> 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 Dec 28 16:00:22 tux kernel: RIP
> [<ffffffffa0003789>] wmi_install_notify_handler+0x39/0x80 [wmi] Dec 28
> 16:00:22 tux kernel: RSP <ffff88009434bed8> Dec 28 16:00:22 tux kernel:
> CR2: 0000000100000024 Dec 28 16:00:22 tux kernel: ---[ end trace
> d18b623021a6c139 ]---
>

Fixed this completely with the following two patches on top of 2.6.33-rc2 :

--- wmi.c.orig 2009-12-28 17:27:15.000000000 +0100
+++ wmi.c 2009-12-28 17:39:01.000000000 +0100
@@ -488,12 +488,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;

if (!guid || !handler)
return AE_BAD_PARAMETER;

- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;

if (block->handler)
@@ -517,12 +518,13 @@
{
struct wmi_block *block;
acpi_status status;
+ bool err;

if (!guid)
return AE_BAD_PARAMETER;

- find_guid(guid, &block);
- if (!block)
+ err = find_guid(guid, &block);
+ if (!err)
return AE_NOT_EXIST;

if (!block->handler)

--- dell-wmi.c.orig 2009-12-28 17:49:09.000000000 +0100
+++ dell-wmi.c 2009-12-28 17:41:36.000000000 +0100
@@ -343,7 +343,7 @@
printk(KERN_ERR
"dell-wmi: Unable to register notify handler - %d\n",
err);
- return err;
+ return -err;
}

return 0;


Signed-off-by: [email protected] <Paul Rolland>

Paul

2009-12-28 20:17:27

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Hi Paul,

On Mon, Dec 28, 2009 at 05:53:01PM +0100, Paul Rolland wrote:
>
> Fixed this completely with the following two patches on top of 2.6.33-rc2 :
>
> --- wmi.c.orig 2009-12-28 17:27:15.000000000 +0100
> +++ wmi.c 2009-12-28 17:39:01.000000000 +0100
> @@ -488,12 +488,13 @@
> {
> struct wmi_block *block;
> acpi_status status;
> + bool err;
>
> if (!guid || !handler)
> return AE_BAD_PARAMETER;
>
> - find_guid(guid, &block);
> - if (!block)
> + err = find_guid(guid, &block);
> + if (!err)
> return AE_NOT_EXIST;
>
> if (block->handler)
> @@ -517,12 +518,13 @@
> {
> struct wmi_block *block;
> acpi_status status;
> + bool err;
>
> if (!guid)
> return AE_BAD_PARAMETER;
>
> - find_guid(guid, &block);
> - if (!block)
> + err = find_guid(guid, &block);
> + if (!err)
> return AE_NOT_EXIST;
>
> if (!block->handler)
>
> --- dell-wmi.c.orig 2009-12-28 17:49:09.000000000 +0100
> +++ dell-wmi.c 2009-12-28 17:41:36.000000000 +0100
> @@ -343,7 +343,7 @@
> printk(KERN_ERR
> "dell-wmi: Unable to register notify handler - %d\n",
> err);
> - return err;
> + return -err;

This still leaks AE_* error codes to the upper layers which do not care
for them and expect the errors from Exxxx namespace.

I wonder why wmi is not using standard error codes but instead decided
to come up with it's own.

--
Dmitry

2009-12-30 06:15:14

by Len Brown

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Thanks for the quick test/debug Paul,

> Fixed this completely with the following two patches on top of 2.6.33-rc2 :
>
> --- wmi.c.orig 2009-12-28 17:27:15.000000000 +0100
> +++ wmi.c 2009-12-28 17:39:01.000000000 +0100
> @@ -488,12 +488,13 @@
> {
> struct wmi_block *block;
> acpi_status status;
> + bool err;
>
> if (!guid || !handler)
> return AE_BAD_PARAMETER;
>
> - find_guid(guid, &block);
> - if (!block)
> + err = find_guid(guid, &block);
> + if (!err)

Unfortunately, find_guid() returns 1 for success
and 0 for failure, making this code look backwards.

I've applied this, but changed it to if (!find_guid())
to be consistent with the other callers.

> return AE_NOT_EXIST;
>
> if (block->handler)
> @@ -517,12 +518,13 @@
> {
> struct wmi_block *block;
> acpi_status status;
> + bool err;
>
> if (!guid)
> return AE_BAD_PARAMETER;
>
> - find_guid(guid, &block);
> - if (!block)
> + err = find_guid(guid, &block);
> + if (!err)
> return AE_NOT_EXIST;
>
> if (!block->handler)
>
> --- dell-wmi.c.orig 2009-12-28 17:49:09.000000000 +0100
> +++ dell-wmi.c 2009-12-28 17:41:36.000000000 +0100
> @@ -343,7 +343,7 @@
> printk(KERN_ERR
> "dell-wmi: Unable to register notify handler - %d\n",
> err);
> - return err;
> + return -err;

This change is not quite correct,
but the correct fix is already in the acpi-test tree in the patch
"dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21,
it should follow 0/-E convention"

I'll reply with both of the patches in a sec.
Please let me know if they are sufficient to fix your system.

thanks,
-Len Brown, Intel Open Source Technology Center

> }
>
> return 0;
>
>
> Signed-off-by: [email protected] <Paul Rolland>
>
> Paul
>

2009-12-30 06:19:39

by Len Brown

[permalink] [raw]
Subject: [PATCH] wmi: check find_guid() return value to prevent oops

From: [email protected] <Paul Rolland>

Signed-off-by: [email protected] <Paul Rolland>
Signed-off-by: Len Brown <[email protected]>
---
drivers/platform/x86/wmi.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
index 9f93d6c..cc9ad74 100644
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -492,8 +492,7 @@ wmi_notify_handler handler, void *data)
if (!guid || !handler)
return AE_BAD_PARAMETER;

- find_guid(guid, &block);
- if (!block)
+ if (!find_guid(guid, &block))
return AE_NOT_EXIST;

if (block->handler)
@@ -521,8 +520,7 @@ acpi_status wmi_remove_notify_handler(const char *guid)
if (!guid)
return AE_BAD_PARAMETER;

- find_guid(guid, &block);
- if (!block)
+ if (!find_guid(guid, &block))
return AE_NOT_EXIST;

if (!block->handler)
--
1.6.6.rc4.11.g129a5

2009-12-30 06:21:54

by Len Brown

[permalink] [raw]
Subject: [PATCH] dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21, it should follow 0/-E convention

From: Len Brown <[email protected]>

wmi_install_notify_handler() returns an acpi_error,
but dell_wmi_init() needs return a -errno style error.

Signed-off-by: Len Brown <[email protected]>
---
drivers/platform/x86/dell-wmi.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 916ccb2..4c7e702 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -323,6 +323,7 @@ static int __init dell_wmi_input_setup(void)
static int __init dell_wmi_init(void)
{
int err;
+ acpi_status status;

if (wmi_has_guid(DELL_EVENT_GUID)) {
printk(KERN_WARNING "dell-wmi: No known WMI GUID found\n");
@@ -336,14 +337,14 @@ static int __init dell_wmi_init(void)
if (err)
return err;

- err = wmi_install_notify_handler(DELL_EVENT_GUID,
+ status = wmi_install_notify_handler(DELL_EVENT_GUID,
dell_wmi_notify, NULL);
- if (err) {
+ if (ACPI_FAILURE(status)) {
input_unregister_device(dell_wmi_input_dev);
printk(KERN_ERR
"dell-wmi: Unable to register notify handler - %d\n",
- err);
- return err;
+ status);
+ return -ENODEV;
}

return 0;
--
1.6.6.rc4.11.g129a5

2009-12-30 07:15:20

by Paul Rolland

[permalink] [raw]
Subject: Re: -tip: origin tree boot crash

Hi Len,

On Wed, 30 Dec 2009 01:14:48 -0500 (EST)
Len Brown <[email protected]> wrote:

> Thanks for the quick test/debug Paul,
>
> > Fixed this completely with the following two patches on top of
> > 2.6.33-rc2 :
> >
> > --- wmi.c.orig 2009-12-28 17:27:15.000000000 +0100
> > +++ wmi.c 2009-12-28 17:39:01.000000000 +0100
> > @@ -488,12 +488,13 @@
> > {
> > struct wmi_block *block;
> > acpi_status status;
> > + bool err;
> >
> > if (!guid || !handler)
> > return AE_BAD_PARAMETER;
> >
> > - find_guid(guid, &block);
> > - if (!block)
> > + err = find_guid(guid, &block);
> > + if (!err)
>
> Unfortunately, find_guid() returns 1 for success
> and 0 for failure, making this code look backwards.
>
> I've applied this, but changed it to if (!find_guid())
> to be consistent with the other callers.

That's ok, thanks for reviewing !

> > --- dell-wmi.c.orig 2009-12-28 17:49:09.000000000 +0100
> > +++ dell-wmi.c 2009-12-28 17:41:36.000000000 +0100
> > @@ -343,7 +343,7 @@
> > printk(KERN_ERR
> > "dell-wmi: Unable to register notify handler -
> > %d\n", err);
> > - return err;
> > + return -err;
>
> This change is not quite correct,
> but the correct fix is already in the acpi-test tree in the patch
> "dell-wmi: sys_init_module: 'dell_wmi'->init suspiciously returned 21,
> it should follow 0/-E convention"

Yes, Dmitry sent a mail indicating that yesterday. I must admit I've been
too fast with that part of my fix ,(

> I'll reply with both of the patches in a sec.
> Please let me know if they are sufficient to fix your system.

They are. You can add a :
Tested-by: Paul Rolland <[email protected]>
for what it matters ;)

Paul

2009-12-30 21:11:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)



On Mon, 28 Dec 2009, KOSAKI Motohiro wrote:
> > [<ffffffff810694c0>] __lock_acquire+0x1373/0x16fd
> > [<ffffffff8106993c>] lock_acquire+0xf2/0x116
> > [<ffffffff810bb2b5>] might_fault+0x95/0xb8 <- mmap_sem
> > [<ffffffff810e87d6>] filldir+0x75/0xd0 <- sysfs_mutex
> > [<ffffffff8112be2a>] sysfs_readdir+0x10f/0x149
>
> This output seems to suggest need to fix drm.

Actually, this painful dependency may technically be a bug in drm, but at
the same time, it's really filldir() that makes it _very_ hard for certain
locking scenarios because it has that callback to the VFS layer that takes
the mmap_sem, and sysfs itself wants the sysfs_mutex in paths where it is
not at all unreasonable to hold the mmap_sem.

We've seen it several times (yes, mostly with drm, but it's been seen with
others too), and it's very annoying. It can be fixed by having very
careful readdir implementations, but I really would blame sysfs in
particular for having a very annoying lock reversal issue when used
reasonably.

So the optimal situation would be for sysfs to not have that annoying lock
dependency, and it would really have to be sysfs_readdir() that drops the
sysfs_mutex around the filldir call (and that obviously implies having to
re-validate and be really careful).

Added Eric and Greg to the cc, in case the sysfs people want to solve it.

The other alternative would be to fix this on a VFS layer by changing how
'readdir/filldir' interacts, and instead make it fill in a kernel buffer -
avoiding the mmap_sem issue entirely. And than later (in readdir) that
kernel buffer could be copied to user space without holding any other
locks.

I like the VFS approach because I think we could possibly use that as a
first approach to eventually try to think about caching readdir() results
at a VFS level - readdir() is currently the _only_ main filesystem
callback that always calls into the low-level filesystem, and always takes
a lot of locks. I'm adding Al to the Cc for that - he knows about this
issue from me previously thinking aloud along these lines.

And yes, one option would be to just fix drm - by avoiding calling any
sysfs functions while holding the mmap_lock (either in the mmap callback
or the page fault paths). However, as mentioned, I really do think that
the blame can be laid on sysfs for trying to be a nice generic interface,
but having a damn inconvenient locking model.

Linus

2009-12-30 21:34:50

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Linus Torvalds <[email protected]> writes:

> We've seen it several times (yes, mostly with drm, but it's been seen with
> others too), and it's very annoying. It can be fixed by having very
> careful readdir implementations, but I really would blame sysfs in
> particular for having a very annoying lock reversal issue when used
> reasonably.

Maybe. The mnmap_sem has some interesting issues all of it's own.
What reasonable thing is the drm doing that is causing problems?

> So the optimal situation would be for sysfs to not have that annoying lock
> dependency, and it would really have to be sysfs_readdir() that drops the
> sysfs_mutex around the filldir call (and that obviously implies having to
> re-validate and be really careful).
>
> Added Eric and Greg to the cc, in case the sysfs people want to solve it.

There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir
and I have some tenative patches for that. I will take a look after I
come back from the holidays, in a couple of days. I don't understand
the issue as described.

> And yes, one option would be to just fix drm - by avoiding calling any
> sysfs functions while holding the mmap_lock (either in the mmap callback
> or the page fault paths). However, as mentioned, I really do think that
> the blame can be laid on sysfs for trying to be a nice generic interface,
> but having a damn inconvenient locking model.

Could be. I have simplified the sysfs locking quite a bit this last
round. I don't know if there is much more than corner cases left to
improve.

Eric

2009-12-30 22:03:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)



On Wed, 30 Dec 2009, Eric W. Biederman wrote:

> Linus Torvalds <[email protected]> writes:
>
> > We've seen it several times (yes, mostly with drm, but it's been seen with
> > others too), and it's very annoying. It can be fixed by having very
> > careful readdir implementations, but I really would blame sysfs in
> > particular for having a very annoying lock reversal issue when used
> > reasonably.
>
> Maybe. The mnmap_sem has some interesting issues all of it's own.
> What reasonable thing is the drm doing that is causing problems?

The details are in the original thread on lkml, but it boils down to
basically (the below may not be the exact sequence, but it's close)

- drm_mmap (called with mmap_sem) takes 'dev->struct_mutex' to protect
it's own device data (very reasonable)

- drm_release takes 'dev->struct_mutex' again to protect its own data,
and calls "mtrr_del_page()" which ends up taking cpu_hotplug.lock.

Again, that doesn't sound "wrong" in any way.

- hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> ..
kref_put .. -> sysfs_addrm_start (sysfs_mutex)

Again, nothing suspicious or "bad", and this part of the dependency
chain has nothing to do with the DRM code itself.

- sysfs_readdir() (and this is the big problem) holds sysfs_mutex in its
readdir implementation over the call to filldir. And filldir copies the
data to user space, so now you have sysfs_mutex -> mmap_sem.

See? None of the chains look bad. Except sysfs_readdir() obviously has
that sysfs_mutex -> mmap_sem thing, which is _very_ annoying, because now
you end up with a chain like

mmap_sem -> dev->struct_mutex -> cpu_hotplug.lock -> sysfs_mutex -> mmap_sem

and I think you'll agree that of all the lock chains, the place to break
the association is at sysfs_mutex. And the obvious place to break it would
be that last "sysfs_mutex -> mmap_sem" stage.

> > Added Eric and Greg to the cc, in case the sysfs people want to solve it.
>
> There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir
> and I have some tenative patches for that. I will take a look after I
> come back from the holidays, in a couple of days. I don't understand
> the issue as described.

Ok, hopefully the above chain explains it to you, and also makes it clear
that it's rather hard to break anywhere else, and it's not somebody else
doing anything "obviously bogus".

Btw, the scalability issues with readdir() in general is why I'd be open
to try to change the rules for filldir(), and always make it a kernel
space copy. I suspect a number of users would like being able to use
spinlocks over filldir, but it's currently impossible.

However, we have a lot of filldir implementations (knfsd and several
different system call interfaces), and while some of them already use
kernel buffers (eg knfsd) others would need to allocate temporary storage
and then do a double copy. And I suspect even things like knfsd do things
like sleep and take locks, so it's possible that actually getting to the
point where filldir could be spinlock-safe would be infeasible.

Linus

2009-12-31 08:40:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Linus Torvalds <[email protected]> writes:

> On Wed, 30 Dec 2009, Eric W. Biederman wrote:
>
>> Linus Torvalds <[email protected]> writes:
>>
>> > We've seen it several times (yes, mostly with drm, but it's been seen with
>> > others too), and it's very annoying. It can be fixed by having very
>> > careful readdir implementations, but I really would blame sysfs in
>> > particular for having a very annoying lock reversal issue when used
>> > reasonably.
>>
>> Maybe. The mnmap_sem has some interesting issues all of it's own.
>> What reasonable thing is the drm doing that is causing problems?
>
> The details are in the original thread on lkml, but it boils down to
> basically (the below may not be the exact sequence, but it's close)

Thanks.

> - drm_mmap (called with mmap_sem) takes 'dev->struct_mutex' to protect
> it's own device data (very reasonable)
>
> - drm_release takes 'dev->struct_mutex' again to protect its own data,
> and calls "mtrr_del_page()" which ends up taking cpu_hotplug.lock.
>
> Again, that doesn't sound "wrong" in any way.
>
> - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> ..
> kref_put .. -> sysfs_addrm_start (sysfs_mutex)
>
> Again, nothing suspicious or "bad", and this part of the dependency
> chain has nothing to do with the DRM code itself.

kobject_del with a lock held scares me.

There is a possible deadlock (that lockdep is ignorant of) if you hold
a lock over sysfs_deactivate() and if any sysfs file takes that lock.

I won't argue with a claim of inconvenient locking semantics here, and
this is different to the problem you are seeing (except that fixing this
problem would happen to fix the filldir issue).

> - sysfs_readdir() (and this is the big problem) holds sysfs_mutex in its
> readdir implementation over the call to filldir. And filldir copies the
> data to user space, so now you have sysfs_mutex -> mmap_sem.
>
> See? None of the chains look bad. Except sysfs_readdir() obviously has
> that sysfs_mutex -> mmap_sem thing, which is _very_ annoying, because now
> you end up with a chain like
>
> mmap_sem -> dev->struct_mutex -> cpu_hotplug.lock -> sysfs_mutex -> mmap_sem
>
> and I think you'll agree that of all the lock chains, the place to break
> the association is at sysfs_mutex. And the obvious place to break it would
> be that last "sysfs_mutex -> mmap_sem" stage.

I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir
is useful to reduce the lock hold time if nothing else.

The cheap fix here is mostly a matter of grabbing a reference to the
sysfs_dirent and then revalidating that the reference is still useful
after we reacquire the sysfs_mutex. If not we already have the code for
restarting from just an offset. We just don't want to use it too much as
that will give us O(n^2) times for sysfs readdir.

I will see if I can dig up or regenerate my patch in the next couple of days.

>> > Added Eric and Greg to the cc, in case the sysfs people want to solve it.
>>
>> There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir
>> and I have some tenative patches for that. I will take a look after I
>> come back from the holidays, in a couple of days. I don't understand
>> the issue as described.
>
> Ok, hopefully the above chain explains it to you, and also makes it clear
> that it's rather hard to break anywhere else, and it's not somebody else
> doing anything "obviously bogus".

We very definitely have an ABBA deadlock with sysfs_deactivate and the
cpu_hotplug.lock. arch/x86/kernel/microcode_core.c:reload_store() is the
code for a sysfs file that when written to calls get_online_cpus().

Regardless of what we do with sysfs_readdir we need to see if we can
fix cpu_down(), to remove this nasty deadlock.

Eric

2009-12-31 08:40:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Linus Torvalds <[email protected]> writes:

> On Wed, 30 Dec 2009, Eric W. Biederman wrote:
>
>> Linus Torvalds <[email protected]> writes:
>>
>> > We've seen it several times (yes, mostly with drm, but it's been seen with
>> > others too), and it's very annoying. It can be fixed by having very
>> > careful readdir implementations, but I really would blame sysfs in
>> > particular for having a very annoying lock reversal issue when used
>> > reasonably.
>>
>> Maybe. The mnmap_sem has some interesting issues all of it's own.
>> What reasonable thing is the drm doing that is causing problems?
>
> The details are in the original thread on lkml, but it boils down to
> basically (the below may not be the exact sequence, but it's close)

Thanks.

> - drm_mmap (called with mmap_sem) takes 'dev->struct_mutex' to protect
> it's own device data (very reasonable)
>
> - drm_release takes 'dev->struct_mutex' again to protect its own data,
> and calls "mtrr_del_page()" which ends up taking cpu_hotplug.lock.
>
> Again, that doesn't sound "wrong" in any way.
>
> - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> ..
> kref_put .. -> sysfs_addrm_start (sysfs_mutex)
>
> Again, nothing suspicious or "bad", and this part of the dependency
> chain has nothing to do with the DRM code itself.

kobject_del with a lock held scares me.

There is a possible deadlock (that lockdep is ignorant of) if you hold
a lock over sysfs_deactivate() and if any sysfs file takes that lock.

I won't argue with a claim of inconvenient locking semantics here, and
this is different to the problem you are seeing (except that fixing this
problem would happen to fix the filldir issue).

> - sysfs_readdir() (and this is the big problem) holds sysfs_mutex in its
> readdir implementation over the call to filldir. And filldir copies the
> data to user space, so now you have sysfs_mutex -> mmap_sem.
>
> See? None of the chains look bad. Except sysfs_readdir() obviously has
> that sysfs_mutex -> mmap_sem thing, which is _very_ annoying, because now
> you end up with a chain like
>
> mmap_sem -> dev->struct_mutex -> cpu_hotplug.lock -> sysfs_mutex -> mmap_sem
>
> and I think you'll agree that of all the lock chains, the place to break
> the association is at sysfs_mutex. And the obvious place to break it would
> be that last "sysfs_mutex -> mmap_sem" stage.

I agree that fixing sysfs_readdir to not hold the sysfs_mutex over filldir
is useful to reduce the lock hold time if nothing else.

The cheap fix here is mostly a matter of grabbing a reference to the
sysfs_dirent and then revalidating that the reference is still useful
after we reacquire the sysfs_mutex. If not we already have the code for
restarting from just an offset. We just don't want to use it too much as
that will give us O(n^2) times for sysfs readdir.

I will see if I can dig up or regenerate my patch in the next couple of days.

>> > Added Eric and Greg to the cc, in case the sysfs people want to solve it.
>>
>> There are scalability reasons for dropping the sysfs_mutex in sysfs_readdir
>> and I have some tenative patches for that. I will take a look after I
>> come back from the holidays, in a couple of days. I don't understand
>> the issue as described.
>
> Ok, hopefully the above chain explains it to you, and also makes it clear
> that it's rather hard to break anywhere else, and it's not somebody else
> doing anything "obviously bogus".

We very definitely have an ABBA deadlock with sysfs_deactivate and the
cpu_hotplug.lock. arch/x86/kernel/microcode_core.c:reload_store() is the
code for a sysfs file that when written to calls get_online_cpus().

Regardless of what we do with sysfs_readdir we need to see if we can
fix cpu_down(), to remove this nasty deadlock.

Eric

2009-12-31 19:05:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)



On Thu, 31 Dec 2009, Eric W. Biederman wrote:
> >
> > - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> ..
> > kref_put .. -> sysfs_addrm_start (sysfs_mutex)
> >
> > Again, nothing suspicious or "bad", and this part of the dependency
> > chain has nothing to do with the DRM code itself.
>
> kobject_del with a lock held scares me.

I would not object at _all_ if sysfs fixed the locking here instead of in
filldir.

The problem is that releasing objects (with kref_put() and friends) is
something that is _commonly_ done with various locks held.

Btw, that "cpu_down()" is by no means the only case. I would suggest you
just google for

sysfs_mutex lockdep

and you'll find a _lot_ of cases, most of them not involving drm at all,
but ext4 and btrfs.

(Side note: almost all of them tend to _also_ have mmap_sem in the chain:
that's usually the thing that "closes the deal").

> There is a possible deadlock (that lockdep is ignorant of) if you hold
> a lock over sysfs_deactivate() and if any sysfs file takes that lock.
>
> I won't argue with a claim of inconvenient locking semantics here, and
> this is different to the problem you are seeing (except that fixing this
> problem would happen to fix the filldir issue).

I suspect that filldir is almost always implicated because mmap_sem is so
hard to do just one way: both page faulting and mmap have it held, and so
a lot of locks need to be gotten _after_ it, while filldir very often has
the exact reverse requirement.

So that's why filldir is kind of special (and the fundamental _reason_ it
is special is exactly because pretty much all other VFS operations work
with generic caches, and the actual filesystem only fills in the caches,
it doesn't copy to user space directly while holding any locks - although
ioctl's sometimes have the same issue as filldir for all the same
reasons).

Anyway, I'm in _no_ way saying that you need to break it at filldir: the
reason I pick on filldir is because I hate it, and think that it's a
really annoying special case at the VFS level. But from a sysfs
standpoint, I could well see that there are worse problems than that kind
of annoying VFS problem.

So if you can break it at that kref_put layer (which leads to releasing a
sysfs object etc), then that would be great. In fact, it would be better,
since kref_put and friends are in many ways "more fundamental" than some
filldir special case that we _could_ fix in other ways.

> The cheap fix here is mostly a matter of grabbing a reference to the
> sysfs_dirent and then revalidating that the reference is still useful
> after we reacquire the sysfs_mutex. If not we already have the code for
> restarting from just an offset. We just don't want to use it too much as
> that will give us O(n^2) times for sysfs readdir.

Well, the _trivial_ fix is to just move the mutex_lock/unlock _inside_ the
loop instead of of outside. Something like the appended might just work,
and is the really stupid approach.

Totally untested. And it will do a _lot_ more sysfs mutex accesses, since
now it will lock/unlock around each entry we return.

A smarter thing to do would probably be to rewrite the 's_sibling' search
to instead insert a fake entry in the list, so that we don't have to
traverse the s_sibling list every time for each entry (which is O(n**2) in
size of the directory, and just generally horribly evil and crap code).

Linus

---
fs/sysfs/dir.c | 34 +++++++++++++++++++---------------
1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..2d0fd42 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -847,29 +847,33 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
}
- if ((filp->f_pos > 1) && (filp->f_pos < INT_MAX)) {
- mutex_lock(&sysfs_mutex);
+ while ((filp->f_pos > 1) && (filp->f_pos < INT_MAX)) {
+ const char * name;
+ int len, err;

+ mutex_lock(&sysfs_mutex);
/* Skip the dentries we have already reported */
pos = parent_sd->s_dir.children;
while (pos && (filp->f_pos > pos->s_ino))
pos = pos->s_sibling;

- for ( ; pos; pos = pos->s_sibling) {
- const char * name;
- int len;
+ /* This is ok even with 'pos == NULL' */
+ sysfs_get_active(pos);
+ mutex_unlock(&sysfs_mutex);
+ if (!pos) {
+ filp->f_pos = INT_MAX;
+ break;
+ }

- name = pos->s_name;
- len = strlen(name);
- filp->f_pos = ino = pos->s_ino;
+ name = pos->s_name;
+ len = strlen(name);
+ filp->f_pos = ino = pos->s_ino;

- if (filldir(dirent, name, len, filp->f_pos, ino,
- dt_type(pos)) < 0)
- break;
- }
- if (!pos)
- filp->f_pos = INT_MAX;
- mutex_unlock(&sysfs_mutex);
+ err = filldir(dirent, name, len, filp->f_pos, ino, dt_type(pos));
+ sysfs_put_active(pos);
+
+ if (err < 0)
+ break;
}
return 0;
}

2010-01-01 13:58:14

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability


When sysfs_readdir stops short we now cache the next sysfs_dirent to
return to user space in filp->private_data. There is no impact on the
rest of sysfs by doing this and in the common case it allows us to
pick up exactly where we left off with no seeking.

Additionally I drop and regrab the sysfs_mutex around filldir to avoid
a page fault arbitrarily increasing the hold time on the sysfs_mutex.

Signed-off-by: Eric W. Biederman <[email protected]>
---

I haven't stressed this patch but it is sound in principle, and is a
general sysfs improvement, regardless of any locking issues.

fs/sysfs/dir.c | 85 ++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 62 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..ad8a57e 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,19 +827,51 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
return (sd->s_mode >> 12) & 15;
}

+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+ sysfs_put(filp->private_data);
+ return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ if (pos) {
+ int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+ pos->s_parent == parent_sd &&
+ ino == pos->s_ino;
+ sysfs_put(pos);
+ if (valid)
+ return pos;
+ }
+ pos = parent_sd->s_dir.children;
+ while (pos && (ino > pos->s_ino))
+ pos = pos->s_sibling;
+ return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ pos = sysfs_dir_pos(parent_sd, ino, pos);
+ if (pos)
+ pos = pos->s_sibling;
+ return pos;
+}
+
static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
- struct sysfs_dirent *pos;
+ struct sysfs_dirent *pos = filp->private_data;
ino_t ino;

- if (filp->f_pos == 0) {
+ if (!pos && filp->f_pos == 0) {
ino = parent_sd->s_ino;
if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
}
- if (filp->f_pos == 1) {
+ if (!pos && filp->f_pos == 1) {
if (parent_sd->s_parent)
ino = parent_sd->s_parent->s_ino;
else
@@ -847,29 +879,35 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
}
- if ((filp->f_pos > 1) && (filp->f_pos < INT_MAX)) {
- mutex_lock(&sysfs_mutex);
-
- /* Skip the dentries we have already reported */
- pos = parent_sd->s_dir.children;
- while (pos && (filp->f_pos > pos->s_ino))
- pos = pos->s_sibling;
+ /* EOF? */
+ if (!pos && filp->f_pos > 2)
+ return 0;

- for ( ; pos; pos = pos->s_sibling) {
- const char * name;
- int len;
-
- name = pos->s_name;
- len = strlen(name);
- filp->f_pos = ino = pos->s_ino;
+ mutex_lock(&sysfs_mutex);
+ for (pos = sysfs_dir_pos(parent_sd, filp->f_pos, pos);
+ pos;
+ pos = sysfs_dir_next_pos(parent_sd, filp->f_pos, pos)) {
+ const char * name;
+ unsigned int type;
+ int len, ret;
+
+ name = pos->s_name;
+ len = strlen(name);
+ ino = pos->s_ino;
+ type = dt_type(pos);
+ filp->f_pos = ino;
+ filp->private_data = sysfs_get(pos);

- if (filldir(dirent, name, len, filp->f_pos, ino,
- dt_type(pos)) < 0)
- break;
- }
- if (!pos)
- filp->f_pos = INT_MAX;
mutex_unlock(&sysfs_mutex);
+ ret = filldir(dirent, name, len, filp->f_pos, ino, type);
+ mutex_lock(&sysfs_mutex);
+ if (ret < 0)
+ break;
+ }
+ mutex_unlock(&sysfs_mutex);
+ if (!pos) { /* EOF */
+ filp->f_pos = 3;
+ filp->private_data = NULL;
}
return 0;
}
@@ -878,5 +916,6 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
const struct file_operations sysfs_dir_operations = {
.read = generic_read_dir,
.readdir = sysfs_readdir,
+ .release = sysfs_dir_release,
.llseek = generic_file_llseek,
};
--
1.6.5.2.143.g8cc62

2010-01-01 15:16:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected (was: Re: Linux 2.6.33-rc2 - Merry Christmas ...)

Linus Torvalds <[email protected]> writes:

> On Thu, 31 Dec 2009, Eric W. Biederman wrote:
>> >
>> > - hibernate ends up with the sequence: _cpu_down (cpu_hotplug.lock) -> ..
>> > kref_put .. -> sysfs_addrm_start (sysfs_mutex)
>> >
>> > Again, nothing suspicious or "bad", and this part of the dependency
>> > chain has nothing to do with the DRM code itself.
>>
>> kobject_del with a lock held scares me.
>
> I would not object at _all_ if sysfs fixed the locking here instead of in
> filldir.

I just sent you my sysfs filldir scalability patch, so we can take that
red-herring off the plate.

The problem as I see it is that kobject_del is convenient.
kobject_del waits until all of the sysfs show and store methods for
that kobject have stopped executing. Which imposes the rule that
kobject_del can not be called with any locks held that are taken in a
sysfs show or store method. This is all invisible to lockdep as the
wait is done with a completion and not a lock.

Which unfortunately means fixing filldir only removes some noise from
the picture, and completely hides the problem from lockdep.


....

Looking at the case I am familiar with in the networking layer I think
I have stumbled on a way to sort out this locking problem.

Today the network layer effectively does:
rtnl_lock();
device_del(dev);
rtnl_unlock();
kobject_put(dev);


sysfs_deactivate happens in the device_del(), but if we were to move
sysfs_deactivate into the final kobject_put then in theory we can
continue to block and be friendly but not need to be called with
locations where locks are held.

The core idea is to allow unlisting devices from sysfs under a lock
while still waiting for all users to complete after it is safe to
drop the lock.

Does that work for the cpu hotplug case? Doing everything from
notifiers makes me suspect it will fail.


....


Either way we will need some lockdep warnings for sysfs_deactivate
so that the problem does not continue to hide and silently foul
things up. So I will see if I can cook something.

Eric

2010-01-01 15:34:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability

On Fri, Jan 01, 2010 at 05:58:00AM -0800, Eric W. Biederman wrote:
>
> When sysfs_readdir stops short we now cache the next sysfs_dirent to
> return to user space in filp->private_data. There is no impact on the
> rest of sysfs by doing this and in the common case it allows us to
> pick up exactly where we left off with no seeking.
>
> Additionally I drop and regrab the sysfs_mutex around filldir to avoid
> a page fault arbitrarily increasing the hold time on the sysfs_mutex.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
>
> I haven't stressed this patch but it is sound in principle, and is a
> general sysfs improvement, regardless of any locking issues.

I've slapped it ontop of v2.6.33-rc2-249-gcd6e125 here and the circular
locking warning is gone. I'll keep an eye on it in the next couple of
days, just in case.

Tested-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

2010-01-02 02:57:28

by Tejun Heo

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Hello, Eric.

On 01/02/2010 12:16 AM, Eric W. Biederman wrote:
>>> kobject_del with a lock held scares me.
>>
>> I would not object at _all_ if sysfs fixed the locking here instead of in
>> filldir.
>
> I just sent you my sysfs filldir scalability patch, so we can take that
> red-herring off the plate.
>
> The problem as I see it is that kobject_del is convenient.
> kobject_del waits until all of the sysfs show and store methods for
> that kobject have stopped executing. Which imposes the rule that
> kobject_del can not be called with any locks held that are taken in a
> sysfs show or store method. This is all invisible to lockdep as the
> wait is done with a completion and not a lock.

The synchronization against read/write ops is in sysfs_deactivate on
purpose so that drivers (most common users) don't have to worry about
sysfs ops accessing different parts of data structures once
device_del() is complete. Implementing the exlusion at the driver
level is possible but not easy because some hardware devices are
represented with complex data structures, some of them are reused when
devices are exchanged and some sysfs ops end up accessing the
hardware. So, it's often not possible to simply disassociate the data
structure and float it till the last reference goes away. There needs
to be a synchronization point where the driver can tell that nothing
is accessing released data structure or hardware resource after it and
it's far easier to define it at the sysfs level.

> sysfs_deactivate happens in the device_del(), but if we were to move
> sysfs_deactivate into the final kobject_put then in theory we can
> continue to block and be friendly but not need to be called with
> locations where locks are held.

Nobody would know when that final put will actually happen. In
progress sysfs ops might access the hardware after the hardware is
gone or replaced with another unit.

> Either way we will need some lockdep warnings for sysfs_deactivate
> so that the problem does not continue to hide and silently foul
> things up. So I will see if I can cook something.

I don't think this is really relevant to the problem at hand but
adding lockdep annotations would definitely be beneficial, which BTW
is another reason to leave the synchronization in sysfs_deactivate as
the trade off is between deadlocks which can be detected somewhat
reliably with lockdep and scary race conditions which may involve
hardware in mysterious ways.

Thanks.

--
tejun

2010-01-02 06:03:07

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability v2

On Fri, Jan 01, 2010 at 03:10:49PM -0800, Linus Torvalds wrote:
>
> So I guess that's an "Ack", although I'd prefer it to get some more
> testing and perhaps go through Greg's tree as sysfs patches usually go.
>
> And by "testing" I mean both the "yes, this second version also breaks the
> lockdep chain and avoids the warning", but also some kind of actual
> testing of /sysfs itself. If there is any.

I'll queue this up in my tree on Monday to get it some testing.

thanks,

greg k-h

2010-01-02 23:57:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference

On 01/03/2010 06:37 AM, Eric W. Biederman wrote:
>
> Holding locks over device_del -> kobject_del -> sysfs_deactivate can
> cause deadlocks if those same locks are grabbed in sysfs show or store
> methods.
>
> The I model s_active count + completion as a sleeping read/write lock.
> I describe to lockdep sysfs_get_active as a read_trylock,
> sysfs_put_active as a read_unlock, and sysfs_deactivate as a
> write_lock and write_unlock pair. This seems to capture the essence
> for purposes of finding deadlocks, and in my testing gives finds real
> issues and ignores non-issues.
>
> This brings us back to holding locks over kobject_del is a problem
> that ideally we should find a way of addressing, but at least lockdep
> can tell us about the problems instead of requiring developers to debug
> rare strange system deadlocks, that happen when sysfs files are removed
> while being written to.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Looks good to me.

Acked-by: Tejun Heo <[email protected]>

Thanks for doing this.

--
tejun

2010-01-03 00:27:11

by Tejun Heo

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Hello,

On 01/03/2010 06:49 AM, Eric W. Biederman wrote:
>>> sysfs_deactivate happens in the device_del(), but if we were to move
>>> sysfs_deactivate into the final kobject_put then in theory we can
>>> continue to block and be friendly but not need to be called with
>>> locations where locks are held.
>>
>> Nobody would know when that final put will actually happen. In
>> progress sysfs ops might access the hardware after the hardware is
>> gone or replaced with another unit.
>
> Alright than that is a bad possible split of the functionality. Which
> is all I was suggesting splitting the functionality not doing away
> with the wait or moving it to a point where the wait would not work.
> It was simply my bad assumption that the final kobject_put would
> happen before the module that controlled that kobject could be
> removed.

The module should stay around. The severing is necessary to protect
driver internal data structures and possibly removed or reattached (to
a different driver) hardware.

> I still think it might make sense to separate kobject_del into two
> parts. One that we call with the locks held and one without, but that
> does seem to be applicable to only a very small set of cases and our
> problems appear to be much larger than that.

If such separation is necessary, we can implement the split interface
while leaving kobject_del() as is feature-wise and convert the
offending ones to use the split interface but I think it would be
better to simply fix the offending ones if there aren't too many and
they're easily fixable. Let's see how many lockdep warnings turn up.

> For the moment I have generated a patch that does the lockdep
> annotations, and I have found that a simple:
>
> find /sys -type f | xargs cat {} > /dev/null
>
> trivially generates lockdep warnings. In particular:

(cc'ing Dmitry, Hi!)

> [ 165.049042]
> [ 165.049044] =======================================================
> [ 165.052761] [ INFO: possible circular locking dependency detected ]
> [ 165.052761] 2.6.33-rc2x86_64 #3
> [ 165.052761] -------------------------------------------------------
> [ 165.052761] cat/5026 is trying to acquire lock:
> [ 165.052761] (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> [ 165.052761]
> [ 165.052761] but task is already holding lock:
> [ 165.089443] (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
> [ 165.089443]
> [ 165.089443] which lock already depends on the new lock.
> [ 165.089443]
> [ 165.089443]
> [ 165.089443] the existing dependency chain (in reverse order) is:
> [ 165.089443]
> [ 165.089443] -> #1 (s_active){++++.+}:
> [ 165.089443] [<ffffffff81054956>] validate_chain+0xa25/0xd1d
> [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> [ 165.089443] [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
> [ 165.089443] [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
> [ 165.089443] [<ffffffff810e94cf>] remove_files+0x1f/0x2c
> [ 165.089443] [<ffffffff810e9561>] sysfs_remove_group+0x85/0xb4
> [ 165.089443] [<ffffffff81331f0f>] psmouse_disconnect+0x33/0x147
> [ 165.089443] [<ffffffff8132687b>] serio_disconnect_driver+0x2d/0x3a
> [ 165.089443] [<ffffffff81326898>] serio_driver_remove+0x10/0x14
> [ 165.089443] [<ffffffff812077f0>] __device_release_driver+0x67/0xb0
> [ 165.089443] [<ffffffff81207857>] device_release_driver+0x1e/0x2b
> [ 165.089443] [<ffffffff81326e68>] serio_disconnect_port+0x60/0x69
> [ 165.089443] [<ffffffff8132757a>] serio_thread+0x170/0x34a
> [ 165.089443] [<ffffffff810470e7>] kthread+0x7d/0x85
> [ 165.089443] [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
> [ 165.089443]
> [ 165.089443] -> #0 (&serio->drv_mutex){+.+.+.}:
> [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
> [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
> [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
> [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
> [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
> [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
> [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
> [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
> [ 165.089443]
> [ 165.089443] other info that might help us debug this:
> [ 165.089443]
> [ 165.089443] 3 locks held by cat/5026:
> [ 165.089443] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff810e715a>] sysfs_read_file+0x39/0x145
> [ 165.089443] #1: (s_active){++++.+}, at: [<ffffffff810e84d0>] sysfs_get_active_two+0x1f/0x43
> [ 165.089443] #2: (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
> [ 165.089443]
> [ 165.089443] stack backtrace:
> [ 165.089443] Pid: 5026, comm: cat Not tainted 2.6.33-rc2x86_64 #3
> [ 165.089443] Call Trace:
> [ 165.089443] [<ffffffff810538f3>] print_circular_bug+0xb3/0xc1
> [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
> [ 165.089443] [<ffffffff81052fb6>] ? trace_hardirqs_on_caller+0x10b/0x12f
> [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
> [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> [ 165.089443] [<ffffffff8132ee41>] ? atkbd_show_extra+0x0/0x28
> [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
> [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
> [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
> [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
> [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
> [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
>
> Suggestions on how to sort out this other set of issues are welcome.

Ummm... read of an input sysfs node can trigger
serio_disconnect_port() under serio->drv_mutex, which unfortunately
would need to wait for completion of in-progress sysfs ops thus
creating possibility for AB-BA deadlock. Dmitry, is it possible to
make serio_disconnect_port() asynchronous from the sysfs ops (ie. put
it in a work or something)?

Thanks.

--
tejun

2010-01-02 21:37:36

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference


Holding locks over device_del -> kobject_del -> sysfs_deactivate can
cause deadlocks if those same locks are grabbed in sysfs show or store
methods.

The I model s_active count + completion as a sleeping read/write lock.
I describe to lockdep sysfs_get_active as a read_trylock,
sysfs_put_active as a read_unlock, and sysfs_deactivate as a
write_lock and write_unlock pair. This seems to capture the essence
for purposes of finding deadlocks, and in my testing gives finds real
issues and ignores non-issues.

This brings us back to holding locks over kobject_del is a problem
that ideally we should find a way of addressing, but at least lockdep
can tell us about the problems instead of requiring developers to debug
rare strange system deadlocks, that happen when sysfs files are removed
while being written to.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/sysfs/dir.c | 14 ++++++++++++--
fs/sysfs/sysfs.h | 15 +++++++++++++++
2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5d88e30..5c4703d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -106,8 +106,10 @@ static struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
return NULL;

t = atomic_cmpxchg(&sd->s_active, v, v + 1);
- if (likely(t == v))
+ if (likely(t == v)) {
+ rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
return sd;
+ }
if (t < 0)
return NULL;

@@ -130,6 +132,7 @@ static void sysfs_put_active(struct sysfs_dirent *sd)
if (unlikely(!sd))
return;

+ rwsem_release(&sd->dep_map, 1, _RET_IP_);
v = atomic_dec_return(&sd->s_active);
if (likely(v != SD_DEACTIVATED_BIAS))
return;
@@ -194,15 +197,21 @@ static void sysfs_deactivate(struct sysfs_dirent *sd)
BUG_ON(sd->s_sibling || !(sd->s_flags & SYSFS_FLAG_REMOVED));
sd->s_sibling = (void *)&wait;

+ rwsem_acquire(&sd->dep_map, 0, 0, _RET_IP_);
/* atomic_add_return() is a mb(), put_active() will always see
* the updated sd->s_sibling.
*/
v = atomic_add_return(SD_DEACTIVATED_BIAS, &sd->s_active);

- if (v != SD_DEACTIVATED_BIAS)
+ if (v != SD_DEACTIVATED_BIAS) {
+ lock_contended(&sd->dep_map, _RET_IP_);
wait_for_completion(&wait);
+ }

sd->s_sibling = NULL;
+
+ lock_acquired(&sd->dep_map, _RET_IP_);
+ rwsem_release(&sd->dep_map, 1, _RET_IP_);
}

static int sysfs_alloc_ino(ino_t *pino)
@@ -345,6 +354,7 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)

atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_active, 0);
+ sysfs_dirent_init_lockdep(sd);

sd->s_name = name;
sd->s_mode = mode;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index ca52e7b..cdd9377 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -8,6 +8,7 @@
* This file is released under the GPLv2.
*/

+#include <linux/lockdep.h>
#include <linux/fs.h>

struct sysfs_open_dirent;
@@ -50,6 +51,9 @@ struct sysfs_inode_attrs {
struct sysfs_dirent {
atomic_t s_count;
atomic_t s_active;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
struct sysfs_dirent *s_parent;
struct sysfs_dirent *s_sibling;
const char *s_name;
@@ -84,6 +88,17 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
return sd->s_flags & SYSFS_TYPE_MASK;
}

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define sysfs_dirent_init_lockdep(sd) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ lockdep_init_map(&sd->dep_map, "s_active", &__key, 0); \
+} while(0)
+#else
+#define sysfs_dirent_init_lockdep(sd) do {} while(0)
+#endif
+
/*
* Context structure to be used while adding/removing nodes.
*/
--
1.6.5.2.143.g8cc62

2010-01-02 21:49:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Tejun Heo <[email protected]> writes:

> Hello, Eric.
>
> On 01/02/2010 12:16 AM, Eric W. Biederman wrote:
>>>> kobject_del with a lock held scares me.
>>>
>>> I would not object at _all_ if sysfs fixed the locking here instead of in
>>> filldir.
>>
>> I just sent you my sysfs filldir scalability patch, so we can take that
>> red-herring off the plate.
>>
>> The problem as I see it is that kobject_del is convenient.
>> kobject_del waits until all of the sysfs show and store methods for
>> that kobject have stopped executing. Which imposes the rule that
>> kobject_del can not be called with any locks held that are taken in a
>> sysfs show or store method. This is all invisible to lockdep as the
>> wait is done with a completion and not a lock.
>
> The synchronization against read/write ops is in sysfs_deactivate on
> purpose so that drivers (most common users) don't have to worry about
> sysfs ops accessing different parts of data structures once
> device_del() is complete. Implementing the exlusion at the driver
> level is possible but not easy because some hardware devices are
> represented with complex data structures, some of them are reused when
> devices are exchanged and some sysfs ops end up accessing the
> hardware. So, it's often not possible to simply disassociate the data
> structure and float it till the last reference goes away. There needs
> to be a synchronization point where the driver can tell that nothing
> is accessing released data structure or hardware resource after it and
> it's far easier to define it at the sysfs level.
>
>> sysfs_deactivate happens in the device_del(), but if we were to move
>> sysfs_deactivate into the final kobject_put then in theory we can
>> continue to block and be friendly but not need to be called with
>> locations where locks are held.
>
> Nobody would know when that final put will actually happen. In
> progress sysfs ops might access the hardware after the hardware is
> gone or replaced with another unit.

Alright than that is a bad possible split of the functionality. Which
is all I was suggesting splitting the functionality not doing away
with the wait or moving it to a point where the wait would not work.
It was simply my bad assumption that the final kobject_put would
happen before the module that controlled that kobject could be
removed.

I still think it might make sense to separate kobject_del into two
parts. One that we call with the locks held and one without, but that
does seem to be applicable to only a very small set of cases and our
problems appear to be much larger than that.

For the moment I have generated a patch that does the lockdep
annotations, and I have found that a simple:

find /sys -type f | xargs cat {} > /dev/null

trivially generates lockdep warnings. In particular:

[ 165.049042]
[ 165.049044] =======================================================
[ 165.052761] [ INFO: possible circular locking dependency detected ]
[ 165.052761] 2.6.33-rc2x86_64 #3
[ 165.052761] -------------------------------------------------------
[ 165.052761] cat/5026 is trying to acquire lock:
[ 165.052761] (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
[ 165.052761]
[ 165.052761] but task is already holding lock:
[ 165.089443] (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
[ 165.089443]
[ 165.089443] which lock already depends on the new lock.
[ 165.089443]
[ 165.089443]
[ 165.089443] the existing dependency chain (in reverse order) is:
[ 165.089443]
[ 165.089443] -> #1 (s_active){++++.+}:
[ 165.089443] [<ffffffff81054956>] validate_chain+0xa25/0xd1d
[ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
[ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
[ 165.089443] [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
[ 165.089443] [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
[ 165.089443] [<ffffffff810e94cf>] remove_files+0x1f/0x2c
[ 165.089443] [<ffffffff810e9561>] sysfs_remove_group+0x85/0xb4
[ 165.089443] [<ffffffff81331f0f>] psmouse_disconnect+0x33/0x147
[ 165.089443] [<ffffffff8132687b>] serio_disconnect_driver+0x2d/0x3a
[ 165.089443] [<ffffffff81326898>] serio_driver_remove+0x10/0x14
[ 165.089443] [<ffffffff812077f0>] __device_release_driver+0x67/0xb0
[ 165.089443] [<ffffffff81207857>] device_release_driver+0x1e/0x2b
[ 165.089443] [<ffffffff81326e68>] serio_disconnect_port+0x60/0x69
[ 165.089443] [<ffffffff8132757a>] serio_thread+0x170/0x34a
[ 165.089443] [<ffffffff810470e7>] kthread+0x7d/0x85
[ 165.089443] [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
[ 165.089443]
[ 165.089443] -> #0 (&serio->drv_mutex){+.+.+.}:
[ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
[ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
[ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
[ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
[ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
[ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
[ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
[ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
[ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
[ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
[ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
[ 165.089443]
[ 165.089443] other info that might help us debug this:
[ 165.089443]
[ 165.089443] 3 locks held by cat/5026:
[ 165.089443] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff810e715a>] sysfs_read_file+0x39/0x145
[ 165.089443] #1: (s_active){++++.+}, at: [<ffffffff810e84d0>] sysfs_get_active_two+0x1f/0x43
[ 165.089443] #2: (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
[ 165.089443]
[ 165.089443] stack backtrace:
[ 165.089443] Pid: 5026, comm: cat Not tainted 2.6.33-rc2x86_64 #3
[ 165.089443] Call Trace:
[ 165.089443] [<ffffffff810538f3>] print_circular_bug+0xb3/0xc1
[ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
[ 165.089443] [<ffffffff81052fb6>] ? trace_hardirqs_on_caller+0x10b/0x12f
[ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
[ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
[ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
[ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
[ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
[ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
[ 165.089443] [<ffffffff8132ee41>] ? atkbd_show_extra+0x0/0x28
[ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
[ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
[ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
[ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
[ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
[ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
[ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b

Suggestions on how to sort out this other set of issues are welcome.

Eric

2010-01-01 18:56:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability



On Fri, 1 Jan 2010, Eric W. Biederman wrote:
>
> When sysfs_readdir stops short we now cache the next sysfs_dirent to
> return to user space in filp->private_data. There is no impact on the
> rest of sysfs by doing this and in the common case it allows us to
> pick up exactly where we left off with no seeking.
>
> Additionally I drop and regrab the sysfs_mutex around filldir to avoid
> a page fault arbitrarily increasing the hold time on the sysfs_mutex.

Ok, looks mostly sane, but a few things look odd.

>
> - if (filp->f_pos == 0) {
> + if (!pos && filp->f_pos == 0) {
> ino = parent_sd->s_ino;
> if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
> filp->f_pos++;
> }
> - if (filp->f_pos == 1) {
> + if (!pos && filp->f_pos == 1) {
> if (parent_sd->s_parent)
> ino = parent_sd->s_parent->s_ino;
> else
> @@ -847,29 +879,35 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
> if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
> filp->f_pos++;
> }
> - if ((filp->f_pos > 1) && (filp->f_pos < INT_MAX)) {
> - mutex_lock(&sysfs_mutex);
> -
> - /* Skip the dentries we have already reported */
> - pos = parent_sd->s_dir.children;
> - while (pos && (filp->f_pos > pos->s_ino))
> - pos = pos->s_sibling;
> + /* EOF? */
> + if (!pos && filp->f_pos > 2)
> + return 0;

These are all incorrect in the presense of 'lseek'. You can't do that

if (!pos && "test filp->f_pos")

thing, because you get all the wrong results for both the case of an lseek
before doing any readdir (which is undefined behavior, so I guess that's
technically ok) _and_ for the 'lseek back to zero _after_ doing a readdir'
case (which is _not_ undefined behavior!

It looks like it might be easy to fix by making a sysfs_llseek() function
that does something like

.. sysfs_llseek(..)
{
mutex_lock(&sysfs_mutex);
sysfs_release();
filp->private_data = NULL;
mutex_unlock(&sysfs_mutex);

return generic_file_llseek(..);
}

or similar. Except themn you'll need to change the EOF condition testing
and turn it into a re-validation event. Or maybe do the re-validation in
sysfs_llseek() itself, rather than just dropping the cached data.

Hmm? I haven't thought it through very deeply, so maybe I'm missing
something.

Linus

2010-01-01 23:11:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability v2



On Fri, 1 Jan 2010, Eric W. Biederman wrote:
> mutex_unlock(&sysfs_mutex);
> + ret = filldir(dirent, name, len, filp->f_pos, ino, type);
> + mutex_lock(&sysfs_mutex);
> + if (ret < 0)
> + break;
> + }
> + mutex_unlock(&sysfs_mutex);
> + if ((filp->f_pos > 1) && !pos) { /* EOF */
> + filp->f_pos = INT_MAX;
> + filp->private_data = NULL;
> }
> return 0;

That

mutex_lock(&sysfs_mutex);
if (ret < 0)
break;

looks just silly. We know 'pos' is non-NULL, so the break will effectively
just be a "mutex_unlock + return 0", and we just did the mutex_lock, so
why not instead do

if (ret < 0)
return 0;
mutex_lock(&sysfs_mutex);

there?

Not that it really _matters_, but it seems way clearer, no?

But other than that mindless nit, I can't see anything wrong with your
logic, and it looks ok to me from just reading the patch itself.

So I guess that's an "Ack", although I'd prefer it to get some more
testing and perhaps go through Greg's tree as sysfs patches usually go.

And by "testing" I mean both the "yes, this second version also breaks the
lockdep chain and avoids the warning", but also some kind of actual
testing of /sysfs itself. If there is any.

Linus

2010-01-01 22:44:01

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability v2


When sysfs_readdir stops short we now cache the next
sysfs_dirent to return to user space in filp->private_data.
There is no impact on the rest of sysfs by doing this and
in the common case it allows us to pick up exactly where
we left off with no seeking.

Additionally I drop and regrab the sysfs_mutex around
filldir to avoid a page fault abritrarily increasing the
hold time on the sysfs_mutex.

v2: Returned to using INT_MAX as the EOF condition.
seekdir is ambiguous unless all directory entries have
a unique f_pos value.

Signed-off-by: Eric W. Biederman <[email protected]>
---

Linus good catch.

fs/sysfs/dir.c | 82 +++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index f05f230..5d88e30 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -827,11 +827,46 @@ static inline unsigned char dt_type(struct sysfs_dirent *sd)
return (sd->s_mode >> 12) & 15;
}

+static int sysfs_dir_release(struct inode *inode, struct file *filp)
+{
+ sysfs_put(filp->private_data);
+ return 0;
+}
+
+static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ if (pos) {
+ int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
+ pos->s_parent == parent_sd &&
+ ino == pos->s_ino;
+ sysfs_put(pos);
+ if (valid)
+ return pos;
+ }
+ pos = NULL;
+ if ((ino > 1) && (ino < INT_MAX)) {
+ pos = parent_sd->s_dir.children;
+ while (pos && (ino > pos->s_ino))
+ pos = pos->s_sibling;
+ }
+ return pos;
+}
+
+static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
+ ino_t ino, struct sysfs_dirent *pos)
+{
+ pos = sysfs_dir_pos(parent_sd, ino, pos);
+ if (pos)
+ pos = pos->s_sibling;
+ return pos;
+}
+
static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
struct dentry *dentry = filp->f_path.dentry;
struct sysfs_dirent * parent_sd = dentry->d_fsdata;
- struct sysfs_dirent *pos;
+ struct sysfs_dirent *pos = filp->private_data;
ino_t ino;

if (filp->f_pos == 0) {
@@ -847,29 +882,31 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
if (filldir(dirent, "..", 2, filp->f_pos, ino, DT_DIR) == 0)
filp->f_pos++;
}
- if ((filp->f_pos > 1) && (filp->f_pos < INT_MAX)) {
- mutex_lock(&sysfs_mutex);
-
- /* Skip the dentries we have already reported */
- pos = parent_sd->s_dir.children;
- while (pos && (filp->f_pos > pos->s_ino))
- pos = pos->s_sibling;
-
- for ( ; pos; pos = pos->s_sibling) {
- const char * name;
- int len;
-
- name = pos->s_name;
- len = strlen(name);
- filp->f_pos = ino = pos->s_ino;
+ mutex_lock(&sysfs_mutex);
+ for (pos = sysfs_dir_pos(parent_sd, filp->f_pos, pos);
+ pos;
+ pos = sysfs_dir_next_pos(parent_sd, filp->f_pos, pos)) {
+ const char * name;
+ unsigned int type;
+ int len, ret;
+
+ name = pos->s_name;
+ len = strlen(name);
+ ino = pos->s_ino;
+ type = dt_type(pos);
+ filp->f_pos = ino;
+ filp->private_data = sysfs_get(pos);

- if (filldir(dirent, name, len, filp->f_pos, ino,
- dt_type(pos)) < 0)
- break;
- }
- if (!pos)
- filp->f_pos = INT_MAX;
mutex_unlock(&sysfs_mutex);
+ ret = filldir(dirent, name, len, filp->f_pos, ino, type);
+ mutex_lock(&sysfs_mutex);
+ if (ret < 0)
+ break;
+ }
+ mutex_unlock(&sysfs_mutex);
+ if ((filp->f_pos > 1) && !pos) { /* EOF */
+ filp->f_pos = INT_MAX;
+ filp->private_data = NULL;
}
return 0;
}
@@ -878,5 +915,6 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
const struct file_operations sysfs_dir_operations = {
.read = generic_read_dir,
.readdir = sysfs_readdir,
+ .release = sysfs_dir_release,
.llseek = generic_file_llseek,
};
--
1.6.5.2.143.g8cc62

2010-01-02 15:40:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Cache the last sysfs_dirent to improve readdir scalability v2

On Fri, Jan 01, 2010 at 03:10:49PM -0800, Linus Torvalds wrote:
> And by "testing" I mean both the "yes, this second version also breaks the
> lockdep chain and avoids the warning",

Yes, it survived a couple of suspend/resume cycles so far.

--
Regards/Gruss,
Boris.

2010-01-03 02:06:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Tejun Heo <[email protected]> writes:

> Hello,
>
> On 01/03/2010 06:49 AM, Eric W. Biederman wrote:
>>>> sysfs_deactivate happens in the device_del(), but if we were to move
>>>> sysfs_deactivate into the final kobject_put then in theory we can
>>>> continue to block and be friendly but not need to be called with
>>>> locations where locks are held.
>>>
>>> Nobody would know when that final put will actually happen. In
>>> progress sysfs ops might access the hardware after the hardware is
>>> gone or replaced with another unit.
>>
>> Alright than that is a bad possible split of the functionality. Which
>> is all I was suggesting splitting the functionality not doing away
>> with the wait or moving it to a point where the wait would not work.
>> It was simply my bad assumption that the final kobject_put would
>> happen before the module that controlled that kobject could be
>> removed.
>
> The module should stay around. The severing is necessary to protect
> driver internal data structures and possibly removed or reattached (to
> a different driver) hardware.

Removed driver hardware isn't something sysfs can really guard
against, although it can help to make the window of vulnerability
smaller. Protecting driver internal data structures if we can does
seem reasonable.

The case I was thinking of in particular is when someone does:
"rmmod driver" I think device_del protects from the code going away
today.

>> I still think it might make sense to separate kobject_del into two
>> parts. One that we call with the locks held and one without, but that
>> does seem to be applicable to only a very small set of cases and our
>> problems appear to be much larger than that.
>
> If such separation is necessary, we can implement the split interface
> while leaving kobject_del() as is feature-wise and convert the
> offending ones to use the split interface but I think it would be
> better to simply fix the offending ones if there aren't too many and
> they're easily fixable. Let's see how many lockdep warnings turn up.

- We have the network stack.
I have hacked around that (when I thought it was a singleton)
by introducing the idiom:

if (!rtnl_trylock())
return restart_sysscall();

But that isn't sustainable, as there is already one new entry that
just does rntl_lock unconditionally.

Maybe we can move the device_del out from under the rtnl_lock, but I
have my doubts. Certainly the proc and sysctl bits (which have the
same issue look more difficult.

- We almost have an issue in ext4.
Device_del is certainly called under lock_kernel() and lock_super().

- We have what a cpu_hotplug.lock issue with
/sys/devices/system/cpu/cpuN/microcode/reload, a variant of the problem
that triggered this discussion and it looks very non-trivial to solve.

So I'm not certain what to say except that we have longstanding problems.

Eric

2010-01-03 04:59:48

by Tejun Heo

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Hello,

On 01/03/2010 11:06 AM, Eric W. Biederman wrote:
> Removed driver hardware isn't something sysfs can really guard
> against, although it can help to make the window of vulnerability
> smaller.

It can't protect against removal itself per-se but it does give the
driver a boundary which it can depend on while implementing hot
unplugging. Hardwares which support hot unplugging can cope with
surprise removal and has mechanisms to detect and handle them but
software part still is tricky and driver needs to have a boundary
after which it can declare a device gone.

> Protecting driver internal data structures if we can does
> seem reasonable.

Also the case of driver detaching (and another driver attaching).

> The case I was thinking of in particular is when someone does:
> "rmmod driver" I think device_del protects from the code going away
> today.

Nope, that's protected by reference counting via fops and/or other
stuff.

>> If such separation is necessary, we can implement the split interface
>> while leaving kobject_del() as is feature-wise and convert the
>> offending ones to use the split interface but I think it would be
>> better to simply fix the offending ones if there aren't too many and
>> they're easily fixable. Let's see how many lockdep warnings turn up.
>
> - We have the network stack.
> I have hacked around that (when I thought it was a singleton)
> by introducing the idiom:
>
> if (!rtnl_trylock())
> return restart_sysscall();
>
> But that isn't sustainable, as there is already one new entry that
> just does rntl_lock unconditionally.
>
> Maybe we can move the device_del out from under the rtnl_lock, but I
> have my doubts. Certainly the proc and sysctl bits (which have the
> same issue look more difficult.
>
> - We almost have an issue in ext4.
> Device_del is certainly called under lock_kernel() and lock_super().
>
> - We have what a cpu_hotplug.lock issue with
> /sys/devices/system/cpu/cpuN/microcode/reload, a variant of the problem
> that triggered this discussion and it looks very non-trivial to solve.
>
> So I'm not certain what to say except that we have longstanding problems.

It's interesting that the above cases arn't common drivers. AFAICS,
the problem cases would usually be cases like above where the user is
a rather complex software entity or drivers which implement some form
of self detaching via sysfs. For the former group, I agree that
splitting deleting and draining (or simply skipping the draining part
or active reference counting both of which basically achieve the same
thing) would be an easy way out as it would be generally easy to leave
the data structures dangling till the references go away.

How about simply introducing an interface to mark sysfs nodes which
don't require active reference counting and using them on those nodes?

Thanks.

--
tejun

2010-01-03 05:38:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Tejun Heo <[email protected]> writes:

> It's interesting that the above cases arn't common drivers. AFAICS,
> the problem cases would usually be cases like above where the user is
> a rather complex software entity or drivers which implement some form
> of self detaching via sysfs. For the former group, I agree that
> splitting deleting and draining (or simply skipping the draining part
> or active reference counting both of which basically achieve the same
> thing) would be an easy way out as it would be generally easy to leave
> the data structures dangling till the references go away.
>
> How about simply introducing an interface to mark sysfs nodes which
> don't require active reference counting and using them on those nodes?

That might work. However it does not seem to address the case of
bond_sysfs, especially with someone doing rmmod bonding.

I think the brainstorm is on the right track. I think we just need to look
at a few more cases in depth so that we can see a pattern and generalize
what can be done.

Eric

2010-01-03 06:03:53

by Tejun Heo

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Hello,

On 01/03/2010 02:38 PM, Eric W. Biederman wrote:
>> How about simply introducing an interface to mark sysfs nodes which
>> don't require active reference counting and using them on those nodes?
>
> That might work. However it does not seem to address the case of
> bond_sysfs, especially with someone doing rmmod bonding.

Ah... okay, now I remember this. Yeah, I ripped off module ref
counting from sysfs ops. I completely forgot about that and was
thinking we still had module ref counting on sysfs ops. The logical
thing to do would be restoring module ref counting on sysfs ops which
won't go through active ref counting. ie. Let the interface which
switch off active ref counting require module owner so that it uses
module ref counting instead of active ref counting.

Thanks.

--
tejun

2010-01-03 07:47:54

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

On Sun, Jan 03, 2010 at 09:32:06AM +0900, Tejun Heo wrote:
> On 01/03/2010 06:49 AM, Eric W. Biederman wrote:
>
> > For the moment I have generated a patch that does the lockdep
> > annotations, and I have found that a simple:
> >
> > find /sys -type f | xargs cat {} > /dev/null
> >
> > trivially generates lockdep warnings. In particular:
>
> (cc'ing Dmitry, Hi!)

Hi Tejun! ;)

>
> > [ 165.049042]
> > [ 165.049044] =======================================================
> > [ 165.052761] [ INFO: possible circular locking dependency detected ]
> > [ 165.052761] 2.6.33-rc2x86_64 #3
> > [ 165.052761] -------------------------------------------------------
> > [ 165.052761] cat/5026 is trying to acquire lock:
> > [ 165.052761] (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> > [ 165.052761]
> > [ 165.052761] but task is already holding lock:
> > [ 165.089443] (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
> > [ 165.089443]
> > [ 165.089443] which lock already depends on the new lock.
> > [ 165.089443]
> > [ 165.089443]
> > [ 165.089443] the existing dependency chain (in reverse order) is:
> > [ 165.089443]
> > [ 165.089443] -> #1 (s_active){++++.+}:
> > [ 165.089443] [<ffffffff81054956>] validate_chain+0xa25/0xd1d
> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> > [ 165.089443] [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
> > [ 165.089443] [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
> > [ 165.089443] [<ffffffff810e94cf>] remove_files+0x1f/0x2c
> > [ 165.089443] [<ffffffff810e9561>] sysfs_remove_group+0x85/0xb4
> > [ 165.089443] [<ffffffff81331f0f>] psmouse_disconnect+0x33/0x147
> > [ 165.089443] [<ffffffff8132687b>] serio_disconnect_driver+0x2d/0x3a
> > [ 165.089443] [<ffffffff81326898>] serio_driver_remove+0x10/0x14
> > [ 165.089443] [<ffffffff812077f0>] __device_release_driver+0x67/0xb0
> > [ 165.089443] [<ffffffff81207857>] device_release_driver+0x1e/0x2b
> > [ 165.089443] [<ffffffff81326e68>] serio_disconnect_port+0x60/0x69
> > [ 165.089443] [<ffffffff8132757a>] serio_thread+0x170/0x34a
> > [ 165.089443] [<ffffffff810470e7>] kthread+0x7d/0x85
> > [ 165.089443] [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
> > [ 165.089443]
> > [ 165.089443] -> #0 (&serio->drv_mutex){+.+.+.}:
> > [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> > [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
> > [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> > [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
> > [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
> > [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
> > [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
> > [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
> > [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
> > [ 165.089443]
> > [ 165.089443] other info that might help us debug this:
> > [ 165.089443]
> > [ 165.089443] 3 locks held by cat/5026:
> > [ 165.089443] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff810e715a>] sysfs_read_file+0x39/0x145
> > [ 165.089443] #1: (s_active){++++.+}, at: [<ffffffff810e84d0>] sysfs_get_active_two+0x1f/0x43
> > [ 165.089443] #2: (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
> > [ 165.089443]
> > [ 165.089443] stack backtrace:
> > [ 165.089443] Pid: 5026, comm: cat Not tainted 2.6.33-rc2x86_64 #3
> > [ 165.089443] Call Trace:
> > [ 165.089443] [<ffffffff810538f3>] print_circular_bug+0xb3/0xc1
> > [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
> > [ 165.089443] [<ffffffff81052fb6>] ? trace_hardirqs_on_caller+0x10b/0x12f
> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> > [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
> > [ 165.089443] [<ffffffff8132ee41>] ? atkbd_show_extra+0x0/0x28
> > [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
> > [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
> > [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
> > [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
> > [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
> > [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
> > [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
> >
> > Suggestions on how to sort out this other set of issues are welcome.
>
> Ummm... read of an input sysfs node can trigger

Read? I checked and I do not see where read would cause disconnect.
Also, disconnect only involves unbinding driver from the port, not the
destruction of the port itself (children may be destroyed but they have
different locks).

> serio_disconnect_port() under serio->drv_mutex, which unfortunately
> would need to wait for completion of in-progress sysfs ops thus
> creating possibility for AB-BA deadlock.

I think that we are dealing with different drv->mutex instances here.

> Dmitry, is it possible to
> make serio_disconnect_port() asynchronous from the sysfs ops (ie. put
> it in a work or something)?

I am not sure it is needed. Also in the trace presented
serio_disconnect_port() is called from kseriod which certainly does not
access sysfs...

Overall I am not concerned about lockdep bitching about serio because it
still bitches if you simply reload psmouse on a box with Synaptics with a
pass-through port even though there are nested annotations and it is
silent first time around.

Out of curiosity, do yo uknow what caused psmouse disconnect and what
kind of mouse is in the box?

--
Dmitry

2010-01-03 10:57:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Dmitry Torokhov <[email protected]> writes:

> On Sun, Jan 03, 2010 at 09:32:06AM +0900, Tejun Heo wrote:
>> On 01/03/2010 06:49 AM, Eric W. Biederman wrote:
>>
>> > For the moment I have generated a patch that does the lockdep
>> > annotations, and I have found that a simple:
>> >
>> > find /sys -type f | xargs cat {} > /dev/null
>> >
>> > trivially generates lockdep warnings. In particular:
>>
>> (cc'ing Dmitry, Hi!)
>
> Hi Tejun! ;)
>
>>
>> > [ 165.049042]
>> > [ 165.049044] =======================================================
>> > [ 165.052761] [ INFO: possible circular locking dependency detected ]
>> > [ 165.052761] 2.6.33-rc2x86_64 #3
>> > [ 165.052761] -------------------------------------------------------
>> > [ 165.052761] cat/5026 is trying to acquire lock:
>> > [ 165.052761] (&serio->drv_mutex){+.+.+.}, at: [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.052761]
>> > [ 165.052761] but task is already holding lock:
>> > [ 165.089443] (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
>> > [ 165.089443]
>> > [ 165.089443] which lock already depends on the new lock.
>> > [ 165.089443]
>> > [ 165.089443]
>> > [ 165.089443] the existing dependency chain (in reverse order) is:
>> > [ 165.089443]
>> > [ 165.089443] -> #1 (s_active){++++.+}:
>> > [ 165.089443] [<ffffffff81054956>] validate_chain+0xa25/0xd1d
>> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
>> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
>> > [ 165.089443] [<ffffffff810e8202>] sysfs_addrm_finish+0xba/0x125
>> > [ 165.089443] [<ffffffff810e68b0>] sysfs_hash_and_remove+0x4f/0x6b
>> > [ 165.089443] [<ffffffff810e94cf>] remove_files+0x1f/0x2c
>> > [ 165.089443] [<ffffffff810e9561>] sysfs_remove_group+0x85/0xb4
>> > [ 165.089443] [<ffffffff81331f0f>] psmouse_disconnect+0x33/0x147
>> > [ 165.089443] [<ffffffff8132687b>] serio_disconnect_driver+0x2d/0x3a
>> > [ 165.089443] [<ffffffff81326898>] serio_driver_remove+0x10/0x14
>> > [ 165.089443] [<ffffffff812077f0>] __device_release_driver+0x67/0xb0
>> > [ 165.089443] [<ffffffff81207857>] device_release_driver+0x1e/0x2b
>> > [ 165.089443] [<ffffffff81326e68>] serio_disconnect_port+0x60/0x69
>> > [ 165.089443] [<ffffffff8132757a>] serio_thread+0x170/0x34a
>> > [ 165.089443] [<ffffffff810470e7>] kthread+0x7d/0x85
>> > [ 165.089443] [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
>> > [ 165.089443]
>> > [ 165.089443] -> #0 (&serio->drv_mutex){+.+.+.}:
>> > [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
>> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
>> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
>> > [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
>> > [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
>> > [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
>> > [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
>> > [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
>> > [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
>> > [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
>> > [ 165.089443]
>> > [ 165.089443] other info that might help us debug this:
>> > [ 165.089443]
>> > [ 165.089443] 3 locks held by cat/5026:
>> > [ 165.089443] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff810e715a>] sysfs_read_file+0x39/0x145
>> > [ 165.089443] #1: (s_active){++++.+}, at: [<ffffffff810e84d0>] sysfs_get_active_two+0x1f/0x43
>> > [ 165.089443] #2: (s_active){++++.+}, at: [<ffffffff810e84dd>] sysfs_get_active_two+0x2c/0x43
>> > [ 165.089443]
>> > [ 165.089443] stack backtrace:
>> > [ 165.089443] Pid: 5026, comm: cat Not tainted 2.6.33-rc2x86_64 #3
>> > [ 165.089443] Call Trace:
>> > [ 165.089443] [<ffffffff810538f3>] print_circular_bug+0xb3/0xc1
>> > [ 165.089443] [<ffffffff81054642>] validate_chain+0x711/0xd1d
>> > [ 165.089443] [<ffffffff81052fb6>] ? trace_hardirqs_on_caller+0x10b/0x12f
>> > [ 165.089443] [<ffffffff810553d3>] __lock_acquire+0x785/0x7dc
>> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.089443] [<ffffffff81056112>] lock_acquire+0x5a/0x74
>> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.089443] [<ffffffff814378ed>] mutex_lock_interruptible_nested+0x4a/0x307
>> > [ 165.089443] [<ffffffff8132ecaa>] ? atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.089443] [<ffffffff8132ee41>] ? atkbd_show_extra+0x0/0x28
>> > [ 165.089443] [<ffffffff8132ecaa>] atkbd_attr_show_helper+0x28/0x6e
>> > [ 165.089443] [<ffffffff8132ed81>] atkbd_do_show_extra+0x13/0x15
>> > [ 165.089443] [<ffffffff812049b6>] dev_attr_show+0x20/0x43
>> > [ 165.089443] [<ffffffff810e71db>] sysfs_read_file+0xba/0x145
>> > [ 165.089443] [<ffffffff8109f507>] vfs_read+0xab/0x147
>> > [ 165.089443] [<ffffffff8109f85c>] sys_read+0x47/0x70
>> > [ 165.089443] [<ffffffff81001f2b>] system_call_fastpath+0x16/0x1b
>> >
>> > Suggestions on how to sort out this other set of issues are welcome.
>>
>> Ummm... read of an input sysfs node can trigger
>
> Read? I checked and I do not see where read would cause disconnect.
> Also, disconnect only involves unbinding driver from the port, not the
> destruction of the port itself (children may be destroyed but they have
> different locks).
>
>> serio_disconnect_port() under serio->drv_mutex, which unfortunately
>> would need to wait for completion of in-progress sysfs ops thus
>> creating possibility for AB-BA deadlock.
>
> I think that we are dealing with different drv->mutex instances here.
>
>> Dmitry, is it possible to
>> make serio_disconnect_port() asynchronous from the sysfs ops (ie. put
>> it in a work or something)?
>
> I am not sure it is needed. Also in the trace presented
> serio_disconnect_port() is called from kseriod which certainly does not
> access sysfs...
>
> Overall I am not concerned about lockdep bitching about serio because it
> still bitches if you simply reload psmouse on a box with Synaptics with a
> pass-through port even though there are nested annotations and it is
> silent first time around.

This is a new lockdep annotation, and looking into it this appears to
be a true possible deadlock in the serio/sysfs interactions.

We have serio_pin_driver() called from all of the sysfs attributes
which does:
mutex_lock(&serio->drv_mutex);

We have serio_disconnect_driver() called on an unplug which does:
mutex_lock(&serio->drv_mutex);

The deadlock potential is if someone reads say the psmouse rate
sysfs file while the mouse is being unplugged. There is a race
such that we can have:

sysfs_read_file()
fill_read_buffer()
sysfs_get_active_two()
psmouse_attr_show_helper()
serio_pin_driver()
serio_disconnect_driver()
mutex_lock(&serio->drv_mutex);
<-----------------> mutex_lock(&serio_drv_mutex);
psmouse_disconnect()
sysfs_remove_group(... psmouse_attr_group);
....
sysfs_deactivate();
wait_for_completion();


So it is unlikely but possible to deadlock by accessing a serio
attribute of a serio device that is being removed.

What to do about it is another question. It has just recently come to my
attention that we have more events like this

> Out of curiosity, do yo uknow what caused psmouse disconnect and what
> kind of mouse is in the box?

It is a simple ps2mouse connected through a kvm, and the kvm was not
switched to the machine in question during the run.

I am trying to wrap my head around what to do with this sysfs_deactivate
deadlock scenario, (other drivers also hold unfortunate locks over the
removal of sysfs files, and it just happens that the ps2mouse case was the first
one I reproduced), and it was interesting because I had not seen it before.

Eric

2010-01-03 11:14:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

[email protected] (Eric W. Biederman) writes:

> What to do about it is another question. It has just recently come to my
> attention that we have more events like this

In the specific case of serio what gets us in trouble is
the call to sysfs_remove_group.

If instead of independent calls to sysfs_create_group/sysfs_remove_group,
you could move the groups into a list on dev->groups than we could solve
two problems.

- Userspace would see all of the attributes when the hotplug event is
fired remove races.

- We would not hold serio->drv_mutex over sysfs_remove_group so there
would not be a possible deadlock on device removal.

Does that change sound possible?

Eric

2010-01-04 18:58:13

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

On Sun, Jan 03, 2010 at 02:57:15AM -0800, Eric W. Biederman wrote:
> Dmitry Torokhov <[email protected]> writes:
>
> >
> > Overall I am not concerned about lockdep bitching about serio because it
> > still bitches if you simply reload psmouse on a box with Synaptics with a
> > pass-through port even though there are nested annotations and it is
> > silent first time around.
>
> This is a new lockdep annotation, and looking into it this appears to
> be a true possible deadlock in the serio/sysfs interactions.
>
> We have serio_pin_driver() called from all of the sysfs attributes
> which does:
> mutex_lock(&serio->drv_mutex);
>
> We have serio_disconnect_driver() called on an unplug which does:
> mutex_lock(&serio->drv_mutex);
>
> The deadlock potential is if someone reads say the psmouse rate
> sysfs file while the mouse is being unplugged. There is a race
> such that we can have:
>
> sysfs_read_file()
> fill_read_buffer()
> sysfs_get_active_two()
> psmouse_attr_show_helper()
> serio_pin_driver()
> serio_disconnect_driver()
> mutex_lock(&serio->drv_mutex);
> <-----------------> mutex_lock(&serio_drv_mutex);
> psmouse_disconnect()
> sysfs_remove_group(... psmouse_attr_group);
> ....
> sysfs_deactivate();
> wait_for_completion();
>
>
> So it is unlikely but possible to deadlock by accessing a serio
> attribute of a serio device that is being removed.

Hmm, I guess I was too quick dismissing lockdep complaints here. Now
that sysfs remove waits deadlock indeed is possible. Actually the locks
on serio->drv_mutex in attributes were added to make sure we don't
access device that was unbound from the driver through stale sysfs
attributes.

>
> What to do about it is another question.

I think we should simply not take serio->drv_mutex in attributes and use
driver-private mutex to serialize "set" methods that may alter device
state.

--
Dmitry

2010-01-04 19:16:41

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

On Sun, Jan 03, 2010 at 03:14:18AM -0800, Eric W. Biederman wrote:
> [email protected] (Eric W. Biederman) writes:
>
> > What to do about it is another question. It has just recently come to my
> > attention that we have more events like this
>
> In the specific case of serio what gets us in trouble is
> the call to sysfs_remove_group.
>
> If instead of independent calls to sysfs_create_group/sysfs_remove_group,
> you could move the groups into a list on dev->groups than we could solve
> two problems.
>
> - Userspace would see all of the attributes when the hotplug event is
> fired remove races.
>
> - We would not hold serio->drv_mutex over sysfs_remove_group so there
> would not be a possible deadlock on device removal.
>
> Does that change sound possible?

No, because attributes in question belong to driver+device combo. The
device will not go away when driver is unbound but we do want to remove
attributes.

--
Dmitry

2010-01-04 19:43:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Dmitry Torokhov <[email protected]> writes:

> On Sun, Jan 03, 2010 at 02:57:15AM -0800, Eric W. Biederman wrote:
>> Dmitry Torokhov <[email protected]> writes:
>>
>> >
>> > Overall I am not concerned about lockdep bitching about serio because it
>> > still bitches if you simply reload psmouse on a box with Synaptics with a
>> > pass-through port even though there are nested annotations and it is
>> > silent first time around.
>>
>> This is a new lockdep annotation, and looking into it this appears to
>> be a true possible deadlock in the serio/sysfs interactions.
>>
>> We have serio_pin_driver() called from all of the sysfs attributes
>> which does:
>> mutex_lock(&serio->drv_mutex);
>>
>> We have serio_disconnect_driver() called on an unplug which does:
>> mutex_lock(&serio->drv_mutex);
>>
>> The deadlock potential is if someone reads say the psmouse rate
>> sysfs file while the mouse is being unplugged. There is a race
>> such that we can have:
>>
>> sysfs_read_file()
>> fill_read_buffer()
>> sysfs_get_active_two()
>> psmouse_attr_show_helper()
>> serio_pin_driver()
>> serio_disconnect_driver()
>> mutex_lock(&serio->drv_mutex);
>> <-----------------> mutex_lock(&serio_drv_mutex);
>> psmouse_disconnect()
>> sysfs_remove_group(... psmouse_attr_group);
>> ....
>> sysfs_deactivate();
>> wait_for_completion();
>>
>>
>> So it is unlikely but possible to deadlock by accessing a serio
>> attribute of a serio device that is being removed.
>
> Hmm, I guess I was too quick dismissing lockdep complaints here. Now
> that sysfs remove waits deadlock indeed is possible. Actually the locks
> on serio->drv_mutex in attributes were added to make sure we don't
> access device that was unbound from the driver through stale sysfs
> attributes.

Cool. So we have solved the problem generically but we have left over
layer specific solutions. That seems like a good problem to have.

>> What to do about it is another question.
>
> I think we should simply not take serio->drv_mutex in attributes and use
> driver-private mutex to serialize "set" methods that may alter device
> state.

Do you have any ideas what those might be? It looks like we are only
talking about psmouse and atkbd. So the audit for this chunk should
not be too bad.

The psmouse code already has a mutex on it's set operations only the
atkbd does not. The atkbd code does do a driver stop/start, which is
similar (but race prone without the serio->drv_mutex).

Except for the lack of atkbd_enable/disable locking the patch below should
be good. Opinions from someone who knows the serio code better than I do
would be helpful.

Eric


---

From: Eric W. Biederman <[email protected]>
Subject: [PATCH] serio: Remove uneeded and deadlock prone serio_pin_driver

sysfs_remove_group waits for sysfs attributes to be removed
so we don't need to take a mutex in each of the attributes to
prevent remove while the code in the attribute is running.

This removes a theoretical deadlock possibility of a keyboard
or mouse hotplug and someone accessing a sysfs attribute.

Signed-off-by: Eric W. Biederman <[email protected]>
---
drivers/input/keyboard/atkbd.c | 27 +--------------------------
drivers/input/mouse/psmouse-base.c | 32 +++-----------------------------
include/linux/serio.h | 19 -------------------
3 files changed, 4 insertions(+), 74 deletions(-)

diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c
index a357357..7200100 100644
--- a/drivers/input/keyboard/atkbd.c
+++ b/drivers/input/keyboard/atkbd.c
@@ -1234,22 +1234,8 @@ static ssize_t atkbd_attr_show_helper(struct device *dev, char *buf,
ssize_t (*handler)(struct atkbd *, char *))
{
struct serio *serio = to_serio_port(dev);
- int retval;
-
- retval = serio_pin_driver(serio);
- if (retval)
- return retval;

- if (serio->drv != &atkbd_drv) {
- retval = -ENODEV;
- goto out;
- }
-
- retval = handler((struct atkbd *)serio_get_drvdata(serio), buf);
-
-out:
- serio_unpin_driver(serio);
- return retval;
+ return handler((struct atkbd *)serio_get_drvdata(serio), buf);
}

static ssize_t atkbd_attr_set_helper(struct device *dev, const char *buf, size_t count,
@@ -1259,22 +1245,11 @@ static ssize_t atkbd_attr_set_helper(struct device *dev, const char *buf, size_t
struct atkbd *atkbd;
int retval;

- retval = serio_pin_driver(serio);
- if (retval)
- return retval;
-
- if (serio->drv != &atkbd_drv) {
- retval = -ENODEV;
- goto out;
- }
-
atkbd = serio_get_drvdata(serio);
atkbd_disable(atkbd);
retval = handler(atkbd, buf, count);
atkbd_enable(atkbd);

-out:
- serio_unpin_driver(serio);
return retval;
}

diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c
index fd0bc09..59754d3 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -1447,24 +1447,10 @@ ssize_t psmouse_attr_show_helper(struct device *dev, struct device_attribute *de
struct serio *serio = to_serio_port(dev);
struct psmouse_attribute *attr = to_psmouse_attr(devattr);
struct psmouse *psmouse;
- int retval;
-
- retval = serio_pin_driver(serio);
- if (retval)
- return retval;
-
- if (serio->drv != &psmouse_drv) {
- retval = -ENODEV;
- goto out;
- }

psmouse = serio_get_drvdata(serio);

- retval = attr->show(psmouse, attr->data, buf);
-
-out:
- serio_unpin_driver(serio);
- return retval;
+ return attr->show(psmouse, attr->data, buf);
}

ssize_t psmouse_attr_set_helper(struct device *dev, struct device_attribute *devattr,
@@ -1475,18 +1461,9 @@ ssize_t psmouse_attr_set_helper(struct device *dev, struct device_attribute *dev
struct psmouse *psmouse, *parent = NULL;
int retval;

- retval = serio_pin_driver(serio);
- if (retval)
- return retval;
-
- if (serio->drv != &psmouse_drv) {
- retval = -ENODEV;
- goto out_unpin;
- }
-
retval = mutex_lock_interruptible(&psmouse_mutex);
if (retval)
- goto out_unpin;
+ goto out;

psmouse = serio_get_drvdata(serio);

@@ -1516,8 +1493,7 @@ ssize_t psmouse_attr_set_helper(struct device *dev, struct device_attribute *dev

out_unlock:
mutex_unlock(&psmouse_mutex);
- out_unpin:
- serio_unpin_driver(serio);
+ out:
return retval;
}

@@ -1579,9 +1555,7 @@ static ssize_t psmouse_attr_set_protocol(struct psmouse *psmouse, void *data, co
}

mutex_unlock(&psmouse_mutex);
- serio_unpin_driver(serio);
serio_unregister_child_port(serio);
- serio_pin_driver_uninterruptible(serio);
mutex_lock(&psmouse_mutex);

if (serio->drv != &psmouse_drv) {
diff --git a/include/linux/serio.h b/include/linux/serio.h
index e2f3044..813d26c 100644
--- a/include/linux/serio.h
+++ b/include/linux/serio.h
@@ -136,25 +136,6 @@ static inline void serio_continue_rx(struct serio *serio)
spin_unlock_irq(&serio->lock);
}

-/*
- * Use the following functions to pin serio's driver in process context
- */
-static inline int serio_pin_driver(struct serio *serio)
-{
- return mutex_lock_interruptible(&serio->drv_mutex);
-}
-
-static inline void serio_pin_driver_uninterruptible(struct serio *serio)
-{
- mutex_lock(&serio->drv_mutex);
-}
-
-static inline void serio_unpin_driver(struct serio *serio)
-{
- mutex_unlock(&serio->drv_mutex);
-}
-
-
#endif

/*
--
1.6.5.2.143.g8cc62

2010-01-04 21:12:52

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

On Mon, Jan 04, 2010 at 11:43:10AM -0800, Eric W. Biederman wrote:
> Dmitry Torokhov <[email protected]> writes:
>
> > On Sun, Jan 03, 2010 at 02:57:15AM -0800, Eric W. Biederman wrote:
> >> Dmitry Torokhov <[email protected]> writes:
> >>
> >> >
> >> > Overall I am not concerned about lockdep bitching about serio because it
> >> > still bitches if you simply reload psmouse on a box with Synaptics with a
> >> > pass-through port even though there are nested annotations and it is
> >> > silent first time around.
> >>
> >> This is a new lockdep annotation, and looking into it this appears to
> >> be a true possible deadlock in the serio/sysfs interactions.
> >>
> >> We have serio_pin_driver() called from all of the sysfs attributes
> >> which does:
> >> mutex_lock(&serio->drv_mutex);
> >>
> >> We have serio_disconnect_driver() called on an unplug which does:
> >> mutex_lock(&serio->drv_mutex);
> >>
> >> The deadlock potential is if someone reads say the psmouse rate
> >> sysfs file while the mouse is being unplugged. There is a race
> >> such that we can have:
> >>
> >> sysfs_read_file()
> >> fill_read_buffer()
> >> sysfs_get_active_two()
> >> psmouse_attr_show_helper()
> >> serio_pin_driver()
> >> serio_disconnect_driver()
> >> mutex_lock(&serio->drv_mutex);
> >> <-----------------> mutex_lock(&serio_drv_mutex);
> >> psmouse_disconnect()
> >> sysfs_remove_group(... psmouse_attr_group);
> >> ....
> >> sysfs_deactivate();
> >> wait_for_completion();
> >>
> >>
> >> So it is unlikely but possible to deadlock by accessing a serio
> >> attribute of a serio device that is being removed.
> >
> > Hmm, I guess I was too quick dismissing lockdep complaints here. Now
> > that sysfs remove waits deadlock indeed is possible. Actually the locks
> > on serio->drv_mutex in attributes were added to make sure we don't
> > access device that was unbound from the driver through stale sysfs
> > attributes.
>
> Cool. So we have solved the problem generically but we have left over
> layer specific solutions. That seems like a good problem to have.
>
> >> What to do about it is another question.
> >
> > I think we should simply not take serio->drv_mutex in attributes and use
> > driver-private mutex to serialize "set" methods that may alter device
> > state.
>
> Do you have any ideas what those might be? It looks like we are only
> talking about psmouse and atkbd. So the audit for this chunk should
> not be too bad.

Right, only these 2.

>
> The psmouse code already has a mutex on it's set operations only the
> atkbd does not. The atkbd code does do a driver stop/start, which is
> similar (but race prone without the serio->drv_mutex).
>
> Except for the lack of atkbd_enable/disable locking the patch below should
> be good. Opinions from someone who knows the serio code better than I do
> would be helpful.

Thanks Eric, this looks good. I'll add the missing mutex to atkbd and
apply. I think it can wait for .34 though - the window is quite small.

--
Dmitry

2010-01-04 23:04:44

by Tejun Heo

[permalink] [raw]
Subject: Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

Hello,

On 01/05/2010 04:43 AM, Eric W. Biederman wrote:
>>> So it is unlikely but possible to deadlock by accessing a serio
>>> attribute of a serio device that is being removed.
>>
>> Hmm, I guess I was too quick dismissing lockdep complaints here. Now
>> that sysfs remove waits deadlock indeed is possible. Actually the locks
>> on serio->drv_mutex in attributes were added to make sure we don't
>> access device that was unbound from the driver through stale sysfs
>> attributes.
>
> Cool. So we have solved the problem generically but we have left over
> layer specific solutions. That seems like a good problem to have.

This is way too cool. Only if we can have more moments like this. :-)

--
tejun

2010-01-17 16:27:23

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference

On Sat, 02 Jan 2010 13:37:12 -0800
[email protected] (Eric W. Biederman) wrote:

>
> Holding locks over device_del -> kobject_del -> sysfs_deactivate can
> cause deadlocks if those same locks are grabbed in sysfs show or store
> methods.
>
> The I model s_active count + completion as a sleeping read/write lock.
> I describe to lockdep sysfs_get_active as a read_trylock,
> sysfs_put_active as a read_unlock, and sysfs_deactivate as a
> write_lock and write_unlock pair. This seems to capture the essence
> for purposes of finding deadlocks, and in my testing gives finds real
> issues and ignores non-issues.
>
> This brings us back to holding locks over kobject_del is a problem
> that ideally we should find a way of addressing, but at least lockdep
> can tell us about the problems instead of requiring developers to
> debug rare strange system deadlocks, that happen when sysfs files are
> removed while being written to.

The model has hit a possible deadlock in pcmcia, and the lockdep warning
comes when I unplug my wlan card from pcmcia slot.

Looks like socket->skt_mutex is held in remove path, and it is also
grabbed in .stor method.


[ 9207.755883] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0
[ 9207.786583]
[ 9207.786586] =======================================================
[ 9207.786595] [ INFO: possible circular locking dependency detected ]
[ 9207.786602] 2.6.33-rc4-wl #8
[ 9207.786607] -------------------------------------------------------
[ 9207.786614] pccardd/841 is trying to acquire lock:
[ 9207.786620] (s_active){++++.+}, at: [<ffffffff811637f1>] sysfs_addrm_finish+0x36/0x55
[ 9207.786643]
[ 9207.786645] but task is already holding lock:
[ 9207.786651] (&socket->skt_mutex){+.+.+.}, at: [<ffffffff812ed04c>] pccardd+0x15d/0x25f
[ 9207.786669]
[ 9207.786671] which lock already depends on the new lock.
[ 9207.786674]
[ 9207.786679]
[ 9207.786680] the existing dependency chain (in reverse order) is:
[ 9207.786687]
[ 9207.786688] -> #1 (&socket->skt_mutex){+.+.+.}:
[ 9207.786702] [<ffffffff810796c0>] __lock_acquire+0xb73/0xd2b
[ 9207.786716] [<ffffffff8107a36b>] lock_acquire+0xe1/0x105
[ 9207.786726] [<ffffffff813b5ac5>] __mutex_lock_common+0x59/0x49d
[ 9207.786741] [<ffffffff813b5fbe>] mutex_lock_nested+0x39/0x3e
[ 9207.786752] [<ffffffff812ef3ed>] pccard_store_resource+0x6b/0xc5
[ 9207.786763] [<ffffffff812a55da>] dev_attr_store+0x20/0x22
[ 9207.786775] [<ffffffff8116259a>] sysfs_write_file+0x108/0x144
[ 9207.786787] [<ffffffff8110cf48>] vfs_write+0xae/0x10b
[ 9207.786798] [<ffffffff8110d065>] sys_write+0x4a/0x6e
[ 9207.786808] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
[ 9207.786822]
[ 9207.786824] -> #0 (s_active){++++.+}:
[ 9207.786835] [<ffffffff8107956a>] __lock_acquire+0xa1d/0xd2b
[ 9207.786847] [<ffffffff8107a36b>] lock_acquire+0xe1/0x105
[ 9207.786857] [<ffffffff81163230>] sysfs_deactivate+0x8b/0xe0
[ 9207.786868] [<ffffffff811637f1>] sysfs_addrm_finish+0x36/0x55
[ 9207.786879] [<ffffffff81161c26>] sysfs_hash_and_remove+0x53/0x6a
[ 9207.786890] [<ffffffff811629fd>] sysfs_remove_file+0x15/0x17
[ 9207.786900] [<ffffffff812a650e>] device_remove_file+0x17/0x19
[ 9207.786911] [<ffffffff81213ef7>] pci_remove_sysfs_dev_files+0x6b/0x10c
[ 9207.786924] [<ffffffff8120dce1>] pci_stop_bus_device+0x55/0x83
[ 9207.786936] [<ffffffff8120dd99>] pci_remove_bus_device+0x1a/0xba
[ 9207.786947] [<ffffffff8120de5f>] pci_remove_behind_bridge+0x26/0x3f
[ 9207.786958] [<ffffffff812efb96>] cb_free+0x4a/0x4f
[ 9207.786969] [<ffffffff812ec682>] socket_shutdown+0x91/0xfd
[ 9207.786979] [<ffffffff812ec88a>] socket_remove+0x4e/0x57
[ 9207.786989] [<ffffffff812ed08b>] pccardd+0x19c/0x25f
[ 9207.787000] [<ffffffff810674dd>] kthread+0x7f/0x87
[ 9207.787011] [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[ 9207.787023]
[ 9207.787025] other info that might help us debug this:
[ 9207.787027]
[ 9207.787034] 1 lock held by pccardd/841:
[ 9207.787039] #0: (&socket->skt_mutex){+.+.+.}, at: [<ffffffff812ed04c>] pccardd+0x15d/0x25f
[ 9207.787058]
[ 9207.787060] stack backtrace:
[ 9207.787068] Pid: 841, comm: pccardd Not tainted 2.6.33-rc4-wl #8
[ 9207.787069] Call Trace:
[ 9207.787069] [<ffffffff8107871d>] print_circular_bug+0xa8/0xb6
[ 9207.787069] [<ffffffff8107956a>] __lock_acquire+0xa1d/0xd2b
[ 9207.787069] [<ffffffff811637f1>] ? sysfs_addrm_finish+0x36/0x55
[ 9207.787069] [<ffffffff8107a36b>] lock_acquire+0xe1/0x105
[ 9207.787069] [<ffffffff811637f1>] ? sysfs_addrm_finish+0x36/0x55
[ 9207.787069] [<ffffffff81163230>] sysfs_deactivate+0x8b/0xe0
[ 9207.787069] [<ffffffff811637f1>] ? sysfs_addrm_finish+0x36/0x55
[ 9207.787069] [<ffffffff8107760e>] ? trace_hardirqs_off+0xd/0xf
[ 9207.787069] [<ffffffff813b5897>] ? __mutex_unlock_slowpath+0x119/0x14e
[ 9207.787069] [<ffffffff811637f1>] sysfs_addrm_finish+0x36/0x55
[ 9207.787069] [<ffffffff81161c26>] sysfs_hash_and_remove+0x53/0x6a
[ 9207.787069] [<ffffffff811629fd>] sysfs_remove_file+0x15/0x17
[ 9207.787069] [<ffffffff812a650e>] device_remove_file+0x17/0x19
[ 9207.787069] [<ffffffff81213ef7>] pci_remove_sysfs_dev_files+0x6b/0x10c
[ 9207.787069] [<ffffffff8120dce1>] pci_stop_bus_device+0x55/0x83
[ 9207.787069] [<ffffffff8120dd99>] pci_remove_bus_device+0x1a/0xba
[ 9207.787069] [<ffffffff8120de5f>] pci_remove_behind_bridge+0x26/0x3f
[ 9207.787069] [<ffffffff812efb96>] cb_free+0x4a/0x4f
[ 9207.787069] [<ffffffff812ec682>] socket_shutdown+0x91/0xfd
[ 9207.787069] [<ffffffff812ec88a>] socket_remove+0x4e/0x57
[ 9207.787069] [<ffffffff812ed08b>] pccardd+0x19c/0x25f
[ 9207.787069] [<ffffffff812eceef>] ? pccardd+0x0/0x25f
[ 9207.787069] [<ffffffff810674dd>] kthread+0x7f/0x87
[ 9207.787069] [<ffffffff8100aa64>] kernel_thread_helper+0x4/0x10
[ 9207.787069] [<ffffffff813b8014>] ? restore_args+0x0/0x30
[ 9207.787069] [<ffffffff8106745e>] ? kthread+0x0/0x87
[ 9207.787069] [<ffffffff8100aa60>] ? kernel_thread_helper+0x0/0x10
[ 9207.788150] device: '0000:16:00.0': device_unregister
[ 9207.788166] PM: Removing info for pci:0000:16:00.0
[ 9207.788405] bus: 'pci': remove device 0000:16:00.0


2010-01-17 17:18:58

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference

Ming Lei <[email protected]> writes:

> On Sat, 02 Jan 2010 13:37:12 -0800
> [email protected] (Eric W. Biederman) wrote:
>
>>
>> Holding locks over device_del -> kobject_del -> sysfs_deactivate can
>> cause deadlocks if those same locks are grabbed in sysfs show or store
>> methods.
>>
>> The I model s_active count + completion as a sleeping read/write lock.
>> I describe to lockdep sysfs_get_active as a read_trylock,
>> sysfs_put_active as a read_unlock, and sysfs_deactivate as a
>> write_lock and write_unlock pair. This seems to capture the essence
>> for purposes of finding deadlocks, and in my testing gives finds real
>> issues and ignores non-issues.
>>
>> This brings us back to holding locks over kobject_del is a problem
>> that ideally we should find a way of addressing, but at least lockdep
>> can tell us about the problems instead of requiring developers to
>> debug rare strange system deadlocks, that happen when sysfs files are
>> removed while being written to.
>
> The model has hit a possible deadlock in pcmcia, and the lockdep warning
> comes when I unplug my wlan card from pcmcia slot.
>
> Looks like socket->skt_mutex is held in remove path, and it is also
> grabbed in .stor method.


Looking a little closer this is simultaneously a legitimate problem
and also a false positive.

This is only legitimate if you add/remove a cardbus bridge, plugged into
another cardbus bridge, which I think is unlikely but physically possible.

Eric

2010-01-17 18:03:56

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [PATCH] sysfs: Add lockdep annotations for the sysfs active reference

Hey,

On Sun, Jan 17, 2010 at 09:18:34AM -0800, Eric W. Biederman wrote:
> Ming Lei <[email protected]> writes:
>
> > On Sat, 02 Jan 2010 13:37:12 -0800
> > [email protected] (Eric W. Biederman) wrote:
> >
> >>
> >> Holding locks over device_del -> kobject_del -> sysfs_deactivate can
> >> cause deadlocks if those same locks are grabbed in sysfs show or store
> >> methods.
> >>
> >> The I model s_active count + completion as a sleeping read/write lock.
> >> I describe to lockdep sysfs_get_active as a read_trylock,
> >> sysfs_put_active as a read_unlock, and sysfs_deactivate as a
> >> write_lock and write_unlock pair. This seems to capture the essence
> >> for purposes of finding deadlocks, and in my testing gives finds real
> >> issues and ignores non-issues.
> >>
> >> This brings us back to holding locks over kobject_del is a problem
> >> that ideally we should find a way of addressing, but at least lockdep
> >> can tell us about the problems instead of requiring developers to
> >> debug rare strange system deadlocks, that happen when sysfs files are
> >> removed while being written to.
> >
> > The model has hit a possible deadlock in pcmcia, and the lockdep warning
> > comes when I unplug my wlan card from pcmcia slot.
> >
> > Looks like socket->skt_mutex is held in remove path, and it is also
> > grabbed in .stor method.
>
>
> Looking a little closer this is simultaneously a legitimate problem
> and also a false positive.
>
> This is only legitimate if you add/remove a cardbus bridge, plugged into
> another cardbus bridge, which I think is unlikely but physically possible.

Unfortunately, it is not a false positive, as removing a PCMCIA device
racing with "pccardctl eject" seems to trigger this path as well. Patch is
being prepared...

Best,
Dominik