2010-11-24 00:45:39

by Andrew Morton

[permalink] [raw]
Subject: mmotm 2010-11-23-16-12 uploaded

The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to

http://userweb.kernel.org/~akpm/mmotm/

and will soon be available at

git://zen-kernel.org/kernel/mmotm.git

It contains the following patches against 2.6.37-rc3:

leds-fix-bug-with-reading-nas-ss4200-dmi-code.patch
include-linux-fsh-fix-userspace-build.patch
nommu-yield-cpu-while-disposing-vm.patch
uml-disable-winch-irq-before-freeing-handler-data.patch
arch-x86-kernel-entry_64s-fix-build-with-gas-2161.patch
memcg-fix-false-positive-vm_bug-on-non-smp.patch
memcg-fix-false-positive-vm_bug-on-non-smp-fix.patch
linux-next.patch
next-remove-localversion.patch
i-need-old-gcc.patch
aesni-nfg.patch
arch-alpha-kernel-systblss-remove-debug-check.patch
sgi-xpc-xpc-fails-to-discover-partitions-with-all-nasids-above-128.patch
fuse-fix-attributes-after-openo_trunc.patch
drivers-leds-leds-lp5521c-change-some-macros-to-functions.patch
drivers-leds-leds-lp5523c-change-some-macros-to-functions.patch
drivers-leds-leds-lp5521c-adjust-delays-and-add-comments-to-them.patch
drivers-leds-leds-lp5523c-adjust-delays-and-add-comments-to-them.patch
drivers-leds-leds-lp5521c-perform-sw-reset-before-detection.patch
drivers-leds-leds-lp5523c-perform-sw-reset-before-detection.patch
memcg-avoid-deadlock-between-move-charge-and-try_charge.patch
cgroups-make-swap-accounting-default-behavior-configurable.patch
cgroups-make-swap-accounting-default-behavior-configurable-update.patch
mm-page_allocc-fix-build_all_zonelist-where-percpu_alloc-is-wrongly-called-under-stop_machine_run.patch
mm-page_allocc-fix-build_all_zonelist-where-percpu_alloc-is-wrongly-called-under-stop_machine_run-cleanup.patch
mm-remove-call-to-find_vma-in-pagewalk-for-non-hugetlbfs.patch
pagemap-set-pagemap-walk-limit-to-pmd-boundary.patch
drivers-misc-isl29020c-remove-incorrect-kfree-in-isl29020_remove.patch
backlight-grab-ops_lock-before-testing-bd-ops.patch
reiserfs-fix-inode-mutex-reiserfs-lock-misordering.patch
scripts-fix-gfp-translate-for-recent-changes-to-gfph.patch
scripts-fix-gfp-translate-for-recent-changes-to-gfph-fix.patch
mm-vmap-area-cache.patch
arch-arm-plat-omap-iovmmc-fix-end-address-of-vm-area-comparation-in-alloc_iovm_area.patch
backlight-fix-88pm860x_bl-macro-collision.patch
cciss-fix-botched-tag-masking-for-scsi-tape-commands.patch
arch-x86-kernel-entry_32s-i386-too.patch
arch-x86-include-asm-fixmaph-mark-__set_fixmap_offset-as-__always_inline.patch
ibm_rtl-fix-printk-format-warning.patch
acerhdf-add-support-for-aspire-1410-bios-v13314.patch
arch-x86-kernel-apic-io_apicc-fix-warning.patch
x86-olpc-add-xo-1-suspend-resume-support.patch
fs-btrfs-inodec-eliminate-memory-leak.patch
btrfs-dont-dereference-extent_mapping-if-null.patch
cifs-dont-overwrite-dentry-name-in-d_revalidate.patch
cpufreq-fix-ondemand-governor-powersave_bias-execution-time-misuse.patch
drivers-dma-use-the-ccflag-y-instead-of-extra_cflags.patch
drivers-dma-ioat-use-the-ccflag-y-instead-of-extra_cflags.patch
jfs-dont-overwrite-dentry-name-in-d_revalidate.patch
powerpc-enable-arch_dma_addr_t_64bit-with-arch_phys_addr_t_64bit.patch
debugfs-remove-module_exit.patch
drivers-gpu-drm-radeon-atomc-fix-warning.patch
irq-use-per_cpu-kstat_irqs.patch
irq-use-per_cpu-kstat_irqs-checkpatch-fixes.patch
drivers-leds-leds-lp5521c-fix-potential-buffer-overflow.patch
leds-route-kbd-leds-through-the-generic-leds-layer.patch
mips-enable-arch_dma_addr_t_64bit-with-highmem-64bit_phys_addr-64bit.patch
isdn-capi-unregister-capictr-notifier-after-init-failure.patch
isdn-capi-make-kcapi-use-a-separate-workqueue.patch
drivers-video-backlight-l4f00242t03c-make-1-bit-signed-field-unsigned.patch
drivers-video-backlight-l4f00242t03c-full-implement-fb-power-states-for-this-lcd.patch
btusb-patch-add_apple_macbookpro62.patch
atmel_serial-fix-rts-high-after-initialization-in-rs485-mode.patch
atmel_serial-fix-rts-high-after-initialization-in-rs485-mode-fix.patch
drivers-message-fusion-mptsasc-fix-warning.patch
hpsa-remove-incorrect-redefinition-of-pci_device_id_hp_cissf.patch
drivers-block-makefile-replace-the-use-of-module-objs-with-module-y.patch
drivers-block-aoe-makefile-replace-the-use-of-module-objs-with-module-y.patch
vfs-remove-a-warning-on-open_fmode.patch
vfs-add-__fmode_exec.patch
n_hdlc-fix-read-and-write-locking.patch
n_hdlc-fix-read-and-write-locking-update.patch
mm.patch
mm-page-allocator-adjust-the-per-cpu-counter-threshold-when-memory-is-low.patch
mm-vmstat-use-a-single-setter-function-and-callback-for-adjusting-percpu-thresholds.patch
mm-vmstat-use-a-single-setter-function-and-callback-for-adjusting-percpu-thresholds-fix.patch
mm-vmstat-use-a-single-setter-function-and-callback-for-adjusting-percpu-thresholds-update.patch
mm-vmstat-use-a-single-setter-function-and-callback-for-adjusting-percpu-thresholds-fix-set_pgdat_percpu_threshold-dont-use-for_each_online_cpu.patch
mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure.patch
writeback-integrated-background-writeback-work.patch
writeback-trace-wakeup-event-for-background-writeback.patch
writeback-stop-background-kupdate-works-from-livelocking-other-works.patch
writeback-stop-background-kupdate-works-from-livelocking-other-works-update.patch
writeback-avoid-livelocking-wb_sync_all-writeback.patch
writeback-avoid-livelocking-wb_sync_all-writeback-update.patch
writeback-check-skipped-pages-on-wb_sync_all.patch
writeback-check-skipped-pages-on-wb_sync_all-update.patch
writeback-check-skipped-pages-on-wb_sync_all-update-fix.patch
writeback-io-less-balance_dirty_pages.patch
writeback-consolidate-variable-names-in-balance_dirty_pages.patch
writeback-per-task-rate-limit-on-balance_dirty_pages.patch
writeback-per-task-rate-limit-on-balance_dirty_pages-fix.patch
writeback-prevent-duplicate-balance_dirty_pages_ratelimited-calls.patch
writeback-account-per-bdi-accumulated-written-pages.patch
writeback-bdi-write-bandwidth-estimation.patch
writeback-bdi-write-bandwidth-estimation-fix.patch
writeback-show-bdi-write-bandwidth-in-debugfs.patch
writeback-quit-throttling-when-bdi-dirty-pages-dropped-low.patch
writeback-reduce-per-bdi-dirty-threshold-ramp-up-time.patch
writeback-make-reasonable-gap-between-the-dirty-background-thresholds.patch
writeback-scale-down-max-throttle-bandwidth-on-concurrent-dirtiers.patch
writeback-add-trace-event-for-balance_dirty_pages.patch
writeback-make-nr_to_write-a-per-file-limit.patch
writeback-make-nr_to_write-a-per-file-limit-fix.patch
sync_inode_metadata-fix-comment.patch
mm-page-writebackc-fix-__set_page_dirty_no_writeback-return-value.patch
vmscan-factor-out-kswapd-sleeping-logic-from-kswapd.patch
mm-find_get_pages_contig-fixlet.patch
fs-mpagec-consolidate-code.patch
fs-mpagec-consolidate-code-checkpatch-fixes.patch
mm-convert-sprintf_symbol-to-%ps.patch
mm-smaps-export-mlock-information.patch
mm-compaction-add-trace-events-for-memory-compaction-activity.patch
mm-vmscan-convert-lumpy_mode-into-a-bitmask.patch
mm-vmscan-reclaim-order-0-and-use-compaction-instead-of-lumpy-reclaim.patch
mm-vmscan-reclaim-order-0-and-use-compaction-instead-of-lumpy-reclaim-fix.patch
mm-migration-allow-migration-to-operate-asynchronously-and-avoid-synchronous-compaction-in-the-faster-path.patch
mm-migration-allow-migration-to-operate-asynchronously-and-avoid-synchronous-compaction-in-the-faster-path-fix.patch
mm-migration-cleanup-migrate_pages-api-by-matching-types-for-offlining-and-sync.patch
mm-compaction-perform-a-faster-migration-scan-when-migrating-asynchronously.patch
mm-vmscan-rename-lumpy_mode-to-reclaim_mode.patch
mm-deactivate-invalidated-pages.patch
mm-deactivate-invalidated-pages-fix.patch
mm-remove-unused-get_vm_area_node.patch
mm-remove-gfp-mask-from-pcpu_get_vm_areas.patch
mm-unify-module_alloc-code-for-vmalloc.patch
oom-allow-a-non-cap_sys_resource-proces-to-oom_score_adj-down.patch
mm-clear-pageerror-bit-in-msync-fsync.patch
frv-duplicate-output_buffer-of-e03.patch
frv-duplicate-output_buffer-of-e03-checkpatch-fixes.patch
hpet-factor-timer-allocate-from-open.patch
kernel-power-changed-makefile-to-use-proper-ccflag-flag.patch
um-mark-config_highmem-as-broken.patch
arch-um-drivers-linec-safely-iterate-over-list-of-winch-handlers.patch
kmsg_dump-constrain-mtdoops-and-ramoops-to-perform-their-actions-only-for-kmsg_dump_panic.patch
kmsg_dump-add-kmsg_dump-calls-to-the-reboot-halt-poweroff-and-emergency_restart-paths.patch
set_rtc_mmss-show-warning-message-only-once.patch
include-linux-kernelh-abs-fix-handling-of-32-bit-unsigneds-on-64-bit.patch
include-linux-kernelh-abs-fix-handling-of-32-bit-unsigneds-on-64-bit-fix.patch
add-the-common-dma_addr_t-typedef-to-include-linux-typesh.patch
dca-remove-unneeded-null-check.patch
scripts-get_maintainerpl-make-rolestats-the-default.patch
scripts-get_maintainerpl-use-git-fallback-more-often.patch
maintainers-intel-gfx-is-a-subscribers-only-mailing-list.patch
percpucounter-optimize-__percpu_counter_add-a-bit-through-the-use-of-this_cpu-operations.patch
drivers-mmc-host-omapc-use-resource_size.patch
drivers-mmc-host-omap_hsmmcc-use-resource_size.patch
scripts-checkpatchpl-add-check-for-multiple-terminating-semicolons-and-casts-of-vmalloc.patch
checkpatchpl-fix-cast-detection.patch
fs-select-fix-information-leak-to-userspace.patch
fs-select-fix-information-leak-to-userspace-fix.patch
epoll-convert-max_user_watches-to-long.patch
binfmt_elf-cleanups.patch
drivers-rtc-rtc-omapc-fix-a-memory-leak.patch
rtc-add-real-time-clock-driver-for-nvidia-tegra.patch
drivers-gpio-cs5535-gpioc-add-some-additional-cs5535-specific-gpio-functionality.patch
drivers-staging-olpc_dcon-convert-to-new-cs5535-gpio-api.patch
cyber2000fb-avoid-palette-corruption-at-higher-clocks.patch
jbd-remove-dependency-on-__gfp_nofail.patch
memcg-add-page_cgroup-flags-for-dirty-page-tracking.patch
memcg-document-cgroup-dirty-memory-interfaces.patch
memcg-document-cgroup-dirty-memory-interfaces-fix.patch
memcg-create-extensible-page-stat-update-routines.patch
memcg-add-lock-to-synchronize-page-accounting-and-migration.patch
memcg-use-zalloc-rather-than-mallocmemset.patch
fs-proc-basec-kernel-latencytopc-convert-sprintf_symbol-to-%ps.patch
fs-proc-basec-kernel-latencytopc-convert-sprintf_symbol-to-%ps-checkpatch-fixes.patch
proc-use-unsigned-long-inside-proc-statm.patch
exec_domain-establish-a-linux32-domain-on-config_compat-systems.patch
rapidio-use-common-destid-storage-for-endpoints-and-switches.patch
rapidio-integrate-rio_switch-into-rio_dev.patch
fs-execc-provide-the-correct-process-pid-to-the-pipe-helper.patch
nfc-driver-for-nxp-semiconductors-pn544-nfc-chip.patch
nfc-driver-for-nxp-semiconductors-pn544-nfc-chip-update.patch
remove-dma64_addr_t.patch
pps-trivial-fixes.patch
pps-declare-variables-where-they-are-used-in-switch.patch
pps-fix-race-in-pps_fetch-handler.patch
pps-unify-timestamp-gathering.patch
pps-access-pps-device-by-direct-pointer.patch
pps-convert-printk-pr_-to-dev_.patch
pps-move-idr-stuff-to-ppsc.patch
pps-add-async-pps-event-handler.patch
pps-add-async-pps-event-handler-fix.patch
pps-dont-disable-interrupts-when-using-spin-locks.patch
pps-use-bug_on-for-kernel-api-safety-checks.patch
pps-simplify-conditions-a-bit.patch
ntp-add-hardpps-implementation.patch
pps-capture-monotonic_raw-timestamps-as-well.patch
pps-add-kernel-consumer-support.patch
pps-add-parallel-port-pps-client.patch
pps-add-parallel-port-pps-signal-generator.patch
memstick-a-few-changes-to-core.patch
memstick-add-support-for-legacy-memorysticks.patch
memstick-add-driver-for-ricoh-r5c592-card-reader.patch
memstick-add-driver-for-ricoh-r5c592-card-reader-fix.patch
memstick-core-fix-device_register-error-handling.patch
w1-ds2423-counter-driver-and-documentation.patch
w1-ds2423-counter-driver-and-documentation-fix.patch
romfs-have-romfs_fsh-pull-in-necessary-headers.patch
decompressors-add-missing-init-ie-__init.patch
decompressors-get-rid-of-set_error_fn-macro.patch
decompressors-include-linux-slabh-in-linux-decompress-mmh.patch
decompressors-remove-unused-function-from-lib-decompress_unlzmac.patch
make-sure-nobodys-leaking-resources.patch
journal_add_journal_head-debug.patch
releasing-resources-with-children.patch
make-frame_pointer-default=y.patch
mutex-subsystem-synchro-test-module.patch
mutex-subsystem-synchro-test-module-add-missing-header-file.patch
slab-leaks3-default-y.patch
put_bh-debug.patch
add-debugging-aid-for-memory-initialisation-problems.patch
workaround-for-a-pci-restoring-bug.patch
prio_tree-debugging-patch.patch
single_open-seq_release-leak-diagnostics.patch
add-a-refcount-check-in-dput.patch
getblk-handle-2tb-devices.patch
memblock-add-input-size-checking-to-memblock_find_region.patch
memblock-add-input-size-checking-to-memblock_find_region-fix.patch


2010-11-24 04:53:38

by Valdis Klētnieks

[permalink] [raw]
Subject: mmotm 2010-11-23 - lockdep whinge in e1000e driver

On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>
> http://userweb.kernel.org/~akpm/mmotm/

Whinges during boot while bringing up the ethernet interface:

[ 1.081504] ===================================================
[ 1.081507] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 1.081509] ---------------------------------------------------
[ 1.081512] include/linux/inetdevice.h:208 invoked rcu_dereference_check() without protection!
[ 1.081514]
[ 1.081515] other info that might help us debug this:
[ 1.081516]
[ 1.081518]
[ 1.081518] rcu_scheduler_active = 1, debug_locks = 1
[ 1.081521] 3 locks held by swapper/1:
[ 1.081523] #0: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff812d0b57>] device_lock+0xf/0x11
[ 1.081534] #1: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff812d0b57>] device_lock+0xf/0x11
[ 1.081541] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff8142dee8>] rtnl_lock+0x12/0x14
[ 1.081549]
[ 1.081550] stack backtrace:
[ 1.081553] Pid: 1, comm: swapper Not tainted 2.6.37-rc3-mmotm1123 #3
[ 1.081555] Call Trace:
[ 1.081562] [<ffffffff81069580>] lockdep_rcu_dereference+0x9d/0xa5
[ 1.081567] [<ffffffff8147b235>] __in_dev_get_rcu.clone.12+0x3f/0x47
[ 1.081571] [<ffffffff8147b24d>] inet_get_link_af_size+0x10/0x1f
[ 1.081575] [<ffffffff8142ce16>] if_nlmsg_size+0xd5/0x111
[ 1.081579] [<ffffffff8142ecf6>] rtmsg_ifinfo+0x1f/0xeb
[ 1.081584] [<ffffffff8105d78e>] ? raw_notifier_call_chain+0xf/0x11
[ 1.081589] [<ffffffff81421ee7>] register_netdevice+0x3ea/0x410
[ 1.081593] [<ffffffff81421f47>] register_netdev+0x3a/0x4c
[ 1.081599] [<ffffffff81551cc2>] e1000_probe+0x986/0xb6f
[ 1.081604] [<ffffffff81237b2e>] local_pci_probe+0x3f/0x70
[ 1.081608] [<ffffffff81237eae>] pci_device_probe+0x65/0x96
[ 1.081614] [<ffffffff8115a82a>] ? sysfs_create_link+0xe/0x10
[ 1.081617] [<ffffffff812d0fe0>] driver_probe_device+0xe8/0x182
[ 1.081621] [<ffffffff812d10c4>] __driver_attach+0x4a/0x6b
[ 1.081625] [<ffffffff812d107a>] ? __driver_attach+0x0/0x6b
[ 1.081629] [<ffffffff812d01cf>] bus_for_each_dev+0x57/0x83
[ 1.081633] [<ffffffff812d0ca5>] driver_attach+0x19/0x1b
[ 1.081637] [<ffffffff812d08e7>] bus_add_driver+0xae/0x205
[ 1.081641] [<ffffffff812d1324>] driver_register+0xb5/0x122
[ 1.081646] [<ffffffff81b455cb>] ? e1000_init_module+0x0/0x3e
[ 1.081650] [<ffffffff812380e4>] __pci_register_driver+0x61/0xcd
[ 1.081654] [<ffffffff81b455cb>] ? e1000_init_module+0x0/0x3e
[ 1.081658] [<ffffffff81b45607>] e1000_init_module+0x3c/0x3e
[ 1.081663] [<ffffffff810002ff>] do_one_initcall+0x7a/0x12f
[ 1.081668] [<ffffffff81b1fd08>] kernel_init+0x15d/0x1e7
[ 1.081672] [<ffffffff810035d4>] kernel_thread_helper+0x4/0x10
[ 1.081678] [<ffffffff8102f845>] ? finish_task_switch+0x3f/0xe3
[ 1.081682] [<ffffffff8155b5c0>] ? restore_args+0x0/0x30
[ 1.081686] [<ffffffff81b1fbab>] ? kernel_init+0x0/0x1e7
[ 1.081690] [<ffffffff810035d0>] ? kernel_thread_helper+0x0/0x10
[ 1.081731] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:24:e8:c6:ad:17


Attachments:
(No filename) (227.00 B)

2010-11-24 04:55:49

by Valdis Klētnieks

[permalink] [raw]
Subject: mmotm 2010-11-23 - WARNING: at drivers/tty/tty_io.c:1331

On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>
> http://userweb.kernel.org/~akpm/mmotm/

Seen during boot:

[ 22.859616] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[ 23.015434] ------------[ cut here ]------------
[ 23.015443] WARNING: at drivers/tty/tty_io.c:1331 tty_open+0x2a2/0x49a()
[ 23.015446] Hardware name: Latitude E6500
[ 23.015448] Modules linked in:
[ 23.015453] Pid: 1207, comm: plymouthd Not tainted 2.6.37-rc3-mmotm1123 #3
[ 23.015455] Call Trace:
[ 23.015461] [<ffffffff8103b189>] warn_slowpath_common+0x80/0x98
[ 23.015465] [<ffffffff8103b1b6>] warn_slowpath_null+0x15/0x17
[ 23.015469] [<ffffffff8128a3ab>] tty_open+0x2a2/0x49a
[ 23.015475] [<ffffffff810fd53f>] chrdev_open+0x11d/0x146
[ 23.015479] [<ffffffff810fd422>] ? chrdev_open+0x0/0x146
[ 23.015483] [<ffffffff810f7b4c>] __dentry_open+0x31a/0x483
[ 23.015488] [<ffffffff810f88fe>] nameidata_to_filp+0x50/0x57
[ 23.015492] [<ffffffff81105e53>] do_last+0x448/0x5b2
[ 23.015497] [<ffffffff81229229>] ? __raw_spin_lock_init+0x31/0x50
[ 23.015501] [<ffffffff81106205>] do_filp_open+0x248/0x64a
[ 23.015507] [<ffffffff810a5c4d>] ? trace_preempt_on+0x15/0x28
[ 23.015511] [<ffffffff81110cda>] ? alloc_fd+0x17c/0x18e
[ 23.015516] [<ffffffff8155adfc>] ? _raw_spin_unlock+0x30/0x69
[ 23.015521] [<ffffffff8155e3b8>] ? sub_preempt_count+0x35/0x49
[ 23.015525] [<ffffffff81110cda>] ? alloc_fd+0x17c/0x18e
[ 23.015529] [<ffffffff810f8965>] do_sys_open+0x60/0xfb
[ 23.015533] [<ffffffff8155a64b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 23.015537] [<ffffffff810f8a1b>] sys_open+0x1b/0x1d
[ 23.015542] [<ffffffff8100277b>] system_call_fastpath+0x16/0x1b
[ 23.015545] ---[ end trace 12db3a7ab6675b51 ]---


Attachments:
(No filename) (227.00 B)

2010-11-24 05:02:01

by Valdis Klētnieks

[permalink] [raw]
Subject: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>
> http://userweb.kernel.org/~akpm/mmotm/

(I appear to be on a roll tonight - 3 splats before I even had a chance to login. :)

mmotm + Ingo's cleanup of Mike's autogroups patch.

[ 114.569222] =================================
[ 114.578171] [ INFO: inconsistent lock state ]
[ 114.578171] 2.6.37-rc3-mmotm1123 #3
[ 114.578171] ---------------------------------
[ 114.578171] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 114.578171] kworker/0:0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 114.578171] (&(&sighand->siglock)->rlock){?.+...}, at: [<ffffffff8104bfb1>] __lock_task_sighand+0x88/0xd6
[ 114.578171] {HARDIRQ-ON-W} state was registered at:
[ 114.578171] [<ffffffff8106a9a9>] __lock_acquire+0x358/0xd4e
[ 114.578171] [<ffffffff8106b8b1>] lock_acquire+0x100/0x126
[ 114.578171] [<ffffffff8155a849>] _raw_spin_lock+0x36/0x45
[ 114.578171] [<ffffffff81030bc6>] sched_autogroup_fork+0x30/0x61
[ 114.578171] [<ffffffff8103995a>] copy_process+0x994/0x1325
[ 114.578171] [<ffffffff8103a4ca>] do_fork+0x1ae/0x3e3
[ 114.578171] [<ffffffff81009603>] kernel_thread+0x6b/0x6d
[ 114.578171] [<ffffffff8105832e>] kthreadd+0xdd/0x11f
[ 114.578171] [<ffffffff810035d4>] kernel_thread_helper+0x4/0x10
[ 114.578171] irq event stamp: 1137212
[ 114.578171] hardirqs last enabled at (1137209): [<ffffffff8155ae6f>] _raw_spin_unlock_irqrestore+0x3a/0x80
[ 114.578171] hardirqs last disabled at (1137210): [<ffffffff8155b467>] save_args+0x67/0x70
[ 114.578171] softirqs last enabled at (1137212): [<ffffffff810414f3>] _local_bh_enable+0xe/0x10
[ 114.578171] softirqs last disabled at (1137211): [<ffffffff81041edd>] irq_enter+0x3d/0x6f
[ 114.578171]
[ 114.578171] other info that might help us debug this:
[ 114.578171] 3 locks held by kworker/0:0/0:
[ 114.578171] #0: (&(&new_timer->it_lock)->rlock){-.....}, at: [<ffffffff81056e7b>] posix_timer_fn+0x24/0xc7
[ 114.578171] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81056d77>] rcu_read_lock+0x0/0x35
[ 114.578171] #2: (rcu_read_lock){.+.+..}, at: [<ffffffff8104a6e6>] rcu_read_lock+0x0/0x35
[ 114.578171]
[ 114.578171] stack backtrace:
[ 114.578171] Pid: 0, comm: kworker/0:0 Tainted: G W 2.6.37-rc3-mmotm1123 #3
[ 114.578171] Call Trace:
[ 114.578171] <IRQ> [<ffffffff8106a467>] valid_state+0x17c/0x18e
[ 114.578171] [<ffffffff81069d2c>] ? check_usage_forwards+0x0/0x87
[ 114.578171] [<ffffffff8106a558>] mark_lock+0xdf/0x1d8
[ 114.578171] [<ffffffff81069d2c>] ? check_usage_forwards+0x0/0x87
[ 114.578171] [<ffffffff8106a928>] __lock_acquire+0x2d7/0xd4e
[ 114.578171] [<ffffffff8106a4a6>] ? mark_lock+0x2d/0x1d8
[ 114.578171] [<ffffffff8104bfb1>] ? __lock_task_sighand+0x88/0xd6
[ 114.578171] [<ffffffff8106b8b1>] lock_acquire+0x100/0x126
[ 114.578171] [<ffffffff8104bfb1>] ? __lock_task_sighand+0x88/0xd6
[ 114.578171] [<ffffffff8155a942>] _raw_spin_lock_irqsave+0x44/0x57
[ 114.578171] [<ffffffff8104bfb1>] ? __lock_task_sighand+0x88/0xd6
[ 114.578171] [<ffffffff8104bfb1>] __lock_task_sighand+0x88/0xd6
[ 114.578171] [<ffffffff8104c6b3>] send_sigqueue+0x51/0x162
[ 114.578171] [<ffffffff81056e42>] posix_timer_event+0x3f/0x54
[ 114.578171] [<ffffffff81056ea1>] posix_timer_fn+0x4a/0xc7
[ 114.578171] [<ffffffff812294fd>] ? do_raw_spin_unlock+0xd0/0xfa
[ 114.578171] [<ffffffff8105bb7e>] __run_hrtimer+0x13e/0x27a
[ 114.578171] [<ffffffff81056e57>] ? posix_timer_fn+0x0/0xc7
[ 114.578171] [<ffffffff8105c5f3>] hrtimer_interrupt+0xea/0x1d6
[ 114.578171] [<ffffffff8101ad4f>] smp_apic_timer_interrupt+0x74/0x87
[ 114.578171] [<ffffffff81003193>] apic_timer_interrupt+0x13/0x20
[ 114.578171] <EOI> [<ffffffff81000cf5>] ? cpu_idle+0x42/0x14e
[ 114.578171] [<ffffffff81000dd5>] ? cpu_idle+0x122/0x14e
[ 114.578171] [<ffffffff81b57170>] start_secondary+0x1a9/0x1ad
~


Attachments:
(No filename) (227.00 B)

2010-11-24 13:56:08

by Zimny Lech

[permalink] [raw]
Subject: Re: mmotm 2010-11-23-16-12 uploaded

Ave

2010/11/24 <[email protected]>:
> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to

So far, so good - eight builds and one error (AFAICS known issue)

'make CONFIG_DEBUG_SECTION_MISMATCH=y'
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
drivers/built-in.o: In function `timblogiw_close':
/home/test/linux-2.6-mm/drivers/media/video/timblogiw.c:704: undefined
reference to `dma_release_channel'
drivers/built-in.o: In function `buffer_release':
/home/test/linux-2.6-mm/drivers/media/video/timblogiw.c:595: undefined
reference to `dma_sync_wait'
drivers/built-in.o: In function `timblogiw_open':
/home/test/linux-2.6-mm/drivers/media/video/timblogiw.c:671: undefined
reference to `__dma_request_channel'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [sub-make] Error 2






--
Slawa!
N.P.S.

Chwa?a tobie, Szatanie, cze?? na wysoko?ciach
Nieba, gdzie kr?lowa?e?, chwa?a w g??boko?ciach
Piek?a, gdzie zwyci??ony, trwasz w dumnym milczeniu!
Uczy?, niechaj ma dusza spocznie z Tob? w cieniu
Drzewa Wiedzy, gdy swoje konary rozwinie,
Jak sklepienie ko?cio?a, kt?ry nie przeminie!

2010-11-24 18:51:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: mmotm 2010-11-23-16-12 uploaded (olpc)

On Tue, 23 Nov 2010 16:13:06 -0800 [email protected] wrote:

> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>
> http://userweb.kernel.org/~akpm/mmotm/
>
> and will soon be available at
>
> git://zen-kernel.org/kernel/mmotm.git


make[4]: *** No rule to make target `arch/x86/platform/olpc/olpc-xo1-wakeup.c', needed by `arch/x86/platform/olpc/olpc-xo1-wakeup.o'.


It's olpc-xo1-wakeup.S, so I guess it needs a special makefile rule ??

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2010-11-24 19:13:58

by Andres Salomon

[permalink] [raw]
Subject: Re: mmotm 2010-11-23-16-12 uploaded (olpc)

On Wed, 24 Nov 2010 10:51:26 -0800
Randy Dunlap <[email protected]> wrote:

> On Tue, 23 Nov 2010 16:13:06 -0800 [email protected] wrote:
>
> > The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
> >
> > http://userweb.kernel.org/~akpm/mmotm/
> >
> > and will soon be available at
> >
> > git://zen-kernel.org/kernel/mmotm.git=
>
>
> make[4]: *** No rule to make target
> `arch/x86/platform/olpc/olpc-xo1-wakeup.c', needed by
> `arch/x86/platform/olpc/olpc-xo1-wakeup.o'.
>
>
> It's olpc-xo1-wakeup.S, so I guess it needs a special makefile rule ??
>

I had trouble with this as well (and after flailing at it a bit, ended
up just dropping the olpc pm stuff from my tree for now). The build
failure is definitely config-specific. I suspected that it needs
something like the following, but failed to figure it out:

foo-y := olpc-xo1-wakeup.o
obj-$(CONFIG_OLPC_XO1) += olpc-xo1.o foo.o

2010-11-24 19:42:21

by Randy Dunlap

[permalink] [raw]
Subject: [PATCH -mmotm/-next] media: fix timblogiw kconfig & build error

From: Randy Dunlap <[email protected]>

timblogiw uses dma() interfaces and it selects TIMB_DMA for that
support. However, drivers/dma/ is not built unless
CONFIG_DMA_ENGINE is enabled, so select/enable that symbol also.

drivers/built-in.o: In function `timblogiw_close':
timblogiw.c:(.text+0x4419fe): undefined reference to `dma_release_channel'
drivers/built-in.o: In function `buffer_release':
timblogiw.c:(.text+0x441a8d): undefined reference to `dma_sync_wait'
drivers/built-in.o: In function `timblogiw_open':
timblogiw.c:(.text+0x44212b): undefined reference to `__dma_request_channel'

Signed-off-by: Randy Dunlap <[email protected]>
---
drivers/media/video/Kconfig | 1 +
1 file changed, 1 insertion(+)

--- mmotm-2010-1123-1612.orig/drivers/media/video/Kconfig
+++ mmotm-2010-1123-1612/drivers/media/video/Kconfig
@@ -669,6 +669,7 @@ config VIDEO_HEXIUM_GEMINI
config VIDEO_TIMBERDALE
tristate "Support for timberdale Video In/LogiWIN"
depends on VIDEO_V4L2 && I2C
+ select DMA_ENGINE
select TIMB_DMA
select VIDEO_ADV7180
select VIDEOBUF_DMA_CONTIG

2010-11-24 20:25:46

by Mike Galbraith

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Wed, 2010-11-24 at 00:01 -0500, [email protected] wrote:
> On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> > The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
> >
> > http://userweb.kernel.org/~akpm/mmotm/
>
> (I appear to be on a roll tonight - 3 splats before I even had a chance to login. :)
>
> mmotm + Ingo's cleanup of Mike's autogroups patch.
...

Sorry for slow response, been trying to use some of my last few vacation
days on vacation stuff ;-)

The below should run gripe free. Suppose I should learn to turn on
lockdep and whatnot when tinkering/testing.

Unfortunately, tip's update_shares() changes are still being difficult.

static void update_shares(int cpu)
{
struct cfs_rq *cfs_rq;
struct rq *rq = cpu_rq(cpu);

rcu_read_lock();
for_each_leaf_cfs_rq(rq, cfs_rq)
update_shares_cpu(cfs_rq->tg, cpu);
rcu_read_unlock();
}

Despite task groups being freed via rcu, update_shares_cup() hits freed
memory and explodes, and nothing I've tried has been able to stop it.
The only thing I haven't tried (aside from the right thing;) is to take
rcu out of the picture entirely.


From: Mike Galbraith <[email protected]>
Date: Sat, 20 Nov 2010 12:35:00 -0700
Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups

A recurring complaint from CFS users is that parallel kbuild has a negative
impact on desktop interactivity. This patch implements an idea from Linus,
to automatically create task groups. Currently, only per session autogroups
are implemented, but the patch leaves the way open for enhancement.

Implementation: each task's signal struct contains an inherited pointer to
a refcounted autogroup struct containing a task group pointer, the default
for all tasks pointing to the init_task_group. When a task calls setsid(),
a new task group is created, the process is moved into the new task group,
and a reference to the preveious task group is dropped. Child processes
inherit this task group thereafter, and increase it's refcount. When the
last thread of a process exits, the process's reference is dropped, such
that when the last process referencing an autogroup exits, the autogroup
is destroyed.

At runqueue selection time, IFF a task has no cgroup assignment, its current
autogroup is used.

Autogroup bandwidth is controllable via setting it's nice level through the
proc filesystem. cat /proc/<pid>/autogroup displays the task's group and the
group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task
group's shares to the weight of nice <level> task. Setting nice level is rate
limited for !admin users due to the abuse risk of task group locking.

The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
selected, but can be disabled via the boot option noautogroup, and can also
be turned on/off on the fly via..
echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
..which will automatically move tasks to/from the root task group.

Signed-off-by: Mike Galbraith <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Markus Trippelsdorf <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/kernel-parameters.txt | 2
fs/proc/base.c | 79 +++++++++++
include/linux/sched.h | 23 +++
init/Kconfig | 12 +
kernel/fork.c | 5
kernel/sched.c | 13 +
kernel/sched_autogroup.c | 240 ++++++++++++++++++++++++++++++++++++
kernel/sched_autogroup.h | 23 +++
kernel/sched_debug.c | 29 ++--
kernel/sys.c | 4
kernel/sysctl.c | 11 +
11 files changed, 423 insertions(+), 18 deletions(-)

Index: linux-2.6.37.git/include/linux/sched.h
===================================================================
--- linux-2.6.37.git.orig/include/linux/sched.h
+++ linux-2.6.37.git/include/linux/sched.h
@@ -509,6 +509,8 @@ struct thread_group_cputimer {
spinlock_t lock;
};

+struct autogroup;
+
/*
* NOTE! "signal_struct" does not have it's own
* locking, because a shared signal_struct always
@@ -576,6 +578,9 @@ struct signal_struct {

struct tty_struct *tty; /* NULL if no tty */

+#ifdef CONFIG_SCHED_AUTOGROUP
+ struct autogroup *autogroup;
+#endif
/*
* Cumulative resource counters for dead threads in the group,
* and for reaped dead child processes forked by this group.
@@ -1931,6 +1936,24 @@ int sched_rt_handler(struct ctl_table *t

extern unsigned int sysctl_sched_compat_yield;

+#ifdef CONFIG_SCHED_AUTOGROUP
+extern unsigned int sysctl_sched_autogroup_enabled;
+
+extern void sched_autogroup_create_attach(struct task_struct *p);
+extern void sched_autogroup_detach(struct task_struct *p);
+extern void sched_autogroup_fork(struct signal_struct *sig);
+extern void sched_autogroup_exit(struct signal_struct *sig);
+#ifdef CONFIG_PROC_FS
+extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
+extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
+#endif
+#else
+static inline void sched_autogroup_create_attach(struct task_struct *p) { }
+static inline void sched_autogroup_detach(struct task_struct *p) { }
+static inline void sched_autogroup_fork(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+#endif
+
#ifdef CONFIG_RT_MUTEXES
extern int rt_mutex_getprio(struct task_struct *p);
extern void rt_mutex_setprio(struct task_struct *p, int prio);
Index: linux-2.6.37.git/kernel/sched.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched.c
+++ linux-2.6.37.git/kernel/sched.c
@@ -78,6 +78,7 @@

#include "sched_cpupri.h"
#include "workqueue_sched.h"
+#include "sched_autogroup.h"

#define CREATE_TRACE_POINTS
#include <trace/events/sched.h>
@@ -268,6 +269,10 @@ struct task_group {
struct task_group *parent;
struct list_head siblings;
struct list_head children;
+
+#ifdef CONFIG_SCHED_AUTOGROUP
+ struct autogroup *autogroup;
+#endif
};

#define root_task_group init_task_group
@@ -605,11 +610,14 @@ static inline int cpu_of(struct rq *rq)
*/
static inline struct task_group *task_group(struct task_struct *p)
{
+ struct task_group *tg;
struct cgroup_subsys_state *css;

css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
lockdep_is_held(&task_rq(p)->lock));
- return container_of(css, struct task_group, css);
+ tg = container_of(css, struct task_group, css);
+
+ return autogroup_task_group(p, tg);
}

/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
@@ -2006,6 +2014,7 @@ static void sched_irq_time_avg_update(st
#include "sched_idletask.c"
#include "sched_fair.c"
#include "sched_rt.c"
+#include "sched_autogroup.c"
#include "sched_stoptask.c"
#ifdef CONFIG_SCHED_DEBUG
# include "sched_debug.c"
@@ -7979,7 +7988,7 @@ void __init sched_init(void)
#ifdef CONFIG_CGROUP_SCHED
list_add(&init_task_group.list, &task_groups);
INIT_LIST_HEAD(&init_task_group.children);
-
+ autogroup_init(&init_task);
#endif /* CONFIG_CGROUP_SCHED */

#if defined CONFIG_FAIR_GROUP_SCHED && defined CONFIG_SMP
Index: linux-2.6.37.git/kernel/fork.c
===================================================================
--- linux-2.6.37.git.orig/kernel/fork.c
+++ linux-2.6.37.git/kernel/fork.c
@@ -174,8 +174,10 @@ static inline void free_signal_struct(st

static inline void put_signal_struct(struct signal_struct *sig)
{
- if (atomic_dec_and_test(&sig->sigcnt))
+ if (atomic_dec_and_test(&sig->sigcnt)) {
+ sched_autogroup_exit(sig);
free_signal_struct(sig);
+ }
}

void __put_task_struct(struct task_struct *tsk)
@@ -904,6 +906,7 @@ static int copy_signal(unsigned long clo
posix_cpu_timers_init_group(sig);

tty_audit_fork(sig);
+ sched_autogroup_fork(sig);

sig->oom_adj = current->signal->oom_adj;
sig->oom_score_adj = current->signal->oom_score_adj;
Index: linux-2.6.37.git/kernel/sys.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sys.c
+++ linux-2.6.37.git/kernel/sys.c
@@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
err = session;
out:
write_unlock_irq(&tasklist_lock);
- if (err > 0)
+ if (err > 0) {
proc_sid_connector(group_leader);
+ sched_autogroup_create_attach(group_leader);
+ }
return err;
}

Index: linux-2.6.37.git/kernel/sched_debug.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched_debug.c
+++ linux-2.6.37.git/kernel/sched_debug.c
@@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
}
#endif

+#if defined(CONFIG_CGROUP_SCHED) && \
+ (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+static void task_group_path(struct task_group *tg, char *buf, int buflen)
+{
+ /* may be NULL if the underlying cgroup isn't fully-created yet */
+ if (!tg->css.cgroup) {
+ if (!autogroup_path(tg, buf, buflen))
+ buf[0] = '\0';
+ return;
+ }
+ cgroup_path(tg->css.cgroup, buf, buflen);
+}
+#endif
+
static void
print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
{
@@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
char path[64];

rcu_read_lock();
- cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+ task_group_path(task_group(p), path, sizeof(path));
rcu_read_unlock();
SEQ_printf(m, " %s", path);
}
@@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
read_unlock_irqrestore(&tasklist_lock, flags);
}

-#if defined(CONFIG_CGROUP_SCHED) && \
- (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
-static void task_group_path(struct task_group *tg, char *buf, int buflen)
-{
- /* may be NULL if the underlying cgroup isn't fully-created yet */
- if (!tg->css.cgroup) {
- buf[0] = '\0';
- return;
- }
- cgroup_path(tg->css.cgroup, buf, buflen);
-}
-#endif
-
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
Index: linux-2.6.37.git/fs/proc/base.c
===================================================================
--- linux-2.6.37.git.orig/fs/proc/base.c
+++ linux-2.6.37.git/fs/proc/base.c
@@ -1407,6 +1407,82 @@ static const struct file_operations proc

#endif

+#ifdef CONFIG_SCHED_AUTOGROUP
+/*
+ * Print out autogroup related information:
+ */
+static int sched_autogroup_show(struct seq_file *m, void *v)
+{
+ struct inode *inode = m->private;
+ struct task_struct *p;
+
+ p = get_proc_task(inode);
+ if (!p)
+ return -ESRCH;
+ proc_sched_autogroup_show_task(p, m);
+
+ put_task_struct(p);
+
+ return 0;
+}
+
+static ssize_t
+sched_autogroup_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *offset)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct task_struct *p;
+ char buffer[PROC_NUMBUF];
+ long nice;
+ int err;
+
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ return -EFAULT;
+
+ err = strict_strtol(strstrip(buffer), 0, &nice);
+ if (err)
+ return -EINVAL;
+
+ p = get_proc_task(inode);
+ if (!p)
+ return -ESRCH;
+
+ err = nice;
+ err = proc_sched_autogroup_set_nice(p, &err);
+ if (err)
+ count = err;
+
+ put_task_struct(p);
+
+ return count;
+}
+
+static int sched_autogroup_open(struct inode *inode, struct file *filp)
+{
+ int ret;
+
+ ret = single_open(filp, sched_autogroup_show, NULL);
+ if (!ret) {
+ struct seq_file *m = filp->private_data;
+
+ m->private = inode;
+ }
+ return ret;
+}
+
+static const struct file_operations proc_pid_sched_autogroup_operations = {
+ .open = sched_autogroup_open,
+ .read = seq_read,
+ .write = sched_autogroup_write,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
+
static ssize_t comm_write(struct file *file, const char __user *buf,
size_t count, loff_t *offset)
{
@@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
+#ifdef CONFIG_SCHED_AUTOGROUP
+ REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
INF("syscall", S_IRUSR, proc_pid_syscall),
Index: linux-2.6.37.git/kernel/sched_autogroup.h
===================================================================
--- /dev/null
+++ linux-2.6.37.git/kernel/sched_autogroup.h
@@ -0,0 +1,23 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg);
+
+#else /* !CONFIG_SCHED_AUTOGROUP */
+
+static inline void autogroup_init(struct task_struct *init_task) { }
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+ return tg;
+}
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+ return 0;
+}
+#endif
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.37.git/kernel/sched_autogroup.c
===================================================================
--- /dev/null
+++ linux-2.6.37.git/kernel/sched_autogroup.c
@@ -0,0 +1,240 @@
+#ifdef CONFIG_SCHED_AUTOGROUP
+
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/utsname.h>
+
+unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
+
+struct autogroup {
+ struct kref kref;
+ struct task_group *tg;
+ struct rw_semaphore lock;
+ unsigned long id;
+ int nice;
+};
+
+static struct autogroup autogroup_default;
+static atomic_t autogroup_seq_nr;
+
+static void autogroup_init(struct task_struct *init_task)
+{
+ autogroup_default.tg = &init_task_group;
+ init_task_group.autogroup = &autogroup_default;
+ kref_init(&autogroup_default.kref);
+ init_rwsem(&autogroup_default.lock);
+ init_task->signal->autogroup = &autogroup_default;
+}
+
+static inline void autogroup_free(struct task_group *tg)
+{
+ kfree(tg->autogroup);
+}
+
+static inline void autogroup_destroy(struct kref *kref)
+{
+ struct autogroup *ag = container_of(kref, struct autogroup, kref);
+
+ sched_destroy_group(ag->tg);
+}
+
+static inline void autogroup_kref_put(struct autogroup *ag)
+{
+ kref_put(&ag->kref, autogroup_destroy);
+}
+
+static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
+{
+ kref_get(&ag->kref);
+ return ag;
+}
+
+static inline struct autogroup *autogroup_create(void)
+{
+ struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+ struct task_group *tg;
+
+ if (!ag)
+ goto out_fail;
+
+ tg = sched_create_group(&init_task_group);
+
+ if (IS_ERR(tg))
+ goto out_free;
+
+ kref_init(&ag->kref);
+ init_rwsem(&ag->lock);
+ ag->id = atomic_inc_return(&autogroup_seq_nr);
+ ag->tg = tg;
+ tg->autogroup = ag;
+
+ return ag;
+
+out_free:
+ kfree(ag);
+out_fail:
+ if (printk_ratelimit())
+ printk(KERN_WARNING "autogroup_create: %s failure.\n",
+ ag ? "sched_create_group()" : "kmalloc()");
+
+ return autogroup_kref_get(&autogroup_default);
+}
+
+static inline bool
+task_wants_autogroup(struct task_struct *p, struct task_group *tg)
+{
+ if (tg != &root_task_group)
+ return false;
+
+ if (p->sched_class != &fair_sched_class)
+ return false;
+
+ /*
+ * We can only assume the task group can't go away on us if
+ * autogroup_move_group() can see us on ->thread_group list.
+ */
+ if (p->flags & PF_EXITING)
+ return false;
+
+ return true;
+}
+
+static inline struct task_group *
+autogroup_task_group(struct task_struct *p, struct task_group *tg)
+{
+ int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
+
+ if (enabled && task_wants_autogroup(p, tg))
+ return p->signal->autogroup->tg;
+
+ return tg;
+}
+
+static void
+autogroup_move_group(struct task_struct *p, struct autogroup *ag)
+{
+ struct autogroup *prev;
+ struct task_struct *t;
+ unsigned long flags;
+
+ if (!lock_task_sighand(p, &flags)) {
+ WARN_ON(1);
+ return;
+ }
+
+ prev = p->signal->autogroup;
+ if (prev == ag) {
+ unlock_task_sighand(p, &flags);
+ return;
+ }
+
+ p->signal->autogroup = autogroup_kref_get(ag);
+ t = p;
+
+ do {
+ sched_move_task(t);
+ } while_each_thread(p, t);
+
+ unlock_task_sighand(p, &flags);
+ autogroup_kref_put(prev);
+}
+
+/* Allocates GFP_KERNEL, cannot be called under any spinlock */
+void sched_autogroup_create_attach(struct task_struct *p)
+{
+ struct autogroup *ag = autogroup_create();
+
+ autogroup_move_group(p, ag);
+ /* drop extra refrence added by autogroup_create() */
+ autogroup_kref_put(ag);
+}
+EXPORT_SYMBOL(sched_autogroup_create_attach);
+
+/* Cannot be called under siglock. Currently has no users */
+void sched_autogroup_detach(struct task_struct *p)
+{
+ autogroup_move_group(p, &autogroup_default);
+}
+EXPORT_SYMBOL(sched_autogroup_detach);
+
+void sched_autogroup_fork(struct signal_struct *sig)
+{
+ struct task_struct *p = current;
+
+ spin_lock_irq(&p->sighand->siglock);
+ sig->autogroup = autogroup_kref_get(p->signal->autogroup);
+ spin_unlock_irq(&p->sighand->siglock);
+}
+
+void sched_autogroup_exit(struct signal_struct *sig)
+{
+ autogroup_kref_put(sig->autogroup);
+}
+
+static int __init setup_autogroup(char *str)
+{
+ sysctl_sched_autogroup_enabled = 0;
+
+ return 1;
+}
+
+__setup("noautogroup", setup_autogroup);
+
+#ifdef CONFIG_PROC_FS
+
+/* Called with siglock held. */
+int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
+{
+ static unsigned long next = INITIAL_JIFFIES;
+ struct autogroup *ag;
+ int err;
+
+ if (*nice < -20 || *nice > 19)
+ return -EINVAL;
+
+ err = security_task_setnice(current, *nice);
+ if (err)
+ return err;
+
+ if (*nice < 0 && !can_nice(current, *nice))
+ return -EPERM;
+
+ /* this is a heavy operation taking global locks.. */
+ if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
+ return -EAGAIN;
+
+ next = HZ / 10 + jiffies;
+ ag = autogroup_kref_get(p->signal->autogroup);
+
+ down_write(&ag->lock);
+ err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
+ if (!err)
+ ag->nice = *nice;
+ up_write(&ag->lock);
+
+ autogroup_kref_put(ag);
+
+ return err;
+}
+
+void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
+{
+ struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
+
+ down_read(&ag->lock);
+ seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
+ up_read(&ag->lock);
+
+ autogroup_kref_put(ag);
+}
+#endif /* CONFIG_PROC_FS */
+
+#ifdef CONFIG_SCHED_DEBUG
+static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
+{
+ return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
+#endif /* CONFIG_SCHED_AUTOGROUP */
Index: linux-2.6.37.git/kernel/sysctl.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sysctl.c
+++ linux-2.6.37.git/kernel/sysctl.c
@@ -382,6 +382,17 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+#ifdef CONFIG_SCHED_AUTOGROUP
+ {
+ .procname = "sched_autogroup_enabled",
+ .data = &sysctl_sched_autogroup_enabled,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+#endif
#ifdef CONFIG_PROVE_LOCKING
{
.procname = "prove_locking",
Index: linux-2.6.37.git/init/Kconfig
===================================================================
--- linux-2.6.37.git.orig/init/Kconfig
+++ linux-2.6.37.git/init/Kconfig
@@ -728,6 +728,18 @@ config NET_NS

endif # NAMESPACES

+config SCHED_AUTOGROUP
+ bool "Automatic process group scheduling"
+ select CGROUPS
+ select CGROUP_SCHED
+ select FAIR_GROUP_SCHED
+ help
+ This option optimizes the scheduler for common desktop workloads by
+ automatically creating and populating task groups. This separation
+ of workloads isolates aggressive CPU burners (like build jobs) from
+ desktop applications. Task group autogeneration is currently based
+ upon task session.
+
config MM_OWNER
bool

Index: linux-2.6.37.git/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.37.git.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.37.git/Documentation/kernel-parameters.txt
@@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters.
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.

+ noautogroup Disable scheduler automatic task group creation.
+
nobats [PPC] Do not use BATs for mapping kernel lowmem
on "Classic" PPC cores.




2010-11-24 20:39:30

by Mike Galbraith

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Wed, 2010-11-24 at 13:25 -0700, Mike Galbraith wrote:

> The below should run gripe free.

In <= 2.6.37-rc3 kernel I mean. The tip version is still explosive.

-Mike

2010-11-25 06:09:49

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Wed, 24 Nov 2010 13:25:25 MST, Mike Galbraith said:
> On Wed, 2010-11-24 at 00:01 -0500, [email protected] wrote:
> > On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> > > The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
> > >
> > > http://userweb.kernel.org/~akpm/mmotm/
> >
> > (I appear to be on a roll tonight - 3 splats before I even had a chance to login. :)
> >
> > mmotm + Ingo's cleanup of Mike's autogroups patch.
> ...
>
> Sorry for slow response, been trying to use some of my last few vacation
> days on vacation stuff ;-)
>
> The below should run gripe free. Suppose I should learn to turn on
> lockdep and whatnot when tinkering/testing.

Yes, this version runs quietly, thanks.


Attachments:
(No filename) (227.00 B)

2010-11-25 15:14:44

by Kyle McMartin

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 - WARNING: at drivers/tty/tty_io.c:1331

On Tue, Nov 23, 2010 at 11:55:39PM -0500, [email protected] wrote:
> On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> > The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
> >
> > http://userweb.kernel.org/~akpm/mmotm/
>
> Seen during boot:
>
> [ 23.015448] Modules linked in:
> [ 23.015453] Pid: 1207, comm: plymouthd Not tainted 2.6.37-rc3-mmotm1123 #3
> [ 23.015455] Call Trace:

I've been trying to figure this one out for a while, without much luck.
(Users are seeing it in 2.6.36 as well.)

I *think* (I added a rawhide debugging patch to print the tty->name)
that plymouth is always opening tty7 to cause this. My guess is the BKL
removal has exposed some kind of race, but it's not obvious to me (and
there's many other bugs to sort through too. :(

CC-ing Jiri since he seems to be the poor guy who's been poking this
recently (there's a good few threads about this (though the others look
like an ldisc attach race...)) I wouldn't think that's the case here
since N_TTY is the default...

--Kyle

2010-11-25 16:45:04

by Jiri Slaby

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 - WARNING: at drivers/tty/tty_io.c:1331

On 11/25/2010 04:14 PM, Kyle McMartin wrote:
> On Tue, Nov 23, 2010 at 11:55:39PM -0500, [email protected] wrote:
>> On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
>>> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>>>
>>> http://userweb.kernel.org/~akpm/mmotm/
>>
>> Seen during boot:
>>
>> [ 23.015448] Modules linked in:
>> [ 23.015453] Pid: 1207, comm: plymouthd Not tainted 2.6.37-rc3-mmotm1123 #3
>> [ 23.015455] Call Trace:
>
> I've been trying to figure this one out for a while, without much luck.
> (Users are seeing it in 2.6.36 as well.)
>
> I *think* (I added a rawhide debugging patch to print the tty->name)
> that plymouth is always opening tty7 to cause this. My guess is the BKL
> removal has exposed some kind of race, but it's not obvious to me (and
> there's many other bugs to sort through too. :(
>
> CC-ing Jiri since he seems to be the poor guy who's been poking this
> recently (there's a good few threads about this (though the others look
> like an ldisc attach race...)) I wouldn't think that's the case here
> since N_TTY is the default...

Ok, tty_reopen is called without TTY_LDISC set. For further
considerations, note tty_lock is held in tty_open. TTY_LDISC is cleared in:

1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
section tty_lock is held.

2) tty_release via tty_ldisc_release till the end of tty existence. If
tty->count <= 1, tty_lock is taken, TTY_CLOSING bit set and then
tty_ldisc_release called. tty_reopen checks TTY_CLOSING before checking
TTY_LDISC.

3) tty_set_ldisc from tty_ldisc_halt to tty_ldisc_enable. We take
tty_lock, set TTY_LDISC_CHANGING, put tty_lock, do some other work, take
tty_lock, call tty_ldisc_enable, put tty_lock.

So the only option I see is 3) and we should do:
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1310,7 +1310,8 @@ static int tty_reopen(struct tty_struct *tty)
{
struct tty_driver *driver = tty->driver;

- if (test_bit(TTY_CLOSING, &tty->flags))
+ if (test_bit(TTY_CLOSING, &tty->flags) ||
+ test_bit(TTY_LDISC_CHANGING, &tty->flags))
return -EIO;

if (driver->type == TTY_DRIVER_TYPE_PTY &&

Alan, Greg?

thanks,
--
js
suse labs

2010-11-25 16:51:19

by Jiri Slaby

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 - WARNING: at drivers/tty/tty_io.c:1331

On 11/25/2010 05:44 PM, Jiri Slaby wrote:
> On 11/25/2010 04:14 PM, Kyle McMartin wrote:
>> On Tue, Nov 23, 2010 at 11:55:39PM -0500, [email protected] wrote:
>>> On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
>>>> The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
>>>>
>>>> http://userweb.kernel.org/~akpm/mmotm/
>>>
>>> Seen during boot:
>>>
>>> [ 23.015448] Modules linked in:
>>> [ 23.015453] Pid: 1207, comm: plymouthd Not tainted 2.6.37-rc3-mmotm1123 #3
>>> [ 23.015455] Call Trace:
>>
>> I've been trying to figure this one out for a while, without much luck.
>> (Users are seeing it in 2.6.36 as well.)
>>
>> I *think* (I added a rawhide debugging patch to print the tty->name)
>> that plymouth is always opening tty7 to cause this. My guess is the BKL
>> removal has exposed some kind of race, but it's not obvious to me (and
>> there's many other bugs to sort through too. :(
>>
>> CC-ing Jiri since he seems to be the poor guy who's been poking this
>> recently (there's a good few threads about this (though the others look
>> like an ldisc attach race...)) I wouldn't think that's the case here
>> since N_TTY is the default...
>
> Ok, tty_reopen is called without TTY_LDISC set. For further
> considerations, note tty_lock is held in tty_open. TTY_LDISC is cleared in:
>
> 1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
> section tty_lock is held.
>
> 2) tty_release via tty_ldisc_release till the end of tty existence. If
> tty->count <= 1, tty_lock is taken, TTY_CLOSING bit set and then
> tty_ldisc_release called. tty_reopen checks TTY_CLOSING before checking
> TTY_LDISC.
>
> 3) tty_set_ldisc from tty_ldisc_halt to tty_ldisc_enable. We take
> tty_lock, set TTY_LDISC_CHANGING, put tty_lock, do some other work, take
> tty_lock, call tty_ldisc_enable, put tty_lock.

Oh, "do some other work" includes tty_ldisc_halt where TTY_LDISC is
cleared and tty_lock is _not_ held.

> So the only option I see is 3) and we should do:
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1310,7 +1310,8 @@ static int tty_reopen(struct tty_struct *tty)
> {
> struct tty_driver *driver = tty->driver;
>
> - if (test_bit(TTY_CLOSING, &tty->flags))
> + if (test_bit(TTY_CLOSING, &tty->flags) ||
> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
> return -EIO;
>
> if (driver->type == TTY_DRIVER_TYPE_PTY &&
>
> Alan, Greg?
>
> thanks,
--
js
suse labs

2010-11-25 17:16:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

There are many WARNINGs like the following reported nowadays:
WARNING: at drivers/tty/tty_io.c:1331 tty_open+0x2a2/0x49a()
Hardware name: Latitude E6500
Modules linked in:
Pid: 1207, comm: plymouthd Not tainted 2.6.37-rc3-mmotm1123 #3
Call Trace:
[<ffffffff8103b189>] warn_slowpath_common+0x80/0x98
[<ffffffff8103b1b6>] warn_slowpath_null+0x15/0x17
[<ffffffff8128a3ab>] tty_open+0x2a2/0x49a
[<ffffffff810fd53f>] chrdev_open+0x11d/0x146
...

This means tty_reopen is called without TTY_LDISC set. For further
considerations, note tty_lock is held in tty_open. TTY_LDISC is cleared in:
1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
section tty_lock is held.

2) tty_release via tty_ldisc_release till the end of tty existence. If
tty->count <= 1, tty_lock is taken, TTY_CLOSING bit set and then
tty_ldisc_release called. tty_reopen checks TTY_CLOSING before checking
TTY_LDISC.

3) tty_set_ldisc from tty_ldisc_halt to tty_ldisc_enable. We:
* take tty_lock, set TTY_LDISC_CHANGING, put tty_lock
* call tty_ldisc_halt (clear TTY_LDISC), tty_lock is _not_ held
* do some other work
* take tty_lock, call tty_ldisc_enable (set TTY_LDISC), put
tty_lock

So the only option I see is 3). The solution is to check
TTY_LDISC_CHANGING along with TTY_CLOSING in tty_reopen.

Nicely reproducible with two processes:
while (1) {
fd = open("/dev/ttyS1", O_RDWR);
if (fd < 0) {
warn("open");
continue;
}
close(fd);
}
--------
while (1) {
fd = open("/dev/ttyS1", O_RDWR);
ld1 = 0; ld2 = 2;
while (1) {
ioctl(fd, TIOCSETD, &ld1);
ioctl(fd, TIOCSETD, &ld2);
}
close(fd);
}

Signed-off-by: Jiri Slaby <[email protected]>
Reported-by: <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Alan Cox <[email protected]>
---
drivers/tty/tty_io.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index c05c5af..878f6d6 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1310,7 +1310,8 @@ static int tty_reopen(struct tty_struct *tty)
{
struct tty_driver *driver = tty->driver;

- if (test_bit(TTY_CLOSING, &tty->flags))
+ if (test_bit(TTY_CLOSING, &tty->flags) ||
+ test_bit(TTY_LDISC_CHANGING, &tty->flags))
return -EIO;

if (driver->type == TTY_DRIVER_TYPE_PTY &&
--
1.7.3.1

2010-11-25 17:59:19

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On Thu, Nov 25, 2010 at 06:16:23PM +0100, Jiri Slaby wrote:
> - if (test_bit(TTY_CLOSING, &tty->flags))
> + if (test_bit(TTY_CLOSING, &tty->flags) ||
> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
> return -EIO;
>

Doh, nice catch. I just built a couple test images and sent them out to
the reporters for confirmation.

Thanks!
Kyle

2010-11-26 00:28:45

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On Thu, Nov 25, 2010 at 06:16:23PM +0100, Jiri Slaby wrote:
> - if (test_bit(TTY_CLOSING, &tty->flags))
> + if (test_bit(TTY_CLOSING, &tty->flags) ||
> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
> return -EIO;
>
> if (driver->type == TTY_DRIVER_TYPE_PTY &&

Unfortunately, users report this doesn't seem to fix things for them
(built against 2.6.36 (plus another patch you wrote iirc.))

https://bugzilla.redhat.com/show_bug.cgi?id=630464#c27

I tried reverting the TTY patches between 2.6.36 and 2.6.35 and getting
them to test that, and it seems ok:

https://bugzilla.redhat.com/show_bug.cgi?id=630464#c30

So I guess there must be a race here somewhere... I'll keep looking. :/

I would imagine it's something that's probably existed since the dawn of
time but the BKL has just papered over entirely.

Thanks for trying!
--Kyle

2010-11-26 07:46:25

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On 11/26/2010 01:28 AM, Kyle McMartin wrote:
> On Thu, Nov 25, 2010 at 06:16:23PM +0100, Jiri Slaby wrote:
>> - if (test_bit(TTY_CLOSING, &tty->flags))
>> + if (test_bit(TTY_CLOSING, &tty->flags) ||
>> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
>> return -EIO;
>>
>> if (driver->type == TTY_DRIVER_TYPE_PTY &&
>
> Unfortunately, users report this doesn't seem to fix things for them
> (built against 2.6.36 (plus another patch you wrote iirc.))

Which patches exactly do you have? You need three of mine in 2.6.36.

> https://bugzilla.redhat.com/show_bug.cgi?id=630464#c27

regards,
--
js
suse labs

2010-11-26 13:28:05

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On Fri, Nov 26, 2010 at 08:46:18AM +0100, Jiri Slaby wrote:
> On 11/26/2010 01:28 AM, Kyle McMartin wrote:
> > On Thu, Nov 25, 2010 at 06:16:23PM +0100, Jiri Slaby wrote:
> >> - if (test_bit(TTY_CLOSING, &tty->flags))
> >> + if (test_bit(TTY_CLOSING, &tty->flags) ||
> >> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
> >> return -EIO;
> >>
> >> if (driver->type == TTY_DRIVER_TYPE_PTY &&
> >
> > Unfortunately, users report this doesn't seem to fix things for them
> > (built against 2.6.36 (plus another patch you wrote iirc.))
>
> Which patches exactly do you have? You need three of mine in 2.6.36.
>

Just tty-restore-tty_ldisc_wait_idle.patch on top of 2.6.36.1, I'll grab
the other two now.

--Kyle

2010-11-26 16:46:41

by Daniel Drake

[permalink] [raw]
Subject: Re: mmotm 2010-11-23-16-12 uploaded (olpc)

On 24 November 2010 18:51, Randy Dunlap <[email protected]> wrote:
> make[4]: *** No rule to make target `arch/x86/platform/olpc/olpc-xo1-wakeup.c', needed by `arch/x86/platform/olpc/olpc-xo1-wakeup.o'.
>
>
> It's olpc-xo1-wakeup.S, so I guess it needs a special makefile rule ??

Works if you build it in, but fails as above as a module.

And it looks like making it work as a module is not as easy as we
thought. I'll discuss this with Andres and get a new patch submitted
soon.

Daniel

2010-11-27 03:00:05

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On Fri, Nov 26, 2010 at 08:46:18AM +0100, Jiri Slaby wrote:
> >> - if (test_bit(TTY_CLOSING, &tty->flags))
> >> + if (test_bit(TTY_CLOSING, &tty->flags) ||
> >> + test_bit(TTY_LDISC_CHANGING, &tty->flags))
> >> return -EIO;
> >>
> >> if (driver->type == TTY_DRIVER_TYPE_PTY &&
> >
> > Unfortunately, users report this doesn't seem to fix things for them
> > (built against 2.6.36 (plus another patch you wrote iirc.))
>
> Which patches exactly do you have? You need three of mine in 2.6.36.
>
> > https://bugzilla.redhat.com/show_bug.cgi?id=630464#c27
>

Hrm, I'm still seeing it on top of Linus' latest with that patch. :/

Even more bizarrely, I tried to come up with ways this could be failing,
and decided to test a few things...

I set_bit(TTY_DEBUG, &tty->flags) just before returning from
tty_init_dev (which, afaict, should be called for vc/tty$n and ptys?)
and then checked it with a similar WARN_ON in tty_reopen, and found that
I was hitting it fairly regularly.

As far as I can tell, for this to occur, we'd need something to open
/dev/tty1 first, which hits the tty_init_dev, and something else to very
closely follow that, hit the linking of driver->ttys[idx] and so skip
into tty_reopen, and smack into my WARN_ON.

Of course, given the locking, I have no idea how it could possibly be
happening.

I'm poking around to see, I think maybe something might be dropping
locks in the callchain that gives us a window where this might be
possible... I don't see any other way we could end up with tty1 having
TTY_LDISC unset.

(I'm poking in some more debugging, and moving the 'linking in' of the
device until after tty_ldisc_setup in tty_init_dev, but I'm not
particularly hopeful.)

--Kyle

2010-11-27 08:50:46

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On 11/27/2010 03:59 AM, Kyle McMartin wrote:
> I'm poking around to see, I think maybe something might be dropping
> locks in the callchain that gives us a window where this might be
> possible...

Of course, that's the case:
clear_bit(TTY_LDISC, &tty->flags);
tty_unlock();
cancel_delayed_work_sync(&tty->buf.work);
mutex_unlock(&tty->ldisc_mutex);

tty_lock();
mutex_lock(&tty->ldisc_mutex);

in tty_ldisc_hangup. Hence my point 1) from previous posts doesn't hold too:
1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
section tty_lock is held.

I will check, how to fix this.

thanks,
--
js
suse labs

2010-11-27 09:44:01

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On 11/27/2010 09:50 AM, Jiri Slaby wrote:
> On 11/27/2010 03:59 AM, Kyle McMartin wrote:
>> I'm poking around to see, I think maybe something might be dropping
>> locks in the callchain that gives us a window where this might be
>> possible...
>
> Of course, that's the case:
> clear_bit(TTY_LDISC, &tty->flags);
> tty_unlock();
> cancel_delayed_work_sync(&tty->buf.work);
> mutex_unlock(&tty->ldisc_mutex);
>
> tty_lock();
> mutex_lock(&tty->ldisc_mutex);
>
> in tty_ldisc_hangup. Hence my point 1) from previous posts doesn't hold too:
> 1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
> section tty_lock is held.
>
> I will check, how to fix this.

Reproducible with 2 running processes from the attachment.

regards,
--
js
suse labs


Attachments:
tty_reopen.c (1.20 kB)

2010-11-27 15:11:12

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On 11/27/2010 10:43 AM, Jiri Slaby wrote:
> On 11/27/2010 09:50 AM, Jiri Slaby wrote:
>> On 11/27/2010 03:59 AM, Kyle McMartin wrote:
>>> I'm poking around to see, I think maybe something might be dropping
>>> locks in the callchain that gives us a window where this might be
>>> possible...
>>
>> Of course, that's the case:
>> clear_bit(TTY_LDISC, &tty->flags);
>> tty_unlock();
>> cancel_delayed_work_sync(&tty->buf.work);
>> mutex_unlock(&tty->ldisc_mutex);
>>
>> tty_lock();
>> mutex_lock(&tty->ldisc_mutex);
>>
>> in tty_ldisc_hangup. Hence my point 1) from previous posts doesn't hold too:
>> 1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
>> section tty_lock is held.
>>
>> I will check, how to fix this.
>
> Reproducible with 2 running processes from the attachment.

Is it fixed with the attached proof-of-concept patch?

So you need:
THIS ONE
TTY: don't allow reopen when ldisc is changing
TTY: ldisc, fix open flag handling
Char: TTY, restore tty_ldisc_wait_idle

The last one is in 2.6.37-rc2 already.

thanks,
--
js
suse labs


Attachments:
0001-TTY-open-hangup-race-fixup.patch (2.27 kB)

2010-11-27 23:53:59

by Kyle McMartin

[permalink] [raw]
Subject: Re: [PATCH 1/1] TTY: don't allow reopen when ldisc is changing

On Sat, Nov 27, 2010 at 04:11:06PM +0100, Jiri Slaby wrote:
> Is it fixed with the attached proof-of-concept patch?
>
> So you need:
> THIS ONE
> TTY: don't allow reopen when ldisc is changing
> TTY: ldisc, fix open flag handling
> Char: TTY, restore tty_ldisc_wait_idle
>
> The last one is in 2.6.37-rc2 already.

Shoved them all into a build and sent it out for testing, the tester who
was hitting it very frequently reports he hasn't seen it, so I think
you've managed to close the race window, awesome!

I rebooted my laptop continuously and didn't hit the WARN_ON as
well to confirm.

--Kyle

2010-12-02 18:49:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Wed, Nov 24, 2010 at 01:25:25PM -0700, Mike Galbraith wrote:
> On Wed, 2010-11-24 at 00:01 -0500, [email protected] wrote:
> > On Tue, 23 Nov 2010 16:13:06 PST, [email protected] said:
> > > The mm-of-the-moment snapshot 2010-11-23-16-12 has been uploaded to
> > >
> > > http://userweb.kernel.org/~akpm/mmotm/
> >
> > (I appear to be on a roll tonight - 3 splats before I even had a chance to login. :)
> >
> > mmotm + Ingo's cleanup of Mike's autogroups patch.
> ...
>
> Sorry for slow response, been trying to use some of my last few vacation
> days on vacation stuff ;-)
>
> The below should run gripe free. Suppose I should learn to turn on
> lockdep and whatnot when tinkering/testing.
>
> Unfortunately, tip's update_shares() changes are still being difficult.
>
> static void update_shares(int cpu)
> {
> struct cfs_rq *cfs_rq;
> struct rq *rq = cpu_rq(cpu);
>
> rcu_read_lock();
> for_each_leaf_cfs_rq(rq, cfs_rq)
> update_shares_cpu(cfs_rq->tg, cpu);
> rcu_read_unlock();
> }
>
> Despite task groups being freed via rcu, update_shares_cup() hits freed
> memory and explodes, and nothing I've tried has been able to stop it.
> The only thing I haven't tried (aside from the right thing;) is to take
> rcu out of the picture entirely.

Is your new autogroup structure retaining a pointer to memory that
is freed by RCU?

If so, you will need to NULL out that pointer before the memory
in question is passed to call_rcu(). (Or before the call to
synchronize_rcu(), as the case may be.)

Thanx, Paul

> From: Mike Galbraith <[email protected]>
> Date: Sat, 20 Nov 2010 12:35:00 -0700
> Subject: [PATCH] sched: Improve desktop interactivity: Implement automated per session task groups
>
> A recurring complaint from CFS users is that parallel kbuild has a negative
> impact on desktop interactivity. This patch implements an idea from Linus,
> to automatically create task groups. Currently, only per session autogroups
> are implemented, but the patch leaves the way open for enhancement.
>
> Implementation: each task's signal struct contains an inherited pointer to
> a refcounted autogroup struct containing a task group pointer, the default
> for all tasks pointing to the init_task_group. When a task calls setsid(),
> a new task group is created, the process is moved into the new task group,
> and a reference to the preveious task group is dropped. Child processes
> inherit this task group thereafter, and increase it's refcount. When the
> last thread of a process exits, the process's reference is dropped, such
> that when the last process referencing an autogroup exits, the autogroup
> is destroyed.
>
> At runqueue selection time, IFF a task has no cgroup assignment, its current
> autogroup is used.
>
> Autogroup bandwidth is controllable via setting it's nice level through the
> proc filesystem. cat /proc/<pid>/autogroup displays the task's group and the
> group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task
> group's shares to the weight of nice <level> task. Setting nice level is rate
> limited for !admin users due to the abuse risk of task group locking.
>
> The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is
> selected, but can be disabled via the boot option noautogroup, and can also
> be turned on/off on the fly via..
> echo [01] > /proc/sys/kernel/sched_autogroup_enabled.
> ..which will automatically move tasks to/from the root task group.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Markus Trippelsdorf <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 2
> fs/proc/base.c | 79 +++++++++++
> include/linux/sched.h | 23 +++
> init/Kconfig | 12 +
> kernel/fork.c | 5
> kernel/sched.c | 13 +
> kernel/sched_autogroup.c | 240 ++++++++++++++++++++++++++++++++++++
> kernel/sched_autogroup.h | 23 +++
> kernel/sched_debug.c | 29 ++--
> kernel/sys.c | 4
> kernel/sysctl.c | 11 +
> 11 files changed, 423 insertions(+), 18 deletions(-)
>
> Index: linux-2.6.37.git/include/linux/sched.h
> ===================================================================
> --- linux-2.6.37.git.orig/include/linux/sched.h
> +++ linux-2.6.37.git/include/linux/sched.h
> @@ -509,6 +509,8 @@ struct thread_group_cputimer {
> spinlock_t lock;
> };
>
> +struct autogroup;
> +
> /*
> * NOTE! "signal_struct" does not have it's own
> * locking, because a shared signal_struct always
> @@ -576,6 +578,9 @@ struct signal_struct {
>
> struct tty_struct *tty; /* NULL if no tty */
>
> +#ifdef CONFIG_SCHED_AUTOGROUP
> + struct autogroup *autogroup;
> +#endif
> /*
> * Cumulative resource counters for dead threads in the group,
> * and for reaped dead child processes forked by this group.
> @@ -1931,6 +1936,24 @@ int sched_rt_handler(struct ctl_table *t
>
> extern unsigned int sysctl_sched_compat_yield;
>
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +extern unsigned int sysctl_sched_autogroup_enabled;
> +
> +extern void sched_autogroup_create_attach(struct task_struct *p);
> +extern void sched_autogroup_detach(struct task_struct *p);
> +extern void sched_autogroup_fork(struct signal_struct *sig);
> +extern void sched_autogroup_exit(struct signal_struct *sig);
> +#ifdef CONFIG_PROC_FS
> +extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
> +extern int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice);
> +#endif
> +#else
> +static inline void sched_autogroup_create_attach(struct task_struct *p) { }
> +static inline void sched_autogroup_detach(struct task_struct *p) { }
> +static inline void sched_autogroup_fork(struct signal_struct *sig) { }
> +static inline void sched_autogroup_exit(struct signal_struct *sig) { }
> +#endif
> +
> #ifdef CONFIG_RT_MUTEXES
> extern int rt_mutex_getprio(struct task_struct *p);
> extern void rt_mutex_setprio(struct task_struct *p, int prio);
> Index: linux-2.6.37.git/kernel/sched.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/sched.c
> +++ linux-2.6.37.git/kernel/sched.c
> @@ -78,6 +78,7 @@
>
> #include "sched_cpupri.h"
> #include "workqueue_sched.h"
> +#include "sched_autogroup.h"
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/sched.h>
> @@ -268,6 +269,10 @@ struct task_group {
> struct task_group *parent;
> struct list_head siblings;
> struct list_head children;
> +
> +#ifdef CONFIG_SCHED_AUTOGROUP
> + struct autogroup *autogroup;
> +#endif
> };
>
> #define root_task_group init_task_group
> @@ -605,11 +610,14 @@ static inline int cpu_of(struct rq *rq)
> */
> static inline struct task_group *task_group(struct task_struct *p)
> {
> + struct task_group *tg;
> struct cgroup_subsys_state *css;
>
> css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> lockdep_is_held(&task_rq(p)->lock));
> - return container_of(css, struct task_group, css);
> + tg = container_of(css, struct task_group, css);
> +
> + return autogroup_task_group(p, tg);
> }
>
> /* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
> @@ -2006,6 +2014,7 @@ static void sched_irq_time_avg_update(st
> #include "sched_idletask.c"
> #include "sched_fair.c"
> #include "sched_rt.c"
> +#include "sched_autogroup.c"
> #include "sched_stoptask.c"
> #ifdef CONFIG_SCHED_DEBUG
> # include "sched_debug.c"
> @@ -7979,7 +7988,7 @@ void __init sched_init(void)
> #ifdef CONFIG_CGROUP_SCHED
> list_add(&init_task_group.list, &task_groups);
> INIT_LIST_HEAD(&init_task_group.children);
> -
> + autogroup_init(&init_task);
> #endif /* CONFIG_CGROUP_SCHED */
>
> #if defined CONFIG_FAIR_GROUP_SCHED && defined CONFIG_SMP
> Index: linux-2.6.37.git/kernel/fork.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/fork.c
> +++ linux-2.6.37.git/kernel/fork.c
> @@ -174,8 +174,10 @@ static inline void free_signal_struct(st
>
> static inline void put_signal_struct(struct signal_struct *sig)
> {
> - if (atomic_dec_and_test(&sig->sigcnt))
> + if (atomic_dec_and_test(&sig->sigcnt)) {
> + sched_autogroup_exit(sig);
> free_signal_struct(sig);
> + }
> }
>
> void __put_task_struct(struct task_struct *tsk)
> @@ -904,6 +906,7 @@ static int copy_signal(unsigned long clo
> posix_cpu_timers_init_group(sig);
>
> tty_audit_fork(sig);
> + sched_autogroup_fork(sig);
>
> sig->oom_adj = current->signal->oom_adj;
> sig->oom_score_adj = current->signal->oom_score_adj;
> Index: linux-2.6.37.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/sys.c
> +++ linux-2.6.37.git/kernel/sys.c
> @@ -1080,8 +1080,10 @@ SYSCALL_DEFINE0(setsid)
> err = session;
> out:
> write_unlock_irq(&tasklist_lock);
> - if (err > 0)
> + if (err > 0) {
> proc_sid_connector(group_leader);
> + sched_autogroup_create_attach(group_leader);
> + }
> return err;
> }
>
> Index: linux-2.6.37.git/kernel/sched_debug.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/sched_debug.c
> +++ linux-2.6.37.git/kernel/sched_debug.c
> @@ -87,6 +87,20 @@ static void print_cfs_group_stats(struct
> }
> #endif
>
> +#if defined(CONFIG_CGROUP_SCHED) && \
> + (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> +static void task_group_path(struct task_group *tg, char *buf, int buflen)
> +{
> + /* may be NULL if the underlying cgroup isn't fully-created yet */
> + if (!tg->css.cgroup) {
> + if (!autogroup_path(tg, buf, buflen))
> + buf[0] = '\0';
> + return;
> + }
> + cgroup_path(tg->css.cgroup, buf, buflen);
> +}
> +#endif
> +
> static void
> print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
> {
> @@ -115,7 +129,7 @@ print_task(struct seq_file *m, struct rq
> char path[64];
>
> rcu_read_lock();
> - cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
> + task_group_path(task_group(p), path, sizeof(path));
> rcu_read_unlock();
> SEQ_printf(m, " %s", path);
> }
> @@ -147,19 +161,6 @@ static void print_rq(struct seq_file *m,
> read_unlock_irqrestore(&tasklist_lock, flags);
> }
>
> -#if defined(CONFIG_CGROUP_SCHED) && \
> - (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
> -static void task_group_path(struct task_group *tg, char *buf, int buflen)
> -{
> - /* may be NULL if the underlying cgroup isn't fully-created yet */
> - if (!tg->css.cgroup) {
> - buf[0] = '\0';
> - return;
> - }
> - cgroup_path(tg->css.cgroup, buf, buflen);
> -}
> -#endif
> -
> void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
> {
> s64 MIN_vruntime = -1, min_vruntime, max_vruntime = -1,
> Index: linux-2.6.37.git/fs/proc/base.c
> ===================================================================
> --- linux-2.6.37.git.orig/fs/proc/base.c
> +++ linux-2.6.37.git/fs/proc/base.c
> @@ -1407,6 +1407,82 @@ static const struct file_operations proc
>
> #endif
>
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +/*
> + * Print out autogroup related information:
> + */
> +static int sched_autogroup_show(struct seq_file *m, void *v)
> +{
> + struct inode *inode = m->private;
> + struct task_struct *p;
> +
> + p = get_proc_task(inode);
> + if (!p)
> + return -ESRCH;
> + proc_sched_autogroup_show_task(p, m);
> +
> + put_task_struct(p);
> +
> + return 0;
> +}
> +
> +static ssize_t
> +sched_autogroup_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *offset)
> +{
> + struct inode *inode = file->f_path.dentry->d_inode;
> + struct task_struct *p;
> + char buffer[PROC_NUMBUF];
> + long nice;
> + int err;
> +
> + memset(buffer, 0, sizeof(buffer));
> + if (count > sizeof(buffer) - 1)
> + count = sizeof(buffer) - 1;
> + if (copy_from_user(buffer, buf, count))
> + return -EFAULT;
> +
> + err = strict_strtol(strstrip(buffer), 0, &nice);
> + if (err)
> + return -EINVAL;
> +
> + p = get_proc_task(inode);
> + if (!p)
> + return -ESRCH;
> +
> + err = nice;
> + err = proc_sched_autogroup_set_nice(p, &err);
> + if (err)
> + count = err;
> +
> + put_task_struct(p);
> +
> + return count;
> +}
> +
> +static int sched_autogroup_open(struct inode *inode, struct file *filp)
> +{
> + int ret;
> +
> + ret = single_open(filp, sched_autogroup_show, NULL);
> + if (!ret) {
> + struct seq_file *m = filp->private_data;
> +
> + m->private = inode;
> + }
> + return ret;
> +}
> +
> +static const struct file_operations proc_pid_sched_autogroup_operations = {
> + .open = sched_autogroup_open,
> + .read = seq_read,
> + .write = sched_autogroup_write,
> + .llseek = seq_lseek,
> + .release = single_release,
> +};
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> +
> static ssize_t comm_write(struct file *file, const char __user *buf,
> size_t count, loff_t *offset)
> {
> @@ -2733,6 +2809,9 @@ static const struct pid_entry tgid_base_
> #ifdef CONFIG_SCHED_DEBUG
> REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
> #endif
> +#ifdef CONFIG_SCHED_AUTOGROUP
> + REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
> +#endif
> REG("comm", S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
> #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
> INF("syscall", S_IRUSR, proc_pid_syscall),
> Index: linux-2.6.37.git/kernel/sched_autogroup.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6.37.git/kernel/sched_autogroup.h
> @@ -0,0 +1,23 @@
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg);
> +
> +#else /* !CONFIG_SCHED_AUTOGROUP */
> +
> +static inline void autogroup_init(struct task_struct *init_task) { }
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg)
> +{
> + return tg;
> +}
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
> +{
> + return 0;
> +}
> +#endif
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> Index: linux-2.6.37.git/kernel/sched_autogroup.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.37.git/kernel/sched_autogroup.c
> @@ -0,0 +1,240 @@
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/kallsyms.h>
> +#include <linux/utsname.h>
> +
> +unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
> +
> +struct autogroup {
> + struct kref kref;
> + struct task_group *tg;
> + struct rw_semaphore lock;
> + unsigned long id;
> + int nice;
> +};
> +
> +static struct autogroup autogroup_default;
> +static atomic_t autogroup_seq_nr;
> +
> +static void autogroup_init(struct task_struct *init_task)
> +{
> + autogroup_default.tg = &init_task_group;
> + init_task_group.autogroup = &autogroup_default;
> + kref_init(&autogroup_default.kref);
> + init_rwsem(&autogroup_default.lock);
> + init_task->signal->autogroup = &autogroup_default;
> +}
> +
> +static inline void autogroup_free(struct task_group *tg)
> +{
> + kfree(tg->autogroup);
> +}
> +
> +static inline void autogroup_destroy(struct kref *kref)
> +{
> + struct autogroup *ag = container_of(kref, struct autogroup, kref);
> +
> + sched_destroy_group(ag->tg);
> +}
> +
> +static inline void autogroup_kref_put(struct autogroup *ag)
> +{
> + kref_put(&ag->kref, autogroup_destroy);
> +}
> +
> +static inline struct autogroup *autogroup_kref_get(struct autogroup *ag)
> +{
> + kref_get(&ag->kref);
> + return ag;
> +}
> +
> +static inline struct autogroup *autogroup_create(void)
> +{
> + struct autogroup *ag = kzalloc(sizeof(*ag), GFP_KERNEL);
> + struct task_group *tg;
> +
> + if (!ag)
> + goto out_fail;
> +
> + tg = sched_create_group(&init_task_group);
> +
> + if (IS_ERR(tg))
> + goto out_free;
> +
> + kref_init(&ag->kref);
> + init_rwsem(&ag->lock);
> + ag->id = atomic_inc_return(&autogroup_seq_nr);
> + ag->tg = tg;
> + tg->autogroup = ag;
> +
> + return ag;
> +
> +out_free:
> + kfree(ag);
> +out_fail:
> + if (printk_ratelimit())
> + printk(KERN_WARNING "autogroup_create: %s failure.\n",
> + ag ? "sched_create_group()" : "kmalloc()");
> +
> + return autogroup_kref_get(&autogroup_default);
> +}
> +
> +static inline bool
> +task_wants_autogroup(struct task_struct *p, struct task_group *tg)
> +{
> + if (tg != &root_task_group)
> + return false;
> +
> + if (p->sched_class != &fair_sched_class)
> + return false;
> +
> + /*
> + * We can only assume the task group can't go away on us if
> + * autogroup_move_group() can see us on ->thread_group list.
> + */
> + if (p->flags & PF_EXITING)
> + return false;
> +
> + return true;
> +}
> +
> +static inline struct task_group *
> +autogroup_task_group(struct task_struct *p, struct task_group *tg)
> +{
> + int enabled = ACCESS_ONCE(sysctl_sched_autogroup_enabled);
> +
> + if (enabled && task_wants_autogroup(p, tg))
> + return p->signal->autogroup->tg;
> +
> + return tg;
> +}
> +
> +static void
> +autogroup_move_group(struct task_struct *p, struct autogroup *ag)
> +{
> + struct autogroup *prev;
> + struct task_struct *t;
> + unsigned long flags;
> +
> + if (!lock_task_sighand(p, &flags)) {
> + WARN_ON(1);
> + return;
> + }
> +
> + prev = p->signal->autogroup;
> + if (prev == ag) {
> + unlock_task_sighand(p, &flags);
> + return;
> + }
> +
> + p->signal->autogroup = autogroup_kref_get(ag);
> + t = p;
> +
> + do {
> + sched_move_task(t);
> + } while_each_thread(p, t);
> +
> + unlock_task_sighand(p, &flags);
> + autogroup_kref_put(prev);
> +}
> +
> +/* Allocates GFP_KERNEL, cannot be called under any spinlock */
> +void sched_autogroup_create_attach(struct task_struct *p)
> +{
> + struct autogroup *ag = autogroup_create();
> +
> + autogroup_move_group(p, ag);
> + /* drop extra refrence added by autogroup_create() */
> + autogroup_kref_put(ag);
> +}
> +EXPORT_SYMBOL(sched_autogroup_create_attach);
> +
> +/* Cannot be called under siglock. Currently has no users */
> +void sched_autogroup_detach(struct task_struct *p)
> +{
> + autogroup_move_group(p, &autogroup_default);
> +}
> +EXPORT_SYMBOL(sched_autogroup_detach);
> +
> +void sched_autogroup_fork(struct signal_struct *sig)
> +{
> + struct task_struct *p = current;
> +
> + spin_lock_irq(&p->sighand->siglock);
> + sig->autogroup = autogroup_kref_get(p->signal->autogroup);
> + spin_unlock_irq(&p->sighand->siglock);
> +}
> +
> +void sched_autogroup_exit(struct signal_struct *sig)
> +{
> + autogroup_kref_put(sig->autogroup);
> +}
> +
> +static int __init setup_autogroup(char *str)
> +{
> + sysctl_sched_autogroup_enabled = 0;
> +
> + return 1;
> +}
> +
> +__setup("noautogroup", setup_autogroup);
> +
> +#ifdef CONFIG_PROC_FS
> +
> +/* Called with siglock held. */
> +int proc_sched_autogroup_set_nice(struct task_struct *p, int *nice)
> +{
> + static unsigned long next = INITIAL_JIFFIES;
> + struct autogroup *ag;
> + int err;
> +
> + if (*nice < -20 || *nice > 19)
> + return -EINVAL;
> +
> + err = security_task_setnice(current, *nice);
> + if (err)
> + return err;
> +
> + if (*nice < 0 && !can_nice(current, *nice))
> + return -EPERM;
> +
> + /* this is a heavy operation taking global locks.. */
> + if (!capable(CAP_SYS_ADMIN) && time_before(jiffies, next))
> + return -EAGAIN;
> +
> + next = HZ / 10 + jiffies;
> + ag = autogroup_kref_get(p->signal->autogroup);
> +
> + down_write(&ag->lock);
> + err = sched_group_set_shares(ag->tg, prio_to_weight[*nice + 20]);
> + if (!err)
> + ag->nice = *nice;
> + up_write(&ag->lock);
> +
> + autogroup_kref_put(ag);
> +
> + return err;
> +}
> +
> +void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m)
> +{
> + struct autogroup *ag = autogroup_kref_get(p->signal->autogroup);
> +
> + down_read(&ag->lock);
> + seq_printf(m, "/autogroup-%ld nice %d\n", ag->id, ag->nice);
> + up_read(&ag->lock);
> +
> + autogroup_kref_put(ag);
> +}
> +#endif /* CONFIG_PROC_FS */
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
> +{
> + return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
> +}
> +#endif /* CONFIG_SCHED_DEBUG */
> +
> +#endif /* CONFIG_SCHED_AUTOGROUP */
> Index: linux-2.6.37.git/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.37.git.orig/kernel/sysctl.c
> +++ linux-2.6.37.git/kernel/sysctl.c
> @@ -382,6 +382,17 @@ static struct ctl_table kern_table[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec,
> },
> +#ifdef CONFIG_SCHED_AUTOGROUP
> + {
> + .procname = "sched_autogroup_enabled",
> + .data = &sysctl_sched_autogroup_enabled,
> + .maxlen = sizeof(unsigned int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + .extra1 = &zero,
> + .extra2 = &one,
> + },
> +#endif
> #ifdef CONFIG_PROVE_LOCKING
> {
> .procname = "prove_locking",
> Index: linux-2.6.37.git/init/Kconfig
> ===================================================================
> --- linux-2.6.37.git.orig/init/Kconfig
> +++ linux-2.6.37.git/init/Kconfig
> @@ -728,6 +728,18 @@ config NET_NS
>
> endif # NAMESPACES
>
> +config SCHED_AUTOGROUP
> + bool "Automatic process group scheduling"
> + select CGROUPS
> + select CGROUP_SCHED
> + select FAIR_GROUP_SCHED
> + help
> + This option optimizes the scheduler for common desktop workloads by
> + automatically creating and populating task groups. This separation
> + of workloads isolates aggressive CPU burners (like build jobs) from
> + desktop applications. Task group autogeneration is currently based
> + upon task session.
> +
> config MM_OWNER
> bool
>
> Index: linux-2.6.37.git/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.37.git.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6.37.git/Documentation/kernel-parameters.txt
> @@ -1622,6 +1622,8 @@ and is between 256 and 4096 characters.
> noapic [SMP,APIC] Tells the kernel to not make use of any
> IOAPICs that may be present in the system.
>
> + noautogroup Disable scheduler automatic task group creation.
> +
> nobats [PPC] Do not use BATs for mapping kernel lowmem
> on "Classic" PPC cores.
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-12-03 03:58:27

by Mike Galbraith

[permalink] [raw]
Subject: Re: mmotm 2010-11-23 + autogroups -> inconsistent lock state

On Thu, 2010-12-02 at 10:16 -0800, Paul E. McKenney wrote:
> On Wed, Nov 24, 2010 at 01:25:25PM -0700, Mike Galbraith wrote:
>
> > Despite task groups being freed via rcu, update_shares_cup() hits freed
> > memory and explodes, and nothing I've tried has been able to stop it.
> > The only thing I haven't tried (aside from the right thing;) is to take
> > rcu out of the picture entirely.
>
> Is your new autogroup structure retaining a pointer to memory that
> is freed by RCU?

That turned out to be a typo that left freed cfs_rq registered. No dark
elves (memory ordering), just a defenseless little typo.

-Mike