Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
and will later appear at
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
Please check kernel.org before using zip.com.au.
- Added the `bk-scsi-target' tree to the -mm lineup. It is managed by James
Bottomley
- Some enhancements to the ext3 block reservation code here. Please cc
[email protected] on oops reports ;)
- There's a patch here which will cause warnings if a PCI device driver is
removed without having called pci_disable_device(). Please try to cc the
appropriate mailing list or maintainer when reporting any instances.
Changes since 2.6.9-rc1-mm4:
linus.patch
bk-acpi.patch
bk-agpgart.patch
bk-alsa.patch
bk-cpufreq.patch
bk-driver-core.patch
bk-ia64.patch
bk-ieee1394.patch
bk-input.patch
bk-netdev.patch
bk-pci.patch
bk-pnp.patch
bk-power.patch
bk-scsi.patch
bk-scsi-target.patch
bk-usb.patch
bk-watchdog.patch
External trees
-pkt_act-fix.patch -ksysfs-build-fix.patch -ppc-build-fix.patch
-ppc64-allow-sd_nodes_per_domain-to-be-overridden.patch
-ppc64-fix-hang-on-oprofile-shutdown.patch
-ppc64-fix-__rw_yield-prototype.patch
-ppc64-be-resilient-against-sysfs-pci-config-accesses.patch
-ppc64-cut-down-paca-footprint.patch -ppc64-fix-boot-memory-reporting.patch
-ppc64-fix-power5-js20-smp-init.patch
-cleanup-fix-lost-ticks-handling-on-x86-64.patch
-factor-out-common-asm-hardirqh-code.patch
-scsi-qla2xxx-fix-inline-compile-errors.patch
-add-pci_fixup_enable-pass.patch
-cleanup-ptrace-stops-and-remove-notify_parent.patch
-cleanup-ptrace-stops-and-remove-notify_parent-extra.patch
-ptrace-api-preservation.patch
-nix-rusage_group.patch
-i386-syscall-tracing-of-bogus-system-calls.patch
-make-single-step-into-signal-delivery-stop-in-handler.patch
-cdrom-range-fixes.patch
-vsxxxaac-fixups.patch
-disambiguate-espc-clones.patch
-allow-cluster-wide-flock.patch
-allow-cluster-wide-flock-update.patch
-filemap-read-fix.patch
-fix-f_version-optimization-for-get_tgid_list.patch
-kernel-sysfs-events-layer.patch
-centralize-some-nls-helpers.patch
-remove-unused-sysctls-from-kernel-personalityc.patch
-fs-compatc-rwsem-instead-of-bkl-around-ioctl32_hash_table.patch
-small-wait_on_page_writeback_range-optimization.patch
-3w-xxxxc-queue-depth.patch
-md-correct-working_disk-counts-for-raid5-and-raid6.patch
-knfsd-calls-to-break_lease-in-nfsd-should-be-o_nonblocking.patch
-knfsd-return-eacces-instead-of-estale-for-certain-filehandle-lookup-failures.patch
-knfsd-fix-incorrect-indentation-in-fh_verify.patch
-nfsd4-support-acl_support-attribute.patch
-knfsd-trivial-cleanup-of-nfs4statec.patch
-nfsd4-could-leak-a-stateid-in-an-error-path.patch
-nfsd4-postpone-release-of-stateowner-on-close.patch
-nfsd4-store-current-tgid-instead-of-lockowner-hash-in-fl_pid.patch
-knfsd-remove-redundant-initialization-in-nfsd4_lockt.patch
-remove-in-kernel-init_module-cleanup_module-stubs.patch
-remove-ext2_panic.patch
-s390-export-copy_in_user.patch
-s390-minmax-removal-arch-s390-kernel-debugc.patch
-s390-packed-stack-vs-cpu-hotplug.patch
-s390-lcs-multicast-deadlock.patch
-allow-i8042-register-location-override-2.patch
-zlib_inflate-move-zlib_inflatesync-friends.patch
-zlib_inflate-make-zlib_inflate_trees_fixed-generate-the-table.patch
-ppc32-switch-arch-ppc-boot-to-lib-zlib_inflate.patch
-ext3-dreference-of-sb-preceeds-check.patch
-fbdev-speed-up-scrolling-of-tdfxfb.patch
-fbdev-ppc-crash-and-other-fixes-for-rivafb.patch
-fbcon-take-over-console-on-driver-registration.patch
-fbdev-clean-up-framebuffer-initialization.patch
-fbdev-add-module_init-and-fb_get_options-per-driver.patch
-remove-bogus-memset-from-cpqfc-driver.patch
-hpt366-ptr-use-before-null-check.patch
-crypto-teac-xtea_encrypt-should-use-xtea_delta.patch
-aio-dio-oops-fix.patch
-riscom8-build-fix.patch
-use-for_each_cpu-in-oprofile-code.patch
-fix-oprofile-vfree-warning-on-error.patch
-speed-up-oprofile-buffer-drain-code.patch
-speed-up-oprofile-buffer-drain-code-fix.patch
-cdu31a-build-fix.patch
-synclinkc-kernel-janitor-changes.patch
-adfs-add-static.patch
-isofs-add-static.patch
-correct-elf-section-used-for-out-of-line-spinlocks.patch
-tsc-synchronisation-cleanup.patch
-add-static-in-affs.patch
-add-static-in-afs.patch
-add-static-in-befs.patch
-codemercs-io-warrior-support.patch
-fat-use-hlist_head-for-fat_inode_hashtable-1-4.patch
-fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
-fat-cache-lock-from-per-sb-to-per-inode-3-4.patch
-fat-the-inode-hash-from-per-module-to-per-sb-4-4.patch
-uml-avoid-using-elv_queue_empty.patch
-uml-avoid-forcing-use-of-the-no-op-scheduler.patch
-uml-correct-the-failure-path-in-start_io_thread.patch
-fix-address_spacei_mmap-comment.patch
-remove-mod_incdec_use_count-users-that-got-back-in.patch
-dont-mention-mod_incdec_use_count-in-documentation.patch
Merged
+remove-set_fs-from-compat-sched-affinity-syscalls.patch
Remove the set_fs hack in the compat affinity calls.
+allow-compat-long-sized-bitmasks-in-affinity-code.patch
compat_sys_sched_getaffity() fix
+fix-schedstats-null-deref-in-sched_exec.patch
Fix an oops
+rock-fix.patch
Fix the rock.c driver
-es7000-subarch-update.patch
+2681-es7000-subarch-update.patch
New es7000 update
+exec-fix-posix-timers-leak-and-pending-signal-loss.patch
Fix some leaks
+fix-abi-in-set_mempolicy.patch
Fix up the numa memory policy stuff
+ksysfs-warning-fix.patch
+kobject_uevent-warning-fix.patch
+fix-smm-failures-on-e750x-systems.patch
+vsxxxaac-fixups.patch
+allow-i8042-register-location-override-2.patch
+tmscsim-build-fix.patch
Various fixes for various people's bk trees
+swsusp-documentation-update.patch
+small-cleanups-for-swsusp.patch
+swsusp-kill-crash-when-too-much-memory-is-free.patch
+swsusp-progress-in-percent.patch
+swsusp-clean-up-reading.patch
+swsusp-another-simplification.patch
+acpi-proc-simplify-error-handling.patch
swsusp stuff
+ppc64-lparcfg-fixes-for-processor-counts.patch
+ppc64-lparcfg-whitespace-and-wordwrap-cleanup.patch
+ppc64-remove-spinline-config-option.patch
+ppc64-rtas-error-logs-can-appear-twice-in-dmesg.patch
+ppc64-enable-numa-api.patch
+ppc64-give-the-kernel-an-opd-section.patch
+ppc64-use-nm-synthetic-where-available.patch
+ppc64-clean-up-kernel-command-line-code.patch
+ppc64-remove-unused-ppc64_calibrate_delay.patch
+ppc64-remove-eeh-command-line-device-matching-code.patch
+ppc64-use-early_param.patch
+ppc64-restore-smt-enabled=off-kernel-command-line-option.patch
+ppc64-enable-power5-low-power-mode-in-idle-loop.patch
+ppc64-clean-up-idle-loop-code.patch
+ppc64-remove-wno-uninitialized.patch
+ppc64-fix-real-bugs-uncovered-by-wno-uninitialized-removal.patch
+ppc64-fix-spurious-warnings-uncovered-by-wno-uninitialized-removal.patch
+hvc-uninitialised-variable.patch
+ppc64-improved-vsid-allocation-algorithm.patch
+ppc64fix-missing-register-in-altivec-context-switch.patch
+ppc32-remove-wno-uninitialized.patch
+ppc32-pmac-cpufreq-for-ibook-2-600.patch
ppc[64] updates
-pid_max-fix.patch
Dropped - wli fixed this by other means.
+lockmeter.patch
Repaired lockmeter patch
+ext3-reservations-spelling-fixes.patch
+ext3-reservations-renumber-the-ext3-reservations-ioctls.patch
+ext3-reservations-remove-unneeded-declaration.patch
+ext3-reservations-turn-ext3-per-sb-reservations-list-into-an-rbtree.patch
+ext3-reservations-split-the-reserve_window-struct-into-two.patch
+ext3-reservations-smp-protect-the-reservation-during-allocation.patch
ext3 block reservation enhancements: fix a few things and use an rbtree
+sched-trivial-sched-changes.patch
+sched-add-cpu_down_prepare-notifier.patch
+sched-integrate-cpu-hotplug-and-sched-domains.patch
+sched-arch_destroy_sched_domains-warning-fix.patch
+sched-sched-add-load-balance-flag.patch
+sched-remove-disjoint-numa-domains-setup.patch
+sched-make-domain-setup-overridable.patch
+sched-make-domain-setup-overridable-rename.patch
+sched-ia64-add-disjoint-numa-domain-support.patch
+sched-fix-domain-debug-for-isolcpus.patch
+sched-enable-sd_load_balance.patch
+sched-hotplug-add-a-cpu_down_failed-notifier.patch
+sched-use-cpu_down_failed-notifier.patch
+sched-fixes-for-ia64-domain-setup.patch
CPU scheduler work.
-journal_clean_checkpoint_list-latency-fix.patch
-filemap_sync-latency-fix.patch
-pty_write-latency-fix.patch
Dropped these scheduling latency changes - let's see what Ingo's ones look
like
propagate-pci_enable_device-errors.patch
+move-syscall-declarations-from-linux-keyh-2.patch
+make-key-management-use-syscalls-not-prctls-build-fix.patch
Key management code updates
+cachefs-return-the-right-error-upon-invalid-mount.patch
+remove-error-from-linux-cachefsh.patch
+cachefs-warning-fix-2.patch
+cachefs-linkage-fix-2.patch
-cachefs-linkage-fix.patch
Various updates to cachefs
+cpusets-display-allowed-masks-in-proc-status.patch
+cpusets-simplify-cpus_allowed-setting-in-attach.patch
+cpusets-remove-useless-validation-check.patch
+cpusets-fix-possible-race-in-cpuset_tasks_read.patch
+cpusets-interoperate-with-hotplug-online-maps.patch
cpusets fixes/updates
+stop-reiser4-from-turning-itself-on-by-default.patch
reiser4 Kconfig fix
+kallsyms-fix-sparc-gibberish.patch
Fix endianness in the new kallsyms handling code
+m32r-update-for-profiling.patch
+m32r-update-zone_sizes_init.patch
+m32r-update-to-fix-compile-errors.patch
+m32r-update-uaccessh.patch
+m32r-update-checksum-functions.patch
+m32r-update-cf-pcmcia-drivers.patch
+m32r-update-headers-to-remove-useless-ibcs2-support-code.patch
+atomic_inc_return-for-m32r-re.patch
m32r updates
-lighten-mmlist_lock.patch
Dropped - Hugh had second thoughts
+misrouted-irq-recovery-take-2-fix.patch
+misrouted-irq-recovery-docs.patch
Smarter workarounds for ia32 IRQ routing problems
+cfq-iosched-v2.patch
Major revamp of the CFQ IO scheduler
+dont-export-blkdev_open-and-def_blk_ops.patch
+remove-dead-code-from-fs-mbcachec.patch
+remove-posix_acl_masq_nfs_mode.patch
+make-kmem_find_general_cachep-static-in-slabc.patch
+dont-export-shmem_file_setup.patch
+remove-pm_find-unexport-pm_send.patch
+remove-dead-code-and-exports-from-signalc.patch
+mark-md_interrupt_thread-static.patch
+unexport-proc_sys_root.patch
+mark-dq_list_lock-static.patch
+unexport-is_subdir-and-shrink_dcache_anon.patch
+unexport-devfs_mk_symlink.patch
+unexport-do_execve-do_select.patch
+unexport-exit_mm.patch
+unexport-files_lock-and-put_filp.patch
+remove-exports-from-audit-code.patch
+unexport-f_delown.patch
+unexport-lookup_create.patch
+remove-wake_up_all_sync.patch
+remove-set_fs_root-set_fs_pwd.patch
Little fixes/cleanups
+add-prctl-to-modify-current-comm.patch
Allow current->comm to be modified via prctl()
+md-remove-md_flush_all.patch
+md-make-retry_list-non-global-in-raid1-and-multipath.patch
+md-rationalise-issue_flush-function-in-md-personalities.patch
+md-rationalise-unplug-functions-in-md.patch
+md-make-sure-md-always-uses-rdev_dec_pending-properly.patch
+md-fix-two-little-bugs-in-raid10.patch
+md-modify-locking-when-accessing-subdevices-in-md.patch
RAID update
+blk-max_sectors-tunables.patch
Make the per-queue max_sectors tunable for latency purposes
+generic-acl-support-for-permission.patch
+generic-acl-support-for-permission-fix.patch
+generic-acl-support-for-permission-keyfs-fix.patch
ACL code consolidation
+device-driver-for-the-sgi-system-clock-mmtimer.patch
New device driver
+rtl8150-fix.patch
Net driver fix
+close-race-with-preempt-and-modular-pm_idle-callbacks.patch
Fix a PM idle-handler race
+cacheline-align-pagevec-structure.patch
Finctune the pagevec size
+hvcs-fix-to-replace-yield-with-tty_wait_until_sent-in.patch
HVCS driver fix
+fbdev-remove-unnecessary-banshee_wait_idle-from-tdfxfb.patch
+fbdev-fix-logo-drawing-failure-for-vga16fb.patch
+fbdev-initialize-i810fb-after-agpgart.patch
+fbdev-fix-userland-compile-breakage.patch
+fbcon-fix-setup-boot-options-of-fbcon.patch
+fbdev-pass-struct-device-to-class_simple_device_add.patch
+fbdev-add-tile-blitting-support.patch
fbdev update
+fix-for-spurious-interrupts-on-e100-resume.patch
e100 PM resume workaround
+r8169-miscalculation-of-available-tx-descriptors.patch
+r8169-hint-for-tx-flow-control.patch
+r8169-tso-support.patch
+r8169-mac-identifier-extracted-from-realteks-driver-v22.patch
net driver update
+uml-remove-ghash.patch
+uml-eliminate-useless-thread-field.patch
+uml-fix-scheduler-race.patch
+uml-fix-binary-layout-assumption.patch
+uml-disable-pending-signals-across-a-reboot.patch
+uml-refer-to-config_usermode-not-to-config_um.patch
+uml-remove-commented-old-code-in-kconfig.patch
+uml-smp-build-fix.patch
+uml-remove-config_uml_smp.patch
UML updates
+highmem-flushes.patch
Missing dcache flushes in the bounce buffering code
+add-support-for-word-length-uart-registers.patch
Serial driver fix
+compile-fix-3c59x-for-eisa-without-pci.patch
Net driver build fix
+atomic_inc_return-for-i386.patch
+atomic_inc_return-for-x86_64.patch
+atomic_inc_return-for-arm.patch
+atomic_inc_return-for-arm26.patch
+atomic_inc_return-for-sparc64.patch
atomic_inc_return() for various architectures
+fix-uninitialized-warnings-in-mempolicyc.patch
Warning fixes
+online-cpu-with-maxcpus-option-panics.patch
Fix a crash with maxcpus=
+remove-dead-exports-from-fs-fat.patch
+fat-use-hlist_head-for-fat_inode_hashtable-1-6.patch
+fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
+fat-cache-lock-from-per-sb-to-per-inode-3-6.patch
+fat-the-inode-hash-from-per-module-to-per-sb-4-6.patch
+fat-fix-the-race-bitween-fat_free-and-fat_get_cluster.patch
+fat-remove-debug_pr-6-6.patch
fatfs update
+small-linux-hardirqh-tweaks.patch
hardirq.h fixes
+bsd-disklabel-handle-more-than-8-partitions.patch
Fix BSD disklabels
+asm-softirqh-crept-back-in-h8300-and-sh64.patch
Remove unneeded files (again)
+mark-amiflop-non-unloadable.patch
amiflop.c fixlet
+thinkpad-fnfx-key-driver.patch
Thinkpad function key fixes
+netpoll-endian-fixes.patch
netpoll fixes on big-endian
+rewrite-alloc_pidmap.patch
Clean up alloc_pidmap()
+missing-pci_disable_device.patch
Add a warning to check that drivers have called pci_disable_device() (Uses
CONFIG_DEBUG_KERNEL, and shouldn't).
+fbdev-radeonfb-remove-bogus-radeonfb_read-write.patch
radeonfb fix
+add-missing-pci_disable_device-for-pci-based-usb-hcd.patch
+add-missing-pci_disable_device-for-e1000.patch
Add pci_disable_device() to a couple of drivers
+next_thread-bug-fixes.patch
Remove some suspect BUG()s from next_thread().
number of patches in -mm: 432
number of changesets in external trees: 554
number of patches in -mm only: 416
total patches: 970
All 432 patches:
linus.patch
remove-set_fs-from-compat-sched-affinity-syscalls.patch
Remove set_fs() from compat sched affinity syscalls
allow-compat-long-sized-bitmasks-in-affinity-code.patch
Allow compat long sized bitmasks in affinity code
distinct-tgid-tid-cpu-usage.patch
distinct tgid/tid CPU usage
fix-schedstats-null-deref-in-sched_exec.patch
fix schedstats null deref in sched_exec
rock-fix.patch
rock.c: fix double-kfree()
2681-es7000-subarch-update.patch
ES7000 subarch update
exec-fix-posix-timers-leak-and-pending-signal-loss.patch
exec: fix posix-timers leak and pending signal loss
fix-abi-in-set_mempolicy.patch
Fix ABI in set_mempolicy()
__set_page_dirty_nobuffers-mappings.patch
__set_page_dirty_nobuffers mappings
sysfs-backing-store-prepare-file_operations.patch
sysfs backing store - prepare sysfs_file_operations helpers
sysfs-backing-store-prepare-file_operations-fix.patch
fix oops with firmware loading
sysfs-backing-store-add-sysfs_dirent.patch
sysfs backing store - add sysfs_direct structure
sysfs-backing-store-use-sysfs_dirent-tree-in-removal.patch
sysfs backing store: use sysfs_dirent based tree in file removal
sysfs-backing-store-use-sysfs_dirent-tree-in-dir-file_operations.patch
sysfs backing store: use sysfs_dirent based tree in dir file operations
sysfs-backing-store-stop-pinning-dentries-inodes-for-leaves.patch
sysfs backing store: stop pinning dentries/inodes for leaf entries
bk-acpi.patch
acpi-compile-fix.patch
acpi-compile-fix
acpi-x86_64-build-fix.patch
acpi x86_64 build fix
bk-agpgart.patch
bk-alsa.patch
bk-cpufreq.patch
bk-driver-core.patch
ksysfs-warning-fix.patch
ksysfs warning fix
kobject_uevent-warning-fix.patch
kobject_uevent warning fix
bk-ia64.patch
bk-ieee1394.patch
bk-input.patch
fix-smm-failures-on-e750x-systems.patch
fix SMM failures on E750x systems
vsxxxaac-fixups.patch
vsxxxaa.c fixups
allow-i8042-register-location-override-2.patch
allow i8042 register location override #2
bk-netdev.patch
bk-pci.patch
bk-pnp.patch
bk-power.patch
bk-scsi.patch
bk-scsi-target.patch
tmscsim-build-fix.patch
tmscsim-build-fix
bk-usb.patch
bk-watchdog.patch
mm.patch
add -mmN to EXTRAVERSION
mm-swsusp-make-sure-we-do-not-return-to-userspace-where-image-is-on-disk.patch
-mm swsusp: make sure we do not return to userspace where image is on disk
mm-swsusp-copy_page-is-harmfull.patch
-mm swsusp: copy_page is harmfull
swsusp-fix-highmem.patch
swsusp: fix highmem
swsusp-do-not-disable-platform-swsusp-because-s4bios-is-available.patch
swsusp: do not disable platform swsusp because S4bios is available
swsusp-fix-default-powerdown-mode.patch
swsusp: fix default powerdown mode
mark-old-power-managment-as-deprecated-and-clean-it-up.patch
Mark old power managment as deprecated and clean it up
use-global-system_state-to-avoid-system-state-confusion.patch
Use global system_state to avoid system-state confusion
swsusp-error-do-not-oops-after-allocation-failure.patch
swsusp: do not oops after allocation failure
swsusp-documentation-update.patch
swsusp: Documentation update
small-cleanups-for-swsusp.patch
Small cleanups for swsusp
swsusp-kill-crash-when-too-much-memory-is-free.patch
swsusp: kill crash when too much memory is free
swsusp-progress-in-percent.patch
swsusp: progress in percent
swsusp-clean-up-reading.patch
swsusp: clean up reading
swsusp-another-simplification.patch
swsusp: another simplification
acpi-proc-simplify-error-handling.patch
acpi proc: simplify error handling
pegasus-fixes.patch
pegasus.c fixes
pointer-dereference-before-null-check-in-acpi-thermal-driver.patch
Pointer dereference before NULL check in ACPI thermal driver
network-packet-tracer-module-using-kprobes-interface.patch
Network packet tracer module using kprobes interface.
kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)
kgdbL warning fix
kgdb buffer overflow fix
kgdbL warning fix
kgdb: CONFIG_DEBUG_INFO fix
x86_64 fixes
correct kgdb.txt Documentation link (against 2.6.1-rc1-mm2)
kgdb: fix for recent gcc
kgdb warning fixes
THREAD_SIZE fixes for kgdb
Fix stack overflow test for non-8k stacks
kgdb-ga.patch fix for i386 single-step into sysenter
fix TRAP_BAD_SYSCALL_EXITS on i386
add TRAP_BAD_SYSCALL_EXITS config for i386
kgdb-is-incompatible-with-kprobes.patch
kgdb-is-incompatible-with-kprobes
kgdboe-netpoll.patch
kgdb-over-ethernet via netpoll
kgdboe: fix configuration of MAC address
kgdb-x86_64-support.patch
kgdb-x86_64-support.patch for 2.6.2-rc1-mm3
kgdb-x86_64-warning-fixes
kgdb-ia64-support.patch
IA64 kgdb support
ia64 kgdb repair and cleanup
ia64 kgdb fix
kgdb-ia64-fixes.patch
kgdb: ia64 fixes
make-tree_lock-an-rwlock.patch
make mapping->tree_lock an rwlock
must-fix.patch
must fix lists update
must fix list update
mustfix update
must-fix update
mustfix lists
ppc64-lparcfg-fixes-for-processor-counts.patch
ppc64: lparcfg fixes for processor counts
ppc64-lparcfg-whitespace-and-wordwrap-cleanup.patch
ppc64: lparcfg whitespace and wordwrap cleanup.
ppc64-remove-spinline-config-option.patch
ppc64: remove SPINLINE config option
ppc64-rtas-error-logs-can-appear-twice-in-dmesg.patch
ppc64: RTAS error logs can appear twice in dmesg
ppc64-enable-numa-api.patch
ppc64: Enable NUMA API
ppc64-give-the-kernel-an-opd-section.patch
ppc64: give the kernel an OPD section
ppc64-use-nm-synthetic-where-available.patch
ppc64: use nm --synthetic where available
ppc64-clean-up-kernel-command-line-code.patch
ppc64: clean up kernel command line code
ppc64-remove-unused-ppc64_calibrate_delay.patch
ppc64: remove unused ppc64_calibrate_delay
ppc64-remove-eeh-command-line-device-matching-code.patch
ppc64: remove EEH command line device matching code
ppc64-use-early_param.patch
ppc64: use early_param
ppc64-restore-smt-enabled=off-kernel-command-line-option.patch
ppc64: restore smt-enabled=off kernel command line option
ppc64-enable-power5-low-power-mode-in-idle-loop.patch
ppc64: enable POWER5 low power mode in idle loop
ppc64-clean-up-idle-loop-code.patch
ppc64: clean up idle loop code
ppc64-remove-wno-uninitialized.patch
ppc64: remove -Wno-uninitialized
ppc64-fix-real-bugs-uncovered-by-wno-uninitialized-removal.patch
ppc64: Fix real bugs uncovered by -Wno-uninitialized removal
ppc64-fix-spurious-warnings-uncovered-by-wno-uninitialized-removal.patch
ppc64: Fix spurious warnings uncovered by -Wno-uninitialized removal
hvc-uninitialised-variable.patch
hvc: uninitialised variable
ppc64-improved-vsid-allocation-algorithm.patch
ppc64: improved VSID allocation algorithm
ppc64fix-missing-register-in-altivec-context-switch.patch
ppc64: fix missing register in altivec context switch
ppc32-remove-wno-uninitialized.patch
ppc32: remove -Wno-uninitialized
ppc32-pmac-cpufreq-for-ibook-2-600.patch
ppc32: pmac cpufreq for ibook 2 600
lazy-tsss-i-o-bitmap-copy-for-x86-64.patch
lazy TSS's I/O bitmap copy for x86-64
lazy-tsss-i-o-bitmap-copy-for-x86-64-fix.patch
lazy-tsss-i-o-bitmap-copy-for-x86-64-fix
ppc64-reloc_hide.patch
invalidate_inodes-speedup.patch
invalidate_inodes speedup
more invalidate_inodes speedup fixes
dev-mem-restriction-patch.patch
/dev/mem restriction patch
get_user_pages-handle-VM_IO.patch
fix get_user_pages() against mappings of /dev/mem
jbd-remove-livelock-avoidance.patch
JBD: remove livelock avoidance code in journal_dirty_data()
journal_add_journal_head-debug.patch
journal_add_journal_head-debug
list_del-debug.patch
list_del debug check
lockmeter.patch
lockmeter
ia64 CONFIG_LOCKMETER fix
lockmeter-build-fix
lockmeter for x86_64
unplug-can-sleep.patch
unplug functions can sleep
firestream-warnings.patch
firestream warnings
ext3_rsv_cleanup.patch
ext3 block reservation patch set -- ext3 preallocation cleanup
ext3_rsv_base.patch
ext3 block reservation patch set -- ext3 block reservation
ext3 reservations: fix performance regression
ext3 block reservation patch set -- mount and ioctl feature
ext3 block reservation patch set -- dynamically increase reservation window
ext3 reservation ifdef cleanup patch
ext3 reservation max window size check patch
ext3 reservation file ioctl fix
ext3-reservation-default-on.patch
ext3 reservation: default to on
ext3-lazy-discard-reservation-window-patch.patch
ext3 lazy discard reservation window patch
ext3 discard reservation in last iput fix patch
Fix lazy reservation discard
ext3 reservations: bad_inode fix
ext3 reservation discard race fix
ext3-reservations-spelling-fixes.patch
ext3 reservations: Spelling fixes
ext3-reservations-renumber-the-ext3-reservations-ioctls.patch
ext3 reservations: Renumber the ext3 reservations ioctls
ext3-reservations-remove-unneeded-declaration.patch
ext3 reservations: Remove unneeded declaration.
ext3-reservations-turn-ext3-per-sb-reservations-list-into-an-rbtree.patch
ext3 reservations: Turn ext3 per-sb reservations list into an rbtree.
ext3-reservations-split-the-reserve_window-struct-into-two.patch
ext3 reservations: Split the "reserve_window" struct into two
ext3-reservations-smp-protect-the-reservation-during-allocation.patch
ext3 reservations: SMP-protect the reservation during allocation
tty_io-hangup-locking.patch
tty_io.c hangup locking
perfctr-core.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][1/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: core
CONFIG_PERFCTR=n build fix
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][6/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: misc
perfctr-i386.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][2/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: i386
perfctr #if/#ifdef cleanup
perfctr Dothan support
perfctr x86_tests build fix
perfctr x86 init bug
perfctr: K8 fix for internal benchmarking code
perfctr x86 update
perfctr-prescott-fix.patch
Prescott fix for perfctr
perfctr-x86_64.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][3/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: x86_64
perfctr-ppc.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][4/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: PowerPC
perfctr ppc32 update
perfctr update 4/6: PPC32 cleanups
perfctr ppc32 buglet fix
perfctr-virtualised-counters.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][5/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: virtualised counters
perfctr update 6/6: misc minor cleanups
perfctr update 3/6: __user annotations
perfctr-cpus_complement-fix
perfctr cpumask cleanup
perfctr SMP hang fix
make-perfctr_virtual-default-in-kconfig-match-recommendation.patch
Make PERFCTR_VIRTUAL default in Kconfig match recommendation in help text
perfctr-ifdef-cleanup.patch
perfctr ifdef cleanup
perfctr-update-2-6-kconfig-related-updates.patch
perfctr update 2/6: Kconfig-related updates
perfctr-update-5-6-reduce-stack-usage.patch
perfctr update 5/6: reduce stack usage
perfctr-low-level-documentation.patch
perfctr low-level documentation
perfctr documentation update
perfctr-inheritance-1-3-driver-updates.patch
perfctr inheritance 1/3: driver updates
perfctr inheritance illegal sleep bug
perfctr-inheritance-2-3-kernel-updates.patch
perfctr inheritance 2/3: kernel updates
perfctr-inheritance-3-3-documentation-updates.patch
perfctr inheritance 3/3: documentation updates
perfctr-inheritance-locking-fix.patch
perfctr inheritance locking fix
ext3-online-resize-patch.patch
ext3: online resizing
ext3-online-resize-warning-fix
sched-trivial-sched-changes.patch
sched: trivial sched changes
sched-add-cpu_down_prepare-notifier.patch
sched: add CPU_DOWN_PREPARE notifier
sched-integrate-cpu-hotplug-and-sched-domains.patch
sched: integrate cpu hotplug and sched domains
sched-arch_destroy_sched_domains-warning-fix.patch
sched: arch_destroy_sched_domains warning fix
sched-sched-add-load-balance-flag.patch
sched: sched add load balance flag
sched-remove-disjoint-numa-domains-setup.patch
sched: remove disjoint NUMA domains setup
sched-make-domain-setup-overridable.patch
sched: make domain setup overridable
sched-make-domain-setup-overridable-rename.patch
sched-make-domain-setup-overridable: rename IDLE
sched-ia64-add-disjoint-numa-domain-support.patch
sched: IA64 add disjoint NUMA domain support
sched-fix-domain-debug-for-isolcpus.patch
sched: fix domain debug for isolcpus
sched-enable-sd_load_balance.patch
sched: enable SD_LOAD_BALANCE
sched-hotplug-add-a-cpu_down_failed-notifier.patch
sched: hotplug add a CPU_DOWN_FAILED notifier
sched-use-cpu_down_failed-notifier.patch
sched: use CPU_DOWN_FAILED notifier
sched-fixes-for-ia64-domain-setup.patch
sched: fixes for ia64 domain setup
nicksched.patch
nicksched
nicksched-sched_fifo-fix.patch
nicksched: SCHED_FIFO fix
sched-smtnice-fix.patch
sched: SMT nice fix
ext3_bread-cleanup.patch
ext3_bread() cleanup
pcmcia-implement-driver-model-support.patch
pcmcia: implement driver model support
pcmcia-update-network-drivers.patch
pcmcia: update network drivers
pcmcia-update-wireless-drivers.patch
pcmcia: update wireless drivers
pcmcia-fix-eject-lockup.patch
pcmcia: fix eject lockup
pcmcia-add-hotplug-support.patch
pcmcia: add *hotplug support
linux-2.6.8.1-49-rpc_workqueue.patch
nfs: RPC: Convert rpciod into a work queue for greater flexibility
linux-2.6.8.1-50-rpc_queue_lock.patch
nfs: RPC: Remove the rpc_queue_lock global spinlock
dvdrw-support-for-267-bk13.patch
DVD+RW support for 2.6.7-bk13
packet-writing-credits.patch
packet-writing: add credits
cdrw-packet-writing-support-for-267-bk13.patch
CDRW packet writing support
packet: remove #warning
packet writing: door unlocking fix
pkt_lock_door() warning fix
Fix race in pktcdvd kernel thread handling
Fix open/close races in pktcdvd
packet writing: review fixups
Remove pkt_dev from struct pktcdvd_device
packet writing: convert to seq_file
dvd-rw-packet-writing-update.patch
Packet writing support for DVD-RW and DVD+RW discs.
Get blockdev size right in pktcdvd after switching discs
packet-writing-docco.patch
packet writing documentation
Trivial CDRW packet writing doc update
control-pktcdvd-with-an-auxiliary-character-device.patch
Control pktcdvd with an auxiliary character device
Subject: Re: 2.6.8-rc2-mm2
control-pktcdvd-with-an-auxiliary-character-device-fix
simplified-request-size-handling-in-cdrw-packet-writing.patch
Simplified request size handling in CDRW packet writing
fix-setting-of-maximum-read-speed-in-cdrw-packet-writing.patch
Fix setting of maximum read speed in CDRW packet writing
packet-writing-reporting-fix.patch
Packet writing reporting fixes
speed-up-the-cdrw-packet-writing-driver.patch
Speed up the cdrw packet writing driver
packet-writing-avoid-bio-hackery.patch
packet writing: avoid BIO hackery
cdrom-buffer-size-fix.patch
cdrom: buffer sizing fix
cpufreq-driver-for-nforce2-kernel-267.patch
cpufreq driver for nForce2
allow-modular-ide-pnp.patch
allow modular ide-pnp
create-nodemask_t.patch
Create nodemask_t
nodemask fix
nodemask build fix
b44-add-47xx-support.patch
b44: add 47xx support
allow-x86_64-to-reenable-interrupts-on-contention.patch
Allow x86_64 to reenable interrupts on contention
serial-cs-and-unusable-port-size-ranges.patch
serial-cs and unusable port size ranges
add-support-for-it8212-ide-controllers.patch
Add support for IT8212 IDE controllers
i386-hotplug-cpu.patch
i386 Hotplug CPU
hotplug-cpu-fix-apic-queued-timer-vector-race.patch
Hotplug cpu: Fix APIC queued timer vector race
hotplug-cpu-move-cpu_online_map-clear-to-__cpu_disable.patch
Hotplug cpu: Move cpu_online_map clear to __cpu_disable
igxb-speedup.patch
igxb speedup
serialize-access-to-ide-devices.patch
serialize access to ide devices
remove-unconditional-pci-acpi-irq-routing.patch
remove unconditional PCI ACPI IRQ routing
propagate-pci_enable_device-errors.patch
propagate pci_enable_device() errors
disable-atykb-warning.patch
disable atykb "too many keys pressed" warning
add-some-key-management-specific-error-codes.patch
Add some key management specific error codes
keys-new-error-codes-for-alpha-mips-pa-risc-sparc-sparc64.patch
keys: new error codes for Alpha, MIPS, PA-RISC, Sparc & Sparc64
implement-in-kernel-keys-keyring-management.patch
implement in-kernel keys & keyring management
keys build fix
keys & keyring management update patch
implement-in-kernel-keys-keyring-management-update-build-fix
implement-in-kernel-keys-keyring-management-update-build-fix-2
key management patch cleanup
make-key-management-code-use-new-the-error-codes.patch
Make key management code use new the error codes
keys-permission-fix.patch
keys: permission fix
keys-keyring-management-keyfs-patch.patch
keys & keyring management: keyfs patch
keyfs-build-fix.patch
keyfs build fix
implement-in-kernel-keys-keyring-management-afs-workaround.patch
implement-in-kernel-keys-keyring-management afs workaround
support-supplementary-information-for-request-key.patch
Support supplementary information for request-key
make-key-management-use-syscalls-not-prctls.patch
Make key management use syscalls not prctls
move-syscall-declarations-from-linux-keyh-2.patch
Move syscall declarations from linux/key.h #2
make-key-management-use-syscalls-not-prctls-build-fix.patch
make-key-management-use-syscalls-not-prctls build fix
export-file_ra_state_init-again.patch
Export file_ra_state_init() again
cachefs-filesystem.patch
CacheFS filesystem
cachefs-return-the-right-error-upon-invalid-mount.patch
CacheFS: return the right error upon invalid mount
remove-error-from-linux-cachefsh.patch
Remove #error from linux/cachefs.h
cachefs-warning-fix-2.patch
cachefs warning fix 2
cachefs-linkage-fix-2.patch
cachefs linkage fix
cachefs-build-fix.patch
cachefs build fix
cachefs-documentation.patch
CacheFS documentation
add-page-becoming-writable-notification.patch
Add page becoming writable notification
provide-a-filesystem-specific-syncable-page-bit.patch
Provide a filesystem-specific sync'able page bit
provide-a-filesystem-specific-syncable-page-bit-fix.patch
provide-a-filesystem-specific-syncable-page-bit-fix
make-afs-use-cachefs.patch
Make AFS use CacheFS
ide-probe.patch
ide probe
268-rc3-jffs2-unable-to-read-filesystems.patch
jffs2 unable to read filesystems
qlogic-isp2x00-remove-needless-busyloop.patch
QLogic ISP2x00: remove needless busyloop
jffs2-mount-options-discarded.patch
JFFS2 mount options discarded
assign_irq_vector-section-fix.patch
assign_irq_vector __init section fix
find_isa_irq_pin-should-not-be-__init.patch
find_isa_irq_pin should not be __init
kexec-i8259-shutdowni386.patch
kexec: i8259-shutdown.i386
kexec-i8259-shutdown-x86_64.patch
kexec: x86_64 i8259 shutdown
kexec-apic-virtwire-on-shutdowni386patch.patch
kexec: apic-virtwire-on-shutdown.i386.patch
kexec-apic-virtwire-on-shutdownx86_64.patch
kexec: apic-virtwire-on-shutdown.x86_64
kexec-ioapic-virtwire-on-shutdowni386.patch
kexec: ioapic-virtwire-on-shutdown.i386
kexec-ioapic-virtwire-on-shutdownx86_64.patch
kexec: ioapic-virtwire-on-shutdown.x86_64
kexec-e820-64bit.patch
kexec: e820-64bit
kexec-kexec-generic.patch
kexec: kexec-generic
kexec-machine_shutdownx86_64.patch
kexec: machine_shutdown.x86_64
kexec-kexecx86_64.patch
kexec: kexec.x86_64
kexec-machine_shutdowni386.patch
kexec: machine_shutdown.i386
kexec-kexeci386.patch
kexec: kexec.i386
kexec-use_mm.patch
kexec: use_mm
kexec-kexecppc.patch
kexec: kexec.ppc
kexec-ppc-kexec-kconfig-misplacement.patch
kexec ppc KEXEC Kconfig misplacement
new-bitmap-list-format-for-cpusets.patch
new bitmap list format (for cpusets)
cpusets-big-numa-cpu-and-memory-placement.patch
cpusets - big numa cpu and memory placement
cpusets-dont-export-proc_cpuset_operations.patch
Cpusets - Dont export proc_cpuset_operations
cpusets-display-allowed-masks-in-proc-status.patch
cpusets: display allowed masks in proc status
cpusets-simplify-cpus_allowed-setting-in-attach.patch
cpusets: simplify cpus_allowed setting in attach
cpusets-remove-useless-validation-check.patch
cpusets: remove useless validation check
cpusets-config_cpusets-depends-on-smp.patch
Cpusets: CONFIG_CPUSETS depends on SMP
cpusets-tasks-file-simplify-format-fixes.patch
Cpusets tasks file: simplify format, fixes
cpusets-fix-possible-race-in-cpuset_tasks_read.patch
cpusets: fix possible race in cpuset_tasks_read()
cpusets-simplify-memory-generation.patch
Cpusets: simplify memory generation
cpusets-interoperate-with-hotplug-online-maps.patch
cpusets: interoperate with hotplug online maps
reiser4-sb_sync_inodes.patch
reiser4: vfs: add super_operations.sync_inodes()
reiser4-sb_sync_inodes-cleanup.patch
reiser4-sb_sync_inodes-cleanup
reiser4-allow-drop_inode-implementation.patch
reiser4: export vfs inode.c symbols
reiser4-allow-drop_inode-implementation-cleanup.patch
reiser4-allow-drop_inode-implementation-cleanup
reiser4-truncate_inode_pages_range.patch
reiser4: vfs: add truncate_inode_pages_range()
reiser4-truncate_inode_pages_range-cleanup.patch
reiser4-truncate_inode_pages_range-cleanup
reiser4-export-remove_from_page_cache.patch
reiser4: export pagecache add/remove functions to modules
reiser4-export-page_cache_readahead.patch
reiser4: export page_cache_readahead to modules
reiser4-reget-page-mapping.patch
reiser4: vfs: re-check page->mapping after calling try_to_release_page()
reiser4-rcu-barrier.patch
reiser4: add rcu_barrier() synchronization point
reiser4-rcu-barrier-fix.patch
reiser4-rcu-barrier fix
reiser4-export-inode_lock.patch
reiser4: export inode_lock to modules
reiser4-export-inode_lock-cleanup.patch
reiser4-export-inode_lock-cleanup
reiser4-export-pagevec-funcs.patch
reiser4: export pagevec functions to modules
reiser4-export-pagevec-funcs-cleanup.patch
reiser4-export-pagevec-funcs-cleanup
reiser4-export-radix_tree_preload.patch
reiser4: export radix_tree_preload() to modules
reiser4-radix-tree-tag.patch
reiser4: add new radix tree tag
reiser4-radix_tree_lookup_slot.patch
reiser4: add radix_tree_lookup_slot()
reiser4-aliased-dir.patch
reiser4: vfs: handle aliased directories
reiser4-kobject-umount-race.patch
reiser4: introduce filesystem kobjects
reiser4-kobject-umount-race-cleanup.patch
reiser4-kobject-umount-race-cleanup
reiser4-perthread-pages.patch
reiser4: per-thread page pools
reiser4-unstatic-kswapd.patch
reiser4: make kswapd() unstatic for debug
reiser4-include-reiser4.patch
reiser4: add to build system
reiser4-4kstacks-fix.patch
resier4-4kstacks-fix
stop-reiser4-from-turning-itself-on-by-default.patch
Stop reiser4 from turning itself on by default
reiser4-doc.patch
reiser4: documentation
reiser4-doc-update.patch
Update Documentation/Changes for reiser4
reiser4-only.patch
reiser4: main fs
reiser4-debug-build-fix.patch
reiser4-debug-build-fix
reiser4-prefetch-warning-fix.patch
reiser4: prefetch warning fix
reiser4-mode-fix.patch
reiser4: mode type fix
reiser4-get_context_ok-warning-fixes.patch
reiser4: get_context_ok() warning fixes
reiser4-remove-debug.patch
resier4: remove debug stuff
reiser4-spinlock-debugging-build-fix-2.patch
reiser4-spinlock-debugging-build-fix-2
reiser4-sparc64-build-fix.patch
reiser4 sparc64 build fix
sys_reiser4-sparc64-build-fix.patch
sys_reiser4 sparc64 build fix
reiser4-printk-warning-fixes.patch
reiser4 printk warning fixes
add-acpi-based-floppy-controller-enumeration.patch
Add ACPI-based floppy controller enumeration.
add-acpi-based-floppy-controller-enumeration-fix.patch
add-acpi-based-floppy-controller-enumeration fix
update-acpi-floppy-enumeration.patch
update ACPI floppy enumeration
possible-dcache-bug-debugging-patch.patch
Possible dcache BUG: debugging patch
kallsyms-data-size-reduction--lookup-speedup.patch
kallsyms data size reduction / lookup speedup
inconsistent-kallsyms-fix.patch
Inconsistent kallsyms fix
kallsyms-correct-type-char-in-proc-kallsyms.patch
kallsyms: correct type char in /proc/kallsyms
kallsyms-fix-sparc-gibberish.patch
kallsyms: fix sparc gibberish
tioccons-security.patch
TIOCCONS security
fix-process-start-times.patch
Fix reporting of process start times
fix-comment-in-include-linux-nodemaskh.patch
Fix comment in include/linux/nodemask.h
x86-build-issue-with-software-suspend-code.patch
Fix x86 build issue with software suspend code
hpt366c-wrong-timings-used-since-268.patch
hpt366.c: wrong timings
move-waitqueue-functions-to-kernel-waitc.patch
move waitqueue functions to kernel/wait.c
standardize-bit-waiting-data-type.patch
standardize bit waiting data type
provide-a-filesystem-specific-syncable-page-bit-fix-2.patch
provide-a-filesystem-specific-syncable-page-bit-fix-2
consolidate-bit-waiting-code-patterns.patch
consolidate bit waiting code patterns
consolidate-bit-waiting-code-patterns-cleanup
__wait_on_bit-fix
eliminate-bh-waitqueue-hashtable.patch
eliminate bh waitqueue hashtable
eliminate-bh-waitqueue-hashtable-fix.patch
wait_on_bit_lock() must test_and_set_bit(), not test_bit()
eliminate-inode-waitqueue-hashtable.patch
eliminate inode waitqueue hashtable
move-wait-ops-contention-case-completely-out-of-line.patch
move wait ops' contention case completely out of line
reduce-number-of-parameters-to-__wait_on_bit-and-__wait_on_bit_lock.patch
reduce number of parameters to __wait_on_bit() and __wait_on_bit_lock()
document-wake_up_bits-requirement-for-preceding-memory-barriers.patch
document wake_up_bit()'s requirement for preceding memory barriers
3c59x-pm-fix.patch
3c59x: enable power management unconditionally
serial-mpsc-driver.patch
Serial MPSC driver
serial-add-support-for-non-standard-xtals-to-16c950-driver.patch
serial: add support for non-standard XTALs to 16c950 driver
add-support-for-possio-gcc-aka-pcmcia-siemens-mc45.patch
Add support for Possio GCC AKA PCMCIA Siemens MC45
searching-for-parameters-in-make-menuconfig.patch
searching for parameters in 'make menuconfig'
menuconfig-regex-search-dependencies.patch
menuconfig: regex search + dependencies
add-smc91x-ethernet-for-lpd7a40x.patch
add SMC91x ethernet for LPD7A40X
m32r-base.patch
m32r architecture
m32r-update-for-profiling.patch
m32r: update for profiling
m32r-update-zone_sizes_init.patch
m32r: update zone_sizes_init()
m32r-update-to-fix-compile-errors.patch
m32r: update to fix compile errors
m32r-update-uaccessh.patch
m32r: update uaccess.h
m32r-update-checksum-functions.patch
m32r: update checksum functions
m32r-update-cf-pcmcia-drivers.patch
m32r: update CF/PCMCIA drivers
m32r-update-headers-to-remove-useless-ibcs2-support-code.patch
m32r: update headers to remove useless iBCS2 support code
atomic_inc_return-for-m32r-re.patch
atomic_inc_return for m32r
m32r-change-from-export_symbol_novers-to-export_symbol.patch
m32r: change from EXPORT_SYMBOL_NOVERS to EXPORT_SYMBOL
m32r-modify-sys_ipc-to-remove-useless-ibcs2-support-code.patch
m32r: modify sys_ipc() to remove useless iBCS2 support code
m32r-add-elf-machine-code.patch
m32r: add ELF machine code
m32r-upgrade-to-2681-kernel.patch
m32r: upgrade to 2.6.8.1 kernel
m32r-support-a-new-bootloader-m32r-g00ff.patch
m32r: support a new bootloader "m32r-g00ff"
m32r-modify-io-routines-for-m32700ut-cf-access.patch
m32r: modify IO routines for m32700ut CF access
vm-pageout-throttling.patch
vm: pageout throttling
fix-race-in-sysfs_read_file-and-sysfs_write_file.patch
Fix race in sysfs_read_file() and sysfs_write_file()
possible-race-in-sysfs_read_file-and-sysfs_write_file-update.patch
Possible race in sysfs_read_file() and sysfs_write_file()
md-add-interface-for-userspace-monitoring-of-events.patch
md: add interface for userspace monitoring of events.
lazy-tsss-i-o-bitmap-copy-for-i386.patch
lazy TSS's I/O bitmap copy for i386
pnpbios-parser-bugfix.patch
pnpbios parser bugfix
unreachable-code-in-ext3_direct_io.patch
unreachable code in ext3_direct_IO()
fix-for-nforce2-secondary-ide-getting-wrong-irq.patch
Fix for NForce2 secondary IDE getting wrong IRQ
revert-allow-oem-written-modules-to-make-calls-to-ia64-oem-sal-functions.patch
revert "allow OEM written modules to make calls to ia64 OEM SAL functions"
shmem-dont-slab_hwcache_align.patch
shmem: don't SLAB_HWCACHE_ALIGN
shmem-inodes-and-links-need-lowmem.patch
shmem: inodes and links need lowmem
shmem-no-sbinfo-for-shm-mount.patch
shmem: no sbinfo for shm mount
shmem-no-sbinfo-for-tmpfs-mount.patch
shmem: no sbinfo for tmpfs mount?
shmem-avoid-the-shmem_inodes-list.patch
shmem: avoid the shmem_inodes list
shmem-rework-majmin-and-zero_page.patch
shmem: rework majmin and ZERO_PAGE
shmem-copyright-file_setup-trivia.patch
shmem: Copyright file_setup trivia
allocate-correct-amount-of-memory-for-pid-hash.patch
Allocate correct amount of memory for pid hash
misrouted-irq-recovery-take-2.patch
Misrouted IRQ recovery, take 2
misrouted-irq-recovery-take-2-cleanup.patch
misrouted-irq-recovery-take-2 cleanup
misrouted-irq-recovery-take-2-fix.patch
misrouted-irq-recovery-take-2 fix
misrouted-irq-recovery-docs.patch
misrouted-irq-recovery documentation
explicity-align-tss-stack.patch
explicity align tss->stack
check-checksums-for-bnep.patch
Check checksums for BNEP
remember-to-check-return-value-from-__copy_to_user-in.patch
__copy_to_user() check in cdrom_read_cdda_old()
cfq-iosched-v2.patch
CFQ iosched v2
dont-export-blkdev_open-and-def_blk_ops.patch
don't export blkdev_open and def_blk_ops
remove-dead-code-from-fs-mbcachec.patch
remove dead code from fs/mbcache.c
remove-posix_acl_masq_nfs_mode.patch
remove posix_acl_masq_nfs_mode
make-kmem_find_general_cachep-static-in-slabc.patch
make kmem_find_general_cachep static in slab.c
dont-export-shmem_file_setup.patch
don't export shmem_file_setup
remove-pm_find-unexport-pm_send.patch
remove pm_find, unexport pm_send
remove-dead-code-and-exports-from-signalc.patch
remove dead code and exports from signal.c
mark-md_interrupt_thread-static.patch
mark md_interrupt_thread static
unexport-proc_sys_root.patch
unexport proc_sys_root
mark-dq_list_lock-static.patch
mark dq_list_lock static
unexport-is_subdir-and-shrink_dcache_anon.patch
unexport is_subdir and shrink_dcache_anon
unexport-devfs_mk_symlink.patch
unexport devfs_mk_symlink
unexport-do_execve-do_select.patch
unexport do_execve/do_select
unexport-exit_mm.patch
unexport exit_mm
unexport-files_lock-and-put_filp.patch
unexport files_lock and put_filp
remove-exports-from-audit-code.patch
remove exports from audit code
unexport-f_delown.patch
unexport f_delown
unexport-lookup_create.patch
unexport lookup_create
remove-wake_up_all_sync.patch
remove wake_up_all_sync
remove-set_fs_root-set_fs_pwd.patch
remove set_fs_root/set_fs_pwd
add-prctl-to-modify-current-comm.patch
Add prctl to modify current->comm
md-remove-md_flush_all.patch
md: remove md_flush_all
md-make-retry_list-non-global-in-raid1-and-multipath.patch
md: make retry_list non-global in raid1 and multipath
md-rationalise-issue_flush-function-in-md-personalities.patch
md: rationalise issue_flush function in md personalities
md-rationalise-unplug-functions-in-md.patch
md: rationalise unplug functions in md
md-make-sure-md-always-uses-rdev_dec_pending-properly.patch
md: make sure md always uses rdev_dec_pending properly
md-fix-two-little-bugs-in-raid10.patch
md: fix two little bugs in raid10
md-modify-locking-when-accessing-subdevices-in-md.patch
md: modify locking when accessing subdevices in md
blk-max_sectors-tunables.patch
blk: max_sectors tunables
generic-acl-support-for-permission.patch
generic acl support for ->permission
generic-acl-support-for-permission-fix.patch
generic acl support for ->permission fix
generic-acl-support-for-permission-keyfs-fix.patch
generic-acl-support-for-permission-keyfs-fix
device-driver-for-the-sgi-system-clock-mmtimer.patch
device driver for the SGI system clock, mmtimer
rtl8150-fix.patch
rtl8150 fix
close-race-with-preempt-and-modular-pm_idle-callbacks.patch
Close race with preempt and modular pm_idle callbacks
cacheline-align-pagevec-structure.patch
Adjust align pagevec structure
hvcs-fix-to-replace-yield-with-tty_wait_until_sent-in.patch
HVCS fix to replace yield with tty_wait_until_sent in hvcs_close
fbdev-remove-unnecessary-banshee_wait_idle-from-tdfxfb.patch
fbdev: remove unnecessary banshee_wait_idle from tdfxfb
fbdev-fix-logo-drawing-failure-for-vga16fb.patch
fbdev: fix logo drawing failure for vga16fb
fbdev-initialize-i810fb-after-agpgart.patch
fbdev: Initialize i810fb after agpgart
fbdev-fix-userland-compile-breakage.patch
fbdev: Fix userland compile breakage
fbcon-fix-setup-boot-options-of-fbcon.patch
fbcon: Fix setup boot options of fbcon
fbdev-pass-struct-device-to-class_simple_device_add.patch
fbdev: Pass struct device to class_simple_device_add
fbdev-add-tile-blitting-support.patch
fbdev: Add Tile Blitting support
fix-for-spurious-interrupts-on-e100-resume.patch
Fix for spurious interrupts on e100 resume
r8169-miscalculation-of-available-tx-descriptors.patch
r8169: miscalculation of available Tx descriptors
r8169-hint-for-tx-flow-control.patch
r8169: hint for Tx flow control
r8169-tso-support.patch
r8169: TSO support.
r8169-mac-identifier-extracted-from-realteks-driver-v22.patch
r8169: Mac identifier extracted from Realtek's driver v2.2
uml-remove-ghash.patch
uml: remove ghash.h
uml-eliminate-useless-thread-field.patch
uml: eliminate useless thread field
uml-fix-scheduler-race.patch
uml: fix scheduler race
uml-fix-binary-layout-assumption.patch
uml: fix binary layout assumption
uml-disable-pending-signals-across-a-reboot.patch
uml: disable pending signals across a reboot
uml-refer-to-config_usermode-not-to-config_um.patch
uml: refer to CONFIG_USERMODE, not to CONFIG_UM
uml-remove-commented-old-code-in-kconfig.patch
uml: remove commented old code in Kconfig
uml-smp-build-fix.patch
uml: smp build fix
uml-remove-config_uml_smp.patch
uml: remove CONFIG_UML_SMP
highmem-flushes.patch
block highmem flushes
add-support-for-word-length-uart-registers.patch
Add support for word-length UART registers
compile-fix-3c59x-for-eisa-without-pci.patch
compile fix 3c59x for eisa without pci
atomic_inc_return-for-i386.patch
atomic_inc_return() for i386
atomic_inc_return-for-x86_64.patch
atomic_inc_return() for x86_64
atomic_inc_return-for-arm.patch
atomic_inc_return() for arm
atomic_inc_return-for-arm26.patch
atomic_inc_return() for arm26
atomic_inc_return-for-sparc64.patch
atomic_inc_return() for sparc64
show-aggregate-per-process-counters-in-proc-pid-stat-2.patch
show aggregate per-process counters in /proc/PID/stat 2
fix-uninitialized-warnings-in-mempolicyc.patch
fix uninitialized warnings in mempolicy.c
online-cpu-with-maxcpus-option-panics.patch
Online CPU with maxcpus option panics
remove-dead-exports-from-fs-fat.patch
remove dead exports from fs/fat/
fat-use-hlist_head-for-fat_inode_hashtable-1-6.patch
FAT: use hlist_head for fat_inode_hashtable
fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
FAT: rewrite the cache for file allocation table lookup
fat-cache-lock-from-per-sb-to-per-inode-3-6.patch
FAT: cache lock from per sb to per inode
fat-the-inode-hash-from-per-module-to-per-sb-4-6.patch
FAT: the inode hash from per module to per sb
fat-fix-the-race-bitween-fat_free-and-fat_get_cluster.patch
FAT: Fix the race bitween fat_free() and fat_get_cluster()
fat-remove-debug_pr-6-6.patch
FAT: remove debug_pr()
small-linux-hardirqh-tweaks.patch
small <linux/hardirq.h> tweaks
bsd-disklabel-handle-more-than-8-partitions.patch
BSD disklabel: handle more than 8 partitions
asm-softirqh-crept-back-in-h8300-and-sh64.patch
<asm/softirq.h> crept back in h8300 and sh64
mark-amiflop-non-unloadable.patch
mark amiflop non-unloadable
thinkpad-fnfx-key-driver.patch
thinkpad fn+fx key driver
netpoll-endian-fixes.patch
netpoll endian fixes
rewrite-alloc_pidmap.patch
pidhashing: rewrite alloc_pidmap()
missing-pci_disable_device.patch
missing pci_disable_device()
fbdev-radeonfb-remove-bogus-radeonfb_read-write.patch
fbdev/radeonfb: Remove bogus radeonfb_read/write
add-missing-pci_disable_device-for-pci-based-usb-hcd.patch
add missing pci_disable_device for PCI-based USB HCD
add-missing-pci_disable_device-for-e1000.patch
add missing pci_disable_device for e1000
next_thread-bug-fixes.patch
next_thread() BUG fixes
Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> +sched-trivial-sched-changes.patch
> +sched-add-cpu_down_prepare-notifier.patch
> +sched-integrate-cpu-hotplug-and-sched-domains.patch
> +sched-arch_destroy_sched_domains-warning-fix.patch
> +sched-sched-add-load-balance-flag.patch
> +sched-remove-disjoint-numa-domains-setup.patch
> +sched-make-domain-setup-overridable.patch
> +sched-make-domain-setup-overridable-rename.patch
> +sched-ia64-add-disjoint-numa-domain-support.patch
> +sched-fix-domain-debug-for-isolcpus.patch
> +sched-enable-sd_load_balance.patch
> +sched-hotplug-add-a-cpu_down_failed-notifier.patch
> +sched-use-cpu_down_failed-notifier.patch
> +sched-fixes-for-ia64-domain-setup.patch
>
> CPU scheduler work.
>
In particular, anyone who was having trouble with sched-domains and/or CPU
hotplug please test this.
It is supposed to fix all known issues, but some patches are fairly involved,
and not having been tested on problem hardware, there could be still some bugs.
Please let me know if anything goes bug.
Also, ia64 sched-domains setup is possibly still broken. If anyone boots this
on an Altix, please send over the full dmesg! Thanks.
> +lockmeter.patch
>
> Repaired lockmeter patch
This one is still needlessly messing around in procfs internals.
On Monday 13 of September 2004 10:50, Andrew Morton wrote:
>
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
I can't build it on x86-64:
LD init/built-in.o
LD .tmp_vmlinux1
fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
: undefined reference to `vfs_permission'
make: *** [.tmp_vmlinux1] Error 1
The .config is attached.
Greets,
RJW
--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> consolidate-bit-waiting-code-patterns.patch
> eliminate-bh-waitqueue-hashtable.patch
> eliminate-bh-waitqueue-hashtable-fix.patch
> eliminate-inode-waitqueue-hashtable.patch
> move-wait-ops-contention-case-completely-out-of-line.patch
> reduce-number-of-parameters-to-__wait_on_bit-and-__wait_on_bit_lock.patch
> document-wake_up_bits-requirement-for-preceding-memory-barriers.patch
For a general status update, suparna and I are working on the aio
integration with all this (well, thus far mostly suparna).
-- wli
Rafael J. Wysocki writes:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> I can't build it on x86-64:
>
> LD init/built-in.o
> LD .tmp_vmlinux1
> fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
> : undefined reference to `vfs_permission'
> make: *** [.tmp_vmlinux1] Error 1
reiser4 wasn't updated during vfs_permission/generic_permission
conversion. Evil conspiracy is obviously underway.
Untested patch is below.
Andrew, please apply.
Nikita.
----------------------------------------------------------------------
--- perm.c 2004-05-17 14:04:55.000000000 +0400
+++ perm.c.new 2004-09-13 15:07:10.432547928 +0400
@@ -13,7 +13,7 @@
static int
mask_ok_common(struct inode *inode, int mask)
{
- return vfs_permission(inode, mask);
+ return generic_permission(inode, mask, NULL);
}
static int
----------------------------------------------------------------------
>
> The .config is attached.
>
> Greets,
> RJW
>
Update:
On Monday 13 of September 2004 12:48, Rafael J. Wysocki wrote:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> I can't build it on x86-64:
>
> LD init/built-in.o
> LD .tmp_vmlinux1
> fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
> : undefined reference to `vfs_permission'
> make: *** [.tmp_vmlinux1] Error 1
It's reiser4, apparently:
CC fs/reiser4/plugin/security/perm.o
fs/reiser4/plugin/security/perm.c: In function `mask_ok_common':
fs/reiser4/plugin/security/perm.c:16: warning: implicit declaration of
function `vfs_permission'
Greets,
RJW
--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
just don't set ->permission for reiser4 and kill the whole perm_plugin
bullshit. This fixes the issue by removing a few hundred lines of code which
is always a good idea.
OK, starfire broke and qlogicisp. Plus some NUMA stuff in mm/mempolicy.c
Full error log below. Config is (on ia32):
ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/config.numaq
The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
or create-nodemask_t.patch by the looks of it. The only thing touching
starfire is bk-netdev.patch, but as I get very similar errors from qlogicisp
maybe someone's been futzing with readw/writew ?
M.
drivers/net/starfire.c: In function `starfire_init_one':
drivers/net/starfire.c:924: warning: passing arg 1 of `readb' makes pointer from integer without a cast
drivers/net/starfire.c:930: warning: passing arg 1 of `readb' makes pointer from integer without a cast
drivers/net/starfire.c:935: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:937: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:940: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:944: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `mdio_read':
drivers/net/starfire.c:1087: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `mdio_write':
drivers/net/starfire.c:1100: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_open':
drivers/net/starfire.c:1123: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1124: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1162: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1169: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1177: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1179: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1180: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1181: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1182: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1183: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1185: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1189: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1196: warning: passing arg 2 of `writeb' makes pointer from integer without a cast
drivers/net/starfire.c:1199: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1200: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1201: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1205: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1206: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1207: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1213: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1215: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1217: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1219: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1232: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1238: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1240: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1241: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1257: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1260: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `tx_timeout':
drivers/net/starfire.c:1312: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `init_ring':
drivers/net/starfire.c:1356: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `start_tx':
drivers/net/starfire.c:1477: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `intr_handler':
drivers/net/starfire.c:1505: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1522: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1562: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1593: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `__netdev_rx':
drivers/net/starfire.c:1710: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `refill_rx_ring':
drivers/net/starfire.c:1779: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_media_change':
drivers/net/starfire.c:1839: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1841: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1849: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_error':
drivers/net/starfire.c:1865: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `get_stats':
drivers/net/starfire.c:1891: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1892: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1893: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1895: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1895: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1896: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1898: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1898: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1901: warning: passing arg 1 of `readw' makes pointer from integer without a cast
drivers/net/starfire.c:1902: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1903: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1904: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1905: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1906: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `set_rx_mode':
drivers/net/starfire.c:1961: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1962: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1963: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1967: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1968: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1969: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1990: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1991: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1992: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1995: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1998: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_close':
drivers/net/starfire.c:2099: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:2106: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:2109: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:2110: warning: passing arg 1 of `readl' makes pointer from integer without a cast
mm/mempolicy.c: In function `get_zonemask':
mm/mempolicy.c:419: `maxnode' undeclared (first use in this function)
mm/mempolicy.c:419: (Each undeclared identifier is reported only once
mm/mempolicy.c:419: for each function it appears in.)
drivers/scsi/qlogicisp.c: In function `isp_inw':
drivers/scsi/qlogicisp.c:632: warning: passing arg 1 of `readw' makes pointer from integer without a cast
drivers/scsi/qlogicisp.c: In function `isp_outw':
drivers/scsi/qlogicisp.c:641: warning: passing arg 2 of `writew' makes pointer from integer without a cast
Martin wrote:
> The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
> or create-nodemask_t.patch by the looks of it
The numa one, with the following errors:
mm/mempolicy.c: In function `get_zonemask':
mm/mempolicy.c:419: error: `maxnode' undeclared (first use in this function)
is due to fix-abi-in-set_mempolicy.patch.
See my fix on lkml:
Subject: [PATCH] undo more numa maxnode confusions
Date: Mon, 13 Sep 2004 05:58:48 -0700
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
--Paul Jackson <[email protected]> wrote (on Monday, September 13, 2004 08:18:41 -0700):
> Martin wrote:
>> The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
>> or create-nodemask_t.patch by the looks of it
>
> The numa one, with the following errors:
>
> mm/mempolicy.c: In function `get_zonemask':
> mm/mempolicy.c:419: error: `maxnode' undeclared (first use in this function)
>
> is due to fix-abi-in-set_mempolicy.patch.
>
> See my fix on lkml:
>
> Subject: [PATCH] undo more numa maxnode confusions
> Date: Mon, 13 Sep 2004 05:58:48 -0700
That worked - thanks.
The others seem only to be warnings, and are allegedly no worse than before,
so maybe it'll work now ;-)
M.
> so maybe it'll work now ;-)
It's not working for me - on a small ia64 SN2, it crashes during boot.
Somewhere between the 32 and 42 patch of Andrews broken out set of 436
patches ... I'm still in the binary search loop.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
--- ./fs/proc/array.c.nt 2004-09-13 18:56:17.000000000 +0400
+++ ./fs/proc/array.c 2004-09-13 19:13:03.749684712 +0400
@@ -338,6 +338,7 @@ static int do_task_stat(struct task_stru
spin_lock_irq(&task->sighand->siglock);
num_threads = atomic_read(&task->signal->count);
collect_sigign_sigcatch(task, &sigign, &sigcatch);
+ spin_unlock_irq(&task->sighand->siglock);
/* add up live thread stats at the group level */
if (whole) {
@@ -350,8 +351,6 @@ static int do_task_stat(struct task_stru
t = next_thread(t);
} while (t != task);
}
-
- spin_unlock_irq(&task->sighand->siglock);
}
if (task->signal) {
if (task->signal->tty) {
On Monday, September 13, 2004 2:22 am, Nick Piggin wrote:
> In particular, anyone who was having trouble with sched-domains and/or CPU
> hotplug please test this.
>
> It is supposed to fix all known issues, but some patches are fairly
> involved, and not having been tested on problem hardware, there could be
> still some bugs. Please let me know if anything goes bug.
>
> Also, ia64 sched-domains setup is possibly still broken. If anyone boots
> this on an Altix, please send over the full dmesg! Thanks.
Didn't you get my last mail about this? Looks like the lack
of !defined(SD_NODE_INIT) in sched.h made its way to Andrew. Here's the
dmesg from a 2p, 1 node box, I'll send out a more complete one later (unless
Paul beat me to it, I'm still only part way through my lkml mailbox).
Thanks,
Jesse
Jesse wrote:
> I'll send out a more complete one later (unless
> Paul beat me to it,
See my patch posted a few hours ago:
[Patch] Fix sched make domain setup overridable
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
> Jesse wrote:
> > I'll send out a more complete one later (unless
> > Paul beat me to it,
>
> See my patch posted a few hours ago:
>
> [Patch] Fix sched make domain setup overridable
Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
bigger system). I've got a 32p reserved for later today.
Jesse
Kirill Korotaev <[email protected]> wrote:
>
> Hello Andrew,
>
> Please replace patch next_thread-bug-fixes.patch in -mm5 tree with the
> last diff-next_thread I sent to you.
I was planning on replacing it with Ingo's patch.
--- linux/fs/proc/array.c.orig
+++ linux/fs/proc/array.c
@@ -356,7 +356,7 @@ static int do_task_stat(struct task_stru
stime = task->signal->stime;
}
}
- if (whole) {
+ if (whole && task->sighand) {
Is there some reason why your patch is better? If so, please do a full
resend.
> And it looks like thread loop in do_task_stat() doesn't require siglock
> lock, so you can add the patch attached to reduce lock area.
hm, OK.
* Andrew Morton <[email protected]>:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
Some badness with OHCI usb, usb devices just 'aren't there' for me.
- good -
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: irq 19, pci mem 0xf4000000
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 1
Sep 9 18:51:58 tienel kernel: hub 1-0:1.0: USB hub found
Sep 9 18:51:58 tienel kernel: hub 1-0:1.0: 4 ports detected
Sep 9 18:51:58 tienel kernel: USB Universal Host Controller Interface driver v2.2
Sep 9 18:51:58 tienel kernel: usb 1-1: new full speed USB device using address 2
- -
- bad (as in 2.6.9-rc1-mm5) -
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: irq 19, pci mem 0xf4000000
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 1
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: remove, state 0
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: USB bus 1 deregistered
Sep 13 23:01:19 tienel kernel: ohci_hcd: probe of 0000:02:00.0 failed with error -16
- -
--
Psi -- <http://www.iki.fi/pasi.savolainen>
On Monday 13 of September 2004 10:50, Andrew Morton wrote:
>
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> and will later appear at
>
>
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
It does not compile on SMP x86-64 w/ NUMA:
CC arch/x86_64/ia32/ia32_ioctl.o
In file included from fs/compat_ioctl.c:63,
from arch/x86_64/ia32/ia32_ioctl.c:14:
include/linux/reiserfs_fs.h:441: error: redefinition of `struct key'
include/linux/reiserfs_fs.h: In function `le_key_k_offset':
include/linux/reiserfs_fs.h:608: error: structure has no member named `u'
include/linux/reiserfs_fs.h:609: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `le_key_k_type':
include/linux/reiserfs_fs.h:620: error: structure has no member named `u'
include/linux/reiserfs_fs.h:621: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_le_key_k_offset':
include/linux/reiserfs_fs.h:633: error: structure has no member named `u'
include/linux/reiserfs_fs.h:634: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_le_key_k_type':
include/linux/reiserfs_fs.h:647: error: structure has no member named `u'
include/linux/reiserfs_fs.h:648: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_offset':
include/linux/reiserfs_fs.h:677: error: structure has no member named `u'
include/linux/reiserfs_fs.h:678: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_type':
include/linux/reiserfs_fs.h:684: error: structure has no member named `u'
include/linux/reiserfs_fs.h:685: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_cpu_key_k_offset':
include/linux/reiserfs_fs.h:691: error: structure has no member named `u'
include/linux/reiserfs_fs.h:692: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_cpu_key_k_type':
include/linux/reiserfs_fs.h:699: error: structure has no member named `u'
include/linux/reiserfs_fs.h:700: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_offset_dec':
include/linux/reiserfs_fs.h:707: error: structure has no member named `u'
include/linux/reiserfs_fs.h:709: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `le_key_version':
include/linux/reiserfs_fs.h:1869: error: structure has no member named `u'
make[1]: *** [arch/x86_64/ia32/ia32_ioctl.o] Error 1
make: *** [arch/x86_64/ia32] Error 2
The .config is available at:
http://www.sisk.pl/kernel/040913/2.6.9-rc1-mm5-NUMA.config
Greets,
RJW
--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
On Monday, September 13, 2004 11:10 am, Jesse Barnes wrote:
> On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
> > Jesse wrote:
> > > I'll send out a more complete one later (unless
> > > Paul beat me to it,
> >
> > See my patch posted a few hours ago:
> >
> > [Patch] Fix sched make domain setup overridable
>
> Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
> bigger system). I've got a 32p reserved for later today.
Here's one from a 32p, 16 node machine (captured while scsi was still coming
up, but you probably don't care about that).
Jesse
On an Altix with the default config (smp+preempt, see
arch/ia64/configs/sn2_defconfig), I'm getting this:
bad: scheduling while atomic!
Call Trace:
[<a000000100017380>] show_stack+0x80/0xa0
sp=e0001c3004adfc40 bsp=e0001c3004ad9098
[<a0000001006bcc70>] schedule+0x11f0/0x16a0
sp=e0001c3004adfe10 bsp=e0001c3004ad8f78
[<a000000100018530>] cpu_idle+0x5b0/0x620
sp=e0001c3004adfe30 bsp=e0001c3004ad8ee8
[<a000000100059a10>] start_secondary+0x2d0/0x300
sp=e0001c3004adfe30 bsp=e0001c3004ad8eb0
[<a000000100008580>] _start+0x260/0x290
sp=e0001c3004adfe30 bsp=e0001c3004ad8eb0
The messages began right after I logged out of an ssh session and haven't
stopped yet.
Jesse
Shortly after the backtrace I've already posted, I got one panic that looked
like this:
Warning: kfree_skb on hard IRQ a0000001006443d0
Unable to handle kernel paging request at virtual address 600000000001e8e0
Warning: kfree_skb on hard IRQ a0000001006443d0
Unable to handle kernel paging request at virtual address 600000000001e8e0
sshd[8790]: Oops 8804682956800 [1]
Modules linked in:
Pid: 8790, CPU 1, comm: sshd
psr : 0000101308526030 ifs : 80000028b0815428 ip : [<2000000000573670>]
Not tainted
ip is at 0x2000000000573670
unat: 0000000000000000 pfs : c000000000000288 rsc : 000000000000000f
rnat: 0000000000000000 bsps: 60000fff7fffc418 pr : 000000000001a529
ldrs: 0000000002100000 ccv : 0000000000000000 fpsr: 0009804c8a74433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : 4000000000042010 b6 : 2000000000573520 b7 : 0000000000000000
f6 : 000000000000000000000 f7 : 000000000000000000000
f8 : 000000000000000000000 f9 : 000000000000000000000
f10 : 000000000000000000000 f11 : 000000000000000000000
r1 : 2000000000684200 r2 : c000000000000288 r3 : 0000000000000001
r8 : 600000000001e8e0 r9 : 0000000000000000 r10 : 0000000000000000
r11 : 60000fffffffafa0 r12 : 60000fffffff7020 r13 : 20000000007392e0
r14 : 0000000000000000 r15 : 0000000000000006 r16 : 0000000005a6a5a9
r17 : 0000000000000000 r18 : 600000000001e8f0 r19 : 600000000001e8e0
r20 : 60000fffffff7040 r21 : 60000fffffff7050 r22 : 0000000000000010
r23 : 60000fff7fffc418 r24 : 0000000000000000 r25 : 0000000000000000
r26 : c00000000000038a r27 : 000000000000000f r28 : 2000000000617e20
r29 : 00001213085a6010 r30 : 60000fffffff7244 r31 : 600000000001eaf4
r32 : 0000000000000002 r33 : 0000000000000000 r34 : 200000000009ae00
r35 : 6000000000024b28 r36 : 6000000000024c10 r37 : 2000000000086610
r38 : c000000000000288 r39 : 6000000000024b20 r40 : 0000000000000002
r41 : 600000000001dcd0 r42 : 200000000009ae00 r43 : 0000000000000001
r44 : 0000000000000000 r45 : 0000000000000000 r46 : 0000000000000006
r47 : 0000000000000000 r48 : 2000000000083060 r49 : c00000000000048e
r50 : 6000000000024b20 r51 : 0000000000000002 r52 : 6000000000027db0
r53 : 6000000000024c38 r54 : 0000000000000002 r55 : 0000000000000000
r56 : 2000000000a647c0 r57 : 0000000000000000 r58 : 60000000000349d0
r59 : 2000000000a540d8 r60 : 60000fffffffaff8 r61 : 0000000000000000
r62 : 6000000000024b70 r63 : 2000000000082d90 r64 : c00000000000058f
r65 : 0000000005a5a969 r66 : 6000000000027e60 r67 : 0000000000000002
r68 : 0000000000000002 r69 : 0000000000000000 r70 : 200000000009ae00
r71 : 6000000000027e68
Kernel panic - not syncing: Aiee, killing interrupt handler!
Rebooting in 5 seconds..
The ip above is in sshd presumably, and the warning message corresponds to
somewhere in tcp_recvmsg:
a0000001006434e0 T tcp_recvmsg
a000000100644760 t tcp_close_state
Is this a known problem?
Thanks,
Jesse
On Mon, 13 Sep 2004 14:56:31 -0700
Jesse Barnes <[email protected]> wrote:
> Shortly after the backtrace I've already posted, I got one panic that looked
> like this:
Do you have PREEMPT enabled with VLAN? If so, that's been fixed
recently, it was some buggy RCU locking in the VLAN code.
On Monday, September 13, 2004 3:36 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 14:56:31 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > Shortly after the backtrace I've already posted, I got one panic that
> > looked like this:
>
> Do you have PREEMPT enabled with VLAN? If so, that's been fixed
> recently, it was some buggy RCU locking in the VLAN code.
Nope, VLAN isn't set:
[jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
# CONFIG_VLAN_8021Q is not set
On Mon, 13 Sep 2004 15:44:07 -0700
Jesse Barnes <[email protected]> wrote:
> Nope, VLAN isn't set:
> [jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
> # CONFIG_VLAN_8021Q is not set
Hmmm, then that's a really strange backtrace. What networking
driver are you using?
> The messages began right after I logged out of an ssh session and haven't
I got the same messages, on another Altix sn2_defconfig, after I had:
1) logged in as the only, root user,
2) played around a bit, then
3) issued a 'reboot' command.
Then bamo - lots of scheduling while atomic! complaints. Though the
shutdown did succeed, and shut the complaints off.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373
On Monday, September 13, 2004 3:47 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 15:44:07 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > Nope, VLAN isn't set:
> > [jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
> > # CONFIG_VLAN_8021Q is not set
>
> Hmmm, then that's a really strange backtrace. What networking
> driver are you using?
tg3. I saw one trace that included do_poll (iirc) and another last week that
had sys_select in it. I'll try to gather some more info.
Jesse
On Mon, 13 Sep 2004 16:54:27 -0700
Jesse Barnes <[email protected]> wrote:
> tg3. I saw one trace that included do_poll (iirc) and another last week that
> had sys_select in it. I'll try to gather some more info.
What you're seeing might be due to the bug fixed by this patch:
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/09/13 12:58:04-07:00 [email protected]
# [NET]: Fix missing spin lock in lltx path.
#
# This fixes a silly missing spin lock in the relock path. For some
# reason it seems to still work when you don't have spinlock debugging
# enabled.
#
# Please apply.
#
# Thanks to Arjan's spinlock debug kernel for finding it.
#
# Signed-off-by: Andi Kleen <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
# net/sched/sch_generic.c
# 2004/09/13 12:57:46-07:00 [email protected] +3 -1
# [NET]: Fix missing spin lock in lltx path.
#
# This fixes a silly missing spin lock in the relock path. For some
# reason it seems to still work when you don't have spinlock debugging
# enabled.
#
# Please apply.
#
# Thanks to Arjan's spinlock debug kernel for finding it.
#
# Signed-off-by: Andi Kleen <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
diff -Nru a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
+++ b/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
@@ -148,8 +148,10 @@
spin_lock(&dev->queue_lock);
return -1;
}
- if (ret == NETDEV_TX_LOCKED && nolock)
+ if (ret == NETDEV_TX_LOCKED && nolock) {
+ spin_lock(&dev->queue_lock);
goto collision;
+ }
}
/* NETDEV_TX_BUSY - we need to requeue */
On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 16:54:27 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > tg3. I saw one trace that included do_poll (iirc) and another last week
> > that had sys_select in it. I'll try to gather some more info.
>
> What you're seeing might be due to the bug fixed by this patch:
> spin_lock(&dev->queue_lock);
> return -1;
> }
> - if (ret == NETDEV_TX_LOCKED && nolock)
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + spin_lock(&dev->queue_lock);
> goto collision;
> + }
> }
>
> /* NETDEV_TX_BUSY - we need to requeue */
Ok, I guess that would explain why I haven't seen this in 2.6.9-rc2. I was
getting my backtraces confused too--I've only seen this one for this bug.
I'll keep an eye out and report anything I see with the latest bk tree.
Thanks,
Jesse
On Mon, 13 Sep 2004 17:03:48 -0700
Jesse Barnes <[email protected]> wrote:
> On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> > On Mon, 13 Sep 2004 16:54:27 -0700
> >
> > Jesse Barnes <[email protected]> wrote:
> > > tg3. I saw one trace that included do_poll (iirc) and another last week
> > > that had sys_select in it. I'll try to gather some more info.
> >
> > What you're seeing might be due to the bug fixed by this patch:
..
> Ok, I guess that would explain why I haven't seen this in 2.6.9-rc2. I was
> getting my backtraces confused too--I've only seen this one for this bug.
> I'll keep an eye out and report anything I see with the latest bk tree.
The patch isn't in the tree yet, you would see the problem in
2.6.9-rc2
Please try to get a clean backtrace with a current tree plus
the patch I posted, and I'll scratch my head some more.
:-)
I'm experiencing TCP related oopses with this kernel (not seen in -mm4),
.config file attached.
Here are two backtraces, the first happened a few seconds after logging
in via ssh, the second happened soon after boot (using selinux=0, just to
make sure).
Oops #1:
-----------
KERNEL: assertion (!skb_queue_empty(&sk->sk_write_queue)) failed at net/ipv4/tcp_timer.c (322)
Unable to handle kernel NULL pointer dereference at virtual address 00000048
printing eip:
c03022c2
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: ipv6 e1000 3c59x ac
CPU: 0
EIP: 0060:[<c03022c2>] Not tainted VLI
EFLAGS: 00010246 (2.6.9-rc1-mm5)
EIP is at tcp_retransmit_skb+0x89/0x340
eax: 00000000 ebx: 00000000 ecx: f7718960 edx: 00000000
esi: f740c2a0 edi: f740c0a8 ebp: c0460f64 esp: c0460f48
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0460000 task=c039dac0)
Stack: f740c0a8 00000000 0000056e f740c2a0 f740c0a8 f740c2a0 f740c10c c0460fa0
c03044b2 c0387ed4 c038901c c038615b 00000142 c0460fb8 f888bb2f f709a778
f70791c0 c181110c 00000001 f740c0a8 f740c2a0 f740c0c8 c0460fb8 c03048af
Call Trace:
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c03044b2>] tcp_retransmit_timer+0xe9/0x434
[<c03048af>] tcp_write_timer+0xb2/0xcd
[<c01249c0>] run_timer_softirq+0xbf/0x17f
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c0112535>] smp_apic_timer_interrupt+0xf2/0xf4
[<c01066ca>] apic_timer_interrupt+0x1a/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
=======================
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c03044b2>] tcp_retransmit_timer+0xe9/0x434
[<c03048af>] tcp_write_timer+0xb2/0xcd
[<c01249c0>] run_timer_softirq+0xbf/0x17f
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c0112535>] smp_apic_timer_interrupt+0xf2/0xf4
[<c01066ca>] apic_timer_interrupt+0x1a/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
Code: 89 45 ec 8b 47 78 be f5 ff ff ff 89 c2 c1 fa 02 01 d0 8b 97 84 00 00 00 39 c2 0f 4f d0 8b 47 60 39 d0 0f 8f b3 01 00 00 8b 75 f0 <8b> 53 48 8b 4e 10 39 ca 79 5c 39 4b 4c 79 08 0f 0b c3 03 14 61
<0>Kernel panic - not syncing: Fatal exception in interrupt
Oops #2:
-----------
gdb) l *0xc02fac2c
0xc02fac2c is in tcp_time_to_recover (net/ipv4/tcp_input.c:1352).
1350 static inline int tcp_skb_timedout(struct tcp_opt *tp, struct sk_buff *skb)
1351 {
1352 return (tcp_time_stamp - TCP_SKB_CB(skb)->when > tp->rto);
1353 }
1354
Unable to handle kernel NULL pointer dereference at virtual address 00000050
printing eip:
c02fac2c
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: ipv6 e1000 3c59x ac
CPU: 0
EIP: 0060:[<c02fac2c>] Not tainted VLI
EFLAGS: 00010246 (2.6.9-rc1-mm5)
EIP is at tcp_time_to_recover+0x1d0/0x214
eax: fffcc289 ebx: f77a6320 ecx: 00000002 edx: 00000000
esi: 00000003 edi: f77a6128 ebp: c0460ddc esp: c0460dc4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0460000 task=c039dac0)
Stack: 00000246 fffcc3b1 00000001 f77a6320 00000000 49a2fa4f c0460e20 c02fb752
c0460e20 c02fc1b1 00000000 00010800 49a2fa4f 037a6320 00000001 00000000
00000106 00000004 49a2f4d3 f77a6128 00000003 f77a6320 49a2fa4f c0460e60
Call Trace:
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c02fb752>] tcp_fastretrans_alert+0x146/0x6ed
[<c02fca42>] tcp_ack+0x260/0x5df
[<c02ff67e>] tcp_rcv_established+0x5d0/0x868
[<c0308265>] tcp_v4_do_rcv+0x101/0x103
[<c0308a73>] tcp_v4_rcv+0x80c/0x920
[<c02ed407>] ip_local_deliver+0xa0/0x26d
[<c02edb43>] ip_rcv+0x381/0x4f9
[<c02da8e3>] netif_receive_skb+0x1f7/0x224
[<c02da995>] process_backlog+0x85/0x135
[<c02daacb>] net_rx_action+0x86/0x136
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c01089ed>] do_IRQ+0x185/0x1cf
[<c0106648>] common_interrupt+0x18/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
=======================
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c02fb752>] tcp_fastretrans_alert+0x146/0x6ed
[<c02fca42>] tcp_ack+0x260/0x5df
[<c02ff67e>] tcp_rcv_established+0x5d0/0x868
[<c0308265>] tcp_v4_do_rcv+0x101/0x103
[<c0308a73>] tcp_v4_rcv+0x80c/0x920
[<c02ed407>] ip_local_deliver+0xa0/0x26d
[<c02edb43>] ip_rcv+0x381/0x4f9
[<c02da8e3>] netif_receive_skb+0x1f7/0x224
[<c02da995>] process_backlog+0x85/0x135
[<c02daacb>] net_rx_action+0x86/0x136
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c01089ed>] do_IRQ+0x185/0x1cf
[<c0106648>] common_interrupt+0x18/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
Code: 83 c4 0c 5b 5e 5f 5d c3 8b 92 7c 01 00 00 83 c2 01 e9 7a fe ff ff 8d 47 64 8b 57 64 39 c2 b8 00 00 00 00 0f 44 d0 a1 a0 f5 39 c0 <2b> 42 50 3b 83 94 00 00 00 77 c7 e9 7b fe ff ff c7 45 f0 00 00
<0>Kernel panic - not syncing: Fatal exception in interrupt
--
James Morris
<[email protected]>
Jesse Barnes wrote:
> On Monday, September 13, 2004 11:10 am, Jesse Barnes wrote:
>
>>On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
>>
>>>Jesse wrote:
>>>
>>>>I'll send out a more complete one later (unless
>>>>Paul beat me to it,
Sorry, I actually did read your mail about the SD_NODE_INIT thing. It
slipped my mind :(
>>>
>>>See my patch posted a few hours ago:
>>>
>>> [Patch] Fix sched make domain setup overridable
>>
>>Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
>>bigger system). I've got a 32p reserved for later today.
>
>
> Here's one from a 32p, 16 node machine (captured while scsi was still coming
> up, but you probably don't care about that).
>
OK, in that case you'll also need the attached patch.
Sigh. We'll get there one day.
On Mon, 13 Sep 2004 20:25:38 -0400 (EDT)
James Morris <[email protected]> wrote:
> I'm experiencing TCP related oopses with this kernel (not seen in -mm4),
> .config file attached.
>
> Here are two backtraces, the first happened a few seconds after logging
> in via ssh, the second happened soon after boot (using selinux=0, just to
> make sure).
I think I fixed this one yesterday. Callers of tcp_fragment()
in tcp_output.c were not accounting packets correctly. I
believe this is what will fix it, and this is in Linus's
tree already.
I guess you have an e1000 in this box? :)
(either that or some other card whose driver
enables TSO by default)
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/09/10 15:21:43-07:00 [email protected]
# [TCP]: Fix packet counting when fragmenting already sent packets.
#
# Calls to tcp_fragment() change the tso_factor of
# an SKB, so we need to deal with that.
#
# Signed-off-by: David S. Miller <[email protected]>
#
# net/ipv4/tcp_output.c
# 2004/09/10 15:21:13-07:00 [email protected] +12 -2
# [TCP]: Fix packet counting when fragmenting already sent packets.
#
# Calls to tcp_fragment() change the tso_factor of
# an SKB, so we need to deal with that.
#
# Signed-off-by: David S. Miller <[email protected]>
#
diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c 2004-09-13 18:51:38 -07:00
+++ b/net/ipv4/tcp_output.c 2004-09-13 18:51:38 -07:00
@@ -681,8 +681,12 @@
TCP_SKB_CB(skb)->when = tcp_time_stamp;
if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC)))
break;
- /* Advance the send_head. This one is sent out. */
+
+ /* Advance the send_head. This one is sent out.
+ * This call will increment packets_out.
+ */
update_send_head(sk, tp, skb);
+
tcp_minshall_update(tp, mss_now, skb);
sent_pkts = 1;
}
@@ -968,11 +972,17 @@
return -EAGAIN;
if (skb->len > cur_mss) {
+ int old_factor = TCP_SKB_CB(skb)->tso_factor;
+ int new_factor;
+
if (tcp_fragment(sk, skb, cur_mss))
return -ENOMEM; /* We'll try again later. */
/* New SKB created, account for it. */
- tcp_inc_pcount(&tp->packets_out, skb);
+ new_factor = TCP_SKB_CB(skb)->tso_factor;
+ tcp_dec_pcount_explicit(&tp->packets_out,
+ new_factor - old_factor);
+ tcp_inc_pcount(&tp->packets_out, skb->next);
}
/* Collapse two adjacent packets if worthwhile and we can. */
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.
The following 3 updates address various issues expressed to me in
unrelated threads or messages, and while none of them are particularly
pressing each does resolve a concern I've deemed valid.
-- wli
On Mon, Sep 13, 2004 at 07:28:27PM -0700, William Lee Irwin III wrote:
> I was informed that the vendor component of the copyright can't be
> clobbered without more care, so this patch retains the older vendor,
> updating it only to reflect the appropriate time period.
/proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
lowers it there. Compiletested on x86-64.
Index: mm5-2.6.9-rc1/include/linux/threads.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/threads.h 2004-08-13 22:36:12.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/threads.h 2004-09-13 16:28:38.791798576 -0700
@@ -30,6 +30,6 @@
/*
* A maximum of 4 million PIDs should be enough for a while:
*/
-#define PID_MAX_LIMIT (4*1024*1024)
+#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
#endif
On Mon, Sep 13, 2004 at 07:31:14PM -0700, William Lee Irwin III wrote:
> /proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
> lowers it there. Compiletested on x86-64.
The pid_max sysctl doesn't enforce PID_MAX_LIMIT or sane lower bounds.
RESERVED_PIDS + 1 is the minimum pid_max that won't break alloc_pidmap(),
and PID_MAX_LIMIT may not be aligned to 8*PAGE_SIZE boundaries for
unusual values of PAGE_SIZE, so this also rounds up PID_MAX_LIMIT to it.
Compiletested on x86-64.
Index: mm5-2.6.9-rc1/kernel/pid.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/pid.c 2004-09-13 16:30:21.980111568 -0700
+++ mm5-2.6.9-rc1/kernel/pid.c 2004-09-13 16:33:06.324127480 -0700
@@ -36,7 +36,10 @@
#define RESERVED_PIDS 300
-#define PIDMAP_ENTRIES (PID_MAX_LIMIT/PAGE_SIZE/8)
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+
+#define PIDMAP_ENTRIES ((PID_MAX_LIMIT + 8*PAGE_SIZE - 1)/PAGE_SIZE/8)
#define BITS_PER_PAGE (PAGE_SIZE*8)
#define BITS_PER_PAGE_MASK (BITS_PER_PAGE-1)
#define mk_pid(map, off) (((map) - pidmap_array)*BITS_PER_PAGE + (off))
Index: mm5-2.6.9-rc1/kernel/sysctl.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sysctl.c 2004-09-13 16:27:44.621033784 -0700
+++ mm5-2.6.9-rc1/kernel/sysctl.c 2004-09-13 16:40:46.358191672 -0700
@@ -68,6 +68,7 @@
extern int sched_base_timeslice;
extern int sched_min_base;
extern int sched_max_base;
+extern int pid_max_min, pid_max_max;
#if defined(CONFIG_X86_LOCAL_APIC) && defined(__i386__)
int unknown_nmi_panic;
@@ -577,7 +578,10 @@
.data = &pid_max,
.maxlen = sizeof (int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = sysctl_intvec,
+ .extra1 = &pid_max_min,
+ .extra2 = &pid_max_max,
},
{
.ctl_name = KERN_PANIC_ON_OOPS,
On Mon, Sep 13, 2004 at 07:31:14PM -0700, William Lee Irwin III wrote:
> /proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
> lowers it there. Compiletested on x86-64.
[...]
> -#define PID_MAX_LIMIT (4*1024*1024)
> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
Index: mm5-2.6.9-rc1/include/linux/threads.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/threads.h 2004-08-13 22:36:12.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/threads.h 2004-09-13 19:30:47.552374432 -0700
@@ -30,6 +30,6 @@
/*
* A maximum of 4 million PIDs should be enough for a while:
*/
-#define PID_MAX_LIMIT (4*1024*1024)
+#define PID_MAX_LIMIT (sizeof(long) > 4 ? 4*1024*1024 : PID_MAX_DEFAULT)
#endif
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
>> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>> and will later appear at
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
>> Please check kernel.org before using zip.com.au.
On Mon, Sep 13, 2004 at 07:25:30PM -0700, William Lee Irwin III wrote:
> The following 3 updates address various issues expressed to me in
> unrelated threads or messages, and while none of them are particularly
> pressing each does resolve a concern I've deemed valid.
I was informed that the vendor component of the copyright can't be
clobbered without more care, so this patch retains the older vendor,
updating it only to reflect the appropriate time period.
Index: mm5-2.6.9-rc1/kernel/pid.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/pid.c 2004-09-13 16:27:52.608819456 -0700
+++ mm5-2.6.9-rc1/kernel/pid.c 2004-09-13 16:30:21.980111568 -0700
@@ -1,7 +1,8 @@
/*
* Generic pidhash and scalable, time-bounded PID allocator
*
- * (C) 2002-2004 William Irwin, Oracle
+ * (C) 2002-2003 William Irwin, IBM
+ * (C) 2004 William Irwin, Oracle
* (C) 2002-2004 Ingo Molnar, Red Hat
*
* pid-structures are backing objects for tasks sharing a given ID to chain
On Monday, September 13, 2004 7:02 pm, Nick Piggin wrote:
> Sorry, I actually did read your mail about the SD_NODE_INIT thing. It
> slipped my mind :(
Ok, just wanted to make sure I hadn't been spamlisted or something :)
> OK, in that case you'll also need the attached patch.
> Sigh. We'll get there one day.
Ok, I'll give it a try.
Jesse
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.
Not all binfmts page align ->end_code and ->start_code, so the task_mmu
statistics calculations need to perform this allocation themselves.
Index: mm5-2.6.9-rc1/fs/proc/task_mmu.c
===================================================================
--- mm5-2.6.9-rc1.orig/fs/proc/task_mmu.c 2004-09-13 16:27:35.915357248 -0700
+++ mm5-2.6.9-rc1/fs/proc/task_mmu.c 2004-09-13 19:43:19.681033496 -0700
@@ -9,7 +9,7 @@
unsigned long data, text, lib;
data = mm->total_vm - mm->shared_vm - mm->stack_vm;
- text = (mm->end_code - mm->start_code) >> 10;
+ text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
buffer += sprintf(buffer,
"VmSize:\t%8lu kB\n"
@@ -36,7 +36,8 @@
int *data, int *resident)
{
*shared = mm->shared_vm;
- *text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
+ *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
+ >> PAGE_SHIFT;
*data = mm->total_vm - mm->shared_vm - *text;
*resident = mm->rss;
return mm->total_vm;
On Mon, Sep 13, 2004 at 07:53:04PM -0700, William Lee Irwin III wrote:
> Not all binfmts page align ->end_code and ->start_code, so the task_mmu
> statistics calculations need to perform this allocation themselves.
s/allocation/alignment/
-- wli
On Mon, 13 Sep 2004, David S. Miller wrote:
> I think I fixed this one yesterday. Callers of tcp_fragment()
> in tcp_output.c were not accounting packets correctly. I
> believe this is what will fix it, and this is in Linus's
> tree already.
This patch is also in -mm5 (linus.patch), and the oopses go away when I
back it out.
> I guess you have an e1000 in this box? :)
Yes.
- James
--
James Morris
<[email protected]>
David S. Miller <[email protected]> wrote:
>
> @@ -968,11 +972,17 @@
> return -EAGAIN;
>
> if (skb->len > cur_mss) {
> + int old_factor = TCP_SKB_CB(skb)->tso_factor;
> + int new_factor;
> +
> if (tcp_fragment(sk, skb, cur_mss))
> return -ENOMEM; /* We'll try again later. */
>
> /* New SKB created, account for it. */
> - tcp_inc_pcount(&tp->packets_out, skb);
> + new_factor = TCP_SKB_CB(skb)->tso_factor;
> + tcp_dec_pcount_explicit(&tp->packets_out,
> + new_factor - old_factor);
That should be tcp_inc_pcount_explicit.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.
> - Added the `bk-scsi-target' tree to the -mm lineup. It is managed by James
> Bottomley
> - Some enhancements to the ext3 block reservation code here. Please cc
> [email protected] on oops reports ;)
> - There's a patch here which will cause warnings if a PCI device driver is
> removed without having called pci_disable_device(). Please try to cc the
> appropriate mailing list or maintainer when reporting any instances.
I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.
What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. read_proc_profile()
does not flush the per-cpu hashtables because flushing may cause
timeslice overrun on the systems where prof_buffer cacheline bounces
are so problematic as to livelock the timer interrupt.
This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.
I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-13 21:36:35.498912144 -0700
@@ -12,10 +12,18 @@
#include <linux/profile.h>
#include <asm/sections.h>
+struct profile_hit {
+ unsigned long pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
+
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [NR_PROFILE_HIT], cpu_profile_hits);
+#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
{
@@ -181,6 +189,41 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);
+#ifdef CONFIG_SMP
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ cpu = get_cpu();
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ((~pc << 1) | 1) & (NR_PROFILE_HIT - 1);
+ hits = per_cpu(cpu_profile_hits, cpu);
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +233,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif
void profile_tick(int type, struct pt_regs *regs)
{
On Tue, 14 Sep 2004 13:34:20 +1000
Herbert Xu <[email protected]> wrote:
> > @@ -968,11 +972,17 @@
> > return -EAGAIN;
> >
> > if (skb->len > cur_mss) {
> > + int old_factor = TCP_SKB_CB(skb)->tso_factor;
> > + int new_factor;
> > +
> > if (tcp_fragment(sk, skb, cur_mss))
> > return -ENOMEM; /* We'll try again later. */
> >
> > /* New SKB created, account for it. */
> > - tcp_inc_pcount(&tp->packets_out, skb);
> > + new_factor = TCP_SKB_CB(skb)->tso_factor;
> > + tcp_dec_pcount_explicit(&tp->packets_out,
> > + new_factor - old_factor);
>
> That should be tcp_inc_pcount_explicit.
Better fix is to transpose the factors in the subtraction.
That's what I was trying to do here.
Good eyes Herbert.
James, does this make your problem go away?
Thanks for testing.
===== net/ipv4/tcp_output.c 1.57 vs edited =====
--- 1.57/net/ipv4/tcp_output.c 2004-09-12 16:17:23 -07:00
+++ edited/net/ipv4/tcp_output.c 2004-09-13 21:36:59 -07:00
@@ -991,7 +991,7 @@
/* New SKB created, account for it. */
new_factor = TCP_SKB_CB(skb)->tso_factor;
tcp_dec_pcount_explicit(&tp->packets_out,
- new_factor - old_factor);
+ old_factor - new_factor);
tcp_inc_pcount(&tp->packets_out, skb->next);
}
William, any reason not to fully per-cpu the profile buffer
and then only traverse the array when the user attempts to
capture the counters?
Then we can undo the atomics altogether, as well as the cacheline
traffic, for the extremely common case.
Are there space concerns?
On Mon, 13 Sep 2004, David S. Miller wrote:
> James, does this make your problem go away?
Looks like it.
- James
--
James Morris
<[email protected]>
William Lee Irwin III <[email protected]> wrote:
>
> read_proc_profile()
> does not flush the per-cpu hashtables because flushing may cause
> timeslice overrun on the systems where prof_buffer cacheline bounces
> are so problematic as to livelock the timer interrupt.
That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
large number of hits within the hash table is it not possible that the
/proc/profile results could be grossly inaccurate?
If you had two front-ends per cpu to the profiling buffer then the CPU
which is running the /proc/profile read could tell all the other CPUs to
flip to their alternate buffer and can then perform accumulation at its
leisure.
How does oprofile get around this? I guess in most modes the CPUs are not
synchronised.
One wonders how long we should keep flogging the /prof/profile profiling
code. What systems are seeing this livelock?
William Lee Irwin III <[email protected]> wrote:
>> read_proc_profile()
>> does not flush the per-cpu hashtables because flushing may cause
>> timeslice overrun on the systems where prof_buffer cacheline bounces
>> are so problematic as to livelock the timer interrupt.
On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
> That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
> large number of hits within the hash table is it not possible that the
> /proc/profile results could be grossly inaccurate?
> If you had two front-ends per cpu to the profiling buffer then the CPU
> which is running the /proc/profile read could tell all the other CPUs to
> flip to their alternate buffer and can then perform accumulation at its
> leisure.
This is superior to no flushing; I'll implement that and send out an
incremental update (or if preferred, an update of this patch).
On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
> How does oprofile get around this? I guess in most modes the CPUs are not
> synchronised.
> One wonders how long we should keep flogging the /prof/profile profiling
> code. What systems are seeing this livelock?
The original bits were merely a consolidation extracted from a since-
dropped feature patch and an unrelated feature patch from mingo and
arjanv; this is an unrelated fix for SGI's stability issue on larger
Altixen. I personally intend to do no further adjustments.
-- wli
On Mon, Sep 13, 2004 at 10:05:07PM -0700, David S. Miller wrote:
> William, any reason not to fully per-cpu the profile buffer
> and then only traverse the array when the user attempts to
> capture the counters?
> Then we can undo the atomics altogether, as well as the cacheline
> traffic, for the extremely common case.
> Are there space concerns?
This was my original approach (modulo eliminating the global buffer
and the atomic operations), but space concerns stymied it, as the
profile buffer can be several megabytes large. It would likely perform
better in general if admissible, for whatever value performance is
considered to have.
There is also an unusual facet to this; the TLB overhead of a loop like:
for (i = 0; i < prof_len; ++i) {
for_each_online_cpu(cpu)
global_buf[i] += per_cpu(cpu_prof_buffer, cpu)[i];
}
is very large and caused "effective nontermination", otherwise known as
"exhausting the user's patience", on SGI's systems after about half an
hour. So some TLB overhead amortization is necessary for this to be
feasible. I suspect iterating over pages of the profile buffer and
storing intermediate results for a page full of profile buffer hits
in a buffer page may suffice though I've not tried it.
-- wli
On Mon, 13 Sep 2004 22:32:18 -0700
William Lee Irwin III <[email protected]> wrote:
> This was my original approach (modulo eliminating the global buffer
> and the atomic operations), but space concerns stymied it, as the
> profile buffer can be several megabytes large. It would likely perform
> better in general if admissible, for whatever value performance is
> considered to have.
>
> There is also an unusual facet to this; the TLB overhead of a loop like:
> for (i = 0; i < prof_len; ++i) {
> for_each_online_cpu(cpu)
> global_buf[i] += per_cpu(cpu_prof_buffer, cpu)[i];
> }
> is very large and caused "effective nontermination", otherwise known as
> "exhausting the user's patience", on SGI's systems after about half an
> hour. So some TLB overhead amortization is necessary for this to be
> feasible. I suspect iterating over pages of the profile buffer and
> storing intermediate results for a page full of profile buffer hits
> in a buffer page may suffice though I've not tried it.
I bet that, like we found out about page tables on 64-bit, these
profile buffers are sparsely populated with hits. So perhaps a
per-cpu bitmap that indicates regions that might have any hits
at all, allowing large amounts of skipping and thus amortizing the
scan cost.
On Mon, 13 Sep 2004 22:32:18 -0700 William Lee Irwin III wrote:
>> There is also an unusual facet to this; the TLB overhead of a loop like:
[...]
>> is very large and caused "effective nontermination", otherwise known as
>> "exhausting the user's patience", on SGI's systems after about half an
>> hour. So some TLB overhead amortization is necessary for this to be
>> feasible. I suspect iterating over pages of the profile buffer and
>> storing intermediate results for a page full of profile buffer hits
>> in a buffer page may suffice though I've not tried it.
On Mon, Sep 13, 2004 at 10:49:43PM -0700, David S. Miller wrote:
> I bet that, like we found out about page tables on 64-bit, these
> profile buffers are sparsely populated with hits. So perhaps a
> per-cpu bitmap that indicates regions that might have any hits
> at all, allowing large amounts of skipping and thus amortizing the
> scan cost.
Well, that would speed it up, but the catastrophe was avoided in the
older patches by just processing all the hits for one cpu at a time,
and the buffering methods above for your suggested accounting
structures likely work well enough the overhead of processing unused
portions of the bitmap can be ignored. I don't really want to go about
addressing performance issues besides effective or actual
nontermination for this code, and would rather leave highly efficient
methods to oprofile (in fact, some others believe that even bugfixes
for such issues should be ignored for kernel/profile.c, contrary to my
notion that it shouldn't crash systems regardless of their size).
-- wli
On Mon, Sep 13, 2004 at 11:10:23PM -0700, William Lee Irwin III wrote:
> Well, that would speed it up, but the catastrophe was avoided in the
> older patches by just processing all the hits for one cpu at a time,
> and the buffering methods above for your suggested accounting
> structures likely work well enough the overhead of processing unused
> portions of the bitmap can be ignored. I don't really want to go about
> addressing performance issues besides effective or actual
> nontermination for this code, and would rather leave highly efficient
> methods to oprofile (in fact, some others believe that even bugfixes
> for such issues should be ignored for kernel/profile.c, contrary to my
> notion that it shouldn't crash systems regardless of their size).
s/portions of the bitmap/portions of the profile buffer/
--- ./kernel/exit.c.nt 2004-09-13 18:00:12.727181136 +0400
+++ ./kernel/exit.c 2004-09-13 18:00:51.864231400 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
task_t fastcall *next_thread(const task_t *p)
{
#ifdef CONFIG_SMP
- if (!p->sighand)
- BUG();
- if (!spin_is_locked(&p->sighand->siglock) &&
- !rwlock_is_locked(&tasklist_lock))
+ if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();
#endif
return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);
On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
>> That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
>> large number of hits within the hash table is it not possible that the
>> /proc/profile results could be grossly inaccurate?
>> If you had two front-ends per cpu to the profiling buffer then the CPU
>> which is running the /proc/profile read could tell all the other CPUs to
>> flip to their alternate buffer and can then perform accumulation at its
>> leisure.
On Mon, Sep 13, 2004 at 10:21:18PM -0700, William Lee Irwin III wrote:
> This is superior to no flushing; I'll implement that and send out an
> incremental update (or if preferred, an update of this patch).
I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.
What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. In order to "flush"
the pending profile hits for read_profile(), this patch actually
creates a pair of per-cpu profile buffer, and at the time of
read_profile() IPI's all cpus to get them to flip between their pairs
of profile buffers, doing all the work to flush the profile hits from
the older per-cpu buffers in the context of the caller of read_profile(),
with exclusion provided by a semaphore ensuring that only one caller of
profile_flip_buffers() may execute at a time and interrupt disablement
to prevent buffer flip IPI's from altering the hashtables or flip state
while an update is in progress. The flip state is per-cpu so that
remote cpus need only disable interrupts locally for synchronization,
which is both simple and busywait-free for remote cpus, and the flip
states all change in tandem with the cpu requesting the update waiting
for the completion of smp_call_function() for notification that all
cpus have finished flipping their buffers. The IPI handler merely
toggles the flip state (which is an array index) between 0 and 1.
This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.
I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-13 23:12:27.574463744 -0700
@@ -11,11 +11,21 @@
#include <linux/cpumask.h>
#include <linux/profile.h>
#include <asm/sections.h>
+#include <asm/semaphore.h>
+
+struct profile_hit {
+ unsigned long pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(int, cpu_profile_flip);
+#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
{
@@ -181,6 +191,74 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);
+#ifdef CONFIG_SMP
+static void __profile_flip_buffers(void *unused)
+{
+ int cpu = get_cpu();
+ unsigned long flags;
+
+ local_irq_save(flags);
+ per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
+ local_irq_restore(flags);
+ put_cpu();
+}
+
+static void profile_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, j, cpu;
+
+ down(&profile_flip_mutex);
+ j = per_cpu(cpu_profile_flip, smp_processor_id());
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ if (!hits[i].hits)
+ continue;
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ }
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ cpu = get_cpu();
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ((~pc << 1) | 1) & (NR_PROFILE_HIT - 1);
+ hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else /* !CONFIG_SMP */
+#define profile_flip_buffers() do { } while (0)
+
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +268,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif /* !CONFIG_SMP */
void profile_tick(int type, struct pt_regs *regs)
{
@@ -256,6 +335,7 @@
char * pnt;
unsigned int sample_step = 1 << prof_shift;
+ profile_flip_buffers();
if (p >= (prof_len+1)*sizeof(unsigned int))
return 0;
if (count > (prof_len+1)*sizeof(unsigned int) - p)
William Lee Irwin III <[email protected]> wrote:
>
A few comments which describe the design would be nice...
> +#ifdef CONFIG_SMP
> +static void __profile_flip_buffers(void *unused)
> +{
> + int cpu = get_cpu();
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
> + local_irq_restore(flags);
> + put_cpu();
> +}
hm. Does an IPI handler need to disable local IRQs?
> +static void profile_flip_buffers(void)
> +{
> + static DECLARE_MUTEX(profile_flip_mutex);
> + int i, j, cpu;
> +
> + down(&profile_flip_mutex);
> + j = per_cpu(cpu_profile_flip, smp_processor_id());
Is this preempt-safe?
> + on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
> + for_each_online_cpu(cpu) {
> + struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
William Lee Irwin III <[email protected]> wrote:
[...]
On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> A few comments which describe the design would be nice...
Okay, I'll add a few in another update. I suppose what's going on may
not be as obvious to everyone else even with the code in hand.
On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
>> + local_irq_save(flags);
>> + per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
>> + local_irq_restore(flags);
>> + put_cpu();
>> +}
On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> hm. Does an IPI handler need to disable local IRQs?
It's for exclusion from the timer interrupt. It looks like ia32 enters
the calls with interrupts disabled, so it's probably safe to assume
it's called with disabled interrupts for all architectures (or what
architectures don't are broken by other callers elsewhere). I'll send
out an update with the explicit interrupt disablement removed.
William Lee Irwin III <[email protected]> wrote:
>> + down(&profile_flip_mutex);
>> + j = per_cpu(cpu_profile_flip, smp_processor_id());
On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> Is this preempt-safe?
Yes. It's irrelevant which cpu's cpu_profile_flip is sampled. But
it's not cpu hotplug -safe, as the cpu may be offlined and the per-cpu
storage freed in the duration between calling smp_processor_id()
and dereferencing the offset from the start of the per-cpu area.
Disabling preemption while it's being sampled (no longer than that is
necessary) would repair it for cpu hotplug, as it would then have a
valid cpu (the one on which it's executing) while the flip state is
being sampled (it can't change because we own the semaphore, and won't
vary by cpu unless the on_each_cpu() is in flight, but we have to have
a valid cpu number to sample it). The cpucontrol semaphore would
be excessively heavyweight and we'd either have to conditionally
compile out the native semaphore for the cpu hotplug case or otherwise
acquire two semaphores in succession.
This raises an interesting question of how on earth for_each_online_cpu()
is handled by cpu hotplug, but I don't feel responsible for answering it.
So, my preferred fix is the following, with which I'll send out an
updated patch if everyone agrees:
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 23:12:27.574463744 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 00:10:29.820081944 -0700
@@ -209,7 +209,8 @@
int i, j, cpu;
down(&profile_flip_mutex);
- j = per_cpu(cpu_profile_flip, smp_processor_id());
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
for_each_online_cpu(cpu) {
struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
>> A few comments which describe the design would be nice...
On Tue, Sep 14, 2004 at 12:55:44AM -0700, William Lee Irwin III wrote:
> Okay, I'll add a few in another update. I suppose what's going on may
> not be as obvious to everyone else even with the code in hand.
The comments and all other issues raised in your reply have been
addressed in the following updated patch, in which I also shrank the
hashtable entries' fields to u32 for 64-bit machines, on which the full
precision of an unsigned long is unnecessary, added some commentary to
the beginning of the file describing its contents and the recent major
work done on it, and simplified the secondary hash function. I also
presume silence to be assent regarding the hotplug (not preempt) fix.
-- wli
I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.
What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. In order to "flush"
the pending profile hits for read_profile(), this patch actually
creates a pair of per-cpu profile buffer, and at the time of
read_profile() IPI's all cpus to get them to flip between their pairs
of profile buffers, doing all the work to flush the profile hits from
the older per-cpu buffers in the context of the caller of read_profile(),
with exclusion provided by a semaphore ensuring that only one caller of
profile_flip_buffers() may execute at a time and interrupt disablement
to prevent buffer flip IPI's from altering the hashtables or flip state
while an update is in progress. The flip state is per-cpu so that
remote cpus need only disable interrupts locally for synchronization,
which is both simple and busywait-free for remote cpus, and the flip
states all change in tandem with the cpu requesting the update waiting
for the completion of smp_call_function() for notification that all
cpus have finished flipping their buffers. The IPI handler merely
toggles the flip state (which is an array index) between 0 and 1.
This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.
I also took the liberty of adding some commentary to the comments at
the beginning of the file reflecting the major work done on profile.c
in recent months and describing what the file implements..
I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 01:27:49.675716672 -0700
@@ -1,5 +1,16 @@
/*
* linux/kernel/profile.c
+ * Simple profiling. Manages a direct-mapped profile hit count buffer,
+ * with configurable resolution, support for restricting the cpus on
+ * which profiling is done, and switching between cpu time and
+ * schedule() calls via kernel command line parameters passed at boot.
+ *
+ * Scheduler profiling support, Arjan van de Ven and Ingo Molnar,
+ * Red Hat, July 2004
+ * Consolidation of architecture support code for profiling,
+ * William Irwin, Oracle, July 2004
+ * Amortized hit count accounting via per-cpu open-addressed hashtables
+ * to resolve timer interrupt livelocks, William Irwin, Oracle, 2004
*/
#include <linux/config.h>
@@ -11,11 +22,21 @@
#include <linux/cpumask.h>
#include <linux/profile.h>
#include <asm/sections.h>
+#include <asm/semaphore.h>
+
+struct profile_hit {
+ u32 pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(int, cpu_profile_flip);
+#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
{
@@ -181,6 +202,100 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);
+#ifdef CONFIG_SMP
+/*
+ * Each cpu has a pair of open-addressed hashtables for pending
+ * profile hits. read_profile() IPI's all cpus to request them
+ * to flip buffers and flushes their contents to prof_buffer itself.
+ * Flip requests are serialized by the profile_flip_mutex. The sole
+ * use of having a second hashtable is for avoiding cacheline
+ * contention that would otherwise happen during flushes of pending
+ * profile hits required for the accuracy of reported profile hits
+ * and so resurrect the interrupt livelock issue.
+ *
+ * The open-addressed hashtables are indexed by profile buffer slot
+ * and hold the number of pending hits to that profile buffer slot on
+ * a cpu in an entry. When the hashtable overflows, all pending hits
+ * are accounted to their corresponding profile buffer slots with
+ * atomic_add() and the hashtable emptied. As numerous pending hits
+ * may be accounted to a profile buffer slot in a hashtable entry,
+ * this amortizes a number of atomic profile buffer increments likely
+ * to be far larger than the number of entries in the hashtable,
+ * particularly given that the number of distinct profile buffer
+ * positions to which hits are accounted during short intervals (e.g.
+ * several seconds) is usually very small. Exclusion from buffer
+ * flipping is provided by interrupt disablement (note that for
+ * SCHED_PROFILING profile_hit() may be called from process context).
+ * The hash function is meant to be lightweight as opposed to strong,
+ * and was vaguely inspired by ppc64 firmware-supported inverted
+ * pagetable hash functions, but doesn't use finite collision chains.
+ *
+ * -- wli
+ */
+static void __profile_flip_buffers(void *unused)
+{
+ int cpu = smp_processor_id();
+
+ per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
+}
+
+static void profile_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, j, cpu;
+
+ down(&profile_flip_mutex);
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ if (!hits[i].hits)
+ continue;
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ }
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ~(pc << 1) & (NR_PROFILE_HIT - 1);
+ cpu = get_cpu();
+ hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else /* !CONFIG_SMP */
+#define profile_flip_buffers() do { } while (0)
+
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +305,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif /* !CONFIG_SMP */
void profile_tick(int type, struct pt_regs *regs)
{
@@ -256,6 +372,7 @@
char * pnt;
unsigned int sample_step = 1 << prof_shift;
+ profile_flip_buffers();
if (p >= (prof_len+1)*sizeof(unsigned int))
return 0;
if (count > (prof_len+1)*sizeof(unsigned int) - p)
Rafael J. Wysocki writes:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> >
> > and will later appear at
> >
> >
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
>
> It does not compile on SMP x86-64 w/ NUMA:
>
> CC arch/x86_64/ia32/ia32_ioctl.o
> In file included from fs/compat_ioctl.c:63,
> from arch/x86_64/ia32/ia32_ioctl.c:14:
> include/linux/reiserfs_fs.h:441: error: redefinition of `struct key'
> include/linux/reiserfs_fs.h: In function `le_key_k_offset':
include/linux/key.h defines struct key that conflicts with reiserfs'
struct key. As a temporary fix turn off CONFIG_KEYS (or
CONFIG_REISERFS_FS :)).
Correct solution is to put both structs into proper namespaces by
prefixing them.
> Greets,
> RJW
>
Nikita.
Nikita Danilov <[email protected]> wrote:
>
> include/linux/key.h defines struct key that conflicts with reiserfs'
> struct key. As a temporary fix turn off CONFIG_KEYS (or
> CONFIG_REISERFS_FS :)).
>
> Correct solution is to put both structs into proper namespaces by
> prefixing them.
struct key was pretty dumb of both of you, but reiserfs was dumb first.
David, what do you want it renamed to?
On Monday 13 September 2004 10:50, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> and will later appear at
>
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6
>.9-rc1-mm5/
100% reproducible under heavy IO load:
Sep 14 11:42:59 odyssey kernel: journal_bmap: journal block not found at
offset 2060 on hda12
Sep 14 11:42:59 odyssey kernel: Aborting journal on device hda12.
Sep 14 11:42:59 odyssey kernel: EXT3-fs error (device hda12) in
ext3_dirty_inode: IO failure
Sep 14 11:43:00 odyssey kernel: ext3_abort called.
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12):
ext3_journal_start: Detected aborted journal
Sep 14 11:43:00 odyssey kernel: Remounting filesystem read-only
Sep 14 11:43:00 odyssey kernel: ext3_reserve_inode_write: aborting
transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs
error (device hda12) in ext3_reserve_inode_write: Journal has aborted
Sep 14 11:43:00 odyssey kernel: ext3_reserve_inode_write: aborting
transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs
error (device hda12) in ext3_reserve_inode_write: Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in
ext3_orphan_del: Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in ext3_truncate:
Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:01 odyssey last message repeated 17 times
Sep 14 11:43:01 odyssey kernel: or (device hda12) in start_transaction:
Journal has aborted
Sep 14 11:43:01 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:02 odyssey last message repeated 53 times
Sep 14 11:43:02 odyssey kernel: EXT3-fs error (device hda12) in staror (device
hda12) in start_transaction: Journal has aborted
Sep 14 11:43:02 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:03 odyssey last message repeated 53 times
Sep 14 11:43:03 odyssey kernel: EXT3-fs error (device hda12) in staror (device
hda12) in start_transaction: Journal has aborted
Sep 14 11:43:03 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:34 odyssey last message repeated 147542 times
On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
> -#define PID_MAX_LIMIT (4*1024*1024)
> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
An architecture with sizeof(long) > 32? -- Most impressive.
Roger
On 2004-09-14T12:55:27,
Roger Luethi <[email protected]> said:
> > -#define PID_MAX_LIMIT (4*1024*1024)
> > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> An architecture with sizeof(long) > 32? -- Most impressive.
x86_64, s390x, ppc64...
On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
> timer interrupt, usually at boot. The following patch attempts to
> amortize the atomic operations done on the profile buffer to address
> this stability concern. This patch has nothing to do with performance;
isn't it *much* simpler and much more efficient to just have a per-cpu
idle function? I seriously doubt you'll get simultaneous collisions on
anything but the 'halt' instruction in the idle function.
On 2004-09-14T13:10:24,
Lars Marowsky-Bree <[email protected]> said:
> > > -#define PID_MAX_LIMIT (4*1024*1024)
> > > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> > An architecture with sizeof(long) > 32? -- Most impressive.
> x86_64, s390x, ppc64...
yesyes. I can't tell the difference between bytes and bits either.
Forget it ;-)
On Tue, 14 Sep 2004 13:10:24 +0200, Lars Marowsky-Bree wrote:
> On 2004-09-14T12:55:27,
> Roger Luethi <[email protected]> said:
>
> > > -#define PID_MAX_LIMIT (4*1024*1024)
> > > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> > An architecture with sizeof(long) > 32? -- Most impressive.
>
> x86_64, s390x, ppc64...
Really.
> > Correct solution is to put both structs into proper namespaces by
> > prefixing them.
>
> struct key was pretty dumb of both of you, but reiserfs was dumb first.
Well, I argue that it wasn't that dumb - in this case it's meant to be a
generic mechanism usable by everything in the kernel or userspace that needs
authentication, authorisation, or crypto tokens. I use EXT3 rather than
ReiserFS, so it didn't become an issue.
> David, what do you want it renamed to?
key_struct? token? key_token?
Possibly ticket or principal, though they make it sound like it's specifically
for Kerberos, so perhaps not.
What I need is a thesaurus.
JamesM: any good suggestion as to a name?
David
On Tue, 14 Sep 2004, David Howells wrote:
> > David, what do you want it renamed to?
>
> key_struct? token? key_token?
>
> Possibly ticket or principal, though they make it sound like it's specifically
> for Kerberos, so perhaps not.
Then there's the related problem of what do do about the naming of
key_alloc(), key.h etc.
What about 'akey', where a is for authentication or access.
- James
--
James Morris
<[email protected]>
> What about 'akey', where a is for authentication or access.
How about struct key_cookie? Though I think I like struct key_token better. I
like struct key even better though:-)
David
On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
>> -#define PID_MAX_LIMIT (4*1024*1024)
>> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
> An architecture with sizeof(long) > 32? -- Most impressive.
Did the correction not arrive?
-- wli
On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
>> timer interrupt, usually at boot. The following patch attempts to
>> amortize the atomic operations done on the profile buffer to address
>> this stability concern. This patch has nothing to do with performance;
On Tue, Sep 14, 2004 at 01:34:19PM +0200, Andrea Arcangeli wrote:
> isn't it *much* simpler and much more efficient to just have a per-cpu
> idle function? I seriously doubt you'll get simultaneous collisions on
> anything but the 'halt' instruction in the idle function.
Sampling the profile buffer at regular intervals shows far less than
256 distinct functions hit in 1s intervals even with all cpus busy. As
for whether that would be sufficient, that will have to be answered by
those who reported the bug. I suppose to test whether things besides
idling do cause this problem, one would boot with a restricted
prof_cpu_mask, load all cpus on the machine, set the prof_cpu_mask to
unrestricted, and see if it livelocks before the load terminates.
-- wli
On Tue, 14 Sep 2004 08:41:44 -0700, William Lee Irwin III wrote:
> On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
> >> -#define PID_MAX_LIMIT (4*1024*1024)
> >> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
>
> On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
> > An architecture with sizeof(long) > 32? -- Most impressive.
>
> Did the correction not arrive?
Must have missed it.
Roger
On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
> It probably worth to measure it. The real bottleneck happens when all
> cpus tries to get an exclusive lock on the same cacheline at the *same*
> time. 1 second is a pretty long time, if there's no contention of the
> cacheline, things are normally ok.
Right, we want to avoid that heavy contention.
> this is basically the same issue we had with RCU since all timers fired
> at the same wall clock time, and all of them tried to change bits in the
> same cacheline at the same time, that is a workload that collapse a
> 512-way machine ;). The profile timer is no different.
>
> Simply removing the idle time accounting would fix it, however this
> cripple down functionality a little bit, but it'll be a very good way to
> test if my theory is correct, or if you truly need some per-cpu logic in
> the profiler.
>
> You could also fake it, have a per-cpu counter only for the current->pid
> case, and then once somebody reads /proc/profile, you flush the total
> per-cpu count to the counter in the buffer that corresponds to the EIP
> of the idle func.
>
> Before dedicidng I'd suggest to have a look and see how the below patch
> compares to your approch in performance terms.
It looks like the 512p we have here is pretty heavily reserved this week, so
I'm not sure if I'll be able to test this (someone else might, John?). I
think the balance we're looking for is between simplicity and non-brokenness.
Builtin profiling is *supposed* to be simple and dumb, and were it not for
the readprofile times, I'd say per-cpu would be the way to go just because it
retains the simplicity of the current approach while allowing it to work on
large machines (as well as limiting the performance impact of builtin
profiling in general). wli's approach seems like a reasonable tradeoff
though, assuming what you suggest doesn't work.
Thanks,
Jesse
On Tue, Sep 14, 2004 at 08:51:03AM -0700, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
> >> timer interrupt, usually at boot. The following patch attempts to
> >> amortize the atomic operations done on the profile buffer to address
> >> this stability concern. This patch has nothing to do with performance;
>
> On Tue, Sep 14, 2004 at 01:34:19PM +0200, Andrea Arcangeli wrote:
> > isn't it *much* simpler and much more efficient to just have a per-cpu
> > idle function? I seriously doubt you'll get simultaneous collisions on
> > anything but the 'halt' instruction in the idle function.
>
> Sampling the profile buffer at regular intervals shows far less than
> 256 distinct functions hit in 1s intervals even with all cpus busy. As
> for whether that would be sufficient, that will have to be answered by
> those who reported the bug. I suppose to test whether things besides
> idling do cause this problem, one would boot with a restricted
> prof_cpu_mask, load all cpus on the machine, set the prof_cpu_mask to
> unrestricted, and see if it livelocks before the load terminates.
It probably worth to measure it. The real bottleneck happens when all
cpus tries to get an exclusive lock on the same cacheline at the *same*
time. 1 second is a pretty long time, if there's no contention of the
cacheline, things are normally ok.
this is basically the same issue we had with RCU since all timers fired
at the same wall clock time, and all of them tried to change bits in the
same cacheline at the same time, that is a workload that collapse a
512-way machine ;). The profile timer is no different.
Simply removing the idle time accounting would fix it, however this
cripple down functionality a little bit, but it'll be a very good way to
test if my theory is correct, or if you truly need some per-cpu logic in
the profiler.
You could also fake it, have a per-cpu counter only for the current->pid
case, and then once somebody reads /proc/profile, you flush the total
per-cpu count to the counter in the buffer that corresponds to the EIP
of the idle func.
Before dedicidng I'd suggest to have a look and see how the below patch
compares to your approch in performance terms.
--- sles/arch/ia64/kernel/time.c.~1~ 2004-08-25 02:47:33.000000000 +0200
+++ sles/arch/ia64/kernel/time.c 2004-09-14 18:01:39.792182008 +0200
@@ -206,6 +206,9 @@ ia64_do_profile (struct pt_regs * regs)
if (!prof_buffer)
return;
+ if (!current->pid)
+ return;
+
ip = instruction_pointer(regs);
/* Conserve space in histogram by encoding slot bits in address
* bits 2 and 3 rather than bits 0 and 1.
On Tue, 14 Sep 2004 09:41:57 -0700, William Lee Irwin III wrote:
> Please check to see that the above message arrived.
It's in the archive. Sorry for the noise.
Roger
On Tue, Sep 14, 2004 at 06:31:43PM +0200, Andrea Arcangeli wrote:
> per-cpu certainly sounds simple enough conceptually, so if you can
> notice any slowdown even with idle loop ruled out, per-cpu is sure
> better.
> This bouncing is likely to hurt smaller SMP too (but once the cpu is
> idle normally it's not a too bad thing since it only hurted reschedule
> latency, since we remain stuck in the timer irq for a bit longer than we
> should), but duplicating the ram of the array there doesn't look as nice
> as it would be on the altix, not all SMP have tons of ram. So an
> intermediate solution for this problem still sound worthwhile for the
> normal smp case.
Could you clarify whether you deem the per-cpu hashtable -based
amortization acceptable or whether this refers to per-cpu profile
buffers? I devised the hashtables to address space footprint concerns,
so I'm in a pickle if both have pending objections.
Thanks.
-- wli
On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
>> Before dedicidng I'd suggest to have a look and see how the below patch
>> compares to your approch in performance terms.
On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
> It looks like the 512p we have here is pretty heavily reserved this week, so
> I'm not sure if I'll be able to test this (someone else might, John?). I
> think the balance we're looking for is between simplicity and
> non-brokenness. Builtin profiling is *supposed* to be simple and dumb,
> and were it not for the readprofile times, I'd say per-cpu would be
> the way to go just because it retains the simplicity of the current
> approach while allowing it to work on large machines (as well as
> limiting the performance impact of builtin profiling in general).
> wli's approach seems like a reasonable tradeoff though, assuming what
> you suggest doesn't work.
Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
hotplug-ize this?
-- wli
Atop the prior per-cpu hashtable patch. It turns out that ia64 has
limitations on the sizes of per-cpu areas to the size of an area
covered by a single TLB entry, and worse yet, as short format VHPT
is being used, this TLB entry is limited to the PAGE_SIZE of the
region used for kernel data.
In order to address this, the following patch dynamically allocates
the per-cpu hashtables at boot-time. It probably needs adjustments
for cpu hotplug.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 01:27:49.675716672 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 10:20:43.589942872 -0700
@@ -34,7 +34,7 @@
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
#ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
#endif /* CONFIG_SMP */
@@ -273,6 +273,10 @@
secondary = ~(pc << 1) & (NR_PROFILE_HIT - 1);
cpu = get_cpu();
hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ if (!hits) {
+ put_cpu();
+ return;
+ }
local_irq_save(flags);
do {
if (hits[i].pc == pc) {
@@ -423,17 +427,58 @@
.write = write_profile,
};
+#ifdef CONFIG_SMP
+static void __init profile_nop(void *unused)
+{
+}
+#endif
+
static int __init create_proc_profile(void)
{
struct proc_dir_entry *entry;
+ int cpu;
+ (void)cpu;
if (!prof_on)
return 0;
+#ifdef CONFIG_SMP
+ for_each_online_cpu(cpu) {
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (struct profile_hit *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ goto out_cleanup;
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (struct profile_hit *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[1])
+ continue;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ goto out_cleanup;
+ }
+#endif /* CONFIG_SMP */
if (!(entry = create_proc_entry("profile", S_IWUSR | S_IRUGO, NULL)))
return 0;
entry->proc_fops = &proc_profile_operations;
entry->size = (1+prof_len) * sizeof(atomic_t);
return 0;
+#ifdef CONFIG_SMP
+out_cleanup:
+ prof_on = 0;
+ mb();
+ on_each_cpu(profile_nop, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ unsigned long kvaddr
+ = (unsigned long)per_cpu(cpu_profile_hits, cpu)[0];
+
+ if (!kvaddr)
+ break;
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page(kvaddr);
+ kvaddr = (unsigned long)per_cpu(cpu_profile_hits, cpu)[1];
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ free_page(kvaddr);
+ }
+ return -1;
+#endif /* CONFIG_SMP */
}
module_init(create_proc_profile);
#endif /* CONFIG_PROC_FS */
On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
>> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
>> hotplug-ize this?
On Tue, Sep 14, 2004 at 01:02:20PM -0700, William Lee Irwin III wrote:
> Okay, here's an attempt to hotplug-ize it. I have no clue whether this
> actually works, compiles, or follows whatever rules there are about
> dynamically allocated data referenced by per_cpu areas.
Take 2: actually register the notifier I wrote.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 10:20:43.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 12:56:33.871160032 -0700
@@ -20,6 +20,7 @@
#include <linux/notifier.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/cpu.h>
#include <linux/profile.h>
#include <asm/sections.h>
#include <asm/semaphore.h>
@@ -297,6 +298,44 @@
local_irq_restore(flags);
put_cpu();
}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int __devinit profile_cpu_callback(struct notifier_block *info,
+ unsigned long action, void *__cpu)
+{
+ int cpu = (unsigned long)__cpu;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(cpu_profile_flip, cpu) = 0;
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ return NOTIFY_BAD;
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[0])
+ break;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ return NOTIFY_BAD;
+ break;
+ case CPU_ONLINE:
+ cpu_set(cpu, prof_cpu_mask);
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ cpu_clear(cpu, prof_cpu_mask);
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ break;
+ }
+ return NOTIFY_OK;
+}
+#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
@@ -459,6 +498,7 @@
return 0;
entry->proc_fops = &proc_profile_operations;
entry->size = (1+prof_len) * sizeof(atomic_t);
+ hotcpu_notifier(profile_cpu_callback, 0);
return 0;
#ifdef CONFIG_SMP
out_cleanup:
On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
>> It looks like the 512p we have here is pretty heavily reserved this
>> week, so I'm not sure if I'll be able to test this (someone else
>> might, John?). I think the balance we're looking for is between
>> simplicity and non-brokenness. Builtin profiling is *supposed* to be
>> simple and dumb, and were it not for the readprofile times, I'd say
>> per-cpu would be the way to go just because it retains the simplicity
>> of the current approach while allowing it to work on large machines
>> (as well as limiting the performance impact of builtin profiling in
>> general). wli's approach seems like a reasonable tradeoff though,
>> assuming what you suggest doesn't work.
On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
> hotplug-ize this?
Okay, here's an attempt to hotplug-ize it. I have no clue whether this
actually works, compiles, or follows whatever rules there are about
dynamically allocated data referenced by per_cpu areas.
-- wli
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 10:20:43.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 12:52:16.064352624 -0700
@@ -20,6 +20,7 @@
#include <linux/notifier.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/cpu.h>
#include <linux/profile.h>
#include <asm/sections.h>
#include <asm/semaphore.h>
@@ -297,6 +298,44 @@
local_irq_restore(flags);
put_cpu();
}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int __devinit profile_cpu_callback(struct notifier_block *info,
+ unsigned long action, void *__cpu)
+{
+ int cpu = (unsigned long)__cpu;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(cpu_profile_flip, cpu) = 0;
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ return NOTIFY_BAD;
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[0])
+ break;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ return NOTIFY_BAD;
+ break;
+ case CPU_ONLINE:
+ cpu_set(cpu, prof_cpu_mask);
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ cpu_clear(cpu, prof_cpu_mask);
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ break;
+ }
+ return NOTIFY_OK;
+}
+#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
>>> Before dedicidng I'd suggest to have a look and see how the below patch
>>> compares to your approch in performance terms.
On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
>> It looks like the 512p we have here is pretty heavily reserved this
>> week, so I'm not sure if I'll be able to test this (someone else
>> might, John?). I think the balance we're looking for is between
>> simplicity and non-brokenness. Builtin profiling is *supposed* to be
>> simple and dumb, and were it not for the readprofile times, I'd say
>> per-cpu would be the way to go just because it retains the
>> simplicity of the current approach while allowing it to work on
>> large machines (as well as limiting the performance impact of
>> builtin profiling in general). wli's approach seems like a
>> reasonable tradeoff though, assuming what you suggest doesn't work.
On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
> hotplug-ize this?
Successfully tested on x86-64.
-- wli
On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
>>> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
>>> hotplug-ize this?
On Tue, Sep 14, 2004 at 01:02:20PM -0700, William Lee Irwin III wrote:
>> Okay, here's an attempt to hotplug-ize it. I have no clue whether this
>> actually works, compiles, or follows whatever rules there are about
>> dynamically allocated data referenced by per_cpu areas.
On Tue, Sep 14, 2004 at 01:04:53PM -0700, William Lee Irwin III wrote:
> Take 2: actually register the notifier I wrote.
As pointed out by John Hawkes, I forgot to flush the pending hits at
the time of profile buffer reset. The following patch, atop the cpu
hotplug notifier bits, does so.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 12:56:33.871160032 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 13:43:55.826117208 -0700
@@ -37,6 +37,7 @@
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
+static DECLARE_MUTEX(profile_flip_mutex);
#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
@@ -242,7 +243,6 @@
static void profile_flip_buffers(void)
{
- static DECLARE_MUTEX(profile_flip_mutex);
int i, j, cpu;
down(&profile_flip_mutex);
@@ -261,6 +261,22 @@
up(&profile_flip_mutex);
}
+static void profile_discard_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, cpu;
+
+ down(&profile_flip_mutex);
+ i = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
void profile_hit(int type, void *__pc)
{
unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
@@ -338,6 +354,7 @@
#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
+#define profile_discard_flip_buffers() do { } while (0)
void profile_hit(int type, void *__pc)
{
@@ -456,7 +473,7 @@
return -EINVAL;
}
#endif
-
+ profile_discard_flip_buffers();
memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
return count;
}
On Tue, Sep 14, 2004 at 01:04:53PM -0700, William Lee Irwin III wrote:
>> Take 2: actually register the notifier I wrote.
On Tue, Sep 14, 2004 at 02:04:22PM -0700, William Lee Irwin III wrote:
> As pointed out by John Hawkes, I forgot to flush the pending hits at
> the time of profile buffer reset. The following patch, atop the cpu
> hotplug notifier bits, does so.
Repost with corrected patch.
As pointed out by John Hawkes, I forgot to flush the pending hits at
the time of profile buffer reset. The following patch, atop the cpu
hotplug notifier bits, does so.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 13:46:05.151456768 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 14:03:01.854894352 -0700
@@ -37,6 +37,7 @@
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
+static DECLARE_MUTEX(profile_flip_mutex);
#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
@@ -242,7 +243,6 @@
static void profile_flip_buffers(void)
{
- static DECLARE_MUTEX(profile_flip_mutex);
int i, j, cpu;
down(&profile_flip_mutex);
@@ -261,6 +261,21 @@
up(&profile_flip_mutex);
}
+static void profile_discard_flip_buffers(void)
+{
+ int i, cpu;
+
+ down(&profile_flip_mutex);
+ i = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
void profile_hit(int type, void *__pc)
{
unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
@@ -338,6 +353,7 @@
#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
+#define profile_discard_flip_buffers() do { } while (0)
void profile_hit(int type, void *__pc)
{
@@ -456,7 +472,7 @@
return -EINVAL;
}
#endif
-
+ profile_discard_flip_buffers();
memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
return count;
}
On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
>>> An architecture with sizeof(long) > 32? -- Most impressive.
On Tue, 14 Sep 2004 08:41:44 -0700, William Lee Irwin III wrote:
>> Did the correction not arrive?
On Tue, Sep 14, 2004 at 05:47:50PM +0200, Roger Leuthi wrote:
> Must have missed it.
Date: Mon, 13 Sep 2004 19:38:30 -0700
From: William Lee Irwin III <[email protected]>
To: Andrew Morton <[email protected]>
Cc: [email protected], Albert Cahalan <[email protected]>
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines
Message-ID: <[email protected]>
Please check to see that the above message arrived.
Thanks.
-- wli
On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> diff -Nru a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> --- a/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
> +++ b/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
> @@ -148,8 +148,10 @@
> spin_lock(&dev->queue_lock);
> return -1;
> }
> - if (ret == NETDEV_TX_LOCKED && nolock)
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + spin_lock(&dev->queue_lock);
> goto collision;
> + }
> }
>
> /* NETDEV_TX_BUSY - we need to requeue */
Ok, is *this* the sort of thing you'd expect this patch to fix? I've seen it
on a couple of different machines now (one 32p and one 8p), but I haven't
seen it since applying the above to the BK tree as of this morning. Either
way, I'll keep pounding on different machines using the BK tree + your patch
to see what problems I run into.
Thanks,
Jesse
bad: scheduling while atomic!
Call Trace:
[<a000000100017320>] show_stack+0x80/0xa0
sp=e00002bc38dffbd0 bsp=e00002bc38df9250
[<a000000100017370>] dump_stack+0x30/0x60
sp=e00002bc38dffda0 bsp=e00002bc38df9238
[<a0000001006a7500>] schedule+0x1160/0x1520
sp=e00002bc38dffda0 bsp=e00002bc38df9128
[<a0000001006a8430>] schedule_timeout+0xf0/0x200
sp=e00002bc38dffdc0 bsp=e00002bc38df90f0
[<a000000100192e40>] sys_poll+0x520/0x7c0
sp=e00002bc38dffe00 bsp=e00002bc38df9018
[<a00000010000f4c0>] ia64_ret_from_syscall+0x0/0x20
sp=e00002bc38dffe30 bsp=e00002bc38df8fd8
Warning: kfree_skb on hard IRQ a0000001005dcba0
bad: scheduling while atomic!
Call Trace:
[<a000000100017320>] show_stack+0x80/0xa0
sp=e00002bc38dffc40 bsp=e00002bc38df9100
[<a000000100017370>] dump_stack+0x30/0x60
sp=e00002bc38dffe10 bsp=e00002bc38df90e8
[<a0000001006a7500>] schedule+0x1160/0x1520
sp=e00002bc38dffe10 bsp=e00002bc38df8fd8
[<a00000010000fa20>] skip_rbs_switch+0x90/0xf0
sp=e00002bc38dffe30 bsp=e00002bc38df8fd8
Unable to handle kernel paging request at virtual address 20000000001bcab0
ls[11638]: Oops 4294967296 [1]
Modules linked in:
Pid: 11638, CPU 2, comm: ls
psr : 00001013081a6018 ifs : 8000000000000003 ip : [<20000000001bcab0>]
Nottainted
ip is at 0x20000000001bcab0
unat: 0000000000000000 pfs : c000000000000207 rsc : 000000000000000f
rnat: 0000000000000000 bsps: 60000fff7fffc3f0 pr : 0000000005a6a9e9
ldrs: 0000000000880000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : 20000000001b9390 b6 : 20000000001b9360 b7 : 2000000000173d30
f6 : 1003ecccccccccccccccd f7 : 1003e0000000000000007
f8 : 1000c9404000000000000 f9 : 0ffff8000000000000000
f10 : 1003e0000000000002501 f11 : 1000c9403fffff6bfc000
r1 : 200000000029c200 r2 : 2000000000304e58 r3 : 60000fffffffafb0
r8 : 0000000000000009 r9 : 00000000fbad8001 r10 : 0000000000000c00
r11 : 20000000002ffa98 r12 : 60000fffffffafb0 r13 : 2000000000081de0
r14 : 20000000000a9238 r15 : 20000000000a9240 r16 : 000000000011d360
r17 : 20000000001b9360 r18 : 200000000009c000 r19 : 200000000009c228
r20 : 0000000200000000 r21 : 0000000100000000 r22 : 0000000000000000
r23 : 200000000003e16c r24 : 4000000000001bd0 r25 : 200000000003e0d0
r26 : 6000000000001da8 r27 : 200000000029c200 r28 : 20000000003008d0
r29 : 20000000003008c8 r30 : 20000000000804c8 r31 : 000000000000142b
r32 : 0000000000000000 r33 : 60000fffffffb42c r34 : 400000000000ba00
Kernel panic - not syncing: Aiee, killing interrupt handler!
Rebooting in 5 seconds..
On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
> the readprofile times, I'd say per-cpu would be the way to go just because it
> retains the simplicity of the current approach while allowing it to work on
> large machines (as well as limiting the performance impact of builtin
> profiling in general). wli's approach seems like a reasonable tradeoff
> though, assuming what you suggest doesn't work.
per-cpu certainly sounds simple enough conceptually, so if you can
notice any slowdown even with idle loop ruled out, per-cpu is sure
better.
This bouncing is likely to hurt smaller SMP too (but once the cpu is
idle normally it's not a too bad thing since it only hurted reschedule
latency, since we remain stuck in the timer irq for a bit longer than we
should), but duplicating the ram of the array there doesn't look as nice
as it would be on the altix, not all SMP have tons of ram. So an
intermediate solution for this problem still sound worthwhile for the
normal smp case.
On Mon, Sep 13, 2004 at 07:53:04PM -0700, William Lee Irwin III wrote:
>> Not all binfmts page align ->end_code and ->start_code, so the task_mmu
>> statistics calculations need to perform this allocation themselves.
On Mon, Sep 13, 2004 at 07:54:58PM -0700, William Lee Irwin III wrote:
> s/allocation/alignment/
Andi Kleen requested that the number of pagetable pages in use by a
process be reported in /proc/$PID/status; this patch implements that.
Atop the text reporting fix. Compiletested on x86-64.
Index: mm5-2.6.9-rc1/arch/i386/mm/hugetlbpage.c
===================================================================
--- mm5-2.6.9-rc1.orig/arch/i386/mm/hugetlbpage.c 2004-08-13 22:37:42.000000000 -0700
+++ mm5-2.6.9-rc1/arch/i386/mm/hugetlbpage.c 2004-09-15 03:31:26.914794288 -0700
@@ -247,6 +247,7 @@
page = pmd_page(*pmd);
pmd_clear(pmd);
+ mm->nr_ptes--;
dec_page_state(nr_page_table_pages);
page_cache_release(page);
}
Index: mm5-2.6.9-rc1/arch/ppc64/mm/hugetlbpage.c
===================================================================
--- mm5-2.6.9-rc1.orig/arch/ppc64/mm/hugetlbpage.c 2004-09-13 16:27:32.000000000 -0700
+++ mm5-2.6.9-rc1/arch/ppc64/mm/hugetlbpage.c 2004-09-15 03:32:25.375906848 -0700
@@ -213,6 +213,7 @@
}
page = pmd_page(*pmd);
pmd_clear(pmd);
+ mm->nr_ptes--;
dec_page_state(nr_page_table_pages);
pte_free_tlb(tlb, page);
}
Index: mm5-2.6.9-rc1/fs/proc/task_mmu.c
===================================================================
--- mm5-2.6.9-rc1.orig/fs/proc/task_mmu.c 2004-09-13 19:43:19.000000000 -0700
+++ mm5-2.6.9-rc1/fs/proc/task_mmu.c 2004-09-15 03:42:42.746052320 -0700
@@ -18,12 +18,14 @@
"VmData:\t%8lu kB\n"
"VmStk:\t%8lu kB\n"
"VmExe:\t%8lu kB\n"
- "VmLib:\t%8lu kB\n",
+ "VmLib:\t%8lu kB\n"
+ "VmPTE:\t%8lu kB\n",
(mm->total_vm - mm->reserved_vm) << (PAGE_SHIFT-10),
mm->locked_vm << (PAGE_SHIFT-10),
mm->rss << (PAGE_SHIFT-10),
data << (PAGE_SHIFT-10),
- mm->stack_vm << (PAGE_SHIFT-10), text, lib);
+ mm->stack_vm << (PAGE_SHIFT-10), text, lib,
+ (PTRS_PER_PTE*sizeof(pte_t)*mm->nr_ptes) >> 10);
return buffer;
}
Index: mm5-2.6.9-rc1/include/linux/sched.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/sched.h 2004-09-14 14:44:05.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/sched.h 2004-09-15 03:22:38.650102728 -0700
@@ -227,7 +227,7 @@
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
unsigned long rss, total_vm, locked_vm, shared_vm;
- unsigned long exec_vm, stack_vm, reserved_vm, def_flags;
+ unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;
unsigned long saved_auxv[42]; /* for /proc/PID/auxv */
Index: mm5-2.6.9-rc1/kernel/fork.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/fork.c 2004-09-14 14:45:49.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/fork.c 2004-09-15 03:23:33.238803984 -0700
@@ -308,6 +308,7 @@
atomic_set(&mm->mm_count, 1);
init_rwsem(&mm->mmap_sem);
mm->core_waiters = 0;
+ mm->nr_ptes = 0;
mm->page_table_lock = SPIN_LOCK_UNLOCKED;
mm->ioctx_list_lock = RW_LOCK_UNLOCKED;
mm->ioctx_list = NULL;
Index: mm5-2.6.9-rc1/mm/memory.c
===================================================================
--- mm5-2.6.9-rc1.orig/mm/memory.c 2004-09-13 16:27:46.000000000 -0700
+++ mm5-2.6.9-rc1/mm/memory.c 2004-09-15 03:30:32.241105952 -0700
@@ -114,6 +114,7 @@
page = pmd_page(*dir);
pmd_clear(dir);
dec_page_state(nr_page_table_pages);
+ tlb->mm->nr_ptes--;
pte_free_tlb(tlb, page);
}
@@ -163,7 +164,6 @@
spin_lock(&mm->page_table_lock);
if (!new)
return NULL;
-
/*
* Because we dropped the lock, we should re-check the
* entry, as somebody else could have populated it..
@@ -172,6 +172,7 @@
pte_free(new);
goto out;
}
+ mm->nr_ptes++;
inc_page_state(nr_page_table_pages);
pmd_populate(mm, pmd, new);
}
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> +cfq-iosched-v2.patch
> Major revamp of the CFQ IO scheduler
While editing some files while booted into 2.6.9-rc1-mm5:
# ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:1359
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: st sr_mod floppy usbserial parport_pc lp parport snd_seq_oss snd_seq_device snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_ioctl32 thermal processor fan button battery snd_intel8x0 snd_ac97_codec snd_pcm snd_timer ipv6 ac snd soundcore snd_page_alloc af_packet joydev usbhid ehci_hcd e1000 uhci_hcd usbcore hw_random evdev dm_mod ext3 jbd aic79xx ata_piix libata sd_mod scsi_mod
Pid: 9615, comm: cc1 Not tainted 2.6.9-rc1-mm5
RIP: 0010:[<ffffffff80290ab6>] <ffffffff80290ab6>{cfq_put_request+166}
RSP: 0000:ffffffff804c8638 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 000001017e2c3b80 RCX: 00000000000049f2
RDX: 0000000000000001 RSI: 000001017e75cd10 RDI: 000001000b5d57c0
RBP: 000001017e75cd10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 000001016d1b3db0
R13: 000001017d142c08 R14: 000001017fff1400 R15: 0000000000000001
FS: 0000002a9588d6e0(0000) GS:ffffffff8055c880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000548000 CR3: 0000000000101000 CR4: 00000000000006e0
Process cc1 (pid: 9615, threadinfo 000001011f720000, task 000001012d4897e0)
Stack: 0000000000001000 000001016d1b3db0 000001017e2c3b80 0000000000000001
0000000000000001 000001017e2c3b80 0000000000000200 ffffffff8028527f
0000010163320300 ffffffff80287bfb
Call Trace:<IRQ> <ffffffff8028527f>{elv_put_request+15} <ffffffff80287bfb>{__blk_put_request+139}
<ffffffff80287d33>{end_that_request_last+243} <ffffffffa0006178>{:scsi_mod:scsi_end_request+200}
<ffffffffa00063f0>{:scsi_mod:scsi_io_completion+576}
<ffffffffa0000506>{:scsi_mod:scsi_finish_command+214}
<ffffffffa0000e4a>{:scsi_mod:scsi_softirq+234} <ffffffff8013df61>{__do_softirq+113}
<ffffffff8013e015>{do_softirq+53} <ffffffff80113f1f>{do_IRQ+335}
<ffffffff80110c97>{ret_from_intr+0} <EOI>
Code: 0f 0b 26 9b 38 80 ff ff ff ff 4f 05 ff c8 41 89 44 95 58 0f
RIP <ffffffff80290ab6>{cfq_put_request+166} RSP <ffffffff804c8638>
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
-- wli
On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> > +cfq-iosched-v2.patch
> > Major revamp of the CFQ IO scheduler
>
> While editing some files while booted into 2.6.9-rc1-mm5:
>
> # ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at cfq_iosched:1359
Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
etc)?
--
Jens Axboe
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
>>> +cfq-iosched-v2.patch
>>> Major revamp of the CFQ IO scheduler
On Wed, Sep 15 2004, William Lee Irwin III wrote:
>> While editing some files while booted into 2.6.9-rc1-mm5:
>> # ----------- [cut here ] --------- [please bite here ] ---------
>> Kernel BUG at cfq_iosched:1359
On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> etc)?
2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
binary or 3rd-party modules anywhere near the box' fs or even the
network the thing is on. lspci output follows.
-- wli
0000:00:00.0 Host bridge: Intel Corp. Workstation Memory Controller Hub (rev 08)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: [40] #09 [a105]
0000:00:00.1 Class ff00: Intel Corp. Memory Controller Hub Error Reporting Register (rev 08)
Subsystem: Intel Corp. Memory Controller Hub Error Reporting Register
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
0000:00:03.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port A1 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=00, secondary=02, subordinate=04, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa400000-fa8fffff
Prefetchable memory behind bridge: 00000000bfe00000-00000000bfe00000
Expansion ROM at 0000d000 [disabled] [size=4K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0141]
0000:00:04.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B0 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fa900000-feafffff
Prefetchable memory behind bridge: 00000000bff00000-00000000dfe00000
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0141]
0000:00:08.0 System peripheral: Intel Corp. Memory Controller Hub Extended Configuration Registers (rev 08)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
0000:00:1d.0 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 201
Region 4: I/O ports at e080 [size=32]
0000:00:1d.1 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 209
Region 4: I/O ports at e400 [size=32]
0000:00:1d.2 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin C routed to IRQ 169
Region 4: I/O ports at e480 [size=32]
0000:00:1d.7 USB Controller: Intel Corp. 82801EB USB2 (rev 02) (prog-if 20 [EHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 193
Region 0: Memory at febff400 (32-bit, non-prefetchable)
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] #0a [20a0]
0000:00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB PCI Bridge (rev c2) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: fa300000-fa3fffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB LPC Interface Controller (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
0000:00:1f.2 IDE interface: Intel Corp. 82801EB Ultra ATA Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Intel Corp. 82801EB Ultra ATA Storage Controller
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 169
Region 0: I/O ports at <unassigned>
Region 1: I/O ports at <unassigned>
Region 2: I/O ports at <unassigned>
Region 3: I/O ports at <unassigned>
Region 4: I/O ports at fc00 [size=16]
0000:00:1f.3 SMBus: Intel Corp. 82801EB SMBus Controller (rev 02)
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 5
Region 4: I/O ports at e800 [size=32]
0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801EB AC'97 Audio Controller (rev 02)
Subsystem: Intel Corp.: Unknown device e801
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 217
Region 0: I/O ports at ec00
Region 1: I/O ports at e880 [size=64]
Region 2: Memory at febffc00 (32-bit, non-prefetchable) [size=512]
Region 3: Memory at febff800 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
0000:01:02.0 Ethernet controller: Intel Corp. 82541GI Gigabit Ethernet Controller
Subsystem: Intel Corp.: Unknown device 3408
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin A routed to IRQ 217
Region 0: Memory at fa3e0000 (32-bit, non-prefetchable) [size=180000000]
Region 1: Memory at fa3c0000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at cc80 [size=64]
Expansion ROM at 00020000 [disabled]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:02:00.0 PCI bridge: Intel Corp.: Unknown device 0320 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=02, secondary=04, subordinate=04, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa400000-fa6fffff
Prefetchable memory behind bridge: 00000000bfe00000-00000000bfe00000
Expansion ROM at 0000d000 [disabled] [size=4K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8]
0000:02:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A (rev 08) (prog-if 20 [IO(X)-APIC])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fa8fe000 (32-bit, non-prefetchable)
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
0000:02:00.2 PCI bridge: Intel Corp.: Unknown device 0321 (rev 08) (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=02, secondary=03, subordinate=03, sec-latency=64
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000fff00000-0000000000000000
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8]
0000:02:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller B (rev 08) (prog-if 20 [IO(X)-APIC])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fa8ff000 (32-bit, non-prefetchable)
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
0000:04:03.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
Subsystem: Adaptec: Unknown device ffff
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (10000ns min, 6250ns max), cache line size 10
Interrupt: pin A routed to IRQ 177
Region 0: I/O ports at d400 [size=180000000]
Region 1: Memory at fa6fc000 (64-bit, non-prefetchable) [disabled] [size=8K]
Region 3: I/O ports at d000 [size=256]
Expansion ROM at ffffffff3ff00000 [disabled]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [94]
0000:04:03.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
Subsystem: Adaptec: Unknown device ffff
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (10000ns min, 6250ns max), cache line size 10
Interrupt: pin B routed to IRQ 185
Region 0: I/O ports at dc00 [size=180000000]
Region 1: Memory at fa6fe000 (64-bit, non-prefetchable) [disabled] [size=8K]
Region 3: I/O ports at d800 [size=256]
Expansion ROM at ffffffff3ff00000 [disabled]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [94]
0000:05:00.0 VGA compatible controller: nVidia Corporation: Unknown device 00fd (rev a2) (prog-if 00 [VGA])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=feae0000]
Region 1: Memory at c0000000 (32-bit, prefetchable) [size=256M]
Region 2: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Expansion ROM at 00020000 [disabled]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [78] #10 [0011]
On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> >>> +cfq-iosched-v2.patch
> >>> Major revamp of the CFQ IO scheduler
>
> On Wed, Sep 15 2004, William Lee Irwin III wrote:
> >> While editing some files while booted into 2.6.9-rc1-mm5:
> >> # ----------- [cut here ] --------- [please bite here ] ---------
> >> Kernel BUG at cfq_iosched:1359
>
> On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> > Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> > etc)?
>
> 2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
> binary or 3rd-party modules anywhere near the box' fs or even the
> network the thing is on. lspci output follows.
Hmm, I can only see this happening if rq->flags has its direction bit
changed between the allocation time and the time of freeing. I'll look
over scsi and see if I can find any traces of that, don't see any
immediately.
--
Jens Axboe
On Wed, Sep 15 2004, Jens Axboe wrote:
> On Wed, Sep 15 2004, William Lee Irwin III wrote:
> > On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> > >>> +cfq-iosched-v2.patch
> > >>> Major revamp of the CFQ IO scheduler
> >
> > On Wed, Sep 15 2004, William Lee Irwin III wrote:
> > >> While editing some files while booted into 2.6.9-rc1-mm5:
> > >> # ----------- [cut here ] --------- [please bite here ] ---------
> > >> Kernel BUG at cfq_iosched:1359
> >
> > On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> > > Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> > > etc)?
> >
> > 2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
> > binary or 3rd-party modules anywhere near the box' fs or even the
> > network the thing is on. lspci output follows.
>
> Hmm, I can only see this happening if rq->flags has its direction bit
> changed between the allocation time and the time of freeing. I'll look
> over scsi and see if I can find any traces of that, don't see any
> immediately.
Can you try if this works?
--- linux-2.6.9-rc1-mm5/drivers/block/cfq-iosched.c~ 2004-09-15 14:50:14.941876065 +0200
+++ linux-2.6.9-rc1-mm5/drivers/block/cfq-iosched.c 2004-09-15 14:51:09.889996813 +0200
@@ -195,6 +195,7 @@
unsigned int in_flight : 1;
unsigned int accounted : 1;
unsigned int is_sync : 1;
+ unsigned int is_write : 1;
};
static struct cfq_queue *cfq_find_cfq_hash(struct cfq_data *, unsigned long);
@@ -1353,12 +1354,12 @@
if (crq->io_context)
put_io_context(crq->io_context->ioc);
+ BUG_ON(!cfqq->allocated[crq->is_write]);
+ cfqq->allocated[crq->is_write]--;
+
mempool_free(crq, cfqd->crq_pool);
rq->elevator_private = NULL;
- BUG_ON(!cfqq->allocated[rw]);
- cfqq->allocated[rw]--;
-
smp_mb();
cfq_check_waiters(q, cfqq);
cfq_put_queue(cfqq);
@@ -1415,6 +1416,7 @@
crq->io_context = cic;
crq->service_start = crq->queue_start = 0;
crq->in_flight = crq->accounted = crq->is_sync = 0;
+ crq->is_write = rw;
rq->elevator_private = crq;
cfqq->allocated[rw]++;
cfqq->alloc_limit[rw] = 0;
--
Jens Axboe
On Wed, Sep 15 2004, Jens Axboe wrote:
>> Hmm, I can only see this happening if rq->flags has its direction bit
>> changed between the allocation time and the time of freeing. I'll look
>> over scsi and see if I can find any traces of that, don't see any
>> immediately.
On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
> Can you try if this works?
Booting it ASAP.
-- wli
On Wed, Sep 15 2004, Jens Axboe wrote:
>>> Hmm, I can only see this happening if rq->flags has its direction bit
>>> changed between the allocation time and the time of freeing. I'll look
>>> over scsi and see if I can find any traces of that, don't see any
>>> immediately.
On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
>> Can you try if this works?
On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
> Booting it ASAP.
It appears to have lasted enough hours to call it an improvement. I'll
leave it running for a while longer just in case.
-- wli
On Wed, Sep 15 2004, Jens Axboe wrote:
>>>> Hmm, I can only see this happening if rq->flags has its direction bit
>>>> changed between the allocation time and the time of freeing. I'll look
>>>> over scsi and see if I can find any traces of that, don't see any
>>>> immediately.
On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
>>> Can you try if this works?
On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
>> Booting it ASAP.
On Wed, Sep 15, 2004 at 05:38:19PM -0700, William Lee Irwin III wrote:
> It appears to have lasted enough hours to call it an improvement. I'll
> leave it running for a while longer just in case.
Okay, it got well over 8 solid hours, so I'm going to move on to booting
something else.
-- wli
On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Wed, Sep 15 2004, Jens Axboe wrote:
> >>>> Hmm, I can only see this happening if rq->flags has its direction bit
> >>>> changed between the allocation time and the time of freeing. I'll look
> >>>> over scsi and see if I can find any traces of that, don't see any
> >>>> immediately.
>
> On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
> >>> Can you try if this works?
>
> On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
> >> Booting it ASAP.
>
> On Wed, Sep 15, 2004 at 05:38:19PM -0700, William Lee Irwin III wrote:
> > It appears to have lasted enough hours to call it an improvement. I'll
> > leave it running for a while longer just in case.
>
> Okay, it got well over 8 solid hours, so I'm going to move on to booting
> something else.
Thanks for your testing, I'm concluding that it most likely fixed your
problem.
--
Jens Axboe