2004-09-13 08:53:43

by Andrew Morton

[permalink] [raw]
Subject: 2.6.9-rc1-mm5


Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at

http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/

and will later appear at

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/

Please check kernel.org before using zip.com.au.



- Added the `bk-scsi-target' tree to the -mm lineup. It is managed by James
Bottomley

- Some enhancements to the ext3 block reservation code here. Please cc
[email protected] on oops reports ;)

- There's a patch here which will cause warnings if a PCI device driver is
removed without having called pci_disable_device(). Please try to cc the
appropriate mailing list or maintainer when reporting any instances.




Changes since 2.6.9-rc1-mm4:


linus.patch
bk-acpi.patch
bk-agpgart.patch
bk-alsa.patch
bk-cpufreq.patch
bk-driver-core.patch
bk-ia64.patch
bk-ieee1394.patch
bk-input.patch
bk-netdev.patch
bk-pci.patch
bk-pnp.patch
bk-power.patch
bk-scsi.patch
bk-scsi-target.patch
bk-usb.patch
bk-watchdog.patch

External trees

-pkt_act-fix.patch -ksysfs-build-fix.patch -ppc-build-fix.patch
-ppc64-allow-sd_nodes_per_domain-to-be-overridden.patch
-ppc64-fix-hang-on-oprofile-shutdown.patch
-ppc64-fix-__rw_yield-prototype.patch
-ppc64-be-resilient-against-sysfs-pci-config-accesses.patch
-ppc64-cut-down-paca-footprint.patch -ppc64-fix-boot-memory-reporting.patch
-ppc64-fix-power5-js20-smp-init.patch
-cleanup-fix-lost-ticks-handling-on-x86-64.patch
-factor-out-common-asm-hardirqh-code.patch
-scsi-qla2xxx-fix-inline-compile-errors.patch
-add-pci_fixup_enable-pass.patch
-cleanup-ptrace-stops-and-remove-notify_parent.patch
-cleanup-ptrace-stops-and-remove-notify_parent-extra.patch
-ptrace-api-preservation.patch
-nix-rusage_group.patch
-i386-syscall-tracing-of-bogus-system-calls.patch
-make-single-step-into-signal-delivery-stop-in-handler.patch
-cdrom-range-fixes.patch
-vsxxxaac-fixups.patch
-disambiguate-espc-clones.patch
-allow-cluster-wide-flock.patch
-allow-cluster-wide-flock-update.patch
-filemap-read-fix.patch
-fix-f_version-optimization-for-get_tgid_list.patch
-kernel-sysfs-events-layer.patch
-centralize-some-nls-helpers.patch
-remove-unused-sysctls-from-kernel-personalityc.patch
-fs-compatc-rwsem-instead-of-bkl-around-ioctl32_hash_table.patch
-small-wait_on_page_writeback_range-optimization.patch
-3w-xxxxc-queue-depth.patch
-md-correct-working_disk-counts-for-raid5-and-raid6.patch
-knfsd-calls-to-break_lease-in-nfsd-should-be-o_nonblocking.patch
-knfsd-return-eacces-instead-of-estale-for-certain-filehandle-lookup-failures.patch
-knfsd-fix-incorrect-indentation-in-fh_verify.patch
-nfsd4-support-acl_support-attribute.patch
-knfsd-trivial-cleanup-of-nfs4statec.patch
-nfsd4-could-leak-a-stateid-in-an-error-path.patch
-nfsd4-postpone-release-of-stateowner-on-close.patch
-nfsd4-store-current-tgid-instead-of-lockowner-hash-in-fl_pid.patch
-knfsd-remove-redundant-initialization-in-nfsd4_lockt.patch
-remove-in-kernel-init_module-cleanup_module-stubs.patch
-remove-ext2_panic.patch
-s390-export-copy_in_user.patch
-s390-minmax-removal-arch-s390-kernel-debugc.patch
-s390-packed-stack-vs-cpu-hotplug.patch
-s390-lcs-multicast-deadlock.patch
-allow-i8042-register-location-override-2.patch
-zlib_inflate-move-zlib_inflatesync-friends.patch
-zlib_inflate-make-zlib_inflate_trees_fixed-generate-the-table.patch
-ppc32-switch-arch-ppc-boot-to-lib-zlib_inflate.patch
-ext3-dreference-of-sb-preceeds-check.patch
-fbdev-speed-up-scrolling-of-tdfxfb.patch
-fbdev-ppc-crash-and-other-fixes-for-rivafb.patch
-fbcon-take-over-console-on-driver-registration.patch
-fbdev-clean-up-framebuffer-initialization.patch
-fbdev-add-module_init-and-fb_get_options-per-driver.patch
-remove-bogus-memset-from-cpqfc-driver.patch
-hpt366-ptr-use-before-null-check.patch
-crypto-teac-xtea_encrypt-should-use-xtea_delta.patch
-aio-dio-oops-fix.patch
-riscom8-build-fix.patch
-use-for_each_cpu-in-oprofile-code.patch
-fix-oprofile-vfree-warning-on-error.patch
-speed-up-oprofile-buffer-drain-code.patch
-speed-up-oprofile-buffer-drain-code-fix.patch
-cdu31a-build-fix.patch
-synclinkc-kernel-janitor-changes.patch
-adfs-add-static.patch
-isofs-add-static.patch
-correct-elf-section-used-for-out-of-line-spinlocks.patch
-tsc-synchronisation-cleanup.patch
-add-static-in-affs.patch
-add-static-in-afs.patch
-add-static-in-befs.patch
-codemercs-io-warrior-support.patch
-fat-use-hlist_head-for-fat_inode_hashtable-1-4.patch
-fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
-fat-cache-lock-from-per-sb-to-per-inode-3-4.patch
-fat-the-inode-hash-from-per-module-to-per-sb-4-4.patch
-uml-avoid-using-elv_queue_empty.patch
-uml-avoid-forcing-use-of-the-no-op-scheduler.patch
-uml-correct-the-failure-path-in-start_io_thread.patch
-fix-address_spacei_mmap-comment.patch
-remove-mod_incdec_use_count-users-that-got-back-in.patch
-dont-mention-mod_incdec_use_count-in-documentation.patch

Merged

+remove-set_fs-from-compat-sched-affinity-syscalls.patch

Remove the set_fs hack in the compat affinity calls.

+allow-compat-long-sized-bitmasks-in-affinity-code.patch

compat_sys_sched_getaffity() fix

+fix-schedstats-null-deref-in-sched_exec.patch

Fix an oops

+rock-fix.patch

Fix the rock.c driver

-es7000-subarch-update.patch
+2681-es7000-subarch-update.patch

New es7000 update

+exec-fix-posix-timers-leak-and-pending-signal-loss.patch

Fix some leaks

+fix-abi-in-set_mempolicy.patch

Fix up the numa memory policy stuff

+ksysfs-warning-fix.patch
+kobject_uevent-warning-fix.patch
+fix-smm-failures-on-e750x-systems.patch
+vsxxxaac-fixups.patch
+allow-i8042-register-location-override-2.patch
+tmscsim-build-fix.patch

Various fixes for various people's bk trees

+swsusp-documentation-update.patch
+small-cleanups-for-swsusp.patch
+swsusp-kill-crash-when-too-much-memory-is-free.patch
+swsusp-progress-in-percent.patch
+swsusp-clean-up-reading.patch
+swsusp-another-simplification.patch
+acpi-proc-simplify-error-handling.patch

swsusp stuff

+ppc64-lparcfg-fixes-for-processor-counts.patch
+ppc64-lparcfg-whitespace-and-wordwrap-cleanup.patch
+ppc64-remove-spinline-config-option.patch
+ppc64-rtas-error-logs-can-appear-twice-in-dmesg.patch
+ppc64-enable-numa-api.patch
+ppc64-give-the-kernel-an-opd-section.patch
+ppc64-use-nm-synthetic-where-available.patch
+ppc64-clean-up-kernel-command-line-code.patch
+ppc64-remove-unused-ppc64_calibrate_delay.patch
+ppc64-remove-eeh-command-line-device-matching-code.patch
+ppc64-use-early_param.patch
+ppc64-restore-smt-enabled=off-kernel-command-line-option.patch
+ppc64-enable-power5-low-power-mode-in-idle-loop.patch
+ppc64-clean-up-idle-loop-code.patch
+ppc64-remove-wno-uninitialized.patch
+ppc64-fix-real-bugs-uncovered-by-wno-uninitialized-removal.patch
+ppc64-fix-spurious-warnings-uncovered-by-wno-uninitialized-removal.patch
+hvc-uninitialised-variable.patch
+ppc64-improved-vsid-allocation-algorithm.patch
+ppc64fix-missing-register-in-altivec-context-switch.patch
+ppc32-remove-wno-uninitialized.patch
+ppc32-pmac-cpufreq-for-ibook-2-600.patch

ppc[64] updates

-pid_max-fix.patch

Dropped - wli fixed this by other means.

+lockmeter.patch

Repaired lockmeter patch

+ext3-reservations-spelling-fixes.patch
+ext3-reservations-renumber-the-ext3-reservations-ioctls.patch
+ext3-reservations-remove-unneeded-declaration.patch
+ext3-reservations-turn-ext3-per-sb-reservations-list-into-an-rbtree.patch
+ext3-reservations-split-the-reserve_window-struct-into-two.patch
+ext3-reservations-smp-protect-the-reservation-during-allocation.patch

ext3 block reservation enhancements: fix a few things and use an rbtree

+sched-trivial-sched-changes.patch
+sched-add-cpu_down_prepare-notifier.patch
+sched-integrate-cpu-hotplug-and-sched-domains.patch
+sched-arch_destroy_sched_domains-warning-fix.patch
+sched-sched-add-load-balance-flag.patch
+sched-remove-disjoint-numa-domains-setup.patch
+sched-make-domain-setup-overridable.patch
+sched-make-domain-setup-overridable-rename.patch
+sched-ia64-add-disjoint-numa-domain-support.patch
+sched-fix-domain-debug-for-isolcpus.patch
+sched-enable-sd_load_balance.patch
+sched-hotplug-add-a-cpu_down_failed-notifier.patch
+sched-use-cpu_down_failed-notifier.patch
+sched-fixes-for-ia64-domain-setup.patch

CPU scheduler work.

-journal_clean_checkpoint_list-latency-fix.patch
-filemap_sync-latency-fix.patch
-pty_write-latency-fix.patch

Dropped these scheduling latency changes - let's see what Ingo's ones look
like

propagate-pci_enable_device-errors.patch

+move-syscall-declarations-from-linux-keyh-2.patch
+make-key-management-use-syscalls-not-prctls-build-fix.patch

Key management code updates

+cachefs-return-the-right-error-upon-invalid-mount.patch
+remove-error-from-linux-cachefsh.patch
+cachefs-warning-fix-2.patch
+cachefs-linkage-fix-2.patch
-cachefs-linkage-fix.patch

Various updates to cachefs

+cpusets-display-allowed-masks-in-proc-status.patch
+cpusets-simplify-cpus_allowed-setting-in-attach.patch
+cpusets-remove-useless-validation-check.patch
+cpusets-fix-possible-race-in-cpuset_tasks_read.patch
+cpusets-interoperate-with-hotplug-online-maps.patch

cpusets fixes/updates

+stop-reiser4-from-turning-itself-on-by-default.patch

reiser4 Kconfig fix

+kallsyms-fix-sparc-gibberish.patch

Fix endianness in the new kallsyms handling code

+m32r-update-for-profiling.patch
+m32r-update-zone_sizes_init.patch
+m32r-update-to-fix-compile-errors.patch
+m32r-update-uaccessh.patch
+m32r-update-checksum-functions.patch
+m32r-update-cf-pcmcia-drivers.patch
+m32r-update-headers-to-remove-useless-ibcs2-support-code.patch
+atomic_inc_return-for-m32r-re.patch

m32r updates

-lighten-mmlist_lock.patch

Dropped - Hugh had second thoughts

+misrouted-irq-recovery-take-2-fix.patch
+misrouted-irq-recovery-docs.patch

Smarter workarounds for ia32 IRQ routing problems

+cfq-iosched-v2.patch

Major revamp of the CFQ IO scheduler

+dont-export-blkdev_open-and-def_blk_ops.patch
+remove-dead-code-from-fs-mbcachec.patch
+remove-posix_acl_masq_nfs_mode.patch
+make-kmem_find_general_cachep-static-in-slabc.patch
+dont-export-shmem_file_setup.patch
+remove-pm_find-unexport-pm_send.patch
+remove-dead-code-and-exports-from-signalc.patch
+mark-md_interrupt_thread-static.patch
+unexport-proc_sys_root.patch
+mark-dq_list_lock-static.patch
+unexport-is_subdir-and-shrink_dcache_anon.patch
+unexport-devfs_mk_symlink.patch
+unexport-do_execve-do_select.patch
+unexport-exit_mm.patch
+unexport-files_lock-and-put_filp.patch
+remove-exports-from-audit-code.patch
+unexport-f_delown.patch
+unexport-lookup_create.patch
+remove-wake_up_all_sync.patch
+remove-set_fs_root-set_fs_pwd.patch

Little fixes/cleanups

+add-prctl-to-modify-current-comm.patch

Allow current->comm to be modified via prctl()

+md-remove-md_flush_all.patch
+md-make-retry_list-non-global-in-raid1-and-multipath.patch
+md-rationalise-issue_flush-function-in-md-personalities.patch
+md-rationalise-unplug-functions-in-md.patch
+md-make-sure-md-always-uses-rdev_dec_pending-properly.patch
+md-fix-two-little-bugs-in-raid10.patch
+md-modify-locking-when-accessing-subdevices-in-md.patch

RAID update

+blk-max_sectors-tunables.patch

Make the per-queue max_sectors tunable for latency purposes

+generic-acl-support-for-permission.patch
+generic-acl-support-for-permission-fix.patch
+generic-acl-support-for-permission-keyfs-fix.patch

ACL code consolidation

+device-driver-for-the-sgi-system-clock-mmtimer.patch

New device driver

+rtl8150-fix.patch

Net driver fix

+close-race-with-preempt-and-modular-pm_idle-callbacks.patch

Fix a PM idle-handler race

+cacheline-align-pagevec-structure.patch

Finctune the pagevec size

+hvcs-fix-to-replace-yield-with-tty_wait_until_sent-in.patch

HVCS driver fix

+fbdev-remove-unnecessary-banshee_wait_idle-from-tdfxfb.patch
+fbdev-fix-logo-drawing-failure-for-vga16fb.patch
+fbdev-initialize-i810fb-after-agpgart.patch
+fbdev-fix-userland-compile-breakage.patch
+fbcon-fix-setup-boot-options-of-fbcon.patch
+fbdev-pass-struct-device-to-class_simple_device_add.patch
+fbdev-add-tile-blitting-support.patch

fbdev update

+fix-for-spurious-interrupts-on-e100-resume.patch

e100 PM resume workaround

+r8169-miscalculation-of-available-tx-descriptors.patch
+r8169-hint-for-tx-flow-control.patch
+r8169-tso-support.patch
+r8169-mac-identifier-extracted-from-realteks-driver-v22.patch

net driver update

+uml-remove-ghash.patch
+uml-eliminate-useless-thread-field.patch
+uml-fix-scheduler-race.patch
+uml-fix-binary-layout-assumption.patch
+uml-disable-pending-signals-across-a-reboot.patch
+uml-refer-to-config_usermode-not-to-config_um.patch
+uml-remove-commented-old-code-in-kconfig.patch
+uml-smp-build-fix.patch
+uml-remove-config_uml_smp.patch

UML updates

+highmem-flushes.patch

Missing dcache flushes in the bounce buffering code

+add-support-for-word-length-uart-registers.patch

Serial driver fix

+compile-fix-3c59x-for-eisa-without-pci.patch

Net driver build fix

+atomic_inc_return-for-i386.patch
+atomic_inc_return-for-x86_64.patch
+atomic_inc_return-for-arm.patch
+atomic_inc_return-for-arm26.patch
+atomic_inc_return-for-sparc64.patch

atomic_inc_return() for various architectures

+fix-uninitialized-warnings-in-mempolicyc.patch

Warning fixes

+online-cpu-with-maxcpus-option-panics.patch

Fix a crash with maxcpus=

+remove-dead-exports-from-fs-fat.patch
+fat-use-hlist_head-for-fat_inode_hashtable-1-6.patch
+fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
+fat-cache-lock-from-per-sb-to-per-inode-3-6.patch
+fat-the-inode-hash-from-per-module-to-per-sb-4-6.patch
+fat-fix-the-race-bitween-fat_free-and-fat_get_cluster.patch
+fat-remove-debug_pr-6-6.patch

fatfs update

+small-linux-hardirqh-tweaks.patch

hardirq.h fixes

+bsd-disklabel-handle-more-than-8-partitions.patch

Fix BSD disklabels

+asm-softirqh-crept-back-in-h8300-and-sh64.patch

Remove unneeded files (again)

+mark-amiflop-non-unloadable.patch

amiflop.c fixlet

+thinkpad-fnfx-key-driver.patch

Thinkpad function key fixes

+netpoll-endian-fixes.patch

netpoll fixes on big-endian

+rewrite-alloc_pidmap.patch

Clean up alloc_pidmap()

+missing-pci_disable_device.patch

Add a warning to check that drivers have called pci_disable_device() (Uses
CONFIG_DEBUG_KERNEL, and shouldn't).

+fbdev-radeonfb-remove-bogus-radeonfb_read-write.patch

radeonfb fix

+add-missing-pci_disable_device-for-pci-based-usb-hcd.patch
+add-missing-pci_disable_device-for-e1000.patch

Add pci_disable_device() to a couple of drivers

+next_thread-bug-fixes.patch

Remove some suspect BUG()s from next_thread().



number of patches in -mm: 432
number of changesets in external trees: 554
number of patches in -mm only: 416
total patches: 970




All 432 patches:


linus.patch

remove-set_fs-from-compat-sched-affinity-syscalls.patch
Remove set_fs() from compat sched affinity syscalls

allow-compat-long-sized-bitmasks-in-affinity-code.patch
Allow compat long sized bitmasks in affinity code

distinct-tgid-tid-cpu-usage.patch
distinct tgid/tid CPU usage

fix-schedstats-null-deref-in-sched_exec.patch
fix schedstats null deref in sched_exec

rock-fix.patch
rock.c: fix double-kfree()

2681-es7000-subarch-update.patch
ES7000 subarch update

exec-fix-posix-timers-leak-and-pending-signal-loss.patch
exec: fix posix-timers leak and pending signal loss

fix-abi-in-set_mempolicy.patch
Fix ABI in set_mempolicy()

__set_page_dirty_nobuffers-mappings.patch
__set_page_dirty_nobuffers mappings

sysfs-backing-store-prepare-file_operations.patch
sysfs backing store - prepare sysfs_file_operations helpers

sysfs-backing-store-prepare-file_operations-fix.patch
fix oops with firmware loading

sysfs-backing-store-add-sysfs_dirent.patch
sysfs backing store - add sysfs_direct structure

sysfs-backing-store-use-sysfs_dirent-tree-in-removal.patch
sysfs backing store: use sysfs_dirent based tree in file removal

sysfs-backing-store-use-sysfs_dirent-tree-in-dir-file_operations.patch
sysfs backing store: use sysfs_dirent based tree in dir file operations

sysfs-backing-store-stop-pinning-dentries-inodes-for-leaves.patch
sysfs backing store: stop pinning dentries/inodes for leaf entries

bk-acpi.patch

acpi-compile-fix.patch
acpi-compile-fix

acpi-x86_64-build-fix.patch
acpi x86_64 build fix

bk-agpgart.patch

bk-alsa.patch

bk-cpufreq.patch

bk-driver-core.patch

ksysfs-warning-fix.patch
ksysfs warning fix

kobject_uevent-warning-fix.patch
kobject_uevent warning fix

bk-ia64.patch

bk-ieee1394.patch

bk-input.patch

fix-smm-failures-on-e750x-systems.patch
fix SMM failures on E750x systems

vsxxxaac-fixups.patch
vsxxxaa.c fixups

allow-i8042-register-location-override-2.patch
allow i8042 register location override #2

bk-netdev.patch

bk-pci.patch

bk-pnp.patch

bk-power.patch

bk-scsi.patch

bk-scsi-target.patch

tmscsim-build-fix.patch
tmscsim-build-fix

bk-usb.patch

bk-watchdog.patch

mm.patch
add -mmN to EXTRAVERSION

mm-swsusp-make-sure-we-do-not-return-to-userspace-where-image-is-on-disk.patch
-mm swsusp: make sure we do not return to userspace where image is on disk

mm-swsusp-copy_page-is-harmfull.patch
-mm swsusp: copy_page is harmfull

swsusp-fix-highmem.patch
swsusp: fix highmem

swsusp-do-not-disable-platform-swsusp-because-s4bios-is-available.patch
swsusp: do not disable platform swsusp because S4bios is available

swsusp-fix-default-powerdown-mode.patch
swsusp: fix default powerdown mode

mark-old-power-managment-as-deprecated-and-clean-it-up.patch
Mark old power managment as deprecated and clean it up

use-global-system_state-to-avoid-system-state-confusion.patch
Use global system_state to avoid system-state confusion

swsusp-error-do-not-oops-after-allocation-failure.patch
swsusp: do not oops after allocation failure

swsusp-documentation-update.patch
swsusp: Documentation update

small-cleanups-for-swsusp.patch
Small cleanups for swsusp

swsusp-kill-crash-when-too-much-memory-is-free.patch
swsusp: kill crash when too much memory is free

swsusp-progress-in-percent.patch
swsusp: progress in percent

swsusp-clean-up-reading.patch
swsusp: clean up reading

swsusp-another-simplification.patch
swsusp: another simplification

acpi-proc-simplify-error-handling.patch
acpi proc: simplify error handling

pegasus-fixes.patch
pegasus.c fixes

pointer-dereference-before-null-check-in-acpi-thermal-driver.patch
Pointer dereference before NULL check in ACPI thermal driver

network-packet-tracer-module-using-kprobes-interface.patch
Network packet tracer module using kprobes interface.

kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)
kgdbL warning fix
kgdb buffer overflow fix
kgdbL warning fix
kgdb: CONFIG_DEBUG_INFO fix
x86_64 fixes
correct kgdb.txt Documentation link (against 2.6.1-rc1-mm2)
kgdb: fix for recent gcc
kgdb warning fixes
THREAD_SIZE fixes for kgdb
Fix stack overflow test for non-8k stacks
kgdb-ga.patch fix for i386 single-step into sysenter
fix TRAP_BAD_SYSCALL_EXITS on i386
add TRAP_BAD_SYSCALL_EXITS config for i386

kgdb-is-incompatible-with-kprobes.patch
kgdb-is-incompatible-with-kprobes

kgdboe-netpoll.patch
kgdb-over-ethernet via netpoll
kgdboe: fix configuration of MAC address

kgdb-x86_64-support.patch
kgdb-x86_64-support.patch for 2.6.2-rc1-mm3
kgdb-x86_64-warning-fixes

kgdb-ia64-support.patch
IA64 kgdb support
ia64 kgdb repair and cleanup
ia64 kgdb fix

kgdb-ia64-fixes.patch
kgdb: ia64 fixes

make-tree_lock-an-rwlock.patch
make mapping->tree_lock an rwlock

must-fix.patch
must fix lists update
must fix list update
mustfix update
must-fix update
mustfix lists

ppc64-lparcfg-fixes-for-processor-counts.patch
ppc64: lparcfg fixes for processor counts

ppc64-lparcfg-whitespace-and-wordwrap-cleanup.patch
ppc64: lparcfg whitespace and wordwrap cleanup.

ppc64-remove-spinline-config-option.patch
ppc64: remove SPINLINE config option

ppc64-rtas-error-logs-can-appear-twice-in-dmesg.patch
ppc64: RTAS error logs can appear twice in dmesg

ppc64-enable-numa-api.patch
ppc64: Enable NUMA API

ppc64-give-the-kernel-an-opd-section.patch
ppc64: give the kernel an OPD section

ppc64-use-nm-synthetic-where-available.patch
ppc64: use nm --synthetic where available

ppc64-clean-up-kernel-command-line-code.patch
ppc64: clean up kernel command line code

ppc64-remove-unused-ppc64_calibrate_delay.patch
ppc64: remove unused ppc64_calibrate_delay

ppc64-remove-eeh-command-line-device-matching-code.patch
ppc64: remove EEH command line device matching code

ppc64-use-early_param.patch
ppc64: use early_param

ppc64-restore-smt-enabled=off-kernel-command-line-option.patch
ppc64: restore smt-enabled=off kernel command line option

ppc64-enable-power5-low-power-mode-in-idle-loop.patch
ppc64: enable POWER5 low power mode in idle loop

ppc64-clean-up-idle-loop-code.patch
ppc64: clean up idle loop code

ppc64-remove-wno-uninitialized.patch
ppc64: remove -Wno-uninitialized

ppc64-fix-real-bugs-uncovered-by-wno-uninitialized-removal.patch
ppc64: Fix real bugs uncovered by -Wno-uninitialized removal

ppc64-fix-spurious-warnings-uncovered-by-wno-uninitialized-removal.patch
ppc64: Fix spurious warnings uncovered by -Wno-uninitialized removal

hvc-uninitialised-variable.patch
hvc: uninitialised variable

ppc64-improved-vsid-allocation-algorithm.patch
ppc64: improved VSID allocation algorithm

ppc64fix-missing-register-in-altivec-context-switch.patch
ppc64: fix missing register in altivec context switch

ppc32-remove-wno-uninitialized.patch
ppc32: remove -Wno-uninitialized

ppc32-pmac-cpufreq-for-ibook-2-600.patch
ppc32: pmac cpufreq for ibook 2 600

lazy-tsss-i-o-bitmap-copy-for-x86-64.patch
lazy TSS's I/O bitmap copy for x86-64

lazy-tsss-i-o-bitmap-copy-for-x86-64-fix.patch
lazy-tsss-i-o-bitmap-copy-for-x86-64-fix

ppc64-reloc_hide.patch

invalidate_inodes-speedup.patch
invalidate_inodes speedup
more invalidate_inodes speedup fixes

dev-mem-restriction-patch.patch
/dev/mem restriction patch

get_user_pages-handle-VM_IO.patch
fix get_user_pages() against mappings of /dev/mem

jbd-remove-livelock-avoidance.patch
JBD: remove livelock avoidance code in journal_dirty_data()

journal_add_journal_head-debug.patch
journal_add_journal_head-debug

list_del-debug.patch
list_del debug check

lockmeter.patch
lockmeter
ia64 CONFIG_LOCKMETER fix
lockmeter-build-fix
lockmeter for x86_64

unplug-can-sleep.patch
unplug functions can sleep

firestream-warnings.patch
firestream warnings

ext3_rsv_cleanup.patch
ext3 block reservation patch set -- ext3 preallocation cleanup

ext3_rsv_base.patch
ext3 block reservation patch set -- ext3 block reservation
ext3 reservations: fix performance regression
ext3 block reservation patch set -- mount and ioctl feature
ext3 block reservation patch set -- dynamically increase reservation window
ext3 reservation ifdef cleanup patch
ext3 reservation max window size check patch
ext3 reservation file ioctl fix

ext3-reservation-default-on.patch
ext3 reservation: default to on

ext3-lazy-discard-reservation-window-patch.patch
ext3 lazy discard reservation window patch
ext3 discard reservation in last iput fix patch
Fix lazy reservation discard
ext3 reservations: bad_inode fix
ext3 reservation discard race fix

ext3-reservations-spelling-fixes.patch
ext3 reservations: Spelling fixes

ext3-reservations-renumber-the-ext3-reservations-ioctls.patch
ext3 reservations: Renumber the ext3 reservations ioctls

ext3-reservations-remove-unneeded-declaration.patch
ext3 reservations: Remove unneeded declaration.

ext3-reservations-turn-ext3-per-sb-reservations-list-into-an-rbtree.patch
ext3 reservations: Turn ext3 per-sb reservations list into an rbtree.

ext3-reservations-split-the-reserve_window-struct-into-two.patch
ext3 reservations: Split the "reserve_window" struct into two

ext3-reservations-smp-protect-the-reservation-during-allocation.patch
ext3 reservations: SMP-protect the reservation during allocation

tty_io-hangup-locking.patch
tty_io.c hangup locking

perfctr-core.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][1/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: core
CONFIG_PERFCTR=n build fix
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][6/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: misc

perfctr-i386.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][2/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: i386
perfctr #if/#ifdef cleanup
perfctr Dothan support
perfctr x86_tests build fix
perfctr x86 init bug
perfctr: K8 fix for internal benchmarking code
perfctr x86 update

perfctr-prescott-fix.patch
Prescott fix for perfctr

perfctr-x86_64.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][3/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: x86_64

perfctr-ppc.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][4/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: PowerPC
perfctr ppc32 update
perfctr update 4/6: PPC32 cleanups
perfctr ppc32 buglet fix

perfctr-virtualised-counters.patch
From: Mikael Pettersson <[email protected]>
Subject: [PATCH][5/6] perfctr-2.7.3 for 2.6.7-rc1-mm1: virtualised counters
perfctr update 6/6: misc minor cleanups
perfctr update 3/6: __user annotations
perfctr-cpus_complement-fix
perfctr cpumask cleanup
perfctr SMP hang fix

make-perfctr_virtual-default-in-kconfig-match-recommendation.patch
Make PERFCTR_VIRTUAL default in Kconfig match recommendation in help text

perfctr-ifdef-cleanup.patch
perfctr ifdef cleanup

perfctr-update-2-6-kconfig-related-updates.patch
perfctr update 2/6: Kconfig-related updates

perfctr-update-5-6-reduce-stack-usage.patch
perfctr update 5/6: reduce stack usage

perfctr-low-level-documentation.patch
perfctr low-level documentation
perfctr documentation update

perfctr-inheritance-1-3-driver-updates.patch
perfctr inheritance 1/3: driver updates
perfctr inheritance illegal sleep bug

perfctr-inheritance-2-3-kernel-updates.patch
perfctr inheritance 2/3: kernel updates

perfctr-inheritance-3-3-documentation-updates.patch
perfctr inheritance 3/3: documentation updates

perfctr-inheritance-locking-fix.patch
perfctr inheritance locking fix

ext3-online-resize-patch.patch
ext3: online resizing
ext3-online-resize-warning-fix

sched-trivial-sched-changes.patch
sched: trivial sched changes

sched-add-cpu_down_prepare-notifier.patch
sched: add CPU_DOWN_PREPARE notifier

sched-integrate-cpu-hotplug-and-sched-domains.patch
sched: integrate cpu hotplug and sched domains

sched-arch_destroy_sched_domains-warning-fix.patch
sched: arch_destroy_sched_domains warning fix

sched-sched-add-load-balance-flag.patch
sched: sched add load balance flag

sched-remove-disjoint-numa-domains-setup.patch
sched: remove disjoint NUMA domains setup

sched-make-domain-setup-overridable.patch
sched: make domain setup overridable

sched-make-domain-setup-overridable-rename.patch
sched-make-domain-setup-overridable: rename IDLE

sched-ia64-add-disjoint-numa-domain-support.patch
sched: IA64 add disjoint NUMA domain support

sched-fix-domain-debug-for-isolcpus.patch
sched: fix domain debug for isolcpus

sched-enable-sd_load_balance.patch
sched: enable SD_LOAD_BALANCE

sched-hotplug-add-a-cpu_down_failed-notifier.patch
sched: hotplug add a CPU_DOWN_FAILED notifier

sched-use-cpu_down_failed-notifier.patch
sched: use CPU_DOWN_FAILED notifier

sched-fixes-for-ia64-domain-setup.patch
sched: fixes for ia64 domain setup

nicksched.patch
nicksched

nicksched-sched_fifo-fix.patch
nicksched: SCHED_FIFO fix

sched-smtnice-fix.patch
sched: SMT nice fix

ext3_bread-cleanup.patch
ext3_bread() cleanup

pcmcia-implement-driver-model-support.patch
pcmcia: implement driver model support

pcmcia-update-network-drivers.patch
pcmcia: update network drivers

pcmcia-update-wireless-drivers.patch
pcmcia: update wireless drivers

pcmcia-fix-eject-lockup.patch
pcmcia: fix eject lockup

pcmcia-add-hotplug-support.patch
pcmcia: add *hotplug support

linux-2.6.8.1-49-rpc_workqueue.patch
nfs: RPC: Convert rpciod into a work queue for greater flexibility

linux-2.6.8.1-50-rpc_queue_lock.patch
nfs: RPC: Remove the rpc_queue_lock global spinlock

dvdrw-support-for-267-bk13.patch
DVD+RW support for 2.6.7-bk13

packet-writing-credits.patch
packet-writing: add credits

cdrw-packet-writing-support-for-267-bk13.patch
CDRW packet writing support
packet: remove #warning
packet writing: door unlocking fix
pkt_lock_door() warning fix
Fix race in pktcdvd kernel thread handling
Fix open/close races in pktcdvd
packet writing: review fixups
Remove pkt_dev from struct pktcdvd_device
packet writing: convert to seq_file

dvd-rw-packet-writing-update.patch
Packet writing support for DVD-RW and DVD+RW discs.
Get blockdev size right in pktcdvd after switching discs

packet-writing-docco.patch
packet writing documentation
Trivial CDRW packet writing doc update

control-pktcdvd-with-an-auxiliary-character-device.patch
Control pktcdvd with an auxiliary character device
Subject: Re: 2.6.8-rc2-mm2
control-pktcdvd-with-an-auxiliary-character-device-fix

simplified-request-size-handling-in-cdrw-packet-writing.patch
Simplified request size handling in CDRW packet writing

fix-setting-of-maximum-read-speed-in-cdrw-packet-writing.patch
Fix setting of maximum read speed in CDRW packet writing

packet-writing-reporting-fix.patch
Packet writing reporting fixes

speed-up-the-cdrw-packet-writing-driver.patch
Speed up the cdrw packet writing driver

packet-writing-avoid-bio-hackery.patch
packet writing: avoid BIO hackery

cdrom-buffer-size-fix.patch
cdrom: buffer sizing fix

cpufreq-driver-for-nforce2-kernel-267.patch
cpufreq driver for nForce2

allow-modular-ide-pnp.patch
allow modular ide-pnp

create-nodemask_t.patch
Create nodemask_t
nodemask fix
nodemask build fix

b44-add-47xx-support.patch
b44: add 47xx support

allow-x86_64-to-reenable-interrupts-on-contention.patch
Allow x86_64 to reenable interrupts on contention

serial-cs-and-unusable-port-size-ranges.patch
serial-cs and unusable port size ranges

add-support-for-it8212-ide-controllers.patch
Add support for IT8212 IDE controllers

i386-hotplug-cpu.patch
i386 Hotplug CPU

hotplug-cpu-fix-apic-queued-timer-vector-race.patch
Hotplug cpu: Fix APIC queued timer vector race

hotplug-cpu-move-cpu_online_map-clear-to-__cpu_disable.patch
Hotplug cpu: Move cpu_online_map clear to __cpu_disable

igxb-speedup.patch
igxb speedup

serialize-access-to-ide-devices.patch
serialize access to ide devices

remove-unconditional-pci-acpi-irq-routing.patch
remove unconditional PCI ACPI IRQ routing

propagate-pci_enable_device-errors.patch
propagate pci_enable_device() errors

disable-atykb-warning.patch
disable atykb "too many keys pressed" warning

add-some-key-management-specific-error-codes.patch
Add some key management specific error codes

keys-new-error-codes-for-alpha-mips-pa-risc-sparc-sparc64.patch
keys: new error codes for Alpha, MIPS, PA-RISC, Sparc & Sparc64

implement-in-kernel-keys-keyring-management.patch
implement in-kernel keys & keyring management
keys build fix
keys & keyring management update patch
implement-in-kernel-keys-keyring-management-update-build-fix
implement-in-kernel-keys-keyring-management-update-build-fix-2
key management patch cleanup

make-key-management-code-use-new-the-error-codes.patch
Make key management code use new the error codes

keys-permission-fix.patch
keys: permission fix

keys-keyring-management-keyfs-patch.patch
keys & keyring management: keyfs patch

keyfs-build-fix.patch
keyfs build fix

implement-in-kernel-keys-keyring-management-afs-workaround.patch
implement-in-kernel-keys-keyring-management afs workaround

support-supplementary-information-for-request-key.patch
Support supplementary information for request-key

make-key-management-use-syscalls-not-prctls.patch
Make key management use syscalls not prctls

move-syscall-declarations-from-linux-keyh-2.patch
Move syscall declarations from linux/key.h #2

make-key-management-use-syscalls-not-prctls-build-fix.patch
make-key-management-use-syscalls-not-prctls build fix

export-file_ra_state_init-again.patch
Export file_ra_state_init() again

cachefs-filesystem.patch
CacheFS filesystem

cachefs-return-the-right-error-upon-invalid-mount.patch
CacheFS: return the right error upon invalid mount

remove-error-from-linux-cachefsh.patch
Remove #error from linux/cachefs.h

cachefs-warning-fix-2.patch
cachefs warning fix 2

cachefs-linkage-fix-2.patch
cachefs linkage fix

cachefs-build-fix.patch
cachefs build fix

cachefs-documentation.patch
CacheFS documentation

add-page-becoming-writable-notification.patch
Add page becoming writable notification

provide-a-filesystem-specific-syncable-page-bit.patch
Provide a filesystem-specific sync'able page bit

provide-a-filesystem-specific-syncable-page-bit-fix.patch
provide-a-filesystem-specific-syncable-page-bit-fix

make-afs-use-cachefs.patch
Make AFS use CacheFS

ide-probe.patch
ide probe

268-rc3-jffs2-unable-to-read-filesystems.patch
jffs2 unable to read filesystems

qlogic-isp2x00-remove-needless-busyloop.patch
QLogic ISP2x00: remove needless busyloop

jffs2-mount-options-discarded.patch
JFFS2 mount options discarded

assign_irq_vector-section-fix.patch
assign_irq_vector __init section fix

find_isa_irq_pin-should-not-be-__init.patch
find_isa_irq_pin should not be __init

kexec-i8259-shutdowni386.patch
kexec: i8259-shutdown.i386

kexec-i8259-shutdown-x86_64.patch
kexec: x86_64 i8259 shutdown

kexec-apic-virtwire-on-shutdowni386patch.patch
kexec: apic-virtwire-on-shutdown.i386.patch

kexec-apic-virtwire-on-shutdownx86_64.patch
kexec: apic-virtwire-on-shutdown.x86_64

kexec-ioapic-virtwire-on-shutdowni386.patch
kexec: ioapic-virtwire-on-shutdown.i386

kexec-ioapic-virtwire-on-shutdownx86_64.patch
kexec: ioapic-virtwire-on-shutdown.x86_64

kexec-e820-64bit.patch
kexec: e820-64bit

kexec-kexec-generic.patch
kexec: kexec-generic

kexec-machine_shutdownx86_64.patch
kexec: machine_shutdown.x86_64

kexec-kexecx86_64.patch
kexec: kexec.x86_64

kexec-machine_shutdowni386.patch
kexec: machine_shutdown.i386

kexec-kexeci386.patch
kexec: kexec.i386

kexec-use_mm.patch
kexec: use_mm

kexec-kexecppc.patch
kexec: kexec.ppc

kexec-ppc-kexec-kconfig-misplacement.patch
kexec ppc KEXEC Kconfig misplacement

new-bitmap-list-format-for-cpusets.patch
new bitmap list format (for cpusets)

cpusets-big-numa-cpu-and-memory-placement.patch
cpusets - big numa cpu and memory placement

cpusets-dont-export-proc_cpuset_operations.patch
Cpusets - Dont export proc_cpuset_operations

cpusets-display-allowed-masks-in-proc-status.patch
cpusets: display allowed masks in proc status

cpusets-simplify-cpus_allowed-setting-in-attach.patch
cpusets: simplify cpus_allowed setting in attach

cpusets-remove-useless-validation-check.patch
cpusets: remove useless validation check

cpusets-config_cpusets-depends-on-smp.patch
Cpusets: CONFIG_CPUSETS depends on SMP

cpusets-tasks-file-simplify-format-fixes.patch
Cpusets tasks file: simplify format, fixes

cpusets-fix-possible-race-in-cpuset_tasks_read.patch
cpusets: fix possible race in cpuset_tasks_read()

cpusets-simplify-memory-generation.patch
Cpusets: simplify memory generation

cpusets-interoperate-with-hotplug-online-maps.patch
cpusets: interoperate with hotplug online maps

reiser4-sb_sync_inodes.patch
reiser4: vfs: add super_operations.sync_inodes()

reiser4-sb_sync_inodes-cleanup.patch
reiser4-sb_sync_inodes-cleanup

reiser4-allow-drop_inode-implementation.patch
reiser4: export vfs inode.c symbols

reiser4-allow-drop_inode-implementation-cleanup.patch
reiser4-allow-drop_inode-implementation-cleanup

reiser4-truncate_inode_pages_range.patch
reiser4: vfs: add truncate_inode_pages_range()

reiser4-truncate_inode_pages_range-cleanup.patch
reiser4-truncate_inode_pages_range-cleanup

reiser4-export-remove_from_page_cache.patch
reiser4: export pagecache add/remove functions to modules

reiser4-export-page_cache_readahead.patch
reiser4: export page_cache_readahead to modules

reiser4-reget-page-mapping.patch
reiser4: vfs: re-check page->mapping after calling try_to_release_page()

reiser4-rcu-barrier.patch
reiser4: add rcu_barrier() synchronization point

reiser4-rcu-barrier-fix.patch
reiser4-rcu-barrier fix

reiser4-export-inode_lock.patch
reiser4: export inode_lock to modules

reiser4-export-inode_lock-cleanup.patch
reiser4-export-inode_lock-cleanup

reiser4-export-pagevec-funcs.patch
reiser4: export pagevec functions to modules

reiser4-export-pagevec-funcs-cleanup.patch
reiser4-export-pagevec-funcs-cleanup

reiser4-export-radix_tree_preload.patch
reiser4: export radix_tree_preload() to modules

reiser4-radix-tree-tag.patch
reiser4: add new radix tree tag

reiser4-radix_tree_lookup_slot.patch
reiser4: add radix_tree_lookup_slot()

reiser4-aliased-dir.patch
reiser4: vfs: handle aliased directories

reiser4-kobject-umount-race.patch
reiser4: introduce filesystem kobjects

reiser4-kobject-umount-race-cleanup.patch
reiser4-kobject-umount-race-cleanup

reiser4-perthread-pages.patch
reiser4: per-thread page pools

reiser4-unstatic-kswapd.patch
reiser4: make kswapd() unstatic for debug

reiser4-include-reiser4.patch
reiser4: add to build system

reiser4-4kstacks-fix.patch
resier4-4kstacks-fix

stop-reiser4-from-turning-itself-on-by-default.patch
Stop reiser4 from turning itself on by default

reiser4-doc.patch
reiser4: documentation

reiser4-doc-update.patch
Update Documentation/Changes for reiser4

reiser4-only.patch
reiser4: main fs

reiser4-debug-build-fix.patch
reiser4-debug-build-fix

reiser4-prefetch-warning-fix.patch
reiser4: prefetch warning fix

reiser4-mode-fix.patch
reiser4: mode type fix

reiser4-get_context_ok-warning-fixes.patch
reiser4: get_context_ok() warning fixes

reiser4-remove-debug.patch
resier4: remove debug stuff

reiser4-spinlock-debugging-build-fix-2.patch
reiser4-spinlock-debugging-build-fix-2

reiser4-sparc64-build-fix.patch
reiser4 sparc64 build fix

sys_reiser4-sparc64-build-fix.patch
sys_reiser4 sparc64 build fix

reiser4-printk-warning-fixes.patch
reiser4 printk warning fixes

add-acpi-based-floppy-controller-enumeration.patch
Add ACPI-based floppy controller enumeration.

add-acpi-based-floppy-controller-enumeration-fix.patch
add-acpi-based-floppy-controller-enumeration fix

update-acpi-floppy-enumeration.patch
update ACPI floppy enumeration

possible-dcache-bug-debugging-patch.patch
Possible dcache BUG: debugging patch

kallsyms-data-size-reduction--lookup-speedup.patch
kallsyms data size reduction / lookup speedup

inconsistent-kallsyms-fix.patch
Inconsistent kallsyms fix

kallsyms-correct-type-char-in-proc-kallsyms.patch
kallsyms: correct type char in /proc/kallsyms

kallsyms-fix-sparc-gibberish.patch
kallsyms: fix sparc gibberish

tioccons-security.patch
TIOCCONS security

fix-process-start-times.patch
Fix reporting of process start times

fix-comment-in-include-linux-nodemaskh.patch
Fix comment in include/linux/nodemask.h

x86-build-issue-with-software-suspend-code.patch
Fix x86 build issue with software suspend code

hpt366c-wrong-timings-used-since-268.patch
hpt366.c: wrong timings

move-waitqueue-functions-to-kernel-waitc.patch
move waitqueue functions to kernel/wait.c

standardize-bit-waiting-data-type.patch
standardize bit waiting data type

provide-a-filesystem-specific-syncable-page-bit-fix-2.patch
provide-a-filesystem-specific-syncable-page-bit-fix-2

consolidate-bit-waiting-code-patterns.patch
consolidate bit waiting code patterns
consolidate-bit-waiting-code-patterns-cleanup
__wait_on_bit-fix

eliminate-bh-waitqueue-hashtable.patch
eliminate bh waitqueue hashtable

eliminate-bh-waitqueue-hashtable-fix.patch
wait_on_bit_lock() must test_and_set_bit(), not test_bit()

eliminate-inode-waitqueue-hashtable.patch
eliminate inode waitqueue hashtable

move-wait-ops-contention-case-completely-out-of-line.patch
move wait ops' contention case completely out of line

reduce-number-of-parameters-to-__wait_on_bit-and-__wait_on_bit_lock.patch
reduce number of parameters to __wait_on_bit() and __wait_on_bit_lock()

document-wake_up_bits-requirement-for-preceding-memory-barriers.patch
document wake_up_bit()'s requirement for preceding memory barriers

3c59x-pm-fix.patch
3c59x: enable power management unconditionally

serial-mpsc-driver.patch
Serial MPSC driver

serial-add-support-for-non-standard-xtals-to-16c950-driver.patch
serial: add support for non-standard XTALs to 16c950 driver

add-support-for-possio-gcc-aka-pcmcia-siemens-mc45.patch
Add support for Possio GCC AKA PCMCIA Siemens MC45

searching-for-parameters-in-make-menuconfig.patch
searching for parameters in 'make menuconfig'

menuconfig-regex-search-dependencies.patch
menuconfig: regex search + dependencies

add-smc91x-ethernet-for-lpd7a40x.patch
add SMC91x ethernet for LPD7A40X

m32r-base.patch
m32r architecture

m32r-update-for-profiling.patch
m32r: update for profiling

m32r-update-zone_sizes_init.patch
m32r: update zone_sizes_init()

m32r-update-to-fix-compile-errors.patch
m32r: update to fix compile errors

m32r-update-uaccessh.patch
m32r: update uaccess.h

m32r-update-checksum-functions.patch
m32r: update checksum functions

m32r-update-cf-pcmcia-drivers.patch
m32r: update CF/PCMCIA drivers

m32r-update-headers-to-remove-useless-ibcs2-support-code.patch
m32r: update headers to remove useless iBCS2 support code

atomic_inc_return-for-m32r-re.patch
atomic_inc_return for m32r

m32r-change-from-export_symbol_novers-to-export_symbol.patch
m32r: change from EXPORT_SYMBOL_NOVERS to EXPORT_SYMBOL

m32r-modify-sys_ipc-to-remove-useless-ibcs2-support-code.patch
m32r: modify sys_ipc() to remove useless iBCS2 support code

m32r-add-elf-machine-code.patch
m32r: add ELF machine code

m32r-upgrade-to-2681-kernel.patch
m32r: upgrade to 2.6.8.1 kernel

m32r-support-a-new-bootloader-m32r-g00ff.patch
m32r: support a new bootloader "m32r-g00ff"

m32r-modify-io-routines-for-m32700ut-cf-access.patch
m32r: modify IO routines for m32700ut CF access

vm-pageout-throttling.patch
vm: pageout throttling

fix-race-in-sysfs_read_file-and-sysfs_write_file.patch
Fix race in sysfs_read_file() and sysfs_write_file()

possible-race-in-sysfs_read_file-and-sysfs_write_file-update.patch
Possible race in sysfs_read_file() and sysfs_write_file()

md-add-interface-for-userspace-monitoring-of-events.patch
md: add interface for userspace monitoring of events.

lazy-tsss-i-o-bitmap-copy-for-i386.patch
lazy TSS's I/O bitmap copy for i386

pnpbios-parser-bugfix.patch
pnpbios parser bugfix

unreachable-code-in-ext3_direct_io.patch
unreachable code in ext3_direct_IO()

fix-for-nforce2-secondary-ide-getting-wrong-irq.patch
Fix for NForce2 secondary IDE getting wrong IRQ

revert-allow-oem-written-modules-to-make-calls-to-ia64-oem-sal-functions.patch
revert "allow OEM written modules to make calls to ia64 OEM SAL functions"

shmem-dont-slab_hwcache_align.patch
shmem: don't SLAB_HWCACHE_ALIGN

shmem-inodes-and-links-need-lowmem.patch
shmem: inodes and links need lowmem

shmem-no-sbinfo-for-shm-mount.patch
shmem: no sbinfo for shm mount

shmem-no-sbinfo-for-tmpfs-mount.patch
shmem: no sbinfo for tmpfs mount?

shmem-avoid-the-shmem_inodes-list.patch
shmem: avoid the shmem_inodes list

shmem-rework-majmin-and-zero_page.patch
shmem: rework majmin and ZERO_PAGE

shmem-copyright-file_setup-trivia.patch
shmem: Copyright file_setup trivia

allocate-correct-amount-of-memory-for-pid-hash.patch
Allocate correct amount of memory for pid hash

misrouted-irq-recovery-take-2.patch
Misrouted IRQ recovery, take 2

misrouted-irq-recovery-take-2-cleanup.patch
misrouted-irq-recovery-take-2 cleanup

misrouted-irq-recovery-take-2-fix.patch
misrouted-irq-recovery-take-2 fix

misrouted-irq-recovery-docs.patch
misrouted-irq-recovery documentation

explicity-align-tss-stack.patch
explicity align tss->stack

check-checksums-for-bnep.patch
Check checksums for BNEP

remember-to-check-return-value-from-__copy_to_user-in.patch
__copy_to_user() check in cdrom_read_cdda_old()

cfq-iosched-v2.patch
CFQ iosched v2

dont-export-blkdev_open-and-def_blk_ops.patch
don't export blkdev_open and def_blk_ops

remove-dead-code-from-fs-mbcachec.patch
remove dead code from fs/mbcache.c

remove-posix_acl_masq_nfs_mode.patch
remove posix_acl_masq_nfs_mode

make-kmem_find_general_cachep-static-in-slabc.patch
make kmem_find_general_cachep static in slab.c

dont-export-shmem_file_setup.patch
don't export shmem_file_setup

remove-pm_find-unexport-pm_send.patch
remove pm_find, unexport pm_send

remove-dead-code-and-exports-from-signalc.patch
remove dead code and exports from signal.c

mark-md_interrupt_thread-static.patch
mark md_interrupt_thread static

unexport-proc_sys_root.patch
unexport proc_sys_root

mark-dq_list_lock-static.patch
mark dq_list_lock static

unexport-is_subdir-and-shrink_dcache_anon.patch
unexport is_subdir and shrink_dcache_anon

unexport-devfs_mk_symlink.patch
unexport devfs_mk_symlink

unexport-do_execve-do_select.patch
unexport do_execve/do_select

unexport-exit_mm.patch
unexport exit_mm

unexport-files_lock-and-put_filp.patch
unexport files_lock and put_filp

remove-exports-from-audit-code.patch
remove exports from audit code

unexport-f_delown.patch
unexport f_delown

unexport-lookup_create.patch
unexport lookup_create

remove-wake_up_all_sync.patch
remove wake_up_all_sync

remove-set_fs_root-set_fs_pwd.patch
remove set_fs_root/set_fs_pwd

add-prctl-to-modify-current-comm.patch
Add prctl to modify current->comm

md-remove-md_flush_all.patch
md: remove md_flush_all

md-make-retry_list-non-global-in-raid1-and-multipath.patch
md: make retry_list non-global in raid1 and multipath

md-rationalise-issue_flush-function-in-md-personalities.patch
md: rationalise issue_flush function in md personalities

md-rationalise-unplug-functions-in-md.patch
md: rationalise unplug functions in md

md-make-sure-md-always-uses-rdev_dec_pending-properly.patch
md: make sure md always uses rdev_dec_pending properly

md-fix-two-little-bugs-in-raid10.patch
md: fix two little bugs in raid10

md-modify-locking-when-accessing-subdevices-in-md.patch
md: modify locking when accessing subdevices in md

blk-max_sectors-tunables.patch
blk: max_sectors tunables

generic-acl-support-for-permission.patch
generic acl support for ->permission

generic-acl-support-for-permission-fix.patch
generic acl support for ->permission fix

generic-acl-support-for-permission-keyfs-fix.patch
generic-acl-support-for-permission-keyfs-fix

device-driver-for-the-sgi-system-clock-mmtimer.patch
device driver for the SGI system clock, mmtimer

rtl8150-fix.patch
rtl8150 fix

close-race-with-preempt-and-modular-pm_idle-callbacks.patch
Close race with preempt and modular pm_idle callbacks

cacheline-align-pagevec-structure.patch
Adjust align pagevec structure

hvcs-fix-to-replace-yield-with-tty_wait_until_sent-in.patch
HVCS fix to replace yield with tty_wait_until_sent in hvcs_close

fbdev-remove-unnecessary-banshee_wait_idle-from-tdfxfb.patch
fbdev: remove unnecessary banshee_wait_idle from tdfxfb

fbdev-fix-logo-drawing-failure-for-vga16fb.patch
fbdev: fix logo drawing failure for vga16fb

fbdev-initialize-i810fb-after-agpgart.patch
fbdev: Initialize i810fb after agpgart

fbdev-fix-userland-compile-breakage.patch
fbdev: Fix userland compile breakage

fbcon-fix-setup-boot-options-of-fbcon.patch
fbcon: Fix setup boot options of fbcon

fbdev-pass-struct-device-to-class_simple_device_add.patch
fbdev: Pass struct device to class_simple_device_add

fbdev-add-tile-blitting-support.patch
fbdev: Add Tile Blitting support

fix-for-spurious-interrupts-on-e100-resume.patch
Fix for spurious interrupts on e100 resume

r8169-miscalculation-of-available-tx-descriptors.patch
r8169: miscalculation of available Tx descriptors

r8169-hint-for-tx-flow-control.patch
r8169: hint for Tx flow control

r8169-tso-support.patch
r8169: TSO support.

r8169-mac-identifier-extracted-from-realteks-driver-v22.patch
r8169: Mac identifier extracted from Realtek's driver v2.2

uml-remove-ghash.patch
uml: remove ghash.h

uml-eliminate-useless-thread-field.patch
uml: eliminate useless thread field

uml-fix-scheduler-race.patch
uml: fix scheduler race

uml-fix-binary-layout-assumption.patch
uml: fix binary layout assumption

uml-disable-pending-signals-across-a-reboot.patch
uml: disable pending signals across a reboot

uml-refer-to-config_usermode-not-to-config_um.patch
uml: refer to CONFIG_USERMODE, not to CONFIG_UM

uml-remove-commented-old-code-in-kconfig.patch
uml: remove commented old code in Kconfig

uml-smp-build-fix.patch
uml: smp build fix

uml-remove-config_uml_smp.patch
uml: remove CONFIG_UML_SMP

highmem-flushes.patch
block highmem flushes

add-support-for-word-length-uart-registers.patch
Add support for word-length UART registers

compile-fix-3c59x-for-eisa-without-pci.patch
compile fix 3c59x for eisa without pci

atomic_inc_return-for-i386.patch
atomic_inc_return() for i386

atomic_inc_return-for-x86_64.patch
atomic_inc_return() for x86_64

atomic_inc_return-for-arm.patch
atomic_inc_return() for arm

atomic_inc_return-for-arm26.patch
atomic_inc_return() for arm26

atomic_inc_return-for-sparc64.patch
atomic_inc_return() for sparc64

show-aggregate-per-process-counters-in-proc-pid-stat-2.patch
show aggregate per-process counters in /proc/PID/stat 2

fix-uninitialized-warnings-in-mempolicyc.patch
fix uninitialized warnings in mempolicy.c

online-cpu-with-maxcpus-option-panics.patch
Online CPU with maxcpus option panics

remove-dead-exports-from-fs-fat.patch
remove dead exports from fs/fat/

fat-use-hlist_head-for-fat_inode_hashtable-1-6.patch
FAT: use hlist_head for fat_inode_hashtable

fat-rewrite-the-cache-for-file-allocation-table-lookup.patch
FAT: rewrite the cache for file allocation table lookup

fat-cache-lock-from-per-sb-to-per-inode-3-6.patch
FAT: cache lock from per sb to per inode

fat-the-inode-hash-from-per-module-to-per-sb-4-6.patch
FAT: the inode hash from per module to per sb

fat-fix-the-race-bitween-fat_free-and-fat_get_cluster.patch
FAT: Fix the race bitween fat_free() and fat_get_cluster()

fat-remove-debug_pr-6-6.patch
FAT: remove debug_pr()

small-linux-hardirqh-tweaks.patch
small <linux/hardirq.h> tweaks

bsd-disklabel-handle-more-than-8-partitions.patch
BSD disklabel: handle more than 8 partitions

asm-softirqh-crept-back-in-h8300-and-sh64.patch
<asm/softirq.h> crept back in h8300 and sh64

mark-amiflop-non-unloadable.patch
mark amiflop non-unloadable

thinkpad-fnfx-key-driver.patch
thinkpad fn+fx key driver

netpoll-endian-fixes.patch
netpoll endian fixes

rewrite-alloc_pidmap.patch
pidhashing: rewrite alloc_pidmap()

missing-pci_disable_device.patch
missing pci_disable_device()

fbdev-radeonfb-remove-bogus-radeonfb_read-write.patch
fbdev/radeonfb: Remove bogus radeonfb_read/write

add-missing-pci_disable_device-for-pci-based-usb-hcd.patch
add missing pci_disable_device for PCI-based USB HCD

add-missing-pci_disable_device-for-e1000.patch
add missing pci_disable_device for e1000

next_thread-bug-fixes.patch
next_thread() BUG fixes




2004-09-13 09:22:43

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/

> +sched-trivial-sched-changes.patch
> +sched-add-cpu_down_prepare-notifier.patch
> +sched-integrate-cpu-hotplug-and-sched-domains.patch
> +sched-arch_destroy_sched_domains-warning-fix.patch
> +sched-sched-add-load-balance-flag.patch
> +sched-remove-disjoint-numa-domains-setup.patch
> +sched-make-domain-setup-overridable.patch
> +sched-make-domain-setup-overridable-rename.patch
> +sched-ia64-add-disjoint-numa-domain-support.patch
> +sched-fix-domain-debug-for-isolcpus.patch
> +sched-enable-sd_load_balance.patch
> +sched-hotplug-add-a-cpu_down_failed-notifier.patch
> +sched-use-cpu_down_failed-notifier.patch
> +sched-fixes-for-ia64-domain-setup.patch
>
> CPU scheduler work.
>

In particular, anyone who was having trouble with sched-domains and/or CPU
hotplug please test this.

It is supposed to fix all known issues, but some patches are fairly involved,
and not having been tested on problem hardware, there could be still some bugs.
Please let me know if anything goes bug.

Also, ia64 sched-domains setup is possibly still broken. If anyone boots this
on an Altix, please send over the full dmesg! Thanks.

2004-09-13 10:20:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

> +lockmeter.patch
>
> Repaired lockmeter patch

This one is still needlessly messing around in procfs internals.

2004-09-13 10:49:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday 13 of September 2004 10:50, Andrew Morton wrote:
>
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/

I can't build it on x86-64:

LD init/built-in.o
LD .tmp_vmlinux1
fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
: undefined reference to `vfs_permission'
make: *** [.tmp_vmlinux1] Error 1

The .config is attached.

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"


Attachments:
(No filename) (636.00 B)
2.6.9-rc1-mm5.config (36.97 kB)
Download all attachments

2004-09-13 11:01:28

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> consolidate-bit-waiting-code-patterns.patch
> eliminate-bh-waitqueue-hashtable.patch
> eliminate-bh-waitqueue-hashtable-fix.patch
> eliminate-inode-waitqueue-hashtable.patch
> move-wait-ops-contention-case-completely-out-of-line.patch
> reduce-number-of-parameters-to-__wait_on_bit-and-__wait_on_bit_lock.patch
> document-wake_up_bits-requirement-for-preceding-memory-barriers.patch

For a general status update, suparna and I are working on the aio
integration with all this (well, thus far mostly suparna).


-- wli

2004-09-13 11:13:21

by Nikita Danilov

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Rafael J. Wysocki writes:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> I can't build it on x86-64:
>
> LD init/built-in.o
> LD .tmp_vmlinux1
> fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
> : undefined reference to `vfs_permission'
> make: *** [.tmp_vmlinux1] Error 1

reiser4 wasn't updated during vfs_permission/generic_permission
conversion. Evil conspiracy is obviously underway.

Untested patch is below.

Andrew, please apply.

Nikita.
----------------------------------------------------------------------
--- perm.c 2004-05-17 14:04:55.000000000 +0400
+++ perm.c.new 2004-09-13 15:07:10.432547928 +0400
@@ -13,7 +13,7 @@
static int
mask_ok_common(struct inode *inode, int mask)
{
- return vfs_permission(inode, mask);
+ return generic_permission(inode, mask, NULL);
}

static int
----------------------------------------------------------------------
>
> The .config is attached.
>
> Greets,
> RJW
>

2004-09-13 11:15:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Update:

On Monday 13 of September 2004 12:48, Rafael J. Wysocki wrote:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> I can't build it on x86-64:
>
> LD init/built-in.o
> LD .tmp_vmlinux1
> fs/built-in.o(.text+0xd1893): In function `mask_ok_common':
> : undefined reference to `vfs_permission'
> make: *** [.tmp_vmlinux1] Error 1

It's reiser4, apparently:

CC fs/reiser4/plugin/security/perm.o
fs/reiser4/plugin/security/perm.c: In function `mask_ok_common':
fs/reiser4/plugin/security/perm.c:16: warning: implicit declaration of
function `vfs_permission'

Greets,
RJW


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-09-13 13:41:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

just don't set ->permission for reiser4 and kill the whole perm_plugin
bullshit. This fixes the issue by removing a few hundred lines of code which
is always a good idea.

2004-09-13 15:17:01

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

OK, starfire broke and qlogicisp. Plus some NUMA stuff in mm/mempolicy.c
Full error log below. Config is (on ia32):

ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/config.numaq

The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
or create-nodemask_t.patch by the looks of it. The only thing touching
starfire is bk-netdev.patch, but as I get very similar errors from qlogicisp
maybe someone's been futzing with readw/writew ?

M.

drivers/net/starfire.c: In function `starfire_init_one':
drivers/net/starfire.c:924: warning: passing arg 1 of `readb' makes pointer from integer without a cast
drivers/net/starfire.c:930: warning: passing arg 1 of `readb' makes pointer from integer without a cast
drivers/net/starfire.c:935: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:937: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:940: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:944: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `mdio_read':
drivers/net/starfire.c:1087: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `mdio_write':
drivers/net/starfire.c:1100: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_open':
drivers/net/starfire.c:1123: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1124: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1162: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1169: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1177: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1179: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1180: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1181: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1182: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1183: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1185: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1189: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1196: warning: passing arg 2 of `writeb' makes pointer from integer without a cast
drivers/net/starfire.c:1199: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1200: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1201: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1205: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1206: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1207: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1213: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1215: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1217: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1219: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1232: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1238: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1240: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1241: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1257: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1260: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `tx_timeout':
drivers/net/starfire.c:1312: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `init_ring':
drivers/net/starfire.c:1356: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `start_tx':
drivers/net/starfire.c:1477: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `intr_handler':
drivers/net/starfire.c:1505: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1522: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1562: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1593: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `__netdev_rx':
drivers/net/starfire.c:1710: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `refill_rx_ring':
drivers/net/starfire.c:1779: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_media_change':
drivers/net/starfire.c:1839: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1841: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:1849: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_error':
drivers/net/starfire.c:1865: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `get_stats':
drivers/net/starfire.c:1891: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1892: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1893: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1895: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1895: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1896: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1898: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1898: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1901: warning: passing arg 1 of `readw' makes pointer from integer without a cast
drivers/net/starfire.c:1902: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1903: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1904: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1905: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:1906: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c: In function `set_rx_mode':
drivers/net/starfire.c:1961: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1962: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1963: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1967: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1968: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1969: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1990: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1991: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1992: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1995: warning: passing arg 2 of `writew' makes pointer from integer without a cast
drivers/net/starfire.c:1998: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c: In function `netdev_close':
drivers/net/starfire.c:2099: warning: passing arg 1 of `readl' makes pointer from integer without a cast
drivers/net/starfire.c:2106: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:2109: warning: passing arg 2 of `writel' makes pointer from integer without a cast
drivers/net/starfire.c:2110: warning: passing arg 1 of `readl' makes pointer from integer without a cast
mm/mempolicy.c: In function `get_zonemask':
mm/mempolicy.c:419: `maxnode' undeclared (first use in this function)
mm/mempolicy.c:419: (Each undeclared identifier is reported only once
mm/mempolicy.c:419: for each function it appears in.)
drivers/scsi/qlogicisp.c: In function `isp_inw':
drivers/scsi/qlogicisp.c:632: warning: passing arg 1 of `readw' makes pointer from integer without a cast
drivers/scsi/qlogicisp.c: In function `isp_outw':
drivers/scsi/qlogicisp.c:641: warning: passing arg 2 of `writew' makes pointer from integer without a cast

2004-09-13 15:20:34

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Martin wrote:
> The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
> or create-nodemask_t.patch by the looks of it

The numa one, with the following errors:

mm/mempolicy.c: In function `get_zonemask':
mm/mempolicy.c:419: error: `maxnode' undeclared (first use in this function)

is due to fix-abi-in-set_mempolicy.patch.

See my fix on lkml:

Subject: [PATCH] undo more numa maxnode confusions
Date: Mon, 13 Sep 2004 05:58:48 -0700

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-13 16:17:04

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

--Paul Jackson <[email protected]> wrote (on Monday, September 13, 2004 08:18:41 -0700):

> Martin wrote:
>> The NUMA one is either cpusets-big-numa-cpu-and-memory-placement.patch
>> or create-nodemask_t.patch by the looks of it
>
> The numa one, with the following errors:
>
> mm/mempolicy.c: In function `get_zonemask':
> mm/mempolicy.c:419: error: `maxnode' undeclared (first use in this function)
>
> is due to fix-abi-in-set_mempolicy.patch.
>
> See my fix on lkml:
>
> Subject: [PATCH] undo more numa maxnode confusions
> Date: Mon, 13 Sep 2004 05:58:48 -0700

That worked - thanks.

The others seem only to be warnings, and are allegedly no worse than before,
so maybe it'll work now ;-)

M.

2004-09-13 16:24:15

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

> so maybe it'll work now ;-)

It's not working for me - on a small ia64 SN2, it crashes during boot.
Somewhere between the 32 and 42 patch of Andrews broken out set of 436
patches ... I'm still in the binary search loop.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-13 15:17:08

by Kirill Korotaev

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

--- ./fs/proc/array.c.nt 2004-09-13 18:56:17.000000000 +0400
+++ ./fs/proc/array.c 2004-09-13 19:13:03.749684712 +0400
@@ -338,6 +338,7 @@ static int do_task_stat(struct task_stru
spin_lock_irq(&task->sighand->siglock);
num_threads = atomic_read(&task->signal->count);
collect_sigign_sigcatch(task, &sigign, &sigcatch);
+ spin_unlock_irq(&task->sighand->siglock);

/* add up live thread stats at the group level */
if (whole) {
@@ -350,8 +351,6 @@ static int do_task_stat(struct task_stru
t = next_thread(t);
} while (t != task);
}
-
- spin_unlock_irq(&task->sighand->siglock);
}
if (task->signal) {
if (task->signal->tty) {


Attachments:
diff-task_stat-mm5 (662.00 B)

2004-09-13 17:25:22

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday, September 13, 2004 2:22 am, Nick Piggin wrote:
> In particular, anyone who was having trouble with sched-domains and/or CPU
> hotplug please test this.
>
> It is supposed to fix all known issues, but some patches are fairly
> involved, and not having been tested on problem hardware, there could be
> still some bugs. Please let me know if anything goes bug.
>
> Also, ia64 sched-domains setup is possibly still broken. If anyone boots
> this on an Altix, please send over the full dmesg! Thanks.

Didn't you get my last mail about this? Looks like the lack
of !defined(SD_NODE_INIT) in sched.h made its way to Andrew. Here's the
dmesg from a 2p, 1 node box, I'll send out a more complete one later (unless
Paul beat me to it, I'm still only part way through my lkml mailbox).

Thanks,
Jesse


Attachments:
(No filename) (808.00 B)
dmesg.txt (8.72 kB)
Download all attachments

2004-09-13 18:06:50

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Jesse wrote:
> I'll send out a more complete one later (unless
> Paul beat me to it,

See my patch posted a few hours ago:

[Patch] Fix sched make domain setup overridable

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-13 18:10:25

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
> Jesse wrote:
> > I'll send out a more complete one later (unless
> > Paul beat me to it,
>
> See my patch posted a few hours ago:
>
> [Patch] Fix sched make domain setup overridable

Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
bigger system). I've got a 32p reserved for later today.

Jesse

2004-09-13 20:04:53

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Kirill Korotaev <[email protected]> wrote:
>
> Hello Andrew,
>
> Please replace patch next_thread-bug-fixes.patch in -mm5 tree with the
> last diff-next_thread I sent to you.

I was planning on replacing it with Ingo's patch.

--- linux/fs/proc/array.c.orig
+++ linux/fs/proc/array.c
@@ -356,7 +356,7 @@ static int do_task_stat(struct task_stru
stime = task->signal->stime;
}
}
- if (whole) {
+ if (whole && task->sighand) {

Is there some reason why your patch is better? If so, please do a full
resend.

> And it looks like thread loop in do_task_stat() doesn't require siglock
> lock, so you can add the patch attached to reduce lock area.

hm, OK.

2004-09-13 20:30:25

by Pasi Savolainen

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

* Andrew Morton <[email protected]>:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/

Some badness with OHCI usb, usb devices just 'aren't there' for me.

- good -
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: irq 19, pci mem 0xf4000000
Sep 9 18:51:58 tienel kernel: ohci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 1
Sep 9 18:51:58 tienel kernel: hub 1-0:1.0: USB hub found
Sep 9 18:51:58 tienel kernel: hub 1-0:1.0: 4 ports detected
Sep 9 18:51:58 tienel kernel: USB Universal Host Controller Interface driver v2.2
Sep 9 18:51:58 tienel kernel: usb 1-1: new full speed USB device using address 2
- -

- bad (as in 2.6.9-rc1-mm5) -
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: irq 19, pci mem 0xf4000000
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 1
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: remove, state 0
Sep 13 23:01:19 tienel kernel: ohci_hcd 0000:02:00.0: USB bus 1 deregistered
Sep 13 23:01:19 tienel kernel: ohci_hcd: probe of 0000:02:00.0 failed with error -16
- -

--
Psi -- <http://www.iki.fi/pasi.savolainen>

2004-09-13 21:07:02

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday 13 of September 2004 10:50, Andrew Morton wrote:
>
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> and will later appear at
>
>
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/

It does not compile on SMP x86-64 w/ NUMA:

CC arch/x86_64/ia32/ia32_ioctl.o
In file included from fs/compat_ioctl.c:63,
from arch/x86_64/ia32/ia32_ioctl.c:14:
include/linux/reiserfs_fs.h:441: error: redefinition of `struct key'
include/linux/reiserfs_fs.h: In function `le_key_k_offset':
include/linux/reiserfs_fs.h:608: error: structure has no member named `u'
include/linux/reiserfs_fs.h:609: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `le_key_k_type':
include/linux/reiserfs_fs.h:620: error: structure has no member named `u'
include/linux/reiserfs_fs.h:621: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_le_key_k_offset':
include/linux/reiserfs_fs.h:633: error: structure has no member named `u'
include/linux/reiserfs_fs.h:634: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_le_key_k_type':
include/linux/reiserfs_fs.h:647: error: structure has no member named `u'
include/linux/reiserfs_fs.h:648: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_offset':
include/linux/reiserfs_fs.h:677: error: structure has no member named `u'
include/linux/reiserfs_fs.h:678: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_type':
include/linux/reiserfs_fs.h:684: error: structure has no member named `u'
include/linux/reiserfs_fs.h:685: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_cpu_key_k_offset':
include/linux/reiserfs_fs.h:691: error: structure has no member named `u'
include/linux/reiserfs_fs.h:692: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `set_cpu_key_k_type':
include/linux/reiserfs_fs.h:699: error: structure has no member named `u'
include/linux/reiserfs_fs.h:700: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `cpu_key_k_offset_dec':
include/linux/reiserfs_fs.h:707: error: structure has no member named `u'
include/linux/reiserfs_fs.h:709: error: structure has no member named `u'
include/linux/reiserfs_fs.h: In function `le_key_version':
include/linux/reiserfs_fs.h:1869: error: structure has no member named `u'
make[1]: *** [arch/x86_64/ia32/ia32_ioctl.o] Error 1
make: *** [arch/x86_64/ia32] Error 2

The .config is available at:
http://www.sisk.pl/kernel/040913/2.6.9-rc1-mm5-NUMA.config

Greets,
RJW

--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2004-09-13 21:31:23

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday, September 13, 2004 11:10 am, Jesse Barnes wrote:
> On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
> > Jesse wrote:
> > > I'll send out a more complete one later (unless
> > > Paul beat me to it,
> >
> > See my patch posted a few hours ago:
> >
> > [Patch] Fix sched make domain setup overridable
>
> Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
> bigger system). I've got a 32p reserved for later today.

Here's one from a 32p, 16 node machine (captured while scsi was still coming
up, but you probably don't care about that).

Jesse


Attachments:
(No filename) (590.00 B)
dmesg.txt (25.33 kB)
Download all attachments

2004-09-13 21:49:44

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 scheduling while atomic

On an Altix with the default config (smp+preempt, see
arch/ia64/configs/sn2_defconfig), I'm getting this:

bad: scheduling while atomic!

Call Trace:
[<a000000100017380>] show_stack+0x80/0xa0
sp=e0001c3004adfc40 bsp=e0001c3004ad9098
[<a0000001006bcc70>] schedule+0x11f0/0x16a0
sp=e0001c3004adfe10 bsp=e0001c3004ad8f78
[<a000000100018530>] cpu_idle+0x5b0/0x620
sp=e0001c3004adfe30 bsp=e0001c3004ad8ee8
[<a000000100059a10>] start_secondary+0x2d0/0x300
sp=e0001c3004adfe30 bsp=e0001c3004ad8eb0
[<a000000100008580>] _start+0x260/0x290
sp=e0001c3004adfe30 bsp=e0001c3004ad8eb0

The messages began right after I logged out of an ssh session and haven't
stopped yet.

Jesse

2004-09-13 21:56:55

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

Shortly after the backtrace I've already posted, I got one panic that looked
like this:

Warning: kfree_skb on hard IRQ a0000001006443d0
Unable to handle kernel paging request at virtual address 600000000001e8e0
Warning: kfree_skb on hard IRQ a0000001006443d0
Unable to handle kernel paging request at virtual address 600000000001e8e0
sshd[8790]: Oops 8804682956800 [1]
Modules linked in:

Pid: 8790, CPU 1, comm: sshd
psr : 0000101308526030 ifs : 80000028b0815428 ip : [<2000000000573670>]
Not tainted
ip is at 0x2000000000573670
unat: 0000000000000000 pfs : c000000000000288 rsc : 000000000000000f
rnat: 0000000000000000 bsps: 60000fff7fffc418 pr : 000000000001a529
ldrs: 0000000002100000 ccv : 0000000000000000 fpsr: 0009804c8a74433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : 4000000000042010 b6 : 2000000000573520 b7 : 0000000000000000
f6 : 000000000000000000000 f7 : 000000000000000000000
f8 : 000000000000000000000 f9 : 000000000000000000000
f10 : 000000000000000000000 f11 : 000000000000000000000
r1 : 2000000000684200 r2 : c000000000000288 r3 : 0000000000000001
r8 : 600000000001e8e0 r9 : 0000000000000000 r10 : 0000000000000000
r11 : 60000fffffffafa0 r12 : 60000fffffff7020 r13 : 20000000007392e0
r14 : 0000000000000000 r15 : 0000000000000006 r16 : 0000000005a6a5a9
r17 : 0000000000000000 r18 : 600000000001e8f0 r19 : 600000000001e8e0
r20 : 60000fffffff7040 r21 : 60000fffffff7050 r22 : 0000000000000010
r23 : 60000fff7fffc418 r24 : 0000000000000000 r25 : 0000000000000000
r26 : c00000000000038a r27 : 000000000000000f r28 : 2000000000617e20
r29 : 00001213085a6010 r30 : 60000fffffff7244 r31 : 600000000001eaf4
r32 : 0000000000000002 r33 : 0000000000000000 r34 : 200000000009ae00
r35 : 6000000000024b28 r36 : 6000000000024c10 r37 : 2000000000086610
r38 : c000000000000288 r39 : 6000000000024b20 r40 : 0000000000000002
r41 : 600000000001dcd0 r42 : 200000000009ae00 r43 : 0000000000000001
r44 : 0000000000000000 r45 : 0000000000000000 r46 : 0000000000000006
r47 : 0000000000000000 r48 : 2000000000083060 r49 : c00000000000048e
r50 : 6000000000024b20 r51 : 0000000000000002 r52 : 6000000000027db0
r53 : 6000000000024c38 r54 : 0000000000000002 r55 : 0000000000000000
r56 : 2000000000a647c0 r57 : 0000000000000000 r58 : 60000000000349d0
r59 : 2000000000a540d8 r60 : 60000fffffffaff8 r61 : 0000000000000000
r62 : 6000000000024b70 r63 : 2000000000082d90 r64 : c00000000000058f
r65 : 0000000005a5a969 r66 : 6000000000027e60 r67 : 0000000000000002
r68 : 0000000000000002 r69 : 0000000000000000 r70 : 200000000009ae00
r71 : 6000000000027e68
Kernel panic - not syncing: Aiee, killing interrupt handler!
Rebooting in 5 seconds..

The ip above is in sshd presumably, and the warning message corresponds to
somewhere in tcp_recvmsg:

a0000001006434e0 T tcp_recvmsg
a000000100644760 t tcp_close_state

Is this a known problem?

Thanks,
Jesse

2004-09-13 22:39:19

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Mon, 13 Sep 2004 14:56:31 -0700
Jesse Barnes <[email protected]> wrote:

> Shortly after the backtrace I've already posted, I got one panic that looked
> like this:

Do you have PREEMPT enabled with VLAN? If so, that's been fixed
recently, it was some buggy RCU locking in the VLAN code.

2004-09-13 22:46:33

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Monday, September 13, 2004 3:36 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 14:56:31 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > Shortly after the backtrace I've already posted, I got one panic that
> > looked like this:
>
> Do you have PREEMPT enabled with VLAN? If so, that's been fixed
> recently, it was some buggy RCU locking in the VLAN code.

Nope, VLAN isn't set:
[jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
# CONFIG_VLAN_8021Q is not set

2004-09-13 22:49:44

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Mon, 13 Sep 2004 15:44:07 -0700
Jesse Barnes <[email protected]> wrote:

> Nope, VLAN isn't set:
> [jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
> # CONFIG_VLAN_8021Q is not set

Hmmm, then that's a really strange backtrace. What networking
driver are you using?

2004-09-13 22:59:27

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 scheduling while atomic

> The messages began right after I logged out of an ssh session and haven't

I got the same messages, on another Altix sn2_defconfig, after I had:

1) logged in as the only, root user,
2) played around a bit, then
3) issued a 'reboot' command.

Then bamo - lots of scheduling while atomic! complaints. Though the
shutdown did succeed, and shut the complaints off.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2004-09-13 23:54:46

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Monday, September 13, 2004 3:47 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 15:44:07 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > Nope, VLAN isn't set:
> > [jbarnes@tomahawk linux-2.6.9-rc1-mm5]$ grep VLAN .config
> > # CONFIG_VLAN_8021Q is not set
>
> Hmmm, then that's a really strange backtrace. What networking
> driver are you using?

tg3. I saw one trace that included do_poll (iirc) and another last week that
had sys_select in it. I'll try to gather some more info.

Jesse

2004-09-13 23:58:01

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Mon, 13 Sep 2004 16:54:27 -0700
Jesse Barnes <[email protected]> wrote:

> tg3. I saw one trace that included do_poll (iirc) and another last week that
> had sys_select in it. I'll try to gather some more info.

What you're seeing might be due to the bug fixed by this patch:

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/09/13 12:58:04-07:00 [email protected]
# [NET]: Fix missing spin lock in lltx path.
#
# This fixes a silly missing spin lock in the relock path. For some
# reason it seems to still work when you don't have spinlock debugging
# enabled.
#
# Please apply.
#
# Thanks to Arjan's spinlock debug kernel for finding it.
#
# Signed-off-by: Andi Kleen <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
# net/sched/sch_generic.c
# 2004/09/13 12:57:46-07:00 [email protected] +3 -1
# [NET]: Fix missing spin lock in lltx path.
#
# This fixes a silly missing spin lock in the relock path. For some
# reason it seems to still work when you don't have spinlock debugging
# enabled.
#
# Please apply.
#
# Thanks to Arjan's spinlock debug kernel for finding it.
#
# Signed-off-by: Andi Kleen <[email protected]>
# Signed-off-by: David S. Miller <[email protected]>
#
diff -Nru a/net/sched/sch_generic.c b/net/sched/sch_generic.c
--- a/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
+++ b/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
@@ -148,8 +148,10 @@
spin_lock(&dev->queue_lock);
return -1;
}
- if (ret == NETDEV_TX_LOCKED && nolock)
+ if (ret == NETDEV_TX_LOCKED && nolock) {
+ spin_lock(&dev->queue_lock);
goto collision;
+ }
}

/* NETDEV_TX_BUSY - we need to requeue */

2004-09-14 00:03:59

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> On Mon, 13 Sep 2004 16:54:27 -0700
>
> Jesse Barnes <[email protected]> wrote:
> > tg3. I saw one trace that included do_poll (iirc) and another last week
> > that had sys_select in it. I'll try to gather some more info.
>
> What you're seeing might be due to the bug fixed by this patch:

> spin_lock(&dev->queue_lock);
> return -1;
> }
> - if (ret == NETDEV_TX_LOCKED && nolock)
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + spin_lock(&dev->queue_lock);
> goto collision;
> + }
> }
>
> /* NETDEV_TX_BUSY - we need to requeue */

Ok, I guess that would explain why I haven't seen this in 2.6.9-rc2. I was
getting my backtraces confused too--I've only seen this one for this bug.
I'll keep an eye out and report anything I see with the latest bk tree.

Thanks,
Jesse

2004-09-14 00:23:06

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Mon, 13 Sep 2004 17:03:48 -0700
Jesse Barnes <[email protected]> wrote:

> On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> > On Mon, 13 Sep 2004 16:54:27 -0700
> >
> > Jesse Barnes <[email protected]> wrote:
> > > tg3. I saw one trace that included do_poll (iirc) and another last week
> > > that had sys_select in it. I'll try to gather some more info.
> >
> > What you're seeing might be due to the bug fixed by this patch:
..
> Ok, I guess that would explain why I haven't seen this in 2.6.9-rc2. I was
> getting my backtraces confused too--I've only seen this one for this bug.
> I'll keep an eye out and report anything I see with the latest bk tree.

The patch isn't in the tree yet, you would see the problem in
2.6.9-rc2

Please try to get a clean backtrace with a current tree plus
the patch I posted, and I'll scratch my head some more.
:-)

2004-09-14 00:27:02

by James Morris

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

I'm experiencing TCP related oopses with this kernel (not seen in -mm4),
.config file attached.

Here are two backtraces, the first happened a few seconds after logging
in via ssh, the second happened soon after boot (using selinux=0, just to
make sure).

Oops #1:
-----------

KERNEL: assertion (!skb_queue_empty(&sk->sk_write_queue)) failed at net/ipv4/tcp_timer.c (322)
Unable to handle kernel NULL pointer dereference at virtual address 00000048
printing eip:
c03022c2
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: ipv6 e1000 3c59x ac
CPU: 0
EIP: 0060:[<c03022c2>] Not tainted VLI
EFLAGS: 00010246 (2.6.9-rc1-mm5)
EIP is at tcp_retransmit_skb+0x89/0x340
eax: 00000000 ebx: 00000000 ecx: f7718960 edx: 00000000
esi: f740c2a0 edi: f740c0a8 ebp: c0460f64 esp: c0460f48
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0460000 task=c039dac0)
Stack: f740c0a8 00000000 0000056e f740c2a0 f740c0a8 f740c2a0 f740c10c c0460fa0
c03044b2 c0387ed4 c038901c c038615b 00000142 c0460fb8 f888bb2f f709a778
f70791c0 c181110c 00000001 f740c0a8 f740c2a0 f740c0c8 c0460fb8 c03048af
Call Trace:
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c03044b2>] tcp_retransmit_timer+0xe9/0x434
[<c03048af>] tcp_write_timer+0xb2/0xcd
[<c01249c0>] run_timer_softirq+0xbf/0x17f
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c0112535>] smp_apic_timer_interrupt+0xf2/0xf4
[<c01066ca>] apic_timer_interrupt+0x1a/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
=======================
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c03044b2>] tcp_retransmit_timer+0xe9/0x434
[<c03048af>] tcp_write_timer+0xb2/0xcd
[<c01249c0>] run_timer_softirq+0xbf/0x17f
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c0112535>] smp_apic_timer_interrupt+0xf2/0xf4
[<c01066ca>] apic_timer_interrupt+0x1a/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
Code: 89 45 ec 8b 47 78 be f5 ff ff ff 89 c2 c1 fa 02 01 d0 8b 97 84 00 00 00 39 c2 0f 4f d0 8b 47 60 39 d0 0f 8f b3 01 00 00 8b 75 f0 <8b> 53 48 8b 4e 10 39 ca 79 5c 39 4b 4c 79 08 0f 0b c3 03 14 61
<0>Kernel panic - not syncing: Fatal exception in interrupt



Oops #2:
-----------

gdb) l *0xc02fac2c
0xc02fac2c is in tcp_time_to_recover (net/ipv4/tcp_input.c:1352).

1350 static inline int tcp_skb_timedout(struct tcp_opt *tp, struct sk_buff *skb)
1351 {
1352 return (tcp_time_stamp - TCP_SKB_CB(skb)->when > tp->rto);
1353 }
1354


Unable to handle kernel NULL pointer dereference at virtual address 00000050
printing eip:
c02fac2c
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: ipv6 e1000 3c59x ac
CPU: 0
EIP: 0060:[<c02fac2c>] Not tainted VLI
EFLAGS: 00010246 (2.6.9-rc1-mm5)
EIP is at tcp_time_to_recover+0x1d0/0x214
eax: fffcc289 ebx: f77a6320 ecx: 00000002 edx: 00000000
esi: 00000003 edi: f77a6128 ebp: c0460ddc esp: c0460dc4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0460000 task=c039dac0)
Stack: 00000246 fffcc3b1 00000001 f77a6320 00000000 49a2fa4f c0460e20 c02fb752
c0460e20 c02fc1b1 00000000 00010800 49a2fa4f 037a6320 00000001 00000000
00000106 00000004 49a2f4d3 f77a6128 00000003 f77a6320 49a2fa4f c0460e60
Call Trace:
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c02fb752>] tcp_fastretrans_alert+0x146/0x6ed
[<c02fca42>] tcp_ack+0x260/0x5df
[<c02ff67e>] tcp_rcv_established+0x5d0/0x868
[<c0308265>] tcp_v4_do_rcv+0x101/0x103
[<c0308a73>] tcp_v4_rcv+0x80c/0x920
[<c02ed407>] ip_local_deliver+0xa0/0x26d
[<c02edb43>] ip_rcv+0x381/0x4f9
[<c02da8e3>] netif_receive_skb+0x1f7/0x224
[<c02da995>] process_backlog+0x85/0x135
[<c02daacb>] net_rx_action+0x86/0x136
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c01089ed>] do_IRQ+0x185/0x1cf
[<c0106648>] common_interrupt+0x18/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
=======================
[<c0106b21>] show_stack+0x7a/0x90
[<c0106ca2>] show_registers+0x152/0x1ca
[<c0106ea9>] die+0x100/0x186
[<c0115809>] do_page_fault+0x2dc/0x5d0
[<c0106765>] error_code+0x2d/0x38
[<c02fb752>] tcp_fastretrans_alert+0x146/0x6ed
[<c02fca42>] tcp_ack+0x260/0x5df
[<c02ff67e>] tcp_rcv_established+0x5d0/0x868
[<c0308265>] tcp_v4_do_rcv+0x101/0x103
[<c0308a73>] tcp_v4_rcv+0x80c/0x920
[<c02ed407>] ip_local_deliver+0xa0/0x26d
[<c02edb43>] ip_rcv+0x381/0x4f9
[<c02da8e3>] netif_receive_skb+0x1f7/0x224
[<c02da995>] process_backlog+0x85/0x135
[<c02daacb>] net_rx_action+0x86/0x136
[<c0120f24>] __do_softirq+0x64/0xd2
[<c01091aa>] do_softirq+0x47/0x4f
[<c01089ed>] do_IRQ+0x185/0x1cf
[<c0106648>] common_interrupt+0x18/0x20
[<c0103e97>] cpu_idle+0x38/0x5a
[<c042f85a>] start_kernel+0x196/0x1d5
[<c0100211>] 0xc0100211
Code: 83 c4 0c 5b 5e 5f 5d c3 8b 92 7c 01 00 00 83 c2 01 e9 7a fe ff ff 8d 47 64 8b 57 64 39 c2 b8 00 00 00 00 0f 44 d0 a1 a0 f5 39 c0 <2b> 42 50 3b 83 94 00 00 00 77 c7 e9 7b fe ff ff c7 45 f0 00 00
<0>Kernel panic - not syncing: Fatal exception in interrupt


--
James Morris
<[email protected]>


Attachments:
config.txt (25.58 kB)

2004-09-14 02:02:34

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Jesse Barnes wrote:
> On Monday, September 13, 2004 11:10 am, Jesse Barnes wrote:
>
>>On Monday, September 13, 2004 11:06 am, Paul Jackson wrote:
>>
>>>Jesse wrote:
>>>
>>>>I'll send out a more complete one later (unless
>>>>Paul beat me to it,

Sorry, I actually did read your mail about the SD_NODE_INIT thing. It
slipped my mind :(

>>>
>>>See my patch posted a few hours ago:
>>>
>>> [Patch] Fix sched make domain setup overridable
>>
>>Yeah, I saw that, thanks. I meant a more complete dmesg (i.e. one for a
>>bigger system). I've got a 32p reserved for later today.
>
>
> Here's one from a 32p, 16 node machine (captured while scsi was still coming
> up, but you probably don't care about that).
>

OK, in that case you'll also need the attached patch.
Sigh. We'll get there one day.


Attachments:
ia64-make-node-balance.patch (709.00 B)

2004-09-14 02:13:35

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

On Mon, 13 Sep 2004 20:25:38 -0400 (EDT)
James Morris <[email protected]> wrote:

> I'm experiencing TCP related oopses with this kernel (not seen in -mm4),
> .config file attached.
>
> Here are two backtraces, the first happened a few seconds after logging
> in via ssh, the second happened soon after boot (using selinux=0, just to
> make sure).

I think I fixed this one yesterday. Callers of tcp_fragment()
in tcp_output.c were not accounting packets correctly. I
believe this is what will fix it, and this is in Linus's
tree already.

I guess you have an e1000 in this box? :)
(either that or some other card whose driver
enables TSO by default)

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/09/10 15:21:43-07:00 [email protected]
# [TCP]: Fix packet counting when fragmenting already sent packets.
#
# Calls to tcp_fragment() change the tso_factor of
# an SKB, so we need to deal with that.
#
# Signed-off-by: David S. Miller <[email protected]>
#
# net/ipv4/tcp_output.c
# 2004/09/10 15:21:13-07:00 [email protected] +12 -2
# [TCP]: Fix packet counting when fragmenting already sent packets.
#
# Calls to tcp_fragment() change the tso_factor of
# an SKB, so we need to deal with that.
#
# Signed-off-by: David S. Miller <[email protected]>
#
diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c 2004-09-13 18:51:38 -07:00
+++ b/net/ipv4/tcp_output.c 2004-09-13 18:51:38 -07:00
@@ -681,8 +681,12 @@
TCP_SKB_CB(skb)->when = tcp_time_stamp;
if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC)))
break;
- /* Advance the send_head. This one is sent out. */
+
+ /* Advance the send_head. This one is sent out.
+ * This call will increment packets_out.
+ */
update_send_head(sk, tp, skb);
+
tcp_minshall_update(tp, mss_now, skb);
sent_pkts = 1;
}
@@ -968,11 +972,17 @@
return -EAGAIN;

if (skb->len > cur_mss) {
+ int old_factor = TCP_SKB_CB(skb)->tso_factor;
+ int new_factor;
+
if (tcp_fragment(sk, skb, cur_mss))
return -ENOMEM; /* We'll try again later. */

/* New SKB created, account for it. */
- tcp_inc_pcount(&tp->packets_out, skb);
+ new_factor = TCP_SKB_CB(skb)->tso_factor;
+ tcp_dec_pcount_explicit(&tp->packets_out,
+ new_factor - old_factor);
+ tcp_inc_pcount(&tp->packets_out, skb->next);
}

/* Collapse two adjacent packets if worthwhile and we can. */

2004-09-14 02:28:05

by William Lee Irwin III

[permalink] [raw]
Subject: [pidhashing] [0/3] pid allocator updates

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.

The following 3 updates address various issues expressed to me in
unrelated threads or messages, and while none of them are particularly
pressing each does resolve a concern I've deemed valid.


-- wli

2004-09-14 02:34:19

by William Lee Irwin III

[permalink] [raw]
Subject: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Mon, Sep 13, 2004 at 07:28:27PM -0700, William Lee Irwin III wrote:
> I was informed that the vendor component of the copyright can't be
> clobbered without more care, so this patch retains the older vendor,
> updating it only to reflect the appropriate time period.

/proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
lowers it there. Compiletested on x86-64.


Index: mm5-2.6.9-rc1/include/linux/threads.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/threads.h 2004-08-13 22:36:12.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/threads.h 2004-09-13 16:28:38.791798576 -0700
@@ -30,6 +30,6 @@
/*
* A maximum of 4 million PIDs should be enough for a while:
*/
-#define PID_MAX_LIMIT (4*1024*1024)
+#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)

#endif

2004-09-14 02:41:26

by William Lee Irwin III

[permalink] [raw]
Subject: [pidhashing] [3/3] enforce PID_MAX_LIMIT in sysctls

On Mon, Sep 13, 2004 at 07:31:14PM -0700, William Lee Irwin III wrote:
> /proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
> lowers it there. Compiletested on x86-64.

The pid_max sysctl doesn't enforce PID_MAX_LIMIT or sane lower bounds.
RESERVED_PIDS + 1 is the minimum pid_max that won't break alloc_pidmap(),
and PID_MAX_LIMIT may not be aligned to 8*PAGE_SIZE boundaries for
unusual values of PAGE_SIZE, so this also rounds up PID_MAX_LIMIT to it.
Compiletested on x86-64.

Index: mm5-2.6.9-rc1/kernel/pid.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/pid.c 2004-09-13 16:30:21.980111568 -0700
+++ mm5-2.6.9-rc1/kernel/pid.c 2004-09-13 16:33:06.324127480 -0700
@@ -36,7 +36,10 @@

#define RESERVED_PIDS 300

-#define PIDMAP_ENTRIES (PID_MAX_LIMIT/PAGE_SIZE/8)
+int pid_max_min = RESERVED_PIDS + 1;
+int pid_max_max = PID_MAX_LIMIT;
+
+#define PIDMAP_ENTRIES ((PID_MAX_LIMIT + 8*PAGE_SIZE - 1)/PAGE_SIZE/8)
#define BITS_PER_PAGE (PAGE_SIZE*8)
#define BITS_PER_PAGE_MASK (BITS_PER_PAGE-1)
#define mk_pid(map, off) (((map) - pidmap_array)*BITS_PER_PAGE + (off))
Index: mm5-2.6.9-rc1/kernel/sysctl.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sysctl.c 2004-09-13 16:27:44.621033784 -0700
+++ mm5-2.6.9-rc1/kernel/sysctl.c 2004-09-13 16:40:46.358191672 -0700
@@ -68,6 +68,7 @@
extern int sched_base_timeslice;
extern int sched_min_base;
extern int sched_max_base;
+extern int pid_max_min, pid_max_max;

#if defined(CONFIG_X86_LOCAL_APIC) && defined(__i386__)
int unknown_nmi_panic;
@@ -577,7 +578,10 @@
.data = &pid_max,
.maxlen = sizeof (int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = sysctl_intvec,
+ .extra1 = &pid_max_min,
+ .extra2 = &pid_max_max,
},
{
.ctl_name = KERN_PANIC_ON_OOPS,

2004-09-14 02:43:09

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Mon, Sep 13, 2004 at 07:31:14PM -0700, William Lee Irwin III wrote:
> /proc/ breaks when PID_MAX_LIMIT is elevated on 32-bit, so this patch
> lowers it there. Compiletested on x86-64.
[...]
> -#define PID_MAX_LIMIT (4*1024*1024)
> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)

Index: mm5-2.6.9-rc1/include/linux/threads.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/threads.h 2004-08-13 22:36:12.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/threads.h 2004-09-13 19:30:47.552374432 -0700
@@ -30,6 +30,6 @@
/*
* A maximum of 4 million PIDs should be enough for a while:
*/
-#define PID_MAX_LIMIT (4*1024*1024)
+#define PID_MAX_LIMIT (sizeof(long) > 4 ? 4*1024*1024 : PID_MAX_DEFAULT)

#endif

2004-09-14 02:34:20

by William Lee Irwin III

[permalink] [raw]
Subject: [pidhashing] [1/3] retain older vendor copyright

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
>> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>> and will later appear at
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
>> Please check kernel.org before using zip.com.au.

On Mon, Sep 13, 2004 at 07:25:30PM -0700, William Lee Irwin III wrote:
> The following 3 updates address various issues expressed to me in
> unrelated threads or messages, and while none of them are particularly
> pressing each does resolve a concern I've deemed valid.

I was informed that the vendor component of the copyright can't be
clobbered without more care, so this patch retains the older vendor,
updating it only to reflect the appropriate time period.

Index: mm5-2.6.9-rc1/kernel/pid.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/pid.c 2004-09-13 16:27:52.608819456 -0700
+++ mm5-2.6.9-rc1/kernel/pid.c 2004-09-13 16:30:21.980111568 -0700
@@ -1,7 +1,8 @@
/*
* Generic pidhash and scalable, time-bounded PID allocator
*
- * (C) 2002-2004 William Irwin, Oracle
+ * (C) 2002-2003 William Irwin, IBM
+ * (C) 2004 William Irwin, Oracle
* (C) 2002-2004 Ingo Molnar, Red Hat
*
* pid-structures are backing objects for tasks sharing a given ID to chain

2004-09-14 02:17:02

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday, September 13, 2004 7:02 pm, Nick Piggin wrote:
> Sorry, I actually did read your mail about the SD_NODE_INIT thing. It
> slipped my mind :(

Ok, just wanted to make sure I hadn't been spamlisted or something :)

> OK, in that case you'll also need the attached patch.
> Sigh. We'll get there one day.

Ok, I'll give it a try.

Jesse

2004-09-14 02:57:49

by William Lee Irwin III

[permalink] [raw]
Subject: [procfs] [1/1] fix task_mmu.c text size reporting

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.

Not all binfmts page align ->end_code and ->start_code, so the task_mmu
statistics calculations need to perform this allocation themselves.

Index: mm5-2.6.9-rc1/fs/proc/task_mmu.c
===================================================================
--- mm5-2.6.9-rc1.orig/fs/proc/task_mmu.c 2004-09-13 16:27:35.915357248 -0700
+++ mm5-2.6.9-rc1/fs/proc/task_mmu.c 2004-09-13 19:43:19.681033496 -0700
@@ -9,7 +9,7 @@
unsigned long data, text, lib;

data = mm->total_vm - mm->shared_vm - mm->stack_vm;
- text = (mm->end_code - mm->start_code) >> 10;
+ text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
buffer += sprintf(buffer,
"VmSize:\t%8lu kB\n"
@@ -36,7 +36,8 @@
int *data, int *resident)
{
*shared = mm->shared_vm;
- *text = (mm->end_code - mm->start_code) >> PAGE_SHIFT;
+ *text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
+ >> PAGE_SHIFT;
*data = mm->total_vm - mm->shared_vm - *text;
*resident = mm->rss;
return mm->total_vm;

2004-09-14 02:58:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [procfs] [1/1] fix task_mmu.c text size reporting

On Mon, Sep 13, 2004 at 07:53:04PM -0700, William Lee Irwin III wrote:
> Not all binfmts page align ->end_code and ->start_code, so the task_mmu
> statistics calculations need to perform this allocation themselves.

s/allocation/alignment/


-- wli

2004-09-14 03:07:40

by James Morris

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

On Mon, 13 Sep 2004, David S. Miller wrote:

> I think I fixed this one yesterday. Callers of tcp_fragment()
> in tcp_output.c were not accounting packets correctly. I
> believe this is what will fix it, and this is in Linus's
> tree already.

This patch is also in -mm5 (linus.patch), and the oopses go away when I
back it out.

> I guess you have an e1000 in this box? :)

Yes.


- James
--
James Morris
<[email protected]>


2004-09-14 03:34:43

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

David S. Miller <[email protected]> wrote:
>
> @@ -968,11 +972,17 @@
> return -EAGAIN;
>
> if (skb->len > cur_mss) {
> + int old_factor = TCP_SKB_CB(skb)->tso_factor;
> + int new_factor;
> +
> if (tcp_fragment(sk, skb, cur_mss))
> return -ENOMEM; /* We'll try again later. */
>
> /* New SKB created, account for it. */
> - tcp_inc_pcount(&tp->packets_out, skb);
> + new_factor = TCP_SKB_CB(skb)->tso_factor;
> + tcp_dec_pcount_explicit(&tp->packets_out,
> + new_factor - old_factor);

That should be tcp_inc_pcount_explicit.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2004-09-14 04:48:12

by William Lee Irwin III

[permalink] [raw]
Subject: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.
> - Added the `bk-scsi-target' tree to the -mm lineup. It is managed by James
> Bottomley
> - Some enhancements to the ext3 block reservation code here. Please cc
> [email protected] on oops reports ;)
> - There's a patch here which will cause warnings if a PCI device driver is
> removed without having called pci_disable_device(). Please try to cc the
> appropriate mailing list or maintainer when reporting any instances.

I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.

What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. read_proc_profile()
does not flush the per-cpu hashtables because flushing may cause
timeslice overrun on the systems where prof_buffer cacheline bounces
are so problematic as to livelock the timer interrupt.

This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.

I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-13 21:36:35.498912144 -0700
@@ -12,10 +12,18 @@
#include <linux/profile.h>
#include <asm/sections.h>

+struct profile_hit {
+ unsigned long pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
+
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [NR_PROFILE_HIT], cpu_profile_hits);
+#endif /* CONFIG_SMP */

static int __init profile_setup(char * str)
{
@@ -181,6 +189,41 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);

+#ifdef CONFIG_SMP
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ cpu = get_cpu();
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ((~pc << 1) | 1) & (NR_PROFILE_HIT - 1);
+ hits = per_cpu(cpu_profile_hits, cpu);
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +233,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif

void profile_tick(int type, struct pt_regs *regs)
{

2004-09-14 04:55:45

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

On Tue, 14 Sep 2004 13:34:20 +1000
Herbert Xu <[email protected]> wrote:

> > @@ -968,11 +972,17 @@
> > return -EAGAIN;
> >
> > if (skb->len > cur_mss) {
> > + int old_factor = TCP_SKB_CB(skb)->tso_factor;
> > + int new_factor;
> > +
> > if (tcp_fragment(sk, skb, cur_mss))
> > return -ENOMEM; /* We'll try again later. */
> >
> > /* New SKB created, account for it. */
> > - tcp_inc_pcount(&tp->packets_out, skb);
> > + new_factor = TCP_SKB_CB(skb)->tso_factor;
> > + tcp_dec_pcount_explicit(&tp->packets_out,
> > + new_factor - old_factor);
>
> That should be tcp_inc_pcount_explicit.

Better fix is to transpose the factors in the subtraction.
That's what I was trying to do here.

Good eyes Herbert.

2004-09-14 04:58:09

by David Miller

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses


James, does this make your problem go away?

Thanks for testing.

===== net/ipv4/tcp_output.c 1.57 vs edited =====
--- 1.57/net/ipv4/tcp_output.c 2004-09-12 16:17:23 -07:00
+++ edited/net/ipv4/tcp_output.c 2004-09-13 21:36:59 -07:00
@@ -991,7 +991,7 @@
/* New SKB created, account for it. */
new_factor = TCP_SKB_CB(skb)->tso_factor;
tcp_dec_pcount_explicit(&tp->packets_out,
- new_factor - old_factor);
+ old_factor - new_factor);
tcp_inc_pcount(&tp->packets_out, skb->next);
}

2004-09-14 05:07:06

by David Miller

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments


William, any reason not to fully per-cpu the profile buffer
and then only traverse the array when the user attempts to
capture the counters?

Then we can undo the atomics altogether, as well as the cacheline
traffic, for the extremely common case.

Are there space concerns?

2004-09-14 05:09:49

by James Morris

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5: TCP oopses

On Mon, 13 Sep 2004, David S. Miller wrote:

> James, does this make your problem go away?

Looks like it.


- James
--
James Morris
<[email protected]>


2004-09-14 05:08:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

William Lee Irwin III <[email protected]> wrote:
>
> read_proc_profile()
> does not flush the per-cpu hashtables because flushing may cause
> timeslice overrun on the systems where prof_buffer cacheline bounces
> are so problematic as to livelock the timer interrupt.

That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
large number of hits within the hash table is it not possible that the
/proc/profile results could be grossly inaccurate?

If you had two front-ends per cpu to the profiling buffer then the CPU
which is running the /proc/profile read could tell all the other CPUs to
flip to their alternate buffer and can then perform accumulation at its
leisure.

How does oprofile get around this? I guess in most modes the CPUs are not
synchronised.

One wonders how long we should keep flogging the /prof/profile profiling
code. What systems are seeing this livelock?

2004-09-14 05:21:59

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

William Lee Irwin III <[email protected]> wrote:
>> read_proc_profile()
>> does not flush the per-cpu hashtables because flushing may cause
>> timeslice overrun on the systems where prof_buffer cacheline bounces
>> are so problematic as to livelock the timer interrupt.

On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
> That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
> large number of hits within the hash table is it not possible that the
> /proc/profile results could be grossly inaccurate?
> If you had two front-ends per cpu to the profiling buffer then the CPU
> which is running the /proc/profile read could tell all the other CPUs to
> flip to their alternate buffer and can then perform accumulation at its
> leisure.

This is superior to no flushing; I'll implement that and send out an
incremental update (or if preferred, an update of this patch).


On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
> How does oprofile get around this? I guess in most modes the CPUs are not
> synchronised.
> One wonders how long we should keep flogging the /prof/profile profiling
> code. What systems are seeing this livelock?

The original bits were merely a consolidation extracted from a since-
dropped feature patch and an unrelated feature patch from mingo and
arjanv; this is an unrelated fix for SGI's stability issue on larger
Altixen. I personally intend to do no further adjustments.


-- wli

2004-09-14 05:32:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 10:05:07PM -0700, David S. Miller wrote:
> William, any reason not to fully per-cpu the profile buffer
> and then only traverse the array when the user attempts to
> capture the counters?
> Then we can undo the atomics altogether, as well as the cacheline
> traffic, for the extremely common case.
> Are there space concerns?

This was my original approach (modulo eliminating the global buffer
and the atomic operations), but space concerns stymied it, as the
profile buffer can be several megabytes large. It would likely perform
better in general if admissible, for whatever value performance is
considered to have.

There is also an unusual facet to this; the TLB overhead of a loop like:
for (i = 0; i < prof_len; ++i) {
for_each_online_cpu(cpu)
global_buf[i] += per_cpu(cpu_prof_buffer, cpu)[i];
}
is very large and caused "effective nontermination", otherwise known as
"exhausting the user's patience", on SGI's systems after about half an
hour. So some TLB overhead amortization is necessary for this to be
feasible. I suspect iterating over pages of the profile buffer and
storing intermediate results for a page full of profile buffer hits
in a buffer page may suffice though I've not tried it.


-- wli

2004-09-14 05:51:45

by David Miller

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, 13 Sep 2004 22:32:18 -0700
William Lee Irwin III <[email protected]> wrote:

> This was my original approach (modulo eliminating the global buffer
> and the atomic operations), but space concerns stymied it, as the
> profile buffer can be several megabytes large. It would likely perform
> better in general if admissible, for whatever value performance is
> considered to have.
>
> There is also an unusual facet to this; the TLB overhead of a loop like:
> for (i = 0; i < prof_len; ++i) {
> for_each_online_cpu(cpu)
> global_buf[i] += per_cpu(cpu_prof_buffer, cpu)[i];
> }
> is very large and caused "effective nontermination", otherwise known as
> "exhausting the user's patience", on SGI's systems after about half an
> hour. So some TLB overhead amortization is necessary for this to be
> feasible. I suspect iterating over pages of the profile buffer and
> storing intermediate results for a page full of profile buffer hits
> in a buffer page may suffice though I've not tried it.

I bet that, like we found out about page tables on 64-bit, these
profile buffers are sparsely populated with hits. So perhaps a
per-cpu bitmap that indicates regions that might have any hits
at all, allowing large amounts of skipping and thus amortizing the
scan cost.

2004-09-14 06:10:36

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, 13 Sep 2004 22:32:18 -0700 William Lee Irwin III wrote:
>> There is also an unusual facet to this; the TLB overhead of a loop like:
[...]
>> is very large and caused "effective nontermination", otherwise known as
>> "exhausting the user's patience", on SGI's systems after about half an
>> hour. So some TLB overhead amortization is necessary for this to be
>> feasible. I suspect iterating over pages of the profile buffer and
>> storing intermediate results for a page full of profile buffer hits
>> in a buffer page may suffice though I've not tried it.

On Mon, Sep 13, 2004 at 10:49:43PM -0700, David S. Miller wrote:
> I bet that, like we found out about page tables on 64-bit, these
> profile buffers are sparsely populated with hits. So perhaps a
> per-cpu bitmap that indicates regions that might have any hits
> at all, allowing large amounts of skipping and thus amortizing the
> scan cost.

Well, that would speed it up, but the catastrophe was avoided in the
older patches by just processing all the hits for one cpu at a time,
and the buffering methods above for your suggested accounting
structures likely work well enough the overhead of processing unused
portions of the bitmap can be ignored. I don't really want to go about
addressing performance issues besides effective or actual
nontermination for this code, and would rather leave highly efficient
methods to oprofile (in fact, some others believe that even bugfixes
for such issues should be ignored for kernel/profile.c, contrary to my
notion that it shouldn't crash systems regardless of their size).


-- wli

2004-09-14 06:18:12

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 11:10:23PM -0700, William Lee Irwin III wrote:
> Well, that would speed it up, but the catastrophe was avoided in the
> older patches by just processing all the hits for one cpu at a time,
> and the buffering methods above for your suggested accounting
> structures likely work well enough the overhead of processing unused
> portions of the bitmap can be ignored. I don't really want to go about
> addressing performance issues besides effective or actual
> nontermination for this code, and would rather leave highly efficient
> methods to oprofile (in fact, some others believe that even bugfixes
> for such issues should be ignored for kernel/profile.c, contrary to my
> notion that it shouldn't crash systems regardless of their size).

s/portions of the bitmap/portions of the profile buffer/

2004-09-14 06:32:03

by Kirill Korotaev

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

--- ./kernel/exit.c.nt 2004-09-13 18:00:12.727181136 +0400
+++ ./kernel/exit.c 2004-09-13 18:00:51.864231400 +0400
@@ -848,10 +848,7 @@ asmlinkage long sys_exit(int error_code)
task_t fastcall *next_thread(const task_t *p)
{
#ifdef CONFIG_SMP
- if (!p->sighand)
- BUG();
- if (!spin_is_locked(&p->sighand->siglock) &&
- !rwlock_is_locked(&tasklist_lock))
+ if (!rwlock_is_locked(&tasklist_lock) || p->pids[PIDTYPE_TGID].nr == 0)
BUG();
#endif
return pid_task(p->pids[PIDTYPE_TGID].pid_list.next, PIDTYPE_TGID);


Attachments:
diff-next_thread (524.00 B)

2004-09-14 06:46:42

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 10:05:21PM -0700, Andrew Morton wrote:
>> That's a bit of a problem, isn't it? As we can accumulate an arbitrarily
>> large number of hits within the hash table is it not possible that the
>> /proc/profile results could be grossly inaccurate?
>> If you had two front-ends per cpu to the profiling buffer then the CPU
>> which is running the /proc/profile read could tell all the other CPUs to
>> flip to their alternate buffer and can then perform accumulation at its
>> leisure.

On Mon, Sep 13, 2004 at 10:21:18PM -0700, William Lee Irwin III wrote:
> This is superior to no flushing; I'll implement that and send out an
> incremental update (or if preferred, an update of this patch).

I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.

What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. In order to "flush"
the pending profile hits for read_profile(), this patch actually
creates a pair of per-cpu profile buffer, and at the time of
read_profile() IPI's all cpus to get them to flip between their pairs
of profile buffers, doing all the work to flush the profile hits from
the older per-cpu buffers in the context of the caller of read_profile(),
with exclusion provided by a semaphore ensuring that only one caller of
profile_flip_buffers() may execute at a time and interrupt disablement
to prevent buffer flip IPI's from altering the hashtables or flip state
while an update is in progress. The flip state is per-cpu so that
remote cpus need only disable interrupts locally for synchronization,
which is both simple and busywait-free for remote cpus, and the flip
states all change in tandem with the cpu requesting the update waiting
for the completion of smp_call_function() for notification that all
cpus have finished flipping their buffers. The IPI handler merely
toggles the flip state (which is an array index) between 0 and 1.

This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.

I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-13 23:12:27.574463744 -0700
@@ -11,11 +11,21 @@
#include <linux/cpumask.h>
#include <linux/profile.h>
#include <asm/sections.h>
+#include <asm/semaphore.h>
+
+struct profile_hit {
+ unsigned long pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))

static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(int, cpu_profile_flip);
+#endif /* CONFIG_SMP */

static int __init profile_setup(char * str)
{
@@ -181,6 +191,74 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);

+#ifdef CONFIG_SMP
+static void __profile_flip_buffers(void *unused)
+{
+ int cpu = get_cpu();
+ unsigned long flags;
+
+ local_irq_save(flags);
+ per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
+ local_irq_restore(flags);
+ put_cpu();
+}
+
+static void profile_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, j, cpu;
+
+ down(&profile_flip_mutex);
+ j = per_cpu(cpu_profile_flip, smp_processor_id());
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ if (!hits[i].hits)
+ continue;
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ }
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ cpu = get_cpu();
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ((~pc << 1) | 1) & (NR_PROFILE_HIT - 1);
+ hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else /* !CONFIG_SMP */
+#define profile_flip_buffers() do { } while (0)
+
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +268,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif /* !CONFIG_SMP */

void profile_tick(int type, struct pt_regs *regs)
{
@@ -256,6 +335,7 @@
char * pnt;
unsigned int sample_step = 1 << prof_shift;

+ profile_flip_buffers();
if (p >= (prof_len+1)*sizeof(unsigned int))
return 0;
if (count > (prof_len+1)*sizeof(unsigned int) - p)

2004-09-14 06:54:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

William Lee Irwin III <[email protected]> wrote:
>

A few comments which describe the design would be nice...

> +#ifdef CONFIG_SMP
> +static void __profile_flip_buffers(void *unused)
> +{
> + int cpu = get_cpu();
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
> + local_irq_restore(flags);
> + put_cpu();
> +}

hm. Does an IPI handler need to disable local IRQs?

> +static void profile_flip_buffers(void)
> +{
> + static DECLARE_MUTEX(profile_flip_mutex);
> + int i, j, cpu;
> +
> + down(&profile_flip_mutex);
> + j = per_cpu(cpu_profile_flip, smp_processor_id());

Is this preempt-safe?

> + on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
> + for_each_online_cpu(cpu) {
> + struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];


2004-09-14 07:56:57

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

William Lee Irwin III <[email protected]> wrote:
[...]

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> A few comments which describe the design would be nice...

Okay, I'll add a few in another update. I suppose what's going on may
not be as obvious to everyone else even with the code in hand.


On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
>> + local_irq_save(flags);
>> + per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
>> + local_irq_restore(flags);
>> + put_cpu();
>> +}

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> hm. Does an IPI handler need to disable local IRQs?

It's for exclusion from the timer interrupt. It looks like ia32 enters
the calls with interrupts disabled, so it's probably safe to assume
it's called with disabled interrupts for all architectures (or what
architectures don't are broken by other callers elsewhere). I'll send
out an update with the explicit interrupt disablement removed.


William Lee Irwin III <[email protected]> wrote:
>> + down(&profile_flip_mutex);
>> + j = per_cpu(cpu_profile_flip, smp_processor_id());

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
> Is this preempt-safe?

Yes. It's irrelevant which cpu's cpu_profile_flip is sampled. But
it's not cpu hotplug -safe, as the cpu may be offlined and the per-cpu
storage freed in the duration between calling smp_processor_id()
and dereferencing the offset from the start of the per-cpu area.
Disabling preemption while it's being sampled (no longer than that is
necessary) would repair it for cpu hotplug, as it would then have a
valid cpu (the one on which it's executing) while the flip state is
being sampled (it can't change because we own the semaphore, and won't
vary by cpu unless the on_each_cpu() is in flight, but we have to have
a valid cpu number to sample it). The cpucontrol semaphore would
be excessively heavyweight and we'd either have to conditionally
compile out the native semaphore for the cpu hotplug case or otherwise
acquire two semaphores in succession.

This raises an interesting question of how on earth for_each_online_cpu()
is handled by cpu hotplug, but I don't feel responsible for answering it.

So, my preferred fix is the following, with which I'll send out an
updated patch if everyone agrees:

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 23:12:27.574463744 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 00:10:29.820081944 -0700
@@ -209,7 +209,8 @@
int i, j, cpu;

down(&profile_flip_mutex);
- j = per_cpu(cpu_profile_flip, smp_processor_id());
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
for_each_online_cpu(cpu) {
struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];

2004-09-14 08:48:38

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 11:52:25PM -0700, Andrew Morton wrote:
>> A few comments which describe the design would be nice...

On Tue, Sep 14, 2004 at 12:55:44AM -0700, William Lee Irwin III wrote:
> Okay, I'll add a few in another update. I suppose what's going on may
> not be as obvious to everyone else even with the code in hand.

The comments and all other issues raised in your reply have been
addressed in the following updated patch, in which I also shrank the
hashtable entries' fields to u32 for 64-bit machines, on which the full
precision of an unsigned long is unnecessary, added some commentary to
the beginning of the file describing its contents and the recent major
work done on it, and simplified the secondary hash function. I also
presume silence to be assent regarding the hotplug (not preempt) fix.


-- wli

I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.

What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. In order to "flush"
the pending profile hits for read_profile(), this patch actually
creates a pair of per-cpu profile buffer, and at the time of
read_profile() IPI's all cpus to get them to flip between their pairs
of profile buffers, doing all the work to flush the profile hits from
the older per-cpu buffers in the context of the caller of read_profile(),
with exclusion provided by a semaphore ensuring that only one caller of
profile_flip_buffers() may execute at a time and interrupt disablement
to prevent buffer flip IPI's from altering the hashtables or flip state
while an update is in progress. The flip state is per-cpu so that
remote cpus need only disable interrupts locally for synchronization,
which is both simple and busywait-free for remote cpus, and the flip
states all change in tandem with the cpu requesting the update waiting
for the completion of smp_call_function() for notification that all
cpus have finished flipping their buffers. The IPI handler merely
toggles the flip state (which is an array index) between 0 and 1.

This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.

I also took the liberty of adding some commentary to the comments at
the beginning of the file reflecting the major work done on profile.c
in recent months and describing what the file implements..

I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 01:27:49.675716672 -0700
@@ -1,5 +1,16 @@
/*
* linux/kernel/profile.c
+ * Simple profiling. Manages a direct-mapped profile hit count buffer,
+ * with configurable resolution, support for restricting the cpus on
+ * which profiling is done, and switching between cpu time and
+ * schedule() calls via kernel command line parameters passed at boot.
+ *
+ * Scheduler profiling support, Arjan van de Ven and Ingo Molnar,
+ * Red Hat, July 2004
+ * Consolidation of architecture support code for profiling,
+ * William Irwin, Oracle, July 2004
+ * Amortized hit count accounting via per-cpu open-addressed hashtables
+ * to resolve timer interrupt livelocks, William Irwin, Oracle, 2004
*/

#include <linux/config.h>
@@ -11,11 +22,21 @@
#include <linux/cpumask.h>
#include <linux/profile.h>
#include <asm/sections.h>
+#include <asm/semaphore.h>
+
+struct profile_hit {
+ u32 pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))

static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(int, cpu_profile_flip);
+#endif /* CONFIG_SMP */

static int __init profile_setup(char * str)
{
@@ -181,6 +202,100 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);

+#ifdef CONFIG_SMP
+/*
+ * Each cpu has a pair of open-addressed hashtables for pending
+ * profile hits. read_profile() IPI's all cpus to request them
+ * to flip buffers and flushes their contents to prof_buffer itself.
+ * Flip requests are serialized by the profile_flip_mutex. The sole
+ * use of having a second hashtable is for avoiding cacheline
+ * contention that would otherwise happen during flushes of pending
+ * profile hits required for the accuracy of reported profile hits
+ * and so resurrect the interrupt livelock issue.
+ *
+ * The open-addressed hashtables are indexed by profile buffer slot
+ * and hold the number of pending hits to that profile buffer slot on
+ * a cpu in an entry. When the hashtable overflows, all pending hits
+ * are accounted to their corresponding profile buffer slots with
+ * atomic_add() and the hashtable emptied. As numerous pending hits
+ * may be accounted to a profile buffer slot in a hashtable entry,
+ * this amortizes a number of atomic profile buffer increments likely
+ * to be far larger than the number of entries in the hashtable,
+ * particularly given that the number of distinct profile buffer
+ * positions to which hits are accounted during short intervals (e.g.
+ * several seconds) is usually very small. Exclusion from buffer
+ * flipping is provided by interrupt disablement (note that for
+ * SCHED_PROFILING profile_hit() may be called from process context).
+ * The hash function is meant to be lightweight as opposed to strong,
+ * and was vaguely inspired by ppc64 firmware-supported inverted
+ * pagetable hash functions, but doesn't use finite collision chains.
+ *
+ * -- wli
+ */
+static void __profile_flip_buffers(void *unused)
+{
+ int cpu = smp_processor_id();
+
+ per_cpu(cpu_profile_flip, cpu) = !per_cpu(cpu_profile_flip, cpu);
+}
+
+static void profile_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, j, cpu;
+
+ down(&profile_flip_mutex);
+ j = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[j];
+ for (i = 0; i < NR_PROFILE_HIT; ++i) {
+ if (!hits[i].hits)
+ continue;
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ }
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ~(pc << 1) & (NR_PROFILE_HIT - 1);
+ cpu = get_cpu();
+ hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else /* !CONFIG_SMP */
+#define profile_flip_buffers() do { } while (0)
+
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +305,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif /* !CONFIG_SMP */

void profile_tick(int type, struct pt_regs *regs)
{
@@ -256,6 +372,7 @@
char * pnt;
unsigned int sample_step = 1 << prof_shift;

+ profile_flip_buffers();
if (p >= (prof_len+1)*sizeof(unsigned int))
return 0;
if (count > (prof_len+1)*sizeof(unsigned int) - p)

2004-09-14 09:08:05

by Nikita Danilov

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Rafael J. Wysocki writes:
> On Monday 13 of September 2004 10:50, Andrew Morton wrote:
> >
> > Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> >
> > and will later appear at
> >
> >
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
>
> It does not compile on SMP x86-64 w/ NUMA:
>
> CC arch/x86_64/ia32/ia32_ioctl.o
> In file included from fs/compat_ioctl.c:63,
> from arch/x86_64/ia32/ia32_ioctl.c:14:
> include/linux/reiserfs_fs.h:441: error: redefinition of `struct key'
> include/linux/reiserfs_fs.h: In function `le_key_k_offset':

include/linux/key.h defines struct key that conflicts with reiserfs'
struct key. As a temporary fix turn off CONFIG_KEYS (or
CONFIG_REISERFS_FS :)).

Correct solution is to put both structs into proper namespaces by
prefixing them.

> Greets,
> RJW
>

Nikita.

2004-09-14 09:14:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

Nikita Danilov <[email protected]> wrote:
>
> include/linux/key.h defines struct key that conflicts with reiserfs'
> struct key. As a temporary fix turn off CONFIG_KEYS (or
> CONFIG_REISERFS_FS :)).
>
> Correct solution is to put both structs into proper namespaces by
> prefixing them.

struct key was pretty dumb of both of you, but reiserfs was dumb first.

David, what do you want it renamed to?

2004-09-14 09:56:00

by Lorenzo Allegrucci

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Monday 13 September 2004 10:50, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
>
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
>
> and will later appear at
>
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6
>.9-rc1-mm5/

100% reproducible under heavy IO load:

Sep 14 11:42:59 odyssey kernel: journal_bmap: journal block not found at
offset 2060 on hda12
Sep 14 11:42:59 odyssey kernel: Aborting journal on device hda12.
Sep 14 11:42:59 odyssey kernel: EXT3-fs error (device hda12) in
ext3_dirty_inode: IO failure
Sep 14 11:43:00 odyssey kernel: ext3_abort called.
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12):
ext3_journal_start: Detected aborted journal
Sep 14 11:43:00 odyssey kernel: Remounting filesystem read-only
Sep 14 11:43:00 odyssey kernel: ext3_reserve_inode_write: aborting
transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs
error (device hda12) in ext3_reserve_inode_write: Journal has aborted
Sep 14 11:43:00 odyssey kernel: ext3_reserve_inode_write: aborting
transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs
error (device hda12) in ext3_reserve_inode_write: Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in
ext3_orphan_del: Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in ext3_truncate:
Journal has aborted
Sep 14 11:43:00 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:01 odyssey last message repeated 17 times
Sep 14 11:43:01 odyssey kernel: or (device hda12) in start_transaction:
Journal has aborted
Sep 14 11:43:01 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:02 odyssey last message repeated 53 times
Sep 14 11:43:02 odyssey kernel: EXT3-fs error (device hda12) in staror (device
hda12) in start_transaction: Journal has aborted
Sep 14 11:43:02 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:03 odyssey last message repeated 53 times
Sep 14 11:43:03 odyssey kernel: EXT3-fs error (device hda12) in staror (device
hda12) in start_transaction: Journal has aborted
Sep 14 11:43:03 odyssey kernel: EXT3-fs error (device hda12) in
start_transaction: Journal has aborted
Sep 14 11:43:34 odyssey last message repeated 147542 times

2004-09-14 10:56:53

by Roger Luethi

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
> -#define PID_MAX_LIMIT (4*1024*1024)
> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)

An architecture with sizeof(long) > 32? -- Most impressive.

Roger

2004-09-14 11:15:22

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On 2004-09-14T12:55:27,
Roger Luethi <[email protected]> said:

> > -#define PID_MAX_LIMIT (4*1024*1024)
> > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> An architecture with sizeof(long) > 32? -- Most impressive.

x86_64, s390x, ppc64...


2004-09-14 11:40:43

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
> timer interrupt, usually at boot. The following patch attempts to
> amortize the atomic operations done on the profile buffer to address
> this stability concern. This patch has nothing to do with performance;

isn't it *much* simpler and much more efficient to just have a per-cpu
idle function? I seriously doubt you'll get simultaneous collisions on
anything but the 'halt' instruction in the idle function.

2004-09-14 12:08:08

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On 2004-09-14T13:10:24,
Lars Marowsky-Bree <[email protected]> said:

> > > -#define PID_MAX_LIMIT (4*1024*1024)
> > > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> > An architecture with sizeof(long) > 32? -- Most impressive.
> x86_64, s390x, ppc64...

yesyes. I can't tell the difference between bytes and bits either.
Forget it ;-)


2004-09-14 12:12:49

by Roger Luethi

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Tue, 14 Sep 2004 13:10:24 +0200, Lars Marowsky-Bree wrote:
> On 2004-09-14T12:55:27,
> Roger Luethi <[email protected]> said:
>
> > > -#define PID_MAX_LIMIT (4*1024*1024)
> > > +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
> > An architecture with sizeof(long) > 32? -- Most impressive.
>
> x86_64, s390x, ppc64...

Really.

2004-09-14 13:27:38

by David Howells

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5


> > Correct solution is to put both structs into proper namespaces by
> > prefixing them.
>
> struct key was pretty dumb of both of you, but reiserfs was dumb first.

Well, I argue that it wasn't that dumb - in this case it's meant to be a
generic mechanism usable by everything in the kernel or userspace that needs
authentication, authorisation, or crypto tokens. I use EXT3 rather than
ReiserFS, so it didn't become an issue.

> David, what do you want it renamed to?

key_struct? token? key_token?

Possibly ticket or principal, though they make it sound like it's specifically
for Kerberos, so perhaps not.

What I need is a thesaurus.

JamesM: any good suggestion as to a name?

David

2004-09-14 14:25:24

by James Morris

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Tue, 14 Sep 2004, David Howells wrote:

> > David, what do you want it renamed to?
>
> key_struct? token? key_token?
>
> Possibly ticket or principal, though they make it sound like it's specifically
> for Kerberos, so perhaps not.

Then there's the related problem of what do do about the naming of
key_alloc(), key.h etc.

What about 'akey', where a is for authentication or access.


- James
--
James Morris
<[email protected]>



2004-09-14 15:43:07

by David Howells

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5


> What about 'akey', where a is for authentication or access.

How about struct key_cookie? Though I think I like struct key_token better. I
like struct key even better though:-)

David

2004-09-14 15:47:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
>> -#define PID_MAX_LIMIT (4*1024*1024)
>> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)

On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
> An architecture with sizeof(long) > 32? -- Most impressive.

Did the correction not arrive?


-- wli

2004-09-14 15:55:56

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
>> timer interrupt, usually at boot. The following patch attempts to
>> amortize the atomic operations done on the profile buffer to address
>> this stability concern. This patch has nothing to do with performance;

On Tue, Sep 14, 2004 at 01:34:19PM +0200, Andrea Arcangeli wrote:
> isn't it *much* simpler and much more efficient to just have a per-cpu
> idle function? I seriously doubt you'll get simultaneous collisions on
> anything but the 'halt' instruction in the idle function.

Sampling the profile buffer at regular intervals shows far less than
256 distinct functions hit in 1s intervals even with all cpus busy. As
for whether that would be sufficient, that will have to be answered by
those who reported the bug. I suppose to test whether things besides
idling do cause this problem, one would boot with a restricted
prof_cpu_mask, load all cpus on the machine, set the prof_cpu_mask to
unrestricted, and see if it livelocks before the load terminates.


-- wli

2004-09-14 15:59:29

by Roger Luethi

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Tue, 14 Sep 2004 08:41:44 -0700, William Lee Irwin III wrote:
> On Mon, 13 Sep 2004 19:31:14 -0700, William Lee Irwin III wrote:
> >> -#define PID_MAX_LIMIT (4*1024*1024)
> >> +#define PID_MAX_LIMIT (sizeof(long) > 32 ? 4*1024*1024 : PID_MAX_DEFAULT)
>
> On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
> > An architecture with sizeof(long) > 32? -- Most impressive.
>
> Did the correction not arrive?

Must have missed it.

Roger

2004-09-14 16:18:11

by Jesse Barnes

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
> It probably worth to measure it. The real bottleneck happens when all
> cpus tries to get an exclusive lock on the same cacheline at the *same*
> time. 1 second is a pretty long time, if there's no contention of the
> cacheline, things are normally ok.

Right, we want to avoid that heavy contention.

> this is basically the same issue we had with RCU since all timers fired
> at the same wall clock time, and all of them tried to change bits in the
> same cacheline at the same time, that is a workload that collapse a
> 512-way machine ;). The profile timer is no different.
>
> Simply removing the idle time accounting would fix it, however this
> cripple down functionality a little bit, but it'll be a very good way to
> test if my theory is correct, or if you truly need some per-cpu logic in
> the profiler.
>
> You could also fake it, have a per-cpu counter only for the current->pid
> case, and then once somebody reads /proc/profile, you flush the total
> per-cpu count to the counter in the buffer that corresponds to the EIP
> of the idle func.
>
> Before dedicidng I'd suggest to have a look and see how the below patch
> compares to your approch in performance terms.

It looks like the 512p we have here is pretty heavily reserved this week, so
I'm not sure if I'll be able to test this (someone else might, John?). I
think the balance we're looking for is between simplicity and non-brokenness.
Builtin profiling is *supposed* to be simple and dumb, and were it not for
the readprofile times, I'd say per-cpu would be the way to go just because it
retains the simplicity of the current approach while allowing it to work on
large machines (as well as limiting the performance impact of builtin
profiling in general). wli's approach seems like a reasonable tradeoff
though, assuming what you suggest doesn't work.

Thanks,
Jesse

2004-09-14 16:13:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 08:51:03AM -0700, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 09:47:48PM -0700, William Lee Irwin III wrote:
> >> timer interrupt, usually at boot. The following patch attempts to
> >> amortize the atomic operations done on the profile buffer to address
> >> this stability concern. This patch has nothing to do with performance;
>
> On Tue, Sep 14, 2004 at 01:34:19PM +0200, Andrea Arcangeli wrote:
> > isn't it *much* simpler and much more efficient to just have a per-cpu
> > idle function? I seriously doubt you'll get simultaneous collisions on
> > anything but the 'halt' instruction in the idle function.
>
> Sampling the profile buffer at regular intervals shows far less than
> 256 distinct functions hit in 1s intervals even with all cpus busy. As
> for whether that would be sufficient, that will have to be answered by
> those who reported the bug. I suppose to test whether things besides
> idling do cause this problem, one would boot with a restricted
> prof_cpu_mask, load all cpus on the machine, set the prof_cpu_mask to
> unrestricted, and see if it livelocks before the load terminates.

It probably worth to measure it. The real bottleneck happens when all
cpus tries to get an exclusive lock on the same cacheline at the *same*
time. 1 second is a pretty long time, if there's no contention of the
cacheline, things are normally ok.

this is basically the same issue we had with RCU since all timers fired
at the same wall clock time, and all of them tried to change bits in the
same cacheline at the same time, that is a workload that collapse a
512-way machine ;). The profile timer is no different.

Simply removing the idle time accounting would fix it, however this
cripple down functionality a little bit, but it'll be a very good way to
test if my theory is correct, or if you truly need some per-cpu logic in
the profiler.

You could also fake it, have a per-cpu counter only for the current->pid
case, and then once somebody reads /proc/profile, you flush the total
per-cpu count to the counter in the buffer that corresponds to the EIP
of the idle func.

Before dedicidng I'd suggest to have a look and see how the below patch
compares to your approch in performance terms.

--- sles/arch/ia64/kernel/time.c.~1~ 2004-08-25 02:47:33.000000000 +0200
+++ sles/arch/ia64/kernel/time.c 2004-09-14 18:01:39.792182008 +0200
@@ -206,6 +206,9 @@ ia64_do_profile (struct pt_regs * regs)
if (!prof_buffer)
return;

+ if (!current->pid)
+ return;
+
ip = instruction_pointer(regs);
/* Conserve space in histogram by encoding slot bits in address
* bits 2 and 3 rather than bits 0 and 1.

2004-09-14 17:23:16

by Roger Luethi

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Tue, 14 Sep 2004 09:41:57 -0700, William Lee Irwin III wrote:
> Please check to see that the above message arrived.

It's in the archive. Sorry for the noise.

Roger

2004-09-14 17:10:26

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 06:31:43PM +0200, Andrea Arcangeli wrote:
> per-cpu certainly sounds simple enough conceptually, so if you can
> notice any slowdown even with idle loop ruled out, per-cpu is sure
> better.
> This bouncing is likely to hurt smaller SMP too (but once the cpu is
> idle normally it's not a too bad thing since it only hurted reschedule
> latency, since we remain stuck in the timer irq for a bit longer than we
> should), but duplicating the ram of the array there doesn't look as nice
> as it would be on the altix, not all SMP have tons of ram. So an
> intermediate solution for this problem still sound worthwhile for the
> normal smp case.

Could you clarify whether you deem the per-cpu hashtable -based
amortization acceptable or whether this refers to per-cpu profile
buffers? I devised the hashtables to address space footprint concerns,
so I'm in a pickle if both have pending objections.

Thanks.


-- wli

2004-09-14 19:04:41

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
>> Before dedicidng I'd suggest to have a look and see how the below patch
>> compares to your approch in performance terms.

On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
> It looks like the 512p we have here is pretty heavily reserved this week, so
> I'm not sure if I'll be able to test this (someone else might, John?). I
> think the balance we're looking for is between simplicity and
> non-brokenness. Builtin profiling is *supposed* to be simple and dumb,
> and were it not for the readprofile times, I'd say per-cpu would be
> the way to go just because it retains the simplicity of the current
> approach while allowing it to work on large machines (as well as
> limiting the performance impact of builtin profiling in general).
> wli's approach seems like a reasonable tradeoff though, assuming what
> you suggest doesn't work.

Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
hotplug-ize this?


-- wli

Atop the prior per-cpu hashtable patch. It turns out that ia64 has
limitations on the sizes of per-cpu areas to the size of an area
covered by a single TLB entry, and worse yet, as short format VHPT
is being used, this TLB entry is limited to the PAGE_SIZE of the
region used for kernel data.

In order to address this, the following patch dynamically allocates
the per-cpu hashtables at boot-time. It probably needs adjustments
for cpu hotplug.


Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 01:27:49.675716672 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 10:20:43.589942872 -0700
@@ -34,7 +34,7 @@
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
#ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct profile_hit [2][NR_PROFILE_HIT], cpu_profile_hits);
+static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
#endif /* CONFIG_SMP */

@@ -273,6 +273,10 @@
secondary = ~(pc << 1) & (NR_PROFILE_HIT - 1);
cpu = get_cpu();
hits = per_cpu(cpu_profile_hits, cpu)[per_cpu(cpu_profile_flip, cpu)];
+ if (!hits) {
+ put_cpu();
+ return;
+ }
local_irq_save(flags);
do {
if (hits[i].pc == pc) {
@@ -423,17 +427,58 @@
.write = write_profile,
};

+#ifdef CONFIG_SMP
+static void __init profile_nop(void *unused)
+{
+}
+#endif
+
static int __init create_proc_profile(void)
{
struct proc_dir_entry *entry;
+ int cpu;

+ (void)cpu;
if (!prof_on)
return 0;
+#ifdef CONFIG_SMP
+ for_each_online_cpu(cpu) {
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (struct profile_hit *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ goto out_cleanup;
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (struct profile_hit *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[1])
+ continue;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ goto out_cleanup;
+ }
+#endif /* CONFIG_SMP */
if (!(entry = create_proc_entry("profile", S_IWUSR | S_IRUGO, NULL)))
return 0;
entry->proc_fops = &proc_profile_operations;
entry->size = (1+prof_len) * sizeof(atomic_t);
return 0;
+#ifdef CONFIG_SMP
+out_cleanup:
+ prof_on = 0;
+ mb();
+ on_each_cpu(profile_nop, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ unsigned long kvaddr
+ = (unsigned long)per_cpu(cpu_profile_hits, cpu)[0];
+
+ if (!kvaddr)
+ break;
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page(kvaddr);
+ kvaddr = (unsigned long)per_cpu(cpu_profile_hits, cpu)[1];
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ free_page(kvaddr);
+ }
+ return -1;
+#endif /* CONFIG_SMP */
}
module_init(create_proc_profile);
#endif /* CONFIG_PROC_FS */

2004-09-14 20:14:18

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
>> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
>> hotplug-ize this?

On Tue, Sep 14, 2004 at 01:02:20PM -0700, William Lee Irwin III wrote:
> Okay, here's an attempt to hotplug-ize it. I have no clue whether this
> actually works, compiles, or follows whatever rules there are about
> dynamically allocated data referenced by per_cpu areas.

Take 2: actually register the notifier I wrote.


Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 10:20:43.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 12:56:33.871160032 -0700
@@ -20,6 +20,7 @@
#include <linux/notifier.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/cpu.h>
#include <linux/profile.h>
#include <asm/sections.h>
#include <asm/semaphore.h>
@@ -297,6 +298,44 @@
local_irq_restore(flags);
put_cpu();
}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int __devinit profile_cpu_callback(struct notifier_block *info,
+ unsigned long action, void *__cpu)
+{
+ int cpu = (unsigned long)__cpu;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(cpu_profile_flip, cpu) = 0;
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ return NOTIFY_BAD;
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[0])
+ break;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ return NOTIFY_BAD;
+ break;
+ case CPU_ONLINE:
+ cpu_set(cpu, prof_cpu_mask);
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ cpu_clear(cpu, prof_cpu_mask);
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ break;
+ }
+ return NOTIFY_OK;
+}
+#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)

@@ -459,6 +498,7 @@
return 0;
entry->proc_fops = &proc_profile_operations;
entry->size = (1+prof_len) * sizeof(atomic_t);
+ hotcpu_notifier(profile_cpu_callback, 0);
return 0;
#ifdef CONFIG_SMP
out_cleanup:

2004-09-14 20:08:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
>> It looks like the 512p we have here is pretty heavily reserved this
>> week, so I'm not sure if I'll be able to test this (someone else
>> might, John?). I think the balance we're looking for is between
>> simplicity and non-brokenness. Builtin profiling is *supposed* to be
>> simple and dumb, and were it not for the readprofile times, I'd say
>> per-cpu would be the way to go just because it retains the simplicity
>> of the current approach while allowing it to work on large machines
>> (as well as limiting the performance impact of builtin profiling in
>> general). wli's approach seems like a reasonable tradeoff though,
>> assuming what you suggest doesn't work.

On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
> hotplug-ize this?

Okay, here's an attempt to hotplug-ize it. I have no clue whether this
actually works, compiles, or follows whatever rules there are about
dynamically allocated data referenced by per_cpu areas.


-- wli

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 10:20:43.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 12:52:16.064352624 -0700
@@ -20,6 +20,7 @@
#include <linux/notifier.h>
#include <linux/mm.h>
#include <linux/cpumask.h>
+#include <linux/cpu.h>
#include <linux/profile.h>
#include <asm/sections.h>
#include <asm/semaphore.h>
@@ -297,6 +298,44 @@
local_irq_restore(flags);
put_cpu();
}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int __devinit profile_cpu_callback(struct notifier_block *info,
+ unsigned long action, void *__cpu)
+{
+ int cpu = (unsigned long)__cpu;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(cpu_profile_flip, cpu) = 0;
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ per_cpu(cpu_profile_hits, cpu)[1]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!per_cpu(cpu_profile_hits, cpu)[1])
+ return NOTIFY_BAD;
+ if (!per_cpu(cpu_profile_hits, cpu)[0])
+ per_cpu(cpu_profile_hits, cpu)[0]
+ = (void *)get_zeroed_page(GFP_KERNEL);
+ if (per_cpu(cpu_profile_hits, cpu)[0])
+ break;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ return NOTIFY_BAD;
+ break;
+ case CPU_ONLINE:
+ cpu_set(cpu, prof_cpu_mask);
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ cpu_clear(cpu, prof_cpu_mask);
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[0]);
+ per_cpu(cpu_profile_hits, cpu)[0] = NULL;
+ free_page((unsigned long)per_cpu(cpu_profile_hits, cpu)[1]);
+ per_cpu(cpu_profile_hits, cpu)[1] = NULL;
+ break;
+ }
+ return NOTIFY_OK;
+}
+#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)

2004-09-14 20:19:28

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tuesday, September 14, 2004 9:05 am, Andrea Arcangeli wrote:
>>> Before dedicidng I'd suggest to have a look and see how the below patch
>>> compares to your approch in performance terms.

On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
>> It looks like the 512p we have here is pretty heavily reserved this
>> week, so I'm not sure if I'll be able to test this (someone else
>> might, John?). I think the balance we're looking for is between
>> simplicity and non-brokenness. Builtin profiling is *supposed* to be
>> simple and dumb, and were it not for the readprofile times, I'd say
>> per-cpu would be the way to go just because it retains the
>> simplicity of the current approach while allowing it to work on
>> large machines (as well as limiting the performance impact of
>> builtin profiling in general). wli's approach seems like a
>> reasonable tradeoff though, assuming what you suggest doesn't work.

On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
> hotplug-ize this?

Successfully tested on x86-64.

-- wli

2004-09-14 21:14:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 12:00:30PM -0700, William Lee Irwin III wrote:
>>> Goddamn fscking short-format VHPT crap. Rusty, how the hell do I
>>> hotplug-ize this?

On Tue, Sep 14, 2004 at 01:02:20PM -0700, William Lee Irwin III wrote:
>> Okay, here's an attempt to hotplug-ize it. I have no clue whether this
>> actually works, compiles, or follows whatever rules there are about
>> dynamically allocated data referenced by per_cpu areas.

On Tue, Sep 14, 2004 at 01:04:53PM -0700, William Lee Irwin III wrote:
> Take 2: actually register the notifier I wrote.

As pointed out by John Hawkes, I forgot to flush the pending hits at
the time of profile buffer reset. The following patch, atop the cpu
hotplug notifier bits, does so.

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 12:56:33.871160032 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 13:43:55.826117208 -0700
@@ -37,6 +37,7 @@
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
+static DECLARE_MUTEX(profile_flip_mutex);
#endif /* CONFIG_SMP */

static int __init profile_setup(char * str)
@@ -242,7 +243,6 @@

static void profile_flip_buffers(void)
{
- static DECLARE_MUTEX(profile_flip_mutex);
int i, j, cpu;

down(&profile_flip_mutex);
@@ -261,6 +261,22 @@
up(&profile_flip_mutex);
}

+static void profile_discard_flip_buffers(void)
+{
+ static DECLARE_MUTEX(profile_flip_mutex);
+ int i, cpu;
+
+ down(&profile_flip_mutex);
+ i = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
void profile_hit(int type, void *__pc)
{
unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
@@ -338,6 +354,7 @@
#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
+#define profile_discard_flip_buffers() do { } while (0)

void profile_hit(int type, void *__pc)
{
@@ -456,7 +473,7 @@
return -EINVAL;
}
#endif
-
+ profile_discard_flip_buffers();
memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
return count;
}

2004-09-14 21:20:25

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 01:04:53PM -0700, William Lee Irwin III wrote:
>> Take 2: actually register the notifier I wrote.

On Tue, Sep 14, 2004 at 02:04:22PM -0700, William Lee Irwin III wrote:
> As pointed out by John Hawkes, I forgot to flush the pending hits at
> the time of profile buffer reset. The following patch, atop the cpu
> hotplug notifier bits, does so.

Repost with corrected patch.


As pointed out by John Hawkes, I forgot to flush the pending hits at
the time of profile buffer reset. The following patch, atop the cpu
hotplug notifier bits, does so.

Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-14 13:46:05.151456768 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-14 14:03:01.854894352 -0700
@@ -37,6 +37,7 @@
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits);
static DEFINE_PER_CPU(int, cpu_profile_flip);
+static DECLARE_MUTEX(profile_flip_mutex);
#endif /* CONFIG_SMP */

static int __init profile_setup(char * str)
@@ -242,7 +243,6 @@

static void profile_flip_buffers(void)
{
- static DECLARE_MUTEX(profile_flip_mutex);
int i, j, cpu;

down(&profile_flip_mutex);
@@ -261,6 +261,21 @@
up(&profile_flip_mutex);
}

+static void profile_discard_flip_buffers(void)
+{
+ int i, cpu;
+
+ down(&profile_flip_mutex);
+ i = per_cpu(cpu_profile_flip, get_cpu());
+ put_cpu();
+ on_each_cpu(__profile_flip_buffers, NULL, 0, 1);
+ for_each_online_cpu(cpu) {
+ struct profile_hit *hits = per_cpu(cpu_profile_hits, cpu)[i];
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+ }
+ up(&profile_flip_mutex);
+}
+
void profile_hit(int type, void *__pc)
{
unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
@@ -338,6 +353,7 @@
#endif /* CONFIG_HOTPLUG_CPU */
#else /* !CONFIG_SMP */
#define profile_flip_buffers() do { } while (0)
+#define profile_discard_flip_buffers() do { } while (0)

void profile_hit(int type, void *__pc)
{
@@ -456,7 +472,7 @@
return -EINVAL;
}
#endif
-
+ profile_discard_flip_buffers();
memset(prof_buffer, 0, prof_len * sizeof(atomic_t));
return count;
}

2004-09-14 17:02:47

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines

On Tue, Sep 14, 2004 at 12:55:27PM +0200, Roger Luethi wrote:
>>> An architecture with sizeof(long) > 32? -- Most impressive.

On Tue, 14 Sep 2004 08:41:44 -0700, William Lee Irwin III wrote:
>> Did the correction not arrive?

On Tue, Sep 14, 2004 at 05:47:50PM +0200, Roger Leuthi wrote:
> Must have missed it.

Date: Mon, 13 Sep 2004 19:38:30 -0700
From: William Lee Irwin III <[email protected]>
To: Andrew Morton <[email protected]>
Cc: [email protected], Albert Cahalan <[email protected]>
Subject: Re: [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines
Message-ID: <[email protected]>

Please check to see that the above message arrived.

Thanks.


-- wli

2004-09-14 22:03:55

by Jesse Barnes

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5 bug in tcp_recvmsg?

On Monday, September 13, 2004 4:55 pm, David S. Miller wrote:
> diff -Nru a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> --- a/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
> +++ b/net/sched/sch_generic.c 2004-09-13 16:38:39 -07:00
> @@ -148,8 +148,10 @@
> spin_lock(&dev->queue_lock);
> return -1;
> }
> - if (ret == NETDEV_TX_LOCKED && nolock)
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + spin_lock(&dev->queue_lock);
> goto collision;
> + }
> }
>
> /* NETDEV_TX_BUSY - we need to requeue */

Ok, is *this* the sort of thing you'd expect this patch to fix? I've seen it
on a couple of different machines now (one 32p and one 8p), but I haven't
seen it since applying the above to the BK tree as of this morning. Either
way, I'll keep pounding on different machines using the BK tree + your patch
to see what problems I run into.

Thanks,
Jesse

bad: scheduling while atomic!

Call Trace:
[<a000000100017320>] show_stack+0x80/0xa0
sp=e00002bc38dffbd0 bsp=e00002bc38df9250
[<a000000100017370>] dump_stack+0x30/0x60
sp=e00002bc38dffda0 bsp=e00002bc38df9238
[<a0000001006a7500>] schedule+0x1160/0x1520
sp=e00002bc38dffda0 bsp=e00002bc38df9128
[<a0000001006a8430>] schedule_timeout+0xf0/0x200
sp=e00002bc38dffdc0 bsp=e00002bc38df90f0
[<a000000100192e40>] sys_poll+0x520/0x7c0
sp=e00002bc38dffe00 bsp=e00002bc38df9018
[<a00000010000f4c0>] ia64_ret_from_syscall+0x0/0x20
sp=e00002bc38dffe30 bsp=e00002bc38df8fd8
Warning: kfree_skb on hard IRQ a0000001005dcba0
bad: scheduling while atomic!

Call Trace:
[<a000000100017320>] show_stack+0x80/0xa0
sp=e00002bc38dffc40 bsp=e00002bc38df9100
[<a000000100017370>] dump_stack+0x30/0x60
sp=e00002bc38dffe10 bsp=e00002bc38df90e8
[<a0000001006a7500>] schedule+0x1160/0x1520
sp=e00002bc38dffe10 bsp=e00002bc38df8fd8
[<a00000010000fa20>] skip_rbs_switch+0x90/0xf0
sp=e00002bc38dffe30 bsp=e00002bc38df8fd8
Unable to handle kernel paging request at virtual address 20000000001bcab0
ls[11638]: Oops 4294967296 [1]
Modules linked in:

Pid: 11638, CPU 2, comm: ls
psr : 00001013081a6018 ifs : 8000000000000003 ip : [<20000000001bcab0>]
Nottainted
ip is at 0x20000000001bcab0
unat: 0000000000000000 pfs : c000000000000207 rsc : 000000000000000f
rnat: 0000000000000000 bsps: 60000fff7fffc3f0 pr : 0000000005a6a9e9
ldrs: 0000000000880000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : 20000000001b9390 b6 : 20000000001b9360 b7 : 2000000000173d30
f6 : 1003ecccccccccccccccd f7 : 1003e0000000000000007
f8 : 1000c9404000000000000 f9 : 0ffff8000000000000000
f10 : 1003e0000000000002501 f11 : 1000c9403fffff6bfc000
r1 : 200000000029c200 r2 : 2000000000304e58 r3 : 60000fffffffafb0
r8 : 0000000000000009 r9 : 00000000fbad8001 r10 : 0000000000000c00
r11 : 20000000002ffa98 r12 : 60000fffffffafb0 r13 : 2000000000081de0
r14 : 20000000000a9238 r15 : 20000000000a9240 r16 : 000000000011d360
r17 : 20000000001b9360 r18 : 200000000009c000 r19 : 200000000009c228
r20 : 0000000200000000 r21 : 0000000100000000 r22 : 0000000000000000
r23 : 200000000003e16c r24 : 4000000000001bd0 r25 : 200000000003e0d0
r26 : 6000000000001da8 r27 : 200000000029c200 r28 : 20000000003008d0
r29 : 20000000003008c8 r30 : 20000000000804c8 r31 : 000000000000142b
r32 : 0000000000000000 r33 : 60000fffffffb42c r34 : 400000000000ba00
Kernel panic - not syncing: Aiee, killing interrupt handler!
Rebooting in 5 seconds..

2004-09-14 17:02:47

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [profile] amortize atomic hit count increments

On Tue, Sep 14, 2004 at 09:16:48AM -0700, Jesse Barnes wrote:
> the readprofile times, I'd say per-cpu would be the way to go just because it
> retains the simplicity of the current approach while allowing it to work on
> large machines (as well as limiting the performance impact of builtin
> profiling in general). wli's approach seems like a reasonable tradeoff
> though, assuming what you suggest doesn't work.

per-cpu certainly sounds simple enough conceptually, so if you can
notice any slowdown even with idle loop ruled out, per-cpu is sure
better.

This bouncing is likely to hurt smaller SMP too (but once the cpu is
idle normally it's not a too bad thing since it only hurted reschedule
latency, since we remain stuck in the timer irq for a bit longer than we
should), but duplicating the ram of the array there doesn't look as nice
as it would be on the altix, not all SMP have tons of ram. So an
intermediate solution for this problem still sound worthwhile for the
normal smp case.

2004-09-15 10:52:21

by William Lee Irwin III

[permalink] [raw]
Subject: [procfs] [2/1] report per-process pagetable usage

On Mon, Sep 13, 2004 at 07:53:04PM -0700, William Lee Irwin III wrote:
>> Not all binfmts page align ->end_code and ->start_code, so the task_mmu
>> statistics calculations need to perform this allocation themselves.

On Mon, Sep 13, 2004 at 07:54:58PM -0700, William Lee Irwin III wrote:
> s/allocation/alignment/

Andi Kleen requested that the number of pagetable pages in use by a
process be reported in /proc/$PID/status; this patch implements that.
Atop the text reporting fix. Compiletested on x86-64.

Index: mm5-2.6.9-rc1/arch/i386/mm/hugetlbpage.c
===================================================================
--- mm5-2.6.9-rc1.orig/arch/i386/mm/hugetlbpage.c 2004-08-13 22:37:42.000000000 -0700
+++ mm5-2.6.9-rc1/arch/i386/mm/hugetlbpage.c 2004-09-15 03:31:26.914794288 -0700
@@ -247,6 +247,7 @@

page = pmd_page(*pmd);
pmd_clear(pmd);
+ mm->nr_ptes--;
dec_page_state(nr_page_table_pages);
page_cache_release(page);
}
Index: mm5-2.6.9-rc1/arch/ppc64/mm/hugetlbpage.c
===================================================================
--- mm5-2.6.9-rc1.orig/arch/ppc64/mm/hugetlbpage.c 2004-09-13 16:27:32.000000000 -0700
+++ mm5-2.6.9-rc1/arch/ppc64/mm/hugetlbpage.c 2004-09-15 03:32:25.375906848 -0700
@@ -213,6 +213,7 @@
}
page = pmd_page(*pmd);
pmd_clear(pmd);
+ mm->nr_ptes--;
dec_page_state(nr_page_table_pages);
pte_free_tlb(tlb, page);
}
Index: mm5-2.6.9-rc1/fs/proc/task_mmu.c
===================================================================
--- mm5-2.6.9-rc1.orig/fs/proc/task_mmu.c 2004-09-13 19:43:19.000000000 -0700
+++ mm5-2.6.9-rc1/fs/proc/task_mmu.c 2004-09-15 03:42:42.746052320 -0700
@@ -18,12 +18,14 @@
"VmData:\t%8lu kB\n"
"VmStk:\t%8lu kB\n"
"VmExe:\t%8lu kB\n"
- "VmLib:\t%8lu kB\n",
+ "VmLib:\t%8lu kB\n"
+ "VmPTE:\t%8lu kB\n",
(mm->total_vm - mm->reserved_vm) << (PAGE_SHIFT-10),
mm->locked_vm << (PAGE_SHIFT-10),
mm->rss << (PAGE_SHIFT-10),
data << (PAGE_SHIFT-10),
- mm->stack_vm << (PAGE_SHIFT-10), text, lib);
+ mm->stack_vm << (PAGE_SHIFT-10), text, lib,
+ (PTRS_PER_PTE*sizeof(pte_t)*mm->nr_ptes) >> 10);
return buffer;
}

Index: mm5-2.6.9-rc1/include/linux/sched.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/sched.h 2004-09-14 14:44:05.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/sched.h 2004-09-15 03:22:38.650102728 -0700
@@ -227,7 +227,7 @@
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
unsigned long rss, total_vm, locked_vm, shared_vm;
- unsigned long exec_vm, stack_vm, reserved_vm, def_flags;
+ unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;

unsigned long saved_auxv[42]; /* for /proc/PID/auxv */

Index: mm5-2.6.9-rc1/kernel/fork.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/fork.c 2004-09-14 14:45:49.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/fork.c 2004-09-15 03:23:33.238803984 -0700
@@ -308,6 +308,7 @@
atomic_set(&mm->mm_count, 1);
init_rwsem(&mm->mmap_sem);
mm->core_waiters = 0;
+ mm->nr_ptes = 0;
mm->page_table_lock = SPIN_LOCK_UNLOCKED;
mm->ioctx_list_lock = RW_LOCK_UNLOCKED;
mm->ioctx_list = NULL;
Index: mm5-2.6.9-rc1/mm/memory.c
===================================================================
--- mm5-2.6.9-rc1.orig/mm/memory.c 2004-09-13 16:27:46.000000000 -0700
+++ mm5-2.6.9-rc1/mm/memory.c 2004-09-15 03:30:32.241105952 -0700
@@ -114,6 +114,7 @@
page = pmd_page(*dir);
pmd_clear(dir);
dec_page_state(nr_page_table_pages);
+ tlb->mm->nr_ptes--;
pte_free_tlb(tlb, page);
}

@@ -163,7 +164,6 @@
spin_lock(&mm->page_table_lock);
if (!new)
return NULL;
-
/*
* Because we dropped the lock, we should re-check the
* entry, as somebody else could have populated it..
@@ -172,6 +172,7 @@
pte_free(new);
goto out;
}
+ mm->nr_ptes++;
inc_page_state(nr_page_table_pages);
pmd_populate(mm, pmd, new);
}

2004-09-15 11:36:59

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> +cfq-iosched-v2.patch
> Major revamp of the CFQ IO scheduler

While editing some files while booted into 2.6.9-rc1-mm5:

# ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cfq_iosched:1359
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: st sr_mod floppy usbserial parport_pc lp parport snd_seq_oss snd_seq_device snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_ioctl32 thermal processor fan button battery snd_intel8x0 snd_ac97_codec snd_pcm snd_timer ipv6 ac snd soundcore snd_page_alloc af_packet joydev usbhid ehci_hcd e1000 uhci_hcd usbcore hw_random evdev dm_mod ext3 jbd aic79xx ata_piix libata sd_mod scsi_mod
Pid: 9615, comm: cc1 Not tainted 2.6.9-rc1-mm5
RIP: 0010:[<ffffffff80290ab6>] <ffffffff80290ab6>{cfq_put_request+166}
RSP: 0000:ffffffff804c8638 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 000001017e2c3b80 RCX: 00000000000049f2
RDX: 0000000000000001 RSI: 000001017e75cd10 RDI: 000001000b5d57c0
RBP: 000001017e75cd10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 000001016d1b3db0
R13: 000001017d142c08 R14: 000001017fff1400 R15: 0000000000000001
FS: 0000002a9588d6e0(0000) GS:ffffffff8055c880(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000548000 CR3: 0000000000101000 CR4: 00000000000006e0
Process cc1 (pid: 9615, threadinfo 000001011f720000, task 000001012d4897e0)
Stack: 0000000000001000 000001016d1b3db0 000001017e2c3b80 0000000000000001
0000000000000001 000001017e2c3b80 0000000000000200 ffffffff8028527f
0000010163320300 ffffffff80287bfb
Call Trace:<IRQ> <ffffffff8028527f>{elv_put_request+15} <ffffffff80287bfb>{__blk_put_request+139}
<ffffffff80287d33>{end_that_request_last+243} <ffffffffa0006178>{:scsi_mod:scsi_end_request+200}
<ffffffffa00063f0>{:scsi_mod:scsi_io_completion+576}
<ffffffffa0000506>{:scsi_mod:scsi_finish_command+214}
<ffffffffa0000e4a>{:scsi_mod:scsi_softirq+234} <ffffffff8013df61>{__do_softirq+113}
<ffffffff8013e015>{do_softirq+53} <ffffffff80113f1f>{do_IRQ+335}
<ffffffff80110c97>{ret_from_intr+0} <EOI>

Code: 0f 0b 26 9b 38 80 ff ff ff ff 4f 05 ff c8 41 89 44 95 58 0f
RIP <ffffffff80290ab6>{cfq_put_request+166} RSP <ffffffff804c8638>
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!


-- wli

2004-09-15 11:40:52

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> > +cfq-iosched-v2.patch
> > Major revamp of the CFQ IO scheduler
>
> While editing some files while booted into 2.6.9-rc1-mm5:
>
> # ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at cfq_iosched:1359

Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
etc)?

--
Jens Axboe

2004-09-15 12:29:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
>>> +cfq-iosched-v2.patch
>>> Major revamp of the CFQ IO scheduler

On Wed, Sep 15 2004, William Lee Irwin III wrote:
>> While editing some files while booted into 2.6.9-rc1-mm5:
>> # ----------- [cut here ] --------- [please bite here ] ---------
>> Kernel BUG at cfq_iosched:1359

On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> etc)?

2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
binary or 3rd-party modules anywhere near the box' fs or even the
network the thing is on. lspci output follows.


-- wli

0000:00:00.0 Host bridge: Intel Corp. Workstation Memory Controller Hub (rev 08)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: [40] #09 [a105]

0000:00:00.1 Class ff00: Intel Corp. Memory Controller Hub Error Reporting Register (rev 08)
Subsystem: Intel Corp. Memory Controller Hub Error Reporting Register
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

0000:00:03.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port A1 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=00, secondary=02, subordinate=04, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa400000-fa8fffff
Prefetchable memory behind bridge: 00000000bfe00000-00000000bfe00000
Expansion ROM at 0000d000 [disabled] [size=4K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0141]

0000:00:04.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express Port B0 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fa900000-feafffff
Prefetchable memory behind bridge: 00000000bff00000-00000000dfe00000
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable-
Address: fee00000 Data: 0000
Capabilities: [64] #10 [0141]

0000:00:08.0 System peripheral: Intel Corp. Memory Controller Hub Extended Configuration Registers (rev 08)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

0000:00:1d.0 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 201
Region 4: I/O ports at e080 [size=32]

0000:00:1d.1 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 209
Region 4: I/O ports at e400 [size=32]

0000:00:1d.2 USB Controller: Intel Corp. 82801EB USB (rev 02) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin C routed to IRQ 169
Region 4: I/O ports at e480 [size=32]

0000:00:1d.7 USB Controller: Intel Corp. 82801EB USB2 (rev 02) (prog-if 20 [EHCI])
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 193
Region 0: Memory at febff400 (32-bit, non-prefetchable)
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] #0a [20a0]

0000:00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB PCI Bridge (rev c2) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: fa300000-fa3fffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

0000:00:1f.0 ISA bridge: Intel Corp. 82801EB LPC Interface Controller (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

0000:00:1f.2 IDE interface: Intel Corp. 82801EB Ultra ATA Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Intel Corp. 82801EB Ultra ATA Storage Controller
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 169
Region 0: I/O ports at <unassigned>
Region 1: I/O ports at <unassigned>
Region 2: I/O ports at <unassigned>
Region 3: I/O ports at <unassigned>
Region 4: I/O ports at fc00 [size=16]

0000:00:1f.3 SMBus: Intel Corp. 82801EB SMBus Controller (rev 02)
Subsystem: Intel Corp.: Unknown device 24d0
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 5
Region 4: I/O ports at e800 [size=32]

0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801EB AC'97 Audio Controller (rev 02)
Subsystem: Intel Corp.: Unknown device e801
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 217
Region 0: I/O ports at ec00
Region 1: I/O ports at e880 [size=64]
Region 2: Memory at febffc00 (32-bit, non-prefetchable) [size=512]
Region 3: Memory at febff800 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:01:02.0 Ethernet controller: Intel Corp. 82541GI Gigabit Ethernet Controller
Subsystem: Intel Corp.: Unknown device 3408
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin A routed to IRQ 217
Region 0: Memory at fa3e0000 (32-bit, non-prefetchable) [size=180000000]
Region 1: Memory at fa3c0000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at cc80 [size=64]
Expansion ROM at 00020000 [disabled]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
0000:02:00.0 PCI bridge: Intel Corp.: Unknown device 0320 (rev 08) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=02, secondary=04, subordinate=04, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa400000-fa6fffff
Prefetchable memory behind bridge: 00000000bfe00000-00000000bfe00000
Expansion ROM at 0000d000 [disabled] [size=4K]
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8]
0000:02:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller A (rev 08) (prog-if 20 [IO(X)-APIC])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fa8fe000 (32-bit, non-prefetchable)
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:02:00.2 PCI bridge: Intel Corp.: Unknown device 0321 (rev 08) (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Bus: primary=02, secondary=03, subordinate=03, sec-latency=64
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000fff00000-0000000000000000
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
Capabilities: [44] #10 [0071]
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d8]
0000:02:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt Controller B (rev 08) (prog-if 20 [IO(X)-APIC])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: Memory at fa8ff000 (32-bit, non-prefetchable)
Capabilities: [44] #10 [0001]
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

0000:04:03.0 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
Subsystem: Adaptec: Unknown device ffff
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (10000ns min, 6250ns max), cache line size 10
Interrupt: pin A routed to IRQ 177
Region 0: I/O ports at d400 [size=180000000]
Region 1: Memory at fa6fc000 (64-bit, non-prefetchable) [disabled] [size=8K]
Region 3: I/O ports at d000 [size=256]
Expansion ROM at ffffffff3ff00000 [disabled]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [94]
0000:04:03.1 SCSI storage controller: Adaptec AIC-7902 U320 (rev 03)
Subsystem: Adaptec: Unknown device ffff
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (10000ns min, 6250ns max), cache line size 10
Interrupt: pin B routed to IRQ 185
Region 0: I/O ports at dc00 [size=180000000]
Region 1: Memory at fa6fe000 (64-bit, non-prefetchable) [disabled] [size=8K]
Region 3: I/O ports at d800 [size=256]
Expansion ROM at ffffffff3ff00000 [disabled]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [94]
0000:05:00.0 VGA compatible controller: nVidia Corporation: Unknown device 00fd (rev a2) (prog-if 00 [VGA])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, cache line size 10
Interrupt: pin A routed to IRQ 11
Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=feae0000]
Region 1: Memory at c0000000 (32-bit, prefetchable) [size=256M]
Region 2: Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
Expansion ROM at 00020000 [disabled]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [78] #10 [0011]

2004-09-15 12:43:06

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> >>> +cfq-iosched-v2.patch
> >>> Major revamp of the CFQ IO scheduler
>
> On Wed, Sep 15 2004, William Lee Irwin III wrote:
> >> While editing some files while booted into 2.6.9-rc1-mm5:
> >> # ----------- [cut here ] --------- [please bite here ] ---------
> >> Kernel BUG at cfq_iosched:1359
>
> On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> > Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> > etc)?
>
> 2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
> binary or 3rd-party modules anywhere near the box' fs or even the
> network the thing is on. lspci output follows.

Hmm, I can only see this happening if rq->flags has its direction bit
changed between the allocation time and the time of freeing. I'll look
over scsi and see if I can find any traces of that, don't see any
immediately.

--
Jens Axboe

2004-09-15 12:55:00

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, Jens Axboe wrote:
> On Wed, Sep 15 2004, William Lee Irwin III wrote:
> > On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> > >>> +cfq-iosched-v2.patch
> > >>> Major revamp of the CFQ IO scheduler
> >
> > On Wed, Sep 15 2004, William Lee Irwin III wrote:
> > >> While editing some files while booted into 2.6.9-rc1-mm5:
> > >> # ----------- [cut here ] --------- [please bite here ] ---------
> > >> Kernel BUG at cfq_iosched:1359
> >
> > On Wed, Sep 15, 2004 at 01:38:34PM +0200, Jens Axboe wrote:
> > > Hmm, ->allocated is unbalanced. What is your io setup like (adapter,
> > > etc)?
> >
> > 2 Maxtor Atlas10K 10Krpm U320 disks attached to some aic7902's. No
> > binary or 3rd-party modules anywhere near the box' fs or even the
> > network the thing is on. lspci output follows.
>
> Hmm, I can only see this happening if rq->flags has its direction bit
> changed between the allocation time and the time of freeing. I'll look
> over scsi and see if I can find any traces of that, don't see any
> immediately.

Can you try if this works?

--- linux-2.6.9-rc1-mm5/drivers/block/cfq-iosched.c~ 2004-09-15 14:50:14.941876065 +0200
+++ linux-2.6.9-rc1-mm5/drivers/block/cfq-iosched.c 2004-09-15 14:51:09.889996813 +0200
@@ -195,6 +195,7 @@
unsigned int in_flight : 1;
unsigned int accounted : 1;
unsigned int is_sync : 1;
+ unsigned int is_write : 1;
};

static struct cfq_queue *cfq_find_cfq_hash(struct cfq_data *, unsigned long);
@@ -1353,12 +1354,12 @@
if (crq->io_context)
put_io_context(crq->io_context->ioc);

+ BUG_ON(!cfqq->allocated[crq->is_write]);
+ cfqq->allocated[crq->is_write]--;
+
mempool_free(crq, cfqd->crq_pool);
rq->elevator_private = NULL;

- BUG_ON(!cfqq->allocated[rw]);
- cfqq->allocated[rw]--;
-
smp_mb();
cfq_check_waiters(q, cfqq);
cfq_put_queue(cfqq);
@@ -1415,6 +1416,7 @@
crq->io_context = cic;
crq->service_start = crq->queue_start = 0;
crq->in_flight = crq->accounted = crq->is_sync = 0;
+ crq->is_write = rw;
rq->elevator_private = crq;
cfqq->allocated[rw]++;
cfqq->alloc_limit[rw] = 0;

--
Jens Axboe

2004-09-15 12:57:50

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, Jens Axboe wrote:
>> Hmm, I can only see this happening if rq->flags has its direction bit
>> changed between the allocation time and the time of freeing. I'll look
>> over scsi and see if I can find any traces of that, don't see any
>> immediately.

On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
> Can you try if this works?

Booting it ASAP.


-- wli

2004-09-16 00:45:11

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, Jens Axboe wrote:
>>> Hmm, I can only see this happening if rq->flags has its direction bit
>>> changed between the allocation time and the time of freeing. I'll look
>>> over scsi and see if I can find any traces of that, don't see any
>>> immediately.

On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
>> Can you try if this works?

On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
> Booting it ASAP.

It appears to have lasted enough hours to call it an improvement. I'll
leave it running for a while longer just in case.


-- wli

2004-09-16 05:45:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, Jens Axboe wrote:
>>>> Hmm, I can only see this happening if rq->flags has its direction bit
>>>> changed between the allocation time and the time of freeing. I'll look
>>>> over scsi and see if I can find any traces of that, don't see any
>>>> immediately.

On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
>>> Can you try if this works?

On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
>> Booting it ASAP.

On Wed, Sep 15, 2004 at 05:38:19PM -0700, William Lee Irwin III wrote:
> It appears to have lasted enough hours to call it an improvement. I'll
> leave it running for a while longer just in case.

Okay, it got well over 8 solid hours, so I'm going to move on to booting
something else.


-- wli

2004-09-16 05:47:25

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm5

On Wed, Sep 15 2004, William Lee Irwin III wrote:
> On Wed, Sep 15 2004, Jens Axboe wrote:
> >>>> Hmm, I can only see this happening if rq->flags has its direction bit
> >>>> changed between the allocation time and the time of freeing. I'll look
> >>>> over scsi and see if I can find any traces of that, don't see any
> >>>> immediately.
>
> On Wed, Sep 15, 2004 at 02:50:57PM +0200, Jens Axboe wrote:
> >>> Can you try if this works?
>
> On Wed, Sep 15, 2004 at 05:53:55AM -0700, William Lee Irwin III wrote:
> >> Booting it ASAP.
>
> On Wed, Sep 15, 2004 at 05:38:19PM -0700, William Lee Irwin III wrote:
> > It appears to have lasted enough hours to call it an improvement. I'll
> > leave it running for a while longer just in case.
>
> Okay, it got well over 8 solid hours, so I'm going to move on to booting
> something else.

Thanks for your testing, I'm concluding that it most likely fixed your
problem.

--
Jens Axboe