2006-01-07 13:22:40

by Andrew Morton

[permalink] [raw]
Subject: 2.6.15-mm2


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/

This should be somewhat less buggy than 2.6.15-mm1.


Changes since 2.6.15-mm1:

linus.patch
git-acpi.patch
git-agpgart.patch
git-arm.patch
git-blktrace.patch
git-block.patch
git-cfq.patch
git-cifs.patch
git-drm.patch
git-audit.patch
git-infiniband.patch
git-input.patch
git-libata-all.patch
git-mmc.patch
git-netdev-all.patch
git-ntfs.patch
git-ocfs2.patch
git-powerpc.patch
git-serial.patch
git-sym2.patch
git-sas-jg.patch
git-watchdog.patch
git-xfs.patch
git-cryptodev.patch

Subsystem trees

+revert-mm-page_state-fixes.patch

This was a deoptimisation

+asm-generic-atomich-needs-typesh.patch

Build fix

+md-support-check-without-repair-of-raid10-arrays.patch

Linus seemed to drop this, and I'm not sure it's right any more. Waiting
for Neil to return.

-hfsplus-oops-fix.patch
-nbd-fix-tx-rx-race-condition.patch
-nbd-fix-tx-rx-race-condition-update.patch
-knfsd-fix-hash-function-for-ip-addresses-on-64bit-little-endian-machines.patch
-alpha-dma_map_page-fix.patch
-gregkh-i2c-i2c-i801-explicitly-set-smbauxctl.patch
-gregkh-i2c-i2c-ds1337-init.patch
-gregkh-i2c-hwmon-adm1025-adm1026-remove-deprecated-symbols.patch
-gregkh-i2c-hwmon-lm85-adt7463-vrm-10.patch
-gregkh-i2c-hwmon-w83627thf-fix-vrm-and-vid.patch
-gregkh-i2c-hwmon-w83627thf-vid-documentation-update.patch
-gregkh-i2c-i2c-rtc8564-remove-duplicate-bcd-macros.patch
-gregkh-i2c-i2c-parport-barco-lpt-dvi.patch
-gregkh-i2c-hwmon-vt8231-new-driver.patch
-gregkh-i2c-i2c-drop-driver-flags-01-df-dummy.patch
-gregkh-i2c-i2c-drop-driver-flags-02-df-notify.patch
-gregkh-i2c-i2c-drop-driver-flags-03-flags.patch
-gregkh-i2c-i2c-client-use-01-drop-multiple-use-flag.patch
-gregkh-i2c-i2c-client-use-02-make-use-flag-default.patch
-gregkh-i2c-i2c-client-use-03-allow-multiple-use.patch
-gregkh-i2c-i2c-doc-porting-clients-update.patch
-gregkh-i2c-i2c-core-get-client-is-gone.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-01-core.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-02-chips.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-03-hwmon.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-04-macintosh.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-05-media.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-06-oss.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-07-ppc.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-08-acorn.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-09-video.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-10-arm.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-11-documentation.patch
-gregkh-i2c-i2c-drop-driver-owner-and-name-12-fix-debug.patch
-gregkh-i2c-i2c-driver-owner-cleanup-01.patch
-gregkh-i2c-i2c-driver-owner-cleanup-02.patch
-gregkh-i2c-i2c-driver-owner-cleanup-03.patch
-gregkh-i2c-i2c-dev-dynamic_class.patch
-gregkh-i2c-hwmon-w83792d-misc-cleanups.patch
-gregkh-i2c-hwmon-w83792d-simplify-in-low-bits.patch
-gregkh-i2c-hwmon-vrm-via.patch
-gregkh-i2c-hwmon-it87-vrm-fits-in-u8.patch
-gregkh-i2c-i2c-id-cleanups.patch
-gregkh-i2c-i2c-drop-useless-command-callbacks.patch
-gregkh-i2c-i2c-update-command-documentation.patch
-gregkh-i2c-i2c-drop-driver-flags-04-drop-outdated-comments.patch
-gregkh-i2c-i2c-ibm_iic-hwmon-class.patch
-gregkh-i2c-i2c-nforce2-support-nforce4-mcp04.patch
-gregkh-i2c-i2c-mv64xxx-abort-fix.patch
-i8042-add-oqo-zepto-to-noloop-dmi-table.patch
-remove-duplicated-pci-id.patch
-libata_suspend.patch
-libata_suspend-fix.patch
-swsusp-resume_store-retval-fix.patch
-netfilter-fix-handling-of-module-param-dcc_timeout-in-ip_conntrack_ircc.patch
-git-serial-build-fix.patch
-git-alsa-vs-git-pcmcia.patch
-mm-fix-__alloc_pages-cpuset-alloc_-flags.patch
-mm-gfp_atomic-comment.patch
-memhotplug-__add_section-remove-unused-pgdat-definition.patch
-memhotplug-register_-and-unregister_memory_notifier-should-be-global.patch
-memhotplug-register_memory-should-be-global.patch
-reiser4-truncate_inode_pages_range.patch
-madvise-remove-remove-pages-from-tmpfs-shm-backing-store.patch
-hugetlb-remove-duplicate-i_size-check.patch
-hugetlb-rename-find_lock_page-to.patch
-hugetlb-reorganize-hugetlb_fault-to-prepare-for-cow.patch
-hugetlb-copy-on-write-support.patch
-dequeue-a-huge-page-near-to-this-node.patch
-add-numa-policy-support-for-huge-pages.patch
-add-numa-policy-support-for-huge-pages-fix.patch
-add-numa-policy-support-for-huge-pages-fix-fix.patch
-remove-old-node-based-policy-interface-from-mempolicyc.patch
-hugepages-fold-find_or_alloc_pages-into-huge_no_page.patch
-mm-kvaddr_to_nid-not-used-in-common-code.patch
-mm-pfn_to_pgdat-not-used-in-common-code.patch
-mm-remove-arch-independent-nodes_span_other_nodes.patch
-shut-up-warnings-in-ipc-shmc.patch
-flatmem-split-out-memory-model.patch
-sparsemem-provide-pfn_to_nid.patch
-mm-free_pages_and_swap_cache-opt.patch
-mm-pagealloc-opt.patch
-mm-set_page_refs-opt.patch
-mm-microopt-conditions.patch
-mm-remove-bad_range.patch
-mm-remove-pcp-low.patch
-mm-page_state-fixes.patch
-mm-page_alloc-cleanups.patch
-mm-optimize-numa-policy-handling-in-slab-allocator.patch
-mm-free-pages-from-local-pcp-lists-under-tight-memory-conditions.patch
-cleanup-bootmem-allocator-and-fix-alloc_bootmem_low.patch
-frv-clean-up-bootmem-allocators-page-freeing-algorithm.patch
-find_lock_page-call-__lock_page-directly.patch
-kill-last-zone_reclaim-bits.patch
-kill-last-zone_reclaim-bits-fix.patch
-mm-dma32-zone-statistics.patch
-mm-bad_page-opt.patch
-mm-rmap-opt.patch
-mm-pfault-opt.patch
-mm-pcp-drain-tweak.patch
-vmscan-balancing-fix.patch
-consolidate-lru_add_drain-and-lru_drain_cache.patch
-mm-add-populated_zone-helper.patch
-simplify-build_zonelists_node-by-removing-the-case.patch
-move-determination-of-policy_zone-into-page-allocator.patch
-fix-zone-policy-determination.patch
-build_zonelists_node-rename-args.patch
-atomic_long_t-include-asm-generic-atomich-v2.patch
-atomic_long_t-include-asm-generic-atomich-v2-fix.patch
-atomic_long_t-include-asm-generic-atomich-v2-fix-2.patch
-atomic_long_t-include-asm-generic-atomich-v2-fix-3.patch
-atomic_long_t-include-asm-generic-atomich-v2-fix-4.patch
-mm-page_state-opt.patch
-mm-page_state-opt-fix.patch
-mm-page_state-opt-docs.patch
-x25-fix-for-broken-x25-module.patch
-selinux-array_size-cleanups.patch
-selinux-more-array_size-cleanups.patch
-keys-remove-key-duplication.patch
-security-possible-cleanups.patch
-ppc32-remove-jumbo-member-from-ocp_func_emac_data.patch
-arch-ppc-kernel-idlec-dont-declare-cpu-variable-in-non-smp-kernels.patch
-macintosh-dont-store-i2c_add_driver-return-if-no-further-processing-done.patch
-ppc32-remove-useless-file-arch-ppc-platforms-mpc5200c.patch
-ppc32-serial-fix-compiler-errors-with-gcc-4x-in.patch
-ppc32-serial-change-mpc52xx_uartc-to-use-the-low.patch
-ppc32-fix-static-io-mapping-for-freescale-mpc52xx.patch
-ppc32-modify-freescale-mpc52xx-irq-mapping-to-_not_.patch
-ppc32-remove-__init-qualifier-from-mpc52xx-pci.patch
-ppc32-fix-mpc52xx-configuration-space-access.patch
-ppc32-fix-mpc52xx-pci-init-in-cas-the-bootloader.patch
-ppc32-allows-compilation-of-a-mpc52xx-kernel-without.patch
-ppc32-re-add-embed_configc-to-ml300-ep405.patch
-therm_adt746x-quiet-fan-speed-change-messages.patch
-nommu-provide-shared-writable-mmap-support-on-ramfs.patch
-nommu-provide-shared-writable-mmap-support-on-ramfs-tidy.patch
-nommu-make-sysv-ipc-shm-use-ramfs-facilities-on-nommu.patch
-frv-implement-futex-operations-for-frv.patch
-frv-make-futex-code-compilable-on-nommu.patch
-frv-fix-frv-signal-handling.patch
-frv-improve-signal-handling.patch
-remove-include-asm-mips-riscos-syscallh.patch
-x86-gdt-alignment-fix.patch
-i386-dont-blindly-enable-interrupts-in-die.patch
-i386-move-simd-initialization.patch
-i386-move-simd-initialization-fix.patch
-i386-fix-bound-check-idt-gate.patch
-x86-cr4-is-valid-on-some-486s.patch
-x86-pnp-segments-in-segment-h.patch
-x86-always-relax-segments.patch
-x86-apm-seg-in-gdt.patch
-x86-deprecate-obsolete-ldt-accessors.patch
-x86-pnp-byte-granularity.patch
-x86-fixed-pnp-bios-limits.patch
-x86-stop-deleting-nt.patch
-x86-apm-is-on-cpu-zero-only.patch
-x86-deprecate-useless-bug.patch
-x86-handle-wsign-compare-in-bitops.patch
-mark-rodata-section-read-only-generic-infrastructure.patch
-mark-rodata-section-read-only-x86-parts.patch
-mark-rodata-section-read-only-generic-x86-64-bugfix.patch
-mark-rodata-section-read-only-x86-64-support.patch
-mark-rodata-section-read-only-make-some-datastructures-const.patch
-debug-option-to-write-protect-rodata-x86-support-warning-fix.patch
-i386-sparsemem-for-single-node-systems.patch
-allow-flatmem-to-be-disabled-when-only-sparsemem-is-implemented.patch
-x86-convert-bigsmp-to-use-flat-physical-mode.patch
-x86-make-bigsmp-the-default-mode-if-config_hotplug_cpu.patch
-reboot-through-the-bios-on-newer-hp-laptops.patch
-x86-change-page-attr-fix.patch
-x86-change-page-attr-fix-fix.patch
-x86-change-page-attr-fix-fix-fix.patch
-x86-missing-printk-newline-in-apic-boot-option-parser.patch
-x86-fls-in-asm.patch
-arch-i386-kernel-msrc-removed-unused-variable.patch
-arch-i386-kernel-cpuidc-unused-variable.patch
-i386-gpio-driver-for-amd-cs5535-cs5536-fix.patch
-base-support-for-amd-geode-gx-lx-processors.patch
-base-support-for-amd-geode-gx-lx-processors-tidy.patch
-geode-lx-hw-rng-support.patch
-geode-lx-hw-rng-support-fix.patch
-apm-screen-blanking-fix.patch
-apm-screen-blanking-fix-tidy.patch
-fix-cpu-frequency-detection-in-arch-i386-kernel-timers-timer_tsccrecalibrate_cpu_khz.patch
-mpspec-remove-unneeded-packed-attribute.patch
-i386-ioapic-virtual-wire-mode-fix.patch
-i386-handle-hp-laptop-rebooting-properly.patch
-cpu-hotplug-x86_64-disable-interrupt-in-play_dead.patch
-alpha-convert-to-generic-irq-framework-generic-part.patch
-alpha-convert-to-generic-irq-framework-alpha-part.patch
-swsusp-remove-encryption.patch
-swsusp-introduce-the-swap-map-structure.patch
-swsusp-improve-freeing-of-memory.patch
-swsusp-improve-freeing-of-memory-fix.patch
-swsusp-fix-enough_free_mem.patch
-oss-remove-deprecated-pm-interface-from-ad1848-driver.patch
-oss-remove-deprecated-pm-interface-from-cs4281-driver.patch
-oss-remove-deprecated-pm-interface-from-cs46xx-driver.patch
-oss-remove-deprecated-pm-interface-from-maestro-driver.patch
-oss-remove-deprecated-pm-interface-from-nm256-driver.patch
-oss-remove-deprecated-pm-interface-from-opl3sa2-driver.patch
-swsusp-drop-duplicate-prototypes.patch
-swsusp-limit-image-size.patch
-swsusp-make-image-size-limit-tunable.patch
-mm-add-a-new-function-needed-for-swap-suspend.patch
-swsusp-improve-handling-of-swap-partitions.patch
-swsusp-save-image-header-first.patch
-dont-freeze-firewire-on-suspend.patch
-m32r-trivial-fix-to-remove-unused-instructions.patch
-m32r-support-m32104ut-target-platform.patch
-m32r-update-syscall-macros-for-mmu-less.patch
-m32r-update-_port2addr-to-use.patch
-m32r-fix-m32104-cache-flushing-routines.patch
-m32r-remove-unnecessary-icu_data_t.patch
-m68knommu-enable_irq-disable_irq.patch
-m68knommu-remove-enable_irq_nosync.patch
-cris-kgdb-remove-double_this.patch
-uml-use-kstrdup.patch
-uml-non-void-functions-should-return-something.patch
-uml-formatting-changes.patch
-uml-use-array_size.patch
-uml-remove-unneeded-structure-field.patch
-uml-move-mconsole-support-out-of-generic-code.patch
-uml-add-static-initializations-and-declarations.patch
-uml-line_setup-interface-change.patch
-uml-move-console-configuration.patch
-uml-simplify-console-opening-closing-and-irq-registration.patch
-uml-fix-flip_buf-full-handling.patch
-uml-add-throttling-to-console-driver.patch
-uml-separate-libc-dependent-umid-code.patch
-uml-umid-cleanup.patch
-uml-sigwinch-handling-cleanup.patch
-uml-better-diagnostics-for-broken-configs.patch
-uml-add-mconsole_reply-variant-with-length-param.patch
-uml-capture-printk-output-for-mconsole-stack.patch
-uml-capture-printk-output-for-mconsole-sysrq.patch
-uml-fix-whitespace-in-mconsole-driver.patch
-uml-free-network-irq-correctly.patch
-s390-atomic-primitives.patch
-s390-cms-volume-label-definitions.patch
-s390-uaccess-warnings.patch
-s390-rt_sigreturn-fix.patch
-s390-update-default-configuration.patch
-s390-cputime_t-fixes.patch
-s390-re-activated-path-detection.patch
-s390-move-s390_root_dev_-out-of-the-cio-layer.patch
-s390-biodasdprrd-ioctl-return-code.patch
-s390-dasd-failfast-support.patch
-s390-add-oprofile-callgraph-support.patch
-s390-in-kernel-crypto-rename.patch
-s390-sha256-support.patch
-s390-aes-support.patch
-s390-in-kernel-crypto-test-vectors.patch
-s390-qdio-v=v-pass-through.patch
-s390-introduce-struct-subchannel_id.patch
-s390-introduce-for_each_subchannel.patch
-s390-introduce-struct-channel_subsystem.patch
-s390-convert-proc-cio_ignore.patch
-s390-multiple-subchannel-sets-support.patch
-s390-add-support-for-cex2a-crypto-cards.patch
-s390-fix-missing-release-function-and-cosmetic-changes.patch
-s390-fix-invalid-return-code-in-sclp_cpi.patch
-s390-cleanup-kconfig.patch
-mmci-kunmap_atomic-unmaps-virtual-address-not-page.patch
-protect-remove_proc_entry.patch
-add-udev-support-to-parport_pc.patch
-add-udev-support-to-parport_pc-tidy.patch
-i2o-changed-i2o-api-to-create-i2o-messages-in-kernel.patch
-i2o-changed-i2o-api-to-create-i2o-messages-in-kernel-fix.patch
-i2o-sparc-fixes.patch
-i2o-remove-wrong-i2o-device-class.patch
-i2o-bugfixes.patch
-i2o-beautifying.patch
-i2o-optimizing.patch
-i2o-lindent-run.patch
-fuse-clean-up-fuse_lookup.patch
-fuse-clean-up-page-offset-calculation.patch
-fuse-bump-interface-version.patch
-fuse-add-frsize-to-statfs-reply.patch
-fuse-support-caching-negative-dentries.patch
-fuse-add-code-documentation.patch
-fuse-fail-file-operations-on-bad-inode.patch
-fuse-clean-up-request-size-limit-checking.patch
-fuse-make-maximum-write-data-configurable.patch
-fuse-ensure-progress-in-read-and-write.patch
-fuse-check-file-type-in-lookup.patch
-parport-buffer-overflow-fix.patch
-parport-phase-fixes.patch
-parport-daisy-chain-end-detection-fix.patch
-parport-daisy-chain-device-id-reading-fix.patch
-parport-daisy-chain-device-id-reading-fix-2.patch
-parport-use-complete-slab-buffer.patch
-parport-constification.patch
-parport-constification-fix.patch
-parport-debug_parport-build-fix.patch
-parport-kconfig-dependency-fixes.patch
-parport-include-fixes.patch
-parport-export-parport_get_port.patch
-simplify-parport_pc_pcmcia-dependencies.patch
-include-linux-parport_pch-extern-inline-static-inline.patch
-ipmi-use-refcount-in-message-handler-avoid-list_for_each_safe_rcu.patch
-kernel-modulec-removed-dead-code.patch
-jbd-split-checkpoint-lists.patch
-keep-nfsd-from-exiting-when-seeing-recv-errors.patch
-knfsd-check-error-status-from-vfs_getattr-and-i_op-fsync.patch
-knfsd-reduce-stack-consumption.patch
-device-mapper-add-dm_find_md.patch
-device-mapper-add-dm_get_md.patch
-device-mapper-ioctl-event-on-rename.patch
-device-mapper-snapshot-metadata-reading-separation.patch
-device-mapper-remove-unused-definition.patch
-device-mapper-scanf-sector-format-change.patch
-device-mapper-raid1-add-default-mirror.patch
-device-mapper-rename-frozen_bdev.patch
-device-mapper-make-lock_fs-optional.patch
-device-mapper-ioctl-add-skip-lock_fs-flag.patch
-drivers-md-kcopydc-if-0-kcopyd_cancel.patch
-dm-crypt-zero-key-before-freeing-it.patch
-make-dm-mirror-not-issue-invalid-resync-requests.patch
-md-improve-raid1-io-barrier-concept.patch
-md-improve-raid10-io-barrier-concept.patch
-md-small-cleanups-for-raid5.patch
-md-allow-dirty-raid-arrays-to-be-started-at-boot.patch
-md-move-bitmap_create-to-after-md-array-has-been-initialised.patch
-md-write-intent-bitmap-support-for-raid10.patch
-md-fix-raid6-resync-check-repair-code.patch
-md-improve-handing-of-read-errors-with-raid6.patch
-md-attempt-to-auto-correct-read-errors-in-raid1.patch
-md-tidyup-some-issues-with-raid1-resync-and-prepare-for-catching-read-errors.patch
-md-better-handling-for-read-error-in-raid1-during-resync.patch
-md-handle-errors-when-read-only.patch
-md-fix-up-some-rdev-rcu-locking-in-raid5-6.patch
-md-support-check-without-repair-of-raid10-arrays.patch
-md-allow-raid1-to-check-consistency.patch
-md-make-sure-read-error-on-last-working-drive-of-raid1-actually-returns-failure.patch
-md-auto-correct-correctable-read-errors-in-raid10.patch
-md-raid10-read-error-handling-resync-and-read-only.patch
-md-make-proc-mdstat-pollable.patch
-md-clean-up-page-related-names-in-md.patch
-md-convert-md-to-use-kzalloc-throughout.patch
-md-tidy-up-raid5-6-hash-table-code.patch
-md-convert-various-kmap-calls-to-kmap_atomic.patch
-md-convert-recently-exported-symbol-to-gpl.patch
-md-break-out-of-a-loop-that-doesnt-need-to-run-to-completion.patch
-md-remove-personality-numbering-from-md.patch
-md-fix-possible-problem-in-raid1-raid10-error-overwriting.patch
-md-remove-inappropriate-limits-in-md-bitmap-configuration.patch
-md-define-and-use-safe_put_page-for-md.patch
-md-helper-function-to-match-commands-written-to-sysfs-files.patch
-md-fix-typo-in-comment.patch
-md-make-a-couple-of-names-in-mdc-static.patch
-md-make-a-couple-of-names-in-mdc-static-fix.patch
-md-make-sure-bitmap-updates-are-visible-through-filesystem.patch
-md-fix-rdev-pending-counts-in-raid1.patch
-md-allow-chunk_size-to-be-settable-through-sysfs.patch
-md-allow-md-array-component-size-to-be-accessed-and-set-via-sysfs.patch
-md-expose-md-metadata-format-in-sysfs.patch
-md-allow-array-level-to-be-set-textually-via-sysfs.patch
-md-count-corrected-read-errors-per-drive.patch
-md-allow-md-raid_disks-to-be-settable.patch
-md-keep-better-track-of-dev-array-size-when-assembling-md-arrays.patch
-md-expose-device-slot-information-via-sysfs.patch
-md-export-rdev-data_offset-via-sysfs.patch
-md-export-rdev-data_offset-via-sysfs-fix.patch
-md-allow-available-size-of-component-devices-to-be-set-via-sysfs.patch
-md-allow-available-size-of-component-devices-to-be-set-via-sysfs-fix.patch
-md-support-adding-new-devices-to-md-arrays-via-sysfs.patch
-md-allow-sync-speed-to-be-controlled-per-device.patch
-decrease-number-of-pointer-derefs-in-nfnetlink_queuec.patch
-decrease-number-of-pointer-derefs-in-nf_conntrack_corec.patch

Merged

+git-acpi-memhotplug-build-fix.patch

Fix git-acpi.patch

+git-blkdev-fixup.patch

Fix rejects in git-blkdev.patch

+git-cfq-fixup.patch

Fix rejects in git-cfq.patch

+git-libata-all-fixup.patch

Fix rejects in git-libata-all.patch

+git-libata-all-pata_amd-build-fix.patch

Fix git-libata-all.patch

+fix-sys-class-net-if-wireless-without-dev-get_wireless_stats.patch
+fix-sys-class-net-if-wireless-without-dev-get_wireless_stats-fix.patch

Bring back a /sys file needed by userspace wireless control apps.

+bonding-sparse-warnings-fix.patch

Bonding fixlet

+pass-proper-device-from-buslogic-to-scsi-layer.patch

Buslogic driver fix

+areca-raid-driver-arcmsr-cleanups.patch

Clean up areca-raid-linux-scsi-driver.patch

+gregkh-usb-usb-remove-usb_audio-and-usb_midi-drivers.patch
+gregkh-usb-usb-drivers-usb-core-message.c-make-usb_get_string-static.patch

USB tree updates

+arm-netwinder-watchdog-wdt977-update.patch

Netwinder driver fix

-x86_64-cpu-pda-cleanup.patch
-x86_64-remove-kdb-vector.patch
+x86_64-cpu-pda-cleanup.patch
+x86_64-remove-kdb-vector.patch
+x86_64-sparse-warnings-fix.patch
+x86_64-invalid-operand-name.patch
+x86_64-allow-setting-rf-in-eflags.patch
+x86_64-io-apic-memorize.patch
+x86_64-cleanup-enter_lazy_tlb.patch

x86_64 tree updates

+i386-io_apic-use-correct-index-variable-when-computing-the.patch

x86 io-apic fix

+fix-compilation-with-config_memory_hotplug=y-and-gcc41.patch

Build fix

+swap-migration-v5-sys_migrate_pages-interface-x86_64-fix.patch

Fix the swap migration patch

+oom-kill-of-current-task.patch

Optimise oom-killing of current task

+acx-driver-update.patch

Update acx wireless driver

+xfrm-sparse-warning-fix.patch

Sparse fix

+frv-suppress-configuration-of-certain-features-for-frv.patch
+frv-drop-8-16-bit-xchg-and-cmpxchg.patch
+frv-drop-unsupported-debugging-features.patch
+frv-implement-and-export-various-things-required-by-modules.patch
+frv-support-module-exception-tables.patch
+frv-supply-various-missing-i-o-access-primitives.patch
+frv-add-module-support-stubs.patch
+frv-add-pci_iomap.patch
+frv-fix-pcmcia-configuration.patch
+frv-force-serial-driver-inclusion.patch
+frv-make-get_user-macro-cast-pointers.patch
+frv-miscellaneous-changes.patch
+frv-fix-uninitialised-variable-in-atm-nicstar-driver.patch
+frv-fix-uninitialised-variable-in-serverworks-driver.patch

FRV updates

+ia64-use-i386-dmi_scanc.patch

ia64 uses some of ia32's DMI code.

+uml-move-libc-dependent-code-from-signal_userc.patch
+uml-move-libc-dependent-code-from-trap_userc.patch
+uml-merge-trap_userc-and-trap_kernc.patch
+uml-whitespace-cleanup.patch
+uml-prevent-mode_skas=n-and-mode_tt=n.patch

UML updates

+consolidate-asm-futexh.patch

Cleanup

-support-for-preadv-pwritev.patch
-support-for-preadv-pwritev-fix.patch

Dropped these - they might come back if we have a good reason, but they
touch the syscall tables and cause me grief.

+parport_pc-arm-build-fix.patch
+parport-bring-back-an-unused-phase-for-ppdev-ioctl.patch

parport fixes

+simplify-k_getrusage.patch
+avoid-taking-global-tasklist_lock-for-single-threadedprocess-at-getrusage.patch
+avoid-taking-global-tasklist_lock-for-single-threadedprocess-at-getrusage-tidy.patch

SMP optimisation

+drivers-isdn-add-missing-includes.patch
+drivers-isdn-hardware-eicon-os_4bric-correct-the-xdiloadfile-signature.patch

ISDN fixes

+dump_thread-cleanup.patch

Code consolidation

+cciss-avoid-defining-useless-major_nr-macro.patch

Cleanup

+remove-set_fs-in-stop_machine.patch

Remove unneeded code

+kdump-documentation-update.patch

Documentation

-unshare-system-call-system-call-handler-function.patch
-unshare-system-call-system-call-registration-for-i386.patch
-unshare-system-call-system-call-registration-for-powerpc.patch
-unshare-system-call-system-call-registration-for-ppc.patch
-unshare-system-call-system-call-registration-for-x86_64.patch
-unshare-system-call-allow-unsharing-of-filesystem.patch
-unshare-system-call-allow-unsharing-of-namespace.patch
-unshare-system-call-allow-unsharing-of-vm.patch
-unshare-system-call-allow-unsharing-of-files.patch

These caused an oops - dropped.

+mutex-subsystem-add-atomic_xchg-to-all-arches.patch
+mutex-subsystem-add-typecheck_fntype-function.patch
+mutex-subsystem-add-asm-generic-mutex-h-implementations.patch
+mutex-subsystem-memory-ordering-fixes.patch
+mutex-subsystem-add-include-asm-i386-mutexh.patch
+mutex-subsystem-add-include-asm-x86_64-mutexh.patch
+mutex-subsystem-add-include-asm-arm-mutexh.patch
+mutex-subsystem-add-include-asm-arm-mutexh-fix.patch
+mutex-subsystem-add-default-include-asm-mutexh-files.patch
+mutex-subsystem-core.patch
+mutex-subsystem-documentation.patch
+mutex-subsystem-debugging-code.patch
+mutex-subsystem-more-debugging-code.patch
+mutex-subsystem-synchro-test-module.patch
+mutex-subsystem-semaphore-to-mutex-xfs.patch
+mutex-subsystem-semaphore-to-mutex-vfs-i_sem.patch
+mutex-subsystem-semaphore-to-mutex-vfs-i_sem-more.patch
+mutex-subsystem-semaphore-to-mutex-vfs-i_sem-fixes.patch
+mutex-subsystem-semaphore-to-mutex-vfs-i_sem-fixes-2.patch
+mutex-subsystem-semaphore-to-mutex-vfs-i_sem-fixes-3.patch
+mutex-subsystem-semaphore-to-mutex-vfs-sb-s_lock.patch
+mutex-subsystem-semaphore-to-completion-sx8.patch
+mutex-subsystem-semaphore-to-completion-cpu3wdt.patch
+mutex-subsystem-semaphore-to-completion-ide-gendev_rel_sem.patch
+mutex-subsystem-semaphore-to-completion-drivers-block-loopc.patch
+reiser4-i_sem-mutex-switch.patch

mutex stuff

-time-i386-conversion-part-2-move-timer_tscc-to-tscc.patch
-time-i386-conversion-part-3-rework-tsc-support.patch
-time-i386-conversion-part-4-acpi-pm-variable-renaming-and-config-change.patch
-time-i386-conversion-part-5-enable-generic-timekeeping.patch
-time-i386-conversion-part-6-remove-old-code.patch
+time-i386-conversion-part-2-rework-tsc-support.patch
+time-i386-conversion-part-3-enable-generic-timekeeping.patch
+time-i386-conversion-part-4-remove-old-timer_opts-code.patch
+time-i386-conversion-part-5-acpi-pm-variable-renaming-and-config-change.patch
+time-fix-cpu-frequency-detection.patch

New version of the time patches.

+kprobes-conversion-from-kcalloc-to-kzalloc.patch

kzalloc conversion

+drivers-media-conversions-from-kmallocmemset-to-kzcalloc.patch

kzalloc conversion

-ide-compat_semaphore-to-completion.patch

Dropped - is in the mutex patches

+fbcon-dont-call-set_par-in-fbcon_init-if-vc_mode==kd_graphics.patch

Fix loading of fbcon drivers while in X.

+make-__always_inline-actually-force-always-inlining.patch
+kbuild-call-gcc_version-earlier.patch
+enable-unit-at-a-time-optimisations-for-gcc4.patch
+mark-several-functions-__always_inline.patch
+mark-some-key-vfs-functions-as-__always_inline.patch
+uninline-capable.patch
+unlinline-a-bunch-of-other-functions.patch
+pktcdvd-un-inline-some-functions.patch
+make-inline-no-longer-mandatory-for-gcc-4x.patch

Futz with inlining.

+vfa-at-functions-core.patch
+vfs-at-functions-i386.patch
+vfs-at-functions-x86_64.patch

Add new filesystem syscalls: these are like open() and friends except "The
openat() function is identical to the open() function except that the path
argument is interpreted relative to the starting point implied by the fd
argument. If the fd argument has the special value AT_FDCWD, a relative path
argument will be resolved relative to the current working directory. If the
path argument is absolute, the fd argument is ignored."

+fix-some-f_ops-abuse-in-acpi.patch
+fix-input-layer-f_ops-abuse.patch
+fix-cifs-bugs-wrt-writing-to-f_ops.patch
+mark-f_ops-const-in-the-inode.patch

Preparation for moving the file_operations tables into .rodata.

+docs-update-typos-corrections-and-additions-to-applying-patchestxt.patch
+docs-update-missing-files-and-descriptions-for-filesystems-00-index.patch
+docs-update-small-spelling-formating-etc-fixes-for-filesystems-ext3txt.patch
+docs-update-remove-obsolete-patch-from-lockstxt.patch
+docs-update-small-fixes-to-stable_kernel_rulestxt.patch

Documentation fixes

+debug-shared-irqs-fix.patch

Fix debug-shared-irqs.patch



All 1179 patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/patch-list



2006-01-07 13:23:24

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton <[email protected]> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/
>

This hasn't mirrored yet - it should be there in an hour or so.

2006-01-07 15:05:45

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hi,

On 8/01/2006 2:22 a.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/
>
> This should be somewhat less buggy than 2.6.15-mm1.
>
>
> Changes since 2.6.15-mm1:
>
> linus.patch
> git-acpi.patch
> git-agpgart.patch
> git-arm.patch
> git-blktrace.patch
> git-block.patch
> git-cfq.patch
> git-cifs.patch
> git-drm.patch
> git-audit.patch
> git-infiniband.patch
> git-input.patch
> git-libata-all.patch
> git-mmc.patch
> git-netdev-all.patch
> git-ntfs.patch
> git-ocfs2.patch
> git-powerpc.patch
> git-serial.patch
> git-sym2.patch
> git-sas-jg.patch
> git-watchdog.patch
> git-xfs.patch
> git-cryptodev.patch

Seeing multiple problems with this release...

1. Nasty oops when rebooting into -mm2. Then I rebooted back into -mm1 and it
happened again - so I cold booted this time and the problem went away.

Linux version 2.6.15-mm2 ([email protected]) (gcc version 4.1.0 20051222
(Red Hat 4.1.0-0.12)) #1 SMP Sun Jan 8 02:58:06 NZDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fe2f800 (usable)
BIOS-e820: 000000003fe2f800 - 000000003fe3f8e3 (ACPI NVS)
BIOS-e820: 000000003ff2f800 - 000000003ff30000 (ACPI NVS)
BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data)
BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS)
BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fed13000 - 00000000fed1a000 (reserved)
BIOS-e820: 00000000fed1c000 - 00000000feda0000 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode: Flat. Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
Detected 2800.349 MHz processor.
Built 1 zonelists
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600
CPU 0 irqstacks, hard=c0423000 soft=c0421000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1033452k/1046716k available (2238k kernel code, 12600k reserved, 739k
data, 200k init, 129212k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5607.23 BogoMIPS (lpj=11214461)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c0424000 soft=c0422000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5600.64 BogoMIPS (lpj=11201280)
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Total of 2 processors activated (11207.87 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
migration_cost=74
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG
ACPI: Subsystem revision 20051216
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: Power Resource [URP2] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0
ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: ffa00000-ffafffff
PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: ff600000-ff6fffff
PREFETCH window: fdb00000-fdbfffff
PCI: Bridge: 0000:00:1c.1
IO window: a000-afff
MEM window: ff700000-ff7fffff
PREFETCH window: fdc00000-fdcfffff
PCI: Bridge: 0000:00:1c.2
IO window: disabled.
MEM window: ff800000-ff8fffff
PREFETCH window: fdd00000-fddfffff
PCI: Bridge: 0000:00:1c.3
IO window: disabled.
MEM window: ff900000-ff9fffff
PREFETCH window: fde00000-fdefffff
PCI: Bridge: 0000:00:1e.0
IO window: b000-bfff
MEM window: ff500000-ff5fffff
PREFETCH window: fe000000-fe7fffff
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
PCI: Enabling device 0000:00:1c.1 (0106 -> 0107)
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered<6>Time: tsc clocksource has been installed.

io scheduler deadline registered
io scheduler cfq registered
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
assign_interrupt_mode Found MSI capability
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: Processor [CPU2] (supports 8 throttling states)
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
?serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
sky2 Cannot find PowerManagement capability, aborting.
sky2: probe of 0000:04:00.0 failed with error -5
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
QLogic Fibre Channel HBA Driver
ahci: probe of 0000:00:1f.2 failed with error -12
ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c0234702
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 1
EIP: 0060:[<c0234702>] Not tainted VLI
EFLAGS: 00010206 (2.6.15-mm2)
EIP is at make_class_name+0x28/0x8d
eax: 00000000 ebx: ffffffff ecx: ffffffff edx: c19d9224
esi: 00000009 edi: 00000000 ebp: 00000000 esp: c1921d9c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, threadinfo=c1921000 task=c1920a70)
Stack: <0>c19d9224 c03a9158 c19d9224 c03a9158 c03a9160 c0234925 c03a90e0 00000000
<0>00000246 c19d9224 c19d9000 c19d9030 00000002 c02349db c19d90e4 c0253218
<0>c19d92c0 00000000 00000000 c0276693 00000000 c0279391 c035749f c1961640
Call Trace:
[<c0234925>] class_device_del+0x9f/0x14d
[<c02349db>] class_device_unregister+0x8/0x10
[<c0253218>] scsi_remove_host+0xb8/0xf8
[<c0276693>] ata_host_remove+0xe/0x18
[<c0279391>] ata_device_add+0x2d3/0xb99
[<c02b6fb0>] pci_mmcfg_write+0xd3/0x103
[<c01eb713>] pci_bus_write_config_byte+0x4e/0x58
[<c02b67d3>] pcibios_set_master+0x74/0x8c
[<c027a2e5>] ata_pci_init_one+0x32c/0x38e
[<c01eb7ea>] pci_bus_read_config_word+0x62/0x6c
[<c01ef8bd>] pci_get_subsys+0x6c/0xe0
[<c027e334>] piix_init_one+0x18e/0x33a
[<c01ef259>] pci_device_probe+0x40/0x5b
[<c0233ed7>] driver_probe_device+0x35/0x98
[<c0234038>] __driver_attach+0x8a/0x8c
[<c02339a7>] bus_for_each_dev+0x39/0x57
[<c0233e4c>] driver_attach+0x16/0x1a
[<c0233fae>] __driver_attach+0x0/0x8c
[<c023365b>] bus_add_driver+0x6f/0x126
[<c01ef3f1>] __pci_register_driver+0x7d/0xac
[<c04023e9>] piix_init+0xc/0x1e
[<c01003c8>] init+0xff/0x324
[<c01002c9>] init+0x0/0x324
[<c0100d35>] kernel_thread_helper+0x5/0xb
Code: 89 c8 c3 55 57 56 53 83 ec 04 89 04 24 89 c2 8b 40 48 8b 38 31 ed bb ff ff
ff ff 89 d9 89 e8 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d
4e 02 8d 04 0a ba d0 00 00 00 e8 22 cf
<0>Kernel panic - not syncing: Attempted to kill init!
<0>Rebooting in 60 seconds..0


2. Notice above how the sky2 driver is being bailed out:

ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
sky2 Cannot find PowerManagement capability, aborting.
sky2: probe of 0000:04:00.0 failed with error -5

This has happened a number of times in the last few days, and I suspect is
unrelated to the oops that followed above.

This driver worked fine in 2.6.15-rc5-mm3, and seems to work OK when built as a
module. But most of the time (not all the time) it doesn't like being
statically built in and fails with the above error. Changes to this driver have
been fairly small lately so I'm not sure if it's the driver or something else
like ACPI that is the root cause.

3. The boot up process with -mm2 was pretty lengthy, I had two periods of time
when the whole system just came to a crawl, first time was when starting cups,
and it came back to life and continued booting about 30s later. Next when
starting hpijs it didn't come to life at all and I had to reboot. No output to
the console for either, unfortunately.

Back on -mm1 for now, box has only got an PCI graphics card and I'm not running
X, DRM or AGP, so by all accounts -mm1 is OK for me ;-)

config up at the usual place http://www.reub.net/files/kernel/configs/

reuben



2006-01-07 15:09:03

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 1/7/06, Andrew Morton <[email protected]> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/
>
> This should be somewhat less buggy than 2.6.15-mm1.
>
For some maybe. For me it's just as broken as 2.6.15-mm1 :-(

I'll turn on all debug switches and try and collect some crash dumps.
If there's anything specific you want me to try, let me know.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-07 16:20:33

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.15-mm2: why is __get_page_state() global again?

On Sat, Jan 07, 2006 at 05:22:21AM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.15-mm1:
>...
> +revert-mm-page_state-fixes.patch
>
> This was a deoptimisation
>...

>From reading the patch description I don't understand why this makes
__get_page_state() global again.

Is there a reason or is this accidentally?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-07 18:00:41

by Adrian Bunk

[permalink] [raw]
Subject: [-mm patch] drivers/block/amiflop.c: fix compilation

add-block_device_operationsgetgeo-block-device-method.patch causes the
following compile error:

<-- snip -->

...
CC drivers/block/amiflop.o
drivers/block/amiflop.c: In function `fd_getgeo':
drivers/block/amiflop.c:1431: warning: implicit declaration of function `minor'
...
LD .tmp_vmlinux1
...
drivers/built-in.o(.text+0x215cc): In function `fd_getgeo':
: undefined reference to `minor'
make: *** [.tmp_vmlinux1] Error 1

<-- snip -->


Signed-off-by: Adrian Bunk <[email protected]>

--- linux-2.6.15-mm2-m68k/drivers/block/amiflop.c.old 2006-01-07 18:22:21.000000000 +0100
+++ linux-2.6.15-mm2-m68k/drivers/block/amiflop.c 2006-01-07 18:22:30.000000000 +0100
@@ -1428,7 +1428,7 @@

static int fd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{
- int drive = minor(bdev->bd_dev) & 3;
+ int drive = MINOR(bdev->bd_dev) & 3;

geo->heads = unit[drive].type->heads;
geo->sectors = unit[drive].dtype->sects * unit[drive].type->sect_mult;

2006-01-07 18:19:29

by Adrian Bunk

[permalink] [raw]
Subject: [-mm patch] drivers/acpi/: make two functions static

On Sat, Jan 07, 2006 at 05:22:21AM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.15-mm1:
>...
> +fix-some-f_ops-abuse-in-acpi.patch
>...

After this patch, we can make two functions static.


Signed-off-by: Adrian Bunk <[email protected]>

---

drivers/acpi/processor_thermal.c | 6 +++---
drivers/acpi/processor_throttling.c | 6 +++---
include/acpi/processor.h | 6 ------
3 files changed, 6 insertions(+), 12 deletions(-)

--- linux-2.6.15-mm2-full/include/acpi/processor.h.old 2006-01-07 17:52:17.000000000 +0100
+++ linux-2.6.15-mm2-full/include/acpi/processor.h 2006-01-07 17:52:28.000000000 +0100
@@ -223,9 +223,6 @@
/* in processor_throttling.c */
int acpi_processor_get_throttling_info(struct acpi_processor *pr);
int acpi_processor_set_throttling(struct acpi_processor *pr, int state);
-ssize_t acpi_processor_write_throttling(struct file *file,
- const char __user * buffer,
- size_t count, loff_t * data);
extern struct file_operations acpi_processor_throttling_fops;

/* in processor_idle.c */
@@ -237,9 +234,6 @@

/* in processor_thermal.c */
int acpi_processor_get_limit_info(struct acpi_processor *pr);
-ssize_t acpi_processor_write_limit(struct file *file,
- const char __user * buffer,
- size_t count, loff_t * data);
extern struct file_operations acpi_processor_limit_fops;

#ifdef CONFIG_CPU_FREQ
--- linux-2.6.15-mm2-full/drivers/acpi/processor_throttling.c.old 2006-01-07 17:52:37.000000000 +0100
+++ linux-2.6.15-mm2-full/drivers/acpi/processor_throttling.c 2006-01-07 17:52:54.000000000 +0100
@@ -306,9 +306,9 @@
PDE(inode)->data);
}

-ssize_t acpi_processor_write_throttling(struct file * file,
- const char __user * buffer,
- size_t count, loff_t * data)
+static ssize_t acpi_processor_write_throttling(struct file * file,
+ const char __user * buffer,
+ size_t count, loff_t * data)
{
int result = 0;
struct seq_file *m = (struct seq_file *)file->private_data;
--- linux-2.6.15-mm2-full/drivers/acpi/processor_thermal.c.old 2006-01-07 17:53:04.000000000 +0100
+++ linux-2.6.15-mm2-full/drivers/acpi/processor_thermal.c 2006-01-07 17:53:16.000000000 +0100
@@ -348,9 +348,9 @@
PDE(inode)->data);
}

-ssize_t acpi_processor_write_limit(struct file * file,
- const char __user * buffer,
- size_t count, loff_t * data)
+static ssize_t acpi_processor_write_limit(struct file * file,
+ const char __user * buffer,
+ size_t count, loff_t * data)
{
int result = 0;
struct seq_file *m = (struct seq_file *)file->private_data;

2006-01-07 18:21:17

by Adrian Bunk

[permalink] [raw]
Subject: [-mm patch] kernel/synchro-test.c: make 5 functions static

On Sat, Jan 07, 2006 at 05:22:21AM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.15-mm1:
>...
> +mutex-subsystem-synchro-test-module.patch
>...
> mutex stuff
>...

This patch makes fives needlessly global functions static.


Signed-off-by: Adrian Bunk <[email protected]>

---

kernel/synchro-test.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.15-mm2-full/kernel/synchro-test.c.old 2006-01-07 17:26:30.000000000 +0100
+++ linux-2.6.15-mm2-full/kernel/synchro-test.c 2006-01-07 17:27:11.000000000 +0100
@@ -221,7 +221,7 @@
schedule();
}

-int mutexer(void *arg)
+static int mutexer(void *arg)
{
unsigned int N = (unsigned long) arg;

@@ -243,7 +243,7 @@
complete_and_exit(&mx_comp[N], 0);
}

-int semaphorer(void *arg)
+static int semaphorer(void *arg)
{
unsigned int N = (unsigned long) arg;

@@ -265,7 +265,7 @@
complete_and_exit(&sm_comp[N], 0);
}

-int reader(void *arg)
+static int reader(void *arg)
{
unsigned int N = (unsigned long) arg;

@@ -289,7 +289,7 @@
complete_and_exit(&rd_comp[N], 0);
}

-int writer(void *arg)
+static int writer(void *arg)
{
unsigned int N = (unsigned long) arg;

@@ -313,7 +313,7 @@
complete_and_exit(&wr_comp[N], 0);
}

-int downgrader(void *arg)
+static int downgrader(void *arg)
{
unsigned int N = (unsigned long) arg;


2006-01-07 19:32:18

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

0000:00:00.0 Host bridge: Intel Corporation Mobile 915GM/PM/GMS/910GML Express Processor to DRAM Controller (rev 03)
Subsystem: IBM: Unknown device 0575
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Capabilities: <available only to root>
00: 86 80 90 25 06 01 90 20 03 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 75 05
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00

0000:00:01.0 PCI bridge: Intel Corporation Mobile 915GM/PM Express PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: a8100000-a81fffff
Prefetchable memory behind bridge: c0000000-c7ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
Capabilities: <available only to root>
00: 86 80 91 25 07 05 10 00 03 00 04 06 08 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 20 20 00 20
20: 10 a8 10 a8 00 c0 f0 c7 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 0b 01 0c 00

0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 (rev 03) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: a8200000-a82fffff
Prefetchable memory behind bridge: 00000000fff00000-0000000000000000
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: <available only to root>
00: 86 80 60 26 07 05 10 00 03 00 04 06 08 00 81 00
10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 20
20: 20 a8 20 a8 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 04 00

0000:00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 3 (rev 03) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: a8300000-a83fffff
Prefetchable memory behind bridge: 00000000c8000000-00000000c8000000
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: <available only to root>
00: 86 80 64 26 07 05 10 00 03 00 04 06 08 00 81 00
10: 00 00 00 00 00 00 00 00 00 03 03 00 30 30 00 00
20: 30 a8 30 a8 01 c8 01 c8 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 03 04 00

0000:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 (rev 03) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 0565
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 169
Region 4: I/O ports at 1800 [size=32]
00: 86 80 58 26 05 00 80 02 03 00 03 0c 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 18 00 00 00 00 00 00 00 00 00 00 14 10 65 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 00

0000:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #2 (rev 03) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 0565
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin B routed to IRQ 225
Region 4: I/O ports at 1820 [size=32]
00: 86 80 59 26 05 00 80 02 03 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 21 18 00 00 00 00 00 00 00 00 00 00 14 10 65 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 02 00 00

0000:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 0565
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin C routed to IRQ 233
Region 4: I/O ports at 1840 [size=32]
00: 86 80 5a 26 05 00 80 02 03 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 41 18 00 00 00 00 00 00 00 00 00 00 14 10 65 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 03 00 00

0000:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #4 (rev 03) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 0565
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 50
Region 4: I/O ports at 1860 [size=32]
00: 86 80 5b 26 05 00 80 02 03 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 61 18 00 00 00 00 00 00 00 00 00 00 14 10 65 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 04 00 00

0000:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller (rev 03) (prog-if 20 [EHCI])
Subsystem: IBM: Unknown device 0566
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 50
Region 0: Memory at a8000000 (32-bit, non-prefetchable) [size=1K]
Capabilities: <available only to root>
00: 86 80 5c 26 06 01 90 02 03 20 03 0c 00 00 00 00
10: 00 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 66 05
30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 04 00 00

0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev d3) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=04, subordinate=07, sec-latency=64
I/O behind bridge: 00004000-00007fff
Memory behind bridge: a8400000-b7ffffff
Prefetchable memory behind bridge: 00000000d0000000-00000000d7f00000
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: <available only to root>
00: 86 80 48 24 07 01 10 00 d3 01 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 04 07 40 40 70 80 22
20: 40 a8 f0 b7 01 d0 f1 d7 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 04 00

0000:00:1e.2 Multimedia audio controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller (rev 03)
Subsystem: IBM: Unknown device 0567
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 185
Region 0: I/O ports at 1c00 [size=256]
Region 1: I/O ports at 1880 [size=64]
Region 2: Memory at a8000800 (32-bit, non-prefetchable) [size=512]
Region 3: Memory at a8000400 (32-bit, non-prefetchable) [size=256]
Capabilities: <available only to root>
00: 86 80 6e 26 07 00 90 02 03 00 01 04 00 00 00 00
10: 01 1c 00 00 81 18 00 00 00 08 00 a8 00 04 00 a8
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 67 05
30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00

0000:00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface Bridge (rev 03)
Subsystem: IBM: Unknown device 0568
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
00: 86 80 41 26 07 00 00 02 03 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 68 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller (rev 03) (prog-if 80 [Master])
Subsystem: IBM: Unknown device 056a
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 0: I/O ports at <unassigned>
Region 1: I/O ports at <unassigned>
Region 2: I/O ports at <unassigned>
Region 3: I/O ports at <unassigned>
Region 4: I/O ports at 18c0 [size=16]
Capabilities: <available only to root>
00: 86 80 53 26 05 00 b8 02 03 80 01 01 00 00 00 00
10: 01 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00
20: c1 18 00 00 00 00 00 00 00 00 00 00 14 10 6a 05
30: 00 00 00 00 70 00 00 00 00 00 00 00 ff 00 00 00

0000:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller (rev 03)
Subsystem: IBM: Unknown device 056b
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 11
Region 4: I/O ports at 18e0 [size=32]
00: 86 80 6a 26 01 01 80 02 03 00 05 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: e1 18 00 00 00 00 00 00 00 00 00 00 14 10 6b 05
30: 00 00 00 00 00 00 00 00 00 00 00 00 0b 01 00 00

0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon Mobility M300] (prog-if 00 [VGA])
Subsystem: IBM: Unknown device 056e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 169
Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at a8120000 [disabled] [size=128K]
Capabilities: <available only to root>
00: 02 10 60 54 07 01 10 00 00 00 00 03 08 00 00 00
10: 08 00 00 c0 01 20 00 00 00 00 10 a8 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6e 05
30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00

0000:02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751M Gigabit Ethernet PCI Express (rev 11)
Subsystem: IBM: Unknown device 0577
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 169
Region 0: Memory at a8200000 (64-bit, non-prefetchable) [size=64K]
Capabilities: <available only to root>
00: e4 14 7d 16 06 01 10 00 11 00 00 02 08 00 00 00
10: 04 00 20 a8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 07 00 00 00 14 10 77 05
30: 00 00 00 00 48 00 00 00 00 00 00 00 0b 01 00 00

0000:04:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev 8d)
Subsystem: IBM: Unknown device 056c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 168
Interrupt: pin A routed to IRQ 169
Region 0: Memory at a8400000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=04, secondary=05, subordinate=08, sec-latency=176
Memory window 0: d0000000-d1fff000 (prefetchable)
Memory window 1: aa000000-abfff000
I/O window 0: 00004000-000040ff
I/O window 1: 00004400-000044ff
BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset- 16bInt+ PostWrite+
16-bit legacy interface ports at 0001
00: 80 11 76 04 07 00 10 02 8d 00 07 06 00 a8 82 00
10: 00 00 40 a8 dc 00 00 02 04 05 08 b0 00 00 00 d0
20: 00 f0 ff d1 00 00 00 aa 00 f0 ff ab 00 40 00 00
30: fc 40 00 00 00 44 00 00 fc 44 00 00 0b 01 80 05
40: 14 10 6c 05 01 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000:04:02.0 Network controller: Intel Corporation PRO/Wireless 2200BG (rev 05)
Subsystem: Intel Corporation: Unknown device 2711
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (750ns min, 6000ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 58
Region 0: Memory at a8401000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <available only to root>
00: 86 80 20 42 16 01 90 02 05 00 80 02 08 40 00 00
10: 00 10 40 a8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 11 27
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 03 18


Attachments:
config-2.6.15-mm2 (40.60 kB)
dmesg-2.6.15 (14.44 kB)
dmesg-2.6.15-mm2 (15.05 kB)
lspci (13.43 kB)
Download all attachments

2006-01-07 20:49:54

by Alexey Dobriyan

[permalink] [raw]
Subject: 2.6.15-mm2: alpha broken

alpha Just Broken (TM)
----------------------------------------------------------------------------
CC arch/alpha/kernel/asm-offsets.s
In file included from include/asm/user.h:5,
from include/linux/user.h:1,
from include/linux/kernel.h:16,
from include/linux/spinlock.h:54,
from include/linux/capability.h:45,
from include/linux/sched.h:7,
from arch/alpha/kernel/asm-offsets.c:9:
include/linux/ptrace.h: In function `ptrace_link':
include/linux/ptrace.h:100: error: dereferencing pointer to incomplete type
include/linux/ptrace.h: In function `ptrace_unlink':
include/linux/ptrace.h:105: error: dereferencing pointer to incomplete type
make[1]: *** [arch/alpha/kernel/asm-offsets.s] Error 1

2006-01-07 20:51:38

by Andrew James Wade

[permalink] [raw]
Subject: Badness in __mutex_unlock_slowpath

Hello,

I got this when "amaroK" started playing:

Badness in __mutex_unlock_slowpath at kernel/mutex.c:214
[<c03538e8>] __mutex_unlock_slowpath+0x56/0x1a2
[<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
[<c0302f3c>] snd_pcm_oss_write+0x34/0x1e0
[<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
[<c0148221>] vfs_write+0x83/0x122
[<c0148a36>] sys_write+0x3c/0x63
[<c0102ba3>] sysenter_past_esp+0x54/0x75

(2.6.15-mm2)

The sound was garbled with amaroK, but fine with xmms.
Note: I'm underclocking by a couple of percent.

HTH,
Andrew Wade


Attachments:
(No filename) (531.00 B)
.config (31.20 kB)
Download all attachments

2006-01-07 21:04:24

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Sat, Jan 07, 2006 at 02:31:58PM -0500, Brice Goglin wrote:
> Andrew Morton wrote:
>
> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/
> >
> >This should be somewhat less buggy than 2.6.15-mm1.
> >
> >
> Hi Andrew,
>
> I get several problems on my Thinkpad T43:
> 1) agpgart does not load, it looks like something it returning ENODEV.
> Reverting git-agpgart fixes it.

Are you sure you actually this device ...

> 0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon Mobility M300] (prog-if 00 [VGA])
> Subsystem: IBM: Unknown device 056e
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x08 (32 bytes)
> Interrupt: pin A routed to IRQ 169
> Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
> Region 1: I/O ports at 2000 [size=256]
> Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
> Expansion ROM at a8120000 [disabled] [size=128K]
> Capabilities: <available only to root>
> 00: 02 10 60 54 07 01 10 00 00 00 00 03 08 00 00 00
> 10: 08 00 00 c0 01 20 00 00 00 00 10 a8 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6e 05
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00

is actually AGP ? (lspci -vvv of that device [as root] will tell you)

Dave

2006-01-07 21:13:52

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Badness in __mutex_unlock_slowpath

On Sat, 2006-01-07 at 15:51 -0500, Andrew James Wade wrote:
> Hello,
>
> I got this when "amaroK" started playing:
>
> Badness in __mutex_unlock_slowpath at kernel/mutex.c:214
> [<c03538e8>] __mutex_unlock_slowpath+0x56/0x1a2
> [<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
> [<c0302f3c>] snd_pcm_oss_write+0x34/0x1e0
> [<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
> [<c0148221>] vfs_write+0x83/0x122
> [<c0148a36>] sys_write+0x3c/0x63
> [<c0102ba3>] sysenter_past_esp+0x54/0x75
>
> (2.6.15-mm2)


this looks like a really evil alsa bug:

(pre mutex code below)

static ssize_t snd_pcm_oss_write(struct file *file, const char __user *buf, size_t count, loff_t *offset)
{
snd_pcm_oss_file_t *pcm_oss_file;
snd_pcm_substream_t *substream;
long result;

pcm_oss_file = file->private_data;
substream = pcm_oss_file->streams[SNDRV_PCM_STREAM_PLAYBACK];
if (substream == NULL)
return -ENXIO;
up(&file->f_dentry->d_inode->i_sem);
result = snd_pcm_oss_write1(substream, buf, count);
down(&file->f_dentry->d_inode->i_sem);


this is a .write method of a driver, which doesn't run with i_sem helt at all.
Best guess I have is that this code has up() and down() confused and switched...



2006-01-07 21:26:56

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Dave Jones wrote:

>Are you sure you actually this device ...
>
> > 0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon Mobility M300] (prog-if 00 [VGA])
> > Subsystem: IBM: Unknown device 056e
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
> > Latency: 0, Cache Line Size: 0x08 (32 bytes)
> > Interrupt: pin A routed to IRQ 169
> > Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
> > Region 1: I/O ports at 2000 [size=256]
> > Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
> > Expansion ROM at a8120000 [disabled] [size=128K]
> > Capabilities: <available only to root>
> > 00: 02 10 60 54 07 01 10 00 00 00 00 03 08 00 00 00
> > 10: 08 00 00 c0 01 20 00 00 00 00 10 a8 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6e 05
> > 30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00
>
>is actually AGP ? (lspci -vvv of that device [as root] will tell you)
>
> Dave
>
>
>

Hi Dave,

It might be a PCI Express, I'm not sure. Here's lspci -vvv as root.
Where am I supposed to look ?

0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon
Mobility M300] (prog-if 00 [VGA])
Subsystem: IBM: Unknown device 056e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 169
Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at a8120000 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] #10 [0001]
Capabilities: [80] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000

Assuming this is a PCI Express card, then what is the proper fix ?
Should I prevent my initscript from loading agpgart (actually intel_agp)
at all ? (I guess udev or hotplug is trying to load it here). Is there
something like agpgart for PCI express ? Or is it useless ?

Brice

2006-01-07 21:30:33

by David Miller

[permalink] [raw]
Subject: Re: 2.6.15-mm2

From: Brice Goglin <[email protected]>
Date: Sat, 07 Jan 2006 16:26:44 -0500

> It might be a PCI Express, I'm not sure.

It is PCI Express, I have the same thing in my T43p.

2006-01-07 21:31:30

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> ...
> QLogic Fibre Channel HBA Driver
> ahci: probe of 0000:00:1f.2 failed with error -12

It's odd that the ahci driver returned -EBUSY. Maybe this is due to "we
have legacy mode, but all ports are unavailable" in ata_pci_init_one().

> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c0234702
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU: 1
> EIP: 0060:[<c0234702>] Not tainted VLI
> EFLAGS: 00010206 (2.6.15-mm2)
> EIP is at make_class_name+0x28/0x8d
> eax: 00000000 ebx: ffffffff ecx: ffffffff edx: c19d9224
> esi: 00000009 edi: 00000000 ebp: 00000000 esp: c1921d9c
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 1, threadinfo=c1921000 task=c1920a70)
> Stack: <0>c19d9224 c03a9158 c19d9224 c03a9158 c03a9160 c0234925 c03a90e0 00000000
> <0>00000246 c19d9224 c19d9000 c19d9030 00000002 c02349db c19d90e4 c0253218
> <0>c19d92c0 00000000 00000000 c0276693 00000000 c0279391 c035749f c1961640
> Call Trace:
> [<c0234925>] class_device_del+0x9f/0x14d
> [<c02349db>] class_device_unregister+0x8/0x10
> [<c0253218>] scsi_remove_host+0xb8/0xf8
> [<c0276693>] ata_host_remove+0xe/0x18
> [<c0279391>] ata_device_add+0x2d3/0xb99
> [<c02b6fb0>] pci_mmcfg_write+0xd3/0x103
> [<c01eb713>] pci_bus_write_config_byte+0x4e/0x58
> [<c02b67d3>] pcibios_set_master+0x74/0x8c
> [<c027a2e5>] ata_pci_init_one+0x32c/0x38e
> [<c01eb7ea>] pci_bus_read_config_word+0x62/0x6c
> [<c01ef8bd>] pci_get_subsys+0x6c/0xe0
> [<c027e334>] piix_init_one+0x18e/0x33a
> [<c01ef259>] pci_device_probe+0x40/0x5b
> [<c0233ed7>] driver_probe_device+0x35/0x98
> [<c0234038>] __driver_attach+0x8a/0x8c
> [<c02339a7>] bus_for_each_dev+0x39/0x57
> [<c0233e4c>] driver_attach+0x16/0x1a
> [<c0233fae>] __driver_attach+0x0/0x8c
> [<c023365b>] bus_add_driver+0x6f/0x126
> [<c01ef3f1>] __pci_register_driver+0x7d/0xac
> [<c04023e9>] piix_init+0xc/0x1e
> [<c01003c8>] init+0xff/0x324
> [<c01002c9>] init+0x0/0x324
> [<c0100d35>] kernel_thread_helper+0x5/0xb
> Code: 89 c8 c3 55 57 56 53 83 ec 04 89 04 24 89 c2 8b 40 48 8b 38 31 ed bb ff ff
> ff ff 89 d9 89 e8 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d
> 4e 02 8d 04 0a ba d0 00 00 00 e8 22 cf

ata_device_add() has given up, has called ata_host_remove() and then we
presumably oopsed over incompletely initialised class stuff. It's likely
that this oops is a second bug - a consequence of the -EBUSY.

>
>
> 2. Notice above how the sky2 driver is being bailed out:
>
> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
> sky2 Cannot find PowerManagement capability, aborting.
> sky2: probe of 0000:04:00.0 failed with error -5
>
> This has happened a number of times in the last few days, and I suspect is
> unrelated to the oops that followed above.
>
> This driver worked fine in 2.6.15-rc5-mm3, and seems to work OK when built as a
> module. But most of the time (not all the time) it doesn't like being
> statically built in and fails with the above error. Changes to this driver have
> been fairly small lately so I'm not sure if it's the driver or something else
> like ACPI that is the root cause.

Could be acpi, yes.

Parenthetically, I wouldn't have thought that this error should be fatal
for the driver.

> 3. The boot up process with -mm2 was pretty lengthy, I had two periods of time
> when the whole system just came to a crawl, first time was when starting cups,
> and it came back to life and continued booting about 30s later. Next when
> starting hpijs it didn't come to life at all and I had to reboot. No output to
> the console for either, unfortunately.

Don't know, sorry. But this kernel had oopsed, hadn't it?

2006-01-07 21:42:19

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Sat, Jan 07, 2006 at 04:26:44PM -0500, Brice Goglin wrote:

> It might be a PCI Express, I'm not sure. Here's lspci -vvv as root.
> Where am I supposed to look ?
>
> 0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon
> Mobility M300] (prog-if 00 [VGA])
> Subsystem: IBM: Unknown device 056e
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR+ FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 0x08 (32 bytes)
> Interrupt: pin A routed to IRQ 169
> Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
> Region 1: I/O ports at 2000 [size=256]
> Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
> Expansion ROM at a8120000 [disabled] [size=128K]
> Capabilities: [50] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] #10 [0001]
> Capabilities: [80] Message Signalled Interrupts: 64bit+
> Queue=0/0 Enable-
> Address: 0000000000000000 Data: 0000
>
> Assuming this is a PCI Express card, then what is the proper fix ?

It's PCIE.

> Should I prevent my initscript from loading agpgart (actually intel_agp)
> at all ? (I guess udev or hotplug is trying to load it here). Is there
> something like agpgart for PCI express ? Or is it useless ?

it's useless. though the loading of it shouldn't harm anything.
Does it spew warnings during your boot ?

Dave

2006-01-07 21:41:46

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.6.15-mm2


> Assuming this is a PCI Express card, then what is the proper fix ?
> Should I prevent my initscript from loading agpgart (actually intel_agp)
> at all ? (I guess udev or hotplug is trying to load it here). Is there
> something like agpgart for PCI express ? Or is it useless ?

PCI express neither needs nor can use AGP.

(and to be honest, AGP is one of those things that is best compiled into
the kernel if you need it. AGP deals with system/memory resources, and
some bioses fuck that up. If agp is built in, it'll be fixed in time.
There have been a series of bugs about it against fedora, until it we
made it compiled in .. and poof.. bugs gone)


2006-01-07 21:50:58

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Dave Jones wrote:

> > Should I prevent my initscript from loading agpgart (actually intel_agp)
> > at all ? (I guess udev or hotplug is trying to load it here). Is there
> > something like agpgart for PCI express ? Or is it useless ?
>
>it's useless. though the loading of it shouldn't harm anything.
>Does it spew warnings during your boot ?
>
>
No, I don't see any warning/problem.

Thanks.
Brice

2006-01-07 21:55:19

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6.15-mm2

syfs-crash-debugging.patch (from Adrian Bunk) has this:

+ printk(KERN_ALERT "last sysfs file: %s\n", last_sysfs_file);

but davej has changed all the messages around it to KERN_EMERG in his earlier
printk-levels-for-i386-oops-code.patch

--
Chuck
Currently reading: _Sleepside: The Collected Fantasies Of Greg Bear_

2006-01-07 22:04:45

by Chuck Ebbert

[permalink] [raw]
Subject: Re: 2.6.15-mm2

printk-levels-for-i386-oops-code.patch has two minor problems:

> @@ -178,14 +178,15 @@ void show_stack(struct task_struct *task
> }
>
> stack = esp;
> + printk(KERN_EMERG);
> for(i = 0; i < kstack_depth_to_print; i++) {
> if (kstack_end(stack))
> break;
> if (i && ((i % 8) == 0))
> - printk("\n ");
> + printk("\n " KERN_EMERG);

Should be:
+ printk("\n" KERN_EMERG " ");

> printk("%08lx ", *stack++);
> }
> - printk("\nCall Trace:\n");
> + printk("\n" KERN_EMERG "Call Trace:\n");
> show_trace(task, esp);
> }


And:

> @@ -236,17 +237,17 @@ void show_registers(struct pt_regs *regs
> if (in_kernel) {
> u8 __user *eip;
>
> - printk("\nStack: ");
> + printk("\n" KERN_EMERG "Stack: ");
> show_stack(NULL, (unsigned long*)esp);
>
> - printk("Code: ");
> + printk(KERN_EMERG "Code: ");
>
> eip = (u8 __user *)regs->eip - 43;
> for (i = 0; i < 64; i++, eip++) {
> unsigned char c;
>
> if (eip < (u8 __user *)PAGE_OFFSET || __get_user(c, eip)) {
> - printk(" Bad EIP value.");
> + printk(KERN_EMERG " Bad EIP value.");

The above one-line change should not be made -- it's in the middle of a line.

> break;
> }
> if (eip == (u8 __user *)regs->eip)
--
Chuck
Currently reading: _Sleepside: The Collected Fantasies Of Greg Bear_

2006-01-07 22:06:05

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 8/01/2006 10:31 a.m., Andrew Morton wrote:
> Reuben Farrelly <[email protected]> wrote:
>> ...
>> QLogic Fibre Channel HBA Driver
>> ahci: probe of 0000:00:1f.2 failed with error -12
>
> It's odd that the ahci driver returned -EBUSY. Maybe this is due to "we
> have legacy mode, but all ports are unavailable" in ata_pci_init_one().

I've now removed this driver from my .config via menuconfig, I certainly don't
have the hardware and have no idea whatsoever how it came to be built in.
Although I guess it shouldn't be blowing up even if that is the case?

>> ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
>> ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
>> Unable to handle kernel NULL pointer dereference at virtual address 00000000
>> printing eip:
>> c0234702
>> *pde = 00000000
>> Oops: 0000 [#1]
>> SMP
>> last sysfs file:
>> Modules linked in:
>> CPU: 1
>> EIP: 0060:[<c0234702>] Not tainted VLI
>> EFLAGS: 00010206 (2.6.15-mm2)
>> EIP is at make_class_name+0x28/0x8d
>> eax: 00000000 ebx: ffffffff ecx: ffffffff edx: c19d9224
>> esi: 00000009 edi: 00000000 ebp: 00000000 esp: c1921d9c
>> ds: 007b es: 007b ss: 0068
>> Process swapper (pid: 1, threadinfo=c1921000 task=c1920a70)
>> Stack: <0>c19d9224 c03a9158 c19d9224 c03a9158 c03a9160 c0234925 c03a90e0 00000000
>> <0>00000246 c19d9224 c19d9000 c19d9030 00000002 c02349db c19d90e4 c0253218
>> <0>c19d92c0 00000000 00000000 c0276693 00000000 c0279391 c035749f c1961640
>> Call Trace:
>> [<c0234925>] class_device_del+0x9f/0x14d
>> [<c02349db>] class_device_unregister+0x8/0x10
>> [<c0253218>] scsi_remove_host+0xb8/0xf8
>> [<c0276693>] ata_host_remove+0xe/0x18
>> [<c0279391>] ata_device_add+0x2d3/0xb99
>> [<c02b6fb0>] pci_mmcfg_write+0xd3/0x103
>> [<c01eb713>] pci_bus_write_config_byte+0x4e/0x58
>> [<c02b67d3>] pcibios_set_master+0x74/0x8c
>> [<c027a2e5>] ata_pci_init_one+0x32c/0x38e
>> [<c01eb7ea>] pci_bus_read_config_word+0x62/0x6c
>> [<c01ef8bd>] pci_get_subsys+0x6c/0xe0
>> [<c027e334>] piix_init_one+0x18e/0x33a
>> [<c01ef259>] pci_device_probe+0x40/0x5b
>> [<c0233ed7>] driver_probe_device+0x35/0x98
>> [<c0234038>] __driver_attach+0x8a/0x8c
>> [<c02339a7>] bus_for_each_dev+0x39/0x57
>> [<c0233e4c>] driver_attach+0x16/0x1a
>> [<c0233fae>] __driver_attach+0x0/0x8c
>> [<c023365b>] bus_add_driver+0x6f/0x126
>> [<c01ef3f1>] __pci_register_driver+0x7d/0xac
>> [<c04023e9>] piix_init+0xc/0x1e
>> [<c01003c8>] init+0xff/0x324
>> [<c01002c9>] init+0x0/0x324
>> [<c0100d35>] kernel_thread_helper+0x5/0xb
>> Code: 89 c8 c3 55 57 56 53 83 ec 04 89 04 24 89 c2 8b 40 48 8b 38 31 ed bb ff ff
>> ff ff 89 d9 89 e8 f2 ae f7 d1 49 89 ce 8b 7a 08 89 d9 <f2> ae f7 d1 49 89 ca 8d
>> 4e 02 8d 04 0a ba d0 00 00 00 e8 22 cf
>
> ata_device_add() has given up, has called ata_host_remove() and then we
> presumably oopsed over incompletely initialised class stuff. It's likely
> that this oops is a second bug - a consequence of the -EBUSY.
>
>>
>> 2. Notice above how the sky2 driver is being bailed out:
>>
>> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
>> sky2 Cannot find PowerManagement capability, aborting.
>> sky2: probe of 0000:04:00.0 failed with error -5
>>
>> This has happened a number of times in the last few days, and I suspect is
>> unrelated to the oops that followed above.
>>
>> This driver worked fine in 2.6.15-rc5-mm3, and seems to work OK when built as a
>> module. But most of the time (not all the time) it doesn't like being
>> statically built in and fails with the above error. Changes to this driver have
>> been fairly small lately so I'm not sure if it's the driver or something else
>> like ACPI that is the root cause.
>
> Could be acpi, yes.
>
> Parenthetically, I wouldn't have thought that this error should be fatal
> for the driver.

lspci -vv shows that when the driver fails we see this:

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR+ <PERR-

and when it works we see this:

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>SERR- <PERR-

I'm not sure if that's a consequence of the fail, or the cause of it.

>> 3. The boot up process with -mm2 was pretty lengthy, I had two periods of time
>> when the whole system just came to a crawl, first time was when starting cups,
>> and it came back to life and continued booting about 30s later. Next when
>> starting hpijs it didn't come to life at all and I had to reboot. No output to
>> the console for either, unfortunately.
>
> Don't know, sorry. But this kernel had oopsed, hadn't it?

I reloaded multiple times, the oopsing only occurred till I did a full cold
boot, and then it came right (but until then I had the oops twice in a row
across a warm reboot).

If I have time to play later on today I'll see if I can get more info.

reuben


2006-01-07 22:13:25

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Sat, Jan 07, 2006 at 04:50:48PM -0500, Brice Goglin wrote:
> Dave Jones wrote:
>
> > > Should I prevent my initscript from loading agpgart (actually intel_agp)
> > > at all ? (I guess udev or hotplug is trying to load it here). Is there
> > > something like agpgart for PCI express ? Or is it useless ?
> >
> >it's useless. though the loading of it shouldn't harm anything.
> >Does it spew warnings during your boot ?
> >
> >
> No, I don't see any warning/problem.

I'm curious how you noticed this change of behaviour at all then :-)
(The only user visible change is that it no longer prints anything
about agpgart during boot. Was that what tipped you off?)

Dave

2006-01-07 22:26:42

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Dave Jones wrote:

>On Sat, Jan 07, 2006 at 04:50:48PM -0500, Brice Goglin wrote:
> > Dave Jones wrote:
> >
> > > > Should I prevent my initscript from loading agpgart (actually intel_agp)
> > > > at all ? (I guess udev or hotplug is trying to load it here). Is there
> > > > something like agpgart for PCI express ? Or is it useless ?
> > >
> > >it's useless. though the loading of it shouldn't harm anything.
> > >Does it spew warnings during your boot ?
> > >
> > >
> > No, I don't see any warning/problem.
>
>I'm curious how you noticed this change of behaviour at all then :-)
>(The only user visible change is that it no longer prints anything
> about agpgart during boot. Was that what tipped you off?)
>
>
I simply noticed modprobe saying that agpgart didn't get loaded ("No
such device") during hardware detection. Nothing else. Nothing bad then.

Brice

2006-01-07 22:58:19

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brice Goglin <[email protected]> wrote:
>
> 2) acpi-cpufreq does not load either, returns ENODEV too. It's probably
> git-acpi. I tried to revert it but there are lots of other patches
> depending on it, so I finally gave up.

OK, let me try to reproduce this. acpi and cpufreq are fully merged up, so
this bug may well be in mainline now.

> 3) wpa_supplicant does not find my WPA network anymore (while iwlist
> scanning sees). I didn't see anything relevant in dmesg. My driver is
> ipw2200.

It's things like this which make me consider a career in carpentry.

I assume 2.6.15 works OK?

Unfortunately I don't know diddly about wpa_supplicant (how come FC5-test1
doesn't ship it?)

2006-01-07 23:15:32

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Replying to myself is not a good thing but:

On 8/01/2006 11:06 a.m., Reuben Farrelly wrote:
>
>
> On 8/01/2006 10:31 a.m., Andrew Morton wrote:
>> Reuben Farrelly <[email protected]> wrote:
>>> ...
>>> QLogic Fibre Channel HBA Driver
>>> ahci: probe of 0000:00:1f.2 failed with error -12
>>
>> It's odd that the ahci driver returned -EBUSY. Maybe this is due to "we
>> have legacy mode, but all ports are unavailable" in ata_pci_init_one().
>
> I've now removed this driver from my .config via menuconfig, I certainly
> don't have the hardware and have no idea whatsoever how it came to be
> built in. Although I guess it shouldn't be blowing up even if that is
> the case?

I thought I'd clear up that I only removed the QLogic driver, and not AHCI ;-)

reuben

2006-01-07 23:38:43

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton wrote:

>Brice Goglin <[email protected]> wrote:
>
>
>>2) acpi-cpufreq does not load either, returns ENODEV too. It's probably
>> git-acpi. I tried to revert it but there are lots of other patches
>> depending on it, so I finally gave up.
>>
>>
>
>OK, let me try to reproduce this. acpi and cpufreq are fully merged up, so
>this bug may well be in mainline now.
>
>
>
>> 3) wpa_supplicant does not find my WPA network anymore (while iwlist
>> scanning sees). I didn't see anything relevant in dmesg. My driver is
>> ipw2200.
>>
>>
>
>It's things like this which make me consider a career in carpentry.
>
>I assume 2.6.15 works OK?
>
>

2.6.15 and 2.6.15-git3 both don't show any of these issues. Did acpi and
cpufreq get merged after -git3 ?

thanks,
Brice

2006-01-07 23:40:29

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> Replying to myself is not a good thing but:
>
> On 8/01/2006 11:06 a.m., Reuben Farrelly wrote:
> >
> >
> > On 8/01/2006 10:31 a.m., Andrew Morton wrote:
> >> Reuben Farrelly <[email protected]> wrote:
> >>> ...
> >>> QLogic Fibre Channel HBA Driver
> >>> ahci: probe of 0000:00:1f.2 failed with error -12
> >>
> >> It's odd that the ahci driver returned -EBUSY. Maybe this is due to "we
> >> have legacy mode, but all ports are unavailable" in ata_pci_init_one().
> >
> > I've now removed this driver from my .config via menuconfig, I certainly
> > don't have the hardware and have no idea whatsoever how it came to be
> > built in. Although I guess it shouldn't be blowing up even if that is
> > the case?
>
> I thought I'd clear up that I only removed the QLogic driver, and not AHCI ;-)
>

That message was caused by the ahci driver.

2006-01-07 23:49:00

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Alexey Dobriyan <[email protected]> wrote:
>
> alpha Just Broken (TM)
> ----------------------------------------------------------------------------
> CC arch/alpha/kernel/asm-offsets.s
> In file included from include/asm/user.h:5,
> from include/linux/user.h:1,
> from include/linux/kernel.h:16,
> from include/linux/spinlock.h:54,
> from include/linux/capability.h:45,
> from include/linux/sched.h:7,
> from arch/alpha/kernel/asm-offsets.c:9:
> include/linux/ptrace.h: In function `ptrace_link':
> include/linux/ptrace.h:100: error: dereferencing pointer to incomplete type
> include/linux/ptrace.h: In function `ptrace_unlink':
> include/linux/ptrace.h:105: error: dereferencing pointer to incomplete type
> make[1]: *** [arch/alpha/kernel/asm-offsets.s] Error 1

This is caused by the inclusion of user.h in kernel.h added by
dump_thread-cleanup.patch.

Fix:

--- 25-alpha/include/linux/kernel.h~dump_thread-cleanup-fix 2006-01-07 15:46:50.000000000 -0800
+++ 25-alpha-akpm/include/linux/kernel.h 2006-01-07 15:47:20.000000000 -0800
@@ -13,7 +13,6 @@
#include <linux/types.h>
#include <linux/compiler.h>
#include <linux/bitops.h>
-#include <linux/user.h>
#include <asm/byteorder.h>
#include <asm/bug.h>

@@ -48,6 +47,8 @@ extern int console_printk[];
#define default_console_loglevel (console_printk[3])

struct completion;
+struct pt_regs;
+struct user;

/**
* might_sleep - annotation for functions that can sleep
@@ -124,7 +125,6 @@ extern int __kernel_text_address(unsigne
extern int kernel_text_address(unsigned long addr);
extern int session_of_pgrp(int pgrp);

-struct pt_regs;
extern void dump_thread(struct pt_regs *regs, struct user *dump);

#ifdef CONFIG_PRINTK
_

2006-01-08 00:29:04

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH -mm] fixup *at syscalls additions (alpha, sparc64)

Signed-off-by: Alexey Dobriyan <[email protected]>
---

Apply after dump_thread-cleanup.patch fixup.

--- linux-2.6.15-mm2/arch/alpha/kernel/osf_sys.c
+++ linux-1/arch/alpha/kernel/osf_sys.c
@@ -960,7 +960,7 @@ osf_utimes(char __user *filename, struct
return -EFAULT;
}

- return do_utimes(filename, tvs ? ktvs : NULL);
+ return do_utimes(AT_FDCWD, filename, tvs ? ktvs : NULL);
}

#define MAX_SELECT_SECONDS \
--- linux-2.6.15-mm2/arch/sparc64/kernel/sys_sparc32.c
+++ linux-1/arch/sparc64/kernel/sys_sparc32.c
@@ -820,7 +820,7 @@ asmlinkage long sys32_utimes(char __user
return -EFAULT;
}

- return do_utimes(filename, (tvs ? &ktvs[0] : NULL));
+ return do_utimes(AT_FDCWD, filename, (tvs ? &ktvs[0] : NULL));
}

/* These are here just in case some old sparc32 binary calls it. */

2006-01-08 00:37:20

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH -mm] Fixup arch/alpha/mm/init.c compilation

CC arch/alpha/mm/init.o
In file included from include/asm/tlb.h:10,
from arch/alpha/mm/init.c:32:
include/asm-generic/tlb.h: In function `tlb_flush_mmu':
include/asm-generic/tlb.h:77: warning: implicit declaration of function `release_pages'
include/asm-generic/tlb.h: In function `tlb_remove_page':
include/asm-generic/tlb.h:106: warning: implicit declaration of function `page_cache_release'

Signed-off-by: Alexey Dobriyan <[email protected]>
---

arch/alpha/mm/init.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/alpha/mm/init.c
+++ b/arch/alpha/mm/init.c
@@ -7,6 +7,7 @@
/* 2.3.x zone allocator, 1999 Andrea Arcangeli <[email protected]> */

#include <linux/config.h>
+#include <linux/pagemap.h>
#include <linux/signal.h>
#include <linux/sched.h>
#include <linux/kernel.h>

2006-01-08 00:40:20

by Alexander Gran

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Am Samstag, 7. Januar 2006 14:22 schrieben Sie:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15
>-mm2/
>
> This should be somewhat less buggy than 2.6.15-mm1.

Yep. I can boot and use X. You're getting that mail from 2.6.15-mm2.
However I still get these
EDAC PCI- Detected Parity Error on 0000:00:1e.0
And the system is quite sluggish
Dunno why, perhaps because of EDAC.

regards
Alex

--
Encrypted Mails welcome.
PGP-Key at http://zodiac.dnsalias.org/misc/pgpkey.asc | Key-ID: 0x6D7DD291


Attachments:
(No filename) (521.00 B)
(No filename) (189.00 B)
Download all attachments

2006-01-08 07:47:45

by Chuck Ebbert

[permalink] [raw]
Subject: Re: Badness in __mutex_unlock_slowpath

In-Reply-To: <200601071551.20344.ajwade@cpe001346162bf9-cm0011ae8cd564.cpe.net.cable.rogers.com>

On Sat, 7 Jan 2006 at 15:51:19 -0500 Andrew James Wade wrote:


> I got this when "amaroK" started playing:
>
> Badness in __mutex_unlock_slowpath at kernel/mutex.c:214
> [<c03538e8>] __mutex_unlock_slowpath+0x56/0x1a2
> [<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
> [<c0302f3c>] snd_pcm_oss_write+0x34/0x1e0
> [<c0302f08>] snd_pcm_oss_write+0x0/0x1e0
> [<c0148221>] vfs_write+0x83/0x122
> [<c0148a36>] sys_write+0x3c/0x63
> [<c0102ba3>] sysenter_past_esp+0x54/0x75


The thread doing the unlock does not own the mutex.

Same exact check is made a few lines later in debug_mutex_unlock().

And debugging gets turned off after the first debug message prints,
so there could be other problems that are not reported.

--
Chuck
Currently reading: _Thud!_ by Terry Pratchett

2006-01-08 08:16:43

by Brown, Len

[permalink] [raw]
Subject: RE: 2.6.15-mm2


>2) acpi-cpufreq does not load either, returns ENODEV too. It's probably
>git-acpi. I tried to revert it but there are lots of other patches
>depending on it, so I finally gave up.

Brice,
Can you try the converse?
Apply the acpi patch (which is included in -mm)
without the rest of the mm tree to see if that broke acpi-cpufreq?:

http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.15/acpi-test-20051216-2.6.15.diff.bz2

thanks,
-Len

2006-01-08 08:20:16

by Brown, Len

[permalink] [raw]
Subject: RE: 2.6.15-mm2


>> 2. Notice above how the sky2 driver is being bailed out:
>>
>> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
>> sky2 Cannot find PowerManagement capability, aborting.
>> sky2: probe of 0000:04:00.0 failed with error -5
>>
>> ...so I'm not sure if it's the driver or something else
>> like ACPI that is the root cause.
>
>Could be acpi, yes.

Any difference if you boot with "acpi=off" or "pci=noacpi"?
If that fixes it, then ACPI is shomehow involved in the problem.
If it doesn't fix it, then ACPI is not involved.

thanks,
-Len

2006-01-08 08:53:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: Badness in __mutex_unlock_slowpath


* Arjan van de Ven <[email protected]> wrote:

> this looks like a really evil alsa bug:
>
> (pre mutex code below)

> up(&file->f_dentry->d_inode->i_sem);
> result = snd_pcm_oss_write1(substream, buf, count);
> down(&file->f_dentry->d_inode->i_sem);

> this is a .write method of a driver, which doesn't run with i_sem held
> at all. Best guess I have is that this code has up() and down()
> confused and switched...

well snd_pcm_oss_read1() is not using the mutex at all - nor any other
functions here. So the patch below removes the i_mutex use. _If_ some
synchronization is needed it would be needed in the read1 case too: it
is destructive to a sound stream when it is 'read' and when it is
'written' just as much.

the bug could cause inode corruption on the VFS level: one thread
unlocks an inode it doesnt own - this could surprise another thread
holding that mutex and could allow a third thread to lock it and thus
two threads would be in a critical section - bad.

Ingo

--
remove bogus i_mutex use from sound/core/oss/pcm_oss.c.

Signed-off-by: Ingo Molnar <[email protected]>

----

sound/core/oss/pcm_oss.c | 2 --
1 files changed, 2 deletions(-)

Index: linux/sound/core/oss/pcm_oss.c
===================================================================
--- linux.orig/sound/core/oss/pcm_oss.c
+++ linux/sound/core/oss/pcm_oss.c
@@ -2135,9 +2135,7 @@ static ssize_t snd_pcm_oss_write(struct
substream = pcm_oss_file->streams[SNDRV_PCM_STREAM_PLAYBACK];
if (substream == NULL)
return -ENXIO;
- mutex_unlock(&file->f_dentry->d_inode->i_mutex);
result = snd_pcm_oss_write1(substream, buf, count);
- mutex_lock(&file->f_dentry->d_inode->i_mutex);
#ifdef OSS_DEBUG
printk("pcm_oss: write %li bytes (wrote %li bytes)\n", (long)count, (long)result);
#endif

2006-01-08 09:40:18

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hi,

On 8/01/2006 9:19 p.m., Brown, Len wrote:
>
>>> 2. Notice above how the sky2 driver is being bailed out:
>>>
>>> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
>>> sky2 Cannot find PowerManagement capability, aborting.
>>> sky2: probe of 0000:04:00.0 failed with error -5
>>>
>>> ...so I'm not sure if it's the driver or something else
>>> like ACPI that is the root cause.
>> Could be acpi, yes.
>
> Any difference if you boot with "acpi=off" or "pci=noacpi"?
> If that fixes it, then ACPI is shomehow involved in the problem.
> If it doesn't fix it, then ACPI is not involved.

Big difference, but probably not the sort of difference you were hoping for ;)


kernel /vmlinuz-2.6.15-mm2 ro root=/dev/md0 panic=60 console=ttyS0,57600 single
acpi=off
[Linux-bzImage, setup=0x1400, size=0x1842ed]

Linux version 2.6.15-mm2 ([email protected]) (gcc version 4.1.0 20051222
(Red Hat 4.1.0-0.12)) #1 SMP Sun Jan 8 11:50:13 NZDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fe2f800 (usable)
BIOS-e820: 000000003fe2f800 - 000000003fe3f8e3 (ACPI NVS)
BIOS-e820: 000000003ff2f800 - 000000003ff30000 (ACPI NVS)
BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data)
BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS)
BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fed13000 - 00000000fed1a000 (reserved)
BIOS-e820: 00000000fed1c000 - 00000000feda0000 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: Product ID: Grantsdale-G APIC at: 0xFEE00000
Processor #0 15:3 APIC version 20
I/O APIC #2 Version 32 at 0xFEC00000.
Enabling APIC mode: Flat. Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
Detected 2800.280 MHz processor.
Built 1 zonelists
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600 single acpi=off
CPU 0 irqstacks, hard=c0405000 soft=c0403000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1033572k/1046716k available (2142k kernel code, 12480k reserved, 715k
data, 200k init, 129212k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5607.11 BogoMIPS (lpj=11214232)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Total of 1 processors activated (5607.11 BogoMIPS).
ExtINT not setup in hardware but reported by MP table
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
Brought up 1 CPUs
migration_cost=0
NET: Registered protocol family 16
PCI: Using configuration type 1
ACPI: Subsystem revision 20051216
ACPI: Interpreter disabled.
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/2640] at 0000:00:1f.0
PCI->APIC IRQ transform: 0000:00:1d.0[A] -> IRQ 209
PCI->APIC IRQ transform: 0000:00:1d.1[B] -> IRQ 193
PCI->APIC IRQ transform: 0000:00:1d.2[C] -> IRQ 185
PCI->APIC IRQ transform: 0000:00:1d.3[D] -> IRQ 169
PCI->APIC IRQ transform: 0000:00:1d.7[A] -> IRQ 209
PCI->APIC IRQ transform: 0000:00:1f.1[A] -> IRQ 185
PCI->APIC IRQ transform: 0000:00:1f.2[B] -> IRQ 193
PCI->APIC IRQ transform: 0000:00:1f.3[B] -> IRQ 193
PCI->APIC IRQ transform: 0000:04:00.0[A] -> IRQ 177
PCI->APIC IRQ transform: 0000:06:00.0[A] -> IRQ 201
PCI->APIC IRQ transform: 0000:06:02.0[A] -> IRQ 185
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: ffa00000-ffafffff
PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: ff600000-ff6fffff
PREFETCH window: fdb00000-fdbfffff
PCI: Bridge: 0000:00:1c.1
IO window: a000-afff
MEM window: ff700000-ff7fffff
PREFETCH window: fdc00000-fdcfffff
PCI: Bridge: 0000:00:1c.2
IO window: disabled.
MEM window: ff800000-ff8fffff
PREFETCH window: fdd00000-fddfffff
PCI: Bridge: 0000:00:1c.3
IO window: disabled.
MEM window: ff900000-ff9fffff
PREFETCH window: fde00000-fdefffff
PCI: Bridge: 0000:00:1e.0
IO window: b000-bfff
MEM window: ff500000-ff5fffff
PREFETCH window: fe000000-fe7fffff
PCI: No IRQ known for interrupt pin A of device 0000:00:01.0. Probably buggy MP
table.
PCI: No IRQ known for interrupt pin A of device 0000:00:1c.0. Probably buggy MP
table.
PCI: No IRQ known for interrupt pin B of device 0000:00:1c.1. Probably buggy MP
table.
PCI: No IRQ known for interrupt pin C of device 0000:00:1c.2. Probably buggy MP
table.
PCI: No IRQ known for interrupt pin D of device 0000:00:1c.3. Probably buggy MP
table.
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered<6>Time: tsc clocksource has been installed.

PCI: No IRQ known for interrupt pin A of device 0000:00:01.0. Probably buggy MP
table.
pcie_portdrv_probe->Dev[2585:8086] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
PCI: No IRQ known for interrupt pin A of device 0000:00:1c.0. Probably buggy MP
table.
pcie_portdrv_probe->Dev[2660:8086] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
PCI: No IRQ known for interrupt pin B of device 0000:00:1c.1. Probably buggy MP
table.
pcie_portdrv_probe->Dev[2662:8086] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
PCI: No IRQ known for interrupt pin C of device 0000:00:1c.2. Probably buggy MP
table.
pcie_portdrv_probe->Dev[2664:8086] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
PCI: No IRQ known for interrupt pin D of device 0000:00:1c.3. Probably buggy MP
table.
pcie_portdrv_probe->Dev[2666:8086] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
?serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
sky2 v0.11 addr 0xff720000 irq 177 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:11:11:43:05:2f
sky2 0000:04:00.0: pci express error (0x0)
sky2 0000:04:00.0: pci express error (0x0)
sky2 0000:04:00.0: pci express error (0x0)
sky2 0000:04:00.0: pci express error (0x0)
sky2 0000:04:00.0: pci express error (0x0)
sky2 0000:04:00.0: pci express error (0x0)

<last few lines repeat over and over neverending>

reuben

2006-01-08 12:14:37

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Sat, Jan 07, 2006 at 03:48:42PM -0800, Andrew Morton wrote:
> Alexey Dobriyan <[email protected]> wrote:
> > from arch/alpha/kernel/asm-offsets.c:9:
> > include/linux/ptrace.h: In function `ptrace_link':
> > include/linux/ptrace.h:100: error: dereferencing pointer to incomplete type
> > include/linux/ptrace.h: In function `ptrace_unlink':
> > include/linux/ptrace.h:105: error: dereferencing pointer to incomplete type
> > make[1]: *** [arch/alpha/kernel/asm-offsets.s] Error 1
>
> This is caused by the inclusion of user.h in kernel.h added by
> dump_thread-cleanup.patch.
>
> Fix:
>
> --- 25-alpha/include/linux/kernel.h~dump_thread-cleanup-fix 2006-01-07 15:46:50.000000000 -0800
> +++ 25-alpha-akpm/include/linux/kernel.h 2006-01-07 15:47:20.000000000 -0800
> @@ -13,7 +13,6 @@
> #include <linux/types.h>
> #include <linux/compiler.h>
> #include <linux/bitops.h>
> -#include <linux/user.h>
> #include <asm/byteorder.h>
> #include <asm/bug.h>
>
> @@ -48,6 +47,8 @@ extern int console_printk[];
> #define default_console_loglevel (console_printk[3])
>
> struct completion;
> +struct pt_regs;
> +struct user;
>
> /**
> * might_sleep - annotation for functions that can sleep
> @@ -124,7 +125,6 @@ extern int __kernel_text_address(unsigne
> extern int kernel_text_address(unsigned long addr);
> extern int session_of_pgrp(int pgrp);
>
> -struct pt_regs;
> extern void dump_thread(struct pt_regs *regs, struct user *dump);
>
> #ifdef CONFIG_PRINTK
> _

Yum. It also remove truckload of warnings from parisc build:

include/linux/kernel.h:128: warning: "struct user" declared inside parameter list
include/linux/kernel.h:128: warning: its scope is only this definition or declaration, which is probably not what you want

2006-01-08 12:25:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brice Goglin <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> >Brice Goglin <[email protected]> wrote:
> >
> >
> >>2) acpi-cpufreq does not load either, returns ENODEV too. It's probably
> >> git-acpi. I tried to revert it but there are lots of other patches
> >> depending on it, so I finally gave up.
> >>
> >>
> >
> >OK, let me try to reproduce this. acpi and cpufreq are fully merged up, so
> >this bug may well be in mainline now.
> >
> >
> >
> >> 3) wpa_supplicant does not find my WPA network anymore (while iwlist
> >> scanning sees). I didn't see anything relevant in dmesg. My driver is
> >> ipw2200.
> >>
> >>
> >
> >It's things like this which make me consider a career in carpentry.
> >
> >I assume 2.6.15 works OK?
> >
> >
>
> 2.6.15 and 2.6.15-git3 both don't show any of these issues. Did acpi and
> cpufreq get merged after -git3 ?
>

Well whatever bug it is, it's in Linus's tree now. Happens for me too.

I traced the failure down as far as acpi_processor_get_performance_info(),
where it's failing here:

status = acpi_get_handle(pr->handle, "_PCT", &handle);
if (ACPI_FAILURE(status)) {
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"ACPI-based processor performance control unavailable\n"));
return_VALUE(-ENODEV);
}


2006-01-08 12:29:10

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brice Goglin <[email protected]> wrote:
>
> >> 3) wpa_supplicant does not find my WPA network anymore (while iwlist
> >> scanning sees). I didn't see anything relevant in dmesg. My driver is
> >> ipw2200.

And this one I don't have a clue about. Can you test the next git
snapshot, or otherwise work out a bit more about it?

2006-01-08 14:14:10

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton wrote:

>Brice Goglin <[email protected]> wrote:
>
>
>> >> 3) wpa_supplicant does not find my WPA network anymore (while iwlist
>> >> scanning sees). I didn't see anything relevant in dmesg. My driver is
>> >> ipw2200.
>>
>>
>
>And this one I don't have a clue about. Can you test the next git
>snapshot, or otherwise work out a bit more about it?
>
>
I just work out a bit more and finally got it to work... Looks like
wpa_supplicant needs a less flexible configuration now that some of my
neighbors got a wireless access point for christmas. Let's say this bug
does not exist. I'll try to check whether it's really worse on -mm than
on vanilla.
Sorry about that.

Brice

2006-01-08 14:25:56

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brown, Len wrote:

>
>
>
>>2) acpi-cpufreq does not load either, returns ENODEV too. It's probably
>>git-acpi. I tried to revert it but there are lots of other patches
>>depending on it, so I finally gave up.
>>
>>
>
>Brice,
>Can you try the converse?
>Apply the acpi patch (which is included in -mm)
>without the rest of the mm tree to see if that broke acpi-cpufreq?:
>
>http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.15/acpi-test-20051216-2.6.15.diff.bz2
>
>thanks,
>-Len
>
>

Len,

This patch applied on top of 2.6.15 breaks acpi-cpufreq in the same way.

Brice

2006-01-08 14:38:59

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton wrote:

>Well whatever bug it is, it's in Linus's tree now. Happens for me too.
>
>
>
Happens for you in -git4 ? Does not here...

Brice



>I traced the failure down as far as acpi_processor_get_performance_info(),
>where it's failing here:
>
> status = acpi_get_handle(pr->handle, "_PCT", &handle);
> if (ACPI_FAILURE(status)) {
> ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> "ACPI-based processor performance control unavailable\n"));
> return_VALUE(-ENODEV);
> }
>
>
>
>

2006-01-08 17:58:33

by Brown, Len

[permalink] [raw]
Subject: RE: 2.6.15-mm2


>>>2) acpi-cpufreq does not load either, returns ENODEV too.
>>>It's probably git-acpi. I tried to revert it but there are
>>>lots of other patches depending on it, so I finally gave up.
>>>
>>>
>>
>>Brice,
>>Can you try the converse?
>>Apply the acpi patch (which is included in -mm)
>>without the rest of the mm tree to see if that broke acpi-cpufreq?:
>>
>>http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patche
>s/test/2.6.15/acpi-test-20051216-2.6.15.diff.bz2
>>
>>thanks,
>>-Len
>>
>>
>
>Len,
>
>This patch applied on top of 2.6.15 breaks acpi-cpufreq in the
>same way.

Ah good!
Can you test the _PDC patch here all by itself?

http://bugzilla.kernel.org/show_bug.cgi?id=5483#c17

If this patch is what causes the failure, please
build with CONFIG_ACP_DEBUG=y and attach the dmesg
from the failure, as well as the output from acpidump,
available in the latest pmtools here:

http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils

thanks,
-Len

2006-01-08 18:09:14

by Brown, Len

[permalink] [raw]
Subject: RE: 2.6.15-mm2

>> Any difference if you boot with "acpi=off" or "pci=noacpi"?
>> If that fixes it, then ACPI is shomehow involved in the problem.
>> If it doesn't fix it, then ACPI is not involved.

>Big difference, but probably not the sort of difference you
>were hoping for ;)

>PCI: No IRQ known for interrupt pin C of device 0000:00:1c.2.
>Probably buggy MP table.

Yeah, that that's no help. Sorry, debugging the legacy MPS
code is where I draw the line:-) I guess if you want to compare
with and without ACPI you have to go all the way down to
UP/PIC mode, (maxcpus=1 noapic, with and with out acpi=off)
but unless that fails with acpi and works without, we may
not be able to tell much about the failure from it.

thanks
-Len

2006-01-08 18:18:43

by Brown, Len

[permalink] [raw]
Subject: RE: 2.6.15-mm2


>> 2.6.15 and 2.6.15-git3 both don't show any of these issues.
>> Did acpi and cpufreq get merged after -git3 ?
>>
>
>Well whatever bug it is, it's in Linus's tree now. Happens for me too.
>
>I traced the failure down as far as acpi_processor_get_performance_info(),
>where it's failing here:
>
> status = acpi_get_handle(pr->handle, "_PCT", &handle);
> if (ACPI_FAILURE(status)) {
> ACPI_DEBUG_PRINT((ACPI_DB_INFO,
> "ACPI-based processor
>performance control unavailable\n"));
> return_VALUE(-ENODEV);
> }

No, acpi was not merged after 2.6.15 -- see if cpufreq changed.

-Len

2006-01-08 18:56:57

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brice Goglin <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> >Well whatever bug it is, it's in Linus's tree now. Happens for me too.
> >
> >
> >
> Happens for you in -git4 ?

Yup.

> Does not here...

grr.

2006-01-09 17:47:04

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 1/7/06, Jesper Juhl <[email protected]> wrote:
> On 1/7/06, Andrew Morton <[email protected]> wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/
> >
> > This should be somewhat less buggy than 2.6.15-mm1.
> >
> For some maybe. For me it's just as broken as 2.6.15-mm1 :-(
>
> I'll turn on all debug switches and try and collect some crash dumps.
> If there's anything specific you want me to try, let me know.
>

Ok, got some info.

Following Dave's theory that, given that I hit bad_page() sometimes
during the crash that might actually be the first thing I hit before
hitting BUG(), I added an infinite loop at the end of bad_page() so it
would stay on screen for me to write down - and bad_page() turns out
to actuall *be* the very first thing I hit.

Here's what bad_page printed for me :

Bad page state in process 'kded'
[<c0103e77>] dump_stack+0x17/0x20
[<c0148999>] bad_page+0x69/0x160
[<c0148e92>] __free_pages_ok+0xa2/0x120
[<c0149c7f>] __free_pages+0x2f/0x60
[<c02acb63>] sg_page_free+0x23/0x30
[<c02abdb3>] sg_remove_scat+0x63/0xe0
[<c02ac80d>] __sg_remove_sfp+0x4d/0xc0
[<c02ac927>] sg_remove_sfp+0xa7/0x120
[<c02a8b39>] sg_release+0x49/0xc0
[<c0166827>] __fput+0x167/0x1b0
[<c01666ab>] fput+0x3b/0x50
[<c0164efc>] filp_close+0x3c/0x80
[<c0164fa9>] sys_close+0x69/0x90
[<c0103009>] syscall_call+0x7/0xb
Hexdump:
000: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00
010: 50 81 e8 c1 50 81 e8 c1 ff ff ff ff 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040: 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-09 17:57:56

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Mon, Jan 09, 2006 at 06:47:01PM +0100, Jesper Juhl wrote:

> Here's what bad_page printed for me :
>
> Bad page state in process 'kded'
> [<c0103e77>] dump_stack+0x17/0x20
> [<c0148999>] bad_page+0x69/0x160

Odd, there should be more state between the 'Bad page'
and the backtrace.

printk(KERN_EMERG "Bad page state in process '%s'\n"
"page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
"Trying to fix it up, but a reboot is needed\n"

Did you aggressively trim that, or did it for some
reason not get printed ?

Dave

2006-01-09 18:01:48

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 1/9/06, Dave Jones <[email protected]> wrote:
> On Mon, Jan 09, 2006 at 06:47:01PM +0100, Jesper Juhl wrote:
>
> > Here's what bad_page printed for me :
> >
> > Bad page state in process 'kded'
> > [<c0103e77>] dump_stack+0x17/0x20
> > [<c0148999>] bad_page+0x69/0x160
>
> Odd, there should be more state between the 'Bad page'
> and the backtrace.
>
> printk(KERN_EMERG "Bad page state in process '%s'\n"
> "page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
> "Trying to fix it up, but a reboot is needed\n"
>
> Did you aggressively trim that, or did it for some
> reason not get printed ?
>

I did not trim that.

All I did was add

printk(KERN_EMERG "we hit bad page, looping forever\n");
while (1) {
mdelay(1000);
}

to the end of bad_page()

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-09 18:24:12

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Mon, 9 Jan 2006, Jesper Juhl wrote:
> On 1/9/06, Dave Jones <[email protected]> wrote:
> > On Mon, Jan 09, 2006 at 06:47:01PM +0100, Jesper Juhl wrote:
> >
> > > Here's what bad_page printed for me :
> > >
> > > Bad page state in process 'kded'
> > > [<c0103e77>] dump_stack+0x17/0x20
> > > [<c0148999>] bad_page+0x69/0x160
> >
> > Odd, there should be more state between the 'Bad page'
> > and the backtrace.
> >
> > printk(KERN_EMERG "Bad page state in process '%s'\n"
> > "page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
> > "Trying to fix it up, but a reboot is needed\n"
> >
> > Did you aggressively trim that, or did it for some
> > reason not get printed ?
> >
>
> I did not trim that.
>
> All I did was add
>
> printk(KERN_EMERG "we hit bad page, looping forever\n");
> while (1) {
> mdelay(1000);
> }
>
> to the end of bad_page()

I'm afraid someone has recently "tidied up" bad_page, and missed out
the most interesting KERN_EMERG of all. No promises that this will
actually help us more than the backtrace you've sent, but please give
it another go with patch below applied. Andrew, please pass along...


Restore KERN_EMERG to each line printed by bad_page.

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.15-mm2/mm/page_alloc.c 2006-01-07 14:05:58.000000000 +0000
+++ linux/mm/page_alloc.c 2006-01-09 18:13:00.000000000 +0000
@@ -137,9 +137,9 @@ static inline int bad_range(struct zone
static void bad_page(struct page *page)
{
printk(KERN_EMERG "Bad page state in process '%s'\n"
- "page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
- "Trying to fix it up, but a reboot is needed\n"
- "Backtrace:\n",
+ KERN_EMERG "page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
+ KERN_EMERG "Trying to fix it up, but a reboot is needed\n"
+ KERN_EMERG "Backtrace:\n",
current->comm, page, (int)(2*sizeof(unsigned long)),
(unsigned long)page->flags, page->mapping,
page_mapcount(page), page_count(page));

2006-01-09 18:48:57

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 1/9/06, Hugh Dickins <[email protected]> wrote:
> On Mon, 9 Jan 2006, Jesper Juhl wrote:
> > On 1/9/06, Dave Jones <[email protected]> wrote:
> > > On Mon, Jan 09, 2006 at 06:47:01PM +0100, Jesper Juhl wrote:
> > >
> > > > Here's what bad_page printed for me :
> > > >
> > > > Bad page state in process 'kded'
> > > > [<c0103e77>] dump_stack+0x17/0x20
> > > > [<c0148999>] bad_page+0x69/0x160
> > >
> > > Odd, there should be more state between the 'Bad page'
> > > and the backtrace.
> > >
> > > printk(KERN_EMERG "Bad page state in process '%s'\n"
> > > "page:%p flags:0x%0*lx mapping:%p mapcount:%d count:%d\n"
> > > "Trying to fix it up, but a reboot is needed\n"
> > >
> > > Did you aggressively trim that, or did it for some
> > > reason not get printed ?
> > >
> >
> > I did not trim that.
> >
> > All I did was add
> >
> > printk(KERN_EMERG "we hit bad page, looping forever\n");
> > while (1) {
> > mdelay(1000);
> > }
> >
> > to the end of bad_page()
>
> I'm afraid someone has recently "tidied up" bad_page, and missed out
> the most interesting KERN_EMERG of all. No promises that this will
> actually help us more than the backtrace you've sent, but please give
> it another go with patch below applied. Andrew, please pass along...
>
>
> Restore KERN_EMERG to each line printed by bad_page.
>
Ok, with that patch the page, flags, mapping, mapcount & count
information prints again.
I get the exact same backtrace as before though, but a slightly
different hexdump :

Bad page state in process 'kded'
page:c1e75400 flags:0x00000000 mapping:00000000 mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0103e77>] dump_stack+0x17/0x20
[<c0148999>] bad_page+0x69/0x160
[<c0148e92>] __free_pages_ok+0xa2/0x120
[<c0149c7f>] __free_pages+0x2f/0x60
[<c02acb63>] sg_page_free+0x23/0x30
[<c02abdb3>] sg_remove_scat+0x63/0xe0
[<c02ac80d>] __sg_remove_sfp+0x4d/0xc0
[<c02ac927>] sg_remove_sfp+0xa7/0x120
[<c02a8b39>] sg_release+0x49/0xc0
[<c0166827>] __fput+0x167/0x1b0
[<c01666ab>] fput+0x3b/0x50
[<c0164efc>] filp_close+0x3c/0x80
[<c0164fa9>] sys_close+0x69/0x90
[<c0103009>] syscall_call+0x7/0xb
Hexdump:
000: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00
010: d0 53 e7 c1 d0 53 e7 c1 ff ff ff ff 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040: 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

One more note: the first time I booted the kernel with this patch X
failed to start. It started and blacked out the screen, at that point
the system hung, but not completely, I could still switch tty with
ctrl+alt+f? but all tty's I switched to were also completely black
with just a cursor in the top left corner. The system never recovered
from this state and I was forced to press the reset button.
The second time it booted up fine and launched kdm as usual to let me
log in and upon logging in it crashed as expected while launching KDE
(at that point I had switched over to tty1 and captured the crash dump
you see above).
So, it seems to me that we may have more than one bug.


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-09 19:16:08

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Mon, 9 Jan 2006, Jesper Juhl wrote:
> Ok, with that patch the page, flags, mapping, mapcount & count
> information prints again.

Good, thanks.

> I get the exact same backtrace as before though, but a slightly
> different hexdump :

(I find -mm's hexdump addition really irritating. Perhaps it could
be helpful if properly formatted, but not that dump of bytes.)

> Bad page state in process 'kded'
> page:c1e75400 flags:0x00000000 mapping:00000000 mapcount:1 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> [<c0103e77>] dump_stack+0x17/0x20
> [<c0148999>] bad_page+0x69/0x160
> [<c0148e92>] __free_pages_ok+0xa2/0x120
> [<c0149c7f>] __free_pages+0x2f/0x60
> [<c02acb63>] sg_page_free+0x23/0x30
> [<c02abdb3>] sg_remove_scat+0x63/0xe0
....

Having sent you the patch to restore the KERN_EMERGs, I then took a
look at drivers/scsi/sg.c, and it looks as if changes have gone into
2.6.15-git which might make more urgent a fix we knew would be needed
in some cases. Could you try the patch below and let us know if it
fixes your problems? Thanks...


Remove sg_rb_correct4mmap() and its nasty __put_page()s, which are liable
to do quite the wrong thing. Instead allocate pages with __GFP_COMP, then
high-orders should be safe for exposure to userspace by sg_vma_nopage(),
without any further manipulations. Based on original patch by Nick Piggin.

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.15-mm2/drivers/scsi/sg.c 2006-01-09 11:36:26.000000000 +0000
+++ linux/drivers/scsi/sg.c 2006-01-09 18:46:17.000000000 +0000
@@ -1140,32 +1140,6 @@ sg_fasync(int fd, struct file *filp, int
return (retval < 0) ? retval : 0;
}

-/* When startFinish==1 increments page counts for pages other than the
- first of scatter gather elements obtained from alloc_pages().
- When startFinish==0 decrements ... */
-static void
-sg_rb_correct4mmap(Sg_scatter_hold * rsv_schp, int startFinish)
-{
- struct scatterlist *sg = rsv_schp->buffer;
- struct page *page;
- int k, m;
-
- SCSI_LOG_TIMEOUT(3, printk("sg_rb_correct4mmap: startFinish=%d, scatg=%d\n",
- startFinish, rsv_schp->k_use_sg));
- /* N.B. correction _not_ applied to base page of each allocation */
- for (k = 0; k < rsv_schp->k_use_sg; ++k, ++sg) {
- for (m = PAGE_SIZE; m < sg->length; m += PAGE_SIZE) {
- page = sg->page;
- if (startFinish)
- get_page(page);
- else {
- if (page_count(page) > 0)
- __put_page(page);
- }
- }
- }
-}
-
static struct page *
sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int *type)
{
@@ -1237,10 +1211,7 @@ sg_mmap(struct file *filp, struct vm_are
sa += len;
}

- if (0 == sfp->mmap_called) {
- sg_rb_correct4mmap(rsv_schp, 1); /* do only once per fd lifetime */
- sfp->mmap_called = 1;
- }
+ sfp->mmap_called = 1;
vma->vm_flags |= VM_RESERVED;
vma->vm_private_data = sfp;
vma->vm_ops = &sg_mmap_vm_ops;
@@ -2395,8 +2366,6 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
SCSI_LOG_TIMEOUT(6,
printk("__sg_remove_sfp: bufflen=%d, k_use_sg=%d\n",
(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
- if (sfp->mmap_called)
- sg_rb_correct4mmap(&sfp->reserve, 0); /* undo correction */
sg_remove_scat(&sfp->reserve);
}
sfp->parentdp = NULL;
@@ -2478,9 +2447,9 @@ sg_page_malloc(int rqSz, int lowDma, int
return resp;

if (lowDma)
- page_mask = GFP_ATOMIC | GFP_DMA | __GFP_NOWARN;
+ page_mask = GFP_ATOMIC | GFP_DMA | __GFP_NOWARN | __GFP_COMP;
else
- page_mask = GFP_ATOMIC | __GFP_NOWARN;
+ page_mask = GFP_ATOMIC | __GFP_NOWARN | __GFP_COMP;

for (order = 0, a_size = PAGE_SIZE; a_size < rqSz;
order++, a_size <<= 1) ;

2006-01-09 19:21:09

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm2

I mistyped Doug's email address, sorry to Doug and all.

On Mon, 9 Jan 2006, Jesper Juhl wrote:
> Ok, with that patch the page, flags, mapping, mapcount & count
> information prints again.

Good, thanks.

> I get the exact same backtrace as before though, but a slightly
> different hexdump :

(I find -mm's hexdump addition really irritating. Perhaps it could
be helpful if properly formatted, but not that dump of bytes.)

> Bad page state in process 'kded'
> page:c1e75400 flags:0x00000000 mapping:00000000 mapcount:1 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> [<c0103e77>] dump_stack+0x17/0x20
> [<c0148999>] bad_page+0x69/0x160
> [<c0148e92>] __free_pages_ok+0xa2/0x120
> [<c0149c7f>] __free_pages+0x2f/0x60
> [<c02acb63>] sg_page_free+0x23/0x30
> [<c02abdb3>] sg_remove_scat+0x63/0xe0
....

Having sent you the patch to restore the KERN_EMERGs, I then took a
look at drivers/scsi/sg.c, and it looks as if changes have gone into
2.6.15-git which might make more urgent a fix we knew would be needed
in some cases. Could you try the patch below and let us know if it
fixes your problems? Thanks...


Remove sg_rb_correct4mmap() and its nasty __put_page()s, which are liable
to do quite the wrong thing. Instead allocate pages with __GFP_COMP, then
high-orders should be safe for exposure to userspace by sg_vma_nopage(),
without any further manipulations. Based on original patch by Nick Piggin.

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.15-mm2/drivers/scsi/sg.c 2006-01-09 11:36:26.000000000 +0000
+++ linux/drivers/scsi/sg.c 2006-01-09 18:46:17.000000000 +0000
@@ -1140,32 +1140,6 @@ sg_fasync(int fd, struct file *filp, int
return (retval < 0) ? retval : 0;
}

-/* When startFinish==1 increments page counts for pages other than the
- first of scatter gather elements obtained from alloc_pages().
- When startFinish==0 decrements ... */
-static void
-sg_rb_correct4mmap(Sg_scatter_hold * rsv_schp, int startFinish)
-{
- struct scatterlist *sg = rsv_schp->buffer;
- struct page *page;
- int k, m;
-
- SCSI_LOG_TIMEOUT(3, printk("sg_rb_correct4mmap: startFinish=%d, scatg=%d\n",
- startFinish, rsv_schp->k_use_sg));
- /* N.B. correction _not_ applied to base page of each allocation */
- for (k = 0; k < rsv_schp->k_use_sg; ++k, ++sg) {
- for (m = PAGE_SIZE; m < sg->length; m += PAGE_SIZE) {
- page = sg->page;
- if (startFinish)
- get_page(page);
- else {
- if (page_count(page) > 0)
- __put_page(page);
- }
- }
- }
-}
-
static struct page *
sg_vma_nopage(struct vm_area_struct *vma, unsigned long addr, int *type)
{
@@ -1237,10 +1211,7 @@ sg_mmap(struct file *filp, struct vm_are
sa += len;
}

- if (0 == sfp->mmap_called) {
- sg_rb_correct4mmap(rsv_schp, 1); /* do only once per fd lifetime */
- sfp->mmap_called = 1;
- }
+ sfp->mmap_called = 1;
vma->vm_flags |= VM_RESERVED;
vma->vm_private_data = sfp;
vma->vm_ops = &sg_mmap_vm_ops;
@@ -2395,8 +2366,6 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
SCSI_LOG_TIMEOUT(6,
printk("__sg_remove_sfp: bufflen=%d, k_use_sg=%d\n",
(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
- if (sfp->mmap_called)
- sg_rb_correct4mmap(&sfp->reserve, 0); /* undo correction */
sg_remove_scat(&sfp->reserve);
}
sfp->parentdp = NULL;
@@ -2478,9 +2447,9 @@ sg_page_malloc(int rqSz, int lowDma, int
return resp;

if (lowDma)
- page_mask = GFP_ATOMIC | GFP_DMA | __GFP_NOWARN;
+ page_mask = GFP_ATOMIC | GFP_DMA | __GFP_NOWARN | __GFP_COMP;
else
- page_mask = GFP_ATOMIC | __GFP_NOWARN;
+ page_mask = GFP_ATOMIC | __GFP_NOWARN | __GFP_COMP;

for (order = 0, a_size = PAGE_SIZE; a_size < rqSz;
order++, a_size <<= 1) ;

2006-01-09 19:39:25

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 1/9/06, Hugh Dickins <[email protected]> wrote:
> On Mon, 9 Jan 2006, Jesper Juhl wrote:
> > Ok, with that patch the page, flags, mapping, mapcount & count
> > information prints again.
>
> Good, thanks.
>
> > I get the exact same backtrace as before though, but a slightly
> > different hexdump :
>
> (I find -mm's hexdump addition really irritating. Perhaps it could
> be helpful if properly formatted, but not that dump of bytes.)
>
> > Bad page state in process 'kded'
> > page:c1e75400 flags:0x00000000 mapping:00000000 mapcount:1 count:0
> > Trying to fix it up, but a reboot is needed
> > Backtrace:
> > [<c0103e77>] dump_stack+0x17/0x20
> > [<c0148999>] bad_page+0x69/0x160
> > [<c0148e92>] __free_pages_ok+0xa2/0x120
> > [<c0149c7f>] __free_pages+0x2f/0x60
> > [<c02acb63>] sg_page_free+0x23/0x30
> > [<c02abdb3>] sg_remove_scat+0x63/0xe0
> ....
>
> Having sent you the patch to restore the KERN_EMERGs, I then took a
> look at drivers/scsi/sg.c, and it looks as if changes have gone into
> 2.6.15-git which might make more urgent a fix we knew would be needed
> in some cases. Could you try the patch below and let us know if it
> fixes your problems? Thanks...
>
>
> Remove sg_rb_correct4mmap() and its nasty __put_page()s, which are liable
> to do quite the wrong thing. Instead allocate pages with __GFP_COMP, then
> high-orders should be safe for exposure to userspace by sg_vma_nopage(),
> without any further manipulations. Based on original patch by Nick Piggin.
>

Unfortunately that patch doesn't change a thing (except some
addresses, but that's exected) :-(

Here's the crash I just got with that patch applied :

Bad page state in process 'kded'
page:c1e87d00 flags:0x00000000 mapping:00000000 macount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0103e77>] dump_stack+0x17/0x20
[<c0148999>] bad_page+0x69/0x160
[<c0148e92>] __free_pages_ok+0xa2/0x120
[<c0149c7f>] __free_pages+0x2f/0x60
[<c02aca53>] sg_page_free+0x23/0x30
[<c02abcc3>] sg_remove_scat+0x63/0xe0
[<c02ac711>] __sg_remove_sfp+0x41/0xa0
[<c02ac927>] sg_remove_sfp+0xa7/0x120
[<c02a8b39>] sg_release+0x49/0xc0
[<c0166827>] __fput+0x167/0x1b0
[<c01666ab>] fput+0x3b/0x50
[<c0164efc>] filp_close+0x3c/0x80
[<c0164fa9>] sys_close+0x69/0x90
[<c0103009>] syscall_call+0x7/0xb
Hexdump:
000: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00
010: d0 7c e8 c1 d0 7c e8 c1 ff ff ff ff 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
040: 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00
050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-09 20:15:36

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm

On Mon, 9 Jan 2006, Jesper Juhl wrote:
> On 1/9/06, Hugh Dickins <[email protected]> wrote:
> >
> > Remove sg_rb_correct4mmap() and its nasty __put_page()s, which are liable
> > to do quite the wrong thing. Instead allocate pages with __GFP_COMP, then
> > high-orders should be safe for exposure to userspace by sg_vma_nopage(),
> > without any further manipulations. Based on original patch by Nick Piggin.
>
> Unfortunately that patch doesn't change a thing (except some
> addresses, but that's exected) :-(

Okay, thanks for trying. Maybe you need to revert to the 2.6.15
drivers/scsi/sg.c for now (does that work for you in the 2.6.15-mm2
kernel?), or you could first try this little patch on 2.6.15-mm2
(either with or without my earlier patch - which will be wanted,
but not so urgently). I've not attempted to review the changes
in detail, but this change (if no more) looks to be badly needed...
And it's 2.6.15-git needing the fix, not just -mm.


sg_page_malloc clear the data buffer, not that extent of mem_map.

Signed-off-by: Hugh Dickins <[email protected]>

--- 2.6.15-mm2/drivers/scsi/sg.c 2006-01-07 14:05:49.000000000 +0000
+++ linux/drivers/scsi/sg.c 2006-01-09 20:03:59.000000000 +0000
@@ -2493,7 +2493,7 @@ sg_page_malloc(int rqSz, int lowDma, int
}
if (resp) {
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
- memset(resp, 0, resSz);
+ memset(page_address(resp), 0, resSz);
if (retSzp)
*retSzp = resSz;
}

2006-01-09 20:30:28

by Jesper Juhl

[permalink] [raw]
Subject: Re: 2.6.15-mm

On 1/9/06, Hugh Dickins <[email protected]> wrote:
> On Mon, 9 Jan 2006, Jesper Juhl wrote:
> > On 1/9/06, Hugh Dickins <[email protected]> wrote:
> > >
> > > Remove sg_rb_correct4mmap() and its nasty __put_page()s, which are liable
> > > to do quite the wrong thing. Instead allocate pages with __GFP_COMP, then
> > > high-orders should be safe for exposure to userspace by sg_vma_nopage(),
> > > without any further manipulations. Based on original patch by Nick Piggin.
> >
> > Unfortunately that patch doesn't change a thing (except some
> > addresses, but that's exected) :-(
>
> Okay, thanks for trying. Maybe you need to revert to the 2.6.15
> drivers/scsi/sg.c for now (does that work for you in the 2.6.15-mm2
> kernel?), or you could first try this little patch on 2.6.15-mm2
> (either with or without my earlier patch - which will be wanted,
> but not so urgently). I've not attempted to review the changes
> in detail, but this change (if no more) looks to be badly needed...
> And it's 2.6.15-git needing the fix, not just -mm.
>
>
> sg_page_malloc clear the data buffer, not that extent of mem_map.
>

Hugh, you're a genious!
I added your small patch on top of your previous one and now
2.6.15-mm2 doesn't crash any more :-)

Thanks a lot.


2.6.15-mm2 still has a few other problems for me though, which I'll
report in a short while in sepperate threads - a lot easier to
investigate now that the box no longer crashes while running that
kernel :)

Thank you again for your work in fixing this problem.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-09 20:41:27

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm

On Mon, 9 Jan 2006, Jesper Juhl wrote:
> On 1/9/06, Hugh Dickins <[email protected]> wrote:
> >
> > sg_page_malloc clear the data buffer, not that extent of mem_map.
>
> Hugh, you're a genious!
> I added your small patch on top of your previous one and now
> 2.6.15-mm2 doesn't crash any more :-)

Great, thanks a lot for trying it. I'll rush that one to Linus now
(the __GFP_COMP patch can wait and go through the proper channels).

Hugh

2006-01-09 20:46:39

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] fix Jesper's sg_page_free Bad page states

sg_page_malloc clear the data buffer, not that extent of mem_map.

Signed-off-by: Hugh Dickins <[email protected]>
---

drivers/scsi/sg.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

--- 2.6.15-git5/drivers/scsi/sg.c 2006-01-09 14:05:49.000000000 +0000
+++ linux/drivers/scsi/sg.c 2006-01-09 20:03:59.000000000 +0000
@@ -2493,7 +2493,7 @@ sg_page_malloc(int rqSz, int lowDma, int
}
if (resp) {
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
- memset(resp, 0, resSz);
+ memset(page_address(resp), 0, resSz);
if (retSzp)
*retSzp = resSz;
}

2006-01-09 20:48:28

by Mike Christie

[permalink] [raw]
Subject: Re: 2.6.15-mm

Hugh Dickins wrote:

>
> --- 2.6.15-mm2/drivers/scsi/sg.c 2006-01-07 14:05:49.000000000 +0000
> +++ linux/drivers/scsi/sg.c 2006-01-09 20:03:59.000000000 +0000
> @@ -2493,7 +2493,7 @@ sg_page_malloc(int rqSz, int lowDma, int
> }
> if (resp) {
> if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
> - memset(resp, 0, resSz);
> + memset(page_address(resp), 0, resSz);
> if (retSzp)
> *retSzp = resSz;
> }

Oops yeah, that is right. We switched from __get_free_pages to alloc_pages.

Will alloc_pages() always return lowmem pages or can it return highmem
pages? Just wondering becuase I guess if it can return highmem pages I
need to replace the page_adress calls to kmap/kunmap ones right?

2006-01-09 21:04:11

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.15-mm

On Mon, 9 Jan 2006, Mike Christie wrote:
>
> Oops yeah, that is right. We switched from __get_free_pages to alloc_pages.
>
> Will alloc_pages() always return lowmem pages or can it return highmem pages?
> Just wondering becuase I guess if it can return highmem pages I need to
> replace the page_adress calls to kmap/kunmap ones right?

Good thinking, but the page_address patch is safe for now. You can only
get highmem from alloc_pages if you say __GFP_HIGHMEM to it (perhaps
inside GFP_HIGHUSER), and at present you're not. You probably should,
depending on what the underlying device can handle: at present there's
lowDma choosing GFP_DMA, and that probably should be extended to cover
other possibilities. I'm not familiar with the driver end of these
things, James would give much better advice on how to proceed there:
or perhaps he'll advise that it's best left simply as is (with the
page_address fix) after all.

Hugh

2006-01-10 00:30:22

by Alexander Gran

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Am Sonntag, 8. Januar 2006 02:50 schrieben Sie:
> Can you try removing EDAC from .config?

Just did.

> I doubt if the cause is EDAC really. If you could investigate a bit
> further it'd help. mtrr? Run top? Generate a kernel profile? Is it just
> X being sluggish? (DRM/AGP?) etc.


EDAC errors are gone. System isn't sluggish ;)
However one new erro:
serial8250: too much work for irq3
serial8250: too much work for irq3

regards
Alex

--
Encrypted Mails welcome.
PGP-Key at http://zodiac.dnsalias.org/misc/pgpkey.asc | Key-ID: 0x6D7DD291


Attachments:
(No filename) (548.00 B)
(No filename) (189.00 B)
Download all attachments

2006-01-10 01:23:08

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Alexander Gran <[email protected]> wrote:
>
> Am Sonntag, 8. Januar 2006 02:50 schrieben Sie:
> > Can you try removing EDAC from .config?
>
> Just did.
>
> > I doubt if the cause is EDAC really. If you could investigate a bit
> > further it'd help. mtrr? Run top? Generate a kernel profile? Is it just
> > X being sluggish? (DRM/AGP?) etc.
>
>
> EDAC errors are gone. System isn't sluggish ;)

You're saying that enabling the EDAC driver made the system sluggish?

Did you look at the `top' output, or generate a kernel profile? That would
really help.

> However one new erro:
> serial8250: too much work for irq3
> serial8250: too much work for irq3

Was the serial port in use at the time? Does it work?

2006-01-10 10:16:07

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hi again,

[removed cc Jeff Garzik and Stephen Hemminger, as this is not about sky2 or libata]

On 8/01/2006 10:31 a.m., Andrew Morton wrote:

>> 3. The boot up process with -mm2 was pretty lengthy, I had two periods of time
>> when the whole system just came to a crawl, first time was when starting cups,
>> and it came back to life and continued booting about 30s later. Next when
>> starting hpijs it didn't come to life at all and I had to reboot. No output to
>> the console for either, unfortunately.
>
> Don't know, sorry. But this kernel had oopsed, hadn't it?

This one is still present in -git6. The symptoms are that the kernel boots up,
the userspace applications start launching as the system starts to go to
runlevel 3, and then the system 'blocks' on $random_service (clamd, mysql and
vsftp and others). I've left it for 5 mins and it never continued on..

There's no oops, and nothing seems to be logged about it, I can hit enter and
the console jumps to a new line, so the machine doesn't lock up hard, it seems
to be getting 'stuck'.

I gathered this info via serial console:

SysRq : HELP : loglevel0-8 reBoot show-all-locks(D) tErm Full kIll saK showMem
Nice powerOff showPc unRaw Sync showTasks Unmount
dSysRq : Show Locks Held

Showing all blocking locks in the system:
S init: 1 [c1920a50, 116] (not blocked on mutex)
S migration/0: 2 [c1920030, 0] (not blocked on mutex)
S ksoftirqd/0: 3 [c1928a50, 134] (not blocked on mutex)
S watchdog/0: 4 [c1928540, 0] (not blocked on mutex)
S migration/1: 5 [c1928030, 0] (not blocked on mutex)
S ksoftirqd/1: 6 [c1930a50, 134] (not blocked on mutex)
S watchdog/1: 7 [c1930540, 0] (not blocked on mutex)
S events/0: 8 [c1930030, 110] (not blocked on mutex)
S events/1: 9 [c1950a50, 110] (not blocked on mutex)
S khelper: 10 [c1936a50, 111] (not blocked on mutex)
S kthread: 11 [c1950540, 111] (not blocked on mutex)
S kblockd/0: 14 [c1967540, 110] (not blocked on mutex)
S kblockd/1: 15 [c1967030, 110] (not blocked on mutex)
S kacpid: 16 [c1936540, 111] (not blocked on mutex)
S khubd: 108 [c1ba8a50, 110] (not blocked on mutex)
S pdflush: 160 [c1bdb540, 120] (not blocked on mutex)
D pdflush: 161 [c1bdb030, 115] (not blocked on mutex)
S kswapd0: 162 [c1be0a50, 117] (not blocked on mutex)
S aio/0: 163 [c1b51030, 111] (not blocked on mutex)
S aio/1: 164 [c1b51540, 111] (not blocked on mutex)
S kseriod: 252 [f7cdd030, 110] (not blocked on mutex)
S ata/0: 277 [f7cdd540, 111] (not blocked on mutex)
S ata/1: 278 [f7cd8a50, 111] (not blocked on mutex)
S scsi_eh_0: 280 [c1b55030, 110] (not blocked on mutex)
S scsi_eh_1: 281 [c1b55540, 111] (not blocked on mutex)
S scsi_eh_2: 282 [f7cd3030, 111] (not blocked on mutex)
S scsi_eh_3: 283 [c1b55a50, 110] (not blocked on mutex)
S kirqd: 368 [f7cd8030, 115] (not blocked on mutex)
S md5_raid1: 374 [f7cd8540, 110] (not blocked on mutex)
S md4_raid1: 378 [c1b4b030, 110] (not blocked on mutex)
S md3_raid1: 382 [c1967a50, 110] (not blocked on mutex)
D md2_raid1: 386 [c1b4b540, 110] (not blocked on mutex)
S md1_raid1: 391 [c1b51a50, 110] (not blocked on mutex)
D md0_raid1: 394 [c1b47540, 110] (not blocked on mutex)
D reiserfs/0: 395 [c1b39030, 110] (not blocked on mutex)
D reiserfs/1: 396 [f7c70030, 110] (not blocked on mutex)
S udevd: 446 [c1ba8030, 112] (not blocked on mutex)
S kjournald: 1299 [f7c67a50, 120] (not blocked on mutex)
S rc: 1370 [c1b39540, 118] (not blocked on mutex)
D syslogd: 1675 [f753d540, 116] (not blocked on mutex)
S klogd: 1678 [f75e3030, 115] (not blocked on mutex)
S named: 1689 [f753d030, 117] (not blocked on mutex)
S named: 1690 [f7673540, 116] (not blocked on mutex)
D named: 1691 [f767b540, 116] (not blocked on mutex)
S named: 1692 [f7673030, 116] (not blocked on mutex)
S named: 1693 [f7c6ca50, 116] (not blocked on mutex)
S portmap: 1715 [f74ee030, 116] (not blocked on mutex)
S mdadm: 1724 [f75d4030, 116] (not blocked on mutex)
S acpid: 1804 [f7620030, 123] (not blocked on mutex)
S hpiod: 1812 [c1b43a50, 116] (not blocked on mutex)
S hpiod: 1828 [f7696a50, 117] (not blocked on mutex)
S python: 1817 [f7c67030, 115] (not blocked on mutex)
D cupsd: 1826 [f771ca50, 116] (not blocked on mutex)
S sshd: 1879 [c1950030, 115] (not blocked on mutex)
S xinetd: 1888 [f74fea50, 115] (not blocked on mutex)
S ntpd: 1899 [c1b39a50, 116] (not blocked on mutex)
S apcupsd: 1917 [f75d4540, 116] (not blocked on mutex)
S apcupsd: 2198 [f74ee540, 115] (not blocked on mutex)
S nfsd: 1953 [c1936030, 119] (not blocked on mutex)
S nfsd: 1954 [f7696030, 119] (not blocked on mutex)
S nfsd: 1955 [f7c76540, 120] (not blocked on mutex)
S nfsd: 1956 [f7c6c030, 120] (not blocked on mutex)
S nfsd: 1957 [c1be0540, 120] (not blocked on mutex)
S nfsd: 1958 [f74eea50, 120] (not blocked on mutex)
S nfsd: 1959 [c1ba8540, 120] (not blocked on mutex)
S nfsd: 1960 [c1b43540, 121] (not blocked on mutex)
S lockd: 1962 [c1b43030, 120] (not blocked on mutex)
S rpciod/0: 1963 [f7594540, 112] (not blocked on mutex)
S rpciod/1: 1964 [f7594030, 110] (not blocked on mutex)
S rpc.mountd: 1966 [f7c76030, 119] (not blocked on mutex)
S rpc.idmapd: 1985 [f7c6c540, 116] (not blocked on mutex)
S vsftpd: 1995 [f7696540, 118] (not blocked on mutex)
S mysqld_safe: 2055 [f64cb540, 124] (not blocked on mutex)
S mysqld: 2086 [c1bdba50, 116] (not blocked on mutex)
S mysqld: 2087 [f64d4030, 119] (not blocked on mutex)
S mysqld: 2088 [f64d4a50, 116] (not blocked on mutex)
S mysqld: 2089 [f64cba50, 117] (not blocked on mutex)
S mysqld: 2090 [f771c540, 119] (not blocked on mutex)
S mysqld: 2097 [f760d540, 116] (not blocked on mutex)
S mysqld: 2098 [f760da50, 116] (not blocked on mutex)
S mysqld: 2104 [f7594a50, 116] (not blocked on mutex)
S mysqld: 2105 [f7663a50, 116] (not blocked on mutex)
S mysqld: 2106 [f753da50, 118] (not blocked on mutex)
S dhcpd: 2119 [f7cd3a50, 116] (not blocked on mutex)
S dovecot: 2128 [f64dd540, 115] (not blocked on mutex)
S dovecot-auth: 2138 [f75e3a50, 116] (not blocked on mutex)
S dovecot-auth: 2139 [c1b47a50, 116] (not blocked on mutex)
D imap: 2157 [c1b47030, 116] (not blocked on mutex)
S clamd: 2158 [f7663540, 118] (not blocked on mutex)
S S79amavisd: 2161 [f7620a50, 118] (not blocked on mutex)
D imap: 2164 [f7cd3540, 115] blocked on mutex: [f61e5d44]
{inode_init_once}
.. held by: imap: 2188 [f74fe540, 115]
... acquired at: real_lookup+0x1d/0xc6
S bash: 2165 [f7c67540, 120] (not blocked on mutex)
D amavisd: 2166 [f650ba50, 118] (not blocked on mutex)
D imap: 2174 [f7c76a50, 116] (not blocked on mutex)
X imap-login: 2176 [f771c030, 116] (not blocked on mutex)
X imap-login: 2178 [f74fe030, 116] (not blocked on mutex)
S imap-login: 2179 [f75e3540, 116] (not blocked on mutex)
S imap-login: 2180 [f767ba50, 116] (not blocked on mutex)
D imap: 2188 [f74fe540, 115] (not blocked on mutex)
S imap: 2196 [f650b540, 116] (not blocked on mutex)
X imap-login: 2197 [f7cdda50, 115] (not blocked on mutex)
X imap-login: 2202 [f767b030, 116] (not blocked on mutex)
D imap: 2213 [f6231a50, 115] (not blocked on mutex)
D dovecot-auth: 2222 [f64d4540, 118] (not blocked on mutex)
D dovecot-auth: 2223 [f64dd030, 116] (not blocked on mutex)
D dovecot-auth: 2224 [f760d030, 115] (not blocked on mutex)

---------------------------
| showing all locks held: |
---------------------------

#001: [f7dc8a4c] {alloc_super}
.. held by: pdflush: 161 [c1bdb030, 115]
... acquired at: sync_supers+0x8d/0xeb

#002: [f61e5d44] {inode_init_once}
.. held by: imap: 2188 [f74fe540, 115]
... acquired at: real_lookup+0x1d/0xc6

#003: [f61529c4] {inode_init_once}
.. held by: imap: 2157 [c1b47030, 116]
... acquired at: real_lookup+0x1d/0xc6

#004: [f700d9c4] {inode_init_once}
.. held by: named: 1691 [f767b540, 116]
... acquired at: reiserfs_file_write+0x1be/0x615

#005: [f68dcd44] {inode_init_once}
.. held by: cupsd: 1826 [f771ca50, 116]
... acquired at: sys_unlink+0x66/0x118

#006: [f6bd2d44] {inode_init_once}
.. held by: cupsd: 1826 [f771ca50, 116]
... acquired at: vfs_unlink+0xd4/0x150

=============================================

Is there any other information I can gather to help narrow this one down?

reuben


2006-01-10 10:31:15

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> Is there any other information I can gather to help narrow this one down?

sysrq-t, please.

2006-01-10 10:47:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-mm2


* Reuben Farrelly <[email protected]> wrote:

> >Don't know, sorry. But this kernel had oopsed, hadn't it?
>
> This one is still present in -git6. The symptoms are that the kernel
> boots up, the userspace applications start launching as the system
> starts to go to runlevel 3, and then the system 'blocks' on
> $random_service (clamd, mysql and vsftp and others). I've left it for
> 5 mins and it never continued on..
>
> There's no oops, and nothing seems to be logged about it, I can hit
> enter and the console jumps to a new line, so the machine doesn't lock
> up hard, it seems to be getting 'stuck'.

could you please also send me a SysRq-T (showTasks) output? [which will
also include all the stacktraces] (Please make sure you have
KALLSYMS_ALL enabled.)

Ingo

2006-01-10 10:52:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-mm2


* Ingo Molnar <[email protected]> wrote:

> could you please also send me a SysRq-T (showTasks) output? [which
> will also include all the stacktraces] (Please make sure you have
> KALLSYMS_ALL enabled.)

a wild guess: could you also apply the debug patch below (and please
keep CONFIG_DEBUG_MUTEXES enabled) - does it trigger anywhere during
your bootup sequence? [it doesnt trigger here on an ext3 based bootup
sequence]

Ingo

--
check that mutexes are used in TASK_RUNNING state. Using a mutex within
some wait-for-event loop could result in wakeups getting lost.

Signed-off-by: Ingo Molnar <[email protected]>

----

kernel/mutex-debug.c | 5 +++++
1 files changed, 5 insertions(+)

Index: linux/kernel/mutex-debug.c
===================================================================
--- linux.orig/kernel/mutex-debug.c
+++ linux/kernel/mutex-debug.c
@@ -385,6 +385,11 @@ void debug_mutex_init_waiter(struct mute
memset(waiter, 0x11, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
+ /*
+ * Make sure mutexes are not acquired deep within some
+ * waitqueue loop - wakeups could get lost:
+ */
+ DEBUG_WARN_ON(current->state != TASK_RUNNING);
}

void debug_mutex_wake_waiter(struct mutex *lock, struct mutex_waiter *waiter)

2006-01-10 10:57:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-mm2


an additional patch you could apply is the one below - but it doubt it
makes a difference.

Ingo

--
add might_sleep() to the mutex_lock slowpath.

Signed-off-by: Ingo Molnar <[email protected]>

----

kernel/mutex.c | 1 +
1 files changed, 1 insertion(+)

Index: linux/kernel/mutex.c
===================================================================
--- linux.orig/kernel/mutex.c
+++ linux/kernel/mutex.c
@@ -133,6 +133,7 @@ __mutex_lock_common(struct mutex *lock,
struct mutex_waiter waiter;
unsigned int old_val;

+ might_sleep();
debug_mutex_init_waiter(&waiter);

spin_lock_mutex(&lock->wait_lock);

2006-01-10 10:59:59

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2


On 10/01/2006 11:30 p.m., Andrew Morton wrote:
> Reuben Farrelly <[email protected]> wrote:
>> Is there any other information I can gather to help narrow this one down?
>
> sysrq-t, please.

Here you go:

SysRq : Show State

sibling
task PC pid father child younger older
init S 00000000 0 1 0 2 (NOTLB)
c1921ec4 00000001 c0373f84 00000000 c0373f80 2d510cb0 00000033 000200d0
00000044 00000002 00000009 c1920b78 c1920a50 c1920540 c180c060 2d514cf2
00000033 00003a6e 00000001 c1921000 c0161800 00000282 c01234f1 c1921ed8
Call Trace:
[<c0161800>] link_path_walk+0x65/0xcb
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c012dae3>] add_wait_queue+0xf/0x30
[<c0123e06>] process_timeout+0x0/0x5
[<c016543b>] do_select+0x2db/0x2f5
[<c0164fe5>] __pollwait+0x0/0x97
[<c0165655>] sys_select+0x1e8/0x39c
[<c0102a57>] sysenter_past_esp+0x54/0x75
migration/0 S 00000082 0 2 1 3 (L-TLB)
c1927fb0 00000001 00000000 00000082 00000002 dffc3a1f 00000009 00000000
00000000 00000082 00000001 c1920158 c1920030 c036ea20 c1804060 e0360311
00000009 00000c60 00000000 c1927000 f7904f70 f7904f68 f7904f6c 00000296
Call Trace:
[<c0117c12>] complete+0x3a/0x4a
[<c0118a9f>] migration_thread+0x7a/0x104
[<c0118a25>] migration_thread+0x0/0x104
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
ksoftirqd/0 S C1804060 0 3 1 4 2 (L-TLB)
c1929fb0 c1920a50 c036e880 c1804060 c1921eac e6799862 00000006 f7a47d80
00000000 00000082 0000000a c1928b78 c1928a50 c036ea20 c1804060 e694e651
00000006 0000063d 00000000 c1929000 00000000 00193641 f7ca1030 0000000e
Call Trace:
[<c01206ef>] ksoftirqd+0xba/0xbc
[<c0120635>] ksoftirqd+0x0/0xbc
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
watchdog/0 S C01157B3 0 4 1 5 3 (L-TLB)
c192af88 00195ca5 c192af30 c01157b3 c1804060 2823ba88 00000034 00000000
00000002 c1928540 00000001 c1928668 c1928540 c036ea20 c1804060 2823c527
00000034 00000528 00000000 c192a000 00000000 00200282 c01234f1 c192af9c
Call Trace:
[<c01157b3>] activate_task+0x9d/0xe7
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c01373ce>] watchdog+0x0/0x62
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c01241a2>] msleep_interruptible+0x31/0x3f
[<c013740c>] watchdog+0x3e/0x62
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
migration/1 S 00000082 0 5 1 6 4 (L-TLB)
c192bfb0 00000000 00000001 00000082 00000002 c0f4a8b8 00000009 f7bd4380
00000000 00000082 00000001 c1928158 c1928030 f7a86540 c180c060 c0f791ee
00000009 00000da7 00000001 c192b000 00000000 f7377f68 f743c540 00000753
Call Trace:
[<c0118a9f>] migration_thread+0x7a/0x104
[<c0118a25>] migration_thread+0x0/0x104
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
ksoftirqd/1 S C180C060 0 6 1 7 5 (L-TLB)
c1931fb0 c1928030 c036e880 c180c060 c192bfb0 7bed7983 00000003 00000000
00000001 00000000 0000000a c1930b78 c1930a50 f7cd2030 c180c060 7c177ddd
00000003 00000c84 00000001 c1931000 c1804060 001a0cad 00000018 00000018
Call Trace:
[<c0117d3c>] set_user_nice+0xd0/0x10a
[<c01206ef>] ksoftirqd+0xba/0xbc
[<c0120635>] ksoftirqd+0x0/0xbc
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
watchdog/1 S C01157B3 0 7 1 8 6 (L-TLB)
c1932f88 001a29ef c1932f30 c01157b3 c1804060 282397ff 00000034 00000000
00000002 c1930540 00000009 c1930668 c1930540 c1920540 c180c060 2823a171
00000034 000004b0 00000001 c1932000 00000000 00000282 c01234f1 c1932f9c
Call Trace:
[<c01157b3>] activate_task+0x9d/0xe7
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c01373ce>] watchdog+0x0/0x62
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c01241a2>] msleep_interruptible+0x31/0x3f
[<c013740c>] watchdog+0x3e/0x62
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
events/0 S 00000000 0 8 1 9 7 (L-TLB)
c1935f40 00000000 c193be00 00000000 00000000 1057a540 00000034 00000003
c042e970 c042e964 0000000a c1930158 c1930030 c036ea20 c1804060 1057b3eb
00000034 0000081f 00000000 c1935000 00000000 00000001 c1935f40 c0117b55
Call Trace:
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c0218d95>] flush_to_ldisc+0x0/0x110
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
events/1 S C1B3C000 0 9 1 10 8 (L-TLB)
c1951f40 c18ff800 c18fd440 c1b3c000 c1036760 e45128f1 00000033 00000282
c18fd440 c0373880 0000000a c1950b78 c1950a50 c1920540 c180c060 e4516b9a
00000033 00003c6e 00000001 c1951000 00000000 00000001 c1951f40 c0117b55
Call Trace:
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c01525b3>] cache_reap+0x0/0x17a
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
khelper S 00800711 0 10 1 11 9 (L-TLB)
c1937f40 00000000 00000000 00800711 c1937f2c bbef926b 00000007 00000000
00000000 00000000 0000000a c1936b78 c1936a50 f7cb7030 c1804060 bc293651
00000007 000035af 00000000 c1937000 00000000 00000001 c1920540 c0117b55
Call Trace:
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c0129908>] __call_usermodehelper+0x0/0x51
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kthread S C1B97030 0 11 1 14 162 10 (L-TLB)
c1960f40 f7cb7540 00800711 c1b97030 00000002 9f7eae71 00000009 f79a3880
00000000 f7904f34 00000009 c1950668 c1950540 f743c540 c1804060 9fa720c2
00000009 0000070f 00000000 c1960000 00000000 000008ca f7cb7540 c0117b55
Call Trace:
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c012d83d>] keventd_create_kthread+0x0/0x46
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kblockd/0 S F7C7DFA8 0 14 11 15 (L-TLB)
c1969f40 00000019 c180c060 f7c7dfa8 f7dc872c 7125ee13 0000000a c012dc60
00000000 f7c7dfa8 0000000a c1967668 c1967540 c036ea20 c1804060 712612c5
0000000a 000014ca 00000000 c1969000 00000000 00000001 c1969f40 c0117b55
Call Trace:
[<c012dc60>] autoremove_wake_function+0x15/0x37
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c01da433>] blk_unplug_work+0x0/0x6
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kblockd/1 S F7C7DFA8 0 15 11 16 14 (L-TLB)
c196af40 00000000 00000000 f7c7dfa8 f7dc872c b24106b9 00000021 c012dc60
00000000 f7c7dfa8 0000000a c1967158 c1967030 c1920540 c180c060 b2411da4
00000021 00000c8e 00000001 c196a000 00000000 00000001 c196af40 c0117b55
Call Trace:
[<c012dc60>] autoremove_wake_function+0x15/0x37
[<c0117b55>] __wake_up+0x32/0x43
[<c0129cfc>] worker_thread+0x14b/0x247
[<c01da433>] blk_unplug_work+0x0/0x6
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kacpid S C0116ECF 0 16 11 108 15 (L-TLB)
c19cbf40 00000001 c19cbf18 c0116ecf 00000000 01dac4a8 00000000 00000000
c18049c0 00000000 00000009 c1936668 c1936540 c1920540 c180c060 01db2fa0
00000000 000007e3 00000001 c19cb000 c1936540 c19cbf2c c0126d68 c1940254
Call Trace:
[<c0116ecf>] find_busiest_group+0x1ec/0x351
[<c0126d68>] do_sigaction+0x1a1/0x1f2
[<c0129bb1>] worker_thread+0x0/0x247
[<c0129cfc>] worker_thread+0x14b/0x247
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
khubd S 00000004 0 108 11 160 16 (L-TLB)
c1b95f84 00000001 f7d3d420 00000004 000003e8 3624fba8 00000002 c1b95f72
f7d3ea14 c0269480 0000000a c1b92668 c1b92540 c036ea20 c1804060 3642f8ca
00000002 00000a25 00000000 c1b95000 00000000 00000002 c036ea20 f7d3ea00
Call Trace:
[<c0269480>] hub_port_status+0x15/0x7e
[<c026a9b3>] hub_thread+0x0/0x112
[<c026aa97>] hub_thread+0xe4/0x112
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
pdflush S F7C0BF34 0 160 11 161 108 (L-TLB)
f7c0bf94 c0117305 00000002 f7c0bf34 c1950540 927ec3a2 00000000 c1920540
c1920540 c036e880 0000000a f7c0ab78 f7c0aa50 c1920540 c180c060 928dfdbd
00000000 000008c6 00000001 f7c0b000 00000000 00000286 c01188d7 00000286
Call Trace:
[<c0117305>] load_balance_newidle+0x42/0xf2
[<c01188d7>] set_cpus_allowed+0x73/0x114
[<c013f125>] pdflush+0x0/0x2d
[<c013f00c>] __pdflush+0x7c/0x195
[<c013f14d>] pdflush+0x28/0x2d
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
pdflush D F7C0CC9C 0 161 11 163 160 (L-TLB)
f7c0ccc0 0140ef60 00000001 f7c0cc9c f7c79fa8 70e90228 0000000a f7c0cc94
c012dc60 00000000 0000000a f7c0a668 f7c0a540 c036ea20 c1804060 70f43bb2
0000000a 000b28d7 00000000 f7c0c000 f7c0ccb8 c0117b55 00000000 00000000
Call Trace:
[<c012dc60>] autoremove_wake_function+0x15/0x37
[<c0117b55>] __wake_up+0x32/0x43
[<c0297589>] md_write_start+0xc7/0x159
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c028ef75>] make_request+0x58/0x458
[<c013ce17>] __alloc_pages+0x57/0x2d2
[<c01db8f0>] generic_make_request+0xb7/0x12a
[<c015194a>] cache_init_objs+0x43/0x85
[<c01db9a9>] submit_bio+0x46/0xd0
[<c0158eec>] bio_alloc_bioset+0xd9/0x195
[<c0158863>] submit_bh+0xc0/0x10d
[<c0158924>] ll_rw_block+0x74/0xc4
[<c01a7c04>] flush_commit_list+0x1d7/0x50e
[<c01ac397>] do_journal_end+0x80a/0x971
[<c01ab374>] journal_end_sync+0x62/0x73
[<c019a0ca>] reiserfs_sync_fs+0x4f/0x5c
[<c015a724>] sync_supers+0xa8/0xeb
[<c013e854>] wb_kupdate+0x3a/0x148
[<c013f125>] pdflush+0x0/0x2d
[<c013f043>] __pdflush+0xb3/0x195
[<c013f14d>] pdflush+0x28/0x2d
[<c013e81a>] wb_kupdate+0x0/0x148
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kswapd0 S F7C0A1E3 0 162 1 368 11 (L-TLB)
f7c0df90 f7c0a1da 0000000a f7c0a1e3 f7c0df9c 92eeead4 00000000 00000000
0000000a f7c0a030 00000008 f7c0a158 f7c0a030 c036ea20 c1804060 92efc785
00000000 00001a3c 00000000 f7c0d000 c0372060 c036e338 c0124213 f7c0d000
Call Trace:
[<c0124213>] free_uid+0x11/0x48
[<c011d088>] daemonize+0x1cb/0x22f
[<c0141f37>] kswapd+0x117/0x12e
[<c01029ba>] ret_from_fork+0x6/0x14
[<c0141e20>] kswapd+0x0/0x12e
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0141e20>] kswapd+0x0/0x12e
[<c0100fcd>] kernel_thread_helper+0x5/0xb
aio/0 S C0116ECF 0 163 11 164 161 (L-TLB)
c1b90f40 00000000 c1b90f18 c0116ecf 00000000 92eeead4 00000000 00000001
c180c9c0 00000000 00000009 c1b8e668 c1b8e540 c036ea20 c1804060 92effd03
00000000 00000a08 00000000 c1b90000 c1b8e540 c1b90f2c c0126d68 c1ba0d54
Call Trace:
[<c0116ecf>] find_busiest_group+0x1ec/0x351
[<c0126d68>] do_sigaction+0x1a1/0x1f2
[<c0129bb1>] worker_thread+0x0/0x247
[<c0129cfc>] worker_thread+0x14b/0x247
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
aio/1 S 00000000 0 164 11 252 163 (L-TLB)
c1b54f40 00000001 c180c9c0 00000000 c04123bc 92f00d5b 00000000 00000000
00000000 00000001 00000009 c1b51158 c1b51030 c1920a50 c180c060 92f0389f
00000000 0000072f 00000001 c1b54000 00000000 c1b54f2c c036ea20 c1b7a4d4
Call Trace:
[<c0129bb1>] worker_thread+0x0/0x247
[<c0129cfc>] worker_thread+0x14b/0x247
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kseriod S C0312ABC 0 252 11 277 164 (L-TLB)
f7cd4f80 00000001 f7cfc868 c0312abc 22222222 53e41492 00000002 00008080
00000000 f7ceb114 0000000a f7cd2668 f7cd2540 c036ea20 c1804060 53ef5a4a
00000002 00029830 00000000 f7cd4000 c0383060 f7d3d9c0 f7d3d9c0 c18ff100
Call Trace:
[<c0312abc>] __mutex_unlock_slowpath+0x6d/0x202
[<c0229030>] serio_thread+0x0/0x115
[<c02290ee>] serio_thread+0xbe/0x115
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
ata/0 S C0116ECF 0 277 11 278 252 (L-TLB)
f7cd3f40 00000001 f7cd3f18 c0116ecf 00000000 b6716cff 00000000 00000000
00000000 00000000 00000009 f7cd2b78 f7cd2a50 c1920a50 c1804060 b672017c
00000000 00000c77 00000000 f7cd3000 00000000 f7cd3f2c c1950540 f7ced454
Call Trace:
[<c0116ecf>] find_busiest_group+0x1ec/0x351
[<c0129bb1>] worker_thread+0x0/0x247
[<c0129cfc>] worker_thread+0x14b/0x247
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
ata/1 S C0116ECF 0 278 11 280 277 (L-TLB)
c1b96f40 00000000 c1b96f18 c0116ecf 00000000 b6702aea 00000000 00000001
c180c9c0 00000000 0000000a c1b92158 c1b92030 c1920540 c180c060 b6723300
00000000 000008fd 00000001 c1b96000 c1b92030 c1b96f2c c0126d68 c1bc1354
Call Trace:
[<c0116ecf>] find_busiest_group+0x1ec/0x351
[<c0126d68>] do_sigaction+0x1a1/0x1f2
[<c0129bb1>] worker_thread+0x0/0x247
[<c0129cfc>] worker_thread+0x14b/0x247
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
scsi_eh_0 S C031118F 0 280 11 281 278 (L-TLB)
c1b93fb8 c1804060 c03cdfe0 c031118f c1b93fc8 56a412ec 00000001 00000082
00000002 56734bc4 00000009 c1b92b78 c1b92a50 c036ea20 c1804060 56a4b87d
00000001 000005eb 00000000 c1b93000 00000001 00000bb2 00000001 c1b93000
Call Trace:
[<c031118f>] schedule+0x31f/0xd03
[<c0254522>] scsi_error_handler+0x0/0x9c
[<c0254565>] scsi_error_handler+0x43/0x9c
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
scsi_eh_1 S C031118F 0 281 11 282 280 (L-TLB)
f7ccefb8 c1804060 c03cdfe0 c031118f f7ccefc8 572a2f76 00000001 00000082
00000002 572a2f76 0000000a f7ccd158 f7ccd030 c036ea20 c1804060 575584e4
00000001 00000486 00000000 f7cce000 00000001 00000c04 00000000 f7cce000
Call Trace:
[<c031118f>] schedule+0x31f/0xd03
[<c0254522>] scsi_error_handler+0x0/0x9c
[<c0254565>] scsi_error_handler+0x43/0x9c
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
scsi_eh_2 S C031118F 0 282 11 283 281 (L-TLB)
f7cd0fb8 c1804060 c03cdfe0 c031118f f7cd0fc8 57e0ef48 00000001 00000082
00000002 57e0ef48 0000000a f7ccdb78 f7ccda50 c036ea20 c1804060 58062a15
00000001 0000049a 00000000 f7cd0000 00000001 00000b06 00000000 f7cd0000
Call Trace:
[<c031118f>] schedule+0x31f/0xd03
[<c0254522>] scsi_error_handler+0x0/0x9c
[<c0254565>] scsi_error_handler+0x43/0x9c
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
scsi_eh_3 S C031118F 0 283 11 374 282 (L-TLB)
f7cccfb8 c1804060 c03cdfe0 c031118f f7cccfc8 58b6b7bf 00000001 00000082
00000002 58b6b7bf 00000009 f7ccd668 f7ccd540 c036ea20 c1804060 58b6e355
00000001 00000532 00000000 f7ccc000 00000001 00000ade 00000000 f7ccc000
Call Trace:
[<c031118f>] schedule+0x31f/0xd03
[<c0254522>] scsi_error_handler+0x0/0x9c
[<c0254565>] scsi_error_handler+0x43/0x9c
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
kirqd S 00000000 0 368 1 445 162 (L-TLB)
f7cb8f98 00000000 00000002 00000000 c1804060 16c46026 00000034 00000000
00000000 c19cc1b8 0000000a f7cc1b78 f7cc1a50 c1920540 c180c060 16c47299
00000034 00000cd4 00000001 f7cb8000 c1804060 00000282 c01234f1 f7cb8fac
Call Trace:
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c011136e>] balanced_irq+0x7f/0xb5
[<c01112ef>] balanced_irq+0x0/0xb5
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md5_raid1 S 00000008 0 374 11 378 283 (L-TLB)
f7cbaf48 f7cba000 c0297e08 00000008 00000001 2fef2d6a 00000033 f7cdc02c
c01da41a f7d83600 0000000a f7cc1668 f7cc1540 c1920540 c180c060 2ff29043
00000033 0133f6d5 00000001 f7cba000 00000000 00000282 c01234f1 f7cbaf5c
Call Trace:
[<c0297e08>] md_check_recovery+0x18/0x435
[<c01da41a>] generic_unplug_device+0x15/0x21
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c0296860>] md_thread+0x118/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md4_raid1 S 00000000 0 378 11 382 374 (L-TLB)
f7c7ef48 f7c7e000 c0297e08 00000000 00000000 4d06524d 00000034 00000000
00000002 f7d83680 0000000a c1bc2b78 c1bc2a50 c1920540 c180c060 4d3dface
00000034 00748f57 00000001 f7c7e000 00000000 00000282 c01234f1 f7c7ef5c
Call Trace:
[<c0297e08>] md_check_recovery+0x18/0x435
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c0296860>] md_thread+0x118/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md3_raid1 S 00000008 0 382 11 386 378 (L-TLB)
f7ca6f48 f7ca6000 c0297e08 00000008 00000001 452bef0e 00000034 00000000
00000002 f7d83700 0000000a c1b97668 c1b97540 c1920540 c180c060 45446c75
00000034 0018781a 00000001 f7ca6000 00000000 00000282 c01234f1 f7ca6f5c
Call Trace:
[<c0297e08>] md_check_recovery+0x18/0x435
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c0296860>] md_thread+0x118/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md2_raid1 D 00011210 0 386 11 390 382 (L-TLB)
f7c7de84 00000010 c18ec340 00011210 00000001 7826cfa1 00000014 c1000000
00011210 00000000 0000000a c1b8e158 c1b8e030 c036ea20 c1804060 782777ad
00000014 00006310 00000000 f7c7d000 c197de8c f76b1b00 c01591d3 f7cdc744
Call Trace:
[<c01591d3>] bio_clone+0x9c/0xae
[<c0291855>] md_super_wait+0xdf/0xf2
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c029941f>] bitmap_unplug+0x1c7/0x1ce
[<c01da3ac>] blk_remove_plug+0x26/0x5d
[<c0290011>] raid1d+0x7f/0x594
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c01236c9>] del_timer_sync+0xa/0x10
[<c0312389>] schedule_timeout+0x50/0xac
[<c0296789>] md_thread+0x41/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md1_raid1 S C1804060 0 390 11 393 386 (L-TLB)
c1b81f48 c1b81000 c0297e08 c1804060 0000000a 2e81c970 00000033 00000000
00000002 f7d83800 0000000a f7ca1b78 f7ca1a50 c036ea20 c1804060 2ead36a6
00000033 011f143c 00000000 c1b81000 00000000 00200282 c01234f1 c1b81f5c
Call Trace:
[<c0297e08>] md_check_recovery+0x18/0x435
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c0296860>] md_thread+0x118/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
md0_raid1 D 02E9B0CB 0 393 11 394 390 (L-TLB)
f7c79e2c c04123bc 00000000 02e9b0cb 00000c80 70e8e00b 0000000a 00000246
f76666c0 f7d07660 0000000a c1b51668 c1b51540 c1920540 c180c060 70f4d67b
0000000a 00009eaf 00000001 f7c79000 f7cdc744 c1000000 00011210 00000000
Call Trace:
[<c0291855>] md_super_wait+0xdf/0xf2
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c02987f4>] write_sb_page+0x42/0x71
[<c02988e5>] write_page+0xc2/0x134
[<c0293355>] md_update_sb+0x7f/0x15b
[<c0297f6f>] md_check_recovery+0x17f/0x435
[<c028ffb3>] raid1d+0x21/0x594
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c01236c9>] del_timer_sync+0xa/0x10
[<c0312389>] schedule_timeout+0x50/0xac
[<c0296789>] md_thread+0x41/0x15a
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0296748>] md_thread+0x0/0x15a
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
reiserfs/0 D F7DBBC98 0 394 11 395 393 (L-TLB)
f7c74e9c f7bcce2c 00000000 f7dbbc98 00000000 7718f28d 0000000a ffffffff
00000000 00000000 0000000a f7c78668 f7c78540 c036ea20 c1804060 77190c89
0000000a 00000b06 00000000 f7c74000 c16f1540 00000001 00000003 f73849a4
Call Trace:
[<c03134aa>] __down+0xb7/0x11f
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c0310e07>] __down_failed+0x7/0xc
[<c01ac57e>] .text.lock.journal+0x8/0xfe
[<c01ab3f8>] flush_async_commits+0x73/0x75
[<c0129d6e>] worker_thread+0x1bd/0x247
[<c01ab385>] flush_async_commits+0x0/0x75
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
reiserfs/1 D 01406F60 0 395 11 394 (L-TLB)
c1b85db4 00000000 ffffffff 01406f60 00000082 7826ac88 00000014 f7dc872c
00000000 c1b85d90 0000000a f7c78b78 f7c78a50 c1920540 c180c060 78271282
00000014 00005b76 00000001 c1b85000 00000000 00000001 c1b85db4 c0117b55
Call Trace:
[<c0117b55>] __wake_up+0x32/0x43
[<c03122f0>] io_schedule+0x26/0x30
[<c0155877>] sync_buffer+0x30/0x33
[<c03124b4>] __wait_on_bit+0x42/0x5e
[<c0155847>] sync_buffer+0x0/0x33
[<c0312545>] out_of_line_wait_on_bit+0x75/0x7d
[<c0155847>] sync_buffer+0x0/0x33
[<c0158eec>] bio_alloc_bioset+0xd9/0x195
[<c012dc82>] wake_bit_function+0x0/0x3c
[<c01558db>] __wait_on_buffer+0x24/0x29
[<c01a78ae>] write_ordered_buffers+0x1cf/0x20e
[<c01a7469>] write_ordered_chunk+0x0/0x52
[<c0117305>] load_balance_newidle+0x42/0xf2
[<c013eaca>] do_writepages+0x23/0x37
[<c031118f>] schedule+0x31f/0xd03
[<c01a7e9f>] flush_commit_list+0x472/0x50e
[<c01ab3f8>] flush_async_commits+0x73/0x75
[<c0129d6e>] worker_thread+0x1bd/0x247
[<c01ab385>] flush_async_commits+0x0/0x75
[<c0117abc>] default_wake_function+0x0/0xc
[<c0129bb1>] worker_thread+0x0/0x247
[<c012d839>] kthread+0x93/0x97
[<c012d7a6>] kthread+0x0/0x97
[<c0100fcd>] kernel_thread_helper+0x5/0xb
udevd S C0373F84 0 445 1 1297 368 (NOTLB)
f7962ec4 3e8ac045 00000001 c0373f84 00000000 a499fb27 00000009 c013cd7f
000200d0 00000044 00000007 f7cb7668 f7cb7540 c036ea20 c1804060 a4baa159
00000009 00013a40 00000000 f7962000 c18e7500 c016cd2d f795d005 c0373f80
Call Trace:
[<c013cd7f>] get_page_from_freelist+0x6d/0xae
[<c016cd2d>] mntput_no_expire+0x13/0x5a
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c012dae3>] add_wait_queue+0xf/0x30
[<c0177311>] inotify_poll+0x29/0x55
[<c016543b>] do_select+0x2db/0x2f5
[<c0164fe5>] __pollwait+0x0/0x97
[<c0165655>] sys_select+0x1e8/0x39c
[<c0117abc>] default_wake_function+0x0/0xc
[<c0102a57>] sysenter_past_esp+0x54/0x75
kjournald S 00000000 0 1297 1 1357 445 (L-TLB)
f7ca8f74 c011bea4 f7ca8f80 00000000 00000000 fe56316f 00000003 f7a82458
00000000 00000246 00000003 c1b83668 c1b83540 c036ea20 c1804060 fe57c6c1
00000003 00002c60 00000000 f7ca8000 00000000 00000001 f7ca8f74 c0117b55
Call Trace:
[<c011bea4>] vprintk+0x275/0x2a1
[<c0117b55>] __wake_up+0x32/0x43
[<c01c4b0d>] kjournald+0x228/0x241
[<c01029ba>] ret_from_fork+0x6/0x14
[<c01c48e0>] commit_timeout+0x0/0x5
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c01c48e5>] kjournald+0x0/0x241
[<c0100fcd>] kernel_thread_helper+0x5/0xb
rc S 00000003 0 1357 1 1794 1662 1297 (NOTLB)
f7b5ef28 00000000 00000000 00000003 00000000 c092ae32 00000009 c0373fa8
f7c7ca50 000200d2 00000006 f7c7cb78 f7c7ca50 c036ea20 c1804060 c0c99f7b
00000009 00014d64 00000000 f7b5e000 0000000e c014003d c17d9a20 c17cb940
Call Trace:
[<c014003d>] __pagevec_lru_add_active+0xa8/0xb3
[<c0144d7b>] do_wp_page+0x251/0x2a3
[<c011ed8e>] do_wait+0x332/0x3d0
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c011eeca>] sys_wait4+0x31/0x35
[<c011eef5>] sys_waitpid+0x27/0x2b
[<c0102a57>] sysenter_past_esp+0x54/0x75
syslogd D F7CDC02C 0 1662 1 1665 1357 (NOTLB)
f71eaeb4 c01d848f 00000010 f7cdc02c f7c7dfa8 7162b6e9 0000000a f7bd4380
00000001 00000000 00000009 c1b3db78 c1b3da50 c1b8e030 c180c060 718720da
0000000a 0000309b 00000001 f71ea000 00000000 c0117b55 f743c540 00000000
Call Trace:
[<c01d848f>] elv_set_request+0x1e/0x33
[<c0117b55>] __wake_up+0x32/0x43
[<c03122f0>] io_schedule+0x26/0x30
[<c0138257>] sync_page+0x31/0x3c
[<c03124b4>] __wait_on_bit+0x42/0x5e
[<c0138226>] sync_page+0x0/0x3c
[<c013882f>] wait_on_page_bit+0x75/0x7d
[<c0138b40>] find_get_pages_tag+0x31/0x6f
[<c012dc82>] wake_bit_function+0x0/0x3c
[<c01383f9>] wait_on_page_writeback_range+0x9e/0x104
[<c0255b0c>] scsi_issue_flush_fn+0x37/0x39
[<c028ea19>] raid1_issue_flush+0x61/0xd0
[<c028e9b8>] raid1_issue_flush+0x0/0xd0
[<c0312abc>] __mutex_unlock_slowpath+0x6d/0x202
[<c0155d24>] do_fsync+0x78/0xbb
[<c0102a57>] sysenter_past_esp+0x54/0x75
klogd S C013CE17 0 1665 1 1676 1662 (NOTLB)
f797fd90 000000d0 00000000 c013ce17 00000044 6f018104 0000000a f7bd4380
00000000 000200d0 0000000a f7cbd668 f7cbd540 c1b3da50 c180c060 6f372d3f
0000000a 00006b70 00000001 f797f000 00000000 f6cbd000 f743c540 000004b0
Call Trace:
[<c013ce17>] __alloc_pages+0x57/0x2d2
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c0151b89>] cache_alloc_refill+0x54/0x210
[<c012dbc6>] prepare_to_wait_exclusive+0x12/0x4c
[<c030b7bb>] unix_wait_for_peer+0xb7/0xde
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c030c159>] unix_dgram_sendmsg+0x313/0x4e0
[<c016b5dc>] update_atime+0x45/0x70
[<c02a0375>] do_sock_write+0xa3/0xba
[<c02a04bb>] sock_aio_write+0x68/0x6c
[<c015488f>] do_sync_write+0xc3/0xff
[<c0117130>] load_balance+0x4e/0x1e1
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c0154a03>] vfs_write+0x138/0x13f
[<c0154ab5>] sys_write+0x41/0x6a
[<c0102a57>] sysenter_past_esp+0x54/0x75
named S F7BD4B80 0 1676 1 1677 1665 (NOTLB)
f7aa3f98 f7cb6f20 c0131948 f7bd4b80 00000001 ba820c84 00000007 00000001
00000000 00000000 00000009 c1b83158 c1b83030 c036ea20 c1804060 bab043c4
00000007 0007f858 00000000 f7aa3000 00000014 00000246 bfc30d28 00000008
Call Trace:
[<c0131948>] futex_wake_op+0x24a/0x44b
[<c0101d12>] sys_rt_sigsuspend+0xac/0xc8
[<c0102a57>] sysenter_past_esp+0x54/0x75
named S 00000010 0 1677 1 1678 1676 (NOTLB)
f7570ebc f7a3d180 00000010 00000010 b7a73324 7b9409ff 00000022 8009d6fc
f7570e90 c029fabb 00000009 c1b97b78 c1b97a50 c1920540 c180c060 7bb0308a
00000022 00009fba 00000001 f7570000 99120002 58186dc2 00000000 00000000
Call Trace:
[<c029fabb>] move_addr_to_user+0x49/0x54
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c0131505>] get_futex_key+0x39/0xee
[<c0117b01>] __wake_up_common+0x39/0x5b
[<c012dae3>] add_wait_queue+0xf/0x30
[<c01320b2>] futex_wait+0x1d2/0x23b
[<c0147ddc>] find_extend_vma+0x12/0x50
[<c0117abc>] default_wake_function+0x0/0xc
[<c0132362>] do_futex+0x49/0x89
[<c02a1b78>] sys_socketcall+0x26a/0x271
[<c013240d>] sys_futex+0x6b/0xbd
[<c0102a57>] sysenter_past_esp+0x54/0x75
named D 00000286 0 1678 1 1679 1677 (NOTLB)
f7cb6c24 c02a9645 00000004 00000286 c0151b89 b1c72edd 00000021 c197d01c
00000246 00000246 00000009 c1bccb78 c1bcca50 c1920540 c180c060 b20076d6
00000021 0002eb9f 00000001 f7cb6000 c02c4c94 80000000 00000000 f75ec1c0
Call Trace:
[<c02a9645>] dev_queue_xmit+0x96/0x2ad
[<c0151b89>] cache_alloc_refill+0x54/0x210
[<c02c4c94>] ip_finish_output+0x0/0x22f
[<c03134aa>] __down+0xb7/0x11f
[<c01a7440>] write_chunk+0x29/0x52
[<c0117abc>] default_wake_function+0x0/0xc
[<c0310e07>] __down_failed+0x7/0xc
[<c01ac57e>] .text.lock.journal+0x8/0xfe
[<c01a8316>] flush_journal_list+0x10f/0x63f
[<c01a8bc5>] flush_used_journal_lists+0xb1/0xce
[<c01abb84>] flush_old_journal_lists+0x4a/0x53
[<c01ac2f4>] do_journal_end+0x767/0x971
[<c01ab127>] journal_end+0x9d/0xd4
[<c019aaf4>] reiserfs_dirty_inode+0x77/0x82
[<c0172b98>] __mark_inode_dirty+0x28/0x177
[<c01962f9>] reiserfs_file_write+0x229/0x615
[<c01157b3>] activate_task+0x9d/0xe7
[<c016cd2d>] mntput_no_expire+0x13/0x5a
[<c0161800>] link_path_walk+0x65/0xcb
[<c01e8f90>] copy_to_user+0x4c/0x6a
[<c015d224>] cp_new_stat64+0xf6/0x108
[<c0154956>] vfs_write+0x8b/0x13f
[<c01960d0>] reiserfs_file_write+0x0/0x615
[<c0154ab5>] sys_write+0x41/0x6a
[<c0102a57>] sysenter_past_esp+0x54/0x75
named S 00000001 0 1679 1 1680 1678 (NOTLB)
f7389ebc c18049c0 00000069 00000001 00000001 dd9db4bd 00000026 f7bd4b80
00000001 c1bcca50 00000009 f7caeb78 f7caea50 c1920540 c180c060 dd9e01b7
00000026 000047e6 00000001 f7389000 00000000 00000282 c01234f1 f7389ed0
Call Trace:
[<c01234f1>] lock_timer_base+0x15/0x2f
[<c012358e>] __mod_timer+0x83/0x9e
[<c0312382>] schedule_timeout+0x49/0xac
[<c0131505>] get_futex_key+0x39/0xee
[<c0123e06>] process_timeout+0x0/0x5
[<c01320b2>] futex_wait+0x1d2/0x23b
[<c0147ddc>] find_extend_vma+0x12/0x50
[<c0117abc>] default_wake_function+0x0/0xc
[<c0132362>] do_futex+0x49/0x89
[<c01e8ffa>] copy_from_user+0x4c/0x82
[<c013240d>] sys_futex+0x6b/0xbd
[<c0102a57>] sysenter_past_esp+0x54/0x75
named S 00000000 0 1680 1 1702 1679 (NOTLB)
f71ebec4 00000001 c0373f84 00000000 c0373f80 46faf5f5 00000026 000200d0
00000044 00000002 00000009 f743c158 f743c030 c036ea20 c1804060 470586ad
00000026 0000379b 00000000 f71eb000 00000000 00200092 c0373f80 000200d0
Call Trace:
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c012dae3>] add_wait_queue+0xf/0x30
[<c02cb342>] tcp_poll+0x14/0x15b
[<c016543b>] do_select+0x2db/0x2f5
[<c0164fe5>] __pollwait+0x0/0x97
[<c0165655>] sys_select+0x1e8/0x39c
[<c0102a57>] sysenter_past_esp+0x54/0x75
portmap S 000000D0 0 1702 1 1711 1680 (NOTLB)
f7393f30 c0373f80 f7be8540 000000d0 00000000 8681e029 00000009 00000001
f7393f34 f7a2d284 00000004 f7be8668 f7be8540 c1920540 c180c060 86969e2d
00000009 00065fd1 00000001 f7393000 f7b73d18 00200246 c012dae3 f728d100
Call Trace:
[<c012dae3>] add_wait_queue+0xf/0x30
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c02a07ea>] sock_poll+0xc/0xe
[<c0165864>] do_pollfd+0x5b/0xad
[<c0165958>] do_poll+0xa2/0xc1
[<c0165aee>] sys_poll+0x177/0x239
[<c0164fe5>] __pollwait+0x0/0x97
[<c0102a57>] sysenter_past_esp+0x54/0x75
mdadm S 00000A00 0 1711 1 1791 1702 (NOTLB)
f7454e84 c1804060 f7cc1540 00000a00 f7cc1540 e6534d88 00000018 c0115e3c
32383880 000000b7 00000009 f7a41668 f7a41540 c1920540 c180c060 e654730a
00000018 00011f68 00000001 f7454000 00000000 ffffffff 01406f60 0000000a
Call Trace:
[<c0115e3c>] try_to_wake_up+0x6e/0x3dd
[<c03135be>] __down_interruptible+0xac/0x13c
[<c0117abc>] default_wake_function+0x0/0xc
[<c0310e13>] __down_failed_interruptible+0x7/0xc
[<c02984f3>] .text.lock.md+0xe3/0x130
[<c0147024>] vma_merge+0xc0/0x1a1
[<c029122e>] mddev_put+0x13/0x6c
[<c0170c9c>] seq_read+0x1ec/0x2c5
[<c0154718>] vfs_read+0x89/0x13d
[<c0170ab0>] seq_read+0x0/0x2c5
[<c0154a4b>] sys_read+0x41/0x6a
[<c0102a57>] sysenter_past_esp+0x54/0x75
acpid S 000200D0 0 1791 1 1799 1711 (NOTLB)
f7ae8f30 00000044 c013cd7f 000200d0 00000044 bf2526e8 00000009 000200d0
00000000 c0373f80 00000005 c1bd1b78 c1bd1a50 c036ea20 c1804060 bf4815c5
00000009 00036e00 00000000 f7ae8000 00000010 c16f7a00 f7b6c9c0 00000cf4
Call Trace:
[<c013cd7f>] get_page_from_freelist+0x6d/0xae
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c02a07ea>] sock_poll+0xc/0xe
[<c0165864>] do_pollfd+0x5b/0xad
[<c0165958>] do_poll+0xa2/0xc1
[<c0165aee>] sys_poll+0x177/0x239
[<c0164fe5>] __pollwait+0x0/0x97
[<c0102a57>] sysenter_past_esp+0x54/0x75
S50hplip S 00000003 0 1794 1357 1801 (NOTLB)
f7b92f28 000200d2 00000044 00000003 00000000 d821d839 00000009 c0373fa8
f7c73a50 000200d2 00000007 f7c73b78 f7c73a50 c036ea20 c1804060 d827abe2
00000009 000118b3 00000000 f7b92000 c0149870 c1b9712c c17ca860 fffba000
Call Trace:
[<c0149870>] anon_vma_prepare+0x20/0xc6
[<c0144ce8>] do_wp_page+0x1be/0x2a3
[<c011ed8e>] do_wait+0x332/0x3d0
[<c031118f>] schedule+0x31f/0xd03
[<c0117abc>] default_wake_function+0x0/0xc
[<c011eeca>] sys_wait4+0x31/0x35
[<c011eef5>] sys_waitpid+0x27/0x2b
[<c0102a57>] sysenter_past_esp+0x54/0x75
hpiod S 00000000 0 1799 1 1791 (NOTLB)
f79bce1c f79bcdd8 00000707 00000000 00000000 e22060cd 00000009 00000372
00000000 c0321bc0 0000000a f7a86668 f7a86540 c1920540 c180c060 e254825f
00000009 000125b9 00000001 f79bc000 00000000 00000000 00000000 00000000
Call Trace:
[<c03123a7>] schedule_timeout+0x6e/0xac
[<c02ca738>] inet_csk_wait_for_connect+0xea/0x112
[<c030b2c7>] unix_find_other+0xe6/0x121
[<c012dc4b>] autoremove_wake_function+0x0/0x37
[<c02ca7c1>] inet_csk_accept+0x61/0x134
[<c02e6cf4>] inet_accept+0x1f/0xa2
[<c016a2f5>] alloc_inode+0x15/0x135
[<c02a0ecc>] sys_accept+0x96/0x14f
[<c014589f>] do_no_page+0x165/0x27e
[<c01397d7>] filemap_nopage+0x0/0x3a3
[<c0145ae8>] __handle_mm_fault+0xc0/0x217
[<c02a19d4>] sys_socketcall+0xc6/0x271
[<c0114423>] do_page_fault+0x0/0x5b5
[<c0102a57>] sysenter_past_esp+0x54/0x75
bash S F78C4150 0 1801 1794 1802 (NOTLB)
f71b9f28 000200d2 00000031 f78c4150 c0138928 d821bbbc 00000009 c0139ad6
c1b97030 000200d2 00000005 c1b97158 c1b97030 c1920540 c180c060 d847a0ca
00000009 0000f6a8 00000001 f71b9000 c17f23c0 c014589f c17d02e0 c17ce140
Call Trace:
[<c0138928>] find_get_page+0x18/0x3a
[<c0139ad6>] filemap_nopage+0x2ff/0x3a3
[<c014589f>] do_no_page+0x165/0x27e
[<c0144d7b>] do_wp_page+0x251/0x2a3
[<c011ed8e>] do_wait+0x332/0x3d0
[<c0117abc>] default_wake_function+0x0/0xc
[<c011eeca>] sys_wait4+0x31/0x35
[<c011eef5>] sys_waitpid+0x27/0x2b
[<c0102a57>] sysenter_past_esp+0x54/0x75
python D F7DDB6A8 0 1802 1801 (NOTLB)
f7904b1c c01591d3 f7d89b50 f7ddb6a8 00000001 7eb66828 0000000a c028f2f7
00000001 00000000 00000007 f743c668 f743c540 c036ea20 c1804060 7ec2673f
0000000a 000bf8be 00000000 f7904000 00000000 00000001 f7904b1c c0117b55
Call Trace:
[<c01591d3>] bio_clone+0x9c/0xae
[<c028f2f7>] make_request+0x3da/0x458
[<c0117b55>] __wake_up+0x32/0x43
[<c03122f0>] io_schedule+0x26/0x30
[<c0155877>] sync_buffer+0x30/0x33
[<c03124b4>] __wait_on_bit+0x42/0x5e
[<c0155847>] sync_buffer+0x0/0x33
[<c0312545>] out_of_line_wait_on_bit+0x75/0x7d
[<c0155847>] sync_buffer+0x0/0x33
[<c012dc82>] wake_bit_function+0x0/0x3c
[<c01558db>] __wait_on_buffer+0x24/0x29
[<c01a2bf0>] search_by_key+0x12b/0xc4f
[<c01a2ea6>] search_by_key+0x3e1/0xc4f
[<c01236c9>] del_timer_sync+0xa/0x10
[<c0312389>] schedule_timeout+0x50/0xac
[<c0123e06>] process_timeout+0x0/0x5
[<c012dc25>] finish_wait+0x25/0x4b
[<c0191447>] reiserfs_read_locked_inode+0x69/0x105
[<c016ad00>] get_new_inode+0x5f/0x12f
[<c019156d>] reiserfs_iget+0x6f/0x86
[<c01913d0>] reiserfs_init_locked_inode+0x0/0xe
[<c018cdcc>] reiserfs_lookup+0xb2/0x117
[<c0160810>] real_lookup+0x1d/0xc6
[<c0160893>] real_lookup+0xa0/0xc6
[<c0160acb>] do_lookup+0x6d/0x78
[<c01611bc>] __link_path_walk+0x6e6/0xcc5
[<c016cd2d>] mntput_no_expire+0x13/0x5a
[<c0160d8e>] __link_path_walk+0x2b8/0xcc5
[<c01617e0>] link_path_walk+0x45/0xcb
[<c0153e23>] get_unused_fd+0x50/0xb2
[<c0161ad8>] path_lookup+0x92/0x15a
[<c0161bda>] __path_lookup_intent_open+0x3a/0x6f
[<c0161c27>] path_lookup_open+0x18/0x1d
[<c0162184>] open_namei+0x5a/0x5d2
[<c015ccf7>] vfs_stat+0x14/0x3a
[<c0153cb1>] filp_open+0x1c/0x35
[<c0153e23>] get_unused_fd+0x50/0xb2
[<c0153f4b>] do_sys_open+0x35/0xb6
[<c0102a57>] sysenter_past_esp+0x54/0x75

Showing all blocking locks in the system:
S init: 1 [c1920a50, 116] (not blocked on mutex)
S migration/0: 2 [c1920030, 0] (not blocked on mutex)
S ksoftirqd/0: 3 [c1928a50, 134] (not blocked on mutex)
S watchdog/0: 4 [c1928540, 0] (not blocked on mutex)
S migration/1: 5 [c1928030, 0] (not blocked on mutex)
S ksoftirqd/1: 6 [c1930a50, 134] (not blocked on mutex)
S watchdog/1: 7 [c1930540, 0] (not blocked on mutex)
S events/0: 8 [c1930030, 110] (not blocked on mutex)
S events/1: 9 [c1950a50, 110] (not blocked on mutex)
S khelper: 10 [c1936a50, 110] (not blocked on mutex)
S kthread: 11 [c1950540, 111] (not blocked on mutex)
S kblockd/0: 14 [c1967540, 110] (not blocked on mutex)
S kblockd/1: 15 [c1967030, 110] (not blocked on mutex)
S kacpid: 16 [c1936540, 111] (not blocked on mutex)
S khubd: 108 [c1b92540, 110] (not blocked on mutex)
S pdflush: 160 [f7c0aa50, 120] (not blocked on mutex)
D pdflush: 161 [f7c0a540, 115] (not blocked on mutex)
S kswapd0: 162 [f7c0a030, 117] (not blocked on mutex)
S aio/0: 163 [c1b8e540, 111] (not blocked on mutex)
S aio/1: 164 [c1b51030, 111] (not blocked on mutex)
S kseriod: 252 [f7cd2540, 110] (not blocked on mutex)
S ata/0: 277 [f7cd2a50, 111] (not blocked on mutex)
S ata/1: 278 [c1b92030, 110] (not blocked on mutex)
S scsi_eh_0: 280 [c1b92a50, 111] (not blocked on mutex)
S scsi_eh_1: 281 [f7ccd030, 110] (not blocked on mutex)
S scsi_eh_2: 282 [f7ccda50, 110] (not blocked on mutex)
S scsi_eh_3: 283 [f7ccd540, 111] (not blocked on mutex)
S kirqd: 368 [f7cc1a50, 115] (not blocked on mutex)
S md5_raid1: 374 [f7cc1540, 110] (not blocked on mutex)
S md4_raid1: 378 [c1bc2a50, 110] (not blocked on mutex)
S md3_raid1: 382 [c1b97540, 110] (not blocked on mutex)
D md2_raid1: 386 [c1b8e030, 110] (not blocked on mutex)
S md1_raid1: 390 [f7ca1a50, 110] (not blocked on mutex)
D md0_raid1: 393 [c1b51540, 110] (not blocked on mutex)
D reiserfs/0: 394 [f7c78540, 110] (not blocked on mutex)
D reiserfs/1: 395 [f7c78a50, 110] (not blocked on mutex)
S udevd: 445 [f7cb7540, 113] (not blocked on mutex)
S kjournald: 1297 [c1b83540, 122] (not blocked on mutex)
S rc: 1357 [f7c7ca50, 117] (not blocked on mutex)
D syslogd: 1662 [c1b3da50, 116] (not blocked on mutex)
S klogd: 1665 [f7cbd540, 115] (not blocked on mutex)
S named: 1676 [c1b83030, 116] (not blocked on mutex)
S named: 1677 [c1b97a50, 116] (not blocked on mutex)
D named: 1678 [c1bcca50, 116] (not blocked on mutex)
S named: 1679 [f7caea50, 116] (not blocked on mutex)
S named: 1680 [f743c030, 116] (not blocked on mutex)
S portmap: 1702 [f7be8540, 121] (not blocked on mutex)
S mdadm: 1711 [f7a41540, 116] (not blocked on mutex)
S acpid: 1791 [c1bd1a50, 120] (not blocked on mutex)
S S50hplip: 1794 [f7c73a50, 116] (not blocked on mutex)
S hpiod: 1799 [f7a86540, 115] (not blocked on mutex)
S bash: 1801 [c1b97030, 119] (not blocked on mutex)
D python: 1802 [f743c540, 118] (not blocked on mutex)

---------------------------
| showing all locks held: |
---------------------------

#001: [f7dbcc4c] {alloc_super}
.. held by: pdflush: 161 [f7c0a540, 115]
... acquired at: sync_supers+0x8d/0xeb

#002: [f73d62c4] {inode_init_once}
.. held by: python: 1802 [f743c540, 118]
... acquired at: real_lookup+0x1d/0xc6

#003: [f724eb84] {inode_init_once}
.. held by: named: 1678 [c1bcca50, 116]
... acquired at: reiserfs_file_write+0x1be/0x615

=============================================


reuben

2006-01-10 11:34:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-mm2


could you also enable CONFIG_FRAME_POINTERS, to make the backtraces
easier to read?

Ingo

2006-01-10 12:29:16

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 10/01/2006 11:47 p.m., Ingo Molnar wrote:
> * Reuben Farrelly <[email protected]> wrote:
>
>>> Don't know, sorry. But this kernel had oopsed, hadn't it?
>> This one is still present in -git6. The symptoms are that the kernel
>> boots up, the userspace applications start launching as the system
>> starts to go to runlevel 3, and then the system 'blocks' on
>> $random_service (clamd, mysql and vsftp and others). I've left it for
>> 5 mins and it never continued on..
>>
>> There's no oops, and nothing seems to be logged about it, I can hit
>> enter and the console jumps to a new line, so the machine doesn't lock
>> up hard, it seems to be getting 'stuck'.
>
> could you please also send me a SysRq-T (showTasks) output? [which will
> also include all the stacktraces] (Please make sure you have
> KALLSYMS_ALL enabled.)

Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
patch from Ingo.

Linux version 2.6.15-git6 ([email protected]) (gcc version 4.1.0 20060106
(Red Hat 4.1.0-0.14)) #1 SMP Wed Jan 11 01:12:07 NZDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fe2f800 (usable)
BIOS-e820: 000000003fe2f800 - 000000003fe3f8e3 (ACPI NVS)
BIOS-e820: 000000003ff2f800 - 000000003ff30000 (ACPI NVS)
BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data)
BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS)
BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fed13000 - 00000000fed1a000 (reserved)
BIOS-e820: 00000000fed1c000 - 00000000feda0000 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode: Flat. Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
Built 1 zonelists
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c043f000 soft=c043d000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2800.394 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1033336k/1046716k available (2185k kernel code, 12728k reserved, 900k
data, 204k init, 129212k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5611.90 BogoMIPS (lpj=11223811)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c0440000 soft=c043e000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5600.73 BogoMIPS (lpj=11201467)
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Total of 2 processors activated (11212.63 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: Power Resource [URP2] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: ffa00000-ffafffff
PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: ff600000-ff6fffff
PREFETCH window: fdb00000-fdbfffff
PCI: Bridge: 0000:00:1c.1
IO window: a000-afff
MEM window: ff700000-ff7fffff
PREFETCH window: fdc00000-fdcfffff
PCI: Bridge: 0000:00:1c.2
IO window: disabled.
MEM window: ff800000-ff8fffff
PREFETCH window: fdd00000-fddfffff
PCI: Bridge: 0000:00:1c.3
IO window: disabled.
MEM window: ff900000-ff9fffff
PREFETCH window: fde00000-fdefffff
PCI: Bridge: 0000:00:1e.0
IO window: b000-bfff
MEM window: ff500000-ff5fffff
PREFETCH window: fe000000-fe7fffff
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
assign_interrupt_mode Found MSI capability
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
assign_interrupt_mode Found MSI capability
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: Processor [CPU2] (supports 8 throttling states)
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
?serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
sky2 v0.11 addr 0xff720000 irq 177 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:11:11:43:05:2f
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193
ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 193
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: SATA link up 1.5 Gbps (SStatus 113)
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: SATA link up 1.5 Gbps (SStatus 113)
ata3: dev 0 ATA-6, max UDMA/133, 156299375 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : ahci
ata4: SATA link down (SStatus 0)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380013AS Rev: 3.18
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 >
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 156299375 512-byte hdwr sectors (80025 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
SCSI device sdc: 156299375 512-byte hdwr sectors (80025 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: Attached scsi generic sg2 type 0
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 58
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 58, io mem 0xff4ff800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
USB Universal Host Controller Interface driver v2.3
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 58
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 58, io base 0x0000cc00
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 193
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 193, io base 0x0000d000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 185
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 185, io base 0x0000d400
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 169
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 169, io base 0x0000d800
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usb 5-1: new full speed USB device using uhci_hcd and address 2
usb 5-1: configuration #1 chosen from 1 choice
usb 5-2: new full speed USB device using uhci_hcd and address 3
usb 5-2: configuration #1 chosen from 1 choice
hub 5-2:1.0: USB hub found
hub 5-2:1.0: 4 ports detected
usb 5-2.1: new low speed USB device using uhci_hcd and address 4
usb 5-2.1: configuration #1 chosen from 1 choice
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usbcore: registered new driver libusual
usbcore: registered new driver hiddev
input: Belkin Components Belkin OmniView KVM Switch as /class/input/input0
input: USB HID v1.00 Keyboard [Belkin Components Belkin OmniView KVM Switch] on
usb-0000:00:1d.3-2.1
input: Belkin Components Belkin OmniView KVM Switch as /class/input/input1
input: USB HID v1.00 Mouse [Belkin Components Belkin OmniView KVM Switch] on
usb-0000:00:1d.3-2.1
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
mice: PS/2 mouse device common for all mice
md: raid1 personality registered for level 1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 9, 2621440 bytes)
TCP bind hash table entries: 65536 (order: 8, 1310720 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
IPv4 over IPv4 tunneling driver
ip_conntrack version 2.4 (8177 buckets, 65416 max) - 212 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
ipt_recent v0.3.1: Stephen Frost <[email protected]>.
http://snowman.net/projects/ipt_recent/
arp_tables: (C) 2002 David S. Miller
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
p4-clockmod: P4/Xeon(TM) CPU On-Demand Clock Modulation available
Starting balanced_irq
Using IPI Shortcut mode
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb10 ...
md: adding sdb10 ...
md: sdb7 has different UUID to sdb10
md: sdb6 has different UUID to sdb10
md: sdb5 has different UUID to sdb10
md: sdb3 has different UUID to sdb10
md: sdb2 has different UUID to sdb10
md: adding sda10 ...
md: sda7 has different UUID to sdb10
md: sda6 has different UUID to sdb10
md: sda5 has different UUID to sdb10
md: sda3 has different UUID to sdb10
md: sda2 has different UUID to sdb10
md: created md5
md: bind<sda10>
md: bind<sdb10>
md: running: <sdb10><sda10>
raid1: raid set md5 active with 2 out of 2 mirrors
md5: bitmap initialized from disk: read 11/11 pages, set 3 bits, status: 0
created bitmap (161 pages) for device md5
md: considering sdb7 ...
md: adding sdb7 ...
md: sdb6 has different UUID to sdb7
md: sdb5 has different UUID to sdb7
md: sdb3 has different UUID to sdb7
md: sdb2 has different UUID to sdb7
md: adding sda7 ...
md: sda6 has different UUID to sdb7
md: sda5 has different UUID to sdb7
md: sda3 has different UUID to sdb7
md: sda2 has different UUID to sdb7
md: created md4
md: bind<sda7>
md: bind<sdb7>
md: running: <sdb7><sda7>
raid1: raid set md4 active with 2 out of 2 mirrors
md4: bitmap initialized from disk: read 4/4 pages, set 20 bits, status: 0
created bitmap (61 pages) for device md4
md: considering sdb6 ...
md: adding sdb6 ...
md: sdb5 has different UUID to sdb6
md: sdb3 has different UUID to sdb6
md: sdb2 has different UUID to sdb6
md: adding sda6 ...
md: sda5 has different UUID to sdb6
md: sda3 has different UUID to sdb6
md: sda2 has different UUID to sdb6
md: created md3
md: bind<sda6>
md: bind<sdb6>
md: running: <sdb6><sda6>
raid1: raid set md3 active with 2 out of 2 mirrors
md3: bitmap initialized from disk: read 1/1 pages, set 11 bits, status: 0
created bitmap (13 pages) for device md3
md: considering sdb5 ...
md: adding sdb5 ...
md: sdb3 has different UUID to sdb5
md: sdb2 has different UUID to sdb5
md: adding sda5 ...
md: sda3 has different UUID to sdb5
md: sda2 has different UUID to sdb5
md: created md2
md: bind<sda5>
md: bind<sdb5>
md: running: <sdb5><sda5>
raid1: raid set md2 active with 2 out of 2 mirrors
md2: bitmap initialized from disk: read 10/10 pages, set 187 bits, status: 0
created bitmap (150 pages) for device md2
md: considering sdb3 ...
md: adding sdb3 ...
md: sdb2 has different UUID to sdb3
md: adding sda3 ...
md: sda2 has different UUID to sdb3
md: created md1
md: bind<sda3>
md: bind<sdb3>
md: running: <sdb3><sda3>
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 10/10 pages, set 5 bits, status: 0
created bitmap (150 pages) for device md1
md: considering sdb2 ...
md: adding sdb2 ...
md: adding sda2 ...
md: created md0
md: bind<sda2>
md: bind<sdb2>
md: running: <sdb2><sda2>
raid1: raid set md0 active with 2 out of 2 mirrors
md0: bitmap initialized from disk: read 12/12 pages, set 75 bits, status: 0
created bitmap (187 pages) for device md0
md: ... autorun DONE.
ReiserFS: md0: found reiserfs format "3.6" with standard journal
ReiserFS: md0: using ordered data mode
ReiserFS: md0: journal params: device md0, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit

age 30, max trans age 30
ReiserFS: md0: checking transaction log (md0)
ReiserFS: md0: Using r5 hash to sort names
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 204k freed
Write protecting the kernel read-only data: 341k
INIT: version 2.86 booting
Welcome to Fedora Core
Press 'I' to enter interactive startup.
Setting clock (utc): Wed Jan 11 01:16:06 NZDT 2006 [ OK ]
Starting udev:udevd-event[787]: udev_db_lookup_name: unable to open udev_db
'/dev/.udev/db': No such file or directory
[ OK ]
Setting hostname tornado.reub.net: [ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1
/dev/sda1 has been mounted 30 times without being checked, check forced.
/dev/sda1: 39/6024 files (28.2% non-contiguous), 12389/24064 blocks
[ OK ]
Remounting root filesystem in read-write mode: [ OK ]
Mounting local filesystems: [ OK ]
rm: cannot remove `/var/run/dovecot/login': Is a directory
Enabling swap space: [ OK ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting sysstat: Calling the system activity data collector (sadc):
[ OK ]
Checking for hardware changes [ OK ]
Flushing firewall rules: [ OK ]
Setting chains to policy ACCEPT: nat mangle filter [ OK ]
Unloading iptables modules: [ OK ]
Applying iptables firewall rules: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface gre0: [ OK ]
Starting system logger: [ OK ]
Starting kernel logger: [ OK ]
Starting named: [ OK ]
Starting portmap: [ OK ]
Starting mdmonitor: [ OK ]
Mounting other filesystems: [ OK ]
Starting lm_sensors: Starting lm_sensors: i2c_adapter i2c-0: Unrecognized
version/stepping 0x69 Defaulting to LM85.
[ OK ]
[ OK ]
Starting acpi daemon: [ OK ]
Starting hpiod: [ OK ]
Starting hpssd: [ OK ]
Starting cups:

<2 mins later>

SysRq : Show State

sibling
task PC pid father child younger older
init S C013EA16 0 1 0 2 (NOTLB)
c1924eb8 c0383400 c1924e70 c013ea16 000200d0 3eb3674a 00000019 00000000
000200d0 00000000 00000009 c1923b98 c1923a70 c1923550 c180f060 3eb3cee0
00000019 0000600b c1924000 00000282 c1924ecc c1924e9c 00000282 c180fee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c016816e>] do_select+0x28b/0x2a2
[<c0168374>] sys_select+0x1ce/0x374
[<c0102a7b>] sysenter_past_esp+0x54/0x75
migration/0 S C1923030 0 2 1 3 (L-TLB)
c192afac 00000086 0000001f c1923030 c1b43550 c52e7915 00000009 00000000
f741e680 00000001 00000001 c1923158 c1923030 c037da60 c1807060 c535cc9f
00000009 00000c07 c192a000 00000292 f6e68f48 c192af90 00000292 f6e68f44
Call Trace:
[<c01193f9>] migration_thread+0x7d/0x10b
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
ksoftirqd/0 S 00002040 0 3 1 4 2 (L-TLB)
c192cfa8 c1923c30 c037d880 00002040 c192cf44 0011523d 00000007 00000000
00000092 00000002 0000000a c192bb98 c192ba70 c037da60 c1807060 00141c2c
00000007 00000759 c192c000 00000246 c192ba70 c192cf8c 00000246 0000000e
Call Trace:
[<c012151f>] ksoftirqd+0xc4/0xc6
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
watchdog/0 S 00000001 0 4 1 5 3 (L-TLB)
c192df70 00000000 00000001 00000001 0019725e aaf8bf3a 00000019 00000002
00000000 c192df24 00000001 c192b678 c192b550 c037da60 c1807060 aaf8d6b0
00000019 0000074f c192d000 00000282 c192df84 c192df54 00000282 c1807ee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c032059a>] schedule_timeout_interruptible+0x17/0x19
[<c01251c5>] msleep_interruptible+0x34/0x43
[<c013901f>] watchdog+0x42/0x67
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
migration/1 S C180F9C0 0 5 1 6 4 (L-TLB)
c192efac 00000086 00000002 c180f9c0 c03219c3 c38ae12b 00000009 00000000
f769f700 00000000 00000001 c192b158 c192b030 f70b2a70 c180f060 c38c49b9
00000009 00000d49 c192e000 00000000 f7c19f48 f709c030 00000753 f70b2a70
Call Trace:
[<c01193f9>] migration_thread+0x7d/0x10b
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
ksoftirqd/1 S 0000A040 0 6 1 7 5 (L-TLB)
c1934fa8 c192b1f0 c037d880 0000a040 c1934f44 a97a01ad 00000001 00000001
00000000 00000000 0000000a c1933b98 c1933a70 c1923550 c180f060 a981a627
00000001 00001419 c1934000 00000246 c1933a70 c1934f8c 00000246 00000018
Call Trace:
[<c012151f>] ksoftirqd+0xc4/0xc6
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
watchdog/1 S 00000001 0 7 1 8 6 (L-TLB)
c1935f70 00000000 00000001 00000001 001a5d72 abaf76fc 00000019 00000001
00000000 c1935f24 00000009 c1933678 c1933550 c1923550 c180f060 abaf8725
00000019 00000988 c1935000 00000282 c1935f84 c1935f54 00000282 c180fee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c032059a>] schedule_timeout_interruptible+0x17/0x19
[<c01251c5>] msleep_interruptible+0x34/0x43
[<c013901f>] watchdog+0x42/0x67
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
events/0 S C1809178 0 8 1 9 7 (L-TLB)
c1938f38 c1807ee0 00000282 c1809178 c1938ed8 7125e045 00000019 c1938ef4
c012450e 00000000 0000000a c1933158 c1933030 c037da60 c1807060 71264f41
00000019 0000679c c1938000 00000001 c1938f18 00000246 c19072a0 c1938f24
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
events/1 S C1811178 0 9 1 10 8 (L-TLB)
c1951f38 c180fee0 00000282 c1811178 c1951ed8 719fad75 00000019 c1951ef4
c012450e 00000000 0000000a c1950b98 c1950a70 c1923550 c180f060 71a004ba
00000019 00004cad c1951000 00000001 c1951f18 00000246 c1907220 c1951f24
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
khelper S C193AF2C 0 10 1 11 9 (L-TLB)
c193af38 00800711 c193af20 c193af2c c0101068 97bb5059 00000009 00000000
00000000 c193aee4 00000008 c1939b98 c1939a70 f7c7da70 c1807060 97c68368
00000009 000026e6 c193a000 00000000 c193af18 c1923550 c19071a0 f7c7da70
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kthread S 00000002 0 11 1 14 162 10 (L-TLB)
c1960f38 f7cd7550 c180f060 00000002 00000000 3569a3fc 00000008 00000000
f769fd00 c1960f28 00000009 c1950678 c1950550 c1b7fa70 c1807060 359ad331
00000008 00000884 c1960000 00000000 0000041a f7cc4030 c195fca0 c1b7fa70
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kblockd/0 S 00000000 0 14 11 15 (L-TLB)
c196af38 c196aed4 c012f3d9 00000000 c1b91fa4 cb9b3ca3 00000009 c011842d
00000000 00000001 0000000a c1968678 c1968550 c037da60 c1807060 cb9b5a03
00000009 0000107e c196a000 00000001 c196af18 00000246 c193db20 c196af24
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kblockd/1 S 00000000 0 15 11 16 14 (L-TLB)
c196bf38 c196bed4 c012f3d9 00000000 c1b91fa4 c384124c 00000009 00000001
f741e680 00000001 0000000a c1968158 c1968030 c1b43550 c180f060 c38458bc
00000009 000027f3 c196b000 00000000 c196bf18 f7c7da70 c193daa0 c1b43550
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kacpid S C01177C5 0 16 11 108 15 (L-TLB)
c1b57f38 00000001 c1b57f10 c01177c5 00000000 01da9d16 00000000 00000000
c18079c0 00000000 00000009 c1939678 c1939550 c1923550 c180f060 01db1be1
00000000 000009d6 c1b57000 c03219a9 c1b57f38 00000246 c195fba0 c1b57f24
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
khubd S F7D52880 0 108 11 160 16 (L-TLB)
c1b44f7c 000003e8 f7d5d3a0 f7d52880 f7d52880 356556f2 00000002 c0273cee
00000004 f7d52880 0000000a c1b43b98 c1b43a70 c037da60 c1807060 35760561
00000002 00000bec c1b44000 00000000 00000246 c03999a8 c1b44f64 00000246
Call Trace:
[<c0275319>] hub_thread+0xd8/0x106
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
pdflush S 00000002 0 160 11 161 108 (L-TLB)
f7c27f88 f7c27f44 c0117c15 00000002 00000001 92dc34de 00000000 00000001
00000000 c037dc20 00000009 f7c26b98 f7c26a70 c1950550 c1807060 92defaf4
00000000 00000eef f7c27000 00000000 c011922d c1923550 00000003 c1950550
Call Trace:
[<c0140e01>] __pdflush+0x81/0x199
[<c0140f45>] pdflush+0x2c/0x32
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
pdflush D 00000000 0 161 11 163 160 (L-TLB)
f7c28c60 c1b98fa4 f741ab18 00000000 f7c28bfc caa787fa 00000009 c012f3d9
00000000 c1b98fa4 0000000a f7c26678 f7c26550 c1923550 c180f060 cad5c95a
00000009 0000081e f7c28000 f741ab08 00000246 f7db0948 f7c28c48 00000246
Call Trace:
[<c02a2f72>] md_write_start+0xbc/0x150
[<c029a659>] make_request+0x51/0x432
[<c01e1146>] generic_make_request+0xbe/0x13d
[<c01e120e>] submit_bio+0x49/0xd3
[<c015ad93>] submit_bh+0xc3/0x111
[<c015ae60>] ll_rw_block+0x7f/0xd3
[<c01ac41b>] flush_commit_list+0x1d3/0x503
[<c01b0d39>] do_journal_end+0x7c3/0x912
[<c01afce1>] journal_end_sync+0x65/0x77
[<c019e6d2>] reiserfs_sync_fs+0x57/0x67
[<c019e6ef>] reiserfs_write_super+0xd/0xf
[<c015ce15>] sync_supers+0xbe/0x103
[<c01405d8>] wb_kupdate+0x38/0x13e
[<c0140e3a>] __pdflush+0xba/0x199
[<c0140f45>] pdflush+0x2c/0x32
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
aio/0 S C01177C5 0 163 11 164 161 (L-TLB)
c1b8ff38 00000000 c1b8ff10 c01177c5 00000000 933ff7e0 00000000 00000001
c180f9c0 00000000 00000009 c1b8eb98 c1b8ea70 c037da60 c1807060 9341263b
00000000 00000c8b c1b8f000 c03219a9 c1b8ff38 00000246 c1bac6a0 c1b8ff24
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kswapd0 S 00000000 0 162 1 368 11 (L-TLB)
f7c29f88 f7c29f64 c01edef4 00000000 f7c2611c 933ff7e0 00000000 f7c26118
00000286 f7c26030 00000008 f7c26158 f7c26030 c037da60 c1807060 9340eda9
00000000 00001fe9 f7c29000 f7c29000 00000246 c0383484 f7c29f70 00000246
Call Trace:
[<c0143dab>] kswapd+0x114/0x128
[<c0100fe5>] kernel_thread_helper+0x5/0xb
aio/1 S C044A3BC 0 164 11 252 163 (L-TLB)
c1b40f38 c180f9c0 00000000 c044a3bc 00000080 934134da 00000000 00000000
00000000 00000000 00000009 c1b3d158 c1b3d030 c1923a70 c180f060 93416a14
00000000 0000089c c1b40000 00000000 c1b40f38 c037da60 c1bac620 c1923a70
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kseriod S 00000002 0 252 11 277 164 (L-TLB)
f7ceff78 c039ce48 00000000 00000002 00000001 52e70f94 00000002 f7ceff48
00000002 c039ce48 0000000a f7ced678 f7ced550 c037da60 c1807060 52f30b80
00000002 0002cbad f7cef000 f7ceff60 00000246 c0392cfc f7ceff60 00000246
Call Trace:
[<c02318e1>] serio_thread+0xb1/0x109
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
ata/0 S C01177C5 0 277 11 278 252 (L-TLB)
f7cecf38 00000001 f7cecf10 c01177c5 00000000 b6ac2621 00000000 00000001
00000000 00000000 00000009 f7ce0158 f7ce0030 c1950550 c1807060 b6acc04a
00000000 0000107e f7cec000 00000000 f7cecf38 c1923a70 f7cb8aa0 c1950550
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
ata/1 S C01177C5 0 278 11 280 277 (L-TLB)
f7ceef38 00000001 f7ceef10 c01177c5 00000000 b6aad46e 00000000 00000000
00000000 00000000 00000009 f7cedb98 f7ceda70 c1923a70 c180f060 b6ad0b1a
00000000 00000922 f7cee000 00000000 f7ceef38 c037da60 f7cb8a20 c1923a70
Call Trace:
[<c012b1b3>] worker_thread+0x12e/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
scsi_eh_0 S C03219C3 0 280 11 281 278 (L-TLB)
f7ce1fb0 00002000 f7ce1f44 c03219c3 f7ce1fc4 566df5ed 00000001 00000002
00000008 f7ce0b98 0000000a f7ce0b98 f7ce0a70 c037da60 c1807060 569effab
00000001 0000065b f7ce1000 c1807060 569ee457 00000001 00000cd1 f7ce1000
Call Trace:
[<c025e60a>] scsi_error_handler+0x48/0xa5
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
scsi_eh_1 S C03219C3 0 281 11 282 280 (L-TLB)
f7cf5fb0 00002000 f7cf5f44 c03219c3 f7cf5fc4 5724b4f1 00000001 00000002
00000009 f7cf1678 0000000a f7cf1678 f7cf1550 c037da60 c1807060 574feb15
00000001 00000595 f7cf5000 c1807060 574fd1f0 00000001 00000bee f7cf5000
Call Trace:
[<c025e60a>] scsi_error_handler+0x48/0xa5
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
scsi_eh_2 S C03219C3 0 282 11 283 281 (L-TLB)
f7cf0fb0 00000040 f7cf0f44 c03219c3 f7cf0fc4 58007f47 00000001 00000002
00000007 f7ced158 00000009 f7ced158 f7ced030 c1923550 c180f060 5800e2d9
00000001 000005ef f7cf0000 c1807060 5800c2ad 00000001 00000c3e f7cf0000
Call Trace:
[<c025e60a>] scsi_error_handler+0x48/0xa5
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
scsi_eh_3 S C03219C3 0 283 11 374 282 (L-TLB)
c1b3ffb0 00002140 c1b3ff44 c03219c3 c1b3ffc4 58b19b8e 00000001 00000000
00000000 c1b3d678 00000009 c1b43158 c1b43030 c1923a70 c180f060 58b1d409
00000001 00000495 c1b3f000 00000000 58b1b397 c037da60 00000cb1 c1923a70
Call Trace:
[<c025e60a>] scsi_error_handler+0x48/0xa5
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
kirqd S 0000001E 0 368 1 445 162 (L-TLB)
f7ccff88 00000000 00000000 0000001e 00000001 9e990eaa 00000019 00000000
f75ef380 00000246 0000000a f7ccd678 f7ccd550 c037da60 c1807060 9e993bc0
00000019 00002751 f7ccf000 00000282 f7ccff9c f7ccff6c 00000282 c1807ee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c032059a>] schedule_timeout_interruptible+0x17/0x19
[<c0111a85>] balanced_irq+0x8a/0xc0
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md5_raid1 S F7DB0610 0 374 11 378 283 (L-TLB)
f7ccbf3c 00000008 f7ccbed0 f7db0610 f7ccbedc 918294c8 00000019 00000002
00000000 00000282 0000000a f7ccd158 f7ccd030 c1923550 c180f060 91bb8f46
00000019 02204783 f7ccb000 00000282 f7ccbf50 f7ccbf20 00000282 c180fee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c02a21e5>] md_thread+0x10f/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md4_raid1 S F7DB0410 0 378 11 382 374 (L-TLB)
c1bc9f3c 00000008 c1bc9ed0 f7db0410 c1bc9edc 91829d5b 00000019 00000002
00000000 00000282 0000000a c1bc7b98 c1bc7a70 f7c6f030 c1807060 918bb09d
00000019 00d3fe93 c1bc9000 00000000 c1bc9f50 f7ccd030 00000282 f7c6f030
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c02a21e5>] md_thread+0x10f/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md3_raid1 S F7DB0010 0 382 11 386 378 (L-TLB)
f7cc7f3c 00000008 f7cc7ed0 f7db0010 f7cc7edc 8d3a119d 00000019 00000002
00000000 00000282 0000000a c1b47b98 c1b47a70 c1923550 c180f060 8d5c6575
00000019 00224c3b f7cc7000 00000282 f7cc7f50 f7cc7f20 00000282 c180fee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c02a21e5>] md_thread+0x10f/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
c1b91e64 f7227200 00000001 f7227200 00000000 c2dc6fe1 00000010 00000001
00000001 00001000 0000000a c1b43678 c1b43550 c1923550 c180f060 c2dd5d38
00000010 00007fe0 c1b91000 c015b787 00000246 f7dd6f48 c1b91e4c 00000246
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c02a4f93>] bitmap_unplug+0x1d8/0x1df
[<c029b72b>] raid1d+0x7d/0x555
[<c02a211a>] md_thread+0x44/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md1_raid1 S F7DD6C10 0 390 11 393 386 (L-TLB)
c1b50f3c 00000008 c1b50ed0 f7dd6c10 c1b50edc 908ef164 00000019 00000002
00000000 00000282 0000000a c1bc7158 c1bc7030 c1bc7a70 c1807060 90b7b20a
00000019 01d324e4 c1b50000 00000282 c1b50f50 c1b50f20 00000282 c1807ee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c02a21e5>] md_thread+0x10f/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
md0_raid1 D C1B98E24 0 393 11 394 390 (L-TLB)
c1b98e44 c197d300 c1903800 c1b98e24 c01e1146 caa79079 00000009 f7db0800
f7db0938 00000800 0000000a c1b8e158 c1b8e030 c037da60 c1807060 cad5fcef
00000009 00001c8f c1b98000 c197d300 00000246 f7db0948 c1b98e2c 00000246
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c029ec29>] md_update_sb+0xc9/0x153
[<c02a3a20>] md_check_recovery+0x182/0x437
[<c029b6cd>] raid1d+0x1f/0x555
[<c02a211a>] md_thread+0x44/0x14f
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
reiserfs/0 D C01183F2 0 394 11 395 393 (L-TLB)
f7c13d90 00000000 f7c13d24 c01183f2 f7c13d38 c2dc789f 00000010 c1b91fa4
f7405818 f7c13d5c 0000000a c1b92158 c1b92030 c037da60 c1807060 c2dcdf7a
00000010 00005dc5 f7c13000 c0118487 00000000 00000000 00000003 f7dd6e00
Call Trace:
[<c032047b>] io_schedule+0x26/0x30
[<c0157c7e>] sync_buffer+0x33/0x37
[<c0320663>] __wait_on_bit+0x45/0x62
[<c03206eb>] out_of_line_wait_on_bit+0x6b/0x73
[<c0157cef>] __wait_on_buffer+0x27/0x2d
[<c01ac0c8>] write_ordered_buffers+0x1b3/0x1fe
[<c01ac6ac>] flush_commit_list+0x464/0x503
[<c01afd6b>] flush_async_commits+0x78/0x7a
[<c012b221>] worker_thread+0x19c/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
reiserfs/1 D 00000000 0 395 11 394 (L-TLB)
f7c15e80 f7dd9c1c 00000000 00000000 c015dc50 cfa6ca71 00000009 00000000
00000000 00000000 0000000a c1b97b98 c1b97a70 c1923550 c180f060 cfa6e216
00000009 00000bda f7c15000 c16d0d60 c16e7720 c16ee120 c16d17e0 c16dfc00
Call Trace:
[<c03216e7>] __down+0xae/0x116
[<c031ef4a>] __down_failed+0xa/0x10
[<c01b0f17>] .text.lock.journal+0x8/0x111
[<c01afd6b>] flush_async_commits+0x78/0x7a
[<c012b221>] worker_thread+0x19c/0x226
[<c012ef67>] kthread+0x99/0x9d
[<c0100fe5>] kernel_thread_helper+0x5/0xb
udevd S C013EA16 0 445 1 1297 368 (NOTLB)
f7614eb8 c0383400 f7614e70 c013ea16 000200d0 3a68e9e7 00000008 00000000
000200d0 00000000 00000007 f7cc4158 f7cc4030 c037da60 c1807060 3a6a8cd6
00000008 00014e5d f7614000 c0383400 00000000 00000246 f765fa00 f7614ea4
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c016816e>] do_select+0x28b/0x2a2
[<c0168374>] sys_select+0x1ce/0x374
[<c0102a7b>] sysenter_past_esp+0x54/0x75
kjournald S 00000000 0 1297 1 1357 445 (L-TLB)
f7ccaf70 00000000 00000000 00000000 f7cc9da8 0a2c2793 00000004 00000032
c01183f2 f7ccaf2c 00000004 c1950158 c1950030 c037da60 c1807060 0a639496
00000004 00002f9f f7cca000 00000001 00000246 f77744b4 f7ccaf58 00000246
Call Trace:
[<c01c9985>] kjournald+0x22b/0x242
[<c0100fe5>] kernel_thread_helper+0x5/0xb
rc S 00000000 0 1357 1 1808 1662 1297 (NOTLB)
f7076f1c 00000044 00000003 00000000 000200d2 fa7112a3 00000008 f77fb550
000200d2 f7076f04 00000006 f77fb678 f77fb550 c037da60 c1807060 faa06439
00000008 00013318 f7076000 c17aa160 f7076efc 00000246 f7413e08 f7076f08
Call Trace:
[<c011fb02>] do_wait+0x304/0x39c
[<c011fc34>] sys_wait4+0x32/0x34
[<c011fc5d>] sys_waitpid+0x27/0x29
[<c0102a7b>] sysenter_past_esp+0x54/0x75
syslogd D 00000000 0 1662 1 1665 1357 (NOTLB)
f703be84 f703be20 c012f3d9 00000000 c1b91fa4 cbd82791 00000009 c011842d
00000000 00000001 00000009 c1bc2b98 c1bc2a70 c037da60 c1807060 cc08aca6
00000009 00001ebd f703b000 00000003 f7dd6e00 00000000 f703beec f703be70
Call Trace:
[<c032047b>] io_schedule+0x26/0x30
[<c0139f45>] sync_page+0x37/0x42
[<c0320663>] __wait_on_bit+0x45/0x62
[<c013a516>] wait_on_page_bit+0x6a/0x72
[<c013a0de>] wait_on_page_writeback_range+0x9b/0xf9
[<c013a2db>] filemap_fdatawait+0x4c/0x51
[<c0158197>] do_fsync+0x90/0xd9
[<c01581ed>] sys_fsync+0xd/0xf
[<c0102a7b>] sysenter_past_esp+0x54/0x75
klogd S C0115F42 0 1665 1 1676 1662 (NOTLB)
f7583d70 c16d1640 f7583d14 c0115f42 00000246 c3471a7e 00000009 00000000
f741e680 c1936dc0 0000000a f77fb158 f77fb030 c1bc2a70 c180f060 c36b6b1d
00000009 0000ae86 f7583000 00000000 0000004b f7c7da70 000002a3 c1bc2a70
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c0319665>] unix_wait_for_peer+0xd4/0xd8
[<c031a004>] unix_dgram_sendmsg+0x2ef/0x4fd
[<c02ac25f>] do_sock_write+0xa8/0xbe
[<c02ac3c6>] sock_aio_write+0x69/0x6d
[<c0156c5f>] do_sync_write+0xbb/0xf1
[<c0156dd4>] vfs_write+0x13f/0x146
[<c0156e7c>] sys_write+0x3d/0x64
[<c0102a7b>] sysenter_past_esp+0x54/0x75
named S 00000001 0 1676 1 1677 1665 (NOTLB)
f766df94 c01333dd f7693e8c 00000001 8005e828 cacebf41 00000007 00000001
00000000 00000000 00000009 c1b92678 c1b92550 c037da60 c1807060 cae3fe48
00000007 0006ff90 f766d000 bfc86fe8 00000246 c01ef0e6 bfc86fe8 f766df84
Call Trace:
[<c0101d55>] sys_rt_sigsuspend+0xab/0xc7
[<c0102a7b>] sysenter_past_esp+0x54/0x75
named S 00000001 0 1677 1 1678 1676 (NOTLB)
f6827ea8 c0115f42 f76feb80 00000001 00000001 c48052ee 00000015 00000001
f76feb80 c1b4ba70 00000009 f77dfb98 f77dfa70 c037da60 c1807060 c4814854
00000015 0000134f f6827000 00000000 00000000 c1b55a70 00000001 c1b4ba70
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c0133b00>] futex_wait+0x1dd/0x250
[<c0133dc8>] do_futex+0x49/0x83
[<c0133e67>] sys_futex+0x65/0xb8
[<c0102a7b>] sysenter_past_esp+0x54/0x75
named S C02B255C 0 1678 1 1679 1677 (NOTLB)
f6816ea8 00000000 f6816f4c c02b255c f6816f0c 266dda6f 00000019 c01ef0e6
f6816f0c f6816f4c 00000009 c1b55b98 c1b55a70 c1923550 c180f060 267c092e
00000019 000443a5 f6816000 f6816f0c 80105f54 80088c48 f6816f0c 00000008
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c0133b00>] futex_wait+0x1dd/0x250
[<c0133dc8>] do_futex+0x49/0x83
[<c0133e67>] sys_futex+0x65/0xb8
[<c0102a7b>] sysenter_past_esp+0x54/0x75
named S F76FEB80 0 1679 1 1680 1678 (NOTLB)
f77deea8 f77dee48 c0115f58 f76feb80 00000001 c48049ed 00000015 00000001
f76feb80 00000096 00000009 f7ccdb98 f7ccda70 c1923550 c180f060 c48113ce
00000015 00001881 f77de000 00000282 f77deebc f77dee8c 00000282 c180fee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c0133b00>] futex_wait+0x1dd/0x250
[<c0133dc8>] do_futex+0x49/0x83
[<c0133e67>] sys_futex+0x65/0xb8
[<c0102a7b>] sysenter_past_esp+0x54/0x75
named S C013EA16 0 1680 1 1702 1679 (NOTLB)
f6ee0eb8 c0383400 f6ee0e70 c013ea16 000200d0 266de319 00000019 00000000
000200d0 00000000 00000009 c1b4bb98 c1b4ba70 c037da60 c1807060 268b59ef
00000019 00000fc7 f6ee0000 f688307c f6ee0ea0 00000246 f6e6a518 f6ee0ea4
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c016816e>] do_select+0x28b/0x2a2
[<c0168374>] sys_select+0x1ce/0x374
[<c0102a7b>] sysenter_past_esp+0x54/0x75
portmap S F755DEF0 0 1702 1 1711 1680 (NOTLB)
f755df24 00000000 c17fa920 f755def0 c0383400 1bb69555 00000008 f6efbc98
f755ded4 00000246 00000004 f77df158 f77df030 c037da60 c1807060 1bcaafd5
00000008 0007d1ba f755d000 f68c7000 f76d5880 f755df9c f755df18 c0167dae
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c016865b>] do_poll+0x9a/0xb9
[<c01687e3>] sys_poll+0x169/0x226
[<c0102a7b>] sysenter_past_esp+0x54/0x75
mdadm S 00000096 0 1711 1 1791 1702 (NOTLB)
f7618e6c f7ccd030 f7618e04 00000096 00000001 0ebe717f 00000016 32388e4c
000000b7 00000000 00000009 f7065678 f7065550 c1923550 c180f060 0ebfc1d7
00000016 0001480d f7618000 00000000 ffffffff 013d1f60 ffffffff ffffffff
Call Trace:
[<c03217f2>] __down_interruptible+0xa3/0x133
[<c031ef5a>] __down_failed_interruptible+0xa/0x10
[<c02a3fba>] .text.lock.md+0xd7/0x11d
[<c017427c>] seq_read+0x1d8/0x2a4
[<c0156ae9>] vfs_read+0x89/0x144
[<c0156e18>] sys_read+0x3d/0x64
[<c0102a7b>] sysenter_past_esp+0x54/0x75
acpid S 000200D0 0 1791 1 1799 1711 (NOTLB)
f7c70f24 00000002 00000000 000200d0 00000000 554c894b 00000008 000000d0
f7c70f08 c013eaa9 00000004 f7c6f678 f7c6f550 c037da60 c1807060 555f2779
00000008 00033e39 f7c70000 f6916000 f771d780 f7c70f9c f7c70f18 c0167dae
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c016865b>] do_poll+0x9a/0xb9
[<c01687e3>] sys_poll+0x169/0x226
[<c0102a7b>] sysenter_past_esp+0x54/0x75
hpiod S 00000001 0 1799 1 1804 1791 (NOTLB)
f773ae0c 00200202 c0164105 00000001 c0383404 8d42e73d 00000009 00000000
00000000 f6923e20 00000009 f7691678 f7691550 f70b2030 c180f060 8d503e23
00000009 00035f90 f773a000 00000000 c016d45d c037da60 0000012c f70b2030
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c02d7525>] inet_csk_wait_for_connect+0xdb/0x104
[<c02d75b4>] inet_csk_accept+0x66/0x139
[<c02f41ad>] inet_accept+0x20/0xa2
[<c02acdcb>] sys_accept+0x88/0x119
[<c02ad734>] sys_socketcall+0xc1/0x254
[<c0102a7b>] sysenter_past_esp+0x54/0x75
hpiod S 00000000 0 1815 1 1804 (NOTLB)
f77d1cc4 00000000 00000000 00000000 00000000 8d42e73d 00000009 00000000
00000000 00000000 00000008 f70b2158 f70b2030 f7cc4550 c180f060 8d5081a4
00000009 00004381 f77d1000 00000000 00000000 c037da60 00000096 f7cc4550
Call Trace:
[<c0320544>] schedule_timeout+0x6f/0xae
[<c02aefee>] sk_wait_data+0xb3/0xec
[<c02d9bce>] tcp_recvmsg+0x385/0x720
[<c02af717>] sock_common_recvmsg+0x3d/0x53
[<c02abeb0>] sock_recvmsg+0xda/0xfe
[<c02ad117>] sys_recvfrom+0x79/0xbe
[<c02ad192>] sys_recv+0x36/0x38
[<c02ad7cb>] sys_socketcall+0x158/0x254
[<c0102a7b>] sysenter_past_esp+0x54/0x75
python S C013EA16 0 1804 1 1815 1799 (NOTLB)
f681ceb8 c0383400 f681ce70 c013ea16 000200d0 c24aa123 00000019 00000000
000200d0 00000000 0000000a f7cd7678 f7cd7550 c037da60 c1807060 c24b07ed
00000019 0000609e f681c000 00000282 f681cecc f681ce9c 00000282 c1807ee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c016816e>] do_select+0x28b/0x2a2
[<c0168374>] sys_select+0x1ce/0x374
[<c0102a7b>] sysenter_past_esp+0x54/0x75
S55cups S 00000000 0 1808 1357 1811 (NOTLB)
f7586f1c 00000044 00000003 00000000 000200d2 fb27d0c7 00000008 f70ba030
000200d2 f7586f04 00000003 f70ba158 f70ba030 c037da60 c1807060 fb2f2072
00000008 0000fd6c f7586000 c17ce140 f7586efc 00000246 f7413988 f7586f08
Call Trace:
[<c011fb02>] do_wait+0x304/0x39c
[<c011fc34>] sys_wait4+0x32/0x34
[<c011fc5d>] sys_waitpid+0x27/0x29
[<c0102a7b>] sysenter_past_esp+0x54/0x75
bash S 00000000 0 1811 1808 1812 (NOTLB)
f680ef1c c013a636 000000ac 00000000 c17f23c0 fb27c91b 00000008 f7c78550
000200d2 f680ef2c 00000001 f7c78678 f7c78550 c1923550 c180f060 fb4f30ba
00000008 000140c2 f680e000 c17c8aa0 00000000 00000246 f76dcb08 f680ef08
Call Trace:
[<c011fb02>] do_wait+0x304/0x39c
[<c011fc34>] sys_wait4+0x32/0x34
[<c011fc5d>] sys_waitpid+0x27/0x29
[<c0102a7b>] sysenter_past_esp+0x54/0x75
cupsd S F70B2A70 0 1812 1811 1813 (NOTLB)
f7073f44 f7073ee0 f7073000 f70b2a70 f7073ee0 cd3cd31c 00000019 c011b664
00000000 c17d35a0 00000009 f7c6f158 f7c6f030 c037da60 c1807060 cd3cfcfb
00000019 00001776 f7073000 00000282 f7073f58 f7073f28 00000282 c1807ee0
Call Trace:
[<c032051e>] schedule_timeout+0x49/0xae
[<c032059a>] schedule_timeout_interruptible+0x17/0x19
[<c0124ee2>] sys_nanosleep+0xcc/0x12f
[<c0102a7b>] sysenter_past_esp+0x54/0x75
cupsd S BFBE1C0C 0 1813 1812 1837 (NOTLB)
f6804f1c 00000000 00000000 bfbe1c0c bfbe1c0c bce98a58 00000010 f6804000
f6804f08 f6804ee8 00000001 f70b2b98 f70b2a70 c037da60 c1807060 bcebb163
00000010 000217ca f6804000 c03209b6 c0162893 00000246 f7d1c688 f6804f08
Call Trace:
[<c011fb02>] do_wait+0x304/0x39c
[<c011fc34>] sys_wait4+0x32/0x34
[<c011fc5d>] sys_waitpid+0x27/0x29
[<c0102a7b>] sysenter_past_esp+0x54/0x75
smb D 00000082 0 1837 1813 (NOTLB)
f6e68e28 f6e68de0 c02601f4 00000082 f7cbf3dc cccbd193 00000009 f7cbf3dc
f7cf39a0 00000001 00000007 f70d9b98 f70d9a70 c037da60 c1807060 ccd313bc
00000009 0001d758 f6e68000 00000003 f7db0800 f6e68e88 f6e68e94 f6e68e14
Call Trace:
[<c032047b>] io_schedule+0x26/0x30
[<c0139f45>] sync_page+0x37/0x42
[<c0320733>] __wait_on_bit_lock+0x40/0x63
[<c013a5f4>] __lock_page+0x64/0x6c
[<c013b737>] filemap_nopage+0x2e5/0x3a0
[<c0147681>] do_no_page+0x7a/0x266
[<c014799f>] __handle_mm_fault+0xbf/0x20a
[<c0114fbf>] do_page_fault+0x37b/0x5af
[<c0103673>] error_code+0x4f/0x54

Showing all blocking locks in the system:
S init: 1 [c1923a70, 116] (not blocked on mutex)
S migration/0: 2 [c1923030, 0] (not blocked on mutex)
S ksoftirqd/0: 3 [c192ba70, 134] (not blocked on mutex)
S watchdog/0: 4 [c192b550, 0] (not blocked on mutex)
S migration/1: 5 [c192b030, 0] (not blocked on mutex)
S ksoftirqd/1: 6 [c1933a70, 134] (not blocked on mutex)
S watchdog/1: 7 [c1933550, 0] (not blocked on mutex)
S events/0: 8 [c1933030, 110] (not blocked on mutex)
S events/1: 9 [c1950a70, 110] (not blocked on mutex)
S khelper: 10 [c1939a70, 112] (not blocked on mutex)
S kthread: 11 [c1950550, 111] (not blocked on mutex)
S kblockd/0: 14 [c1968550, 110] (not blocked on mutex)
S kblockd/1: 15 [c1968030, 110] (not blocked on mutex)
S kacpid: 16 [c1939550, 111] (not blocked on mutex)
S khubd: 108 [c1b43a70, 110] (not blocked on mutex)
S pdflush: 160 [f7c26a70, 120] (not blocked on mutex)
D pdflush: 161 [f7c26550, 115] (not blocked on mutex)
S aio/0: 163 [c1b8ea70, 111] (not blocked on mutex)
S kswapd0: 162 [f7c26030, 117] (not blocked on mutex)
S aio/1: 164 [c1b3d030, 111] (not blocked on mutex)
S kseriod: 252 [f7ced550, 110] (not blocked on mutex)
S ata/0: 277 [f7ce0030, 111] (not blocked on mutex)
S ata/1: 278 [f7ceda70, 111] (not blocked on mutex)
S scsi_eh_0: 280 [f7ce0a70, 110] (not blocked on mutex)
S scsi_eh_1: 281 [f7cf1550, 110] (not blocked on mutex)
S scsi_eh_2: 282 [f7ced030, 111] (not blocked on mutex)
S scsi_eh_3: 283 [c1b43030, 111] (not blocked on mutex)
S kirqd: 368 [f7ccd550, 115] (not blocked on mutex)
S md5_raid1: 374 [f7ccd030, 110] (not blocked on mutex)
S md4_raid1: 378 [c1bc7a70, 110] (not blocked on mutex)
S md3_raid1: 382 [c1b47a70, 110] (not blocked on mutex)
D md2_raid1: 386 [c1b43550, 110] (not blocked on mutex)
S md1_raid1: 390 [c1bc7030, 110] (not blocked on mutex)
D md0_raid1: 393 [c1b8e030, 110] (not blocked on mutex)
D reiserfs/0: 394 [c1b92030, 110] (not blocked on mutex)
D reiserfs/1: 395 [c1b97a70, 110] (not blocked on mutex)
S udevd: 445 [f7cc4030, 114] (not blocked on mutex)
S kjournald: 1297 [c1950030, 121] (not blocked on mutex)
S rc: 1357 [f77fb550, 117] (not blocked on mutex)
D syslogd: 1662 [c1bc2a70, 116] (not blocked on mutex)
S klogd: 1665 [f77fb030, 115] (not blocked on mutex)
S named: 1676 [c1b92550, 116] (not blocked on mutex)
S named: 1677 [f77dfa70, 116] (not blocked on mutex)
S named: 1678 [c1b55a70, 116] (not blocked on mutex)
S named: 1679 [f7ccda70, 116] (not blocked on mutex)
S named: 1680 [c1b4ba70, 116] (not blocked on mutex)
S portmap: 1702 [f77df030, 121] (not blocked on mutex)
S mdadm: 1711 [f7065550, 116] (not blocked on mutex)
S acpid: 1791 [f7c6f550, 121] (not blocked on mutex)
S hpiod: 1799 [f7691550, 116] (not blocked on mutex)
S hpiod: 1815 [f70b2030, 117] (not blocked on mutex)
S python: 1804 [f7cd7550, 115] (not blocked on mutex)
S S55cups: 1808 [f70ba030, 120] (not blocked on mutex)
S bash: 1811 [f7c78550, 123] (not blocked on mutex)
S cupsd: 1812 [f7c6f030, 116] (not blocked on mutex)
S cupsd: 1813 [f70b2a70, 122] (not blocked on mutex)
D smb: 1837 [f70d9a70, 118] (not blocked on mutex)

---------------------------
| showing all locks held: |
---------------------------

#001: [f7dd6a58] {alloc_super}
.. held by: pdflush: 161 [f7c26550, 115]
... acquired at: sync_supers+0xa3/0x103

=============================================

reuben

2006-01-10 12:43:11

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
> CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
> patch from Ingo.

This is quite ugly. I'd be suspecting a block layer problem: RAID or the
underlying device driver (ahci) has lost an IO.

2006-01-10 13:16:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.15-mm2


* Andrew Morton <[email protected]> wrote:

> Reuben Farrelly <[email protected]> wrote:
> >
> > Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
> > CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
> > patch from Ingo.
>
> This is quite ugly. I'd be suspecting a block layer problem: RAID or
> the underlying device driver (ahci) has lost an IO.

yeah, now it more looks like that to me too. What happens is a raid1
resync happens in the background - which is one of the more complex
raid1 workloads - and there've been a good number of md patches
recently. Reuben, does -git5 show the same symptoms?

Ingo

2006-01-10 21:20:23

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Quoting Andrew Morton ([email protected]):
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm2/

With both this and 2.6.15-mm1, but not with 2.6.15, I get the following
error:

mm/built-in.o(*ABS*+0x39c3c7ac): In function `__crc___handle_mm_fault':
slab.c: multiple definition of `__crc___handle_mm_fault'
make: *** [.tmp_vmlinux1] Error 1

The culprit appears to be that there are two places where
__handle_mm_fault is EXPORT_SYMBOL_GPLed. The following trivial patch
fixes it for me.

Signed-off-by: Serge Hallyn <[email protected]>

Index: linux-2.6.15/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6.15.orig/arch/powerpc/kernel/ppc_ksyms.c 2006-01-10 04:59:11.000000000 -0600
+++ linux-2.6.15/arch/powerpc/kernel/ppc_ksyms.c 2006-01-10 09:07:40.000000000 -0600
@@ -240,8 +240,6 @@ EXPORT_SYMBOL(next_mmu_context);
EXPORT_SYMBOL(set_context);
#endif

-EXPORT_SYMBOL_GPL(__handle_mm_fault);
-
#ifdef CONFIG_PPC_STD_MMU_32
extern long mol_trampoline;
EXPORT_SYMBOL(mol_trampoline); /* For MOL */


thanks,
-serge

2006-01-11 02:24:54

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Andrew wrote:
> This is caused by the inclusion of user.h in kernel.h added by
> dump_thread-cleanup.patch.

This same build breakage showed up on ia64 sn2_defconfig,
and your patch fixes it nicely. Thanks.

Acked-by: Paul Jackson <[email protected]>


Andrian - I think that was your dump_thread-cleanup patch.

Please be sure to cross build other arch's when making non-local
changes, such as this one that affected the files:

arch/alpha/kernel/alpha_ksyms.c
arch/arm26/kernel/armksyms.c
arch/cris/kernel/crisksyms.c
arch/cris/kernel/process.c
arch/frv/kernel/frv_ksyms.c
arch/frv/kernel/process.c
arch/h8300/kernel/h8300_ksyms.c
arch/h8300/kernel/process.c
arch/m32r/kernel/m32r_ksyms.c
arch/m32r/kernel/process.c
arch/m68k/kernel/m68k_ksyms.c
arch/m68knommu/kernel/m68k_ksyms.c
arch/m68knommu/kernel/process.c
arch/s390/kernel/process.c
arch/sh64/kernel/process.c
arch/sh64/kernel/sh_ksyms.c
arch/sh/kernel/process.c
arch/sh/kernel/sh_ksyms.c
arch/sparc64/kernel/binfmt_aout32.c
arch/sparc64/kernel/sparc64_ksyms.c
arch/sparc/kernel/sparc_ksyms.c
arch/v850/kernel/process.c
arch/v850/kernel/v850_ksyms.c
fs/binfmt_aout.c
fs/binfmt_flat.c
include/asm-um/processor-generic.h
include/linux/kernel.h

Sure, it consumes some time, but better you do it once, then each of
several of us have to first do a bisection on Andrew's gazillion
patches to find the culprit, and then stare at the patch until the light
bulb goes on in our dimm brains, only to grep back through the lkml
messages to find that we are not alone in our misery.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-01-11 04:16:49

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Tuesday January 10, [email protected] wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > Reuben Farrelly <[email protected]> wrote:
> > >
> > > Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
> > > CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
> > > patch from Ingo.
> >
> > This is quite ugly. I'd be suspecting a block layer problem: RAID or
> > the underlying device driver (ahci) has lost an IO.
>
> yeah, now it more looks like that to me too. What happens is a raid1
> resync happens in the background - which is one of the more complex
> raid1 workloads - and there've been a good number of md patches
> recently. Reuben, does -git5 show the same symptoms?

There isn't a resync happening - if there was you would a process
called
mdX_resync
(for some X).

What I see here is:
pdflush at:
Call Trace:
[<c02a2f72>] md_write_start+0xbc/0x150
[<c029a659>] make_request+0x51/0x432
[<c01e1146>] generic_make_request+0xbe/0x13d
[<c01e120e>] submit_bio+0x49/0xd3

So it is trying to write to a raid1 which was 'clean' and needs to
be marked 'dirty' (or 'active') before the first write.
md_start_write arranges for the array's thread to do this.
What is that thread doing?

md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
...
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c02a4f93>] bitmap_unplug+0x1d8/0x1df
[<c029b72b>] raid1d+0x7d/0x555
[<c02a211a>] md_thread+0x44/0x14f

It probably hasn't tried to write out the superblock, and just
now it is writing out some write-intent-bitmap entries and waiting
for the write to complete.

md_super_wait is waiting for 'pending_writes' to become zero.
It is incremented when any superblock or bitmap write starts, and
is decremented when that write completes.

So a lost write request in one of the components of the array could
cause this, but it is too easy to simply blame it on someone else....

But there is something I don't understand....

If md2_raid1 is in bitmap_unplug, that means there are outstanding
write requests to md2_raid1, so the one that pdflush is currently
generating cannot be the first.

This suggests that pdflush is not writing to md2, but to something
else.
Ahhhh.. md0_raid1 is also blocked:
Call Trace:
[<c029d004>] md_super_wait+0xd5/0xea
[<c029ec29>] md_update_sb+0xc9/0x153
[<c02a3a20>] md_check_recovery+0x182/0x437
[<c029b6cd>] raid1d+0x1f/0x555

It has just updated the superblocks for md0 and is waiting for those
writes to complete. But they don't seem to want to complete.

So it seems that two raid1 arrays are blocked in slightly different
places.

I'm tempted to blame the IO scheduled, only because there have been
vaguely similar problems in the recent past that can be avoided by
changing the scheduler.

Reuben: could you check what IO scheduler your drives are using, and
try changing it. I suspect they use 'as' by default. Try 'cfq' or
'deadline'.

NeilBrown

2006-01-11 05:16:31

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 11/01/2006 5:16 p.m., Neil Brown wrote:
> On Tuesday January 10, [email protected] wrote:
>> * Andrew Morton <[email protected]> wrote:
>>
>>> Reuben Farrelly <[email protected]> wrote:
>>>> Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER,
>>>> CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING);
>>>> patch from Ingo.
>>> This is quite ugly. I'd be suspecting a block layer problem: RAID or
>>> the underlying device driver (ahci) has lost an IO.
>> yeah, now it more looks like that to me too. What happens is a raid1
>> resync happens in the background - which is one of the more complex
>> raid1 workloads - and there've been a good number of md patches
>> recently. Reuben, does -git5 show the same symptoms?
>
> There isn't a resync happening - if there was you would a process
> called
> mdX_resync
> (for some X).
>
> What I see here is:
> pdflush at:
> Call Trace:
> [<c02a2f72>] md_write_start+0xbc/0x150
> [<c029a659>] make_request+0x51/0x432
> [<c01e1146>] generic_make_request+0xbe/0x13d
> [<c01e120e>] submit_bio+0x49/0xd3
>
> So it is trying to write to a raid1 which was 'clean' and needs to
> be marked 'dirty' (or 'active') before the first write.
> md_start_write arranges for the array's thread to do this.
> What is that thread doing?
>
> md2_raid1 D F7227200 0 386 11 390 382 (L-TLB)
> ...
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c02a4f93>] bitmap_unplug+0x1d8/0x1df
> [<c029b72b>] raid1d+0x7d/0x555
> [<c02a211a>] md_thread+0x44/0x14f
>
> It probably hasn't tried to write out the superblock, and just
> now it is writing out some write-intent-bitmap entries and waiting
> for the write to complete.
>
> md_super_wait is waiting for 'pending_writes' to become zero.
> It is incremented when any superblock or bitmap write starts, and
> is decremented when that write completes.
>
> So a lost write request in one of the components of the array could
> cause this, but it is too easy to simply blame it on someone else....
>
> But there is something I don't understand....
>
> If md2_raid1 is in bitmap_unplug, that means there are outstanding
> write requests to md2_raid1, so the one that pdflush is currently
> generating cannot be the first.
>
> This suggests that pdflush is not writing to md2, but to something
> else.
> Ahhhh.. md0_raid1 is also blocked:
> Call Trace:
> [<c029d004>] md_super_wait+0xd5/0xea
> [<c029ec29>] md_update_sb+0xc9/0x153
> [<c02a3a20>] md_check_recovery+0x182/0x437
> [<c029b6cd>] raid1d+0x1f/0x555
>
> It has just updated the superblocks for md0 and is waiting for those
> writes to complete. But they don't seem to want to complete.
>
> So it seems that two raid1 arrays are blocked in slightly different
> places.
>
> I'm tempted to blame the IO scheduled, only because there have been
> vaguely similar problems in the recent past that can be avoided by
> changing the scheduler.
>
> Reuben: could you check what IO scheduler your drives are using, and
> try changing it. I suspect they use 'as' by default. Try 'cfq' or
> 'deadline'.

By default it was using 'deadline', but I just added elevator=as to my kernel
command line, and it still failed in the same way :( I'm building all four
schedulers into the kernel (should probably optimise that to one someday but not
now..)

I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
that would narrow it down to say, 200 or so patches ;-)

reuben


2006-01-11 05:30:40

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
> that would narrow it down to say, 200 or so patches ;-)

If -mm2 plus -mm2's linus.patch does not fail then
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
find the dud patch.

2006-01-11 05:31:27

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton <[email protected]> wrote:
>
> Reuben Farrelly <[email protected]> wrote:
> >
> > I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
> > that would narrow it down to say, 200 or so patches ;-)
>
> If -mm2 plus -mm2's linus.patch does not fail then
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
> find the dud patch.

Actually 2.6.15-mm1 would be a better one to do the bisection on: it has
all the md- patches separated out.

2006-01-11 10:50:24

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On 11/01/2006 6:30 p.m., Andrew Morton wrote:
> Andrew Morton <[email protected]> wrote:
>> Reuben Farrelly <[email protected]> wrote:
>>> I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
>>> that would narrow it down to say, 200 or so patches ;-)
>> If -mm2 plus -mm2's linus.patch does not fail then
>> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
>> find the dud patch.
>
> Actually 2.6.15-mm1 would be a better one to do the bisection on: it has
> all the md- patches separated out.

I've done some more testing - which may change the suggested approach somewhat..

2.6.15-mm1 is OK, I'm running it now, rebooted probably 15 times and it's come
up every time.
2.6.15-git2 is OK, booted up to completion (tested once).
2.6.15-git3 was a dud, bootup hung
2.6.15- [linus.patch from -mm2, which is basically the same as -git3] won't boot
2.6.15-mm2 doesn't boot either, tested many times
2.6.15-git6 won't boot
2.6.15-git7 got stuck also, same issue

So some change that went in between -git2 and -git3 seems to have caused it.
Nothing from -git3 onwards has ever booted to completion.

Is there any chance a patch came in, was queued in -mm but was never released in
any -mm (1|2) release before being sent to Linus/-gitX? (in this case, -git3).
The reason I suggest this is because -mm1 didn't have the problem, but in -mm2
it was visible by just the linus.patch from that release.

I'm not sure where this leaves quilt testing. Would quilt testing just narrow
me down to it being the linus.patch in mm which actually caused it? (Which I
already know is the source)..

I've put the -git7 .config and a tasks list up on
http://www.reub.net/files/kernel/ again in case anyone wants to verify that it
is still the same problem.

What a pain of a bug.

reuben





2006-01-11 11:06:22

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> On 11/01/2006 6:30 p.m., Andrew Morton wrote:
> > Andrew Morton <[email protected]> wrote:
> >> Reuben Farrelly <[email protected]> wrote:
> >>> I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
> >>> that would narrow it down to say, 200 or so patches ;-)
> >> If -mm2 plus -mm2's linus.patch does not fail then
> >> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
> >> find the dud patch.
> >
> > Actually 2.6.15-mm1 would be a better one to do the bisection on: it has
> > all the md- patches separated out.
>
> I've done some more testing - which may change the suggested approach somewhat..
>
> 2.6.15-mm1 is OK, I'm running it now, rebooted probably 15 times and it's come
> up every time.
> 2.6.15-git2 is OK, booted up to completion (tested once).
> 2.6.15-git3 was a dud, bootup hung

Ah.

> 2.6.15- [linus.patch from -mm2, which is basically the same as -git3] won't boot
> 2.6.15-mm2 doesn't boot either, tested many times
> 2.6.15-git6 won't boot
> 2.6.15-git7 got stuck also, same issue
>
> So some change that went in between -git2 and -git3 seems to have caused it.
> Nothing from -git3 onwards has ever booted to completion.
>
> Is there any chance a patch came in, was queued in -mm but was never released in
> any -mm (1|2) release before being sent to Linus/-gitX? (in this case, -git3).

Yes, people sneak stuff in at the last minute.

Neil thinks that an IO got lost. In the git2->git3 diff we have:

b/drivers/scsi/Kconfig | 10
b/drivers/scsi/ahci.c | 1
b/drivers/scsi/ata_piix.c | 5
b/drivers/scsi/libata-core.c | 145 +
b/drivers/scsi/libata-scsi.c | 48
b/drivers/scsi/libata.h | 4
b/drivers/scsi/sata_mv.c | 1
b/drivers/scsi/sata_promise.c | 1
b/drivers/scsi/sata_sil.c | 1
b/drivers/scsi/sata_sil24.c | 1
b/drivers/scsi/sata_sx4.c | 1
b/drivers/scsi/scsi_lib.c | 50
b/drivers/scsi/scsi_sysfs.c | 31
b/drivers/scsi/sd.c | 85 -
b/fs/bio.c | 26

Jens, Jeff: were any of those changes added in the final day or two, not
included in the trees which I pull?


>
> I'm not sure where this leaves quilt testing. Would quilt testing just narrow
> me down to it being the linus.patch in mm which actually caused it? (Which I
> already know is the source)..

Yes, there's not much point in that.

`git bisect' will find it.

2006-01-11 11:11:49

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wed, Jan 11 2006, Andrew Morton wrote:
> Reuben Farrelly <[email protected]> wrote:
> >
> > On 11/01/2006 6:30 p.m., Andrew Morton wrote:
> > > Andrew Morton <[email protected]> wrote:
> > >> Reuben Farrelly <[email protected]> wrote:
> > >>> I'm tempted to see if I can narrow it down to a specific -gitX release, maybe
> > >>> that would narrow it down to say, 200 or so patches ;-)
> > >> If -mm2 plus -mm2's linus.patch does not fail then
> > >> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt will
> > >> find the dud patch.
> > >
> > > Actually 2.6.15-mm1 would be a better one to do the bisection on: it has
> > > all the md- patches separated out.
> >
> > I've done some more testing - which may change the suggested approach somewhat..
> >
> > 2.6.15-mm1 is OK, I'm running it now, rebooted probably 15 times and it's come
> > up every time.
> > 2.6.15-git2 is OK, booted up to completion (tested once).
> > 2.6.15-git3 was a dud, bootup hung
>
> Ah.
>
> > 2.6.15- [linus.patch from -mm2, which is basically the same as -git3] won't boot
> > 2.6.15-mm2 doesn't boot either, tested many times
> > 2.6.15-git6 won't boot
> > 2.6.15-git7 got stuck also, same issue
> >
> > So some change that went in between -git2 and -git3 seems to have caused it.
> > Nothing from -git3 onwards has ever booted to completion.
> >
> > Is there any chance a patch came in, was queued in -mm but was never released in
> > any -mm (1|2) release before being sent to Linus/-gitX? (in this case, -git3).
>
> Yes, people sneak stuff in at the last minute.
>
> Neil thinks that an IO got lost. In the git2->git3 diff we have:
>
> b/drivers/scsi/Kconfig | 10
> b/drivers/scsi/ahci.c | 1
> b/drivers/scsi/ata_piix.c | 5
> b/drivers/scsi/libata-core.c | 145 +
> b/drivers/scsi/libata-scsi.c | 48
> b/drivers/scsi/libata.h | 4
> b/drivers/scsi/sata_mv.c | 1
> b/drivers/scsi/sata_promise.c | 1
> b/drivers/scsi/sata_sil.c | 1
> b/drivers/scsi/sata_sil24.c | 1
> b/drivers/scsi/sata_sx4.c | 1
> b/drivers/scsi/scsi_lib.c | 50
> b/drivers/scsi/scsi_sysfs.c | 31
> b/drivers/scsi/sd.c | 85 -
> b/fs/bio.c | 26
>
> Jens, Jeff: were any of those changes added in the final day or two, not
> included in the trees which I pull?

Reuben, do you have any barrier= options in your fstab for any reiser
file system?

--
Jens Axboe

2006-01-11 11:41:11

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 12/01/2006 12:13 a.m., Jens Axboe wrote:
> On Wed, Jan 11 2006, Andrew Morton wrote:
>> Neil thinks that an IO got lost. In the git2->git3 diff we have:
>>
>> b/drivers/scsi/Kconfig | 10
>> b/drivers/scsi/ahci.c | 1
>> b/drivers/scsi/ata_piix.c | 5
>> b/drivers/scsi/libata-core.c | 145 +
>> b/drivers/scsi/libata-scsi.c | 48
>> b/drivers/scsi/libata.h | 4
>> b/drivers/scsi/sata_mv.c | 1
>> b/drivers/scsi/sata_promise.c | 1
>> b/drivers/scsi/sata_sil.c | 1
>> b/drivers/scsi/sata_sil24.c | 1
>> b/drivers/scsi/sata_sx4.c | 1
>> b/drivers/scsi/scsi_lib.c | 50
>> b/drivers/scsi/scsi_sysfs.c | 31
>> b/drivers/scsi/sd.c | 85 -
>> b/fs/bio.c | 26
>>
>> Jens, Jeff: were any of those changes added in the final day or two, not
>> included in the trees which I pull?
>
> Reuben, do you have any barrier= options in your fstab for any reiser
> file system?

None whatsoever:

/dev/md0 / reiserfs defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
/dev/sda1 /boot ext3 defaults 1 2
#/dev/sdb1 /boot-2 ext3 defaults 1 2
/dev/md1 /home reiserfs defaults 0 0
/dev/md2 /var reiserfs defaults 0 0
/dev/md3 /var/www/cgi-bin reiserfs defaults 0 0
/dev/md4 /tmp reiserfs defaults 0 0
/dev/md5 /backup reiserfs defaults 0 0
/dev/sda8 /var/spool/squid-1 reiserfs noatime,notail 0 0
/dev/sdb8 /var/spool/squid-2 reiserfs noatime,notail 0 0
/dev/sda9 swap swap defaults 0 0
/dev/sdb9 swap swap defaults 0 0
/dev/sdc1 /store reiserfs defaults 0 0
/dev/shm /var/spool/amavisd/tmp tmpfs
defaults,size=25m,mode=700,uid=508,gid=509, 0 0
/dev/fd0 /media/floppy auto
pamconsole,exec,noauto,managed 0 0

reuben

2006-01-11 11:54:37

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Reuben Farrelly wrote:
>
>
> On 12/01/2006 12:13 a.m., Jens Axboe wrote:
> >On Wed, Jan 11 2006, Andrew Morton wrote:
> >>Neil thinks that an IO got lost. In the git2->git3 diff we have:
> >>
> >> b/drivers/scsi/Kconfig | 10
> >> b/drivers/scsi/ahci.c | 1
> >> b/drivers/scsi/ata_piix.c | 5
> >> b/drivers/scsi/libata-core.c | 145 +
> >> b/drivers/scsi/libata-scsi.c | 48
> >> b/drivers/scsi/libata.h | 4
> >> b/drivers/scsi/sata_mv.c | 1
> >> b/drivers/scsi/sata_promise.c | 1
> >> b/drivers/scsi/sata_sil.c | 1
> >> b/drivers/scsi/sata_sil24.c | 1
> >> b/drivers/scsi/sata_sx4.c | 1
> >> b/drivers/scsi/scsi_lib.c | 50
> >> b/drivers/scsi/scsi_sysfs.c | 31
> >> b/drivers/scsi/sd.c | 85 -
> >> b/fs/bio.c | 26
> >>
> >>Jens, Jeff: were any of those changes added in the final day or two, not
> >>included in the trees which I pull?
> >
> >Reuben, do you have any barrier= options in your fstab for any reiser
> >file system?
>
> None whatsoever:
>
> /dev/md0 / reiserfs defaults 0 0
> none /dev/pts devpts gid=5,mode=620 0 0
> none /dev/shm tmpfs defaults 0 0
> none /proc proc defaults 0 0
> sysfs /sys sysfs defaults 0 0
> /dev/sda1 /boot ext3 defaults 1 2
> #/dev/sdb1 /boot-2 ext3 defaults 1 2
> /dev/md1 /home reiserfs defaults 0 0
> /dev/md2 /var reiserfs defaults 0 0
> /dev/md3 /var/www/cgi-bin reiserfs defaults 0 0
> /dev/md4 /tmp reiserfs defaults 0 0
> /dev/md5 /backup reiserfs defaults 0 0
> /dev/sda8 /var/spool/squid-1 reiserfs noatime,notail 0 0
> /dev/sdb8 /var/spool/squid-2 reiserfs noatime,notail 0 0
> /dev/sda9 swap swap defaults 0 0
> /dev/sdb9 swap swap defaults 0 0
> /dev/sdc1 /store reiserfs defaults 0 0
> /dev/shm /var/spool/amavisd/tmp tmpfs
> defaults,size=25m,mode=700,uid=508,gid=509, 0 0
> /dev/fd0 /media/floppy auto
> pamconsole,exec,noauto,managed 0 0

Then the barrier changes from git2 -> git3 should not have anything to
do with it. Strange... I guess you should try the git bisect method to
narrow it down.

--
Jens Axboe

2006-01-11 14:40:48

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 12/01/2006 12:56 a.m., Jens Axboe wrote:
> On Thu, Jan 12 2006, Reuben Farrelly wrote:
>>
>> On 12/01/2006 12:13 a.m., Jens Axboe wrote:
>>> On Wed, Jan 11 2006, Andrew Morton wrote:
>>>> Neil thinks that an IO got lost. In the git2->git3 diff we have:
>>>>
>>>> b/drivers/scsi/Kconfig | 10
>>>> b/drivers/scsi/ahci.c | 1
>>>> b/drivers/scsi/ata_piix.c | 5
>>>> b/drivers/scsi/libata-core.c | 145 +
>>>> b/drivers/scsi/libata-scsi.c | 48
>>>> b/drivers/scsi/libata.h | 4
>>>> b/drivers/scsi/sata_mv.c | 1
>>>> b/drivers/scsi/sata_promise.c | 1
>>>> b/drivers/scsi/sata_sil.c | 1
>>>> b/drivers/scsi/sata_sil24.c | 1
>>>> b/drivers/scsi/sata_sx4.c | 1
>>>> b/drivers/scsi/scsi_lib.c | 50
>>>> b/drivers/scsi/scsi_sysfs.c | 31
>>>> b/drivers/scsi/sd.c | 85 -
>>>> b/fs/bio.c | 26
>>>>
>>>> Jens, Jeff: were any of those changes added in the final day or two, not
>>>> included in the trees which I pull?
>>> Reuben, do you have any barrier= options in your fstab for any reiser
>>> file system?
>> None whatsoever:
>>
>> /dev/md0 / reiserfs defaults 0 0
>> none /dev/pts devpts gid=5,mode=620 0 0
>> none /dev/shm tmpfs defaults 0 0
>> none /proc proc defaults 0 0
>> sysfs /sys sysfs defaults 0 0
>> /dev/sda1 /boot ext3 defaults 1 2
>> #/dev/sdb1 /boot-2 ext3 defaults 1 2
>> /dev/md1 /home reiserfs defaults 0 0
>> /dev/md2 /var reiserfs defaults 0 0
>> /dev/md3 /var/www/cgi-bin reiserfs defaults 0 0
>> /dev/md4 /tmp reiserfs defaults 0 0
>> /dev/md5 /backup reiserfs defaults 0 0
>> /dev/sda8 /var/spool/squid-1 reiserfs noatime,notail 0 0
>> /dev/sdb8 /var/spool/squid-2 reiserfs noatime,notail 0 0
>> /dev/sda9 swap swap defaults 0 0
>> /dev/sdb9 swap swap defaults 0 0
>> /dev/sdc1 /store reiserfs defaults 0 0
>> /dev/shm /var/spool/amavisd/tmp tmpfs
>> defaults,size=25m,mode=700,uid=508,gid=509, 0 0
>> /dev/fd0 /media/floppy auto
>> pamconsole,exec,noauto,managed 0 0
>
> Then the barrier changes from git2 -> git3 should not have anything to
> do with it. Strange... I guess you should try the git bisect method to
> narrow it down.

Ok push came to shove, so I spent the evening (early hours of the morning
really) learning how to git my way around a little and use git bisect. Not bad,
people who come up with clever stuff like that would probably be clever enough
to be able to do kernel development or something ;-)

Anyway, humour aside, I've bisected down to six revisions:

[root@tornado linux-2.6]# git bisect good
Bisecting: 6 revisions left to test after this
[93c9338713d4e11102cd09b4670ad42a336b06a3] [BLOCK] update libata to use new
blk_ordered for barriers
[root@tornado linux-2.6]#

however I'm not sure I can go a lot further now as the tree is failing to
compile at that point:

include/asm/mpspec_def.h:78: warning: 'packed' attribute ignored for field of
type 'unsigned char[5u]'
block/ll_rw_blk.c:2421: error: conflicting types for 'blk_execute_rq_nowait'
include/linux/blkdev.h:617: error: previous declaration of
'blk_execute_rq_nowait' was here
make[1]: *** [block/ll_rw_blk.o] Error 1
make: *** [block] Error 2
[root@tornado linux-2.6]#

I'm guessing there are a block of changes that all go together around this point.

Here's my BISECT_LOG:

git-bisect start
# bad: [0aec63e67c69545ca757a73a66f5dcf05fa484bf] Fix posix-cpu-timers
sched_time accumulation
git-bisect bad 0aec63e67c69545ca757a73a66f5dcf05fa484bf
# good: [2e3e13f8e9d9b2111404cdccaa4e1b988b70acce] i2c: i2c-i801 explicitly
enables/disables PEC
git-bisect good 2e3e13f8e9d9b2111404cdccaa4e1b988b70acce
# good: [9bbc8346fb21fad3f678220b067450e436e45dbf] s390: fix invalid return code
in sclp_cpi
git-bisect good 9bbc8346fb21fad3f678220b067450e436e45dbf
# bad: [221fc10ec89834329e5613e3cab4569ba22da410] fs/ufs: debug mode compilation
failure
git-bisect bad 221fc10ec89834329e5613e3cab4569ba22da410
# good: [ddaf22abaa831763e75775e6d4c7693504237997] md: attempt to auto-correct
read errors in raid1
git-bisect good ddaf22abaa831763e75775e6d4c7693504237997
# good: [d9d166c2a9d5d01af34396793950aa695883eed4] md: allow array level to be
set textually via sysfs
git-bisect good d9d166c2a9d5d01af34396793950aa695883eed4
# bad: [e650c305ec3178818b317dad37a6d9c7fa8ba28d] [SCSI] scsi_end_async() needs
to take an uptodate parameter
git-bisect bad e650c305ec3178818b317dad37a6d9c7fa8ba28d
# good: [64100099ed22f71cce656c5c2caecf5c9cf255dc] [BLOCK] mark some block/
variables cons
git-bisect good 64100099ed22f71cce656c5c2caecf5c9cf255dc

I'll leave the setup as it is right now so if there's an easy way to narrow it
down even further I can continue tomorrow.

Incidentally, I also tested - the problem is still in -mm3 also.

reuben



2006-01-11 14:50:37

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Reuben Farrelly wrote:
>
>
> On 12/01/2006 12:56 a.m., Jens Axboe wrote:
> >On Thu, Jan 12 2006, Reuben Farrelly wrote:
> >>
> >>On 12/01/2006 12:13 a.m., Jens Axboe wrote:
> >>>On Wed, Jan 11 2006, Andrew Morton wrote:
> >>>>Neil thinks that an IO got lost. In the git2->git3 diff we have:
> >>>>
> >>>>b/drivers/scsi/Kconfig | 10
> >>>>b/drivers/scsi/ahci.c | 1
> >>>>b/drivers/scsi/ata_piix.c | 5
> >>>>b/drivers/scsi/libata-core.c | 145 +
> >>>>b/drivers/scsi/libata-scsi.c | 48
> >>>>b/drivers/scsi/libata.h | 4
> >>>>b/drivers/scsi/sata_mv.c | 1
> >>>>b/drivers/scsi/sata_promise.c | 1
> >>>>b/drivers/scsi/sata_sil.c | 1
> >>>>b/drivers/scsi/sata_sil24.c | 1
> >>>>b/drivers/scsi/sata_sx4.c | 1
> >>>>b/drivers/scsi/scsi_lib.c | 50
> >>>>b/drivers/scsi/scsi_sysfs.c | 31
> >>>>b/drivers/scsi/sd.c | 85 -
> >>>>b/fs/bio.c | 26
> >>>>
> >>>>Jens, Jeff: were any of those changes added in the final day or two, not
> >>>>included in the trees which I pull?
> >>>Reuben, do you have any barrier= options in your fstab for any reiser
> >>>file system?
> >>None whatsoever:
> >>
> >>/dev/md0 / reiserfs defaults
> >>0 0
> >>none /dev/pts devpts gid=5,mode=620
> >>0 0
> >>none /dev/shm tmpfs defaults
> >>0 0
> >>none /proc proc defaults
> >>0 0
> >>sysfs /sys sysfs defaults
> >>0 0
> >>/dev/sda1 /boot ext3 defaults
> >>1 2
> >>#/dev/sdb1 /boot-2 ext3 defaults
> >>1 2
> >>/dev/md1 /home reiserfs defaults
> >>0 0
> >>/dev/md2 /var reiserfs defaults
> >>0 0
> >>/dev/md3 /var/www/cgi-bin reiserfs defaults
> >>0 0
> >>/dev/md4 /tmp reiserfs defaults
> >>0 0
> >>/dev/md5 /backup reiserfs defaults
> >>0 0
> >>/dev/sda8 /var/spool/squid-1 reiserfs noatime,notail
> >>0 0
> >>/dev/sdb8 /var/spool/squid-2 reiserfs noatime,notail
> >>0 0
> >>/dev/sda9 swap swap defaults
> >>0 0
> >>/dev/sdb9 swap swap defaults
> >>0 0
> >>/dev/sdc1 /store reiserfs defaults
> >>0 0
> >>/dev/shm /var/spool/amavisd/tmp tmpfs
> >>defaults,size=25m,mode=700,uid=508,gid=509, 0 0
> >>/dev/fd0 /media/floppy auto
> >>pamconsole,exec,noauto,managed 0 0
> >
> >Then the barrier changes from git2 -> git3 should not have anything to
> >do with it. Strange... I guess you should try the git bisect method to
> >narrow it down.
>
> Ok push came to shove, so I spent the evening (early hours of the morning
> really) learning how to git my way around a little and use git bisect. Not
> bad, people who come up with clever stuff like that would probably be
> clever enough to be able to do kernel development or something ;-)
>
> Anyway, humour aside, I've bisected down to six revisions:
>
> [root@tornado linux-2.6]# git bisect good
> Bisecting: 6 revisions left to test after this
> [93c9338713d4e11102cd09b4670ad42a336b06a3] [BLOCK] update libata to use new
> blk_ordered for barriers
> [root@tornado linux-2.6]#
>
> however I'm not sure I can go a lot further now as the tree is failing to
> compile at that point:
>
> include/asm/mpspec_def.h:78: warning: 'packed' attribute ignored for field
> of type 'unsigned char[5u]'
> block/ll_rw_blk.c:2421: error: conflicting types for 'blk_execute_rq_nowait'
> include/linux/blkdev.h:617: error: previous declaration of
> 'blk_execute_rq_nowait' was here
> make[1]: *** [block/ll_rw_blk.o] Error 1
> make: *** [block] Error 2
> [root@tornado linux-2.6]#
>
> I'm guessing there are a block of changes that all go together around this
> point.

It's not too tricky, you just need to correct that function prototype.
Could you do that? Would be nice to know _exactly_ which libata
changeset caused this malfunction. But it does of course point at the
barrier changes for scsi/libata...

--
Jens Axboe

2006-01-11 14:53:24

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wed, Jan 11 2006, Jens Axboe wrote:
> It's not too tricky, you just need to correct that function prototype.
> Could you do that? Would be nice to know _exactly_ which libata
> changeset caused this malfunction. But it does of course point at the
> barrier changes for scsi/libata...

You can also try something quicker - use a newer kernel known to exhibit
the problem, and apply this patch on top of that:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0302723..720ace4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -436,6 +436,7 @@ void md_super_write(mddev_t *mddev, mdk_
bio->bi_rw = rw;

atomic_inc(&mddev->pending_writes);
+#if 0
if (!test_bit(BarriersNotsupp, &rdev->flags)) {
struct bio *rbio;
rw |= (1<<BIO_RW_BARRIER);
@@ -444,6 +445,7 @@ void md_super_write(mddev_t *mddev, mdk_
rbio->bi_end_io = super_written_barrier;
submit_bio(rw, rbio);
} else
+#endif
submit_bio(rw, bio);
}


--
Jens Axboe

2006-01-11 18:41:18

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Brice Goglin wrote:

>>>0000:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon Mobility M300] (prog-if 00 [VGA])
>>> Subsystem: IBM: Unknown device 056e
>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>> Latency: 0, Cache Line Size: 0x08 (32 bytes)
>>> Interrupt: pin A routed to IRQ 169
>>> Region 0: Memory at c0000000 (32-bit, prefetchable) [size=128M]
>>> Region 1: I/O ports at 2000 [size=256]
>>> Region 2: Memory at a8100000 (32-bit, non-prefetchable) [size=64K]
>>> Expansion ROM at a8120000 [disabled] [size=128K]
>>> Capabilities: <available only to root>
>>>00: 02 10 60 54 07 01 10 00 00 00 00 03 08 00 00 00
>>>10: 08 00 00 c0 01 20 00 00 00 00 10 a8 00 00 00 00
>>>20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 6e 05
>>>30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 01 00 00
>>>
>>>
>Assuming this is a PCI Express card, then what is the proper fix ?
>Should I prevent my initscript from loading agpgart (actually intel_agp)
>at all ? (I guess udev or hotplug is trying to load it here). Is there
>something like agpgart for PCI express ? Or is it useless ?
>
>

Hi Dave,

I'm coming back on this topic since I managed to get DRI to work with
the open source driver on 2.6.15 (I mean drivers/char/drm/radeon).
And it does not work on -mm (actually I only tried -mm3) since
apparently radeon loads drm, and drm needs some agp symbols. Both radeon
and drm (and agpgart) and built as module here.
How are we supposed to get DRM to work on PCI Express cards if DRM needs
AGP and agpgart does not load when no AGP card is found ? :)

drm: Unknown symbol agp_bind_memory
...
drm: Unknown symbol agp_backend_release
radeon: Unknown symbol drm_open
...
radeon: Unknown symbol drm_release

thanks,
Brice

2006-01-11 19:24:20

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 12/01/2006 3:55 a.m., Jens Axboe wrote:
> On Wed, Jan 11 2006, Jens Axboe wrote:
>> It's not too tricky, you just need to correct that function prototype.
>> Could you do that? Would be nice to know _exactly_ which libata
>> changeset caused this malfunction. But it does of course point at the
>> barrier changes for scsi/libata...
>
> You can also try something quicker - use a newer kernel known to exhibit
> the problem, and apply this patch on top of that:
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0302723..720ace4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -436,6 +436,7 @@ void md_super_write(mddev_t *mddev, mdk_
> bio->bi_rw = rw;
>
> atomic_inc(&mddev->pending_writes);
> +#if 0
> if (!test_bit(BarriersNotsupp, &rdev->flags)) {
> struct bio *rbio;
> rw |= (1<<BIO_RW_BARRIER);
> @@ -444,6 +445,7 @@ void md_super_write(mddev_t *mddev, mdk_
> rbio->bi_end_io = super_written_barrier;
> submit_bio(rw, rbio);
> } else
> +#endif
> submit_bio(rw, bio);
> }

...and with that patch, I can now boot up 2.6.15-mm3 (repeated twice). So yes,
looks like that's where the problem lies.

Thanks Jens,
Reuben

2006-01-11 19:44:05

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Reuben Farrelly wrote:
>
>
> On 12/01/2006 3:55 a.m., Jens Axboe wrote:
> >On Wed, Jan 11 2006, Jens Axboe wrote:
> >>It's not too tricky, you just need to correct that function prototype.
> >>Could you do that? Would be nice to know _exactly_ which libata
> >>changeset caused this malfunction. But it does of course point at the
> >>barrier changes for scsi/libata...
> >
> >You can also try something quicker - use a newer kernel known to exhibit
> >the problem, and apply this patch on top of that:
> >
> >diff --git a/drivers/md/md.c b/drivers/md/md.c
> >index 0302723..720ace4 100644
> >--- a/drivers/md/md.c
> >+++ b/drivers/md/md.c
> >@@ -436,6 +436,7 @@ void md_super_write(mddev_t *mddev, mdk_
> > bio->bi_rw = rw;
> >
> > atomic_inc(&mddev->pending_writes);
> >+#if 0
> > if (!test_bit(BarriersNotsupp, &rdev->flags)) {
> > struct bio *rbio;
> > rw |= (1<<BIO_RW_BARRIER);
> >@@ -444,6 +445,7 @@ void md_super_write(mddev_t *mddev, mdk_
> > rbio->bi_end_io = super_written_barrier;
> > submit_bio(rw, rbio);
> > } else
> >+#endif
> > submit_bio(rw, bio);
> > }
>
> ...and with that patch, I can now boot up 2.6.15-mm3 (repeated twice). So
> yes, looks like that's where the problem lies.

At least it shows that the problem is indeed barrier related. I don't
have the start of this thread, so can you please send me the output from
dmesg from this kernel boot? I'm curious whether the fallback triggers,
or if it's the barrier that fails instead.

--
Jens Axboe

2006-01-11 19:52:29

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wed, Jan 11 2006, Jens Axboe wrote:
> On Thu, Jan 12 2006, Reuben Farrelly wrote:
> >
> >
> > On 12/01/2006 3:55 a.m., Jens Axboe wrote:
> > >On Wed, Jan 11 2006, Jens Axboe wrote:
> > >>It's not too tricky, you just need to correct that function prototype.
> > >>Could you do that? Would be nice to know _exactly_ which libata
> > >>changeset caused this malfunction. But it does of course point at the
> > >>barrier changes for scsi/libata...
> > >
> > >You can also try something quicker - use a newer kernel known to exhibit
> > >the problem, and apply this patch on top of that:
> > >
> > >diff --git a/drivers/md/md.c b/drivers/md/md.c
> > >index 0302723..720ace4 100644
> > >--- a/drivers/md/md.c
> > >+++ b/drivers/md/md.c
> > >@@ -436,6 +436,7 @@ void md_super_write(mddev_t *mddev, mdk_
> > > bio->bi_rw = rw;
> > >
> > > atomic_inc(&mddev->pending_writes);
> > >+#if 0
> > > if (!test_bit(BarriersNotsupp, &rdev->flags)) {
> > > struct bio *rbio;
> > > rw |= (1<<BIO_RW_BARRIER);
> > >@@ -444,6 +445,7 @@ void md_super_write(mddev_t *mddev, mdk_
> > > rbio->bi_end_io = super_written_barrier;
> > > submit_bio(rw, rbio);
> > > } else
> > >+#endif
> > > submit_bio(rw, bio);
> > > }
> >
> > ...and with that patch, I can now boot up 2.6.15-mm3 (repeated twice). So
> > yes, looks like that's where the problem lies.
>
> At least it shows that the problem is indeed barrier related. I don't
> have the start of this thread, so can you please send me the output from
> dmesg from this kernel boot? I'm curious whether the fallback triggers,
> or if it's the barrier that fails instead.

Or even better, please boot with this patch applied on top of the kernel
you just booted (the new one, with the md patch applied).

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4c5127e..07aee66 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1492,6 +1492,7 @@ static int sd_revalidate_disk(struct gen
ordered = QUEUE_ORDERED_DRAIN;

blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush);
+ printk("%s: ordered set to %d\n", disk->disk_name, ordered);

set_capacity(disk, sdkp->capacity);
kfree(buffer);

--
Jens Axboe

2006-01-11 20:30:15

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wed, Jan 11, 2006 at 01:41:12PM -0500, Brice Goglin wrote:

> I'm coming back on this topic since I managed to get DRI to work with
> the open source driver on 2.6.15 (I mean drivers/char/drm/radeon).
> And it does not work on -mm (actually I only tried -mm3) since
> apparently radeon loads drm, and drm needs some agp symbols. Both radeon
> and drm (and agpgart) and built as module here.
> How are we supposed to get DRM to work on PCI Express cards if DRM needs
> AGP and agpgart does not load when no AGP card is found ? :)
>
> drm: Unknown symbol agp_bind_memory
> ...
> drm: Unknown symbol agp_backend_release
> radeon: Unknown symbol drm_open
> ...
> radeon: Unknown symbol drm_release

That's puzzling. It should still be loadable. All the current agpgart tree
is doing is basically enforcing agp=off if there's no agp card present.
That shouldn't prevent the module from actually loading, or it's symbols being
referenced by other modules.

Hrmm, it's puzzling that you also are unable to resolve drm_open and drm_release.
That may be a follow-on failure from the first, but it seems unlikely.

DaveA, any clues ?

Dave

2006-01-11 21:44:50

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wednesday January 11, [email protected] wrote:
>
> Then the barrier changes from git2 -> git3 should not have anything to
> do with it. Strange... I guess you should try the git bisect method to
> narrow it down.

Not true, though you seem to have already figured that out.

md uses barrier writes when writing the superblock. This is partly
because it seems like a good idea, but largely to test if barrier
writes are going to work on the component devices. If any device
claims not to support barriers, then raid1 will claim not to support
barriers.

And the strange hang happens while md is trying to update the
superblock.

NeilBrown

2006-01-11 21:51:48

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-mm2


> > How are we supposed to get DRM to work on PCI Express cards if DRM needs
> > AGP and agpgart does not load when no AGP card is found ? :)
> >
> > drm: Unknown symbol agp_bind_memory
> > ...
> > drm: Unknown symbol agp_backend_release
> > radeon: Unknown symbol drm_open
> > ...
> > radeon: Unknown symbol drm_release
>
> That's puzzling. It should still be loadable. All the current agpgart tree
> is doing is basically enforcing agp=off if there's no agp card present.
> That shouldn't prevent the module from actually loading, or it's symbols being
> referenced by other modules.
>
> Hrmm, it's puzzling that you also are unable to resolve drm_open and drm_release.
> That may be a follow-on failure from the first, but it seems unlikely.
>
> DaveA, any clues ?

Thats' just a cascaded failure, radeon gives out because drm won't load
because agpgart won't load... there must be a reason why agpgart doesn't
load... perhaps we've some issue when the backend isn't there or
something..

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2006-01-11 21:56:35

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Wed, Jan 11, 2006 at 09:50:31PM +0000, Dave Airlie wrote:

> > That's puzzling. It should still be loadable. All the current agpgart tree
> > is doing is basically enforcing agp=off if there's no agp card present.
> > That shouldn't prevent the module from actually loading, or it's symbols being
> > referenced by other modules.
> >
> > Hrmm, it's puzzling that you also are unable to resolve drm_open and drm_release.
> > That may be a follow-on failure from the first, but it seems unlikely.
>
> Thats' just a cascaded failure, radeon gives out because drm won't load
> because agpgart won't load... there must be a reason why agpgart doesn't
> load... perhaps we've some issue when the backend isn't there or
> something..

It may be that my current experiment is a really bad idea, and if it
causes drm heartburn, I'll drop it. But if you could take a peek
just incase drm is doing something silly I'd appreciate it.

Dave

2006-01-11 23:52:10

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-mm2


> causes drm heartburn, I'll drop it. But if you could take a peek
> just incase drm is doing something silly I'd appreciate it.

I'll try and test -mm2 this evening..

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2006-01-12 03:49:45

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 12/01/2006 8:53 a.m., Jens Axboe wrote:
> On Wed, Jan 11 2006, Jens Axboe wrote:
>> At least it shows that the problem is indeed barrier related. I don't
>> have the start of this thread, so can you please send me the output from
>> dmesg from this kernel boot? I'm curious whether the fallback triggers,
>> or if it's the barrier that fails instead.
>
> Or even better, please boot with this patch applied on top of the kernel
> you just booted (the new one, with the md patch applied).
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4c5127e..07aee66 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1492,6 +1492,7 @@ static int sd_revalidate_disk(struct gen
> ordered = QUEUE_ORDERED_DRAIN;
>
> blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush);
> + printk("%s: ordered set to %d\n", disk->disk_name, ordered);
>
> set_capacity(disk, sdkp->capacity);
> kfree(buffer);

Here it is...

Linux version 2.6.15-mm3 ([email protected]) (gcc version 4.1.0 20060106
(Red Hat 4.1.0-0.14)) #4 SMP Thu Jan 12 16:26:28 NZDT 2006
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fe2f800 (usable)
BIOS-e820: 000000003fe2f800 - 000000003fe3f8e3 (ACPI NVS)
BIOS-e820: 000000003ff2f800 - 000000003ff30000 (ACPI NVS)
BIOS-e820: 000000003ff30000 - 000000003ff40000 (ACPI data)
BIOS-e820: 000000003ff40000 - 000000003fff0000 (ACPI NVS)
BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fed13000 - 00000000fed1a000 (reserved)
BIOS-e820: 00000000fed1c000 - 00000000feda0000 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
On node 0 totalpages: 261679
DMA zone: 4096 pages, LIFO batch:0
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
HighMem zone: 32303 pages, LIFO batch:7
DMI 2.3 present.
ACPI: RSDP (v000 ACPIAM ) @ 0x000f4ee0
ACPI: RSDT (v001 INTEL D925XCV 0x20051110 MSFT 0x00000097) @ 0x3ff30000
ACPI: FADT (v002 INTEL D925XCV 0x20051110 MSFT 0x00000097) @ 0x3ff30200
ACPI: MADT (v001 INTEL D925XCV 0x20051110 MSFT 0x00000097) @ 0x3ff30390
ACPI: MCFG (v001 INTEL D925XCV 0x20051110 MSFT 0x00000097) @ 0x3ff30400
ACPI: ASF! (v016 LEGEND I865PASF 0x00000001 INTL 0x02002026) @ 0x3ff35fa0
ACPI: TCPA (v001 INTEL TBLOEMID 0x00000001 MSFT 0x00000097) @ 0x3ff36040
ACPI: WDDT (v001 INTEL OEMWDDT 0x00000001 INTL 0x02002026) @ 0x3ff36072
ACPI: DSDT (v001 INTEL D925XCV 0x00000001 INTL 0x02002026) @ 0x00000000
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode: Flat. Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
Detected 2800.337 MHz processor.
Built 1 zonelists
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600
CPU 0 irqstacks, hard=c040a000 soft=c0408000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1033552k/1046716k available (2161k kernel code, 12500k reserved, 713k
data, 204k init, 129212k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5607.26 BogoMIPS (lpj=11214524)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 0000441d
00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 0000441d
00000000 00000000
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000180 0000441d
00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c040b000 soft=c0409000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5600.57 BogoMIPS (lpj=11201159)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 0000441d
00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 0000441d
00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000180 0000441d
00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Total of 2 processors activated (11207.84 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
migration_cost=128
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG
ACPI: Subsystem revision 20051216
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
Boot video device is 0000:06:00.0
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEGP._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: Power Resource [URP2] (off)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
PCI: Bridge: 0000:00:01.0
IO window: disabled.
MEM window: ffa00000-ffafffff
PREFETCH window: fdf00000-fdffffff
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: ff600000-ff6fffff
PREFETCH window: fdb00000-fdbfffff
PCI: Bridge: 0000:00:1c.1
IO window: a000-afff
MEM window: ff700000-ff7fffff
PREFETCH window: fdc00000-fdcfffff
PCI: Bridge: 0000:00:1c.2
IO window: disabled.
MEM window: ff800000-ff8fffff
PREFETCH window: fdd00000-fddfffff
PCI: Bridge: 0000:00:1c.3
IO window: disabled.
MEM window: ff900000-ff9fffff
PREFETCH window: fde00000-fdefffff
PCI: Bridge: 0000:00:1e.0
IO window: b000-bfff
MEM window: ff500000-ff5fffff
PREFETCH window: fe000000-fe7fffff
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:01.0 to 64
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1c.0 to 64
PCI: Enabling device 0000:00:1c.1 (0106 -> 0107)
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1c.1 to 64
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1c.2 to 64
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1c.3 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Time: tsc clocksource has been installed.
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:01.0:pcie00]
Allocate Port Service[0000:00:01.0:pcie03]
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.0:pcie00]
Allocate Port Service[0000:00:1c.0:pcie03]
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1c.1 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.1:pcie00]
Allocate Port Service[0000:00:1c.1:pcie02]
Allocate Port Service[0000:00:1c.1:pcie03]
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1c.2 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.2:pcie00]
Allocate Port Service[0000:00:1c.2:pcie02]
Allocate Port Service[0000:00:1c.2:pcie03]
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1c.3 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.3:pcie00]
Allocate Port Service[0000:00:1c.3:pcie02]
Allocate Port Service[0000:00:1c.3:pcie03]
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: Processor [CPU2] (supports 8 throttling states)
Real Time Clock Driver v1.12ac
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 18 (level, low) -> IRQ 185
0000:06:02.0: ttyS1 at I/O 0xbc00 (irq = 185) is a 16550A
0000:06:02.0: ttyS2 at I/O 0xbc08 (irq = 185) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
libata version 1.20 loaded.
ahci 0000:00:1f.2: version 1.2
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 193
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:007f
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: SATA link up 1.5 Gbps (SStatus 113)
ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:007f
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: SATA link up 1.5 Gbps (SStatus 113)
ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:007f
ata3: dev 0 ATA-6, max UDMA/133, 156299375 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : ahci
ata4: SATA link down (SStatus 0)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380013AS Rev: 3.18
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
sda: ordered set to 49
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
sda: ordered set to 49
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb: ordered set to 49
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb: ordered set to 49
sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 >
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 156299375 512-byte hdwr sectors (80025 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
sdc: ordered set to 49
SCSI device sdc: 156299375 512-byte hdwr sectors (80025 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
sdc: ordered set to 49
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: Attached scsi generic sg2 type 0
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 50
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 50, io mem 0xff4ff800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 50
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 50, io base 0x0000cc00
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 193, io base 0x0000d000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 185, io base 0x0000d400
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1d.3 to 64
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 169, io base 0x0000d800
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usb 5-1: new full speed USB device using uhci_hcd and address 2
usb 5-1: configuration #1 chosen from 1 choice
usb 5-2: new full speed USB device using uhci_hcd and address 3
usb 5-2: configuration #1 chosen from 1 choice
hub 5-2:1.0: USB hub found
hub 5-2:1.0: 4 ports detected
usb 5-2.1: new low speed USB device using uhci_hcd and address 4
usb 5-2.1: configuration #1 chosen from 1 choice
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usbcore: registered new driver libusual
usbcore: registered new driver hiddev
input: Belkin Components Belkin OmniView KVM Switch as /class/input/input0
input: USB HID v1.00 Keyboard [Belkin Components Belkin OmniView KVM Switch] on
usb-0000:00:1d.3-2.1
input: Belkin Components Belkin OmniView KVM Switch as /class/input/input1
input: USB HID v1.00 Mouse [Belkin Components Belkin OmniView KVM Switch] on
usb-0000:00:1d.3-2.1
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
mice: PS/2 mouse device common for all mice
md: raid1 personality registered for level 1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
IPv4 over IPv4 tunneling driver
ip_conntrack version 2.4 (8177 buckets, 65416 max) - 212 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
ipt_recent v0.3.1: Stephen Frost <[email protected]>.
http://snowman.net/projects/ipt_recent/
arp_tables: (C) 2002 David S. Miller
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
p4-clockmod: P4/Xeon(TM) CPU On-Demand Clock Modulation available
Starting balanced_irq
Using IPI Shortcut mode
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb10 ...
md: adding sdb10 ...
md: sdb7 has different UUID to sdb10
md: sdb6 has different UUID to sdb10
md: sdb5 has different UUID to sdb10
md: sdb3 has different UUID to sdb10
md: sdb2 has different UUID to sdb10
md: adding sda10 ...
md: sda7 has different UUID to sdb10
md: sda6 has different UUID to sdb10
md: sda5 has different UUID to sdb10
md: sda3 has different UUID to sdb10
md: sda2 has different UUID to sdb10
md: created md5
md: bind<sda10>
md: bind<sdb10>
md: running: <sdb10><sda10>
raid1: raid set md5 active with 2 out of 2 mirrors
md5: bitmap initialized from disk: read 11/11 pages, set 3 bits, status: 0
created bitmap (161 pages) for device md5
md: considering sdb7 ...
md: adding sdb7 ...
md: sdb6 has different UUID to sdb7
md: sdb5 has different UUID to sdb7
md: sdb3 has different UUID to sdb7
md: sdb2 has different UUID to sdb7
md: adding sda7 ...
md: sda6 has different UUID to sdb7
md: sda5 has different UUID to sdb7
md: sda3 has different UUID to sdb7
md: sda2 has different UUID to sdb7
md: created md4
md: bind<sda7>
md: bind<sdb7>
md: running: <sdb7><sda7>
raid1: raid set md4 active with 2 out of 2 mirrors
md4: bitmap initialized from disk: read 4/4 pages, set 11 bits, status: 0
created bitmap (61 pages) for device md4
md: considering sdb6 ...
md: adding sdb6 ...
md: sdb5 has different UUID to sdb6
md: sdb3 has different UUID to sdb6
md: sdb2 has different UUID to sdb6
md: adding sda6 ...
md: sda5 has different UUID to sdb6
md: sda3 has different UUID to sdb6
md: sda2 has different UUID to sdb6
md: created md3
md: bind<sda6>
md: bind<sdb6>
md: running: <sdb6><sda6>
raid1: raid set md3 active with 2 out of 2 mirrors
md3: bitmap initialized from disk: read 1/1 pages, set 11 bits, status: 0
created bitmap (13 pages) for device md3
md: considering sdb5 ...
md: adding sdb5 ...
md: sdb3 has different UUID to sdb5
md: sdb2 has different UUID to sdb5
md: adding sda5 ...
md: sda3 has different UUID to sdb5
md: sda2 has different UUID to sdb5
md: created md2
md: bind<sda5>
md: bind<sdb5>
md: running: <sdb5><sda5>
raid1: raid set md2 active with 2 out of 2 mirrors
md2: bitmap initialized from disk: read 10/10 pages, set 84 bits, status: 0
created bitmap (150 pages) for device md2
md: considering sdb3 ...
md: adding sdb3 ...
md: sdb2 has different UUID to sdb3
md: adding sda3 ...
md: sda2 has different UUID to sdb3
md: created md1
md: bind<sda3>
md: bind<sdb3>
md: running: <sdb3><sda3>
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 10/10 pages, set 5 bits, status: 0
created bitmap (150 pages) for device md1
md: considering sdb2 ...
md: adding sdb2 ...
md: adding sda2 ...
md: created md0
md: bind<sda2>
md: bind<sdb2>
md: running: <sdb2><sda2>
raid1: raid set md0 active with 2 out of 2 mirrors
md0: bitmap initialized from disk: read 12/12 pages, set 95 bits, status: 0
created bitmap (187 pages) for device md0
md: ... autorun DONE.
ReiserFS: md0: found reiserfs format "3.6" with standard journal
ReiserFS: md0: using ordered data mode
ReiserFS: md0: journal params: device md0, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md0: checking transaction log (md0)
ReiserFS: md0: Using r5 hash to sort names
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 204k freed
Write protecting the kernel read-only data: 355k
hw_random hardware driver 1.0.0 loaded
ICH6: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 185
ICH6: chipset revision 3
ICH6: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: _NEC DVD_RW ND-2510A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:04:00.0 to 64
sky2 v0.11 addr 0xff720000 irq 177 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:11:11:43:05:2f
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ReiserFS: md1: found reiserfs format "3.6" with standard journal
ReiserFS: md1: using ordered data mode
ReiserFS: md1: journal params: device md1, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md1: checking transaction log (md1)
ReiserFS: md1: Using r5 hash to sort names
ReiserFS: md2: found reiserfs format "3.6" with standard journal
ReiserFS: md2: using ordered data mode
ReiserFS: md2: journal params: device md2, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md2: checking transaction log (md2)
ReiserFS: md2: Using r5 hash to sort names
ReiserFS: md3: found reiserfs format "3.6" with standard journal
ReiserFS: md3: using ordered data mode
ReiserFS: md3: journal params: device md3, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md3: checking transaction log (md3)
ReiserFS: md3: Using r5 hash to sort names
ReiserFS: md4: found reiserfs format "3.6" with standard journal
ReiserFS: md4: using ordered data mode
ReiserFS: md4: journal params: device md4, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md4: checking transaction log (md4)
ReiserFS: md4: Using r5 hash to sort names
ReiserFS: md5: found reiserfs format "3.6" with standard journal
ReiserFS: md5: using ordered data mode
ReiserFS: md5: journal params: device md5, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md5: checking transaction log (md5)
ReiserFS: md5: Using r5 hash to sort names
ReiserFS: sda8: found reiserfs format "3.6" with standard journal
ReiserFS: sda8: using ordered data mode
ReiserFS: sda8: journal params: device sda8, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sda8: checking transaction log (sda8)
ReiserFS: sda8: Using r5 hash to sort names
ReiserFS: sdb8: found reiserfs format "3.6" with standard journal
ReiserFS: sdb8: using ordered data mode
ReiserFS: sdb8: journal params: device sdb8, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sdb8: checking transaction log (sdb8)
ReiserFS: sdb8: Using r5 hash to sort names
ReiserFS: sdc1: found reiserfs format "3.6" with standard journal
ReiserFS: sdc1: using ordered data mode
ReiserFS: sdc1: journal params: device sdc1, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: sdc1: checking transaction log (sdc1)
ReiserFS: sdc1: Using r5 hash to sort names
Adding 248968k swap on /dev/sda9. Priority:-1 extents:1 across:248968k
Adding 248968k swap on /dev/sdb9. Priority:-2 extents:1 across:248968k
sky2 eth0: enabling interface
sky2 eth0: phy interrupt status 0x1c00 0xbc0c
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
GRE over IPv4 tunneling driver
NET: Registered protocol family 10
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
i2c_adapter i2c-0: Unrecognized version/stepping 0x69 Defaulting to LM85.
Installing knfsd (copyright (C) 1996 [email protected]).

I'm seeing some other big problems with SATA which I'll post about soon in the
original thread.

reuben

2006-01-12 07:33:28

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Neil Brown wrote:
> On Wednesday January 11, [email protected] wrote:
> >
> > Then the barrier changes from git2 -> git3 should not have anything to
> > do with it. Strange... I guess you should try the git bisect method to
> > narrow it down.
>
> Not true, though you seem to have already figured that out.
>
> md uses barrier writes when writing the superblock. This is partly
> because it seems like a good idea, but largely to test if barrier
> writes are going to work on the component devices. If any device
> claims not to support barriers, then raid1 will claim not to support
> barriers.
>
> And the strange hang happens while md is trying to update the
> superblock.

Yeah that's what I found out later on in the thread, indeed killing that
barrier write made the problem disappear.

--
Jens Axboe

2006-01-12 08:00:59

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hello, Reuben, Jens and all.

On Thu, Jan 12, 2006 at 04:49:30PM +1300, Reuben Farrelly wrote:
>
>
> On 12/01/2006 8:53 a.m., Jens Axboe wrote:
> >On Wed, Jan 11 2006, Jens Axboe wrote:
> >>At least it shows that the problem is indeed barrier related. I don't
> >>have the start of this thread, so can you please send me the output from
> >>dmesg from this kernel boot? I'm curious whether the fallback triggers,
> >>or if it's the barrier that fails instead.
> >
> >Or even better, please boot with this patch applied on top of the kernel
> >you just booted (the new one, with the md patch applied).
> >
> >diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >index 4c5127e..07aee66 100644
> >--- a/drivers/scsi/sd.c
> >+++ b/drivers/scsi/sd.c
> >@@ -1492,6 +1492,7 @@ static int sd_revalidate_disk(struct gen
> > ordered = QUEUE_ORDERED_DRAIN;
> >
> > blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush);
> >+ printk("%s: ordered set to %d\n", disk->disk_name, ordered);
> >
> > set_capacity(disk, sdkp->capacity);
> > kfree(buffer);
>
> Here it is...
>
> Linux version 2.6.15-mm3 ([email protected]) (gcc version 4.1.0
> 20060106 (Red Hat 4.1.0-0.14)) #4 SMP Thu Jan 12 16:26:28 NZDT 2006
[--snip--]
> ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA
> mode
> ahci 0000:00:1f.2: flags: 64bit ncq led slum part
> ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 193
> ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 193
> ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 193
> ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 193
> ata1: SATA link up 1.5 Gbps (SStatus 113)
> ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata1: dev 0 configured for UDMA/133
> scsi0 : ahci
> ata2: SATA link up 1.5 Gbps (SStatus 113)
> ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata2: dev 0 configured for UDMA/133
> scsi1 : ahci
> ata3: SATA link up 1.5 Gbps (SStatus 113)
> ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata3: dev 0 ATA-6, max UDMA/133, 156299375 sectors: LBA48
> ata3: dev 0 configured for UDMA/133
> scsi2 : ahci
> ata4: SATA link down (SStatus 0)
> scsi3 : ahci
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380013AS Rev: 3.18
> Type: Direct-Access ANSI SCSI revision: 05
> SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
> sda: ordered set to 49

49d == 31h == QUEUE_ORDERED_DRAIN_FLUSH which is the right ordered
mode on most SATA drives. Barrier is performed by

drain -> pre-flush -> barrier -> post-flush

sequence. If something went wrong and above sequence got stuck, the
first suspect part would be 'draining'. I'm attaching a patch which
adds a bunch of debug messages regarding ordered sequencing. Can you
please apply the patch and post the log message? The patch is against
v2.6.15-mm2.

I've also tested almost the same setup here - 2 maxtor SATA drives
hanging off AHCI and grouped into /dev/md0 and it works fine here. It
prints something like the following while mounting /dev/md0 on boot
with the debug patch applied.

[start_ordered ] f7695078 -> f75b5d9c,f75b5e40,f75b5ee4 infl=0
[start_ordered ] f6029540 0 9767359 8 8 1 1 f764b000
[start_ordered ] BIO f6029540 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695078->f75b5d9c
[elv_next_request ] f75b5d9c (pre)
[blk_do_ordered ] seq=04 f75b5e40->00000000
[end_that_request_last] !ELVPRIV f75b5d9c 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f75b5e40->f75b5e40
[elv_next_request ] f75b5e40 (bar)
[blk_do_ordered ] seq=08 f75b5ee4->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f6029540 9767359 4096
[end_that_request_last] !ELVPRIV f75b5e40 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f75b5ee4->f75b5ee4
[elv_next_request ] f75b5ee4 (post)
[end_that_request_last] !ELVPRIV f75b5ee4 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
[start_ordered ] f7695078 -> f7d03208,f7d032ac,f7d03350 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764a000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695078->f7d03208
[elv_next_request ] f7d03208 (pre)
[blk_do_ordered ] seq=04 f7d032ac->00000000
[end_that_request_last] !ELVPRIV f7d03208 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f7d032ac->f7d032ac
[elv_next_request ] f7d032ac (bar)
[blk_do_ordered ] seq=08 f7d03350->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f7d032ac 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f7d03350->f7d03350
[elv_next_request ] f7d03350 (post)
kjournald starting. Commit interval 5 seconds
[end_that_request_last] !ELVPRIV f7d03350 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
<< /dev/md0 got mounted >>
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
[start_ordered ] f7695ee8 -> f75b5d9c,f75b5e40,f75b5ee4 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764b000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695ee8->f75b5d9c
[elv_next_request ] f75b5d9c (pre)
[blk_do_ordered ] seq=04 f75b5e40->00000000
[end_that_request_last] !ELVPRIV f75b5d9c 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f75b5e40->f75b5e40
[elv_next_request ] f75b5e40 (bar)
[blk_do_ordered ] seq=08 f75b5ee4->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f75b5e40 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f75b5ee4->f75b5ee4
[elv_next_request ] f75b5ee4 (post)
[end_that_request_last] !ELVPRIV f75b5ee4 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
[start_ordered ] f7695ee8 -> f7d03208,f7d032ac,f7d03350 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764a000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695ee8->f7d03208
[elv_next_request ] f7d03208 (pre)
[blk_do_ordered ] seq=04 f7d032ac->00000000
[end_that_request_last] !ELVPRIV f7d03208 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f7d032ac->f7d032ac
[elv_next_request ] f7d032ac (bar)
[blk_do_ordered ] seq=08 f7d03350->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f7d032ac 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f7d03350->f7d03350
[elv_next_request ] f7d03350 (post)
[end_that_request_last] !ELVPRIV f7d03350 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete


And the patch follows.


diff --git a/block/elevator.c b/block/elevator.c
index 1b5b5d9..2bab695 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -37,6 +37,8 @@

#include <asm/uaccess.h>

+#define pd(fmt, args...) printk("[%-20s] "fmt, __FUNCTION__ , ##args)
+
static DEFINE_SPINLOCK(elv_list_lock);
static LIST_HEAD(elv_list);

@@ -351,8 +353,10 @@ void __elv_add_request(request_queue_t *
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
- } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT)
+ } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT) {
where = ELEVATOR_INSERT_BACK;
+ pd("!ELVPRIV %p %08lx inserting back\n", rq, rq->flags);
+ }

if (plug)
blk_plug_device(q);
@@ -528,6 +532,11 @@ struct request *elv_next_request(request
}
}

+ if (rq && (rq == &q->pre_flush_rq || rq == &q->post_flush_rq ||
+ rq == &q->bar_rq))
+ pd("%p (%s)\n", rq,
+ rq == &q->pre_flush_rq ?
+ "pre" : (rq == &q->post_flush_rq ? "post" : "bar"));
return rq;
}

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index ec27dda..12396e0 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -28,6 +28,8 @@
#include <linux/writeback.h>
#include <linux/blktrace_api.h>

+#define pd(fmt, args...) printk("[%-20s] "fmt, __FUNCTION__ , ##args)
+
/*
* for max sense size
*/
@@ -303,6 +305,8 @@ static inline void rq_init(request_queue
int blk_queue_ordered(request_queue_t *q, unsigned ordered,
prepare_flush_fn *prepare_flush_fn)
{
+ pd("%x->%x, ordseq=%x\n", q->next_ordered, ordered, q->ordseq);
+
if (ordered & (QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH) &&
prepare_flush_fn == NULL) {
printk(KERN_ERR "blk_queue_ordered: prepare_flush_fn required\n");
@@ -380,6 +384,9 @@ void blk_ordered_complete_seq(request_qu
struct request *rq;
int uptodate;

+ pd("ordseq=%02x seq=%02x orderr=%d error=%d\n",
+ q->ordseq, seq, q->orderr, error);
+
if (error && !q->orderr)
q->orderr = error;

@@ -392,6 +399,7 @@ void blk_ordered_complete_seq(request_qu
/*
* Okay, sequence complete.
*/
+ pd("sequence complete\n");
rq = q->orig_bar_rq;
uptodate = q->orderr ? q->orderr : 1;

@@ -446,6 +454,17 @@ static void queue_flush(request_queue_t
static inline struct request *start_ordered(request_queue_t *q,
struct request *rq)
{
+ pd("%p -> %p,%p,%p infl=%u\n",
+ rq, &q->pre_flush_rq, &q->bar_rq, &q->post_flush_rq, q->in_flight);
+ pd("%p %d %llu %lu %u %u %u %p\n", rq->bio, rq->errors,
+ (unsigned long long)rq->hard_sector, rq->hard_nr_sectors,
+ rq->current_nr_sectors, rq->nr_phys_segments, rq->nr_hw_segments,
+ rq->buffer);
+ struct bio *bio;
+ for (bio = rq->bio; bio; bio = bio->bi_next)
+ pd("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
+
q->bi_size = 0;
q->orderr = 0;
q->ordered = q->next_ordered;
@@ -484,6 +503,7 @@ static inline struct request *start_orde
} else
q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;

+ pd("ordered=%x in_flight=%u\n", q->ordered, q->in_flight);
if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
q->ordseq |= QUEUE_ORDSEQ_DRAIN;
else
@@ -503,8 +523,10 @@ int blk_do_ordered(request_queue_t *q, s

if (q->next_ordered != QUEUE_ORDERED_NONE) {
*rqp = start_ordered(q, rq);
+ pd("start_ordered %p->%p\n", rq, *rqp);
return 1;
} else {
+ pd("ORDERED_NONE, seen barrier\n");
/*
* This can happen when the queue switches to
* ORDERED_NONE while this request is on it.
@@ -521,6 +543,7 @@ int blk_do_ordered(request_queue_t *q, s
if (q->ordered & QUEUE_ORDERED_TAG) {
if (is_barrier && rq != &q->bar_rq)
*rqp = NULL;
+ pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
return 1;
}

@@ -544,6 +567,7 @@ int blk_do_ordered(request_queue_t *q, s
rq == &q->post_flush_rq))
*rqp = NULL;

+ pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
return 1;
}

@@ -576,6 +600,8 @@ static int flush_dry_bio_endio(struct bi
bio->bi_sector -= (q->bi_size >> 9);
q->bi_size = 0;

+ pd("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
return 0;
}

@@ -589,6 +615,7 @@ static inline int ordered_bio_endio(stru
if (&q->bar_rq != rq)
return 0;

+ pd("q->orderr=%d error=%d\n", q->orderr, error);
/*
* Okay, this is the barrier request in progress, dry finish it.
*/
@@ -2008,6 +2035,8 @@ static void freed_request(request_queue_
rl->count[rw]--;
if (priv)
rl->elvpriv--;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

__freed_request(q, rw);

@@ -2074,6 +2103,8 @@ static struct request *get_request(reque
priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
if (priv)
rl->elvpriv++;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

spin_unlock_irq(q->queue_lock);

@@ -2839,6 +2870,7 @@ static int __make_request(request_queue_

barrier = bio_barrier(bio);
if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ pd("ORDERED_NONE, seen barrier\n");
err = -EOPNOTSUPP;
goto end_io;
}
@@ -3394,6 +3426,9 @@ void end_that_request_last(struct reques
if (end_io_error(uptodate))
error = !uptodate ? -EIO : uptodate;

+ if (!(req->flags & REQ_ELVPRIV))
+ pd("!ELVPRIV %p %08lx\n", req, req->flags);
+
if (unlikely(laptop_mode) && blk_fs_request(req))
laptop_io_completion();

diff --git a/drivers/scsi/sata_sil24.c b/drivers/scsi/sata_sil24.c
index 529ae66..569dd1f 100644
--- a/drivers/scsi/sata_sil24.c
+++ b/drivers/scsi/sata_sil24.c
@@ -551,6 +551,10 @@ static void sil24_qc_prep(struct ata_que
BUG();
}

+ if (qc->tf.command == ATA_CMD_FLUSH_EXT) {
+ printk("sil24: corrupting flush\n");
+ qc->tf.command = 0x3D;
+ }
ata_tf_to_fis(&qc->tf, prb->fis, 0);

if (qc->flags & ATA_QCFLAG_DMAMAP)
@@ -564,6 +568,14 @@ static int sil24_qc_issue(struct ata_que
struct sil24_port_priv *pp = ap->private_data;
dma_addr_t paddr = pp->cmd_block_dma + qc->tag * sizeof(*pp->cmd_block);

+ {
+ struct ata_taskfile *tf = &qc->tf;
+ printk("sil24: issuing %02x lba=%02x%02x%02x%02x%02x%02x nsect=%02x%02x f=%02x d=%02x\n",
+ tf->command, tf->hob_lbah, tf->hob_lbam, tf->hob_lbal,
+ tf->lbah, tf->lbam, tf->lbal, tf->hob_nsect, tf->nsect,
+ tf->feature, tf->device);
+ }
+
writel((u32)paddr, port + PORT_CMD_ACTIVATE);
return 0;
}

2006-01-12 08:20:54

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Tejun Heo wrote:
> Hello, Reuben, Jens and all.
>
> On Thu, Jan 12, 2006 at 04:49:30PM +1300, Reuben Farrelly wrote:
> >
> >
> > On 12/01/2006 8:53 a.m., Jens Axboe wrote:
> > >On Wed, Jan 11 2006, Jens Axboe wrote:
> > >>At least it shows that the problem is indeed barrier related. I don't
> > >>have the start of this thread, so can you please send me the output from
> > >>dmesg from this kernel boot? I'm curious whether the fallback triggers,
> > >>or if it's the barrier that fails instead.
> > >
> > >Or even better, please boot with this patch applied on top of the kernel
> > >you just booted (the new one, with the md patch applied).
> > >
> > >diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > >index 4c5127e..07aee66 100644
> > >--- a/drivers/scsi/sd.c
> > >+++ b/drivers/scsi/sd.c
> > >@@ -1492,6 +1492,7 @@ static int sd_revalidate_disk(struct gen
> > > ordered = QUEUE_ORDERED_DRAIN;
> > >
> > > blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush);
> > >+ printk("%s: ordered set to %d\n", disk->disk_name, ordered);
> > >
> > > set_capacity(disk, sdkp->capacity);
> > > kfree(buffer);
> >
> > Here it is...
> >
> > Linux version 2.6.15-mm3 ([email protected]) (gcc version 4.1.0
> > 20060106 (Red Hat 4.1.0-0.14)) #4 SMP Thu Jan 12 16:26:28 NZDT 2006
> [--snip--]
> > ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA
> > mode
> > ahci 0000:00:1f.2: flags: 64bit ncq led slum part
> > ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 193
> > ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 193
> > ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 193
> > ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 193
> > ata1: SATA link up 1.5 Gbps (SStatus 113)
> > ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> > 88:007f
> > ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> > ata1: dev 0 configured for UDMA/133
> > scsi0 : ahci
> > ata2: SATA link up 1.5 Gbps (SStatus 113)
> > ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> > 88:007f
> > ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> > ata2: dev 0 configured for UDMA/133
> > scsi1 : ahci
> > ata3: SATA link up 1.5 Gbps (SStatus 113)
> > ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> > 88:007f
> > ata3: dev 0 ATA-6, max UDMA/133, 156299375 sectors: LBA48
> > ata3: dev 0 configured for UDMA/133
> > scsi2 : ahci
> > ata4: SATA link down (SStatus 0)
> > scsi3 : ahci
> > Vendor: ATA Model: ST380817AS Rev: 3.42
> > Type: Direct-Access ANSI SCSI revision: 05
> > Vendor: ATA Model: ST380817AS Rev: 3.42
> > Type: Direct-Access ANSI SCSI revision: 05
> > Vendor: ATA Model: ST380013AS Rev: 3.18
> > Type: Direct-Access ANSI SCSI revision: 05
> > SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> > sda: Write Protect is off
> > sda: Mode Sense: 00 3a 00 00
> > SCSI device sda: drive cache: write back
> > sda: ordered set to 49
>
> 49d == 31h == QUEUE_ORDERED_DRAIN_FLUSH which is the right ordered
> mode on most SATA drives. Barrier is performed by
>
> drain -> pre-flush -> barrier -> post-flush
>
> sequence. If something went wrong and above sequence got stuck, the
> first suspect part would be 'draining'. I'm attaching a patch which
> adds a bunch of debug messages regarding ordered sequencing. Can you
> please apply the patch and post the log message? The patch is against
> v2.6.15-mm2.
>
> I've also tested almost the same setup here - 2 maxtor SATA drives
> hanging off AHCI and grouped into /dev/md0 and it works fine here. It
> prints something like the following while mounting /dev/md0 on boot
> with the debug patch applied.

It works fine for me as well, both with SATA using the drain-flush
approach and with SCSI + FUA _and_ drain-flush (modified SD to set that
instead).

Thanks for looking at this Tejun, we badly need it fixed asap!

--
Jens Axboe

2006-01-12 10:59:11

by Ulrich Mueller

[permalink] [raw]
Subject: Re: 2.6.15-mm2

>>>>> On Wed, 11 Jan 2006, Dave Airlie wrote:

>> > How are we supposed to get DRM to work on PCI Express cards if DRM
>> > needs AGP and agpgart does not load when no AGP card is found ? :)
>>
>> That's puzzling. It should still be loadable. All the current
>> agpgart tree is doing is basically enforcing agp=off if there's no
>> agp card present. That shouldn't prevent the module from actually
>> loading, or it's symbols being referenced by other modules.
>>
>> Hrmm, it's puzzling that you also are unable to resolve drm_open
>> and drm_release. That may be a follow-on failure from the first,
>> but it seems unlikely.

> Thats' just a cascaded failure, radeon gives out because drm won't
> load because agpgart won't load... there must be a reason why
> agpgart doesn't load... perhaps we've some issue when the backend
> isn't there or something..

Same problem here with 2.6.15-mm3:

$ lspci -s 00:02.0 -v
00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
Subsystem: Hewlett-Packard Company nx6110/nc6120
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
I/O ports at 7000 [size=8]
Memory at c0000000 (32-bit, prefetchable) [size=256M]
Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
Capabilities: [d0] Power Management version 2
$ modprobe -v agpgart
insmod /lib/modules/2.6.15-mm3/kernel/drivers/char/agp/agpgart.ko
FATAL: Error inserting agpgart (/lib/modules/2.6.15-mm3/kernel/drivers/char/agp/agpgart.ko): No such device

And as a consequence intel-agp, drm, and i915 cannot be loaded.

The problem disappears if I revert git-agpgart.patch.

2006-01-12 11:18:55

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
[--snip--]
> [start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
> [start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
> [start_ordered ] BIO f74b0e00 48869571 4096
> [start_ordered ] ordered=31 in_flight=1
> [blk_do_ordered ] start_ordered f7e8a708->00000000
> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
> [blk_do_ordered ] seq=02 c1b028fc->00000000
> [blk_do_ordered ] seq=02 c1b028fc->00000000
> [blk_do_ordered ] seq=02 c1b028fc->00000000

Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
of pre-flush while draining and when it finished it didn't complete
draining thus hanging the queue. It seems like it's some kind of
special request which probably fails and got retried. Are you using
SMART or something which issues special commands to drives?

> [start_ordered ] c1b53660 -> c1b021c4,c1b0226c,c1b02314 infl=1
> [start_ordered ] f7e58d80 0 68436682 8 8 1 1 c1bbd000
> [start_ordered ] BIO f7e58d80 68436682 4096
> [start_ordered ] ordered=31 in_flight=1
> [blk_do_ordered ] start_ordered c1b53660->00000000
> [blk_ordered_complete_seq] ordseq=01 seq=02 orderr=0 error=0
> [blk_do_ordered ] seq=04 c1b021c4->c1b021c4
> [elv_next_request ] c1b021c4 (pre)
> [blk_do_ordered ] seq=04 c1b0226c->00000000
> [blk_do_ordered ] seq=04 c1b0226c->00000000
> [end_that_request_last] !ELVPRIV c1b021c4 02002318
> [blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
> [blk_do_ordered ] seq=08 c1b0226c->c1b0226c
> [elv_next_request ] c1b0226c (bar)
> [blk_do_ordered ] seq=08 c1b02314->00000000
> [ordered_bio_endio ] q->orderr=0 error=0
> [flush_dry_bio_endio ] BIO f7e58d80 68436682 4096
> [end_that_request_last] !ELVPRIV c1b0226c 000003d9
> [blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
> [blk_do_ordered ] seq=10 c1b02314->c1b02314
> [elv_next_request ] c1b02314 (post)
> [end_that_request_last] !ELVPRIV c1b02314 02002318
> [blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
> [blk_ordered_complete_seq] sequence complete
>

Can you please try the following debug patch. I've added a few more
debug messages to make things clearer.

diff --git a/block/elevator.c b/block/elevator.c
index 1b5b5d9..a0075aa 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -37,6 +37,9 @@

#include <asm/uaccess.h>

+#define pd(fmt, args...) printk("[%02d %-24s] "fmt, q->id, __FUNCTION__ , ##args)
+#define pd0(fmt, args...) printk("[na %-24s] "fmt, __FUNCTION__ , ##args)
+
static DEFINE_SPINLOCK(elv_list_lock);
static LIST_HEAD(elv_list);

@@ -296,6 +299,9 @@ void elv_requeue_request(request_queue_t
* it already went through dequeue, we need to decrement the
* in_flight count again
*/
+ if (q->ordseq)
+ pd("ordseq=%02x requeueing %p (flags=0x%lx) infl=%u\n",
+ q->ordseq, rq, rq->flags, q->in_flight);
if (blk_account_rq(rq)) {
q->in_flight--;
if (blk_sorted_rq(rq) && e->ops->elevator_deactivate_req_fn)
@@ -351,8 +357,10 @@ void __elv_add_request(request_queue_t *
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
- } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT)
+ } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT) {
where = ELEVATOR_INSERT_BACK;
+ pd("!ELVPRIV %p %08lx inserting back\n", rq, rq->flags);
+ }

if (plug)
blk_plug_device(q);
@@ -528,6 +536,11 @@ struct request *elv_next_request(request
}
}

+ if (rq && (rq == &q->pre_flush_rq || rq == &q->post_flush_rq ||
+ rq == &q->bar_rq))
+ pd("%p (%s)\n", rq,
+ rq == &q->pre_flush_rq ?
+ "pre" : (rq == &q->post_flush_rq ? "post" : "bar"));
return rq;
}

@@ -623,6 +636,9 @@ void elv_completed_request(request_queue
* Check if the queue is waiting for fs requests to be
* drained for flush sequence.
*/
+ if (q->ordseq)
+ pd("seq=%02x rq=%p (flags=0x%lx) infl=%u\n",
+ q->ordseq, rq, rq->flags, q->in_flight);
if (q->ordseq && q->in_flight == 0 &&
blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN &&
blk_ordered_req_seq(first_rq) > QUEUE_ORDSEQ_DRAIN) {
@@ -632,7 +648,9 @@ void elv_completed_request(request_queue

if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
e->ops->elevator_completed_req_fn(q, rq);
- }
+ } else if (q->ordseq)
+ pd("seq=%02x unacc %p (flags=0x%lx) infl=%u\n",
+ q->ordseq, rq, rq->flags, q->in_flight);
}

int elv_register_queue(struct request_queue *q)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index ec27dda..494fe39 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -28,6 +28,9 @@
#include <linux/writeback.h>
#include <linux/blktrace_api.h>

+#define pd(fmt, args...) printk("[%02d %-24s] "fmt, q->id, __FUNCTION__ , ##args)
+#define pd0(fmt, args...) printk("[na %-24s] "fmt, __FUNCTION__ , ##args)
+
/*
* for max sense size
*/
@@ -303,6 +306,8 @@ static inline void rq_init(request_queue
int blk_queue_ordered(request_queue_t *q, unsigned ordered,
prepare_flush_fn *prepare_flush_fn)
{
+ pd("%x->%x, ordseq=%x\n", q->next_ordered, ordered, q->ordseq);
+
if (ordered & (QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH) &&
prepare_flush_fn == NULL) {
printk(KERN_ERR "blk_queue_ordered: prepare_flush_fn required\n");
@@ -380,6 +385,9 @@ void blk_ordered_complete_seq(request_qu
struct request *rq;
int uptodate;

+ pd("ordseq=%02x seq=%02x orderr=%d error=%d\n",
+ q->ordseq, seq, q->orderr, error);
+
if (error && !q->orderr)
q->orderr = error;

@@ -392,6 +400,7 @@ void blk_ordered_complete_seq(request_qu
/*
* Okay, sequence complete.
*/
+ pd("sequence complete\n");
rq = q->orig_bar_rq;
uptodate = q->orderr ? q->orderr : 1;

@@ -446,6 +455,18 @@ static void queue_flush(request_queue_t
static inline struct request *start_ordered(request_queue_t *q,
struct request *rq)
{
+ pd("%p -> %p,%p,%p ordcolor=%d infl=%u \n",
+ rq, &q->pre_flush_rq, &q->bar_rq, &q->post_flush_rq,
+ q->ordcolor, q->in_flight);
+ pd("%p %d %llu %lu %u %u %u %p\n", rq->bio, rq->errors,
+ (unsigned long long)rq->hard_sector, rq->hard_nr_sectors,
+ rq->current_nr_sectors, rq->nr_phys_segments, rq->nr_hw_segments,
+ rq->buffer);
+ struct bio *bio;
+ for (bio = rq->bio; bio; bio = bio->bi_next)
+ pd("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
+
q->bi_size = 0;
q->orderr = 0;
q->ordered = q->next_ordered;
@@ -484,6 +505,7 @@ static inline struct request *start_orde
} else
q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;

+ pd("ordered=%x in_flight=%u\n", q->ordered, q->in_flight);
if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
q->ordseq |= QUEUE_ORDSEQ_DRAIN;
else
@@ -503,8 +525,10 @@ int blk_do_ordered(request_queue_t *q, s

if (q->next_ordered != QUEUE_ORDERED_NONE) {
*rqp = start_ordered(q, rq);
+ pd("start_ordered %p->%p\n", rq, *rqp);
return 1;
} else {
+ pd("ORDERED_NONE, seen barrier\n");
/*
* This can happen when the queue switches to
* ORDERED_NONE while this request is on it.
@@ -521,6 +545,8 @@ int blk_do_ordered(request_queue_t *q, s
if (q->ordered & QUEUE_ORDERED_TAG) {
if (is_barrier && rq != &q->bar_rq)
*rqp = NULL;
+ pd("seq=%02x %p->%p (flag=0x%lx)\n",
+ blk_ordered_cur_seq(q), rq, *rqp, *rqp ? (*rqp)->flags : 0);
return 1;
}

@@ -544,6 +570,8 @@ int blk_do_ordered(request_queue_t *q, s
rq == &q->post_flush_rq))
*rqp = NULL;

+ pd("seq=%02x %p->%p (flags=0x%lx)\n",
+ blk_ordered_cur_seq(q), rq, *rqp, *rqp ? (*rqp)->flags : 0);
return 1;
}

@@ -576,6 +604,8 @@ static int flush_dry_bio_endio(struct bi
bio->bi_sector -= (q->bi_size >> 9);
q->bi_size = 0;

+ pd0("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
return 0;
}

@@ -589,6 +619,7 @@ static inline int ordered_bio_endio(stru
if (&q->bar_rq != rq)
return 0;

+ pd("q->orderr=%d error=%d\n", q->orderr, error);
/*
* Okay, this is the barrier request in progress, dry finish it.
*/
@@ -1858,6 +1889,11 @@ blk_init_queue_node(request_fn_proc *rfn
if (!q)
return NULL;

+ {
+ static int qid;
+ q->id = qid++;
+ }
+
q->node = node_id;
if (blk_init_free_list(q))
goto out_init;
@@ -2008,6 +2044,8 @@ static void freed_request(request_queue_
rl->count[rw]--;
if (priv)
rl->elvpriv--;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

__freed_request(q, rw);

@@ -2074,6 +2112,8 @@ static struct request *get_request(reque
priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
if (priv)
rl->elvpriv++;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

spin_unlock_irq(q->queue_lock);

@@ -2839,6 +2879,7 @@ static int __make_request(request_queue_

barrier = bio_barrier(bio);
if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ pd("ORDERED_NONE, seen barrier\n");
err = -EOPNOTSUPP;
goto end_io;
}
@@ -3394,6 +3435,9 @@ void end_that_request_last(struct reques
if (end_io_error(uptodate))
error = !uptodate ? -EIO : uptodate;

+ if (!(req->flags & REQ_ELVPRIV))
+ pd0("!ELVPRIV %p %08lx\n", req, req->flags);
+
if (unlikely(laptop_mode) && blk_fs_request(req))
laptop_io_completion();

diff --git a/drivers/scsi/sata_sil24.c b/drivers/scsi/sata_sil24.c
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 991a5ca..e7a5df1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -427,6 +427,8 @@ struct request_queue
struct request *orig_bar_rq;
unsigned int bi_size;
struct blk_trace *blk_trace;
+
+ int id;
};

#define RQ_INACTIVE (-1)

2006-01-12 12:05:06

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 13/01/2006 12:18 a.m., Tejun Heo wrote:
> On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
> [--snip--]
>> [start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
>> [start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
>> [start_ordered ] BIO f74b0e00 48869571 4096
>> [start_ordered ] ordered=31 in_flight=1
>> [blk_do_ordered ] start_ordered f7e8a708->00000000
>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>
> Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
> of pre-flush while draining and when it finished it didn't complete
> draining thus hanging the queue. It seems like it's some kind of
> special request which probably fails and got retried. Are you using
> SMART or something which issues special commands to drives?

No SMART, although I should be (rebuilt the system a few months ago..and must
have missed it).

Are there any other things which could be contributing to this? <scratches head>

> Can you please try the following debug patch. I've added a few more
> debug messages to make things clearer.
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 1b5b5d9..a0075aa 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -37,6 +37,9 @@
>
> #include <asm/uaccess.h>
>
> +#define pd(fmt, args...) printk("[%02d %-24s] "fmt, q->id, __FUNCTION__ , ##args)
> +#define pd0(fmt, args...) printk("[na %-24s] "fmt, __FUNCTION__ , ##args)
> +
> static DEFINE_SPINLOCK(elv_list_lock);
> static LIST_HEAD(elv_list);

I'm applying to -mm3 - applies with some fuzz.

Here's the last few lines:

[01 elv_completed_request ] seq=03 unacc c1b351c4 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[01 blk_do_ordered ] seq=08 c1b3526c->c1b3526c (flags=0xd9)
[01 elv_next_request ] c1b3526c (bar)
[01 blk_do_ordered ] seq=08 c1b35314->00000000 (flags=0x0)
[01 ordered_bio_endio ] q->orderr=0 error=0
[na flush_dry_bio_endio ] BIO f7eb95c0 48869571 4096
[na end_that_request_last ] !ELVPRIV c1b3526c 000003d9
[01 elv_completed_request ] seq=07 rq=c1b3526c (flags=0x3d9) infl=0
[01 blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[01 blk_do_ordered ] seq=10 c1b35314->c1b35314 (flags=0x2002018)
[01 elv_next_request ] c1b35314 (post)
[na end_that_request_last ] !ELVPRIV c1b35314 02002318
[01 elv_completed_request ] seq=0f unacc c1b35314 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[01 blk_ordered_complete_seq] sequence complete
[02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
[01 start_ordered ] f7dd93c0 -> c1b351c4,c1b3526c,c1b35314 ordcolor=1
infl=0
[01 start_ordered ] f7e93d80 0 69641528 8 8 1 1 c1ba7000
[01 start_ordered ] BIO f7e93d80 69641528 4096
[01 start_ordered ] ordered=31 in_flight=0
[01 blk_do_ordered ] start_ordered f7dd93c0->c1b351c4
[01 elv_next_request ] c1b351c4 (pre)
[01 blk_do_ordered ] seq=04 c1b3526c->00000000 (flags=0x0)
[na end_that_request_last ] !ELVPRIV c1b351c4 02002318
[01 elv_completed_request ] seq=03 unacc c1b351c4 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[01 blk_do_ordered ] seq=08 c1b3526c->c1b3526c (flags=0xd9)
[01 elv_next_request ] c1b3526c (bar)
[01 blk_do_ordered ] seq=08 c1b35314->00000000 (flags=0x0)
[01 ordered_bio_endio ] q->orderr=0 error=0
[na flush_dry_bio_endio ] BIO f7e93d80 69641528 4096
[na end_that_request_last ] !ELVPRIV c1b3526c 000003d9
[01 elv_completed_request ] seq=07 rq=c1b3526c (flags=0x3d9) infl=0
[01 blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[01 blk_do_ordered ] seq=10 c1b35314->c1b35314 (flags=0x2002018)
[01 elv_next_request ] c1b35314 (post)
[na end_that_request_last ] !ELVPRIV c1b35314 02002318
[01 elv_completed_request ] seq=0f unacc c1b35314 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[01 blk_ordered_complete_seq] sequence complete
[02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
[01 start_ordered ] f7dd93c0 -> c1b351c4,c1b3526c,c1b35314 ordcolor=1
infl=0
[01 start_ordered ] f7e938c0 0 69641536 8 8 1 1 f7dae000
[01 start_ordered ] BIO f7e938c0 69641536 4096
[01 start_ordered ] ordered=31 in_flight=0
[01 blk_do_ordered ] start_ordered f7dd93c0->c1b351c4
[01 elv_next_request ] c1b351c4 (pre)
[01 blk_do_ordered ] seq=04 c1b3526c->00000000 (flags=0x0)
[na end_that_request_last ] !ELVPRIV c1b351c4 02002318
[01 elv_completed_request ] seq=03 unacc c1b351c4 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[01 blk_do_ordered ] seq=08 c1b3526c->c1b3526c (flags=0xd9)
[01 elv_next_request ] c1b3526c (bar)
[01 blk_do_ordered ] seq=08 c1b35314->00000000 (flags=0x0)
[01 ordered_bio_endio ] q->orderr=0 error=0
[na flush_dry_bio_endio ] BIO f7e938c0 69641536 4096
[na end_that_request_last ] !ELVPRIV c1b3526c 000003d9
[01 elv_completed_request ] seq=07 rq=c1b3526c (flags=0x3d9) infl=0
[01 blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[01 blk_do_ordered ] seq=10 c1b35314->c1b35314 (flags=0x2002018)
[01 elv_next_request ] c1b35314 (post)
[na end_that_request_last ] !ELVPRIV c1b35314 02002318
[01 elv_completed_request ] seq=0f unacc c1b35314 (flags=0x2002318) infl=0
[01 blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[01 blk_ordered_complete_seq] sequence complete

The full 300k file is up on http://www.reub.net/files/kernel/ It's too big to
be sending to everyone..

reuben




2006-01-12 12:34:13

by Ric Wheeler

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly wrote:

>
>
> On 13/01/2006 12:18 a.m., Tejun Heo wrote:
>
>> On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
>> [--snip--]
>>
>>> [start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
>>> [start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
>>> [start_ordered ] BIO f74b0e00 48869571 4096
>>> [start_ordered ] ordered=31 in_flight=1
>>> [blk_do_ordered ] start_ordered f7e8a708->00000000
>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>
>>
>> Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
>> of pre-flush while draining and when it finished it didn't complete
>> draining thus hanging the queue. It seems like it's some kind of
>> special request which probably fails and got retried. Are you using
>> SMART or something which issues special commands to drives?
>
>
> No SMART, although I should be (rebuilt the system a few months
> ago..and must
> have missed it).
>
> Are there any other things which could be contributing to this?
> <scratches head>
>
Could this be hdparm or something tweaking the drive write cache
settings, etc?

ric

2006-01-12 12:39:21

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 13/01/2006 1:31 a.m., Ric Wheeler wrote:
> Reuben Farrelly wrote:
>> On 13/01/2006 12:18 a.m., Tejun Heo wrote:
>>> On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
>>> [--snip--]
>>>
>>>> [start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
>>>> [start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
>>>> [start_ordered ] BIO f74b0e00 48869571 4096
>>>> [start_ordered ] ordered=31 in_flight=1
>>>> [blk_do_ordered ] start_ordered f7e8a708->00000000
>>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>
>>>
>>> Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
>>> of pre-flush while draining and when it finished it didn't complete
>>> draining thus hanging the queue. It seems like it's some kind of
>>> special request which probably fails and got retried. Are you using
>>> SMART or something which issues special commands to drives?
>>
>>
>> No SMART, although I should be (rebuilt the system a few months
>> ago..and must
>> have missed it).
>>
>> Are there any other things which could be contributing to this?
>> <scratches head>
>>
> Could this be hdparm or something tweaking the drive write cache
> settings, etc?

hdparm isn't configured on the box by me or called by initscripts in Fedora
either, AFAIK.

reuben

2006-01-12 13:55:41

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hello, again.

On Fri, Jan 13, 2006 at 01:39:18AM +1300, Reuben Farrelly wrote:
>
>
> On 13/01/2006 1:31 a.m., Ric Wheeler wrote:
> >Reuben Farrelly wrote:
> >>On 13/01/2006 12:18 a.m., Tejun Heo wrote:
> >>>On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
> >>>[--snip--]
> >>>
> >>>>[start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
> >>>>[start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
> >>>>[start_ordered ] BIO f74b0e00 48869571 4096
> >>>>[start_ordered ] ordered=31 in_flight=1
> >>>>[blk_do_ordered ] start_ordered f7e8a708->00000000
> >>>>[blk_do_ordered ] seq=02 f74ccd98->f74ccd98
> >>>>[blk_do_ordered ] seq=02 f74ccd98->f74ccd98
> >>>>[blk_do_ordered ] seq=02 c1b028fc->00000000
> >>>>[blk_do_ordered ] seq=02 c1b028fc->00000000
> >>>>[blk_do_ordered ] seq=02 c1b028fc->00000000
> >>>
> >>>
> >>>Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
> >>>of pre-flush while draining and when it finished it didn't complete
> >>>draining thus hanging the queue. It seems like it's some kind of
> >>>special request which probably fails and got retried. Are you using
> >>>SMART or something which issues special commands to drives?
> >>
> >>
> >>No SMART, although I should be (rebuilt the system a few months
> >>ago..and must
> >>have missed it).
> >>
> >>Are there any other things which could be contributing to this?
> >><scratches head>
> >>
> >Could this be hdparm or something tweaking the drive write cache
> >settings, etc?
>
> hdparm isn't configured on the box by me or called by initscripts in Fedora
> either, AFAIK.
>

This is the offending part of your new log.

[02 start_ordered ] c1b36120 -> c1b35904,c1b359ac,c1b35a54 ordcolor=1 infl=1
[02 start_ordered ] f7eb91c0 0 68436682 8 8 1 1 f7dc0000
[02 start_ordered ] BIO f7eb91c0 68436682 4096
[02 start_ordered ] ordered=31 in_flight=1
[02 blk_do_ordered ] start_ordered c1b36120->00000000
[02 blk_do_ordered ] seq=02 f7e53660->f7e53660 (flags=0x32888)
[02 elv_completed_request ] seq=01 rq=f7dd7ba0 (flags=0x2000b44) infl=0
[02 blk_do_ordered ] seq=02 f7e53660->f7e53660 (flags=0x32b88)
[02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
[na flush_dry_bio_endio ] BIO c19c7580 48869579 4096
[na end_that_request_last ] !ELVPRIV c1b3526c 000003d9
[02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
[02 elv_completed_request ] seq=01 unacc f7e53660 (flags=0x32b88) infl=0
[na end_that_request_last ] !ELVPRIV c1b35314 02002318
[02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)

And I was wrong, it wasn't special command being requeued. What
happens here is....

1. fs requests are happily being processed

2. barrier request comes at the head of the queue

3. ordered code interprets it into three request sequence, a fs
request is still in flight, so it wait for the queue to be drained.

4. a REQ_SPECIAL | REQ_BLOCK_PC | REQ_QUIET request gets queued at
the head of the queue. (I have no idea where this comes from. sd
driver doesn't even handle PC requests. It will be just failed.
Some kind of hardware management stuff trying to probe MMC
devices?)

5. the in-flight fs request finishes, in_flight is now zero but the
head of queue is not the ordered sequence. It determines draining
isn't complete yet.

6. the special request from #4 got issued and completed, but due to
my stupid mistake, special requests don't check for draining
completion condition.

7. The queue is stuck now. SORRY. My apologies.

Reuben, can you please test the following patch? It's against -mm2
but should apply to -mm3 too. If you confirm this one, I'll submit to
Jens & Andrew with proper explanations and stuff. Thanks a lot for
all your time and trouble.


diff --git a/block/elevator.c b/block/elevator.c
index 1b5b5d9..f905e47 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -615,23 +615,23 @@ void elv_completed_request(request_queue
* request is released from the driver, io must be done
*/
if (blk_account_rq(rq)) {
- struct request *first_rq = list_entry_rq(q->queue_head.next);
-
q->in_flight--;
+ if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
+ e->ops->elevator_completed_req_fn(q, rq);
+ }

- /*
- * Check if the queue is waiting for fs requests to be
- * drained for flush sequence.
- */
- if (q->ordseq && q->in_flight == 0 &&
+ /*
+ * Check if the queue is waiting for fs requests to be
+ * drained for flush sequence.
+ */
+ if (unlikely(q->ordseq)) {
+ struct request *first_rq = list_entry_rq(q->queue_head.next);
+ if (q->in_flight == 0 &&
blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN &&
blk_ordered_req_seq(first_rq) > QUEUE_ORDSEQ_DRAIN) {
blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0);
q->request_fn(q);
}
-
- if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
- e->ops->elevator_completed_req_fn(q, rq);
}
}

2006-01-12 14:10:36

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12 2006, Tejun Heo wrote:
> 4. a REQ_SPECIAL | REQ_BLOCK_PC | REQ_QUIET request gets queued at
> the head of the queue. (I have no idea where this comes from. sd
> driver doesn't even handle PC requests. It will be just failed.
> Some kind of hardware management stuff trying to probe MMC
> devices?)

But it does, sd understands these just fine (see references to
blk_pc_request()).

It could be coming from someone doing a blkdev_issue_flush, that will
even cause sd to queue such a request internally. So it isn't
necessarily from user space (it would have to be through SG_IO at that
point), and Reubens boot log doesn't have any evidence of anything of
that nature being started. So I'm guessing it's the flush, raid1
propagates these flushes to the bottom devices when it sees one.

Your analysis looks correct though, Reuben looking forward to hearing
whether this fixes your boot hang!

--
Jens Axboe

2006-01-12 14:20:35

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Jens Axboe wrote:
> On Thu, Jan 12 2006, Tejun Heo wrote:
>
>>4. a REQ_SPECIAL | REQ_BLOCK_PC | REQ_QUIET request gets queued at
>> the head of the queue. (I have no idea where this comes from. sd
>> driver doesn't even handle PC requests. It will be just failed.
>> Some kind of hardware management stuff trying to probe MMC
>> devices?)
>
>
> But it does, sd understands these just fine (see references to
> blk_pc_request()).
>
> It could be coming from someone doing a blkdev_issue_flush, that will
> even cause sd to queue such a request internally. So it isn't
> necessarily from user space (it would have to be through SG_IO at that
> point), and Reubens boot log doesn't have any evidence of anything of
> that nature being started. So I'm guessing it's the flush, raid1
> propagates these flushes to the bottom devices when it sees one.
>
> Your analysis looks correct though, Reuben looking forward to hearing
> whether this fixes your boot hang!
>

Ah... you're right. I was only staring at the !blk_fs_request() test in
sd_init_command(). It has early exit for blk_pc_request() of course.
It needs to handle SG_IO's & flushes. :-p

--
tejun

2006-01-12 17:11:56

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12, 2006 at 11:58:31AM +0100, Ulrich Mueller wrote:

> $ lspci -s 00:02.0 -v
> 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
> Subsystem: Hewlett-Packard Company nx6110/nc6120
> Flags: bus master, fast devsel, latency 0, IRQ 16
> Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
> I/O ports at 7000 [size=8]
> Memory at c0000000 (32-bit, prefetchable) [size=256M]
> Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
> Capabilities: [d0] Power Management version 2

Another one that advertises no AGP capabilities.
In this situation you shouldn't *need* agpgart. If it's PCI[E],
radeon will use pcigart.

Dave

2006-01-12 18:11:41

by Ulrich Mueller

[permalink] [raw]
Subject: Re: 2.6.15-mm2

>>>>> On Thu, 12 Jan 2006, Dave Jones wrote:

>> $ lspci -s 00:02.0 -v
>> 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
>> Subsystem: Hewlett-Packard Company nx6110/nc6120
>> Flags: bus master, fast devsel, latency 0, IRQ 16
>> Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
>> I/O ports at 7000 [size=8]
>> Memory at c0000000 (32-bit, prefetchable) [size=256M]
>> Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
>> Capabilities: [d0] Power Management version 2

> Another one that advertises no AGP capabilities.
> In this situation you shouldn't *need* agpgart. If it's PCI[E],
> radeon will use pcigart.

Problem is that i915 depends on DRM && AGP && AGP_INTEL.
And at the end of i{810,830,915}_dma.c there is the comment:
"All Intel graphics chipsets are treated as AGP, even if they are
really PCI-e."

2006-01-12 19:12:55

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Dave Jones wrote:

>On Thu, Jan 12, 2006 at 11:58:31AM +0100, Ulrich Mueller wrote:
>
> > $ lspci -s 00:02.0 -v
> > 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
> > Subsystem: Hewlett-Packard Company nx6110/nc6120
> > Flags: bus master, fast devsel, latency 0, IRQ 16
> > Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
> > I/O ports at 7000 [size=8]
> > Memory at c0000000 (32-bit, prefetchable) [size=256M]
> > Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
> > Capabilities: [d0] Power Management version 2
>
>Another one that advertises no AGP capabilities.
>In this situation you shouldn't *need* agpgart. If it's PCI[E],
>radeon will use pcigart.
>
>
Is this supposed to work soon ? Looking at all "agp_foo" symbols in the
drm module, there might lots of work do first, right ? In this case, it
might be good to still be able to load agpgart on PCI-E machine for a
while ?

Brice

2006-01-12 19:22:06

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, Jan 12, 2006 at 02:12:23PM -0500, Brice Goglin wrote:
> Dave Jones wrote:
>
> >On Thu, Jan 12, 2006 at 11:58:31AM +0100, Ulrich Mueller wrote:
> >
> > > $ lspci -s 00:02.0 -v
> > > 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
> > > Subsystem: Hewlett-Packard Company nx6110/nc6120
> > > Flags: bus master, fast devsel, latency 0, IRQ 16
> > > Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
> > > I/O ports at 7000 [size=8]
> > > Memory at c0000000 (32-bit, prefetchable) [size=256M]
> > > Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
> > > Capabilities: [d0] Power Management version 2
> >
> >Another one that advertises no AGP capabilities.
> >In this situation you shouldn't *need* agpgart. If it's PCI[E],
> >radeon will use pcigart.
> >
> Is this supposed to work soon ? Looking at all "agp_foo" symbols in the
> drm module, there might lots of work do first, right ? In this case, it
> might be good to still be able to load agpgart on PCI-E machine for a
> while ?

Well, mainline does that already, and this stuff won't go anywhere
near there anytime soon. I'd like to understand why drm can't find
the symbols in the module, even though its loaded, but between being
ill this week, and trying to get the Fedora 5 test2 kernel in shape,
I've not had much chance to dig into it yet.

Dave

2006-01-12 19:26:47

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Hi,

On 13/01/2006 2:55 a.m., Tejun Heo wrote:
> Hello, again.
>
> On Fri, Jan 13, 2006 at 01:39:18AM +1300, Reuben Farrelly wrote:
>>
>> On 13/01/2006 1:31 a.m., Ric Wheeler wrote:
>>> Reuben Farrelly wrote:
>>>> On 13/01/2006 12:18 a.m., Tejun Heo wrote:
>>>>> On Thu, Jan 12, 2006 at 09:38:48PM +1300, Reuben Farrelly wrote:
>>>>> [--snip--]
>>>>>
>>>>>> [start_ordered ] f7e8a708 -> c1b028fc,c1b029a4,c1b02a4c infl=1
>>>>>> [start_ordered ] f74b0e00 0 48869571 8 8 1 1 c1ba9000
>>>>>> [start_ordered ] BIO f74b0e00 48869571 4096
>>>>>> [start_ordered ] ordered=31 in_flight=1
>>>>>> [blk_do_ordered ] start_ordered f7e8a708->00000000
>>>>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>>>>> [blk_do_ordered ] seq=02 f74ccd98->f74ccd98
>>>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>>>> [blk_do_ordered ] seq=02 c1b028fc->00000000
>>>>>
>>>>> Yeap, this one is the offending one. 0xf74ccd98 got requeued in front
>>>>> of pre-flush while draining and when it finished it didn't complete
>>>>> draining thus hanging the queue. It seems like it's some kind of
>>>>> special request which probably fails and got retried. Are you using
>>>>> SMART or something which issues special commands to drives?
>>>>
>>>> No SMART, although I should be (rebuilt the system a few months
>>>> ago..and must
>>>> have missed it).
>>>>
>>>> Are there any other things which could be contributing to this?
>>>> <scratches head>
>>>>
>>> Could this be hdparm or something tweaking the drive write cache
>>> settings, etc?
>> hdparm isn't configured on the box by me or called by initscripts in Fedora
>> either, AFAIK.
>>
>
> This is the offending part of your new log.
>
> [02 start_ordered ] c1b36120 -> c1b35904,c1b359ac,c1b35a54 ordcolor=1 infl=1
> [02 start_ordered ] f7eb91c0 0 68436682 8 8 1 1 f7dc0000
> [02 start_ordered ] BIO f7eb91c0 68436682 4096
> [02 start_ordered ] ordered=31 in_flight=1
> [02 blk_do_ordered ] start_ordered c1b36120->00000000
> [02 blk_do_ordered ] seq=02 f7e53660->f7e53660 (flags=0x32888)
> [02 elv_completed_request ] seq=01 rq=f7dd7ba0 (flags=0x2000b44) infl=0
> [02 blk_do_ordered ] seq=02 f7e53660->f7e53660 (flags=0x32b88)
> [02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
> [na flush_dry_bio_endio ] BIO c19c7580 48869579 4096
> [na end_that_request_last ] !ELVPRIV c1b3526c 000003d9
> [02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
> [02 elv_completed_request ] seq=01 unacc f7e53660 (flags=0x32b88) infl=0
> [na end_that_request_last ] !ELVPRIV c1b35314 02002318
> [02 blk_do_ordered ] seq=02 c1b35904->00000000 (flags=0x0)
>
> And I was wrong, it wasn't special command being requeued. What
> happens here is....
>
> 1. fs requests are happily being processed
>
> 2. barrier request comes at the head of the queue
>
> 3. ordered code interprets it into three request sequence, a fs
> request is still in flight, so it wait for the queue to be drained.
>
> 4. a REQ_SPECIAL | REQ_BLOCK_PC | REQ_QUIET request gets queued at
> the head of the queue. (I have no idea where this comes from. sd
> driver doesn't even handle PC requests. It will be just failed.
> Some kind of hardware management stuff trying to probe MMC
> devices?)
>
> 5. the in-flight fs request finishes, in_flight is now zero but the
> head of queue is not the ordered sequence. It determines draining
> isn't complete yet.
>
> 6. the special request from #4 got issued and completed, but due to
> my stupid mistake, special requests don't check for draining
> completion condition.
>
> 7. The queue is stuck now. SORRY. My apologies.
>
> Reuben, can you please test the following patch? It's against -mm2
> but should apply to -mm3 too. If you confirm this one, I'll submit to
> Jens & Andrew with proper explanations and stuff. Thanks a lot for
> all your time and trouble.
>
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 1b5b5d9..f905e47 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -615,23 +615,23 @@ void elv_completed_request(request_queue
> * request is released from the driver, io must be done
> */
> if (blk_account_rq(rq)) {
> - struct request *first_rq = list_entry_rq(q->queue_head.next);
> -
> q->in_flight--;
> + if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
> + e->ops->elevator_completed_req_fn(q, rq);
> + }
>
> - /*
> - * Check if the queue is waiting for fs requests to be
> - * drained for flush sequence.
> - */
> - if (q->ordseq && q->in_flight == 0 &&
> + /*
> + * Check if the queue is waiting for fs requests to be
> + * drained for flush sequence.
> + */
> + if (unlikely(q->ordseq)) {
> + struct request *first_rq = list_entry_rq(q->queue_head.next);
> + if (q->in_flight == 0 &&
> blk_ordered_cur_seq(q) == QUEUE_ORDSEQ_DRAIN &&
> blk_ordered_req_seq(first_rq) > QUEUE_ORDSEQ_DRAIN) {
> blk_ordered_complete_seq(q, QUEUE_ORDSEQ_DRAIN, 0);
> q->request_fn(q);
> }
> -
> - if (blk_sorted_rq(rq) && e->ops->elevator_completed_req_fn)
> - e->ops->elevator_completed_req_fn(q, rq);
> }
> }

Indeed that seems to fix it. I've just booted -mm3 and it came up with no
problems at all.

Many thanks for the fix Tejun :)

reuben

2006-01-12 20:34:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Reuben Farrelly <[email protected]> wrote:
>
> Indeed that seems to fix it. I've just booted -mm3 and it came up with no
> problems at all.

whew.

What about all the other problems? The oops under ata_device_add()?

And is it still saying this?

Alan Cox <[email protected]> wrote:
>
> On Iau, 2006-01-12 at 16:55 +1300, Reuben Farrelly wrote:
> > ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0
> > ata2: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x8 irq 0
> > Unable to handle kernel NULL pointer dereference at virtual address 00000000
>
> That is the critical bit. The SATA ports have no PCI resources assigned
> for bus mastering (BAR 4). libata should have driven the device PIO in
> this case but the resource should have been assigned.

2006-01-12 20:39:12

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-mm2

>
> > Another one that advertises no AGP capabilities.
> > In this situation you shouldn't *need* agpgart. If it's PCI[E],
> > radeon will use pcigart.
>
> Problem is that i915 depends on DRM && AGP && AGP_INTEL.
> And at the end of i{810,830,915}_dma.c there is the comment:
> "All Intel graphics chipsets are treated as AGP, even if they are
> really PCI-e."
>

I've cc'ed Alan Hourihane, but from memory the Intel on-board graphics
chips don't advertise the AGP bit on the graphics controllers but work
using AGP...

I've got an PCIE chipset with Radeon on it, and in that case I could get
away without agpgart...

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2006-01-12 20:52:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.15-mm2

Andrew Morton wrote:
> Reuben Farrelly <[email protected]> wrote:
>
>> Indeed that seems to fix it. I've just booted -mm3 and it came up with no
>> problems at all.
>
>
> whew.
>
> What about all the other problems? The oops under ata_device_add()?
>
> And is it still saying this?

ACK the questions... I confess I dove into this earlier today, and
quickly got lost in the multiple bug reports :( Nothing against Reuben
-- we need more motivated bug reporters like him!

Jeff



2006-01-12 21:03:23

by Alan Hourihane

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Thu, 2006-01-12 at 20:37 +0000, Dave Airlie wrote:
> >
> > > Another one that advertises no AGP capabilities.
> > > In this situation you shouldn't *need* agpgart. If it's PCI[E],
> > > radeon will use pcigart.
> >
> > Problem is that i915 depends on DRM && AGP && AGP_INTEL.
> > And at the end of i{810,830,915}_dma.c there is the comment:
> > "All Intel graphics chipsets are treated as AGP, even if they are
> > really PCI-e."
> >
>
> I've cc'ed Alan Hourihane, but from memory the Intel on-board graphics
> chips don't advertise the AGP bit on the graphics controllers but work
> using AGP...
>
> I've got an PCIE chipset with Radeon on it, and in that case I could get
> away without agpgart...

Dave,

You're probably reading too much into that last statement.

I've never seen a pure PCI-e chipset from Intel (i.e. the ones without
integrated graphics) so that may not be true, but the ones with
integrated graphics are always treated as AGP based.

Alan.

2006-01-12 22:02:09

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-mm2

> >
> > I've cc'ed Alan Hourihane, but from memory the Intel on-board graphics
> > chips don't advertise the AGP bit on the graphics controllers but work
> > using AGP...
> >
> > I've got an PCIE chipset with Radeon on it, and in that case I could get
> > away without agpgart...
>
> Dave,
>
> You're probably reading too much into that last statement.
>
> I've never seen a pure PCI-e chipset from Intel (i.e. the ones without
> integrated graphics) so that may not be true, but the ones with
> integrated graphics are always treated as AGP based.
>

I'll show you one at xdevconf if I can get there, it has just a PCI-E
root bridge no graphics controller, we still init AGP on it but I
don't think there is any need, however for all the integrated
graphics, even if they don't advertise AGP they do use it which is
DaveJ's problem that he was trying not to load AGP if the AGP was
being advertised..

Dave.

2006-01-13 04:50:00

by Reuben Farrelly

[permalink] [raw]
Subject: Re: 2.6.15-mm2



On 13/01/2006 9:51 a.m., Jeff Garzik wrote:
> Andrew Morton wrote:
>> Reuben Farrelly <[email protected]> wrote:
>>
>>> Indeed that seems to fix it. I've just booted -mm3 and it came up
>>> with no problems at all.
>>
>>
>> whew.

Still up and running with 9 1/2 hrs uptime on the clock.

>> What about all the other problems? The oops under ata_device_add()?
>>
>> And is it still saying this?
>
> ACK the questions... I confess I dove into this earlier today, and
> quickly got lost in the multiple bug reports :( Nothing against Reuben
> -- we need more motivated bug reporters like him!

See http://www.reub.net/files/kernel/outstanding-kernel-bugs.txt for a list and
status of things which are affecting me. I'm trying to keep it up to date so
that it's useful - at the moment I need to keep a list else things get muddled.

Basically in a nutshell:

1. SATA crashing out when controller not configured:

ata1: SATA max UDMA/133 cmd 0x0 ctl 0x2 bmdma 0x0 irq 0

Call Trace:
[<c0103c5d>] show_stack+0x9b/0xc0
[<c0103de4>] show_registers+0x162/0x1e7
[<c0103f8f>] die+0x126/0x231
[<c01140db>] do_page_fault+0x271/0x5b9
[<c01037df>] error_code+0x4f/0x54
[<c023cabd>] class_device_del+0xa3/0x156

This is the next serious problem after the barrier/elevator bug which is now
succesfully resolved. Maybe half the time I boot up I need to reset hardware as
this one triggers.
It's a good genuine Intel board with the latest bios, and only started happening
recently, so I'm not yet convinced it's hardware..


2. SATA 'slow to respond'

Occasionally:

ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 193
ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 50
ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 50
ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 50
ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 50
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1 is slow to respond, please be patient
ata1 failed to respond (30 secs)
scsi0 : ahci
ata2: SATA link up 1.5 Gbps (SStatus 113)
ata2 is slow to respond, please be patient
ata2 failed to respond (30 secs)


3. Various other smaller problems, sky2 "Cannot find PowerManagement capability"
sometimes, cannot boot with acpi=off and also cannot boot with nosmp on an SMP
compiled kernel as SATA times out. Also "uhci_hcd 0000:00:1d.3: Unlink after
no-IRQ? Controller is probably using the wrong IRQ."
These are lower priority problems though.

The first ATA bug is a bit of a killer, so I'd appreciate if it could be looked
at out first. When testing out the barrier problem I had to do quite a good
number of extra reboots because the box kept oopsing on bootup :(

reuben


2006-01-13 08:31:52

by Alan Hourihane

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Fri, 2006-01-13 at 09:02 +1100, Dave Airlie wrote:
> > >
> > > I've cc'ed Alan Hourihane, but from memory the Intel on-board graphics
> > > chips don't advertise the AGP bit on the graphics controllers but work
> > > using AGP...
> > >
> > > I've got an PCIE chipset with Radeon on it, and in that case I could get
> > > away without agpgart...
> >
> > Dave,
> >
> > You're probably reading too much into that last statement.
> >
> > I've never seen a pure PCI-e chipset from Intel (i.e. the ones without
> > integrated graphics) so that may not be true, but the ones with
> > integrated graphics are always treated as AGP based.
> >
>
> I'll show you one at xdevconf if I can get there, it has just a PCI-E
> root bridge no graphics controller, we still init AGP on it but I
> don't think there is any need, however for all the integrated
> graphics, even if they don't advertise AGP they do use it which is
> DaveJ's problem that he was trying not to load AGP if the AGP was
> being advertised..

O.k. I didn't see the original thread to this. But yes, all integrated
graphics based Intel chipsets are AGP regardless if the chip doesn't
advertise it correctly.

Alan.

2006-01-13 14:11:57

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Tue, Jan 10, 2006 at 06:24:22PM -0800, Paul Jackson wrote:
> Andrew wrote:
> > This is caused by the inclusion of user.h in kernel.h added by
> > dump_thread-cleanup.patch.
>
> This same build breakage showed up on ia64 sn2_defconfig,
> and your patch fixes it nicely. Thanks.
>
> Acked-by: Paul Jackson <[email protected]>
>
>
> Andrian - I think that was your dump_thread-cleanup patch.

s/Andrian/Adrian/

> Please be sure to cross build other arch's when making non-local
> changes, such as this one that affected the files:
>
> arch/alpha/kernel/alpha_ksyms.c
> arch/arm26/kernel/armksyms.c
> arch/cris/kernel/crisksyms.c
> arch/cris/kernel/process.c
> arch/frv/kernel/frv_ksyms.c
> arch/frv/kernel/process.c
> arch/h8300/kernel/h8300_ksyms.c
> arch/h8300/kernel/process.c
> arch/m32r/kernel/m32r_ksyms.c
> arch/m32r/kernel/process.c
> arch/m68k/kernel/m68k_ksyms.c
> arch/m68knommu/kernel/m68k_ksyms.c
> arch/m68knommu/kernel/process.c
> arch/s390/kernel/process.c
> arch/sh64/kernel/process.c
> arch/sh64/kernel/sh_ksyms.c
> arch/sh/kernel/process.c
> arch/sh/kernel/sh_ksyms.c
> arch/sparc64/kernel/binfmt_aout32.c
> arch/sparc64/kernel/sparc64_ksyms.c
> arch/sparc/kernel/sparc_ksyms.c
> arch/v850/kernel/process.c
> arch/v850/kernel/v850_ksyms.c
> fs/binfmt_aout.c
> fs/binfmt_flat.c
> include/asm-um/processor-generic.h

None of the above was the problem.

> include/linux/kernel.h

This was the problem.

> Sure, it consumes some time, but better you do it once, then each of
> several of us have to first do a bisection on Andrew's gazillion
> patches to find the culprit, and then stare at the patch until the light
> bulb goes on in our dimm brains, only to grep back through the lkml
> messages to find that we are not alone in our misery.

It can always happen that one out of fifty patches breaks compilation on
some architectures, and I'm for sure not the one with the worst
errors/patches ratio.

-mm is an experimental kernel and although breakages are unfortunate,
they do happen.

I'm already compiling every single patch with both a non-modular and a
completely modular .config for i386. This is the amout of testing I can
afford. If this isn't considered enough I have to stop submtting
patches.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-13 15:53:24

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Adrian wrote:
> I'm already compiling every single patch with both a non-modular and a
> completely modular .config for i386. This is the amout of testing I can
> afford. If this isn't considered enough I have to stop submtting
> patches.

Whatever ... that's not my decision to make, either way.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-01-13 16:38:10

by Al Viro

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, Jan 13, 2006 at 03:11:54PM +0100, Adrian Bunk wrote:
> It can always happen that one out of fifty patches breaks compilation on
> some architectures, and I'm for sure not the one with the worst
> errors/patches ratio.
>
> -mm is an experimental kernel and although breakages are unfortunate,
> they do happen.
>
> I'm already compiling every single patch with both a non-modular and a
> completely modular .config for i386. This is the amout of testing I can
> afford. If this isn't considered enough I have to stop submtting
> patches.

This is not considered enough. If you can't be arsed to set up cross-compilers
on your box, please *stop* shitting in headers that might affect something
other than i386. It's not like doing a cross-toolchain was something hard...

2006-01-13 16:50:23

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-mm2

On Fri, Jan 13, 2006 at 08:32:05AM +0000, Alan Hourihane wrote:
> On Fri, 2006-01-13 at 09:02 +1100, Dave Airlie wrote:
> > > >
> > > > I've cc'ed Alan Hourihane, but from memory the Intel on-board graphics
> > > > chips don't advertise the AGP bit on the graphics controllers but work
> > > > using AGP...
> > > >
> > > > I've got an PCIE chipset with Radeon on it, and in that case I could get
> > > > away without agpgart...
> > >
> > > Dave,
> > >
> > > You're probably reading too much into that last statement.
> > >
> > > I've never seen a pure PCI-e chipset from Intel (i.e. the ones without
> > > integrated graphics) so that may not be true, but the ones with
> > > integrated graphics are always treated as AGP based.
> > >
> >
> > I'll show you one at xdevconf if I can get there, it has just a PCI-E
> > root bridge no graphics controller, we still init AGP on it but I
> > don't think there is any need, however for all the integrated
> > graphics, even if they don't advertise AGP they do use it which is
> > DaveJ's problem that he was trying not to load AGP if the AGP was
> > being advertised..
>
> O.k. I didn't see the original thread to this. But yes, all integrated
> graphics based Intel chipsets are AGP regardless if the chip doesn't
> advertise it correctly.

FWIW, I've dropped that change from agpgart.git. It caused more problems
than it was worth.

Dave

2006-01-13 18:11:27

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Adrian wrote:
> This is the amout of testing I can afford.

It sounds to me like you are saying that a minute of your time is
more valuable than a minute of each of several other peoples time.

The only two people I gladly accept that argument from are Linus
and Andrew.

For the rest of us, it is important to minimize the total workload
of all us combined, not to optimize our individual output.

What you don't test, several others of us get to test. Only its often
more work, for -each- of us, as we each have to figure out which of
1000 patches caused the breakage.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-01-13 18:19:33

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, 13 Jan 2006, Paul Jackson wrote:

> Adrian wrote:
> > This is the amout of testing I can afford.
>
> It sounds to me like you are saying that a minute of your time is
> more valuable than a minute of each of several other peoples time.
>
> The only two people I gladly accept that argument from are Linus
> and Andrew.
>
> For the rest of us, it is important to minimize the total workload
> of all us combined, not to optimize our individual output.
>
> What you don't test, several others of us get to test. Only its often
> more work, for -each- of us, as we each have to figure out which of
> 1000 patches caused the breakage.

I don't find building cross-toolchains quite as easy as Al does,
so I download and build with these (on i386):
http://developer.osdl.org/dev/plm/cross_compile/
as Andrew has also mentioned in the past.

Or one can submit kernel patches for builds to an OSDL
build machine which does 8 or 9 $ARCH builds.

--
~Randy

2006-01-13 19:05:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, 2006-01-13 at 10:19 -0800, Randy.Dunlap wrote:
> I don't find building cross-toolchains quite as easy as Al does,
> so I download and build with these (on i386):
> http://developer.osdl.org/dev/plm/cross_compile/
> as Andrew has also mentioned in the past.
>
> Or one can submit kernel patches for builds to an OSDL
> build machine which does 8 or 9 $ARCH builds.

Also crosstool produces quite a bunch of working cross tool chains out
of the box. As simple as Al said.

http://www.kegel.com/crosstool/

tglx


2006-01-13 19:30:18

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Paul Jackson <[email protected]> wrote:
>
> Adrian wrote:
> > This is the amout of testing I can afford.
>
> It sounds to me like you are saying that a minute of your time is
> more valuable than a minute of each of several other peoples time.
>
> The only two people I gladly accept that argument from are Linus
> and Andrew.

Yes and no. When I do a cross-compile of the whole -mm lineup I'm doing it
for probably 100 different developers' new work, so there are some efficiencies
there...

It's very easy to miss stuff due to .config selections though.

2006-01-13 21:05:20

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, Jan 13, 2006 at 10:10:54AM -0800, Paul Jackson wrote:
> Adrian wrote:
> > This is the amout of testing I can afford.
>
> It sounds to me like you are saying that a minute of your time is
> more valuable than a minute of each of several other peoples time.
>
> The only two people I gladly accept that argument from are Linus
> and Andrew.
>
> For the rest of us, it is important to minimize the total workload
> of all us combined, not to optimize our individual output.
>
> What you don't test, several others of us get to test. Only its often
> more work, for -each- of us, as we each have to figure out which of
> 1000 patches caused the breakage.

I'm working against -mm, and there it's quite common that the kernel
doesn't build on the majority of architectures due to one or two dozen
bugs other people introduced.

I didn't know people consider the quality of my patches so under-average
that they want to require me to fix other people's compile errors first
and test the compilation on all 24 architectures before I'm allowed to
submit a patch that touches some architecture-independend code.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-13 21:08:49

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, Jan 13, 2006 at 10:19:30AM -0800, Randy.Dunlap wrote:
> On Fri, 13 Jan 2006, Paul Jackson wrote:
>
> > Adrian wrote:
> > > This is the amout of testing I can afford.
> >
> > It sounds to me like you are saying that a minute of your time is
> > more valuable than a minute of each of several other peoples time.
> >
> > The only two people I gladly accept that argument from are Linus
> > and Andrew.
> >
> > For the rest of us, it is important to minimize the total workload
> > of all us combined, not to optimize our individual output.
> >
> > What you don't test, several others of us get to test. Only its often
> > more work, for -each- of us, as we each have to figure out which of
> > 1000 patches caused the breakage.
>
> I don't find building cross-toolchains quite as easy as Al does,
> so I download and build with these (on i386):
> http://developer.osdl.org/dev/plm/cross_compile/
> as Andrew has also mentioned in the past.
>
> Or one can submit kernel patches for builds to an OSDL
> build machine which does 8 or 9 $ARCH builds.

This leaves 15 or 16 other architectures for my puny 1,8 GHz CPU.

And does OSDL fix other people's compile breakages in the latest -mm
before I submit my patches, or am I required to play QA for every
single architecture before I can submit one single patch touching
architecture-independend files?

> ~Randy

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-13 21:12:26

by Randy Dunlap

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, 13 Jan 2006, Adrian Bunk wrote:

> On Fri, Jan 13, 2006 at 10:19:30AM -0800, Randy.Dunlap wrote:
> > On Fri, 13 Jan 2006, Paul Jackson wrote:
> >
> > > Adrian wrote:
> > > > This is the amout of testing I can afford.
> > >
> > > It sounds to me like you are saying that a minute of your time is
> > > more valuable than a minute of each of several other peoples time.
> > >
> > > The only two people I gladly accept that argument from are Linus
> > > and Andrew.
> > >
> > > For the rest of us, it is important to minimize the total workload
> > > of all us combined, not to optimize our individual output.
> > >
> > > What you don't test, several others of us get to test. Only its often
> > > more work, for -each- of us, as we each have to figure out which of
> > > 1000 patches caused the breakage.
> >
> > I don't find building cross-toolchains quite as easy as Al does,
> > so I download and build with these (on i386):
> > http://developer.osdl.org/dev/plm/cross_compile/
> > as Andrew has also mentioned in the past.
> >
> > Or one can submit kernel patches for builds to an OSDL
> > build machine which does 8 or 9 $ARCH builds.
>
> This leaves 15 or 16 other architectures for my puny 1,8 GHz CPU.
>
> And does OSDL fix other people's compile breakages in the latest -mm
> before I submit my patches, or am I required to play QA for every
> single architecture before I can submit one single patch touching
> architecture-independend files?

-ETOOMUCHSARCASM
(from someone who also uses sarcasm often)

But seriously, I don't think anyone suggested anything quite
as extreme as your question implies.

--
~Randy

2006-01-13 21:33:01

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

On Fri, Jan 13, 2006 at 01:12:24PM -0800, Randy.Dunlap wrote:
> On Fri, 13 Jan 2006, Adrian Bunk wrote:
>
> > On Fri, Jan 13, 2006 at 10:19:30AM -0800, Randy.Dunlap wrote:
> > > On Fri, 13 Jan 2006, Paul Jackson wrote:
> > >
> > > > Adrian wrote:
> > > > > This is the amout of testing I can afford.
> > > >
> > > > It sounds to me like you are saying that a minute of your time is
> > > > more valuable than a minute of each of several other peoples time.
> > > >
> > > > The only two people I gladly accept that argument from are Linus
> > > > and Andrew.
> > > >
> > > > For the rest of us, it is important to minimize the total workload
> > > > of all us combined, not to optimize our individual output.
> > > >
> > > > What you don't test, several others of us get to test. Only its often
> > > > more work, for -each- of us, as we each have to figure out which of
> > > > 1000 patches caused the breakage.
> > >
> > > I don't find building cross-toolchains quite as easy as Al does,
> > > so I download and build with these (on i386):
> > > http://developer.osdl.org/dev/plm/cross_compile/
> > > as Andrew has also mentioned in the past.
> > >
> > > Or one can submit kernel patches for builds to an OSDL
> > > build machine which does 8 or 9 $ARCH builds.
> >
> > This leaves 15 or 16 other architectures for my puny 1,8 GHz CPU.
> >
> > And does OSDL fix other people's compile breakages in the latest -mm
> > before I submit my patches, or am I required to play QA for every
> > single architecture before I can submit one single patch touching
> > architecture-independend files?
>
> -ETOOMUCHSARCASM
> (from someone who also uses sarcasm often)
>
> But seriously, I don't think anyone suggested anything quite
> as extreme as your question implies.

Where's the sarcasm?

Looking at [1], the change to include/linux/kernel.h I'm so heavily
criticized for in this thread broke the compilation of 2.6.16-mm2
on 3 or 4 out of our 24 architectures.

Which less extreme approach would for sure have prevented this?

> ~Randy

cu
Adrian

[1] http://l4x.org/k/

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-13 21:52:35

by Paul Jackson

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

> Which less extreme approach would for sure have prevented this?

Clearly, if what I said leads inexorably to such extreme measures,
then what I said is full of crock.

And Andrew made a good point, that matches up well with your mention
of your "puny 1.8 GHz CPU." There is some efficiency to be gained
from doing crosstool builds against 100 changes at once, rather than
100 developers each doing them for their one change.

Personally, I recommend a half-way effort. Do crosstool builds
against a few arch's when doing more risky stuff; sometimes batch
up crosstool builds for several fixes at once (which may mean that
I actually send in the fix before the crosstool build succeeds);
send in some fixes for what you see broken if it looks "easy enough"
to you; post a description of some of the other breakages you don't
have time or expertise to track down; sometimes just say to heck with
it, and don't crosstool test, or ignore some of the breakage if it
doesn't look like your problem.

But I'm still relatively junior around here. Those with more battle
scars than I are worth paying more attention to than I am.

My basic rule of thumb, that I suspect scales rather well, is to
try to clean up more crap of others than I drop on them. If we all
kept a slightly positive balance in our crap cleanup versus dropping
statement, then life would be good.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2006-01-13 22:19:57

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.15-mm2: alpha broken

Paul Jackson <[email protected]> wrote:
>
> And Andrew made a good point,

It's January - time for my annual good point.

> that matches up well with your mention
> of your "puny 1.8 GHz CPU." There is some efficiency to be gained
> from doing crosstool builds against 100 changes at once, rather than
> 100 developers each doing them for their one change.

Of course, people other than myself can and do run cross-builds on -mm
trees, which is doubleplus efficient. It appears that Adrian is doing
this, for one.

So the 100-or-more people write their patches, they go into -mm and then I
and a few others do the cross-compiling, feed back or fix any problems.