ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
- Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
- Various random other things - nothing major.
Changes since 2.6.14-rc1-mm1:
linus.patch
git-cifs.patch
git-cryptodev.patch
git-drm.patch
git-ia64.patch
git-jfs.patch
git-libata-all.patch
git-mtd.patch
git-netdev-all.patch
git-nfs.patch
git-nfs-oops-fix.patch
git-ocfs2-prep.patch
git-ocfs2.patch
git-scsi-misc.patch
git-sas.patch
git-watchdog.patch
Subsystem trees
-raid6-altivec-fix.patch
-sharpsl-add-missing-hunk-from-backlight-update.patch
-mtd-update-sharpsl-partition-definitions.patch
-s390-default-configuration.patch
-s390-bl_dev-array-size.patch
-s390-crypto-driver-patch-take-2.patch
-s390-show_cpuinfo-fix.patch
-s390-diag-0x308-reipl.patch
-remove-arch-arm26-boot-compressed-hw-bsec.patch
-cpu-hotplug-breaks-wake_up_new_task.patch
-s390-kernel-stack-corruption.patch
-uml-_switch_to-code-consolidation.patch
-uml-breakpoint-an-arbitrary-thread.patch
-uml-remove-a-useless-include.patch
-uml-remove-an-unused-file.patch
-uml-remove-some-build-warnings.patch
-uml-preserve-errno-in-error-paths.patch
-uml-move-libc-code-out-of-mem_userc-and-tempfilec.patch
-uml-merge-mem_userc-and-memc.patch
-uml-return-a-real-error-code.patch
-uml-remove-include-of-asm-elfh.patch
-fix-up-some-pm_message_t-types.patch
-fix-mm-kconfig-spelling.patch
-x86_64-e820c-needs-module-h.patch
-seclvl-use-securityfs-tidy.patch
-seclvl-use-securityfs-fix.patch
-hdaps-driver-update.patch
-driver-core-fix-bus_rescan_devices-race-2.patch
-i2c-kill-an-unused-i2c_adapter-struct-member.patch
-fix-buffer-overrun-in-rpadlpar_sysfsc.patch
-ibmphp-use-dword-accessors-for-pci_rom_address.patch
-pciehp-use-dword-accessors-for-pci_rom_address.patch
-shpchp-use-dword-accessors-for-pci_rom_address.patch
-qla2xxx-use-dword-accessors-for-pci_rom_address.patch
-pci-convert-kcalloc-to-kzalloc.patch
-gregkh-usb-usb-gotemp.patch
-more-device-ids-for-option-card-driver.patch
-pcnet32-set_ringparam-implementation.patch
-pcnet32-set-min-ring-size-to-4.patch
-add-smp_mb__after_clear_bit-to-unlock_kiocb.patch
-joystick-vs-xorg-fix.patch
-codingstyle-memory-allocation.patch
-files-fix-preemption-issues.patch
-files-fix-preemption-issues-tidy.patch
-fat-miss-sync-issues-on-sync-mount-miss-sync-on-write.patch
-fix-pf-request-handling.patch
-i2o-remove-class-interface.patch
-i2o-remove-i2o_device_class.patch
-driver-core-allow-nesting-classes.patch
-driver-core-make-parent-class-define-subsystem.patch
-driver-core-pass-interface-to-class-intreface-methods.patch
-driver-core-send-hotplug-event-before-adding-class-interfaces.patch
-input-kill-devfs-references.patch
-input-prepare-to-sysfs-integration.patch
-input-convert-net-bluetooth-to-dynamic-input_dev-allocation.patch
-input-convert-drivers-macintosh-to-dynamic-input_dev-allocation.patch
-input-convert-konicawc-to-dynamic-input_dev-allocation.patch
-input-convert-onetouch-to-dynamic-input_dev-allocation.patch
-drivers-input-mouse-convert-to-dynamic-input_dev-allocation.patch
-drivers-input-keyboard-convert-to-dynamic-input_dev-allocation.patch
-drivers-input-touchscreen-convert-to-dynamic-input_dev-allocation.patch
-drivers-usb-input-convert-to-dynamic-input_dev-allocation.patch
-input-convert-ucb1x00-ts-to-dynamic-input_dev-allocation.patch
-input-convert-sound-ppc-beep-to-dynamic-input_dev-allocation.patch
-input-convert-sonypi-to-dynamic-input_dev-allocation.patch
-input-convert-driver-input-misc-to-dynamic-input_dev-allocation.patch
-drivers-input-joystick-convert-to-dynamic-input_dev-allocation.patch
-drivers-media-convert-to-dynamic-input_dev-allocation.patch
-input-show-sysfs-path-in-proc-bus-input-devices.patch
-input-export-input_dev-data-via-sysfs-attributes.patch
-input-core-implement-class-hierachy.patch
-input-core-implement-class-hierachy-hdaps-fixes.patch
-input-core-remove-custom-made-hotplug-handler.patch
-input-convert-input-handlers-to-class-interfaces.patch
-input-convert-to-seq_file.patch
-ide-fix-null-request-pointer-for-taskfile-ioctl.patch
Merged
+proc_task_root_link-c99-fix.patch
+lpfc-build-fix.patch
old gcc fixes
+hostap-fix-kbuild-warning.patch
Wrongly fix Kconfig screwup
+reboot-comment-and-factor-the-main-reboot-functions.patch
+suspend-cleanup-calling-of-power-off-methods.patch
Power management fixes
+pci_fixup_parent_subordinate_busnr-fixes.patch
PCI enumeration fix
+kdumpx86-add-note-type-nt_kdumpinfo-to-kernel-core-dumps.patch
kdump feature
+acpi-handle-fadt-20-xpmtmr-address-0-case.patch
ACPI pm_timer fix
+update-maintainers-list-with-the-kprobes-maintainers.patch
MAINAINERS update
+v9fs-make-conv-functions-to-check-for-conv-buffer-overflow.patch
+v9fs-allocate-the-rwalk-qid-array-from-the-right-conv-buffer.patch
+v9fs-make-copy-of-the-transport-prototype-instead-of-using-it-directly.patch
+v9fs-replace-strlen-on-newly-allocated-by-__getname-buffers-to-path_max.patch
+v9fs-dont-free-root-dentry-inode-if-error-occurs-in-v9fs_get_sb.patch
v9fs updates
+ppc64-smu-driver-update-i2c-support.patch
+ppc64-smu-driver-update-i2c-support-fix.patch
Big update tp pmac platform driver
+acpi-disable-c2-c3-for-_all_-ibm-r40e-laptops-for-2613-bug-3549-update.patch
Fix acpi-disable-c2-c3-for-_all_-ibm-r40e-laptops-for-2613-bug-3549.patch
+cs5535-audio-alsa-driver.patch
+cleanup-for-cs5535-audio-driver.patch
New audio driver
+gregkh-driver-driver-ide-tape-sysfs.patch
+gregkh-driver-driver-fix-bus_rescan_devices.patch
+gregkh-driver-driver-device_is_registered.patch
+gregkh-driver-driver-fix-class-symlinks.patch
Driver tree updates
+drm_addmap_ioctl-warning-fix.patch
drm warning fix
+gregkh-i2c-i2c-maintainer.patch
+gregkh-i2c-hwmon-adm9240-update-01.patch
+gregkh-i2c-hwmon-adm9240-update-02.patch
+gregkh-i2c-hwmon-via686a-save-memory.patch
i2c tree updates
+fix-broken-nvidia-device-id-in-sata_nv.patch
SATA driver fix
+gregkh-pci-pci-remove-unused-scratch.patch
+gregkh-pci-pci-kzalloc.patch
+gregkh-pci-pci-fix-probe-warning.patch
+gregkh-pci-pci-buffer-overrun-rpaldpar.patch
PCI tree updates
+areca-raid-linux-scsi-driver-update.patch
Update areca-raid-linux-scsi-driver.patch
-scsi-sas-makefile-and-kconfig.patch
-sas_class-include-files-in-include-scsi-sas.patch
-sas-class-core-files.patch
-aic94xx-the-aic94xx-sas-lldd.patch
+git-sas.patch
Adaptec Serial Attached Storage tree
+gregkh-usb-ub-burn-cd-fix.patch
+gregkh-usb-usb-option-new-ids.patch
+gregkh-usb-usb-ftdi_sio-baud-rate-change.patch
+gregkh-usb-usb-pxa2xx_udc-build-fix.patch
+gregkh-usb-usb-sl811-minor-fixes.patch
+gregkh-usb-devfs-remove-usb-mode.patch
+gregkh-usb-usb-handoff-merge.patch
+gregkh-usb-usb-power-state-01.patch
+gregkh-usb-usb-power-state-02.patch
+gregkh-usb-usb-power-state-03.patch
+gregkh-usb-usb-power-state-04.patch
+gregkh-usb-usb-power-state-05.patch
+gregkh-usb-usb-uhci-01.patch
+gregkh-usb-usb-uhci-02.patch
+gregkh-usb-usb-gotemp.patch
USB tree updates
+gregkh-usb-usb-power-state-03-fix.patch
+gregkh-usb-usb-handoff-merge-usb-Makefile-fix.patch
+pegasus-ethernet-over-usb-driver-fixes.patch
+st5481_usb-build-fix.patch
Various USB fixes and enhancements
+x86_64-defconfig-update.patch
-x86_64-dma32-iommu.patch
-x86_64-dma32-srat32.patch
-x86_64-vm-holes-reserved.patch
+x86_64-dma32-srat32.patch
+x86_64-vm-holes-reserved.patch
+x86_64-hpet-regs.patch
+x86_64-no-idle-tick.patch
+x86_64-nohpet.patch
+x86_64-mce-thresh.patch
+x86_64-pat-base.patch
Various x86_64 tree updates
+x86_64-no-idle-tick-fix.patch
+x86_64-no-idle-tick-fix-2.patch
+x86_64-mce-thresh-fix.patch
+x86_64-mce-thresh-fix-2.patch
Fix them up.
+mm-move_pte-to-remap-zero_page-fix.patch
Fix mm-move_pte-to-remap-zero_page.patch
+eeproc-module_param_array-cleanup.patch
+b44-fix-suspend-resume.patch
+r8169-call-proper-vlan-receive-function.patch
net driver updates
+ppc32-cleanup-amcc-ppc44x-eval-board-u-boot-support.patch
+ppc32-ifdef-out-altivec-specific-code-in-__switch_to.patch
+ppc32-handle-access-to-non-present-io-ports-on-8xx.patch
ppc32 updates
+x86-initialise-tss-io_bitmap_owner-to-something.patch
+intel_cacheinfo-remove-max_cache_leaves-limit.patch
+i386-little-pgtableh-consolidation-vs-2-3level.patch
+x86-hot-plug-cpu-to-support-physical-add-of-new-processors.patch
x86 updates
+x86_64-dont-use-shortcut-when-using-send_ipi_all-in-flat-mode.patch
+x86_64-init-and-zap-low-address-mappings-on-demand-for-cpu-hotplug.patch
More x86_64 updates
+introduce-valid-callback-for-pm_ops.patch
Power management fixlet
+uml-dont-remove-umid-files-in-conflict-case.patch
+strlcat-use-for-uml-umidc.patch
+uml-dont-redundantly-mark-pte-as-newpage-in-pte_modify.patch
+uml-fix-hang-in-tt-mode-on-fault.patch
+uml-fix-condition-in-tlb-flush.patch
+uml-run-mconsole-sysrq-in-process-context.patch
+uml-avoid-fixing-faults-while-atomic.patch
+uml-fix-gfp_-flags-usage.patch
+uml-use-gfp_atomic-for-allocations-under-spinlocks.patch
+uml-replace-printk-with-stack-friendly-printf-to-report-console-failure.patch
UML updates
+xtensa-remove-io_remap_page_range-and-minor-clean-ups.patch
xtensa fix
+cm4040-cardman-4040-driver-update.patch
+cm4000-cardman-4000-driver-update.patch
Update the cardman pcmcia drivers in -mm.
-invalidate_inode_pages2_range-clean-pages-fix.patch
Wrong, dropped.
+ext3-ext_debug-build-fixes.patch
ext3 fixlet
+fix-bd_claim-error-code.patch
swapon() return code fix
+reiserfs-free-checking-cleanup.patch
reiserfs cleanup
+remove-hardcoded-send_sig_xxx-constants.patch
+cleanup-the-usage-of-send_sig_xxx-constants.patch
Use the #defines
+little-de_thread-cleanup.patch
+introduce-setup_timer-helper.patch
+introduce-setup_timer-helper-x86_64-fix.patch
+move-tasklist-walk-from-cfq-iosched-to-elevatorc.patch
Various code cleanups
+add-kthread_stop_sem.patch
New workqueue featurette
+switch-sibyte-profiling-driver-to-compat_ioctl.patch
+switch-sibyte-profiling-driver-to-compat_ioctl-fix.patch
+remove-drm-ioctl32-translations-from-sparc-and-parisc.patch
+tioc-compat-ioctl-handling.patch
ioctl() cleanups
+ntp-shift_right-cleanup.patch
NTP cleanup
+delete-2-unreachable-statements-in-drivers-block-paride-pfc.patch
+clarify-help-text-for-init_env_arg_limit.patch
+moving-kprobes-and-oprofile-to-instrumentation-support-menu.patch
Little fixes
+keys-add-possessor-permissions-to-keys.patch
Key management enhancement
+fat-cleanup-and-optimization-of-checksum.patch
+fat-remove-the-unneeded-vfat_find-in-vfat_rename.patch
+fat-remove-duplicate-directory-scanning-code.patch
fatfs updates
+i4l-update-hfc_usb-driver.patch
ISDN driver update
+pcmcia-use-runtime-suspend-resume-support-to-unify-all-suspend-code-paths-fix.patch
Fix pcmcia-use-runtime-suspend-resume-support-to-unify-all-suspend-code-paths.patch
+pcmcia-yenta-add-support-for-more-ti-bridges.patch
+pcmcia-yenta-optimize-interrupt-handler.patch
Cardbus driver updates
+sched-modified-nice-support-for-smp-load-balancing.patch
CPU scheduler improvement
+reiser4-ver_linux-dont-print-reiser4progs-version-if-none-found.patch
+reiser4-atime-update-fix.patch
+reiser4-use-try_to_freeze.patch
reiser4 fixes
+ide-move-config_ide_max_hwifs-into-linux-ideh.patch
IDE cleanup
+add-dm-snapshot-tutorial-in-documentation.patch
Devicemapper documentation
+documentation-ioctl-messtxt-start-annotating-i-o.patch
Updates to ioctl documentation
+tty-layer-buffering-revamp-icom-fixes.patch
+tty-layer-buffering-revamp-isdn-layer.patch
+driver-char-n_hdlcc-remove-unused-declaration.patch
More tty layer fallout fixes
All 484 patches:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/patch-list
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
> git-ocfs2-prep.patch
> git-ocfs2.patch
As the truncate_inode_pages patch is now in Linus' git, it is
no longer in git-ocfs2.patch. -rc2-mm1 is effectively reverting it.
git-ocfs2-prep.patch should be removed.
Joel
--
"There is no sincerer love than the love of food."
- George Bernard Shaw
Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127
Hi,
On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
>
> - Various random other things - nothing major.
Overall boots up and looks fine, but still seeing this oops which comes up on
warm reboot intermittently:
ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
ahci(0000:00:1f.2) flags: 64bit ncq led slum part
ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : ahci
ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : ahci
ata3: no device found (phy stat 00000000)
scsi2 : ahci
ata4: no device found (phy stat 00000000)
scsi3 : ahci
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST380817AS Rev: 3.42
Type: Direct-Access ANSI SCSI revision: 05
scheduling while atomic: ksoftirqd/0/0x00000100/3
[<c0103ad0>] dump_stack+0x17/0x19
[<c031483a>] schedule+0x8ba/0xccb
[<c0315d17>] __down+0xe5/0x126
[<c0313f1a>] __down_failed+0xa/0x10
[<c0233f3d>] .text.lock.main+0x2b/0x3e
[<c022f90c>] device_del+0x35/0x5d
[<c025d71e>] scsi_target_reap+0x89/0xa3
[<c025ed5a>] scsi_device_dev_release+0x114/0x18b
[<c022f504>] device_release+0x1a/0x5a
[<c01e15c2>] kobject_cleanup+0x43/0x6b
[<c01e15f5>] kobject_release+0xb/0xd
[<c01e1e3c>] kref_put+0x2e/0x92
[<c01e160b>] kobject_put+0x14/0x16
[<c022f8d5>] put_device+0x11/0x13
[<c0256fd8>] scsi_put_command+0x7c/0x9e
[<c025b918>] scsi_next_command+0xf/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sda: drive cache: write back
sda: at virtual address 6b6b6b6b
printing eip:
c025b81f
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file:
Modules linked in:
CPU: 0
EIP: 0060:[<c025b81f>] Not tainted VLI
EFLAGS: 00010292 (2.6.14-rc2-mm1)
EIP is at scsi_run_queue+0x12/0xb8
eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
ds: 007b es: 007b ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
Call Trace:
[<c0103a83>] show_stack+0x94/0xca
[<c0103c2c>] show_registers+0x15a/0x1ea
[<c0103e4a>] die+0x108/0x183
[<c03166cd>] do_page_fault+0x1ed/0x63d
[<c0103753>] error_code+0x4f/0x54
[<c025b91f>] scsi_next_command+0x16/0x19
[<c025b9db>] scsi_end_request+0x93/0xc5
[<c025bdd4>] scsi_io_completion+0x281/0x46a
[<c025c1c8>] scsi_generic_done+0x2d/0x3a
[<c0257746>] scsi_finish_command+0x7f/0x93
[<c025762b>] scsi_softirq+0xab/0x11c
[<c0121952>] __do_softirq+0x72/0xdc
[<c01219f3>] do_softirq+0x37/0x39
[<c0121eeb>] ksoftirqd+0x9f/0xf4
[<c012ff37>] kthread+0x99/0x9d
[<c01010b5>] kernel_thread_helper+0x5/0xb
Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 60 seconds..
This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
I suspect it is actually a bit older than that even).
reuben
Reuben Farrelly <[email protected]> wrote:
>
> Hi,
>
> On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
> >
> > - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
> >
> > - Various random other things - nothing major.
>
> Overall boots up and looks fine, but still seeing this oops which comes up on
> warm reboot intermittently:
Nasty.
> ahci(0000:00:1f.2) AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA mode
> ahci(0000:00:1f.2) flags: 64bit ncq led slum part
> ata1: SATA max UDMA/133 cmd 0xF8802D00 ctl 0x0 bmdma 0x0 irq 193
> ata2: SATA max UDMA/133 cmd 0xF8802D80 ctl 0x0 bmdma 0x0 irq 193
> ata3: SATA max UDMA/133 cmd 0xF8802E00 ctl 0x0 bmdma 0x0 irq 193
> ata4: SATA max UDMA/133 cmd 0xF8802E80 ctl 0x0 bmdma 0x0 irq 193
> ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata1: dev 0 configured for UDMA/133
> scsi0 : ahci
> ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata2: dev 0 configured for UDMA/133
> scsi1 : ahci
> ata3: no device found (phy stat 00000000)
> scsi2 : ahci
> ata4: no device found (phy stat 00000000)
> scsi3 : ahci
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> scheduling while atomic: ksoftirqd/0/0x00000100/3
> [<c0103ad0>] dump_stack+0x17/0x19
> [<c031483a>] schedule+0x8ba/0xccb
> [<c0315d17>] __down+0xe5/0x126
> [<c0313f1a>] __down_failed+0xa/0x10
> [<c0233f3d>] .text.lock.main+0x2b/0x3e
> [<c022f90c>] device_del+0x35/0x5d
> [<c025d71e>] scsi_target_reap+0x89/0xa3
> [<c025ed5a>] scsi_device_dev_release+0x114/0x18b
> [<c022f504>] device_release+0x1a/0x5a
> [<c01e15c2>] kobject_cleanup+0x43/0x6b
> [<c01e15f5>] kobject_release+0xb/0xd
> [<c01e1e3c>] kref_put+0x2e/0x92
> [<c01e160b>] kobject_put+0x14/0x16
> [<c022f8d5>] put_device+0x11/0x13
> [<c0256fd8>] scsi_put_command+0x7c/0x9e
> [<c025b918>] scsi_next_command+0xf/0x19
> [<c025b9db>] scsi_end_request+0x93/0xc5
> [<c025bdd4>] scsi_io_completion+0x281/0x46a
> [<c025c1c8>] scsi_generic_done+0x2d/0x3a
> [<c0257746>] scsi_finish_command+0x7f/0x93
> [<c025762b>] scsi_softirq+0xab/0x11c
> [<c0121952>] __do_softirq+0x72/0xdc
> [<c01219f3>] do_softirq+0x37/0x39
> [<c0121eeb>] ksoftirqd+0x9f/0xf4
> [<c012ff37>] kthread+0x99/0x9d
> [<c01010b5>] kernel_thread_helper+0x5/0xb
There's a whole bunch of reasons why we cannot call scsi_target_reap() from
softirq context. klist_del() locking and whatever semaphore that's taking
are amongst them...
> Unable to handle kernel paging request<5>SCSI device sda: 156301488 512-byte
> hdwr sectors (80026 MB)
> SCSI device sda: drive cache: write back
> SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> SCSI device sda: drive cache: write back
> sda: at virtual address 6b6b6b6b
> printing eip:
> c025b81f
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> last sysfs file:
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c025b81f>] Not tainted VLI
> EFLAGS: 00010292 (2.6.14-rc2-mm1)
> EIP is at scsi_run_queue+0x12/0xb8
> eax: 6b6b6b6b ebx: f7c36b70 ecx: 00000000 edx: 00000001
> esi: f7c4eb6c edi: 00000246 ebp: c1911eac esp: c1911e98
> ds: 007b es: 007b ss: 0068
> Process ksoftirqd/0 (pid: 3, threadinfo=c1910000 task=c1942a90)
> Stack: c1baf5f8 f7c36b70 f7c36b70 f7c4eb6c 00000246 c1911eb8 c025b91f f7c386e8
> c1911ed0 c025b9db f7c36b70 f7c4eb6c 00000000 00000000 c1911f28 c025bdd4
> 00000001 00004f80 00000100 00000001 c1807ac0 00000000 00000000 00040000
> Call Trace:
> [<c0103a83>] show_stack+0x94/0xca
> [<c0103c2c>] show_registers+0x15a/0x1ea
> [<c0103e4a>] die+0x108/0x183
> [<c03166cd>] do_page_fault+0x1ed/0x63d
> [<c0103753>] error_code+0x4f/0x54
> [<c025b91f>] scsi_next_command+0x16/0x19
> [<c025b9db>] scsi_end_request+0x93/0xc5
> [<c025bdd4>] scsi_io_completion+0x281/0x46a
> [<c025c1c8>] scsi_generic_done+0x2d/0x3a
> [<c0257746>] scsi_finish_command+0x7f/0x93
> [<c025762b>] scsi_softirq+0xab/0x11c
> [<c0121952>] __do_softirq+0x72/0xdc
> [<c01219f3>] do_softirq+0x37/0x39
> [<c0121eeb>] ksoftirqd+0x9f/0xf4
> [<c012ff37>] kthread+0x99/0x9d
> [<c01010b5>] kernel_thread_helper+0x5/0xb
> Code: fd ff 8b 4d ec 8b 41 44 e8 e4 a6 0b 00 89 45 f0 89 d8 e8 34 c1 ff ff eb
> b2 55 89 e5 57 56 53 83 ec 08 89 45 f0 8b 80 10 01 00 00 <8b> 38 80 b8 85 01
> 00 00 00 0f 88 8b 00 00 00 8b 47 44 e8 af a6
> <0>Kernel panic - not syncing: Fatal exception in interrupt
> <0>Rebooting in 60 seconds..
>
It oopsed as well. That might be a second bug.
>
> This is not new to this -mm release (I had a screen dump of it 2 weeks ago but
> I suspect it is actually a bit older than that even).
>
Thanks.
Build breaks with this config (x440/summit):
http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67
arch/i386/kernel/built-in.o(.init.text+0x389d): In function `set_nmi_ipi_callback':
/usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
arch/i386/kernel/built-in.o(.init.text+0x4ee0): In function `smp_read_mpc':
/usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'
Plus it panics on boot on Power-4 LPAR
Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
softlockup thread 1 started up.
Processor 2 found.
softlockup thread 2 started up.
Processor 3 found.
Brought up 4 CPUs
softlockup thread 3 started up.
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging disabled
PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
<7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
ibm,os-term call failed -1
I see regression in tty update speed with ADOM (ncurses based
roguelike) [1].
Messages at the top ("goblin hits you") are printed slowly. An eye can
notice letter after letter printing.
2.6.14-rc2 is OK.
I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
it'll change something.
[1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
"Martin J. Bligh" <[email protected]> wrote:
>
> Build breaks with this config (x440/summit):
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/config/abat/elm3b67
>
> arch/i386/kernel/built-in.o(.init.text+0x389d): In function `set_nmi_ipi_callback':
> /usr/local/autobench/var/tmp/build/arch/i386/kernel/traps.c:727: undefined reference to `usb_early_handoff'
> arch/i386/kernel/built-in.o(.init.text+0x4ee0): In function `smp_read_mpc':
> /usr/local/autobench/var/tmp/build/include/asm-i386/mach-summit/mach_mpparse.h:35: undefined reference to `usb_early_handoff'
>
grr. David had a hack in there which caused my links to fail so I hacked
it out and broke yours.
> Plus it panics on boot on Power-4 LPAR
>
> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
> Mount-cache hash table entries: 256
> softlockup thread 0 started up.
> Processor 1 found.
> softlockup thread 1 started up.
> Processor 2 found.
> softlockup thread 2 started up.
> Processor 3 found.
> Brought up 4 CPUs
> softlockup thread 3 started up.
> NET: Registered protocol family 16
> PCI: Probing PCI hardware
> IOMMU table initialized, virtual merging disabled
> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>
> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
> ibm,os-term call failed -1
There are ppc64 IOMMU changes in Linus's tree...
>> Plus it panics on boot on Power-4 LPAR
>>
>> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
>> Mount-cache hash table entries: 256
>> softlockup thread 0 started up.
>> Processor 1 found.
>> softlockup thread 1 started up.
>> Processor 2 found.
>> softlockup thread 2 started up.
>> Processor 3 found.
>> Brought up 4 CPUs
>> softlockup thread 3 started up.
>> NET: Registered protocol family 16
>> PCI: Probing PCI hardware
>> IOMMU table initialized, virtual merging disabled
>> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
>> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>>
>> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
>> ibm,os-term call failed -1
>
> There are ppc64 IOMMU changes in Linus's tree...
Thanks. will retest with just linus.patch to confirm
On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> I see regression in tty update speed with ADOM (ncurses based
> roguelike) [1].
>
> Messages at the top ("goblin hits you") are printed slowly. An eye can
> notice letter after letter printing.
>
> 2.6.14-rc2 is OK.
>
> I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> it'll change something.
>
> [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
Scratch TTY revamp, the sucker is
fix-sys_poll-large-timeout-handling.patch
HZ=250 here.
------------------------------------------------------------------------
From: Nishanth Aravamudan <[email protected]>
The @timeout parameter to sys_poll() is in milliseconds but we compare it
to (MAX_SCHEDULE_TIMEOUT / HZ), which is (jiffies/jiffies-per-sec) or
seconds. That seems blatantly broken. This led to improper overflow
checking for @timeout. As Andrew Morton pointed out, the best fix is to to
check for potential overflow first, then either select an indefinite value
or convert @timeout.
To achieve this and clean-up the code, change the prototype of the sys_poll
to make it clear that the parameter is in milliseconds and introduce a
variable, timeout_jiffies to hold the corresonding jiffies value.
Signed-off-by: Nishanth Aravamudan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
fs/select.c | 36 ++++++++++++++++++++++++++----------
include/linux/syscalls.h | 2 +-
2 files changed, 27 insertions(+), 11 deletions(-)
diff -puN fs/select.c~fix-sys_poll-large-timeout-handling fs/select.c
--- devel/fs/select.c~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/fs/select.c 2005-09-10 03:26:17.000000000 -0700
@@ -464,15 +464,18 @@ static int do_poll(unsigned int nfds, s
return count;
}
-asmlinkage long sys_poll(struct pollfd __user * ufds, unsigned int nfds, long timeout)
+asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
+ long timeout_msecs)
{
struct poll_wqueues table;
- int fdcount, err;
+ int fdcount, err;
+ int overflow;
unsigned int i;
struct poll_list *head;
struct poll_list *walk;
struct fdtable *fdt;
int max_fdset;
+ unsigned long timeout_jiffies;
/* Do a sanity check on nfds ... */
rcu_read_lock();
@@ -482,13 +485,26 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- if (timeout) {
- /* Careful about overflow in the intermediate values */
- if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
- timeout = (unsigned long)(timeout*HZ+999)/1000+1;
- else /* Negative or overflow */
- timeout = MAX_SCHEDULE_TIMEOUT;
- }
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to avoid
+ * converting any value to a numerically higher value, which
+ * could overflow.
+ */
+#if HZ > 1000
+ overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+#else
+ overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+#endif
+
+ /*
+ * If we would overflow in the conversion or a negative timeout
+ * is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
poll_initwait(&table);
@@ -519,7 +535,7 @@ asmlinkage long sys_poll(struct pollfd _
}
i -= pp->len;
}
- fdcount = do_poll(nfds, head, &table, timeout);
+ fdcount = do_poll(nfds, head, &table, timeout_jiffies);
/* OK, now copy the revents fields back to user space. */
walk = head;
diff -puN include/linux/syscalls.h~fix-sys_poll-large-timeout-handling include/linux/syscalls.h
--- devel/include/linux/syscalls.h~fix-sys_poll-large-timeout-handling 2005-09-10 02:35:19.000000000 -0700
+++ devel-akpm/include/linux/syscalls.h 2005-09-10 02:35:19.000000000 -0700
@@ -420,7 +420,7 @@ asmlinkage long sys_socketpair(int, int,
asmlinkage long sys_socketcall(int call, unsigned long __user *args);
asmlinkage long sys_listen(int, int);
asmlinkage long sys_poll(struct pollfd __user *ufds, unsigned int nfds,
- long timeout);
+ long timeout_msecs);
asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp);
asmlinkage long sys_epoll_create(int size);
_
Hi Andrew,
My ide-based AMD64 machine doesn't boot 2.6.14-rc2-mm1.
Known issue ?
Thanks,
Badari
Badari Pulavarty <[email protected]> wrote:
>
> My ide-based AMD64 machine doesn't boot 2.6.14-rc2-mm1.
> Known issue ?
Nope. How does that dmesg output differ from 2.6.14-rc2's?
On 23.09.2005 [01:49:26 +0400], Alexey Dobriyan wrote:
> On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> > I see regression in tty update speed with ADOM (ncurses based
> > roguelike) [1].
> >
> > Messages at the top ("goblin hits you") are printed slowly. An eye can
> > notice letter after letter printing.
> >
> > 2.6.14-rc2 is OK.
> >
> > I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> > it'll change something.
> >
> > [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
>
> Scratch TTY revamp, the sucker is
> fix-sys_poll-large-timeout-handling.patch
>
> HZ=250 here.
Alexey,
Thanks for the report. I will take a look on my Thinkpad with HZ=250
under -mm2. I have some ideas for debugging it if I see the same
problem.
Thanks,
Nish
--"Martin J. Bligh" <[email protected]> wrote (on Thursday, September 22, 2005 13:14:11 -0700):
>>> Plus it panics on boot on Power-4 LPAR
>>>
>>> Memory: 30962716k/31457280k available (4308k kernel code, 494564k reserved, 1112k data, 253k bss, 420k init)
>>> Mount-cache hash table entries: 256
>>> softlockup thread 0 started up.
>>> Processor 1 found.
>>> softlockup thread 1 started up.
>>> Processor 2 found.
>>> softlockup thread 2 started up.
>>> Processor 3 found.
>>> Brought up 4 CPUs
>>> softlockup thread 3 started up.
>>> NET: Registered protocol family 16
>>> PCI: Probing PCI hardware
>>> IOMMU table initialized, virtual merging disabled
>>> PCI_DMA: iommu_table_setparms: /pci@3fffde0a000/pci@2,2 has missing tce entries !
>>> Kernel panic - not syncing: iommu_init_table: Can't allocate 1729382256943765922 bytes
>>>
>>> <7>RTAS: event: 3, Type: Internal Device Failure, Severity: 5
>>> ibm,os-term call failed -1
>>
>> There are ppc64 IOMMU changes in Linus's tree...
>
> Thanks. will retest with just linus.patch to confirm
Yeah, is broken there too. Borkage in mainline! ;-)
http://test.kernel.org/13316/debug/console.log
if someone wants to look ...
M.
On 9/22/05, Nishanth Aravamudan <[email protected]> wrote:
> On 23.09.2005 [01:49:26 +0400], Alexey Dobriyan wrote:
> > On Thu, Sep 22, 2005 at 11:50:29PM +0400, Alexey Dobriyan wrote:
> > > I see regression in tty update speed with ADOM (ncurses based
> > > roguelike) [1].
> > >
> > > Messages at the top ("goblin hits you") are printed slowly. An eye can
> > > notice letter after letter printing.
> > >
> > > 2.6.14-rc2 is OK.
> > >
> > > I'll try to revert tty-layer-buffering-revamp*.patch pieces and see if
> > > it'll change something.
> > >
> > > [1] http://adom.de/adom/download/linux/adom-111-elf.tar.gz (binary only)
> >
> > Scratch TTY revamp, the sucker is
> > fix-sys_poll-large-timeout-handling.patch
> >
> > HZ=250 here.
>
> Alexey,
>
> Thanks for the report. I will take a look on my Thinkpad with HZ=250
> under -mm2. I have some ideas for debugging it if I see the same
> problem.
I did not see any tty refresh problems on my TP with HZ=250 under
2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
adom binary you sent me. I even played two games just to make sure ;)
Is there any chance you can do an strace of the process while it is
slow to redraw your screen? Just to verify how poll() is being called
[if my patch is the problem, then poll() must be being used somewhat
differently than I expected -- e.g. a dependency on the broken
behavior]. The only thing I can think of right now is that I made
timeout_jiffies unsigned, when schedule_timeout() will treat it as
signed, but I'm not sure if that is the problem.
We may want to contact the adom author eventually to figure out how
poll() is being used in the Linux port, if strace is unable to help
further.
Thanks,
Nish
On Fri, Sep 23, 2005 at 10:12:11AM -0700, Nish Aravamudan wrote:
> I did not see any tty refresh problems on my TP with HZ=250 under
> 2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
> adom binary you sent me. I even played two games just to make sure ;)
The slowdown is HZ dependent:
* HZ=1000 - game is playable. If I would not know slowdown is there I
wouldn't notice it.
* HZ=100 - messages at the top are printed r e a l l y s l o w.
* HZ=250 - somewhere in the middle.
> Is there any chance you can do an strace of the process while it is
> slow to redraw your screen?
Typical pattern is:
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
poll([{fd=0, events=POLLIN}], 1, 0) = 0
write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
I can send full strace log if needed.
On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> On Fri, Sep 23, 2005 at 10:12:11AM -0700, Nish Aravamudan wrote:
> > I did not see any tty refresh problems on my TP with HZ=250 under
> > 2.6.14-rc2-mm1 (excuse the typo in my previous response) under the
> > adom binary you sent me. I even played two games just to make sure ;)
>
> The slowdown is HZ dependent:
> * HZ=1000 - game is playable. If I would not know slowdown is there I
> wouldn't notice it.
> * HZ=100 - messages at the top are printed r e a l l y s l o w.
> * HZ=250 - somewhere in the middle.
>
> > Is there any chance you can do an strace of the process while it is
> > slow to redraw your screen?
>
> Typical pattern is:
>
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[11;18H\33[37m\33[40m[g] Gnome\r\33[12"..., 58) = 58
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[12;18H\33[37m\33[40m[h] Hurthling\r"..., 62) = 62
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> poll([{fd=0, events=POLLIN}], 1, 0) = 0
> write(1, "\33[13;18H\33[37m\33[40m[i] Orc\r\33[14d\33"..., 56) = 56
> rt_sigaction(SIGTSTP, {0xb7f1e578, [], SA_RESTART}, NULL, 8) = 0
> rt_sigaction(SIGTSTP, {SIG_IGN}, {0xb7f1e578, [], SA_RESTART}, 8) = 0
>
> I can send full strace log if needed.
Nope, that helped tremendously! I think I know what the issue is (and
why it's HZ dependent).
In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
be resolved as 0 jiffy requests. But in my patch, those requests become
1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
Care to try the following patch?
Note: I would be happy to not do the conditional and just have the patch
change the msecs_to_jiffies() line when assigning to timeout_jiffies.
But I figured it would be best to avoid *all* computations if we know
the resulting value is going to be 0. Hence all the tab changing.
Thanks,
Nish
Description: Modifying sys_poll() to handle large timeouts correctly
resulted in 0 being treated just like any other millisecond request,
while the current code treats it as an optimized case. Do the same in
the new code. Most of the code change is tabbing due to the inserted if.
Signed-off-by: Nishanth Aravamudan <[email protected]>
---
fs/select.c | 41 +++++++++++++++++++++++++----------------
1 files changed, 25 insertions(+), 16 deletions(-)
diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
--- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
+++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
@@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
if (nfds > max_fdset && nfds > OPEN_MAX)
return -EINVAL;
- /*
- * We compare HZ with 1000 to work out which side of the
- * expression needs conversion. Because we want to avoid
- * converting any value to a numerically higher value, which
- * could overflow.
- */
+ if (timeout_msecs) {
+ /*
+ * We compare HZ with 1000 to work out which side of the
+ * expression needs conversion. Because we want to
+ * avoid converting any value to a numerically higher
+ * value, which could overflow.
+ */
#if HZ > 1000
- overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
+ overflow = timeout_msecs >=
+ jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
#else
- overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
+ overflow = msecs_to_jiffies(timeout_msecs) >=
+ MAX_SCHEDULE_TIMEOUT;
#endif
- /*
- * If we would overflow in the conversion or a negative timeout
- * is requested, sleep indefinitely.
- */
- if (overflow || timeout_msecs < 0)
- timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
- else
- timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ /*
+ * If we would overflow in the conversion or a negative
+ * timeout is requested, sleep indefinitely.
+ */
+ if (overflow || timeout_msecs < 0)
+ timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
+ else
+ timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
+ } else {
+ /*
+ * 0 millisecond requests become 0 jiffy requests
+ */
+ timeout_jiffies = 0;
+ }
poll_initwait(&table);
On Fri, Sep 23, 2005 at 12:07:49PM -0700, Nishanth Aravamudan wrote:
> On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> > poll([{fd=0, events=POLLIN}], 1, 0) = 0
> > I can send full strace log if needed.
>
> Nope, that helped tremendously! I think I know what the issue is (and
> why it's HZ dependent).
>
> In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
> be resolved as 0 jiffy requests. But in my patch, those requests become
> 1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
>
> Care to try the following patch?
It works! Now, even with HZ=100, gameplay is smooth.
Andrew, please, apply.
> Description: Modifying sys_poll() to handle large timeouts correctly
> resulted in 0 being treated just like any other millisecond request,
> while the current code treats it as an optimized case. Do the same in
> the new code. Most of the code change is tabbing due to the inserted if.
> diff -urpN 2.6.14-rc2-mm1/fs/select.c 2.6.14-rc2-mm1-dev/fs/select.c
> --- 2.6.14-rc2-mm1/fs/select.c 2005-09-23 11:52:36.000000000 -0700
> +++ 2.6.14-rc2-mm1-dev/fs/select.c 2005-09-23 12:04:03.000000000 -0700
> @@ -485,26 +485,35 @@ asmlinkage long sys_poll(struct pollfd _
> if (nfds > max_fdset && nfds > OPEN_MAX)
> return -EINVAL;
>
> - /*
> - * We compare HZ with 1000 to work out which side of the
> - * expression needs conversion. Because we want to avoid
> - * converting any value to a numerically higher value, which
> - * could overflow.
> - */
> + if (timeout_msecs) {
> + /*
> + * We compare HZ with 1000 to work out which side of the
> + * expression needs conversion. Because we want to
> + * avoid converting any value to a numerically higher
> + * value, which could overflow.
> + */
> #if HZ > 1000
> - overflow = timeout_msecs >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
> + overflow = timeout_msecs >=
> + jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT);
> #else
> - overflow = msecs_to_jiffies(timeout_msecs) >= MAX_SCHEDULE_TIMEOUT;
> + overflow = msecs_to_jiffies(timeout_msecs) >=
> + MAX_SCHEDULE_TIMEOUT;
> #endif
>
> - /*
> - * If we would overflow in the conversion or a negative timeout
> - * is requested, sleep indefinitely.
> - */
> - if (overflow || timeout_msecs < 0)
> - timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
> - else
> - timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
> + /*
> + * If we would overflow in the conversion or a negative
> + * timeout is requested, sleep indefinitely.
> + */
> + if (overflow || timeout_msecs < 0)
> + timeout_jiffies = MAX_SCHEDULE_TIMEOUT;
> + else
> + timeout_jiffies = msecs_to_jiffies(timeout_msecs) + 1;
> + } else {
> + /*
> + * 0 millisecond requests become 0 jiffy requests
> + */
> + timeout_jiffies = 0;
> + }
>
> poll_initwait(&table);
>
On 23.09.2005 [23:42:53 +0400], Alexey Dobriyan wrote:
> On Fri, Sep 23, 2005 at 12:07:49PM -0700, Nishanth Aravamudan wrote:
> > On 23.09.2005 [22:42:16 +0400], Alexey Dobriyan wrote:
> > > poll([{fd=0, events=POLLIN}], 1, 0) = 0
>
> > > I can send full strace log if needed.
> >
> > Nope, that helped tremendously! I think I know what the issue is (and
> > why it's HZ dependent).
> >
> > In the current code, (2.6.13.2, e.g) we allow 0 timeout poll-requests to
> > be resolved as 0 jiffy requests. But in my patch, those requests become
> > 1 jiffy (which of course depends on HZ and gets quite long if HZ=100)!
> >
> > Care to try the following patch?
>
> It works! Now, even with HZ=100, gameplay is smooth.
>
> Andrew, please, apply.
Great! Thanks for the testing, Alexey.
-Nish
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
[...]
> +reiser4-ver_linux-dont-print-reiser4progs-version-if-none-found.patch
> +reiser4-atime-update-fix.patch
> +reiser4-use-try_to_freeze.patch
>
> reiser4 fixes
Runs good, except that reiser4 seems to do bad things in do_sendfile.
I have apache2 running here and it refuses to serve my ~/public_html
homepage. /home is running on a reiser4 partition and while apache2
serves good pages from different filesystems, stracing the process while
requesting my homepage, I get:
stat64("/home/mattia/public_html/index.html", {st_mode=S_IFREG|0644, st_size=2315, ...}) = 0
open("/home/mattia/public_html/index.html", O_RDONLY) = 12
setsockopt(11, SOL_TCP, TCP_NODELAY, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(11, [{"HTTP/1.1 200 OK\r\nDate: Sat, 24 S"..., 328}], 1) = 328
sendfile(11, 12, [0], 2315) = -1 EINVAL (Invalid argument)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
setsockopt(11, SOL_TCP, TCP_CORK, [0], 4) = 0
setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0
read(11, 0x82297f0, 8000) = -1 EAGAIN (Resource temporarily unavailable)
write(10, "127.0.0.1 - - [24/Sep/2005:10:13"..., 95) = 95
close(11) = 0
read(5, 0xbfe4c4e3, 1) = -1 EAGAIN (Resource temporarily unavailable)
close(12) = 0
--
mattia
:wq!
On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
Herm... running almost good :) I just got the below allocation failure
(including /proc/slabinfo and /proc/vmstat, useful? can provide more
info if happens again - ah, exim is just running for the local delivery
purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
find time enough to report it properly.
Linux version 2.6.14-rc2-mm1-1 (mattia@inferi) (gcc version 4.0.1 (Debian 4.0.1-2)) #1 PREEMPT Fri Sep 23 20:56:05 CEST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000c0000 - 00000000000d0000 (reserved)
BIOS-e820: 00000000000d8000 - 00000000000e0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
BIOS-e820: 000000000fef0000 - 000000000feff000 (ACPI data)
BIOS-e820: 000000000feff000 - 000000000ff00000 (ACPI NVS)
BIOS-e820: 000000000ff00000 - 000000000ff80000 (usable)
BIOS-e820: 000000000ff80000 - 0000000010000000 (reserved)
BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
255MB LOWMEM available.
On node 0 totalpages: 65408
DMA zone: 4096 pages, LIFO batch:2
DMA32 zone: 0 pages, LIFO batch:2
Normal zone: 61312 pages, LIFO batch:32
HighMem zone: 0 pages, LIFO batch:2
DMI present.
[...]
exim4: page allocation failure. order:1, mode:0x80000020
[<c0143698>] __alloc_pages+0x328/0x450
[<c0147150>] kmem_getpages+0x30/0xa0
[<c01480cf>] cache_grow+0xbf/0x1f0
[<c0148446>] cache_alloc_refill+0x246/0x280
[<c0148793>] __kmalloc+0x73/0x80
[<c0291cd8>] pskb_expand_head+0x58/0x150
[<c0297143>] skb_checksum_help+0x103/0x120
[<d0c6d1cc>] ip_nat_fn+0x1cc/0x240 [iptable_nat]
[<d0c763e8>] ip_conntrack_in+0x188/0x2c0 [ip_conntrack]
[<d0c6d45e>] ip_nat_local_fn+0x7e/0xc0 [iptable_nat]
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7c2b>] nf_iterate+0x6b/0xa0
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02e7cc4>] nf_hook_slow+0x64/0x140
[<c02b2670>] dst_output+0x0/0x30
[<c02b2670>] dst_output+0x0/0x30
[<c02b35ae>] ip_queue_xmit+0x23e/0x550
[<c02b2670>] dst_output+0x0/0x30
[<c01e1b9a>] __copy_to_user_ll+0x4a/0x90
[<c0293a6e>] memcpy_toiovec+0x6e/0x90
[<c02c4c75>] tcp_cwnd_restart+0x35/0xf0
[<c02c5276>] tcp_transmit_skb+0x426/0x780
[<c02c332e>] tcp_rcv_established+0x6e/0x8c0
[<c02c657d>] tcp_write_xmit+0x12d/0x3d0
[<c02c6855>] __tcp_push_pending_frames+0x35/0xb0
[<c02bad3c>] tcp_sendmsg+0xa3c/0xb50
[<c028c67f>] sock_aio_write+0xcf/0x120
[<c016029d>] do_sync_write+0xcd/0x130
[<c0131ed0>] autoremove_wake_function+0x0/0x60
[<c016047f>] vfs_write+0x17f/0x190
[<c016055b>] sys_write+0x4b/0x80
[<c01032a1>] syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: low 0, high 12, batch 2 used:8
cpu 0 cold: low 0, high 4, batch 1 used:3
DMA32 per-cpu: empty
Normal per-cpu:
cpu 0 hot: low 0, high 192, batch 32 used:14
cpu 0 cold: low 0, high 64, batch 16 used:51
HighMem per-cpu: empty
Free pages: 4112kB (0kB HighMem)
Active:46238 inactive:10857 dirty:16 writeback:0 unstable:0 free:1028 slab:4078 mapped:39343 pagetables:316
DMA free:1224kB min:128kB low:160kB high:192kB active:6812kB inactive:3684kB present:16384kB pages_scanned:36 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 239 239
Normal free:2888kB min:1916kB low:2392kB high:2872kB active:178140kB inactive:39744kB present:245248kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 300*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1224kB
DMA32: empty
Normal: 722*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2888kB
HighMem: empty
Swap cache: add 33, delete 33, find 0/0, race 0+0
Free swap = 248864kB
Total swap = 248996kB
Free swap: 248864kB
65408 pages of RAM
0 pages of HIGHMEM
1529 reserved pages
46307 pages shared
0 pages swap cached
16 pages dirty
0 pages writeback
39343 pages mapped
4078 pages slab
316 pages pagetables
cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_read_data 32 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_inode_cache 3 14 560 7 1 : tunables 54 27 0 : slabdata 2 2 0
nfs_page 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0
rpc_tasks 8 20 192 20 1 : tunables 120 60 0 : slabdata 1 1 0
rpc_inode_cache 8 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
ip_conntrack_expect 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
ip_conntrack 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
scsi_cmd_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0
d_cursor 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
file_fsdata 71 75 256 15 1 : tunables 120 60 0 : slabdata 5 5 0
dentry_fsdata 2188 3658 64 59 1 : tunables 120 60 0 : slabdata 62 62 0
fq 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
jnode 1869 4480 96 40 1 : tunables 120 60 0 : slabdata 112 112 0
txn_handle 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
txn_atom 1 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
plugin_set 73 118 64 59 1 : tunables 120 60 0 : slabdata 2 2 0
znode 4704 7888 224 17 1 : tunables 120 60 0 : slabdata 464 464 0
reiser4_inode 4057 4144 512 7 1 : tunables 54 27 0 : slabdata 592 592 0
sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0
sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0
sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0
sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
sgpool-8 32 60 128 30 1 : tunables 120 60 0 : slabdata 2 2 0
dm_tio 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
dm_io 0 0 16 203 1 : tunables 120 60 0 : slabdata 0 0 0
uhci_urb_priv 1 92 40 92 1 : tunables 120 60 0 : slabdata 1 1 0
UNIX 77 77 352 11 1 : tunables 54 27 0 : slabdata 7 7 0
tcp_bind_bucket 15 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
inet_peer_cache 1 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_alias 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_hash 9 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
ip_dst_cache 31 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
arp_cache 3 30 128 30 1 : tunables 120 60 0 : slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
UDP 8 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
tw_sock_TCP 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
request_sock_TCP 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
TCP 15 16 960 4 1 : tunables 54 27 0 : slabdata 4 4 0
cfq_ioc_pool 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
crq_pool 0 0 44 84 1 : tunables 120 60 0 : slabdata 0 0 0
deadline_drq 0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
as_arq 24 189 60 63 1 : tunables 120 60 0 : slabdata 3 3 0
mqueue_inode_cache 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
reiser_inode_cache 622 1450 392 10 1 : tunables 54 27 0 : slabdata 145 145 0
dnotify_cache 0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_epi 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
kioctx 0 0 160 24 1 : tunables 120 60 0 : slabdata 0 0 0
kiocb 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
fasync_cache 2 203 16 203 1 : tunables 120 60 0 : slabdata 1 1 0
shmem_inode_cache 748 756 408 9 1 : tunables 54 27 0 : slabdata 84 84 0
posix_timers_cache 0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
uid_cache 6 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_ioc 51 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_queue 2 10 380 10 1 : tunables 54 27 0 : slabdata 1 1 0
blkdev_requests 25 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0
biovec-(256) 260 260 3072 2 2 : tunables 24 12 0 : slabdata 130 130 0
biovec-128 264 265 1536 5 2 : tunables 24 12 0 : slabdata 53 53 0
biovec-64 272 275 768 5 1 : tunables 54 27 0 : slabdata 55 55 0
biovec-16 272 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
biovec-4 272 295 64 59 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 279 406 16 203 1 : tunables 120 60 0 : slabdata 2 2 0
bio 279 354 64 59 1 : tunables 120 60 0 : slabdata 6 6 0
file_lock_cache 21 44 88 44 1 : tunables 120 60 0 : slabdata 1 1 0
sock_inode_cache 110 110 352 11 1 : tunables 54 27 0 : slabdata 10 10 0
skbuff_fclone_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0
skbuff_head_cache 696 696 160 24 1 : tunables 120 60 0 : slabdata 29 29 0
acpi_operand 828 828 40 92 1 : tunables 120 60 0 : slabdata 9 9 0
acpi_parse_ext 61 84 44 84 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_parse 41 127 28 127 1 : tunables 120 60 0 : slabdata 1 1 0
acpi_state 28 78 48 78 1 : tunables 120 60 0 : slabdata 1 1 0
proc_inode_cache 215 360 332 12 1 : tunables 54 27 0 : slabdata 30 30 0
sigqueue 4 26 148 26 1 : tunables 120 60 0 : slabdata 1 1 0
radix_tree_node 3568 4046 276 14 1 : tunables 54 27 0 : slabdata 289 289 0
bdev_cache 7 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
sysfs_dir_cache 4059 4140 40 92 1 : tunables 120 60 0 : slabdata 45 45 0
mnt_cache 27 40 96 40 1 : tunables 120 60 0 : slabdata 1 1 0
inode_cache 1113 1272 316 12 1 : tunables 54 27 0 : slabdata 106 106 0
dentry_cache 5085 7569 136 29 1 : tunables 120 60 0 : slabdata 261 261 0
filp 1512 1632 160 24 1 : tunables 120 60 0 : slabdata 68 68 0
names_cache 11 11 4096 1 1 : tunables 24 12 0 : slabdata 11 11 0
idr_layer_cache 93 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0
buffer_head 3942 20592 48 78 1 : tunables 120 60 0 : slabdata 264 264 0
mm_struct 77 77 576 7 1 : tunables 54 27 0 : slabdata 11 11 0
vm_area_struct 3512 3740 88 44 1 : tunables 120 60 0 : slabdata 85 85 0
fs_cache 77 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
files_cache 78 99 448 9 1 : tunables 54 27 0 : slabdata 11 11 0
signal_cache 99 99 352 11 1 : tunables 54 27 0 : slabdata 9 9 0
sighand_cache 84 84 1312 3 1 : tunables 24 12 0 : slabdata 28 28 0
task_struct 93 93 1328 3 1 : tunables 24 12 0 : slabdata 31 31 0
anon_vma 1504 1695 8 339 1 : tunables 120 60 0 : slabdata 5 5 0
pgd 64 64 4096 1 1 : tunables 24 12 0 : slabdata 64 64 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 2 2 32768 1 8 : tunables 8 4 0 : slabdata 2 2 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 95 95 8192 1 2 : tunables 8 4 0 : slabdata 95 95 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 100 100 4096 1 1 : tunables 24 12 0 : slabdata 100 100 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 310 328 2048 2 1 : tunables 24 12 0 : slabdata 164 164 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 176 176 1024 4 1 : tunables 54 27 0 : slabdata 44 44 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 624 624 512 8 1 : tunables 54 27 0 : slabdata 78 78 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 150 150 256 15 1 : tunables 120 60 0 : slabdata 10 10 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 1702 1800 128 30 1 : tunables 120 60 0 : slabdata 60 60 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 2641 2891 64 59 1 : tunables 120 60 0 : slabdata 49 49 0
size-32 3020 3616 32 113 1 : tunables 120 60 0 : slabdata 32 32 0
kmem_cache 160 160 96 40 1 : tunables 120 60 0 : slabdata 4 4 0
and
cat /proc/vmstat
nr_dirty 6
nr_writeback 0
nr_unstable 0
nr_page_table_pages 299
nr_mapped 39613
nr_slab 4128
pgpgin 853871
pgpgout 697604
pswpin 0
pswpout 33
pgalloc_high 0
pgalloc_normal 7729542
pgalloc_dma 739299
pgfree 8475900
pgactivate 194732
pgdeactivate 167948
pgfault 4652531
pgmajfault 2200
pgrefill_high 0
pgrefill_normal 921490
pgrefill_dma 53701
pgsteal_high 0
pgsteal_normal 225142
pgsteal_dma 32821
pgscan_kswapd_high 0
pgscan_kswapd_normal 218790
pgscan_kswapd_dma 31262
pgscan_direct_high 0
pgscan_direct_normal 63855
pgscan_direct_dma 10391
pginodesteal 888
slabs_scanned 1641984
kswapd_steal 196892
kswapd_inodesteal 17749
pageoutrun 5595
allocstall 1531
pgrotated 71
nr_bounce 0
--
mattia
:wq!
Mattia Dongili <[email protected]> wrote:
>
> On Wed, Sep 21, 2005 at 10:28:39PM -0700, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> Herm... running almost good :) I just got the below allocation failure
> (including /proc/slabinfo and /proc/vmstat, useful? can provide more
> info if happens again - ah, exim is just running for the local delivery
> purpose only). I did see it previously in .14-rc1-mm1 only but I didn't
> find time enough to report it properly.
>
> ...
> exim4: page allocation failure. order:1, mode:0x80000020
Yes, it's expected that
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
fragmentation and will hence cause higher-order allocation attempts to
fail.
I think I'll drop that one.
Folks,
Upon quick testing the latest mm kernel it appears there's some kind of
race condition when using dual core cpu esp when using XORG and USB
(although PS2 has same issue) kebyboard rate being too fast.
The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
to XORG list in hopes to help debug this issue.
The platform is nForce4 SLI from ASUS (A8N-SLI Premium) with dual core
X2 Athlon 3800+ processor.
XORG version is 6.8.2 under Slackware 10.2.
uname -a reports: Linux blaze 2.6.14-rc2-mm1 #1 SMP Sun Sep 25 17:03:22
EDT 2005 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
AuthenticAMD GNU/Linux
kernel config, dmesg output and lspci -vvv will be attached below.
I have confirmed this with another fellow who is using the same setup
and is having the same issue. Also worth noting is that the SATA
performance is very poor, the hdparm results give ~33mb/s where on
nforce2 previously the rates would be in ~58mb/s range. In comparison
SCSI rates are in ~52mb/s range. This both happens on sata_nv and
sata_sil controllers on this mainboard.
One of the workarounds for me is to turn the keyboard rate in
gnome-keybaord tools which helps. Also when browsing websites the USB
mouse has problems with scrolling and the window painting seems very
slow, like when typing www. in url bar can take up to 10 seconds before
the bar shows previously entered urls. Playing mp3 makes the music
skip very badly. I had not tried to use UP kernel but from the
reports i've read the issue is gone when using X. As noted here
http://lists.freedesktop.org/archives/xorg/2005-September/010148.html
I can help debugging this and if more info is needed please CC the
responses.
Best Regards,
Paul B.
Paul Blazejowski <[email protected]> wrote:
>
> Upon quick testing the latest mm kernel it appears there's some kind of
> race condition when using dual core cpu esp when using XORG and USB
> (although PS2 has same issue) kebyboard rate being too fast.
>
> The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
> to XORG list in hopes to help debug this issue.
Is it possible to narrow this down a bit further? Was 2.6.12 OK?
If we can identify two reasonably-close-in-time versions either side of the
regression then the next step would be to run `dmesg -s 1000000' under both
kernel versions, then run `diff -u dmesg.good dmesg.bad'.
I had the same problem with 2.6.12. I'll run some tests with older kernels.
On 9/25/05, Andrew Morton <[email protected]> wrote:
> Paul Blazejowski <[email protected]> wrote:
> >
> > Upon quick testing the latest mm kernel it appears there's some kind of
> > race condition when using dual core cpu esp when using XORG and USB
> > (although PS2 has same issue) kebyboard rate being too fast.
> >
> > The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
> > to XORG list in hopes to help debug this issue.
>
> Is it possible to narrow this down a bit further? Was 2.6.12 OK?
>
> If we can identify two reasonably-close-in-time versions either side of the
> regression then the next step would be to run `dmesg -s 1000000' under both
> kernel versions, then run `diff -u dmesg.good dmesg.bad'.
>
>
--
Carlo J. Calica
On Sun, 25 Sep 2005, Paul Blazejowski wrote:
> Upon quick testing the latest mm kernel it appears there's some kind of
> race condition when using dual core cpu esp when using XORG and USB
> (although PS2 has same issue) kebyboard rate being too fast.
Does the following patch by John Stultz fix the problem?
Tim
>From [email protected] Mon Sep 26 09:04:08 2005
Date: Mon, 19 Sep 2005 12:16:43 -0700
From: john stultz <[email protected]>
To: Andrew Morton <[email protected]>
Cc: lkml <[email protected]>, Andi Kleen <[email protected]>
Subject: [PATCH] x86-64: Fix bad assumption that dualcore cpus have synced
TSCs
Andrew,
This patch should resolve the issue seen in bugme bug #5105, where it
is assumed that dualcore x86_64 systems have synced TSCs. This is not
the case, and alternate timesources should be used instead.
For more details, see:
http://bugzilla.kernel.org/show_bug.cgi?id=5105
Please consider for inclusion in your tree.
thanks
-john
diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -959,9 +959,6 @@ static __init int unsynchronized_tsc(voi
are handled in the OEM check above. */
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
return 0;
- /* All in a single socket - should be synchronized */
- if (cpus_weight(cpu_core_map[0]) == num_online_cpus())
- return 0;
#endif
/* Assume multi socket systems are not synchronized */
return num_online_cpus() > 1;
On Sat, Sep 24, 2005 at 11:23:39AM -0700, Andrew Morton wrote:
> >
> > ...
> > exim4: page allocation failure. order:1, mode:0x80000020
>
> Yes, it's expected that
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch will cause more
> fragmentation and will hence cause higher-order allocation attempts to
> fail.
>
> I think I'll drop that one.
Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).
I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.
[PATCH]: Reduce the high mark in cpu's cold pcp list.
Signed-off-by: Rohit Seth <[email protected]>
--- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
+++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
@@ -1749,7 +1749,7 @@
pcp = &p->pcp[1]; /* cold*/
pcp->count = 0;
pcp->low = 0;
- pcp->high = 2 * batch;
+ pcp->high = batch / 2;
pcp->batch = max(1UL, batch/2);
INIT_LIST_HEAD(&pcp->list);
}
Hi again,
On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
>
> - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
>
> - Various random other things - nothing major.
Just noticed this oops from about 4am this morning. This would have been at
about the time when the normal daily cronjobs are run, but shouldn't have been
doing much else.
Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
Sep 27 04:04:29 tornado kernel: [<c02a780d>] sys_send+0x36/0x38
Sep 27 04:04:29 tornado kernel: [<c02a7ef7>] sys_socketcall+0x134/0x251
Sep 27 04:04:29 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:04:29 tornado kernel: Mem-info:
Sep 27 04:04:29 tornado kernel: DMA per-cpu:
Sep 27 04:04:29 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:04:29 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:04:29 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:04:30 tornado kernel: Normal per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:346
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:324
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:112
Sep 27 04:04:30 tornado kernel: HighMem per-cpu:
Sep 27 04:04:30 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:38
Sep 27 04:04:30 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:27
Sep 27 04:04:30 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:36
Sep 27 04:04:30 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:5
Sep 27 04:04:30 tornado kernel: Free pages: 38404kB (2720kB HighMem)
Sep 27 04:04:31 tornado kernel: Active:139410 inactive:49515 dirty:135
writeback:1 unstable:0 free:9601 slab:54525 mapped:88304 pagetables:776
Sep 27 04:04:31 tornado kernel: DMA free:5828kB min:68kB low:84kB high:100kB
active:100kB inactive:944kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:04:31 tornado kernel: Normal free:29856kB min:3756kB low:4692kB
high:5632kB active:446760kB inactive:188768kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:31 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:04:32 tornado kernel: HighMem free:2720kB min:128kB low:160kB
high:192kB active:110784kB inactive:8344kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:04:32 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:04:32 tornado kernel: DMA: 803*4kB 167*8kB 50*16kB 15*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5828kB
Sep 27 04:04:32 tornado kernel: DMA32: empty
Sep 27 04:04:32 tornado kernel: Normal: 6744*4kB 360*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 29856kB
Sep 27 04:04:32 tornado kernel: HighMem: 654*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2720kB
Sep 27 04:04:32 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:04:32 tornado kernel: Free swap = 497820kB
Sep 27 04:04:32 tornado kernel: Total swap = 497936kB
Sep 27 04:04:32 tornado kernel: Free swap: 497820kB
Sep 27 04:04:32 tornado kernel: 261679 pages of RAM
Sep 27 04:04:32 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:04:32 tornado kernel: 3160 reserved pages
Sep 27 04:04:32 tornado kernel: 160186 pages shared
Sep 27 04:04:32 tornado kernel: 0 pages swap cached
Sep 27 04:04:33 tornado kernel: 135 pages dirty
Sep 27 04:04:33 tornado kernel: 1 pages writeback
Sep 27 04:04:33 tornado kernel: 88304 pages mapped
Sep 27 04:04:33 tornado kernel: 54527 pages slab
Sep 27 04:04:33 tornado kernel: 776 pages pagetables
Sep 27 04:04:59 tornado kernel: smtpd: page allocation failure. order:1,
mode:0x80000020
Sep 27 04:04:59 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
Sep 27 04:04:59 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
Sep 27 04:04:59 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
Sep 27 04:04:59 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
Sep 27 04:04:59 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
Sep 27 04:04:59 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
Sep 27 04:04:59 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
Sep 27 04:04:59 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
Sep 27 04:04:59 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
Sep 27 04:04:59 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
Sep 27 04:04:59 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
Sep 27 04:04:59 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
Sep 27 04:05:00 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
Sep 27 04:05:00 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
Sep 27 04:05:00 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
Sep 27 04:05:00 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
Sep 27 04:05:01 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
Sep 27 04:05:01 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
Sep 27 04:05:01 tornado kernel: [<c02a69c6>] sock_aio_write+0xbd/0xf6
Sep 27 04:05:01 tornado kernel: [<c0159767>] do_sync_write+0xbb/0x10a
Sep 27 04:05:01 tornado kernel: [<c01598f7>] vfs_write+0x141/0x148
Sep 27 04:05:02 tornado kernel: [<c015999f>] sys_write+0x3d/0x64
Sep 27 04:05:02 tornado kernel: [<c0102b5b>] sysenter_past_esp+0x54/0x75
Sep 27 04:05:02 tornado kernel: Mem-info:
Sep 27 04:05:02 tornado kernel: DMA per-cpu:
Sep 27 04:05:02 tornado kernel: cpu 0 hot: low 0, high 12, batch 2 used:4
Sep 27 04:05:02 tornado kernel: cpu 0 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:02 tornado kernel: cpu 1 hot: low 0, high 12, batch 2 used:10
Sep 27 04:05:03 tornado kernel: cpu 1 cold: low 0, high 4, batch 1 used:3
Sep 27 04:05:03 tornado kernel: DMA32 per-cpu: empty
Sep 27 04:05:03 tornado kernel: Normal per-cpu:
Sep 27 04:05:03 tornado kernel: cpu 0 hot: low 0, high 384, batch 64 used:23
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 128, batch 32 used:115
Sep 27 04:05:04 tornado kernel: cpu 1 hot: low 0, high 384, batch 64 used:383
Sep 27 04:05:04 tornado kernel: cpu 1 cold: low 0, high 128, batch 32 used:120
Sep 27 04:05:04 tornado kernel: HighMem per-cpu:
Sep 27 04:05:04 tornado kernel: cpu 0 hot: low 0, high 96, batch 16 used:89
Sep 27 04:05:04 tornado kernel: cpu 0 cold: low 0, high 32, batch 8 used:3
Sep 27 04:05:05 tornado kernel: cpu 1 hot: low 0, high 96, batch 16 used:5
Sep 27 04:05:05 tornado kernel: cpu 1 cold: low 0, high 32, batch 8 used:27
Sep 27 04:05:05 tornado kernel: Free pages: 39608kB (2144kB HighMem)
Sep 27 04:05:05 tornado kernel: Active:132565 inactive:56281 dirty:100
writeback:1 unstable:0 free:9902 slab:54546 mapped:88341 pagetables:776
Sep 27 04:05:05 tornado kernel: DMA free:4704kB min:68kB low:84kB high:100kB
active:224kB inactive:948kB present:16384kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 880 1006
Sep 27 04:05:05 tornado kernel: Normal free:32760kB min:3756kB low:4692kB
high:5632kB active:418168kB inactive:216412kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 1009
Sep 27 04:05:05 tornado kernel: HighMem free:2144kB min:128kB low:160kB
high:192kB active:111868kB inactive:7764kB present:129212kB pages_scanned:0
all_unreclaimable? no
Sep 27 04:05:05 tornado kernel: lowmem_reserve[]: 0 0 0 0
Sep 27 04:05:05 tornado kernel: DMA: 936*4kB 108*8kB 6*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4704kB
Sep 27 04:05:05 tornado kernel: DMA32: empty
Sep 27 04:05:05 tornado kernel: Normal: 7484*4kB 349*8kB 2*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32760kB
Sep 27 04:05:05 tornado kernel: HighMem: 510*4kB 13*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2144kB
Sep 27 04:05:05 tornado kernel: Swap cache: add 40, delete 40, find 1/2, race 0+0
Sep 27 04:05:05 tornado kernel: Free swap = 497820kB
Sep 27 04:05:05 tornado kernel: Total swap = 497936kB
Sep 27 04:05:05 tornado kernel: Free swap: 497820kB
Sep 27 04:05:05 tornado kernel: 261679 pages of RAM
Sep 27 04:05:05 tornado kernel: 32303 pages of HIGHMEM
Sep 27 04:05:05 tornado kernel: 3160 reserved pages
Sep 27 04:05:06 tornado kernel: 165825 pages shared
Sep 27 04:05:06 tornado kernel: 0 pages swap cached
Sep 27 04:05:06 tornado kernel: 100 pages dirty
Sep 27 04:05:06 tornado kernel: 1 pages writeback
Sep 27 04:05:06 tornado kernel: 88341 pages mapped
Sep 27 04:05:06 tornado kernel: 54546 pages slab
Sep 27 04:05:06 tornado kernel: 776 pages pagetables
reuben
Reuben Farrelly <[email protected]> wrote:
>
> On 22/09/2005 5:28 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/
> >
> > - Added git tree `git-sas.patch': Luben Tuikov's SAS driver and its support.
> >
> > - Various random other things - nothing major.
>
> Just noticed this oops from about 4am this morning. This would have been at
> about the time when the normal daily cronjobs are run, but shouldn't have been
> doing much else.
>
>
> Sep 27 04:04:28 tornado kernel: smbd: page allocation failure. order:1,
> mode:0x80000020
> Sep 27 04:04:28 tornado kernel: [<c0103ad0>] dump_stack+0x17/0x19
> Sep 27 04:04:28 tornado kernel: [<c013f84a>] __alloc_pages+0x2d8/0x3ef
> Sep 27 04:04:28 tornado kernel: [<c0142b32>] kmem_getpages+0x2c/0x91
> Sep 27 04:04:28 tornado kernel: [<c0144136>] cache_grow+0xa2/0x1aa
> Sep 27 04:04:28 tornado kernel: [<c0144810>] cache_alloc_refill+0x279/0x2bb
> Sep 27 04:04:28 tornado kernel: [<c0144da9>] __kmalloc+0xc7/0xe7
> Sep 27 04:04:28 tornado kernel: [<c02ab386>] pskb_expand_head+0x4b/0x11a
> Sep 27 04:04:28 tornado kernel: [<c02afd34>] skb_checksum_help+0xcb/0xe5
> Sep 27 04:04:28 tornado kernel: [<c0302b0d>] ip_nat_fn+0x16d/0x1bf
> Sep 27 04:04:28 tornado kernel: [<c0302cdc>] ip_nat_local_fn+0x57/0x8d
> Sep 27 04:04:28 tornado kernel: [<c03068ef>] nf_iterate+0x59/0x7d
> Sep 27 04:04:28 tornado kernel: [<c030695d>] nf_hook_slow+0x4a/0x109
> Sep 27 04:04:28 tornado kernel: [<c02ca035>] ip_queue_xmit+0x23c/0x4f5
> Sep 27 04:04:28 tornado kernel: [<c02da477>] tcp_transmit_skb+0x3ce/0x713
> Sep 27 04:04:29 tornado kernel: [<c02db53b>] tcp_write_xmit+0x124/0x37b
> Sep 27 04:04:29 tornado kernel: [<c02db7b3>] __tcp_push_pending_frames+0x21/0x70
> Sep 27 04:04:29 tornado kernel: [<c02d0b45>] tcp_sendmsg+0x9cc/0xabc
> Sep 27 04:04:29 tornado kernel: [<c02ed3dd>] inet_sendmsg+0x2e/0x4c
> Sep 27 04:04:29 tornado kernel: [<c02a6691>] sock_sendmsg+0xbf/0xe3
> Sep 27 04:04:29 tornado kernel: [<c02a77be>] sys_sendto+0xa5/0xbe
No, this is simply a warning - the kernel ran out of 1-order pages in the
page allocator. There have been several reports of this after
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
which was rather expected.
I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
which address fragmentation at this level. If that code gets there then we
can take another look at
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
> No, this is simply a warning - the kernel ran out of 1-order pages in the
> page allocator. There have been several reports of this after
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch was merged,
> which was rather expected.
>
> I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> which address fragmentation at this level. If that code gets there then we
> can take another look at
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
Me no understand. We're going to deliberately cause fragmentation in order
to defragment it again later ???
M.
> Seems like from the log messages that quite a few pages are hanging in the cpu's cold pcp list even with the low memory conditions. Below is the patch to reduce the higher bound in cold pcp list (...this got increased with my previous change).
>
> I think we should also drain the CPU's hot and cold pcps for the GFP_KERNEL page requests (in the event the higher order request is not able to get serviced otherwise). This will still only drains the current CPUs pcps in an MP environment (leaving the other CPUs with their lists intact). I will send this patch later today.
>
> [PATCH]: Reduce the high mark in cpu's cold pcp list.
>
> Signed-off-by: Rohit Seth <[email protected]>
>
>
> --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000 -0700
> +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000 -0700
> @@ -1749,7 +1749,7 @@
> pcp = &p->pcp[1]; /* cold*/
> pcp->count = 0;
> pcp->low = 0;
> - pcp->high = 2 * batch;
> + pcp->high = batch / 2;
> pcp->batch = max(1UL, batch/2);
> INIT_LIST_HEAD(&pcp->list);
> }
> -
I don't understand. How can you set the high watermark at half the batch
size? Makes no sense to me.
And can you give a stricter definiton of what you mean by "low memory
conditions"? I agree we ought to empty the lists before going OOM or
anything, but not at the slightest feather of pressure ... answer lies
somewhere inbetween ... but where?
M.
On Tue, 2005-09-27 at 11:57 -0700, Martin J. Bligh wrote:
> > Seems like from the log messages that quite a few pages are hanging
> in the cpu's cold pcp list even with the low memory conditions. Below
> is the patch to reduce the higher bound in cold pcp list (...this got
> increased with my previous change).
>
> >
> > I think we should also drain the CPU's hot and cold pcps for the
> GFP_KERNEL page requests (in the event the higher order request is not
> able to get serviced otherwise). This will still only drains the
> current CPUs pcps in an MP environment (leaving the other CPUs with
> their lists intact). I will send this patch later today.
>
> >
> > [PATCH]: Reduce the high mark in cpu's cold pcp list.
> >
> > Signed-off-by: Rohit Seth <[email protected]>
> >
> >
> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
> -0700
> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
> -0700
> > @@ -1749,7 +1749,7 @@
> > pcp = &p->pcp[1]; /* cold*/
> > pcp->count = 0;
> > pcp->low = 0;
> > - pcp->high = 2 * batch;
> > + pcp->high = batch / 2;
> > pcp->batch = max(1UL, batch/2);
> > INIT_LIST_HEAD(&pcp->list);
> > }
> > -
>
> I don't understand. How can you set the high watermark at half the
> batch size? Makes no sense to me.
>
The batch size for the cold pcp list is getting initialized to batch/2
in the code snip above. So, this change is setting the high water mark
for cold list to same as pcp's batch number.
> And can you give a stricter definiton of what you mean by "low memory
> conditions"? I agree we ought to empty the lists before going OOM or
> anything, but not at the slightest feather of pressure ... answer lies
> somewhere inbetween ... but where?
>
In the specific case of dump information that Mattia sent earlier, there
is only 4M of free mem available at the time the order 1 request is
failing.
In general, I think if a specific higher order ( > 0) request fails that
has GFP_KERNEL set then at least we should drain the pcps.
-rohit
>> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
>> -0700
>> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
>> -0700
>> > @@ -1749,7 +1749,7 @@
>> > pcp = &p->pcp[1]; /* cold*/
>> > pcp->count = 0;
>> > pcp->low = 0;
>> > - pcp->high = 2 * batch;
>> > + pcp->high = batch / 2;
>> > pcp->batch = max(1UL, batch/2);
>> > INIT_LIST_HEAD(&pcp->list);
>> > }
>> > -
>>
>> I don't understand. How can you set the high watermark at half the
>> batch size? Makes no sense to me.
>>
>
> The batch size for the cold pcp list is getting initialized to batch/2
> in the code snip above. So, this change is setting the high water mark
> for cold list to same as pcp's batch number.
I must be being particularly dense today ... but:
pcp->high = batch / 2;
Looks like half the batch size to me, not the same?
>> And can you give a stricter definiton of what you mean by "low memory
>> conditions"? I agree we ought to empty the lists before going OOM or
>> anything, but not at the slightest feather of pressure ... answer lies
>> somewhere inbetween ... but where?
>>
>
> In the specific case of dump information that Mattia sent earlier, there
> is only 4M of free mem available at the time the order 1 request is
> failing.
>
> In general, I think if a specific higher order ( > 0) request fails that
> has GFP_KERNEL set then at least we should drain the pcps.
Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
for jumbo ethernet, or NFS, you want to drain the lists? that seems to
wholly defeat the purpose.
Could you elaborate on what the benefits were from this change in the
first place? Some page colouring thing on ia64? It seems to have way more
downside than upside to me.
M.
On Tue, 2005-09-27 at 14:18 -0700, Martin J. Bligh wrote:
> >> > --- linux-2.6.13.old/mm/page_alloc.c 2005-09-26 10:57:07.000000000
> >> -0700
> >> > +++ linux-2.6.13.work/mm/page_alloc.c 2005-09-26 10:47:57.000000000
> >> -0700
> >> > @@ -1749,7 +1749,7 @@
> >> > pcp = &p->pcp[1]; /* cold*/
> >> > pcp->count = 0;
> >> > pcp->low = 0;
> >> > - pcp->high = 2 * batch;
> >> > + pcp->high = batch / 2;
> >> > pcp->batch = max(1UL, batch/2);
> >> > INIT_LIST_HEAD(&pcp->list);
> >> > }
> >> > -
> >>
> >> I don't understand. How can you set the high watermark at half the
> >> batch size? Makes no sense to me.
> >>
> >
> > The batch size for the cold pcp list is getting initialized to batch/2
> > in the code snip above. So, this change is setting the high water mark
> > for cold list to same as pcp's batch number.
>
> I must be being particularly dense today ... but:
>
> pcp->high = batch / 2;
>
> Looks like half the batch size to me, not the same?
pcp->batch = max(1UL, batch/2); is the line of code that is setting the
batch value for the cold pcp list. batch is just a number that we
counted based on some parameters earlier.
>
> >> And can you give a stricter definiton of what you mean by "low memory
> >> conditions"? I agree we ought to empty the lists before going OOM or
> >> anything, but not at the slightest feather of pressure ... answer lies
> >> somewhere inbetween ... but where?
> >>
> >
> > In the specific case of dump information that Mattia sent earlier, there
> > is only 4M of free mem available at the time the order 1 request is
> > failing.
> >
> > In general, I think if a specific higher order ( > 0) request fails that
> > has GFP_KERNEL set then at least we should drain the pcps.
>
> Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
> for jumbo ethernet, or NFS, you want to drain the lists? that seems to
> wholly defeat the purpose.
>
Not every time there is a request for higher order pages. That surely
will defeat the purpose of pcps. But my suggestion is only to drain
when the the global pool is not able to service the request. In the
pathological case where the higher order and zero order requests are
alternating you could have thrashing in terms of pages moving to pcp for
them to move back to global list.
> Could you elaborate on what the benefits were from this change in the
> first place? Some page colouring thing on ia64? It seems to have way more
> downside than upside to me.
The original change was to try to allocate a higher order page to
service a batch size bulk request. This was with the hope that better
physical contiguity will spread the data better across big caches.
-rohit
>> I must be being particularly dense today ... but:
>>
>> pcp->high = batch / 2;
>>
>> Looks like half the batch size to me, not the same?
>
> pcp->batch = max(1UL, batch/2); is the line of code that is setting the
> batch value for the cold pcp list. batch is just a number that we
> counted based on some parameters earlier.
Ah, OK, so I am being dense. Fair enough. But if there's a reason to do
that max, perhaps:
pcp->batch = max(1UL, batch/2);
pcp->high = pcp->batch;
would be more appropriate? Tradeoff is more frequent dump / fill against
better frag, I suppose (at least if we don't refill using higher order
allocs ;-)) which seems fair enough.
>> > In general, I think if a specific higher order ( > 0) request fails that
>> > has GFP_KERNEL set then at least we should drain the pcps.
>>
>> Mmmm. so every time we fork a process with 8K stacks, or allocate a frame
>> for jumbo ethernet, or NFS, you want to drain the lists? that seems to
>> wholly defeat the purpose.
>
> Not every time there is a request for higher order pages. That surely
> will defeat the purpose of pcps. But my suggestion is only to drain
> when the the global pool is not able to service the request. In the
> pathological case where the higher order and zero order requests are
> alternating you could have thrashing in terms of pages moving to pcp for
> them to move back to global list.
OK, seems fair enough. But there's multiple "harder and harder" attempts
within __alloc_pages to do that ... which one are you going for? just
before we OOM / fail the alloc? That'd be hard to argue with, though I'm
unsure what the locking is to dump out other CPUs queues - you going to
global IPI and ask them to do it - that'd seem to cause it to race to
refill (as you mention).
>> Could you elaborate on what the benefits were from this change in the
>> first place? Some page colouring thing on ia64? It seems to have way more
>> downside than upside to me.
>
> The original change was to try to allocate a higher order page to
> service a batch size bulk request. This was with the hope that better
> physical contiguity will spread the data better across big caches.
OK ... but it has an impact on fragmentation. How much benefit are you
getting?
M.
On Tue, 2005-09-27 at 14:59 -0700, Martin J. Bligh wrote:
> pcp->batch = max(1UL, batch/2);
> pcp->high = pcp->batch;
>
> would be more appropriate? Tradeoff is more frequent dump / fill against
> better frag, I suppose (at least if we don't refill using higher order
> allocs ;-)) which seems fair enough.
>
There are couple of small changes including this one that I will be
sending out in this initialization routine.
> >
> > Not every time there is a request for higher order pages. That surely
> > will defeat the purpose of pcps. But my suggestion is only to drain
> > when the the global pool is not able to service the request. In the
> > pathological case where the higher order and zero order requests are
> > alternating you could have thrashing in terms of pages moving to pcp for
> > them to move back to global list.
>
> OK, seems fair enough. But there's multiple "harder and harder" attempts
> within __alloc_pages to do that ... which one are you going for? just
> before we OOM / fail the alloc? That'd be hard to argue with, though I'm
> unsure what the locking is to dump out other CPUs queues - you going to
> global IPI and ask them to do it - that'd seem to cause it to race to
> refill (as you mention).
>
Thinking of initiating this drain operation after the swapper daemon is
woken up. hopefully that will allow other possible pages to be put back
on freelist and reduce the possible thrash of pages between freemem pool
and pcps.
As a first step, I will be draining the local cpu's pcp. IPI or lazy
purging of pcps could be used as a a very last resort to drain other
CPUs pcps for the scnearios where nothing else has worked to get more
pages. For these extreme low memory conditions I'm not sure if we
should worry about thrashing any more than having free pages lying
around and not getting used.
> >> Could you elaborate on what the benefits were from this change in the
> >> first place? Some page colouring thing on ia64? It seems to have way more
> >> downside than upside to me.
> >
> > The original change was to try to allocate a higher order page to
> > service a batch size bulk request. This was with the hope that better
> > physical contiguity will spread the data better across big caches.
>
> OK ... but it has an impact on fragmentation. How much benefit are you
> getting?
>
Benefit is in terms of reduced performance variation (and expected
throughput) of certain workloads from run to run on the same kernel.
-rohit
>> > Not every time there is a request for higher order pages. That surely
>> > will defeat the purpose of pcps. But my suggestion is only to drain
>> > when the the global pool is not able to service the request. In the
>> > pathological case where the higher order and zero order requests are
>> > alternating you could have thrashing in terms of pages moving to pcp for
>> > them to move back to global list.
>>
>> OK, seems fair enough. But there's multiple "harder and harder" attempts
>> within __alloc_pages to do that ... which one are you going for? just
>> before we OOM / fail the alloc? That'd be hard to argue with, though I'm
>> unsure what the locking is to dump out other CPUs queues - you going to
>> global IPI and ask them to do it - that'd seem to cause it to race to
>> refill (as you mention).
>>
>
> Thinking of initiating this drain operation after the swapper daemon is
> woken up. hopefully that will allow other possible pages to be put back
> on freelist and reduce the possible thrash of pages between freemem pool
> and pcps.
OK, but waking up kswapd doesn't indicate a low memory condition.
It's standard procedure .... we'll have to wake it up whenever we dip
below the high watermarks. Perhaps before dropping into direct reclaim
would be more appropriate?
> As a first step, I will be draining the local cpu's pcp. IPI or lazy
> purging of pcps could be used as a a very last resort to drain other
> CPUs pcps for the scnearios where nothing else has worked to get more
> pages. For these extreme low memory conditions I'm not sure if we
> should worry about thrashing any more than having free pages lying
> around and not getting used.
Sounds fair.
>> >> Could you elaborate on what the benefits were from this change in the
>> >> first place? Some page colouring thing on ia64? It seems to have way more
>> >> downside than upside to me.
>> >
>> > The original change was to try to allocate a higher order page to
>> > service a batch size bulk request. This was with the hope that better
>> > physical contiguity will spread the data better across big caches.
>>
>> OK ... but it has an impact on fragmentation. How much benefit are you
>> getting?
>
> Benefit is in terms of reduced performance variation (and expected
> throughput) of certain workloads from run to run on the same kernel.
Mmmm. how much are you talking about in terms of throughput, and on what
platforms? all previous attempts to measure page colouring seemed to
indicate it did nothing at all - maybe some specific types of h/w are
more susceptible?
M.
On Tue, 2005-09-27 at 15:49 -0700, Martin J. Bligh wrote:
> >
> > Thinking of initiating this drain operation after the swapper daemon is
> > woken up. hopefully that will allow other possible pages to be put back
> > on freelist and reduce the possible thrash of pages between freemem pool
> > and pcps.
>
> OK, but waking up kswapd doesn't indicate a low memory condition.
> It's standard procedure .... we'll have to wake it up whenever we dip
> below the high watermarks. Perhaps before dropping into direct reclaim
> would be more appropriate?
>
Agreed. That is a better place.
> >> >> Could you elaborate on what the benefits were from this change in the
> >> >> first place? Some page colouring thing on ia64? It seems to have way more
> >> >> downside than upside to me.
> >> >
> >> > The original change was to try to allocate a higher order page to
> >> > service a batch size bulk request. This was with the hope that better
> >> > physical contiguity will spread the data better across big caches.
> >>
> >> OK ... but it has an impact on fragmentation. How much benefit are you
> >> getting?
> >
> > Benefit is in terms of reduced performance variation (and expected
> > throughput) of certain workloads from run to run on the same kernel.
>
> Mmmm. how much are you talking about in terms of throughput, and on what
> platforms? all previous attempts to measure page colouring seemed to
> indicate it did nothing at all - maybe some specific types of h/w are
> more susceptible?
>
In terms of percentages, between 10-15% variation. Nothing out of
regular about the platforms. Do you remember what workloads were run in
the previous attempts to see if there is any coloring. I agree that
with 2.6.x based kernel, there is better handle on the variation (as
compared to 2.4). And the best results of 2.6 matches the best results
of any coloring patch.
-rohit
>
On Mon, Sep 26, 2005 at 09:14:02AM +0200, Tim Schmielau wrote:
> On Sun, 25 Sep 2005, Paul Blazejowski wrote:
>
> > Upon quick testing the latest mm kernel it appears there's some kind of
> > race condition when using dual core cpu esp when using XORG and USB
> > (although PS2 has same issue) kebyboard rate being too fast.
>
> Does the following patch by John Stultz fix the problem?
>
> Tim
Tim,
No it does not, from my understanding it only pertains to x86_64 but
currently i run i386 SMP enabled kernel on the dualcore X2 processor.
Also worth noting is that i do not see any failures or errors in dmesg
related to lost timers. Perhaps this is something new? I even run a
script from the bugzilla and the output matched both cpu's.
Thanks,
Paul
>
>
> From [email protected] Mon Sep 26 09:04:08 2005
> Date: Mon, 19 Sep 2005 12:16:43 -0700
> From: john stultz <[email protected]>
> To: Andrew Morton <[email protected]>
> Cc: lkml <[email protected]>, Andi Kleen <[email protected]>
> Subject: [PATCH] x86-64: Fix bad assumption that dualcore cpus have synced
> TSCs
>
> Andrew,
> This patch should resolve the issue seen in bugme bug #5105, where it
> is assumed that dualcore x86_64 systems have synced TSCs. This is not
> the case, and alternate timesources should be used instead.
>
> For more details, see:
> http://bugzilla.kernel.org/show_bug.cgi?id=5105
>
>
> Please consider for inclusion in your tree.
>
> thanks
> -john
>
> diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
> --- a/arch/x86_64/kernel/time.c
> +++ b/arch/x86_64/kernel/time.c
> @@ -959,9 +959,6 @@ static __init int unsynchronized_tsc(voi
> are handled in the OEM check above. */
> if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> return 0;
> - /* All in a single socket - should be synchronized */
> - if (cpus_weight(cpu_core_map[0]) == num_online_cpus())
> - return 0;
> #endif
> /* Assume multi socket systems are not synchronized */
> return num_online_cpus() > 1;
>
>
>
On Sun, Sep 25, 2005 at 04:44:21PM -0700, Andrew Morton wrote:
> Paul Blazejowski <[email protected]> wrote:
> >
> > Upon quick testing the latest mm kernel it appears there's some kind of
> > race condition when using dual core cpu esp when using XORG and USB
> > (although PS2 has same issue) kebyboard rate being too fast.
> >
> > The same behaviour happens on vanilla 2.6.13 kernel. Reporting this also
> > to XORG list in hopes to help debug this issue.
>
> Is it possible to narrow this down a bit further? Was 2.6.12 OK?
>
> If we can identify two reasonably-close-in-time versions either side of the
> regression then the next step would be to run `dmesg -s 1000000' under both
> kernel versions, then run `diff -u dmesg.good dmesg.bad'.
>
>
No 2.6.12 is not OK. I don't think there's any regression between the
recent kernels. It just does not work on 3 of them i tried so far.
I am attatching diff from 2.6.12/2.6.13 against 2.6.14-rc2-mm1.
On 9/27/05, Paul Blazejowski <[email protected]> wrote:
> No 2.6.12 is not OK. I don't think there's any regression between the
> recent kernels. It just does not work on 3 of them i tried so far.
>
Another data point:
I'm unable to reproduce on a PATA install. Specifically, booting on a
PATA HD with sata_nv as a module. When booting on a SATA HD with
sata_nv compiled in, I get the race. Setting irq 1,5 (keyboard and
libata) handlers to cpu0 affinity and X affinity to cpu0 solves the
problem.
I haven't had time to try booting SATA with sata_nv as a module in initrd.
--
Carlo J. Calica
Martin, responding to Andrew:
> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> > which address fragmentation at this level. If that code gets there then we
> > can take another look at
> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
>
> Me no understand. We're going to deliberately cause fragmentation in order
> to defragment it again later ???
I thought that the patches of Mel Gorman and Joel Schopp were reducing
fragmentation, not causing it.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401
--Paul Jackson <[email protected]> wrote (on Sunday, October 02, 2005 10:13:19 -0700):
> Martin, responding to Andrew:
>> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
>> > which address fragmentation at this level. If that code gets there then we
>> > can take another look at
>> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
>>
>> Me no understand. We're going to deliberately cause fragmentation in order
>> to defragment it again later ???
>
> I thought that the patches of Mel Gorman and Joel Schopp were reducing
> fragmentation, not causing it.
They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
seems to be going in the opposite direction.
M.
On Sun, 2005-10-02 at 14:31 -0700, Martin J. Bligh wrote:
>
> --Paul Jackson <[email protected]> wrote (on Sunday, October 02, 2005 10:13:19 -0700):
>
> > Martin, responding to Andrew:
> >> > I've dropped that patch. Joel Schopp is working on Mel Gorman's patches
> >> > which address fragmentation at this level. If that code gets there then we
> >> > can take another look at
> >> > mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch.
> >>
> >> Me no understand. We're going to deliberately cause fragmentation in order
> >> to defragment it again later ???
> >
> > I thought that the patches of Mel Gorman and Joel Schopp were reducing
> > fragmentation, not causing it.
>
> They were. but mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk
> seems to be going in the opposite direction.
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
allocate more physical contiguous pages for pcp. This would cause some
extra fragmentation at the higher orders but has the potential benefit
of spreading more uniformly across caches. I agree though that for this
scheme to work nicely we should have the capability of draining the pcps
so that higher order requests can be serviced whenever possible.
-rohit
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk patch tries to
> allocate more physical contiguous pages for pcp. This would cause some
> extra fragmentation at the higher orders but has the potential benefit
> of spreading more uniformly across caches. I agree though that for this
> scheme to work nicely we should have the capability of draining the pcps
> so that higher order requests can be serviced whenever possible.
Unfortunately, I don't think it's that simple. We'll end up taking the
higher order elements from the buddy into the caches, and using them
all piecemeal - ie fragmenting it all.
If we take lists of 0 order pages from the buddy, we're trying to use
whatever dross was left over in there (from a fragmentation point of view)
up first, before breaking into the more precious stuff (phys contig bits).
That was why I wrote it that way in the first place - it wasn't
accidental ;-)
>From the direction the thread was going in previously, it sounded like
you were finding other ways to alleviate the colouring issue you were
seeing ... I was hoping that would fix it up enough the desire for higher
order allocations would disappear.
To be blunt about it ... making sure that we don't fall over on higher
order allocs seems to me to be more important than a bit of variability
in benchmark runs ...
M.