2005-02-10 10:39:54

by Andrew Morton

[permalink] [raw]
Subject: 2.6.11-rc3-mm2



ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/


- Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
It seems that nothing else is going to come along and this is completely
encapsulated.

- Various other stuff. If anyone has a patch in here which they think
should be in 2.6.11, please let me know. I'm intending to merge the
following into 2.6.11:

alpha-add-missing-dma_mapping_error.patch
fix-compat-shmget-overflow.patch
fix-shmget-for-ppc64-s390-64-sparc64.patch
binfmt_elf-clearing-bss-may-fail.patch
qlogic-warning-fixes.patch
oprofile-exittext-referenced-in-inittext.patch
force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
oprofile-arm-xscale1-pmu-support-fix.patch




Changes since 2.6.11-rc3-mm1:


linus.patch
bk-agpgart.patch
bk-alsa.patch
bk-arm.patch
bk-cifs.patch
bk-cpufreq.patch
bk-drm-via.patch
bk-i2c.patch
bk-ieee1394.patch
bk-input.patch
bk-dtor-input.patch
bk-jfs.patch
bk-kbuild.patch
bk-kconfig.patch
bk-netdev.patch
bk-ntfs.patch
bk-scsi.patch
bk-scsi-rc-fixes.patch
bk-serial.patch
bk-usb.patch
bk-watchdog.patch
bk-xfs.patch

External bk trees.

-fix-an-error-in-proc-slabinfo-print.patch
-ibmveth-inlining-failure.patch
-fix-devfs-name-for-the-hvcs-driver.patch
-uml-compile-fixes.patch
-include-jiffies-fix-usecs_to_jiffies-jiffies_to_usecs-math.patch
-credits-update.patch
-nfsd-needs-exportfs.patch
-input-make-mousedevc-report-all-events-to-user-space-immediately.patch
-input-enable-hardware-tapping-for-alps-touchpads.patch
-input-fix-pointer-jumps-to-corner-of-screen-problem-on-alps-glidepoint-touchpads.patch
-input-add-support-for-synaptics-touchpad-scroll-wheels.patch
-driver-model-fix-types-in-usb.patch
-kswapd-throttling-fix.patch
-task_size-is-variable.patch
-use-mm_vm_size-in-exit_mmap.patch
-ppc64-correct-return-code-in-syscall-auditing.patch
-ppc64-show-1-for-physical_id-of-non-present-cpus.patch
-ppc64-replace-last-usage-of-vio-dma-mapping-routines.patch
-speedstep-libc-fix-frequency-multiplier-for-pentium4.patch
-x86_64-parse-noexec=.patch
-force-feedback-support-for-uinput.patch
-pcmcia-dc-initialisation-fix.patch
-scsi-megaraid_mmc-make-some-code-static.patch
-add-map_populate-sys_remap_file_pages-support-to-xfs.patch
-acpi-call-acpi_leave_sleep_state-before-resuming-devices.patch
-small-partitions-msdos-cleanups.patch
-scsi-sim710c-make-some-code-static.patch

Merged

+alpha-add-missing-dma_mapping_error.patch

Alpha build fix

+fix-compat-shmget-overflow.patch
+fix-shmget-for-ppc64-s390-64-sparc64.patch

Various compat code sign extension fixes

+binfmt_elf-clearing-bss-may-fail.patch

Fix the weird elf loading failure

+qlogic-warning-fixes.patch

smp_processor_id() warnings

+oprofile-exittext-referenced-in-inittext.patch

oprofile section fix

+force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch

Partly fix up the x86_64 noexec mapping problem.

+oprofile-arm-xscale1-pmu-support-fix.patch

oprofile fix

+nfsd--sgi-921857-find-broken-with-nohide-on-nfsv3.patch
+nfsd--exportfs-reduce-stack-usage.patch
+nfsd--svcrpc-add-a-per-flavor-set_client-method.patch
+nfsd--svcrpc-rename-pg_authenticate.patch
+nfsd--svcrpc-move-export-table-checks-to-a-per-program-pg_add_client-method.patch
+nfsd--nfs4-use-new-pg_set_client-method-to-simplify-nfs4-callback-authentication.patch
+nfsd--lockd-dont-try-to-match-callback-requests-against-export-table.patch
+nfsd--nfsd-remove-pg_authenticate-field.patch
+nfsd--global-static-cleanups-for-nfsd.patch
+nfsd--change-nfsd-reply-cache-to-use-listh-lists.patch

nfsd update

+acpi-fix-containers-notify-handler-to-handle-proper-cases-properly.patch
+acpi_power_off-bug-fix.patch

ACPI fixes

+update-to-ipmi-driver-to-support-old-dmi-spec.patch
+add-the-ipmi-smbus-driver-fix-fix.patch

IPMI fixes

+ohci1394-dma_pool_destroy-while-in_atomic-irqs_disabled.patch

1394 fix

+sbp2-fix-hang-on-unload.patch

Another 1394 fix

+serio-warning-fix.patch
+twidjoy-build-fix.patch

input code fixes

+compat-ioctl-for-submiting-urb.patch
+compat-ioctl-for-submiting-urb-fix.patch

Add compatibility emulation for the USB URB-direct-submission code.

+swapspace-layout-improvements-fix.patch

Fix swapspace-layout-improvements.patch

+fix-small-vmalloc-per-allocation-limit.patch

Make monster vmalloc()s work

+randomisation-global-sysctl-fix.patch
Fix randomisation-global-sysctl.patch

+fix-compilation-of-uml-after-the-stack-randomization-patches.patch

Fix randomisation-infrastructure.patch

+invalidate-range-of-pages-after-direct-io-write-fix-fix.patch
+write-and-wait-on-range-before-direct-io-read.patch
+only-unmap-what-intersects-a-direct_io-op.patch

Various optimisations for page unmapping, mainly related to direct-IO.

+net-s2io-replace-schedule_timeout-with-msleep.patch

cleanup

+ppc-ppc64-abstract-cpu_feature-checks.patch
+ppc32-dont-create-tmp_gas_check.patch
+ppc32-fix-mv64x60-register-relocation-bug-in-bootwrapper.patch

ppc32 updates

+ppc64-remove-unneeded-includes-from-pseries_nvramc.patch
+ppc64-collect-and-export-low-level-cpu-usage-statistics.patch
+ppc64-defconfig-updates.patch
+ppc64-distribute-export_symbols.patch
+ppc64-disable-hmt-for-rs64-cpus.patch
+use-vmlinux-during-make-install-on-ppc64.patch
+ppc64-functions-to-reserve-performance-monitor-hardware.patch

ppc64 updates

+mips-add-tanbac-tb0219-base-board-driver.patch

mips board driver

+refactor-i386-memory-setup.patch
+consolidate-set_max_mapnr_init-implementations.patch
+remove-free_all_bootmem-define.patch

x86 mm cleanups

+out-of-line-x86-put_user-implementation.patch

move x86 put_user() out of line.

+x86_64-hugetlb-fix.patch

hugepage fix

+swsusp-do-not-use-higher-order-memory-allocations-on-suspend-fix.patch
+swsusp-do-not-use-higher-order-memory-allocations-on-suspend-fix-fix.patch

Fix swsusp-do-not-use-higher-order-memory-allocations-on-suspend.patch

+fix-partial-sysrq-setting.patch

Fix a -mm-only sysrq patch

+touch_softlockup_watchdog.patch
+fix-softlockup-warning-in-swsuspend-resume.patch

Enhancements to detect-soft-lockups.patch

+add-struct-request-end_io-callback-fix.patch

Fix add-struct-request-end_io-callback.patch

+add-compiler-gcc4h.patch

cleanup

+rt-lsm.patch

realtime+mlock LSM

+convert-proc-driver-rtc-to-seq_file.patch

RTC driver /proc seqfile conversion

+drivers-char-lpc-race-fix.patch

Fix obscure line printer driver race

+clean-up-and-unify-asm-resourceh-files.patch

Code cleanup

-base-small-shrink-major_names-hash.patch

This broke stuff

+sort-fix.patch

Make the new sort function place things in sorted order.

-inotify.patch
-inotify-fix_find_inode.patch

I think my version is old, and it oopses.

+pcmcia-add-support-ti-pci4510-cardbus-bridge.patch
+pcmcia-update-vrc4171_card.patch

PCMCIA device updates

+nfsv4-deamon-always-supports-acls.patch

NFS ACL Kconfig fix

+add-do_proc_doulonglongvec_minmax-to-sysctl-functions.patch
+add-sysctl-interface-to-sched_domain-parameters.patch

We found the cause of the weird ia64 oops (extable sorting was wrong), so
bring these back.

+crashdump-routines-for-copying-dump-pages-fixes.patch
+crashdump-linear-raw-format-dump-file-access-coding-style.patch

crashdump coding cleanups

+radeonfb-fix-spurious-error-return-in-fbio_radeon_set_mirror.patch
+w100fb-make-blanking-function-interrupt-safe.patch
+kyrofb-copy__user-return-value-checks-added-to-kyro-fb.patch
+skeletonfb-documentation-fixes.patch
+intelfb-add-partial-support-915g-chipset.patch
+sisfb_compat_ioctl-warning-fix.patch
+sis-warning-fix.patch
+tridentfb-warning-fix.patch

fbdev updates and fixes

-raid5-overlapping-read-hack.patch

This got fixed for real

+md-fix-multipath-assembly-bug.patch
+md-raid-kconfig-cleanups-remove-experimental-tag-from-raid-6.patch

MD updates

+device-mapper-store-name-directly-against-device.patch
+device-mapper-record-restore-bio-state.patch
+device-mapper-export-map_info.patch

DM updates

+update-documentation-filesystems-locking.patch

Update the VFS locking doc for quotas

+hpet-setup-comment-fix.patch
+fs-ncpfs-ncplib_kernelc-make-a-function-static.patch
+kill-iphase5526.patch
+fs-nfs-make-some-code-static.patch
+i386-x86_64-acpi-sleepc-kill-unused-acpi_save_state_disk.patch
+smpbootc-cleanups.patch
+i386-kernel-i387c-misc-cleanups.patch
+i386-x86_64-i8259c-make-mask_and_ack_8259a-static.patch
+scsi-sym53c416c-make-a-function-static.patch
+scsi-ultrastorc-make-a-variable-static.patch
+tridentfbc-make-some-code-static.patch
+kernel-intermodulec-make-inter_module_get-static.patch

Various nanofixes



number of patches in -mm: 585
number of changesets in external trees: 634
number of patches in -mm only: 564
total patches: 1198




All 585 patches:



linus.patch

alpha-add-missing-dma_mapping_error.patch
alpha: add missing dma_mapping_error

fix-compat-shmget-overflow.patch
Fix compat shmget overflow

fix-shmget-for-ppc64-s390-64-sparc64.patch
Fix shmget for ppc64, s390-64 & sparc64.

binfmt_elf-clearing-bss-may-fail.patch
binfmt_elf: clearing bss may fail

qlogic-warning-fixes.patch
more qlogic smp_processor_id() warning fixes

oprofile-exittext-referenced-in-inittext.patch
OProfile: exit.text referenced in init.text

force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
Force read implies exec for all 32bit processes in x86-64

oprofile-arm-xscale1-pmu-support-fix.patch
OProfile: ARM/XScale1 PMU support fix

nfsd--sgi-921857-find-broken-with-nohide-on-nfsv3.patch
SGI 921857: find broken with nohide on NFSv3

nfsd--exportfs-reduce-stack-usage.patch
nfsd: exportfs: reduce stack usage

nfsd--svcrpc-add-a-per-flavor-set_client-method.patch
nfsd: svcrpc: add a per-flavor set_client method

nfsd--svcrpc-rename-pg_authenticate.patch
nfsd: svcrpc: rename pg_authenticate

nfsd--svcrpc-move-export-table-checks-to-a-per-program-pg_add_client-method.patch
nfsd: svcrpc: move export table checks to a per-program pg_add_client method

nfsd--nfs4-use-new-pg_set_client-method-to-simplify-nfs4-callback-authentication.patch
nfsd: nfs4: use new pg_set_client method to simplify nfs4 callback authentication

nfsd--lockd-dont-try-to-match-callback-requests-against-export-table.patch
nfsd: lockd: don't try to match callback requests against export table

nfsd--nfsd-remove-pg_authenticate-field.patch
nfsd: nfsd: remove pg_authenticate field

nfsd--global-static-cleanups-for-nfsd.patch
nfsd: global/static cleanups for nfsd

nfsd--change-nfsd-reply-cache-to-use-listh-lists.patch
nfsd: change nfsd reply cache to use list.h lists

ia64-config_apci_numa-fix.patch
ia64 CONFIG_APCI_NUMA fix

ia64-acpi-build-fix.patch
ia64 acpi build fix

add-try_acquire_console_sem.patch
Add try_acquire_console_sem

update-aty128fb-sleep-wakeup-code-for-new-powermac-changes.patch
update aty128fb sleep/wakeup code for new powermac changes

radeonfb-update.patch
radeonfb update

radeonfb-build-fix.patch
radeonfb-build-fix

acpi-sleep-while-atomic-during-s3-resume-from-ram.patch
acpi: sleep-while-atomic during S3 resume from ram

acpi-report-errors-in-fanc.patch
ACPI: report errors in fan.c

acpi-flush-tlb-when-pagetable-changed.patch
acpi: flush TLB when pagetable changed

fix-an-issue-in-acpi-processor-and-container-drivers-related-with-kobject_hotplug.patch
Fix an issue in ACPI processor and container drivers related with kobject_hotplug()

acpi-fix-containers-notify-handler-to-handle-proper-cases-properly.patch
acpi: fix container's notify handler to handle proper cases properly

acpi_power_off-bug-fix.patch
acpi_power_off bug fix

bk-agpgart.patch

bk-alsa.patch

fix-32-bit-calls-to-snd_pcm_channel_info.patch
Fix 32-bit calls to snd_pcm_channel_info()

bk-arm.patch

bk-cifs.patch

bk-cpufreq.patch

cpufreq-core-reduce-warning-messages.patch
cpufreq-core: reduce warning messages

bk-drm-via.patch

bk-i2c.patch

changes-to-the-i2c-driver-to-support-a-non-blocking-interface.patch
Changes to the I2C driver to support a non-blocking interface

minor-ipmi-enhancements.patch
Minor IPMI enhancements

update-to-ipmi-driver-to-support-old-dmi-spec.patch
Update to IPMI driver to support old DMI spec

modify-the-i801-i2c-driver-to-use-the-non-blocking-interface.patch
Modify the i801 I2C driver to use the non-blocking interface.

add-the-ipmi-smbus-driver.patch
Add the IPMI SMBus driver

add-the-ipmi-smbus-driver-fix.patch
ipmi-build-fix-42

add-the-ipmi-smbus-driver-fix-fix.patch
add-the-ipmi-smbus-driver-fix fix

bk-ieee1394.patch

ohci1394-dma_pool_destroy-while-in_atomic-irqs_disabled.patch
ohci1394: dma_pool_destroy while in_atomic() && irqs_disabled()
ohci1394-dma_pool_destroy-while-in_atomic-irqs_disabled-tidy
ohci1394-dma_pool_destroy-while-in_atomic-irqs_disabled-simplification

sbp2-fix-hang-on-unload.patch
sbp2: fix hang on unload

bk-input.patch

bk-dtor-input.patch

serio-warning-fix.patch
serio warning fix

twidjoy-build-fix.patch
twidjoy-build-fix

bk-jfs.patch

bk-kbuild.patch

bk-kconfig.patch

bk-netdev.patch

bk-ntfs.patch

bk-scsi.patch

bk-scsi-rc-fixes.patch

bk-serial.patch

bk-usb.patch

compat-ioctl-for-submiting-urb.patch
compat ioctl for submiting URB

compat-ioctl-for-submiting-urb-fix.patch
compat-ioctl-for-submiting-urb-fix

bk-watchdog.patch

bk-xfs.patch

mm.patch
add -mmN to EXTRAVERSION

vm-pageout-throttling.patch
vm: pageout throttling

orphaned-pagecache-memleak-fix.patch
orphaned pagecache memleak fix

swapspace-layout-improvements.patch
swapspace-layout-improvements

swapspace-layout-improvements-fix.patch
/proc/swaps negative Used

simpler-topdown-mmap-layout-allocator.patch
simpler topdown mmap layout allocator

vmscan-reclaim-swap_cluster_max-pages-in-a-single-pass.patch
vmscan: reclaim SWAP_CLUSTER_MAX pages in a single pass

fix-mincore-cornercases-overflow-caused-by-large-len.patch
Fix mincore cornercases: overflow caused by large "len"

fix-small-vmalloc-per-allocation-limit.patch
Fix small vmalloc per allocation limit

randomisation-global-sysctl.patch
Randomisation: global sysctl

randomisation-global-sysctl-fix.patch
randomisation-global-sysctl-fix

randomisation-infrastructure.patch
Randomisation: infrastructure

fix-compilation-of-uml-after-the-stack-randomization-patches.patch
Fix compilation of UML after the stack-randomization patches

randomisation-add-pf_randomize.patch
Randomisation: add PF_RANDOMIZE

randomisation-stack-randomisation.patch
Randomisation: stack randomisation

randomisation-mmap-randomisation.patch
Randomisation: mmap randomisation

randomisation-enable-by-default.patch
Randomisation: enable by default

randomisation-addr_no_randomize-personality.patch
Randomisation: add ADDR_NO_RANDOMIZE personality

randomisation-top-of-stack-randomization.patch
Randomisation: top-of-stack randomization

move-accounting-function-calls-out-of-critical-vm-code-pathspatch.patch
Move accounting function calls out of critical vm code paths

invalidate-range-of-pages-after-direct-io-write.patch
invalidate range of pages after direct IO write

invalidate-range-of-pages-after-direct-io-write-fix.patch
invalidate-range-of-pages-after-direct-io-write-fix

invalidate-range-of-pages-after-direct-io-write-fix-fix.patch
invalidate-range-of-pages-after-direct-io-write-fix-fix

write-and-wait-on-range-before-direct-io-read.patch
write and wait on range before direct io read

only-unmap-what-intersects-a-direct_io-op.patch
only unmap what intersects a direct_IO op

make-tree_lock-an-rwlock.patch
make mapping->tree_lock an rwlock

must-fix.patch
must fix lists update
must fix list update
mustfix update
must-fix update
mustfix lists

b44-bounce-buffer-fix.patch
b44 bounce buffering fix

net-s2io-replace-schedule_timeout-with-msleep.patch
net/s2io: replace schedule_timeout() with msleep()

ppc-ppc64-abstract-cpu_feature-checks.patch
PPC/PPC64: Abstract cpu_feature checks.

ppc32-dont-create-tmp_gas_check.patch
ppc32: Don't create .tmp_gas_check

ppc32-fix-mv64x60-register-relocation-bug-in-bootwrapper.patch
ppc32: fix mv64x60 register relocation bug in bootwrapper

ppc64-remove-unneeded-includes-from-pseries_nvramc.patch
remove unneeded includes from pSeries_nvram.c

ppc64-collect-and-export-low-level-cpu-usage-statistics.patch
ppc64: collect and export low-level cpu usage statistics

ppc64-move-systemcfg-out-of-heads.patch
ppc64: Move systemcfg out of head.S

ppc64-defconfig-updates.patch
ppc64: defconfig updates

ppc64-distribute-export_symbols.patch
ppc64: distribute EXPORT_SYMBOLs

ppc64-implement-a-vdso-and-use-it-for-signal-trampoline.patch
ppc64: Implement a vDSO and use it for signal trampoline

ppc64-generic-hotplug-cpu-support.patch
ppc64: generic hotplug cpu support

ppc64-disable-hmt-for-rs64-cpus.patch
ppc64: disable HMT for RS64 cpus

use-vmlinux-during-make-install-on-ppc64.patch
ppc64: use vmlinux during make install on ppc64

ppc64-functions-to-reserve-performance-monitor-hardware.patch
ppc64: functions to reserve performance monitor hardware

ppc64-reloc_hide.patch

agpgart-allow-multiple-backends-to-be-initialized.patch
agpgart: allow multiple backends to be initialized
agpgart-allow-multiple-backends-to-be-initialized fix
agpgart: add bridge assignment missed in agp_allocate_memory
x86_64 agp failure fix

agpgart-allow-multiple-backends-to-be-initialized-fix.patch
agpgart-allow-multiple-backends-to-be-initialized-fix

agpgart-add-agp_find_bridge-function.patch
agpgart: add agp_find_bridge function

agpgart-allow-drivers-to-allocate-memory-local-to.patch
agpgart: allow drivers to allocate memory local to the bridge

drm-add-support-for-new-multiple-agp-bridge-agpgart-api.patch
drm: add support for new multiple agp bridge agpgart api

fb-add-support-for-new-multiple-agp-bridge-agpgart-api.patch
fb: add support for new multiple agp bridge agpgart api

agpgart-add-bridge-parameter-to-driver-functions.patch
agpgart: add bridge parameter to driver functions

mips-add-tanbac-tb0219-base-board-driver.patch
mips: add TANBAC TB0219 base board driver

allow-hot-add-enabled-i386-numa-box-to-boot.patch
Allow hot-add enabled i386 NUMA box to boot

refactor-i386-memory-setup.patch
x86: refactor memory setup

consolidate-set_max_mapnr_init-implementations.patch
x86: consolidate set_max_mapnr_init() implementations

remove-free_all_bootmem-define.patch
x86: remove-free_all_bootmem() #define

out-of-line-x86-put_user-implementation.patch
out-of-line x86 "put_user()" implementation

x86_64-hugetlb-fix.patch
x86_64: hugetlb fix

xen-vmm-4-add-ptep_establish_new-to-make-va-available.patch
Xen VMM #4: add ptep_establish_new to make va available

xen-vmm-4-return-code-for-arch_free_page.patch
Xen VMM #4: return code for arch_free_page

xen-vmm-4-return-code-for-arch_free_page-fix.patch
Get rid of arch_free_page() warning

xen-vmm-4-runtime-disable-of-vt-console.patch
Xen VMM #4: runtime disable of VT console

xen-vmm-4-has_arch_dev_mem.patch
Xen VMM #4: HAS_ARCH_DEV_MEM

xen-vmm-4-split-free_irq-into-teardown_irq.patch
Xen VMM #4: split free_irq into teardown_irq

swsusp-do-not-use-higher-order-memory-allocations-on-suspend.patch
swsusp: do not use higher order memory allocations on suspend

swsusp-do-not-use-higher-order-memory-allocations-on-suspend-fix.patch
swsusp-do-not-use-higher-order-memory-allocations-on-suspend fix

swsusp-do-not-use-higher-order-memory-allocations-on-suspend-fix-fix.patch
swsusp-do-not-use-higher-order-memory-allocations-on-suspend fix fix

make-sysrq-f-call-oom_kill.patch
make sysrq-F call oom_kill()

allow-admin-to-enable-only-some-of-the-magic-sysrq-functions.patch
Allow admin to enable only some of the Magic-Sysrq functions

fix-partial-sysrq-setting.patch
Fix partial sysrq setting

sort-out-pci_rom_address_enable-vs-ioresource_rom_enable.patch
Sort out PCI_ROM_ADDRESS_ENABLE vs IORESOURCE_ROM_ENABLE

irqpoll.patch
irqpoll

poll-mini-optimisations.patch
poll: mini optimisations

mtrr-size-and-base-debug.patch
mtrr size-and-base debugging

cleanup-vc-array-access.patch
cleanup vc array access

remove-console_macrosh.patch
remove console_macros.h

merge-vt_struct-into-vc_data.patch
merge vt_struct into vc_data

merge-vt_struct-into-vc_data-fix.patch
merge-vt_struct-into-vc_data fix

jbd-journal-overflow-fix-2.patch
jbd: journal overflow fix #2

jbd-fix-against-journal-overflow.patch
JBD: reduce stack and number of journal descriptors

jbd-fix-against-journal-overflow-tidies.patch
jbd-fix-against-journal-overflow-tidies

jbd-log-space-management-optimization.patch
JBD: log space management optimization

factor-out-phase-6-of-journal_commit_transaction.patch
Factor out phase 6 of journal_commit_transaction

ext3-cleanup-1.patch
ext3 cleanup 1

ext3-free-block-accounting-fix.patch
ext3: free block accounting fix

ext3_test_root-speedup.patch
ext3_test_root() speedup

i4l-new-hfc_usb-driver-version.patch
i4l: new hfc_usb driver version

i4l-hfc-4s-and-hfc-8s-driver.patch
i4l: HFC-4S and HFC-8S driver

fix-race-between-the-nmi-code-and-the-cmos-clock.patch
Fix race between the NMI code and the CMOS clock

cant-unmount-bad-inode.patch
Can't unmount bad inode

iounmap-debugging.patch
iounmap debugging

fix-put_user-under-mmap_sem-in-sys_get_mempolicy.patch
fix put_user under mmap_sem in sys_get_mempolicy()

oss-support-for-ac97-low-power-codecs.patch
OSS Support for AC97 low power codecs

fix-kallsyms-insmod-rmmod-race.patch
Fix kallsyms/insmod/rmmod race

fix-kallsyms-insmod-rmmod-race-fix.patch
fix-kallsyms-insmod-rmmod-race fix

fix-kallsyms-insmod-rmmod-race-fix-fix.patch
fix-kallsyms-insmod-rmmod-race-fix-fix

d_drop-should-use-per-dentry-lock.patch
d_drop should use per dentry lock

detect-soft-lockups.patch
detect soft lockups

touch_softlockup_watchdog.patch
touch_softlockup_watchdog()

fix-softlockup-warning-in-swsuspend-resume.patch
fix softlockup warning in swsuspend resume

serialize-access-to-ide-devices.patch
serialize access to ide devices

add-struct-request-end_io-callback.patch
Add struct request end_io callback

add-struct-request-end_io-callback-fix.patch
add-struct-request-end_io-callback fix

rework-core-barrier-support.patch
rework core barrier support

scsi_io_completion-sense-copy.patch
scsi_io_completion sense copy

blk_execute_rq-oops-on-fast-completion.patch
blk_execute_rq() oops on fast completion

nls_cp936c-is-not-synchronized-with-ms-translation-table.patch
nls_cp936.c is not synchronized with M$'s translation table

annotate-proc-pid-maps-with--markers.patch
annotate /proc/<PID>/maps with [heap]/[stack]/[vdso] markers

serial-add-nec-vr4100-series-serial-support.patch
serial: add NEC VR4100 series serial support

sys_setpriority-euid-semantics-fix.patch
sys_setpriority() euid semantics fix

add-tcsbrkp-to-compat_ioctlh.patch
add TCSBRKP to compat_ioctl.h

areca-raid-linux-scsi-driver.patch
ARECA RAID Linux scsi driver

add-local-bio-pool-support-and-modify-dm.patch
add local bio pool support and modify dm

add-local-bio-pool-support-and-modify-dm-uninline-zero_fill_bio.patch
uninline-zero_fill_bio

minor-conceptual-fix-for-proc-kcore-header-size.patch
minor conceptual fix for /proc/kcore header size

floppy-add-sysfs-symlink.patch
floppy.c: add sysfs symlink

add-compiler-gcc4h.patch
add compiler-gcc4.h

rt-lsm.patch
RT-LSM

convert-proc-driver-rtc-to-seq_file.patch
convert /proc/driver/rtc to seq_file.

drivers-char-lpc-race-fix.patch
drivers/char/lp.c race fix

clean-up-and-unify-asm-resourceh-files.patch
clean up and unify asm-*/resource.h files

base-small-introduce-the-config_base_small-flag.patch
base-small: introduce the CONFIG_BASE_SMALL flag

base-small-shrink-chrdevs-hash.patch
base-small: shrink chrdevs hash

base-small-shrink-pid-tables.patch
base-small: shrink PID tables

base-small-shrink-uid-hash.patch
base-small: shrink UID hash

base-small-shrink-futex-queues.patch
base-small: shrink futex queues

base-small-shrink-timer-hashes.patch
base-small: shrink timer hashes

base-small-shrink-console-buffer.patch
base-small: shrink console buffer

lib-sort-heapsort-implementation-of-sort.patch
lib/sort: Heapsort implementation of sort()

sort-fix.patch
sort fix

sort-export.patch
sort export

sort-build-fix.patch
sort build fix

lib-sort-turn-off-self-test.patch
lib/sort: turn off self-test

lib-sort-replace-qsort-in-xfs.patch
lib/sort: Replace qsort in XFS

lib-sort-replace-insertion-sort-in-exception-tables.patch
lib/sort: Replace insertion sort in exception tables

lib-sort-replace-insertion-sort-in-ia64-exception-tables.patch
lib/sort: Replace insertion sort in IA64 exception tables

lib-sort-use-generic-sort-on-x86_64.patch
lib/sort: Use generic sort on x86_64

random-pt2-cleanup-waitqueue-logic-fix-missed-wakeup.patch
random: cleanup waitqueue logic, fix missed wakeup

random-pt2-kill-pool-clearing.patch
random: kill pool clearing

random-pt2-combine-legacy-ioctls.patch
random: combine legacy ioctls

random-pt2-re-init-all-pools-on-zero.patch
random: re-init all pools on zero

random-pt2-simplify-initialization.patch
random: simplify initialization

random-pt2-kill-memsets-of-static-data.patch
random: kill memsets of static data

random-pt2-kill-dead-extract_state-struct.patch
random: kill dead extract_state struct

random-pt2-kill-22-compat-waitqueue-defs.patch
random: kill 2.2 compat waitqueue defs

random-pt2-kill-redundant-rotate_left-definitions.patch
random: kill redundant rotate_left definitions

random-pt2-kill-redundant-rotate_left-definitions-fix.patch
rol32 thinko

random-pt2-kill-misnamed-log2.patch
random: kill misnamed log2

random-pt3-more-meaningful-pool-names.patch
random: More meaningful pool names

random-pt3-static-allocation-of-pools.patch
random: Static allocation of pools

random-pt3-static-sysctl-bits.patch
random: Static sysctl bits

random-pt3-catastrophic-reseed-checks.patch
random: Catastrophic reseed checks

random-pt3-entropy-reservation-accounting.patch
random: Entropy reservation accounting

random-pt3-reservation-flag-in-pool-struct.patch
random: Reservation flag in pool struct

random-pt3-reseed-pointer-in-pool-struct.patch
random: Reseed pointer in pool struct

random-pt3-break-up-extract_user.patch
random: Break up extract_user

random-pt3-remove-dead-md5-copy.patch
random: Remove dead MD5 copy

random-pt3-simplify-hash-folding.patch
random: Simplify hash folding

random-pt3-clean-up-hash-buffering.patch
random: Clean up hash buffering

random-pt3-remove-entropy-batching.patch
random: Remove entropy batching

random-pt4-create-new-rol32-ror32-bitops.patch
random: Create new rol32/ror32 bitops

random-pt4-use-them-throughout-the-tree.patch
random: Use them throughout the tree

random-pt4-kill-the-sha-variants.patch
random: Kill the SHA variants

random-pt4-cleanup-sha-interface.patch
random: Cleanup SHA interface

random-pt4-move-sha-code-to-lib.patch
random: Move SHA code to lib/

random-pt4-replace-sha-with-faster-version.patch
random: Replace SHA with faster version

random-pt4-replace-sha-with-faster-version-fix.patch
random-pt4-replace-sha-with-faster-version-fix

random-pt4-replace-sha-with-faster-version-fix-fix.patch
SHA1 clarify kerneldoc

random-pt4-replace-sha-with-faster-version-fix-fix-fix.patch
random-pt4-cleanup-sha-interface fix

random-pt4-update-cryptolib-to-use-sha-fro-lib.patch
random: Update cryptolib to use SHA fro lib

random-pt4-move-halfmd4-to-lib.patch
random: Move halfmd4 to lib

random-pt4-kill-duplicate-halfmd4-in-ext3-htree.patch
random: Kill duplicate halfmd4 in ext3 htree

random-pt4-kill-duplicate-halfmd4-in-ext3-htree-fix.patch
random-pt4-kill-duplicate-halfmd4-in-ext3-htree-fix

random-pt4-simplify-and-shrink-syncookie-code.patch
random: Simplify and shrink syncookie code

random-pt4-move-syncookies-to-net.patch
random: Move syncookies to net/

speedup-proc-pid-maps.patch
Speed up /proc/pid/maps

speedup-proc-pid-maps-fix.patch
Speed up /proc/pid/maps fix

speedup-proc-pid-maps-fix-fix.patch
speedup-proc-pid-maps fix fix

speedup-proc-pid-maps-fix-fix-fix.patch
speedup /proc/<pid>/maps(4th version)

fix-loss-of-records-on-size-4096-in-proc-pid-maps.patch
fix loss of records on size > 4096 in proc/<pid>/maps

speedup-proc-pid-maps-fix-fix-fix-fix.patch
speedup-proc-pid-maps-fix-fix-fix fix

posix-timers-tidy-up-clock-interfaces-and-consolidate-dispatch-logic.patch
posix-timers: tidy up clock interfaces and consolidate dispatch logic

posix-timers-high-resolution-cpu-clocks-for-posix-clock_-syscalls.patch
posix-timers: high-resolution CPU clocks for POSIX clock_* syscalls

posix-timers-tidy-up-clock-interfaces-and-consolidate-dispatch-logic-cleanup.patch
posix-timers: tidy up clock interfaces and consolidate dispatch logic cleanup

posix-timers-fix-posix-timers-signals-lock-order.patch
posix-timers: fix posix-timers signals lock order

posix-timers-cpu-clock-support-for-posix-timers.patch
posix-timers: CPU clock support for POSIX timers

posix-timers-cpu-clock-support-for-posix-timers-fix.patch
posix-timers: CPU clock support for POSIX timers (fix)

panic-in-check_process_timers.patch
PANIC in check_process_timers()

make-itimer_real-per-process.patch
make ITIMER_REAL per-process

make-itimer_prof-itimer_virtual-per-process.patch
make ITIMER_PROF, ITIMER_VIRTUAL per-process

make-rlimit_cpu-sigxcpu-per-process.patch
make RLIMIT_CPU/SIGXCPU per-process

pcmcia-add-support-ti-pci4510-cardbus-bridge.patch
pcmcia: Add support TI PCI4510 CardBus bridge

pcmcia-update-vrc4171_card.patch
pcmcia: update vrc4171_card

nfs-fix_vfsflock.patch
VFS: Fix structure initialization in locks_remove_flock()

nfs-flock.patch
NFS: Add emulation of BSD flock() in terms of POSIX locks on the server

nfsacl-return-enosys-for-rpc-programs-that-are-unavailable.patch
nfsacl: return -ENOSYS for RPC programs that are unavailable

nfsacl-add-missing-eopnotsupp-=-nfs3err_notsupp-mapping-in-nfsd.patch
nfsacl: add missing -EOPNOTSUPP => NFS3ERR_NOTSUPP mapping in nfsd

nfsacl-allow-multiple-programs-to-listen-on-the-same-port.patch
nfsacl: allow multiple programs to listen on the same port

nfsacl-allow-multiple-programs-to-share-the-same-transport.patch
nfsacl: allow multiple programs to share the same transport

nfsacl-lazy-rpc-receive-buffer-allocation.patch
nfsacl: lazy RPC receive buffer allocation

nfsacl-encode-and-decode-arbitrary-xdr-arrays.patch
nfsacl: encode and decode arbitrary XDR arrays

nfsacl-encode-and-decode-arbitrary-xdr-arrays-fix.patch
nfsacl-encode-and-decode-arbitrary-xdr-arrays-fix

nfsacl-add-noacl-nfs-mount-option.patch
nfsacl: add noacl nfs mount option

nfsacl-infrastructure-and-server-side-of-nfsacl.patch
nfsacl: infrastructure and server side of nfsacl

nfsv4-deamon-always-supports-acls.patch
NFSv4 deamon always supports acls

lib-sort-replace-qsort-in-nfs-acl-code.patch
lib/sort: Replace qsort in NFS ACL code

nfsacl-infrastructure-and-server-side-of-nfsacl-fix.patch
nfsacl-infrastructure-and-server-side-of-nfsacl fix

nfsacl-solaris-nfsacl-workaround.patch
nfsacl: solaris nfsacl workaround

nfsacl-client-side-of-nfsacl.patch
nfsacl: client side of nfsacl

nfsacl-client-side-of-nfsacl-fix.patch
nfsacl: Must not initialize inode->i_op to NULL

nfsacl-acl-umask-handling-workaround-in-nfs-client.patch
nfsacl: aCL umask handling workaround in nfs client

nfsacl-acl-umask-handling-workaround-in-nfs-client-fix.patch
ACL umask handling workaround in nfs client fix

nfsacl-cache-acls-on-the-nfs-client-side.patch
nfsacl: cache acls on the nfs client side

nfs-acl-build-fix-posix-acl-config-tidy.patch
NFS ACL build fix, POSIX ACL config tidy

kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)
kgdbL warning fix
kgdb buffer overflow fix
kgdbL warning fix
kgdb: CONFIG_DEBUG_INFO fix
x86_64 fixes
correct kgdb.txt Documentation link (against 2.6.1-rc1-mm2)
kgdb: fix for recent gcc
kgdb warning fixes
THREAD_SIZE fixes for kgdb
Fix stack overflow test for non-8k stacks
kgdb-ga.patch fix for i386 single-step into sysenter
fix TRAP_BAD_SYSCALL_EXITS on i386
add TRAP_BAD_SYSCALL_EXITS config for i386
kgdb-is-incompatible-with-kprobes
kgdb-ga-build-fix
kgdb-ga-fixes
kgdb: kill off highmem_start_page

kgdboe-netpoll.patch
kgdb-over-ethernet via netpoll
kgdboe: fix configuration of MAC address

kgdb-x86_64-support.patch
kgdb-x86_64-support.patch for 2.6.2-rc1-mm3
kgdb-x86_64-warning-fixes
kgdb-x86_64-fix
kgdb-x86_64-serial-fix
kprobes exception notifier fix

dev-mem-restriction-patch.patch
/dev/mem restriction patch

dev-mem-restriction-patch-allow-reads.patch
dev-mem-restriction-patch: allow reads

journal_add_journal_head-debug.patch
journal_add_journal_head-debug

list_del-debug.patch
list_del debug check

page-owner-tracking-leak-detector.patch
Page owner tracking leak detector

make-page_owner-handle-non-contiguous-page-ranges.patch
make page_owner handle non-contiguous page ranges

unplug-can-sleep.patch
unplug functions can sleep

firestream-warnings.patch
firestream warnings

perfctr-core.patch
perfctr: core
perfctr: remove bogus perfctr_sample_thread() calls

perfctr-i386.patch
perfctr: i386

perfctr-x86-core-updates.patch
perfctr x86 core updates

perfctr-x86-driver-updates.patch
perfctr x86 driver updates

perfctr-x86-driver-cleanup.patch
perfctr: x86 driver cleanup

perfctr-prescott-fix.patch
Prescott fix for perfctr

perfctr-x86-update-2.patch
perfctr x86 update 2

perfctr-x86_64.patch
perfctr: x86_64

perfctr-x86_64-core-updates.patch
perfctr x86_64 core updates

perfctr-ppc.patch
perfctr: PowerPC

perfctr-ppc32-driver-update.patch
perfctr: ppc32 driver update

perfctr-ppc32-mmcr0-handling-fixes.patch
perfctr ppc32 MMCR0 handling fixes

perfctr-ppc32-update.patch
perfctr ppc32 update

perfctr-ppc32-update-2.patch
perfctr ppc32 update

perfctr-virtualised-counters.patch
perfctr: virtualised counters

perfctr-remap_page_range-fix.patch

virtual-perfctr-illegal-sleep.patch
virtual perfctr illegal sleep

make-perfctr_virtual-default-in-kconfig-match-recommendation.patch
Make PERFCTR_VIRTUAL default in Kconfig match recommendation in help text

perfctr-ifdef-cleanup.patch
perfctr ifdef cleanup

perfctr-update-2-6-kconfig-related-updates.patch
perfctr: Kconfig-related updates

perfctr-virtual-updates.patch
perfctr virtual updates

perfctr-virtual-cleanup.patch
perfctr: virtual cleanup

perfctr-ppc32-preliminary-interrupt-support.patch
perfctr ppc32 preliminary interrupt support

perfctr-update-5-6-reduce-stack-usage.patch
perfctr: reduce stack usage

perfctr-interrupt-support-kconfig-fix.patch
perfctr interrupt_support Kconfig fix

perfctr-low-level-documentation.patch
perfctr low-level documentation

perfctr-inheritance-1-3-driver-updates.patch
perfctr inheritance: driver updates

perfctr-inheritance-2-3-kernel-updates.patch
perfctr inheritance: kernel updates

perfctr-inheritance-3-3-documentation-updates.patch
perfctr inheritance: documentation updates

perfctr-inheritance-locking-fix.patch
perfctr inheritance locking fix

perfctr-api-changes-first-step.patch
perfctr API changes: first step

perfctr-virtual-update.patch
perfctr virtual update

perfctr-x86-64-ia32-emulation-fix.patch
perfctr x86-64 ia32 emulation fix

perfctr-sysfs-update-1-4-core.patch
perfctr sysfs update: core

perfctr-sysfs-update.patch
Perfctr sysfs update

perfctr-sysfs-update-2-4-x86.patch
perfctr sysfs update: x86

perfctr-sysfs-update-3-4-x86-64.patch
perfctr sysfs update: x86-64
perfctr: syscall numbers in x86-64 ia32-emulation
perfctr x86_64 native syscall numbers fix

perfctr-sysfs-update-4-4-ppc32.patch
perfctr sysfs update: ppc32

add-do_proc_doulonglongvec_minmax-to-sysctl-functions.patch
Add do_proc_doulonglongvec_minmax to sysctl functions
add-do_proc_doulonglongvec_minmax-to-sysctl-functions-fix
add-do_proc_doulonglongvec_minmax-to-sysctl-functions fix 2

add-sysctl-interface-to-sched_domain-parameters.patch
Add sysctl interface to sched_domain parameters

allow-modular-ide-pnp.patch
allow modular ide-pnp

allow-x86_64-to-reenable-interrupts-on-contention.patch
Allow x86_64 to reenable interrupts on contention

i386-cpu-hotplug-updated-for-mm.patch
i386 CPU hotplug updated for -mm

ppc64-fix-cpu-hotplug.patch
ppc64: fix hotplug cpu

disable-atykb-warning.patch
disable atykb "too many keys pressed" warning

export-file_ra_state_init-again.patch
Export file_ra_state_init() again

cachefs-filesystem.patch
CacheFS filesystem

numa-policies-for-file-mappings-mpol_mf_move-cachefs.patch
numa-policies-for-file-mappings-mpol_mf_move for cachefs

cachefs-release-search-records-lest-they-return-to-haunt-us.patch
CacheFS: release search records lest they return to haunt us

fix-64-bit-problems-in-cachefs.patch
Fix 64-bit problems in cachefs

cachefs-fixed-typos-that-cause-wrong-pointer-to-be-kunmapped.patch
cachefs: fixed typos that cause wrong pointer to be kunmapped

cachefs-return-the-right-error-upon-invalid-mount.patch
CacheFS: return the right error upon invalid mount

fix-cachefs-barrier-handling-and-other-kernel-discrepancies.patch
Fix CacheFS barrier handling and other kernel discrepancies

remove-error-from-linux-cachefsh.patch
Remove #error from linux/cachefs.h

cachefs-warning-fix-2.patch
cachefs warning fix 2

cachefs-linkage-fix-2.patch
cachefs linkage fix

cachefs-build-fix.patch
cachefs build fix

cachefs-documentation.patch
CacheFS documentation

add-page-becoming-writable-notification.patch
Add page becoming writable notification

add-page-becoming-writable-notification-fix.patch
do_wp_page_mk_pte_writable() fix

add-page-becoming-writable-notification-build-fix.patch
add-page-becoming-writable-notification build fix

provide-a-filesystem-specific-syncable-page-bit.patch
Provide a filesystem-specific sync'able page bit

provide-a-filesystem-specific-syncable-page-bit-fix.patch
provide-a-filesystem-specific-syncable-page-bit-fix

provide-a-filesystem-specific-syncable-page-bit-fix-2.patch
provide-a-filesystem-specific-syncable-page-bit-fix-2

make-afs-use-cachefs.patch
Make AFS use CacheFS

afs-cachefs-dependency-fix.patch
afs-cachefs-dependency-fix

split-general-cache-manager-from-cachefs.patch
Split general cache manager from CacheFS

turn-cachefs-into-a-cache-backend.patch
Turn CacheFS into a cache backend

rework-the-cachefs-documentation-to-reflect-fs-cache-split.patch
Rework the CacheFS documentation to reflect FS-Cache split

update-afs-client-to-reflect-cachefs-split.patch
Update AFS client to reflect CacheFS split

x86-rename-apic_mode_exint.patch
kexec: x86: rename APIC_MODE_EXINT

x86-local-apic-fix.patch
kexec: x86: local apic fix

x86_64-e820-64bit.patch
kexec: x86_64: e820 64bit fix

x86-i8259-shutdown.patch
kexec: x86: i8259 shutdown: disable interrupts

x86_64-i8259-shutdown.patch
kexec: x86_64: add i8259 shutdown method

x86-apic-virtwire-on-shutdown.patch
kexec: x86: resture apic virtual wire mode on shutdown

x86_64-apic-virtwire-on-shutdown.patch
kexec: x86_64: restore apic virtual wire mode on shutdown

vmlinux-fix-physical-addrs.patch
kexec: vmlinux: fix physical addresses

x86-vmlinux-fix-physical-addrs.patch
kexec: x86: vmlinux: fix physical addresses

x86_64-vmlinux-fix-physical-addrs.patch
kexec: x86_64: vmlinux: fix physical addresses

x86_64-entry64.patch
kexec: x86_64: add 64-bit entry

x86-config-kernel-start.patch
kexec: x86: add CONFIG_PYSICAL_START

x86_64-config-kernel-start.patch
kexec: x86_64: add CONFIG_PHYSICAL_START

kexec-kexec-generic.patch
kexec: add kexec syscalls

kexec-kexec-generic-kexec-use-unsigned-bitfield.patch
kexec: use unsigned bitfield

x86-machine_shutdown.patch
kexec: x86: factor out apic shutdown code

x86-kexec.patch
kexec: x86 kexec core

x86-crashkernel.patch
crashdump: x86 crashkernel option

x86_64-machine_shutdown.patch
kexec: x86_64: factor out apic shutdown code

x86_64-kexec.patch
kexec: x86_64 kexec implementation

x86_64-crashkernel.patch
crashdump: x86_64: crashkernel option

kexec-ppc-support.patch
kexec: kexec ppc support

x86-crash_shutdown-nmi-shootdown.patch
crashdump: x86: add NMI handler to capture other CPUs

x86-crash_shutdown-snapshot-registers.patch
kexec: x86: snapshot registers during crash shutdown

x86-crash_shutdown-apic-shutdown.patch
kexec: x86 shutdown APICs during crash_shutdown

crashdump-documentation.patch
crashdump: documentation

crashdump-memory-preserving-reboot-using-kexec.patch
crashdump: memory preserving reboot using kexec

crashdump-routines-for-copying-dump-pages.patch
crashdump: routines for copying dump pages

crashdump-routines-for-copying-dump-pages-fixes.patch
crashdump-routines-for-copying-dump-pages-fixes

crashdump-elf-format-dump-file-access.patch
crashdump: elf format dump file access

crashdump-linear-raw-format-dump-file-access.patch
crashdump: linear raw format dump file access

crashdump-linear-raw-format-dump-file-access-coding-style.patch
crashdump-linear-raw-format-dump-file-access-coding-style

new-bitmap-list-format-for-cpusets.patch
new bitmap list format (for cpusets)

cpusets-big-numa-cpu-and-memory-placement.patch
cpusets - big numa cpu and memory placement

cpusets-config_cpusets-depends-on-smp.patch
Cpusets: CONFIG_CPUSETS depends on SMP

cpusets-move-cpusets-above-embedded.patch
move CPUSETS above EMBEDDED

cpusets-fix-cpuset_get_dentry.patch
cpusets : fix cpuset_get_dentry()

cpusets-fix-race-in-cpuset_add_file.patch
cpusets: fix race in cpuset_add_file()

cpusets-remove-more-casts.patch
cpusets: remove more casts

cpusets-make-config_cpusets-the-default-in-sn2_defconfig.patch
cpusets: make CONFIG_CPUSETS the default in sn2_defconfig

cpusets-document-proc-status-allowed-fields.patch
cpusets: document proc status allowed fields

cpusets-dont-export-proc_cpuset_operations.patch
Cpusets - Dont export proc_cpuset_operations

cpusets-display-allowed-masks-in-proc-status.patch
cpusets: display allowed masks in proc status

cpusets-simplify-cpus_allowed-setting-in-attach.patch
cpusets: simplify cpus_allowed setting in attach

cpusets-remove-useless-validation-check.patch
cpusets: remove useless validation check

cpusets-tasks-file-simplify-format-fixes.patch
Cpusets tasks file: simplify format, fixes

lib-sort-replace-open-coded-opids2-bubblesort-in-cpusets.patch
lib/sort: Replace open-coded O(pids**2) bubblesort in cpusets

cpusets-simplify-memory-generation.patch
Cpusets: simplify memory generation

cpusets-interoperate-with-hotplug-online-maps.patch
cpusets: interoperate with hotplug online maps

cpusets-alternative-fix-for-possible-race-in.patch
cpusets: alternative fix for possible race in cpuset_tasks_read()

cpusets-remove-casts.patch
cpusets: remove void* typecasts

reiser4-sb_sync_inodes.patch
reiser4: vfs: add super_operations.sync_inodes()

reiser4-allow-drop_inode-implementation.patch
reiser4: export vfs inode.c symbols

reiser4-truncate_inode_pages_range.patch
reiser4: vfs: add truncate_inode_pages_range()

reiser4-export-remove_from_page_cache.patch
reiser4: export pagecache add/remove functions to modules

reiser4-export-page_cache_readahead.patch
reiser4: export page_cache_readahead to modules

reiser4-reget-page-mapping.patch
reiser4: vfs: re-check page->mapping after calling try_to_release_page()

reiser4-rcu-barrier.patch
reiser4: add rcu_barrier() synchronization point

reiser4-export-inode_lock.patch
reiser4: export inode_lock to modules

reiser4-export-pagevec-funcs.patch
reiser4: export pagevec functions to modules

reiser4-export-radix_tree_preload.patch
reiser4: export radix_tree_preload() to modules

reiser4-export-find_get_pages.patch

reiser4-radix-tree-tag.patch
reiser4: add new radix tree tag

reiser4-radix_tree_lookup_slot.patch
reiser4: add radix_tree_lookup_slot()

reiser4-perthread-pages.patch
reiser4: per-thread page pools

reiser4-include-reiser4.patch
reiser4: add to build system

reiser4-doc.patch
reiser4: documentation

reiser4-only.patch
reiser4: main fs

reiser4-recover-read-performance.patch
reiser4: recover read performance

reiser4-export-find_get_pages_tag.patch
reiser4-export-find_get_pages_tag

reiser4-add-missing-context.patch

add-acpi-based-floppy-controller-enumeration.patch
Add ACPI-based floppy controller enumeration.

possible-dcache-bug-debugging-patch.patch
Possible dcache BUG: debugging patch

serial-add-support-for-non-standard-xtals-to-16c950-driver.patch
serial: add support for non-standard XTALs to 16c950 driver

add-support-for-possio-gcc-aka-pcmcia-siemens-mc45.patch
Add support for Possio GCC AKA PCMCIA Siemens MC45

generic-serial-cli-conversion.patch
generic-serial cli() conversion

specialix-io8-cli-conversion.patch
Specialix/IO8 cli() conversion

sx-cli-conversion.patch
SX cli() conversion

revert-allow-oem-written-modules-to-make-calls-to-ia64-oem-sal-functions.patch
revert "allow OEM written modules to make calls to ia64 OEM SAL functions"

md-add-interface-for-userspace-monitoring-of-events.patch
md: add interface for userspace monitoring of events.

make-acpi_bus_register_driver-consistent-with-pci_register_driver-again.patch
make acpi_bus_register_driver() consistent with pci_register_driver()

remove-lock_section-from-x86_64-spin_lock-asm.patch
remove LOCK_SECTION from x86_64 spin_lock asm

kfree_skb-dump_stack.patch
kfree_skb-dump_stack

cancel_rearming_delayed_work.patch
cancel_rearming_delayed_work()

ipvs-deadlock-fix.patch
ipvs deadlock fix

minimal-ide-disk-updates.patch
Minimal ide-disk updates

use-find_trylock_page-in-free_swap_and_cache-instead-of-hand-coding.patch
use find_trylock_page in free_swap_and_cache instead of hand coding

radeonfb-fix-spurious-error-return-in-fbio_radeon_set_mirror.patch
radeonfb: Fix spurious error return in FBIO_RADEON_SET_MIRROR

w100fb-make-blanking-function-interrupt-safe.patch
w100fb: Make blanking function interrupt safe

kyrofb-copy__user-return-value-checks-added-to-kyro-fb.patch
kyrofb: copy_*_user return value checks added to kyro fb

skeletonfb-documentation-fixes.patch
skeletonfb: Documentation fixes

intelfb-add-partial-support-915g-chipset.patch
intelfb: Add partial support 915G chipset

sisfb_compat_ioctl-warning-fix.patch
fbdev compat_ioctl warning fix

sis-warning-fix.patch
sis warning fix

tridentfbc-make-some-code-static.patch
tridentfb.c: make some code static

tridentfb-warning-fix.patch
tridentfb warning fix

md-fix-multipath-assembly-bug.patch
md: fix multipath assembly bug

md-raid-kconfig-cleanups-remove-experimental-tag-from-raid-6.patch
md: RAID Kconfig cleanups, remove experimental tag from RAID-6

device-mapper-store-name-directly-against-device.patch
device-mapper: Store name directly against device

device-mapper-record-restore-bio-state.patch
device-mapper: Record & restore bio state.

device-mapper-export-map_info.patch
device-mapper: Export map_info

figure-out-who-is-inserting-bogus-modules.patch
Figure out who is inserting bogus modules

detect-atomic-counter-underflows.patch
detect atomic counter underflows

update-documentation-filesystems-locking.patch
Update Documentation/filesystems/Locking

post-halloween-doc.patch
post halloween doc

periodically-scan-redzone-entries-and-slab-control-structures.patch
periodically scan redzone entries and slab control structures

fuse-maintainers-kconfig-and-makefile-changes.patch
Subject: [PATCH 1/11] FUSE - MAINTAINERS, Kconfig and Makefile changes

fuse-core.patch
Subject: [PATCH 2/11] FUSE - core

fuse-device-functions.patch
Subject: [PATCH 3/11] FUSE - device functions

fuse-device-functions-fix-race-in-interrupted-request.patch
fuse: fix race in interrupted request

fuse-device-functions-fix.patch
fuse: better error reporting in fuse_fill_super

fuse-fix-llseek-on-device.patch
FUSE: fix llseek on device

fuse-make-two-functions-static.patch
fuse: make two functions static

fuse-fix-variable-with-confusing-name.patch
fuse: fix variable with confusing name

fuse-read-only-operations.patch
Subject: [PATCH 4/11] FUSE - read-only operations

fuse-read-write-operations.patch
Subject: [PATCH 5/11] FUSE - read-write operations

fuse-read-write-operations-fix.patch
fuse: fix hard link operation

fuse-file-operations.patch
Subject: [PATCH 6/11] FUSE - file operations

fuse-mount-options.patch
Subject: [PATCH 7/11] FUSE - mount options

fuse-dont-check-against-zero-fsuid.patch
fuse: don't check against zero fsuid

fuse-remove-mount_max-and-user_allow_other-module-parameters.patch
fuse: remove mount_max and user_allow_other module parameters

fuse-extended-attribute-operations.patch
Subject: [PATCH 8/11] FUSE - extended attribute operations

fuse-readpages-operation.patch
Subject: [PATCH 9/11] FUSE - readpages operation

fuse-nfs-export.patch
Subject: [PATCH 10/11] FUSE - NFS export

fuse-direct-i-o.patch
Subject: [PATCH 11/11] FUSE - direct I/O

fuse-transfer-readdir-data-through-device.patch
fuse: transfer readdir data through device

cryptoapi-prepare-for-processing-multiple-buffers-at.patch
CryptoAPI: prepare for processing multiple buffers at a time

cryptoapi-update-padlock-to-process-multiple-blocks-at.patch
CryptoAPI: Update PadLock to process multiple blocks at once

update-email-address-of-andrea-arcangeli.patch
update email address of Andrea Arcangeli

compile-error-blackbird_load_firmware.patch
blackbird_load_firmware compile fix

i386-x86_64-apicc-make-two-functions-static.patch
i386/x86_64 apic.c: make two functions static

i386-cyrixc-make-a-function-static.patch
i386 cyrix.c: make a function static

mtrr-some-cleanups.patch
mtrr: some cleanups

i386-cpu-commonc-some-cleanups.patch
i386 cpu/common.c: some cleanups

i386-cpuidc-make-two-functions-static.patch
i386 cpuid.c: make two functions static

i386-efic-make-some-code-static.patch
i386 efi.c: make some code static

i386-x86_64-io_apicc-misc-cleanups.patch
i386/x86_64 io_apic.c: misc cleanups

i386-mpparsec-make-mp_processor_info-static.patch
i386 mpparse.c: make MP_processor_info static

i386-x86_64-msrc-make-two-functions-static.patch
i386/x86_64 msr.c: make two functions static

3w-abcdh-tw_device_extension-remove-an-unused-filed.patch
3w-abcd.h: TW_Device_Extension: remove an unused field

hpet-make-some-code-static.patch
hpet: make some code static

26-patch-i386-trapsc-make-a-function-static.patch
i386 traps.c: make a function static

i386-semaphorec-make-4-functions-static.patch
i386 semaphore.c: make 4 functions static

kill-aux_device_present.patch
kill aux_device_present

i386-setupc-make-4-variables-static.patch
i386 setup.c: make 4 variables static

mostly-i386-mm-cleanup.patch
(mostly i386) mm cleanup

update-email-address-of-benjamin-lahaise.patch
Update email address of Benjamin LaHaise

update-email-address-of-philip-blundell.patch
Update email address of Philip Blundell

kernel-acctc-make-a-function-static.patch
kernel/acct.c: make a function static

kernel-auditc-make-some-functions-static.patch
kernel/audit.c: make some functions static

kernel-capabilityc-make-a-spinlock-static.patch
kernel/capability.c: make a spinlock static

mm-thrashc-make-a-variable-static.patch
mm/thrash.c: make a variable static

lib-kernel_lockc-make-kernel_sem-static.patch
lib/kernel_lock.c: make kernel_sem static

saa7146_vv_ksymsc-remove-two-unused-export_symbol_gpls.patch
saa7146_vv_ksyms.c: remove two unused EXPORT_SYMBOL_GPL's

fix-placement-of-static-inline-in-nfsdh.patch
fix placement of static inline in nfsd.h

drivers-block-umemc-make-two-functions-static.patch
drivers/block/umem.c: make two functions static

drivers-block-xdc-make-a-variable-static.patch
drivers/block/xd.c: make a variable static

kernel-forkc-make-mm_cachep-static.patch
kernel/fork.c: make mm_cachep static

kernel-forkc-make-mm_cachep-static-fix.patch
kernel-forkc-make-mm_cachep-static fix

mm-page-writebackc-remove-an-unused-function.patch
mm/page-writeback.c: remove an unused function

mm-shmemc-make-a-struct-static.patch
mm/shmem.c: make a struct static

misc-isapnp-cleanups.patch
misc ISAPNP cleanups

some-pnp-cleanups.patch
some PNP cleanups

if-0-cx88_risc_disasm.patch
#if 0 cx88_risc_disasm

make-loglevels-in-init-mainc-a-little-more-sane.patch
Make loglevels in init/main.c a little more sane.

isicom-use-null-for-pointer.patch
sparse: use NULL for pointer

remove-bouncing-email-address-of-hennus-bergman.patch
remove bouncing email address of Hennus Bergman

cirrusfbc-make-some-code-static.patch
cirrusfb.c: make some code static

matroxfb_basec-make-some-code-static.patch
matroxfb_base.c: make some code static

matroxfb_basec-make-some-code-static-fix.patch
matroxfb_basec-make-some-code-static fix

asiliantfbc-make-some-code-static.patch
asiliantfb.c: make some code static

i386-apic-kconfig-cleanups.patch
i386 APIC Kconfig cleanups

security-seclvlc-make-some-code-static.patch
security/seclvl.c: make some code static

drivers-block-elevatorc-make-two-functions-static.patch
drivers/block/elevator.c: make two functions static

drivers-block-rdc-make-two-variables-static.patch
drivers/block/rd.c: make two variables static

loopc-make-two-functions-static.patch
loop.c: make two functions static

remove-bouncing-email-address-of-thomas-hood.patch
remove bouncing email address of Thomas Hood

fs-adfs-dir_fc-remove-an-unused-function.patch
fs/adfs/dir_f.c: remove an unused function

drivers-char-moxac-if-0-an-unused-function.patch
drivers/char/moxa.c: #if 0 an unused function

fs-lockd-clntprocc-make-2-functions-static.patch
fs/lockd/clntproc.c: make 2 functions static

oss-sb_cardc-no-need-to-include-mcah.patch
OSS sb_card.c: no need to include mca.h

ioschedc-use-proper-documentation-path.patch
*-iosched.c: Use proper documentation path

kernel-resourcec-make-resource_op-static.patch
kernel/resource.c: make resource_op static

kernel-power-mainc-make-pm_states-static.patch
kernel/power/main.c: make pm_states static

kernel-sysc-make-some-code-static.patch
kernel/sys.c: make some code static

scsi-ipsc-make-some-code-static.patch
SCSI ips.c: make some code static

scsi-psi240ic-make-4-functions-static.patch
SCSI psi240i.c: make 4 functions static

scsi-src-make-a-struct-static.patch
SCSI sr.c: make a struct static

small-drivers-video-kyro-cleanups.patch
small drivers/video/kyro/ cleanups

drivers-video-i810-make-some-code-static.patch
drivers/video/i810/: make some code static

floppyc-make-some-code-static.patch
floppy.c: make some code static

drivers-block-nbdc-make-3-functions-static.patch
drivers/block/nbd.c: make 3 functions static

drivers-block-cpqarrayc-small-cleanups.patch
drivers/block/cpqarray.c: small cleanups

pcxx-remove-obsolete-driver.patch
pcxx: Remove obsolete driver

pty-oops-fix.patch
pty oops fix

mark-the-mcd-cdrom-driver-as-broken.patch
mark the mcd cdrom driver as BROKEN

warning-fix-in-drivers-cdrom-mcdc.patch
warning fix in drivers/cdrom/mcd.c

wavefront-reduce-stack-usage.patch
wavefront: reduce stack usage

mm-page-writebackc-remove-an-unused-function-2.patch
mm/page-writeback.c: remove an unused function #2

generic_serialh-kill-incorrect-gs_debug-reference.patch
generic_serial.h: kill incorrect gs_debug reference

kernel-timerc-make-two-variables-static.patch
kernel/timer.c: make two variables static

remove-the-unused-oss-maestro_tablesh.patch
remove the unused OSS maestro_tables.h

fs-hfs-misc-cleanups.patch
fs/hfs/: misc cleanups

fs-hpfs-make-some-code-static.patch
fs/hpfs/: make some code static

fs-hfsplus-misc-cleanups.patch
fs/hfsplus/: misc cleanups

i386-x86_64-processc-make-hlt_counter-static.patch
i386/x86_64 process.c: make hlt_counter static

i386-mach-default-topologyc-make-cpu_devices-static.patch
i386/mach-default/topology.c: make cpu_devices static

i386-math-emu-misc-cleanups.patch
i386/math-emu/: misc cleanups

non-pc-parport-config-change.patch
non-PC parport config change

prism54-misc-cleanups.patch
prism54: misc cleanups

scsi-qlogicfcc-some-cleanups.patch
SCSI qlogicfc.c: some cleanups

scsi-qlogicispc-some-cleanups.patch
SCSI qlogicisp.c: some cleanups

savagefbc-make-some-code-static.patch
savagefb.c: make some code static

hpet-setup-comment-fix.patch
hpet setup comment fix

fs-ncpfs-ncplib_kernelc-make-a-function-static.patch
fs/ncpfs/ncplib_kernel.c: make a function static

kill-iphase5526.patch
kill IPHASE5526

fs-nfs-make-some-code-static.patch
fs/nfs/: make some code static

i386-x86_64-acpi-sleepc-kill-unused-acpi_save_state_disk.patch
i386/x86_64: acpi/sleep.c: kill unused acpi_save_state_disk

smpbootc-cleanups.patch
smp{,boot}.c cleanups

i386-kernel-i387c-misc-cleanups.patch
i386/kernel/i387.c: misc cleanups

i386-x86_64-i8259c-make-mask_and_ack_8259a-static.patch
i386/x86_64 i8259.c: make mask_and_ack_8259A static

scsi-sym53c416c-make-a-function-static.patch
SCSI sym53c416.c: make a function static

scsi-ultrastorc-make-a-variable-static.patch
SCSI ultrastor.c: make a variable static

kernel-intermodulec-make-inter_module_get-static.patch
kernel/intermodule.c: make inter_module_get static




2005-02-10 13:35:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
>
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
>
>
> - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> It seems that nothing else is going to come along and this is completely
> encapsulated.

Even if we accept a module that grants capabilities to groups this isn't fine
yet because it only supports two specific capabilities (and even those two in
different ways!) instead of adding generic support to bind capabilities to
groups.

More comments on the actual code:


+#include <linux/module.h>
+#include <linux/security.h>
+
+#define RT_LSM "Realtime LSM " /* syslog module name prefix */
+#define RT_ERR "Realtime: " /* syslog error message prefix */

+#include <linux/vermagic.h>
+MODULE_INFO(vermagic,VERMAGIC_STRING);

This doesn't belong into a module.

+#define MY_NAME __stringify(KBUILD_MODNAME)

Please use normal prefix. A module shouldn't behave differently depending on
what name you compile it as.

2005-02-10 18:56:04

by Robert Love

[permalink] [raw]
Subject: [patch] inotify for 2.6.11-rc3-mm2

On Thu, 2005-02-10 at 02:35 -0800, Andrew Morton wrote:

> -inotify.patch
> -inotify-fix_find_inode.patch
>
> I think my version is old, and it oopses.

It is old. I have sent you multiple updates. ;-)

Attached, find a patch against 2.6.11-rc3-mm2 of the latest inotify.

This version has numerous optimizations, bug fixes, and clean ups. It
introduces a generic notification layer to cleanly wrap both dnotify and
inotify hooks in fs/.

Pending is a data structure reorganization, to untangle some of the
locking.

Andrew, please apply!

Robert Love


inotify!

inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

* inotify's interface is a device node, not SIGIO. You open a
single fd to the device node, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure)
and Gamin (a FAM replacement).

Signed-off-by: Robert Love <[email protected]>

arch/sparc64/Kconfig | 13
drivers/char/Kconfig | 13
drivers/char/Makefile | 2
drivers/char/inotify.c | 1053 +++++++++++++++++++++++++++++++++++++++++++++
fs/attr.c | 33 -
fs/compat.c | 14
fs/file_table.c | 4
fs/inode.c | 3
fs/namei.c | 38 -
fs/open.c | 9
fs/read_write.c | 24 -
fs/super.c | 3
include/linux/fs.h | 7
include/linux/fsnotify.h | 235 ++++++++++
include/linux/inotify.h | 118 +++++
include/linux/miscdevice.h | 1
include/linux/sched.h | 2
kernel/user.c | 2
18 files changed, 1511 insertions(+), 63 deletions(-)

diff -urN linux-2.6.11-rc3-mm2/arch/sparc64/Kconfig linux-mm-inotify/arch/sparc64/Kconfig
--- linux-2.6.11-rc3-mm2/arch/sparc64/Kconfig 2005-02-10 13:17:32.212175080 -0500
+++ linux-mm-inotify/arch/sparc64/Kconfig 2005-02-10 13:18:40.358815216 -0500
@@ -88,6 +88,19 @@
bool
default y

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config SMP
bool "Symmetric multi-processing support"
---help---
diff -urN linux-2.6.11-rc3-mm2/drivers/char/inotify.c linux-mm-inotify/drivers/char/inotify.c
--- linux-2.6.11-rc3-mm2/drivers/char/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-mm-inotify/drivers/char/inotify.c 2005-02-10 13:18:40.360814912 -0500
@@ -0,0 +1,1053 @@
+/*
+ * drivers/char/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+static kmem_cache_t *inode_data_cachep;
+
+static int sysfs_attrib_max_user_devices;
+static int sysfs_attrib_max_user_watches;
+static unsigned int sysfs_attrib_max_queued_events;
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * For each inotify device, we need to keep track of the events queued on it,
+ * a list of the inodes that we are watching, and so on.
+ *
+ * This structure is protected by 'lock'. Lock ordering:
+ *
+ * dev->lock (protects dev)
+ * inode_lock (used to safely walk inode_in_use list)
+ * inode->i_lock (only needed for getting ref on inode_data)
+ */
+struct inotify_device {
+ wait_queue_head_t wait;
+ struct idr idr;
+ struct list_head events;
+ struct list_head watches;
+ spinlock_t lock;
+ unsigned int queue_size;
+ unsigned int event_count;
+ unsigned int max_events;
+ struct user_struct *user;
+};
+
+struct inotify_watch {
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+ struct inode *inode; /* associated inode */
+ struct inotify_device *dev; /* associated device */
+ struct list_head d_list; /* entry in device's list */
+ struct list_head i_list; /* entry in inotify_data's list */
+};
+
+/*
+ * A list of these is attached to each instance of the driver. In read(), this
+ * this list is walked and all events that can fit in the buffer are returned.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event;
+ struct list_head list;
+ char *filename;
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ sysfs_attrib_max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ sysfs_attrib_max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ sysfs_attrib_max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void __get_inode_data(struct inotify_inode_data *data)
+{
+ atomic_inc(&data->count);
+}
+
+/*
+ * get_inode_data - pin an inotify_inode_data structure. Returns the structure
+ * if successful and NULL on failure, which can only occur if inotify_data is
+ * not yet allocated. The inode must be pinned prior to invocation.
+ */
+static inline struct inotify_inode_data * get_inode_data(struct inode *inode)
+{
+ struct inotify_inode_data *data;
+
+ spin_lock(&inode->i_lock);
+ data = inode->inotify_data;
+ if (data)
+ __get_inode_data(data);
+ spin_unlock(&inode->i_lock);
+
+ return data;
+}
+
+/*
+ * put_inode_data - drop our reference on an inotify_inode_data and the
+ * inode structure in which it lives. If the reference count on inotify_data
+ * reaches zero, free it.
+ */
+static inline void put_inode_data(struct inode *inode)
+{
+ //spin_lock(&inode->i_lock);
+ if (atomic_dec_and_test(&inode->inotify_data->count)) {
+ kmem_cache_free(inode_data_cachep, inode->inotify_data);
+ inode->inotify_data = NULL;
+ }
+ //spin_unlock(&inode->i_lock);
+}
+
+/*
+ * find_inode - resolve a user-given path to a specific inode and return a nd
+ */
+static int find_inode(const char __user *dirname, struct nameidata *nd)
+{
+ int error;
+
+ error = __user_walk(dirname, LOOKUP_FOLLOW, nd);
+ if (error)
+ return error;
+
+ /* you can only watch an inode if you have read permissions on it */
+ return permission(nd->dentry->d_inode, MAY_READ, NULL);
+}
+
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *filename)
+{
+ struct inotify_kernel_event *kevent;
+
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (!kevent)
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (filename) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(filename) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ kevent->filename = kmalloc(len, GFP_ATOMIC);
+ if (!kevent->filename) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->filename, 0, len);
+ strncpy(kevent->filename, filename, strlen(filename));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->filename = NULL;
+ }
+
+ return kevent;
+}
+
+#define list_to_inotify_kernel_event(pos) \
+ list_entry((pos), struct inotify_kernel_event, list)
+
+#define inotify_dev_get_event(dev) \
+ (list_to_inotify_kernel_event(dev->events.next))
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *filename)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->filename;
+
+ if (!filename && !lastname)
+ return;
+ if (filename && lastname && !strcmp(lastname, filename))
+ return;
+ }
+
+ /*
+ * the queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event
+ */
+ if (dev->event_count > dev->max_events)
+ return;
+
+ /* the queue has just overflowed and we need to notify user space */
+ if (dev->event_count == dev->max_events) {
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ goto add_event_to_queue;
+ }
+
+ kevent = kernel_event(watch->wd, mask, cookie, filename);
+
+add_event_to_queue:
+ if (!kevent)
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wait);
+}
+
+static inline int inotify_dev_has_events(struct inotify_device *dev)
+{
+ return !list_empty(&dev->events);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ struct inotify_kernel_event *kevent;
+
+ if (!inotify_dev_has_events(dev))
+ return;
+
+ kevent = inotify_dev_get_event(dev);
+ list_del_init(&kevent->list);
+ if (kevent->filename)
+ kfree(kevent->filename);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ if (atomic_read(&dev->user->inotify_watches) >=
+ sysfs_attrib_max_user_watches)
+ return -ENOSPC;
+
+repeat:
+ if (!idr_pre_get(&dev->idr, GFP_KERNEL))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ if (ret == -EAGAIN) /* more memory is required, try again */
+ goto repeat;
+ else if (ret) /* the idr is full! */
+ return -ENOSPC;
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return 0;
+}
+
+/*
+ * inotify_dev_put_wd - release the given WD on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static int inotify_dev_put_wd(struct inotify_device *dev, s32 wd)
+{
+ if (!dev || wd < 0)
+ return -1;
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, wd);
+
+ return 0;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Grabs dev->lock, so the caller must not hold it.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (!watch)
+ return NULL;
+
+ watch->mask = mask;
+ watch->inode = inode;
+ watch->dev = dev;
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ if (inotify_dev_get_wd(dev, watch)) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ return watch;
+}
+
+/*
+ * delete_watch - removes the given 'watch' from the given 'dev'
+ *
+ * Caller must hold dev->lock.
+ */
+static void delete_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ inotify_dev_put_wd(dev, watch->wd);
+ kmem_cache_free(watch_cachep, watch);
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Caller must hold dev->lock.
+ * FIXME: Needs inotify_data->lock too. Don't need dev->lock, just pin it.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ if (!inode->inotify_data)
+ return NULL;
+
+ list_for_each_entry(watch, &inode->inotify_data->watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * dev_find_wd - given a (dev,wd) pair, returns the matching inotify_watcher
+ *
+ * Returns the results of looking up (dev,wd) in the idr layer. NULL is
+ * returned on error.
+ *
+ * The caller must hold dev->lock.
+ */
+static inline struct inotify_watch *dev_find_wd(struct inotify_device *dev,
+ u32 wd)
+{
+ return idr_find(&dev->idr, wd);
+}
+
+static int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * inotify_dev_add_watcher - add the given watcher to the given device instance
+ *
+ * Caller must hold dev->lock.
+ */
+static int inotify_dev_add_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ if (!dev || !watch)
+ return -EINVAL;
+
+ list_add(&watch->d_list, &dev->watches);
+ return 0;
+}
+
+/*
+ * inotify_dev_rm_watch - remove the given watch from the given device
+ *
+ * Caller must hold dev->lock because we call inotify_dev_queue_event().
+ */
+static int inotify_dev_rm_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ if (!watch)
+ return -EINVAL;
+
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ list_del_init(&watch->d_list);
+
+ return 0;
+}
+
+/*
+ * inode_add_watch - add a watch to the given inode
+ *
+ * Callers must hold dev->lock, because we call inode_find_dev().
+ */
+static int inode_add_watch(struct inode *inode, struct inotify_watch *watch)
+{
+ int ret;
+
+ if (!inode || !watch)
+ return -EINVAL;
+
+ spin_lock(&inode->i_lock);
+ if (!inode->inotify_data) {
+ /* inotify_data is not attached to the inode, so add it */
+ inode->inotify_data = kmem_cache_alloc(inode_data_cachep,
+ GFP_ATOMIC);
+ if (!inode->inotify_data) {
+ ret = -ENOMEM;
+ goto out_lock;
+ }
+
+ atomic_set(&inode->inotify_data->count, 0);
+ INIT_LIST_HEAD(&inode->inotify_data->watches);
+ spin_lock_init(&inode->inotify_data->lock);
+ } else if (inode_find_dev(inode, watch->dev)) {
+ /* a watch is already associated with this (inode,dev) pair */
+ ret = -EINVAL;
+ goto out_lock;
+ }
+ __get_inode_data(inode->inotify_data);
+ spin_unlock(&inode->i_lock);
+
+ list_add(&watch->i_list, &inode->inotify_data->watches);
+
+ return 0;
+out_lock:
+ spin_unlock(&inode->i_lock);
+ return ret;
+}
+
+static int inode_rm_watch(struct inode *inode,
+ struct inotify_watch *watch)
+{
+ if (!inode || !watch || !inode->inotify_data)
+ return -EINVAL;
+
+ list_del_init(&watch->i_list);
+
+ /* clean up inode->inotify_data */
+ put_inode_data(inode);
+
+ return 0;
+}
+
+/* Kernel API */
+
+/*
+ * inotify_inode_queue_event - queue an event with the given mask, cookie, and
+ * filename to any watches associated with the given inode.
+ *
+ * inode must be pinned prior to calling.
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ if (!inode->inotify_data)
+ return;
+
+ list_for_each_entry(watch, &inode->inotify_data->watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *filename)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (inode->inotify_data) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, filename);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+u32 inotify_get_cookie(void)
+{
+ atomic_inc(&inotify_cookie);
+ return atomic_read(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/*
+ * Caller must hold dev->lock.
+ */
+static void __remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ struct inode *inode;
+
+ inode = watch->inode;
+
+ inode_rm_watch(inode, watch);
+ inotify_dev_rm_watch(dev, watch);
+ delete_watch(dev, watch);
+
+ iput(inode);
+}
+
+/*
+ * destroy_watch - remove a watch from both the device and the inode.
+ *
+ * watch->inode must be pinned. We drop a reference before returning. Grabs
+ * dev->lock.
+ */
+static void remove_watch(struct inotify_watch *watch)
+{
+ struct inotify_device *dev = watch->dev;
+
+ spin_lock(&dev->lock);
+ __remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+}
+
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ spin_lock(&inode_lock);
+
+ /*
+ * We hold the inode_lock, so the inodes are not going anywhere, and
+ * we grab a reference on inotify_data before walking its list of
+ * watches.
+ */
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ struct inotify_inode_data *inode_data;
+ struct inotify_watch *watch;
+
+ if (inode->i_sb != sb)
+ continue;
+
+ inode_data = get_inode_data(inode);
+ if (!inode_data)
+ continue;
+
+ list_for_each_entry(watch, &inode_data->watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT, 0,
+ NULL);
+ __remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ put_inode_data(inode);
+ }
+
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/*
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+ struct inotify_inode_data *data;
+
+ data = get_inode_data(inode);
+ if (!data)
+ return;
+ list_for_each_entry_safe(watch, next, &data->watches, i_list)
+ remove_watch(watch);
+ put_inode_data(inode);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* The driver interface is implemented below */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+
+ poll_wait(file, &dev->wait, wait);
+
+ if (inotify_dev_has_events(dev))
+ return POLLIN | POLLRDNORM;
+
+ return 0;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ DECLARE_WAITQUEUE(wait, current);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* We only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int has_events;
+
+ spin_lock(&dev->lock);
+ has_events = inotify_dev_has_events(dev);
+ spin_unlock(&dev->lock);
+ if (has_events)
+ break;
+
+ if (file->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+
+ if (signal_pending(current))
+ return -EINTR;
+
+ add_wait_queue(&dev->wait, &wait);
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ schedule();
+
+ set_current_state(TASK_RUNNING);
+ remove_wait_queue(&dev->wait, &wait);
+ }
+
+ while (count >= event_size) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (!inotify_dev_has_events(dev)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ spin_unlock(&dev->lock);
+
+ /* We can't send this event, not enough space in the buffer */
+ if (event_size + kevent->event.len > count)
+ break;
+
+ /* Copy the entire event except the string to user space */
+ if (copy_to_user(buf, &kevent->event, event_size))
+ return -EFAULT;
+
+ buf += event_size;
+ count -= event_size;
+
+ /* Copy the filename to user space */
+ if (kevent->filename) {
+ if (copy_to_user(buf, kevent->filename,
+ kevent->event.len))
+ return -EFAULT;
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ spin_lock(&dev->lock);
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (atomic_read(&user->inotify_devs) >= sysfs_attrib_max_user_devices) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (!dev) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ atomic_inc(&current->user->inotify_devs);
+
+ idr_init(&dev->idr);
+
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wait);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = sysfs_attrib_max_queued_events;
+ dev->user = user;
+ spin_lock_init(&dev->lock);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+/*
+ * inotify_release_all_watches - destroy all watches on a given device
+ *
+ * FIXME: We need a lock on the watch here.
+ */
+static void inotify_release_all_watches(struct inotify_device *dev)
+{
+ struct inotify_watch *watch, *next;
+
+ list_for_each_entry_safe(watch, next, &dev->watches, d_list)
+ remove_watch(watch);
+}
+
+/*
+ * inotify_release_all_events - destroy all of the events on a given device
+ */
+static void inotify_release_all_events(struct inotify_device *dev)
+{
+ spin_lock(&dev->lock);
+ while (inotify_dev_has_events(dev))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+
+ inotify_release_all_watches(dev);
+ inotify_release_all_events(dev);
+
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+
+ kfree(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch;
+ struct nameidata nd;
+ int ret;
+
+ ret = find_inode((const char __user*) request->name, &nd);
+ if (ret)
+ return ret;
+
+ /* held in place by references in nd */
+ inode = nd.dentry->d_inode;
+
+ spin_lock(&dev->lock);
+
+ /*
+ * This handles the case of re-adding a directory we are already
+ * watching, we just update the mask and return 0
+ */
+ if (inotify_dev_is_watching_inode(dev, inode)) {
+ struct inotify_watch *owatch; /* the old watch */
+
+ owatch = inode_find_dev(inode, dev);
+ owatch->mask = request->mask;
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+
+ return owatch->wd;
+ }
+
+ spin_unlock(&dev->lock);
+
+ watch = create_watch(dev, request->mask, inode);
+ if (!watch) {
+ path_release(&nd);
+ return -ENOSPC;
+ }
+
+ spin_lock(&dev->lock);
+
+ /* We can't add anymore watches to this device */
+ if (inotify_dev_add_watch(dev, watch)) {
+ delete_watch(dev, watch);
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+ return -EINVAL;
+ }
+
+ ret = inode_add_watch(inode, watch);
+ if (ret < 0) {
+ list_del_init(&watch->d_list);
+ delete_watch(dev, watch);
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+ return ret;
+ }
+
+ spin_unlock(&dev->lock);
+
+ /*
+ * Demote the reference to nameidata to a reference to the inode held
+ * by the watch.
+ */
+ spin_lock(&inode_lock);
+ __iget(inode);
+ spin_unlock(&inode_lock);
+ path_release(&nd);
+
+ return watch->wd;
+}
+
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ int ret = 0;
+
+ spin_lock(&dev->lock);
+ watch = dev_find_wd(dev, wd);
+ spin_unlock(&dev->lock);
+ if (!watch) {
+ ret = -EINVAL;
+ goto out;
+ }
+ __remove_watch(watch, dev);
+
+out:
+ spin_unlock(&dev->lock);
+ return ret;
+}
+
+/*
+ * inotify_ioctl() - our device file's ioctl method
+ *
+ * The VFS serializes all of our calls via the BKL and we rely on that. We
+ * could, alternatively, grab dev->lock. Right now lower levels grab that
+ * where needed.
+ */
+static int inotify_ioctl(struct inode *ip, struct file *fp,
+ unsigned int cmd, unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ s32 wd;
+
+ dev = fp->private_data;
+ p = (void __user *) arg;
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (copy_from_user(&request, p, sizeof (request)))
+ return -EFAULT;
+ return inotify_add_watch(dev, &request);
+ case INOTIFY_IGNORE:
+ if (copy_from_user(&wd, p, sizeof (wd)))
+ return -EFAULT;
+ return inotify_ignore(dev, wd);
+ case FIONREAD:
+ return put_user(dev->queue_size, (int __user *) p);
+ default:
+ return -ENOTTY;
+ }
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (ret)
+ return ret;
+
+ sysfs_attrib_max_queued_events = 512;
+ sysfs_attrib_max_user_devices = 64;
+ sysfs_attrib_max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch), 0, SLAB_PANIC,
+ NULL, NULL);
+
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event), 0,
+ SLAB_PANIC, NULL, NULL);
+
+ inode_data_cachep = kmem_cache_create("inotify_inode_data_cache",
+ sizeof(struct inotify_inode_data), 0, SLAB_PANIC,
+ NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.11-rc3-mm2/drivers/char/Kconfig linux-mm-inotify/drivers/char/Kconfig
--- linux-2.6.11-rc3-mm2/drivers/char/Kconfig 2005-02-10 13:17:46.349025952 -0500
+++ linux-mm-inotify/drivers/char/Kconfig 2005-02-10 13:18:40.361814760 -0500
@@ -62,6 +62,19 @@
depends on VT && !S390 && !USERMODE
default y

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config SERIAL_NONSTANDARD
bool "Non-standard serial port support"
---help---
diff -urN linux-2.6.11-rc3-mm2/drivers/char/Makefile linux-mm-inotify/drivers/char/Makefile
--- linux-2.6.11-rc3-mm2/drivers/char/Makefile 2005-02-10 13:17:46.352025496 -0500
+++ linux-mm-inotify/drivers/char/Makefile 2005-02-10 13:18:40.362814608 -0500
@@ -9,6 +9,8 @@

obj-y += mem.o random.o tty_io.o n_tty.o tty_ioctl.o

+
+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
obj-y += misc.o
diff -urN linux-2.6.11-rc3-mm2/fs/attr.c linux-mm-inotify/fs/attr.c
--- linux-2.6.11-rc3-mm2/fs/attr.c 2005-02-10 13:17:35.850621952 -0500
+++ linux-mm-inotify/fs/attr.c 2005-02-10 13:19:39.655800704 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -107,31 +107,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -194,11 +171,9 @@
if (ia_valid & ATTR_SIZE)
up_write(&dentry->d_inode->i_alloc_sem);

- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.11-rc3-mm2/fs/compat.c linux-mm-inotify/fs/compat.c
--- linux-2.6.11-rc3-mm2/fs/compat.c 2005-02-10 13:17:35.890615872 -0500
+++ linux-mm-inotify/fs/compat.c 2005-02-10 13:18:45.405048072 -0500
@@ -36,7 +36,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1233,9 +1233,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.11-rc3-mm2/fs/file_table.c linux-mm-inotify/fs/file_table.c
--- linux-2.6.11-rc3-mm2/fs/file_table.c 2005-02-10 13:17:35.967604168 -0500
+++ linux-mm-inotify/fs/file_table.c 2005-02-10 13:18:45.406047920 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -122,6 +123,9 @@
struct vfsmount *mnt = file->f_vfsmnt;
struct inode *inode = dentry->d_inode;

+
+ fsnotify_close(dentry, inode, file->f_mode, dentry->d_name.name);
+
might_sleep();
/*
* The function eventpoll_release() should be the first called
diff -urN linux-2.6.11-rc3-mm2/fs/inode.c linux-mm-inotify/fs/inode.c
--- linux-2.6.11-rc3-mm2/fs/inode.c 2005-02-10 13:17:47.899790200 -0500
+++ linux-mm-inotify/fs/inode.c 2005-02-10 13:18:45.407047768 -0500
@@ -132,6 +132,9 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ inode->inotify_data = NULL;
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.11-rc3-mm2/fs/namei.c linux-mm-inotify/fs/namei.c
--- linux-2.6.11-rc3-mm2/fs/namei.c 2005-02-10 13:17:47.918787312 -0500
+++ linux-mm-inotify/fs/namei.c 2005-02-10 13:18:45.409047464 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1252,7 +1252,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1557,7 +1557,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1630,7 +1630,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1720,10 +1720,8 @@
}
}
up(&dentry->d_inode->i_sem);
- if (!error) {
- inode_dir_notify(dir, DN_DELETE);
- d_delete(dentry);
- }
+ if (!error)
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
dput(dentry);

return error;
@@ -1793,10 +1791,9 @@
up(&dentry->d_inode->i_sem);

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
- if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
- d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
- }
+ if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED))
+ fsnotify_unlink(dentry->d_inode, dir, dentry);
+
return error;
}

@@ -1870,7 +1867,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1943,7 +1940,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2107,6 +2104,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2128,18 +2126,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.11-rc3-mm2/fs/open.c linux-mm-inotify/fs/open.c
--- linux-2.6.11-rc3-mm2/fs/open.c 2005-02-10 13:17:36.101583800 -0500
+++ linux-mm-inotify/fs/open.c 2005-02-10 13:18:45.410047312 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -944,9 +944,14 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+ struct dentry *dentry;
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ dentry = f->f_dentry;
+ fsnotify_open(dentry, dentry->d_inode,
+ dentry->d_name.name);
fd_install(fd, f);
}
out:
@@ -998,7 +1003,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.11-rc3-mm2/fs/read_write.c linux-mm-inotify/fs/read_write.c
--- linux-2.6.11-rc3-mm2/fs/read_write.c 2005-02-10 13:17:48.151751896 -0500
+++ linux-mm-inotify/fs/read_write.c 2005-02-10 13:31:10.191823272 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -239,7 +239,10 @@
else
ret = do_sync_read(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_access(dentry, inode,
+ dentry->d_name.name);
current->rchar += ret;
}
current->syscr++;
@@ -287,7 +290,10 @@
else
ret = do_sync_write(file, buf, count, pos);
if (ret > 0) {
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+ fsnotify_modify(dentry, inode,
+ dentry->d_name.name);
current->wchar += ret;
}
current->syscw++;
@@ -523,9 +529,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+
+ if (type == READ)
+ fsnotify_access(dentry, inode, dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, inode, dentry->d_name.name);
+ }
return ret;
Efault:
ret = -EFAULT;
Files linux-2.6.11-rc3-mm2/fs/.read_write.c.swp and linux-mm-inotify/fs/.read_write.c.swp differ
diff -urN linux-2.6.11-rc3-mm2/fs/super.c linux-mm-inotify/fs/super.c
--- linux-2.6.11-rc3-mm2/fs/super.c 2005-02-10 13:17:36.202568448 -0500
+++ linux-mm-inotify/fs/super.c 2005-02-10 13:18:45.413046856 -0500
@@ -37,9 +37,9 @@
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
#include <linux/kobject.h>
+#include <linux/fsnotify.h>
#include <asm/uaccess.h>

-
void get_filesystem(struct file_system_type *fs);
void put_filesystem(struct file_system_type *fs);
struct file_system_type *get_fs_type(const char *name);
@@ -229,6 +229,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.11-rc3-mm2/include/linux/fs.h linux-mm-inotify/include/linux/fs.h
--- linux-2.6.11-rc3-mm2/include/linux/fs.h 2005-02-10 13:17:49.275581048 -0500
+++ linux-mm-inotify/include/linux/fs.h 2005-02-10 13:18:45.415046552 -0500
@@ -26,6 +26,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -474,6 +475,10 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct inotify_inode_data *inotify_data;
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1374,7 +1379,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.11-rc3-mm2/include/linux/fsnotify.h linux-mm-inotify/include/linux/fsnotify.h
--- linux-2.6.11-rc3-mm2/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-mm-inotify/include/linux/fsnotify.h 2005-02-10 13:18:45.416046400 -0500
@@ -0,0 +1,235 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct inode *inode, struct inode *dir,
+ struct dentry *dentry)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0, filename);
+ inotify_inode_queue_event(inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ inotify_inode_queue_event(inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0, filename);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct dentry *dentry, struct inode *inode,
+ mode_t mode, const char *filename)
+{
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.11-rc3-mm2/include/linux/inotify.h linux-mm-inotify/include/linux/inotify.h
--- linux-2.6.11-rc3-mm2/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-mm-inotify/include/linux/inotify.h 2005-02-10 13:18:45.417046248 -0500
@@ -0,0 +1,118 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie used for synchronizing two events */
+ size_t len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* directory name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+
+struct inotify_inode_data {
+ struct list_head watches; /* list of watches on this inode */
+ spinlock_t lock; /* lock protecting the struct */
+ atomic_t count; /* ref count */
+};
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern __u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline __u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.11-rc3-mm2/include/linux/miscdevice.h linux-mm-inotify/include/linux/miscdevice.h
--- linux-2.6.11-rc3-mm2/include/linux/miscdevice.h 2005-02-10 13:17:37.578359296 -0500
+++ linux-mm-inotify/include/linux/miscdevice.h 2005-02-10 13:18:45.418046096 -0500
@@ -2,6 +2,7 @@
#define _LINUX_MISCDEVICE_H
#include <linux/module.h>
#include <linux/major.h>
+#include <linux/device.h>

#define PSMOUSE_MINOR 1
#define MS_BUSMOUSE_MINOR 2
diff -urN linux-2.6.11-rc3-mm2/include/linux/sched.h linux-mm-inotify/include/linux/sched.h
--- linux-2.6.11-rc3-mm2/include/linux/sched.h 2005-02-10 13:17:49.435556728 -0500
+++ linux-mm-inotify/include/linux/sched.h 2005-02-10 13:18:45.419045944 -0500
@@ -403,6 +403,8 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.11-rc3-mm2/kernel/user.c linux-mm-inotify/kernel/user.c
--- linux-2.6.11-rc3-mm2/kernel/user.c 2005-02-10 13:17:49.580534688 -0500
+++ linux-mm-inotify/kernel/user.c 2005-02-10 13:18:45.420045792 -0500
@@ -120,6 +120,8 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);

new->mq_bytes = 0;
new->locked_shm = 0;


2005-02-10 19:22:08

by Adrian Bunk

[permalink] [raw]
Subject: [2.6.11-rc3-mm2 patch] mxser.c: remove unused variable

The following warning comes from Linus' tree:

<-- snip -->

...
CC drivers/char/mxser.o
drivers/char/mxser.c: In function `mxser_initbrd':
drivers/char/mxser.c:551: warning: unused variable `flags'
...

<-- snip -->


The fis is simple:

Signed-off-by: Adrian Bunk <[email protected]>

--- linux-2.6.11-rc3-mm2-full/drivers/char/mxser.c.old 2005-02-10 19:58:36.000000000 +0100
+++ linux-2.6.11-rc3-mm2-full/drivers/char/mxser.c 2005-02-10 19:58:56.000000000 +0100
@@ -548,7 +548,6 @@
static int mxser_initbrd(int board, struct mxser_hwconf *hwconf)
{
struct mxser_struct *info;
- unsigned long flags;
int retval;
int i, n;


2005-02-10 20:01:18

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Christoph Hellwig <[email protected]> wrote:
>
> On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
> >
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
> >
> >
> > - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> > It seems that nothing else is going to come along and this is completely
> > encapsulated.
>
> Even if we accept a module that grants capabilities to groups this isn't fine
> yet because it only supports two specific capabilities (and even those two in
> different ways!) instead of adding generic support to bind capabilities to
> groups.

I'm sure that got discussed somewhere in the 1000 emails which flew past
last time. Jack?

2005-02-10 20:53:30

by Jack O'Quin

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

[direct reply bounced, resending via gmail]

Andrew Morton <[email protected]> writes:

> Christoph Hellwig <[email protected]> wrote:
> >
> > On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
> > >
> > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
> > >
> > >
> > > - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> > > It seems that nothing else is going to come along and this is completely
> > > encapsulated.
> >
> > Even if we accept a module that grants capabilities to groups this
> > isn't fine yet because it only supports two specific capabilities
> > (and even those two in different ways!) instead of adding generic
> > support to bind capabilities to groups.
>
> I'm sure that got discussed somewhere in the 1000 emails which flew past
> last time. Jack?

[adding cc: for the main discussion participants]

Most people felt that a more general capabilities module would be nice
to have. But, no one offered any code, or volunteered to work on it.

I have no objection to that approach, but am not willing or able to do
it myself. My opinion is that expanding the scope of the LSM would
significantly increase its security risk. That job needs to be done
very carefully, by someone with a deep understanding of the kernel's
internal use of capabilities.

Perhaps, Christoph's suggestion could become part of a more general
module, which might replace the RT-LSM in the 2.8 timeframe. Our LSM
is a modest solution aimed at solving the immediate needs of audio
developers and users with minimal impact on kernel security or
correctness.

2005-02-10 22:13:20

by Corey Minyard

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Andrew Morton wrote:

>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
>
>
>- Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> It seems that nothing else is going to come along and this is completely
> encapsulated.
>
>- Various other stuff. If anyone has a patch in here which they think
> should be in 2.6.11, please let me know. I'm intending to merge the
> following into 2.6.11:
>
> alpha-add-missing-dma_mapping_error.patch
> fix-compat-shmget-overflow.patch
> fix-shmget-for-ppc64-s390-64-sparc64.patch
> binfmt_elf-clearing-bss-may-fail.patch
> qlogic-warning-fixes.patch
> oprofile-exittext-referenced-in-inittext.patch
> force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
> oprofile-arm-xscale1-pmu-support-fix.patch
>
>
>
>
The following one should probably go in:

>+update-to-ipmi-driver-to-support-old-dmi-spec.patch
>
>
Systems with old data will not work correctly without it. There seems
to be a few of them out there.

-Corey

2005-02-10 22:43:39

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, 2005-02-10 at 02:35 -0800, Andrew Morton wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
>
>
> - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> It seems that nothing else is going to come along and this is completely
> encapsulated.
>
> - Various other stuff. If anyone has a patch in here which they think
> should be in 2.6.11, please let me know. I'm intending to merge the
> following into 2.6.11:
>
> alpha-add-missing-dma_mapping_error.patch
> fix-compat-shmget-overflow.patch
> fix-shmget-for-ppc64-s390-64-sparc64.patch
> binfmt_elf-clearing-bss-may-fail.patch
> qlogic-warning-fixes.patch
> oprofile-exittext-referenced-in-inittext.patch
> force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
> oprofile-arm-xscale1-pmu-support-fix.patch

Without the aty128fb and radeonfb updates, current 2.6.11 is a
regression on pmac as it breaks sleep support on previously working
laptops. If you don't intend to get at least
try_to_acquire_console_sem() and aty128fb fix in, in which case i can
send you a minimal radeonfb patch, then I'll have to make another patch
for 2.6.11 that reverts some of the arch changes to re-enable sleep on
those machines.

Ben.


2005-02-10 22:57:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Benjamin Herrenschmidt <[email protected]> wrote:
>
> On Thu, 2005-02-10 at 02:35 -0800, Andrew Morton wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
> >
> >
> > - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> > It seems that nothing else is going to come along and this is completely
> > encapsulated.
> >
> > - Various other stuff. If anyone has a patch in here which they think
> > should be in 2.6.11, please let me know. I'm intending to merge the
> > following into 2.6.11:
> >
> > alpha-add-missing-dma_mapping_error.patch
> > fix-compat-shmget-overflow.patch
> > fix-shmget-for-ppc64-s390-64-sparc64.patch
> > binfmt_elf-clearing-bss-may-fail.patch
> > qlogic-warning-fixes.patch
> > oprofile-exittext-referenced-in-inittext.patch
> > force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
> > oprofile-arm-xscale1-pmu-support-fix.patch
>
> Without the aty128fb and radeonfb updates, current 2.6.11 is a
> regression on pmac as it breaks sleep support on previously working
> laptops.

Is that worse than the risk of the large patch?

> If you don't intend to get at least
> try_to_acquire_console_sem() and aty128fb fix in, in which case i can
> send you a minimal radeonfb patch, then I'll have to make another patch
> for 2.6.11 that reverts some of the arch changes to re-enable sleep on
> those machines.

Ho hum. PM and fbdev are regularly broken anyway. Please always identify
the patches by name - it helps avoid mistakes.

These?

add-try_acquire_console_sem.patch
update-aty128fb-sleep-wakeup-code-for-new-powermac-changes.patch
radeonfb-update.patch
radeonfb-build-fix.patch

2005-02-10 23:18:29

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
>...
> - Various other stuff. If anyone has a patch in here which they think
> should be in 2.6.11, please let me know. I'm intending to merge the
> following into 2.6.11:
>
> alpha-add-missing-dma_mapping_error.patch
> fix-compat-shmget-overflow.patch
> fix-shmget-for-ppc64-s390-64-sparc64.patch
> binfmt_elf-clearing-bss-may-fail.patch
> qlogic-warning-fixes.patch
> oprofile-exittext-referenced-in-inittext.patch
> force-read-implies-exec-for-all-32bit-processes-in-x86-64.patch
> oprofile-arm-xscale1-pmu-support-fix.patch
>...

As described in the patch description, I'd like to see
mark-the-mcd-cdrom-driver-as-broken.patch in 2.6.11 .

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2005-02-10 23:33:40

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

> > Without the aty128fb and radeonfb updates, current 2.6.11 is a
> > regression on pmac as it breaks sleep support on previously working
> > laptops.
>
> Is that worse than the risk of the large patch?

Well, it used to work upstream fine for some time now... The large patch
isn't risky imho, at least in the latest version I sent you. The bulk of
the changes are just code to re-initialize new chip that isn't executed
at all on earlier models. The main radeonfb code changes very little. I
haven't had a failure report with the latest patch yet.

> > If you don't intend to get at least
> > try_to_acquire_console_sem() and aty128fb fix in, in which case i can
> > send you a minimal radeonfb patch, then I'll have to make another patch
> > for 2.6.11 that reverts some of the arch changes to re-enable sleep on
> > those machines.
>
> Ho hum. PM and fbdev are regularly broken anyway. Please always identify
> the patches by name - it helps avoid mistakes.

Ahem ... not that badly broken on releases, I've been careful enough
that at least, powerbook sleep worked fine for some time now.

> These?
>
> add-try_acquire_console_sem.patch
> update-aty128fb-sleep-wakeup-code-for-new-powermac-changes.patch

Those 2 first at least yes

> radeonfb-update.patch
> radeonfb-build-fix.patch

And either the above, or I can do a minimal patch on radeonfb just
restoring sleep on earlier models (adding the pmac_feature call to
notify the arch code that we can wakeup the chip) if you don't want to
merge the bigger update.

Ben.


2005-02-11 00:04:58

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, Feb 10, 2005 at 02:51:44PM -0600, Jack O'Quin wrote:
> [direct reply bounced, resending via gmail]
>
> Andrew Morton <[email protected]> writes:
>
> > Christoph Hellwig <[email protected]> wrote:
> > >
> > > On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
> > > >
> > > >
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/
> > > >
> > > >
> > > > - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
> > > > It seems that nothing else is going to come along and this is completely
> > > > encapsulated.
> > >
> > > Even if we accept a module that grants capabilities to groups this
> > > isn't fine yet because it only supports two specific capabilities
> > > (and even those two in different ways!) instead of adding generic
> > > support to bind capabilities to groups.
> >
> > I'm sure that got discussed somewhere in the 1000 emails which flew past
> > last time. Jack?
>
> [adding cc: for the main discussion participants]
>
> Most people felt that a more general capabilities module would be nice
> to have. But, no one offered any code, or volunteered to work on it.

What happened to the RT rlimit code from Chris?

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 00:47:38

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

* Matt Mackall ([email protected]) wrote:
> What happened to the RT rlimit code from Chris?

I still have it, but I had the impression Ingo didn't like it as a long
term solution/hack (albeit small) to the scheduler. Whereas the rt-lsm
patch is wholly self-contained.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2005-02-11 02:10:21

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, Feb 10, 2005 at 04:47:27PM -0800, Chris Wright wrote:
> * Matt Mackall ([email protected]) wrote:
> > What happened to the RT rlimit code from Chris?
>
> I still have it, but I had the impression Ingo didn't like it as a long
> term solution/hack (albeit small) to the scheduler. Whereas the rt-lsm
> patch is wholly self-contained.

I think it's important to recognize that we're trying to address an
issue that has a much wider potential audience than pro audio users,
and not very far off - what is high end audio performance today will be
expected desktop performance next year.

So I think it's critical that we find solution that's appropriate for
_every single box_, because realistically vendors are going to ship
with this "wholly self-contained" feature turned on by default next
year, at which point the "containment" will be nil and whatever warts
it has will be with us forever.

The rlimit stuff is not perfect, but it's a much better fit for the
UNIX model generally, which is a fairly big win. Having it in the
system unconditionally doesn't trigger the gag reflex in quite the
same way as the LSM approach.

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 02:22:45

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, 2005-02-10 at 18:09 -0800, Matt Mackall wrote:
> On Thu, Feb 10, 2005 at 04:47:27PM -0800, Chris Wright wrote:
> > * Matt Mackall ([email protected]) wrote:
> > > What happened to the RT rlimit code from Chris?
> >
> > I still have it, but I had the impression Ingo didn't like it as a long
> > term solution/hack (albeit small) to the scheduler. Whereas the rt-lsm
> > patch is wholly self-contained.
>
> I think it's important to recognize that we're trying to address an
> issue that has a much wider potential audience than pro audio users,
> and not very far off - what is high end audio performance today will be
> expected desktop performance next year.
>
> So I think it's critical that we find solution that's appropriate for
> _every single box_, because realistically vendors are going to ship
> with this "wholly self-contained" feature turned on by default next
> year, at which point the "containment" will be nil and whatever warts
> it has will be with us forever.
>
> The rlimit stuff is not perfect, but it's a much better fit for the
> UNIX model generally, which is a fairly big win. Having it in the
> system unconditionally doesn't trigger the gag reflex in quite the
> same way as the LSM approach.
>

Without considering the userspace aspect, RT rlimits is the best
implementation I have seen. All others either break RT scheduling
semantics, or don't allow any way for root to maintain control of
the system after giving out RT privileges.


http://mobile.yahoo.com.au - Yahoo! Mobile
- Check & compose your email via SMS on your Telstra or Vodafone mobile.

2005-02-11 03:26:29

by Peter Williams

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Nick Piggin wrote:
> On Thu, 2005-02-10 at 18:09 -0800, Matt Mackall wrote:
>
>>On Thu, Feb 10, 2005 at 04:47:27PM -0800, Chris Wright wrote:
>>
>>>* Matt Mackall ([email protected]) wrote:
>>>
>>>>What happened to the RT rlimit code from Chris?
>>>
>>>I still have it, but I had the impression Ingo didn't like it as a long
>>>term solution/hack (albeit small) to the scheduler. Whereas the rt-lsm
>>>patch is wholly self-contained.
>>
>>I think it's important to recognize that we're trying to address an
>>issue that has a much wider potential audience than pro audio users,
>>and not very far off - what is high end audio performance today will be
>>expected desktop performance next year.
>>
>>So I think it's critical that we find solution that's appropriate for
>>_every single box_, because realistically vendors are going to ship
>>with this "wholly self-contained" feature turned on by default next
>>year, at which point the "containment" will be nil and whatever warts
>>it has will be with us forever.
>>
>>The rlimit stuff is not perfect, but it's a much better fit for the
>>UNIX model generally, which is a fairly big win. Having it in the
>>system unconditionally doesn't trigger the gag reflex in quite the
>>same way as the LSM approach.
>>
>
>
> Without considering the userspace aspect, RT rlimits is the best
> implementation I have seen. All others either break RT scheduling
> semantics, or don't allow any way for root to maintain control of
> the system after giving out RT privileges.

Personally, I think that the best approach to solving this problem is
from the privileges aspect. The ability to grant privileges to only set
RT policy is just an example of a general need for granting limited
privileges to a program and/or a user. So a solution that involved a
mechanism for granting a specified subset of root privileges to
specified users when running specified programs would have wider
application.

My limited understanding of SELinux (which may be mistaken) is that it
provides a basic framework for this level of privilege control and
perhaps the solution lies there.

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2005-02-11 03:41:47

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

[ the best solution is .... ]

[ my preferred solution is ... ]

[ it would be better if ... ]

[ this is a kludge and it should be done instead like ... ]

did nobody read what andrew wrote and what JOQ pointed out?

after weeks of debating this, no other conceptual solution emerged
that did not have at least as many problems as the RT LSM module, and
all other proposed solutions were also more invasive of other aspects
of kernel design and operations than RT LSM is.

--p

2005-02-11 05:04:56

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, 2005-02-10 at 22:41 -0500, Paul Davis wrote:
> [ the best solution is .... ]
>
> [ my preferred solution is ... ]
>
> [ it would be better if ... ]
>
> [ this is a kludge and it should be done instead like ... ]
>
> did nobody read what andrew wrote and what JOQ pointed out?
>
> after weeks of debating this, no other conceptual solution emerged
> that did not have at least as many problems as the RT LSM module, and
> all other proposed solutions were also more invasive of other aspects
> of kernel design and operations than RT LSM is.
>

Sure, it is quick and easy. Suits some. At least I do prefer
this to altering the semantics of realtime scheduling.

I can't say much about it because I'm not putting my hand up to
do anything. Just mentioning that rlimit would be better if not
for the userspace side of the equation. I think most were already
agreed on that point anyway though.

Nick



2005-02-11 05:09:53

by Peter Williams

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Paul Davis wrote:
> [ the best solution is .... ]
>
> [ my preferred solution is ... ]
>
> [ it would be better if ... ]
>
> [ this is a kludge and it should be done instead like ... ]
>
> did nobody read what andrew wrote and what JOQ pointed out?
>
> after weeks of debating this, no other conceptual solution emerged
> that did not have at least as many problems as the RT LSM module, and
> all other proposed solutions were also more invasive of other aspects
> of kernel design and operations than RT LSM is.

As I see it, what I said was in support of RT LSM (or at least the
approach that RT LSM is taking) so why are you attacking me. I'm on
your side :-)

Peter
PS I'm withdrawing the "unprivileged real time" feature from the
spa_no_frills and zaphod schedulers in the PlugSched patch as a result
of the discussions on SCHED_ISO and RT rlimits because the discussion
convinced me that it's the wrong way to go.
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2005-02-11 06:14:39

by John Cherry

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2 (compile stats)

Linux 2.6 (mm tree) Compile Statistics (gcc 3.4.1)

Web page with links to complete details:
http://developer.osdl.org/cherry/compile/

Kernel bzImage bzImage bzImage modules bzImage modules
(defconfig) (allno) (allyes) (allyes) (allmod) (allmod)
--------------- ---------- -------- -------- -------- -------- --------
2.6.11-rc3-mm2 14w/0e 0w/0e 192w/0e 6w/0e 19w/0e 172w/0e
2.6.11-rc3-mm1 13w/10e 0w/7e 196w/12e 6w/0e 18w/12e 177w/0e
2.6.11-rc2-mm2 15w/0e 0w/0e 201w/0e 6w/0e 18w/0e 182w/0e
2.6.11-rc2-mm1 15w/0e 0w/0e 306w/14e 6w/0e 18w/0e 294w/0e
2.6.11-rc1-mm2 21w/0e 0w/0e 316w/9e 6w/0e 22w/0e 294w/0e
2.6.11-rc1-mm1 21w/0e 0w/0e 319w/0e 6w/0e 23w/0e 298w/0e
2.6.10-mm3 21w/0e 0w/0e 320w/0e 6w/0e 23w/0e 299w/0e
2.6.10-mm2 21w/0e 0w/0e 440w/0e 6w/0e 23w/0e 420w/0e
2.6.10-mm1 12w/0e 0w/0e 414w/0e 6w/0e 17w/0e 399w/0e
2.6.10-rc3-mm1 12w/0e 0w/0e 414w/0e 6w/0e 16w/0e 401w/0e
2.6.10-rc2-mm4 15w/0e 1w/7e 421w/0e 6w/0e 16w/0e 408w/0e
2.6.10-rc2-mm3 15w/0e 0w/0e 1255w/12e 66w/0e 16w/0e 1507w/0e
2.6.10-rc2-mm2 15w/0e 0w/0e 1362w/15e 65w/0e 16w/0e 1612w/2e
2.6.10-rc2-mm1 15w/0e 0w/0e 1405w/11e 65w/0e 16w/0e 1652w/0e
2.6.10-rc1-mm5 16w/0e 0w/0e 1587w/0e 65w/0e 20w/0e 1834w/0e
2.6.10-rc1-mm4 16w/0e 0w/0e 1485w/9e 65w/0e 20w/0e 1732w/0e
(Compiles with gcc 3.2.2)
2.6.10-rc1-mm3 7w/31e 0w/9e 496w/141e 4w/0e 4w/50e 693w/83e
2.6.10-rc1-mm2 16w/1e 1w/1e 529w/1e 4w/0e 12w/1e 729w/0e
2.6.10-mm1 12w/0e 0w/0e 414w/0e 6w/0e 17w/0e 399w/0e
2.6.10-rc3-mm1 12w/0e 0w/0e 414w/0e 6w/0e 16w/0e 401w/0e
2.6.10-rc2-mm4 15w/0e 1w/7e 421w/0e 6w/0e 16w/0e 408w/0e
2.6.10-rc2-mm3 15w/0e 0w/0e 1255w/12e 66w/0e 16w/0e 1507w/0e
2.6.10-rc2-mm2 15w/0e 0w/0e 1362w/15e 65w/0e 16w/0e 1612w/2e
2.6.10-rc2-mm1 15w/0e 0w/0e 1405w/11e 65w/0e 16w/0e 1652w/0e
2.6.10-rc1-mm5 16w/0e 0w/0e 1587w/0e 65w/0e 20w/0e 1834w/0e
2.6.10-rc1-mm4 16w/0e 0w/0e 1485w/9e 65w/0e 20w/0e 1732w/0e
(Compiles with gcc 3.2.2)
2.6.10-rc1-mm3 7w/31e 0w/9e 496w/141e 4w/0e 4w/50e 693w/83e
2.6.10-rc1-mm2 16w/1e 1w/1e 529w/1e 4w/0e 12w/1e 729w/0e
2.6.10-rc1-mm1 16w/1e 1w/1e 592w/1e 4w/0e 13w/1e 857w/0e
2.6.9-mm1 6w/1e 1w/1e 1761w/15e 65w/0e 9w/0e 2086w/0e
2.6.9-rc4-mm1 5w/0e 0w/0e 1766w/11e 43w/0e 6w/0e 1798w/0e
2.6.9-rc3-mm3 5w/0e 0w/0e 1756w/11e 43w/0e 4w/0e 1786w/0e
2.6.9-rc3-mm2 10w/0e 4w/9e 1754w/14e 43w/0e 4w/0e 1782w/1e
2.6.9-rc3-mm1 10w/0e 4w/10e 1768w/0e 43w/0e 4w/0e 1796w/0e
2.6.9-rc2-mm4 10w/0e 5w/0e 2573w/0e 41w/0e 4w/0e 2600w/0e
2.6.9-rc2-mm3 10w/0e 5w/0e 2400w/0e 41w/0e 4w/0e 2435w/0e
2.6.9-rc2-mm2 10w/0e 5w/0e 2919w/0e 41w/0e 4w/0e 2954w/0e
2.6.9-rc2-mm1 0w/0e 2w/0e 3541w/9e 41w/0e 3w/9e 3567w/0e
2.6.9-rc1-mm4 0w/0e 1w/0e 55w/0e 3w/0e 2w/0e 48w/0e
2.6.9-rc1-mm3 0w/0e 0w/0e 55w/13e 3w/0e 1w/0e 49w/1e
2.6.9-rc1-mm2 0w/0e 0w/0e 53w/11e 3w/0e 1w/0e 47w/0e
2.6.9-rc1-mm1 0w/0e 0w/0e 80w/0e 4w/0e 1w/0e 74w/0e
2.6.8.1-mm4 0w/0e 0w/0e 78w/0e 4w/0e 1w/0e 73w/0e
2.6.8.1-mm3 0w/96e 0w/0e 78w/97e 4w/0e 1w/0e 74w/89e
2.6.8.1-mm2 0w/96e 0w/0e 78w/97e 4w/0e 1w/0e 74w/89e
2.6.8.1-mm1 0w/0e 0w/0e 78w/0e 4w/0e 1w/0e 74w/0e
2.6.8-rc4-mm1 0w/0e 0w/5e 81w/0e 4w/0e 1w/0e 75w/0e
2.6.8-rc3-mm2 1w/7e 0w/5e 82w/8e 4w/0e 2w/8e 75w/0e
2.6.8-rc3-mm1 0w/0e 1w/5e 81w/9e 4w/0e 1w/0e 75w/0e
2.6.8-rc2-mm2 0w/0e 4w/5e 87w/9e 4w/0e 1w/0e 80w/0e
2.6.8-rc2-mm1 0w/0e 0w/0e 83w/9e 3w/0e 1w/0e 81w/0e
2.6.8-rc1-mm1 0w/0e 0w/0e 88w/9e 5w/0e 1w/0e 87w/0e
2.6.7-mm7 0w/0e 0w/0e 89w/9e 5w/0e 1w/0e 84w/0e
2.6.7-mm6 0w/0e 0w/0e 85w/9e 5w/0e 1w/0e 80w/0e
2.6.7-mm5 0w/0e 0w/0e 92w/0e 5w/0e 1w/0e 87w/0e
2.6.7-mm4 0w/0e 0w/0e 94w/0e 5w/0e 1w/0e 89w/0e
2.6.7-mm3 0w/0e 0w/0e 90w/6e 5w/0e 1w/0e 86w/0e
2.6.7-mm2 0w/0e 0w/0e 109w/0e 7w/0e 1w/0e 106w/0e
2.6.7-mm1 0w/0e 5w/0e 108w/0e 5w/0e 1w/0e 104w/0e
2.6.7-rc3-mm2 0w/0e 5w/0e 105w/10e 5w/0e 2w/0e 100w/2e
2.6.7-rc3-mm1 0w/0e 5w/0e 104w/10e 5w/0e 2w/0e 100w/2e
2.6.7-rc2-mm2 0w/0e 5w/0e 109w/10e 5w/0e 2w/0e 105w/2e
2.6.7-rc2-mm1 0w/0e 12w/0e 158w/13e 5w/0e 3w/0e 153w/4e
2.6.7-rc1-mm1 0w/0e 6w/0e 108w/0e 5w/0e 2w/0e 104w/0e
2.6.6-mm5 0w/0e 0w/0e 109w/5e 5w/0e 2w/0e 110w/0e
2.6.6-mm4 0w/0e 0w/0e 112w/9e 5w/0e 2w/5e 106w/1e
2.6.6-mm3 3w/9e 0w/0e 120w/26e 5w/0e 2w/0e 114w/10e
2.6.6-mm2 4w/11e 0w/0e 120w/24e 6w/0e 2w/0e 118w/9e
2.6.6-mm1 1w/0e 0w/0e 118w/25e 6w/0e 2w/0e 114w/10e
2.6.6-rc3-mm2 0w/0e 0w/0e 117w/ 0e 8w/0e 2w/0e 116w/0e
2.6.6-rc3-mm1 0w/0e 0w/0e 120w/10e 8w/0e 2w/0e 152w/2e
2.6.6-rc2-mm2 0w/0e 1w/5e 118w/ 0e 8w/0e 3w/0e 118w/0e
2.6.6-rc2-mm1 0w/0e 0w/0e 115w/ 0e 7w/0e 3w/0e 116w/0e
2.6.6-rc1-mm1 0w/0e 0w/7e 122w/ 0e 7w/0e 4w/0e 122w/0e
2.6.5-mm6 0w/0e 0w/0e 123w/ 0e 7w/0e 4w/0e 124w/0e
2.6.5-mm5 0w/0e 0w/0e 119w/ 0e 7w/0e 4w/0e 120w/0e
2.6.5-mm4 0w/0e 0w/0e 120w/ 0e 7w/0e 4w/0e 121w/0e
2.6.5-mm3 0w/0e 1w/0e 121w/12e 7w/0e 3w/0e 123w/0e
2.6.5-mm2 0w/0e 0w/0e 128w/12e 7w/0e 3w/0e 134w/0e
2.6.5-mm1 0w/0e 5w/0e 122w/ 0e 7w/0e 3w/0e 124w/0e
2.6.5-rc3-mm4 0w/0e 0w/0e 124w/ 0e 8w/0e 4w/0e 126w/0e
2.6.5-rc3-mm3 0w/0e 5w/0e 129w/14e 8w/0e 4w/0e 129w/6e
2.6.5-rc3-mm2 0w/0e 5w/0e 130w/14e 8w/0e 4w/0e 129w/6e
2.6.5-rc3-mm1 0w/0e 5w/0e 129w/ 0e 8w/0e 4w/0e 129w/0e
2.6.5-rc2-mm5 0w/0e 5w/0e 130w/ 0e 8w/0e 4w/0e 129w/0e
2.6.5-rc2-mm4 0w/0e 5w/0e 134w/ 0e 8w/0e 3w/0e 133w/0e
2.6.5-rc2-mm3 0w/0e 5w/0e 134w/ 0e 8w/0e 3w/0e 133w/0e
2.6.5-rc2-mm2 0w/0e 5w/0e 137w/ 0e 8w/0e 3w/0e 134w/0e
2.6.5-rc2-mm1 0w/0e 5w/0e 136w/ 0e 8w/0e 3w/0e 134w/0e
2.6.5-rc1-mm2 0w/0e 5w/0e 135w/ 5e 8w/0e 3w/0e 133w/0e
2.6.5-rc1-mm1 0w/0e 5w/0e 135w/ 5e 8w/0e 3w/0e 133w/0e
2.6.4-mm2 1w/2e 5w/2e 144w/10e 8w/0e 3w/2e 144w/0e
2.6.4-mm1 1w/0e 5w/0e 146w/ 5e 8w/0e 3w/0e 144w/0e
2.6.4-rc2-mm1 1w/0e 5w/0e 146w/12e 11w/0e 3w/0e 147w/2e
2.6.4-rc1-mm2 1w/0e 5w/0e 144w/ 0e 11w/0e 3w/0e 145w/0e
2.6.4-rc1-mm1 1w/0e 5w/0e 147w/ 5e 11w/0e 3w/0e 147w/0e
2.6.3-mm4 1w/0e 5w/0e 146w/ 0e 7w/0e 3w/0e 142w/0e
2.6.3-mm3 1w/2e 5w/2e 146w/15e 7w/0e 3w/2e 144w/5e
2.6.3-mm2 1w/8e 5w/0e 140w/ 0e 7w/0e 3w/0e 138w/0e
2.6.3-mm1 1w/0e 5w/0e 143w/ 5e 7w/0e 3w/0e 141w/0e
2.6.3-rc3-mm1 1w/0e 0w/0e 144w/13e 7w/0e 3w/0e 142w/3e
2.6.3-rc2-mm1 1w/0e 0w/265e 144w/ 5e 7w/0e 3w/0e 145w/0e
2.6.3-rc1-mm1 1w/0e 0w/265e 141w/ 5e 7w/0e 3w/0e 143w/0e
2.6.2-mm1 2w/0e 0w/264e 147w/ 5e 7w/0e 3w/0e 173w/0e
2.6.2-rc3-mm1 2w/0e 0w/265e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc2-mm2 0w/0e 0w/264e 145w/ 5e 7w/0e 3w/0e 171w/0e
2.6.2-rc2-mm1 0w/0e 0w/264e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc1-mm3 0w/0e 0w/265e 144w/ 8e 7w/0e 3w/0e 169w/0e
2.6.2-rc1-mm2 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.2-rc1-mm1 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.1-mm5 2w/5e 0w/264e 153w/11e 10w/0e 3w/0e 180w/0e
2.6.1-mm4 0w/821e 0w/264e 154w/ 5e 8w/1e 5w/0e 179w/0e
2.6.1-mm3 0w/0e 0w/0e 151w/ 5e 10w/0e 3w/0e 177w/0e
2.6.1-mm2 0w/0e 0w/0e 143w/ 5e 12w/0e 3w/0e 171w/0e
2.6.1-mm1 0w/0e 0w/0e 146w/ 9e 12w/0e 6w/0e 171w/0e
2.6.1-rc2-mm1 0w/0e 0w/0e 149w/ 0e 12w/0e 6w/0e 171w/4e
2.6.1-rc1-mm2 0w/0e 0w/0e 157w/15e 12w/0e 3w/0e 185w/4e
2.6.1-rc1-mm1 0w/0e 0w/0e 156w/10e 12w/0e 3w/0e 184w/2e
2.6.0-mm2 0w/0e 0w/0e 161w/ 0e 12w/0e 3w/0e 189w/0e
2.6.0-mm1 0w/0e 0w/0e 173w/ 0e 12w/0e 3w/0e 212w/0e

John



2005-02-11 06:34:22

by Peter Williams

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Nick Piggin wrote:
> On Thu, 2005-02-10 at 22:41 -0500, Paul Davis wrote:
>
>> [ the best solution is .... ]
>>
>> [ my preferred solution is ... ]
>>
>> [ it would be better if ... ]
>>
>> [ this is a kludge and it should be done instead like ... ]
>>
>>did nobody read what andrew wrote and what JOQ pointed out?
>>
>>after weeks of debating this, no other conceptual solution emerged
>>that did not have at least as many problems as the RT LSM module, and
>>all other proposed solutions were also more invasive of other aspects
>>of kernel design and operations than RT LSM is.
>>
>
>
> Sure, it is quick and easy. Suits some. At least I do prefer
> this to altering the semantics of realtime scheduling.
>
> I can't say much about it because I'm not putting my hand up to
> do anything. Just mentioning that rlimit would be better if not
> for the userspace side of the equation. I think most were already
> agreed on that point anyway though.

I think that the rlimits are a good idea in themselves but not as a
solution to this problem. I.e. having a RT CPU rate rlimit should not
be a sufficient (or necessary for that matter) condition to change
policy to SCHED_OTHER or SCHED_RR but could still be used to limit the
possibility of lock out. (But I guess even that is a violation of RT
semantics?)

Peter
PS Zaphod's per task hard/soft CPU rate caps (which are the equivalent
of an rlimit on CPU usage rate) are only enforced for SCHED_NORMAL tasks
and should not (therefore) effect RT semantics.
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2005-02-11 06:43:12

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, 2005-02-11 at 17:34 +1100, Peter Williams wrote:
> Nick Piggin wrote:

> > I can't say much about it because I'm not putting my hand up to
> > do anything. Just mentioning that rlimit would be better if not
> > for the userspace side of the equation. I think most were already
> > agreed on that point anyway though.
>
> I think that the rlimits are a good idea in themselves but not as a
> solution to this problem. I.e. having a RT CPU rate rlimit should not
> be a sufficient (or necessary for that matter) condition to change
> policy to SCHED_OTHER or SCHED_RR but could still be used to limit the
> possibility of lock out.

Ah well that may be a good way to do it indeed. As I said, I
don't know much about privileges etc.

But I just want to be clear that I'm not trying to stop RT-LSM
going in (if only because I don't care one way or the other
about it).

> (But I guess even that is a violation of RT
> semantics?)
>

I'd have to re-read the standard, but it may not be. For example,
a compliant system advertises the minimum and maximum priority
levels available - you may be able to adjust these based on what
the rlimit is set to. On the other hand, yes it may violate the
stanards.

Nick



2005-02-11 06:58:29

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Thu, Feb 10, 2005 at 10:41:28PM -0500, Paul Davis wrote:
> [ the best solution is .... ]
>
> [ my preferred solution is ... ]
>
> [ it would be better if ... ]
>
> [ this is a kludge and it should be done instead like ... ]
>
> did nobody read what andrew wrote and what JOQ pointed out?
>
> after weeks of debating this, no other conceptual solution emerged
> that did not have at least as many problems as the RT LSM module, and
> all other proposed solutions were also more invasive of other aspects
> of kernel design and operations than RT LSM is.

Eh? Chris Wright's original rlimits patch was very straightforward
(unlike some of the other rlimit-like patches that followed).
I haven't heard the downsides of it yet.

simple rlimits:
logical extension of standard, flexible interface
fine-grained per-process access to nice levels and priorities
managed with standard tools
fairly broad possible applications
clean enough to be added unconditionally
already doing mlock this way!

RT LSM:
new, narrow magic group interface (module parameters!)
boolean granularity of access to all RT levels and maybe mlock
potential interesting interaction with other LSMs
not orthogonal to mlock
not appropriate for every box out there
requires lsm and (sysfs or modprobe)

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 07:54:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> Eh? Chris Wright's original rlimits patch was very straightforward
> [...]

the problem is that it didnt solve the problem (unprivileged user can
lock up the system) in any way. So after it became visible that all the
existing 'dont allow users to lock up' solutions are too invasive, we
went to recommend the solution that introduces the least architectural
problems: RT-LSM.

Ingo

2005-02-11 08:15:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> > > What happened to the RT rlimit code from Chris?
> >
> > I still have it, but I had the impression Ingo didn't like it as a long
> > term solution/hack (albeit small) to the scheduler. Whereas the rt-lsm
> > patch is wholly self-contained.
>
> I think it's important to recognize that we're trying to address an
> issue that has a much wider potential audience than pro audio users,
> and not very far off - what is high end audio performance today will
> be expected desktop performance next year.

i disagree that desktop performance tomorrow will necessarily have to
utilize SCHED_FIFO. Today's desktop audio applications perform quite
good at SCHED_NORMAL priorities [with the 2.6.11 kernel that has more
interactivity/latency fixes such as PREEMPT_BKL].

I agree (and hope) that tomorrow's "stock" desktop will be based on
today's pro audio architectures, but tomorrows CPUs will be much faster
and tomorrows desktop apps dont want to spend 30%+ CPU time on creating
audio.

the pro applications will always want to have a 100% guarantee (it
really sucks to generate a nasty audio click during a live performance)
and want to utilize as much CPU time for audio as needed. They are also
clearly the most complex creators of audio so they go far above the
normal (and reasonable) CPU-use/latency expectations and tradeoffs of
the stock scheduler.

> So I think it's critical that we find solution that's appropriate for
> _every single box_, because realistically vendors are going to ship
> with this "wholly self-contained" feature turned on by default next
> year, at which point the "containment" will be nil and whatever warts
> it has will be with us forever.

an "RT priorities rlimit" is still not adequate as a desktop solution,
because it still allows the box to be locked up. Also, if it turns out
to be a mistake then it's already codified into the ABI, while RT-LSM is
much less 'persistent' and could be replaced much easier. RT-LSM is also
more flexible and more practical. (an rlimit needs changes across a
number of userspace components, delaying its adoptation.)

> The rlimit stuff is not perfect, but it's a much better fit for the
> UNIX model generally, which is a fairly big win. [...]

a 'locked up box' is as far away from the UNIX model as it gets.

perhaps, if the need arises, we can add the RT-throttling sysctl (which
still wont give RT priorities to unprivileged users and would serve as a
way to throttle privileged RT tasks), which could thus make the RT-LSM
solution pretty safe. Right now Jack has its own watchdog thread which
should solve most of the lockup situations. Lets not overdesign the
solution, especially when we dont yet know how the problem really looks
like.

or an even simpler solution for the lockup problem would be a
kernel-based RT watchdog. In fact 2.6.11-rc3-mm2 already includes such a
watchdog (written by yours truly):

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.6.11-rc3-mm2/broken-out/detect-soft-lockups.patch

right now softlockup-detect runs at SCHED_FIFO prio 99 and only prints a
warning - but it could easily run at SCHED_FIFO prio 1 [to detect
lockups generated by all RT tasks] and it could actively try to renice
(or kill) tasks that run for too long. So very likely there will be an
easy upstream mechanism for any problem that could arise out of RT-LSM.

Ingo

2005-02-11 08:23:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 09:14:22AM +0100, Ingo Molnar wrote:
> an "RT priorities rlimit" is still not adequate as a desktop solution,
> because it still allows the box to be locked up. Also, if it turns out
> to be a mistake then it's already codified into the ABI, while RT-LSM is
> much less 'persistent' and could be replaced much easier. RT-LSM is also
> more flexible and more practical. (an rlimit needs changes across a
> number of userspace components, delaying its adoptation.)

Putting it into the tree means a gurantee we'll keep it going. It'd
probably much better if Jack just keepts it separatly. Especially as
his lack of even making it generic shows that he's unwilling to invest
work into it that doesn't benfit him personally.

2005-02-11 08:26:11

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 08:54:17AM +0100, Ingo Molnar wrote:
>
> * Matt Mackall <[email protected]> wrote:
>
> > Eh? Chris Wright's original rlimits patch was very straightforward
> > [...]
>
> the problem is that it didnt solve the problem (unprivileged user can
> lock up the system) in any way.

There are two separate but related problems:

a) need a way to give non-root access to SCHED_FIFO without other
privileges

b) would like a way to have RT-like capabilities without risk of DoS

The original rlimits patch solves (a), which is the pressing concern.

The existence of a satisfactory solution to related problem (b) has
yet to be demonstrated. And even if a solution for (b) is found that
is satisfactory for, say, high end audio users, it may not necessarily
be sufficient for everyone who might have wanted SCHED_FIFO for
non-root processes. So we still need a solution for (a).

> So after it became visible that all the
> existing 'dont allow users to lock up' solutions are too invasive, we
> went to recommend the solution that introduces the least architectural
> problems: RT-LSM.

RT-LSM introduces architectural problems in the form of bogus API. And
I claim that if RT-LSM becomes part of the mainline kernel, it -will-
become a default feature on the desktop in short order. The fact that
it's implemented as an LSM is meaningless if Redhat and SuSE ship it
on by default.

So the comparison boils down to putting a magic gid in a sysfs
file/module parameter or setting an rlimit with standard tools (PAM,
etc). I'm really boggled that anyone could prefer the former,
especially since we had almost this exact debate over what became the
mlock rlimit!

Here's Chris' patch for reference:

http://groups-beta.google.com/group/linux.kernel/msg/6408569e13ed6e80

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 08:42:08

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 09:14:22AM +0100, Ingo Molnar wrote:
>
> > I think it's important to recognize that we're trying to address an
> > issue that has a much wider potential audience than pro audio users,
> > and not very far off - what is high end audio performance today will
> > be expected desktop performance next year.
>
> i disagree that desktop performance tomorrow will necessarily have to
> utilize SCHED_FIFO. Today's desktop audio applications perform quite
> good at SCHED_NORMAL priorities [with the 2.6.11 kernel that has more
> interactivity/latency fixes such as PREEMPT_BKL].

Desktop performance tomorrow will want realtime audio AND video.
Think simultaneous record and playback of multiple high-definition
video streams. There's a demand for this; my company already sells it.

> the pro applications will always want to have a 100% guarantee (it
> really sucks to generate a nasty audio click during a live performance)
> and want to utilize as much CPU time for audio as needed. They are also
> clearly the most complex creators of audio so they go far above the
> normal (and reasonable) CPU-use/latency expectations and tradeoffs of
> the stock scheduler.

The pro will want to do his work on a stock desktop system. More
importantly, the hobbyist will want to do exactly what the pro is
doing on the same system.

> > So I think it's critical that we find solution that's appropriate for
> > _every single box_, because realistically vendors are going to ship
> > with this "wholly self-contained" feature turned on by default next
> > year, at which point the "containment" will be nil and whatever warts
> > it has will be with us forever.
>
> an "RT priorities rlimit" is still not adequate as a desktop solution,
> because it still allows the box to be locked up. Also, if it turns out
> to be a mistake then it's already codified into the ABI, while RT-LSM is
> much less 'persistent' and could be replaced much easier. RT-LSM is also
> more flexible and more practical. (an rlimit needs changes across a
> number of userspace components, delaying its adoptation.)

I'm very suspicious about being able to rip out RT-LSM once it's
introduced. See devfs. And I think the adoption barrier thing is a red
herring as well: the current users are by and large compiling their
own RT-tuned kernels.

> > The rlimit stuff is not perfect, but it's a much better fit for the
> > UNIX model generally, which is a fairly big win. [...]
>
> a 'locked up box' is as far away from the UNIX model as it gets.

Rlimits are already the favored tool for dealing with the classic UNIX DoS:
the fork bomb. Turn off process limits, tada, locked up box.

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 08:49:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> Here's Chris' patch for reference:
>
> http://groups-beta.google.com/group/linux.kernel/msg/6408569e13ed6e80

how does this patch solve the separation of 'negative nice values' and
'RT priority rlimits'? In one piece of code it handles the rlimit value
as a 0-39 nice value, in another place it handles it as a limit for a
1-100 RT priority range. The two ranges overlap and have nothing to do
with each other. [*]

anyway, as long as it doesnt touch the scheduler runtime code (and it
doesnt), both types of solutions are fine to me - it's basically the
security-subsystem people's call.

if the patch solves the negative-nice-value and the RT-priority issues
at once, then it indeed looks more flexible (and more generic) than the
LSM solution. [**]

Ingo

[*] one acceptable way to 'merge' the two priority ranges would be to
introduce a unified priority range of 0-139: 0-39 would be for nice
values while 40-139 would be for RT priorities 1-99. NOTE: due to
rlimit semantics (users can always lower them without any security
checks), value 39 _must_ denote nice -20 and value 0 must denote
nice +19. I.e. it must strictly in increasing priority order.

[**] in fact, the 'Gnome problem' wrt. suid/gid binaries would be solved
via the rlimit too.

2005-02-11 08:59:01

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 09:48:43AM +0100, Ingo Molnar wrote:
>
> * Matt Mackall <[email protected]> wrote:
>
> > Here's Chris' patch for reference:
> >
> > http://groups-beta.google.com/group/linux.kernel/msg/6408569e13ed6e80
>
> how does this patch solve the separation of 'negative nice values' and
> 'RT priority rlimits'? In one piece of code it handles the rlimit value
> as a 0-39 nice value, in another place it handles it as a limit for a
> 1-100 RT priority range. The two ranges overlap and have nothing to do
> with each other. [*]

Read more closely: there are two independent limits in the patch,
RLIMIT_NICE and RLIMIT_RTPRIO. This lets us grant elevated nice
without SCHED_FIFO.

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 09:01:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> > i disagree that desktop performance tomorrow will necessarily have to
> > utilize SCHED_FIFO. Today's desktop audio applications perform quite
> > good at SCHED_NORMAL priorities [with the 2.6.11 kernel that has more
> > interactivity/latency fixes such as PREEMPT_BKL].
>
> Desktop performance tomorrow will want realtime audio AND video.
> Think simultaneous record and playback of multiple high-definition
> video streams. There's a demand for this; my company already sells it.

Tomorrow's hardware will have enough buffering as today's hardware has
for simpler tasks. Repeat after me: it likely _wont_ _need_ SCHED_FIFO.
Running tomorrow's hardware on today's boxes indeed pushes the system to
its limits, but torrows hardware will be well-balanced just as much as
today's is - if nothing else then due to kernel drivers providing a
buffering guarantee.

think of SCHED_FIFO on the desktop as an ugly wart, a hammer, that
destroys the careful balance of priorities of SCHED_OTHER tasks. Yes, it
can be useful if you _need_ a scheduling guarantee due to physical
constraints, and it can be useful if the hardware (or the kernel) cannot
buffer enough, but otherwise, it only causes problems.

> I'm very suspicious about being able to rip out RT-LSM once it's
> introduced. [...]

yeah, i somewhat share that view. (despite all the promises from the
audio folks - if they are just half as agressive resisting removal as
they were pushing integration then it will never be removed ;-)

but i'm not sure how rlimits will contain the whole problem - can
rlimits be restricted to a single app (jackd)? The most canonical use of
rlimits is per-user (per-group), so the rlimit could end up _widening_
the effects of the hack ...

> > > The rlimit stuff is not perfect, but it's a much better fit for the
> > > UNIX model generally, which is a fairly big win. [...]
> >
> > a 'locked up box' is as far away from the UNIX model as it gets.
>
> Rlimits are already the favored tool for dealing with the classic UNIX
> DoS: the fork bomb. Turn off process limits, tada, locked up box.

the big difference is that process limits are finegrained and it has a
single value (unlimited) that allows the DoS - while the RT-rlimits have
_one_ value that is safe, all the other values are unsafe!

Ingo

2005-02-11 09:05:02

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> Read more closely: there are two independent limits in the patch,
> RLIMIT_NICE and RLIMIT_RTPRIO. This lets us grant elevated nice
> without SCHED_FIFO.

ok, indeed.

Ingo

2005-02-11 09:12:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> So the comparison boils down to putting a magic gid in a sysfs
> file/module parameter or setting an rlimit with standard tools (PAM,
> etc). I'm really boggled that anyone could prefer the former,
> especially since we had almost this exact debate over what became the
> mlock rlimit!

the big difference to mlock is that for mlock there _is_ a _limit_. For
RT scheduling the priority is _NOT_ a _limit_. Okay? So you give the
false pretense of this being some kind of resource 'limit', while in
fact allowing SCHED_FIFO prio 1 alone enables unprivileged users to lock
up the system.

so i could agree with RLIMIT_NICE (which _is_ a limit), but
RLIMIT_RTPRIO sends the wrong message. The proper rlimit would be
RLIMIT_RT_CPU (the patch i did).

Ingo

2005-02-11 09:28:48

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 10:04:19AM +0100, Ingo Molnar wrote:
>
> * Matt Mackall <[email protected]> wrote:
>
> > So the comparison boils down to putting a magic gid in a sysfs
> > file/module parameter or setting an rlimit with standard tools (PAM,
> > etc). I'm really boggled that anyone could prefer the former,
> > especially since we had almost this exact debate over what became the
> > mlock rlimit!
>
> the big difference to mlock is that for mlock there _is_ a _limit_. For
> RT scheduling the priority is _NOT_ a _limit_. Okay? So you give the
> false pretense of this being some kind of resource 'limit', while in
> fact allowing SCHED_FIFO prio 1 alone enables unprivileged users to lock
> up the system.
>
> so i could agree with RLIMIT_NICE (which _is_ a limit), but
> RLIMIT_RTPRIO sends the wrong message. The proper rlimit would be
> RLIMIT_RT_CPU (the patch i did).

It's not a perfect fit, I'll readily agree.

But consider this: with RLIMIT_RTPRIO, I can restrict a user to the
lowest N RT priorities. Then at N+1, I have an RT watchdog taking care
of runaways, tickled by a SCHED_NORMAL task. So it can still be looked
at as a meaningful limit, just a bit different from the others.

The RT LSM gives full CAP_SYS_NICE out, so there's no way to guarantee
that the watchdog has higher priority.

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 09:40:42

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 09:59:42AM +0100, Ingo Molnar wrote:
>
> think of SCHED_FIFO on the desktop as an ugly wart, a hammer, that
> destroys the careful balance of priorities of SCHED_OTHER tasks. Yes, it
> can be useful if you _need_ a scheduling guarantee due to physical
> constraints, and it can be useful if the hardware (or the kernel) cannot
> buffer enough, but otherwise, it only causes problems.

Agreed. I think something short of full SCHED_FIFO will make most
desktop folks happy. But a) we still have to figure out exactly how to
do that and b) we still have to make everyone else happy. The embedded
folks (me included) would prefer to not run our realtime bits as root
too..

> but i'm not sure how rlimits will contain the whole problem - can
> rlimits be restricted to a single app (jackd)?

Yes. There's also the whole soft limit thing.

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 09:56:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> On Fri, Feb 11, 2005 at 09:59:42AM +0100, Ingo Molnar wrote:
> >
> > think of SCHED_FIFO on the desktop as an ugly wart, a hammer, that
> > destroys the careful balance of priorities of SCHED_OTHER tasks. Yes, it
> > can be useful if you _need_ a scheduling guarantee due to physical
> > constraints, and it can be useful if the hardware (or the kernel) cannot
> > buffer enough, but otherwise, it only causes problems.
>
> Agreed. I think something short of full SCHED_FIFO will make most
> desktop folks happy. [...]

ah, but it's not the desktop folks who have to be happy but users :-)
Really, if you ask any app designer then obviously 'the more CPU time we
get for sure, the better our app behaves'. So in that sense SCHED_OTHER
is a fair playground: if you behave nicely you'll have higher priority
and shorter latencies.

(there are things like SCHED_ISO but how good of a solution they are is
not yet clear.)

> [...] But a) we still have to figure out exactly how to do that and b)
> we still have to make everyone else happy. The embedded folks (me
> included) would prefer to not run our realtime bits as root too..

you dont have to - you can drop root after startup.

> > but i'm not sure how rlimits will contain the whole problem - can
> > rlimits be restricted to a single app (jackd)?
>
> Yes. There's also the whole soft limit thing.

i'm curious, how does this 'per-app' rlimit thing work? If a user has
jackd installed and runs it from X unprivileged, how does it get the
elevated rlimit? (while the rest of his desktop still runs with a safe
rlimit.) SELinux/RT-LSM could do this, but i'm not sure about how
rlimits give this to you.

Ingo

2005-02-11 16:28:48

by Yuval Tanny

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

In fs/Kconfig,

See "Documentation/filesystems/fscache.txt for more information." and
"See Documentation/filesystems/cachefs.txt for more information."

Should be changed to:

"See Documentation/filesystems/caching/fscache.txt for more
information." and "See Documentation/filesystems/caching/cachefs.txt for
more information."


Thanks,

Yuval


Andrew Morton wrote:

>cachefs-filesystem.patch
> CacheFS filesystem
>
>

2005-02-11 17:46:53

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

>introduced. See devfs. And I think the adoption barrier thing is a red
>herring as well: the current users are by and large compiling their
>own RT-tuned kernels.

not true. most people are using kernels built for specialized distros
or addons, such as CCRMA, Demudi, Ubuntu, or dyne:bolic.

--p

2005-02-11 17:45:45

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 10:53:27AM +0100, Ingo Molnar wrote:
>
> * Matt Mackall <[email protected]> wrote:
>
> > On Fri, Feb 11, 2005 at 09:59:42AM +0100, Ingo Molnar wrote:
> > >
> > > think of SCHED_FIFO on the desktop as an ugly wart, a hammer, that
> > > destroys the careful balance of priorities of SCHED_OTHER tasks. Yes, it
> > > can be useful if you _need_ a scheduling guarantee due to physical
> > > constraints, and it can be useful if the hardware (or the kernel) cannot
> > > buffer enough, but otherwise, it only causes problems.
> >
> > Agreed. I think something short of full SCHED_FIFO will make most
> > desktop folks happy. [...]
>
> ah, but it's not the desktop folks who have to be happy but users :-)
> Really, if you ask any app designer then obviously 'the more CPU time we
> get for sure, the better our app behaves'. So in that sense SCHED_OTHER
> is a fair playground: if you behave nicely you'll have higher priority
> and shorter latencies.
>
> (there are things like SCHED_ISO but how good of a solution they are is
> not yet clear.)
>
> > [...] But a) we still have to figure out exactly how to do that and b)
> > we still have to make everyone else happy. The embedded folks (me
> > included) would prefer to not run our realtime bits as root too..
>
> you dont have to - you can drop root after startup.
>
> > > but i'm not sure how rlimits will contain the whole problem - can
> > > rlimits be restricted to a single app (jackd)?
> >
> > Yes. There's also the whole soft limit thing.
>
> i'm curious, how does this 'per-app' rlimit thing work? If a user has
> jackd installed and runs it from X unprivileged, how does it get the
> elevated rlimit?

It needs a setuid launcher. It would be nice to be able to elevate the
rlimits of running processes but the API doesn't exist yet.

>From the POV of accidental elevation to RT, soft limits are
sufficient. But we can't stop a user from exploiting an app they own
with RT privileges from elevating other apps via ptrace+exec or
whatever. Nor with RT-LSM.

> (while the rest of his desktop still runs with a safe
> rlimit.) SELinux/RT-LSM could do this, but i'm not sure about how
> rlimits give this to you.

How does it get done with RT-LSM? Setgid binaries? It only
discriminates on a group granularity. Or are you saying "and SELinux"
rather than "or SELinux"?

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 17:51:37

by Paul Davis

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

>RT-LSM introduces architectural problems in the form of bogus API. And

that may be true of LSM, but not RT-LSM in particular. RT-LSM doesn't
introduce *any* API whatsoever - it simply allows software to call
various existing APIs (mostly from POSIX) and have them not fail as
result of not being root and/or not running on a capabilities-enabled
kernel without the required caps.

No audio apps "use" RT-LSM in any way - it just lets them do things
they otherwise could not do. And all the alternatives to RT-LSM have
this feature as well - controlling rlimits won't be done by the audio
apps, but by some part of the security infrastructure.

>it's implemented as an LSM is meaningless if Redhat and SuSE ship it
>on by default.

We haven't encouraged anyone to ship anything with it on by default:
the idea is for the module to be present and usable, not turned on.

--p

2005-02-11 17:55:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2


* Matt Mackall <[email protected]> wrote:

> > > Yes. There's also the whole soft limit thing.
> >
> > i'm curious, how does this 'per-app' rlimit thing work? If a user has
> > jackd installed and runs it from X unprivileged, how does it get the
> > elevated rlimit?
>
> It needs a setuid launcher. It would be nice to be able to elevate the
> rlimits of running processes but the API doesn't exist yet.

With a setuid launcher you need _zero_ kernel help to get SCHED_FIFO: if
you have a launcher then already today it can just give SCHED_FIFO to
jackd and be done with it!

Ingo

2005-02-11 19:42:54

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 12:49:04PM -0500, Paul Davis wrote:
> >RT-LSM introduces architectural problems in the form of bogus API. And
>
> that may be true of LSM, but not RT-LSM in particular. RT-LSM doesn't
> introduce *any* API whatsoever - it simply allows software to call
> various existing APIs (mostly from POSIX) and have them not fail as
> result of not being root and/or not running on a capabilities-enabled
> kernel without the required caps.

The API is the parameters to modprobe or sysfs.

> >it's implemented as an LSM is meaningless if Redhat and SuSE ship it
> >on by default.
>
> We haven't encouraged anyone to ship anything with it on by default:
> the idea is for the module to be present and usable, not turned on.

On as in turned on for build in the kernel config and shipped. But I
expect people will eventually actually ship it _on_ with a group
called 'rt' and possibly even put the primary user in there on install
unless you start slapping some big fat warnings on it. (I just noticed
the new Debian installer is putting the primary user in audio, cdrom,
video, etc.)

--
Mathematics is the supreme nostalgia of our time.

2005-02-11 19:58:14

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, 2005-02-11 at 11:42 -0800, Matt Mackall wrote:
> On Fri, Feb 11, 2005 at 12:49:04PM -0500, Paul Davis wrote:
> > >RT-LSM introduces architectural problems in the form of bogus API. And
> >
> > that may be true of LSM, but not RT-LSM in particular. RT-LSM doesn't
> > introduce *any* API whatsoever - it simply allows software to call
> > various existing APIs (mostly from POSIX) and have them not fail as
> > result of not being root and/or not running on a capabilities-enabled
> > kernel without the required caps.
>
> The API is the parameters to modprobe or sysfs.
>

I think you are talking about the API for root to administer it vs. the
(lack of) API for apps to use the RT capabilities. I think Paul's point
is that we can transparently replace it with something better (IMO the
RT rlimit is better) in the future, and the apps don't have to know
about it at all. Comparing it to devfs/udev is bogus because those are
way, way more complicated.

> > >it's implemented as an LSM is meaningless if Redhat and SuSE ship it
> > >on by default.
> >
> > We haven't encouraged anyone to ship anything with it on by default:
> > the idea is for the module to be present and usable, not turned on.
>
> On as in turned on for build in the kernel config and shipped. But I
> expect people will eventually actually ship it _on_ with a group
> called 'rt' and possibly even put the primary user in there on install
> unless you start slapping some big fat warnings on it. (I just noticed
> the new Debian installer is putting the primary user in audio, cdrom,
> video, etc.)
>

Sorry, if the distros are so dumb they need a big fat warning to know
that this is not a safe thing to enable by default, at least on anything
you would ever consider a multiuser system, then they get what they
deserve. If they have half a brain they will use the setgid approach
that Ingo suggested, and only enable this for apps like JACK and
cdrecord that have been farily well audited and can be trusted to use
this feature (for example JACK has the internal watchdog to keep a bad
client from locking the system). Really it only makes sense for a
distro to enable this if the user selects the "low latency desktop" or
"multimedia desktop" or whatever install option and makes clear that
this profile is NOT suitable for a multiuser system.

Lee

2005-02-11 20:11:30

by Matt Mackall

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

On Fri, Feb 11, 2005 at 06:49:05PM +0100, Ingo Molnar wrote:
>
> * Matt Mackall <[email protected]> wrote:
>
> > > > Yes. There's also the whole soft limit thing.
> > >
> > > i'm curious, how does this 'per-app' rlimit thing work? If a user has
> > > jackd installed and runs it from X unprivileged, how does it get the
> > > elevated rlimit?
> >
> > It needs a setuid launcher. It would be nice to be able to elevate the
> > rlimits of running processes but the API doesn't exist yet.
>
> With a setuid launcher you need _zero_ kernel help to get SCHED_FIFO: if
> you have a launcher then already today it can just give SCHED_FIFO to
> jackd and be done with it!

I'm sure you know all this already but I'll spell it out so we're all
clear:

a) rlimits are tracked per-process so they're fundamentally
per-process
b) there are hard and soft limits, with soft always <= hard
c) only root can raise hard rlimits, but normal users can lower them
d) if a user owns a process, he can gain the privileges of that process
by various means, so in the strict sense per-process privileges are
meaningless - all privileges are per-uid
e) so we either need to segregate all privileged processes into
separate uid domains
f) or we're assuming non-malicious users and soft limits are
sufficient.

Now I suspect we don't want to insist people do (e) (though I'd
certainly encourage them to try).

Don't forget that the rlimits approach allows us to reserve the
highest priorities for root. I'm pretty sure an effective watchdog
policy can thus be implemented in userspace, which RT-LSM can't really
offer.

--
Mathematics is the supreme nostalgia of our time.

2005-02-12 14:56:25

by Henning Rohde

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Hi,

Yuval Tanny wrote:
> Andrew Morton wrote:
>>cachefs-filesystem.patch
>> CacheFS filesystem
> ...

as you mention cachefs - know what's the status of supporting nfs?
Or is the project as dead as the mailing-list?

Is there any whole-in-one patch relative to vanilla-sources,
at best including nfs-support?


Thanks in advance,

Henning Rohde

2005-02-12 22:43:18

by Olaf Dietsche

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Christoph Hellwig <[email protected]> writes:

> On Thu, Feb 10, 2005 at 02:35:08AM -0800, Andrew Morton wrote:
>>
>> - Added the mlock and !SCHED_OTHER Linux Security Module for the audio guys.
>> It seems that nothing else is going to come along and this is completely
>> encapsulated.
>
> Even if we accept a module that grants capabilities to groups this isn't fine
> yet because it only supports two specific capabilities (and even those two in
> different ways!) instead of adding generic support to bind capabilities to
> groups.

Unless I misunderstood the code, this one is available for
quite some time: <http://www.olafdietsche.de/linux/accessfs/>
or a newer, self-contained version <http://lkml.org/lkml/2005/1/11/221>

Or you could use a real solution - filesystem capabilities:
<http://www.olafdietsche.de/linux/capability/> and if you don't like
this one :-), there's also an alternative existing here:
<http://www.stanford.edu/~luto/linux-fscap/>

Regards, Olaf.

2005-02-14 05:22:48

by Werner Almesberger

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Ingo Molnar wrote:
> the pro applications will always want to have a 100% guarantee (it
> really sucks to generate a nasty audio click during a live performance)

... and the "generic kernels" distributions use will follow just
as swiftly, as soon as the feature appears stable enough. It even
makes sense: no need to switch kernels if "pro audio" applications
(or whatever else may end up wanting this) are added to the mix,
and fewer configurations to test.

You can run, but you cannot hide :-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2005-02-14 13:27:00

by Stefano Rivoir

[permalink] [raw]
Subject: Re: 2.6.11-rc3-mm2

Alle 11:35, gioved? 10 febbraio 2005, Andrew Morton ha scritto:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc3/2.
>6.11-rc3-mm2/

I was trying to use the skge module for my Intel 3c940 card, in place of the
(working) sk98lin.

It gives the following:

Feb 14 14:16:35 nbsteu kernel: kobject_register failed for skge (-17)
Feb 14 14:16:35 nbsteu kernel: [kobject_register+81/96]
kobject_register+0x51/0x60
Feb 14 14:16:35 nbsteu kernel: [bus_add_driver+82/192]
bus_add_driver+0x52/0xc0
Feb 14 14:16:35 nbsteu kernel: [driver_register+40/48]
driver_register+0x28/0x30
Feb 14 14:16:35 nbsteu kernel: [pci_register_driver+94/128]
pci_register_driver+0x5e/0x80
Feb 14 14:16:35 nbsteu kernel: [sys_init_module+313/480]
sys_init_module+0x139/0x1e0
Feb 14 14:16:35 nbsteu kernel: [sysenter_past_esp+82/117]
sysenter_past_esp+0x52/0x75

Attached, .config and lspci -v

--
Stefano Rivoir


Attachments:
(No filename) (920.00 B)
lspci-v (5.47 kB)
.config (28.92 kB)
Download all attachments

2005-02-18 16:49:13

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-rc3-mm2

On Thu, 2005-02-10 at 13:47 -0500, Robert Love wrote:

> Attached, find a patch against 2.6.11-rc3-mm2 of the latest inotify.

Updated patch, fixes a bug.

Robert Love


inotify, bitches

Signed-off-by: Robert Love <[email protected]>

arch/sparc64/Kconfig | 13
drivers/char/Kconfig | 13
drivers/char/Makefile | 2
drivers/char/inotify.c | 1053 +++++++++++++++++++++++++++++++++++++++++++++
drivers/char/misc.c | 14
fs/attr.c | 34 -
fs/compat.c | 14
fs/file_table.c | 4
fs/inode.c | 3
fs/namei.c | 38 -
fs/open.c | 9
fs/read_write.c | 28 -
fs/super.c | 3
include/linux/fs.h | 7
include/linux/fsnotify.h | 235 ++++++++++
include/linux/inotify.h | 118 +++++
include/linux/miscdevice.h | 5
include/linux/sched.h | 2
kernel/user.c | 2
19 files changed, 1522 insertions(+), 75 deletions(-)

diff -urN linux-2.6.10/arch/sparc64/Kconfig linux/arch/sparc64/Kconfig
--- linux-2.6.10/arch/sparc64/Kconfig 2004-12-24 16:35:25.000000000 -0500
+++ linux/arch/sparc64/Kconfig 2005-02-01 12:24:26.000000000 -0500
@@ -88,6 +88,19 @@
bool
default y

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config SMP
bool "Symmetric multi-processing support"
---help---
diff -urN linux-2.6.10/drivers/char/inotify.c linux/drivers/char/inotify.c
--- linux-2.6.10/drivers/char/inotify.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/drivers/char/inotify.c 2005-02-09 16:05:07.959265648 -0500
@@ -0,0 +1,1053 @@
+/*
+ * drivers/char/inotify.c - inode-based file event notifications
+ *
+ * Authors:
+ * John McCutchan <[email protected]>
+ * Robert Love <[email protected]>
+ *
+ * Copyright (C) 2005 John McCutchan
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/writeback.h>
+#include <linux/inotify.h>
+
+#include <asm/ioctls.h>
+
+static atomic_t inotify_cookie;
+static kmem_cache_t *watch_cachep;
+static kmem_cache_t *event_cachep;
+static kmem_cache_t *inode_data_cachep;
+
+static int sysfs_attrib_max_user_devices;
+static int sysfs_attrib_max_user_watches;
+static unsigned int sysfs_attrib_max_queued_events;
+
+/*
+ * struct inotify_device - represents an open instance of an inotify device
+ *
+ * For each inotify device, we need to keep track of the events queued on it,
+ * a list of the inodes that we are watching, and so on.
+ *
+ * This structure is protected by 'lock'. Lock ordering:
+ *
+ * dev->lock (protects dev)
+ * inode_lock (used to safely walk inode_in_use list)
+ * inode->i_lock (only needed for getting ref on inode_data)
+ */
+struct inotify_device {
+ wait_queue_head_t wait;
+ struct idr idr;
+ struct list_head events;
+ struct list_head watches;
+ spinlock_t lock;
+ unsigned int queue_size;
+ unsigned int event_count;
+ unsigned int max_events;
+ struct user_struct *user;
+};
+
+struct inotify_watch {
+ s32 wd; /* watch descriptor */
+ u32 mask; /* event mask for this watch */
+ struct inode *inode; /* associated inode */
+ struct inotify_device *dev; /* associated device */
+ struct list_head d_list; /* entry in device's list */
+ struct list_head i_list; /* entry in inotify_data's list */
+};
+
+/*
+ * A list of these is attached to each instance of the driver. In read(), this
+ * this list is walked and all events that can fit in the buffer are returned.
+ */
+struct inotify_kernel_event {
+ struct inotify_event event;
+ struct list_head list;
+ char *filename;
+};
+
+static ssize_t show_max_queued_events(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_queued_events);
+}
+
+static ssize_t store_max_queued_events(struct class_device *class,
+ const char *buf, size_t count)
+{
+ unsigned int max;
+
+ if (sscanf(buf, "%u", &max) > 0 && max > 0) {
+ sysfs_attrib_max_queued_events = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_devices(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_user_devices);
+}
+
+static ssize_t store_max_user_devices(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ sysfs_attrib_max_user_devices = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static ssize_t show_max_user_watches(struct class_device *class, char *buf)
+{
+ return sprintf(buf, "%d\n", sysfs_attrib_max_user_watches);
+}
+
+static ssize_t store_max_user_watches(struct class_device *class,
+ const char *buf, size_t count)
+{
+ int max;
+
+ if (sscanf(buf, "%d", &max) > 0 && max > 0) {
+ sysfs_attrib_max_user_watches = max;
+ return strlen(buf);
+ }
+ return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(max_queued_events, S_IRUGO | S_IWUSR,
+ show_max_queued_events, store_max_queued_events);
+static CLASS_DEVICE_ATTR(max_user_devices, S_IRUGO | S_IWUSR,
+ show_max_user_devices, store_max_user_devices);
+static CLASS_DEVICE_ATTR(max_user_watches, S_IRUGO | S_IWUSR,
+ show_max_user_watches, store_max_user_watches);
+
+static inline void __get_inode_data(struct inotify_inode_data *data)
+{
+ atomic_inc(&data->count);
+}
+
+/*
+ * get_inode_data - pin an inotify_inode_data structure. Returns the structure
+ * if successful and NULL on failure, which can only occur if inotify_data is
+ * not yet allocated. The inode must be pinned prior to invocation.
+ */
+static inline struct inotify_inode_data * get_inode_data(struct inode *inode)
+{
+ struct inotify_inode_data *data;
+
+ spin_lock(&inode->i_lock);
+ data = inode->inotify_data;
+ if (data)
+ __get_inode_data(data);
+ spin_unlock(&inode->i_lock);
+
+ return data;
+}
+
+/*
+ * put_inode_data - drop our reference on an inotify_inode_data and the
+ * inode structure in which it lives. If the reference count on inotify_data
+ * reaches zero, free it.
+ */
+static inline void put_inode_data(struct inode *inode)
+{
+ //spin_lock(&inode->i_lock);
+ if (atomic_dec_and_test(&inode->inotify_data->count)) {
+ kmem_cache_free(inode_data_cachep, inode->inotify_data);
+ inode->inotify_data = NULL;
+ }
+ //spin_unlock(&inode->i_lock);
+}
+
+/*
+ * find_inode - resolve a user-given path to a specific inode and return a nd
+ */
+static int find_inode(const char __user *dirname, struct nameidata *nd)
+{
+ int error;
+
+ error = __user_walk(dirname, LOOKUP_FOLLOW, nd);
+ if (error)
+ return error;
+
+ /* you can only watch an inode if you have read permissions on it */
+ return permission(nd->dentry->d_inode, MAY_READ, NULL);
+}
+
+static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
+ const char *filename)
+{
+ struct inotify_kernel_event *kevent;
+
+ kevent = kmem_cache_alloc(event_cachep, GFP_ATOMIC);
+ if (!kevent)
+ return NULL;
+
+ /* we hand this out to user-space, so zero it just in case */
+ memset(&kevent->event, 0, sizeof(struct inotify_event));
+
+ kevent->event.wd = wd;
+ kevent->event.mask = mask;
+ kevent->event.cookie = cookie;
+ INIT_LIST_HEAD(&kevent->list);
+
+ if (filename) {
+ size_t len, rem, event_size = sizeof(struct inotify_event);
+
+ /*
+ * We need to pad the filename so as to properly align an
+ * array of inotify_event structures. Because the structure is
+ * small and the common case is a small filename, we just round
+ * up to the next multiple of the structure's sizeof. This is
+ * simple and safe for all architectures.
+ */
+ len = strlen(filename) + 1;
+ rem = event_size - len;
+ if (len > event_size) {
+ rem = event_size - (len % event_size);
+ if (len % event_size == 0)
+ rem = 0;
+ }
+ len += rem;
+
+ kevent->filename = kmalloc(len, GFP_ATOMIC);
+ if (!kevent->filename) {
+ kmem_cache_free(event_cachep, kevent);
+ return NULL;
+ }
+ memset(kevent->filename, 0, len);
+ strncpy(kevent->filename, filename, strlen(filename));
+ kevent->event.len = len;
+ } else {
+ kevent->event.len = 0;
+ kevent->filename = NULL;
+ }
+
+ return kevent;
+}
+
+#define list_to_inotify_kernel_event(pos) \
+ list_entry((pos), struct inotify_kernel_event, list)
+
+#define inotify_dev_get_event(dev) \
+ (list_to_inotify_kernel_event(dev->events.next))
+
+/*
+ * inotify_dev_queue_event - add a new event to the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_queue_event(struct inotify_device *dev,
+ struct inotify_watch *watch, u32 mask,
+ u32 cookie, const char *filename)
+{
+ struct inotify_kernel_event *kevent, *last;
+
+ /* drop this event if it is a dupe of the previous */
+ last = inotify_dev_get_event(dev);
+ if (dev->event_count && last->event.mask == mask &&
+ last->event.wd == watch->wd) {
+ const char *lastname = last->filename;
+
+ if (!filename && !lastname)
+ return;
+ if (filename && lastname && !strcmp(lastname, filename))
+ return;
+ }
+
+ /*
+ * the queue has already overflowed and we have already sent the
+ * Q_OVERFLOW event
+ */
+ if (dev->event_count > dev->max_events)
+ return;
+
+ /* the queue has just overflowed and we need to notify user space */
+ if (dev->event_count == dev->max_events) {
+ kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
+ goto add_event_to_queue;
+ }
+
+ kevent = kernel_event(watch->wd, mask, cookie, filename);
+
+add_event_to_queue:
+ if (!kevent)
+ return;
+
+ /* queue the event and wake up anyone waiting */
+ dev->event_count++;
+ dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
+ list_add_tail(&kevent->list, &dev->events);
+ wake_up_interruptible(&dev->wait);
+}
+
+static inline int inotify_dev_has_events(struct inotify_device *dev)
+{
+ return !list_empty(&dev->events);
+}
+
+/*
+ * inotify_dev_event_dequeue - destroy an event on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static void inotify_dev_event_dequeue(struct inotify_device *dev)
+{
+ struct inotify_kernel_event *kevent;
+
+ if (!inotify_dev_has_events(dev))
+ return;
+
+ kevent = inotify_dev_get_event(dev);
+ list_del_init(&kevent->list);
+ if (kevent->filename)
+ kfree(kevent->filename);
+
+ dev->event_count--;
+ dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
+
+ kmem_cache_free(event_cachep, kevent);
+}
+
+/*
+ * inotify_dev_get_wd - returns the next WD for use by the given dev
+ *
+ * This function can sleep.
+ */
+static int inotify_dev_get_wd(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ int ret;
+
+ if (atomic_read(&dev->user->inotify_watches) >=
+ sysfs_attrib_max_user_watches)
+ return -ENOSPC;
+
+repeat:
+ if (!idr_pre_get(&dev->idr, GFP_KERNEL))
+ return -ENOSPC;
+ spin_lock(&dev->lock);
+ ret = idr_get_new(&dev->idr, watch, &watch->wd);
+ spin_unlock(&dev->lock);
+ if (ret == -EAGAIN) /* more memory is required, try again */
+ goto repeat;
+ else if (ret) /* the idr is full! */
+ return -ENOSPC;
+
+ atomic_inc(&dev->user->inotify_watches);
+
+ return 0;
+}
+
+/*
+ * inotify_dev_put_wd - release the given WD on the given device
+ *
+ * Caller must hold dev->lock.
+ */
+static int inotify_dev_put_wd(struct inotify_device *dev, s32 wd)
+{
+ if (!dev || wd < 0)
+ return -1;
+
+ atomic_dec(&dev->user->inotify_watches);
+ idr_remove(&dev->idr, wd);
+
+ return 0;
+}
+
+/*
+ * create_watch - creates a watch on the given device.
+ *
+ * Grabs dev->lock, so the caller must not hold it.
+ */
+static struct inotify_watch *create_watch(struct inotify_device *dev,
+ u32 mask, struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
+ if (!watch)
+ return NULL;
+
+ watch->mask = mask;
+ watch->inode = inode;
+ watch->dev = dev;
+ INIT_LIST_HEAD(&watch->d_list);
+ INIT_LIST_HEAD(&watch->i_list);
+
+ if (inotify_dev_get_wd(dev, watch)) {
+ kmem_cache_free(watch_cachep, watch);
+ return NULL;
+ }
+
+ return watch;
+}
+
+/*
+ * delete_watch - removes the given 'watch' from the given 'dev'
+ *
+ * Caller must hold dev->lock.
+ */
+static void delete_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ inotify_dev_put_wd(dev, watch->wd);
+ kmem_cache_free(watch_cachep, watch);
+}
+
+/*
+ * inotify_find_dev - find the watch associated with the given inode and dev
+ *
+ * Caller must hold dev->lock.
+ * FIXME: Needs inotify_data->lock too. Don't need dev->lock, just pin it.
+ */
+static struct inotify_watch *inode_find_dev(struct inode *inode,
+ struct inotify_device *dev)
+{
+ struct inotify_watch *watch;
+
+ if (!inode->inotify_data)
+ return NULL;
+
+ list_for_each_entry(watch, &inode->inotify_data->watches, i_list) {
+ if (watch->dev == dev)
+ return watch;
+ }
+
+ return NULL;
+}
+
+/*
+ * dev_find_wd - given a (dev,wd) pair, returns the matching inotify_watcher
+ *
+ * Returns the results of looking up (dev,wd) in the idr layer. NULL is
+ * returned on error.
+ *
+ * The caller must hold dev->lock.
+ */
+static inline struct inotify_watch *dev_find_wd(struct inotify_device *dev,
+ u32 wd)
+{
+ return idr_find(&dev->idr, wd);
+}
+
+static int inotify_dev_is_watching_inode(struct inotify_device *dev,
+ struct inode *inode)
+{
+ struct inotify_watch *watch;
+
+ list_for_each_entry(watch, &dev->watches, d_list) {
+ if (watch->inode == inode)
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * inotify_dev_add_watcher - add the given watcher to the given device instance
+ *
+ * Caller must hold dev->lock.
+ */
+static int inotify_dev_add_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ if (!dev || !watch)
+ return -EINVAL;
+
+ list_add(&watch->d_list, &dev->watches);
+ return 0;
+}
+
+/*
+ * inotify_dev_rm_watch - remove the given watch from the given device
+ *
+ * Caller must hold dev->lock because we call inotify_dev_queue_event().
+ */
+static int inotify_dev_rm_watch(struct inotify_device *dev,
+ struct inotify_watch *watch)
+{
+ if (!watch)
+ return -EINVAL;
+
+ inotify_dev_queue_event(dev, watch, IN_IGNORED, 0, NULL);
+ list_del_init(&watch->d_list);
+
+ return 0;
+}
+
+/*
+ * inode_add_watch - add a watch to the given inode
+ *
+ * Callers must hold dev->lock, because we call inode_find_dev().
+ */
+static int inode_add_watch(struct inode *inode, struct inotify_watch *watch)
+{
+ int ret;
+
+ if (!inode || !watch)
+ return -EINVAL;
+
+ spin_lock(&inode->i_lock);
+ if (!inode->inotify_data) {
+ /* inotify_data is not attached to the inode, so add it */
+ inode->inotify_data = kmem_cache_alloc(inode_data_cachep,
+ GFP_ATOMIC);
+ if (!inode->inotify_data) {
+ ret = -ENOMEM;
+ goto out_lock;
+ }
+
+ atomic_set(&inode->inotify_data->count, 0);
+ INIT_LIST_HEAD(&inode->inotify_data->watches);
+ spin_lock_init(&inode->inotify_data->lock);
+ } else if (inode_find_dev(inode, watch->dev)) {
+ /* a watch is already associated with this (inode,dev) pair */
+ ret = -EINVAL;
+ goto out_lock;
+ }
+ __get_inode_data(inode->inotify_data);
+ spin_unlock(&inode->i_lock);
+
+ list_add(&watch->i_list, &inode->inotify_data->watches);
+
+ return 0;
+out_lock:
+ spin_unlock(&inode->i_lock);
+ return ret;
+}
+
+static int inode_rm_watch(struct inode *inode,
+ struct inotify_watch *watch)
+{
+ if (!inode || !watch || !inode->inotify_data)
+ return -EINVAL;
+
+ list_del_init(&watch->i_list);
+
+ /* clean up inode->inotify_data */
+ put_inode_data(inode);
+
+ return 0;
+}
+
+/* Kernel API */
+
+/*
+ * inotify_inode_queue_event - queue an event with the given mask, cookie, and
+ * filename to any watches associated with the given inode.
+ *
+ * inode must be pinned prior to calling.
+ */
+void inotify_inode_queue_event(struct inode *inode, u32 mask, u32 cookie,
+ const char *name)
+{
+ struct inotify_watch *watch;
+
+ if (!inode->inotify_data)
+ return;
+
+ list_for_each_entry(watch, &inode->inotify_data->watches, i_list) {
+ if (watch->mask & mask) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, mask, cookie, name);
+ spin_unlock(&dev->lock);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(inotify_inode_queue_event);
+
+void inotify_dentry_parent_queue_event(struct dentry *dentry, u32 mask,
+ u32 cookie, const char *filename)
+{
+ struct dentry *parent;
+ struct inode *inode;
+
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ inode = parent->d_inode;
+ if (inode->inotify_data) {
+ dget(parent);
+ spin_unlock(&dentry->d_lock);
+ inotify_inode_queue_event(inode, mask, cookie, filename);
+ dput(parent);
+ } else
+ spin_unlock(&dentry->d_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
+
+u32 inotify_get_cookie(void)
+{
+ atomic_inc(&inotify_cookie);
+ return atomic_read(&inotify_cookie);
+}
+EXPORT_SYMBOL_GPL(inotify_get_cookie);
+
+/*
+ * Caller must hold dev->lock.
+ */
+static void __remove_watch(struct inotify_watch *watch,
+ struct inotify_device *dev)
+{
+ struct inode *inode;
+
+ inode = watch->inode;
+
+ inode_rm_watch(inode, watch);
+ inotify_dev_rm_watch(dev, watch);
+ delete_watch(dev, watch);
+
+ iput(inode);
+}
+
+/*
+ * destroy_watch - remove a watch from both the device and the inode.
+ *
+ * watch->inode must be pinned. We drop a reference before returning. Grabs
+ * dev->lock.
+ */
+static void remove_watch(struct inotify_watch *watch)
+{
+ struct inotify_device *dev = watch->dev;
+
+ spin_lock(&dev->lock);
+ __remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+}
+
+void inotify_super_block_umount(struct super_block *sb)
+{
+ struct inode *inode;
+
+ spin_lock(&inode_lock);
+
+ /*
+ * We hold the inode_lock, so the inodes are not going anywhere, and
+ * we grab a reference on inotify_data before walking its list of
+ * watches.
+ */
+ list_for_each_entry(inode, &inode_in_use, i_list) {
+ struct inotify_inode_data *inode_data;
+ struct inotify_watch *watch;
+
+ if (inode->i_sb != sb)
+ continue;
+
+ inode_data = get_inode_data(inode);
+ if (!inode_data)
+ continue;
+
+ list_for_each_entry(watch, &inode_data->watches, i_list) {
+ struct inotify_device *dev = watch->dev;
+ spin_lock(&dev->lock);
+ inotify_dev_queue_event(dev, watch, IN_UNMOUNT, 0,
+ NULL);
+ __remove_watch(watch, dev);
+ spin_unlock(&dev->lock);
+ }
+ put_inode_data(inode);
+ }
+
+ spin_unlock(&inode_lock);
+}
+EXPORT_SYMBOL_GPL(inotify_super_block_umount);
+
+/*
+ * inotify_inode_is_dead - an inode has been deleted, cleanup any watches
+ */
+void inotify_inode_is_dead(struct inode *inode)
+{
+ struct inotify_watch *watch, *next;
+ struct inotify_inode_data *data;
+
+ data = get_inode_data(inode);
+ if (!data)
+ return;
+ list_for_each_entry_safe(watch, next, &data->watches, i_list)
+ remove_watch(watch);
+ put_inode_data(inode);
+}
+EXPORT_SYMBOL_GPL(inotify_inode_is_dead);
+
+/* The driver interface is implemented below */
+
+static unsigned int inotify_poll(struct file *file, poll_table *wait)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+
+ poll_wait(file, &dev->wait, wait);
+
+ if (inotify_dev_has_events(dev))
+ return POLLIN | POLLRDNORM;
+
+ return 0;
+}
+
+static ssize_t inotify_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ size_t event_size;
+ struct inotify_device *dev;
+ char __user *start;
+ DECLARE_WAITQUEUE(wait, current);
+
+ start = buf;
+ dev = file->private_data;
+
+ /* We only hand out full inotify events */
+ event_size = sizeof(struct inotify_event);
+ if (count < event_size)
+ return 0;
+
+ while (1) {
+ int has_events;
+
+ spin_lock(&dev->lock);
+ has_events = inotify_dev_has_events(dev);
+ spin_unlock(&dev->lock);
+ if (has_events)
+ break;
+
+ if (file->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+
+ if (signal_pending(current))
+ return -EINTR;
+
+ add_wait_queue(&dev->wait, &wait);
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ schedule();
+
+ set_current_state(TASK_RUNNING);
+ remove_wait_queue(&dev->wait, &wait);
+ }
+
+ while (count >= event_size) {
+ struct inotify_kernel_event *kevent;
+
+ spin_lock(&dev->lock);
+ if (!inotify_dev_has_events(dev)) {
+ spin_unlock(&dev->lock);
+ break;
+ }
+ kevent = inotify_dev_get_event(dev);
+ spin_unlock(&dev->lock);
+
+ /* We can't send this event, not enough space in the buffer */
+ if (event_size + kevent->event.len > count)
+ break;
+
+ /* Copy the entire event except the string to user space */
+ if (copy_to_user(buf, &kevent->event, event_size))
+ return -EFAULT;
+
+ buf += event_size;
+ count -= event_size;
+
+ /* Copy the filename to user space */
+ if (kevent->filename) {
+ if (copy_to_user(buf, kevent->filename,
+ kevent->event.len))
+ return -EFAULT;
+ buf += kevent->event.len;
+ count -= kevent->event.len;
+ }
+
+ spin_lock(&dev->lock);
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+ }
+
+ return buf - start;
+}
+
+static int inotify_open(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+ struct user_struct *user;
+ int ret;
+
+ user = get_uid(current->user);
+
+ if (atomic_read(&user->inotify_devs) >= sysfs_attrib_max_user_devices) {
+ ret = -EMFILE;
+ goto out_err;
+ }
+
+ dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
+ if (!dev) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ atomic_inc(&current->user->inotify_devs);
+
+ idr_init(&dev->idr);
+
+ INIT_LIST_HEAD(&dev->events);
+ INIT_LIST_HEAD(&dev->watches);
+ init_waitqueue_head(&dev->wait);
+
+ dev->event_count = 0;
+ dev->queue_size = 0;
+ dev->max_events = sysfs_attrib_max_queued_events;
+ dev->user = user;
+ spin_lock_init(&dev->lock);
+
+ file->private_data = dev;
+
+ return 0;
+out_err:
+ free_uid(current->user);
+ return ret;
+}
+
+/*
+ * inotify_release_all_watches - destroy all watches on a given device
+ *
+ * FIXME: We need a lock on the watch here.
+ */
+static void inotify_release_all_watches(struct inotify_device *dev)
+{
+ struct inotify_watch *watch, *next;
+
+ list_for_each_entry_safe(watch, next, &dev->watches, d_list)
+ remove_watch(watch);
+}
+
+/*
+ * inotify_release_all_events - destroy all of the events on a given device
+ */
+static void inotify_release_all_events(struct inotify_device *dev)
+{
+ spin_lock(&dev->lock);
+ while (inotify_dev_has_events(dev))
+ inotify_dev_event_dequeue(dev);
+ spin_unlock(&dev->lock);
+}
+
+static int inotify_release(struct inode *inode, struct file *file)
+{
+ struct inotify_device *dev;
+
+ dev = file->private_data;
+
+ inotify_release_all_watches(dev);
+ inotify_release_all_events(dev);
+
+ atomic_dec(&dev->user->inotify_devs);
+ free_uid(dev->user);
+
+ kfree(dev);
+
+ return 0;
+}
+
+static int inotify_add_watch(struct inotify_device *dev,
+ struct inotify_watch_request *request)
+{
+ struct inode *inode;
+ struct inotify_watch *watch;
+ struct nameidata nd;
+ int ret;
+
+ ret = find_inode((const char __user*) request->name, &nd);
+ if (ret)
+ return ret;
+
+ /* held in place by references in nd */
+ inode = nd.dentry->d_inode;
+
+ spin_lock(&dev->lock);
+
+ /*
+ * This handles the case of re-adding a directory we are already
+ * watching, we just update the mask and return 0
+ */
+ if (inotify_dev_is_watching_inode(dev, inode)) {
+ struct inotify_watch *owatch; /* the old watch */
+
+ owatch = inode_find_dev(inode, dev);
+ owatch->mask = request->mask;
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+
+ return owatch->wd;
+ }
+
+ spin_unlock(&dev->lock);
+
+ watch = create_watch(dev, request->mask, inode);
+ if (!watch) {
+ path_release(&nd);
+ return -ENOSPC;
+ }
+
+ spin_lock(&dev->lock);
+
+ /* We can't add anymore watches to this device */
+ if (inotify_dev_add_watch(dev, watch)) {
+ delete_watch(dev, watch);
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+ return -EINVAL;
+ }
+
+ ret = inode_add_watch(inode, watch);
+ if (ret < 0) {
+ list_del_init(&watch->d_list);
+ delete_watch(dev, watch);
+ spin_unlock(&dev->lock);
+ path_release(&nd);
+ return ret;
+ }
+
+ spin_unlock(&dev->lock);
+
+ /*
+ * Demote the reference to nameidata to a reference to the inode held
+ * by the watch.
+ */
+ spin_lock(&inode_lock);
+ __iget(inode);
+ spin_unlock(&inode_lock);
+ path_release(&nd);
+
+ return watch->wd;
+}
+
+static int inotify_ignore(struct inotify_device *dev, s32 wd)
+{
+ struct inotify_watch *watch;
+ int ret = 0;
+
+ spin_lock(&dev->lock);
+ watch = dev_find_wd(dev, wd);
+ //spin_unlock(&dev->lock);
+ if (!watch) {
+ ret = -EINVAL;
+ goto out;
+ }
+ __remove_watch(watch, dev);
+
+out:
+ spin_unlock(&dev->lock);
+ return ret;
+}
+
+/*
+ * inotify_ioctl() - our device file's ioctl method
+ *
+ * The VFS serializes all of our calls via the BKL and we rely on that. We
+ * could, alternatively, grab dev->lock. Right now lower levels grab that
+ * where needed.
+ */
+static int inotify_ioctl(struct inode *ip, struct file *fp,
+ unsigned int cmd, unsigned long arg)
+{
+ struct inotify_device *dev;
+ struct inotify_watch_request request;
+ void __user *p;
+ s32 wd;
+
+ dev = fp->private_data;
+ p = (void __user *) arg;
+
+ switch (cmd) {
+ case INOTIFY_WATCH:
+ if (copy_from_user(&request, p, sizeof (request)))
+ return -EFAULT;
+ return inotify_add_watch(dev, &request);
+ case INOTIFY_IGNORE:
+ if (copy_from_user(&wd, p, sizeof (wd)))
+ return -EFAULT;
+ return inotify_ignore(dev, wd);
+ case FIONREAD:
+ return put_user(dev->queue_size, (int __user *) p);
+ default:
+ return -ENOTTY;
+ }
+}
+
+static struct file_operations inotify_fops = {
+ .owner = THIS_MODULE,
+ .poll = inotify_poll,
+ .read = inotify_read,
+ .open = inotify_open,
+ .release = inotify_release,
+ .ioctl = inotify_ioctl,
+};
+
+static struct miscdevice inotify_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "inotify",
+ .fops = &inotify_fops,
+};
+
+static int __init inotify_init(void)
+{
+ struct class_device *class;
+ int ret;
+
+ ret = misc_register(&inotify_device);
+ if (ret)
+ return ret;
+
+ sysfs_attrib_max_queued_events = 512;
+ sysfs_attrib_max_user_devices = 64;
+ sysfs_attrib_max_user_watches = 16384;
+
+ class = inotify_device.class;
+ class_device_create_file(class, &class_device_attr_max_queued_events);
+ class_device_create_file(class, &class_device_attr_max_user_devices);
+ class_device_create_file(class, &class_device_attr_max_user_watches);
+
+ atomic_set(&inotify_cookie, 0);
+
+ watch_cachep = kmem_cache_create("inotify_watch_cache",
+ sizeof(struct inotify_watch), 0, SLAB_PANIC,
+ NULL, NULL);
+
+ event_cachep = kmem_cache_create("inotify_event_cache",
+ sizeof(struct inotify_kernel_event), 0,
+ SLAB_PANIC, NULL, NULL);
+
+ inode_data_cachep = kmem_cache_create("inotify_inode_data_cache",
+ sizeof(struct inotify_inode_data), 0, SLAB_PANIC,
+ NULL, NULL);
+
+ printk(KERN_INFO "inotify device minor=%d\n", inotify_device.minor);
+
+ return 0;
+}
+
+module_init(inotify_init);
diff -urN linux-2.6.10/drivers/char/Kconfig linux/drivers/char/Kconfig
--- linux-2.6.10/drivers/char/Kconfig 2004-12-24 16:33:49.000000000 -0500
+++ linux/drivers/char/Kconfig 2005-01-18 16:11:08.000000000 -0500
@@ -62,6 +62,19 @@
depends on VT && !S390 && !USERMODE
default y

+config INOTIFY
+ bool "Inotify file change notification support"
+ default y
+ ---help---
+ Say Y here to enable inotify support and the /dev/inotify character
+ device. Inotify is a file change notification system and a
+ replacement for dnotify. Inotify fixes numerous shortcomings in
+ dnotify and introduces several new features. It allows monitoring
+ of both files and directories via a single open fd. Multiple file
+ events are supported.
+
+ If unsure, say Y.
+
config SERIAL_NONSTANDARD
bool "Non-standard serial port support"
---help---
diff -urN linux-2.6.10/drivers/char/Makefile linux/drivers/char/Makefile
--- linux-2.6.10/drivers/char/Makefile 2004-12-24 16:35:29.000000000 -0500
+++ linux/drivers/char/Makefile 2005-01-18 16:11:08.000000000 -0500
@@ -9,6 +9,8 @@

obj-y += mem.o random.o tty_io.o n_tty.o tty_ioctl.o

+
+obj-$(CONFIG_INOTIFY) += inotify.o
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
obj-y += misc.o
diff -urN linux-2.6.10/drivers/char/misc.c linux/drivers/char/misc.c
--- linux-2.6.10/drivers/char/misc.c 2004-12-24 16:35:28.000000000 -0500
+++ linux/drivers/char/misc.c 2005-01-18 16:11:08.000000000 -0500
@@ -207,10 +207,9 @@
int misc_register(struct miscdevice * misc)
{
struct miscdevice *c;
- struct class_device *class;
dev_t dev;
int err;
-
+
down(&misc_sem);
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -224,8 +223,7 @@
while (--i >= 0)
if ( (misc_minors[i>>3] & (1 << (i&7))) == 0)
break;
- if (i<0)
- {
+ if (i<0) {
up(&misc_sem);
return -EBUSY;
}
@@ -240,10 +238,10 @@
}
dev = MKDEV(MISC_MAJOR, misc->minor);

- class = class_simple_device_add(misc_class, dev,
- misc->dev, misc->name);
- if (IS_ERR(class)) {
- err = PTR_ERR(class);
+ misc->class = class_simple_device_add(misc_class, dev,
+ misc->dev, misc->name);
+ if (IS_ERR(misc->class)) {
+ err = PTR_ERR(misc->class);
goto out;
}

diff -urN linux-2.6.10/fs/attr.c linux/fs/attr.c
--- linux-2.6.10/fs/attr.c 2004-12-24 16:34:00.000000000 -0500
+++ linux/fs/attr.c 2005-01-31 15:52:37.000000000 -0500
@@ -10,7 +10,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/fcntl.h>
#include <linux/quotaops.h>
#include <linux/security.h>
@@ -103,31 +103,8 @@
out:
return error;
}
-
EXPORT_SYMBOL(inode_setattr);

-int setattr_mask(unsigned int ia_valid)
-{
- unsigned long dn_mask = 0;
-
- if (ia_valid & ATTR_UID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_GID)
- dn_mask |= DN_ATTRIB;
- if (ia_valid & ATTR_SIZE)
- dn_mask |= DN_MODIFY;
- /* both times implies a utime(s) call */
- if ((ia_valid & (ATTR_ATIME|ATTR_MTIME)) == (ATTR_ATIME|ATTR_MTIME))
- dn_mask |= DN_ATTRIB;
- else if (ia_valid & ATTR_ATIME)
- dn_mask |= DN_ACCESS;
- else if (ia_valid & ATTR_MTIME)
- dn_mask |= DN_MODIFY;
- if (ia_valid & ATTR_MODE)
- dn_mask |= DN_ATTRIB;
- return dn_mask;
-}
-
int notify_change(struct dentry * dentry, struct iattr * attr)
{
struct inode *inode = dentry->d_inode;
@@ -183,11 +160,10 @@
error = inode_setattr(inode, attr);
}
}
- if (!error) {
- unsigned long dn_mask = setattr_mask(ia_valid);
- if (dn_mask)
- dnotify_parent(dentry, dn_mask);
- }
+
+ if (!error)
+ fsnotify_change(dentry, ia_valid);
+
return error;
}

diff -urN linux-2.6.10/fs/compat.c linux/fs/compat.c
--- linux-2.6.10/fs/compat.c 2004-12-24 16:34:44.000000000 -0500
+++ linux/fs/compat.c 2005-02-04 12:07:47.000000000 -0500
@@ -35,7 +35,7 @@
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/dirent.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/highuid.h>
#include <linux/sunrpc/svc.h>
#include <linux/nfsd/nfsd.h>
@@ -1192,9 +1192,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ if (type == READ)
+ fsnotify_access(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, dentry->d_inode,
+ dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.10/fs/file_table.c linux/fs/file_table.c
--- linux-2.6.10/fs/file_table.c 2004-12-24 16:33:50.000000000 -0500
+++ linux/fs/file_table.c 2005-01-31 15:46:49.000000000 -0500
@@ -16,6 +16,7 @@
#include <linux/eventpoll.h>
#include <linux/mount.h>
#include <linux/cdev.h>
+#include <linux/fsnotify.h>

/* sysctl tunables... */
struct files_stat_struct files_stat = {
@@ -121,6 +122,9 @@
struct vfsmount *mnt = file->f_vfsmnt;
struct inode *inode = dentry->d_inode;

+
+ fsnotify_close(dentry, inode, file->f_mode, dentry->d_name.name);
+
might_sleep();
/*
* The function eventpoll_release() should be the first called
diff -urN linux-2.6.10/fs/inode.c linux/fs/inode.c
--- linux-2.6.10/fs/inode.c 2004-12-24 16:35:40.000000000 -0500
+++ linux/fs/inode.c 2005-01-18 16:11:08.000000000 -0500
@@ -130,6 +130,9 @@
#ifdef CONFIG_QUOTA
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
#endif
+#ifdef CONFIG_INOTIFY
+ inode->inotify_data = NULL;
+#endif
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_cdev = NULL;
diff -urN linux-2.6.10/fs/namei.c linux/fs/namei.c
--- linux-2.6.10/fs/namei.c 2004-12-24 16:34:30.000000000 -0500
+++ linux/fs/namei.c 2005-01-31 17:24:21.000000000 -0500
@@ -21,7 +21,7 @@
#include <linux/namei.h>
#include <linux/quotaops.h>
#include <linux/pagemap.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/smp_lock.h>
#include <linux/personality.h>
#include <linux/security.h>
@@ -1241,7 +1241,7 @@
DQUOT_INIT(dir);
error = dir->i_op->create(dir, dentry, mode, nd);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_create(dir, dentry, mode);
}
return error;
@@ -1555,7 +1555,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mknod(dir, dentry, mode, dev);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_mknod(dir, dentry, mode, dev);
}
return error;
@@ -1628,7 +1628,7 @@
DQUOT_INIT(dir);
error = dir->i_op->mkdir(dir, dentry, mode);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_mkdir(dir, dentry->d_name.name);
security_inode_post_mkdir(dir,dentry, mode);
}
return error;
@@ -1722,10 +1722,8 @@
}
}
up(&dentry->d_inode->i_sem);
- if (!error) {
- inode_dir_notify(dir, DN_DELETE);
- d_delete(dentry);
- }
+ if (!error)
+ fsnotify_rmdir(dentry, dentry->d_inode, dir);
dput(dentry);

return error;
@@ -1795,10 +1793,9 @@
up(&dentry->d_inode->i_sem);

/* We don't d_delete() NFS sillyrenamed files--they still exist. */
- if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
- d_delete(dentry);
- inode_dir_notify(dir, DN_DELETE);
- }
+ if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED))
+ fsnotify_unlink(dentry->d_inode, dir, dentry);
+
return error;
}

@@ -1872,7 +1869,7 @@
DQUOT_INIT(dir);
error = dir->i_op->symlink(dir, dentry, oldname);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, dentry->d_name.name);
security_inode_post_symlink(dir, dentry, oldname);
}
return error;
@@ -1945,7 +1942,7 @@
error = dir->i_op->link(old_dentry, dir, new_dentry);
up(&old_dentry->d_inode->i_sem);
if (!error) {
- inode_dir_notify(dir, DN_CREATE);
+ fsnotify_create(dir, new_dentry->d_name.name);
security_inode_post_link(old_dentry, dir, new_dentry);
}
return error;
@@ -2109,6 +2106,7 @@
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+ char *old_name;

if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
@@ -2130,18 +2128,18 @@
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);

+ old_name = fsnotify_oldname_init(old_dentry);
+
if (is_dir)
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
if (!error) {
- if (old_dir == new_dir)
- inode_dir_notify(old_dir, DN_RENAME);
- else {
- inode_dir_notify(old_dir, DN_DELETE);
- inode_dir_notify(new_dir, DN_CREATE);
- }
+ const char *new_name = old_dentry->d_name.name;
+ fsnotify_move(old_dir, new_dir, old_name, new_name);
}
+ fsnotify_oldname_free(old_name);
+
return error;
}

diff -urN linux-2.6.10/fs/open.c linux/fs/open.c
--- linux-2.6.10/fs/open.c 2004-12-24 16:33:50.000000000 -0500
+++ linux/fs/open.c 2005-02-02 11:26:06.000000000 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/smp_lock.h>
#include <linux/quotaops.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tty.h>
@@ -953,9 +953,14 @@
fd = get_unused_fd();
if (fd >= 0) {
struct file *f = filp_open(tmp, flags, mode);
+ struct dentry *dentry;
+
error = PTR_ERR(f);
if (IS_ERR(f))
goto out_error;
+ dentry = f->f_dentry;
+ fsnotify_open(dentry, dentry->d_inode,
+ dentry->d_name.name);
fd_install(fd, f);
}
out:
@@ -1007,7 +1012,7 @@
retval = err;
}

- dnotify_flush(filp, id);
+ fsnotify_flush(filp, id);
locks_remove_posix(filp, id);
fput(filp);
return retval;
diff -urN linux-2.6.10/fs/read_write.c linux/fs/read_write.c
--- linux-2.6.10/fs/read_write.c 2004-12-24 16:35:00.000000000 -0500
+++ linux/fs/read_write.c 2005-01-31 16:35:05.000000000 -0500
@@ -10,7 +10,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/smp_lock.h>
-#include <linux/dnotify.h>
+#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/module.h>
#include <linux/syscalls.h>
@@ -216,8 +216,11 @@
ret = file->f_op->read(file, buf, count, pos);
else
ret = do_sync_read(file, buf, count, pos);
- if (ret > 0)
- dnotify_parent(file->f_dentry, DN_ACCESS);
+ if (ret > 0) {
+ struct dentry *dentry = file->f_dentry;
+ fsnotify_access(dentry, inode,
+ dentry->d_name.name);
+ }
}
}

@@ -260,8 +263,11 @@
ret = file->f_op->write(file, buf, count, pos);
else
ret = do_sync_write(file, buf, count, pos);
- if (ret > 0)
- dnotify_parent(file->f_dentry, DN_MODIFY);
+ if (ret > 0) {
+ struct dentry *dentry = file->f_dentry;
+ fsnotify_modify(dentry, inode,
+ dentry->d_name.name);
+ }
}
}

@@ -493,9 +499,15 @@
out:
if (iov != iovstack)
kfree(iov);
- if ((ret + (type == READ)) > 0)
- dnotify_parent(file->f_dentry,
- (type == READ) ? DN_ACCESS : DN_MODIFY);
+ if ((ret + (type == READ)) > 0) {
+ struct dentry *dentry = file->f_dentry;
+ struct inode *inode = dentry->d_inode;
+
+ if (type == READ)
+ fsnotify_access(dentry, inode, dentry->d_name.name);
+ else
+ fsnotify_modify(dentry, inode, dentry->d_name.name);
+ }
return ret;
}

diff -urN linux-2.6.10/fs/super.c linux/fs/super.c
--- linux-2.6.10/fs/super.c 2004-12-24 16:34:33.000000000 -0500
+++ linux/fs/super.c 2005-01-31 14:53:38.000000000 -0500
@@ -37,9 +37,9 @@
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
#include <linux/kobject.h>
+#include <linux/fsnotify.h>
#include <asm/uaccess.h>

-
void get_filesystem(struct file_system_type *fs);
void put_filesystem(struct file_system_type *fs);
struct file_system_type *get_fs_type(const char *name);
@@ -227,6 +227,7 @@

if (root) {
sb->s_root = NULL;
+ fsnotify_sb_umount(sb);
shrink_dcache_parent(root);
shrink_dcache_anon(&sb->s_anon);
dput(root);
diff -urN linux-2.6.10/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.6.10/include/linux/fs.h 2004-12-24 16:34:27.000000000 -0500
+++ linux/include/linux/fs.h 2005-01-18 16:11:08.000000000 -0500
@@ -27,6 +27,7 @@
struct kstatfs;
struct vm_area_struct;
struct vfsmount;
+struct inotify_inode_data;

/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
@@ -473,6 +474,10 @@
struct dnotify_struct *i_dnotify; /* for directory notifications */
#endif

+#ifdef CONFIG_INOTIFY
+ struct inotify_inode_data *inotify_data;
+#endif
+
unsigned long i_state;
unsigned long dirtied_when; /* jiffies of first dirtying */

@@ -1353,7 +1358,7 @@
extern int do_remount_sb(struct super_block *sb, int flags,
void *data, int force);
extern sector_t bmap(struct inode *, sector_t);
-extern int setattr_mask(unsigned int);
+extern void setattr_mask(unsigned int, int *, u32 *);
extern int notify_change(struct dentry *, struct iattr *);
extern int permission(struct inode *, int, struct nameidata *);
extern int generic_permission(struct inode *, int,
diff -urN linux-2.6.10/include/linux/fsnotify.h linux/include/linux/fsnotify.h
--- linux-2.6.10/include/linux/fsnotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/fsnotify.h 2005-02-04 12:09:48.000000000 -0500
@@ -0,0 +1,235 @@
+#ifndef _LINUX_FS_NOTIFY_H
+#define _LINUX_FS_NOTIFY_H
+
+/*
+ * include/linux/fs_notify.h - generic hooks for filesystem notification, to
+ * reduce in-source duplication from both dnotify and inotify.
+ *
+ * We don't compile any of this away in some complicated menagerie of ifdefs.
+ * Instead, we rely on the code inside to optimize away as needed.
+ *
+ * (C) Copyright 2005 Robert Love
+ */
+
+#ifdef __KERNEL__
+
+#include <linux/dnotify.h>
+#include <linux/inotify.h>
+
+/*
+ * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
+ */
+static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
+ const char *old_name, const char *new_name)
+{
+ u32 cookie;
+
+ if (old_dir == new_dir)
+ inode_dir_notify(old_dir, DN_RENAME);
+ else {
+ inode_dir_notify(old_dir, DN_DELETE);
+ inode_dir_notify(new_dir, DN_CREATE);
+ }
+
+ cookie = inotify_get_cookie();
+
+ inotify_inode_queue_event(old_dir, IN_MOVED_FROM, cookie, old_name);
+ inotify_inode_queue_event(new_dir, IN_MOVED_TO, cookie, new_name);
+}
+
+/*
+ * fsnotify_unlink - file was unlinked
+ */
+static inline void fsnotify_unlink(struct inode *inode, struct inode *dir,
+ struct dentry *dentry)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_FILE, 0, dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_rmdir - directory was removed
+ */
+static inline void fsnotify_rmdir(struct dentry *dentry, struct inode *inode,
+ struct inode *dir)
+{
+ inode_dir_notify(dir, DN_DELETE);
+ inotify_inode_queue_event(dir, IN_DELETE_SUBDIR,0,dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL);
+
+ inotify_inode_is_dead(inode);
+ d_delete(dentry);
+}
+
+/*
+ * fsnotify_create - filename was linked in
+ */
+static inline void fsnotify_create(struct inode *inode, const char *filename)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_FILE, 0, filename);
+}
+
+/*
+ * fsnotify_mkdir - directory 'name' was created
+ */
+static inline void fsnotify_mkdir(struct inode *inode, const char *name)
+{
+ inode_dir_notify(inode, DN_CREATE);
+ inotify_inode_queue_event(inode, IN_CREATE_SUBDIR, 0, name);
+}
+
+/*
+ * fsnotify_access - file was read
+ */
+static inline void fsnotify_access(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_ACCESS);
+ inotify_dentry_parent_queue_event(dentry, IN_ACCESS, 0,
+ dentry->d_name.name);
+ inotify_inode_queue_event(inode, IN_ACCESS, 0, NULL);
+}
+
+/*
+ * fsnotify_modify - file was modified
+ */
+static inline void fsnotify_modify(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ dnotify_parent(dentry, DN_MODIFY);
+ inotify_dentry_parent_queue_event(dentry, IN_MODIFY, 0, filename);
+ inotify_inode_queue_event(inode, IN_MODIFY, 0, NULL);
+}
+
+/*
+ * fsnotify_open - file was opened
+ */
+static inline void fsnotify_open(struct dentry *dentry, struct inode *inode,
+ const char *filename)
+{
+ inotify_inode_queue_event(inode, IN_OPEN, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, IN_OPEN, 0, filename);
+}
+
+/*
+ * fsnotify_close - file was closed
+ */
+static inline void fsnotify_close(struct dentry *dentry, struct inode *inode,
+ mode_t mode, const char *filename)
+{
+ u32 mask;
+
+ mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+ inotify_dentry_parent_queue_event(dentry, mask, 0, filename);
+ inotify_inode_queue_event(inode, mask, 0, NULL);
+}
+
+/*
+ * fsnotify_change - notify_change event. file was modified and/or metadata
+ * was changed.
+ */
+static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
+{
+ int dn_mask = 0;
+ u32 in_mask = 0;
+
+ if (ia_valid & ATTR_UID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_GID) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+ if (ia_valid & ATTR_SIZE) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ /* both times implies a utime(s) call */
+ if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
+ {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ } else if (ia_valid & ATTR_ATIME) {
+ in_mask |= IN_ACCESS;
+ dn_mask |= DN_ACCESS;
+ } else if (ia_valid & ATTR_MTIME) {
+ in_mask |= IN_MODIFY;
+ dn_mask |= DN_MODIFY;
+ }
+ if (ia_valid & ATTR_MODE) {
+ in_mask |= IN_ATTRIB;
+ dn_mask |= DN_ATTRIB;
+ }
+
+ if (dn_mask)
+ dnotify_parent(dentry, dn_mask);
+ if (in_mask) {
+ inotify_inode_queue_event(dentry->d_inode, in_mask, 0, NULL);
+ inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+ dentry->d_name.name);
+ }
+}
+
+/*
+ * fsnotify_sb_umount - filesystem unmount
+ */
+static inline void fsnotify_sb_umount(struct super_block *sb)
+{
+ inotify_super_block_umount(sb);
+}
+
+/*
+ * fsnotify_flush - flush time!
+ */
+static inline void fsnotify_flush(struct file *filp, fl_owner_t id)
+{
+ dnotify_flush(filp, id);
+}
+
+#ifdef CONFIG_INOTIFY /* inotify helpers */
+
+/*
+ * fsnotify_oldname_init - save off the old filename before we change it
+ *
+ * this could be kstrdup if only we could add that to lib/string.c
+ */
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ char *old_name;
+
+ old_name = kmalloc(strlen(old_dentry->d_name.name) + 1, GFP_KERNEL);
+ if (old_name)
+ strcpy(old_name, old_dentry->d_name.name);
+ return old_name;
+}
+
+/*
+ * fsnotify_oldname_free - free the name we got from fsnotify_oldname_init
+ */
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+ kfree(old_name);
+}
+
+#else /* CONFIG_INOTIFY */
+
+static inline char *fsnotify_oldname_init(struct dentry *old_dentry)
+{
+ return NULL;
+}
+
+static inline void fsnotify_oldname_free(const char *old_name)
+{
+}
+
+#endif /* ! CONFIG_INOTIFY */
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_FS_NOTIFY_H */
diff -urN linux-2.6.10/include/linux/inotify.h linux/include/linux/inotify.h
--- linux-2.6.10/include/linux/inotify.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/inotify.h 2005-02-09 16:02:58.291978072 -0500
@@ -0,0 +1,118 @@
+/*
+ * Inode based directory notification for Linux
+ *
+ * Copyright (C) 2005 John McCutchan
+ */
+
+#ifndef _LINUX_INOTIFY_H
+#define _LINUX_INOTIFY_H
+
+#include <linux/types.h>
+#include <linux/limits.h>
+
+/*
+ * struct inotify_event - structure read from the inotify device for each event
+ *
+ * When you are watching a directory, you will receive the filename for events
+ * such as IN_CREATE, IN_DELETE, IN_OPEN, IN_CLOSE, ..., relative to the wd.
+ */
+struct inotify_event {
+ __s32 wd; /* watch descriptor */
+ __u32 mask; /* watch mask */
+ __u32 cookie; /* cookie used for synchronizing two events */
+ size_t len; /* length (including nulls) of name */
+ char name[0]; /* stub for possible name */
+};
+
+/*
+ * struct inotify_watch_request - represents a watch request
+ *
+ * Pass to the inotify device via the INOTIFY_WATCH ioctl
+ */
+struct inotify_watch_request {
+ char *name; /* directory name */
+ __u32 mask; /* event mask */
+};
+
+/* the following are legal, implemented events */
+#define IN_ACCESS 0x00000001 /* File was accessed */
+#define IN_MODIFY 0x00000002 /* File was modified */
+#define IN_ATTRIB 0x00000004 /* File changed attributes */
+#define IN_CLOSE_WRITE 0x00000008 /* Writtable file was closed */
+#define IN_CLOSE_NOWRITE 0x00000010 /* Unwrittable file closed */
+#define IN_OPEN 0x00000020 /* File was opened */
+#define IN_MOVED_FROM 0x00000040 /* File was moved from X */
+#define IN_MOVED_TO 0x00000080 /* File was moved to Y */
+#define IN_DELETE_SUBDIR 0x00000100 /* Subdir was deleted */
+#define IN_DELETE_FILE 0x00000200 /* Subfile was deleted */
+#define IN_CREATE_SUBDIR 0x00000400 /* Subdir was created */
+#define IN_CREATE_FILE 0x00000800 /* Subfile was created */
+#define IN_DELETE_SELF 0x00001000 /* Self was deleted */
+#define IN_UNMOUNT 0x00002000 /* Backing fs was unmounted */
+#define IN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */
+#define IN_IGNORED 0x00008000 /* File was ignored */
+
+/* special flags */
+#define IN_ALL_EVENTS 0xffffffff /* All the events */
+#define IN_CLOSE (IN_CLOSE_WRITE | IN_CLOSE_NOWRITE)
+
+#define INOTIFY_IOCTL_MAGIC 'Q'
+#define INOTIFY_IOCTL_MAXNR 2
+
+#define INOTIFY_WATCH _IOR(INOTIFY_IOCTL_MAGIC, 1, struct inotify_watch_request)
+#define INOTIFY_IGNORE _IOR(INOTIFY_IOCTL_MAGIC, 2, int)
+
+#ifdef __KERNEL__
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/config.h>
+
+struct inotify_inode_data {
+ struct list_head watches; /* list of watches on this inode */
+ spinlock_t lock; /* lock protecting the struct */
+ atomic_t count; /* ref count */
+};
+
+#ifdef CONFIG_INOTIFY
+
+extern void inotify_inode_queue_event(struct inode *, __u32, __u32,
+ const char *);
+extern void inotify_dentry_parent_queue_event(struct dentry *, __u32, __u32,
+ const char *);
+extern void inotify_super_block_umount(struct super_block *);
+extern void inotify_inode_is_dead(struct inode *);
+extern __u32 inotify_get_cookie(void);
+
+#else
+
+static inline void inotify_inode_queue_event(struct inode *inode,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_dentry_parent_queue_event(struct dentry *dentry,
+ __u32 mask, __u32 cookie,
+ const char *filename)
+{
+}
+
+static inline void inotify_super_block_umount(struct super_block *sb)
+{
+}
+
+static inline void inotify_inode_is_dead(struct inode *inode)
+{
+}
+
+static inline __u32 inotify_get_cookie(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_INOTIFY */
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_INOTIFY_H */
diff -urN linux-2.6.10/include/linux/miscdevice.h linux/include/linux/miscdevice.h
--- linux-2.6.10/include/linux/miscdevice.h 2004-12-24 16:34:58.000000000 -0500
+++ linux/include/linux/miscdevice.h 2005-01-18 16:11:08.000000000 -0500
@@ -2,6 +2,7 @@
#define _LINUX_MISCDEVICE_H
#include <linux/module.h>
#include <linux/major.h>
+#include <linux/device.h>

#define PSMOUSE_MINOR 1
#define MS_BUSMOUSE_MINOR 2
@@ -32,13 +33,13 @@

struct device;

-struct miscdevice
-{
+struct miscdevice {
int minor;
const char *name;
struct file_operations *fops;
struct list_head list;
struct device *dev;
+ struct class_device *class;
char devfs_name[64];
};

diff -urN linux-2.6.10/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.6.10/include/linux/sched.h 2004-12-24 16:33:59.000000000 -0500
+++ linux/include/linux/sched.h 2005-01-18 16:11:08.000000000 -0500
@@ -353,6 +353,8 @@
atomic_t processes; /* How many processes does this user have? */
atomic_t files; /* How many open files does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
+ atomic_t inotify_watches; /* How many inotify watches does this user have? */
+ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
/* protected by mq_lock */
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
unsigned long locked_shm; /* How many pages of mlocked shm ? */
diff -urN linux-2.6.10/kernel/user.c linux/kernel/user.c
--- linux-2.6.10/kernel/user.c 2004-12-24 16:34:31.000000000 -0500
+++ linux/kernel/user.c 2005-01-18 16:11:08.000000000 -0500
@@ -119,6 +119,8 @@
atomic_set(&new->processes, 0);
atomic_set(&new->files, 0);
atomic_set(&new->sigpending, 0);
+ atomic_set(&new->inotify_watches, 0);
+ atomic_set(&new->inotify_devs, 0);

new->mq_bytes = 0;
new->locked_shm = 0;


2005-02-18 17:24:27

by Al Viro

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-rc3-mm2

On Fri, Feb 18, 2005 at 11:40:59AM -0500, Robert Love wrote:
> inotify, bitches

/me does "pick a random function, find a race" again.

> +/*
> + * inode_add_watch - add a watch to the given inode
> + *
> + * Callers must hold dev->lock, because we call inode_find_dev().
> + */
> +static int inode_add_watch(struct inode *inode, struct inotify_watch *watch)
[snip]
> + list_add(&watch->i_list, &inode->inotify_data->watches);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... and that is protected by what?
> +
> + return 0;

Fix the damn locking, already.

2005-02-18 17:54:55

by Robert Love

[permalink] [raw]
Subject: Re: [patch] inotify for 2.6.11-rc3-mm2

On Fri, 2005-02-18 at 17:24 +0000, Al Viro wrote:

> Fix the damn locking, already.

Fast as I can.

Robert Love