2009-06-13 18:36:25

by Bart Trojanowski

[permalink] [raw]
Subject: [v2.6.30 nfs+fscache] kswapd1: blocked for more than 120 seconds

Hi,

I ran into a fscache lockup. The code seems to be waiting for a page
write event which never comes. I am CC'ing the nfs and MM lists just in
case there is something obvious here.

More info follows, and let me know if you need anything else. I'll
leave the box in this state until kswapd hang (D state) causes me other
issues :)

I am running:
- on an 8-way amd64 system;
- linux v2.6.30 with the following enabled:
CONFIG_PREEMPT_VOLUNTARY=y
...
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
CONFIG_FSCACHE_HISTOGRAM=y
CONFIG_CACHEFILES=m
CONFIG_CACHEFILES_HISTOGRAM=y
...
CONFIG_NFS_FSCACHE=y
(see attached config for the rest)
- cachefilesd_0.9-2 (from debian/testing)
- nfs-utils-1.2.0-2 (from debian/testing)
- my $HOME dir is mounted with rw,nodev,fsc,udp,rsize=32768,wsize=32768,soft,intr,nfsvers=3
- my cachefilesd uses xfs on MD raid0 volume over two SATA disks

I installed and setup everything yesterday and was pretty happy with how
it worked. Today, I thought I would look at the /proc/fs/fscache/
stats. More specifically I ran 'watch -n1 -d cat /proc/fs/fscache/*' in
one terminal. In another terminal I decided to time catting mail/**/*
to /dev/null before and after seeding the cache (and a remount). I
didn't get that far. cat hung in a D state. dmesg shows:

INFO: task kswapd0:423 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0 D 0000000000000000 0 423 2
ffff88023e5a3950 0000000000000046 ffff88023e5a38c0 0000000000000002
0000000000000002 ffffe20005368108 0000000000012cc0 000000000000d4a8
ffff88023e5a8000 ffff88023f0bb160 ffff88023e5a8598 000000028035a005
Call Trace:
[<ffffffff8022653f>] ? default_spin_lock_flags+0x9/0xe
[<ffffffff805890de>] schedule+0xe/0x22
[<ffffffffa0171e24>] __fscache_wait_on_page_write+0x98/0xb4 [fscache]
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffffa01c1216>] nfs_fscache_release_page+0x53/0xcb [nfs]
[<ffffffffa019ca5c>] nfs_release_page+0x41/0x4a [nfs]
[<ffffffff80299fd8>] try_to_release_page+0x34/0x3d
[<ffffffff802a59bc>] shrink_page_list+0x4db/0x681
[<ffffffff802a4a34>] ? isolate_pages_global+0x192/0x232
[<ffffffff80374233>] ? kmem_free+0x2b/0x34
[<ffffffff802a61d7>] shrink_list+0x2af/0x5da
[<ffffffff80235a5d>] ? update_curr+0x75/0x152
[<ffffffff8023854b>] ? enqueue_entity+0x22b/0x234
[<ffffffff8022653f>] ? default_spin_lock_flags+0x9/0xe
[<ffffffff802a67a4>] shrink_zone+0x2a2/0x34f
[<ffffffff803b2173>] ? __up_read+0x92/0x9b
[<ffffffff802a6998>] ? shrink_slab+0x147/0x159
[<ffffffff802a70d3>] kswapd+0x473/0x63c
[<ffffffff802a48a2>] ? isolate_pages_global+0x0/0x232
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff802a6c60>] ? kswapd+0x0/0x63c
[<ffffffff802a6c60>] ? kswapd+0x0/0x63c
[<ffffffff80258783>] kthread+0x5b/0x88
[<ffffffff8020cd4a>] child_rip+0xa/0x20
[<ffffffff80258728>] ? kthread+0x0/0x88
[<ffffffff8020cd40>] ? child_rip+0x0/0x20

INFO: task kswapd1:424 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd1 D 0000000000000000 0 424 2
ffff88023e5b1950 0000000000000046 0000000000000082 ffff88038a92aea0
0000000000000002 ffffffff8020c6ce 0000000000012cc0 000000000000d4a8
ffff88023e5a98b0 ffff88023f103160 ffff88023e5a9e48 000000073e5a98b0
Call Trace:
[<ffffffff8020c6ce>] ? common_interrupt+0xe/0x13
[<ffffffff8022653f>] ? default_spin_lock_flags+0x9/0xe
[<ffffffff805890de>] schedule+0xe/0x22
[<ffffffffa0171e24>] __fscache_wait_on_page_write+0x98/0xb4 [fscache]
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffffa0218276>] ? cachefiles_uncache_page+0x16/0x1a [cachefiles]
[<ffffffffa01c1216>] nfs_fscache_release_page+0x53/0xcb [nfs]
[<ffffffffa019ca5c>] nfs_release_page+0x41/0x4a [nfs]
[<ffffffff80299fd8>] try_to_release_page+0x34/0x3d
[<ffffffff802a59bc>] shrink_page_list+0x4db/0x681
[<ffffffff802a61d7>] shrink_list+0x2af/0x5da
[<ffffffff80235a5d>] ? update_curr+0x75/0x152
[<ffffffff80236416>] ? dequeue_entity+0x20/0x1e4
[<ffffffff8020a654>] ? __switch_to+0xb4/0x273
[<ffffffff8023636b>] ? set_next_entity+0x99/0xea
[<ffffffff8023f5b5>] ? finish_task_switch+0x57/0xf0
[<ffffffff802a67a4>] shrink_zone+0x2a2/0x34f
[<ffffffff803b2173>] ? __up_read+0x92/0x9b
[<ffffffff802a70d3>] kswapd+0x473/0x63c
[<ffffffff802a48a2>] ? isolate_pages_global+0x0/0x232
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff802a6c60>] ? kswapd+0x0/0x63c
[<ffffffff802a6c60>] ? kswapd+0x0/0x63c
[<ffffffff80258783>] kthread+0x5b/0x88
[<ffffffff8020cd4a>] child_rip+0xa/0x20
[<ffffffff80258728>] ? kthread+0x0/0x88
[<ffffffff8020cd40>] ? child_rip+0x0/0x20

INFO: task cat:6151 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cat D 0000000000000000 0 6151 6021
ffff8803c41997d8 0000000000000086 00000002c4199738 ffff88043e676c00
0000000000000002 ffff88043e676ce8 0000000000012cc0 000000000000d4a8
ffff88043e56ca10 ffff88023f0e8000 ffff88043e56cfa8 00000004000002b2
Call Trace:
[<ffffffff8058b097>] ? _spin_lock+0xe/0x12
[<ffffffff805890de>] schedule+0xe/0x22
[<ffffffff8058af12>] __down_read+0xa8/0xc2
[<ffffffffa02161de>] ? cachefiles_has_space+0x43/0x187 [cachefiles]
[<ffffffff8058a008>] down_read+0x1e/0x22
[<ffffffff80359282>] xfs_ilock+0x37/0x62
[<ffffffff80375c3e>] xfs_vm_bmap+0x2f/0x6c
[<ffffffffa02187a2>] cachefiles_read_or_alloc_pages+0x1df/0x95e [cachefiles]
[<ffffffff8029e89a>] ? __rmqueue+0x24/0x1f3
[<ffffffff8029eaee>] ? rmqueue_bulk+0x85/0x99
[<ffffffff802abb6b>] ? zone_statistics+0x65/0x6a
[<ffffffff80258ac0>] ? bit_waitqueue+0x17/0xa8
[<ffffffff80258ba7>] ? wake_up_bit+0x23/0x28
[<ffffffffa0170666>] ? fscache_run_op+0x2d/0x47 [fscache]
[<ffffffffa01718f3>] __fscache_read_or_alloc_pages+0x1d2/0x24f [fscache]
[<ffffffffa01c0e7a>] __nfs_readpages_from_fscache+0x84/0x171 [nfs]
[<ffffffffa01a755e>] nfs_readpages+0x11f/0x1d4 [nfs]
[<ffffffff802c182c>] ? alloc_pages_current+0xbe/0xc7
[<ffffffff802a2e56>] __do_page_cache_readahead+0x10d/0x197
[<ffffffff802a314f>] ondemand_readahead+0x17d/0x18f
[<ffffffff802a31db>] page_cache_async_readahead+0x7a/0x86
[<ffffffff8029c191>] generic_file_aio_read+0x27c/0x5c2
[<ffffffffa019d941>] nfs_file_read+0xeb/0xfe [nfs]
[<ffffffff802ccc62>] do_sync_read+0xec/0x132
[<ffffffff802121e6>] ? native_sched_clock+0x32/0x60
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff80589056>] ? thread_return+0x41/0xbb
[<ffffffff8038da48>] ? security_file_permission+0x16/0x18
[<ffffffff802cd85a>] vfs_read+0xb0/0x159
[<ffffffff802cd9d1>] sys_read+0x4c/0x74
[<ffffffff8020bd72>] system_call_fastpath+0x16/0x1b

INFO: task kslowd:6259 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kslowd D 0000000000000000 0 6259 2
ffff8802a8991420 0000000000000046 0000000000000082 ffff88038a92aaa0
0000000000000002 0000000000000010 0000000000012cc0 000000000000d4a8
ffff88042a9798b0 ffff88023f0f4a10 ffff88042a979e48 00000006803b1545
Call Trace:
[<ffffffff8022653f>] ? default_spin_lock_flags+0x9/0xe
[<ffffffff805890de>] schedule+0xe/0x22
[<ffffffffa0171e24>] __fscache_wait_on_page_write+0x98/0xb4 [fscache]
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffffa01c1216>] nfs_fscache_release_page+0x53/0xcb [nfs]
[<ffffffffa019ca5c>] nfs_release_page+0x41/0x4a [nfs]
[<ffffffff80299fd8>] try_to_release_page+0x34/0x3d
[<ffffffff802a59bc>] shrink_page_list+0x4db/0x681
[<ffffffff802a61d7>] shrink_list+0x2af/0x5da
[<ffffffff8022653f>] ? default_spin_lock_flags+0x9/0xe
[<ffffffff803b20d2>] ? __up_write+0x115/0x124
[<ffffffff802a67a4>] shrink_zone+0x2a2/0x34f
[<ffffffff802601f2>] ? getnstimeofday+0x5c/0xb8
[<ffffffff802a78c2>] try_to_free_pages+0x26f/0x3db
[<ffffffff802a48a2>] ? isolate_pages_global+0x0/0x232
[<ffffffff803b0799>] ? __prop_inc_single+0x37/0x3c
[<ffffffff802a1047>] __alloc_pages_internal+0x29b/0x448
[<ffffffff802c182c>] alloc_pages_current+0xbe/0xc7
[<ffffffff802c62b4>] new_slab+0x5c/0x287
[<ffffffff802c66e7>] __slab_alloc+0x208/0x3e9
[<ffffffff80374188>] ? kmem_zone_alloc+0x6c/0xb7
[<ffffffff80374188>] ? kmem_zone_alloc+0x6c/0xb7
[<ffffffff802c6bce>] kmem_cache_alloc+0x7c/0xf1
[<ffffffff80374188>] kmem_zone_alloc+0x6c/0xb7
[<ffffffff803741e7>] kmem_zone_zalloc+0x14/0x35
[<ffffffff8036db8d>] xfs_trans_dup+0x20/0xec
[<ffffffff8035c210>] xfs_itruncate_finish+0x1d6/0x28a
[<ffffffff80371f7d>] xfs_free_eofblocks+0x1c9/0x20f
[<ffffffff802ccb30>] ? do_sync_write+0xec/0x132
[<ffffffff80372953>] xfs_release+0x182/0x193
[<ffffffff803785e9>] xfs_file_release+0x15/0x19
[<ffffffff802cdf6d>] __fput+0xe4/0x1a5
[<ffffffff802ce04b>] fput+0x1d/0x1f
[<ffffffffa021842e>] cachefiles_write_page+0x1b4/0x205 [cachefiles]
[<ffffffffa0171421>] fscache_write_op+0xd9/0x17f [fscache]
[<ffffffffa016ffb6>] fscache_op_execute+0x41/0x6e [fscache]
[<ffffffff80299bbe>] slow_work_thread+0x280/0x44a
[<ffffffff8023cf59>] ? default_wake_function+0x12/0x14
[<ffffffff80258bac>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff8029993e>] ? slow_work_thread+0x0/0x44a
[<ffffffff8029993e>] ? slow_work_thread+0x0/0x44a
[<ffffffff80258783>] kthread+0x5b/0x88
[<ffffffff8020cd4a>] child_rip+0xa/0x20
[<ffffffff80258728>] ? kthread+0x0/0x88
[<ffffffff8020cd40>] ? child_rip+0x0/0x20

... and then it just repeats the stuff above.

-Bart

--
WebSig: http://www.jukie.net/~bart/sig/


Attachments:
(No filename) (9.42 kB)
config (66.93 kB)
Download all attachments

2009-06-14 14:15:13

by Bart Trojanowski

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] swapper: possible circular locking dependency detected

Hi,

yesterday [1] I reported a kswapd hanging in D state when using fscache
for nfs.

[1] http://article.gmane.org/gmane.linux.kernel.mm/35601

Since then I rebuilt the kernel with lockdep and booted to find out that
by default...

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.

... lockdep wasn't going to help. I increased the value by 128 (hence
the 'dirty' in the version) and now I am seeing the following report
shortly after boot.

If you scroll down past the locking report, you'll see 'CacheFiles:
Lookup failed'. This shows up after heavy use (with cat) of my nfs
filesystem with fscache enabled. Around that time I saw my user space
program report 'Permission denied'.

Let me know if you need more info.

-Bart

---- 8< ----

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.30-kvm3-dirty #4
-------------------------------------------------------
swapper/0 is trying to acquire lock:
(&cwq->lock){..-...}, at: [<ffffffff80256c37>] __queue_work+0x1d/0x43

but task is already holding lock:
(&q->lock){-.-.-.}, at: [<ffffffff80235b6a>] __wake_up+0x27/0x55

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&q->lock){-.-.-.}:
[<ffffffff8026b7f6>] __lock_acquire+0x1350/0x16b4
[<ffffffff8026bc21>] lock_acquire+0xc7/0xf3
[<ffffffff805a22e1>] _spin_lock_irqsave+0x4f/0x86
[<ffffffff80235b6a>] __wake_up+0x27/0x55
[<ffffffff8025620b>] insert_work+0x9a/0xa6
[<ffffffff80256c49>] __queue_work+0x2f/0x43
[<ffffffff80256cec>] queue_work_on+0x4a/0x53
[<ffffffff80256e49>] queue_work+0x1f/0x21
[<ffffffff80255a2c>] call_usermodehelper_exec+0x8e/0xbc
[<ffffffff803be29d>] kobject_uevent_env+0x3b0/0x3ee
[<ffffffff803be2e6>] kobject_uevent+0xb/0xd
[<ffffffff803bd796>] kset_register+0x37/0x3f
[<ffffffff804402f7>] bus_register+0xf3/0x280
[<ffffffff80a2b7af>] platform_bus_init+0x2c/0x44
[<ffffffff80a2b865>] driver_init+0x1d/0x29
[<ffffffff80a04663>] kernel_init+0x123/0x18c
[<ffffffff8020ce8a>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&cwq->lock){..-...}:
[<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by swapper/0:
#0: (&q->lock){-.-.-.}, at: [<ffffffff80235b6a>] __wake_up+0x27/0x55

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.30-kvm3-dirty #4
Call Trace:
<IRQ> [<ffffffff80269ffe>] print_circular_bug_tail+0xc1/0xcc
[<ffffffff8026b52b>] __lock_acquire+0x1085/0x16b4
[<ffffffff802685b4>] ? save_trace+0x3f/0xa6
[<ffffffff8026ba78>] ? __lock_acquire+0x15d2/0x16b4
[<ffffffff8026bc21>] lock_acquire+0xc7/0xf3
[<ffffffff80256c37>] ? __queue_work+0x1d/0x43
[<ffffffff805a22e1>] _spin_lock_irqsave+0x4f/0x86
[<ffffffff80256c37>] ? __queue_work+0x1d/0x43
[<ffffffff80256c37>] __queue_work+0x1d/0x43
[<ffffffff80256cec>] queue_work_on+0x4a/0x53
[<ffffffff80256e49>] queue_work+0x1f/0x21
[<ffffffff80256e66>] schedule_work+0x1b/0x1d
[<ffffffffa00e9268>] fscache_enqueue_operation+0xec/0x11e [fscache]
[<ffffffffa00fd662>] cachefiles_read_waiter+0xee/0x102 [cachefiles]
[<ffffffff80233a55>] __wake_up_common+0x4b/0x7a
[<ffffffff80235b80>] __wake_up+0x3d/0x55
[<ffffffff8025a2f1>] __wake_up_bit+0x31/0x33
[<ffffffff802a52af>] unlock_page+0x27/0x2b
[<ffffffff80300b21>] mpage_end_io_read+0x60/0x77
[<ffffffff802fbb93>] bio_endio+0x2f/0x31
[<ffffffff804bf170>] raid_end_bio_io+0x3c/0x8c
[<ffffffff804c0123>] raid1_end_read_request+0xb4/0x13b
[<ffffffff802124aa>] ? native_sched_clock+0x32/0x60
[<ffffffff802fbb93>] bio_endio+0x2f/0x31
[<ffffffff803ac3e2>] req_bio_endio+0xa7/0xc6
[<ffffffff803ac5b1>] __end_that_request_first+0x1b0/0x2ca
[<ffffffff803ac4d3>] ? __end_that_request_first+0xd2/0x2ca
[<ffffffff803ac6f5>] end_that_request_data+0x2a/0x5f
[<ffffffff803ad29e>] blk_end_io+0x22/0x7c
[<ffffffff803ad333>] blk_end_request+0x13/0x15
[<ffffffff8044e36e>] scsi_io_completion+0x1d8/0x458
[<ffffffff804475bf>] scsi_finish_command+0xf1/0xfa
[<ffffffff8044e728>] scsi_softirq_done+0x125/0x12e
[<ffffffff803b1ac5>] blk_done_softirq+0x81/0x91
[<ffffffff8024a714>] __do_softirq+0xbc/0x198
[<ffffffff8020cf8c>] call_softirq+0x1c/0x28
[<ffffffff8020e974>] do_softirq+0x50/0xb1
[<ffffffff8024a26c>] irq_exit+0x53/0x8d
[<ffffffff805a2e69>] do_IRQ+0xb1/0xc8
[<ffffffff8020c793>] ret_from_intr+0x0/0x16
<EOI> [<ffffffff8023fcf9>] finish_task_switch+0x40/0x111
[<ffffffff8025e71b>] ? __atomic_notifier_call_chain+0x0/0x87
[<ffffffff802259c5>] ? native_safe_halt+0xb/0xd
[<ffffffff802697d1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8021345b>] default_idle+0x71/0xc2
[<ffffffff8025e793>] ? __atomic_notifier_call_chain+0x78/0x87
[<ffffffff8025e71b>] ? __atomic_notifier_call_chain+0x0/0x87
[<ffffffff802136e0>] c1e_idle+0x11e/0x125
[<ffffffff8025e7b1>] ? atomic_notifier_call_chain+0xf/0x11
[<ffffffff8020af32>] cpu_idle+0x62/0xa3
[<ffffffff8059ac0c>] start_secondary+0x1c1/0x1c5

---- 8< ----

CacheFiles: Lookup failed error -105
CacheFiles: Lookup failed error -105

INFO: task cat:8884 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cat D ffff880028012d00 0 8884 8832
ffff880436243998 0000000000000046 0000000000000000 ffff880028012d18
ffff880028012d00 00000000001d3000 000000000000d4e0 ffff880436378000
ffff88043f0d8000 ffff880436378598 0000000700000046 ffffc20000ead018
Call Trace:
[<ffffffff802697d1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8059f527>] schedule+0xe/0x22
[<ffffffffa00e8136>] fscache_wait_bit+0xe/0x12 [fscache]
[<ffffffff8059fbd0>] __wait_on_bit+0x4c/0x7e
[<ffffffffa00e8128>] ? fscache_wait_bit+0x0/0x12 [fscache]
[<ffffffffa00e8128>] ? fscache_wait_bit+0x0/0x12 [fscache]
[<ffffffff8059fc71>] out_of_line_wait_on_bit+0x6f/0x7c
[<ffffffff8025a358>] ? wake_bit_function+0x0/0x2f
[<ffffffffa00eaaaa>] __fscache_read_or_alloc_pages+0x1a4/0x252 [fscache]
[<ffffffffa01d66ca>] ? nfs_readpage_from_fscache_complete+0x0/0x6b [nfs]
[<ffffffffa01d65dd>] __nfs_readpages_from_fscache+0x87/0x174 [nfs]
[<ffffffffa01bc58e>] nfs_readpages+0x11f/0x1d4 [nfs]
[<ffffffff802807d8>] ? cpuset_update_task_memory_state+0x6a/0x11e
[<ffffffff8028076e>] ? cpuset_update_task_memory_state+0x0/0x11e
[<ffffffff802cc566>] ? alloc_pages_current+0xbe/0xc7
[<ffffffff802ad8d4>] __do_page_cache_readahead+0x164/0x1f1
[<ffffffff802ad7f0>] ? __do_page_cache_readahead+0x80/0x1f1
[<ffffffff8026bc6b>] ? print_lock_contention_bug+0x1e/0x110
[<ffffffff802adc2d>] ondemand_readahead+0x1d4/0x1e9
[<ffffffff802adce7>] page_cache_sync_readahead+0x1c/0x1e
[<ffffffff802a6833>] generic_file_aio_read+0x23a/0x5d4
[<ffffffffa01d36f9>] ? nfs_have_delegation+0x0/0x8a [nfs]
[<ffffffffa01b2909>] nfs_file_read+0xeb/0xfe [nfs]
[<ffffffff802d7f36>] do_sync_read+0xec/0x132
[<ffffffff802dbb53>] ? cp_new_stat+0xe7/0xf4
[<ffffffff8025a31b>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff8039ba1c>] ? security_file_permission+0x16/0x18
[<ffffffff802d8b2e>] vfs_read+0xb0/0x159
[<ffffffff802d8ca5>] sys_read+0x4c/0x74
[<ffffffff8020bd72>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.

INFO: task top:10437 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
top D ffff88024d238680 0 10437 5425
ffff880085d53998 0000000000000046 0000000000000000 ffff88024d238698
ffff88024d238680 00000000001d3000 000000000000d4e0 ffff8803fa5b0000
ffff88043f0d8000 ffff8803fa5b0598 0000000700000046 ffffc20000ead018
Call Trace:
[<ffffffff802697d1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8059f527>] schedule+0xe/0x22
[<ffffffffa00e8136>] fscache_wait_bit+0xe/0x12 [fscache]
[<ffffffff8059fbd0>] __wait_on_bit+0x4c/0x7e
[<ffffffffa00e8128>] ? fscache_wait_bit+0x0/0x12 [fscache]
[<ffffffffa00e8128>] ? fscache_wait_bit+0x0/0x12 [fscache]
[<ffffffff8059fc71>] out_of_line_wait_on_bit+0x6f/0x7c
[<ffffffff8025a358>] ? wake_bit_function+0x0/0x2f
[<ffffffffa00eaaaa>] __fscache_read_or_alloc_pages+0x1a4/0x252 [fscache]
[<ffffffffa01d66ca>] ? nfs_readpage_from_fscache_complete+0x0/0x6b [nfs]
[<ffffffffa01d65dd>] __nfs_readpages_from_fscache+0x87/0x174 [nfs]
[<ffffffffa01bc58e>] nfs_readpages+0x11f/0x1d4 [nfs]
[<ffffffff802807d8>] ? cpuset_update_task_memory_state+0x6a/0x11e
[<ffffffff8028076e>] ? cpuset_update_task_memory_state+0x0/0x11e
[<ffffffff802cc566>] ? alloc_pages_current+0xbe/0xc7
[<ffffffff802ad8d4>] __do_page_cache_readahead+0x164/0x1f1
[<ffffffff802ad7f0>] ? __do_page_cache_readahead+0x80/0x1f1
[<ffffffff802adc2d>] ondemand_readahead+0x1d4/0x1e9
[<ffffffff802adce7>] page_cache_sync_readahead+0x1c/0x1e
[<ffffffff802a6833>] generic_file_aio_read+0x23a/0x5d4
[<ffffffffa01d36f9>] ? nfs_have_delegation+0x0/0x8a [nfs]
[<ffffffffa01b2909>] nfs_file_read+0xeb/0xfe [nfs]
[<ffffffff802d7f36>] do_sync_read+0xec/0x132
[<ffffffff802d69bf>] ? nameidata_to_filp+0x46/0x57
[<ffffffff8025a31b>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff802d63b7>] ? fd_install+0x35/0x64
[<ffffffff8039ba1c>] ? security_file_permission+0x16/0x18
[<ffffffff802d8b2e>] vfs_read+0xb0/0x159
[<ffffffff802d8ca5>] sys_read+0x4c/0x74
[<ffffffff8020bd72>] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.


--
WebSig: http://www.jukie.net/~bart/sig/

2009-06-14 17:43:39

by Bart Trojanowski

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] swapper: possible circular locking dependency detected

It's me again.

I am tyring to decipher the lockdep report...

* Bart Trojanowski <[email protected]> [090614 10:15]:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.30-kvm3-dirty #4
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
> (&cwq->lock){..-...}, at: [<ffffffff80256c37>] __queue_work+0x1d/0x43
>
> but task is already holding lock:
> (&q->lock){-.-.-.}, at: [<ffffffff80235b6a>] __wake_up+0x27/0x55
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&q->lock){-.-.-.}:
> [<ffffffff8026b7f6>] __lock_acquire+0x1350/0x16b4
> [<ffffffff8026bc21>] lock_acquire+0xc7/0xf3
> [<ffffffff805a22e1>] _spin_lock_irqsave+0x4f/0x86
> [<ffffffff80235b6a>] __wake_up+0x27/0x55
> [<ffffffff8025620b>] insert_work+0x9a/0xa6
> [<ffffffff80256c49>] __queue_work+0x2f/0x43
> [<ffffffff80256cec>] queue_work_on+0x4a/0x53
> [<ffffffff80256e49>] queue_work+0x1f/0x21
<snip>

So, here I can see that we take the cwq->lock first, when __queue_work
does:

spin_lock_irqsave(&cwq->lock, flags);
insert_work(cwq, work, &cwq->worklist);
spin_unlock_irqrestore(&cwq->lock, flags);

and later take the q->lock when insert_work calls to __wake_up:

spin_lock_irqsave(&q->lock, flags);
__wake_up_common(q, mode, nr_exclusive, 0, key);
spin_unlock_irqrestore(&q->lock, flags);

But previously the order was reversed:

> stack backtrace:
> Pid: 0, comm: swapper Not tainted 2.6.30-kvm3-dirty #4
> Call Trace:
> <IRQ> [<ffffffff80269ffe>] print_circular_bug_tail+0xc1/0xcc
> [<ffffffff8026b52b>] __lock_acquire+0x1085/0x16b4
> [<ffffffff802685b4>] ? save_trace+0x3f/0xa6
> [<ffffffff8026ba78>] ? __lock_acquire+0x15d2/0x16b4
> [<ffffffff8026bc21>] lock_acquire+0xc7/0xf3
> [<ffffffff80256c37>] ? __queue_work+0x1d/0x43
> [<ffffffff805a22e1>] _spin_lock_irqsave+0x4f/0x86
> [<ffffffff80256c37>] ? __queue_work+0x1d/0x43
> [<ffffffff80256c37>] __queue_work+0x1d/0x43
> [<ffffffff80256cec>] queue_work_on+0x4a/0x53
> [<ffffffff80256e49>] queue_work+0x1f/0x21
> [<ffffffff80256e66>] schedule_work+0x1b/0x1d
> [<ffffffffa00e9268>] fscache_enqueue_operation+0xec/0x11e [fscache]
> [<ffffffffa00fd662>] cachefiles_read_waiter+0xee/0x102 [cachefiles]
> [<ffffffff80233a55>] __wake_up_common+0x4b/0x7a
> [<ffffffff80235b80>] __wake_up+0x3d/0x55
> [<ffffffff8025a2f1>] __wake_up_bit+0x31/0x33
> [<ffffffff802a52af>] unlock_page+0x27/0x2b

Here the __wake_up happens first, which takes the q->lock, and later the
__queue_work would take the cwq->lock.

I am guessing that it's not safe for fscache to call out to queue_work
from this cachefiles_read_waiter() context (more specifically
fscache_enqueue_operation calls schedule_work). I don't have much
experience with lockdep... does that make any sense?

-Bart

--
WebSig: http://www.jukie.net/~bart/sig/

2009-06-15 12:20:05

by David Howells

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] kswapd1: blocked for more than 120 seconds

Bart Trojanowski <[email protected]> wrote:

> - my cachefilesd uses xfs on MD raid0 volume over two SATA disks

Is it possible for you try it with ext3 instead of XFS? I'd be interested to
know if this is something XFS specific.

David

2009-06-15 12:37:09

by Bart Trojanowski

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] kswapd1: blocked for more than 120 seconds

* David Howells <[email protected]> [090615 08:20]:
> Bart Trojanowski <[email protected]> wrote:
>
> > - my cachefilesd uses xfs on MD raid0 volume over two SATA disks
>
> Is it possible for you try it with ext3 instead of XFS? I'd be interested to
> know if this is something XFS specific.

Sure, I'll create a new lvm volume with ext3 on it and give it a try.
Can I just shutdown cachefilesd, relocate the cahce, and restart the
daemon without remounting the nfs volumes?

-Bart

--
WebSig: http://www.jukie.net/~bart/sig/

2009-06-15 14:04:17

by David Howells

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] kswapd1: blocked for more than 120 seconds

Bart Trojanowski <[email protected]> wrote:

> Sure, I'll create a new lvm volume with ext3 on it and give it a try.
> Can I just shutdown cachefilesd, relocate the cahce, and restart the
> daemon without remounting the nfs volumes?

Yes... But it won't begin caching any files that are already in the inode
cache in memory until they're discarded from the icache and are iget'd
again... Which is something else I need to look at.

David

2009-06-17 12:05:01

by Bart Trojanowski

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078

Hi David,

I still owe you a test with cachefilesd caching to ext3 (not xfs), I'll
have to free up some space on lvm to make a new volume.

I just saw another issue with fscache; setup is the same as in previous
emails.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa00e8cb6>] fscache_object_slow_work_execute+0x44c/0x814 [fscache]
PGD 396ef8067 PUD 4057a6067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/block/md1/size
CPU 7
Modules linked in: iptable_filter ip_tables x_tables tun virtio_rng virtio_pci virtio_ring virtio_net virtio_console virtio_blk virtio nfs lockd nfs_acl auth_rpcgss sunrpc kvm_amd kvm cachefiles fscache cpufreq_userspace cpufreq_powersave cpufreq_conservative ipv6 bridge stp ext2 mbcache loop pcspkr psmouse ata_generic pata_acpi rtc_cmos rtc_core rtc_lib button
Pid: 16705, comm: kslowd Not tainted 2.6.30-kvm3-dirty #4 empty
RIP: 0010:[<ffffffffa00e8cb6>] [<ffffffffa00e8cb6>] fscache_object_slow_work_execute+0x44c/0x814 [fscache]
RSP: 0018:ffff88037b2bde50 EFLAGS: 00010206
RAX: 0000000000000078 RBX: ffff88042d8f5de0 RCX: 00000000ffffffff
RDX: ffffffff805a21a3 RSI: ffff88042d8f5df8 RDI: 0000000000000246
RBP: ffff88037b2bdea0 R08: 0000000000000002 R09: ffffffffa00e8ca3
R10: ffffffffa00fa6e0 R11: ffffffff8023fcf9 R12: 0000000000000101
R13: ffff88042d8f5e78 R14: ffff88042d8f5dc0 R15: ffff88042d8f5e30
FS: 00007f1fcfaaa6e0(0000) GS:ffffc20000cda000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 0000000375cc0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kslowd (pid: 16705, threadinfo ffff88037b2bc000, task ffff880407d50000)
Stack:
ffffffff807beae0 0000000103bb5248 ffffffff807beae0 0000000000000101
ffff88037b2bde80 ffff88042d8f5e78 0000000000000101 0000000000000032
0000000000000064 ffff88037b2bdec0 ffff88037b2bdf20 ffffffff802a4166
Call Trace:
[<ffffffff802a4166>] slow_work_thread+0x278/0x43a
[<ffffffff8025a31b>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff802a3eee>] ? slow_work_thread+0x0/0x43a
[<ffffffff802a3eee>] ? slow_work_thread+0x0/0x43a
[<ffffffff80259f0c>] kthread+0x5b/0x88
[<ffffffff8020ce8a>] child_rip+0xa/0x20
[<ffffffff805a1f50>] ? _spin_unlock_irq+0x30/0x3b
[<ffffffff8020c850>] ? restore_args+0x0/0x30
[<ffffffff8023fcf9>] ? finish_task_switch+0x40/0x111
[<ffffffff80259e8c>] ? kthreadd+0x10f/0x134
[<ffffffff80259eb1>] ? kthread+0x0/0x88
[<ffffffff8020ce80>] ? child_rip+0x0/0x20
Code: 66 68 fd 49 8d 5e 20 4c 89 f7 48 8b 00 ff 50 18 48 89 df e8 9e 94 4b e0 49 8b 45 f0 41 c7 85 48 ff ff ff 06 00 00 00 48 83 c0 78 <f0> 0f ba 30 01 19 d2 85 d2 74 12 49 8b 7d f0 be 01 00 00 00 48
RIP [<ffffffffa00e8cb6>] fscache_object_slow_work_execute+0x44c/0x814 [fscache]
RSP <ffff88037b2bde50>
CR2: 0000000000000078
---[ end trace bc3032821b5333dd ]---

-Bart

--
WebSig: http://www.jukie.net/~bart/sig/

2009-06-18 13:20:30

by Bart Trojanowski

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] lockdep: inconsistent lock state

Hi David,

I moved the cachedfilesd dir to an ext3 filesystem.

# grep '^dir' /etc/cachefilesd.conf
dir /var/cache/fscache
# mount | grep /var/cache/fscache
/dev/mapper/vg-fscache on /var/cache/fscache type ext3 (rw,user_xattr)

Here is the latest lockdep event. I'd blame xfs, but I run other
systems with xfs, but without fscache... maybe they just don't play well
together.

-Bart

=================================
[ INFO: inconsistent lock state ]
2.6.30-kvm4-dirty #5
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd1/426 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&ip->i_lock)->mr_lock){++++?+}, at: [<ffffffff80365a9b>] xfs_ilock+0x60/0x7e
{RECLAIM_FS-ON-W} state was registered at:
[<ffffffff802694a9>] mark_held_locks+0x4d/0x6b
[<ffffffff80269575>] lockdep_trace_alloc+0xae/0xcf
[<ffffffff802d1838>] kmem_cache_alloc+0x27/0x10d
[<ffffffff80380b5c>] kmem_zone_alloc+0x6c/0xb7
[<ffffffff80380bbb>] kmem_zone_zalloc+0x14/0x35
[<ffffffff80356eff>] xfs_da_state_alloc+0x1a/0x1c
[<ffffffff8035e78d>] xfs_dir2_node_lookup+0x1c/0xee
[<ffffffff8035a11a>] xfs_dir_lookup+0x10b/0x158
[<ffffffff8037e716>] xfs_lookup+0x50/0xb3
[<ffffffff803876f2>] xfs_vn_lookup+0x44/0x83
[<ffffffff802e0050>] do_lookup+0xdc/0x1c0
[<ffffffff802e231b>] __link_path_walk+0x96a/0xdf9
[<ffffffff802e29b8>] path_walk+0x6e/0xd9
[<ffffffff802e2c09>] do_path_lookup+0x185/0x1a6
[<ffffffff802e391a>] do_filp_open+0x108/0x8ee
[<ffffffff802d6345>] do_sys_open+0x5b/0xe2
[<ffffffff802d63ff>] sys_open+0x20/0x22
[<ffffffff80209185>] init_post+0x3a/0x18d
[<ffffffff80a046c1>] kernel_init+0x181/0x18c
[<ffffffff8020ce8a>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 228127
hardirqs last enabled at (228127): [<ffffffff802aaaa8>] free_hot_cold_page+0x1ae/0x26b
hardirqs last disabled at (228126): [<ffffffff802aab2b>] free_hot_cold_page+0x231/0x26b
softirqs last enabled at (227688): [<ffffffff8024a7d1>] __do_softirq+0x189/0x198
softirqs last disabled at (227683): [<ffffffff8020cf8c>] call_softirq+0x1c/0x28

other info that might help us debug this:
2 locks held by kswapd1/426:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff802b1223>] shrink_slab+0x3d/0x159
#1: (iprune_mutex){+.+.-.}, at: [<ffffffff802eb18f>] shrink_icache_memory+0x50/0x246

stack backtrace:
Pid: 426, comm: kswapd1 Not tainted 2.6.30-kvm4-dirty #5
Call Trace:
[<ffffffff80268f0e>] print_usage_bug+0x1bc/0x1cd
[<ffffffff802168ae>] ? save_stack_trace+0x2f/0x4d
[<ffffffff80269b4c>] ? check_usage_forwards+0x0/0x9c
[<ffffffff80269217>] mark_lock+0x2f8/0x53d
[<ffffffff8026ac54>] __lock_acquire+0x812/0x16b4
[<ffffffff8026bbbd>] lock_acquire+0xc7/0xf3
[<ffffffff80365a9b>] ? xfs_ilock+0x60/0x7e
[<ffffffff8025d911>] down_write_nested+0x34/0x67
[<ffffffff80365a9b>] ? xfs_ilock+0x60/0x7e
[<ffffffff80365a9b>] xfs_ilock+0x60/0x7e
[<ffffffff8037d0b1>] xfs_reclaim+0x64/0xae
[<ffffffff80389a82>] xfs_fs_destroy_inode+0x3c/0x5c
[<ffffffff802eb017>] destroy_inode+0x3f/0x54
[<ffffffff802eb10b>] dispose_list+0xdf/0x113
[<ffffffff802eb34f>] shrink_icache_memory+0x210/0x246
[<ffffffff802b12ca>] shrink_slab+0xe4/0x159
[<ffffffff802b1a9c>] kswapd+0x4a7/0x646
[<ffffffff802af2ae>] ? isolate_pages_global+0x0/0x232
[<ffffffff8025a2f7>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff8026976d>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff802b15f5>] ? kswapd+0x0/0x646
[<ffffffff802b15f5>] ? kswapd+0x0/0x646
[<ffffffff80259ee8>] kthread+0x5b/0x88
[<ffffffff8020ce8a>] child_rip+0xa/0x20
[<ffffffff8020c850>] ? restore_args+0x0/0x30
[<ffffffff8023fcf8>] ? finish_task_switch+0x40/0x111
[<ffffffff80259e68>] ? kthreadd+0x10f/0x134
[<ffffffff80259e8d>] ? kthread+0x0/0x88
[<ffffffff8020ce80>] ? child_rip+0x0/0x20
Pid: 29607, comm: kslowd Not tainted 2.6.30-kvm4-dirty #5
Call Trace:
[<ffffffff80235b69>] ? __wake_up+0x27/0x55
[<ffffffffa02575cc>] cachefiles_read_waiter+0x5d/0x102 [cachefiles]
[<ffffffff80233a55>] __wake_up_common+0x4b/0x7a
[<ffffffff80235b7f>] __wake_up+0x3d/0x55
[<ffffffff8025a2cd>] __wake_up_bit+0x31/0x33
[<ffffffff802a51c6>] unlock_page+0x27/0x2b
[<ffffffffa0234fba>] ext3_truncate+0x4bb/0x8fd [ext3]
[<ffffffff802ba7d7>] ? unmap_mapping_range+0x232/0x241
[<ffffffff8026976d>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff802ba987>] vmtruncate+0xc4/0xe4
[<ffffffff802ebfae>] inode_setattr+0x30/0x12a
[<ffffffffa023299f>] ext3_setattr+0x198/0x1ff [ext3]
[<ffffffff802ec241>] notify_change+0x199/0x2e4
[<ffffffffa02545b1>] cachefiles_attr_changed+0x10c/0x181 [cachefiles]
[<ffffffffa0256110>] ? cachefiles_walk_to_object+0x68b/0x798 [cachefiles]
[<ffffffffa0254c72>] cachefiles_lookup_object+0xac/0xd4 [cachefiles]
[<ffffffffa017450f>] fscache_lookup_object+0x136/0x14e [fscache]
[<ffffffffa0174aad>] fscache_object_slow_work_execute+0x243/0x814 [fscache]
[<ffffffff802a4092>] slow_work_thread+0x278/0x43a
[<ffffffff8025a2f7>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff802a3e1a>] ? slow_work_thread+0x0/0x43a
[<ffffffff802a3e1a>] ? slow_work_thread+0x0/0x43a
[<ffffffff80259ee8>] kthread+0x5b/0x88
[<ffffffff8020ce8a>] child_rip+0xa/0x20
[<ffffffff805a1968>] ? _spin_unlock_irq+0x30/0x3b
[<ffffffff8020c850>] ? restore_args+0x0/0x30
[<ffffffff8023fcf8>] ? finish_task_switch+0x40/0x111
[<ffffffff80259e68>] ? kthreadd+0x10f/0x134
[<ffffffff80259e8d>] ? kthread+0x0/0x88
[<ffffffff8020ce80>] ? child_rip+0x0/0x20
CacheFiles: I/O Error: Readpage failed on backing file c0000000000830
FS-Cache: Cache cachefiles stopped due to I/O error

--
WebSig: http://www.jukie.net/~bart/sig/

2009-06-18 14:23:57

by David Howells

[permalink] [raw]
Subject: Re: [v2.6.30 nfs+fscache] lockdep: inconsistent lock state

Bart Trojanowski <[email protected]> wrote:

> Pid: 29607, comm: kslowd Not tainted 2.6.30-kvm4-dirty #5
> Call Trace:
> [<ffffffff80235b69>] ? __wake_up+0x27/0x55
> [<ffffffffa02575cc>] cachefiles_read_waiter+0x5d/0x102 [cachefiles]
> [<ffffffff80233a55>] __wake_up_common+0x4b/0x7a
> [<ffffffff80235b7f>] __wake_up+0x3d/0x55
> [<ffffffff8025a2cd>] __wake_up_bit+0x31/0x33
> [<ffffffff802a51c6>] unlock_page+0x27/0x2b
> [<ffffffffa0234fba>] ext3_truncate+0x4bb/0x8fd [ext3]
> [<ffffffff802ba7d7>] ? unmap_mapping_range+0x232/0x241
> [<ffffffff8026976d>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff802ba987>] vmtruncate+0xc4/0xe4
> [<ffffffff802ebfae>] inode_setattr+0x30/0x12a
> [<ffffffffa023299f>] ext3_setattr+0x198/0x1ff [ext3]
> [<ffffffff802ec241>] notify_change+0x199/0x2e4
> [<ffffffffa02545b1>] cachefiles_attr_changed+0x10c/0x181 [cachefiles]
> [<ffffffffa0256110>] ? cachefiles_walk_to_object+0x68b/0x798 [cachefiles]
> [<ffffffffa0254c72>] cachefiles_lookup_object+0xac/0xd4 [cachefiles]
> [<ffffffffa017450f>] fscache_lookup_object+0x136/0x14e [fscache]
> [<ffffffffa0174aad>] fscache_object_slow_work_execute+0x243/0x814 [fscache]
> [<ffffffff802a4092>] slow_work_thread+0x278/0x43a
> [<ffffffff8025a2f7>] ? autoremove_wake_function+0x0/0x3d
> [<ffffffff802a3e1a>] ? slow_work_thread+0x0/0x43a
> [<ffffffff802a3e1a>] ? slow_work_thread+0x0/0x43a
> [<ffffffff80259ee8>] kthread+0x5b/0x88
> [<ffffffff8020ce8a>] child_rip+0xa/0x20
> [<ffffffff805a1968>] ? _spin_unlock_irq+0x30/0x3b
> [<ffffffff8020c850>] ? restore_args+0x0/0x30
> [<ffffffff8023fcf8>] ? finish_task_switch+0x40/0x111
> [<ffffffff80259e68>] ? kthreadd+0x10f/0x134
> [<ffffffff80259e8d>] ? kthread+0x0/0x88
> [<ffffffff8020ce80>] ? child_rip+0x0/0x20
> CacheFiles: I/O Error: Readpage failed on backing file c0000000000830
> FS-Cache: Cache cachefiles stopped due to I/O error

Yeah, I know what that is. Patch attached.

David
---
From: David Howells <[email protected]>
Subject: [PATCH] CacheFiles: Don't write a full page if there's only a partial page to cache

cachefiles_write_page() writes a full page to the backing file for the last
page of the netfs file, even if the netfs file's last page is only a partial
page.

This causes the EOF on the backing file to be extended beyond the EOF of the
netfs, and thus the backing file will be truncated by cachefiles_attr_changed()
called from cachefiles_lookup_object().

So we need to limit the write we make to the backing file on that last page
such that it doesn't push the EOF too far.


Also, if a backing file that has a partial page at the end is expanded, we
discard the partial page and refetch it on the basis that we then have a hole
in the file with invalid data, and should the power go out... A better way to
deal with this could be to record a note that the partial page contains invalid
data until the correct data is written into it.

This isn't a problem for netfs's that discard the whole backing file if the
file size changes (such as NFS).

Signed-off-by: David Howells <[email protected]>
---

fs/cachefiles/interface.c | 20 +++++++++++++++++---
fs/cachefiles/rdwr.c | 23 +++++++++++++++++++----
include/linux/fscache-cache.h | 3 +++
3 files changed, 39 insertions(+), 7 deletions(-)


diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 431accd..919a7b6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -403,12 +403,26 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
if (oi_size == ni_size)
return 0;

- newattrs.ia_size = ni_size;
- newattrs.ia_valid = ATTR_SIZE;
-
cachefiles_begin_secure(cache, &saved_cred);
mutex_lock(&object->backer->d_inode->i_mutex);
+
+ /* if there's an extension to a partial page at the end of the backing
+ * file, we need to discard the partial page so that we pick up new
+ * data after it */
+ if (oi_size & ~PAGE_MASK && ni_size > oi_size) {
+ _debug("discard tail %llx", oi_size);
+ newattrs.ia_valid = ATTR_SIZE;
+ newattrs.ia_size = oi_size & PAGE_MASK;
+ ret = notify_change(object->backer, &newattrs);
+ if (ret < 0)
+ goto truncate_failed;
+ }
+
+ newattrs.ia_valid = ATTR_SIZE;
+ newattrs.ia_size = ni_size;
ret = notify_change(object->backer, &newattrs);
+
+truncate_failed:
mutex_unlock(&object->backer->d_inode->i_mutex);
cachefiles_end_secure(cache, saved_cred);

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index a69787e..86639c1 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -801,7 +801,8 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)
struct cachefiles_cache *cache;
mm_segment_t old_fs;
struct file *file;
- loff_t pos;
+ loff_t pos, eof;
+ size_t len;
void *data;
int ret;

@@ -835,15 +836,29 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)
ret = -EIO;
if (file->f_op->write) {
pos = (loff_t) page->index << PAGE_SHIFT;
+
+ /* we mustn't write more data than we have, so we have
+ * to beware of a partial page at EOF */
+ eof = object->fscache.store_limit_l;
+ len = PAGE_SIZE;
+ if (eof & ~PAGE_MASK) {
+ ASSERTCMP(pos, <, eof);
+ if (eof - pos < PAGE_SIZE) {
+ _debug("cut short %llx to %llx",
+ pos, eof);
+ len = eof - pos;
+ ASSERTCMP(pos + len, ==, eof);
+ }
+ }
+
data = kmap(page);
old_fs = get_fs();
set_fs(KERNEL_DS);
ret = file->f_op->write(
- file, (const void __user *) data, PAGE_SIZE,
- &pos);
+ file, (const void __user *) data, len, &pos);
set_fs(old_fs);
kunmap(page);
- if (ret != PAGE_SIZE)
+ if (ret != len)
ret = -EIO;
}
fput(file);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 84d3532..97229be 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -374,6 +374,7 @@ struct fscache_object {
struct list_head dep_link; /* link in parent's dependents list */
struct list_head pending_ops; /* unstarted operations on this object */
pgoff_t store_limit; /* current storage limit */
+ loff_t store_limit_l; /* current storage limit */
};

extern const char *fscache_object_states[];
@@ -414,6 +415,7 @@ void fscache_object_init(struct fscache_object *object,
object->events = object->event_mask = 0;
object->flags = 0;
object->store_limit = 0;
+ object->store_limit_l = 0;
object->cache = cache;
object->cookie = cookie;
object->parent = NULL;
@@ -460,6 +462,7 @@ static inline void fscache_object_lookup_error(struct fscache_object *object)
static inline
void fscache_set_store_limit(struct fscache_object *object, loff_t i_size)
{
+ object->store_limit_l = i_size;
object->store_limit = i_size >> PAGE_SHIFT;
if (i_size & ~PAGE_MASK)
object->store_limit++;