2009-08-25 14:23:04

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: Drop mapped buffer_head check during page_mkwrite

Inorder to check whether the buffer_heads are mapped we need
to hold page lock. Otherwise a reclaim can cleanup the attached
buffer_heads. Instead of taking page lock and check whether
buffer_heads are mapped we let the write_begin/write_end callback
does the equivalent. It does have a performance impact in that we
are doing more work if we the buffer_heads are already mapped.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/inode.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..d40b97d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5281,12 +5281,6 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
else
len = PAGE_CACHE_SIZE;

- if (page_has_buffers(page)) {
- /* return if we have all the buffers mapped */
- if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
- ext4_bh_unmapped))
- goto out_unlock;
- }
/*
* OK, we need to fill the hole... Do write_begin write_end
* to do block allocation/reservation.We are not holding
--
1.6.4.1.174.g32f4c



2009-08-26 02:24:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Drop mapped buffer_head check during page_mkwrite

On Tue, Aug 25, 2009 at 07:52:59PM +0530, Aneesh Kumar K.V wrote:
> Inorder to check whether the buffer_heads are mapped we need
> to hold page lock. Otherwise a reclaim can cleanup the attached
> buffer_heads. Instead of taking page lock and check whether
> buffer_heads are mapped we let the write_begin/write_end callback
> does the equivalent. It does have a performance impact in that we
> are doing more work if we the buffer_heads are already mapped.

I'm not sure I understand the commit description. From the patch you
are removing the check to see if all of the buffers are mapped. But
the patch isn't moving the check to ext4_write_begin() or
ext4_write_end(). Are you saying the check is already in
ext4_write_begin()? It doesn't seem to be in ext4_write_end().

I see that we do call write_page_buffers() in ext4_write_begin(), and
in do_journal_get_write_access() we do check to see if the buffers are
present. But if they aren't, we don't return an error; we just fail
request journal write access for the buffer head.

Am I missing something? This patch doesn't feel complete, or the
commit description is very confusing....

- Ted

2009-08-26 05:14:57

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: Drop mapped buffer_head check during page_mkwrite

On Tue, Aug 25, 2009 at 10:24:45PM -0400, Theodore Tso wrote:
> On Tue, Aug 25, 2009 at 07:52:59PM +0530, Aneesh Kumar K.V wrote:
> > Inorder to check whether the buffer_heads are mapped we need
> > to hold page lock. Otherwise a reclaim can cleanup the attached
> > buffer_heads. Instead of taking page lock and check whether
> > buffer_heads are mapped we let the write_begin/write_end callback
> > does the equivalent. It does have a performance impact in that we
> > are doing more work if we the buffer_heads are already mapped.
>
> I'm not sure I understand the commit description. From the patch you
> are removing the check to see if all of the buffers are mapped. But
> the patch isn't moving the check to ext4_write_begin() or
> ext4_write_end(). Are you saying the check is already in
> ext4_write_begin()? It doesn't seem to be in ext4_write_end().
>
> I see that we do call write_page_buffers() in ext4_write_begin(), and
> in do_journal_get_write_access() we do check to see if the buffers are
> present. But if they aren't, we don't return an error; we just fail
> request journal write access for the buffer head.
>
> Am I missing something? This patch doesn't feel complete, or the
> commit description is very confusing....
>

In page_mkwrite if the blocks are already mapped we need not call
get_block. The purpose of the removed check was to avoid calling
write_begin/write_end if the pages are already mapped. Which inturn
avoid calling get_block. But in ext4_write_begin -> __block_prepar_write
we make sure the we don't call get_block if the buffer_head is mapped.

The problem with the current code is that since we don't take page lock
before looking at the attached buffer heads, there is a possibility that
the reclaim can free the attached buffer_heads. This actually caused the
below crash.

BUG: unable to handle kernel NULL pointer dereference at 00000014
IP: [<c04feb2b>] walk_page_buffers+0x1b/0x6b
*pdpt = 000000002786a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:03/0000:03:01.0/local_cpus
Modules linked in: joydev nfs lockd nfs_acl auth_rpcgss bridge stp llc bnep sco
l2cap bluetooth sunrpc ipv6 p4_clockmod dm_multipath uinput ata_generic
pata_acpi qla2xxx e1000 i2c_piix4 scsi_transport_fc pata_serverworks scsi_tgt
serio_raw pcspkr mptspi mptscsih mptbase scsi_transport_spi radeon drm
i2c_algo_bit i2c_core [last unloaded: cpufreq_powersave]

Pid: 26387, comm: bash-shared-map Not tainted (2.6.29.4-1.el6.i686.PAE #1) IBM
eServer BladeCenter HS40 -[883961X]-
EIP: 0060:[<c04feb2b>] EFLAGS: 00210246 CPU: 2
EIP is at walk_page_buffers+0x1b/0x6b
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: f6d7933c EDI: 00000000 EBP: e7981d68 ESP: e7981d4c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash-shared-map (pid: 26387, ti=e7980000 task=f2db58d0
task.ti=e7980000)
Stack:
00200246 00000000 00000000 c04811b4 00001000 f6d7933c 00000800 e7981dac
c04feed7 00001000 00000000 c04fedc3 c17790e0 0006e05c 00000000 f6d79318
e78cc780 0004bdfc 00000000 c17790e0 c17790e0 f2c3b328 00000001 c17790e0
Call Trace:
[<c04811b4>] ? lock_page+0x1f/0x34
[<c04feed7>] ? ext4_page_mkwrite+0xff/0x178
[<c04fedc3>] ? ext4_bh_unmapped+0x0/0x15
[<c0493c28>] ? __do_fault+0x128/0x39b
[<c04941e1>] ? handle_mm_fault+0x346/0x81e
[<c042d588>] ? find_busiest_group+0x2af/0x6e8
[<c0716e70>] ? do_page_fault+0x307/0x7ec
[<c044d398>] ? clocksource_read+0xc/0xf
[<c04293a1>] ? update_curr+0x183/0x18b
[<c044d398>] ? clocksource_read+0xc/0xf
[<c044d6fa>] ? getnstimeofday+0x59/0xec
[<c042f297>] ? rebalance_domains+0x6c/0x485
[<c07151a1>] ? _spin_lock_irq+0x21/0x25
[<c043e486>] ? run_timer_softirq+0x1ae/0x1c0
[<c042f6e3>] ? run_rebalance_domains+0x33/0x9d
[<c043a86c>] ? _local_bh_enable+0x8d/0x9d
[<c043a9a6>] ? __do_softirq+0x12a/0x139
[<c043aa1d>] ? do_softirq+0x68/0x7e
[<c043ab7d>] ? irq_exit+0x54/0x77
[<c0716b69>] ? do_page_fault+0x0/0x7ec
[<c07152f7>] ? error_code+0x77/0x7c
Code: 00 8d 65 f4 5b 5e 5f 5d 8d 67 f8 5f c3 90 90 90 55 89 e5 57 56 53 83 ec
10 0f 1f 44 00 00 8b 7d 0c 89 45 ec 89 d3 31 c0 89 4d e8 <8b> 4a 14 eb 39 8b 72
04 3b 45 08 89 75 f0 8d 34 08 73 05 3b 75
EIP: [<c04feb2b>] walk_page_buffers+0x1b/0x6b SS:ESP 0068:e7981d4c
---[ end trace 8f806517a1c38a37 ]---


-aneesh