I redid my tests on the last few ext-patch-queue snapshots.
The snapshot:
ext4-patch-queue-476b1104923c2228b48aee73070cddd5430b3b54.tar.gz from
062408 @2256hrs GMT works fine. No segfaults.
The latest snapshot:
ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz from
062508 @ 0019hrs GMT causes segmentation faults whenever I do extensive
copying of small files (my Gentoo portage tree & metadata cache folders) to
another ext4dev partition set up identically and then when I make a tarball
from this copied data.
I have been able to reproduce the segfaults consistently. Both partitions
were formatted with flex_bg & meta_bg features enabled (obviously no
resize_inodes) and are set for ordered data mode.
I assume delalloc is enabled although ext4-patch-queue does not yet print
message saying so on boot like mballoc does.
Any suggestions would be most appreciated.
Thanks,
Gary
Gary Hawco wrote:
> I redid my tests on the last few ext-patch-queue snapshots.
>
> The snapshot:
> ext4-patch-queue-476b1104923c2228b48aee73070cddd5430b3b54.tar.gz from
> 062408 @2256hrs GMT works fine. No segfaults.
>
> The latest snapshot:
> ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz from
> 062508 @ 0019hrs GMT causes segmentation faults whenever I do extensive
> copying of small files (my Gentoo portage tree & metadata cache folders) to
> another ext4dev partition set up identically and then when I make a tarball
> from this copied data.
>
> I have been able to reproduce the segfaults consistently. Both partitions
> were formatted with flex_bg & meta_bg features enabled (obviously no
> resize_inodes) and are set for ordered data mode.
>
> I assume delalloc is enabled although ext4-patch-queue does not yet print
> message saying so on boot like mballoc does.
>
> Any suggestions would be most appreciated.
Do you get kernel messages when this happens? If so can you provide them?
Thanks,
-Eric
More on my segmentation problems. Just looking at the end of each kernel
message, they always end with:
EIP [XXXXX] jbd2_journal_data_metadata-XXX
This occurs when I try to copy data from my main partition to a backup
partition or make a tarball.
Both partitions are setup with flex_bg,meta_bg,uninit_bg with ordered data
mode using noatime,nodiratime,journal_async_commit mount parameters.
Again, the snapshot from 062508/0019hrs GMT --
ext4-patch-queue-b5db22ef52ed53d8e3fa978a5a29e1609c9333aa.tar.gz
works fine patching a clean linux-2.6.26-rc6 source and exhibits no
problems whatsoever.
Starting with the next snapshot (062608/0042hrs GMT) is where my segfaults
start. This continues using the newest snapshot from 062608/2251hrs GMT.
I have patched the source from the individual snapshots mentioned as well
as just pulling and updating my ext4-patch-queue folder with the same
results. If I roll the kernel back to the 062508/0019 snapshot, the
problems vanish.
No errors noted during e2fsck -fvy on either partition.
This is driving me crazy!
Gary
On Thu, Jun 26, 2008 at 10:12:27PM +0000, Gary Hawco wrote:
> More on my segmentation problems. Just looking at the end of each kernel
> message, they always end with:
> EIP [XXXXX] jbd2_journal_data_metadata-XXX
Hmm... any chance you can take a picture of the crash message with a
digital camera and post the jpg? (I'm assuming you haven't been able
to capture the OOPS stack trace in /var/log/kern.log or
/var/log/messages or some such.)
> This occurs when I try to copy data from my main partition to a backup
> partition or make a tarball.
I am currently using the ext4 patch queue comit id #555132eb from
2008-06-30 13:04:35 -0400, and I can't reproduce it. I just tried
backing up by isync mail directory to a tar.gz file and then restored
it, with no problems. Both the source and backup partition are ext4
filesystems with flex_bg, meta_bg, and uninit_bg, located on separate
LVM logical volumes on my laptop.[1]
[1] http://thunk.org/tytso/blog/2008/06/30/ext4-is-now-the-primary-filesystem-on-my-laptop/
The partition is mounted using mount options:
noatime,errors=remount-ro,barrier=1,data=ordered
according to /proc/mounts.
The big difference between your mount options and mine is that I dont
have journal_async_commit as a mount option. Which would be largely
moot for me since LVM doesn't support barrier operations. :-( :-( :-(
You seem to be able to reproduce the problem at will. Could you try
removing te journal_sync_commit option and see if it makes the problem
go away? That would be very interesting if it were the case...
- Ted
Ted,
Did some more testing before replying.
Have two linux operating systems:
Both were setup identically, i.e, formatted with flex_bg & meta_bg. Tune2fs
used to enable uninit_bg. Ordered data mode. Grub used to pass boot
parameter of rootflags=commit=15. Fstab mount options of
noatime,nodiratime,journal_async_commit.
Slackware 12.1--this one uses its BSD style startup scripts. No problems
whatsoever with any of the snapshots including the newest from 30 June
2008/2219hrs.
Gentoo --this is the system I am having trouble with.
It will boot the snapshot from 062508/0019hrs fine. Everything works fine
here including my script that updates the portage/metadata trees, copies
them to another partition setup identically with ext4dev, i.e. flex_bg,
meta_bg & uninit_bg, then creates a tarball.
All of the snapshots after the 062508 timestamp through the 062708/2353hrs.
snapshot would segfault running this script. (Looking for a digital
camera, that uses flash media that windows or linux will recognize so I can
save and post the jpeg image)
Starting with today's snapshots rebased against 2.6.26-rc8 kernel a new
problem surfaced, and the old segfault issues disappeared. That is, I
could run my script without any problems.
My Gentoo uses an updated baselayout/OpenRC start configuration. It has a
/lib/rc folder as opposed to var/lib/init.d, containing several subfolders,
such as /lib/rc/init.d which cache dependencies and make for a very fast
boot and shutdown.
With today's snapshots I get one clean start, then I get errors where the
network interface eth0 does not initialize on startup but can manually be
pulled in by "dhcpcd eth0". If I delete the contents of /lib/rc/init.d on
the next restart the network interface initializes properly.
If I roll back to the 062508 kernel snapshot, this new problem goes away.
I tried removing journal_async_commit from fstab mount options without any
difference. I was also passing rootflags=commit=15 during bootup with
grub. I removed that parameter changing back to the default 5 second commit
interval without any improvement. This new startup style uses parallel
service starts. I changed back to series starting without any improvement.
I even tried switching from ordered data mode to writeback mode without
success.
So if I haven't thoroughly confused you or put you to sleep, to summarize:
Slackware loves all the snapshots.
Gentoo was fine through 062508/00119hrs. No problems. From 062608/0042hrs -
062708/2353 snapshots boot sequence was fine, but segfaulting running my
portage/metadata backup script (lots of small files).
Today's updates rebased against 2.6.26-rc8 are NOT segfaulting running the
backup script, but seem to be corrupting the /lib/rc/init.d/database files
after the first start.
I am willing to bet that Gentoo on the old baselayout/Non open-rc startup
up scripts would have no problems ala Slackware, but it's curious
everything was fine through the 062508/0019GMT snapshot. It seems that once
delalloc was brought back in with ordered data mode problems started to
arise. I tried to roll back the baselayout v2 to the older version 1.12,
but I broke the os and had to quickly reinstall using a recent tarball.
It's the only explanation why Gentoo is having problems, but Slackware is
not. And now that today with the latest rc-8 snapshots the initialization
of devices during startup is getting fubared, I am certain the
Baselayout2/open-rc-2.5 does not like the latest iterations of the
ext4-patch-queue kernel.
Hope this expose sheds some light on things.
Thanks again,
Gary
On Tue, Jul 01, 2008 at 12:00:46AM +0000, Gary Hawco wrote:
> Gentoo was fine through 062508/00119hrs. No problems. From 062608/0042hrs -
> 062708/2353 snapshots boot sequence was fine, but segfaulting running my
> portage/metadata backup script (lots of small files).
>
> Today's updates rebased against 2.6.26-rc8 are NOT segfaulting running the
> backup script, but seem to be corrupting the /lib/rc/init.d/database files
> after the first start.
>
> I am willing to bet that Gentoo on the old baselayout/Non open-rc startup
> up scripts would have no problems ala Slackware, but it's curious
> everything was fine through the 062508/0019GMT snapshot. It seems that once
> delalloc was brought back in with ordered data mode problems started to
> arise. I tried to roll back the baselayout v2 to the older version 1.12,
> but I broke the os and had to quickly reinstall using a recent tarball.
> It's the only explanation why Gentoo is having problems, but Slackware is
> not. And now that today with the latest rc-8 snapshots the initialization
> of devices during startup is getting fubared, I am certain the
> Baselayout2/open-rc-2.5 does not like the latest iterations of the
> ext4-patch-queue kernel.
Gary,
It's definitely the case that kernel oops indicates kernel bugs. It
might be the case that one distribution is better than another at
exposing the bug and making it visible, but that doesn't mean the bug
is in the distribution; in fact, if you can provoke a segfault, this
is *good* because it can help us track down the bug more
effectively/efficiently. Data corruption like you are seeing now is
worse, since it's actually much harder to track down.
Of course, in order to track down the segfault we really need the oops
messages. In the bad old days I would laborously copy down all of the
numbers and function names in the stack trace using pen and paper, and
then transcribe it into e-mail. (Yes, I also walked uphill through
the snow in the winter, in both directions. :-) Using a digital
camera is more convenient, but if it's too hard for you to grab one,
you can always fall back to the pen and paper model.
So if you have data corruption in your init.d files, does it go away
if you disable delalloc? What if you disable mballoc? Of course, do
this on Gentoo if it's better at provoking the filesystem bug.
Remember, from the point of view of filesystem developers, it's good
if you can provoke the bug; since it's only then that you can squash
it. :-)
- Ted
Ted,
Thanks for your quick reply. With the newest rc8-based snapshots I am no
longer getting segfaults, unless you wanted me to roll back to the
rc6-based snapshots to reproduce the segfaults.
Per your latest request I disabled delalloc, and voila, no more data
corruption! Then I enabled delalloc and disabled mballoc and file
corruption to /lib/rc/init.d/nettree returned. So delalloc is the culprit.
I hope this will help figure this out.
Thanks again,
Gary
On Tue, 2008-07-01 at 10:54 +0000, Gary Hawco wrote:
> Ted,
>
> Thanks for your quick reply. With the newest rc8-based snapshots I am no
> longer getting segfaults, unless you wanted me to roll back to the
> rc6-based snapshots to reproduce the segfaults.
>
> Per your latest request I disabled delalloc, and voila, no more data
> corruption! Then I enabled delalloc and disabled mballoc and file
> corruption to /lib/rc/init.d/nettree returned. So delalloc is the culprit.
>
> I hope this will help figure this out.
>
Could you try the following patch? It fixes the problem of update
on-disk size too early without block allocation, the problem is
introduced unintentionally by another bug fix patch added to the patch
queue yesterday.
Signed-off-by: Mingming Cao <[email protected]>
---
fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
Index: linux-2.6.26-rc8/fs/ext4/inode.c
===================================================================
--- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-01 15:13:00.000000000 -0700
+++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-01 15:34:19.000000000 -0700
@@ -1892,6 +1892,31 @@ out:
return ret;
}
+/*
+ * Check if we should update i_disksize
+ * when write to the end of file but not require block allocation
+ */
+static int ext4_da_should_update_i_disksize(struct page *page,
+ unsigned long offset)
+{
+ struct buffer_head *head, *bh;
+ unsigned int curr_off = 0;
+
+ head = page_buffers(page);
+ bh = head;
+ do {
+ unsigned int next_off = curr_off + bh->b_size;
+
+ if ((curr_off >= offset) &&
+ (!buffer_mapped || (buffer_delay(bh))) {
+ return 0;
+ }
+ curr_off = next_off;
+ } while ((bh = bh->b_this_page) != head);
+
+ return 1;
+}
+
static int ext4_da_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
@@ -1901,6 +1926,10 @@ static int ext4_da_write_end(struct file
int ret = 0, ret2;
handle_t *handle = ext4_journal_current_handle();
loff_t new_i_size;
+ unsigned long start, end;
+
+ start = pos & (PAGE_CACHE_SIZE - 1);
+ end = start + copied;
/*
* generic_write_end() will run mark_inode_dirty() if i_size
@@ -1910,8 +1939,7 @@ static int ext4_da_write_end(struct file
new_i_size = pos + copied;
if (new_i_size > EXT4_I(inode)->i_disksize)
- if (!walk_page_buffers(NULL, page_buffers(page),
- 0, len, NULL, ext4_bh_unmapped_or_delay)){
+ if (ext4_da_should_update_i_disksize(page, end))
/*
* Updating i_disksize when extending file without
* need block allocation
> Thanks again,
> Gary
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Mingming,
Can you post that patch somewhere for download? I access my email using
Windows Vista, not in linux, so it would be very laborious to hand copy
this patch and recreate it in linux.
Updated the 2.6.26-rc8 kernel with the latest snapshot from today at
1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow
the system to remount read/write on boot. But it worked fine in Slackware.
Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does
not like ext4.
And for clarification, the patch you sent me is not yet in the queue?
So to summarize my recent (as of the last seven days of snapshots) saga
with ext4-queue-patch in Gentoo
Through 062508/0019hrs no problems (although I do not believe delalloc was
enabled)
>From 062608/0042 - 062708/2353hrs had segfaulting when copying and tarring
small files. No other detectable issues.
>From 063008/1704 - 063008/2219hrs --no segfaulting but data corruption in
one initiation file causing network interfaces to have to be manually
activated. No data corruption if delalloc disabled
latest snapshot 070108/1833hrs --system unable to be remounted read/write
after fsck check. Manually booted to livecd with e2fsprogs-1.41WIP
(17-June-2008). Efsck shows no filesystem problems.
Rolling back to 063008/2219 snapshot fixes boot problems. No segfaulting.
No data corruption IF delalloc disabled.
Thanks,
Gary
On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote:
> Mingming,
>
> Can you post that patch somewhere for download? I access my email using
> Windows Vista, not in linux, so it would be very laborious to hand copy
> this patch and recreate it in linux.
>
Patch attached.
> Updated the 2.6.26-rc8 kernel with the latest snapshot from today at
> 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow
> the system to remount read/write on boot. But it worked fine in Slackware.
> Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does
> not like ext4.
>
The only commit made yesterday was add the
Add ext4-fix-online-resize-with-mballoc.patch
It does not seem to impact the boot.
Was the filesystem corrupted before you reboot with new kernel? If so
the next reboot will probably refuse to remount it as it need fsck.
> And for clarification, the patch you sent me is not yet in the queue?
>
Just pushed into the queue today. Please check.
> So to summarize my recent (as of the last seven days of snapshots) saga
> with ext4-queue-patch in Gentoo
>
> Through 062508/0019hrs no problems (although I do not believe delalloc was
> enabled)
> From 062608/0042 - 062708/2353hrs had segfaulting when copying and tarring
> small files. No other detectable issues.
> From 063008/1704 - 063008/2219hrs --no segfaulting but data corruption in
> one initiation file causing network interfaces to have to be manually
> activated. No data corruption if delalloc disabled
> latest snapshot 070108/1833hrs --system unable to be remounted read/write
> after fsck check. Manually booted to livecd with e2fsprogs-1.41WIP
> (17-June-2008). Efsck shows no filesystem problems.
> Rolling back to 063008/2219 snapshot fixes boot problems. No segfaulting.
> No data corruption IF delalloc disabled.
>
> Thanks,
> Gary
>
On Wed, 2008-07-02 at 10:19 -0700, Mingming Cao wrote:
> On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote:
> > Mingming,
> >
> > Can you post that patch somewhere for download? I access my email using
> > Windows Vista, not in linux, so it would be very laborious to hand copy
> > this patch and recreate it in linux.
> >
> Patch attached.
Please use this patch instead, after discuss with Ted, I found an issue
with the patch I sent to the list. ext4 patch queue is also updated with
latest patch.
Ext4: fix delalloc i_disksize early update issue
From: Mingming Cao <[email protected]>
Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check
if it extend the file size without need for allocation. But at that time
the buffer has not being dirtied yet (done in code later in
block_commit_write()), so it always return true and update i_disksize
(before block allocation). we could fix that ext4_da_write_end() to not
use this helper function.
This also fixed another issue:
The i_disksize is updated at ext4_da_write_end() time if
writes to the end of file, and the buffers are all have
blocks allocated. But in the case blocksize < pagesize, and
if the page has, say, the first buffer marked
as buffer_delay, and the write is to EOF and on the third buffer, which
has block already allocated, we certainly need to extend the i_disksize.
Signed-off-by: Mingming Cao <[email protected]>
---
fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
Index: linux-2.6.26-rc8/fs/ext4/inode.c
===================================================================
--- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-02 09:53:42.000000000 -0700
+++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-02 13:22:52.000000000 -0700
@@ -1891,6 +1891,32 @@ out:
return ret;
}
+/*
+ * Check if we should update i_disksize
+ * when write to the end of file but not require block allocation
+ */
+static int ext4_da_should_update_i_disksize(struct page *page,
+ unsigned long offset)
+{
+ struct buffer_head *head, *bh;
+ unsigned int curr_off = 0;
+
+ head = page_buffers(page);
+ bh = head;
+ do {
+ unsigned int next_off = curr_off + bh->b_size;
+
+ if (curr_off <= offset && offset < next_off)
+ if (!buffer_mapped(bh) || (buffer_delay(bh)))
+ return 0;
+ else
+ return 1;
+ curr_off = next_off;
+ } while ((bh = bh->b_this_page) != head);
+
+ return 1;
+}
+
static int ext4_da_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
@@ -1900,6 +1926,10 @@ static int ext4_da_write_end(struct file
int ret = 0, ret2;
handle_t *handle = ext4_journal_current_handle();
loff_t new_i_size;
+ unsigned long start, end;
+
+ start = pos & (PAGE_CACHE_SIZE - 1);
+ end = start + copied;
/*
* generic_write_end() will run mark_inode_dirty() if i_size
@@ -1909,8 +1939,7 @@ static int ext4_da_write_end(struct file
new_i_size = pos + copied;
if (new_i_size > EXT4_I(inode)->i_disksize)
- if (!walk_page_buffers(NULL, page_buffers(page),
- 0, len, NULL, ext4_bh_unmapped_or_delay)){
+ if (ext4_da_should_update_i_disksize(page, end)) {
/*
* Updating i_disksize when extending file without
* need block allocation
[sending via gmail ]
On Thu, Jul 03, 2008 at 05:03:25PM +0530, Aneesh Kumar K.V wrote:
> On Wed, Jul 02, 2008 at 10:19:54AM -0700, Mingming Cao wrote:
> >
> > On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote:
> > > Mingming,
> > >
> > > Can you post that patch somewhere for download? I access my email using
> > > Windows Vista, not in linux, so it would be very laborious to hand copy
> > > this patch and recreate it in linux.
> > >
> > Patch attached.
> >
> > > Updated the 2.6.26-rc8 kernel with the latest snapshot from today at
> > > 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow
> > > the system to remount read/write on boot. But it worked fine in Slackware.
> > > Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does
> > > not like ext4.
> > >
> >
>
> I think we need to protect i_disksize update with i_data_sem. Otherwise
> a parallel writepages and write_end can cause issues. I guess that is
> what Gary is finding. I also did some cleanup for the patch
>
better one moving ext4_truncate i_disksize update under i_data_sem.
ext4_ext_truncate is already doing this.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fcaafe4..05e9790 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1893,18 +1893,29 @@ static int ext4_da_write_begin(struct file
*file, struct address_space *mapping,
/*
* Check if we should update i_disksize
* when write to the end of file but not require block allocation
+ * We check only the buffer head mapping the offset.
+ * ex: File with blocksize 1K page size 4K
+ * block 1 and 2 are holes, block 3 is mapped and half filled
+ * seek to block 1 and write ( marked the buffer delay )
+ * seek to block 3 and extent the end of file with end of file still
+ * falling within block 3. Here the writepages won't update the i_disksize
+ * properly because it allocate only block 1. So we need to update
+ * i_disksize in write_end checking only the offset
+ *
*/
-static int ext4_da_should_update_i_disksize(struct page *page,
- unsigned long offset)
+static int ext4_da_should_update_i_disksize(struct address_space *mapping,
+ struct page *page, unsigned long offset)
{
- struct buffer_head *bh;
- unsigned int idx;
int i;
+ unsigned int idx;
+ struct buffer_head *bh;
+ struct inode *inode = mapping->host;
+ unsigned blocksize = inode->i_sb->s_blocksize;
bh = page_buffers(page);
- idx = offset/bh->b_size;
+ idx = (offset + blocksize - 1)/blocksize;
- for (i=0; i < idx; i++)
+ for (i = 1; i < idx; i++)
bh = bh->b_this_page;
if (!buffer_mapped(bh) || (buffer_delay(bh)))
@@ -1934,15 +1945,20 @@ static int ext4_da_write_end(struct file *file,
new_i_size = pos + copied;
if (new_i_size > EXT4_I(inode)->i_disksize)
- if (ext4_da_should_update_i_disksize(page, end)) {
+ if (ext4_da_should_update_i_disksize(mapping, page, end)) {
/*
* Updating i_disksize when extending file without
* need block allocation
*/
- if (ext4_should_order_data(inode))
- ret = ext4_jbd2_file_inode(handle, inode);
+ down_write(&EXT4_I(inode)->i_data_sem);
+ if (new_i_size > EXT4_I(inode)->i_disksize) {
+ if (ext4_should_order_data(inode))
+ ret = ext4_jbd2_file_inode(handle,
+ inode);
- EXT4_I(inode)->i_disksize = new_i_size;
+ EXT4_I(inode)->i_disksize = new_i_size;
+ }
+ up_write(&EXT4_I(inode)->i_data_sem);
}
ret2 = generic_write_end(file, mapping, pos, len, copied,
page, fsdata);
@@ -2987,6 +3003,11 @@ void ext4_truncate(struct inode *inode)
*/
if (ext4_orphan_add(handle, inode))
goto out_stop;
+ /*
+ * From here we block out all ext4_get_block() callers who want to
+ * modify the block allocation tree.
+ */
+ down_write(&ei->i_data_sem);
/*
* The orphan list entry will now protect us from any crash which
@@ -2997,12 +3018,6 @@ void ext4_truncate(struct inode *inode)
*/
ei->i_disksize = inode->i_size;
- /*
- * From here we block out all ext4_get_block() callers who want to
- * modify the block allocation tree.
- */
- down_write(&ei->i_data_sem);
在 2008-07-03四的 19:37 +0530,Aneesh Kumar写道:
> [sending via gmail ]
>
> On Thu, Jul 03, 2008 at 05:03:25PM +0530, Aneesh Kumar K.V wrote:
> > On Wed, Jul 02, 2008 at 10:19:54AM -0700, Mingming Cao wrote:
> > >
> > > On Tue, 2008-07-01 at 17:50 +0000, Gary Hawco wrote:
> > > > Mingming,
> > > >
> > > > Can you post that patch somewhere for download? I access my email using
> > > > Windows Vista, not in linux, so it would be very laborious to hand copy
> > > > this patch and recreate it in linux.
> > > >
> > > Patch attached.
> > >
> > > > Updated the 2.6.26-rc8 kernel with the latest snapshot from today at
> > > > 1833hrs GMT. All hell broke loose in Gentoo, The new kernel wouldn't allow
> > > > the system to remount read/write on boot. But it worked fine in Slackware.
> > > > Gentoo with the experimental openrc-0.2.5 and baselayout2 apparently does
> > > > not like ext4.
> > > >
> > >
> >
> > I think we need to protect i_disksize update with i_data_sem. Otherwise
> > a parallel writepages and write_end can cause issues. I guess that is
> > what Gary is finding. I also did some cleanup for the patch
> >
>
> better one moving ext4_truncate i_disksize update under i_data_sem.
> ext4_ext_truncate is already doing this.
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fcaafe4..05e9790 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1893,18 +1893,29 @@ static int ext4_da_write_begin(struct file
> *file, struct address_space *mapping,
> /*
> * Check if we should update i_disksize
> * when write to the end of file but not require block allocation
> + * We check only the buffer head mapping the offset.
> + * ex: File with blocksize 1K page size 4K
> + * block 1 and 2 are holes, block 3 is mapped and half filled
> + * seek to block 1 and write ( marked the buffer delay )
> + * seek to block 3 and extent the end of file with end of file still
> + * falling within block 3. Here the writepages won't update the i_disksize
> + * properly because it allocate only block 1. So we need to update
> + * i_disksize in write_end checking only the offset
> + *
> */
Okay, Iwill add this comment
> -static int ext4_da_should_update_i_disksize(struct page *page,
> - unsigned long offset)
> +static int ext4_da_should_update_i_disksize(struct address_space *mapping,
> + struct page *page, unsigned long offset)
> {
> - struct buffer_head *bh;
> - unsigned int idx;
> int i;
> + unsigned int idx;
> + struct buffer_head *bh;
> + struct inode *inode = mapping->host;
> + unsigned blocksize = inode->i_sb->s_blocksize;
>
We could use page->mapping->host to get the inode pointer, so no need to
pass the mapping pointer. But I am fine with this.
> bh = page_buffers(page);
> - idx = offset/bh->b_size;
> + idx = (offset + blocksize - 1)/blocksize;
>
> - for (i=0; i < idx; i++)
> + for (i = 1; i < idx; i++)
> bh = bh->b_this_page;
>
Ack
> if (!buffer_mapped(bh) || (buffer_delay(bh)))
> @@ -1934,15 +1945,20 @@ static int ext4_da_write_end(struct file *file,
>
> new_i_size = pos + copied;
> if (new_i_size > EXT4_I(inode)->i_disksize)
> - if (ext4_da_should_update_i_disksize(page, end)) {
> + if (ext4_da_should_update_i_disksize(mapping, page, end)) {
> /*
> * Updating i_disksize when extending file without
> * need block allocation
> */
> - if (ext4_should_order_data(inode))
> - ret = ext4_jbd2_file_inode(handle, inode);
> + down_write(&EXT4_I(inode)->i_data_sem);
> + if (new_i_size > EXT4_I(inode)->i_disksize) {
> + if (ext4_should_order_data(inode))
> + ret = ext4_jbd2_file_inode(handle,
> + inode);
>
> - EXT4_I(inode)->i_disksize = new_i_size;
> + EXT4_I(inode)->i_disksize = new_i_size;
> + }
> + up_write(&EXT4_I(inode)->i_data_sem);
> }
> ret2 = generic_write_end(file, mapping, pos, len, copied,
> page, fsdata);
> @@ -2987,6 +3003,11 @@ void ext4_truncate(struct inode *inode)
> */
> if (ext4_orphan_add(handle, inode))
> goto out_stop;
> + /*
> + * From here we block out all ext4_get_block() callers who want to
> + * modify the block allocation tree.
> + */
> + down_write(&ei->i_data_sem);
>
> /*
> * The orphan list entry will now protect us from any crash which
> @@ -2997,12 +3018,6 @@ void ext4_truncate(struct inode *inode)
> */
> ei->i_disksize = inode->i_size;
>
> - /*
> - * From here we block out all ext4_get_block() callers who want to
> - * modify the block allocation tree.
> - */
> - down_write(&ei->i_data_sem);
> -
> if (n == 1) { /* direct blocks */
> ext4_free_data(handle, inode, NULL, i_data+offsets[0],
> i_data + EXT4_NDIR_BLOCKS);
>
Ack.
I will merge the update to the parent patch, thanx
Mingming
> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html