Data journalling mode (data=journal) is known to be neglected by
developers and only minority of people is actually using it. This
mode is also less tested than the other two modes by the developers.
This creates a dangerous combination, because the option which seems
*safer* is actually less safe the others. So this commit adds a warning
message in case that data=journal mode is used, so the user is informed
that the mode might be removed in the future.
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/super.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9ea71aa..9d189cf 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct super_block *sb,
sbi->s_min_batch_time = option;
break;
case Opt_data_journal:
+ ext4_msg(sb, KERN_WARNING,
+ "Using data=journal may be removed in the "
+ "future. Please, contact "
+ "[email protected] if you are "
+ "using this feature.");
data_opt = EXT4_MOUNT_JOURNAL_DATA;
goto datacheck;
case Opt_data_ordered:
--
1.7.4.4
On Tue, Jun 28, 2011 at 01:26:03PM +0200, Lukas Czerner wrote:
> Data journalling mode (data=journal) is known to be neglected by
> developers and only minority of people is actually using it. This
> mode is also less tested than the other two modes by the developers.
>
> This creates a dangerous combination, because the option which seems
> *safer* is actually less safe the others. So this commit adds a warning
> message in case that data=journal mode is used, so the user is informed
> that the mode might be removed in the future.
When I last benchmarked it, data=journal mode was considerably faster
than the other modes for sync heavy work loads, such as mail servers.
Have there been improvements to other (safe) modes that reverse this?
--
Bruce Guenter <[email protected]> http://untroubled.org/
On Tue, 28 Jun 2011, Bruce Guenter wrote:
> On Tue, Jun 28, 2011 at 01:26:03PM +0200, Lukas Czerner wrote:
> > Data journalling mode (data=journal) is known to be neglected by
> > developers and only minority of people is actually using it. This
> > mode is also less tested than the other two modes by the developers.
> >
> > This creates a dangerous combination, because the option which seems
> > *safer* is actually less safe the others. So this commit adds a warning
> > message in case that data=journal mode is used, so the user is informed
> > that the mode might be removed in the future.
>
> When I last benchmarked it, data=journal mode was considerably faster
> than the other modes for sync heavy work loads, such as mail servers.
> Have there been improvements to other (safe) modes that reverse this?
>
No, not any specific change that I know of. The problem with
data=journal from performance perspective is that, it can improve fsync
heavy loads, however just for a limited bandwidth. Because as you
probably know in this mode it will write all data twice, however it will
generate fewer seeks. So yes, for sync heavy work load with small writes
(such as mail servers might do) the performance might be better. But it
is always the matter of benchmarking your particular case to decide if
it really helps, it is not like this is a general fact that
data=journal implies better performance for fsync heavy writes.
Also note that the even though bandwidth limit might go away (or scale
up significantly) for newer drives like SSD's, the same is true for
seeks, so I guess data=journal would not have any benefits on future
drives. Also as I said those code paths are not as well tested as the
other modes.
Thanks!
-Lukas
On Tue, 28 Jun 2011, Lukas Czerner wrote:
> Data journalling mode (data=journal) is known to be neglected by
> developers and only minority of people is actually using it. This
> mode is also less tested than the other two modes by the developers.
>
> This creates a dangerous combination, because the option which seems
> *safer* is actually less safe the others. So this commit adds a warning
> message in case that data=journal mode is used, so the user is informed
> that the mode might be removed in the future.
Any comments on this ? Is that feasible to remove is someday ?
Thanks!
-Lukas
>
> Signed-off-by: Lukas Czerner <[email protected]>
> ---
> fs/ext4/super.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9ea71aa..9d189cf 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct super_block *sb,
> sbi->s_min_batch_time = option;
> break;
> case Opt_data_journal:
> + ext4_msg(sb, KERN_WARNING,
> + "Using data=journal may be removed in the "
> + "future. Please, contact "
> + "[email protected] if you are "
> + "using this feature.");
> data_opt = EXT4_MOUNT_JOURNAL_DATA;
> goto datacheck;
> case Opt_data_ordered:
>
On 2011-08-11, at 9:01 AM, Lukas Czerner wrote:
> On Tue, 28 Jun 2011, Lukas Czerner wrote:
>> Data journalling mode (data=journal) is known to be neglected by
>> developers and only minority of people is actually using it. This
>> mode is also less tested than the other two modes by the developers.
>>
>> This creates a dangerous combination, because the option which seems
>> *safer* is actually less safe the others. So this commit adds a warning
>> message in case that data=journal mode is used, so the user is informed
>> that the mode might be removed in the future.
>
> Any comments on this ? Is that feasible to remove is someday ?
I'm less in favour of removing data=journal. Jan made some fixes to
data=journal mode in the last few weeks, which means that people are
still using this.
>> Signed-off-by: Lukas Czerner <[email protected]>
>> ---
>> fs/ext4/super.c | 5 +++++
>> 1 files changed, 5 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 9ea71aa..9d189cf 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct super_block *sb,
>> sbi->s_min_batch_time = option;
>> break;
>> case Opt_data_journal:
>> + ext4_msg(sb, KERN_WARNING,
>> + "Using data=journal may be removed in the "
>> + "future. Please, contact "
>> + "[email protected] if you are "
>> + "using this feature.");
>> data_opt = EXT4_MOUNT_JOURNAL_DATA;
>> goto datacheck;
>> case Opt_data_ordered:
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
On Thu, 11 Aug 2011, Andreas Dilger wrote:
> On 2011-08-11, at 9:01 AM, Lukas Czerner wrote:
> > On Tue, 28 Jun 2011, Lukas Czerner wrote:
> >> Data journalling mode (data=journal) is known to be neglected by
> >> developers and only minority of people is actually using it. This
> >> mode is also less tested than the other two modes by the developers.
> >>
> >> This creates a dangerous combination, because the option which seems
> >> *safer* is actually less safe the others. So this commit adds a warning
> >> message in case that data=journal mode is used, so the user is informed
> >> that the mode might be removed in the future.
> >
> > Any comments on this ? Is that feasible to remove is someday ?
>
> I'm less in favour of removing data=journal. Jan made some fixes to
> data=journal mode in the last few weeks, which means that people are
> still using this.
I think that Jan was actually the one who was in favour of this change
IIRC. But you're right that there are still some (very little possibly?)
users of this. But this change does not remove it, but just let the
users know that it might be removed someday, hence discouraging them from
using it.
Also we were discussing that several times, so I think that letting
users know that we are considering it is a good thing.
Thanks!
-Lukas
>
> >> Signed-off-by: Lukas Czerner <[email protected]>
> >> ---
> >> fs/ext4/super.c | 5 +++++
> >> 1 files changed, 5 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >> index 9ea71aa..9d189cf 100644
> >> --- a/fs/ext4/super.c
> >> +++ b/fs/ext4/super.c
> >> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct super_block *sb,
> >> sbi->s_min_batch_time = option;
> >> break;
> >> case Opt_data_journal:
> >> + ext4_msg(sb, KERN_WARNING,
> >> + "Using data=journal may be removed in the "
> >> + "future. Please, contact "
> >> + "[email protected] if you are "
> >> + "using this feature.");
> >> data_opt = EXT4_MOUNT_JOURNAL_DATA;
> >> goto datacheck;
> >> case Opt_data_ordered:
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>
--
On 08/12/2011 09:16 AM, Lukas Czerner wrote:
> On Thu, 11 Aug 2011, Andreas Dilger wrote:
>
>> On 2011-08-11, at 9:01 AM, Lukas Czerner wrote:
>>> On Tue, 28 Jun 2011, Lukas Czerner wrote:
>>>> Data journalling mode (data=journal) is known to be neglected by
>>>> developers and only minority of people is actually using it. This
>>>> mode is also less tested than the other two modes by the developers.
>>>>
>>>> This creates a dangerous combination, because the option which seems
>>>> *safer* is actually less safe the others. So this commit adds a warning
>>>> message in case that data=journal mode is used, so the user is informed
>>>> that the mode might be removed in the future.
>>> Any comments on this ? Is that feasible to remove is someday ?
>> I'm less in favour of removing data=journal. Jan made some fixes to
>> data=journal mode in the last few weeks, which means that people are
>> still using this.
> I think that Jan was actually the one who was in favour of this change
> IIRC. But you're right that there are still some (very little possibly?)
> users of this. But this change does not remove it, but just let the
> users know that it might be removed someday, hence discouraging them from
> using it.
>
> Also we were discussing that several times, so I think that letting
> users know that we are considering it is a good thing.
>
> Thanks!
> -Lukas
I think that this will be very useful to have - users should definitely chime in
when they see this warning if they are using data journal mode.
The only work load that I know that benefits from a performance point of view is
one which involves an fsync() heavy, small file creation workload. Any workload
with larger files tends to lose roughly 50% of the write bandwidth under
streaming writes since we end up writing everything twice.
Regards,
Ric
>
>>>> Signed-off-by: Lukas Czerner<[email protected]>
>>>> ---
>>>> fs/ext4/super.c | 5 +++++
>>>> 1 files changed, 5 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>> index 9ea71aa..9d189cf 100644
>>>> --- a/fs/ext4/super.c
>>>> +++ b/fs/ext4/super.c
>>>> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct super_block *sb,
>>>> sbi->s_min_batch_time = option;
>>>> break;
>>>> case Opt_data_journal:
>>>> + ext4_msg(sb, KERN_WARNING,
>>>> + "Using data=journal may be removed in the "
>>>> + "future. Please, contact "
>>>> + "[email protected] if you are "
>>>> + "using this feature.");
>>>> data_opt = EXT4_MOUNT_JOURNAL_DATA;
>>>> goto datacheck;
>>>> case Opt_data_ordered:
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>>
I don't know much about data=journal, but I've been running xfstests
with it, and it's a disaster, given that data=journal doesn't support
O_DIRECT. What kind of testing do people do on data=journal?
And worse, I consistently crash my machine running xfstests 074 on a
data=journal partition. I just repeated this to make sure, on
3.1.0-rc1; I've also seen it with 3.0.0. There's a BUG_ON firing in
jbd2_journal_dirty_metadata():
[ 2174.697692] ------------[ cut here ]------------
[ 2174.698627] kernel BUG at fs/jbd2/transaction.c:1112!
[ 2174.698627] invalid opcode: 0000 [#1] SMP
[ 2174.698627] CPU 11
...
[ 2174.698627] Call Trace:
[ 2174.698627] [<ffffffff8127f9a3>] __ext4_handle_dirty_metadata+0x83/0x120
[ 2174.698627] [<ffffffff8127fd1e>] ? __ext4_journal_get_write_access+0x3e/0x80
[ 2174.698627] [<ffffffff81253a78>] __ext4_journalled_writepage+0x338/0x3e0
[ 2174.698627] [<ffffffff81254244>] ext4_writepage+0x234/0x2f0
[ 2174.698627] [<ffffffff81158327>] __writepage+0x17/0x40
[ 2174.698627] [<ffffffff811597ae>] write_cache_pages+0x1fe/0x650
This is at the J_ASSERT_JH below:
/*
* Metadata already on the current transaction list doesn't
* need to be filed. Metadata on another transaction's list must
* be committing, and will be refiled once the commit completes:
* leave it alone for now.
*/
if (jh->b_transaction != transaction) {
JBUFFER_TRACE(jh, "already on other transaction");
J_ASSERT_JH(jh, jh->b_transaction ==
journal->j_committing_transaction); <===============
J_ASSERT_JH(jh, jh->b_next_transaction == transaction);
/* And this case is illegal: we can't reuse another
* transaction's data buffer, ever. */
goto out_unlock_bh;
}
Curt
On Fri, Aug 12, 2011 at 1:25 AM, Ric Wheeler <[email protected]> wrote:
> On 08/12/2011 09:16 AM, Lukas Czerner wrote:
>>
>> On Thu, 11 Aug 2011, Andreas Dilger wrote:
>>
>>> On 2011-08-11, at 9:01 AM, Lukas Czerner wrote:
>>>>
>>>> On Tue, 28 Jun 2011, Lukas Czerner wrote:
>>>>>
>>>>> Data journalling mode (data=journal) is known to be neglected by
>>>>> developers and only minority of people is actually using it. This
>>>>> mode is also less tested than the other two modes by the developers.
>>>>>
>>>>> This creates a dangerous combination, because the option which seems
>>>>> *safer* is actually less safe the others. So this commit adds a warning
>>>>> message in case that data=journal mode is used, so the user is informed
>>>>> that the mode might be removed in the future.
>>>>
>>>> Any comments on this ? Is that feasible to remove is someday ?
>>>
>>> I'm less in favour of removing data=journal. ?Jan made some fixes to
>>> data=journal mode in the last few weeks, which means that people are
>>> still using this.
>>
>> I think that Jan was actually the one who was in favour of this change
>> IIRC. But you're right that there are still some (very little possibly?)
>> users of this. But this change does not remove it, but just let the
>> users know that it might be removed someday, hence discouraging them from
>> using it.
>>
>> Also we were discussing that several times, so I think that letting
>> users know that we are considering it is a good thing.
>>
>> Thanks!
>> -Lukas
>
> I think that this will be very useful to have - users should definitely
> chime in when they see this warning if they are using data journal mode.
>
> The only work load that I know that benefits from a performance point of
> view is one which involves an fsync() heavy, small file creation workload.
> ?Any workload with larger files tends to lose roughly 50% of the write
> bandwidth under streaming writes since we end up writing everything twice.
>
> Regards,
>
> Ric
>
>
>>
>>>>> Signed-off-by: Lukas Czerner<[email protected]>
>>>>> ---
>>>>> fs/ext4/super.c | ? ?5 +++++
>>>>> 1 files changed, 5 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>>>> index 9ea71aa..9d189cf 100644
>>>>> --- a/fs/ext4/super.c
>>>>> +++ b/fs/ext4/super.c
>>>>> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct
>>>>> super_block *sb,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?sbi->s_min_batch_time = option;
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?break;
>>>>> ? ? ? ? ? ? ? ?case Opt_data_journal:
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ext4_msg(sb, KERN_WARNING,
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Using data=journal may be removed in
>>>>> the "
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"future. Please, contact "
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"[email protected] if you are
>>>>> "
>>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"using this feature.");
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?data_opt = EXT4_MOUNT_JOURNAL_DATA;
>>>>> ? ? ? ? ? ? ? ? ? ? ? ?goto datacheck;
>>>>> ? ? ? ? ? ? ? ?case Opt_data_ordered:
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
On Fri, 12 Aug 2011, Curt Wohlgemuth wrote:
> I don't know much about data=journal, but I've been running xfstests
> with it, and it's a disaster, given that data=journal doesn't support
> O_DIRECT. What kind of testing do people do on data=journal?
Short answer is probably none :). Even though that it seems like an
radical answer I believe that it is mostly true, because simply said
almost no-one care. But I think that Ted mentioned that he actually do
some tests with that mode, but I am not sure about that.
>
> And worse, I consistently crash my machine running xfstests 074 on a
> data=journal partition. I just repeated this to make sure, on
> 3.1.0-rc1; I've also seen it with 3.0.0. There's a BUG_ON firing in
> jbd2_journal_dirty_metadata():
>
> [ 2174.697692] ------------[ cut here ]------------
> [ 2174.698627] kernel BUG at fs/jbd2/transaction.c:1112!
> [ 2174.698627] invalid opcode: 0000 [#1] SMP
> [ 2174.698627] CPU 11
> ...
> [ 2174.698627] Call Trace:
> [ 2174.698627] [<ffffffff8127f9a3>] __ext4_handle_dirty_metadata+0x83/0x120
> [ 2174.698627] [<ffffffff8127fd1e>] ? __ext4_journal_get_write_access+0x3e/0x80
> [ 2174.698627] [<ffffffff81253a78>] __ext4_journalled_writepage+0x338/0x3e0
> [ 2174.698627] [<ffffffff81254244>] ext4_writepage+0x234/0x2f0
> [ 2174.698627] [<ffffffff81158327>] __writepage+0x17/0x40
> [ 2174.698627] [<ffffffff811597ae>] write_cache_pages+0x1fe/0x650
>
> This is at the J_ASSERT_JH below:
>
> /*
> * Metadata already on the current transaction list doesn't
> * need to be filed. Metadata on another transaction's list must
> * be committing, and will be refiled once the commit completes:
> * leave it alone for now.
> */
> if (jh->b_transaction != transaction) {
> JBUFFER_TRACE(jh, "already on other transaction");
> J_ASSERT_JH(jh, jh->b_transaction ==
> journal->j_committing_transaction); <===============
> J_ASSERT_JH(jh, jh->b_next_transaction == transaction);
> /* And this case is illegal: we can't reuse another
> * transaction's data buffer, ever. */
> goto out_unlock_bh;
> }
Wow, that backtrace seems like a flash-back to me I believe that I have
seen it several times, probably in different modes and probably already
fixed. Do no know about this one though.
Thanks!
-Lukas
>
> Curt
>
> On Fri, Aug 12, 2011 at 1:25 AM, Ric Wheeler <[email protected]> wrote:
> > On 08/12/2011 09:16 AM, Lukas Czerner wrote:
> >>
> >> On Thu, 11 Aug 2011, Andreas Dilger wrote:
> >>
> >>> On 2011-08-11, at 9:01 AM, Lukas Czerner wrote:
> >>>>
> >>>> On Tue, 28 Jun 2011, Lukas Czerner wrote:
> >>>>>
> >>>>> Data journalling mode (data=journal) is known to be neglected by
> >>>>> developers and only minority of people is actually using it. This
> >>>>> mode is also less tested than the other two modes by the developers.
> >>>>>
> >>>>> This creates a dangerous combination, because the option which seems
> >>>>> *safer* is actually less safe the others. So this commit adds a warning
> >>>>> message in case that data=journal mode is used, so the user is informed
> >>>>> that the mode might be removed in the future.
> >>>>
> >>>> Any comments on this ? Is that feasible to remove is someday ?
> >>>
> >>> I'm less in favour of removing data=journal. ?Jan made some fixes to
> >>> data=journal mode in the last few weeks, which means that people are
> >>> still using this.
> >>
> >> I think that Jan was actually the one who was in favour of this change
> >> IIRC. But you're right that there are still some (very little possibly?)
> >> users of this. But this change does not remove it, but just let the
> >> users know that it might be removed someday, hence discouraging them from
> >> using it.
> >>
> >> Also we were discussing that several times, so I think that letting
> >> users know that we are considering it is a good thing.
> >>
> >> Thanks!
> >> -Lukas
> >
> > I think that this will be very useful to have - users should definitely
> > chime in when they see this warning if they are using data journal mode.
> >
> > The only work load that I know that benefits from a performance point of
> > view is one which involves an fsync() heavy, small file creation workload.
> > ?Any workload with larger files tends to lose roughly 50% of the write
> > bandwidth under streaming writes since we end up writing everything twice.
> >
> > Regards,
> >
> > Ric
> >
> >
> >>
> >>>>> Signed-off-by: Lukas Czerner<[email protected]>
> >>>>> ---
> >>>>> fs/ext4/super.c | ? ?5 +++++
> >>>>> 1 files changed, 5 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >>>>> index 9ea71aa..9d189cf 100644
> >>>>> --- a/fs/ext4/super.c
> >>>>> +++ b/fs/ext4/super.c
> >>>>> @@ -1631,6 +1631,11 @@ static int parse_options(char *options, struct
> >>>>> super_block *sb,
> >>>>> ? ? ? ? ? ? ? ? ? ? ? ?sbi->s_min_batch_time = option;
> >>>>> ? ? ? ? ? ? ? ? ? ? ? ?break;
> >>>>> ? ? ? ? ? ? ? ?case Opt_data_journal:
> >>>>> + ? ? ? ? ? ? ? ? ? ? ? ext4_msg(sb, KERN_WARNING,
> >>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Using data=journal may be removed in
> >>>>> the "
> >>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"future. Please, contact "
> >>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"[email protected] if you are
> >>>>> "
> >>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"using this feature.");
> >>>>> ? ? ? ? ? ? ? ? ? ? ? ?data_opt = EXT4_MOUNT_JOURNAL_DATA;
> >>>>> ? ? ? ? ? ? ? ? ? ? ? ?goto datacheck;
> >>>>> ? ? ? ? ? ? ? ?case Opt_data_ordered:
> >>>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >>>> the body of a message to [email protected]
> >>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >>>
> >>> Cheers, Andreas
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
>
--
On Fri, Aug 12, 2011 at 06:08:21PM +0200, Lukas Czerner wrote:
> On Fri, 12 Aug 2011, Curt Wohlgemuth wrote:
>
> > I don't know much about data=journal, but I've been running xfstests
> > with it, and it's a disaster, given that data=journal doesn't support
> > O_DIRECT. What kind of testing do people do on data=journal?
I have a rather long list of expected failures, mostly having to do
with xfstests assuming that O_DIRECT has to be supported. On my todo
list is to scrub through the list failures that I've seen, make sure
they are indeed related to O_DIRECT, and then see if I can figure out
some way of telling xfstests to skip O_DIRECT tests via some
environment variable or command line option.
For the record this is what I mostly expect (from a somewhat older
xfstests) in the data=journal case:
Ran: 001 002 005 006 007 011 013 014 053 069 070 074 075 076 077 088 089 100 105 112 113 123 124 125 126 127 128 129 130 131 132 133 135 141 169 184 193 198 204 207 208 209 210 211 212 213 214 215 219 221 223 224 225 226 228 230 231 232 233 234 235 236 237 239 240 243 245 246 247 248 249 256
Failures: 113 125 130 133 135 198 207 208 209 210 214 223 226 239 240 245
BTW, with the very latest xfstests, I'm seeing new across-the-board
(not just data=journal) failures for tests #62 (caused by the presence
of the lost+found directory and differences in error code returns for
xattrs) and #79 (a failure in the append-only handling which I don't
completely understand yet).
> Short answer is probably none :). Even though that it seems like an
> radical answer I believe that it is mostly true, because simply said
> almost no-one care. But I think that Ted mentioned that he actually do
> some tests with that mode, but I am not sure about that.
Yes, I'm testing with data=journal, and I haven't been able to
reproduce the crash which curtw is seeing (see above; test #74 passes
in my 2cpu/512mb KVM test environment). I'll add some instrumentation
to the BUG_ON in question and then try to reproduce it in curtw's
environment.
- Ted
On Fri, Aug 12, 2011 at 02:13:30PM -0400, Ted Ts'o wrote:
> I have a rather long list of expected failures, mostly having to do
> with xfstests assuming that O_DIRECT has to be supported. On my todo
> list is to scrub through the list failures that I've seen, make sure
> they are indeed related to O_DIRECT, and then see if I can figure out
> some way of telling xfstests to skip O_DIRECT tests via some
> environment variable or command line option.
If you do it please do it by returning a defined failure from the
test programs and then just exiting the test with _notrun. But given
that xfstests does a lot of O_DIRECT testing it might be quite involved.
To be honest I'd expect a Linux filesystem without O_DIRECT not working
overly well in practical setup - it's pretty widely used these days.
A better fix might be simply accept O_DIRECT for data=journal, but
use the pagecache with a use once hint.
> BTW, with the very latest xfstests, I'm seeing new across-the-board
> (not just data=journal) failures for tests #62 (caused by the presence
> of the lost+found directory
062 fails because Andreas changed the error returns from the xattr
calls. He sent me a patch to accept the new errors, but I'm still
undecided wether the new ones are correct enough. I'll wait another
kernel release to see if real users complain about the change, and
will apply it then.
> and differences in error code returns for
> xattrs) and #79 (a failure in the append-only handling which I don't
> completely understand yet).
This was discussed on fsdevel lately. All filesystem but xfs inherit
the append only bit from directories to files created inside them.
This not only is not very useful behaviour, but also exposes a bug
in the vfs that allows to create these new files, but fail the open
unless using O_APPEND which is against the posix create semantics.
We'll have to either adopt the oether filesystems to the xfs semantics,
or adopt xfs to the common dumb semantics and fix that O_CREAT bug.
On Fri, Aug 12, 2011 at 04:57:19PM -0400, Christoph Hellwig wrote:
> > and differences in error code returns for
> > xattrs) and #79 (a failure in the append-only handling which I don't
> > completely understand yet).
>
> This was discussed on fsdevel lately. All filesystem but xfs inherit
> the append only bit from directories to files created inside them.
> This not only is not very useful behaviour, but also exposes a bug
> in the vfs that allows to create these new files, but fail the open
> unless using O_APPEND which is against the posix create semantics.
I agree that disabling inheritance of the APPEND_FL flag makes sense.
I propose we do this for ext2/3/4 and any other file systems which is
currently following the ext2/3/4 behavior.
Any objections?
- Ted
This doesn't make much sense, and it exposes a bug in the kernel where
attempts to create a new file in an append-only directory using
O_CREAT will fail (but still leave a zero-length file). This was
discovered when xfstests #79 was generalized so it could run on all
file systems.
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 3 +--
include/linux/ext2_fs.h | 4 ++--
include/linux/ext3_fs.h | 4 ++--
3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e717dfd..be593d5 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -357,8 +357,7 @@ struct flex_groups {
/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
- EXT4_SYNC_FL | EXT4_IMMUTABLE_FL | EXT4_APPEND_FL |\
- EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
+ EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 53792bf..ce1b719 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -197,8 +197,8 @@ struct ext2_group_desc
/* Flags that should be inherited by new inodes from their parent. */
#define EXT2_FL_INHERITED (EXT2_SECRM_FL | EXT2_UNRM_FL | EXT2_COMPR_FL |\
- EXT2_SYNC_FL | EXT2_IMMUTABLE_FL | EXT2_APPEND_FL |\
- EXT2_NODUMP_FL | EXT2_NOATIME_FL | EXT2_COMPRBLK_FL|\
+ EXT2_SYNC_FL | EXT2_NODUMP_FL |\
+ EXT2_NOATIME_FL | EXT2_COMPRBLK_FL |\
EXT2_NOCOMP_FL | EXT2_JOURNAL_DATA_FL |\
EXT2_NOTAIL_FL | EXT2_DIRSYNC_FL)
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 67a803a..0244611 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -180,8 +180,8 @@ struct ext3_group_desc
/* Flags that should be inherited by new inodes from their parent. */
#define EXT3_FL_INHERITED (EXT3_SECRM_FL | EXT3_UNRM_FL | EXT3_COMPR_FL |\
- EXT3_SYNC_FL | EXT3_IMMUTABLE_FL | EXT3_APPEND_FL |\
- EXT3_NODUMP_FL | EXT3_NOATIME_FL | EXT3_COMPRBLK_FL|\
+ EXT3_SYNC_FL | EXT3_NODUMP_FL |\
+ EXT3_NOATIME_FL | EXT3_COMPRBLK_FL |\
EXT3_NOCOMPR_FL | EXT3_JOURNAL_DATA_FL |\
EXT3_NOTAIL_FL | EXT3_DIRSYNC_FL)
--
1.7.4.1.22.gec8e1.dirty
On Sat 13-08-11 22:06:17, Ted Tso wrote:
> On Fri, Aug 12, 2011 at 04:57:19PM -0400, Christoph Hellwig wrote:
> > > and differences in error code returns for
> > > xattrs) and #79 (a failure in the append-only handling which I don't
> > > completely understand yet).
> >
> > This was discussed on fsdevel lately. All filesystem but xfs inherit
> > the append only bit from directories to files created inside them.
> > This not only is not very useful behaviour, but also exposes a bug
> > in the vfs that allows to create these new files, but fail the open
> > unless using O_APPEND which is against the posix create semantics.
>
> I agree that disabling inheritance of the APPEND_FL flag makes sense.
> I propose we do this for ext2/3/4 and any other file systems which is
> currently following the ext2/3/4 behavior.
>
> Any objections?
I'm fine with that. It also looks more like a bug (ommission) than
anything else.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 8/13/11 9:28 PM, Theodore Ts'o wrote:
> This doesn't make much sense, and it exposes a bug in the kernel where
> attempts to create a new file in an append-only directory using
> O_CREAT will fail (but still leave a zero-length file). This was
> discovered when xfstests #79 was generalized so it could run on all
> file systems.
Curious about the status of this one; I think it makes sense to me, but
I don't think it ever made it upstream? I'd be willing to give it a:
Reviewed-by: Eric Sandeen <[email protected]>
Are there concerns about it or did it just slip through the cracks?
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4.h | 3 +--
> include/linux/ext2_fs.h | 4 ++--
> include/linux/ext3_fs.h | 4 ++--
> 3 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index e717dfd..be593d5 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -357,8 +357,7 @@ struct flex_groups {
>
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
> - EXT4_SYNC_FL | EXT4_IMMUTABLE_FL | EXT4_APPEND_FL |\
> - EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> + EXT4_SYNC_FL | EXT4_NODUMP_FL | EXT4_NOATIME_FL |\
> EXT4_NOCOMPR_FL | EXT4_JOURNAL_DATA_FL |\
> EXT4_NOTAIL_FL | EXT4_DIRSYNC_FL)
>
> diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
> index 53792bf..ce1b719 100644
> --- a/include/linux/ext2_fs.h
> +++ b/include/linux/ext2_fs.h
> @@ -197,8 +197,8 @@ struct ext2_group_desc
>
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT2_FL_INHERITED (EXT2_SECRM_FL | EXT2_UNRM_FL | EXT2_COMPR_FL |\
> - EXT2_SYNC_FL | EXT2_IMMUTABLE_FL | EXT2_APPEND_FL |\
> - EXT2_NODUMP_FL | EXT2_NOATIME_FL | EXT2_COMPRBLK_FL|\
> + EXT2_SYNC_FL | EXT2_NODUMP_FL |\
> + EXT2_NOATIME_FL | EXT2_COMPRBLK_FL |\
> EXT2_NOCOMP_FL | EXT2_JOURNAL_DATA_FL |\
> EXT2_NOTAIL_FL | EXT2_DIRSYNC_FL)
>
> diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
> index 67a803a..0244611 100644
> --- a/include/linux/ext3_fs.h
> +++ b/include/linux/ext3_fs.h
> @@ -180,8 +180,8 @@ struct ext3_group_desc
>
> /* Flags that should be inherited by new inodes from their parent. */
> #define EXT3_FL_INHERITED (EXT3_SECRM_FL | EXT3_UNRM_FL | EXT3_COMPR_FL |\
> - EXT3_SYNC_FL | EXT3_IMMUTABLE_FL | EXT3_APPEND_FL |\
> - EXT3_NODUMP_FL | EXT3_NOATIME_FL | EXT3_COMPRBLK_FL|\
> + EXT3_SYNC_FL | EXT3_NODUMP_FL |\
> + EXT3_NOATIME_FL | EXT3_COMPRBLK_FL |\
> EXT3_NOCOMPR_FL | EXT3_JOURNAL_DATA_FL |\
> EXT3_NOTAIL_FL | EXT3_DIRSYNC_FL)
>
On Thu, Dec 22, 2011 at 12:06:36PM -0600, Eric Sandeen wrote:
> On 8/13/11 9:28 PM, Theodore Ts'o wrote:
> > This doesn't make much sense, and it exposes a bug in the kernel where
> > attempts to create a new file in an append-only directory using
> > O_CREAT will fail (but still leave a zero-length file). This was
> > discovered when xfstests #79 was generalized so it could run on all
> > file systems.
>
> Curious about the status of this one; I think it makes sense to me, but
> I don't think it ever made it upstream? I'd be willing to give it a:
>
> Reviewed-by: Eric Sandeen <[email protected]>
>
> Are there concerns about it or did it just slip through the cracks?
Yes, it's there. It hit upstream as of v3.2-rc1. Commit id 1cd9f097
- Ted
On 12/22/11 12:19 PM, Ted Ts'o wrote:
> On Thu, Dec 22, 2011 at 12:06:36PM -0600, Eric Sandeen wrote:
>> On 8/13/11 9:28 PM, Theodore Ts'o wrote:
>>> This doesn't make much sense, and it exposes a bug in the kernel where
>>> attempts to create a new file in an append-only directory using
>>> O_CREAT will fail (but still leave a zero-length file). This was
>>> discovered when xfstests #79 was generalized so it could run on all
>>> file systems.
>>
>> Curious about the status of this one; I think it makes sense to me, but
>> I don't think it ever made it upstream? I'd be willing to give it a:
>>
>> Reviewed-by: Eric Sandeen <[email protected]>
>>
>> Are there concerns about it or did it just slip through the cracks?
>
> Yes, it's there. It hit upstream as of v3.2-rc1. Commit id 1cd9f097
Argh sorry, how did I miss that, I thought my tree was up to date, oh well.
Sorry for the noise & thanks,
-Eric
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html