2000-12-13 05:16:22

by Anton Petrusevich

[permalink] [raw]
Subject: test12: innd bug came back?


Hi folks.

Today I saw well-known "innd bug"(truncate(tm)), and my brother said
he had seen it with -test12-pre7. I don't know about -test12-pre3,
neither I nor my brother hadn't noticed it since -test10. But we could
miss it with -test12-pre3, and I didn't try any -test11 kernels. Thus
possibly that was introduced changes between -test12-pre3 and
-test12-pre7, but I can definitly say it present in -test12-final.

Another truncate(tm)?
--
Anton


2000-12-13 21:59:48

by Henrik St?rner

[permalink] [raw]
Subject: Re: test12: innd bug came back?

In <[email protected]> Anton Petrusevich <[email protected]> writes:

>Today I saw well-known "innd bug"(truncate(tm)), and my brother said
>he had seen it with -test12-pre7. I don't know about -test12-pre3,
>neither I nor my brother hadn't noticed it since -test10. But we could
>miss it with -test12-pre3, and I didn't try any -test11 kernels. Thus
>possibly that was introduced changes between -test12-pre3 and
>-test12-pre7, but I can definitly say it present in -test12-final.

Just to add a "me too" on this. I didn't report when I saw it last week,
because I was uncertain of exactly what might have caused it - I was
booting several different kernels at the time, including one from a
rescue disk (I was trying to salvage bits of a Win9x disk at the time -
don't ask for details!)

Alas, I lost the test program someone wrote to test for the truncate
problem, and due to moving I will not be able to test anything until
next Monday. But if needed, I can do some testing then. Something
definitely went wrong with innd during the test12 pre-patches.
--
Henrik Storner | "Crackers thrive on code secrecy. Cockcroaches breed
<[email protected]> | in the dark. It's time to let the sunlight in."
|
| Eric S. Raymond, re. the Frontpage backdoor

2000-12-13 22:27:02

by Alexander Viro

[permalink] [raw]
Subject: Re: test12: innd bug came back?



On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:

> Just to add a "me too" on this. I didn't report when I saw it last week,
> because I was uncertain of exactly what might have caused it - I was
> booting several different kernels at the time, including one from a
> rescue disk (I was trying to salvage bits of a Win9x disk at the time -
> don't ask for details!)
>
> Alas, I lost the test program someone wrote to test for the truncate
> problem, and due to moving I will not be able to test anything until
> next Monday. But if needed, I can do some testing then. Something
> definitely went wrong with innd during the test12 pre-patches.

It may be a side effect of removing partial_clear() in test12-final.
Relevant chunk (in mm/memory.c):
@@ -953,10 +914,6 @@
/* Ok, partially affected.. */
start += diff << PAGE_SHIFT;
len = (len - diff) << PAGE_SHIFT;
- if (start & ~PAGE_MASK) {
- partial_clear(mpnt, start);
- start = (start + ~PAGE_MASK) & PAGE_MASK;
- }
flush_cache_range(mm, start, end);
zap_page_range(mm, start, len);
flush_tlb_range(mm, start, end);
should actually be
@@ -954,7 +915,6 @@
start += diff << PAGE_SHIFT;
len = (len - diff) << PAGE_SHIFT;
if (start & ~PAGE_MASK) {
- partial_clear(mpnt, start);
start = (start + ~PAGE_MASK) & PAGE_MASK;
}
flush_cache_range(mm, start, end);

IOW, we have off-by-one when calling zap_page_range() and friends.
Cheers,
Al

2000-12-13 22:34:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: test12: innd bug came back?

In article <[email protected]>,
Alexander Viro <[email protected]> wrote:
>
>
>On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
>
>> Just to add a "me too" on this. I didn't report when I saw it last week,
>> because I was uncertain of exactly what might have caused it - I was
>> booting several different kernels at the time, including one from a
>> rescue disk (I was trying to salvage bits of a Win9x disk at the time -
>> don't ask for details!)
>>
>> Alas, I lost the test program someone wrote to test for the truncate
>> problem, and due to moving I will not be able to test anything until
>> next Monday. But if needed, I can do some testing then. Something
>> definitely went wrong with innd during the test12 pre-patches.
>
>It may be a side effect of removing partial_clear() in test12-final.

No. If you read the code, partial_clear() has been a no-op for the
longest time (the "start & ~PAGE_MASK" thing could never trigger, as
"start" has been page-aligned for a long long while now.

So it must be something else.

Linus

2000-12-13 23:23:33

by Albert Cranford

[permalink] [raw]
Subject: Re: test12: innd bug came back?

And the problem started with pre8 not final.
currently investigating difference pre7-pre8
Albert
Linus Torvalds wrote:
>
> In article <[email protected]>,
> Alexander Viro <[email protected]> wrote:
> >
> >
> >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
> >
> >> Just to add a "me too" on this. I didn't report when I saw it last week,
> >> because I was uncertain of exactly what might have caused it - I was
> >> booting several different kernels at the time, including one from a
> >> rescue disk (I was trying to salvage bits of a Win9x disk at the time -
> >> don't ask for details!)
> >>
> >> Alas, I lost the test program someone wrote to test for the truncate
> >> problem, and due to moving I will not be able to test anything until
> >> next Monday. But if needed, I can do some testing then. Something
> >> definitely went wrong with innd during the test12 pre-patches.
> >
> >It may be a side effect of removing partial_clear() in test12-final.
>
> No. If you read the code, partial_clear() has been a no-op for the
> longest time (the "start & ~PAGE_MASK" thing could never trigger, as
> "start" has been page-aligned for a long long while now.
>
> So it must be something else.
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/

--
Albert Cranford Deerfield Beach FL USA
[email protected]

2000-12-17 19:04:18

by Jorg de Jong

[permalink] [raw]
Subject: Re: test12: innd bug came back?

> >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
> >
> >> Just to add a "me too" on this. I didn't report when I saw it last week,

I'd like to second that. ME TOO !
Since I switched to 2.4.0.test12 I again have the innd bug.
( well at least the same symptoms !)

No problems with test10 and test11. I have not used any pre kernels.

regards


--
Jorg de Jong
Work : mailto:[email protected]
Play : mailto:[email protected]

2000-12-17 22:05:52

by Alexander Viro

[permalink] [raw]
Subject: Re: test12: innd bug came back?



On Sun, 17 Dec 2000, Jorg de Jong wrote:

> > >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
> > >
> > >> Just to add a "me too" on this. I didn't report when I saw it last week,
>
> I'd like to second that. ME TOO !
> Since I switched to 2.4.0.test12 I again have the innd bug.
> ( well at least the same symptoms !)

Guys, what blocksize are you using? BTW, old testcase was
cat >foo.c <<EOF
#include <unistd.h>
main(argc,argv)
int argc;
char **argv;
{
int fd;
char c=0;
truncate(argv[1], 10);
fd = open(argv[1], 1);
lseek(fd, 16384, 0);
write(fd, &c, 1);
close(fd);
}
EOF
gcc foo.c
./a.out /tmp/something_old
od -c </tmp/something_old
where something_old would be something not touched for long (i.e.
completely out of cache). Buggy kernels would leave much more than
10 non-zero bytes. Correct result is a file with bytes 11-16385 being zero.
I doubt that it would be the same beast, though...

2000-12-17 22:14:44

by Henrik St?rner

[permalink] [raw]
Subject: Re: test12: innd bug came back?

In <[email protected]> Alexander Viro <[email protected]> writes:

>On Sun, 17 Dec 2000, Jorg de Jong wrote:

>> > >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
>> > >
>> > >> Just to add a "me too" on this. I didn't report when I saw it last week

>> I'd like to second that. ME TOO !
>> Since I switched to 2.4.0.test12 I again have the innd bug.
>> ( well at least the same symptoms !)

>Guys, what blocksize are you using?

I am using Reiserfs, and I hear it has some problems with the changes
introduced in pre12. So I will report back once the Reiserfs guys get
this settled.
--
Henrik Storner | "Crackers thrive on code secrecy. Cockcroaches breed
<[email protected]> | in the dark. It's time to let the sunlight in."
|
| Eric S. Raymond, re. the Frontpage backdoor

2000-12-18 11:15:43

by Chris Mason

[permalink] [raw]
Subject: Re: test12: innd bug came back?



On 17 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:

> In <[email protected]> Alexander Viro <[email protected]> writes:
>
> >On Sun, 17 Dec 2000, Jorg de Jong wrote:
>
> >> > >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
> >> > >
> >> > >> Just to add a "me too" on this. I didn't report when I saw it last week
>
> >> I'd like to second that. ME TOO !
> >> Since I switched to 2.4.0.test12 I again have the innd bug.
> >> ( well at least the same symptoms !)
>
> >Guys, what blocksize are you using?
>
> I am using Reiserfs, and I hear it has some problems with the changes
> introduced in pre12. So I will report back once the Reiserfs guys get
> this settled.

Ok, the reiserfs patches for test12 are on ftp.reiserfs.org/pub/2.4/beta,
please let me know if they work for you.

I just reran the test case on test12, with tails on and off and got the
correct results. There might be some interaction with the new O_SYNC code
I'm missing that is causing innd problems though (reiserfs still isn't
using the new sync stuff, workin on it).

-chris


2000-12-18 11:56:18

by Jorg de Jong

[permalink] [raw]
Subject: Re: test12: innd bug came back?

Alexander Viro wrote:
>
> On Sun, 17 Dec 2000, Jorg de Jong wrote:
>
> > > >On 13 Dec 2000, Henrik [ISO-8859-1] St?rner wrote:
> > > >
> > > >> Just to add a "me too" on this. I didn't report when I saw it last week,
> >
> > I'd like to second that. ME TOO !
> > Since I switched to 2.4.0.test12 I again have the innd bug.
> > ( well at least the same symptoms !)
>
> I.e. old contents resurfacing in active?

I tryed your test program and got correct results, a file with bytes 11-16385 being zero.

I will try to give a description of my problems:
after a reboot inn is 're-using' existing messages to store new
messages. It seems that after a renumber command the active file
is correced again. I have not checked to see if the active file
was corrutped before.
I am using a plain stock kernel, no other patches what so ever,
but am using LVM.

The blocksize the ext2 filesystem is using is 1024.

--
Jorg de Jong
Work : mailto:[email protected]
Play : mailto:[email protected]