2003-09-14 16:30:37

by Arjan Filius

[permalink] [raw]
Subject: Another ReiserFS (rpm database) issue (2.6.0-test5)

Hi,

Finally i "solved" my "rpm --rebuild" problems.

My rpm was on a reiserfs (scsi) partition, and for some while now i've got
problems. Mainly at some point i was unable to install (rpm) packages, and
"rpm --rebuild" failed or just looped forever.

In almost panic mode i tried to resque my system, and installed a similar
system (suse 8.2), copied the rpm database files to that system, and to my
surpise a "rpm --rebuilddb" went smoothly without any error or problem.

Then, copied the "fixed" rpm database files at the original place, and i
was able to install packages again, however a "rpm --rebuilddb"
looped/hanged forever.

In a more relaxed panic mode i searched for differences, and motivated by
the LargeFile ReiserFS problems, i decided i try a "rpm --rebuilddb" on a
fresh ext2 partition.
And with success!

No succes i got with the original rpmdb dir on ext2 and the rpmrebuild-dir
on reiserfs (another partition). (same problem, loops/hangs forever)

I think this problem may have started somewhere 2.5, but i can't easy test
this.

Any ideas? (except for banning reiserfs at all).

At this point i'm still able to reproduce the problems, by doing/debugging
(on a random reiserfs partition):
strace -f rpm --rebuilddb --dbpath /images/rpmtest/rpm/
<snip>
lseek(9, 37879808, SEEK_SET) = 37879808
write(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 34275328, SEEK_SET) = 34275328
write(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
lseek(9, 36110336, SEEK_SET) = 36110336
read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 7995392, SEEK_SET) = 7995392
read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 37879808, SEEK_SET) = 37879808
read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 34275328, SEEK_SET) = 34275328
read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
<and here it "hangs" forever>

sizes of my rpmdb files:
rpmtest # ll rpm
total 142233
drwxr-xr-x 2 root root 320 Sep 14 18:16 .
drwxr-xr-x 5 root root 152 Sep 14 18:20 ..
-rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
-rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
-rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
-rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
-rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
-rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
-rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
-rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm

Using suse 8.2/kernel 2.6.0-test5/rpm-3.0.6-478

Please CC my when replying.

Greetings,
--
Arjan Filius
mailto:[email protected]


2003-09-14 23:11:29

by Hans Reiser

[permalink] [raw]
Subject: Re: Another ReiserFS (rpm database) issue (2.6.0-test5)

It is interesting that we didn't get reports of corruption until
2.6.0-test* came out, there must be immensely more users.

Apologies for that bug, I need to review what was used for testing the
large writes patch, it must have been a test that does not write more
than 4 GB.....:-/

--
Hans


2003-09-15 08:40:34

by Oleg Drokin

[permalink] [raw]
Subject: Re: Another ReiserFS (rpm database) issue (2.6.0-test5)

Hello!

On Sun, Sep 14, 2003 at 06:30:33PM +0200, Arjan Filius wrote:
> lseek(9, 36110336, SEEK_SET) = 36110336
> read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 7995392, SEEK_SET) = 7995392
> read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 37879808, SEEK_SET) = 37879808
> read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 34275328, SEEK_SET) = 34275328
> read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
> <and here it "hangs" forever>

You mean, strace does not log more syscalls?

What if you mount your reiserfs partition with "-o nolargeio=1" mount option?

> -rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
> -rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
> -rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
> -rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
> -rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
> -rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
> -rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
> -rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm

None of that fits into "bigger than 4G" cathegory.

Bye,
Oleg

2003-09-15 16:34:07

by Arjan Filius

[permalink] [raw]
Subject: Re: Another ReiserFS (rpm database) issue (2.6.0-test5)

Hello,

On Mon, 15 Sep 2003, Oleg Drokin wrote:

> Hello!
>
> On Sun, Sep 14, 2003 at 06:30:33PM +0200, Arjan Filius wrote:
> > lseek(9, 36110336, SEEK_SET) = 36110336
> > read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 7995392, SEEK_SET) = 7995392
> > read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 37879808, SEEK_SET) = 37879808
> > read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 34275328, SEEK_SET) = 34275328
> > read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
> > <and here it "hangs" forever>
>
> You mean, strace does not log more syscalls?
That is correct, but it still keeps consuming a lot CPU time.

>
> What if you mount your reiserfs partition with "-o nolargeio=1" mount option?

Hey! this seems to "fix" it!
With this option even my original "problem rpm databse" is rebuild in a
few minutes, and without consuming that much memory, and without any
errors!

Without the "nolargeio=1" i'd had to add a lot of swap (on my 1.5Gb RAM
system), else it got just terminated. And adding a lot of swap i still got
some fatal rpm errors.

So it seems the "nolargeio=1" solves all my problems.

Thanks!


>
> > -rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
> > -rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
> > -rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
> > -rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
> > -rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
> > -rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
> > -rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
> > -rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm
>
> None of that fits into "bigger than 4G" cathegory.

I'd tried for just to be sure the largefile patch recently on this list,
however no success.

>
> Bye,
> Oleg
>
>

--
Arjan Filius
mailto:[email protected]

2003-09-16 08:50:04

by Oleg Drokin

[permalink] [raw]
Subject: Re: Another ReiserFS (rpm database) issue (2.6.0-test5)

Hello!

On Mon, Sep 15, 2003 at 06:34:00PM +0200, Arjan Filius wrote:

> > What if you mount your reiserfs partition with "-o nolargeio=1" mount option?
> Hey! this seems to "fix" it!
> With this option even my original "problem rpm databse" is rebuild in a
> few minutes, and without consuming that much memory, and without any
> errors!

That means you have a error in your rpm binary. Probably you want to contact SuSE to get updated version.

> Without the "nolargeio=1" i'd had to add a lot of swap (on my 1.5Gb RAM
> system), else it got just terminated. And adding a lot of swap i still got
> some fatal rpm errors.
> So it seems the "nolargeio=1" solves all my problems.

No, you just masked the problem, but the bug in your rpm binary is still present.

Bye,
Oleg