Hi,
Finally i "solved" my "rpm --rebuild" problems.
My rpm was on a reiserfs (scsi) partition, and for some while now i've got
problems. Mainly at some point i was unable to install (rpm) packages, and
"rpm --rebuild" failed or just looped forever.
In almost panic mode i tried to resque my system, and installed a similar
system (suse 8.2), copied the rpm database files to that system, and to my
surpise a "rpm --rebuilddb" went smoothly without any error or problem.
Then, copied the "fixed" rpm database files at the original place, and i
was able to install packages again, however a "rpm --rebuilddb"
looped/hanged forever.
In a more relaxed panic mode i searched for differences, and motivated by
the LargeFile ReiserFS problems, i decided i try a "rpm --rebuilddb" on a
fresh ext2 partition.
And with success!
No succes i got with the original rpmdb dir on ext2 and the rpmrebuild-dir
on reiserfs (another partition). (same problem, loops/hangs forever)
I think this problem may have started somewhere 2.5, but i can't easy test
this.
Any ideas? (except for banning reiserfs at all).
At this point i'm still able to reproduce the problems, by doing/debugging
(on a random reiserfs partition):
strace -f rpm --rebuilddb --dbpath /images/rpmtest/rpm/
<snip>
lseek(9, 37879808, SEEK_SET) = 37879808
write(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 34275328, SEEK_SET) = 34275328
write(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
lseek(9, 36110336, SEEK_SET) = 36110336
read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 7995392, SEEK_SET) = 7995392
read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 37879808, SEEK_SET) = 37879808
read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
lseek(9, 34275328, SEEK_SET) = 34275328
read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
<and here it "hangs" forever>
sizes of my rpmdb files:
rpmtest # ll rpm
total 142233
drwxr-xr-x 2 root root 320 Sep 14 18:16 .
drwxr-xr-x 5 root root 152 Sep 14 18:20 ..
-rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
-rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
-rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
-rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
-rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
-rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
-rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
-rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm
Using suse 8.2/kernel 2.6.0-test5/rpm-3.0.6-478
Please CC my when replying.
Greetings,
--
Arjan Filius
mailto:[email protected]
It is interesting that we didn't get reports of corruption until
2.6.0-test* came out, there must be immensely more users.
Apologies for that bug, I need to review what was used for testing the
large writes patch, it must have been a test that does not write more
than 4 GB.....:-/
--
Hans
Hello!
On Sun, Sep 14, 2003 at 06:30:33PM +0200, Arjan Filius wrote:
> lseek(9, 36110336, SEEK_SET) = 36110336
> read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 7995392, SEEK_SET) = 7995392
> read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 37879808, SEEK_SET) = 37879808
> read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
> lseek(9, 34275328, SEEK_SET) = 34275328
> read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
> <and here it "hangs" forever>
You mean, strace does not log more syscalls?
What if you mount your reiserfs partition with "-o nolargeio=1" mount option?
> -rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
> -rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
> -rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
> -rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
> -rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
> -rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
> -rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
> -rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm
None of that fits into "bigger than 4G" cathegory.
Bye,
Oleg
Hello,
On Mon, 15 Sep 2003, Oleg Drokin wrote:
> Hello!
>
> On Sun, Sep 14, 2003 at 06:30:33PM +0200, Arjan Filius wrote:
> > lseek(9, 36110336, SEEK_SET) = 36110336
> > read(9, "\4\0\354\377\3\0\n0\344\377\326\377\344\377\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 7995392, SEEK_SET) = 7995392
> > read(9, "\2\0t@\0\0\366\377\0\0\341\377\357\377\0\0\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 37879808, SEEK_SET) = 37879808
> > read(9, "\4\0\352\377\3\0=@\342\377\324\377\342\377\0\0\0\0\0\0"..., 65536) = 65536
> > lseek(9, 34275328, SEEK_SET) = 34275328
> > read(9, "\0\0\372\377\0\0\366\377\0\0\337\377\355\377\0\0\0\0\0"..., 65536) = 65536
> > <and here it "hangs" forever>
>
> You mean, strace does not log more syscalls?
That is correct, but it still keeps consuming a lot CPU time.
>
> What if you mount your reiserfs partition with "-o nolargeio=1" mount option?
Hey! this seems to "fix" it!
With this option even my original "problem rpm databse" is rebuild in a
few minutes, and without consuming that much memory, and without any
errors!
Without the "nolargeio=1" i'd had to add a lot of swap (on my 1.5Gb RAM
system), else it got just terminated. And adding a lot of swap i still got
some fatal rpm errors.
So it seems the "nolargeio=1" solves all my problems.
Thanks!
>
> > -rw-r--r-- 1 root root 16384 Sep 14 18:16 conflictsindex.rpm
> > -rw-r--r-- 1 root root 83431424 Sep 14 18:16 fileindex.rpm
> > -rw-r--r-- 1 root root 57344 Sep 14 18:16 groupindex.rpm
> > -rw-r--r-- 1 root root 94208 Sep 14 18:16 nameindex.rpm
> > -rw-r--r-- 1 root root 54840904 Sep 14 18:16 packages.rpm
> > -rw-r--r-- 1 root root 331776 Sep 14 18:16 providesindex.rpm
> > -rw-r--r-- 1 root root 42246144 Sep 14 18:16 requiredby.rpm
> > -rw-r--r-- 1 root root 16384 Sep 14 18:16 triggerindex.rpm
>
> None of that fits into "bigger than 4G" cathegory.
I'd tried for just to be sure the largefile patch recently on this list,
however no success.
>
> Bye,
> Oleg
>
>
--
Arjan Filius
mailto:[email protected]
Hello!
On Mon, Sep 15, 2003 at 06:34:00PM +0200, Arjan Filius wrote:
> > What if you mount your reiserfs partition with "-o nolargeio=1" mount option?
> Hey! this seems to "fix" it!
> With this option even my original "problem rpm databse" is rebuild in a
> few minutes, and without consuming that much memory, and without any
> errors!
That means you have a error in your rpm binary. Probably you want to contact SuSE to get updated version.
> Without the "nolargeio=1" i'd had to add a lot of swap (on my 1.5Gb RAM
> system), else it got just terminated. And adding a lot of swap i still got
> some fatal rpm errors.
> So it seems the "nolargeio=1" solves all my problems.
No, you just masked the problem, but the bug in your rpm binary is still present.
Bye,
Oleg