2002-03-04 05:16:52

by Jon Masters

[permalink] [raw]
Subject: Loopback (2.4.18)

Hi,

I'm trying to use "multiCD" to backup around 3 lots of 25GB of data in to
handy CD sized images. The software is a perl script which uses a loopback
mounted file on which a standard ext2 filesystem with data is written,
etc. I'm sure everyone gets the idea. These then get scp'd/burned.

Anyway, after about the 10th image the box starts having issues, which
ultimately result in a hard reset and resync of a bunch of software RAID5
arrays residing on it. The symptoms are as if it is not possible to fork a
new process, though it is responsive to icmp, etc. I was advised
previously to try upgrading to 2.4.18 in response to a previous (presumed
to be vm issue) and so I did that just now with no luck. Generally the box
is running fine on a daily basis, handles very high load and memtest86.

Before anything else, can someone just let me know what the current status
of loopback in 2.4 is - I know it was very broken originally but was then
okish just prior to 2.4.17 at which point I am purely guessing it's not
ok once more - perhaps I should keep different kernels for different tasks
(yes that's meant to be a joke - it's probably only amusing to me...) :-)

Kernel 2.4.18 (stock Linus kernel from kernel.org), AMD 900MHz Duron CPU,
1.5GB SDRAM (highmem enabled), various RAID5 /dev/md0-/dev/md3, reiserfs,
NFS/Samba/Automounter/NIS/etc. etc. Just a fileserver really.

I reckon I'm going to end up having to copy the data to another machine
running 2.2 and run the backup from there - but first time for sleep...

Thanks for any replies,

Jon.


2002-03-04 05:40:35

by Andrew Morton

[permalink] [raw]
Subject: Re: Loopback (2.4.18)

Jon Masters wrote:
>
> Hi,
>
> I'm trying to use "multiCD" to backup around 3 lots of 25GB of data in to
> handy CD sized images. The software is a perl script which uses a loopback
> mounted file on which a standard ext2 filesystem with data is written,
> etc. I'm sure everyone gets the idea. These then get scp'd/burned.
>

The loop driver does really naughty things which defeat the kernel's
management of dirty data. It's quite easy to livelock machines with
it, especially if you increase the dirty buffer thresholds.

Alas, it's tricky. I have three patches, none of which fix it.
Fourth time lucky, maybe.

I expect the problem will go away if you drop the dirty buffer
thresholds:

echo 10 0 0 0 500 3000 25 0 0 > /proc/sys/vm/bdflush

Could you please try that? Also, if/when it locks up again,
the SYSRQ-P information will be interesting. Use the key
sequence several times, record the EIP values, look them up
after reboot. Probably, they point at shrink_cache().

Thanks.

-

2002-03-04 06:11:00

by Jon Masters

[permalink] [raw]
Subject: Re: Loopback (2.4.18)

On Sun, 3 Mar 2002, Andrew Morton wrote:

> The loop driver does really naughty things which defeat the kernel's
> management of dirty data. It's quite easy to livelock machines with
> it, especially if you increase the dirty buffer thresholds.

I'd appreciate it if you could go in to more detail off list, out of
a general interest I have here.

> I expect the problem will go away if you drop the dirty buffer
> thresholds:
>
> echo 10 0 0 0 500 3000 25 0 0 > /proc/sys/vm/bdflush
>
> Could you please try that?

Unfortunately not today or probably earlier this week as I'm about to be
150 miles away from the office back in the land of cs.nott.ac.uk and am
planning to avoid late night sysrq fun over a serial console this week :-)

> Also, if/when it locks up again, the SYSRQ-P information will be
> interesting. Use the key sequence several times, record the EIP values,
> look them up after reboot. Probably, they point at shrink_cache().

Well, at least this kernel should have kernel debugging enabled so if/when
it happens again or I get chance to look at it over this week I'll lookup
where it's happening - for now all I need to know is that loopback still
isn't kosher on 2.4 and I need to avoid it by using a 2.2 box or whatever.

Jon.