On Mon, 3 Dec 2007, Thomas Osterried wrote:
> On the machine which has troubles, the bug occured within about 10 days
> During these days, the amount of dirty pages increased, up to 400MB.
> I have testet kernel 2.6.19, 2.6.20, 2.6.22.1 and 2.6.22.10 (with our config),
> and even linux-2.6.20 from ubuntu-sever. They have all shown that behaviour.
<CUT>
> 10 days ago, i installed kernel 2.6.18.5 on this machine (with backported
> 3ware controller code). I'm quite sure that this kernel will now fixes our
> severe stability problems on this production machine (currently:
> Dirty: 472 kB, nr_dirty 118).
> If so, it's the "lastest" kernel i found usable, after half of a year of pain.
Strange, my tests show that both 2.6.18(.8) and 2.6.19(.7) are OK and the
first wrong kernel is 2.6.20.
BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.
Best regards,
Krzysztof Ol?dzki
On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:
>
> BTW: Could someone please look at this problem? I feel little ignored and
> in my situation this is a critical regression.
I was hoping to get around to it today, but I guess tomorrow will have
to do :-/
So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?
Does it happen with other filesystems as well?
What are you ext3 mount options?
On Thu, 13 Dec 2007, Peter Zijlstra wrote:
>
> On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:
>>
>
>> BTW: Could someone please look at this problem? I feel little ignored and
>> in my situation this is a critical regression.
>
> I was hoping to get around to it today, but I guess tomorrow will have
> to do :-/
Thanks.
> So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
> right?
Not only doesn't fall but continuously grows.
> Does it happen with other filesystems as well?
Don't know. I generally only use ext3 and I'm afraid I'm not able to
switch this system to other filesystem.
> What are you ext3 mount options?
/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 rw,nosuid,nodev,noatime,data=ordered 0 0
Best regards,
Krzysztof Ol?dzki
On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:
>
>
> On Thu, 13 Dec 2007, Peter Zijlstra wrote:
>
>>
>> On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:
>>>
>>
>>> BTW: Could someone please look at this problem? I feel little ignored and
>>> in my situation this is a critical regression.
>>
>> I was hoping to get around to it today, but I guess tomorrow will have
>> to do :-/
>
> Thanks.
>
>> So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
>> right?
>
> Not only doesn't fall but continuously grows.
>
>> Does it happen with other filesystems as well?
>
> Don't know. I generally only use ext3 and I'm afraid I'm not able to switch
> this system to other filesystem.
>
>> What are you ext3 mount options?
> /dev/root / ext3 rw,data=journal 0 0
> /dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
> /dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
> /dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3
> rw,nosuid,nodev,noatime,data=writeback 0 0
> /dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3
> rw,nosuid,nodev,noatime,data=writeback 0 0
> /dev/VolGrp0/news_spool /var/spool/news ext3
> rw,nosuid,nodev,noatime,data=ordered 0 0
BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it
was introduced but it is hard to do it on a highly critical production
system, especially since it takes ~2h after a reboot, to be sure.
However, 2h is quite good time, on other systems I have to wait ~2 months
to get 20MB of leaked memory:
# uptime
13:29:34 up 58 days, 13:04, 9 users, load average: 0.38, 0.27, 0.31
# sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
Dirty: 23820 kB
Best regards,
Krzysztof Ol?dzki
http://bugzilla.kernel.org/show_bug.cgi?id=9182
On Sat, 15 Dec 2007, Krzysztof Oledzki wrote:
>
>
> On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:
>
>>
>>
>> On Thu, 13 Dec 2007, Peter Zijlstra wrote:
>>
>>>
>>> On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:
>>>>
>>>
>>>> BTW: Could someone please look at this problem? I feel little ignored and
>>>> in my situation this is a critical regression.
>>>
>>> I was hoping to get around to it today, but I guess tomorrow will have
>>> to do :-/
>>
>> Thanks.
>>
>>> So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
>>> right?
>>
>> Not only doesn't fall but continuously grows.
>>
>>> Does it happen with other filesystems as well?
>>
>> Don't know. I generally only use ext3 and I'm afraid I'm not able to switch
>> this system to other filesystem.
>>
>>> What are you ext3 mount options?
>> /dev/root / ext3 rw,data=journal 0 0
>> /dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
>> /dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
>> /dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3
>> rw,nosuid,nodev,noatime,data=writeback 0 0
>> /dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3
>> rw,nosuid,nodev,noatime,data=writeback 0 0
>> /dev/VolGrp0/news_spool /var/spool/news ext3
>> rw,nosuid,nodev,noatime,data=ordered 0 0
>
> BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it was
> introduced but it is hard to do it on a highly critical production system,
> especially since it takes ~2h after a reboot, to be sure.
>
> However, 2h is quite good time, on other systems I have to wait ~2 months to
> get 20MB of leaked memory:
>
> # uptime
> 13:29:34 up 58 days, 13:04, 9 users, load average: 0.38, 0.27, 0.31
>
> # sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
> Dirty: 23820 kB
More news, I hope this time my problem get more attention from developers
since now I have much more information.
So far I found that:
- 2.6.20-rc4 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14057
- 2.6.20-rc2 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14058
- 2.6.20-rc1 - OK (probably, I need to wait little more to be 100% sure).
2.6.20-rc1 with 33m uptime:
~$ grep Dirt /proc/meminfo ;sync ; sleep 1 ; sync ; grep Dirt /proc/meminfo
Dirty: 10504 kB
Dirty: 0 kB
2.6.20-rc2 was released Dec 23/24 2006 (BAD)
2.6.20-rc1 was released Dec 13/14 2006 (GOOD?)
It seems that this bug was introduced exactly one year ago. Surprisingly,
dirty memory in 2.6.20-rc2/2.6.20-rc4 leaks _much_ more faster than in
2.6.20-final and later kernels as it took only about 6h to reach 172MB.
So, this bug might be cured afterward, but only a little.
There are three commits that may be somehow related:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=3e67c0987d7567ad666641164a153dca9a43b11d
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=5f2a105d5e33a038a717995d2738434f9c25aed2
I'm going to check 2.6.20-rc1-git... releases but it would be *very* nice
if someone could finally give ma a hand and point some hints helping
debugging this problem.
Please note that none of my systems with kernels >= 2.6.20-rc1 is able to
reach 0 kb of dirty memory, even after many synces, even when idle.
Best regards,
Krzysztof Ol?dzki