2006-01-28 14:14:05

by Denis Vlasenko

[permalink] [raw]
Subject: Recursive chmod/chown OOM kills box with 32MB RAM

I have an old PII server box with just 32MB of RAM.

Yesterday I was preparing it for use by other people, not just me,
and went on checking and tightening filesystem permissions with
chmod -R and/or chown -R.

On deep directories box started to OOM-kill processes en masse!

I updated kernel to 2.6.15.1 - doesn't help.
I stopped some of more memory hungry processes before running
chmod -R - doesn't help.

Details:

2006-01-28_13:35:57.55341 kern.notice: Linux version 2.6.15.1 (root@firebird) (gcc version 3.4.1) #1 Sat Jan 28 14:01:51 EET 2006
2006-01-28_13:35:57.55380 kern.info: BIOS-provided physical RAM map:
2006-01-28_13:35:57.55391 kern.warn: BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
2006-01-28_13:35:57.55402 kern.warn: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
2006-01-28_13:35:57.55413 kern.warn: BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
2006-01-28_13:35:57.55423 kern.warn: BIOS-e820: 0000000000100000 - 0000000002000000 (usable)
2006-01-28_13:35:57.55434 kern.warn: BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
2006-01-28_13:35:57.55444 kern.warn: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
2006-01-28_13:35:57.55455 kern.warn: BIOS-e820: 00000000ffe80000 - 0000000100000000 (reserved)
2006-01-28_13:35:57.55466 kern.notice: 32MB LOWMEM available.
2006-01-28_13:35:57.55474 kern.debug: On node 0 totalpages: 8192
2006-01-28_13:35:57.55483 kern.debug: DMA zone: 4096 pages, LIFO batch:0
2006-01-28_13:35:57.55492 kern.debug: DMA32 zone: 0 pages, LIFO batch:0
2006-01-28_13:35:57.55502 kern.debug: Normal zone: 4096 pages, LIFO batch:0
2006-01-28_13:35:57.55511 kern.debug: HighMem zone: 0 pages, LIFO batch:0

Here is the first OOM kill in the last test run:

13:36:00.85 ip_conntrack version 2.4 (256 buckets, 2048 max) - 216 bytes per conntrack
13:36:04.10 NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
13:36:04.10 NFSD: recovery directory /var/lib/nfs/v4recovery doesn't exist
13:36:04.10 NFSD: starting 90-second grace period
13:39:27.71 SysRq : HELP : loglevel0-8 reBoot tErm Full kIll saK showMem Nice powerOff showPc unRaw Sync showTasks Unmount
13:39:28.21 SysRq : HELP : loglevel0-8 reBoot tErm Full kIll saK showMem Nice powerOff showPc unRaw Sync showTasks Unmount
13:40:26.63 SysRq : Changing Loglevel
13:40:26.64 Loglevel set to 9
13:41:07.46 oom-killer: gfp_mask=0x200d2, order=0
13:41:07.47 Mem-info:
13:41:07.47 DMA per-cpu:
13:41:07.47 cpu 0 hot: low 0, high 0, batch 1 used:0
13:41:07.47 cpu 0 cold: low 0, high 0, batch 1 used:0
13:41:07.47 DMA32 per-cpu: empty
13:41:07.47 Normal per-cpu:
13:41:07.48 cpu 0 hot: low 0, high 0, batch 1 used:0
13:41:07.48 cpu 0 cold: low 0, high 0, batch 1 used:0
13:41:07.48 HighMem per-cpu: empty
13:41:07.48 Free pages: 952kB (0kB HighMem)
13:41:07.48 Active:2217 inactive:2002 dirty:0 writeback:4 unstable:0 free:238 slab:1383 mapped:23 pagetables:243
13:41:07.48 DMA free:432kB min:360kB low:448kB high:540kB active:4556kB inactive:3988kB present:16384kB pages_scanned:390 all_unreclaimable? no
13:41:07.48 lowmem_reserve[]: 0 0 16 16
13:41:07.48 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
13:41:07.48 lowmem_reserve[]: 0 0 16 16
13:41:07.48 Normal free:520kB min:360kB low:448kB high:540kB active:4312kB inactive:4020kB present:16384kB pages_scanned:4 all_unreclaimable? no
13:41:07.48 lowmem_reserve[]: 0 0 0 0
13:41:07.48 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
13:41:07.48 lowmem_reserve[]: 0 0 0 0
13:41:07.48 DMA: 2*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 432kB
13:41:07.48 DMA32: empty
13:41:07.48 Normal: 40*4kB 1*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 520kB
13:41:07.48 HighMem: empty
13:41:07.48 Swap cache: add 6654, delete 6608, find 1126/2081, race 0+0
13:41:07.49 Free swap = 92080kB
13:41:07.49 Total swap = 98296kB
13:41:07.49 Free swap: 92080kB
13:41:07.49 8192 pages of RAM
13:41:07.49 0 pages of HIGHMEM
13:41:07.49 1301 reserved pages
13:41:07.49 3949 pages shared
13:41:07.49 46 pages swap cached
13:41:07.49 0 pages dirty
13:41:07.49 4 pages writeback
13:41:07.49 23 pages mapped
13:41:07.49 1383 pages slab
13:41:07.49 243 pages pagetables
13:41:07.49 Out of Memory: Killed process 1173 (top).

Process tree on this box looks like this while I was doing tests
(PIDs won't match, I rebooted the box):

PID TTY STAT TIME COMMAND
1 ? S 0:00 /bin/sh /init HOME=/ TERM=linux devfs=nomount
2 ? SWN 0:00 [ksoftirqd/0]
3 ? SW 0:00 [watchdog/0]
4 ? SW< 0:00 [events/0]
5 ? SW< 0:00 [khelper]
6 ? SW< 0:00 [kthread]
8 ? SW< 0:00 [kblockd/0]
164 ? SW< 0:00 [aio/0]
239 ? SW< 0:00 [kseriod]
349 ? SW< 0:00 [scsi_eh_0]
387 ? SW< 0:00 [ata/0]
648 ? SW< 0:00 [reiserfs/0]
940 ? SW 0:00 [pdflush]
943 ? SW 0:00 [pdflush]
1518 ? SW< 0:00 [nfsd4]
1533 ? SW< 0:00 [rpciod/0]
163 ? SW 0:00 [kswapd0]
559 ? S< 0:00 udevd
784 ? S 0:00 rpc.portmap
1088 ? S 0:00 inetd
1118 ? S 0:00 svscan /var/service PATH=/bin:/usr/bin
1122 ? S 0:00 supervise fw PATH=/bin:/usr/bin
1123 ? S 0:00 supervise sshd PATH=/bin:/usr/bin
1127 ? S 0:00 sshd -D -e -p22 -u0
1643 ? S 0:00 sshd -D -e -p22 -u0
1644 pts/0 S 0:00 -bash USER=vda LOGNAME=vda HOME=/home/vda PATH=/usr/bin:/bin:/usr/sbin:/sbin MAIL=/var/mail/vda SHELL=/bin/bash SSH_CLIENT=172.17.2.38 33504 22 SSH_TTY=/dev/pts/0 TERM=xterm
1653 pts/0 S 0:05 mc MANPATH=/usr/man HOSTNAME=pegasus.port.imtp.ilyichevsk.odessa.ua TERM=xterm SHELL=/bin/bash SSH_CLIENT=172.17.2.38 33504 22 SSH_TTY=/dev/pts/0 USER=vda LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=
1836 pts/0 S 0:00 /bin/bash -c psahe >psahe MANPATH=/usr/man HOSTNAME=pegasus.port.imtp.ilyichevsk.odessa.ua TERM=xterm SHELL=/bin/bash SSH_CLIENT=172.17.2.38 33504 22 SSH_TTY=/dev/pts/0 USER=vda LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg
1837 pts/0 S 0:00 /bin/sh /home/vda/bin/psahe MANPATH=/usr/man HOSTNAME=pegasus.port.imtp.ilyichevsk.odessa.ua SHELL=/bin/bash TERM=xterm SSH_CLIENT=172.17.2.38 33504 22 SSH_TTY=/dev/pts/0 USER=vda LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.
1838 pts/0 R 0:00 ps -AH e --width=500 MANPATH=/usr/man HOSTNAME=pegasus.port.imtp.ilyichevsk.odessa.ua TERM=xterm SHELL=/bin/bash SSH_CLIENT=172.17.2.38 33504 22 OLDPWD=/.local/tmp SSH_TTY=/dev/pts/0 USER=vda LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:
1839 pts/0 D 0:00 most MANPATH=/usr/man HOSTNAME=pegasus.port.imtp.ilyichevsk.odessa.ua TERM=xterm SHELL=/bin/bash SSH_CLIENT=172.17.2.38 33504 22 OLDPWD=/.local/tmp SSH_TTY=/dev/pts/0 USER=vda LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jp
1124 ? S 0:00 supervise ntp PATH=/bin:/usr/bin
1125 ? S 0:00 supervise log PATH=/bin:/usr/bin
1129 ? S 0:00 multilog t n5 /var/log/service/ntp
1131 ? S 0:00 supervise dhcp PATH=/bin:/usr/bin
1132 ? S 0:00 supervise log PATH=/bin:/usr/bin
1144 ? S 0:00 multilog t /var/log/service/dhcp
1145 ? S 0:00 supervise watcher PATH=/bin:/usr/bin
1152 ? S 0:00 /bin/sh ./run PATH=/bin:/usr/bin
1823 ? S 0:00 sleep 33
1150 ? S 0:00 supervise klog PATH=/bin:/usr/bin
1167 ? S 0:00 socklog ucspi
1151 ? S 0:00 supervise log PATH=/bin:/usr/bin
1158 ? S 0:00 svlogd -tt /var/log/service/klog
1155 ? S 0:00 supervise syslog PATH=/bin:/usr/bin
1168 ? S 0:00 socklog unix /dev/log PATH=/bin:/usr/bin PWD=/.local/var/service/syslog SHLVL=0 GID=50 UID=59
1156 ? S 0:00 supervise log PATH=/bin:/usr/bin
1174 ? S 0:00 svlogd /var/log/service/syslog
1161 ? S 0:00 supervise pop3 PATH=/bin:/usr/bin
1183 ? S 0:00 tcpserver -v -R -H -l 0 -c 40 0.0.0.0 pop3 setuidgid root qmail-popup checkpassword qmail-pop3d maildir
1162 ? S 0:00 supervise log PATH=/bin:/usr/bin
1199 ? S 0:00 multilog t /var/log/service/pop3
1171 ? S 0:00 supervise top PATH=/bin:/usr/bin
1200 ? S 0:14 top c s TERM=linux
1172 ? S 0:00 supervise mysql PATH=/bin:/usr/bin
1173 ? S 0:00 supervise log PATH=/bin:/usr/bin
1190 ? S 0:00 multilog t n5 /var/log/service/mysql
1198 ? S 0:00 supervise getty_tty2 PATH=/bin:/usr/bin
1598 tty2 S 0:00 agetty 38400 /dev/tty2 linux TERM=linux
1213 ? S 0:00 supervise getty_tty3 PATH=/bin:/usr/bin
1619 tty3 S 0:00 agetty 38400 /dev/tty3 linux TERM=linux
1218 ? S 0:00 supervise getty_tty4 PATH=/bin:/usr/bin
1224 tty4 S 0:00 agetty 38400 /dev/tty4 linux TERM=linux
1225 ? S 0:00 supervise getty_tty5 PATH=/bin:/usr/bin
1244 tty5 S 0:00 agetty 38400 /dev/tty5 linux TERM=linux
1235 ? S 0:00 supervise getty_tty6 PATH=/bin:/usr/bin
1243 tty6 S 0:00 agetty 38400 /dev/tty6 linux TERM=linux
1242 ? S 0:00 supervise getty_tty7 PATH=/bin:/usr/bin
1252 tty7 S 0:00 agetty 38400 /dev/tty7 linux TERM=linux
1253 ? S 0:00 supervise getty_tty8 PATH=/bin:/usr/bin
1260 tty8 S 0:00 agetty 38400 /dev/tty8 linux TERM=linux
1261 ? S 0:00 supervise getty_tty1 PATH=/bin:/usr/bin
1265 tty1 S 0:00 agetty 38400 /dev/tty1 linux TERM=linux
1274 ? S 0:00 supervise httpd PATH=/bin:/usr/bin
1276 ? S 0:00 tcpserver -v -R -H -l 0 -c 40 0.0.0.0 www setuidgid root httpd -X -f /.local/var/service/httpd/httpd.conf PATH=/bin:/usr/bin
1275 ? S 0:00 supervise log PATH=/bin:/usr/bin
1277 ? S 0:00 multilog t /var/log/service/httpd
1278 ? S 0:00 supervise httpd_ssl PATH=/bin:/usr/bin
1280 ? S 0:00 stunnel -d 443 -D 6 -p https.pem -S 0 -f -P none -N httpd -l setuidgid -- setuidgid root httpd -X -f /.local/var/service/httpd_ssl/httpd.conf
1279 ? S 0:00 supervise log PATH=/bin:/usr/bin
1332 ? S 0:00 multilog /var/log/service/httpd_ssl
1281 ? S 0:00 supervise qmail PATH=/bin:/usr/bin
1282 ? S 0:00 supervise log PATH=/bin:/usr/bin
1303 ? S 0:00 multilog t s300000 /var/log/service/qmail -@* status: local * remote * -@* new msg * -@* end msg * /var/log/service/qmail_nostatus -@* delivery *: success: * /var/log/service/qmail_nosuccess -@* info msg *: * -@* starting delivery *: * /var/log/service/qmail_problems
1296 ? S 0:00 supervise smtp PATH=/bin:/usr/bin
1299 ? S 0:00 tcpserver -v -R -H -l 0 -c 60 0.0.0.0 25 setuidgid mail smtpfront-qmail QMAILQUEUE=/usr/bin/qmail-scanner-queue.pl CVM_SASL_PLAIN=/usr/app/cvm-0.11/bin/cvm-unix
1297 ? S 0:00 supervise log PATH=/bin:/usr/bin
1304 ? S 0:00 multilog t s200000 /var/log/service/smtp
1310 ? S 0:00 supervise smb_n PATH=/bin:/usr/bin
1311 ? S 0:00 supervise log PATH=/bin:/usr/bin
1348 ? S 0:00 multilog t /var/log/service/smb_n
1320 ? S 0:00 supervise smb_s PATH=/bin:/usr/bin
1321 ? S 0:00 supervise log PATH=/bin:/usr/bin
1365 ? S 0:00 multilog t /var/log/service/smb_s
1322 ? S 0:00 supervise smb_w PATH=/bin:/usr/bin
1323 ? S 0:00 supervise log PATH=/bin:/usr/bin
1363 ? S 0:00 multilog t /var/log/service/smb_w
1346 ? S 0:00 supervise nfs PATH=/bin:/usr/bin
1364 ? S 0:00 /bin/sh ./run PATH=/bin:/usr/bin
1542 ? S 0:00 sleep 32000
1380 ? S 0:00 supervise proxy-tcp PATH=/bin:/usr/bin
1391 ? S 0:00 tcpserver -U -v -R -H -l 0 -c 100 0.0.0.0 9123 ./startproxy GID=50 UID=50
1381 ? S 0:00 supervise log PATH=/bin:/usr/bin
1404 ? S 0:00 multilog t /var/log/service/proxy-tcp
1392 ? S 0:00 supervise pgsql PATH=/bin:/usr/bin
1393 ? S 0:00 supervise log PATH=/bin:/usr/bin
1414 ? S 0:00 multilog t /var/log/service/pgsql
1396 ? S 0:00 supervise once PATH=/bin:/usr/bin
1405 ? S 0:00 supervise ovpn-1 PATH=/bin:/usr/bin
1406 ? S 0:00 supervise log PATH=/bin:/usr/bin
1445 ? S 0:00 multilog /var/log/service/ovpn-1
1443 ? S 0:00 supervise automount PATH=/bin:/usr/bin
1447 ? S 0:00 automount -f -s -v --timeout 15 /.local/mnt/auto program /root/bin/mapper.sh
1444 ? S 0:00 supervise log PATH=/bin:/usr/bin
1452 ? S 0:00 multilog t n5 /var/log/service/automount
1121 ? S 0:00 sleep 32000
1524 ? SW 0:00 [nfsd]
1525 ? SW 0:00 [nfsd]
1526 ? SW 0:00 [nfsd]
1527 ? SW 0:00 [nfsd]
1532 ? SW 0:00 [lockd]
1534 ? S 0:00 rpc.mountd
1536 ? S 0:00 rpc.statd

daemontools, bash and some other programs are compiled
againts dietlibc or uclibc. top b n1 output:

16:01:33 up 15 min, 2 users, load average: 0,32, 0,16, 0,17
121 processes: 120 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 3,3% user 2,4% system 0,0% nice 14,1% iowait 80,0% idle
Mem: 27564k av, 25088k used, 2476k free, 0k shrd, 936k buff
12944k active, 2468k inactive
Swap: 98296k av, 0k used, 98296k free 9432k cached

(swap usage is 0 here because box was just rebooted. It starts to use swap
in normal use. A few mins later: "98296k av, 152k used, 98144k free")

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
1870 root 17 0 2164 1168 804 R 8,1 4,2 0:00 0 top
1 root 15 0 1044 616 516 S 0,0 2,2 0:00 0 init
2 root 34 19 0 0 0 SWN 0,0 0,0 0:00 0 ksoftirqd/0
3 root RT 0 0 0 0 SW 0,0 0,0 0:00 0 watchdog/0
4 root 10 -5 0 0 0 SW< 0,0 0,0 0:00 0 events/0
5 root 11 -5 0 0 0 SW< 0,0 0,0 0:00 0 khelper
6 root 11 -5 0 0 0 SW< 0,0 0,0 0:00 0 kthread
8 root 10 -5 0 0 0 SW< 0,0 0,0 0:00 0 kblockd/0
164 root 17 -5 0 0 0 SW< 0,0 0,0 0:00 0 aio/0
163 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 kswapd0
239 root 10 -5 0 0 0 SW< 0,0 0,0 0:00 0 kseriod
349 root 11 -5 0 0 0 SW< 0,0 0,0 0:00 0 scsi_eh_0
387 root 11 -5 0 0 0 SW< 0,0 0,0 0:00 0 ata/0
559 root 16 -4 128 16 4 S < 0,0 0,0 0:00 0 udevd
648 root 10 -5 0 0 0 SW< 0,0 0,0 0:00 0 reiserfs/0
784 rpc 16 0 1444 560 460 S 0,0 2,0 0:00 0 rpc.portmap
940 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 pdflush
943 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 pdflush
1088 root 18 0 1344 460 384 S 0,0 1,6 0:00 0 inetd
1118 root 16 0 132 24 8 S 0,0 0,0 0:00 0 svscan
1121 root 18 0 1816 464 388 S 0,0 1,6 0:00 0 sleep
1122 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1123 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1124 root 18 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1125 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1127 root 16 0 2568 1192 940 S 0,0 4,3 0:01 0 sshd
1129 daemon 18 0 108 28 16 S 0,0 0,1 0:00 0 multilog
1131 root 18 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1132 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1144 daemon 17 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1145 root 16 0 100 20 8 S 0,0 0,0 0:00 0 supervise
1150 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1151 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1152 root 16 0 1048 624 524 S 0,0 2,2 0:00 0 run
1155 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1156 root 16 0 100 20 8 S 0,0 0,0 0:00 0 supervise
1158 logger 18 0 120 40 20 S 0,0 0,1 0:00 0 svlogd
1161 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1162 root 16 0 104 16 8 S 0,0 0,0 0:00 0 supervise
1167 root 16 0 116 32 12 S 0,0 0,1 0:00 0 socklog
1168 logger 16 0 120 32 12 S 0,0 0,1 0:00 0 socklog
1171 root 16 0 104 16 8 S 0,0 0,0 0:00 0 supervise
1172 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1173 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1174 logger 18 0 116 40 20 S 0,0 0,1 0:00 0 svlogd
1183 root 17 0 128 48 32 S 0,0 0,1 0:00 0 tcpserver
1190 daemon 17 0 108 28 16 S 0,0 0,1 0:00 0 multilog
1198 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1199 mail 15 0 108 28 16 S 0,0 0,1 0:00 0 multilog
1200 user0 17 0 1840 1004 728 S 0,0 3,6 0:14 0 top
1213 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1218 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1224 root 16 0 1312 456 380 S 0,0 1,6 0:00 0 agetty
1225 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1235 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1242 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1243 root 16 0 1308 452 380 S 0,0 1,6 0:00 0 agetty
1244 root 16 0 1312 456 380 S 0,0 1,6 0:00 0 agetty
1252 root 16 0 1312 456 380 S 0,0 1,6 0:00 0 agetty
1253 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1260 root 16 0 1308 452 380 S 0,0 1,6 0:00 0 agetty
1261 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1265 root 17 0 1308 452 380 S 0,0 1,6 0:00 0 agetty
1274 root 16 0 100 20 8 S 0,0 0,0 0:00 0 supervise
1275 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1276 root 15 0 124 40 32 S 0,0 0,1 0:00 0 tcpserver
1277 apache 17 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1278 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1279 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1280 root 16 0 2888 1340 1072 S 0,0 4,8 0:00 0 stunnel
1281 root 18 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1282 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1296 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1297 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1299 root 18 0 128 40 28 S 0,0 0,1 0:00 0 tcpserver
1303 mail 17 0 116 36 16 S 0,0 0,1 0:00 0 multilog
1304 mail 17 0 112 36 16 S 0,0 0,1 0:00 0 multilog
1310 root 17 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1311 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1320 root 17 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1321 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1322 root 17 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1323 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1332 apache 17 0 112 36 16 S 0,0 0,1 0:00 0 multilog
1346 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1348 logger 17 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1363 logger 17 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1364 root 16 0 1056 628 520 S 0,0 2,2 0:00 0 run
1365 logger 17 0 112 36 16 S 0,0 0,1 0:00 0 multilog
1380 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1381 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1391 daemon 18 0 124 36 28 S 0,0 0,1 0:00 0 tcpserver
1392 root 18 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1393 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1396 root 17 0 104 24 8 S 0,0 0,0 0:00 0 supervise
1404 daemon 16 0 108 28 16 S 0,0 0,1 0:00 0 multilog
1405 root 18 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1406 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1414 mysql 15 0 112 36 16 S 0,0 0,1 0:00 0 multilog
1443 root 16 0 100 16 8 S 0,0 0,0 0:00 0 supervise
1444 root 16 0 104 20 8 S 0,0 0,0 0:00 0 supervise
1445 daemon 15 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1447 root 16 0 1440 648 540 S 0,0 2,3 0:00 0 automount
1452 logger 16 0 112 32 16 S 0,0 0,1 0:00 0 multilog
1518 root 10 -5 0 0 0 SW< 0,0 0,0 0:00 0 nfsd4
1524 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 nfsd
1525 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 nfsd
1526 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 nfsd
1527 root 15 0 0 0 0 SW 0,0 0,0 0:00 0 nfsd
1532 root 16 0 0 0 0 SW 0,0 0,0 0:00 0 lockd
1533 root 11 -5 0 0 0 SW< 0,0 0,0 0:00 0 rpciod/0
1534 root 17 0 1512 584 408 S 0,0 2,1 0:00 0 rpc.mountd
1536 root 16 0 3380 1392 952 S 0,0 5,0 0:00 0 rpc.statd
1542 root 18 0 1816 464 388 S 0,0 1,6 0:00 0 sleep
1598 root 16 0 1312 456 380 S 0,0 1,6 0:00 0 agetty
1619 root 16 0 1308 452 380 S 0,0 1,6 0:00 0 agetty
1644 root 16 0 1152 824 596 S 0,0 2,9 0:00 0 bash
1653 root 15 0 1412 948 592 S 0,0 3,4 0:05 0 mc
1854 root 18 0 1816 464 388 S 0,0 1,6 0:00 0 sleep
1869 root 18 0 1052 572 468 S 0,0 2,0 0:00 0 bash
--
vda


2006-01-28 15:01:48

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Saturday 28 January 2006 16:13, Denis Vlasenko wrote:
> I have an old PII server box with just 32MB of RAM.
>
> Yesterday I was preparing it for use by other people, not just me,
> and went on checking and tightening filesystem permissions with
> chmod -R and/or chown -R.
>
> On deep directories box started to OOM-kill processes en masse!
>
> I updated kernel to 2.6.15.1 - doesn't help.
> I stopped some of more memory hungry processes before running
> chmod -R - doesn't help.

More details which might be relevant.

I did not alter anything in /proc/sys/vm, maybe I should?

# cd /proc/sys/vm
# for a in *; do echo "$a: `cat "$a"`"; done
block_dump: 0
dirty_background_ratio: 10
dirty_expire_centisecs: 3000
dirty_ratio: 40
dirty_writeback_centisecs: 500
hugetlb_shm_group: 0
laptop_mode: 0
legacy_va_layout: 0
lowmem_reserve_ratio: 256 256 32
max_map_count: 65536
min_free_kbytes: 724
nr_hugepages: 0
nr_pdflush_threads: 2
overcommit_memory: 0
overcommit_ratio: 50
page-cluster: 3
swap_token_timeout: 300
swappiness: 60
vfs_cache_pressure: 100

SCSI controller and disks attached to it:

lspci says: "00:0b.0 SCSI storage controller: Adaptec AIC-7880U (rev 01)"
boot log:
13:48:34.50 ahc_pci:0:11:0: Using left over BIOS settings
13:48:34.50 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
13:48:34.50 <Adaptec aic7880 Ultra SCSI adapter>
13:48:34.50 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
13:48:34.50
13:48:34.50 Vendor: IBM Model: DNES-309170W Rev: SAH0
13:48:34.50 Type: Direct-Access ANSI SCSI revision: 03
13:48:34.50 scsi0:A:0:0: Tagged Queuing enabled. Depth 8
13:48:34.50 target0:0:0: Beginning Domain Validation
13:48:34.50 target0:0:0: wide asynchronous.
13:48:34.50 target0:0:0: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 8)
13:48:34.50 target0:0:0: Domain Validation skipping write tests
13:48:34.50 target0:0:0: Ending Domain Validation
13:48:34.50 Vendor: IBM Model: DNES-309170W Rev: SAH0
13:48:34.50 Type: Direct-Access ANSI SCSI revision: 03
13:48:34.50 scsi0:A:1:0: Tagged Queuing enabled. Depth 8
13:48:34.50 target0:0:1: Beginning Domain Validation
13:48:34.50 target0:0:1: wide asynchronous.
13:48:34.50 target0:0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 8)
13:48:34.50 target0:0:1: Domain Validation skipping write tests
13:48:34.50 target0:0:1: Ending Domain Validation
13:48:34.50 Vendor: SEAGATE Model: ST39140N Rev: 1498
13:48:34.50 Type: Direct-Access ANSI SCSI revision: 02
13:48:34.50 scsi0:A:3:0: Tagged Queuing enabled. Depth 8
13:48:34.50 target0:0:3: Beginning Domain Validation
13:48:34.50 target0:0:3: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 15)
13:48:34.50 target0:0:3: Domain Validation skipping write tests
13:48:34.50 target0:0:3: Ending Domain Validation
13:48:34.50 libata version 1.20 loaded.
13:48:34.50 SCSI device sda: 17916240 512-byte hdwr sectors (9173 MB)
13:48:34.50 SCSI device sda: drive cache: write back
13:48:34.50 SCSI device sda: 17916240 512-byte hdwr sectors (9173 MB)
13:48:34.50 SCSI device sda: drive cache: write back
13:48:34.50 sda: sda1 sda2
13:48:34.50 sd 0:0:0:0: Attached scsi disk sda
13:48:34.51 SCSI device sdb: 17916240 512-byte hdwr sectors (9173 MB)
13:48:34.51 SCSI device sdb: drive cache: write back
13:48:34.51 SCSI device sdb: 17916240 512-byte hdwr sectors (9173 MB)
13:48:34.51 SCSI device sdb: drive cache: write back
13:48:34.51 sdb: sdb1 sdb2
13:48:34.51 sd 0:0:1:0: Attached scsi disk sdb
13:48:34.51 SCSI device sdc: 17783240 512-byte hdwr sectors (9105 MB)
13:48:34.51 SCSI device sdc: drive cache: write back
13:48:34.51 SCSI device sdc: 17783240 512-byte hdwr sectors (9105 MB)
13:48:34.51 SCSI device sdc: drive cache: write back
13:48:34.51 sdc: sdc1 sdc2 sdc3

--
vda

2006-01-28 16:12:00

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

[CCing namesys]

Narrowed it down to 100% reproducible case:

chown -Rc 0:<n> .

in a top directory of tree containing ~21938 files
on reiser3 partition:

/dev/sdc3 on /.3 type reiserfs (rw,noatime)

causes oom kill storm. "ls -lR", "find ." etc work fine.

I suspected that it is a leak in winbindd libnss module,
but chown does not seem to grow larger in top, and also
running it under softlimit -m 400000 still causes oom kills
while chown's RSS stays below 4MB.
--
vda

2006-01-30 06:12:04

by Hans Reiser

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Chris, would Denis Vlasenko wrote:

>[CCing namesys]
>
>Narrowed it down to 100% reproducible case:
>
> chown -Rc 0:<n> .
>
>in a top directory of tree containing ~21938 files
>on reiser3 partition:
>
> /dev/sdc3 on /.3 type reiserfs (rw,noatime)
>
>causes oom kill storm. "ls -lR", "find ." etc work fine.
>
>I suspected that it is a leak in winbindd libnss module,
>but chown does not seem to grow larger in top, and also
>running it under softlimit -m 400000 still causes oom kills
>while chown's RSS stays below 4MB.
>--
>vda
>
>
>
>
Chris, would you like to handle this?

Thanks,

Hans

2006-01-30 13:22:56

by Chris Mason

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Monday 30 January 2006 01:11, Hans Reiser wrote:
> Chris, would Denis Vlasenko wrote:
> >[CCing namesys]
> >
> >Narrowed it down to 100% reproducible case:
> >
> > chown -Rc 0:<n> .
> >
> >in a top directory of tree containing ~21938 files
> >on reiser3 partition:
> >
> > /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> >
> >causes oom kill storm. "ls -lR", "find ." etc work fine.
> >
> >I suspected that it is a leak in winbindd libnss module,
> >but chown does not seem to grow larger in top, and also
> >running it under softlimit -m 400000 still causes oom kills
> >while chown's RSS stays below 4MB.

In order for the journaled filesystems to make sure the FS is consistent after
a crash, we need to keep some blocks in memory until other blocks have been
written. These blocks are pinned, and can't be freed until a certain amount
of io is done.

In the case of reiserfs, it might pin as much as the size of the journal at
any time. The default journal is 32MB, which is much too large for a system
with only 32MB of ram.

You can shrink the log of an existing filesystem. The minimum size is 513
blocks, you might try 1024 as a good starting poing.

reiserfstune -s 1024 /dev/xxxx

The filesystem must be unmounted first.

-chris

2006-01-30 15:37:13

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Monday 30 January 2006 08:11, Hans Reiser wrote:
> Denis Vlasenko wrote:
>
> >[CCing namesys]
> >
> >Narrowed it down to 100% reproducible case:
> >
> > chown -Rc 0:<n> .
> >
> >in a top directory of tree containing ~21938 files
> >on reiser3 partition:
> >
> > /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> >
> >causes oom kill storm. "ls -lR", "find ." etc work fine.
> >
> >I suspected that it is a leak in winbindd libnss module,
> >but chown does not seem to grow larger in top, and also
> >running it under softlimit -m 400000 still causes oom kills

(typo, must be -m 40000000)

> >while chown's RSS stays below 4MB.
>
> Chris, would you like to handle this?


fs seems to be ok fsck-wise:

# mount -o remount,ro /.3
# reiserfsck /dev/sdc3
reiserfsck 3.6.11 (2003 http://www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and it fails **
** please email bug reports to [email protected], **
** providing as much information as possible -- your **
** hardware, kernel, patches, settings, all reiserfsck **
** messages (including version), the reiserfsck logfile, **
** check the syslog file for any related information. **
** If you would like advice on using this program, support **
** is available for $25 at http://www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/sdc3
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Mon Jan 30 14:11:15 2006
###########
Filesystem seems mounted read-only. Skipping journal replay.
Checking internal tree..finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
Leaves 8075
Internal nodes 52
Directories 1792
Other files 31865
Data block pointers 1058363 (0 of them are zero)
Safe links 0
###########
reiserfsck finished at Mon Jan 30 14:13:28 2006
###########

However, there is one strange thing: I cannot umount it.
Why? There is no open files.

# umount /.3
umount: /.3: device is busy
# lsof -nP | grep -F '/.3'
# lsof -nP | grep -F 'sdc'
# uname -a
Linux pegasus 2.6.14.6 #1 SMP Mon Jan 30 08:46:20 EET 2006 i686 unknown unknown GNU/Linux

--
vda

2006-02-01 07:33:26

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Monday 30 January 2006 15:22, Chris Mason wrote:
> On Monday 30 January 2006 01:11, Hans Reiser wrote:
> > Chris, would Denis Vlasenko wrote:
> > >[CCing namesys]
> > >
> > >Narrowed it down to 100% reproducible case:
> > >
> > > chown -Rc 0:<n> .
> > >
> > >in a top directory of tree containing ~21938 files
> > >on reiser3 partition:
> > >
> > > /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> > >
> > >causes oom kill storm. "ls -lR", "find ." etc work fine.
> > >
> > >I suspected that it is a leak in winbindd libnss module,
> > >but chown does not seem to grow larger in top, and also
> > >running it under softlimit -m 400000 still causes oom kills
> > >while chown's RSS stays below 4MB.
>
> In order for the journaled filesystems to make sure the FS is consistent after
> a crash, we need to keep some blocks in memory until other blocks have been
> written. These blocks are pinned, and can't be freed until a certain amount
> of io is done.
>
> In the case of reiserfs, it might pin as much as the size of the journal at
> any time. The default journal is 32MB, which is much too large for a system
> with only 32MB of ram.
>
> You can shrink the log of an existing filesystem. The minimum size is 513
> blocks, you might try 1024 as a good starting poing.
>
> reiserfstune -s 1024 /dev/xxxx
>
> The filesystem must be unmounted first.

Will try this and report the result.

Please consider printing a big fat warning at mount time if total RAM
on the system is close to sum of RAM space required for all currently
mounted reiserfs partitions...
--
vda

2006-02-01 07:42:05

by Hans Reiser

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko wrote:

>On Monday 30 January 2006 15:22, Chris Mason wrote:
>
>
>>On Monday 30 January 2006 01:11, Hans Reiser wrote:
>>
>>
>>>Chris, would Denis Vlasenko wrote:
>>>
>>>
>>>>[CCing namesys]
>>>>
>>>>Narrowed it down to 100% reproducible case:
>>>>
>>>> chown -Rc 0:<n> .
>>>>
>>>>in a top directory of tree containing ~21938 files
>>>>on reiser3 partition:
>>>>
>>>> /dev/sdc3 on /.3 type reiserfs (rw,noatime)
>>>>
>>>>causes oom kill storm. "ls -lR", "find ." etc work fine.
>>>>
>>>>I suspected that it is a leak in winbindd libnss module,
>>>>but chown does not seem to grow larger in top, and also
>>>>running it under softlimit -m 400000 still causes oom kills
>>>>while chown's RSS stays below 4MB.
>>>>
>>>>
>>In order for the journaled filesystems to make sure the FS is consistent after
>>a crash, we need to keep some blocks in memory until other blocks have been
>>written. These blocks are pinned, and can't be freed until a certain amount
>>of io is done.
>>
>>In the case of reiserfs, it might pin as much as the size of the journal at
>>any time. The default journal is 32MB, which is much too large for a system
>>with only 32MB of ram.
>>
>>You can shrink the log of an existing filesystem. The minimum size is 513
>>blocks, you might try 1024 as a good starting poing.
>>
>>reiserfstune -s 1024 /dev/xxxx
>>
>>The filesystem must be unmounted first.
>>
>>
>
>Will try this and report the result.
>
>Please consider printing a big fat warning at mount time if total RAM
>on the system is close to sum of RAM space required for all currently
>mounted reiserfs partitions...
>--
>vda
>
>
>
>
I already suggested this to Chris.;-) I agree.

Best,

Hans

2006-02-01 07:43:04

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Monday 30 January 2006 15:22, Chris Mason wrote:
> You can shrink the log of an existing filesystem. The minimum size is 513
> blocks, you might try 1024 as a good starting poing.
>
> reiserfstune -s 1024 /dev/xxxx
>
> The filesystem must be unmounted first.

It doesn't want to, for no obvious reason:

# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (ro,nogrpid)
none on /proc type proc (rw,nodiratime)
none on /sys type sysfs (rw)
none on /dev type ramfs (rw)
/dev/sda2 on /.share type ext2 (rw,noatime,nogrpid)
/dev/sdb2 on /.1 type reiserfs (rw,noatime)
/dev/sdc2 on /.2 type reiserfs (rw,noatime)
/dev/sdc3 on /.3 type reiserfs (rw,noatime)
/dev/sda2 on /.local type ext2 (rw,noatime,nogrpid)
none on /dev/pts type devpts (rw)
automount(pid1043) on /.local/mnt/auto type autofs (rw)
# umount /.3
umount: /.3: device is busy
# mount -o ro,remount /.3
# umount /.3
umount: /.3: device is busy
# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (ro,nogrpid)
none on /proc type proc (rw,nodiratime)
none on /sys type sysfs (rw)
none on /dev type ramfs (rw)
/dev/sda2 on /.share type ext2 (rw,noatime,nogrpid)
/dev/sdb2 on /.1 type reiserfs (rw,noatime)
/dev/sdc2 on /.2 type reiserfs (rw,noatime)
/dev/sdc3 on /.3 type reiserfs (ro,noatime)
/dev/sda2 on /.local type ext2 (rw,noatime,nogrpid)
none on /dev/pts type devpts (rw)
automount(pid1043) on /.local/mnt/auto type autofs (rw)
# reiserfstune /dev/sdc3
reiserfstune: Reiserfstune is not allowed to be run on mounted filesystem.
# lsof -nP | grep -F '/.3'
# lsof -nP | grep -F '/dev/sdc'

--
vda

2006-02-01 10:16:47

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 09:42, Denis Vlasenko wrote:
> On Monday 30 January 2006 15:22, Chris Mason wrote:
> > You can shrink the log of an existing filesystem. The minimum size is 513
> > blocks, you might try 1024 as a good starting poing.
> >
> > reiserfstune -s 1024 /dev/xxxx
> >
> > The filesystem must be unmounted first.
>
> It doesn't want to, for no obvious reason:
>
> # mount
> rootfs on / type rootfs (rw)
> /dev/root on / type ext2 (ro,nogrpid)
> none on /proc type proc (rw,nodiratime)
> none on /sys type sysfs (rw)
> none on /dev type ramfs (rw)
> /dev/sda2 on /.share type ext2 (rw,noatime,nogrpid)
> /dev/sdb2 on /.1 type reiserfs (rw,noatime)
> /dev/sdc2 on /.2 type reiserfs (rw,noatime)
> /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> /dev/sda2 on /.local type ext2 (rw,noatime,nogrpid)
> none on /dev/pts type devpts (rw)
> automount(pid1043) on /.local/mnt/auto type autofs (rw)
> # umount /.3
> umount: /.3: device is busy
> # mount -o ro,remount /.3
> # umount /.3
> umount: /.3: device is busy
> # reiserfstune /dev/sdc3
> reiserfstune: Reiserfstune is not allowed to be run on mounted filesystem.
> # lsof -nP | grep -F '/.3'
> # lsof -nP | grep -F '/dev/sdc'

Removed it from /etc/fstab, after reboot:

# reiserfstune -s 1024 /dev/sdc3
reiserfstune: Journal device has not been specified. Assuming journal is on the main device (/dev/sdc3).

Current parameters:

Filesystem state: consistent

Reiserfs super block in block 16 on 0x823 of format 3.6 with standard journal
Count of blocks on the device: 2094474
Number of bitmaps: 64
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
Root block: 32941
Filesystem is cleanly umounted
Tree height: 4
Hash function used to sort names: "r5"
Objectid map size 54, max 972
Journal parameters:
Device [0x0]
Magic [0x2cac04df]
Size 8193 blocks (including 1 for journal header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x0:
sb_version: 2
inode generation number: 293892
UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
LABEL:
Set flags in SB:
ATTRIBUTES CLEAN
reiserfstune: Current journal parameters:
Device [0x0]
Magic [0x2cac04df]
Size 8193 blocks (including 1 for journal header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
WARNING: wrong transaction max size (1024). Changed to 511
reiserfstune: New journal parameters:
Device [0x0]
Magic [0x70dbc903]
Size 1024 blocks (including 1 for journal header) (first block 18)
Max transaction length 511 blocks
Max batch size 449 blocks
Max commit age 30
Reiserfs super block in block 16 on 0x823 of format 3.6 with non-standard journal
Count of blocks on the device: 2094474
Number of bitmaps: 64
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
Root block: 32941
Filesystem is cleanly umounted
Tree height: 4
Hash function used to sort names: "r5"
Objectid map size 54, max 972
Journal parameters:
Device [0x0]
Magic [0x70dbc903]
Size 1024 blocks (including 1 for journal header) (first block 18)
Max transaction length 511 blocks
Max batch size 449 blocks
Max commit age 30
Blocks reserved by journal: 8193
Fs state field: 0x0:
sb_version: 2
inode generation number: 293892
UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
LABEL:
Set flags in SB:
ATTRIBUTES CLEAN
reiserfstune: ATTENTION: YOU ARE ABOUT TO SETUP THE NEW JOURNAL FOR THE "/dev/sdc3"!
AREA OF "/dev/sdc3" DEDICATED FOR JOURNAL WILL BE ZEROED!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok

# mount /dev/sdc3 /.3 -o noatime
mount: you must specify the filesystem type

# dmesg | tail -4
br: topology change detected, propagating
br: port 1(ifi) entering forwarding state
FAT: bogus number of reserved sectors
VFS: Can't find a valid FAT filesystem on dev sdc3.

# reiserfsck /dev/sdc3
reiserfsck 3.6.11 (2003 http://www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and it fails **
** please email bug reports to [email protected], **
** providing as much information as possible -- your **
** hardware, kernel, patches, settings, all reiserfsck **
** messages (including version), the reiserfsck logfile, **
** check the syslog file for any related information. **
** If you would like advice on using this program, support **
** is available for $25 at http://www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/sdc3
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Failed to open the journal device ((null)).

Wow... 8(
--
vda

2006-02-01 11:50:05

by Vitaly Fertman

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 13:15, Denis Vlasenko wrote:
>
> # reiserfstune -s 1024 /dev/sdc3
> # mount /dev/sdc3 /.3 -o noatime
> mount: you must specify the filesystem type
>
> # dmesg | tail -4
> br: topology change detected, propagating
> br: port 1(ifi) entering forwarding state
> FAT: bogus number of reserved sectors
> VFS: Can't find a valid FAT filesystem on dev sdc3.
>
> # reiserfsck /dev/sdc3
> reiserfsck 3.6.11 (2003 http://www.namesys.com)

your reiserfsprogs are old. which kernel are you using?

as I can see 3.6.11 had that problem indeed, however I have no
problem with progs 3.6.19 and kernel 2.6.9 even after shrinking
the journal with tune 3.6.11.

--
Vitaly

2006-02-01 11:52:47

by Edward Shishkin

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko wrote:

>On Wednesday 01 February 2006 09:42, Denis Vlasenko wrote:
>
>
>>On Monday 30 January 2006 15:22, Chris Mason wrote:
>>
>>
>>>You can shrink the log of an existing filesystem. The minimum size is 513
>>>blocks, you might try 1024 as a good starting poing.
>>>
>>>reiserfstune -s 1024 /dev/xxxx
>>>
>>>The filesystem must be unmounted first.
>>>
>>>
>>It doesn't want to, for no obvious reason:
>>
>># mount
>>rootfs on / type rootfs (rw)
>>/dev/root on / type ext2 (ro,nogrpid)
>>none on /proc type proc (rw,nodiratime)
>>none on /sys type sysfs (rw)
>>none on /dev type ramfs (rw)
>>/dev/sda2 on /.share type ext2 (rw,noatime,nogrpid)
>>/dev/sdb2 on /.1 type reiserfs (rw,noatime)
>>/dev/sdc2 on /.2 type reiserfs (rw,noatime)
>>/dev/sdc3 on /.3 type reiserfs (rw,noatime)
>>/dev/sda2 on /.local type ext2 (rw,noatime,nogrpid)
>>none on /dev/pts type devpts (rw)
>>automount(pid1043) on /.local/mnt/auto type autofs (rw)
>># umount /.3
>>umount: /.3: device is busy
>># mount -o ro,remount /.3
>># umount /.3
>>umount: /.3: device is busy
>># reiserfstune /dev/sdc3
>>reiserfstune: Reiserfstune is not allowed to be run on mounted filesystem.
>># lsof -nP | grep -F '/.3'
>># lsof -nP | grep -F '/dev/sdc'
>>
>>
>
>Removed it from /etc/fstab, after reboot:
>
># reiserfstune -s 1024 /dev/sdc3
>reiserfstune: Journal device has not been specified. Assuming journal is on the main device (/dev/sdc3).
>
>Current parameters:
>
>Filesystem state: consistent
>
>Reiserfs super block in block 16 on 0x823 of format 3.6 with standard journal
>Count of blocks on the device: 2094474
>Number of bitmaps: 64
>Blocksize: 4096
>Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
>Root block: 32941
>Filesystem is cleanly umounted
>Tree height: 4
>Hash function used to sort names: "r5"
>Objectid map size 54, max 972
>Journal parameters:
> Device [0x0]
> Magic [0x2cac04df]
> Size 8193 blocks (including 1 for journal header) (first block 18)
> Max transaction length 1024 blocks
> Max batch size 900 blocks
> Max commit age 30
>Blocks reserved by journal: 0
>Fs state field: 0x0:
>sb_version: 2
>inode generation number: 293892
>UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
>LABEL:
>Set flags in SB:
> ATTRIBUTES CLEAN
>reiserfstune: Current journal parameters:
> Device [0x0]
> Magic [0x2cac04df]
> Size 8193 blocks (including 1 for journal header) (first block 18)
> Max transaction length 1024 blocks
> Max batch size 900 blocks
> Max commit age 30
>WARNING: wrong transaction max size (1024). Changed to 511
>reiserfstune: New journal parameters:
> Device [0x0]
> Magic [0x70dbc903]
> Size 1024 blocks (including 1 for journal header) (first block 18)
> Max transaction length 511 blocks
> Max batch size 449 blocks
> Max commit age 30
>Reiserfs super block in block 16 on 0x823 of format 3.6 with non-standard journal
>Count of blocks on the device: 2094474
>Number of bitmaps: 64
>Blocksize: 4096
>Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
>Root block: 32941
>Filesystem is cleanly umounted
>Tree height: 4
>Hash function used to sort names: "r5"
>Objectid map size 54, max 972
>Journal parameters:
> Device [0x0]
> Magic [0x70dbc903]
> Size 1024 blocks (including 1 for journal header) (first block 18)
> Max transaction length 511 blocks
> Max batch size 449 blocks
> Max commit age 30
>Blocks reserved by journal: 8193
>Fs state field: 0x0:
>sb_version: 2
>inode generation number: 293892
>UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
>LABEL:
>Set flags in SB:
> ATTRIBUTES CLEAN
>reiserfstune: ATTENTION: YOU ARE ABOUT TO SETUP THE NEW JOURNAL FOR THE "/dev/sdc3"!
>AREA OF "/dev/sdc3" DEDICATED FOR JOURNAL WILL BE ZEROED!
>Continue (y/n):y
>Initializing journal - 0%....20%....40%....60%....80%....100%
>Syncing..ok
>
># mount /dev/sdc3 /.3 -o noatime
>mount: you must specify the filesystem type
>
># dmesg | tail -4
>br: topology change detected, propagating
>br: port 1(ifi) entering forwarding state
>FAT: bogus number of reserved sectors
>VFS: Can't find a valid FAT filesystem on dev sdc3.
>
># reiserfsck /dev/sdc3
>reiserfsck 3.6.11 (2003 http://www.namesys.com)
>
>*************************************************************
>** If you are using the latest reiserfsprogs and it fails **
>** please email bug reports to [email protected], **
>** providing as much information as possible -- your **
>** hardware, kernel, patches, settings, all reiserfsck **
>** messages (including version), the reiserfsck logfile, **
>** check the syslog file for any related information. **
>** If you would like advice on using this program, support **
>** is available for $25 at http://www.namesys.com/support.html. **
>*************************************************************
>
>Will read-only check consistency of the filesystem on /dev/sdc3
>Will put log info to 'stdout'
>
>Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
>Failed to open the journal device ((null)).
>
> Wow... 8(
>--
>vda
>
>

would you try
reiserfsck -j /dev/sdc3 /dev/sdc3

Thanks,
Edward.

2006-02-01 14:26:20

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 13:45, Vitaly Fertman wrote:
> On Wednesday 01 February 2006 13:15, Denis Vlasenko wrote:
> >
> > # reiserfstune -s 1024 /dev/sdc3
> > # mount /dev/sdc3 /.3 -o noatime
> > mount: you must specify the filesystem type
> >
> > # dmesg | tail -4
> > br: topology change detected, propagating
> > br: port 1(ifi) entering forwarding state
> > FAT: bogus number of reserved sectors
> > VFS: Can't find a valid FAT filesystem on dev sdc3.
> >
> > # reiserfsck /dev/sdc3
> > reiserfsck 3.6.11 (2003 http://www.namesys.com)
>
> your reiserfsprogs are old. which kernel are you using?

# uname -a
Linux pegasus 2.6.12.3-2 #1 SMP Thu Sep 15 11:04:37 EEST 2005 i686 unknown unknown GNU/Linux

> as I can see 3.6.11 had that problem indeed, however I have no
> problem with progs 3.6.19 and kernel 2.6.9 even after shrinking
> the journal with tune 3.6.11.

Updated to reiserfsprogs-3.6.19. How to fix /dev/sdc3 now?
--
vda

2006-02-01 14:26:43

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 13:52, Edward Shishkin wrote:
> ># reiserfstune -s 1024 /dev/sdc3
> >reiserfstune: Journal device has not been specified. Assuming journal is on the main device (/dev/sdc3).
> >
> >Current parameters:
> >
> >Filesystem state: consistent
> >
> >Reiserfs super block in block 16 on 0x823 of format 3.6 with standard journal
> >Count of blocks on the device: 2094474
> >Number of bitmaps: 64
> >Blocksize: 4096
> >Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
> >Root block: 32941
> >Filesystem is cleanly umounted
> >Tree height: 4
> >Hash function used to sort names: "r5"
> >Objectid map size 54, max 972
> >Journal parameters:
> > Device [0x0]
> > Magic [0x2cac04df]
> > Size 8193 blocks (including 1 for journal header) (first block 18)
> > Max transaction length 1024 blocks
> > Max batch size 900 blocks
> > Max commit age 30
> >Blocks reserved by journal: 0
> >Fs state field: 0x0:
> >sb_version: 2
> >inode generation number: 293892
> >UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
> >LABEL:
> >Set flags in SB:
> > ATTRIBUTES CLEAN
> >reiserfstune: Current journal parameters:
> > Device [0x0]
> > Magic [0x2cac04df]
> > Size 8193 blocks (including 1 for journal header) (first block 18)
> > Max transaction length 1024 blocks
> > Max batch size 900 blocks
> > Max commit age 30
> >WARNING: wrong transaction max size (1024). Changed to 511
> >reiserfstune: New journal parameters:
> > Device [0x0]
> > Magic [0x70dbc903]
> > Size 1024 blocks (including 1 for journal header) (first block 18)
> > Max transaction length 511 blocks
> > Max batch size 449 blocks
> > Max commit age 30
> >Reiserfs super block in block 16 on 0x823 of format 3.6 with non-standard journal
> >Count of blocks on the device: 2094474
> >Number of bitmaps: 64
> >Blocksize: 4096
> >Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1019710
> >Root block: 32941
> >Filesystem is cleanly umounted
> >Tree height: 4
> >Hash function used to sort names: "r5"
> >Objectid map size 54, max 972
> >Journal parameters:
> > Device [0x0]
> > Magic [0x70dbc903]
> > Size 1024 blocks (including 1 for journal header) (first block 18)
> > Max transaction length 511 blocks
> > Max batch size 449 blocks
> > Max commit age 30
> >Blocks reserved by journal: 8193
> >Fs state field: 0x0:
> >sb_version: 2
> >inode generation number: 293892
> >UUID: 7b3aa1ab-40bd-44da-98dc-b37b21f7add0
> >LABEL:
> >Set flags in SB:
> > ATTRIBUTES CLEAN
> >reiserfstune: ATTENTION: YOU ARE ABOUT TO SETUP THE NEW JOURNAL FOR THE "/dev/sdc3"!
> >AREA OF "/dev/sdc3" DEDICATED FOR JOURNAL WILL BE ZEROED!
> >Continue (y/n):y
> >Initializing journal - 0%....20%....40%....60%....80%....100%
> >Syncing..ok
> >
> ># mount /dev/sdc3 /.3 -o noatime
> >mount: you must specify the filesystem type
> >
> ># dmesg | tail -4
> >br: topology change detected, propagating
> >br: port 1(ifi) entering forwarding state
> >FAT: bogus number of reserved sectors
> >VFS: Can't find a valid FAT filesystem on dev sdc3.
> >
> ># reiserfsck /dev/sdc3
> >reiserfsck 3.6.11 (2003 http://www.namesys.com)
> >
> >*************************************************************
> >** If you are using the latest reiserfsprogs and it fails **
> >** please email bug reports to [email protected], **
> >** providing as much information as possible -- your **
> >** hardware, kernel, patches, settings, all reiserfsck **
> >** messages (including version), the reiserfsck logfile, **
> >** check the syslog file for any related information. **
> >** If you would like advice on using this program, support **
> >** is available for $25 at http://www.namesys.com/support.html. **
> >*************************************************************
> >
> >Will read-only check consistency of the filesystem on /dev/sdc3
> >Will put log info to 'stdout'
> >
> >Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
> >Failed to open the journal device ((null)).
> >
> > Wow... 8(
> >--
> >vda
> >
> >
>
> would you try
> reiserfsck -j /dev/sdc3 /dev/sdc3

Updated tools to reiserfsprogs-3.6.19, then:

# reiserfsck -j /dev/sdc3 /dev/sdc3
reiserfsck 3.6.19 (2003 http://www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and it fails **
** please email bug reports to [email protected], **
** providing as much information as possible -- your **
** hardware, kernel, patches, settings, all reiserfsck **
** messages (including version), the reiserfsck logfile, **
** check the syslog file for any related information. **
** If you would like advice on using this program, support **
** is available for $25 at http://www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/sdc3
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Wed Feb 1 16:24:20 2006
###########
Replaying journal..
No transactions found
Checking internal tree..finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
Leaves 8075
Internal nodes 52
Directories 1792
Other files 31865
Data block pointers 1058363 (0 of them are zero)
Safe links 0
###########
reiserfsck finished at Wed Feb 1 16:26:33 2006
###########

--
vda

2006-02-01 14:58:18

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 16:26, Denis Vlasenko wrote:
> Updated tools to reiserfsprogs-3.6.19, then:
>
> # reiserfsck -j /dev/sdc3 /dev/sdc3
> reiserfsck 3.6.19 (2003 http://www.namesys.com)
>
> *************************************************************
> ** If you are using the latest reiserfsprogs and it fails **
> ** please email bug reports to [email protected], **
> ** providing as much information as possible -- your **
> ** hardware, kernel, patches, settings, all reiserfsck **
> ** messages (including version), the reiserfsck logfile, **
> ** check the syslog file for any related information. **
> ** If you would like advice on using this program, support **
> ** is available for $25 at http://www.namesys.com/support.html. **
> *************************************************************
>
> Will read-only check consistency of the filesystem on /dev/sdc3
> Will put log info to 'stdout'
>
> Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
> ###########
> reiserfsck --check started at Wed Feb 1 16:24:20 2006
> ###########
> Replaying journal..
> No transactions found
> Checking internal tree..finished
> Comparing bitmaps..finished
> Checking Semantic tree:
> finished
> No corruptions found
> There are on the filesystem:
> Leaves 8075
> Internal nodes 52
> Directories 1792
> Other files 31865
> Data block pointers 1058363 (0 of them are zero)
> Safe links 0
> ###########
> reiserfsck finished at Wed Feb 1 16:26:33 2006
> ###########

I hit [reply] too fast.

reiserfsck 3.6.19 reports success now even without -j option,
but still (kernel 2.6.15.1):

# mount /dev/sdc3 /.3
mount: you must specify the filesystem type

How can I fix it?
--
vda

2006-02-01 15:00:10

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 16:25, Denis Vlasenko wrote:
> On Wednesday 01 February 2006 13:45, Vitaly Fertman wrote:
> > On Wednesday 01 February 2006 13:15, Denis Vlasenko wrote:
> > >
> > > # reiserfstune -s 1024 /dev/sdc3
> > > # mount /dev/sdc3 /.3 -o noatime
> > > mount: you must specify the filesystem type
> > >
> > > # dmesg | tail -4
> > > br: topology change detected, propagating
> > > br: port 1(ifi) entering forwarding state
> > > FAT: bogus number of reserved sectors
> > > VFS: Can't find a valid FAT filesystem on dev sdc3.
> > >
> > > # reiserfsck /dev/sdc3
> > > reiserfsck 3.6.11 (2003 http://www.namesys.com)
> >
> > your reiserfsprogs are old. which kernel are you using?
>
> # uname -a
> Linux pegasus 2.6.12.3-2 #1 SMP Thu Sep 15 11:04:37 EEST 2005 i686 unknown unknown GNU/Linux

I actually _looked at_ the uname output :) I am running wrong kernel!
Rebooted into 2.6.15.1. Mount still failing.
--
vda

2006-02-01 15:14:14

by be-news06

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko <[email protected]> wrote:
> # mount /dev/sdc3 /.3
> mount: you must specify the filesystem type

what happens if you actually specify the fs type?

Gruss
Bernd

2006-02-01 15:21:30

by be-news06

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko <[email protected]> wrote:
> # umount /.3
> umount: /.3: device is busy
> # lsof -nP | grep -F '/.3'
> # lsof -nP | grep -F 'sdc'

You can try "lsof +f -- /.3" also

Gruss
Bernd

2006-02-01 15:28:33

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 17:14, Bernd Eckenfels wrote:
> Denis Vlasenko <[email protected]> wrote:
> > # mount /dev/sdc3 /.3
> > mount: you must specify the filesystem type
>
> what happens if you actually specify the fs type?

It works.

# mount -t reiserfs /dev/sdc3 /.3
# umount /.3
# mount /dev/sdc3 /.3
mount: you must specify the filesystem type

--
vda

2006-02-02 07:25:47

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Monday 30 January 2006 15:22, Chris Mason wrote:
> > > chown -Rc 0:<n> .
> > >
> > >in a top directory of tree containing ~21938 files
> > >on reiser3 partition:
> > >
> > > /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> > >
> > >causes oom kill storm. "ls -lR", "find ." etc work fine.
>
> In order for the journaled filesystems to make sure the FS is consistent after
> a crash, we need to keep some blocks in memory until other blocks have been
> written. These blocks are pinned, and can't be freed until a certain amount
> of io is done.
>
> In the case of reiserfs, it might pin as much as the size of the journal at
> any time. The default journal is 32MB, which is much too large for a system
> with only 32MB of ram.
>
> You can shrink the log of an existing filesystem. The minimum size is 513
> blocks, you might try 1024 as a good starting poing.
>
> reiserfstune -s 1024 /dev/xxxx

I had reiserfsprogs 3.6.11 and reiserfstune (above command) made my /dev/sdc3
unmountable without -t reiserfs. I upgraded reiserfsprogs to 3.6.19 and now
reiserfsck /dev/sdc3 reports no problems, but mount problem persists:

# mount -t reiserfs /dev/sdc3 /.3
# umount /.3
# mount /dev/sdc3 /.3
mount: you must specify the filesystem type
# dmesg | tail -3
br: port 1(ifi) entering forwarding state
FAT: bogus number of reserved sectors
VFS: Can't find a valid FAT filesystem on dev sdc3.

"chown -Rc <n>:<m> ." now does not OOM kill the box, so this issue
is resolved, thanks!

Can I restore sdc3 somehow that I won't need -t reiserfs in mount command?
You can find result of

dd if=/dev/sdc3 of=1m bs=1M count=1

at http://195.66.192.167/linux/1m
--
vda

2006-02-02 09:46:39

by Vitaly Fertman

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

> > reiserfstune -s 1024 /dev/xxxx
>
> I had reiserfsprogs 3.6.11 and reiserfstune (above command) made my /dev/sdc3
> unmountable without -t reiserfs. I upgraded reiserfsprogs to 3.6.19 and now
> reiserfsck /dev/sdc3 reports no problems, but mount problem persists:
>
> # mount -t reiserfs /dev/sdc3 /.3
> # umount /.3
> # mount /dev/sdc3 /.3
> mount: you must specify the filesystem type
> # dmesg | tail -3
> br: port 1(ifi) entering forwarding state
> FAT: bogus number of reserved sectors
> VFS: Can't find a valid FAT filesystem on dev sdc3.
>
> "chown -Rc <n>:<m> ." now does not OOM kill the box, so this issue
> is resolved, thanks!
>
> Can I restore sdc3 somehow that I won't need -t reiserfs in mount command?
> You can find result of
>
> dd if=/dev/sdc3 of=1m bs=1M count=1
>
> at http://195.66.192.167/linux/1m

the problem seems to be in mount program, which version do you use?
I still have no problem with your 1m image, mount version is 2.11z,
2.12c.

--
Vitaly

2006-02-02 11:53:49

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Thursday 02 February 2006 11:42, Vitaly Fertman wrote:
> > > reiserfstune -s 1024 /dev/xxxx
> >
> > I had reiserfsprogs 3.6.11 and reiserfstune (above command) made my /dev/sdc3
> > unmountable without -t reiserfs. I upgraded reiserfsprogs to 3.6.19 and now
> > reiserfsck /dev/sdc3 reports no problems, but mount problem persists:
> >
> > # mount -t reiserfs /dev/sdc3 /.3
> > # umount /.3
> > # mount /dev/sdc3 /.3
> > mount: you must specify the filesystem type
> > # dmesg | tail -3
> > br: port 1(ifi) entering forwarding state
> > FAT: bogus number of reserved sectors
> > VFS: Can't find a valid FAT filesystem on dev sdc3.
> >
> > "chown -Rc <n>:<m> ." now does not OOM kill the box, so this issue
> > is resolved, thanks!
> >
> > Can I restore sdc3 somehow that I won't need -t reiserfs in mount command?
> > You can find result of
> >
> > dd if=/dev/sdc3 of=1m bs=1M count=1
> >
> > at http://195.66.192.167/linux/1m
>
> the problem seems to be in mount program, which version do you use?
> I still have no problem with your 1m image, mount version is 2.11z,
> 2.12c.

# mount --version
mount: mount-2.11p

Obviously you are right, mount reads some data in /dev/sdc3 itself
and decides to try specific fs types first instead of just asking
kernel to autodetect:

...
...
stat64("/dev/sdc3", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 35), ...}) = 0
open("/dev/sdc3", O_RDONLY|O_LARGEFILE) = 4
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 576) = 576
_llseek(4, 512, [512], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(4, 1024, [1024], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 228) = 228
_llseek(4, 1024, [1024], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
_llseek(4, 3072, [3072], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(4, 8192, [8192], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1376) = 1376
_llseek(4, 8192, [8192], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64
_llseek(4, 8192, [8192], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0", 8) = 8
_llseek(4, 32768, [32768], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 112) = 112
_llseek(4, 32768, [32768], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
_llseek(4, 65536, [65536], SEEK_SET) = 0
read(4, "\212\365\37\0>\217\17\0\255\200\0\0\22\0\0\0\0\0\0\0\377"..., 64) = 64
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
close(4) = 0
open("/etc/filesystems", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/proc/filesystems", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7f000
read(4, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 1024) = 277
mount("/dev/sdc3", "/.3", "msdos", 0xc0ed0000, 0) = -1 EINVAL (Invalid argument)
mount("/dev/sdc3", "/.3", "vfat", 0xc0ed0000, 0) = -1 EINVAL (Invalid argument)
read(4, "", 1024) = 0
close(4) = 0
munmap(0xb7f7f000, 4096) = 0
rt_sigprocmask(SIG_UNBLOCK, ~[TRAP SEGV], NULL, 8) = 0
open("/usr/share/locale/ru_RU.KOI8-R/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/ru_RU.koi8r/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/ru_RU/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/ru.KOI8-R/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/ru.koi8r/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/ru/LC_MESSAGES/util-linux.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "mount: you must specify the file"..., 44) = 44

mount from busybox 1.0 works fine:

# strace busybox mount /dev/sdc3 /.3
execve("/sbin/busybox", ["busybox", "mount", "/dev/sdc3", "/.3"], [/* 24 vars */]) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE, {B38400 opost isig icanon echo ...}) = 0
getuid() = 0
getgid() = 0
getuid() = 0
getgid() = 0
setgid(0) = 0
setuid(0) = 0
brk(0) = 0x80f6000
brk(0x80f7000) = 0x80f7000
brk(0x80f8000) = 0x80f8000
brk(0x80f9000) = 0x80f9000
stat64("/dev/sdc3", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 35), ...}) = 0
open("/etc/filesystems", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/proc/filesystems", O_RDONLY|O_LARGEFILE) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE, 0xbfb32740) = -1 ENOTTY (Inappropriate ioctl for device)
brk(0x80fa000) = 0x80fa000
read(4, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 4096) = 277
mount("/dev/sdc3", "/.3", "reiserfs", 0xc0ed0000, 0x80f6008) = 0
close(4) = 0
_exit(0)


--
vda

2006-02-02 15:04:53

by Pavel Machek

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Hi!


> > >[CCing namesys]
> > >
> > >Narrowed it down to 100% reproducible case:
> > >
> > > chown -Rc 0:<n> .
> > >
> > >in a top directory of tree containing ~21938 files
> > >on reiser3 partition:
> > >
> > > /dev/sdc3 on /.3 type reiserfs (rw,noatime)
> > >
> > >causes oom kill storm. "ls -lR", "find ." etc work fine.
> > >
> > >I suspected that it is a leak in winbindd libnss module,
> > >but chown does not seem to grow larger in top, and also
> > >running it under softlimit -m 400000 still causes oom kills
> > >while chown's RSS stays below 4MB.
>
> In order for the journaled filesystems to make sure the FS is consistent after
> a crash, we need to keep some blocks in memory until other blocks have been
> written. These blocks are pinned, and can't be freed until a certain amount
> of io is done.
>
> In the case of reiserfs, it might pin as much as the size of the journal at
> any time. The default journal is 32MB, which is much too large for a system
> with only 32MB of ram.
>
> You can shrink the log of an existing filesystem. The minimum size is 513
> blocks, you might try 1024 as a good starting poing.
>
> reiserfstune -s 1024 /dev/xxxx
>
> The filesystem must be unmounted first.

Could we refuse to mount filesystem unless journal_size <
physmem_size/2 or something like that?

I was not aware of this trap, and it seems unlikely that users know
about it...
Pavel
--
Thanks, Sharp!

2006-02-02 15:17:39

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Sunday 22 January 2006 01:20, Pavel Machek wrote:
> > In the case of reiserfs, it might pin as much as the size of the journal at
> > any time. The default journal is 32MB, which is much too large for a system
> > with only 32MB of ram.
> >
> > You can shrink the log of an existing filesystem. The minimum size is 513
> > blocks, you might try 1024 as a good starting poing.
> >
> > reiserfstune -s 1024 /dev/xxxx
> >
> > The filesystem must be unmounted first.
>
> Could we refuse to mount filesystem unless journal_size <
> physmem_size/2 or something like that?
>
> I was not aware of this trap, and it seems unlikely that users know
> about it...

Maybe reiserfs code should use journal of reduced size on lowmem boxes.
Basically "reiserfstune -s 1024 /dev/xxxx" on-the-fly.
--
vda

2006-02-02 19:25:12

by Bill Davidsen

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko wrote:
> On Monday 30 January 2006 15:22, Chris Mason wrote:
>
>>On Monday 30 January 2006 01:11, Hans Reiser wrote:
>>
>>>Chris, would Denis Vlasenko wrote:
>>>
>>>>[CCing namesys]
>>>>
>>>>Narrowed it down to 100% reproducible case:
>>>>
>>>> chown -Rc 0:<n> .
>>>>
>>>>in a top directory of tree containing ~21938 files
>>>>on reiser3 partition:
>>>>
>>>> /dev/sdc3 on /.3 type reiserfs (rw,noatime)
>>>>
>>>>causes oom kill storm. "ls -lR", "find ." etc work fine.
>>>>
>>>>I suspected that it is a leak in winbindd libnss module,
>>>>but chown does not seem to grow larger in top, and also
>>>>running it under softlimit -m 400000 still causes oom kills
>>>>while chown's RSS stays below 4MB.
>>
>>In order for the journaled filesystems to make sure the FS is consistent after
>>a crash, we need to keep some blocks in memory until other blocks have been
>>written. These blocks are pinned, and can't be freed until a certain amount
>>of io is done.
>>
>>In the case of reiserfs, it might pin as much as the size of the journal at
>>any time. The default journal is 32MB, which is much too large for a system
>>with only 32MB of ram.
>>
>>You can shrink the log of an existing filesystem. The minimum size is 513
>>blocks, you might try 1024 as a good starting poing.
>>
>>reiserfstune -s 1024 /dev/xxxx
>>
>>The filesystem must be unmounted first.
>
>
> Will try this and report the result.
>
> Please consider printing a big fat warning at mount time if total RAM
> on the system is close to sum of RAM space required for all currently
> mounted reiserfs partitions...

I would think that rather than warn about the problem that it would be
better to teach the filesystem not to use all of memory, and to stop and
wait for i/o to finish rather than pin so many pages that the system
becomes unusable. For definitions of usable which include not killing
random small processes because the filesystem code isn't playing nicely.

Would some global count of reiser_pinned_pages be possible as a way to
track the problem?

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-02-02 19:30:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko wrote:
> On Monday 30 January 2006 15:22, Chris Mason wrote:
>
>>>> chown -Rc 0:<n> .
>>>>
>>>>in a top directory of tree containing ~21938 files
>>>>on reiser3 partition:
>>>>
>>>> /dev/sdc3 on /.3 type reiserfs (rw,noatime)
>>>>
>>>>causes oom kill storm. "ls -lR", "find ." etc work fine.
>>
>>In order for the journaled filesystems to make sure the FS is consistent after
>>a crash, we need to keep some blocks in memory until other blocks have been
>>written. These blocks are pinned, and can't be freed until a certain amount
>>of io is done.
>>
>>In the case of reiserfs, it might pin as much as the size of the journal at
>>any time. The default journal is 32MB, which is much too large for a system
>>with only 32MB of ram.
>>
>>You can shrink the log of an existing filesystem. The minimum size is 513
>>blocks, you might try 1024 as a good starting poing.
>>
>>reiserfstune -s 1024 /dev/xxxx
>
>
> I had reiserfsprogs 3.6.11 and reiserfstune (above command) made my /dev/sdc3
> unmountable without -t reiserfs. I upgraded reiserfsprogs to 3.6.19 and now
> reiserfsck /dev/sdc3 reports no problems, but mount problem persists:
>
> # mount -t reiserfs /dev/sdc3 /.3
> # umount /.3
> # mount /dev/sdc3 /.3
> mount: you must specify the filesystem type
> # dmesg | tail -3
> br: port 1(ifi) entering forwarding state
> FAT: bogus number of reserved sectors
> VFS: Can't find a valid FAT filesystem on dev sdc3.
>
> "chown -Rc <n>:<m> ." now does not OOM kill the box, so this issue
> is resolved, thanks!
>
> Can I restore sdc3 somehow that I won't need -t reiserfs in mount command?
> You can find result of
>
> dd if=/dev/sdc3 of=1m bs=1M count=1
>
> at http://195.66.192.167/linux/1m

At the risk of stating the obvious:
1 - is reaser a module, and is it loaded?
2 - did this ever work? I think you said you removed the entry from
fstab, was the filetype there which made it work?

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-02-02 19:33:06

by Bill Davidsen

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

Denis Vlasenko wrote:

> Maybe reiserfs code should use journal of reduced size on lowmem boxes.
> Basically "reiserfstune -s 1024 /dev/xxxx" on-the-fly.

I posted something about that, but you really need to cover the case
where there are multiple mounts and the sum of all journal pins is kept
reasonable.

See my earlier post if you care.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-02-03 05:57:45

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Thursday 02 February 2006 21:32, Bill Davidsen wrote:
> >>reiserfstune -s 1024 /dev/xxxx
> >
> > I had reiserfsprogs 3.6.11 and reiserfstune (above command) made my /dev/sdc3
> > unmountable without -t reiserfs. I upgraded reiserfsprogs to 3.6.19 and now
> > reiserfsck /dev/sdc3 reports no problems, but mount problem persists:
> >
> > # mount -t reiserfs /dev/sdc3 /.3
> > # umount /.3
> > # mount /dev/sdc3 /.3
> > mount: you must specify the filesystem type
> > # dmesg | tail -3
> > br: port 1(ifi) entering forwarding state
> > FAT: bogus number of reserved sectors
> > VFS: Can't find a valid FAT filesystem on dev sdc3.
> >
> > "chown -Rc <n>:<m> ." now does not OOM kill the box, so this issue
> > is resolved, thanks!
> >
> > Can I restore sdc3 somehow that I won't need -t reiserfs in mount command?
> > You can find result of
> >
> > dd if=/dev/sdc3 of=1m bs=1M count=1
> >
> > at http://195.66.192.167/linux/1m
>
> At the risk of stating the obvious:
> 1 - is reaser a module, and is it loaded?
> 2 - did this ever work? I think you said you removed the entry from
> fstab, was the filetype there which made it work?

1 - not a module, 2 - fstab line was "/dev/sdc3 /.3 auto noatime,rw 1 1".
But anyway. mount problem is solved now, mount from util-linux 2.11p
tried to "autodetect" fs and thought it's a FAT partition.

mount from busybox 1.0 works.
--
vda

2006-02-03 06:21:48

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Recursive chmod/chown OOM kills box with 32MB RAM

On Wednesday 01 February 2006 17:21, Bernd Eckenfels wrote:
> Denis Vlasenko <[email protected]> wrote:
> > # umount /.3
> > umount: /.3: device is busy
> > # lsof -nP | grep -F '/.3'
> > # lsof -nP | grep -F 'sdc'
>
> You can try "lsof +f -- /.3" also

Didn't help, but later I found it.
Kernel nfsd daemon was keeping it busy.
It's not visible in lsof.
--
vda