Hello,
Doing some more testing and found that calling swapoff &
passing it a regular file causes a kernel Oops. It should
simply return with EINVAL in errno. I'm using a Red Hat
2.4.20-13.9 kernel and opened RH bugzilla#91603 to document
it.
The reason I'm contacting the lkml is that there may be
other distributions affected or 2.5 kernel may have this
issue, too. I found the problem using the following
program:
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <errno.h>
int main(void)
{
int ret;
if (geteuid() != 0) {
puts("Must be super/root for this test!");
return 1;
}
if (creat("./abcd", S_IRWXU) == 0) {
printf("Unable to setup abcd");
return 1;
}
ret = swapoff("./abcd");
if (ret == -1 && errno != EINVAL) {
printf("%d returned instead of EINVAL.\n", errno);
return 1;
}
unlink("./abcd");
return 0;
}
What I get in my logs is this:
May 25 12:59:58 dds kernel: <1>Unable to handle kernel
NULL pointer dereference at virtual address 0000026e
May 25 12:59:58 dds kernel: printing eip:
May 25 12:59:58 dds kernel: c0149985
May 25 12:59:58 dds kernel: *pde = 00000000
May 25 12:59:58 dds kernel: Oops: 0002
May 25 12:59:58 dds kernel: parport_pc lp parport 3c59x
ipv6 ipt_LOG ipt_state iptable_nat ip_conntrack
iptable_filter ip_tables ide-scsi scsi_mod ide-cd cdrom
loop lvm-mod keybdev mouse
May 25 12:59:58 dds kernel: CPU: 0
May 25 12:59:58 dds kernel: EIP: 0060:[<c0149985>]
Not tainted
May 25 12:59:58 dds kernel: EFLAGS: 00010202
May 25 12:59:58 dds kernel:
May 25 12:59:58 dds kernel: EIP is at path_release [kernel]
0x15 (2.4.20-13.9)
May 25 12:59:58 dds kernel: eax: c1ac6f84 ebx: c2e5ff90
ecx: ffffffff edx: 00000246
May 25 12:59:58 dds kernel: esi: 00000002 edi: ffffffea
ebp: c0c3cbe0 esp: c2e5ff84
May 25 12:59:58 dds kernel: ds: 0068 es: 0068 ss: 0068
May 25 12:59:58 dds kernel: Process sigtest (pid: 1900,
stackpage=c2e5f000)
May 25 12:59:58 dds kernel: Stack: c037ae88 c013a831
c2e5ff90 c1ac6f84 00000246 00000003 c013f2f0 c1ac6f84
May 25 12:59:58 dds kernel: cf814000 c2e5e000
00000004 c2e5e000 40012820 bffff624 bffff5c8 c0109103
May 25 12:59:58 dds kernel: 080484e8 000001c0
4014e9a0 40012820 bffff624 bffff5c8 00000073 0000002b
May 25 12:59:58 dds kernel: Call Trace: [<c013a831>]
sys_swapoff [kernel] 0x191 (0xc2e5ff88))
May 25 12:59:58 dds kernel: [<c013f2f0>] sys_open [kernel]
0x70 (0xc2e5ff9c))
May 25 12:59:58 dds kernel: [<c0109103>] system_call
[kernel] 0x33 (0xc2e5ffc0))
May 25 12:59:58 dds kernel:
May 25 12:59:58 dds kernel:
May 25 12:59:58 dds kernel: Code: ff 4a 28 0f 94 c0 84 c0
75 02 5b c3 89 54 24 08 5b e9 65 c3
CPU is K6-2, fs is ext3 on ide. OS is RH9.
If any more info is needed, let me know. Please try the
program first, though. Hopefully, my machine isn't the only
one doing this.
-Steve Grubb
__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
> program first, though. Hopefully, my machine isn't the only
> one doing this.
Well, I had this problem when using RH8, but using kernel 2.4.20-8 from RH9.
Switching completely to RH9 solved this (and I'm using 2.4.20-9 now which
works, too). I'm using a 1GB swap file on an ext3 partition.
Daniel
>Well, I had this problem when using RH8, but using kernel
>2.4.20-8 from RH9.
OK, I've investigated this further. It seems that all Red
Hat 2.4.18 kernels are immune to the swapoff problem. All
2.4.20 kernels (include the brand new one) have this bug. I
don't think anyone has tried the program I sent since it
would have caused an Oops and you'd see what I mean. I
don't think this is a Red Hat problem either, I think this
is generic to recent kernels.
Looking at the source for 2.4.20 kernel mm/swapfile.c
sys_swapoff function, the bug goes like this...swapoff
checks permissions, this is OK, it then gets the nameidata
entry for the filename, it checks to see if the file is on
the swap list, but its not (remember mkswap was never
called). err is set to -EINVAL and it jumps to out_dput
line 792. The kernel is unlocked and path_release() is
called (fs/namei.c line 253).
path_release() will unmount the entry and this is where the
Oops occurs. It was never mounted. It is simply a regular
file. It seems like there should be some check if err ==
-EINVAL that the file is in fact mounted. Looking at
__mntput (which is an inline function & maybe that's why
its not in the Oops call stack), it implicitly trusts that
the mnt parameter is not NULL & is valid. dput() at least
checks for NULL and does nothing.
I'm not too familiar with the kernel internals, but maybe
someone else can take what I've said and figure out the
right fix.
Best Regards,
-Steve Grubb
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
On Sun, May 25, 2003 at 10:58:06AM -0700, Steve G wrote:
> Hello,
>
> What I get in my logs is this:
You didn't run that through ksymoops...
>You didn't run that through ksymoops...
Did you try the program? Does it Oops for you?
Here's the info from ksymoops:
>>EIP; c0149985 <path_release+15/30> <=====
>>eax; c1ac6f84 <_end+171cf04/10462fe0>
>>ebx; ce261f90 <_end+deb7f10/10462fe0>
>>ecx; ffffffff <END_OF_CODE+2f685ba0/????>
>>edx; 00200246 Before first symbol
>>edi; ffffffea <END_OF_CODE+2f685b8b/????>
>>ebp; c503f4e0 <_end+4c95460/10462fe0>
>>esp; ce261f84 <_end+deb7f04/10462fe0>
Trace; c013a831 <sys_swapoff+191/260>
Trace; c013f2f0 <sys_open+70/80>
Trace; c0109103 <system_call+33/40>
Code; c0149985 <path_release+15/30>
00000000 <_EIP>:
Code; c0149985 <path_release+15/30> <=====
0: ff 4a 28 decl 0x28(%edx)
<=====
Code; c0149988 <path_release+18/30>
3: 0f 94 c0 sete %al
Code; c014998b <path_release+1b/30>
6: 84 c0 test %al,%al
Code; c014998d <path_release+1d/30>
8: 75 02 jne c <_EIP+0xc>
Code; c014998f <path_release+1f/30>
a: 5b pop %ebx
Code; c0149990 <path_release+20/30>
b: c3 ret
Code; c0149991 <path_release+21/30>
c: 89 54 24 08 mov %edx,0x8(%esp,1)
Code; c0149995 <path_release+25/30>
10: 5b pop %ebx
Code; c0149996 <path_release+26/30>
11: e9 65 c3 00 00 jmp c37b <_EIP+0xc37b>
2 warnings and 3 errors issued. Results
may not be reliable.
Curiously, the brand new RH 2.4.20-18.9 kernel produces 2
Oops from the one program. When I run it from a gnome
terminal, this is what comes immediately after the first
Oops:
>>EIP; c0134973 <__kmem_cache_alloc+73/e0> <=====
>>eax; 84ac6f83 Before first symbol
>>ebx; c1a61000 <_end+16b6f80/10462fe0>
>>ecx; c1ac1f20 <_end+1717ea0/10462fe0>
>>edx; c1ac6f8c <_end+171cf0c/10462fe0>
>>esi; c1ac6f84 <_end+171cf04/10462fe0>
>>edi; 00200246 Before first symbol
>>ebp; c1a61000 <_end+16b6f80/10462fe0>
>>esp; c8caff40 <_end+8905ec0/10462fe0>
Trace; c013432f <kmem_cache_alloc+f/20>
Trace; c014972d <getname+1d/b0>
Trace; c014a5ee <__user_walk+e/40>
Trace; c0146aa6 <sys_readlink+26/90>
Trace; c011e9a2 <sys_gettimeofday+22/60>
Trace; c0109103 <system_call+33/40>
Code; c0134973
<__kmem_cache_alloc+73/e0>
00000000 <_EIP>:
Code; c0134973 <__kmem_cache_alloc+73/e0> <=====
0: 89 48 04 mov %ecx,0x4(%eax)
<=====
Code; c0134976 <__kmem_cache_alloc+76/e0>
3: 89 71 04 mov %esi,0x4(%ecx)
Code; c0134979 <__kmem_cache_alloc+79/e0>
6: eb d9 jmp ffffffe1
<_EIP+0xffffffe1>
Code; c013497b <__kmem_cache_alloc+7b/e0>
8: 8d 46 10 lea 0x10(%esi),%eax
Code; c013497e <__kmem_cache_alloc+7e/e0>
b: 8b 4e 10 mov 0x10(%esi),%ecx
Code; c0134981 <__kmem_cache_alloc+81/e0>
e: 39 c1 cmp %eax,%ecx
Code; c0134983 <__kmem_cache_alloc+83/e0>
10: 74 20 je 32 <_EIP+0x32>
Code; c0134985 <__kmem_cache_alloc+85/e0>
12: 8b 41 00 mov 0x0(%ecx),%eax
2 warnings and 3 errors issued. Results
may not be reliable.
It would be nice to get confirmation as to this being a
generic problem or just a RH problem. Can someone running
another distro or plain vanilla kernel > 2.4.18 give the
program a try?
http://marc.theaimsgroup.com/?l=linux-kernel&m=105388560111905&w=2
Thanks,
Steve Grubb
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
On Thu, Jun 05, 2003 at 02:13:28PM -0700, Steve G wrote:
> >You didn't run that through ksymoops...
>
> Did you try the program? Does it Oops for you?
Didn't try it yet. Sorry. Will try later.