Hi Trond,
I'm using NFS (v2) over TCP (in a SSH tunnel).
Each time the SSH dies before a umount NFS, I have to umount -f
and I get a crash (only sysrq works).
Actually, the crash occurs a few seconds after umount -f.
It seems that killing SSH by hand does _not_ lead to crash.
But a long network failure does.
I remember seeing this bug several times with all stable releases
from 2.6.7 to 2.6.11. I didn't try with earlier versions.
I didn't see anything in the logs (after reboot). But I can't be sure
there was nothing in dmesg since I didn't get a chance to chvt 1 and
see console messages before rebooting (with sysrq).
Do you have any idea how to debug this ?
Thanks,
Brice
fr den 22.04.2005 Klokka 14:32 (+0200) skreiv Brice Goglin:
> Hi Trond,
>
> I'm using NFS (v2) over TCP (in a SSH tunnel).
> Each time the SSH dies before a umount NFS, I have to umount -f
> and I get a crash (only sysrq works).
> Actually, the crash occurs a few seconds after umount -f.
>
> It seems that killing SSH by hand does _not_ lead to crash.
> But a long network failure does.
> I remember seeing this bug several times with all stable releases
> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
>
> I didn't see anything in the logs (after reboot). But I can't be sure
> there was nothing in dmesg since I didn't get a chance to chvt 1 and
> see console messages before rebooting (with sysrq).
I'll try to reproduce. There has just been a discussion about "umount
-f" on the NFS mailing list ([email protected]), where Peter
Cendio said he was seeing the following Oops:
http://www.cendio.se/~peter/fc3-umount-crash.png
I am unable to reproduce Peter's crash, but I didn't try the scenario
that you describe above.
Cheers,
Trond
--
Trond Myklebust <[email protected]>
Brice Goglin wrote:
> Hi Trond,
>
> I'm using NFS (v2) over TCP (in a SSH tunnel).
> Each time the SSH dies before a umount NFS, I have to umount -f
> and I get a crash (only sysrq works).
> Actually, the crash occurs a few seconds after umount -f.
>
> It seems that killing SSH by hand does _not_ lead to crash.
> But a long network failure does.
> I remember seeing this bug several times with all stable releases
> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
>
> I didn't see anything in the logs (after reboot). But I can't be sure
> there was nothing in dmesg since I didn't get a chance to chvt 1 and
> see console messages before rebooting (with sysrq).
>
> Do you have any idea how to debug this ?
No clue, but a question: is this a hard or soft mount? Could you post
your ssh and mount commands, munged as needed for security? That might
give someone a clue.
I did this "back when" but I don't recall having a problem with it.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
Bill Davidsen a ?crit :
> Brice Goglin wrote:
>
>> Hi Trond,
>>
>> I'm using NFS (v2) over TCP (in a SSH tunnel).
>> Each time the SSH dies before a umount NFS, I have to umount -f
>> and I get a crash (only sysrq works).
>> Actually, the crash occurs a few seconds after umount -f.
>>
>> It seems that killing SSH by hand does _not_ lead to crash.
>> But a long network failure does.
>> I remember seeing this bug several times with all stable releases
>> from 2.6.7 to 2.6.11. I didn't try with earlier versions.
>>
>> I didn't see anything in the logs (after reboot). But I can't be sure
>> there was nothing in dmesg since I didn't get a chance to chvt 1 and
>> see console messages before rebooting (with sysrq).
>>
>> Do you have any idea how to debug this ?
>
>
> No clue, but a question: is this a hard or soft mount? Could you post
> your ssh and mount commands, munged as needed for security? That might
> give someone a clue.
The ssh command is just
$ ssh kwad -L 2249:localhost:2049 -L 2248:localhost:870 -N -f
(port is forwarded to 2249 while mountport if forwarded to 2248)
Options is /proc/mounts are
rw,v2,rsize=8192,wsize=8192,hard,tcp,nolock,addr=localhost
I just had another network failure. I ran umount -f from vt1 to see
kernel message. I waited for about 1 minute but didn't get any crash.
So I switched back to X... and got the crash then.
Looks like this crash doesn't want me to see any message...
Brice
Brice Goglin a ?crit :
> The ssh command is just
> $ ssh kwad -L 2249:localhost:2049 -L 2248:localhost:870 -N -f
> (port is forwarded to 2249 while mountport if forwarded to 2248)
>
> Options is /proc/mounts are
> rw,v2,rsize=8192,wsize=8192,hard,tcp,nolock,addr=localhost
>
> I just had another network failure. I ran umount -f from vt1 to see
> kernel message. I waited for about 1 minute but didn't get any crash.
> So I switched back to X... and got the crash then.
> Looks like this crash doesn't want me to see any message...
I just got it through netconsole.
Unfortunatelly, the call trace doesn't appear.
Maybe the netconsole didn't have time send it before crashing.
Hope this helps.
Brice
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
RPC: error 5 connecting to server localhost
Unable to handle kernel paging request at virtual address ffffff98
printing eip:
e0aaa07a
*pde = 00002067
*pte = 00000000
Oops: 0002 [#1]
PREEMPT
Modules linked in: netconsole sd_mod usb_storage vfat fat loop isofs
zlib_inflate nls_cp850 nls_iso8859_15 smbfs nfs lockd sunrpc i915 tun
ipt_MASQUERADE iptable_nat ipt_state ip_conntrack iptable_filter
ip_tables floppy uhci_hcd ehci_hcd dm_mod snd_intel8x0 snd_ac97_codec
CPU: 0
EIP: 0060:[<e0aaa07a>] Not tainted VLI
EFLAGS: 00010297 (2.6.11=Macvin)
EIP is at rpc_wake_up_status+0x6a/0x80 [sunrpc]
eax: ffffff84 ebx: d0065888 ecx: 00000001 edx: c146e000
esi: fffffffb edi: d0065888 ebp: d0065800 esp: c146ef14
ds: 007b es: 007b ss: 0068
Process events/0 (pid: 3, threadinfo=c146e000 task=c1473020)
Stack: c146ef44 d0065800 00000283 fffffffb e0aa710e d0065888 fffffffb
00120dcb c1473184 00000000 d0065904
to den 05.05.2005 Klokka 12:17 (+0200) skreiv Brice Goglin:
> Unable to handle kernel paging request at virtual address ffffff98
> printing eip:
> e0aaa07a
> *pde = 00002067
> *pte = 00000000
> Oops: 0002 [#1]
> PREEMPT
>
> Modules linked in: netconsole sd_mod usb_storage vfat fat loop isofs
> zlib_inflate nls_cp850 nls_iso8859_15 smbfs nfs lockd sunrpc i915 tun
> ipt_MASQUERADE iptable_nat ipt_state ip_conntrack iptable_filter
> ip_tables floppy uhci_hcd ehci_hcd dm_mod snd_intel8x0 snd_ac97_codec
>
> CPU: 0
> EIP: 0060:[<e0aaa07a>] Not tainted VLI
> EFLAGS: 00010297 (2.6.11=Macvin)
> EIP is at rpc_wake_up_status+0x6a/0x80 [sunrpc]
> eax: ffffff84 ebx: d0065888 ecx: 00000001 edx: c146e000
> esi: fffffffb edi: d0065888 ebp: d0065800 esp: c146ef14
> ds: 007b es: 007b ss: 0068
> Process events/0 (pid: 3, threadinfo=c146e000 task=c1473020)
> Stack: c146ef44 d0065800 00000283 fffffffb e0aa710e d0065888 fffffffb
> 00120dcb c1473184 00000000 d0065904
Have you tried the attached patch? Andrew has already included it in the
-mm series.
Cheers,
Trond