Hello Trond, hello all,
I'm suffering from the long present bug in the nfs client.
This bug cause programs reading from NFS volume to stuck in D state forever.
This bug revealed only when client talks to NFS server with 3COM 3C905
NIC's ( well I'v triggered it with Intel eepro card too, but you have to
wait) and never with cheap slower cards like RTLxxxx, NE2000 clones. It
happens infrequently but inevitably. It happens more frequentlly on
2.4.17 kernel then on 2.4.21-pre5, when compiled by gcc 3.2.1 then gcc
2.95.3. Trond's NFS patches doesn't help on both kernels. It's not due
to packets loss (ok, it happens some times but rarely), it happens on
both 10 and 100 Mbps. This happens only on my StrongARM board (similar
to Brutus) with SMC's LAN91C111 ethernet chip. I 've not able to
reproduce this on PC, but i've head about very similar case:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0206.0/0066.html
It triggered simply by 'ls -lR /home>/dev/null&' and takes ~ half a
minute to happend.
If i insert a few printk's in the interrupt handler for NIC, it's gone !!!
IMHO this is due to the race in the nfs client.
Look at some logs from my system:
sh-2.03# mount
rootfs on / type rootfs (rw)
/dev/mtdblock4 on / type jffs2 (rw)
none on /proc type proc (rw)
none on /tmp type tmpfs (rw)
none on /dev/pts type devpts (rw)
infracvs:/group on /group type nfs
(rw,v2,rsize=4096,wsize=4096,soft,intr,udp,lock,addr=infracvs)
serov:/home on /home type nfs
(rw,v3,rsize=4096,wsize=4096,soft,intr,udp,lock,addr=serov)
sh-2.03# ps
PID Uid Stat Command
1 root S init
2 root S [keventd]
3 root S [ksoftirqd_CPU0]
4 root S [kswapd]
5 root S [bdflush]
6 root S [kupdated]
7 root S [mtdblockd]
8 root S [jffs2_gcd_mtd4]
102 root S dhcpcd
111 bin S portmap
113 root S [rpciod]
114 root S [lockd]
124 root S klogd
138 root S /usr/sbin/inetd
143 root S /www/sbin/sshd -f /www/etc/sshd_config
152 root S init
153 root S sh -login -i
158 root D ls -lR /home
159 root D ls -lR /home
179 root D ls -lR /home
183 root R ps
Part of output from Magic SysRq t with decoded symbols:
ls D C001EB8C 3216 165 153 (NOTLB)
Function entered at [<c001e990>] from [<c0109b14>]
schedule __rpc_execute
I've used /proc/sys/sunrpc/rpc_debug and /proc/sys/sunrpc/nfs_debug to
get some info, it was nothing interesting in it exept the fact that rpc
request wich was constantly reused after 'ls' stuck is appeared inthe
following message in the --rqstp- column.
sh-2.03# echo 1 > /proc/sys/sunrpc/rpc_debug
sh-2.03# dmesg -c -s 66666
-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait
-action- --exit--
20429 0001 0000 000000 c0eda960 100003 c8f89218 00000000 <NULL>
c0105d5c 0
10052 0001 0000 000000 c0eda960 100003 c8f8918c 00000000 <NULL>
c0105d5c 0
06851 0001 0000 000000 c0eda960 100003 c8f89100 00000000 <NULL>
c0105d5c 0
00673 0004 0000 000000 c0eda960 100003 c8f89074 00000000 <NULL>
c0105d5c 0
00368 0000 0081 -00110 c0eda960 100003 0 00003000 nfs_flushd
c006e290 c006e3c8
00002 0000 0081 -00110 c0e310a0 100003 0 00003000 nfs_flushd
c006e290 c006e3c8
c006e290 t nfs_flushd
c006e3c8 t nfs_flushd_exit
c0105d5c t call_status
>>>>> " " == Vladimir Serov <[email protected]> writes:
> interrupt handler for NIC, it's gone !!! IMHO this is due to
> the race in the nfs client.
Why would an NFS race show up only on PPC? Do you have a tcpdump?
Cheers,
Trond
Trond Myklebust wrote:
>>>>>>" " == Vladimir Serov <[email protected]> writes:
>>>>>>
>>>>>>
>
> > interrupt handler for NIC, it's gone !!! IMHO this is due to
> > the race in the nfs client.
>
>Why would an NFS race show up only on PPC? Do you have a tcpdump?
>
>
Hi , Trond
As I wrote , another persone has similar problems on PC's, as to me it
was a big suprise to see such a problem in nfs, cause i'm using it for
over 10 yers in a different setups's OS's , etc. Yes I have tcpdump ,
and as i wrote, nothing wrong is going on with packet receiption, where
is now corrupted packets , no error messages, NOTHING !!!! Just RPC
request gets lost, I mean not correctly connected to the some queue or
caller. It last for over a year , and is a big pain in the ass of
company i'm working for now.
With best regards, Vladimir.
>>>>> " " == Vladimir Serov <[email protected]> writes:
> Trond Myklebust wrote:
>>>>>>> " " == Vladimir Serov <[email protected]> writes:
>>>>>>>
>>
>> > interrupt handler for NIC, it's gone !!! IMHO this is due to
>> > the race in the nfs client.
>>
>> Why would an NFS race show up only on PPC? Do you have a
>> tcpdump?
>>
> Hi , Trond As I wrote , another persone has similar problems on
> PC's, as to me it was a big suprise to see such a problem in
No that wasn't the same problem. IIRC that other person had faulty
hardware. To my knowledge, there are no outstanding problems with
hangs under 2.4.x.
> nfs, cause i'm using it for over 10 yers in a different
> setups's OS's , etc. Yes I have tcpdump , and as i wrote,
> nothing wrong is going on with packet receiption, where is now
> corrupted packets , no error messages, NOTHING !!!! Just RPC
Can I see that tcpdump in order to judge that for myself?
Cheers,
Trond
Hi Trond ,
I'm calling you again, to the ask for the help with the same problem on
ARM nfs client.
I've managed to trace this problem down to the the specific routine
where control reaches uninterruptible sleep.
It's in 'fs/nfs/inode.c' : __nfs_revalidate_inode(struct nfs_server
*server, struct inode *inode):
while (NFS_REVALIDATING(inode)) {
dfprintk(PAGECACHE, "NFS: nfs_wait_on_inode()\n");
status = nfs_wait_on_inode(inode, NFS_INO_REVALIDATING);
if (status < 0)
goto out_nowait;
if (time_before(jiffies,NFS_READTIME(inode)+NFS_ATTRTIMEO(inode))) {
status = NFS_STALE(inode) ? -ESTALE : 0;
goto out_nowait;
}
}
control sleeps forever in nfs_wait_on_inode(). I can't bet, but as far
as i broused dmesg output every call to nfs_wait_on_inode() in this
place leads to uninterruptible sleep forever. I like to notice that
inode->i_count i'm printing is 2 when this problem happens instead of 1
when things are OK. Also the rpc client in this request is c0d75060
which is mentioned in rpc queue status:
-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait
-action- --exit--
09150 0001 0000 000000 c0d75060 100003 c8f99074 00000000 <NULL>
c00f17b8 0
00683 0000 0081 -00110 c0d75060 100003 0 00030000 nfs_flushd
c0066c04 c0066d3c
00002 0000 0081 -00110 c8f93c80 100003 0 00030000 nfs_flushd
c0066c04 c0066d3c
my current kernel is 2.4.21-pre6 based. I'm using HARD mounted nfs
volume now !!!
here is the tail of last dmesg output:
NFS: revalidating (9/1246183/1246183 count=1)
NFS: lock_kernel()
NFS: exit while (NFS_REVALIDATING(inode))
NFS call getattr
RPC: 25429 new task procpid 179
RPC: 25429 looking up UNIX cred
RPC: 25429 rpc_execute flgs 0
RPC: 25429 deleting timer
RPC: 25429 call_start nfs3 proc 1 (sync)
RPC: 25429 deleting timer
RPC: 25429 call_reserve
RPC: 25429 reserved req c8f99100 xid 00011353
RPC: 25429 deleting timer
RPC: 25429 call_reserveresult (status 0)
RPC: 25429 deleting timer
RPC: 25429 call_allocate (status 0)
RPC: allocated buffer c8e70800
RPC: 25429 deleting timer
RPC: 25429 call_encode (status 0)
RPC: 25429 marshaling UNIX cred c0db30a0
RPC: 25429 deleting timer
RPC: 25429 call_bind xprt c8f99000 is connected
RPC: 25429 deleting timer
RPC: 25429 call_transmit (status 0)
RPC: 25429 xprt_transmit(11353)
RPC: 25429 xprt_cwnd_limited cong = 0 cwnd = 512
RPC: udp_data_ready...
RPC: udp_data_ready client c8f99000
RPC: 25429 received reply
RPC: cong 256, cwnd was 512, now 512
RPC: 25429 has input (112 bytes)
RPC: xprt_sendmsg(0) = 104
RPC: 25429 xmit complete
RPC: wake_up_next(c8f9904c "xprt_resend")
RPC: wake_up_next(c8f99040 "xprt_sending")
RPC: 25429 deleting timer
RPC: 25429 call_status (status 112)
RPC: 25429 deleting timer
RPC: 25429 call_decode (status 112)
RPC: 25429 validating UNIX cred c0db30a0
RPC: 25429 call_decode result 0 decode 0xc0071920
RPC: 25429 deleting timer
RPC: 25429 exit() = 0
RPC: 25429 release task
RPC: 25429 disabling timer
RPC: 25429 deleting timer
RPC: wake_up_next(c8f9904c "xprt_resend")
RPC: wake_up_next(c8f99040 "xprt_sending")
RPC: 25429 release request c8f99100
RPC: wake_up_next(c8f99064 "xprt_backlog")
RPC: 25429 releasing UNIX cred c0db30a0
RPC: rpc_release_client(c0d75060, 3)
NFS reply getattr
NFS: nfs_refresh_inode(inode, &fattr)
NFS: refresh_inode(9/1246183 ct=1 info=0x6)
NFS: (9/1246183) revalidation complete
NFS: out:
NFS: out_nowait:
NFS: unlock_kernel()
NFS: dentry_delete(linux/802_11.h, 0)
RPC: looking up UNIX cred
RPC: looking up UNIX cred
RPC: looking up UNIX cred
RPC: looking up UNIX cred
RPC: looking up UNIX cred
RPC: looking up UNIX cred
RPC: looking up UNIX cred
NFS: revalidating (9/1246184/1246184 count=2)
NFS: lock_kernel()
NFS: nfs_wait_on_inode()
NFS: nfs_wait_on_inode: clnt=c0d75060 nfs_wait_event() UN interruptible
Cheers, Vladimir.
>>>>> Vladimir Serov <[email protected]> writes:
> when things are OK. Also the rpc client in this request is
> c0d75060 which is mentioned in rpc queue status:
> -pid- proc flgs status -client- -prog- --rqstp- -timeout
> -rpcwait -action- --exit--
> 09150 0001 0000 000000 c0d75060 100003 c8f99074 00000000
> <NULL> c00f17b8 0
Looks like there is a hanging GETATTR call from another process that
is blocking your process.
Which procedure does c00f17b8 correspond to? You can use gdb on the
'vmlinux' file (NB!: *not* the compressed vmlinuz), then
'disassemble 0xc00f17b8').
Cheers,
Trond
Hi Trond,
> > when things are OK. Also the rpc client in this request is
> > c0d75060 which is mentioned in rpc queue status:
>
> > -pid- proc flgs status -client- -prog- --rqstp- -timeout
> > -rpcwait -action- --exit--
> > 09150 0001 0000 000000 c0d75060 100003 c8f99074 00000000
> > <NULL> c00f17b8 0
>
>
>Looks like there is a hanging GETATTR call from another process that
>is blocking your process.
>
>Which procedure does c00f17b8 correspond to?
>
from the System.map :
c00f17b8 t call_status
Regards, Vladimir.
>>>>> " " == Vladimir Serov <[email protected]> writes:
>>
>> Looks like there is a hanging GETATTR call from another process
>> that is blocking your process.
>>
>> Which procedure does c00f17b8 correspond to?
>>
> from the System.map :
> c00f17b8 t call_status
OK. Looks as if the process isn't being woken up in
rpc_execute(). Does the following make any difference?
Cheers,
Trond
--- linux-2.4.21-up/include/linux/sunrpc/sched.h.orig 2003-05-13 17:51:59.000000000 +0200
+++ linux-2.4.21-up/include/linux/sunrpc/sched.h 2003-05-13 23:04:44.000000000 +0200
@@ -128,7 +128,12 @@
#define RPC_IS_RUNNING(t) (test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
#define rpc_set_running(t) (set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
-#define rpc_clear_running(t) (clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
+#define rpc_clear_running(t) \
+ do { \
+ smp_mb__before_clear_bit(); \
+ clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate); \
+ smp_mb__after_clear_bit(); \
+ } while(0)
#define rpc_set_sleeping(t) (set_bit(RPC_TASK_SLEEPING, &(t)->tk_runstate))
--- linux-2.4.21-up/include/linux/sunrpc/xprt.h.orig 2003-05-13 17:51:59.000000000 +0200
+++ linux-2.4.21-up/include/linux/sunrpc/xprt.h 2003-05-13 23:06:41.000000000 +0200
@@ -200,7 +200,12 @@
#define xprt_connected(xp) (test_bit(XPRT_CONNECT, &(xp)->sockstate))
#define xprt_set_connected(xp) (set_bit(XPRT_CONNECT, &(xp)->sockstate))
#define xprt_test_and_set_connected(xp) (test_and_set_bit(XPRT_CONNECT, &(xp)->sockstate))
-#define xprt_clear_connected(xp) (clear_bit(XPRT_CONNECT, &(xp)->sockstate))
+#define xprt_clear_connected(xp) \
+ do { \
+ smp_mb__before_clear_bit(); \
+ clear_bit(XPRT_CONNECT, &(xp)->sockstate); \
+ smp_mb__after_clear_bit(); \
+ } while(0)
#endif /* __KERNEL__*/
Hi Trond , Hi Russell,
I have some new info on this long standing and obscure problem with ARM
nfs client. I found the initial place of nfs failure. Sometimes 'ls -lR
/home' command sleeps inside __rpc_execute() function and doesn't get
really woken up. While everything looks like going correctly and
wake_up(&task->tk_wait) is called (with rpc_clear_sleeping(task)
previously done) in rpc_make_runnable() function, and the address of
wait queue ( &task->tk_wait ) used in __wait_event(task->tk_wait,
!RPC_IS_SLEEPING(task)) and wake_up(&task->tk_wait) is the same. But for
some reason task doesn't get woken up !!!
I'm not shure which layer of kernel code could be responsible for this,
and where to look in now.
Help, please !!!
My current kernel is 2.4.21-pre6 based with patches from 2.4.19-rmk7
applied (well partially, except ide and pci cause i don't have them,
board is mostly brutus). I'm using HARD mounted nfs
volume now !!! The tail of dmesg is following.
NFS: lookup(LC_MESSAGES/kmid.mo)
NFS: lookup nfs_cached_lookup()
NFS: lookup nfs_fhget()
NFS: nfs_fhget(LC_MESSAGES/kmid.mo fileid=3227729)
NFS: refresh_inode(9/3227729 ct=1 info=0x6)
NFS: __nfs_fhget(9/3227729 ct=1)
NFS: lookup nfs_renew_times()
NFS: lookup out:
NFS: revalidating (9/3227729/3227729 count=1)
NFS: lock_kernel()
NFS: exit while (NFS_REVALIDATING(inode))
NFS call getattr
RPC: 3871 new task procpid 171
RPC: 3871 looking up UNIX cred
RPC: 3871 rpc_execute flgs 0
RPC: 3871 deleting timer
RPC: 3871 call_start nfs3 proc 1 (sync)
RPC: 3871 deleting timer
RPC: 3871 call_reserve
RPC: 3871 reserved req c0e0f074 xid 00032f1d
RPC: 3871 deleting timer
RPC: 3871 call_reserveresult (status 0)
RPC: 3871 deleting timer
RPC: 3871 call_allocate (status 0)
RPC: allocated buffer c0dd1000
RPC: 3871 deleting timer
RPC: 3871 call_encode (status 0)
RPC: 3871 marshaling UNIX cred c8f3a0a0
RPC: 3871 deleting timer
RPC: 3871 call_bind xprt c0e0f000 is connected
RPC: 3871 deleting timer
RPC: 3871 call_transmit (status 0)
RPC: 3871 xprt_transmit(32f1d)
RPC: 3871 xprt_cwnd_limited cong = 0 cwnd = 512
RPC: xprt_sendmsg(0) = 104
RPC: 3871 xmit complete
RPC: 3871 sleep_on(queue "xprt_pending" time 1248461)
RPC: 3871 added to queue c0e0f058 "xprt_pending"
RPC: 3871 setting alarm for 4 ms
RPC: wake_up_next(c0e0f04c "xprt_resend")
RPC: wake_up_next(c0e0f040 "xprt_sending")
RPC: 3871 sync task going to sleep on c0e83e70
RPC: udp_data_ready...
RPC: udp_data_ready client c0e0f000
RPC: 3871 received reply
RPC: cong 256, cwnd was 512, now 512
RPC: wake_up_next(c0e0f04c "xprt_resend")
RPC: wake_up_next(c0e0f040 "xprt_sending")
RPC: 3871 has input (112 bytes)
RPC: 3871 __rpc_wake_up_task (now 1248462 inh 0)
RPC: 3871 disabling timer
RPC: 3871 removed from queue c0e0f058 "xprt_pending"
RPC: 3871 sync task was woken up on c0e83e70 in rpc_make_runnable() !!!
RPC: __rpc_wake_up_task done
NFS: dentry_delete(hu/LC_MESSAGES, 0)
NFS: dentry_delete(il/LC_MESSAGES, 0)
NFS: dentry_delete(locale/br, 0)
NFS: dentry_delete(locale/ca, 0)
NFS: dentry_delete(locale/cs, 0)
NFS: dentry_delete(locale/da, 0)
NFS: dentry_delete(share/ghostscript, 0)
NFS: dentry_delete(licq/qt-gui, 0)
NFS: dentry_delete(locale/de, 0)
NFS: dentry_delete(is/LC_MESSAGES, 0)
NFS: dentry_delete(it/LC_MESSAGES, 0)
NFS: dentry_delete(ja/LC_MESSAGES, 0)
NFS: dentry_delete(ko/LC_MESSAGES, 0)
NFS: dentry_delete(mk/LC_MESSAGES, 0)
NFS: dentry_delete(nl/LC_MESSAGES, 0)
RPC: 656 running timer
RPC: 656 timeout (default timer)
RPC: 656 __rpc_wake_up_task (now 1256944 inh 0)
RPC: 656 disabling timer
RPC: 656 removed from queue c0138a1c "nfs_flushd"
RPC: 656 added to queue c013e6e8 "schedq"
RPC: __rpc_wake_up_task done
RPC: switch to rpciod
RPC: rpc_schedule enter
RPC: 656 removed from queue c013e6e8 "schedq"
RPC: 656 rpc_execute flgs 81
RPC: 656 deleting timer
NFS: 656 flushd starting
NFS: 656 flushd back to sleep
RPC: 656 sleep_on(queue "nfs_flushd" time 1256944)
RPC: 656 added to queue c0138a1c "nfs_flushd"
RPC: 656 setting alarm for 30000 ms
RPC: rpc_schedule leave
RPC: rpciod back to sleep
RPC: 2 running timer
RPC: 2 timeout (default timer)
RPC: 2 __rpc_wake_up_task (now 1278435 inh 0)
RPC: 2 disabling timer
RPC: 2 removed from queue c0138a1c "nfs_flushd"
RPC: 2 added to queue c013e6e8 "schedq"
RPC: __rpc_wake_up_task done
RPC: switch to rpciod
RPC: rpc_schedule enter
RPC: 2 removed from queue c013e6e8 "schedq"
RPC: 2 rpc_execute flgs 81
RPC: 2 deleting timer
NFS: 2 flushd starting
NFS: 2 flushd back to sleep
RPC: 2 sleep_on(queue "nfs_flushd" time 1278435)
RPC: 2 added to queue c0138a1c "nfs_flushd"
RPC: 2 setting alarm for 30000 ms
RPC: rpc_schedule leave
RPC: rpciod back to sleep
-pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait
-action- --exit--
03871 0001 0000 000000 c8f5fa80 100003 c0e0f074 00000000 <NULL>
c00f17ac 0
00656 0000 0081 -00110 c8f5fa80 100003 0 00030000 nfs_flushd
c0066c08 c0066d40
00002 0000 0081 -00110 c0e15d80 100003 0 00030000 nfs_flushd
c0066c08 c0066d40
Cheers, Vladimir.
Did you try the patch I sent you last week? (appended below)
Cheers,
Trond
--- linux-2.4.21-up/include/linux/sunrpc/sched.h.orig 2003-05-13 17:51:59.000000000 +0200
+++ linux-2.4.21-up/include/linux/sunrpc/sched.h 2003-05-13 23:04:44.000000000 +0200
@@ -128,7 +128,12 @@
#define RPC_IS_RUNNING(t) (test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
#define rpc_set_running(t) (set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
-#define rpc_clear_running(t) (clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
+#define rpc_clear_running(t) \
+ do { \
+ smp_mb__before_clear_bit(); \
+ clear_bit(RPC_TASK_RUNNING, &(t)->tk_runstate); \
+ smp_mb__after_clear_bit(); \
+ } while(0)
#define rpc_set_sleeping(t) (set_bit(RPC_TASK_SLEEPING, &(t)->tk_runstate))
--- linux-2.4.21-up/include/linux/sunrpc/xprt.h.orig 2003-05-13 17:51:59.000000000 +0200
+++ linux-2.4.21-up/include/linux/sunrpc/xprt.h 2003-05-13 23:06:41.000000000 +0200
@@ -200,7 +200,12 @@
#define xprt_connected(xp) (test_bit(XPRT_CONNECT, &(xp)->sockstate))
#define xprt_set_connected(xp) (set_bit(XPRT_CONNECT, &(xp)->sockstate))
#define xprt_test_and_set_connected(xp) (test_and_set_bit(XPRT_CONNECT, &(xp)->sockstate))
-#define xprt_clear_connected(xp) (clear_bit(XPRT_CONNECT, &(xp)->sockstate))
+#define xprt_clear_connected(xp) \
+ do { \
+ smp_mb__before_clear_bit(); \
+ clear_bit(XPRT_CONNECT, &(xp)->sockstate); \
+ smp_mb__after_clear_bit(); \
+ } while(0)
#endif /* __KERNEL__*/
Trond Myklebust wrote:
>Did you try the patch I sent you last week? (appended below)
>
>Cheers,
> Trond
>
Yes I do, Trond.
It doesn't help, and probably shouldn't, because it's UP not SMP system.
Cheers, Vladimir.
>>>>> " " == Vladimir Serov <[email protected]> writes:
> Yes I do, Trond. It doesn't help, and probably shouldn't,
> because it's UP not SMP system.
Then I haven't a clue as to why the PPC case should differ from the
normal i386 case.
Have you been able to capture a trace of the __rpc_wake_up_task() call
that fails? So far I've only seen traces of what happens afterwards.
Cheers,
Trond
On Mon, May 19, 2003 at 05:20:27PM +0400, Vladimir Serov wrote:
> My current kernel is 2.4.21-pre6 based with patches from 2.4.19-rmk7
> applied (well partially, except ide and pci cause i don't have them,
> board is mostly brutus). I'm using HARD mounted nfs
> volume now !!! The tail of dmesg is following.
Looking back on stuff which happened a long time ago, there's a
possibility that there's an ordering issue with set_current_state.
Please note that this is affects _all_ 2.4 architectures.
I think this was discussed about 6 months ago, so I'm surprised this
hasn't made it into the 2.4.2x kernel (or no one else has seen the
problem.)
Please note that since I don't track 2.4 kernels currently, I've no
idea whether 2.4.21-pre/rc has this fix. (This means if Marcelo
drops or ignores the fix, it won't get re-sent.) Certainly gcc 3
requires this patch. I've not heard of gcc 2.95.x tripping over
this problem though.
The problem area is:
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
#ifdef CONFIG_SMP
#define set_task_state(tsk, state_value) \
set_mb((tsk)->state, (state_value))
#else
#define set_task_state(tsk, state_value) \
__set_task_state((tsk), (state_value))
#endif
#define __set_current_state(state_value) \
do { current->state = (state_value); } while (0)
#ifdef CONFIG_SMP
#define set_current_state(state_value) \
set_mb(current->state, (state_value))
#else
#define set_current_state(state_value) \
__set_current_state(state_value)
#endif
As they currently stand, there is no guarantee that the assignment to
"->state" occurs before the condition in wait_event is processed.
The attached patch should fix your problem. It should be applied to
2.4.2x. All architectures which do not provide set_mb() need to be
fixed.
--- ref/include/linux/sched.h Wed Mar 19 15:54:45 2003
+++ linux/include/linux/sched.h Wed May 14 09:45:31 2003
@@ -94,23 +94,13 @@
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
-#ifdef CONFIG_SMP
#define set_task_state(tsk, state_value) \
set_mb((tsk)->state, (state_value))
-#else
-#define set_task_state(tsk, state_value) \
- __set_task_state((tsk), (state_value))
-#endif
#define __set_current_state(state_value) \
do { current->state = (state_value); } while (0)
-#ifdef CONFIG_SMP
#define set_current_state(state_value) \
set_mb(current->state, (state_value))
-#else
-#define set_current_state(state_value) \
- __set_current_state(state_value)
-#endif
/*
* Scheduling policies
--- ref/include/asm-arm/system.h Wed Sep 11 17:30:52 2002
+++ linux/include/asm-arm/system.h Wed May 14 09:46:20 2003
@@ -46,6 +46,8 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define set_mb(var, value) do { var = value; mb(); } while (0)
+#define set_wmb(var, value) do { var = value; wmb(); } while (0)
#define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");
#define prepare_to_switch() do { } while(0)
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Wed, May 21, 2003 at 10:29:23AM +0100, Russell King wrote:
> On Mon, May 19, 2003 at 05:20:27PM +0400, Vladimir Serov wrote:
> > My current kernel is 2.4.21-pre6 based with patches from 2.4.19-rmk7
> > applied (well partially, except ide and pci cause i don't have them,
> > board is mostly brutus). I'm using HARD mounted nfs
> > volume now !!! The tail of dmesg is following.
>
> Looking back on stuff which happened a long time ago, there's a
> possibility that there's an ordering issue with set_current_state.
>
> Please note that this is affects _all_ 2.4 architectures.
>
> I think this was discussed about 6 months ago, so I'm surprised this
> hasn't made it into the 2.4.2x kernel (or no one else has seen the
> problem.)
Yes, it was first discovered 7 months ago, but it seems Marcelo didn't
merge the fix:
> Date: Fri, 18 Oct 2002 23:00:58 -0400 (EDT)
> From: Nicolas Pitre <[email protected]>
> To: Marcelo Tosatti <[email protected]>
> Subject: [PATCH] set_task_state() UP memory barriers
Nicolas included a more complete fix which updates all 2.4 architectures.
Nico - could you re-send your fix please?
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Russell King wrote:
>Looking back on stuff which happened a long time ago, there's a
>possibility that there's an ordering issue with set_current_state.
>
>Please note that this is affects _all_ 2.4 architectures.
>
>
..........................
>The attached patch should fix your problem. It should be applied to
>2.4.2x. All architectures which do not provide set_mb() need to be
>fixed.
>
>
>
Thanks a lot !!! It works !!!
With best regards, Vladimir.
On Wed, 21 May 2003, Russell King wrote:
> On Wed, May 21, 2003 at 10:29:23AM +0100, Russell King wrote:
> > On Mon, May 19, 2003 at 05:20:27PM +0400, Vladimir Serov wrote:
> > > My current kernel is 2.4.21-pre6 based with patches from 2.4.19-rmk7
> > > applied (well partially, except ide and pci cause i don't have them,
> > > board is mostly brutus). I'm using HARD mounted nfs
> > > volume now !!! The tail of dmesg is following.
> >
> > Looking back on stuff which happened a long time ago, there's a
> > possibility that there's an ordering issue with set_current_state.
> >
> > Please note that this is affects _all_ 2.4 architectures.
> >
> > I think this was discussed about 6 months ago, so I'm surprised this
> > hasn't made it into the 2.4.2x kernel (or no one else has seen the
> > problem.)
>
> Yes, it was first discovered 7 months ago, but it seems Marcelo didn't
> merge the fix:
>
> > Date: Fri, 18 Oct 2002 23:00:58 -0400 (EDT)
> > From: Nicolas Pitre <[email protected]>
> > To: Marcelo Tosatti <[email protected]>
> > Subject: [PATCH] set_task_state() UP memory barriers
>
> Nicolas included a more complete fix which updates all 2.4 architectures.
> Nico - could you re-send your fix please?
Here it is, again.
---------- Forwarded message ----------
>From [email protected] Wed May 21 10:55:10 2003
Date: Fri, 18 Oct 2002 23:00:58 -0400 (EDT)
From: Nicolas Pitre <[email protected]>
To: Marcelo Tosatti <[email protected]>
Subject: [PATCH] set_task_state() UP memory barriers
This fixes the UP set_task_state and set_current_state to ensure that we
don't re-order loads around the store for setting task->state.
The patch below has just been merged into 2.5 by Linus but 2.4 needs this
as well, especially if gcc-3.2.x is used to compile the kernel.
diff -ur linux-2.4.20-pre11/include/asm-arm/system.h linux/include/asm-arm/system.h
--- linux-2.4.20-pre11/include/asm-arm/system.h Mon Nov 27 20:07:59 2000
+++ linux/include/asm-arm/system.h Fri Oct 18 22:13:49 2002
@@ -39,6 +39,8 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define set_mb(var, value) do { var = value; mb(); } while (0)
+#define set_wmb(var, value) do { var = value; wmb(); } while (0)
#define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");
#define prepare_to_switch() do { } while(0)
diff -ur linux-2.4.20-pre11/include/asm-cris/system.h linux/include/asm-cris/system.h
--- linux-2.4.20-pre11/include/asm-cris/system.h Fri Aug 2 20:39:45 2002
+++ linux/include/asm-cris/system.h Fri Oct 18 22:13:49 2002
@@ -150,6 +150,8 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define set_mb(var, value) do { var = value; mb(); } while (0)
+#define set_wmb(var, value) do { var = value; wmb(); } while (0)
#ifdef CONFIG_SMP
#define smp_mb() mb()
diff -ur linux-2.4.20-pre11/include/asm-i386/system.h linux/include/asm-i386/system.h
--- linux-2.4.20-pre11/include/asm-i386/system.h Fri Oct 18 22:11:16 2002
+++ linux/include/asm-i386/system.h Fri Oct 18 22:13:49 2002
@@ -305,13 +305,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define set_mb(var, value) do { xchg(&var, value); } while (0)
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif
-#define set_mb(var, value) do { xchg(&var, value); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)
/* interrupt control.. */
Only in linux/include/asm-i386: system.h.orig
diff -ur linux-2.4.20-pre11/include/asm-m68k/system.h linux/include/asm-m68k/system.h
--- linux-2.4.20-pre11/include/asm-m68k/system.h Fri Aug 2 20:39:45 2002
+++ linux/include/asm-m68k/system.h Fri Oct 18 22:24:57 2002
@@ -82,7 +82,7 @@
#define mb() barrier()
#define rmb() barrier()
#define wmb() barrier()
-#define set_mb(var, value) do { xchg(&var, value); } while (0)
+#define set_mb(var, value) do { var = value; mb(); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)
#define smp_mb() barrier()
Only in linux/include/asm-parisc: system.h.orig
diff -ur linux-2.4.20-pre11/include/asm-sh/system.h linux/include/asm-sh/system.h
--- linux-2.4.20-pre11/include/asm-sh/system.h Sat Sep 8 15:29:09 2001
+++ linux/include/asm-sh/system.h Fri Oct 18 22:13:49 2002
@@ -100,7 +100,7 @@
#define smp_wmb() barrier()
#endif
-#define set_mb(var, value) do { xchg(&var, value); } while (0)
+#define set_mb(var, value) do { var = value; mb(); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)
/* Interrupt Control */
Only in linux/include/asm-sh: system.h.orig
diff -ur linux-2.4.20-pre11/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.4.20-pre11/include/linux/sched.h Fri Oct 18 22:11:19 2002
+++ linux/include/linux/sched.h Fri Oct 18 22:14:04 2002
@@ -94,23 +94,13 @@
#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)
-#ifdef CONFIG_SMP
#define set_task_state(tsk, state_value) \
set_mb((tsk)->state, (state_value))
-#else
-#define set_task_state(tsk, state_value) \
- __set_task_state((tsk), (state_value))
-#endif
#define __set_current_state(state_value) \
do { current->state = (state_value); } while (0)
-#ifdef CONFIG_SMP
#define set_current_state(state_value) \
set_mb(current->state, (state_value))
-#else
-#define set_current_state(state_value) \
- __set_current_state(state_value)
-#endif
/*
* Scheduling policies