Hello,
I report a problem that occurs on a Core2 system (x86_64 used) with a Linux
2.6.19.2, when i use a NAS : Maxtor Shared Storage II 320Go (Linux 2.6.12
inside).
In fact, this NAS can be web-configured to sleep after 30 min. Also, i mount a
partition of this device through this kind of entry inside the fstab :
------------
//192.168.1.60/Archive /home/tuxico/NAS/Archive cifs
noauto,users,iocharset=iso8859-1,noperm,nosetuids,noacl,sfu,file_mode=0600,dir_mode=0755,uid=tuxico,gid=users,credentials=/root/.credentials
0 0
------------
Under those circumstances, the Core2 system which is connected to it, freeze
sometimes completely (mouse, keyboard are frozen, no connection possible from
an external system - sshd not respond).
This occurs regularly (within 3-4 days) and it seems that the problem results
from the awakening of the NAS device.
Indeed I have disable the sleep feature of the device (via its web control
panel), then i was unable to trigger the problem for at least 7 days of
uptime of the Core2 system.
I join the .config, the result of lspci, and the CIFS logs that have been
written to /var/log/messages (i don't know if there are relevant or not but
in the doubt...).
Thanks in advance for the help.
Best regards,
Eric
The cifs entries in the dmesg log do not indicate any errors, much less
show the cause of this
particular problem.
The repeated entry:
CIFS VFS: Send error in SETFSUnixInfo = -5
is expected on connection to certain older versions of Samba servers (or
other servers that
only partially support the current CIFS Unix Extensions). It is harmless.
It would be useful to know (e.g. if it is possible to trace the network
traffic on the server side on your NAS box) whether
any network traffic from the client is being sent when (or just before)
the hang occurs.
It is possible that the restarting of the NAS box allows reconnection of
the smb/cifs session to proceed
which presumably could be hanging or looping in the network adapter
driver, the tcp stack or cifs on
the client, but it is hard to tell without more information. I don't
know much about either of the
GigE drivers loaded on your system to determine if there is an easy way to
tell their state.
There are various ways to analyze system hangs including (at least in some
cases) getting a system dump which
can be used to isolate the failing location - hopefully
[email protected] wrote on 01/30/2007 06:37:48 AM:
> Hello,
>
> I report a problem that occurs on a Core2 system (x86_64 used) with a
Linux
> 2.6.19.2, when i use a NAS : Maxtor Shared Storage II 320Go (Linux
2.6.12
> inside).
>
> In fact, this NAS can be web-configured to sleep after 30 min. Also,i
mount a
> partition of this device through this kind of entry inside the fstab :
>
> ------------
> //192.168.1.60/Archive /home/tuxico/NAS/Archive cifs
> noauto,users,iocharset=iso8859-1,noperm,nosetuids,noacl,sfu,
> file_mode=0600,dir_mode=0755,uid=tuxico,gid=users,
> credentials=/root/.credentials
> 0 0
> ------------
>
> Under those circumstances, the Core2 system which is connected to it,
freeze
> sometimes completely (mouse, keyboard are frozen, no connection possible
from
> an external system - sshd not respond).
>
> This occurs regularly (within 3-4 days) and it seems that the problem
results
> from the awakening of the NAS device.
>
> Indeed I have disable the sleep feature of the device (via its web
control
> panel), then i was unable to trigger the problem for at least 7 days of
> uptime of the Core2 system.
>
> I join the .config, the result of lspci, and the CIFS logs that have
been
> written to /var/log/messages (i don't know if there are relevant or not
but
> in the doubt...).
>
> Thanks in advance for the help.
>
> Best regards,
>
> Eric
> [attachment "lspci" deleted by Steven French/Austin/IBM] [attachment
> "dotconfig" deleted by Steven French/Austin/IBM] [attachment
> "cifslog" deleted by Steven French/Austin/IBM]
Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com
First of all, thank you for your answer.
On Wednesday 31 January 2007 04:51:35 Steven French wrote:
> The cifs entries in the dmesg log do not indicate any errors, much less
> show the cause of this
> particular problem.
ok.
>
> The repeated entry:
> CIFS VFS: Send error in SETFSUnixInfo = -5
> is expected on connection to certain older versions of Samba servers (or
> other servers that
> only partially support the current CIFS Unix Extensions). It is harmless.
>
> It would be useful to know (e.g. if it is possible to trace the network
> traffic on the server side on your NAS box) whether
> any network traffic from the client is being sent when (or just before)
> the hang occurs.
Unfortunately, I can't easily do stuff on the NAS box other than what it was
provided for. (An interesting project exists about the first version of the
Maxtor Shared Storage : openmss, but this one is based on a totally different
hardware).
>
> It is possible that the restarting of the NAS box allows reconnection of
> the smb/cifs session to proceed
> which presumably could be hanging or looping in the network adapter
> driver, the tcp stack or cifs on
> the client, but it is hard to tell without more information. I don't
> know much about either of the
> GigE drivers loaded on your system to determine if there is an easy way to
> tell their state.
The network device I use is :
"D-Link System Inc DGE-528T Gigabit Ethernet Adapter (rev 10)",
and the driver used is the one for "Realtek 8169 PCI Gigabit Ethernet adapter"
(in the 2.6.19.2) which is the only one that recognizes this device.
That problem also remind me that when I compiled this driver without
the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I have
really poor performance with the net device. Maybe it is related, or not ;)
If it gives you more ideas ?
Maybe it could be interesting to know about the r8169 maintainer, but I dont
know who he is.
>
> There are various ways to analyze system hangs including (at least in some
> cases) getting a system dump which
> can be used to isolate the failing location - hopefully
Could you give me some worthful URLs ?
Thank you again.
Eric
>
> [email protected] wrote on 01/30/2007 06:37:48 AM:
[...]
> Steve French
> Senior Software Engineer
> Linux Technology Center - IBM Austin
> phone: 512-838-2294
> email: sfrench at-sign us dot ibm dot com
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Hello,
I resend my email with a more appropriated title, following the brief
discussion with Steve French.
It seems possible that my problem is related to a bug inside the r8169 network
driver.
Information follows. If someone need more information to figure out how the
problem could be fixed, I would do my best to answer.
Thanks in advance.
Best regards,
Eric Lacombe
---------------
> On Tuesday 30 January 2007 13:37:48 Eric Lacombe wrote:
> > Hello,
> >
> > I report a problem that occurs on a Core2 system (x86_64 used) with a
> > Linux 2.6.19.2, when i use a NAS : Maxtor Shared Storage II 320Go (Linux
> > 2.6.12 inside).
> >
> > In fact, this NAS can be web-configured to sleep after 30 min. Also, i
> > mount a partition of this device through this kind of entry inside the
> > fstab :
> >
> > ------------
> > //192.168.1.60/Archive /home/tuxico/NAS/Archive cifs
> > noauto,users,iocharset=iso8859-1,noperm,nosetuids,noacl,sfu,file_mode=060
> >0, dir_mode=0755,uid=tuxico,gid=users,credentials=/root/.credentials 0 0
> > ------------
> >
> > Under those circumstances, the Core2 system which is connected to it,
> > freeze sometimes completely (mouse, keyboard are frozen, no connection
> > possible from an external system - sshd not respond).
> >
> > This occurs regularly (within 3-4 days) and it seems that the problem
> > results from the awakening of the NAS device.
> >
> > Indeed I have disable the sleep feature of the device (via its web
> > control panel), then i was unable to trigger the problem for at least 7
> > days of uptime of the Core2 system.
> >
> > I join the .config, the result of lspci, and the CIFS logs that have been
> > written to /var/log/messages (i don't know if there are relevant or not
> > but in the doubt...).
> >
> > Thanks in advance for the help.
> >
> > Best regards,
> >
> > Eric
On Wednesday 31 January 2007 04:51:35 Steve French wrote:
> The cifs entries in the dmesg log do not indicate any errors, much less
> show the cause of this
> particular problem.
ok.
>
> The repeated entry:
> CIFS VFS: Send error in SETFSUnixInfo = -5
> is expected on connection to certain older versions of Samba servers (or
> other servers that
> only partially support the current CIFS Unix Extensions). It is harmless.
>
> It would be useful to know (e.g. if it is possible to trace the network
> traffic on the server side on your NAS box) whether
> any network traffic from the client is being sent when (or just before)
> the hang occurs.
Unfortunately, I can't easily do stuff on the NAS box other than what it was
provided for. (An interesting project exists about the first version of the
Maxtor Shared Storage : openmss, but this one is based on a totally different
hardware).
>
> It is possible that the restarting of the NAS box allows reconnection of
> the smb/cifs session to proceed
> which presumably could be hanging or looping in the network adapter
> driver, the tcp stack or cifs on
> the client, but it is hard to tell without more information. I don't
> know much about either of the
> GigE drivers loaded on your system to determine if there is an easy way to
> tell their state.
The network device I use is :
"D-Link System Inc DGE-528T Gigabit Ethernet Adapter (rev 10)",
and the driver used is the one for "Realtek 8169 PCI Gigabit Ethernet adapter"
(in the 2.6.19.2) which is the only one that recognizes this device.
That problem also remind me that when I compiled this driver without
the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I have
really poor performance with the net device. Maybe it is related, or not ;)
If it gives you more ideas ?
Maybe it could be interesting to know about the r8169 maintainer, but I dont
know who he is.
>
> There are various ways to analyze system hangs including (at least in some
> cases) getting a system dump which
> can be used to isolate the failing location - hopefully
I would like to have more detailed help, if possible.
Thanks.
Eric
>
> [email protected] wrote on 01/30/2007 06:37:48 AM:
[...]
> Steve French
> Senior Software Engineer
> Linux Technology Center - IBM Austin
> phone: 512-838-2294
> email: sfrench at-sign us dot ibm dot com
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Eric Lacombe <[email protected]> :
[...]
> That problem also remind me that when I compiled this driver without
> the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I have
> really poor performance with the net device. Maybe it is related, or not ;)
>
> If it gives you more ideas ?
> Maybe it could be interesting to know about the r8169 maintainer, but I dont
> know who he is.
1. $ ls
arch crypto include kernel mm scripts
block Documentation init lib net security
COPYING drivers ipc MAINTAINERS README sound
CREDITS fs Kbuild Makefile REPORTING-BUGS usr
The maintainer of the r8169 driver is listed in the MAINTAINERS file.
2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that.
3. See tethereal -w or tcpdump on the adequate interface to save a
traffic dump.
4. Are you using a binary module for your video adapter ?
5. How does the 2.6.20 version of the r8169 driver behave ?
6. You may setup a cron to monitor an ethtool dump of the register of
the 8169 at regular interval. ifconfig and /proc/interrupts could
exhibit some unusual drift as time passes on too.
7. A dmesg would be welcome.
8. Please open a PR at bugzilla.kernel.org.
|...]
> > There are various ways to analyze system hangs including (at least in some
> > cases) getting a system dump which
> > can be used to isolate the failing location - hopefully
>
> I would like to have more detailed help, if possible.
CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at
runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for details.
Please keep Steve French in the loop.
--
Ueimor
On Tuesday 13 February 2007 21:30:47 Francois Romieu wrote:
> Eric Lacombe <[email protected]> :
> [...]
>
> > That problem also remind me that when I compiled this driver without
> > the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I
> > have really poor performance with the net device. Maybe it is related, or
> > not ;)
> >
> > If it gives you more ideas ?
> > Maybe it could be interesting to know about the r8169 maintainer, but I
> > dont know who he is.
>
> 1. $ ls
> arch crypto include kernel mm scripts
> block Documentation init lib net security
> COPYING drivers ipc MAINTAINERS README sound
> CREDITS fs Kbuild Makefile REPORTING-BUGS usr
>
> The maintainer of the r8169 driver is listed in the MAINTAINERS file.
I see, thanks ;)
(I thought the MAINTAINERS file was not fully maintained ;)
>
> 2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that.
ok, but why having it only inside the "ethernet 100" menu ?
It is misleading, no ?
>
> 3. See tethereal -w or tcpdump on the adequate interface to save a
> traffic dump.
yep, but the problem is that I cant do that from the NAS Box. I will try to
monitor the traffic via the system that will freeze... For the moment I can't
monitor the net traffic from an alternate PC, but soon.
>
> 4. Are you using a binary module for your video adapter ?
yes, I suppose that I have to unload this one before doing further tests.
>
> 5. How does the 2.6.20 version of the r8169 driver behave ?
I don't have installed it yet, but I'll do it this evening.
>
> 6. You may setup a cron to monitor an ethtool dump of the register of
> the 8169 at regular interval. ifconfig and /proc/interrupts could
> exhibit some unusual drift as time passes on too.
I will do that. When I could put a third system to monitor the traffic, I will
make "the system that freeze" keep sending that information to it.
>
> 7. A dmesg would be welcome.
I could do that, this evening.
>
> 8. Please open a PR at bugzilla.kernel.org.
ok
>
> |...]
> |
> > > There are various ways to analyze system hangs including (at least in
> > > some cases) getting a system dump which
> > > can be used to isolate the failing location - hopefully
> >
> > I would like to have more detailed help, if possible.
>
> CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at
> runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for details.
ok
>
> Please keep Steve French in the loop.
ok
Thanks for your response ;)
Eric