2000-12-19 10:03:46

by Hans-Joachim Baader

[permalink] [raw]
Subject: 2.2.18: Thread problem with smbfs

Hi,

I hava a strange problem with smbfs. My application creates threads
that copy files from a mounted SMB share to the local disk. When
I run the application normally, there's no problem. However when
I run it in gdb 4.18 or 5.0, one of the threads goes into the D state
(not always), and the whole program including gdb hangs.

With strace, these are the last lines of output I get:

1854 sched_get_priority_max(0) = 0
1854 sched_get_priority_min(0) = 0
1854 brk(0x80ca000) = 0x80ca000
1854 pipe([9, 10]) = 0
1854 clone() = 1856
1854 write(10, "\300\357\215@\5\0\0\0\24\364\377\277\256^\204@\370\377\215@\240\353\215@\276\271w@Q\270w@\274Dx@\240\353\215@Q\270w@\274Dx@\240\353\215@\260\357\215@\304\357\215@H\364\377\277\300\357\215@\370\377\215@\240\353\215@d\364\377\277\276\271w@\274Dx@\260\357\215@\256^\204@\370\377\215@\276\271w@\274Dx@\260\357\215@\2\0\0\0T\365\377\277G\200\0@>[w@\324Vf@D:\1@`R\216@\3\0\0\0p\365\377\277", 148) = 148
1854 rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
1854 write(10, "\0!x@\0\0\0\0\360\365\377\277\0 q@\340`\f\10\0\0\0\200\0\0\0\0\f\0\0\0P\357\22@\f\0\0\0l\365\377\277\\.d@\204\342\22@\354\215\371\7\234\365\377\277\"@f@\314\233\315\4\250\365\377\277\\.d@\240\365\377\277A\245\0@X\340\22@@R\216@\7\0\0\0\216\244\0@\370\227v@\340`\f\10P\234\v\10|\263d@H\236v@D;w@\24\366\377\277\360\246\0@\0 q@2\0\0\0p\232w@x\340\22@", 148) = 148
1854 rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
1854 rt_sigsuspend([] <unfinished ...>

In the syslog I find the following:

Dec 18 19:07:58 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:58 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:07:59 George kernel: smb_retry: sucessful, new pid=16002, generation=38
Dec 18 19:07:59 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:59 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:07:59 George kernel: smb_retry: sucessful, new pid=16002, generation=39
Dec 18 19:07:59 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:59 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:08:00 George kernel: smb_retry: sucessful, new pid=16002, generation=40

and so on, endlessly. So, AFAIK, smbfs thinks it has lost connection and
tells smbmount to re-establish it, which succeeds (at least smbmount
thinks so). This happens several times per second.

However, with processes instead of threads, without the debugger, or
when reading from a local filesystem instead of a SMB filesystem, there
is no problem.

Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.

Regards,
hjb
--
http://www.pro-linux.de/ - Germany's largest volunteer Linux support site


2000-12-19 11:29:03

by Urban Widmark

[permalink] [raw]
Subject: Re: 2.2.18: Thread problem with smbfs

On Tue, 19 Dec 2000, Hans-Joachim Baader wrote:

> and so on, endlessly. So, AFAIK, smbfs thinks it has lost connection and
> tells smbmount to re-establish it, which succeeds (at least smbmount
> thinks so). This happens several times per second.

-512 means that the recv was interrupted by a signal, or rather, the
current process has a signal maybe the recv was interrupted, maybe there
is a problem with the connection, better reconnect.

Still, it's better than pre-2.2.18 where smbmount wouldn't stay alive ...

I don't really know how signal delivery works within the kernel, but
smb_trans2_request tries to disable some signals. That does not work
(completely?) so either it needs fixing or the -512 errno needs to be
handled.

Why so bad in gdb? perhaps it causes more signals.
Why does one thread end up in D state? don't know.


> Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.

A small testprogram that causes this would be nice. The -512 is easy to
reproduce but I haven't seen the 'D' before.

If someone is interested the relevant code is fs/smbfs/sock.c
(smb_trans2_request, ..., _recvfrom)

/Urban

2000-12-20 21:12:06

by Hans-Joachim Baader

[permalink] [raw]
Subject: Re: 2.2.18: Thread problem with smbfs

Hi,

Urban Widmark wrote:

> I don't really know how signal delivery works within the kernel, but
> smb_trans2_request tries to disable some signals. That does not work
> (completely?) so either it needs fixing or the -512 errno needs to be
> handled.
>
> Why so bad in gdb? perhaps it causes more signals.
> Why does one thread end up in D state? don't know.
>
>
> > Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.
>
> A small testprogram that causes this would be nice. The -512 is easy to
> reproduce but I haven't seen the 'D' before.
>
> If someone is interested the relevant code is fs/smbfs/sock.c
> (smb_trans2_request, ..., _recvfrom)

Here is a test program to reproduce this. Don't worry about
missing error checks and so on, it's just a quick hack.
Create the required files file1..file5 on a SMB share and edit
the #define accordingly. File sizes of 1-2 MB should suffice.
Then run the program. It should copy the files to the current
directory. Then run it under gdb. It should hang until you kill
gdb.

I tested only with a NT 4 server (sp 5 or 6).

Regards,
hjb

#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

/* Size of the blocks we read from a file. */
static const int ChunkSize = 8192;

/* Path on the mounted SMB share from which we copy files */
#define SourcePath "/mnt/net/test"

struct CopyThreadInfo
{
char* src;
char* dst;
};

/* returns 1 on success */
int CopyFile(char* src, char* dst)
{
char buffer[ChunkSize];
int f, g;
ssize_t nRet;
int nError;

if ((f = open(src, O_RDONLY)) < 0)
return 0;

g = open(dst, O_WRONLY | O_CREAT | O_TRUNC, 0666);
if (g < 0)
{
close(f);
return 0;
}

do
{
nRet = read(f, buffer, sizeof(buffer));
if (nRet < 0 && errno == EINTR)
nRet = 0;
if (nRet < 0)
{
return 0;
}
if (nRet > 0)
nRet = write(g, buffer, nRet);
} while (nRet > 0);

close(g);
close(f);

if (nRet < 0)
return 0;

return 1;
}

void* Copy(struct CopyThreadInfo *info)
{
CopyFile(info->src, info->dst);
return NULL;
}

void Fetch(char* name)
{
char src[4096];
char dst[4096];

pthread_attr_t attr;
pthread_t pid;
struct CopyThreadInfo* pCopy = (struct CopyThreadInfo *) malloc(sizeof(struct CopyThreadInfo));

strcpy(src, SourcePath);
strcat(src, name);
strcpy(dst, name);

pCopy->src = strdup(src);
pCopy->dst = strdup(dst);

pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&pid, &attr, Copy, pCopy);
}

int main()
{
Fetch("file1");
Fetch("file2");
Fetch("file3");
Fetch("file4");
Fetch("file5");
while(1)
;
return 0;
}


--
http://www.pro-linux.de/ - Germany's largest volunteer Linux support site