Hi Petr,
What is the status of this bug?
http://bugzilla.kernel.org/show_bug.cgi?id=3328
I do not see anything in the history of fs/ncpfs that seems to suggest that this, rather critical, issue has been resolved. Is anyone working on it?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Hi Petr,
>
> What is the status of this bug?
>
> http://bugzilla.kernel.org/show_bug.cgi?id=3328
>
> I do not see anything in the history of fs/ncpfs that seems to suggest that this, rather critical, issue has been resolved. Is anyone working on it?
Nobody is working on it (at least to my knowledge), and to me it is
feature - it always worked this way, like smbfs did back in the past -
if you send signal 9 to process using mount point, and there is some
transaction in progress, nobody can correctly finish that transaction
anymore. Fixing it would require non-trivial amount of code, and given
that NCP itself is more or less dead protocol I do not feel that it is
necessary.
If you want to fix it, feel free. Culprit is RQ_INPROGRESS handling in
ncp_abort_request - it just aborts whole connection so it does not have
to provide temporary buffers and special handling for reply - as buffers
currently specified as reply buffers are owned by caller, so after
aborting request you cannot use them anymore.
Petr Vandrovec
Petr Vandrovec wrote:
>
> Nobody is working on it (at least to my knowledge), and to me it is
> feature - it always worked this way, like smbfs did back in the past -
> if you send signal 9 to process using mount point, and there is some
> transaction in progress, nobody can correctly finish that transaction
> anymore. Fixing it would require non-trivial amount of code, and
> given that NCP itself is more or less dead protocol I do not feel that
> it is necessary.
>
Someone needs to tell our customers then so they'll stop using it. :)
> If you want to fix it, feel free. Culprit is RQ_INPROGRESS handling
> in ncp_abort_request - it just aborts whole connection so it does not
> have to provide temporary buffers and special handling for reply - as
> buffers currently specified as reply buffers are owned by caller, so
> after aborting request you cannot use them anymore.
Do you have any pointers to how it was solved with smbfs? Relevant
patches perhaps? Provided a similar solution can be applied here.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Petr Vandrovec wrote:
>> Nobody is working on it (at least to my knowledge), and to me it is
>> feature - it always worked this way, like smbfs did back in the past -
>> if you send signal 9 to process using mount point, and there is some
>> transaction in progress, nobody can correctly finish that transaction
>> anymore. Fixing it would require non-trivial amount of code, and
>> given that NCP itself is more or less dead protocol I do not feel that
>> it is necessary.
>>
>
> Someone needs to tell our customers then so they'll stop using it. :)
When I asked at Brainshare 2001 when support for files over 4GB files
will be added to NCP, they told me that I should switch to CIFS or
NFS... Years after that confirmed it - only NW6.5SP3 finally got NCPs
to support for files over 4GB, although you could have such files even
on NW5.
>> If you want to fix it, feel free. Culprit is RQ_INPROGRESS handling
>> in ncp_abort_request - it just aborts whole connection so it does not
>> have to provide temporary buffers and special handling for reply - as
>> buffers currently specified as reply buffers are owned by caller, so
>> after aborting request you cannot use them anymore.
>
> Do you have any pointers to how it was solved with smbfs? Relevant
> patches perhaps? Provided a similar solution can be applied here.
I believe it was fixed when smbiod was introduced. When find_request()
returns failure, it simple throws away data received from network.
Unfortunately NCP does not run on top of TCP stream, but on top of
IPX/UDP, and so dropping reply is not sufficient - you must continue
resending request (so you must buffer it somewhere...) until you get
result from server - after you receive answer from server, you can
finally throw away both request & reply, and move on.
Petr
Sorry this took some time, I've been busy with other things.
Petr Vandrovec wrote:
>
> Unfortunately NCP does not run on top of TCP stream, but on top of
> IPX/UDP, and so dropping reply is not sufficient - you must continue
> resending request (so you must buffer it somewhere...) until you get
> result from server - after you receive answer from server, you can
> finally throw away both request & reply, and move on.
>
I don't quite understand why you need to resend. I did the following and
it seems to work fine with UDP:
diff --git a/fs/ncpfs/sock.c b/fs/ncpfs/sock.c
index e496d8b..5159bae 100644
--- a/fs/ncpfs/sock.c
+++ b/fs/ncpfs/sock.c
@@ -151,6 +153,8 @@ static inline int get_conn_number(struct
ncp_reply_header *rp)
return rp->conn_low | (rp->conn_high << 8);
}
+static void __ncp_next_request(struct ncp_server *server);
+
static inline void __ncp_abort_request(struct ncp_server *server,
struct ncp_request_reply *req, int err)
{
/* If req is done, we got signal, but we also received answer... */
@@ -163,7 +167,10 @@ static inline void __ncp_abort_request(struct
ncp_server *server, struct ncp_req
ncp_finish_request(req, err);
break;
case RQ_INPROGRESS:
- __abort_ncp_connection(server, req, err);
+ printk(KERN_INFO "ncpfs: Killing running
request!\n");
+ ncp_finish_request(req, err);
+ __ncp_next_request(server);
+// __abort_ncp_connection(server, req, err);
break;
}
}
@@ -754,7 +761,8 @@ static int ncp_do_request(struct ncp_server *server,
int size,
if (result < 0) {
/* There was a problem with I/O, so the connections is
* no longer usable. */
- ncp_invalidate_conn(server);
+ printk(KERN_INFO "ncpfs: Invalidating connection!\n");
+// ncp_invalidate_conn(server);
}
return result;
}
I'm not particularly proud of the second chunk though. Ideas on how to
handle when we actually get a transmission problem and not just getting
killed by a signal?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Sorry this took some time, I've been busy with other things.
>
> Petr Vandrovec wrote:
>> Unfortunately NCP does not run on top of TCP stream, but on top of
>> IPX/UDP, and so dropping reply is not sufficient - you must continue
>> resending request (so you must buffer it somewhere...) until you get
>> result from server - after you receive answer from server, you can
>> finally throw away both request & reply, and move on.
>>
>
> I don't quite understand why you need to resend. I did the following and
> it seems to work fine with UDP:
Hello,
create test scenario where first transmit of NCP request is lost by
network, and before resend you kill this process. So it stops
resending, but local sequence count is already incremented. Then when
next process tries to access ncpfs, server will ignore its requests as
it expects packet with sequence X, while packet with sequence X+1 arrived.
And unfortunately it is not possible to simple not increment sequence
number unless you get reply - when server receives two packets with same
sequence number, it simple resends answer it gave to first request,
without looking at request's body at all. So in this case server would
answer, but would gave you bogus answer.
So only solution (as far as I can tell) is to keep retrying request
until you get answer - only in this case you can be sure that client and
server state machines are in same state - your solution will work if
packet is never lost. But as we talk about UDP and real networks, this
assumption is not safe.
Petr
>
> diff --git a/fs/ncpfs/sock.c b/fs/ncpfs/sock.c
> index e496d8b..5159bae 100644
> --- a/fs/ncpfs/sock.c
> +++ b/fs/ncpfs/sock.c
> @@ -151,6 +153,8 @@ static inline int get_conn_number(struct
> ncp_reply_header *rp)
> return rp->conn_low | (rp->conn_high << 8);
> }
>
> +static void __ncp_next_request(struct ncp_server *server);
> +
> static inline void __ncp_abort_request(struct ncp_server *server,
> struct ncp_request_reply *req, int err)
> {
> /* If req is done, we got signal, but we also received answer... */
> @@ -163,7 +167,10 @@ static inline void __ncp_abort_request(struct
> ncp_server *server, struct ncp_req
> ncp_finish_request(req, err);
> break;
> case RQ_INPROGRESS:
> - __abort_ncp_connection(server, req, err);
> + printk(KERN_INFO "ncpfs: Killing running
> request!\n");
> + ncp_finish_request(req, err);
> + __ncp_next_request(server);
> +// __abort_ncp_connection(server, req, err);
> break;
> }
> }
> @@ -754,7 +761,8 @@ static int ncp_do_request(struct ncp_server *server,
> int size,
> if (result < 0) {
> /* There was a problem with I/O, so the connections is
> * no longer usable. */
> - ncp_invalidate_conn(server);
> + printk(KERN_INFO "ncpfs: Invalidating connection!\n");
> +// ncp_invalidate_conn(server);
> }
> return result;
> }
>
> I'm not particularly proud of the second chunk though. Ideas on how to
> handle when we actually get a transmission problem and not just getting
> killed by a signal?
>
> Rgds
>
Petr Vandrovec wrote:
>
> Hello,
> create test scenario where first transmit of NCP request is lost by
> network, and before resend you kill this process. So it stops
> resending, but local sequence count is already incremented. Then when
> next process tries to access ncpfs, server will ignore its requests as
> it expects packet with sequence X, while packet with sequence X+1
> arrived.
Figured something along those lines, but I couldn't find any docs on the
protocol so I wasn't sure. You wouldn't happen to have any pointers to
such docs?
>
> And unfortunately it is not possible to simple not increment sequence
> number unless you get reply - when server receives two packets with
> same sequence number, it simple resends answer it gave to first
> request, without looking at request's body at all. So in this case
> server would answer, but would gave you bogus answer.
>
This sounds promising though. In that case it wouldn't be necessary to
store the entire request, just the sequence number, right?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Petr Vandrovec wrote:
>> Hello,
>> create test scenario where first transmit of NCP request is lost by
>> network, and before resend you kill this process. So it stops
>> resending, but local sequence count is already incremented. Then when
>> next process tries to access ncpfs, server will ignore its requests as
>> it expects packet with sequence X, while packet with sequence X+1
>> arrived.
>
> Figured something along those lines, but I couldn't find any docs on the
> protocol so I wasn't sure. You wouldn't happen to have any pointers to
> such docs?
You can download NCP documentation from Novell website. Or you could,
couple of months ago. Unfortunately that documentation just describes
different NCP calls, not transport - to my knowledge transport layer was
never documented outside of Novell.
>> And unfortunately it is not possible to simple not increment sequence
>> number unless you get reply - when server receives two packets with
>> same sequence number, it simple resends answer it gave to first
>> request, without looking at request's body at all. So in this case
>> server would answer, but would gave you bogus answer.
>
> This sounds promising though. In that case it wouldn't be necessary to
> store the entire request, just the sequence number, right?
Not quite. If packet signatures are on then server needs to know packet
you sent so it can correctly compute secret used for next packet (it is
function of old secret, and data in current packet). As current
signatures implementation implement only partial signatures, you need
just first 64bytes of the packet same - but at that point it may be
better to just resend packet completely, as write request with correct
file handle, length, and offset, but with only ~50 bytes of valid data
is going to do lot of damage on the server. So I would recommend
resending original request...
Petr
Ok... how about this baby instead. I've replaced the stack allocated
request structure by one allocated with kmalloc() and reference counted
using an atomic_t. I couldn't see anything else that was associated to
the process, so I believe this should suffice.
(This is just a RFC. Once I get an ok from you I'll put together a more
proper patch mail)
diff --git a/fs/ncpfs/sock.c b/fs/ncpfs/sock.c
index e496d8b..fc6e02d 100644
--- a/fs/ncpfs/sock.c
+++ b/fs/ncpfs/sock.c
@@ -55,6 +55,7 @@ static int _send(struct socket *sock, const void
*buff, int len)
struct ncp_request_reply {
struct list_head req;
wait_queue_head_t wq;
+ atomic_t refs;
struct ncp_reply_header* reply_buf;
size_t datalen;
int result;
@@ -67,6 +68,32 @@ struct ncp_request_reply {
u_int32_t sign[6];
};
+static inline struct ncp_request_reply* ncp_alloc_req(void)
+{
+ struct ncp_request_reply *req;
+
+ req = kmalloc(sizeof(struct ncp_request_reply), GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ init_waitqueue_head(&req->wq);
+ atomic_set(&req->refs, (1));
+ req->status = RQ_IDLE;
+
+ return req;
+}
+
+static void ncp_req_get(struct ncp_request_reply *req)
+{
+ atomic_inc(&req->refs);
+}
+
+static void ncp_req_put(struct ncp_request_reply *req)
+{
+ if (atomic_dec_and_test(&req->refs))
+ kfree(req);
+}
+
void ncp_tcp_data_ready(struct sock *sk, int len)
{
struct ncp_server *server = sk->sk_user_data;
@@ -106,6 +133,7 @@ static inline void ncp_finish_request(struct
ncp_request_reply *req, int result)
req->result = result;
req->status = RQ_DONE;
wake_up_all(&req->wq);
+ ncp_req_put(req);
}
static void __abort_ncp_connection(struct ncp_server *server, struct
ncp_request_reply *aborted, int err)
@@ -308,6 +336,7 @@ static int ncp_add_request(struct ncp_server
*server, struct ncp_request_reply *
printk(KERN_ERR "ncpfs: tcp: Server died\n");
return -EIO;
}
+ ncp_req_get(req);
if (server->tx.creq || server->rcv.creq) {
req->status = RQ_QUEUED;
list_add_tail(&req->req, &server->tx.requests);
@@ -478,12 +507,6 @@ void ncpdgram_timeout_proc(struct work_struct *work)
mutex_unlock(&server->rcv.creq_mutex);
}
-static inline void ncp_init_req(struct ncp_request_reply* req)
-{
- init_waitqueue_head(&req->wq);
- req->status = RQ_IDLE;
-}
-
static int do_tcp_rcv(struct ncp_server *server, void *buffer, size_t len)
{
int result;
@@ -678,25 +701,32 @@ static int do_ncp_rpc_call(struct ncp_server
*server, int size,
struct ncp_reply_header* reply_buf, int max_reply_size)
{
int result;
- struct ncp_request_reply req;
-
- ncp_init_req(&req);
- req.reply_buf = reply_buf;
- req.datalen = max_reply_size;
- req.tx_iov[1].iov_base = server->packet;
- req.tx_iov[1].iov_len = size;
- req.tx_iovlen = 1;
- req.tx_totallen = size;
- req.tx_type = *(u_int16_t*)server->packet;
-
- result = ncp_add_request(server, &req);
+ struct ncp_request_reply *req;
+
+ req = ncp_alloc_req();
+ if (!req)
+ return -ENOMEM;
+
+ req->reply_buf = reply_buf;
+ req->datalen = max_reply_size;
+ req->tx_iov[1].iov_base = server->packet;
+ req->tx_iov[1].iov_len = size;
+ req->tx_iovlen = 1;
+ req->tx_totallen = size;
+ req->tx_type = *(u_int16_t*)server->packet;
+
+ result = ncp_add_request(server, req);
if (result < 0) {
return result;
}
- if (wait_event_interruptible(req.wq, req.status == RQ_DONE)) {
- ncp_abort_request(server, &req, -EIO);
- }
- return req.result;
+ if (wait_event_interruptible(req->wq, req->status == RQ_DONE))
+ result = -EINTR;
+ else
+ result = req->result;
+
+ ncp_req_put(req);
+
+ return result;
}
/*
@@ -751,11 +781,6 @@ static int ncp_do_request(struct ncp_server
*server, int size,
DDPRINTK("do_ncp_rpc_call returned %d\n", result);
- if (result < 0) {
- /* There was a problem with I/O, so the connections is
- * no longer usable. */
- ncp_invalidate_conn(server);
- }
return result;
}
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
ping!
Pierre Ossman wrote:
> Ok... how about this baby instead. I've replaced the stack allocated
> request structure by one allocated with kmalloc() and reference counted
> using an atomic_t. I couldn't see anything else that was associated to
> the process, so I believe this should suffice.
>
> (This is just a RFC. Once I get an ok from you I'll put together a more
> proper patch mail)
>
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Ok... how about this baby instead. I've replaced the stack allocated
> request structure by one allocated with kmalloc() and reference counted
> using an atomic_t. I couldn't see anything else that was associated to
> the process, so I believe this should suffice.
>
> (This is just a RFC. Once I get an ok from you I'll put together a more
> proper patch mail)
>
> - req.tx_type = *(u_int16_t*)server->packet;
> -
> - result = ncp_add_request(server, &req);
> + struct ncp_request_reply *req;
> +
> + req = ncp_alloc_req();
> + if (!req)
> + return -ENOMEM;
> +
> + req->reply_buf = reply_buf;
> + req->datalen = max_reply_size;
> + req->tx_iov[1].iov_base = server->packet;
> + req->tx_iov[1].iov_len = size;
> + req->tx_iovlen = 1;
> + req->tx_totallen = size;
> + req->tx_type = *(u_int16_t*)server->packet;
Problem is with these pointers - reply_buf & server->packet. Now code
will just read packet from server->packet, and write result to
reply_buf, most probably transmiting some random data to network, and
overwriting innocent memory on receiption... I believe that you need to
make copies of server->packet/size for transmission, and some simillar
solution for receive as well. As both request & response can be up to
~66000 bytes.
Petr
Petr Vandrovec wrote:
>
> Problem is with these pointers - reply_buf & server->packet. Now code
> will just read packet from server->packet, and write result to
> reply_buf, most probably transmiting some random data to network, and
> overwriting innocent memory on receiption... I believe that you need
> to make copies of server->packet/size for transmission, and some
> simillar solution for receive as well. As both request & response can
> be up to ~66000 bytes.
Hmm.. I thought server->packet was protected until the packet was
completed, independent of the process that issued it. Looking closer I
see that this isn't quite the case.
How about this... We allocate two buffers at startup, one for outgoing
and one for incoming. Then we use these during the actual transmission,
copying back and forth as need. Then we just need to avoid the final
response copy if the process has gone belly up.
Now my next question in that case is, what is the purpose of
server->packet. Couldn't this buffer be provided by the caller like the
response buffer?
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Petr Vandrovec wrote:
>> Problem is with these pointers - reply_buf & server->packet. Now code
>> will just read packet from server->packet, and write result to
>> reply_buf, most probably transmiting some random data to network, and
>> overwriting innocent memory on receiption... I believe that you need
>> to make copies of server->packet/size for transmission, and some
>> simillar solution for receive as well. As both request & response can
>> be up to ~66000 bytes.
>
> Hmm.. I thought server->packet was protected until the packet was
> completed, independent of the process that issued it. Looking closer I
> see that this isn't quite the case.
>
> How about this... We allocate two buffers at startup, one for outgoing
> and one for incoming. Then we use these during the actual transmission,
> copying back and forth as need. Then we just need to avoid the final
> response copy if the process has gone belly up.
You must not allow anybody to reuse transmit buffer until you are done
with all retransmits and received reply from server... That's why code
uses same buffer for both request and reply - you never need both, and
as maximum size is more or less same for both (65KB), it avoid problem
that you would need two 65KB buffers in worst case.
> Now my next question in that case is, what is the purpose of
> server->packet. Couldn't this buffer be provided by the caller like the
> response buffer?
Then you would need to do vmalloc (or maybe kmalloc for some cases) on
each request transmit & receive. And only very few callers provide
receive buffer - most of them does ncp_request() which receives reply to
server->packet, without doing any additional allocation - there are only
three callers which use ncp_request2 - two of them (ioctl,
ncp_read_bounce) do that because copy_to_user is not allowed while
ncp_server is locked, and third one (search for file set) does that
because caller may need to issue additional NCP calls while parsing its
result. But everybody else gets away with no memory allocation.
Petr
Sorry this took so long but I got occupied with other things and this
had to move down the pile a bit.
New patch which uses dedicated buffers for the currently active packet.
Also adds a new state RQ_ABANDONED which basically means "the caller no
longer cares about this request so the pointers are no longer valid". It
is used to determine if the global receive buffer should be copied to
the provided one upon completion.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
Pierre Ossman wrote:
> Sorry this took so long but I got occupied with other things and this
> had to move down the pile a bit.
>
> New patch which uses dedicated buffers for the currently active packet.
> Also adds a new state RQ_ABANDONED which basically means "the caller no
> longer cares about this request so the pointers are no longer valid". It
> is used to determine if the global receive buffer should be copied to
> the provided one upon completion.
Hello,
it would be nice if these two copies (request->txbuf, and
rxbuf->reply) could be eliminated, but I see no easy way how to do that...
> commit 166fb223f9507431fb97820549e3e41980987445
> Author: Pierre Ossman <[email protected]>
> Date: Mon Feb 19 11:34:43 2007 +0100
>
> ncpfs: make sure server connection survives a kill
>
> Use internal buffers instead of the ones supplied by the caller
> so that a caller can be interrupted without having to abort the
> entire ncp connection.
>
> Signed-off-by: Pierre Ossman <[email protected]>
Acked-by: Petr Vandrovec <[email protected]>
(modulo one thing below)
> diff --git a/include/linux/ncp_fs_sb.h b/include/linux/ncp_fs_sb.h
> index a503052..d5e7ffc 100644
> --- a/include/linux/ncp_fs_sb.h
> +++ b/include/linux/ncp_fs_sb.h
> @@ -50,6 +50,8 @@ struct ncp_server {
> int packet_size;
> unsigned char *packet; /* Here we prepare requests and
> receive replies */
> + unsigned char *txbuf; /* Storage for current requres */
Looks like a typo? requres => request ?
> + unsigned char *rxbuf; /* Storage for reply to current request */
>
> int lock; /* To prevent mismatch in protocols. */
> struct mutex mutex;
Petr
Petr Vandrovec wrote:
>
> Hello,
> it would be nice if these two copies (request->txbuf, and
> rxbuf->reply) could be eliminated, but I see no easy way how to do
> that...
>
At least we have the basic functionality now. One can start looking at
optimising it after that.
>
> Acked-by: Petr Vandrovec <[email protected]>
I'll send it on to Linus then.
>
> Looks like a typo? requres => request ?
>
Ooops. I'll fix that. :)
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org