2014-02-03 01:15:18

by David Fries

[permalink] [raw]
Subject: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

I could submit these patches as in, which would require the previous
set, or I could merge the documentation into the previous set and
resubmit them all since they haven't made it into the kernel tree yet.
Opinions?

Here's a small refcnt fix, skipping sending non-error messages, and
documentation and comment updates.

non-error error messages:
Currently every master or slave command is sending a response with
w1_netlink_send_error no matter if there is an error or not. This
makes commands like list slaves W1_CMD_LIST_SLAVES or W1_CMD_READ
return two messages, one with data and one without. That is a problem
with the list slaves because they are identical except for one having
data and one not, and since there could be no slaves known to the
kernel you can't just discard the no data case, unless the program
were to expect two replies. So I propose only sending the error reply
if there is an error, in which case there wouldn't be a normal reply
(such as read). This would mean commands like write would no longer
return a response unless there was an error. If an application wanted
to verify the kernel received the write message it could follow it by
a read to verify the data or just that read came after write and had a
response so write must have completed without error. I think it is
safe to do away with the extra replies. If someone sees a big enough
need for this, I could modify it so all commands return one response,
with commands like write always calling send error even if there
wasn't one.

It seems the way I read Documentation/connector/connector.txt not all
requests (from user space), must have a reply, and it does say
delivery isn't guaranteed as memory pressure can discard them, so I
think it's safe to drop the no error sending.

refcnt:
hub 7-0:1.0: port 2 disabled by hub (EMI?), re-enabling...
which is where the ds2490 was attached, which it would normally
recover just fine, except this time the kernel just printed,
w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=2.
and I had to reboot. It hasn't been reproduced by removing and
replugging the ds2490. In digging through the code I did find a
refcnt leak that could cause the problem. I verified it leaked a
refcnt by modifying my program to send a command without data, and
verified it is fixed after. I wouldn't have thought my program would
have sent an incomplete request, so I don't know what actually caused
the original refcnt problem, so it may still be there.


2014-02-03 01:15:31

by David Fries

[permalink] [raw]
Subject: [PATCH 4/4] w1: update cn_netlink_send documentation

Signed-off-by: David Fries <[email protected]>
---
Documentation/connector/connector.txt | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
index e5c5f5e..9bdfc1a 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -24,7 +24,7 @@ netlink based networking for inter-process communication in a significantly
easier way:

int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
+void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);

struct cb_id
{
@@ -71,15 +71,19 @@ void cn_del_callback(struct cb_id *id);
struct cb_id *id - unique connector's user identifier.


-int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask);

Sends message to the specified groups. It can be safely called from
softirq context, but may silently fail under strong memory pressure.
If there are no listeners for given group -ESRCH can be returned.

struct cn_msg * - message header(with attached data).
+ u32 port - destination port.
+ If non-zero the message will be sent to the
+ given port, which should be set to the
+ original sender.
u32 __group - destination group.
- If __group is zero, then appropriate group will
+ If port and __group is zero, then appropriate group will
be searched through all registered connector users,
and message will be delivered to the group which was
created for user with the same ID as in msg.
--
1.7.10.4

2014-02-03 01:17:27

by David Fries

[permalink] [raw]
Subject: [PATCH 2/4] w1: only send_error when there is an error

Otherwise there's an extra reply being sent out for each async
message. Some commands such as W1_CMD_LIST_SLAVES will be identical
except one message has data and the other doesn't making it difficult
for a program to know if all the slaves just vanished or what happened.

Signed-off-by: David Fries <[email protected]>
---
drivers/w1/w1_netlink.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index a02704a..a6962f5 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -423,9 +423,11 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd)
err = w1_process_command_master(dev, &node->block->msg,
node->m, cmd);

- w1_netlink_send_error(&node->block->msg, node->m, cmd,
- node->block->portid, err);
- err = 0;
+ if (err) {
+ w1_netlink_send_error(&node->block->msg, node->m, cmd,
+ node->block->portid, err);
+ err = 0;
+ }

cmd_data += cmd->len + sizeof(struct w1_netlink_cmd);
mlen -= cmd->len + sizeof(struct w1_netlink_cmd);
--
1.7.10.4

2014-02-03 01:17:35

by David Fries

[permalink] [raw]
Subject: [PATCH 1/4] w1: fix netlink refcnt leak on error path

If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
is taken when searching for the slave or master device. If there
isn't any following data m->len (mlen is a copy) is 0 and packing up
the message for later execution is skipped leaving nothing to
decrement the reference counts.

Way back when, m->len was checked before the search that increments the
reference count, but W1_LIST_MASTERS has no additional data, the check
was moved in 9be62e0b2fadaf5ff causing this bug.

This change reorders to put the check before the reference count is
incremented avoiding the problem.

Signed-off-by: David Fries <[email protected]>
---
drivers/w1/w1_netlink.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 5234964..a02704a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg,
struct w1_netlink_msg *w;
u32 *id;

- if (mcmd->type != W1_LIST_MASTERS) {
- printk(KERN_NOTICE "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
- __func__, msg->id.idx, msg->id.val, mcmd->type, mcmd->len);
- return -EPROTO;
- }
-
cn = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!cn)
return -ENOMEM;
@@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd)
w1_netlink_send_error(&node->block->msg, node->m, cmd,
node->block->portid, err);

+ /* ref taken in w1_search_slave or w1_search_master_id when building
+ * the block
+ */
if (sl)
w1_unref_slave(sl);
else
@@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)

msg_len = msg->len;
while (msg_len && !err) {
- struct w1_reg_num id;
- u16 mlen = m->len;

dev = NULL;
sl = NULL;

- memcpy(&id, m->id.id, sizeof(id));
-#if 0
- printk("%s: %02x.%012llx.%02x: type=%02x, len=%u.\n",
- __func__, id.family, (unsigned long long)id.id, id.crc, m->type, m->len);
-#endif
if (m->len + sizeof(struct w1_netlink_msg) > msg_len) {
err = -E2BIG;
break;
}

+ /* execute on this thread, no need to process later */
+ if (m->type == W1_LIST_MASTERS) {
+ err = w1_process_command_root(msg, m, nsp->portid);
+ goto out_cont;
+ }
+
+ /* All following message types require additional data,
+ * check here before references are taken.
+ */
+ if (!m->len) {
+ err = -EPROTO;
+ goto out_cont;
+ }
+
+ /* both search calls take reference counts */
if (m->type == W1_MASTER_CMD) {
dev = w1_search_master_id(m->id.mst.id);
} else if (m->type == W1_SLAVE_CMD) {
- sl = w1_search_slave(&id);
+ sl = w1_search_slave((struct w1_reg_num *)m->id.id);
if (sl)
dev = sl->master;
} else {
- err = w1_process_command_root(msg, m, nsp->portid);
+ printk(KERN_NOTICE
+ "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
+ __func__, msg->id.idx, msg->id.val,
+ m->type, m->len);
+ err = -EPROTO;
goto out_cont;
}

@@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
}

err = 0;
- if (!mlen)
- goto out_cont;

atomic_inc(&block->refcnt);
node->async.cb = w1_process_cb;
@@ -557,7 +564,8 @@ out_cont:
if (err)
w1_netlink_send_error(msg, m, NULL, nsp->portid, err);
msg_len -= sizeof(struct w1_netlink_msg) + m->len;
- m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct w1_netlink_msg) + m->len);
+ m = (struct w1_netlink_msg *)(((u8 *)m) +
+ sizeof(struct w1_netlink_msg) + m->len);

/*
* Let's allow requests for nonexisting devices.
--
1.7.10.4

2014-02-03 01:17:52

by David Fries

[permalink] [raw]
Subject: [PATCH 3/4] w1: document struct w1_netlink_msg and struct w1_netlink_cmd

I wasn't sure on the length, so I looked it up and documented it.

Signed-off-by: David Fries <[email protected]>
---
drivers/w1/w1_netlink.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/drivers/w1/w1_netlink.h b/drivers/w1/w1_netlink.h
index 1e9504e..c646a98 100644
--- a/drivers/w1/w1_netlink.h
+++ b/drivers/w1/w1_netlink.h
@@ -49,6 +49,19 @@ enum w1_netlink_message_types {
W1_LIST_MASTERS,
};

+/**
+ * struct w1_netlink_msg - holds w1 message type, id, and result
+ *
+ * @type: one of enum w1_netlink_message_types
+ * @status: kernel feedback for success 0 or errno failure value
+ * @len: length of data following w1_netlink_msg
+ * @id: union holding master bus id (msg.id) and slave device id (id[8]).
+ * @data: start address of any following data
+ *
+ * The base message structure for w1 messages over netlink.
+ * The netlink connector data sequence is, struct nlmsghdr, struct cn_msg,
+ * then one or more struct w1_netlink_msg (each with optional data).
+ */
struct w1_netlink_msg
{
__u8 type;
@@ -66,6 +79,7 @@ struct w1_netlink_msg

/**
* enum w1_commands - commands available for master or slave operations
+ *
* @W1_CMD_READ: read len bytes
* @W1_CMD_WRITE: write len bytes
* @W1_CMD_SEARCH: initiate a standard search, returns only the slave
@@ -93,6 +107,17 @@ enum w1_commands {
W1_CMD_MAX
};

+/**
+ * struct w1_netlink_cmd - holds the command and data
+ *
+ * @cmd: one of enum w1_commands
+ * @res: reserved
+ * @len: length of data following w1_netlink_cmd
+ * @data: start address of any following data
+ *
+ * One or more struct w1_netlink_cmd is placed starting at w1_netlink_msg.data
+ * each with optional data.
+ */
struct w1_netlink_cmd
{
__u8 cmd;
--
1.7.10.4

2014-02-03 23:59:43

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

Hi

03.02.2014, 05:15, "David Fries" <[email protected]>:

> ?I could submit these patches as in, which would require the previous
> ?set, or I could merge the documentation into the previous set and
> ?resubmit them all since they haven't made it into the kernel tree yet.
> ?Opinions?
>
> ?Here's a small refcnt fix, skipping sending non-error messages, and
> ?documentation and comment updates.
>
> ?non-error error messages:
> ?Currently every master or slave command is sending a response with
> ?w1_netlink_send_error no matter if there is an error or not. ?This
> ?makes commands like list slaves W1_CMD_LIST_SLAVES or W1_CMD_READ
> ?return two messages, one with data and one without. ?That is a problem
> ?with the list slaves because they are identical except for one having
> ?data and one not, and since there could be no slaves known to the
> ?kernel you can't just discard the no data case, unless the program
> ?were to expect two replies. ?So I propose only sending the error reply
> ?if there is an error, in which case there wouldn't be a normal reply
> ?(such as read). ?This would mean commands like write would no longer
> ?return a response unless there was an error. ?If an application wanted
> ?to verify the kernel received the write message it could follow it by
> ?a read to verify the data or just that read came after write and had a
> ?response so write must have completed without error. ?I think it is
> ?safe to do away with the extra replies. ?If someone sees a big enough
> ?need for this, I could modify it so all commands return one response,
> ?with commands like write always calling send error even if there
> ?wasn't one.

I created this protocol to handle cases like nothing is returned, but yet userspace knows
operations has been completed. Also, you can not really change it at this time - there are
already userspace application which may depend on the last ack to find out its request completed.

Reference counter fix is correct, please submit it in the separate patch.

2014-02-04 05:51:50

by David Fries

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

On Tue, Feb 04, 2014 at 03:59:38AM +0400, [email protected] wrote:
> Hi
>
> 03.02.2014, 05:15, "David Fries" <[email protected]>:
>
> > ?I could submit these patches as in, which would require the previous
> > ?set, or I could merge the documentation into the previous set and
> > ?resubmit them all since they haven't made it into the kernel tree yet.
> > ?Opinions?
> >
> > ?Here's a small refcnt fix, skipping sending non-error messages, and
> > ?documentation and comment updates.
> >
> > ?non-error error messages:
> > ?Currently every master or slave command is sending a response with
> > ?w1_netlink_send_error no matter if there is an error or not. ?This
> > ?makes commands like list slaves W1_CMD_LIST_SLAVES or W1_CMD_READ
> > ?return two messages, one with data and one without. ?That is a problem
> > ?with the list slaves because they are identical except for one having
> > ?data and one not, and since there could be no slaves known to the
> > ?kernel you can't just discard the no data case, unless the program
> > ?were to expect two replies. ?So I propose only sending the error reply
> > ?if there is an error, in which case there wouldn't be a normal reply
> > ?(such as read). ?This would mean commands like write would no longer
> > ?return a response unless there was an error. ?If an application wanted
> > ?to verify the kernel received the write message it could follow it by
> > ?a read to verify the data or just that read came after write and had a
> > ?response so write must have completed without error. ?I think it is
> > ?safe to do away with the extra replies. ?If someone sees a big enough
> > ?need for this, I could modify it so all commands return one response,
> > ?with commands like write always calling send error even if there
> > ?wasn't one.
>
> I created this protocol to handle cases like nothing is returned, but yet userspace knows
> operations has been completed. Also, you can not really change it at this time - there are
> already userspace application which may depend on the last ack to find out its request completed.
>
> Reference counter fix is correct, please submit it in the separate patch.

Help me understand what the protocol is supposed to be. Assuming
there aren't any errors, is there supposed to be a
w1_netlink_send_error generated reply per netlink packet (cn_msg), per
w1_netlink_msg, or per w1_netlink_cmd?

What about the cn_msg seq and ack values? I assume the kernel
response should carry the same seq number as the request, but what
should the ack be set to?

--
David Fries <[email protected]> PGP pub CB1EE8F0
http://fries.net/~david/

2014-02-04 23:59:01

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

Hi

04.02.2014, 09:51, "David Fries" <[email protected]>:
> Help me understand what the protocol is supposed to be. ?Assuming
> there aren't any errors, is there supposed to be a
> w1_netlink_send_error generated reply per netlink packet (cn_msg), per
> w1_netlink_msg, or per w1_netlink_cmd?

reply should be sent per cmd to specify each command status
If there is no cmd in request or we didn't get to it (like failed to reset device), we should send error.

Depending on how w1-msg + (optional) w1-cmd are packed, client can detect what exact error happend

> What about the cn_msg seq and ack values? ?I assume the kernel
> response should carry the same seq number as the request, but what
> should the ack be set to?

reply ack is seq + 1
seq is the same to highlight request it belongs to

2014-02-07 06:00:46

by David Fries

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

On Wed, Feb 05, 2014 at 03:48:45AM +0400, [email protected] wrote:
> Hi
>
> 04.02.2014, 09:51, "David Fries" <[email protected]>:
> > Help me understand what the protocol is supposed to be. ?Assuming
> > there aren't any errors, is there supposed to be a
> > w1_netlink_send_error generated reply per netlink packet (cn_msg), per
> > w1_netlink_msg, or per w1_netlink_cmd?
>
> reply should be sent per cmd to specify each command status
> If there is no cmd in request or we didn't get to it (like failed to reset device), we should send error.
>
> Depending on how w1-msg + (optional) w1-cmd are packed, client can detect what exact error happend
>
> > What about the cn_msg seq and ack values? ?I assume the kernel
> > response should carry the same seq number as the request, but what
> > should the ack be set to?
>
> reply ack is seq + 1
> seq is the same to highlight request it belongs to

Here's a patch to implement that. Is this what you have in mind?

>From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
From: David Fries <[email protected]>
Date: Thu, 6 Feb 2014 23:45:05 -0600
Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1

Netlink messages sent from the kernel consists of kernel generated
notifications for adds or removes, the error message (also indicates
the message has been processed), and the messages that have data to
return. The cn_msg ack is left alone for the first two, and when
returning data it is the sequence number + 1. Modifying the code to
the protocol standard.

Signed-off-by: David Fries <[email protected]>
---
Documentation/connector/connector.txt | 2 +-
drivers/w1/w1_netlink.c | 6 +-----
2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
index 9bdfc1a..0bc3522 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -115,7 +115,7 @@ acknowledge number MUST be the same + 1.
If we receive a message and its sequence number is not equal to one we
are expecting, then it is a new message. If we receive a message and
its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
message + 1, then it is a new message.

Obviously, the protocol header contains the above id.
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 3c81689..d98b550 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -59,7 +59,6 @@ static void w1_send_slave(struct w1_master *dev, u64 rn)
avail = dev->priv_size - cmd->len;

if (avail < 8) {
- msg->ack++;
cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL);

msg->len = sizeof(struct w1_netlink_msg) +
@@ -130,7 +129,6 @@ static int w1_get_slaves(struct w1_master *dev,
W1_ALARM_SEARCH : W1_SEARCH, w1_found_send_slave);
}

- msg->ack = 0;
cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL);

dev->priv = NULL;
@@ -308,7 +306,7 @@ static int w1_process_command_root(struct cn_msg *msg,
cn->id.val = CN_W1_VAL;

cn->seq = msg->seq;
- cn->ack = 1;
+ cn->ack = msg->seq + 1;
cn->len = sizeof(struct w1_netlink_msg);
w = (struct w1_netlink_msg *)(cn + 1);

@@ -321,7 +319,6 @@ static int w1_process_command_root(struct cn_msg *msg,
list_for_each_entry(m, &w1_masters, w1_master_entry) {
if (cn->len + sizeof(*id) > PAGE_SIZE - sizeof(struct cn_msg)) {
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
- cn->ack++;
cn->len = sizeof(struct w1_netlink_msg);
w->len = 0;
id = (u32 *)(w + 1);
@@ -332,7 +329,6 @@ static int w1_process_command_root(struct cn_msg *msg,
cn->len += sizeof(*id);
id++;
}
- cn->ack = 0;
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
mutex_unlock(&w1_mlock);

--
1.7.10.4

2014-02-07 21:23:47

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

Hi

07.02.2014, 10:00, "David Fries" <[email protected]>:

> Here's a patch to implement that. ?Is this what you have in mind?
>
> From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
> From: David Fries <[email protected]>
> Date: Thu, 6 Feb 2014 23:45:05 -0600
> Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1
>
> Netlink messages sent from the kernel consists of kernel generated
> notifications for adds or removes, the error message (also indicates
> the message has been processed), and the messages that have data to
> return. ?The cn_msg ack is left alone for the first two, and when
> returning data it is the sequence number + 1. ?Modifying the code to
> the protocol standard.
>
> Signed-off-by: David Fries <[email protected]>

Yes, it looks right.
Can you also check that protocol documentation is correct?

2014-02-07 22:25:57

by David Fries

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

On Sat, Feb 08, 2014 at 01:23:43AM +0400, [email protected] wrote:
> Hi
>
> 07.02.2014, 10:00, "David Fries" <[email protected]>:
>
> > Here's a patch to implement that. ?Is this what you have in mind?
> >
> > From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
> > From: David Fries <[email protected]>
> > Date: Thu, 6 Feb 2014 23:45:05 -0600
> > Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1
> >
> > Netlink messages sent from the kernel consists of kernel generated
> > notifications for adds or removes, the error message (also indicates
> > the message has been processed), and the messages that have data to
> > return. ?The cn_msg ack is left alone for the first two, and when
> > returning data it is the sequence number + 1. ?Modifying the code to
> > the protocol standard.
> >
> > Signed-off-by: David Fries <[email protected]>
>
> Yes, it looks right.
> Can you also check that protocol documentation is correct?

Documentation/connector/connector.txt ? I found it a little unclear,
I'll see what I can do.

--
David Fries <[email protected]> PGP pub CB1EE8F0
http://fries.net/~david/

2014-02-07 22:35:40

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs



08.02.2014, 02:25, "David Fries" <[email protected]>:

>> ?Can you also check that protocol documentation is correct?
>
> Documentation/connector/connector.txt ? ?I found it a little unclear,
> I'll see what I can do.

No, I meant Documentation/w1