2018-02-07 22:25:11

by Simon Gaiser

[permalink] [raw]
Subject: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
concurrent xenstore accesses") made a subtle change to the semantic of
xenbus_dev_request_and_reply() and xenbus_transaction_end().

Before on an error response to XS_TRANSACTION_END
xenbus_dev_request_and_reply() would not decrement the active
transaction counter. But xenbus_transaction_end() has always counted the
transaction as finished regardless of the response.

The new behavior is that xenbus_dev_request_and_reply() and
xenbus_transaction_end() will always count the transaction as finished
regardless the response code (handled in xs_request_exit()).

But xenbus_dev_frontend tries to end a transaction on closing of the
device if the XS_TRANSACTION_END failed before. Trying to close the
transaction twice corrupts the reference count. So fix this by also
considering a transaction closed if we have sent XS_TRANSACTION_END once
regardless of the return code.

Cc: <[email protected]> # 4.11
Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Signed-off-by: Simon Gaiser <[email protected]>
---
drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
index f3b089b7c0b6..d2edbc79384a 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -365,7 +365,7 @@ void xenbus_dev_queue_reply(struct xb_req_data *req)
if (WARN_ON(rc))
goto out;
}
- } else if (req->msg.type == XS_TRANSACTION_END) {
+ } else if (req->type == XS_TRANSACTION_END) {
trans = xenbus_get_transaction(u, req->msg.tx_id);
if (WARN_ON(!trans))
goto out;
--
2.15.1



2018-02-07 22:25:31

by Simon Gaiser

[permalink] [raw]
Subject: [PATCH 2/2] xen: xenbus: WARN_ON XS_TRANSACTION_{START,END} misuse

As the previous commit shows it's quite easy to confuse the transaction
reference counting by ending a transaction twice. So at least try to
detect and report it.

Signed-off-by: Simon Gaiser <[email protected]>
---
drivers/xen/xenbus/xenbus_xs.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 3e59590c7254..aed954b09b9b 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -137,11 +137,20 @@ static uint32_t xs_request_enter(struct xb_req_data *req)

void xs_request_exit(struct xb_req_data *req)
{
+ unsigned int users_old;
+
spin_lock(&xs_state_lock);
+ users_old = xs_state_users;
xs_state_users--;
if ((req->type == XS_TRANSACTION_START && req->msg.type == XS_ERROR) ||
req->type == XS_TRANSACTION_END)
xs_state_users--;
+ if (WARN_ON(xs_state_users > users_old))
+ /*
+ * Someone misused XS_TRANSACTION_{START,END}. Reset the
+ * reference counter so we might survive.
+ */
+ xs_state_users = 0;
spin_unlock(&xs_state_lock);

if (xs_suspend_active && !xs_state_users)
--
2.15.1


2018-02-10 16:55:04

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling



On 02/07/2018 05:22 PM, Simon Gaiser wrote:
> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
> concurrent xenstore accesses") made a subtle change to the semantic of
> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>
> Before on an error response to XS_TRANSACTION_END
> xenbus_dev_request_and_reply() would not decrement the active
> transaction counter. But xenbus_transaction_end() has always counted the
> transaction as finished regardless of the response.
>
> The new behavior is that xenbus_dev_request_and_reply() and
> xenbus_transaction_end() will always count the transaction as finished
> regardless the response code (handled in xs_request_exit()).
>
> But xenbus_dev_frontend tries to end a transaction on closing of the
> device if the XS_TRANSACTION_END failed before. Trying to close the
> transaction twice corrupts the reference count. So fix this by also
> considering a transaction closed if we have sent XS_TRANSACTION_END once
> regardless of the return code.
>
> Cc: <[email protected]> # 4.11
> Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
> Signed-off-by: Simon Gaiser <[email protected]>


Reviewed-by: Boris Ostrovsky <[email protected]>

(although I'd prefer Juergen to also take a look at this)

> ---
> drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
> index f3b089b7c0b6..d2edbc79384a 100644
> --- a/drivers/xen/xenbus/xenbus_dev_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
> @@ -365,7 +365,7 @@ void xenbus_dev_queue_reply(struct xb_req_data *req)
> if (WARN_ON(rc))
> goto out;
> }
> - } else if (req->msg.type == XS_TRANSACTION_END) {
> + } else if (req->type == XS_TRANSACTION_END) {
> trans = xenbus_get_transaction(u, req->msg.tx_id);
> if (WARN_ON(!trans))
> goto out;
>

2018-02-10 16:58:41

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH 2/2] xen: xenbus: WARN_ON XS_TRANSACTION_{START,END} misuse



On 02/07/2018 05:22 PM, Simon Gaiser wrote:
> As the previous commit shows it's quite easy to confuse the transaction
> reference counting by ending a transaction twice. So at least try to
> detect and report it.
>
> Signed-off-by: Simon Gaiser <[email protected]>
> ---
> drivers/xen/xenbus/xenbus_xs.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> index 3e59590c7254..aed954b09b9b 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -137,11 +137,20 @@ static uint32_t xs_request_enter(struct xb_req_data *req)
>
> void xs_request_exit(struct xb_req_data *req)
> {
> + unsigned int users_old;
> +
> spin_lock(&xs_state_lock);
> + users_old = xs_state_users;
> xs_state_users--;
> if ((req->type == XS_TRANSACTION_START && req->msg.type == XS_ERROR) ||
> req->type == XS_TRANSACTION_END)
> xs_state_users--;
> + if (WARN_ON(xs_state_users > users_old))


WARN_ON_ONCE()?

-boris


> + /*
> + * Someone misused XS_TRANSACTION_{START,END}. Reset the
> + * reference counter so we might survive.
> + */
> + xs_state_users = 0;
> spin_unlock(&xs_state_lock);
>
> if (xs_suspend_active && !xs_state_users)
>

2018-02-11 01:29:43

by Simon Gaiser

[permalink] [raw]
Subject: Re: [PATCH 2/2] xen: xenbus: WARN_ON XS_TRANSACTION_{START,END} misuse

Boris Ostrovsky:
> On 02/07/2018 05:22 PM, Simon Gaiser wrote:
>> + users_old = xs_state_users;
>> xs_state_users--;
>> if ((req->type == XS_TRANSACTION_START && req->msg.type == XS_ERROR) ||
>> req->type == XS_TRANSACTION_END)
>> xs_state_users--;
>> + if (WARN_ON(xs_state_users > users_old))
>
>
> WARN_ON_ONCE()?

Since we "fix" the wrong decrement by clamping at zero it should not
happen immediately again. But if you prefer _ONCE I can change it.


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2018-02-11 15:29:43

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH 2/2] xen: xenbus: WARN_ON XS_TRANSACTION_{START,END} misuse



On 02/10/2018 08:27 PM, Simon Gaiser wrote:
> Boris Ostrovsky:
>> On 02/07/2018 05:22 PM, Simon Gaiser wrote:
>>> + users_old = xs_state_users;
>>> xs_state_users--;
>>> if ((req->type == XS_TRANSACTION_START && req->msg.type == XS_ERROR) ||
>>> req->type == XS_TRANSACTION_END)
>>> xs_state_users--;
>>> + if (WARN_ON(xs_state_users > users_old))
>>
>>
>> WARN_ON_ONCE()?
>
> Since we "fix" the wrong decrement by clamping at zero it should not
> happen immediately again. But if you prefer _ONCE I can change it.
>


If this error can happen once then someone at least theoretically can
construct a case when it is repeated. So let's switch to _ONCE() variant.

Thanks.
-boris

2018-02-12 08:55:23

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

On 07/02/18 23:22, Simon Gaiser wrote:
> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
> concurrent xenstore accesses") made a subtle change to the semantic of
> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>
> Before on an error response to XS_TRANSACTION_END
> xenbus_dev_request_and_reply() would not decrement the active
> transaction counter. But xenbus_transaction_end() has always counted the
> transaction as finished regardless of the response.

Which is correct now. Xenstore will free all transaction related
data regardless of the response. A once failed transaction can't
be repaired, it has to be repeated completely.

The real problem is decrementing the counter when XS_TRANSACTION_END
for a non-existing transaction is being sent.

> The new behavior is that xenbus_dev_request_and_reply() and
> xenbus_transaction_end() will always count the transaction as finished
> regardless the response code (handled in xs_request_exit()).

ENOENT should not decrement the transaction counter, while all
other responses to XS_TRANSACTION_END should still do so.

> But xenbus_dev_frontend tries to end a transaction on closing of the
> device if the XS_TRANSACTION_END failed before. Trying to close the
> transaction twice corrupts the reference count. So fix this by also
> considering a transaction closed if we have sent XS_TRANSACTION_END once
> regardless of the return code.

A transaction in the list of transactions should not considered to be
finished. Either it is not on the list or it is still pending.

>
> Cc: <[email protected]> # 4.11
> Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
> Signed-off-by: Simon Gaiser <[email protected]>

So: your patch is a band-aid trying to cure the symptoms, but not the
real problem. Please do it properly.

NAK.


Juergen

2018-02-12 11:56:37

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

On 12/02/18 09:49, Juergen Gross wrote:
> On 07/02/18 23:22, Simon Gaiser wrote:
>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>> concurrent xenstore accesses") made a subtle change to the semantic of
>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>
>> Before on an error response to XS_TRANSACTION_END
>> xenbus_dev_request_and_reply() would not decrement the active
>> transaction counter. But xenbus_transaction_end() has always counted the
>> transaction as finished regardless of the response.
>
> Which is correct now. Xenstore will free all transaction related
> data regardless of the response. A once failed transaction can't
> be repaired, it has to be repeated completely.
>
> The real problem is decrementing the counter when XS_TRANSACTION_END
> for a non-existing transaction is being sent.
>
>> The new behavior is that xenbus_dev_request_and_reply() and
>> xenbus_transaction_end() will always count the transaction as finished
>> regardless the response code (handled in xs_request_exit()).
>
> ENOENT should not decrement the transaction counter, while all
> other responses to XS_TRANSACTION_END should still do so.

Sorry, I stand corrected: the ENOENT case should never happen, as this
case is tested in xenbus_write_transaction(). It doesn't hurt to test
for ENOENT, though.

What should be handled is EINVAL: this would happen if a user specified
a string different from "T" and "F".


Juergen

2018-02-20 04:57:56

by Simon Gaiser

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

Juergen Gross:
> On 07/02/18 23:22, Simon Gaiser wrote:
>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>> concurrent xenstore accesses") made a subtle change to the semantic of
>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>
>> Before on an error response to XS_TRANSACTION_END
>> xenbus_dev_request_and_reply() would not decrement the active
>> transaction counter. But xenbus_transaction_end() has always counted the
>> transaction as finished regardless of the response.
>
> Which is correct now. Xenstore will free all transaction related
> data regardless of the response. A once failed transaction can't
> be repaired, it has to be repeated completely.

So if xenstore frees the transaction why should we keep it in the list
with pending transaction in xenbus_dev_frontend? That's exactly what
this patch fixes by always removing it from the list, not only on a
successful response (See below for the EINVAL case).

[...]
>> But xenbus_dev_frontend tries to end a transaction on closing of the
>> device if the XS_TRANSACTION_END failed before. Trying to close the
>> transaction twice corrupts the reference count. So fix this by also
>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>> regardless of the return code.
>
> A transaction in the list of transactions should not considered to be
> finished. Either it is not on the list or it is still pending.

With "considering a transaction closed" I mean "take the code path which
removes the transaction from the list with pending transactions".

From the follow-up mail:
>>> The new behavior is that xenbus_dev_request_and_reply() and
>>> xenbus_transaction_end() will always count the transaction as finished
>>> regardless the response code (handled in xs_request_exit()).
>>
>> ENOENT should not decrement the transaction counter, while all
>> other responses to XS_TRANSACTION_END should still do so.
>
> Sorry, I stand corrected: the ENOENT case should never happen, as this
> case is tested in xenbus_write_transaction(). It doesn't hurt to test
> for ENOENT, though.
>
> What should be handled is EINVAL: this would happen if a user specified
> a string different from "T" and "F".

Ok, I will handle those cases in xs_request_exit(). Although I don't
like that this depends on the internals of xenstore (At least to me it's
not obvious why it should only return ENOENT or EINVAL in this case).

In the xenbus_write_transaction() case checking the string before
sending the transaction (like the transaction itself is verified) would
avoid this problem.


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2018-03-02 15:51:19

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

On 20/02/18 05:56, Simon Gaiser wrote:
> Juergen Gross:
>> On 07/02/18 23:22, Simon Gaiser wrote:
>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>
>>> Before on an error response to XS_TRANSACTION_END
>>> xenbus_dev_request_and_reply() would not decrement the active
>>> transaction counter. But xenbus_transaction_end() has always counted the
>>> transaction as finished regardless of the response.
>>
>> Which is correct now. Xenstore will free all transaction related
>> data regardless of the response. A once failed transaction can't
>> be repaired, it has to be repeated completely.
>
> So if xenstore frees the transaction why should we keep it in the list
> with pending transaction in xenbus_dev_frontend? That's exactly what
> this patch fixes by always removing it from the list, not only on a
> successful response (See below for the EINVAL case).

Aah, sorry, I seem to have misread my own coding. :-(

Yes, you are right. Sorry for not seeing it before.

>
> [...]
>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>> transaction twice corrupts the reference count. So fix this by also
>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>> regardless of the return code.
>>
>> A transaction in the list of transactions should not considered to be
>> finished. Either it is not on the list or it is still pending.
>
> With "considering a transaction closed" I mean "take the code path which
> removes the transaction from the list with pending transactions".
>
> From the follow-up mail:
>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>> xenbus_transaction_end() will always count the transaction as finished
>>>> regardless the response code (handled in xs_request_exit()).
>>>
>>> ENOENT should not decrement the transaction counter, while all
>>> other responses to XS_TRANSACTION_END should still do so.
>>
>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>> for ENOENT, though.
>>
>> What should be handled is EINVAL: this would happen if a user specified
>> a string different from "T" and "F".
>
> Ok, I will handle those cases in xs_request_exit(). Although I don't
> like that this depends on the internals of xenstore (At least to me it's
> not obvious why it should only return ENOENT or EINVAL in this case).
>
> In the xenbus_write_transaction() case checking the string before
> sending the transaction (like the transaction itself is verified) would
> avoid this problem.

Right. I'd prefer this solution.

Remains the only problem you tried to tackle with your second patch: a
kernel driver going crazy and ending transactions it never started (or
ending them multiple times). The EINVAL case can't happen here, but
ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
need to keep track of the transactions like in the user interface and
refuse ending an unknown transaction. Or you trust the kernel users.
Trying to fix the usage counter seems to be the wrong approach IMO.


Juergen

2018-03-02 15:57:45

by Simon Gaiser

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

Juergen Gross:
> On 20/02/18 05:56, Simon Gaiser wrote:
>> Juergen Gross:
>>> On 07/02/18 23:22, Simon Gaiser wrote:
>>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>>
>>>> Before on an error response to XS_TRANSACTION_END
>>>> xenbus_dev_request_and_reply() would not decrement the active
>>>> transaction counter. But xenbus_transaction_end() has always counted the
>>>> transaction as finished regardless of the response.
>>>
>>> Which is correct now. Xenstore will free all transaction related
>>> data regardless of the response. A once failed transaction can't
>>> be repaired, it has to be repeated completely.
>>
>> So if xenstore frees the transaction why should we keep it in the list
>> with pending transaction in xenbus_dev_frontend? That's exactly what
>> this patch fixes by always removing it from the list, not only on a
>> successful response (See below for the EINVAL case).
>
> Aah, sorry, I seem to have misread my own coding. :-(
>
> Yes, you are right. Sorry for not seeing it before.
>
>>
>> [...]
>>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>>> transaction twice corrupts the reference count. So fix this by also
>>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>>> regardless of the return code.
>>>
>>> A transaction in the list of transactions should not considered to be
>>> finished. Either it is not on the list or it is still pending.
>>
>> With "considering a transaction closed" I mean "take the code path which
>> removes the transaction from the list with pending transactions".
>>
>> From the follow-up mail:
>>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>>> xenbus_transaction_end() will always count the transaction as finished
>>>>> regardless the response code (handled in xs_request_exit()).
>>>>
>>>> ENOENT should not decrement the transaction counter, while all
>>>> other responses to XS_TRANSACTION_END should still do so.
>>>
>>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>>> for ENOENT, though.
>>>
>>> What should be handled is EINVAL: this would happen if a user specified
>>> a string different from "T" and "F".
>>
>> Ok, I will handle those cases in xs_request_exit(). Although I don't
>> like that this depends on the internals of xenstore (At least to me it's
>> not obvious why it should only return ENOENT or EINVAL in this case).
>>
>> In the xenbus_write_transaction() case checking the string before
>> sending the transaction (like the transaction itself is verified) would
>> avoid this problem.
>
> Right. I'd prefer this solution.
>
> Remains the only problem you tried to tackle with your second patch: a
> kernel driver going crazy and ending transactions it never started (or
> ending them multiple times). The EINVAL case can't happen here, but
> ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
> need to keep track of the transactions like in the user interface and
> refuse ending an unknown transaction. Or you trust the kernel users.
> Trying to fix the usage counter seems to be the wrong approach IMO.

The point of the second patch was to detect such bugs. This would have
saved quite some time to find this bug. I added the "fix" of the counter
I just because it was trivial after having the if there.

Adding tracking seems to be a quite complex solution for a _potential_
problem.

So I would go with checking ENOENT in xs_request_exit(). Should this be
WARN_ON_ONCE()? Since this normally should not happen I would say yes.

Should I keep the reference counter sanity check? And if yes, with the
"fix" to the counter?


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2018-03-02 16:05:05

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH 1/2] xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

On 02/03/18 15:58, Simon Gaiser wrote:
> Juergen Gross:
>> On 20/02/18 05:56, Simon Gaiser wrote:
>>> Juergen Gross:
>>>> On 07/02/18 23:22, Simon Gaiser wrote:
>>>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>>>
>>>>> Before on an error response to XS_TRANSACTION_END
>>>>> xenbus_dev_request_and_reply() would not decrement the active
>>>>> transaction counter. But xenbus_transaction_end() has always counted the
>>>>> transaction as finished regardless of the response.
>>>>
>>>> Which is correct now. Xenstore will free all transaction related
>>>> data regardless of the response. A once failed transaction can't
>>>> be repaired, it has to be repeated completely.
>>>
>>> So if xenstore frees the transaction why should we keep it in the list
>>> with pending transaction in xenbus_dev_frontend? That's exactly what
>>> this patch fixes by always removing it from the list, not only on a
>>> successful response (See below for the EINVAL case).
>>
>> Aah, sorry, I seem to have misread my own coding. :-(
>>
>> Yes, you are right. Sorry for not seeing it before.
>>
>>>
>>> [...]
>>>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>>>> transaction twice corrupts the reference count. So fix this by also
>>>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>>>> regardless of the return code.
>>>>
>>>> A transaction in the list of transactions should not considered to be
>>>> finished. Either it is not on the list or it is still pending.
>>>
>>> With "considering a transaction closed" I mean "take the code path which
>>> removes the transaction from the list with pending transactions".
>>>
>>> From the follow-up mail:
>>>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>>>> xenbus_transaction_end() will always count the transaction as finished
>>>>>> regardless the response code (handled in xs_request_exit()).
>>>>>
>>>>> ENOENT should not decrement the transaction counter, while all
>>>>> other responses to XS_TRANSACTION_END should still do so.
>>>>
>>>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>>>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>>>> for ENOENT, though.
>>>>
>>>> What should be handled is EINVAL: this would happen if a user specified
>>>> a string different from "T" and "F".
>>>
>>> Ok, I will handle those cases in xs_request_exit(). Although I don't
>>> like that this depends on the internals of xenstore (At least to me it's
>>> not obvious why it should only return ENOENT or EINVAL in this case).
>>>
>>> In the xenbus_write_transaction() case checking the string before
>>> sending the transaction (like the transaction itself is verified) would
>>> avoid this problem.
>>
>> Right. I'd prefer this solution.
>>
>> Remains the only problem you tried to tackle with your second patch: a
>> kernel driver going crazy and ending transactions it never started (or
>> ending them multiple times). The EINVAL case can't happen here, but
>> ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
>> need to keep track of the transactions like in the user interface and
>> refuse ending an unknown transaction. Or you trust the kernel users.
>> Trying to fix the usage counter seems to be the wrong approach IMO.
>
> The point of the second patch was to detect such bugs. This would have
> saved quite some time to find this bug. I added the "fix" of the counter
> I just because it was trivial after having the if there.
>
> Adding tracking seems to be a quite complex solution for a _potential_
> problem.

I agree.

> So I would go with checking ENOENT in xs_request_exit(). Should this be
> WARN_ON_ONCE()? Since this normally should not happen I would say yes.

Yes, having a WARN_ON_ONCE here will help.

> Should I keep the reference counter sanity check? And if yes, with the
> "fix" to the counter?

I'd drop it. This really should not happen and blowing up kernel size
with checks of impossible situations isn't the way to go.

In case you really want to do something here you can add something like
ASSERT(xs_state_users) before decrementing the counter.


Juergen