The condition retry == 0 is theoretically possible even if 'client'
does not point to a valid element because no break was hit.
To only execute the dev_warn if actually a break within the loop was
hit, a separate variable is used that is only set if it is ensured to
point to a valid client struct.
Signed-off-by: Jakob Koschel <[email protected]>
---
drivers/virt/acrn/ioreq.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/virt/acrn/ioreq.c b/drivers/virt/acrn/ioreq.c
index 5ff1c53740c0..bdf7cedce926 100644
--- a/drivers/virt/acrn/ioreq.c
+++ b/drivers/virt/acrn/ioreq.c
@@ -211,7 +211,7 @@ static int ioreq_task(void *data)
*/
void acrn_ioreq_request_clear(struct acrn_vm *vm)
{
- struct acrn_ioreq_client *client;
+ struct acrn_ioreq_client *client, *iter;
bool has_pending = false;
unsigned long vcpu;
int retry = 10;
@@ -228,17 +228,20 @@ void acrn_ioreq_request_clear(struct acrn_vm *vm)
*/
do {
spin_lock_bh(&vm->ioreq_clients_lock);
- list_for_each_entry(client, &vm->ioreq_clients, list) {
- has_pending = has_pending_request(client);
- if (has_pending)
+ client = NULL;
+ list_for_each_entry(iter, &vm->ioreq_clients, list) {
+ has_pending = has_pending_request(iter);
+ if (has_pending) {
+ client = iter;
break;
+ }
}
spin_unlock_bh(&vm->ioreq_clients_lock);
if (has_pending)
schedule_timeout_interruptible(HZ / 100);
} while (has_pending && --retry > 0);
- if (retry == 0)
+ if (client)
dev_warn(acrn_dev.this_device,
"%s cannot flush pending request!\n", client->name);
base-commit: 34e047aa16c0123bbae8e2f6df33e5ecc1f56601
--
2.25.1
On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> The condition retry == 0 is theoretically possible even if 'client'
> does not point to a valid element because no break was hit.
>
> To only execute the dev_warn if actually a break within the loop was
> hit, a separate variable is used that is only set if it is ensured to
> point to a valid client struct.
>
Hi Koschel
Thanks for you to help us to try to improve the code. Maybe you don't get the point.
The dev_warn should only been called when has_pending = true && retry == 0
If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
to schedule out to wait all the pending I/O requests would been completed.
Thanks.
> Signed-off-by: Jakob Koschel <[email protected]>
> ---
> drivers/virt/acrn/ioreq.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/virt/acrn/ioreq.c b/drivers/virt/acrn/ioreq.c
> index 5ff1c53740c0..bdf7cedce926 100644
> --- a/drivers/virt/acrn/ioreq.c
> +++ b/drivers/virt/acrn/ioreq.c
> @@ -211,7 +211,7 @@ static int ioreq_task(void *data)
> */
> void acrn_ioreq_request_clear(struct acrn_vm *vm)
> {
> - struct acrn_ioreq_client *client;
> + struct acrn_ioreq_client *client, *iter;
> bool has_pending = false;
> unsigned long vcpu;
> int retry = 10;
> @@ -228,17 +228,20 @@ void acrn_ioreq_request_clear(struct acrn_vm *vm)
> */
> do {
> spin_lock_bh(&vm->ioreq_clients_lock);
> - list_for_each_entry(client, &vm->ioreq_clients, list) {
> - has_pending = has_pending_request(client);
> - if (has_pending)
> + client = NULL;
> + list_for_each_entry(iter, &vm->ioreq_clients, list) {
> + has_pending = has_pending_request(iter);
> + if (has_pending) {
> + client = iter;
> break;
> + }
> }
> spin_unlock_bh(&vm->ioreq_clients_lock);
>
> if (has_pending)
> schedule_timeout_interruptible(HZ / 100);
> } while (has_pending && --retry > 0);
> - if (retry == 0)
> + if (client)
> dev_warn(acrn_dev.this_device,
> "%s cannot flush pending request!\n", client->name);
>
>
> base-commit: 34e047aa16c0123bbae8e2f6df33e5ecc1f56601
> --
> 2.25.1
>
> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>
> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>> The condition retry == 0 is theoretically possible even if 'client'
>> does not point to a valid element because no break was hit.
>>
>> To only execute the dev_warn if actually a break within the loop was
>> hit, a separate variable is used that is only set if it is ensured to
>> point to a valid client struct.
>>
> Hi Koschel
>
> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> The dev_warn should only been called when has_pending = true && retry == 0
Maybe I don't understand but looking isolated at this function I could see a way to call
the dev_warn() with has_pending = false && retry == 0.
> list_for_each_entry(client, &vm->ioreq_clients, list) {
> has_pending = has_pending_request(client);
> if (has_pending)
> }
> spin_unlock_bh(&vm->ioreq_clients_lock);
imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>
> if (has_pending)
> schedule_timeout_interruptible(HZ / 100);
> } while (has_pending && --retry > 0);
since has_pending && --retry > 0 is no longer true the loop stops.
> if (retry == 0)
> dev_warn(acrn_dev.this_device,
> "%s cannot flush pending request!\n", client->name);
client->name is accessed since retry == 0 now, but client is not a valid struct ending up
in a type confusion.
>
> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> to schedule out to wait all the pending I/O requests would been completed.
>
> Thanks.
Again, I'm not sure if this is realistically possible. I'm trying to remove
any use of the list iterator after the loop to make such potentially issues detectable
at compile time instead of relying on certain (difficult to maintain) conditions to be met
to avoid the type confusion.
Thanks,
Jakob
On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>
> > On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
> >
> > On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
> >>
> >>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >>>
> >>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> >>>> The condition retry == 0 is theoretically possible even if 'client'
> >>>> does not point to a valid element because no break was hit.
> >>>>
> >>>> To only execute the dev_warn if actually a break within the loop was
> >>>> hit, a separate variable is used that is only set if it is ensured to
> >>>> point to a valid client struct.
> >>>>
> >>> Hi Koschel
> >>>
> >>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> >>> The dev_warn should only been called when has_pending = true && retry == 0
> >>
> >> Maybe I don't understand but looking isolated at this function I could see a way to call
> >> the dev_warn() with has_pending = false && retry == 0.
> > Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> > has_pending is true after schedule_timeout_interruptible and we even didn't check
> > there're other pending client on the ioreq_clients (because we can't make sure
> > when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>
> I'm sorry, I don't quite understand what you mean by that.
> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
> or not?
>
Yes, so what ? It just a hint there may have pending request.
Even retry == 0 && has_pending = true,
When we call dev_warn, the clent may not is pending.
> >>
> >>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>> has_pending = has_pending_request(client);
> >>> if (has_pending)
> >>> }
> >>> spin_unlock_bh(&vm->ioreq_clients_lock);
> >>
> >> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> > What do you mean "client will not hold a valid list entry" ?
>
> Imagine a very simple example:
>
> struct acrn_ioreq_client *client;
> list_for_each_entry(client, &vm->ioreq_clients, list) {
> continue;
> }
>
> dev_warn(acrn_dev.this_device,
> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>
If there're pending request, we would call schedule to schedule out then schedule back
to check the list from the beginning. If there's no pending client, we point to the last
client and break out the while loop.
The code doesn't want to find the pending client and break out the while loop and call
dev_warn. Please see the function comment.
>
> Since there is no break for the list_for_each_entry() iterator to return early,
> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
> Essentially before the terminating condition of the list traversal the following code is called:
>
> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>
> resulting in a:
>
> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>
> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
> this call compute an invalid pointer.
> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
> not safe. Since the loop can never return early in the simple example above it will
> always break. On other cases (the one we are discussing here) there might be a chance that
> there is one code path (in theory) where the loop did not exit early and 'client'
> holds that 'invalid entry'.
>
> This would be the case with has_pending = false && retry == 0.
>
> I hope this makes sense.
>
> >
> >>
> >>>
> >>> if (has_pending)
> >>> schedule_timeout_interruptible(HZ / 100);
> >>> } while (has_pending && --retry > 0);
> >>
> >> since has_pending && --retry > 0 is no longer true the loop stops.
> >>
> >>> if (retry == 0)
> >>> dev_warn(acrn_dev.this_device,
> >>> "%s cannot flush pending request!\n", client->name);
> >> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
> >> in a type confusion.
> >>
> >>>
> >>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> >>> to schedule out to wait all the pending I/O requests would been completed.
> >>>
> >>> Thanks.
> >>
> >> Again, I'm not sure if this is realistically possible. I'm trying to remove
> >> any use of the list iterator after the loop to make such potentially issues detectable
> > You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>
> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
> the do while loop.
>
> The only thing we know after the do while loop is that: !has_pending || retry == 0
> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>
> >> at compile time instead of relying on certain (difficult to maintain) conditions to be met
> >> to avoid the type confusion.
> >>
> >> Thanks,
> >> Jakob
>
> Jakob
>
> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
>
> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>
>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>
>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>> does not point to a valid element because no break was hit.
>>>>
>>>> To only execute the dev_warn if actually a break within the loop was
>>>> hit, a separate variable is used that is only set if it is ensured to
>>>> point to a valid client struct.
>>>>
>>> Hi Koschel
>>>
>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>> The dev_warn should only been called when has_pending = true && retry == 0
>>
>> Maybe I don't understand but looking isolated at this function I could see a way to call
>> the dev_warn() with has_pending = false && retry == 0.
> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> has_pending is true after schedule_timeout_interruptible and we even didn't check
> there're other pending client on the ioreq_clients (because we can't make sure
> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
I'm sorry, I don't quite understand what you mean by that.
Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
or not?
>>
>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>> has_pending = has_pending_request(client);
>>> if (has_pending)
>>> }
>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>
>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> What do you mean "client will not hold a valid list entry" ?
Imagine a very simple example:
struct acrn_ioreq_client *client;
list_for_each_entry(client, &vm->ioreq_clients, list) {
continue;
}
dev_warn(acrn_dev.this_device,
"%s cannot flush pending request!\n", client->name); /* NOT GOOD */
Since there is no break for the list_for_each_entry() iterator to return early,
client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
Essentially before the terminating condition of the list traversal the following code is called:
list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
resulting in a:
container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
&vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
this call compute an invalid pointer.
Therefore using 'client' as in the example above (e.g. client->name) after the loop is
not safe. Since the loop can never return early in the simple example above it will
always break. On other cases (the one we are discussing here) there might be a chance that
there is one code path (in theory) where the loop did not exit early and 'client'
holds that 'invalid entry'.
This would be the case with has_pending = false && retry == 0.
I hope this makes sense.
>
>>
>>>
>>> if (has_pending)
>>> schedule_timeout_interruptible(HZ / 100);
>>> } while (has_pending && --retry > 0);
>>
>> since has_pending && --retry > 0 is no longer true the loop stops.
>>
>>> if (retry == 0)
>>> dev_warn(acrn_dev.this_device,
>>> "%s cannot flush pending request!\n", client->name);
>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>> in a type confusion.
>>
>>>
>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>> to schedule out to wait all the pending I/O requests would been completed.
>>>
>>> Thanks.
>>
>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>> any use of the list iterator after the loop to make such potentially issues detectable
> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
I'm not exactly sure which loop you are referring to but no, I don't think we are still in
the do while loop.
The only thing we know after the do while loop is that: !has_pending || retry == 0
And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>> to avoid the type confusion.
>>
>> Thanks,
>> Jakob
Jakob
On Fri, Apr 01, 2022 at 10:50:51AM +0200, Jakob Koschel wrote:
>
>
> > On 1. Apr 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >
> > On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
> >>
> >>
> >>> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
> >>>
> >>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
> >>>>
> >>>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
> >>>>>
> >>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
> >>>>>>
> >>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >>>>>>>
> >>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> >>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
> >>>>>>>> does not point to a valid element because no break was hit.
> >>>>>>>>
> >>>>>>>> To only execute the dev_warn if actually a break within the loop was
> >>>>>>>> hit, a separate variable is used that is only set if it is ensured to
> >>>>>>>> point to a valid client struct.
> >>>>>>>>
> >>>>>>> Hi Koschel
> >>>>>>>
> >>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> >>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
> >>>>>>
> >>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
> >>>>>> the dev_warn() with has_pending = false && retry == 0.
> >>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> >>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
> >>>>> there're other pending client on the ioreq_clients (because we can't make sure
> >>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
> >>>>
> >>>> I'm sorry, I don't quite understand what you mean by that.
> >>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
> >>>> or not?
> >>>>
> >>> Yes, so what ? It just a hint there may have pending request.
> >>
> >> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
> >> dependencies met:
> >>
> >> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
> >> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
> >> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
> > So you'd better to check when the client in ioreq_clients would been destroyed, right ?
>
> I'm afraid I don't know exactly what you mean here.
>
> More specifically:
> to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
>
> Actually thinking about it more the check should be
>
> if (client && retry == 0)
>
> to be correct.
The client in ioreq_clients would always been "valid" (here valid means the client struct would not
been destroyed) when this function been called. That's guaranteed by the code logic.
>
> if has_pending == false I read the code as if no client was found that has a pending request
> so I don't follow how:
>
> "%s cannot flush pending request!\n", client->name);
>
> can be valid since no client has a pending request.
>
> >
> >>
> >> I'm *only* talking about the case where has_pending == false, in case that's not clear.
> >>
> >>
> >>> Even retry == 0 && has_pending = true,
> >>> When we call dev_warn, the clent may not is pending.
> >>>
> >>>
> >>>>>>
> >>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>>>>>> has_pending = has_pending_request(client);
> >>>>>>> if (has_pending)
> >>>>>>> }
> >>>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
> >>>>>>
> >>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> >>>>> What do you mean "client will not hold a valid list entry" ?
> >>>>
> >>>> Imagine a very simple example:
> >>>>
> >>>> struct acrn_ioreq_client *client;
> >>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>>> continue;
> >>>> }
> >>>>
> >>>> dev_warn(acrn_dev.this_device,
> >>>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
> >>>>
> >>> If there're pending request, we would call schedule to schedule out then schedule back
> >>> to check the list from the beginning. If there's no pending client, we point to the last
> >>> client and break out the while loop.
> >>>
> >>> The code doesn't want to find the pending client and break out the while loop and call
> >>> dev_warn. Please see the function comment.
> >>>
> >>>
> >>>>
> >>>> Since there is no break for the list_for_each_entry() iterator to return early,
> >>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
> >>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
> >>>> Essentially before the terminating condition of the list traversal the following code is called:
> >>>>
> >>>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>>>
> >>>> resulting in a:
> >>>>
> >>>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>>>
> >>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
> >>>> this call compute an invalid pointer.
> >>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
> >>>> not safe. Since the loop can never return early in the simple example above it will
> >>>> always break. On other cases (the one we are discussing here) there might be a chance that
> >>>> there is one code path (in theory) where the loop did not exit early and 'client'
> >>>> holds that 'invalid entry'.
> >>>>
> >>>> This would be the case with has_pending = false && retry == 0.
> >>>>
> >>>> I hope this makes sense.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> if (has_pending)
> >>>>>>> schedule_timeout_interruptible(HZ / 100);
> >>>>>>> } while (has_pending && --retry > 0);
> >>>>>>
> >>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
> >>>>>>
> >>>>>>> if (retry == 0)
> >>>>>>> dev_warn(acrn_dev.this_device,
> >>>>>>> "%s cannot flush pending request!\n", client->name);
> >>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
> >>>>>> in a type confusion.
> >>>>>>
> >>>>>>>
> >>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> >>>>>>> to schedule out to wait all the pending I/O requests would been completed.
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>
> >>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
> >>>>>> any use of the list iterator after the loop to make such potentially issues detectable
> >>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
> >>>>
> >>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
> >>>> the do while loop.
> >>>>
> >>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
> >>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
> >>>>
> >>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
> >>>>>> to avoid the type confusion.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jakob
> >>>>
> >>>> Jakob
> >>
> >> Jakob
>
On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
>
>
> > On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
> >
> > On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
> >>
> >>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
> >>>
> >>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
> >>>>
> >>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >>>>>
> >>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> >>>>>> The condition retry == 0 is theoretically possible even if 'client'
> >>>>>> does not point to a valid element because no break was hit.
> >>>>>>
> >>>>>> To only execute the dev_warn if actually a break within the loop was
> >>>>>> hit, a separate variable is used that is only set if it is ensured to
> >>>>>> point to a valid client struct.
> >>>>>>
> >>>>> Hi Koschel
> >>>>>
> >>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> >>>>> The dev_warn should only been called when has_pending = true && retry == 0
> >>>>
> >>>> Maybe I don't understand but looking isolated at this function I could see a way to call
> >>>> the dev_warn() with has_pending = false && retry == 0.
> >>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> >>> has_pending is true after schedule_timeout_interruptible and we even didn't check
> >>> there're other pending client on the ioreq_clients (because we can't make sure
> >>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
> >>
> >> I'm sorry, I don't quite understand what you mean by that.
> >> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
> >> or not?
> >>
> > Yes, so what ? It just a hint there may have pending request.
>
> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
> dependencies met:
>
> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
So you'd better to check when the client in ioreq_clients would been destroyed, right ?
>
> I'm *only* talking about the case where has_pending == false, in case that's not clear.
>
>
> > Even retry == 0 && has_pending = true,
> > When we call dev_warn, the clent may not is pending.
> >
> >
> >>>>
> >>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>>>> has_pending = has_pending_request(client);
> >>>>> if (has_pending)
> >>>>> }
> >>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
> >>>>
> >>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> >>> What do you mean "client will not hold a valid list entry" ?
> >>
> >> Imagine a very simple example:
> >>
> >> struct acrn_ioreq_client *client;
> >> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >> continue;
> >> }
> >>
> >> dev_warn(acrn_dev.this_device,
> >> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
> >>
> > If there're pending request, we would call schedule to schedule out then schedule back
> > to check the list from the beginning. If there's no pending client, we point to the last
> > client and break out the while loop.
> >
> > The code doesn't want to find the pending client and break out the while loop and call
> > dev_warn. Please see the function comment.
> >
> >
> >>
> >> Since there is no break for the list_for_each_entry() iterator to return early,
> >> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
> >> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
> >> Essentially before the terminating condition of the list traversal the following code is called:
> >>
> >> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>
> >> resulting in a:
> >>
> >> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>
> >> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
> >> this call compute an invalid pointer.
> >> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
> >> not safe. Since the loop can never return early in the simple example above it will
> >> always break. On other cases (the one we are discussing here) there might be a chance that
> >> there is one code path (in theory) where the loop did not exit early and 'client'
> >> holds that 'invalid entry'.
> >>
> >> This would be the case with has_pending = false && retry == 0.
> >>
> >> I hope this makes sense.
> >>
> >>>
> >>>>
> >>>>>
> >>>>> if (has_pending)
> >>>>> schedule_timeout_interruptible(HZ / 100);
> >>>>> } while (has_pending && --retry > 0);
> >>>>
> >>>> since has_pending && --retry > 0 is no longer true the loop stops.
> >>>>
> >>>>> if (retry == 0)
> >>>>> dev_warn(acrn_dev.this_device,
> >>>>> "%s cannot flush pending request!\n", client->name);
> >>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
> >>>> in a type confusion.
> >>>>
> >>>>>
> >>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> >>>>> to schedule out to wait all the pending I/O requests would been completed.
> >>>>>
> >>>>> Thanks.
> >>>>
> >>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
> >>>> any use of the list iterator after the loop to make such potentially issues detectable
> >>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
> >>
> >> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
> >> the do while loop.
> >>
> >> The only thing we know after the do while loop is that: !has_pending || retry == 0
> >> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
> >>
> >>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
> >>>> to avoid the type confusion.
> >>>>
> >>>> Thanks,
> >>>> Jakob
> >>
> >> Jakob
>
> Jakob
>
On Fri, Apr 01, 2022 at 11:08:02AM +0200, Jakob Koschel wrote:
>
>
> > On 1. Apr 2022, at 11:05, Li Fei1 <[email protected]> wrote:
> >
> > On Fri, Apr 01, 2022 at 10:50:51AM +0200, Jakob Koschel wrote:
> >>
> >>
> >>> On 1. Apr 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >>>
> >>> On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
> >>>>
> >>>>
> >>>>> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
> >>>>>
> >>>>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
> >>>>>>
> >>>>>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
> >>>>>>>
> >>>>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
> >>>>>>>>
> >>>>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> >>>>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
> >>>>>>>>>> does not point to a valid element because no break was hit.
> >>>>>>>>>>
> >>>>>>>>>> To only execute the dev_warn if actually a break within the loop was
> >>>>>>>>>> hit, a separate variable is used that is only set if it is ensured to
> >>>>>>>>>> point to a valid client struct.
> >>>>>>>>>>
> >>>>>>>>> Hi Koschel
> >>>>>>>>>
> >>>>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> >>>>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
> >>>>>>>>
> >>>>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
> >>>>>>>> the dev_warn() with has_pending = false && retry == 0.
> >>>>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
> >>>>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
> >>>>>>> there're other pending client on the ioreq_clients (because we can't make sure
> >>>>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
> >>>>>>
> >>>>>> I'm sorry, I don't quite understand what you mean by that.
> >>>>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
> >>>>>> or not?
> >>>>>>
> >>>>> Yes, so what ? It just a hint there may have pending request.
> >>>>
> >>>> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
> >>>> dependencies met:
> >>>>
> >>>> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
> >>>> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
> >>>> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
> >>> So you'd better to check when the client in ioreq_clients would been destroyed, right ?
> >>
> >> I'm afraid I don't know exactly what you mean here.
> >>
> >> More specifically:
> >> to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
> >>
> >> Actually thinking about it more the check should be
> >>
> >> if (client && retry == 0)
> >>
> >> to be correct.
> > The client in ioreq_clients would always been "valid" (here valid means the client struct would not
> > been destroyed) when this function been called. That's guaranteed by the code logic.
>
> Now I'm very confused.
> Didn't you say the dev_msg() can be called with has_pending == false && retry == 0?
> Then the 'client' used in the dev_msg() cannot be valid.
I think I would not reply you before you understand how ACRN ioreq works.
>
> >>
> >> if has_pending == false I read the code as if no client was found that has a pending request
> >> so I don't follow how:
> >>
> >> "%s cannot flush pending request!\n", client->name);
> >>
> >> can be valid since no client has a pending request.
> >>
> >>>
> >>>>
> >>>> I'm *only* talking about the case where has_pending == false, in case that's not clear.
> >>>>
> >>>>
> >>>>> Even retry == 0 && has_pending = true,
> >>>>> When we call dev_warn, the clent may not is pending.
> >>>>>
> >>>>>
> >>>>>>>>
> >>>>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>>>>>>>> has_pending = has_pending_request(client);
> >>>>>>>>> if (has_pending)
> >>>>>>>>> }
> >>>>>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
> >>>>>>>>
> >>>>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
> >>>>>>> What do you mean "client will not hold a valid list entry" ?
> >>>>>>
> >>>>>> Imagine a very simple example:
> >>>>>>
> >>>>>> struct acrn_ioreq_client *client;
> >>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
> >>>>>> continue;
> >>>>>> }
> >>>>>>
> >>>>>> dev_warn(acrn_dev.this_device,
> >>>>>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
> >>>>>>
> >>>>> If there're pending request, we would call schedule to schedule out then schedule back
> >>>>> to check the list from the beginning. If there's no pending client, we point to the last
> >>>>> client and break out the while loop.
> >>>>>
> >>>>> The code doesn't want to find the pending client and break out the while loop and call
> >>>>> dev_warn. Please see the function comment.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Since there is no break for the list_for_each_entry() iterator to return early,
> >>>>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
> >>>>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
> >>>>>> Essentially before the terminating condition of the list traversal the following code is called:
> >>>>>>
> >>>>>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>>>>>
> >>>>>> resulting in a:
> >>>>>>
> >>>>>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
> >>>>>>
> >>>>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
> >>>>>> this call compute an invalid pointer.
> >>>>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
> >>>>>> not safe. Since the loop can never return early in the simple example above it will
> >>>>>> always break. On other cases (the one we are discussing here) there might be a chance that
> >>>>>> there is one code path (in theory) where the loop did not exit early and 'client'
> >>>>>> holds that 'invalid entry'.
> >>>>>>
> >>>>>> This would be the case with has_pending = false && retry == 0.
> >>>>>>
> >>>>>> I hope this makes sense.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> if (has_pending)
> >>>>>>>>> schedule_timeout_interruptible(HZ / 100);
> >>>>>>>>> } while (has_pending && --retry > 0);
> >>>>>>>>
> >>>>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
> >>>>>>>>
> >>>>>>>>> if (retry == 0)
> >>>>>>>>> dev_warn(acrn_dev.this_device,
> >>>>>>>>> "%s cannot flush pending request!\n", client->name);
> >>>>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
> >>>>>>>> in a type confusion.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> >>>>>>>>> to schedule out to wait all the pending I/O requests would been completed.
> >>>>>>>>>
> >>>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
> >>>>>>>> any use of the list iterator after the loop to make such potentially issues detectable
> >>>>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
> >>>>>>
> >>>>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
> >>>>>> the do while loop.
> >>>>>>
> >>>>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
> >>>>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
> >>>>>>
> >>>>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
> >>>>>>>> to avoid the type confusion.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jakob
> >>>>>>
> >>>>>> Jakob
> >>>>
> >>>> Jakob
>
> On 1. Apr 2022, at 11:12, Li Fei1 <[email protected]> wrote:
>
> On Fri, Apr 01, 2022 at 11:08:02AM +0200, Jakob Koschel wrote:
>>
>>
>>> On 1. Apr 2022, at 11:05, Li Fei1 <[email protected]> wrote:
>>>
>>> On Fri, Apr 01, 2022 at 10:50:51AM +0200, Jakob Koschel wrote:
>>>>
>>>>
>>>>> On 1. Apr 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>>>
>>>>> On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
>>>>>>
>>>>>>
>>>>>>> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
>>>>>>>
>>>>>>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>>>>>>>
>>>>>>>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>>>>>>>
>>>>>>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>>>>>>>> does not point to a valid element because no break was hit.
>>>>>>>>>>>>
>>>>>>>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>>>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>>>>>>>> point to a valid client struct.
>>>>>>>>>>>>
>>>>>>>>>>> Hi Koschel
>>>>>>>>>>>
>>>>>>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>>>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>>>>>>>
>>>>>>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>>>>>>>> the dev_warn() with has_pending = false && retry == 0.
>>>>>>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>>>>>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>>>>>>>> there're other pending client on the ioreq_clients (because we can't make sure
>>>>>>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>>>>>>>
>>>>>>>> I'm sorry, I don't quite understand what you mean by that.
>>>>>>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>>>>>>>> or not?
>>>>>>>>
>>>>>>> Yes, so what ? It just a hint there may have pending request.
>>>>>>
>>>>>> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
>>>>>> dependencies met:
>>>>>>
>>>>>> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
>>>>>> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
>>>>>> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
>>>>> So you'd better to check when the client in ioreq_clients would been destroyed, right ?
>>>>
>>>> I'm afraid I don't know exactly what you mean here.
>>>>
>>>> More specifically:
>>>> to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
>>>>
>>>> Actually thinking about it more the check should be
>>>>
>>>> if (client && retry == 0)
>>>>
>>>> to be correct.
>>> The client in ioreq_clients would always been "valid" (here valid means the client struct would not
>>> been destroyed) when this function been called. That's guaranteed by the code logic.
>>
>> Now I'm very confused.
>> Didn't you say the dev_msg() can be called with has_pending == false && retry == 0?
>> Then the 'client' used in the dev_msg() cannot be valid.
> I think I would not reply you before you understand how ACRN ioreq works.
I've asked you before if dev_msg() can be called with such requirements. Here:
>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>> or not?
>>
> Yes, so what ? It just a hint there may have pending request.
I'm just making the point that in such a case 'client' cannot be valid. For that I don't
need to understand the entirety how ACRN ioreq. Hence I asked you that question.
If you can't keep this to an objective discussion I'll just leave it as is.
>
>>
>>>>
>>>> if has_pending == false I read the code as if no client was found that has a pending request
>>>> so I don't follow how:
>>>>
>>>> "%s cannot flush pending request!\n", client->name);
>>>>
>>>> can be valid since no client has a pending request.
>>>>
>>>>>
>>>>>>
>>>>>> I'm *only* talking about the case where has_pending == false, in case that's not clear.
>>>>>>
>>>>>>
>>>>>>> Even retry == 0 && has_pending = true,
>>>>>>> When we call dev_warn, the clent may not is pending.
>>>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>>>>>> has_pending = has_pending_request(client);
>>>>>>>>>>> if (has_pending)
>>>>>>>>>>> }
>>>>>>>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>>>>>>>
>>>>>>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>>>>>>>> What do you mean "client will not hold a valid list entry" ?
>>>>>>>>
>>>>>>>> Imagine a very simple example:
>>>>>>>>
>>>>>>>> struct acrn_ioreq_client *client;
>>>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>>> continue;
>>>>>>>> }
>>>>>>>>
>>>>>>>> dev_warn(acrn_dev.this_device,
>>>>>>>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>>>>>>>
>>>>>>> If there're pending request, we would call schedule to schedule out then schedule back
>>>>>>> to check the list from the beginning. If there's no pending client, we point to the last
>>>>>>> client and break out the while loop.
>>>>>>>
>>>>>>> The code doesn't want to find the pending client and break out the while loop and call
>>>>>>> dev_warn. Please see the function comment.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Since there is no break for the list_for_each_entry() iterator to return early,
>>>>>>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>>>>>>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>>>>>>>> Essentially before the terminating condition of the list traversal the following code is called:
>>>>>>>>
>>>>>>>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>>>
>>>>>>>> resulting in a:
>>>>>>>>
>>>>>>>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>>>
>>>>>>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>>>>>>>> this call compute an invalid pointer.
>>>>>>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>>>>>>>> not safe. Since the loop can never return early in the simple example above it will
>>>>>>>> always break. On other cases (the one we are discussing here) there might be a chance that
>>>>>>>> there is one code path (in theory) where the loop did not exit early and 'client'
>>>>>>>> holds that 'invalid entry'.
>>>>>>>>
>>>>>>>> This would be the case with has_pending = false && retry == 0.
>>>>>>>>
>>>>>>>> I hope this makes sense.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> if (has_pending)
>>>>>>>>>>> schedule_timeout_interruptible(HZ / 100);
>>>>>>>>>>> } while (has_pending && --retry > 0);
>>>>>>>>>>
>>>>>>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>>>>>>>
>>>>>>>>>>> if (retry == 0)
>>>>>>>>>>> dev_warn(acrn_dev.this_device,
>>>>>>>>>>> "%s cannot flush pending request!\n", client->name);
>>>>>>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>>>>>>>> in a type confusion.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>>>>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>>>>>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>>>>>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>>>>>>>
>>>>>>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>>>>>>>> the do while loop.
>>>>>>>>
>>>>>>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>>>>>>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>>>>>>>
>>>>>>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>>>>>>>> to avoid the type confusion.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Jakob
>>>>>>>>
>>>>>>>> Jakob
>>>>>>
>>>>>> Jakob
> On 1. Apr 2022, at 11:05, Li Fei1 <[email protected]> wrote:
>
> On Fri, Apr 01, 2022 at 10:50:51AM +0200, Jakob Koschel wrote:
>>
>>
>>> On 1. Apr 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>
>>> On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
>>>>
>>>>
>>>>> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
>>>>>
>>>>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>>>>>
>>>>>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
>>>>>>>
>>>>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>>>>>
>>>>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>>>>>> does not point to a valid element because no break was hit.
>>>>>>>>>>
>>>>>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>>>>>> point to a valid client struct.
>>>>>>>>>>
>>>>>>>>> Hi Koschel
>>>>>>>>>
>>>>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>>>>>
>>>>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>>>>>> the dev_warn() with has_pending = false && retry == 0.
>>>>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>>>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>>>>>> there're other pending client on the ioreq_clients (because we can't make sure
>>>>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>>>>>
>>>>>> I'm sorry, I don't quite understand what you mean by that.
>>>>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>>>>>> or not?
>>>>>>
>>>>> Yes, so what ? It just a hint there may have pending request.
>>>>
>>>> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
>>>> dependencies met:
>>>>
>>>> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
>>>> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
>>>> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
>>> So you'd better to check when the client in ioreq_clients would been destroyed, right ?
>>
>> I'm afraid I don't know exactly what you mean here.
>>
>> More specifically:
>> to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
>>
>> Actually thinking about it more the check should be
>>
>> if (client && retry == 0)
>>
>> to be correct.
> The client in ioreq_clients would always been "valid" (here valid means the client struct would not
> been destroyed) when this function been called. That's guaranteed by the code logic.
Now I'm very confused.
Didn't you say the dev_msg() can be called with has_pending == false && retry == 0?
Then the 'client' used in the dev_msg() cannot be valid.
>>
>> if has_pending == false I read the code as if no client was found that has a pending request
>> so I don't follow how:
>>
>> "%s cannot flush pending request!\n", client->name);
>>
>> can be valid since no client has a pending request.
>>
>>>
>>>>
>>>> I'm *only* talking about the case where has_pending == false, in case that's not clear.
>>>>
>>>>
>>>>> Even retry == 0 && has_pending = true,
>>>>> When we call dev_warn, the clent may not is pending.
>>>>>
>>>>>
>>>>>>>>
>>>>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>>>> has_pending = has_pending_request(client);
>>>>>>>>> if (has_pending)
>>>>>>>>> }
>>>>>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>>>>>
>>>>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>>>>>> What do you mean "client will not hold a valid list entry" ?
>>>>>>
>>>>>> Imagine a very simple example:
>>>>>>
>>>>>> struct acrn_ioreq_client *client;
>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>> continue;
>>>>>> }
>>>>>>
>>>>>> dev_warn(acrn_dev.this_device,
>>>>>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>>>>>
>>>>> If there're pending request, we would call schedule to schedule out then schedule back
>>>>> to check the list from the beginning. If there's no pending client, we point to the last
>>>>> client and break out the while loop.
>>>>>
>>>>> The code doesn't want to find the pending client and break out the while loop and call
>>>>> dev_warn. Please see the function comment.
>>>>>
>>>>>
>>>>>>
>>>>>> Since there is no break for the list_for_each_entry() iterator to return early,
>>>>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>>>>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>>>>>> Essentially before the terminating condition of the list traversal the following code is called:
>>>>>>
>>>>>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>
>>>>>> resulting in a:
>>>>>>
>>>>>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>>>
>>>>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>>>>>> this call compute an invalid pointer.
>>>>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>>>>>> not safe. Since the loop can never return early in the simple example above it will
>>>>>> always break. On other cases (the one we are discussing here) there might be a chance that
>>>>>> there is one code path (in theory) where the loop did not exit early and 'client'
>>>>>> holds that 'invalid entry'.
>>>>>>
>>>>>> This would be the case with has_pending = false && retry == 0.
>>>>>>
>>>>>> I hope this makes sense.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> if (has_pending)
>>>>>>>>> schedule_timeout_interruptible(HZ / 100);
>>>>>>>>> } while (has_pending && --retry > 0);
>>>>>>>>
>>>>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>>>>>
>>>>>>>>> if (retry == 0)
>>>>>>>>> dev_warn(acrn_dev.this_device,
>>>>>>>>> "%s cannot flush pending request!\n", client->name);
>>>>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>>>>>> in a type confusion.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>>>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>>>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>>>>>
>>>>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>>>>>> the do while loop.
>>>>>>
>>>>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>>>>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>>>>>
>>>>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>>>>>> to avoid the type confusion.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jakob
>>>>>>
>>>>>> Jakob
>>>>
>>>> Jakob
> On 1. Apr 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>
> On Fri, Apr 01, 2022 at 09:16:48AM +0200, Jakob Koschel wrote:
>>
>>
>>> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
>>>
>>> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>>>
>>>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
>>>>>
>>>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>>>
>>>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>>>>>
>>>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>>>> does not point to a valid element because no break was hit.
>>>>>>>>
>>>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>>>> point to a valid client struct.
>>>>>>>>
>>>>>>> Hi Koschel
>>>>>>>
>>>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>>>
>>>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>>>> the dev_warn() with has_pending = false && retry == 0.
>>>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>>>> there're other pending client on the ioreq_clients (because we can't make sure
>>>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>>>
>>>> I'm sorry, I don't quite understand what you mean by that.
>>>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>>>> or not?
>>>>
>>> Yes, so what ? It just a hint there may have pending request.
>>
>> if has_pending == false && retry == 0 when calling dev_warn() there are very clear
>> dependencies met:
>>
>> * has_pending == false means that the list_for_each_entry() macro it *not* exit early.
>> * since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
>> * using client->name is not safe and will not point to a valid string (perhaps not even an address)
> So you'd better to check when the client in ioreq_clients would been destroyed, right ?
I'm afraid I don't know exactly what you mean here.
More specifically:
to check what? and I'm also not sure what you mean with "when client in ioreq_clients would been destroyed".
Actually thinking about it more the check should be
if (client && retry == 0)
to be correct.
if has_pending == false I read the code as if no client was found that has a pending request
so I don't follow how:
"%s cannot flush pending request!\n", client->name);
can be valid since no client has a pending request.
>
>>
>> I'm *only* talking about the case where has_pending == false, in case that's not clear.
>>
>>
>>> Even retry == 0 && has_pending = true,
>>> When we call dev_warn, the clent may not is pending.
>>>
>>>
>>>>>>
>>>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>>>> has_pending = has_pending_request(client);
>>>>>>> if (has_pending)
>>>>>>> }
>>>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>>>
>>>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>>>> What do you mean "client will not hold a valid list entry" ?
>>>>
>>>> Imagine a very simple example:
>>>>
>>>> struct acrn_ioreq_client *client;
>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>> continue;
>>>> }
>>>>
>>>> dev_warn(acrn_dev.this_device,
>>>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>>>
>>> If there're pending request, we would call schedule to schedule out then schedule back
>>> to check the list from the beginning. If there's no pending client, we point to the last
>>> client and break out the while loop.
>>>
>>> The code doesn't want to find the pending client and break out the while loop and call
>>> dev_warn. Please see the function comment.
>>>
>>>
>>>>
>>>> Since there is no break for the list_for_each_entry() iterator to return early,
>>>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>>>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>>>> Essentially before the terminating condition of the list traversal the following code is called:
>>>>
>>>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>
>>>> resulting in a:
>>>>
>>>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>>>
>>>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>>>> this call compute an invalid pointer.
>>>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>>>> not safe. Since the loop can never return early in the simple example above it will
>>>> always break. On other cases (the one we are discussing here) there might be a chance that
>>>> there is one code path (in theory) where the loop did not exit early and 'client'
>>>> holds that 'invalid entry'.
>>>>
>>>> This would be the case with has_pending = false && retry == 0.
>>>>
>>>> I hope this makes sense.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> if (has_pending)
>>>>>>> schedule_timeout_interruptible(HZ / 100);
>>>>>>> } while (has_pending && --retry > 0);
>>>>>>
>>>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>>>
>>>>>>> if (retry == 0)
>>>>>>> dev_warn(acrn_dev.this_device,
>>>>>>> "%s cannot flush pending request!\n", client->name);
>>>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>>>> in a type confusion.
>>>>>>
>>>>>>>
>>>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>>>
>>>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>>>> the do while loop.
>>>>
>>>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>>>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>>>
>>>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>>>> to avoid the type confusion.
>>>>>>
>>>>>> Thanks,
>>>>>> Jakob
>>>>
>>>> Jakob
>>
>> Jakob
> On 1. Apr 2022, at 05:57, Li Fei1 <[email protected]> wrote:
>
> On Fri, Apr 01, 2022 at 05:22:36AM +0200, Jakob Koschel wrote:
>>
>>> On 1. Apr 2022, at 03:15, Li Fei1 <[email protected]> wrote:
>>>
>>> On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>>>>
>>>>> On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
>>>>>
>>>>> On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
>>>>>> The condition retry == 0 is theoretically possible even if 'client'
>>>>>> does not point to a valid element because no break was hit.
>>>>>>
>>>>>> To only execute the dev_warn if actually a break within the loop was
>>>>>> hit, a separate variable is used that is only set if it is ensured to
>>>>>> point to a valid client struct.
>>>>>>
>>>>> Hi Koschel
>>>>>
>>>>> Thanks for you to help us to try to improve the code. Maybe you don't get the point.
>>>>> The dev_warn should only been called when has_pending = true && retry == 0
>>>>
>>>> Maybe I don't understand but looking isolated at this function I could see a way to call
>>>> the dev_warn() with has_pending = false && retry == 0.
>>> Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
>>> has_pending is true after schedule_timeout_interruptible and we even didn't check
>>> there're other pending client on the ioreq_clients (because we can't make sure
>>> when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>>
>> I'm sorry, I don't quite understand what you mean by that.
>> Do you agree that has_pending = false && retry == 0 is possible when calling the dev_warn()
>> or not?
>>
> Yes, so what ? It just a hint there may have pending request.
if has_pending == false && retry == 0 when calling dev_warn() there are very clear
dependencies met:
* has_pending == false means that the list_for_each_entry() macro it *not* exit early.
* since list_for_each_entry() did *not* exit early, client will not hold a valid list entry
* using client->name is not safe and will not point to a valid string (perhaps not even an address)
I'm *only* talking about the case where has_pending == false, in case that's not clear.
> Even retry == 0 && has_pending = true,
> When we call dev_warn, the clent may not is pending.
>
>
>>>>
>>>>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>>>>> has_pending = has_pending_request(client);
>>>>> if (has_pending)
>>>>> }
>>>>> spin_unlock_bh(&vm->ioreq_clients_lock);
>>>>
>>>> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
>>> What do you mean "client will not hold a valid list entry" ?
>>
>> Imagine a very simple example:
>>
>> struct acrn_ioreq_client *client;
>> list_for_each_entry(client, &vm->ioreq_clients, list) {
>> continue;
>> }
>>
>> dev_warn(acrn_dev.this_device,
>> "%s cannot flush pending request!\n", client->name); /* NOT GOOD */
>>
> If there're pending request, we would call schedule to schedule out then schedule back
> to check the list from the beginning. If there's no pending client, we point to the last
> client and break out the while loop.
>
> The code doesn't want to find the pending client and break out the while loop and call
> dev_warn. Please see the function comment.
>
>
>>
>> Since there is no break for the list_for_each_entry() iterator to return early,
>> client *cannot* be a valid entry of the list. In fact if you look at the list_for_each_entry()
>> macro, it will be an 'bogus' pointer, pointing somewhere into 'vm'.
>> Essentially before the terminating condition of the list traversal the following code is called:
>>
>> list_entry(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>
>> resulting in a:
>>
>> container_of(&vm->ioreq_clients, struct acrn_ioreq_client *, list);
>>
>> &vm->ioreq_clients however is not contained in a struct acrn_ioreq_client, making
>> this call compute an invalid pointer.
>> Therefore using 'client' as in the example above (e.g. client->name) after the loop is
>> not safe. Since the loop can never return early in the simple example above it will
>> always break. On other cases (the one we are discussing here) there might be a chance that
>> there is one code path (in theory) where the loop did not exit early and 'client'
>> holds that 'invalid entry'.
>>
>> This would be the case with has_pending = false && retry == 0.
>>
>> I hope this makes sense.
>>
>>>
>>>>
>>>>>
>>>>> if (has_pending)
>>>>> schedule_timeout_interruptible(HZ / 100);
>>>>> } while (has_pending && --retry > 0);
>>>>
>>>> since has_pending && --retry > 0 is no longer true the loop stops.
>>>>
>>>>> if (retry == 0)
>>>>> dev_warn(acrn_dev.this_device,
>>>>> "%s cannot flush pending request!\n", client->name);
>>>> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
>>>> in a type confusion.
>>>>
>>>>>
>>>>> If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
>>>>> to schedule out to wait all the pending I/O requests would been completed.
>>>>>
>>>>> Thanks.
>>>>
>>>> Again, I'm not sure if this is realistically possible. I'm trying to remove
>>>> any use of the list iterator after the loop to make such potentially issues detectable
>>> You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
>>
>> I'm not exactly sure which loop you are referring to but no, I don't think we are still in
>> the do while loop.
>>
>> The only thing we know after the do while loop is that: !has_pending || retry == 0
>> And iff has_pending && retry == 0, then we shouldn't call the dev_warn().
>>
>>>> at compile time instead of relying on certain (difficult to maintain) conditions to be met
>>>> to avoid the type confusion.
>>>>
>>>> Thanks,
>>>> Jakob
>>
>> Jakob
Jakob
On Thu, Mar 31, 2022 at 01:20:50PM +0200, Jakob Koschel wrote:
>
> > On 30. Mar 2022, at 09:57, Li Fei1 <[email protected]> wrote:
> >
> > On Sat, Mar 19, 2022 at 09:38:19PM +0100, Jakob Koschel wrote:
> >> The condition retry == 0 is theoretically possible even if 'client'
> >> does not point to a valid element because no break was hit.
> >>
> >> To only execute the dev_warn if actually a break within the loop was
> >> hit, a separate variable is used that is only set if it is ensured to
> >> point to a valid client struct.
> >>
> > Hi Koschel
> >
> > Thanks for you to help us to try to improve the code. Maybe you don't get the point.
> > The dev_warn should only been called when has_pending = true && retry == 0
>
> Maybe I don't understand but looking isolated at this function I could see a way to call
> the dev_warn() with has_pending = false && retry == 0.
Yes, even has_pending = true && retry == 0 at the beginning, we could not make sure
has_pending is true after schedule_timeout_interruptible and we even didn't check
there're other pending client on the ioreq_clients (because we can't make sure
when we dev_warn this clent is still pending). So we just use dev_warn not higher log level.
>
> > list_for_each_entry(client, &vm->ioreq_clients, list) {
> > has_pending = has_pending_request(client);
> > if (has_pending)
> > }
> > spin_unlock_bh(&vm->ioreq_clients_lock);
>
> imagine has_pending == false && retry == 1 here, then client will not hold a valid list entry.
What do you mean "client will not hold a valid list entry" ?
>
> >
> > if (has_pending)
> > schedule_timeout_interruptible(HZ / 100);
> > } while (has_pending && --retry > 0);
>
> since has_pending && --retry > 0 is no longer true the loop stops.
>
> > if (retry == 0)
> > dev_warn(acrn_dev.this_device,
> > "%s cannot flush pending request!\n", client->name);
> client->name is accessed since retry == 0 now, but client is not a valid struct ending up
> in a type confusion.
>
> >
> > If retry > 0 and has_pending is true, we would call schedule_timeout_interruptible
> > to schedule out to wait all the pending I/O requests would been completed.
> >
> > Thanks.
>
> Again, I'm not sure if this is realistically possible. I'm trying to remove
> any use of the list iterator after the loop to make such potentially issues detectable
You may think we still in the loop (could we ?), even that we didn't write the list iterator then.
> at compile time instead of relying on certain (difficult to maintain) conditions to be met
> to avoid the type confusion.
>
> Thanks,
> Jakob
>