2014-11-12 03:55:08

by Dexuan Cui

[permalink] [raw]
Subject: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

In the case the user-space daemon crashes, hangs or is killed, we
need to down the semaphore, otherwise, after the daemon starts next
time, the obsolete data in fcopy_transaction.message or
fcopy_transaction.fcopy_msg will be used immediately.

Cc: K. Y. Srinivasan <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
---
drivers/hv/hv_fcopy.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 23b2ce2..177122a 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct *dummy)
* process the pending transaction.
*/
fcopy_respond_to_host(HV_E_FAIL);
+
+ /* In the case the user-space daemon crashes, hangs or is killed, we
+ * need to down the semaphore, otherwise, after the daemon starts next
+ * time, the obsolete data in fcopy_transaction.message or
+ * fcopy_transaction.fcopy_msg will be used immediately.
+ */
+ if (down_trylock(&fcopy_transaction.read_sema))
+ pr_debug("FCP: failed to acquire the semaphore\n");
+
}

static int fcopy_handle_handshake(u32 version)
--
1.9.1


2014-11-12 09:42:02

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

Dexuan Cui <[email protected]> writes:

> In the case the user-space daemon crashes, hangs or is killed, we
> need to down the semaphore, otherwise, after the daemon starts next
> time, the obsolete data in fcopy_transaction.message or
> fcopy_transaction.fcopy_msg will be used immediately.
>
> Cc: K. Y. Srinivasan <[email protected]>
> Signed-off-by: Dexuan Cui <[email protected]>

Reviewed-by: Vitaly Kuznetsov <[email protected]>

> ---
> drivers/hv/hv_fcopy.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index 23b2ce2..177122a 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct *dummy)
> * process the pending transaction.
> */
> fcopy_respond_to_host(HV_E_FAIL);
> +
> + /* In the case the user-space daemon crashes, hangs or is killed, we
> + * need to down the semaphore, otherwise, after the daemon starts next
> + * time, the obsolete data in fcopy_transaction.message or
> + * fcopy_transaction.fcopy_msg will be used immediately.
> + */
> + if (down_trylock(&fcopy_transaction.read_sema))
> + pr_debug("FCP: failed to acquire the semaphore\n");
> +
> }
>
> static int fcopy_handle_handshake(u32 version)

--
Vitaly

2014-11-19 22:59:21

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure



> -----Original Message-----
> From: devel [mailto:[email protected]] On
> Behalf Of Dexuan Cui
> Sent: Tuesday, November 11, 2014 9:03 PM
> To: [email protected]; [email protected]; driverdev-
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: Haiyang Zhang
> Subject: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
>
> In the case the user-space daemon crashes, hangs or is killed, we need to
> down the semaphore, otherwise, after the daemon starts next time, the
> obsolete data in fcopy_transaction.message or fcopy_transaction.fcopy_msg
> will be used immediately.
>
> Cc: K. Y. Srinivasan <[email protected]>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
> drivers/hv/hv_fcopy.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> 23b2ce2..177122a 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> *dummy)
> * process the pending transaction.
> */
> fcopy_respond_to_host(HV_E_FAIL);
> +
> + /* In the case the user-space daemon crashes, hangs or is killed, we
> + * need to down the semaphore, otherwise, after the daemon starts
> next
> + * time, the obsolete data in fcopy_transaction.message or
> + * fcopy_transaction.fcopy_msg will be used immediately.
> + */
> + if (down_trylock(&fcopy_transaction.read_sema))
> + pr_debug("FCP: failed to acquire the semaphore\n");
> +
> }

When the daemon is killed, we currently reset the state in the release function. Why can't we cleanup the semaphore state (initialize) here as well.

K. Y
>
> static int fcopy_handle_handshake(u32 version)
> --
> 1.9.1
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

2014-11-20 07:48:38

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

> -----Original Message-----
> From: KY Srinivasan
> Sent: Thursday, November 20, 2014 6:59 AM
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > 23b2ce2..177122a 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> > *dummy)
> > * process the pending transaction.
> > */
> > fcopy_respond_to_host(HV_E_FAIL);
> > +
> > +/* In the case the user-space daemon crashes, hangs or is killed, we
> > + * need to down the semaphore, otherwise, after the daemon starts
> > next
> > + * time, the obsolete data in fcopy_transaction.message or
> > + * fcopy_transaction.fcopy_msg will be used immediately.
> > + */
> > +if (down_trylock(&fcopy_transaction.read_sema))
> > +pr_debug("FCP: failed to acquire the semaphore\n");
> > +
> > }
>
> When the daemon is killed, we currently reset the state in the release
> function. Why can't we cleanup the semaphore state (initialize) here as well.
>
> K. Y

Hi KY,
1) The down_trylock() here is necessary: the daemon can fail to respond in 5
seconds due to many reasons, e.g., the VM's CPU and I/O are too busy. In this
case, the daemon may become running later(NOTE: in this example, the daemon
is not killed), but from the host user's point of view, the PowerShell
copy-vmfile command has failed, so here we have to 'down' the semaphore
anyway, otherwise, the daemon can get obsolete data.

2) If we add a line
sema_init(&fcopy_transaction.read_sema, 0);
in fcopy_release(), it seems OK at a glance, but we have to handle the race
condition: the above down_trylock() and the sema_init() can, in theory, run
simultaneously on different virtual CPUs. It's tricky to address this.

3) So I think we can reuse the same semaphore without an actually unnecessary
re-initialization. :-)

Thanks,
-- Dexuan

2014-11-20 17:58:25

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure



> -----Original Message-----
> From: Dexuan Cui
> Sent: Wednesday, November 19, 2014 11:48 PM
> To: KY Srinivasan; [email protected]; linux-
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: Haiyang Zhang
> Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
>
> > -----Original Message-----
> > From: KY Srinivasan
> > Sent: Thursday, November 20, 2014 6:59 AM
> > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > > 23b2ce2..177122a 100644
> > > --- a/drivers/hv/hv_fcopy.c
> > > +++ b/drivers/hv/hv_fcopy.c
> > > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> > > *dummy)
> > > * process the pending transaction.
> > > */
> > > fcopy_respond_to_host(HV_E_FAIL);
> > > +
> > > +/* In the case the user-space daemon crashes, hangs or is killed,
> > > +we
> > > + * need to down the semaphore, otherwise, after the daemon starts
> > > next
> > > + * time, the obsolete data in fcopy_transaction.message or
> > > + * fcopy_transaction.fcopy_msg will be used immediately.
> > > + */
> > > +if (down_trylock(&fcopy_transaction.read_sema))
> > > +pr_debug("FCP: failed to acquire the semaphore\n");
> > > +
> > > }
> >
> > When the daemon is killed, we currently reset the state in the release
> > function. Why can't we cleanup the semaphore state (initialize) here as
> well.
> >
> > K. Y
>
> Hi KY,
> 1) The down_trylock() here is necessary: the daemon can fail to respond in 5
> seconds due to many reasons, e.g., the VM's CPU and I/O are too busy. In
> this case, the daemon may become running later(NOTE: in this example, the
> daemon is not killed), but from the host user's point of view, the PowerShell
> copy-vmfile command has failed, so here we have to 'down' the semaphore
> anyway, otherwise, the daemon can get obsolete data.
>
> 2) If we add a line
> sema_init(&fcopy_transaction.read_sema, 0); in fcopy_release(), it seems
> OK at a glance, but we have to handle the race
> condition: the above down_trylock() and the sema_init() can, in theory, run
> simultaneously on different virtual CPUs. It's tricky to address this.
>
> 3) So I think we can reuse the same semaphore without an actually
> unnecessary re-initialization. :-)

Agreed; you may want to get rid of the pr_debug() call though.

Thanks,

K. Y
>
> Thanks,
> -- Dexuan

2014-11-21 02:42:05

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

> -----Original Message-----
> From: KY Srinivasan
> Sent: Friday, November 21, 2014 1:58 AM
> To: Dexuan Cui; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: Haiyang Zhang
> Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> > -----Original Message-----
> > From: Dexuan Cui
> > Sent: Wednesday, November 19, 2014 11:48 PM
> > To: KY Srinivasan; [email protected]; linux-
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Cc: Haiyang Zhang
> > Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> > failure
> >
> > > -----Original Message-----
> > > From: KY Srinivasan
> > > Sent: Thursday, November 20, 2014 6:59 AM
> > > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > > > 23b2ce2..177122a 100644
> > > > --- a/drivers/hv/hv_fcopy.c
> > > > +++ b/drivers/hv/hv_fcopy.c
> > > > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct
> work_struct
> > > > *dummy)
> > > > * process the pending transaction.
> > > > */
> > > > fcopy_respond_to_host(HV_E_FAIL);
> > > > +
> > > > +/* In the case the user-space daemon crashes, hangs or is killed,
> > > > +we
> > > > + * need to down the semaphore, otherwise, after the daemon starts
> > > > next
> > > > + * time, the obsolete data in fcopy_transaction.message or
> > > > + * fcopy_transaction.fcopy_msg will be used immediately.
> > > > + */
> > > > +if (down_trylock(&fcopy_transaction.read_sema))
> > > > +pr_debug("FCP: failed to acquire the semaphore\n");
> > > > +
> > > > }
> > >
> > > When the daemon is killed, we currently reset the state in the release
> > > function. Why can't we cleanup the semaphore state (initialize) here as
> > well.
> > >
> > > K. Y
> >
> > Hi KY,
> > 1) The down_trylock() here is necessary: the daemon can fail to respond
> in 5
> > seconds due to many reasons, e.g., the VM's CPU and I/O are too busy. In
> > this case, the daemon may become running later(NOTE: in this example,
> the
> > daemon is not killed), but from the host user's point of view, the
> PowerShell
> > copy-vmfile command has failed, so here we have to 'down' the
> semaphore
> > anyway, otherwise, the daemon can get obsolete data.
> >
> > 2) If we add a line
> > sema_init(&fcopy_transaction.read_sema, 0); in fcopy_release(), it seems
> > OK at a glance, but we have to handle the race
> > condition: the above down_trylock() and the sema_init() can, in theory,
> run
> > simultaneously on different virtual CPUs. It's tricky to address this.
> >
> > 3) So I think we can reuse the same semaphore without an actually
> > unnecessary re-initialization. :-)
>
> Agreed; you may want to get rid of the pr_debug() call though.
>
> Thanks,
>
> K. Y

The pr_debug() is added intentionally according to suggestion of
Redhat's Vitaly Kuznetsov in the bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1162100#c5

The function is declared with__must_check in include/linux/semaphore.h:
extern int __must_check down_trylock(struct semaphore *sem);

Without checking the return value, we'll get these warning if the
"Kernel hacking" options are enabled:

drivers/hv/hv_fcopy.c: In function 'fcopy_work_func':
drivers/hv/hv_fcopy.c:95:2: warning: ignoring return value of 'down_trylock', declared with attribute warn_unused_result [-Wunused-result]
(void)down_trylock(&fcopy_transaction.read_sema);
^

In practice, the message I add should be very rare since it's very unlikely to fail
to get the semaphore in this timeout case -- and in case this happens, it's actually
OK, because the driver has told the host user the PowerShell command should fail.

Thanks,
-- Dexuan

2014-11-21 18:29:56

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure



> -----Original Message-----
> From: Dexuan Cui
> Sent: Thursday, November 20, 2014 6:41 PM
> To: KY Srinivasan; [email protected]; linux-
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: Haiyang Zhang; Vitaly Kuznetsov
> Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
>
> > -----Original Message-----
> > From: KY Srinivasan
> > Sent: Friday, November 21, 2014 1:58 AM
> > To: Dexuan Cui; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Cc: Haiyang Zhang
> > Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on
> > transfer failure
> > > -----Original Message-----
> > > From: Dexuan Cui
> > > Sent: Wednesday, November 19, 2014 11:48 PM
> > > To: KY Srinivasan; [email protected]; linux-
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected]
> > > Cc: Haiyang Zhang
> > > Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on
> > > transfer failure
> > >
> > > > -----Original Message-----
> > > > From: KY Srinivasan
> > > > Sent: Thursday, November 20, 2014 6:59 AM
> > > > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > > > > 23b2ce2..177122a 100644
> > > > > --- a/drivers/hv/hv_fcopy.c
> > > > > +++ b/drivers/hv/hv_fcopy.c
> > > > > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct
> > work_struct
> > > > > *dummy)
> > > > > * process the pending transaction.
> > > > > */
> > > > > fcopy_respond_to_host(HV_E_FAIL);
> > > > > +
> > > > > +/* In the case the user-space daemon crashes, hangs or is
> > > > > +killed, we
> > > > > + * need to down the semaphore, otherwise, after the daemon
> > > > > +starts
> > > > > next
> > > > > + * time, the obsolete data in fcopy_transaction.message or
> > > > > + * fcopy_transaction.fcopy_msg will be used immediately.
> > > > > + */
> > > > > +if (down_trylock(&fcopy_transaction.read_sema))
> > > > > +pr_debug("FCP: failed to acquire the semaphore\n");
> > > > > +
> > > > > }
> > > >
> > > > When the daemon is killed, we currently reset the state in the
> > > > release function. Why can't we cleanup the semaphore state
> > > > (initialize) here as
> > > well.
> > > >
> > > > K. Y
> > >
> > > Hi KY,
> > > 1) The down_trylock() here is necessary: the daemon can fail to
> > > respond
> > in 5
> > > seconds due to many reasons, e.g., the VM's CPU and I/O are too
> > > busy. In this case, the daemon may become running later(NOTE: in
> > > this example,
> > the
> > > daemon is not killed), but from the host user's point of view, the
> > PowerShell
> > > copy-vmfile command has failed, so here we have to 'down' the
> > semaphore
> > > anyway, otherwise, the daemon can get obsolete data.
> > >
> > > 2) If we add a line
> > > sema_init(&fcopy_transaction.read_sema, 0); in fcopy_release(), it
> > > seems OK at a glance, but we have to handle the race
> > > condition: the above down_trylock() and the sema_init() can, in
> > > theory,
> > run
> > > simultaneously on different virtual CPUs. It's tricky to address this.
> > >
> > > 3) So I think we can reuse the same semaphore without an actually
> > > unnecessary re-initialization. :-)
> >
> > Agreed; you may want to get rid of the pr_debug() call though.
> >
> > Thanks,
> >
> > K. Y
>
> The pr_debug() is added intentionally according to suggestion of Redhat's
> Vitaly Kuznetsov in the bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=1162100#c5
>
> The function is declared with__must_check in include/linux/semaphore.h:
> extern int __must_check down_trylock(struct semaphore *sem);
>
> Without checking the return value, we'll get these warning if the "Kernel
> hacking" options are enabled:
>
> drivers/hv/hv_fcopy.c: In function 'fcopy_work_func':
> drivers/hv/hv_fcopy.c:95:2: warning: ignoring return value of 'down_trylock',
> declared with attribute warn_unused_result [-Wunused-result]
> (void)down_trylock(&fcopy_transaction.read_sema);
> ^
>
> In practice, the message I add should be very rare since it's very unlikely to
> fail to get the semaphore in this timeout case -- and in case this happens, it's
> actually OK, because the driver has told the host user the PowerShell
> command should fail.
Clearly, we don't want to ignore the return value (to avoid the warning); but that does not mean
that we need to print a message that is of questionable value. That said, I am fine with the code that
you currently have as this message is going to be very rare.

Regards,

K. Y
>
> Thanks,
> -- Dexuan

2014-11-27 02:41:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

On Tue, Nov 11, 2014 at 09:03:26PM -0800, Dexuan Cui wrote:
> In the case the user-space daemon crashes, hangs or is killed, we
> need to down the semaphore, otherwise, after the daemon starts next
> time, the obsolete data in fcopy_transaction.message or
> fcopy_transaction.fcopy_msg will be used immediately.
>
> Cc: K. Y. Srinivasan <[email protected]>
> Signed-off-by: Dexuan Cui <[email protected]>
> ---
> drivers/hv/hv_fcopy.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index 23b2ce2..177122a 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct *dummy)
> * process the pending transaction.
> */
> fcopy_respond_to_host(HV_E_FAIL);
> +
> + /* In the case the user-space daemon crashes, hangs or is killed, we
> + * need to down the semaphore, otherwise, after the daemon starts next
> + * time, the obsolete data in fcopy_transaction.message or
> + * fcopy_transaction.fcopy_msg will be used immediately.
> + */
> + if (down_trylock(&fcopy_transaction.read_sema))
> + pr_debug("FCP: failed to acquire the semaphore\n");

Why is "FCP:" needed? pr_debug() should never need any type of prefix.

Please fix.

thanks,

greg k-h

2014-11-27 06:22:18

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

> -----Original Message-----
> From: Greg KH [mailto:[email protected]]
> Sent: Thursday, November 27, 2014 7:54 AM
> To: Dexuan Cui
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; KY Srinivasan;
> Haiyang Zhang; [email protected]
> Subject: Re: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
>
> On Tue, Nov 11, 2014 at 09:03:26PM -0800, Dexuan Cui wrote:
> > In the case the user-space daemon crashes, hangs or is killed, we
> > need to down the semaphore, otherwise, after the daemon starts next
> > time, the obsolete data in fcopy_transaction.message or
> > fcopy_transaction.fcopy_msg will be used immediately.
> >
> > Cc: K. Y. Srinivasan <[email protected]>
> > Signed-off-by: Dexuan Cui <[email protected]>
> > ---
> > drivers/hv/hv_fcopy.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> > index 23b2ce2..177122a 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> *dummy)
> > * process the pending transaction.
> > */
> > fcopy_respond_to_host(HV_E_FAIL);
> > +
> > + /* In the case the user-space daemon crashes, hangs or is killed, we
> > + * need to down the semaphore, otherwise, after the daemon starts
> next
> > + * time, the obsolete data in fcopy_transaction.message or
> > + * fcopy_transaction.fcopy_msg will be used immediately.
> > + */
> > + if (down_trylock(&fcopy_transaction.read_sema))
> > + pr_debug("FCP: failed to acquire the semaphore\n");
>
> Why is "FCP:" needed? pr_debug() should never need any type of prefix.
>
> Please fix.
>
> thanks,
>
> greg k-h

Ok, I'll send a v2 soon.

Thanks,
-- Dexuan