2010-02-23 17:01:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH 0/3] vhost: logging fixes

The following patches on top of net-next fix issues related to write
logging in vhost. This fixes all known to me logging issues, migration
now works for me while under stress in both TX and RX directions.
Rusty's going on vacation, I am guessing he won't have time to review
this: Gleb, Juan, Herbert, could one of you review this patchset please?

There's also the send queue full issue reported by
Sridhar Samudrala which I'm testing various fixes for,
that patch is contained to vhost/net though,
so there's no conflict, patch will be posted separately.


Michael S. Tsirkin (3):
vhost: logging thinko fix
vhost: initialize log eventfd context pointer
vhost: fix get_user_pages_fast error handling

drivers/vhost/vhost.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)


2010-02-23 17:01:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH 2/3] vhost: initialize log eventfd context pointer

vq log eventfd context pointer needs to be initialized, otherwise
operation may fail or oops if log is enabled but log eventfd not set by
userspace.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/vhost/vhost.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c767279..d4f8fdf 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -121,6 +121,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->kick = NULL;
vq->call_ctx = NULL;
vq->call = NULL;
+ vq->log_ctx = NULL;
}

long vhost_dev_init(struct vhost_dev *dev,
--
1.7.0.18.g0d53a5

2010-02-23 17:01:44

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH 1/3] vhost: logging math fix

vhost was dong some complex math to get
offset to log at, and got it wrong by a couple of bytes,
while in fact it's simple: get address where we write,
subtract start of buffer, add log base.

Do it this way.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/vhost/vhost.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6eb1525..c767279 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1004,10 +1004,12 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
if (unlikely(vq->log_used)) {
/* Make sure data is seen before log. */
smp_wmb();
- log_write(vq->log_base, vq->log_addr + sizeof *vq->used->ring *
- (vq->last_used_idx % vq->num),
- sizeof *vq->used->ring);
- log_write(vq->log_base, vq->log_addr, sizeof *vq->used->ring);
+ log_write(vq->log_base,
+ vq->log_addr + ((void *)used - (void *)vq->used),
+ sizeof *used);
+ log_write(vq->log_base,
+ vq->log_addr + offsetof(struct vring_used, idx),
+ sizeof vq->used->idx);
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
--
1.7.0.18.g0d53a5

2010-02-23 17:01:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

get_user_pages_fast returns number of pages on success, negative value
on failure, but never 0. Fix vhost code to match this logic.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/vhost/vhost.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d4f8fdf..d003504 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
int bit = nr + (log % PAGE_SIZE) * 8;
int r;
r = get_user_pages_fast(log, 1, 1, &page);
- if (r)
+ if (r < 0)
return r;
+ BUG_ON(r != 1);
base = kmap_atomic(page, KM_USER0);
set_bit(bit, base);
kunmap_atomic(base, KM_USER0);
--
1.7.0.18.g0d53a5

2010-02-23 17:35:17

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 06:57:58PM +0200, Michael S. Tsirkin wrote:
> get_user_pages_fast returns number of pages on success, negative value
> on failure, but never 0. Fix vhost code to match this logic.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/vhost/vhost.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d4f8fdf..d003504 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> int bit = nr + (log % PAGE_SIZE) * 8;
> int r;
> r = get_user_pages_fast(log, 1, 1, &page);
> - if (r)
> + if (r < 0)
> return r;
> + BUG_ON(r != 1);
Can't this be easily triggered from user space?

> base = kmap_atomic(page, KM_USER0);
> set_bit(bit, base);
> kunmap_atomic(base, KM_USER0);
> --
> 1.7.0.18.g0d53a5

--
Gleb.

2010-02-23 17:36:54

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 07:34:34PM +0200, Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 06:57:58PM +0200, Michael S. Tsirkin wrote:
> > get_user_pages_fast returns number of pages on success, negative value
> > on failure, but never 0. Fix vhost code to match this logic.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> > drivers/vhost/vhost.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index d4f8fdf..d003504 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> > int bit = nr + (log % PAGE_SIZE) * 8;
> > int r;
> > r = get_user_pages_fast(log, 1, 1, &page);
> > - if (r)
> > + if (r < 0)
> > return r;
> > + BUG_ON(r != 1);
> Can't this be easily triggered from user space?

I think no. get_user_pages_fast always returns number of pages
pinned (in this case always 1) or an error (< 0).
Anything else is a kernel bug.

> > base = kmap_atomic(page, KM_USER0);
> > set_bit(bit, base);
> > kunmap_atomic(base, KM_USER0);
> > --
> > 1.7.0.18.g0d53a5
>
> --
> Gleb.

2010-02-23 17:40:31

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 07:32:58PM +0200, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2010 at 07:34:34PM +0200, Gleb Natapov wrote:
> > On Tue, Feb 23, 2010 at 06:57:58PM +0200, Michael S. Tsirkin wrote:
> > > get_user_pages_fast returns number of pages on success, negative value
> > > on failure, but never 0. Fix vhost code to match this logic.
> > >
> > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > > ---
> > > drivers/vhost/vhost.c | 3 ++-
> > > 1 files changed, 2 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index d4f8fdf..d003504 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> > > int bit = nr + (log % PAGE_SIZE) * 8;
> > > int r;
> > > r = get_user_pages_fast(log, 1, 1, &page);
> > > - if (r)
> > > + if (r < 0)
> > > return r;
> > > + BUG_ON(r != 1);
> > Can't this be easily triggered from user space?
>
> I think no. get_user_pages_fast always returns number of pages
> pinned (in this case always 1) or an error (< 0).
> Anything else is a kernel bug.
>
But what if page is unmapped from userspace?

> > > base = kmap_atomic(page, KM_USER0);
> > > set_bit(bit, base);
> > > kunmap_atomic(base, KM_USER0);
> > > --
> > > 1.7.0.18.g0d53a5
> >
> > --
> > Gleb.

--
Gleb.

2010-02-23 17:42:37

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 07:39:52PM +0200, Gleb Natapov wrote:
> On Tue, Feb 23, 2010 at 07:32:58PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Feb 23, 2010 at 07:34:34PM +0200, Gleb Natapov wrote:
> > > On Tue, Feb 23, 2010 at 06:57:58PM +0200, Michael S. Tsirkin wrote:
> > > > get_user_pages_fast returns number of pages on success, negative value
> > > > on failure, but never 0. Fix vhost code to match this logic.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > > > ---
> > > > drivers/vhost/vhost.c | 3 ++-
> > > > 1 files changed, 2 insertions(+), 1 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index d4f8fdf..d003504 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> > > > int bit = nr + (log % PAGE_SIZE) * 8;
> > > > int r;
> > > > r = get_user_pages_fast(log, 1, 1, &page);
> > > > - if (r)
> > > > + if (r < 0)
> > > > return r;
> > > > + BUG_ON(r != 1);
> > > Can't this be easily triggered from user space?
> >
> > I think no. get_user_pages_fast always returns number of pages
> > pinned (in this case always 1) or an error (< 0).
> > Anything else is a kernel bug.
> >
> But what if page is unmapped from userspace?

Then we get -EFAULT

> > > > base = kmap_atomic(page, KM_USER0);
> > > > set_bit(bit, base);
> > > > kunmap_atomic(base, KM_USER0);
> > > > --
> > > > 1.7.0.18.g0d53a5
> > >
> > > --
> > > Gleb.
>
> --
> Gleb.

2010-02-23 17:43:48

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 07:39:08PM +0200, Michael S. Tsirkin wrote:
> On Tue, Feb 23, 2010 at 07:39:52PM +0200, Gleb Natapov wrote:
> > On Tue, Feb 23, 2010 at 07:32:58PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Feb 23, 2010 at 07:34:34PM +0200, Gleb Natapov wrote:
> > > > On Tue, Feb 23, 2010 at 06:57:58PM +0200, Michael S. Tsirkin wrote:
> > > > > get_user_pages_fast returns number of pages on success, negative value
> > > > > on failure, but never 0. Fix vhost code to match this logic.
> > > > >
> > > > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > > > > ---
> > > > > drivers/vhost/vhost.c | 3 ++-
> > > > > 1 files changed, 2 insertions(+), 1 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > index d4f8fdf..d003504 100644
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> > > > > int bit = nr + (log % PAGE_SIZE) * 8;
> > > > > int r;
> > > > > r = get_user_pages_fast(log, 1, 1, &page);
> > > > > - if (r)
> > > > > + if (r < 0)
> > > > > return r;
> > > > > + BUG_ON(r != 1);
> > > > Can't this be easily triggered from user space?
> > >
> > > I think no. get_user_pages_fast always returns number of pages
> > > pinned (in this case always 1) or an error (< 0).
> > > Anything else is a kernel bug.
> > >
> > But what if page is unmapped from userspace?
>
> Then we get -EFAULT
>
Ah correct.

> > > > > base = kmap_atomic(page, KM_USER0);
> > > > > set_bit(bit, base);
> > > > > kunmap_atomic(base, KM_USER0);
> > > > > --
> > > > > 1.7.0.18.g0d53a5
> > > >
> > > > --
> > > > Gleb.
> >
> > --
> > Gleb.

--
Gleb.

2010-02-23 19:26:30

by Juan Quintela

[permalink] [raw]
Subject: Re: [PATCH 1/3] vhost: logging math fix

"Michael S. Tsirkin" <[email protected]> wrote:
> vhost was dong some complex math to get
> offset to log at, and got it wrong by a couple of bytes,
> while in fact it's simple: get address where we write,
> subtract start of buffer, add log base.
>
> Do it this way.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

Reviewed-by: Juan Quintela <[email protected]>

> ---
> drivers/vhost/vhost.c | 10 ++++++----
> 1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 6eb1525..c767279 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1004,10 +1004,12 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
> if (unlikely(vq->log_used)) {
> /* Make sure data is seen before log. */

We explain what smp_wmb() does.

> smp_wmb();
> - log_write(vq->log_base, vq->log_addr + sizeof *vq->used->ring *
> - (vq->last_used_idx % vq->num),
> - sizeof *vq->used->ring);
> - log_write(vq->log_base, vq->log_addr, sizeof *vq->used->ring);
> + log_write(vq->log_base,
> + vq->log_addr + ((void *)used - (void *)vq->used),
> + sizeof *used);
> + log_write(vq->log_base,
> + vq->log_addr + offsetof(struct vring_used, idx),
> + sizeof vq->used->idx);

Once here, can we add a comment explaining _what_ are we trying to write
to the log? michael explains that t is the used element and the index,
but nothing states that.

> if (vq->log_ctx)
> eventfd_signal(vq->log_ctx, 1);
> }

2010-02-23 19:31:24

by Juan Quintela

[permalink] [raw]
Subject: Re: [PATCH 2/3] vhost: initialize log eventfd context pointer

"Michael S. Tsirkin" <[email protected]> wrote:
> vq log eventfd context pointer needs to be initialized, otherwise
> operation may fail or oops if log is enabled but log eventfd not set by
> userspace.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

Reviewed-by: Juan Quintela <[email protected]>

When log_ctx for device is created, it is copied to the vq. This reset
was missing.

> ---
> drivers/vhost/vhost.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index c767279..d4f8fdf 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -121,6 +121,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
> vq->kick = NULL;
> vq->call_ctx = NULL;
> vq->call = NULL;
> + vq->log_ctx = NULL;
> }
>
> long vhost_dev_init(struct vhost_dev *dev,

2010-02-23 19:57:51

by Juan Quintela

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

"Michael S. Tsirkin" <[email protected]> wrote:
> get_user_pages_fast returns number of pages on success, negative value
> on failure, but never 0. Fix vhost code to match this logic.

It can return 0 if you ask for 0 pages :)
>From the comment:

* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.
*/

I agree that code was wrong, but the BUG_ON() is not neccessary
IMHO. The important bit is the change in the comparison.

Reviewed-by: Juan Quintela <[email protected]>


> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/vhost/vhost.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index d4f8fdf..d003504 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -646,8 +646,9 @@ static int set_bit_to_user(int nr, void __user *addr)
> int bit = nr + (log % PAGE_SIZE) * 8;
> int r;
> r = get_user_pages_fast(log, 1, 1, &page);
> - if (r)
> + if (r < 0)
> return r;
> + BUG_ON(r != 1);
> base = kmap_atomic(page, KM_USER0);
> set_bit(bit, base);
> kunmap_atomic(base, KM_USER0);

2010-02-23 22:42:19

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling


Just for the record I'm generally not interested in vhost
patches.

If it's a specific network one that will be merged via
the networking tree, yes please CC: me.

But if it's a bunch of changes to vhost.c and other pieces
of infrastructure, feel free to leave me out of it. It just
clutters my already overflowing inbox.

Thanks.

2010-02-24 05:41:08

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 02:42:35PM -0800, David Miller wrote:
>
> Just for the record I'm generally not interested in vhost
> patches.
>
> If it's a specific network one that will be merged via
> the networking tree, yes please CC: me.
>
> But if it's a bunch of changes to vhost.c and other pieces
> of infrastructure, feel free to leave me out of it. It just
> clutters my already overflowing inbox.
>
> Thanks.

Dave, so while Rusty's on vacation, what's the best way to get vhost
infrastructure fixes in? Are you ok with getting pull requests and
merging them into net-next? That should keep the clutter in your inbox
to the minimum.

Of course network changes would still go the usual way.

--
MST

2010-02-24 07:04:15

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

From: "Michael S. Tsirkin" <[email protected]>
Date: Wed, 24 Feb 2010 07:37:37 +0200

> Dave, so while Rusty's on vacation, what's the best way to get vhost
> infrastructure fixes in? Are you ok with getting pull requests and
> merging them into net-next? That should keep the clutter in your inbox
> to the minimum.
>
> Of course network changes would still go the usual way.

Well, who is providing oversight of vhost work while he's
gone? Has he, implicitly or explicitly, appointed a maintainer
while he's away?

2010-02-24 07:39:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

On Tue, Feb 23, 2010 at 11:04:28PM -0800, David Miller wrote:
> From: "Michael S. Tsirkin" <[email protected]>
> Date: Wed, 24 Feb 2010 07:37:37 +0200
>
> > Dave, so while Rusty's on vacation, what's the best way to get vhost
> > infrastructure fixes in? Are you ok with getting pull requests and
> > merging them into net-next? That should keep the clutter in your inbox
> > to the minimum.
> >
> > Of course network changes would still go the usual way.
>
> Well, who is providing oversight of vhost work while he's
> gone?

My plan was to get peer review of the patches before merging.
So far Juan Quintela and Gleb Natapov gave feedback.

> Has he, implicitly or explicitly, appointed a maintainer
> while he's away?

Implicitly, I guess. He said "if there's an issue Michael Tsirkin is the
best person to resolve it", this was wrt merging his virtio&lguest tree.
He didn't mention vhost, I wrote all of vhost though, there shouldn't be
an issue with that.

--
MST

2010-02-24 07:41:20

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling

From: "Michael S. Tsirkin" <[email protected]>
Date: Wed, 24 Feb 2010 09:34:25 +0200

> Implicitly, I guess. He said "if there's an issue Michael Tsirkin is the
> best person to resolve it", this was wrt merging his virtio&lguest tree.
> He didn't mention vhost, I wrote all of vhost though, there shouldn't be
> an issue with that.

That's good enough for me.

Feel free to setup a tree for me to pull from.