Subject: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

When copying a large file over sftp over vsock, data size is usually 32kB,
and kmalloc seems to fail to try to allocate 32 32kB regions.

Call Trace:
[<ffffffffb6a0df64>] dump_stack+0x97/0xdb
[<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
[<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
[<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
[<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
[<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
[<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
[<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
[<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
[<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
[<ffffffffb683ddce>] kthread+0xfd/0x105
[<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
[<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
[<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
[<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3

Work around by doing kvmalloc instead.

Signed-off-by: Junichi Uekawa <[email protected]>
---

drivers/vhost/vsock.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 368330417bde..5703775af129 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
return NULL;
}

- pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
+ pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
if (!pkt->buf) {
kfree(pkt);
return NULL;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..3a12aee33e92 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);

void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
{
- kfree(pkt->buf);
+ kvfree(pkt->buf);
kfree(pkt);
}
EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
--
2.37.3.998.g577e59143f-goog


2022-09-28 08:37:24

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
>When copying a large file over sftp over vsock, data size is usually 32kB,
>and kmalloc seems to fail to try to allocate 32 32kB regions.
>
> Call Trace:
> [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> [<ffffffffb683ddce>] kthread+0xfd/0x105
> [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>
>Work around by doing kvmalloc instead.
>
>Signed-off-by: Junichi Uekawa <[email protected]>
>---
>
> drivers/vhost/vsock.c | 2 +-
> net/vmw_vsock/virtio_transport_common.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 368330417bde..5703775af129 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> return NULL;
> }
>
>- pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
>+ pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> if (!pkt->buf) {
> kfree(pkt);
> return NULL;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index ec2c2afbf0d0..3a12aee33e92 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
>
> void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> {
>- kfree(pkt->buf);
>+ kvfree(pkt->buf);

virtio_transport_free_pkt() is used also in virtio_transport.c and
vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
kvfree() can be used with that memory, so this should be fine.

> kfree(pkt);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
>--
>2.37.3.998.g577e59143f-goog
>

This issue should go away with the Bobby's work about introducing
sk_buff [1], but we can queue this for now.

I'm not sure if we should do the same also in the virtio-vsock driver
(virtio_transport.c). Here in vhost-vsock the buf allocated is only used
in the host, while in the virtio-vsock driver the buffer is exposed to
the device emulated in the host, so it should be physically contiguous
(if not, maybe we need to adjust virtio_vsock_rx_fill()).
So for now I think is fine to use kvmalloc only on vhost-vsock
(eventually we can use it also in vsock_loopback), since the Bobby's
patch should rework this code:

Reviewed-by: Stefano Garzarella <[email protected]>

[1]
https://lore.kernel.org/lkml/65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com/

Thanks,
Stefano

2022-09-28 09:36:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > When copying a large file over sftp over vsock, data size is usually 32kB,
> > and kmalloc seems to fail to try to allocate 32 32kB regions.
> >
> > Call Trace:
> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> >
> > Work around by doing kvmalloc instead.
> >
> > Signed-off-by: Junichi Uekawa <[email protected]>

My worry here is that this in more of a work around.
It would be better to not allocate memory so aggressively:
if we are so short on memory we should probably process
packets one at a time. Is that very hard to implement?



> > ---
> >
> > drivers/vhost/vsock.c | 2 +-
> > net/vmw_vsock/virtio_transport_common.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index 368330417bde..5703775af129 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > return NULL;
> > }
> >
> > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> > if (!pkt->buf) {
> > kfree(pkt);
> > return NULL;
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index ec2c2afbf0d0..3a12aee33e92 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> >
> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> > {
> > - kfree(pkt->buf);
> > + kvfree(pkt->buf);
>
> virtio_transport_free_pkt() is used also in virtio_transport.c and
> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
> kvfree() can be used with that memory, so this should be fine.
>
> > kfree(pkt);
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> > --
> > 2.37.3.998.g577e59143f-goog
> >
>
> This issue should go away with the Bobby's work about introducing sk_buff
> [1], but we can queue this for now.
>
> I'm not sure if we should do the same also in the virtio-vsock driver
> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
> the host, while in the virtio-vsock driver the buffer is exposed to the
> device emulated in the host, so it should be physically contiguous (if not,
> maybe we need to adjust virtio_vsock_rx_fill()).

More importantly it needs to support DMA API which IIUC kvmalloc
memory does not.

> So for now I think is fine to use kvmalloc only on vhost-vsock (eventually
> we can use it also in vsock_loopback), since the Bobby's patch should rework
> this code:
>
> Reviewed-by: Stefano Garzarella <[email protected]>
>
> [1] https://lore.kernel.org/lkml/65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com/
>
> Thanks,
> Stefano

2022-09-28 15:21:56

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
>On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
>> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
>> > When copying a large file over sftp over vsock, data size is usually 32kB,
>> > and kmalloc seems to fail to try to allocate 32 32kB regions.
>> >
>> > Call Trace:
>> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
>> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
>> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
>> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
>> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
>> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
>> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
>> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
>> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
>> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
>> > [<ffffffffb683ddce>] kthread+0xfd/0x105
>> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
>> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
>> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> >
>> > Work around by doing kvmalloc instead.
>> >
>> > Signed-off-by: Junichi Uekawa <[email protected]>
>
>My worry here is that this in more of a work around.
>It would be better to not allocate memory so aggressively:
>if we are so short on memory we should probably process
>packets one at a time. Is that very hard to implement?

Currently the "virtio_vsock_pkt" is allocated in the "handle_kick"
callback of TX virtqueue. Then the packet is multiplexed on the right
socket queue, then the user space can de-queue it whenever they want.

So maybe we can stop processing the virtqueue if we are short on memory,
but when can we restart the TX virtqueue processing?

I think as long as the guest used only 4K buffers we had no problem, but
now that it can create larger buffers the host may not be able to
allocate it contiguously. Since there is no need to have them contiguous
here, I think this patch is okay.

However, if we switch to sk_buff (as Bobby is already doing), maybe we
don't have this problem because I think there is some kind of
pre-allocated pool.

>
>
>
>> > ---
>> >
>> > drivers/vhost/vsock.c | 2 +-
>> > net/vmw_vsock/virtio_transport_common.c | 2 +-
>> > 2 files changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index 368330417bde..5703775af129 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>> > return NULL;
>> > }
>> >
>> > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
>> > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
>> > if (!pkt->buf) {
>> > kfree(pkt);
>> > return NULL;
>> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > index ec2c2afbf0d0..3a12aee33e92 100644
>> > --- a/net/vmw_vsock/virtio_transport_common.c
>> > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
>> >
>> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
>> > {
>> > - kfree(pkt->buf);
>> > + kvfree(pkt->buf);
>>
>> virtio_transport_free_pkt() is used also in virtio_transport.c and
>> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
>> kvfree() can be used with that memory, so this should be fine.
>>
>> > kfree(pkt);
>> > }
>> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
>> > --
>> > 2.37.3.998.g577e59143f-goog
>> >
>>
>> This issue should go away with the Bobby's work about introducing sk_buff
>> [1], but we can queue this for now.
>>
>> I'm not sure if we should do the same also in the virtio-vsock driver
>> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
>> the host, while in the virtio-vsock driver the buffer is exposed to the
>> device emulated in the host, so it should be physically contiguous (if not,
>> maybe we need to adjust virtio_vsock_rx_fill()).
>
>More importantly it needs to support DMA API which IIUC kvmalloc
>memory does not.
>

Right, good point!

Thanks,
Stefano

2022-09-28 18:17:07

by Bobby Eshleman

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 05:11:35PM +0200, Stefano Garzarella wrote:
> On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> > > On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > > > When copying a large file over sftp over vsock, data size is usually 32kB,
> > > > and kmalloc seems to fail to try to allocate 32 32kB regions.
> > > >
> > > > Call Trace:
> > > > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > > > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > > > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > > > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > > > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > > > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > > > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > > > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > > > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > > > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > > > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > > > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > >
> > > > Work around by doing kvmalloc instead.
> > > >
> > > > Signed-off-by: Junichi Uekawa <[email protected]>
> >
> > My worry here is that this in more of a work around.
> > It would be better to not allocate memory so aggressively:
> > if we are so short on memory we should probably process
> > packets one at a time. Is that very hard to implement?
>
> Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" callback
> of TX virtqueue. Then the packet is multiplexed on the right socket queue,
> then the user space can de-queue it whenever they want.
>
> So maybe we can stop processing the virtqueue if we are short on memory, but
> when can we restart the TX virtqueue processing?
>
> I think as long as the guest used only 4K buffers we had no problem, but now
> that it can create larger buffers the host may not be able to allocate it
> contiguously. Since there is no need to have them contiguous here, I think
> this patch is okay.
>
> However, if we switch to sk_buff (as Bobby is already doing), maybe we don't
> have this problem because I think there is some kind of pre-allocated pool.
>
> >
> >
> >
> > > > ---
> > > >
> > > > drivers/vhost/vsock.c | 2 +-
> > > > net/vmw_vsock/virtio_transport_common.c | 2 +-
> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 368330417bde..5703775af129 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > > > return NULL;
> > > > }
> > > >
> > > > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> > > > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> > > > if (!pkt->buf) {
> > > > kfree(pkt);
> > > > return NULL;
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index ec2c2afbf0d0..3a12aee33e92 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> > > >
> > > > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> > > > {
> > > > - kfree(pkt->buf);
> > > > + kvfree(pkt->buf);
> > >
> > > virtio_transport_free_pkt() is used also in virtio_transport.c and
> > > vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
> > > kvfree() can be used with that memory, so this should be fine.
> > >
> > > > kfree(pkt);
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> > > > --
> > > > 2.37.3.998.g577e59143f-goog
> > > >
> > >
> > > This issue should go away with the Bobby's work about introducing sk_buff
> > > [1], but we can queue this for now.
> > >

I also expect the sk_buff allocator can handle this. I've tested the
sk_buff patch with 64K payloads via uperf without issue.

2022-09-28 20:17:10

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 05:11:35PM +0200, Stefano Garzarella wrote:
> On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> > > On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > > > When copying a large file over sftp over vsock, data size is usually 32kB,
> > > > and kmalloc seems to fail to try to allocate 32 32kB regions.
> > > >
> > > > Call Trace:
> > > > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > > > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > > > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > > > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > > > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > > > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > > > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > > > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > > > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > > > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > > > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > > > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > >
> > > > Work around by doing kvmalloc instead.
> > > >
> > > > Signed-off-by: Junichi Uekawa <[email protected]>
> >
> > My worry here is that this in more of a work around.
> > It would be better to not allocate memory so aggressively:
> > if we are so short on memory we should probably process
> > packets one at a time. Is that very hard to implement?
>
> Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" callback
> of TX virtqueue. Then the packet is multiplexed on the right socket queue,
> then the user space can de-queue it whenever they want.
>
> So maybe we can stop processing the virtqueue if we are short on memory, but
> when can we restart the TX virtqueue processing?

Assuming you added at least one buffer, the time to restart would be
after that buffer has been used.


> I think as long as the guest used only 4K buffers we had no problem, but now
> that it can create larger buffers the host may not be able to allocate it
> contiguously. Since there is no need to have them contiguous here, I think
> this patch is okay.
>
> However, if we switch to sk_buff (as Bobby is already doing), maybe we don't
> have this problem because I think there is some kind of pre-allocated pool.
>
> >
> >
> >
> > > > ---
> > > >
> > > > drivers/vhost/vsock.c | 2 +-
> > > > net/vmw_vsock/virtio_transport_common.c | 2 +-
> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 368330417bde..5703775af129 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > > > return NULL;
> > > > }
> > > >
> > > > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> > > > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> > > > if (!pkt->buf) {
> > > > kfree(pkt);
> > > > return NULL;
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > > index ec2c2afbf0d0..3a12aee33e92 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> > > >
> > > > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> > > > {
> > > > - kfree(pkt->buf);
> > > > + kvfree(pkt->buf);
> > >
> > > virtio_transport_free_pkt() is used also in virtio_transport.c and
> > > vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
> > > kvfree() can be used with that memory, so this should be fine.
> > >
> > > > kfree(pkt);
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> > > > --
> > > > 2.37.3.998.g577e59143f-goog
> > > >
> > >
> > > This issue should go away with the Bobby's work about introducing sk_buff
> > > [1], but we can queue this for now.
> > >
> > > I'm not sure if we should do the same also in the virtio-vsock driver
> > > (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
> > > the host, while in the virtio-vsock driver the buffer is exposed to the
> > > device emulated in the host, so it should be physically contiguous (if not,
> > > maybe we need to adjust virtio_vsock_rx_fill()).
> >
> > More importantly it needs to support DMA API which IIUC kvmalloc
> > memory does not.
> >
>
> Right, good point!
>
> Thanks,
> Stefano

Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

2022年9月29日(木) 0:11 Stefano Garzarella <[email protected]>:
>
> On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> >On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> >> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> >> > When copying a large file over sftp over vsock, data size is usually 32kB,
> >> > and kmalloc seems to fail to try to allocate 32 32kB regions.
> >> >
> >> > Call Trace:
> >> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> >> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> >> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> >> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> >> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> >> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> >> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> >> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> >> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> >> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> >> > [<ffffffffb683ddce>] kthread+0xfd/0x105
> >> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> >> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> >> >
> >> > Work around by doing kvmalloc instead.
> >> >
> >> > Signed-off-by: Junichi Uekawa <[email protected]>
> >
> >My worry here is that this in more of a work around.
> >It would be better to not allocate memory so aggressively:
> >if we are so short on memory we should probably process
> >packets one at a time. Is that very hard to implement?
>
> Currently the "virtio_vsock_pkt" is allocated in the "handle_kick"
> callback of TX virtqueue. Then the packet is multiplexed on the right
> socket queue, then the user space can de-queue it whenever they want.
>
> So maybe we can stop processing the virtqueue if we are short on memory,
> but when can we restart the TX virtqueue processing?
>
> I think as long as the guest used only 4K buffers we had no problem, but
> now that it can create larger buffers the host may not be able to
> allocate it contiguously. Since there is no need to have them contiguous
> here, I think this patch is okay.
>
> However, if we switch to sk_buff (as Bobby is already doing), maybe we
> don't have this problem because I think there is some kind of
> pre-allocated pool.
>

Thank you for the review! I was wondering if this is a reasonable workaround (as
we found that this patch makes a reliably crashing system into a
reliably surviving system.)


... Sounds like it is a reasonable patch to use backported to older kernels?

> >
> >
> >
> >> > ---
> >> >
> >> > drivers/vhost/vsock.c | 2 +-
> >> > net/vmw_vsock/virtio_transport_common.c | 2 +-
> >> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >> > index 368330417bde..5703775af129 100644
> >> > --- a/drivers/vhost/vsock.c
> >> > +++ b/drivers/vhost/vsock.c
> >> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >> > return NULL;
> >> > }
> >> >
> >> > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> >> > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> >> > if (!pkt->buf) {
> >> > kfree(pkt);
> >> > return NULL;
> >> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >> > index ec2c2afbf0d0..3a12aee33e92 100644
> >> > --- a/net/vmw_vsock/virtio_transport_common.c
> >> > +++ b/net/vmw_vsock/virtio_transport_common.c
> >> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> >> >
> >> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> >> > {
> >> > - kfree(pkt->buf);
> >> > + kvfree(pkt->buf);
> >>
> >> virtio_transport_free_pkt() is used also in virtio_transport.c and
> >> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
> >> kvfree() can be used with that memory, so this should be fine.
> >>
> >> > kfree(pkt);
> >> > }
> >> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> >> > --
> >> > 2.37.3.998.g577e59143f-goog
> >> >
> >>
> >> This issue should go away with the Bobby's work about introducing sk_buff
> >> [1], but we can queue this for now.
> >>
> >> I'm not sure if we should do the same also in the virtio-vsock driver
> >> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
> >> the host, while in the virtio-vsock driver the buffer is exposed to the
> >> device emulated in the host, so it should be physically contiguous (if not,
> >> maybe we need to adjust virtio_vsock_rx_fill()).
> >
> >More importantly it needs to support DMA API which IIUC kvmalloc
> >memory does not.
> >
>
> Right, good point!
>
> Thanks,
> Stefano
>


--
Junichi Uekawa
Google

2022-09-29 07:23:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, Sep 29, 2022 at 08:14:24AM +0900, Junichi Uekawa (上川純一) wrote:
> 2022年9月29日(木) 0:11 Stefano Garzarella <[email protected]>:
> >
> > On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> > >On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> > >> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > >> > When copying a large file over sftp over vsock, data size is usually 32kB,
> > >> > and kmalloc seems to fail to try to allocate 32 32kB regions.
> > >> >
> > >> > Call Trace:
> > >> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > >> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > >> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > >> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > >> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > >> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > >> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > >> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > >> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > >> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > >> > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > >> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > >> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > >> >
> > >> > Work around by doing kvmalloc instead.
> > >> >
> > >> > Signed-off-by: Junichi Uekawa <[email protected]>
> > >
> > >My worry here is that this in more of a work around.
> > >It would be better to not allocate memory so aggressively:
> > >if we are so short on memory we should probably process
> > >packets one at a time. Is that very hard to implement?
> >
> > Currently the "virtio_vsock_pkt" is allocated in the "handle_kick"
> > callback of TX virtqueue. Then the packet is multiplexed on the right
> > socket queue, then the user space can de-queue it whenever they want.
> >
> > So maybe we can stop processing the virtqueue if we are short on memory,
> > but when can we restart the TX virtqueue processing?
> >
> > I think as long as the guest used only 4K buffers we had no problem, but
> > now that it can create larger buffers the host may not be able to
> > allocate it contiguously. Since there is no need to have them contiguous
> > here, I think this patch is okay.
> >
> > However, if we switch to sk_buff (as Bobby is already doing), maybe we
> > don't have this problem because I think there is some kind of
> > pre-allocated pool.
> >
>
> Thank you for the review! I was wondering if this is a reasonable workaround (as
> we found that this patch makes a reliably crashing system into a
> reliably surviving system.)
>
>
> ... Sounds like it is a reasonable patch to use backported to older kernels?

Hmm. Good point about stable. OK.

Acked-by: Michael S. Tsirkin <[email protected]>


> > >
> > >
> > >
> > >> > ---
> > >> >
> > >> > drivers/vhost/vsock.c | 2 +-
> > >> > net/vmw_vsock/virtio_transport_common.c | 2 +-
> > >> > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > >> > index 368330417bde..5703775af129 100644
> > >> > --- a/drivers/vhost/vsock.c
> > >> > +++ b/drivers/vhost/vsock.c
> > >> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> > >> > return NULL;
> > >> > }
> > >> >
> > >> > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> > >> > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
> > >> > if (!pkt->buf) {
> > >> > kfree(pkt);
> > >> > return NULL;
> > >> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > >> > index ec2c2afbf0d0..3a12aee33e92 100644
> > >> > --- a/net/vmw_vsock/virtio_transport_common.c
> > >> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > >> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
> > >> >
> > >> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> > >> > {
> > >> > - kfree(pkt->buf);
> > >> > + kvfree(pkt->buf);
> > >>
> > >> virtio_transport_free_pkt() is used also in virtio_transport.c and
> > >> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
> > >> kvfree() can be used with that memory, so this should be fine.
> > >>
> > >> > kfree(pkt);
> > >> > }
> > >> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> > >> > --
> > >> > 2.37.3.998.g577e59143f-goog
> > >> >
> > >>
> > >> This issue should go away with the Bobby's work about introducing sk_buff
> > >> [1], but we can queue this for now.
> > >>
> > >> I'm not sure if we should do the same also in the virtio-vsock driver
> > >> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
> > >> the host, while in the virtio-vsock driver the buffer is exposed to the
> > >> device emulated in the host, so it should be physically contiguous (if not,
> > >> maybe we need to adjust virtio_vsock_rx_fill()).
> > >
> > >More importantly it needs to support DMA API which IIUC kvmalloc
> > >memory does not.
> > >
> >
> > Right, good point!
> >
> > Thanks,
> > Stefano
> >
>
>
> --
> Junichi Uekawa
> Google

2022-09-29 07:42:34

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 04:02:12PM -0400, Michael S. Tsirkin wrote:
>On Wed, Sep 28, 2022 at 05:11:35PM +0200, Stefano Garzarella wrote:
>> On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
>> > On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
>> > > On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
>> > > > When copying a large file over sftp over vsock, data size is usually 32kB,
>> > > > and kmalloc seems to fail to try to allocate 32 32kB regions.
>> > > >
>> > > > Call Trace:
>> > > > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
>> > > > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
>> > > > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
>> > > > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
>> > > > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
>> > > > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
>> > > > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
>> > > > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
>> > > > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
>> > > > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
>> > > > [<ffffffffb683ddce>] kthread+0xfd/0x105
>> > > > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
>> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> > > > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
>> > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> > > >
>> > > > Work around by doing kvmalloc instead.
>> > > >
>> > > > Signed-off-by: Junichi Uekawa <[email protected]>
>> >
>> > My worry here is that this in more of a work around.
>> > It would be better to not allocate memory so aggressively:
>> > if we are so short on memory we should probably process
>> > packets one at a time. Is that very hard to implement?
>>
>> Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" callback
>> of TX virtqueue. Then the packet is multiplexed on the right socket queue,
>> then the user space can de-queue it whenever they want.
>>
>> So maybe we can stop processing the virtqueue if we are short on memory, but
>> when can we restart the TX virtqueue processing?
>
>Assuming you added at least one buffer, the time to restart would be
>after that buffer has been used.

Yes, but we still might not have as many continuous pages to allocate,
so I would use kvmalloc the same.

I agree that we should do better, I hope that moving to sk_buff will
allow us to better manage allocation. Maybe after we merge that part we
should spend some time to solve these problems.

Thanks,
Stefano

2022-09-29 08:01:31

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, Sep 29, 2022 at 03:19:14AM -0400, Michael S. Tsirkin wrote:
>On Thu, Sep 29, 2022 at 08:14:24AM +0900, Junichi Uekawa (上川純一) wrote:
>> 2022年9月29日(木) 0:11 Stefano Garzarella <[email protected]>:
>> >
>> > On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
>> > >On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
>> > >> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
>> > >> > When copying a large file over sftp over vsock, data size is usually 32kB,
>> > >> > and kmalloc seems to fail to try to allocate 32 32kB regions.
>> > >> >
>> > >> > Call Trace:
>> > >> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
>> > >> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
>> > >> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
>> > >> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
>> > >> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
>> > >> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
>> > >> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
>> > >> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
>> > >> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
>> > >> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
>> > >> > [<ffffffffb683ddce>] kthread+0xfd/0x105
>> > >> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
>> > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> > >> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
>> > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> > >> >
>> > >> > Work around by doing kvmalloc instead.
>> > >> >
>> > >> > Signed-off-by: Junichi Uekawa <[email protected]>
>> > >
>> > >My worry here is that this in more of a work around.
>> > >It would be better to not allocate memory so aggressively:
>> > >if we are so short on memory we should probably process
>> > >packets one at a time. Is that very hard to implement?
>> >
>> > Currently the "virtio_vsock_pkt" is allocated in the "handle_kick"
>> > callback of TX virtqueue. Then the packet is multiplexed on the right
>> > socket queue, then the user space can de-queue it whenever they want.
>> >
>> > So maybe we can stop processing the virtqueue if we are short on memory,
>> > but when can we restart the TX virtqueue processing?
>> >
>> > I think as long as the guest used only 4K buffers we had no problem, but
>> > now that it can create larger buffers the host may not be able to
>> > allocate it contiguously. Since there is no need to have them contiguous
>> > here, I think this patch is okay.
>> >
>> > However, if we switch to sk_buff (as Bobby is already doing), maybe we
>> > don't have this problem because I think there is some kind of
>> > pre-allocated pool.
>> >
>>
>> Thank you for the review! I was wondering if this is a reasonable workaround (as
>> we found that this patch makes a reliably crashing system into a
>> reliably surviving system.)
>>
>>
>> ... Sounds like it is a reasonable patch to use backported to older kernels?
>
>Hmm. Good point about stable. OK.

Right, so in this case I think is better to add a Fixes tag. Since we
used kmalloc from the beginning we can use the following:

Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")

>
>Acked-by: Michael S. Tsirkin <[email protected]>
>

@Michael are you queueing this, or should it go through net tree?

Thanks,
Stefano

2022-09-29 08:01:31

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, Sep 29, 2022 at 09:40:10AM +0200, Stefano Garzarella wrote:
> On Wed, Sep 28, 2022 at 04:02:12PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2022 at 05:11:35PM +0200, Stefano Garzarella wrote:
> > > On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> > > > > On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > > > > > When copying a large file over sftp over vsock, data size is usually 32kB,
> > > > > > and kmalloc seems to fail to try to allocate 32 32kB regions.
> > > > > >
> > > > > > Call Trace:
> > > > > > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > > > > > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > > > > > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > > > > > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > > > > > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > > > > > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > > > > > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > > > > > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > > > > > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > > > > > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > > > > > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > > > > > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > > > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > > > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > > > > > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > > >
> > > > > > Work around by doing kvmalloc instead.
> > > > > >
> > > > > > Signed-off-by: Junichi Uekawa <[email protected]>
> > > >
> > > > My worry here is that this in more of a work around.
> > > > It would be better to not allocate memory so aggressively:
> > > > if we are so short on memory we should probably process
> > > > packets one at a time. Is that very hard to implement?
> > >
> > > Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" callback
> > > of TX virtqueue. Then the packet is multiplexed on the right socket queue,
> > > then the user space can de-queue it whenever they want.
> > >
> > > So maybe we can stop processing the virtqueue if we are short on memory, but
> > > when can we restart the TX virtqueue processing?
> >
> > Assuming you added at least one buffer, the time to restart would be
> > after that buffer has been used.
>
> Yes, but we still might not have as many continuous pages to allocate, so I
> would use kvmalloc the same.


you would do something like
if (is_vmalloc_addr())
stop adding buffers.



> I agree that we should do better, I hope that moving to sk_buff will allow
> us to better manage allocation. Maybe after we merge that part we should
> spend some time to solve these problems.
>
> Thanks,
> Stefano

2022-09-29 08:53:16

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, Sep 29, 2022 at 09:46:06AM +0200, Stefano Garzarella wrote:
> On Thu, Sep 29, 2022 at 03:19:14AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 29, 2022 at 08:14:24AM +0900, Junichi Uekawa (上川純一) wrote:
> > > 2022年9月29日(木) 0:11 Stefano Garzarella <[email protected]>:
> > > >
> > > > On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
> > > > >On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
> > > > >> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
> > > > >> > When copying a large file over sftp over vsock, data size is usually 32kB,
> > > > >> > and kmalloc seems to fail to try to allocate 32 32kB regions.
> > > > >> >
> > > > >> > Call Trace:
> > > > >> > [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> > > > >> > [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> > > > >> > [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> > > > >> > [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> > > > >> > [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> > > > >> > [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> > > > >> > [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> > > > >> > [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> > > > >> > [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> > > > >> > [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> > > > >> > [<ffffffffb683ddce>] kthread+0xfd/0x105
> > > > >> > [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> > > > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > >> > [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> > > > >> > [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> > > > >> >
> > > > >> > Work around by doing kvmalloc instead.
> > > > >> >
> > > > >> > Signed-off-by: Junichi Uekawa <[email protected]>
> > > > >
> > > > >My worry here is that this in more of a work around.
> > > > >It would be better to not allocate memory so aggressively:
> > > > >if we are so short on memory we should probably process
> > > > >packets one at a time. Is that very hard to implement?
> > > >
> > > > Currently the "virtio_vsock_pkt" is allocated in the "handle_kick"
> > > > callback of TX virtqueue. Then the packet is multiplexed on the right
> > > > socket queue, then the user space can de-queue it whenever they want.
> > > >
> > > > So maybe we can stop processing the virtqueue if we are short on memory,
> > > > but when can we restart the TX virtqueue processing?
> > > >
> > > > I think as long as the guest used only 4K buffers we had no problem, but
> > > > now that it can create larger buffers the host may not be able to
> > > > allocate it contiguously. Since there is no need to have them contiguous
> > > > here, I think this patch is okay.
> > > >
> > > > However, if we switch to sk_buff (as Bobby is already doing), maybe we
> > > > don't have this problem because I think there is some kind of
> > > > pre-allocated pool.
> > > >
> > >
> > > Thank you for the review! I was wondering if this is a reasonable workaround (as
> > > we found that this patch makes a reliably crashing system into a
> > > reliably surviving system.)
> > >
> > >
> > > ... Sounds like it is a reasonable patch to use backported to older kernels?
> >
> > Hmm. Good point about stable. OK.
>
> Right, so in this case I think is better to add a Fixes tag. Since we used
> kmalloc from the beginning we can use the following:
>
> Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
>
> >
> > Acked-by: Michael S. Tsirkin <[email protected]>
> >
>
> @Michael are you queueing this, or should it go through net tree?
>
> Thanks,
> Stefano

net tree would be preferable, my pull for this release is kind of ready ... kuba?

--
MST

2022-09-29 16:12:36

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, 29 Sep 2022 03:49:18 -0400 Michael S. Tsirkin wrote:
> net tree would be preferable, my pull for this release is kind of ready ... kuba?

If we're talking about 6.1 - we can take it, no problem.

2022-09-29 17:15:08

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, Sep 29, 2022 at 09:07:31AM -0700, Jakub Kicinski wrote:
> On Thu, 29 Sep 2022 03:49:18 -0400 Michael S. Tsirkin wrote:
> > net tree would be preferable, my pull for this release is kind of ready ... kuba?
>
> If we're talking about 6.1 - we can take it, no problem.

I think they want it in 6.0 as it fixes a crash.

2022-09-29 18:15:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Thu, 29 Sep 2022 12:25:14 -0400 Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2022 at 09:07:31AM -0700, Jakub Kicinski wrote:
> > On Thu, 29 Sep 2022 03:49:18 -0400 Michael S. Tsirkin wrote:
> > > net tree would be preferable, my pull for this release is kind of ready ... kuba?
> >
> > If we're talking about 6.1 - we can take it, no problem.
>
> I think they want it in 6.0 as it fixes a crash.

I thought it's just an OOM leading to send failure. Junichi could you
repost with the tags collected and the header for that stack trace
included? The line that says it's just an OOM...

"someone tried to alloc >32kB because it worked on a freshly booted
system" kind of cases are a dime a dozen, no?..

Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

Hi,

It doesn't really crash the kernel causing send failures (I think the
error might not be handled quite right but not sure if that's on the
sftp end or the vsock part). Symptom is sftp-over-vsock sitting there
presumably failing to send any data anymore at one point.



[ 5071.211895] vhost-5837: page allocation failure: order:4, mode:0x24040c0
[ 5071.211900] CPU: 0 PID: 5859 Comm: vhost-5837 Tainted: G C
4.4.302-21729-g34b1226044a7 #1
[ 5071.211902] Hardware name: Google Atlas/Atlas, BIOS
Google_Atlas.11827.125.0 11/09/2020
[ 5071.211904] aaaaaaaaaaaaaaaa 0b812745f761e452 ffff8803b9a23bc0
ffffffffb6a0df64
[ 5071.211909] 0000000000000002 0000003000000020 0000000000000282
0b812745f761e452
[ 5071.211913] 0000000000000001 ffff8803b9a23c58 ffffffffb68d6aed
0000000200000040
[ 5071.211917] Call Trace:
[ 5071.211922] [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
[ 5071.211925] [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
[ 5071.211928] [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
[ 5071.211931] [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
[ 5071.211933] [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
[ 5071.211936] [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
[ 5071.211938] [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
[ 5071.211941] [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
[ 5071.211945] [<ffffffffc0689ab7>]
vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
[ 5071.211949] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
[ 5071.211951] [<ffffffffb683ddce>] kthread+0xfd/0x105
[ 5071.211954] [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
[ 5071.211956] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
[ 5071.211959] [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
[ 5071.211961] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
[ 5071.212605] Mem-Info:
[ 5071.212609] active_anon:1454995 inactive_anon:383258 isolated_anon:0
active_file:968668 inactive_file:970304 isolated_file:0
unevictable:8176 dirty:892826 writeback:213 unstable:0
slab_reclaimable:67232 slab_unreclaimable:26022
mapped:1549569 shmem:1536241 pagetables:19892 bounce:0
free:22525 free_pcp:22 free_cma:0
[ 5071.212614] DMA free:15840kB min:12kB low:12kB high:16kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB
managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB
shmem:0kB slab_reclaimable:0kB slab_unreclaimable:64kB
kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
[ 5071.212616] lowmem_reserve[]: 0 1864 15903 15903
[ 5071.212623] DMA32 free:58268kB min:1888kB low:2360kB high:2832kB
active_anon:521800kB inactive_anon:282196kB active_file:454328kB
inactive_file:455308kB unevictable:3996kB isolated(anon):0kB
isolated(file):0kB present:1992216kB managed:1912404kB mlocked:0kB
dirty:421280kB writeback:180kB mapped:678312kB shmem:669844kB
slab_reclaimable:37700kB slab_unreclaimable:13744kB
kernel_stack:2800kB pagetables:9932kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:256 all_unreclaimable? no
[ 5071.212625] lowmem_reserve[]: 0 0 14038 14038
[ 5071.212631] Normal free:15992kB min:14240kB low:17800kB
high:21360kB active_anon:5298180kB inactive_anon:1250836kB
active_file:3420344kB inactive_file:3425908kB unevictable:28708kB
isolated(anon):0kB isolated(file):0kB present:14663680kB
managed:14375736kB mlocked:80kB dirty:3150024kB writeback:672kB
mapped:5519964kB shmem:5475120kB slab_reclaimable:231228kB
slab_unreclaimable:90280kB kernel_stack:20752kB pagetables:69636kB
unstable:0kB bounce:0kB free_pcp:88kB local_pcp:88kB free_cma:0kB
writeback_tmp:0kB pages_scanned:5188 all_unreclaimable? no
[ 5071.212633] lowmem_reserve[]: 0 0 0 0
[ 5071.212637] DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB
(U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) =
15840kB
[ 5071.212651] DMA32: 233*4kB (UME) 360*8kB (UME) 71*16kB (UM) 56*32kB
(UME) 30*64kB (UME) 45*128kB (UME) 38*256kB (UME) 19*512kB (UME)
6*1024kB (UM) 3*2048kB (UME) 3*4096kB (UM) = 58452kB
[ 5071.212668] Normal: 887*4kB (UME) 1194*8kB (UME) 174*16kB (U)
0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
15884kB
[ 5071.212680] 3475331 total pagecache pages
[ 5071.212681] 6 pages in swap cache
[ 5071.212682] Swap cache stats: add 45354, delete 45348, find 0/1601
[ 5071.212683] Free swap = 23774356kB
[ 5071.212684] Total swap = 23882876kB
[ 5071.212686] 4167973 pages RAM
[ 5071.212687] 0 pages HighMem/MovableOnly
[ 5071.212688] 91962 pages reserved


2022年9月30日(金) 3:07 Jakub Kicinski <[email protected]>:
>
> On Thu, 29 Sep 2022 12:25:14 -0400 Michael S. Tsirkin wrote:
> > On Thu, Sep 29, 2022 at 09:07:31AM -0700, Jakub Kicinski wrote:
> > > On Thu, 29 Sep 2022 03:49:18 -0400 Michael S. Tsirkin wrote:
> > > > net tree would be preferable, my pull for this release is kind of ready ... kuba?
> > >
> > > If we're talking about 6.1 - we can take it, no problem.
> >
> > I think they want it in 6.0 as it fixes a crash.
>
> I thought it's just an OOM leading to send failure. Junichi could you
> repost with the tags collected and the header for that stack trace
> included? The line that says it's just an OOM...
>

I think this is what you asked for but I don't quite know what you
mean by the tags

> "someone tried to alloc >32kB because it worked on a freshly booted
> system" kind of cases are a dime a dozen, no?..

2022-09-30 01:54:11

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Fri, 30 Sep 2022 09:49:54 +0900 Junichi Uekawa (上川純一) wrote:
> > > I think they want it in 6.0 as it fixes a crash.
> >
> > I thought it's just an OOM leading to send failure. Junichi could you
> > repost with the tags collected and the header for that stack trace
> > included? The line that says it's just an OOM...
> >
>
> I think this is what you asked for but I don't quite know what you
> mean by the tags

We call Reviewed-by: Acked-by: Fixes: etc trailers "tags".
Don't worry tho, you shared the full stack trace now, I'll
fix the message up when applying.

2022-09-30 01:58:28

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <[email protected]>:

On Wed, 28 Sep 2022 15:45:38 +0900 you wrote:
> When copying a large file over sftp over vsock, data size is usually 32kB,
> and kmalloc seems to fail to try to allocate 32 32kB regions.
>
> Call Trace:
> [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
> [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
> [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
> [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
> [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
> [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
> [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
> [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
> [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
> [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
> [<ffffffffb683ddce>] kthread+0xfd/0x105
> [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
> [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
> [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
> [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>
> [...]

Here is the summary with links:
- vhost/vsock: Use kvmalloc/kvfree for larger packets.
https://git.kernel.org/netdev/net/c/0e3f72931fc4

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html