2021-10-09 11:49:38

by Xianting Tian

[permalink] [raw]
Subject: [PATCH v10 0/3] make hvc pass dma capable memory to its backend

Dear all,

This patch series make hvc framework pass DMA capable memory to
put_chars() of hvc backend(eg, virtio-console), and revert commit
c4baad5029 ("virtio-console: avoid DMA from stack”)

V1
virtio-console: avoid DMA from vmalloc area
https://lkml.org/lkml/2021/7/27/494

For v1 patch, Arnd Bergmann suggests to fix the issue in the first
place:
Make hvc pass DMA capable memory to put_chars()
The fix suggestion is included in v2.

V2
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8
[PATCH 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/1/9

For v2 patch, Arnd Bergmann suggests to make new buf part of the
hvc_struct structure, and fix the compile issue.
The fix suggestion is included in v3.

V3
[PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/3/1347
[PATCH v3 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/3/1348

For v3 patch, Jiri Slaby suggests to make 'char c[N_OUTBUF]' part of
hvc_struct, and make 'hp->outbuf' aligned and use struct_size() to
calculate the size of hvc_struct. The fix suggestion is included in
v4.

V4
[PATCH v4 0/2] make hvc pass dma capable memory to its backend
https://lkml.org/lkml/2021/8/5/1350
[PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/5/1351
[PATCH v4 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/5/1352

For v4 patch, Arnd Bergmann suggests to introduce another
array(cons_outbuf[]) for the buffer pointers next to the cons_ops[]
and vtermnos[] arrays. This fix included in this v5 patch.

V5
Arnd Bergmann suggests to use "L1_CACHE_BYTES" as dma alignment,
use 'sizeof(long)' as dma alignment is wrong. fix it in v6.

V6
It contains coding error, fix it in v7 and it worked normally
according to test result.

V7
Greg KH suggests to add test and code review developer,
Jiri Slaby suggests to use lockless buffer and fix dma alignment
in separate patch.
fix above things in v8.

V8
This contains coding error when switch to use new buffer. fix it in v9.

V9
It didn't make things much clearer, it needs add more comments for new added buf.
Add use lock to protect new added buffer. fix in v10.

********TEST STEPS*********
1, config guest console=hvc0
2, start guest
3, login guest
Welcome to Buildroot
buildroot login: root
#
# cat /proc/cmdline
console=hvc0,115200
#

drivers/tty/hvc/hvc_console.c | 38 +++++++++++++++++++++--------------
drivers/tty/hvc/hvc_console.h | 24 ++++++++++++++++++++--
drivers/char/virtio_console.c | 12 ++----------
3 file changed


2021-10-09 11:50:18

by Xianting Tian

[permalink] [raw]
Subject: [PATCH v10 1/3] tty: hvc: use correct dma alignment size

Use L1_CACHE_BYTES as the dma alignment size, use 'sizeof(long)'
is wrong.

Signed-off-by: Xianting Tian <[email protected]>
Reviewed-by: Shile Zhang <[email protected]>
---
drivers/tty/hvc/hvc_console.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..5957ab728 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -49,7 +49,7 @@
#define N_OUTBUF 16
#define N_INBUF 16

-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+#define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES)))

static struct tty_driver *hvc_driver;
static struct task_struct *hvc_task;
--
2.17.1

2021-10-09 11:50:27

by Xianting Tian

[permalink] [raw]
Subject: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()

As well known, hvc backend can register its opertions to hvc backend.
the operations contain put_chars(), get_chars() and so on.

Some hvc backend may do dma in its operations. eg, put_chars() of
virtio-console. But in the code of hvc framework, it may pass DMA
incapable memory to put_chars() under a specific configuration, which
is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
1, c[] is on stack,
hvc_console_print():
char c[N_OUTBUF] __ALIGNED__;
cons_ops[index]->put_chars(vtermnos[index], c, i);
2, ch is on stack,
static void hvc_poll_put_char(,,char ch)
{
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
int n;

do {
n = hp->ops->put_chars(hp->vtermno, &ch, 1);
} while (n <= 0);
}

Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
is passed to virtio-console by hvc framework in above code. But I think
the fix is aggressive, it directly uses kmemdup() to alloc new buffer
from kmalloc area and do memcpy no matter the memory is in kmalloc area
or not. But most importantly, it should better be fixed in the hvc
framework, by changing it to never pass stack memory to the put_chars()
function in the first place. Otherwise, we still face the same issue if
a new hvc backend using dma added in the furture.

In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
so hp->cons_outbuf is no longer the stack memory, we can use it in above
case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
in above case 2. We also add lock for each above buf to protect them
separately instead of using the global lock of hvc.

Introduce another array(cons_hvcs[]) for hvc pointers next to the
cons_ops[] and vtermnos[] arrays. With the array, we can easily find
hvc's cons_outbuf and its lock.

With the patch, we can revert the fix c4baad5029.

Signed-off-by: Xianting Tian <[email protected]>
Signed-off-by: Shile Zhang <[email protected]>
---
drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
2 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..4d8f112f2 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -41,16 +41,6 @@
*/
#define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */

-/*
- * These sizes are most efficient for vio, because they are the
- * native transfer size. We could make them selectable in the
- * future to better deal with backends that want other buffer sizes.
- */
-#define N_OUTBUF 16
-#define N_INBUF 16
-
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
-
static struct tty_driver *hvc_driver;
static struct task_struct *hvc_task;

@@ -142,6 +132,7 @@ static int hvc_flush(struct hvc_struct *hp)
static const struct hv_ops *cons_ops[MAX_NR_HVC_CONSOLES];
static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
{[0 ... MAX_NR_HVC_CONSOLES - 1] = -1};
+static struct hvc_struct *cons_hvcs[MAX_NR_HVC_CONSOLES];

/*
* Console APIs, NOT TTY. These APIs are available immediately when
@@ -151,9 +142,11 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
static void hvc_console_print(struct console *co, const char *b,
unsigned count)
{
- char c[N_OUTBUF] __ALIGNED__;
+ char *c;
unsigned i = 0, n = 0;
int r, donecr = 0, index = co->index;
+ unsigned long flags;
+ struct hvc_struct *hp;

/* Console access attempt outside of acceptable console range. */
if (index >= MAX_NR_HVC_CONSOLES)
@@ -163,6 +156,13 @@ static void hvc_console_print(struct console *co, const char *b,
if (vtermnos[index] == -1)
return;

+ hp = cons_hvcs[index];
+ if (!hp)
+ return;
+
+ c = hp->cons_outbuf;
+
+ spin_lock_irqsave(&hp->cons_outbuf_lock, flags);
while (count > 0 || i > 0) {
if (count > 0 && i < sizeof(c)) {
if (b[n] == '\n' && !donecr) {
@@ -191,6 +191,7 @@ static void hvc_console_print(struct console *co, const char *b,
}
}
}
+ spin_unlock_irqrestore(&hp->cons_outbuf_lock, flags);
hvc_console_flush(cons_ops[index], vtermnos[index]);
}

@@ -878,9 +879,13 @@ static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
int n;
+ unsigned long flags;

do {
- n = hp->ops->put_chars(hp->vtermno, &ch, 1);
+ spin_lock_irqsave(&hp->outchar_lock, flags);
+ hp->outchar = ch;
+ n = hp->ops->put_chars(hp->vtermno, hp->outchar, 1);
+ spin_unlock_irqrestore(&hp->outchar_lock, flags);
} while (n <= 0);
}
#endif
@@ -922,8 +927,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
return ERR_PTR(err);
}

- hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size,
- GFP_KERNEL);
+ hp = kzalloc(struct_size(hp, outbuf, outbuf_size), GFP_KERNEL);
if (!hp)
return ERR_PTR(-ENOMEM);

@@ -931,13 +935,14 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
hp->data = data;
hp->ops = ops;
hp->outbuf_size = outbuf_size;
- hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];

tty_port_init(&hp->port);
hp->port.ops = &hvc_port_ops;

INIT_WORK(&hp->tty_resize, hvc_set_winsz);
spin_lock_init(&hp->lock);
+ spin_lock_init(&hp->outchar_lock);
+ spin_lock_init(&hp->cons_outbuf_lock);
mutex_lock(&hvc_structs_mutex);

/*
@@ -964,6 +969,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
if (i < MAX_NR_HVC_CONSOLES) {
cons_ops[i] = ops;
vtermnos[i] = vtermno;
+ cons_hvcs[i] = hp;
}

list_add_tail(&(hp->next), &hvc_structs);
@@ -988,6 +994,7 @@ int hvc_remove(struct hvc_struct *hp)
if (hp->index < MAX_NR_HVC_CONSOLES) {
vtermnos[hp->index] = -1;
cons_ops[hp->index] = NULL;
+ cons_hvcs[hp->index] = NULL;
}

/* Don't whack hp->irq because tty_hangup() will need to free the irq. */
diff --git a/drivers/tty/hvc/hvc_console.h b/drivers/tty/hvc/hvc_console.h
index 18d005814..98f0ced83 100644
--- a/drivers/tty/hvc/hvc_console.h
+++ b/drivers/tty/hvc/hvc_console.h
@@ -32,13 +32,21 @@
*/
#define HVC_ALLOC_TTY_ADAPTERS 8

+/*
+ * These sizes are most efficient for vio, because they are the
+ * native transfer size. We could make them selectable in the
+ * future to better deal with backends that want other buffer sizes.
+ */
+#define N_OUTBUF 16
+#define N_INBUF 16
+
+#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+
struct hvc_struct {
struct tty_port port;
spinlock_t lock;
int index;
int do_wakeup;
- char *outbuf;
- int outbuf_size;
int n_outbuf;
uint32_t vtermno;
const struct hv_ops *ops;
@@ -48,6 +56,18 @@ struct hvc_struct {
struct work_struct tty_resize;
struct list_head next;
unsigned long flags;
+
+ /* the buf is used in hvc console api for putting chars */
+ char cons_outbuf[N_OUTBUF] __ALIGNED__;
+ spinlock_t cons_outbuf_lock;
+
+ /* the buf is for putting single char to tty */
+ char outchar;
+ spinlock_t outchar_lock;
+
+ /* the buf is for putting chars to tty */
+ int outbuf_size;
+ char outbuf[0] __ALIGNED__;
};

/* implemented by a low level driver */
--
2.17.1

2021-10-09 11:51:38

by Xianting Tian

[permalink] [raw]
Subject: [PATCH v10 3/3] virtio-console: remove unnecessary kmemdup()

This revert commit c4baad5029 ("virtio-console: avoid DMA from stack")

hvc framework will never pass stack memory to the put_chars() function,
So the calling of kmemdup() is unnecessary, we can remove it.

Signed-off-by: Xianting Tian <[email protected]>
Reviewed-by: Shile Zhang <[email protected]>
---
drivers/char/virtio_console.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int count)
{
struct port *port;
struct scatterlist sg[1];
- void *data;
- int ret;

if (unlikely(early_put_chars))
return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int count)
if (!port)
return -EPIPE;

- data = kmemdup(buf, count, GFP_ATOMIC);
- if (!data)
- return -ENOMEM;
-
- sg_init_one(sg, data, count);
- ret = __send_to_port(port, sg, 1, count, data, false);
- kfree(data);
- return ret;
+ sg_init_one(sg, buf, count);
+ return __send_to_port(port, sg, 1, count, (void *)buf, false);
}

/*
--
2.17.1

2021-10-09 11:57:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the operations contain put_chars(), get_chars() and so on.
>
> Some hvc backend may do dma in its operations. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
> hvc_console_print():
> char c[N_OUTBUF] __ALIGNED__;
> cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
> static void hvc_poll_put_char(,,char ch)
> {
> struct tty_struct *tty = driver->ttys[0];
> struct hvc_struct *hp = tty->driver_data;
> int n;
>
> do {
> n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> } while (n <= 0);
> }
>
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
>
> In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
> so hp->cons_outbuf is no longer the stack memory, we can use it in above
> case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
> in above case 2. We also add lock for each above buf to protect them
> separately instead of using the global lock of hvc.
>
> Introduce another array(cons_hvcs[]) for hvc pointers next to the
> cons_ops[] and vtermnos[] arrays. With the array, we can easily find
> hvc's cons_outbuf and its lock.
>
> With the patch, we can revert the fix c4baad5029.
>
> Signed-off-by: Xianting Tian <[email protected]>
> Signed-off-by: Shile Zhang <[email protected]>
> ---
> drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
> drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
> 2 files changed, 44 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 5bb8c4e44..4d8f112f2 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -41,16 +41,6 @@
> */
> #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
>
> -/*
> - * These sizes are most efficient for vio, because they are the
> - * native transfer size. We could make them selectable in the
> - * future to better deal with backends that want other buffer sizes.
> - */
> -#define N_OUTBUF 16
> -#define N_INBUF 16
> -
> -#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
> -

Are you sure this applies on top of patch 1/3 here?

> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF 16
> +#define N_INBUF 16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Again, are you sure this is correct?

thanks,

greg k-h

2021-10-09 11:59:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> --- a/drivers/tty/hvc/hvc_console.h
> +++ b/drivers/tty/hvc/hvc_console.h
> @@ -32,13 +32,21 @@
> */
> #define HVC_ALLOC_TTY_ADAPTERS 8
>
> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF 16
> +#define N_INBUF 16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Does this conflict with what is in hvcs.c?

> +
> struct hvc_struct {
> struct tty_port port;
> spinlock_t lock;
> int index;
> int do_wakeup;
> - char *outbuf;
> - int outbuf_size;
> int n_outbuf;
> uint32_t vtermno;
> const struct hv_ops *ops;
> @@ -48,6 +56,18 @@ struct hvc_struct {
> struct work_struct tty_resize;
> struct list_head next;
> unsigned long flags;
> +
> + /* the buf is used in hvc console api for putting chars */
> + char cons_outbuf[N_OUTBUF] __ALIGNED__;
> + spinlock_t cons_outbuf_lock;

Did you look at the placement using pahole as to how this structure now
looks?

> +
> + /* the buf is for putting single char to tty */
> + char outchar;
> + spinlock_t outchar_lock;

So you have a lock for a character and a different one for a longer
string? Why can they not just use the same lock? Why are 2 needed at
all, can't you just use the first character of cons_outbuf[] instead?
Surely you do not have 2 sends happening at the same time, right?

> +
> + /* the buf is for putting chars to tty */
> + int outbuf_size;
> + char outbuf[0] __ALIGNED__;

I thought we were not allowing [0] anymore in kernel structures?

thanks,

greg k-h

2021-10-09 15:49:09

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()


在 2021/10/9 下午7:58, Greg KH 写道:
> Did you look at the placement using pahole as to how this structure now
> looks?

thanks for all your commnts. for this one, do you mean I need to remove
the blank line?  thanks

2021-10-10 05:36:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()

On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>
> 在 2021/10/9 下午7:58, Greg KH 写道:
> > Did you look at the placement using pahole as to how this structure now
> > looks?
>
> thanks for all your commnts. for this one, do you mean I need to remove the
> blank line?  thanks
>

No, I mean to use the tool 'pahole' to see the structure layout that you
just created and determine if it really is the best way to add these new
fields, especially as you are adding huge buffers with odd alignment.

thanks,

greg k-h

2021-10-14 08:36:15

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()


在 2021/10/10 下午1:33, Greg KH 写道:
> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>> Did you look at the placement using pahole as to how this structure now
>>> looks?
>> thanks for all your commnts. for this one, do you mean I need to remove the
>> blank line?  thanks
>>
> No, I mean to use the tool 'pahole' to see the structure layout that you
> just created and determine if it really is the best way to add these new
> fields, especially as you are adding huge buffers with odd alignment.

thanks,

Based on your comments, I removed 'char outchar',  remian the position
of 'int outbuf_size' unchanged to keep outbuf_size and lock in the same
cache line.  Now hvc_struct change as below,

 struct hvc_struct {
        struct tty_port port;
        spinlock_t lock;
        int index;
        int do_wakeup;
-       char *outbuf;
        int outbuf_size;
        int n_outbuf;
        uint32_t vtermno;
@@ -48,6 +57,16 @@ struct hvc_struct {
        struct work_struct tty_resize;
        struct list_head next;
        unsigned long flags;
+
+       /*
+        * the buf is used in hvc console api for putting chars,
+        * and also used in hvc_poll_put_char() for putting single char.
+        */
+       char cons_outbuf[N_OUTBUF] __ALIGNED__;
+       spinlock_t cons_outbuf_lock;
+
+       /* the buf is used for putting chars to tty */
+       char outbuf[] __ALIGNED__;
 };

pahole for above hvc_struct as below,  is it ok for you?  do we need to
pack the hole? thanks

struct hvc_struct {
    struct tty_port            port;                 /*     0 352 */
    /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
    spinlock_t                 lock;                 /*   352 4 */
    int                        index;                /*   356 4 */
    int                        do_wakeup;            /*   360 4 */
    int                        outbuf_size;          /*   364 4 */
    int                        n_outbuf;             /*   368 4 */
    uint32_t                   vtermno;              /*   372 4 */
    const struct hv_ops  *     ops;                  /*   376 8 */
    /* --- cacheline 6 boundary (384 bytes) --- */
    int                        irq_requested;        /*   384 4 */
    int                        data;                 /*   388 4 */
    struct winsize             ws;                   /*   392 8 */
    struct work_struct         tty_resize;           /*   400 32 */
    struct list_head           next;                 /*   432 16 */
    /* --- cacheline 7 boundary (448 bytes) --- */
    long unsigned int          flags;                /*   448 8 */

    /* XXX 56 bytes hole, try to pack */

    /* --- cacheline 8 boundary (512 bytes) --- */
    char                       cons_outbuf[16];      /*   512 16 */
    spinlock_t                 cons_outbuf_lock;     /*   528 4 */

    /* XXX 44 bytes hole, try to pack */

    /* --- cacheline 9 boundary (576 bytes) --- */
    char                       outbuf[0];            /*   576 0 */

    /* size: 576, cachelines: 9, members: 17 */
    /* sum members: 476, holes: 2, sum holes: 100 */
};


>
> thanks,
>
> greg k-h

2021-10-14 08:43:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()

On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
>
> 在 2021/10/10 下午1:33, Greg KH 写道:
> > On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> > > 在 2021/10/9 下午7:58, Greg KH 写道:
> > > > Did you look at the placement using pahole as to how this structure now
> > > > looks?
> > > thanks for all your commnts. for this one, do you mean I need to remove the
> > > blank line?  thanks
> > >
> > No, I mean to use the tool 'pahole' to see the structure layout that you
> > just created and determine if it really is the best way to add these new
> > fields, especially as you are adding huge buffers with odd alignment.
>
> thanks,
>
> Based on your comments, I removed 'char outchar',  remian the position of
> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
> line.  Now hvc_struct change as below,
>
>  struct hvc_struct {
>         struct tty_port port;
>         spinlock_t lock;
>         int index;
>         int do_wakeup;
> -       char *outbuf;
>         int outbuf_size;
>         int n_outbuf;
>         uint32_t vtermno;
> @@ -48,6 +57,16 @@ struct hvc_struct {
>         struct work_struct tty_resize;
>         struct list_head next;
>         unsigned long flags;
> +
> +       /*
> +        * the buf is used in hvc console api for putting chars,
> +        * and also used in hvc_poll_put_char() for putting single char.
> +        */
> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +       spinlock_t cons_outbuf_lock;
> +
> +       /* the buf is used for putting chars to tty */
> +       char outbuf[] __ALIGNED__;
>  };
>
> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
> the hole? thanks
>
> struct hvc_struct {
>     struct tty_port            port;                 /*     0 352 */
>     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>     spinlock_t                 lock;                 /*   352 4 */
>     int                        index;                /*   356 4 */
>     int                        do_wakeup;            /*   360 4 */
>     int                        outbuf_size;          /*   364 4 */
>     int                        n_outbuf;             /*   368 4 */
>     uint32_t                   vtermno;              /*   372 4 */
>     const struct hv_ops  *     ops;                  /*   376 8 */
>     /* --- cacheline 6 boundary (384 bytes) --- */
>     int                        irq_requested;        /*   384 4 */
>     int                        data;                 /*   388 4 */
>     struct winsize             ws;                   /*   392 8 */
>     struct work_struct         tty_resize;           /*   400 32 */
>     struct list_head           next;                 /*   432 16 */
>     /* --- cacheline 7 boundary (448 bytes) --- */
>     long unsigned int          flags;                /*   448 8 */
>
>     /* XXX 56 bytes hole, try to pack */
>
>     /* --- cacheline 8 boundary (512 bytes) --- */
>     char                       cons_outbuf[16];      /*   512 16 */
>     spinlock_t                 cons_outbuf_lock;     /*   528 4 */
>
>     /* XXX 44 bytes hole, try to pack */

Why not move the spinlock up above the cons_outbuf? Will that not be a
bit better?

Anyway, this is all fine, and much better than before, thanks.

greg k-h

2021-10-14 08:57:33

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()


在 2021/10/14 下午4:41, Greg KH 写道:
> On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
>> 在 2021/10/10 下午1:33, Greg KH 写道:
>>> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>>>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>>>> Did you look at the placement using pahole as to how this structure now
>>>>> looks?
>>>> thanks for all your commnts. for this one, do you mean I need to remove the
>>>> blank line?  thanks
>>>>
>>> No, I mean to use the tool 'pahole' to see the structure layout that you
>>> just created and determine if it really is the best way to add these new
>>> fields, especially as you are adding huge buffers with odd alignment.
>> thanks,
>>
>> Based on your comments, I removed 'char outchar',  remian the position of
>> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
>> line.  Now hvc_struct change as below,
>>
>>  struct hvc_struct {
>>         struct tty_port port;
>>         spinlock_t lock;
>>         int index;
>>         int do_wakeup;
>> -       char *outbuf;
>>         int outbuf_size;
>>         int n_outbuf;
>>         uint32_t vtermno;
>> @@ -48,6 +57,16 @@ struct hvc_struct {
>>         struct work_struct tty_resize;
>>         struct list_head next;
>>         unsigned long flags;
>> +
>> +       /*
>> +        * the buf is used in hvc console api for putting chars,
>> +        * and also used in hvc_poll_put_char() for putting single char.
>> +        */
>> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
>> +       spinlock_t cons_outbuf_lock;
>> +
>> +       /* the buf is used for putting chars to tty */
>> +       char outbuf[] __ALIGNED__;
>>  };
>>
>> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
>> the hole? thanks
>>
>> struct hvc_struct {
>>     struct tty_port            port;                 /*     0 352 */
>>     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>>     spinlock_t                 lock;                 /*   352 4 */
>>     int                        index;                /*   356 4 */
>>     int                        do_wakeup;            /*   360 4 */
>>     int                        outbuf_size;          /*   364 4 */
>>     int                        n_outbuf;             /*   368 4 */
>>     uint32_t                   vtermno;              /*   372 4 */
>>     const struct hv_ops  *     ops;                  /*   376 8 */
>>     /* --- cacheline 6 boundary (384 bytes) --- */
>>     int                        irq_requested;        /*   384 4 */
>>     int                        data;                 /*   388 4 */
>>     struct winsize             ws;                   /*   392 8 */
>>     struct work_struct         tty_resize;           /*   400 32 */
>>     struct list_head           next;                 /*   432 16 */
>>     /* --- cacheline 7 boundary (448 bytes) --- */
>>     long unsigned int          flags;                /*   448 8 */
>>
>>     /* XXX 56 bytes hole, try to pack */
>>
>>     /* --- cacheline 8 boundary (512 bytes) --- */
>>     char                       cons_outbuf[16];      /*   512 16 */
>>     spinlock_t                 cons_outbuf_lock;     /*   528 4 */
>>
>>     /* XXX 44 bytes hole, try to pack */
> Why not move the spinlock up above the cons_outbuf? Will that not be a
> bit better?
thanks, I will move it avove cons_outbuf, and send v11 patches soon.
>
> Anyway, this is all fine, and much better than before, thanks.
>
> greg k-h