2016-12-08 17:40:24

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH] tracing: Replace kmap with copy_from_user() in trace_marker writing

Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly. But Henrik Austad had issues with this because that caused
other tracepoints to trigger as well.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.

Cc: Henrik Austad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Updates: d696b58ca2c3ca "tracing: Do not allocate buffer for trace_marker"
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
trace.c | 135 +++++++++++++++++-----------------------------------------------
1 file changed, 37 insertions(+), 98 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 60416bf7c591..e0f8d814cec6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5738,61 +5738,6 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
return 0;
}

-static inline int lock_user_pages(const char __user *ubuf, size_t cnt,
- struct page **pages, void **map_page,
- int *offset)
-{
- unsigned long addr = (unsigned long)ubuf;
- int nr_pages = 1;
- int ret;
- int i;
-
- /*
- * Userspace is injecting traces into the kernel trace buffer.
- * We want to be as non intrusive as possible.
- * To do so, we do not want to allocate any special buffers
- * or take any locks, but instead write the userspace data
- * straight into the ring buffer.
- *
- * First we need to pin the userspace buffer into memory,
- * which, most likely it is, because it just referenced it.
- * But there's no guarantee that it is. By using get_user_pages_fast()
- * and kmap_atomic/kunmap_atomic() we can get access to the
- * pages directly. We then write the data directly into the
- * ring buffer.
- */
-
- /* check if we cross pages */
- if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
- nr_pages = 2;
-
- *offset = addr & (PAGE_SIZE - 1);
- addr &= PAGE_MASK;
-
- ret = get_user_pages_fast(addr, nr_pages, 0, pages);
- if (ret < nr_pages) {
- while (--ret >= 0)
- put_page(pages[ret]);
- return -EFAULT;
- }
-
- for (i = 0; i < nr_pages; i++)
- map_page[i] = kmap_atomic(pages[i]);
-
- return nr_pages;
-}
-
-static inline void unlock_user_pages(struct page **pages,
- void **map_page, int nr_pages)
-{
- int i;
-
- for (i = nr_pages - 1; i >= 0; i--) {
- kunmap_atomic(map_page[i]);
- put_page(pages[i]);
- }
-}
-
static ssize_t
tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
@@ -5803,13 +5748,15 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
struct print_entry *entry;
unsigned long irq_flags;
struct page *pages[2];
- void *map_page[2];
- int nr_pages = 1;
+ const char faulted[] = "<faulted>";
ssize_t written;
int offset;
int size;
int len;

+/* Used in tracing_mark_raw_write() as well */
+#define FAULTED_SIZE (sizeof(faulted) - 1) /* '\0' is already accounted for */
+
if (tracing_disabled)
return -EINVAL;

@@ -5821,30 +5768,31 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,

BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);

- nr_pages = lock_user_pages(ubuf, cnt, pages, map_page, &offset);
- if (nr_pages < 0)
- return nr_pages;
-
local_save_flags(irq_flags);
- size = sizeof(*entry) + cnt + 2; /* possible \n added */
+ size = sizeof(*entry) + cnt + 2; /* add '\0' and possible '\n' */
+
+ /* If less than "<faulted>", then make sure we can still add that */
+ if (cnt < FAULTED_SIZE)
+ size += FAULTED_SIZE - cnt;
+
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
irq_flags, preempt_count());
- if (!event) {
+ if (unlikely(!event))
/* Ring buffer disabled, return as if not open for write */
- written = -EBADF;
- goto out_unlock;
- }
+ return -EBADF;

entry = ring_buffer_event_data(event);
entry->ip = _THIS_IP_;

- if (nr_pages == 2) {
- len = PAGE_SIZE - offset;
- memcpy(&entry->buf, map_page[0] + offset, len);
- memcpy(&entry->buf[len], map_page[1], cnt - len);
+ len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
+ if (len) {
+ memcpy(&entry->buf, faulted, FAULTED_SIZE);
+ cnt = FAULTED_SIZE;
+ written = -EFAULT;
} else
- memcpy(&entry->buf, map_page[0] + offset, cnt);
+ written = cnt;
+ len = cnt;

if (entry->buf[cnt - 1] != '\n') {
entry->buf[cnt] = '\n';
@@ -5854,12 +5802,8 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,

__buffer_unlock_commit(buffer, event);

- written = cnt;
-
- *fpos += written;
-
- out_unlock:
- unlock_user_pages(pages, map_page, nr_pages);
+ if (written > 0)
+ *fpos += written;

return written;
}
@@ -5875,15 +5819,16 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct raw_data_entry *entry;
+ const char faulted[] = "<faulted>";
unsigned long irq_flags;
struct page *pages[2];
- void *map_page[2];
- int nr_pages = 1;
ssize_t written;
int offset;
int size;
int len;

+#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+
if (tracing_disabled)
return -EINVAL;

@@ -5899,38 +5844,32 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,

BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);

- nr_pages = lock_user_pages(ubuf, cnt, pages, map_page, &offset);
- if (nr_pages < 0)
- return nr_pages;
-
local_save_flags(irq_flags);
size = sizeof(*entry) + cnt;
+ if (cnt < FAULT_SIZE_ID)
+ size += FAULT_SIZE_ID - cnt;
+
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_RAW_DATA, size,
irq_flags, preempt_count());
- if (!event) {
+ if (!event)
/* Ring buffer disabled, return as if not open for write */
- written = -EBADF;
- goto out_unlock;
- }
+ return -EBADF;

entry = ring_buffer_event_data(event);

- if (nr_pages == 2) {
- len = PAGE_SIZE - offset;
- memcpy(&entry->id, map_page[0] + offset, len);
- memcpy(((char *)&entry->id) + len, map_page[1], cnt - len);
+ len = __copy_from_user_inatomic(&entry->id, ubuf, cnt);
+ if (len) {
+ entry->id = -1;
+ memcpy(&entry->buf, faulted, FAULTED_SIZE);
+ written = -EFAULT;
} else
- memcpy(&entry->id, map_page[0] + offset, cnt);
+ written = cnt;

__buffer_unlock_commit(buffer, event);

- written = cnt;
-
- *fpos += written;
-
- out_unlock:
- unlock_user_pages(pages, map_page, nr_pages);
+ if (written > 0)
+ *fpos += written;

return written;
}


2016-12-08 18:44:28

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH v2] tracing: Replace kmap with copy_from_user() in trace_marker writing


Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly. But Henrik Austad had issues with this because that caused
other tracepoints to trigger as well.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.

Link: http://lkml.kernel.org/r/[email protected]

Cc: Ingo Molnar <[email protected]>
Cc: Henrik Austad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Updates: d696b58ca2c3ca "tracing: Do not allocate buffer for trace_marker"
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---

Changes from v1: Removed unused variables.

kernel/trace/trace.c | 139 ++++++++++++++-------------------------------------
1 file changed, 37 insertions(+), 102 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 60416bf7c591..6f420d7b703b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5738,61 +5738,6 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
return 0;
}

-static inline int lock_user_pages(const char __user *ubuf, size_t cnt,
- struct page **pages, void **map_page,
- int *offset)
-{
- unsigned long addr = (unsigned long)ubuf;
- int nr_pages = 1;
- int ret;
- int i;
-
- /*
- * Userspace is injecting traces into the kernel trace buffer.
- * We want to be as non intrusive as possible.
- * To do so, we do not want to allocate any special buffers
- * or take any locks, but instead write the userspace data
- * straight into the ring buffer.
- *
- * First we need to pin the userspace buffer into memory,
- * which, most likely it is, because it just referenced it.
- * But there's no guarantee that it is. By using get_user_pages_fast()
- * and kmap_atomic/kunmap_atomic() we can get access to the
- * pages directly. We then write the data directly into the
- * ring buffer.
- */
-
- /* check if we cross pages */
- if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
- nr_pages = 2;
-
- *offset = addr & (PAGE_SIZE - 1);
- addr &= PAGE_MASK;
-
- ret = get_user_pages_fast(addr, nr_pages, 0, pages);
- if (ret < nr_pages) {
- while (--ret >= 0)
- put_page(pages[ret]);
- return -EFAULT;
- }
-
- for (i = 0; i < nr_pages; i++)
- map_page[i] = kmap_atomic(pages[i]);
-
- return nr_pages;
-}
-
-static inline void unlock_user_pages(struct page **pages,
- void **map_page, int nr_pages)
-{
- int i;
-
- for (i = nr_pages - 1; i >= 0; i--) {
- kunmap_atomic(map_page[i]);
- put_page(pages[i]);
- }
-}
-
static ssize_t
tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
@@ -5802,14 +5747,14 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
struct ring_buffer *buffer;
struct print_entry *entry;
unsigned long irq_flags;
- struct page *pages[2];
- void *map_page[2];
- int nr_pages = 1;
+ const char faulted[] = "<faulted>";
ssize_t written;
- int offset;
int size;
int len;

+/* Used in tracing_mark_raw_write() as well */
+#define FAULTED_SIZE (sizeof(faulted) - 1) /* '\0' is already accounted for */
+
if (tracing_disabled)
return -EINVAL;

@@ -5821,30 +5766,31 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,

BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);

- nr_pages = lock_user_pages(ubuf, cnt, pages, map_page, &offset);
- if (nr_pages < 0)
- return nr_pages;
-
local_save_flags(irq_flags);
- size = sizeof(*entry) + cnt + 2; /* possible \n added */
+ size = sizeof(*entry) + cnt + 2; /* add '\0' and possible '\n' */
+
+ /* If less than "<faulted>", then make sure we can still add that */
+ if (cnt < FAULTED_SIZE)
+ size += FAULTED_SIZE - cnt;
+
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
irq_flags, preempt_count());
- if (!event) {
+ if (unlikely(!event))
/* Ring buffer disabled, return as if not open for write */
- written = -EBADF;
- goto out_unlock;
- }
+ return -EBADF;

entry = ring_buffer_event_data(event);
entry->ip = _THIS_IP_;

- if (nr_pages == 2) {
- len = PAGE_SIZE - offset;
- memcpy(&entry->buf, map_page[0] + offset, len);
- memcpy(&entry->buf[len], map_page[1], cnt - len);
+ len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
+ if (len) {
+ memcpy(&entry->buf, faulted, FAULTED_SIZE);
+ cnt = FAULTED_SIZE;
+ written = -EFAULT;
} else
- memcpy(&entry->buf, map_page[0] + offset, cnt);
+ written = cnt;
+ len = cnt;

if (entry->buf[cnt - 1] != '\n') {
entry->buf[cnt] = '\n';
@@ -5854,12 +5800,8 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,

__buffer_unlock_commit(buffer, event);

- written = cnt;
-
- *fpos += written;
-
- out_unlock:
- unlock_user_pages(pages, map_page, nr_pages);
+ if (written > 0)
+ *fpos += written;

return written;
}
@@ -5875,15 +5817,14 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
struct ring_buffer_event *event;
struct ring_buffer *buffer;
struct raw_data_entry *entry;
+ const char faulted[] = "<faulted>";
unsigned long irq_flags;
- struct page *pages[2];
- void *map_page[2];
- int nr_pages = 1;
ssize_t written;
- int offset;
int size;
int len;

+#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+
if (tracing_disabled)
return -EINVAL;

@@ -5899,38 +5840,32 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,

BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);

- nr_pages = lock_user_pages(ubuf, cnt, pages, map_page, &offset);
- if (nr_pages < 0)
- return nr_pages;
-
local_save_flags(irq_flags);
size = sizeof(*entry) + cnt;
+ if (cnt < FAULT_SIZE_ID)
+ size += FAULT_SIZE_ID - cnt;
+
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_RAW_DATA, size,
irq_flags, preempt_count());
- if (!event) {
+ if (!event)
/* Ring buffer disabled, return as if not open for write */
- written = -EBADF;
- goto out_unlock;
- }
+ return -EBADF;

entry = ring_buffer_event_data(event);

- if (nr_pages == 2) {
- len = PAGE_SIZE - offset;
- memcpy(&entry->id, map_page[0] + offset, len);
- memcpy(((char *)&entry->id) + len, map_page[1], cnt - len);
+ len = __copy_from_user_inatomic(&entry->id, ubuf, cnt);
+ if (len) {
+ entry->id = -1;
+ memcpy(&entry->buf, faulted, FAULTED_SIZE);
+ written = -EFAULT;
} else
- memcpy(&entry->id, map_page[0] + offset, cnt);
+ written = cnt;

__buffer_unlock_commit(buffer, event);

- written = cnt;
-
- *fpos += written;
-
- out_unlock:
- unlock_user_pages(pages, map_page, nr_pages);
+ if (written > 0)
+ *fpos += written;

return written;
}
--
2.1.0

2016-12-09 06:34:51

by Henrik Austad

[permalink] [raw]
Subject: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"<faulted>" into the ring buffer.

On architectures without a arch-specific get_user_pages_fast(), this
will end up in the generic get_user_pages_fast() and this grabs
mm->mmap_sem. Once you do this, then suddenly writing to the
trace_marker can cause priority-inversions.

This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
signed-off-chain by is somewhat uncertain at this stage.

The patch compiles, boots and does not immediately explode on impact. By
definition [2] it must therefore be perfect

2) https://www.spinics.net/lists/kernel/msg2400769.html
2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html

Cc: Ingo Molnar <[email protected]>
Cc: Henrik Austad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]

Suggested-by: Thomas Gleixner <[email protected]>
Used-to-be-signed-off-by: Steven Rostedt <[email protected]>
Backported-by: Henrik Austad <[email protected]>
Tested-by: Henrik Austad <[email protected]>
Signed-off-by: Henrik Austad <[email protected]>
---
kernel/trace/trace.c | 78 +++++++++++++++-------------------------------------
1 file changed, 22 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 18cdf91..94eb1ee 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4501,15 +4501,13 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
struct ring_buffer *buffer;
struct print_entry *entry;
unsigned long irq_flags;
- struct page *pages[2];
- void *map_page[2];
- int nr_pages = 1;
+ const char faulted[] = "<faulted>";
ssize_t written;
- int offset;
int size;
int len;
- int ret;
- int i;
+
+/* Used in tracing_mark_raw_write() as well */
+#define FAULTED_SIZE (sizeof(faulted) - 1) /* '\0' is already accounted for */

if (tracing_disabled)
return -EINVAL;
@@ -4520,60 +4518,34 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if (cnt > TRACE_BUF_SIZE)
cnt = TRACE_BUF_SIZE;

- /*
- * Userspace is injecting traces into the kernel trace buffer.
- * We want to be as non intrusive as possible.
- * To do so, we do not want to allocate any special buffers
- * or take any locks, but instead write the userspace data
- * straight into the ring buffer.
- *
- * First we need to pin the userspace buffer into memory,
- * which, most likely it is, because it just referenced it.
- * But there's no guarantee that it is. By using get_user_pages_fast()
- * and kmap_atomic/kunmap_atomic() we can get access to the
- * pages directly. We then write the data directly into the
- * ring buffer.
- */
BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);

- /* check if we cross pages */
- if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
- nr_pages = 2;
-
- offset = addr & (PAGE_SIZE - 1);
- addr &= PAGE_MASK;
-
- ret = get_user_pages_fast(addr, nr_pages, 0, pages);
- if (ret < nr_pages) {
- while (--ret >= 0)
- put_page(pages[ret]);
- written = -EFAULT;
- goto out;
- }
+ local_save_flags(irq_flags);
+ size = sizeof(*entry) + cnt + 2; /* add '\0' and possible '\n' */

- for (i = 0; i < nr_pages; i++)
- map_page[i] = kmap_atomic(pages[i]);
+ /* If less than "<faulted>", then make sure we can still add that */
+ if (cnt < FAULTED_SIZE)
+ size += FAULTED_SIZE - cnt;

- local_save_flags(irq_flags);
- size = sizeof(*entry) + cnt + 2; /* possible \n added */
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
irq_flags, preempt_count());
- if (!event) {
- /* Ring buffer disabled, return as if not open for write */
- written = -EBADF;
- goto out_unlock;
- }
+
+ if (unlikely(!event))
+ /* Ring buffer disabled, return as if not open for write */
+ return -EBADF;

entry = ring_buffer_event_data(event);
entry->ip = _THIS_IP_;

- if (nr_pages == 2) {
- len = PAGE_SIZE - offset;
- memcpy(&entry->buf, map_page[0] + offset, len);
- memcpy(&entry->buf[len], map_page[1], cnt - len);
+ len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
+ if (len) {
+ memcpy(&entry->buf, faulted, FAULTED_SIZE);
+ cnt = FAULTED_SIZE;
+ written = -EFAULT;
} else
- memcpy(&entry->buf, map_page[0] + offset, cnt);
+ written = cnt;
+ len = cnt;

if (entry->buf[cnt - 1] != '\n') {
entry->buf[cnt] = '\n';
@@ -4583,16 +4555,10 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,

__buffer_unlock_commit(buffer, event);

- written = cnt;

- *fpos += written;
+ if (written > 0)
+ *fpos += written;

- out_unlock:
- for (i = 0; i < nr_pages; i++){
- kunmap_atomic(map_page[i]);
- put_page(pages[i]);
- }
- out:
return written;
}

--
2.7.4

2016-12-09 07:22:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

On Fri, Dec 09, 2016 at 07:34:04AM +0100, Henrik Austad wrote:
> Instead of using get_user_pages_fast() and kmap_atomic() when writing
> to the trace_marker file, just allocate enough space on the ring buffer
> directly, and write into it via copy_from_user().
>
> Writing into the trace_marker file use to allocate a temporary buffer
> to perform the copy_from_user(), as we didn't want to write into the
> ring buffer if the copy failed. But as a trace_marker write is suppose
> to be extremely fast, and allocating memory causes other tracepoints to
> trigger, Peter Zijlstra suggested using get_user_pages_fast() and
> kmap_atomic() to keep the user space pages in memory and reading it
> directly.
>
> Instead, just allocate the space in the ring buffer and use
> copy_from_user() directly. If it faults, return -EFAULT and write
> "<faulted>" into the ring buffer.
>
> On architectures without a arch-specific get_user_pages_fast(), this
> will end up in the generic get_user_pages_fast() and this grabs
> mm->mmap_sem. Once you do this, then suddenly writing to the
> trace_marker can cause priority-inversions.
>
> This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
> signed-off-chain by is somewhat uncertain at this stage.
>
> The patch compiles, boots and does not immediately explode on impact. By
> definition [2] it must therefore be perfect
>
> 2) https://www.spinics.net/lists/kernel/msg2400769.html
> 2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html
>
> Cc: Ingo Molnar <[email protected]>
> Cc: Henrik Austad <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: [email protected]
>
> Suggested-by: Thomas Gleixner <[email protected]>
> Used-to-be-signed-off-by: Steven Rostedt <[email protected]>
> Backported-by: Henrik Austad <[email protected]>
> Tested-by: Henrik Austad <[email protected]>
> Signed-off-by: Henrik Austad <[email protected]>
> ---
> kernel/trace/trace.c | 78 +++++++++++++++-------------------------------------
> 1 file changed, 22 insertions(+), 56 deletions(-)

What is the git commit id of this patch in Linus's tree? And what
stable trees do you feel it should be applied to?

thanks,

greg k-h

2016-12-09 08:16:07

by Henrik Austad

[permalink] [raw]
Subject: Re: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

On Fri, Dec 09, 2016 at 08:22:05AM +0100, Greg KH wrote:
> On Fri, Dec 09, 2016 at 07:34:04AM +0100, Henrik Austad wrote:
> > Instead of using get_user_pages_fast() and kmap_atomic() when writing
> > to the trace_marker file, just allocate enough space on the ring buffer
> > directly, and write into it via copy_from_user().
> >
> > Writing into the trace_marker file use to allocate a temporary buffer
> > to perform the copy_from_user(), as we didn't want to write into the
> > ring buffer if the copy failed. But as a trace_marker write is suppose
> > to be extremely fast, and allocating memory causes other tracepoints to
> > trigger, Peter Zijlstra suggested using get_user_pages_fast() and
> > kmap_atomic() to keep the user space pages in memory and reading it
> > directly.
> >
> > Instead, just allocate the space in the ring buffer and use
> > copy_from_user() directly. If it faults, return -EFAULT and write
> > "<faulted>" into the ring buffer.
> >
> > On architectures without a arch-specific get_user_pages_fast(), this
> > will end up in the generic get_user_pages_fast() and this grabs
> > mm->mmap_sem. Once you do this, then suddenly writing to the
> > trace_marker can cause priority-inversions.
> >
> > This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
> > signed-off-chain by is somewhat uncertain at this stage.
> >
> > The patch compiles, boots and does not immediately explode on impact. By
> > definition [2] it must therefore be perfect
> >
> > 2) https://www.spinics.net/lists/kernel/msg2400769.html
> > 2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html
> >
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Henrik Austad <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: [email protected]
> >
> > Suggested-by: Thomas Gleixner <[email protected]>
> > Used-to-be-signed-off-by: Steven Rostedt <[email protected]>
> > Backported-by: Henrik Austad <[email protected]>
> > Tested-by: Henrik Austad <[email protected]>
> > Signed-off-by: Henrik Austad <[email protected]>
> > ---
> > kernel/trace/trace.c | 78 +++++++++++++++-------------------------------------
> > 1 file changed, 22 insertions(+), 56 deletions(-)
>
> What is the git commit id of this patch in Linus's tree? And what
> stable trees do you feel it should be applied to?

Ah, perhaps I jumped the gun here. I don't think Linus has picked this one
up yet, Steven sent out the patch yesterday.

Since then, I've backported it to 3.10 and ran the first set of tests
over night and it looks good. So ideally this would find its way into
3.10(.104).

Do you want med to resubmit when Stevens patch is merged upstream?

-Henrik

2016-12-09 08:27:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

On Fri, Dec 09, 2016 at 09:05:51AM +0100, Henrik Austad wrote:
> On Fri, Dec 09, 2016 at 08:22:05AM +0100, Greg KH wrote:
> > On Fri, Dec 09, 2016 at 07:34:04AM +0100, Henrik Austad wrote:
> > > Instead of using get_user_pages_fast() and kmap_atomic() when writing
> > > to the trace_marker file, just allocate enough space on the ring buffer
> > > directly, and write into it via copy_from_user().
> > >
> > > Writing into the trace_marker file use to allocate a temporary buffer
> > > to perform the copy_from_user(), as we didn't want to write into the
> > > ring buffer if the copy failed. But as a trace_marker write is suppose
> > > to be extremely fast, and allocating memory causes other tracepoints to
> > > trigger, Peter Zijlstra suggested using get_user_pages_fast() and
> > > kmap_atomic() to keep the user space pages in memory and reading it
> > > directly.
> > >
> > > Instead, just allocate the space in the ring buffer and use
> > > copy_from_user() directly. If it faults, return -EFAULT and write
> > > "<faulted>" into the ring buffer.
> > >
> > > On architectures without a arch-specific get_user_pages_fast(), this
> > > will end up in the generic get_user_pages_fast() and this grabs
> > > mm->mmap_sem. Once you do this, then suddenly writing to the
> > > trace_marker can cause priority-inversions.
> > >
> > > This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
> > > signed-off-chain by is somewhat uncertain at this stage.
> > >
> > > The patch compiles, boots and does not immediately explode on impact. By
> > > definition [2] it must therefore be perfect
> > >
> > > 2) https://www.spinics.net/lists/kernel/msg2400769.html
> > > 2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html
> > >
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Henrik Austad <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Steven Rostedt <[email protected]>
> > > Cc: [email protected]
> > >
> > > Suggested-by: Thomas Gleixner <[email protected]>
> > > Used-to-be-signed-off-by: Steven Rostedt <[email protected]>
> > > Backported-by: Henrik Austad <[email protected]>
> > > Tested-by: Henrik Austad <[email protected]>
> > > Signed-off-by: Henrik Austad <[email protected]>
> > > ---
> > > kernel/trace/trace.c | 78 +++++++++++++++-------------------------------------
> > > 1 file changed, 22 insertions(+), 56 deletions(-)
> >
> > What is the git commit id of this patch in Linus's tree? And what
> > stable trees do you feel it should be applied to?
>
> Ah, perhaps I jumped the gun here. I don't think Linus has picked this one
> up yet, Steven sent out the patch yesterday.
>
> Since then, I've backported it to 3.10 and ran the first set of tests
> over night and it looks good. So ideally this would find its way into
> 3.10(.104).
>
> Do you want med to resubmit when Stevens patch is merged upstream?

Yes please, we can't do anything until it is in Linus's tree, please see
Documentation/stable_kernel_rules.txt for how this all works.

thanks,

greg k-h

2016-12-09 13:56:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

On Fri, 9 Dec 2016 09:05:51 +0100
Henrik Austad <[email protected]> wrote:


> Ah, perhaps I jumped the gun here. I don't think Linus has picked this one
> up yet, Steven sent out the patch yesterday.

Correct, and since it's rc8 and I've just finished testing this, I
think I'll just wait till the merge window to push it to Linus. I'll be
putting it into linux-next until then.

You're too quick, I didn't think you were going to push it right
away ;-)

>
> Since then, I've backported it to 3.10 and ran the first set of tests
> over night and it looks good. So ideally this would find its way into
> 3.10(.104).
>
> Do you want med to resubmit when Stevens patch is merged upstream?

Yes, once the merge window opens (I'm guessing it may be this weekend?)
I'll submit it to Linus and after he pulls it, you can send your patch
to stable. I'll get notified when he does, and I can send you a poke to
let you know.

-- Steve