2013-09-13 02:02:18

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

Hi Steven,

This is a v2 patch set for realizing a part of "Integrated trace" feature which
is a trace merging system for a virtualization environment. Currently, trace-cmd
does not have following features yet:

a) Server and client for a virtualization environment
b) Structured message platform between guests and host
c) Agent feature of a client
d) Merge feature of trace data of multiple guests and host in chronological
order

This patch set supports above a) and b) features.

<overall view>

+------------+ +------------+
Guest | a), c) | | a), c) | client/agent
^ +------------+ +------------+
| ^ ^ ^ ^
============|===|=================|===|===========
| v b)v v b)v
v +----------------------------------+
Host | a) | server
+----------------------------------+
||output || ||
\/ \/ \/
/--------+ /--------+ /--------+
| 010101 | | 101010 | | 100101 | binary data
| 010100 | | 010100 | | 110011 |
+--------+ +--------+ +--------+
\ /
\-----------------------------------/
|| d)
\/
/-----------------------------------+
| (guest1) 123456: sched_switch... | text data
| (guest2) 123458: kmem_free... |
| (host) 123500: kvm_exit (guest1)|
| (host) 123510: kvm_entry(guest1)|
| (guest1) 123550: sched_switch... |
+-----------------------------------+

a) Server and client for a virtualization environment
trace-cmd has listen mode for network, but using network will be a high cost
operation for inducing a lot of memory copying. From kernel-3.6, the
virtio-console driver supports splice_write and ftrace supports "steal" for
fops. So, guest clients of trace-cmd can send trace data without copying memory
by using splice(2). If guest clients use virtio-serial, the server also needs to
support virtio-serial I/F.

b) Structured message platform between guests and a host
Currently, a server(clients) sends unstructured character string to
clients(server), so clients(server) must parse the unstructured messages.
Since it is hard to add complex contents in the protocol, structured binary
message trace-msg is introduced as the communication protocol.

c) Agent feature of a client
Current trace-cmd client can operate only as "record" mode, so the client
will send trace data to the server immediately. However, when an user tries to
collect trace data of multiple guests on a host, the user must log in to
each guest. This is hard to use, I think. So, trace-cmd client had better
support agent mode which receives a message from the server.

d) Merge feature of trace data of multiple guests and a host in chronological
order
Current trace-cmd has a merge feature for multiple machines whose times are
synchronized by NTP. When we use the feature, we execute "trace-cmd record"
with --date option on each machine, and then we run "trace-cmd report" with -i
option for each file.
However, there are cases that times of those machines cannot be synchronized.
For example, although multiple users can run guests on virtualization
environments (e.g. multi-tenant cloud hosting), there are no guarantee that
they use the same NTP server. Moreover, even if the times are synchronized,
trace data cannot exactly be merged because the NTP-synchronized time
granularity may not be enough fine for sorting guest-host switching events.
So, I'm considering that trace data use x86-tsc as timestamp in order to merge
trace data. By using x86-tsc, we can merge trace data even if time of those
machines is not synchronized when CPU has the invariant TSC feature or the
constant TSC feature. And the precision will be enough for understanding
operations of guests and host. However, TSC values on a guest are not equal to
the values on the host because
TSC_guest = TSC_host + TSC_offset.
This series actually doesn't support TSC offset, but I'd like to add such
feature to fix host/guest clock difference in the other series. TSC offset
values can be gotten as write_tsc_offset trace event from kernel-3.11.
(see https://lkml.org/lkml/2013/6/12/72)

For a), this patch introduces "virt-server" and "record --virt" modes for
achieving low-overhead communication of trace data of guests. "virt-server" is a
server mode for collecting trace data of guests. On the other hand,
"record --virt" mode is a guest client for sending trace data of the guest.
Although these functions are similar to "listen" and "record -N" modes each,
these do not use network but use virtio-serial for low-overhead communication.

For b), this patch series introduce specific message protocol in order to handle
communication messages with 8 commands. When we extend any messages, using
structured message will be easier than using unstructured message.

<How to use>
1. Run virt-server on a host
# trace-cmd virt-server

2. Make guest domain directory
# mkdir -p /tmp/trace-cmd/virt/<domain>
# chmod 710 /tmp/trace-cmd/virt/<domain>
# chgrp qemu /tmp/trace-cmd/virt/<domain>

3. Make FIFO on the host
# mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}

4. Set up of virtio-serial pipe of a guest on the host
Add the following tags to domain XML files.
# virsh edit <domain>
<channel type='unix'>
<source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
<target type='virtio' name='agent-ctl-path'/>
</channel>
<channel type='pipe'>
<source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
<target type='virtio' name='trace-path-cpu0'/>
</channel>
... (cpu1, cpu2, ...)

5. Boot the guest
# virsh start <domain>

6. Execute "record --virt" on the guest
# trace-cmd record --virt -e sched*

<Result>
I measured CPU usage outputted by top command on a guest when client sends
trace data. Client means "record -N"(NW) or "record --virt"(virtio-serial).

NW virtio-serial(splice)
client(fedora19) ~2.9[%] ~1.7[%]

<Future work>
- Add an agent mode based on "record --virt"
- Add a merging feature of trace data of guests and host to "report"

Changes in V2:
[1/5] Add a comment in open_udp()
[2/5] Regacy protocol support in order to keep backward compatibility

Thank you,

---

Yoshihiro YUNOMAE (5):
[CLEANUP] trace-cmd: Split out binding a port and fork reader from open_udp()
trace-cmd: Apply the trace-msg protocol for communication between a server and clients
trace-cmd: Use poll(2) to wait for a message
trace-cmd: Add virt-server mode for a virtualization environment
trace-cmd: Add --virt option for record mode


Documentation/trace-cmd-record.1.txt | 11
Documentation/trace-cmd-virt-server.1.txt | 89 +++
Makefile | 2
trace-cmd.c | 3
trace-cmd.h | 14
trace-listen.c | 601 ++++++++++++++++----
trace-msg.c | 874 +++++++++++++++++++++++++++++
trace-msg.h | 31 +
trace-output.c | 4
trace-record.c | 146 ++++-
trace-recorder.c | 54 +-
trace-usage.c | 10
12 files changed, 1678 insertions(+), 161 deletions(-)
create mode 100644 Documentation/trace-cmd-virt-server.1.txt
create mode 100644 trace-msg.c
create mode 100644 trace-msg.h

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]


2013-09-13 02:02:25

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 1/5] [CLEANUP] trace-cmd: Split out binding a port and fork reader from open_udp()

Split out binding a port and fork reader from open_udp() for avoiding duplicate
codes between listen mode and virt-server mode.

Changes in V2: Add a comment in open_udp()

Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
---
trace-listen.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/trace-listen.c b/trace-listen.c
index 8b8f02c..bf187c9 100644
--- a/trace-listen.c
+++ b/trace-listen.c
@@ -228,13 +228,12 @@ static void process_udp_child(int sfd, const char *host, const char *port,
#define START_PORT_SEARCH 1500
#define MAX_PORT_SEARCH 6000

-static int open_udp(const char *node, const char *port, int *pid,
- int cpu, int pagesize, int start_port)
+static int udp_bind_a_port(int start_port, int *sfd)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
- int sfd, s;
char buf[BUFSIZ];
+ int s;
int num_port = start_port;

again:
@@ -250,15 +249,15 @@ static int open_udp(const char *node, const char *port, int *pid,
pdie("getaddrinfo: error opening udp socket");

for (rp = result; rp != NULL; rp = rp->ai_next) {
- sfd = socket(rp->ai_family, rp->ai_socktype,
- rp->ai_protocol);
- if (sfd < 0)
+ *sfd = socket(rp->ai_family, rp->ai_socktype,
+ rp->ai_protocol);
+ if (*sfd < 0)
continue;

- if (bind(sfd, rp->ai_addr, rp->ai_addrlen) == 0)
+ if (bind(*sfd, rp->ai_addr, rp->ai_addrlen) == 0)
break;

- close(sfd);
+ close(*sfd);
}

if (rp == NULL) {
@@ -270,6 +269,12 @@ static int open_udp(const char *node, const char *port, int *pid,

freeaddrinfo(result);

+ return num_port;
+}
+
+static void fork_udp_reader(int sfd, const char *node, const char *port,
+ int *pid, int cpu, int pagesize)
+{
*pid = fork();

if (*pid < 0)
@@ -279,6 +284,23 @@ static int open_udp(const char *node, const char *port, int *pid,
process_udp_child(sfd, node, port, cpu, pagesize);

close(sfd);
+}
+
+static int open_udp(const char *node, const char *port, int *pid,
+ int cpu, int pagesize, int start_port)
+{
+ int sfd;
+ int num_port;
+
+ /*
+ * udp_bind_a_port() currently does not return an error, but if that
+ * changes in the future, we have a check for it now.
+ */
+ num_port = udp_bind_a_port(start_port, &sfd);
+ if (num_port < 0)
+ return num_port;
+
+ fork_udp_reader(sfd, node, port, pid, cpu, pagesize);

return num_port;
}

2013-09-13 02:02:31

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 4/5] trace-cmd: Add virt-server mode for a virtualization environment

Add the virt-server mode for a virtualization environment based on the listen
mode for networking. This mode works like client/server mode over TCP/UDP,
but it uses virtio-serial channel instead of IP network. Using networking for
collecting trace data of guests is generally high overhead caused by processing
of the network stack.

We use virtio-serial for collecting trace data of guests. virtio-serial is a
simple communication path between the guest and the host. Moreover,
since virtio-serial and ftrace can use splice(2), memory copying is not
occurred on the guests. Therefore, total overhead for collecting trace data
of the guests will be reduced. The implementation of clients will be shown
in another patch.

virt-server uses two kinds of virtio-serial I/Fs:
(1) agent-ctl-path(UNIX domain socket)
=> control path of an agent trace-cmd each guest
(2) trace-path-cpuX(named pipe)
=> trace data path each vcpu

Those I/Fs must be defined as below paths:
(1) /tmp/trace-cmd/virt/agent-ctl-path
(2) /tmp/trace-cmd/virt/<guest domain>/trace-path-cpuX

If we run virt-server, agent-ctl-path I/F is automatically created because
virt-server operates as a server mode of UNIX domain socket. However,
trace-path-cpuX is not automatically created because we need to separate
trace data for each guests.

When the client uses virtio-serial, the client must notify the server of the
connection. This is because a virtio-serial I/F on the guest is a just character
device. In other words, the server cannot understand whether the client exists
or not even if the client opens the I/F. So, the server using virtio-serial
waits for the connection message MSG_TCONNECT from the client.
The server and the client operate as follows:

<server> <client>
wait for MSG_TCONNECT
open virtio-serial I/F
send MSG_TCONNECT
receive MSG_TCONNECT <----+
send MSG_RCONNECT
+---------------> receive MSG_RCONNECT
check "tracecmd-V2"
send cpus,pagesize,option(MSG_TINIT)
receive MSG_TINIT <-------+
print "cpus=XXX"
print "pagesize=XXX"
understand option
send port_array
+--MSG_RINIT-> receive MSG_RINIT
understand port_array
send meta data(MSG_SENDMETA)
receive MSG_SENDMETA <----+
record meta data
(snip)
send a message to finish sending meta data
| (MSG_FINMETA)
receive MSG_FINMETA <-----+
read block
--- start sending trace data on child processes ---

--- When client finishes sending trace data ---
send MSG_CLOSE
receive MSG_CLOSE <-------+
close(socket fd) close(socket fd)

<How to set up>
1. Run virt-server on a host before booting guests
# trace-cmd virt-server

2. Make guest domain directory
# mkdir -p /tmp/trace-cmd/virt/<domain>
# chmod 710 /tmp/trace-cmd/virt/<domain>
# chgrp qemu /tmp/trace-cmd/virt/<domain>

3. Make FIFO on the host
# mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}

4. Set up of virtio-serial pipe of a guest on the host
Add the following tags to domain XML files.
# virsh edit <domain>
<channel type='unix'>
<source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
<target type='virtio' name='agent-ctl-path'/>
</channel>
<channel type='pipe'>
<source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
<target type='virtio' name='trace-path-cpu0'/>
</channel>
... (cpu1, cpu2, ...)

5. Boot the guest
# virsh start <domain>

6. Check I/F of virtio-serial on the guest
# ls /dev/virtio-ports
...
agent-ctl-path
...
trace-path-cpu0
...

Next, the user will run trace-cmd with record --virt options or other options
for virtualization on the guest.

This patch adds only minimum features of virt-server as follows:
<Features>
- virt-server subcommand
- Create I/F directory(/tmp/trace-cmd/virt/)
- Use named pipe I/Fs of virtio-serial for trace data paths
- Use UNIX domain socket for connecting clients on guests
- Use splice(2) for collecting trace data of guests

<Restrictions>
- Use libvirt when we boot guests

Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
---
Documentation/trace-cmd-virt-server.1.txt | 89 ++++++
trace-cmd.c | 3
trace-cmd.h | 2
trace-listen.c | 434 ++++++++++++++++++++++++-----
trace-msg.c | 105 +++++++
trace-recorder.c | 54 +++-
trace-usage.c | 10 +
7 files changed, 602 insertions(+), 95 deletions(-)
create mode 100644 Documentation/trace-cmd-virt-server.1.txt

diff --git a/Documentation/trace-cmd-virt-server.1.txt b/Documentation/trace-cmd-virt-server.1.txt
new file mode 100644
index 0000000..4168a04
--- /dev/null
+++ b/Documentation/trace-cmd-virt-server.1.txt
@@ -0,0 +1,89 @@
+TRACE-CMD-VIRT-SERVER(1)
+========================
+
+NAME
+----
+trace-cmd-virt-server - listen for incoming connection to record tracing of
+ guests' clients
+
+SYNOPSIS
+--------
+*trace-cmd virt-server ['OPTIONS']
+
+DESCRIPTION
+-----------
+The trace-cmd(1) virt-server sets up UNIX domain socket I/F for communicating
+with guests' clients that run 'trace-cmd-record(1)' with the *--virt* option.
+When a connection is made, and the guest's client sends data, it will create a
+file called 'trace.DOMAIN.dat'. Where DOMAIN is the name of the guest named
+by libvirt.
+
+OPTIONS
+-------
+*-D*::
+ This options causes trace-cmd listen to go into a daemon mode and run in
+ the background.
+
+*-d* 'dir'::
+ This option specifies a directory to write the data files into.
+
+*-o* 'filename'::
+ This option overrides the default 'trace' in the 'trace.DOMAIN.dat' that
+ is created when guest's client connects.
+
+*-l* 'filename'::
+ This option writes the output messages to a log file instead of standard output.
+
+SET UP
+------
+Here, an example is written as follows:
+
+1. Run virt-server on a host
+ # trace-cmd virt-server
+
+2. Make guest domain directory
+ # mkdir -p /tmp/trace-cmd/virt/<DOMAIN>
+ # chmod 710 /tmp/trace-cmd/virt/<DOMAIN>
+ # chgrp qemu /tmp/trace-cmd/virt/<DOMAIN>
+
+3. Make FIFO on the host
+ # mkfifo /tmp/trace-cmd/virt/<DOMAIN>/trace-path-cpu{0,1,...,X}.{in,out}
+
+4. Set up of virtio-serial pipe of a guest on the host
+ Add the following tags to domain XML files.
+ # virsh edit <guest domain>
+ <channel type='unix'>
+ <source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
+ <target type='virtio' name='agent-ctl-path'/>
+ </channel>
+ <channel type='pipe'>
+ <source path='/tmp/trace-cmd/virt/<DOMAIN>/trace-path-cpu0'/>
+ <target type='virtio' name='trace-path-cpu0'/>
+ </channel>
+ ... (cpu1, cpu2, ...)
+
+5. Boot the guest
+ # virsh start <DOMAIN>
+
+6. Run the guest's client(see trace-cmd-record(1) with the *--virt* option)
+ # trace-cmd record -e sched* --virt
+
+SEE ALSO
+--------
+trace-cmd(1), trace-cmd-record(1), trace-cmd-report(1), trace-cmd-start(1),
+trace-cmd-stop(1), trace-cmd-extract(1), trace-cmd-reset(1),
+trace-cmd-split(1), trace-cmd-list(1)
+
+AUTHOR
+------
+Written by Yoshihiro YUNOMAE, <[email protected]>
+
+RESOURCES
+---------
+git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
+
+COPYING
+-------
+Copyright \(C) 2013 Hitachi, Ltd. Free use of this software is granted under
+the terms of the GNU Public License (GPL).
+
diff --git a/trace-cmd.c b/trace-cmd.c
index e6f5918..45a5bb4 100644
--- a/trace-cmd.c
+++ b/trace-cmd.c
@@ -219,7 +219,8 @@ int main (int argc, char **argv)
} else if (strcmp(argv[1], "mem") == 0) {
trace_mem(argc, argv);
exit(0);
- } else if (strcmp(argv[1], "listen") == 0) {
+ } else if (strcmp(argv[1], "listen") == 0 ||
+ strcmp(argv[1], "virt-server") == 0) {
trace_listen(argc, argv);
exit(0);
} else if (strcmp(argv[1], "split") == 0) {
diff --git a/trace-cmd.h b/trace-cmd.h
index a2958ac..ce3df2c 100644
--- a/trace-cmd.h
+++ b/trace-cmd.h
@@ -242,6 +242,7 @@ struct tracecmd_recorder *tracecmd_create_recorder_maxkb(const char *file, int c
struct tracecmd_recorder *tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *buffer);
struct tracecmd_recorder *tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer);
struct tracecmd_recorder *tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags, const char *buffer, int maxkb);
+struct tracecmd_recorder *tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd);

int tracecmd_start_recording(struct tracecmd_recorder *recorder, unsigned long sleep);
void tracecmd_stop_recording(struct tracecmd_recorder *recorder);
@@ -255,6 +256,7 @@ int tracecmd_msg_finish_sending_metadata(int fd);
void tracecmd_msg_send_close_msg();

/* for server */
+int tracecmd_msg_set_connection(int fd, const char *domain);
int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize);
int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports);
int tracecmd_msg_collect_metadata(int ifd, int ofd);
diff --git a/trace-listen.c b/trace-listen.c
index 280b1af..26d346c 100644
--- a/trace-listen.c
+++ b/trace-listen.c
@@ -23,9 +23,13 @@
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
+#include <grp.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
+#include <sys/epoll.h>
+#include <sys/un.h>
#include <netdb.h>
#include <unistd.h>
#include <fcntl.h>
@@ -50,15 +54,23 @@ static int backlog = 5;

static int proto_ver;

-#define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu
-static char *get_temp_file(const char *host, const char *port, int cpu)
+#define TEMP_FILE_STR_NW "%s.%s:%s.cpu%d", output_file, host, port, cpu
+#define TEMP_FILE_STR_VIRT "%s.%s:%d.cpu%d", output_file, domain, virtpid, cpu
+static char *get_temp_file(const char *host, const char *port,
+ const char *domain, int virtpid, int cpu)
{
char *file = NULL;
int size;

- size = snprintf(file, 0, TEMP_FILE_STR);
- file = malloc_or_die(size + 1);
- sprintf(file, TEMP_FILE_STR);
+ if (host) {
+ size = snprintf(file, 0, TEMP_FILE_STR_NW);
+ file = malloc_or_die(size + 1);
+ sprintf(file, TEMP_FILE_STR_NW);
+ } else {
+ size = snprintf(file, 0, TEMP_FILE_STR_VIRT);
+ file = malloc_or_die(size + 1);
+ sprintf(file, TEMP_FILE_STR_VIRT);
+ }

return file;
}
@@ -81,11 +93,15 @@ static void signal_setup(int sig, sighandler_t handle)
sigaction(sig, &action, NULL);
}

-static void delete_temp_file(const char *host, const char *port, int cpu)
+static void delete_temp_file(const char *host, const char *port,
+ const char *domain, int virtpid, int cpu)
{
char file[MAX_PATH];

- snprintf(file, MAX_PATH, TEMP_FILE_STR);
+ if (host)
+ snprintf(file, MAX_PATH, TEMP_FILE_STR_NW);
+ else
+ snprintf(file, MAX_PATH, TEMP_FILE_STR_VIRT);
unlink(file);
}

@@ -113,8 +129,12 @@ static int process_option(char *option)
return 0;
}

+static struct tracecmd_recorder *recorder;
+
static void finish(int sig)
{
+ if (recorder)
+ tracecmd_stop_recording(recorder);
done = true;
}

@@ -184,7 +204,7 @@ static void process_udp_child(int sfd, const char *host, const char *port,

signal_setup(SIGUSR1, finish);

- tempfile = get_temp_file(host, port, cpu);
+ tempfile = get_temp_file(host, port, NULL, 0, cpu);
fd = open(tempfile, O_WRONLY | O_TRUNC | O_CREAT, 0644);
if (fd < 0)
pdie("creating %s", tempfile);
@@ -225,6 +245,28 @@ static void process_udp_child(int sfd, const char *host, const char *port,
exit(0);
}

+#define SLEEP_DEFAULT 1000
+
+static void process_virt_child(int fd, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ char *tempfile;
+
+ signal_setup(SIGUSR1, finish);
+ tempfile = get_temp_file(NULL, NULL, domain, virtpid, cpu);
+
+ recorder = tracecmd_create_recorder_virt(tempfile, cpu, fd);
+
+ do {
+ if (tracecmd_start_recording(recorder, SLEEP_DEFAULT) < 0)
+ break;
+ } while (!done);
+
+ tracecmd_free_recorder(recorder);
+ put_temp_file(tempfile);
+ exit(0);
+}
+
#define START_PORT_SEARCH 1500
#define MAX_PORT_SEARCH 6000

@@ -272,20 +314,37 @@ static int udp_bind_a_port(int start_port, int *sfd)
return num_port;
}

-static void fork_udp_reader(int sfd, const char *node, const char *port,
- int *pid, int cpu, int pagesize)
+static void fork_reader(int sfd, const char *node, const char *port,
+ int *pid, int cpu, int pagesize, const char *domain,
+ int virtpid)
{
*pid = fork();

if (*pid < 0)
- pdie("creating udp reader");
+ pdie("creating reader");

- if (!*pid)
- process_udp_child(sfd, node, port, cpu, pagesize);
+ if (!*pid) {
+ if (node)
+ process_udp_child(sfd, node, port, cpu, pagesize);
+ else
+ process_virt_child(sfd, cpu, pagesize, domain, virtpid);
+ }

close(sfd);
}

+static void fork_udp_reader(int sfd, const char *node, const char *port,
+ int *pid, int cpu, int pagesize)
+{
+ fork_reader(sfd, node, port, pid, cpu, pagesize, NULL, 0);
+}
+
+static void fork_virt_reader(int sfd, int *pid, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ fork_reader(sfd, NULL, NULL, pid, cpu, pagesize, domain, virtpid);
+}
+
static int open_udp(const char *node, const char *port, int *pid,
int cpu, int pagesize, int start_port)
{
@@ -305,7 +364,30 @@ static int open_udp(const char *node, const char *port, int *pid,
return num_port;
}

-static int communicate_with_client(int fd, int *cpus, int *pagesize)
+#define TRACE_CMD_DIR "/tmp/trace-cmd/"
+#define VIRT_DIR TRACE_CMD_DIR "virt/"
+#define VIRT_TRACE_CTL_SOCK VIRT_DIR "agent-ctl-path"
+#define TRACE_PATH_DOMAIN_CPU VIRT_DIR "%s/trace-path-cpu%d.out"
+
+static int open_virtio_serial_pipe(int *pid, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ char buf[PATH_MAX];
+ int fd;
+
+ snprintf(buf, PATH_MAX, TRACE_PATH_DOMAIN_CPU, domain, cpu);
+ fd = open(buf, O_RDONLY | O_NONBLOCK);
+ if (fd < 0) {
+ warning("open %s", buf);
+ return fd;
+ }
+
+ fork_virt_reader(fd, pid, cpu, pagesize, domain, virtpid);
+
+ return fd;
+}
+
+static int communicate_with_client_nw(int fd, int *cpus, int *pagesize)
{
char buf[BUFSIZ];
char *option;
@@ -404,12 +486,30 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize)
return 0;
}

-static int create_client_file(const char *node, const char *port)
+static int communicate_with_client_virt(int fd, const char *domain, int *cpus, int *pagesize)
+{
+ proto_ver = V2_PROTOCOL;
+
+ if (tracecmd_msg_set_connection(fd, domain) < 0)
+ return -1;
+
+ /* read the CPU count, the page size, and options */
+ if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0)
+ return -1;
+
+ return 0;
+}
+
+static int create_client_file(const char *node, const char *port,
+ const char *domain, int pid)
{
char buf[BUFSIZ];
int ofd;

- snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port);
+ if (node)
+ snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port);
+ else
+ snprintf(buf, BUFSIZ, "%s.%s:%d.dat", output_file, domain, pid);

ofd = open(buf, O_RDWR | O_CREAT | O_TRUNC, 0644);
if (ofd < 0)
@@ -418,7 +518,8 @@ static int create_client_file(const char *node, const char *port)
}

static void destroy_all_readers(int cpus, int *pid_array, const char *node,
- const char *port)
+ const char *port, const char *domain,
+ int virtpid)
{
int cpu;

@@ -426,36 +527,48 @@ static void destroy_all_readers(int cpus, int *pid_array, const char *node,
if (pid_array[cpu] > 0) {
kill(pid_array[cpu], SIGKILL);
waitpid(pid_array[cpu], NULL, 0);
- delete_temp_file(node, port, cpu);
+ delete_temp_file(node, port, domain, virtpid, cpu);
pid_array[cpu] = 0;
}
}
}

static int *create_all_readers(int cpus, const char *node, const char *port,
- int pagesize, int fd)
+ const char *domain, int virtpid, int pagesize, int fd)
{
char buf[BUFSIZ];
- int *port_array;
+ int *port_array = NULL;
int *pid_array;
int start_port;
int udp_port;
int cpu;
int pid;

- port_array = malloc_or_die(sizeof(int) * cpus);
+ if (node) {
+ port_array = malloc_or_die(sizeof(int) * cpus);
+ start_port = START_PORT_SEARCH;
+ }
pid_array = malloc_or_die(sizeof(int) * cpus);
memset(pid_array, 0, sizeof(int) * cpus);

- start_port = START_PORT_SEARCH;
-
- /* Now create a UDP port for each CPU */
+ /* Now create a reader for each CPU */
for (cpu = 0; cpu < cpus; cpu++) {
- udp_port = open_udp(node, port, &pid, cpu,
- pagesize, start_port);
- if (udp_port < 0)
- goto out_free;
- port_array[cpu] = udp_port;
+ if (node) {
+ udp_port = open_udp(node, port, &pid, cpu,
+ pagesize, start_port);
+ if (udp_port < 0)
+ goto out_free;
+ port_array[cpu] = udp_port;
+ /*
+ * due to some bugging finding ports,
+ * force search after last port
+ */
+ start_port = udp_port + 1;
+ } else {
+ if (open_virtio_serial_pipe(&pid, cpu, pagesize,
+ domain, virtpid) < 0)
+ goto out_free;
+ }
pid_array[cpu] = pid;
/*
* Due to some bugging finding ports,
@@ -482,7 +595,7 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
return pid_array;

out_free:
- destroy_all_readers(cpus, pid_array, node, port);
+ destroy_all_readers(cpus, pid_array, node, port, domain, virtpid);
return NULL;
}

@@ -524,7 +637,7 @@ static void stop_all_readers(int cpus, int *pid_array)
}

static void put_together_file(int cpus, int ofd, const char *node,
- const char *port)
+ const char *port, const char *domain, int virtpid)
{
char **temp_files;
int cpu;
@@ -533,25 +646,31 @@ static void put_together_file(int cpus, int ofd, const char *node,
temp_files = malloc_or_die(sizeof(*temp_files) * cpus);

for (cpu = 0; cpu < cpus; cpu++)
- temp_files[cpu] = get_temp_file(node, port, cpu);
+ temp_files[cpu] = get_temp_file(node, port, domain,
+ virtpid, cpu);

tracecmd_attach_cpu_data_fd(ofd, cpus, temp_files);
free(temp_files);
}

-static void process_client(const char *node, const char *port, int fd)
+static void process_client(const char *node, const char *port,
+ const char *domain, int virtpid, int fd)
{
int *pid_array;
int pagesize;
int cpus;
int ofd;

- if (communicate_with_client(fd, &cpus, &pagesize) < 0)
- return;
-
- ofd = create_client_file(node, port);
+ if (node) {
+ if (communicate_with_client_nw(fd, &cpus, &pagesize) < 0)
+ return;
+ } else {
+ if (communicate_with_client_virt(fd, domain, &cpus, &pagesize) < 0)
+ return;
+ }

- pid_array = create_all_readers(cpus, node, port, pagesize, fd);
+ ofd = create_client_file(node, port, domain, virtpid);
+ pid_array = create_all_readers(cpus, node, port, domain, virtpid, pagesize, fd);
if (!pid_array)
return;

@@ -570,9 +689,22 @@ static void process_client(const char *node, const char *port, int fd)
/* wait a little to have the readers clean up */
sleep(1);

- put_together_file(cpus, ofd, node, port);
+ put_together_file(cpus, ofd, node, port, domain, virtpid);

- destroy_all_readers(cpus, pid_array, node, port);
+ destroy_all_readers(cpus, pid_array, node, port, domain, virtpid);
+}
+
+static void process_client_nw(const char *node, const char *port, int fd)
+{
+ process_client(node, port, NULL, 0, fd);
+}
+
+static void process_client_virt(const char *domain, int virtpid, int fd)
+{
+ /* keep connection to qemu if clients on guests finish operation */
+ do {
+ process_client(NULL, NULL, domain, virtpid, fd);
+ } while (!done);
}

static int do_fork(int cfd)
@@ -599,8 +731,8 @@ static int do_fork(int cfd)
return 0;
}

-static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
- socklen_t peer_addr_len)
+static int do_connection(int cfd, struct sockaddr *peer_addr,
+ socklen_t *peer_addr_len, const char *domain, int virtpid)
{
char host[NI_MAXHOST], service[NI_MAXSERV];
int s;
@@ -610,21 +742,22 @@ static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
if (ret)
return ret;

- s = getnameinfo((struct sockaddr *)peer_addr, peer_addr_len,
- host, NI_MAXHOST,
- service, NI_MAXSERV, NI_NUMERICSERV);
-
- if (s == 0)
- plog("Connected with %s:%s\n",
- host, service);
- else {
- plog("Error with getnameinfo: %s\n",
- gai_strerror(s));
- close(cfd);
- return -1;
- }
-
- process_client(host, service, cfd);
+ if (peer_addr) {
+ s = getnameinfo(peer_addr, *peer_addr_len, host, NI_MAXHOST,
+ service, NI_MAXSERV, NI_NUMERICSERV);
+
+ if (s == 0)
+ plog("Connected with %s:%s\n",
+ host, service);
+ else {
+ plog("Error with getnameinfo: %s\n",
+ gai_strerror(s));
+ close(cfd);
+ return -1;
+ }
+ process_client_nw(host, service, cfd);
+ } else
+ process_client_virt(domain, virtpid, cfd);

close(cfd);

@@ -634,6 +767,77 @@ static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
return 0;
}

+static int do_connection_nw(int cfd, struct sockaddr *addr, socklen_t *addrlen)
+{
+ return do_connection(cfd, addr, addrlen, NULL, 0);
+}
+
+#define LIBVIRT_DOMAIN_PATH "/var/run/libvirt/qemu/"
+
+/* We can convert pid to domain name of a guest when we use libvirt. */
+static char *get_guest_domain_from_pid(int pid)
+{
+ struct dirent *dirent;
+ char file_name[NAME_MAX];
+ char *file_name_ret, *domain;
+ char buf[BUFSIZ];
+ DIR *dir;
+ size_t doml;
+ int fd;
+
+ dir = opendir(LIBVIRT_DOMAIN_PATH);
+ if (!dir) {
+ if (errno == ENOENT)
+ warning("Only support for using libvirt");
+ return NULL;
+ }
+
+ for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
+ snprintf(file_name, NAME_MAX, LIBVIRT_DOMAIN_PATH"%s",
+ dirent->d_name);
+ file_name_ret = strstr(file_name, ".pid");
+ if (file_name_ret) {
+ fd = open(file_name, O_RDONLY);
+ if (fd < 0)
+ return NULL;
+ if (read(fd, buf, BUFSIZ) < 0)
+ return NULL;
+
+ if (pid == atoi(buf)) {
+ /* not include /var/run/libvirt/qemu */
+ doml = (size_t)(file_name_ret - file_name)
+ - strlen(LIBVIRT_DOMAIN_PATH);
+ domain = strndup(file_name +
+ strlen(LIBVIRT_DOMAIN_PATH),
+ doml);
+ plog("start %s:%d\n", domain, pid);
+ return domain;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static int do_connection_virt(int cfd)
+{
+ struct ucred cr;
+ socklen_t cl;
+ int ret;
+ char *domain;
+
+ cl = sizeof(cr);
+ ret = getsockopt(cfd, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
+ if (ret < 0)
+ return ret;
+
+ domain = get_guest_domain_from_pid(cr.pid);
+ if (!domain)
+ return -1;
+
+ return do_connection(cfd, NULL, NULL, domain, cr.pid);
+}
+
static int *client_pids;
static int saved_pids;
static int size_pids;
@@ -678,12 +882,11 @@ static void remove_process(int pid)

static void kill_clients(void)
{
- int status;
int i;

for (i = 0; i < saved_pids; i++) {
kill(client_pids[i], SIGINT);
- waitpid(client_pids[i], &status, 0);
+ waitpid(client_pids[i], NULL, 0);
}

saved_pids = 0;
@@ -702,31 +905,51 @@ static void clean_up(int sig)
} while (ret > 0);
}

-static void do_accept_loop(int sfd)
+static void do_accept_loop(int sfd, bool nw, struct sockaddr *addr,
+ socklen_t *addrlen)
{
- struct sockaddr_storage peer_addr;
- socklen_t peer_addr_len;
int cfd, pid;

- peer_addr_len = sizeof(peer_addr);
-
do {
- cfd = accept(sfd, (struct sockaddr *)&peer_addr,
- &peer_addr_len);
+ cfd = accept(sfd, addr, addrlen);
printf("connected!\n");
if (cfd < 0 && errno == EINTR)
continue;
if (cfd < 0)
pdie("connecting");

- pid = do_connection(cfd, &peer_addr, peer_addr_len);
+ if (nw)
+ pid = do_connection_nw(cfd, addr, addrlen);
+ else
+ pid = do_connection_virt(cfd);
if (pid > 0)
add_process(pid);

} while (!done);
}

-static void do_listen(char *port)
+static void do_accept_loop_nw(int sfd)
+{
+ struct sockaddr_storage peer_addr;
+ socklen_t peer_addr_len;
+
+ peer_addr_len = sizeof(peer_addr);
+
+ do_accept_loop(sfd, true, (struct sockaddr *)&peer_addr,
+ &peer_addr_len);
+}
+
+static void do_accept_loop_virt(int sfd)
+{
+ struct sockaddr_un un_addr;
+ socklen_t un_addrlen;
+
+ un_addrlen = sizeof(un_addr);
+
+ do_accept_loop(sfd, false, (struct sockaddr *)&un_addr, &un_addrlen);
+}
+
+static void do_listen_nw(char *port)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
@@ -764,11 +987,67 @@ static void do_listen(char *port)
if (listen(sfd, backlog) < 0)
pdie("listen");

- do_accept_loop(sfd);
+ do_accept_loop_nw(sfd);

kill_clients();
}

+static void make_virt_if_dir(void)
+{
+ struct group *group;
+
+ if (mkdir(TRACE_CMD_DIR, 0710) < 0) {
+ if (errno != EEXIST)
+ pdie("mkdir %s", TRACE_CMD_DIR);
+ }
+ /* QEMU operates as qemu:qemu */
+ chmod(TRACE_CMD_DIR, 0710);
+ group = getgrnam("qemu");
+ if (chown(TRACE_CMD_DIR, -1, group->gr_gid) < 0)
+ pdie("chown %s", TRACE_CMD_DIR);
+
+ if (mkdir(VIRT_DIR, 0710) < 0) {
+ if (errno != EEXIST)
+ pdie("mkdir %s", VIRT_DIR);
+ }
+ chmod(VIRT_DIR, 0710);
+ if (chown(VIRT_DIR, -1, group->gr_gid) < 0)
+ pdie("chown %s", VIRT_DIR);
+}
+
+static void do_listen_virt(void)
+{
+ struct sockaddr_un un_server;
+ struct group *group;
+ socklen_t slen;
+ int sfd;
+
+ make_virt_if_dir();
+
+ slen = sizeof(un_server);
+ sfd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (sfd < 0)
+ pdie("socket");
+
+ un_server.sun_family = AF_UNIX;
+ snprintf(un_server.sun_path, PATH_MAX, VIRT_TRACE_CTL_SOCK);
+
+ if (bind(sfd, (struct sockaddr *)&un_server, slen) < 0)
+ pdie("bind");
+ chmod(VIRT_TRACE_CTL_SOCK, 0660);
+ group = getgrnam("qemu");
+ if (chown(VIRT_TRACE_CTL_SOCK, -1, group->gr_gid) < 0)
+ pdie("fchown %s", VIRT_TRACE_CTL_SOCK);
+
+ if (listen(sfd, backlog) < 0)
+ pdie("listen");
+
+ do_accept_loop_virt(sfd);
+
+ unlink(VIRT_TRACE_CTL_SOCK);
+ kill_clients();
+}
+
static void start_daemon(void)
{
if (daemon(1, 0) < 0)
@@ -785,11 +1064,17 @@ void trace_listen(int argc, char **argv)
char *port = NULL;
int daemon = 0;
int c;
+ int nw = 0;
+ int virt = 0;

if (argc < 2)
usage(argv);

- if (strcmp(argv[1], "listen") != 0)
+ if ((nw = (strcmp(argv[1], "listen") == 0)))
+ ; /* do nothing */
+ else if ((virt = (strcmp(argv[1], "virt-server") == 0)))
+ ; /* do nothing */
+ else
usage(argv);

for (;;) {
@@ -810,6 +1095,8 @@ void trace_listen(int argc, char **argv)
usage(argv);
break;
case 'p':
+ if (virt)
+ die("-p only available with listen");
port = optarg;
break;
case 'd':
@@ -832,7 +1119,7 @@ void trace_listen(int argc, char **argv)
}
}

- if (!port)
+ if (!port && nw)
usage(argv);

if ((argc - optind) >= 2)
@@ -860,7 +1147,10 @@ void trace_listen(int argc, char **argv)
signal_setup(SIGINT, finish);
signal_setup(SIGTERM, finish);

- do_listen(port);
+ if (nw)
+ do_listen_nw(port);
+ else
+ do_listen_virt();

return;
}
diff --git a/trace-msg.c b/trace-msg.c
index 61bde54..0b3b356 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -59,6 +59,11 @@ typedef __be32 be32;

#define CPU_MAX 256

+/* use CONNECTION_MSG as a protocol version of trace-msg */
+#define MSG_VERSION "V2"
+#define CONNECTION_MSG "tracecmd-" MSG_VERSION
+#define CONNECTION_MSGSIZE sizeof(CONNECTION_MSG)
+
/* for both client and server */
bool use_tcp;
int cpu_count;
@@ -78,6 +83,10 @@ struct tracecmd_msg_str {
char *buf;
} __attribute__((packed));

+struct tracecmd_msg_rconnect {
+ struct tracecmd_msg_str str;
+};
+
struct tracecmd_msg_opt {
be32 size;
be32 opt_cmd;
@@ -104,6 +113,7 @@ struct tracecmd_msg_error {
be32 size;
be32 cmd;
union {
+ struct tracecmd_msg_rconnect rconnect;
struct tracecmd_msg_tinit tinit;
struct tracecmd_msg_rinit rinit;
struct tracecmd_msg_meta meta;
@@ -111,7 +121,10 @@ struct tracecmd_msg_error {
} __attribute__((packed));

enum tracecmd_msg_cmd {
+ MSG_ERROR = 0,
MSG_CLOSE = 1,
+ MSG_TCONNECT = 2,
+ MSG_RCONNECT = 3,
MSG_TINIT = 4,
MSG_RINIT = 5,
MSG_SENDMETA = 6,
@@ -122,6 +135,7 @@ struct tracecmd_msg {
be32 size;
be32 cmd;
union {
+ struct tracecmd_msg_rconnect rconnect;
struct tracecmd_msg_tinit tinit;
struct tracecmd_msg_rinit rinit;
struct tracecmd_msg_meta meta;
@@ -155,6 +169,16 @@ static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
memcpy(dest+offset, buf, buflen);
}

+static int make_rconnect(const char *buf, int buflen, struct tracecmd_msg *msg)
+{
+ u32 offset = offsetof(struct tracecmd_msg, data.rconnect.str.buf);
+
+ msg->data.rconnect.str.size = htonl(buflen);
+ bufcpy(msg, offset, buf, buflen);
+
+ return 0;
+}
+
enum msg_opt_command {
MSGOPT_USETCP = 1,
};
@@ -232,11 +256,13 @@ static int make_rinit(struct tracecmd_msg *msg)

msg->data.rinit.cpus = htonl(cpu_count);

- for (i = 0; i < cpu_count; i++) {
- /* + rrqports->cpus or rrqports->port_array[i] */
- offset += sizeof(be32);
- port = htonl(port_array[i]);
- bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
+ if (port_array) {
+ for (i = 0; i < cpu_count; i++) {
+ /* + rrqports->cpus or rrqports->port_array[i] */
+ offset += sizeof(be32);
+ port = htonl(port_array[i]);
+ bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
+ }
}

return 0;
@@ -248,6 +274,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
u32 len = 0;

switch (cmd) {
+ case MSG_RCONNECT:
+ return sizeof(msg->data.rconnect.str.size) + CONNECTION_MSGSIZE;
case MSG_TINIT:
len = sizeof(msg->data.tinit.cpus)
+ sizeof(msg->data.tinit.page_size)
@@ -285,6 +313,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
{
switch (cmd) {
+ case MSG_RCONNECT:
+ return make_rconnect(CONNECTION_MSG, CONNECTION_MSGSIZE, msg);
case MSG_TINIT:
return make_tinit(len, msg);
case MSG_RINIT:
@@ -425,6 +455,8 @@ static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
{
char msg_tmp[TRACECMD_MSG_MAX_LEN];
+ char *buf;
+ int offset = TRACECMD_MSG_HDR_LEN;
u32 cmd;
int ret;

@@ -437,8 +469,20 @@ static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)

*msg = (struct tracecmd_msg *)msg_tmp;
cmd = ntohl((*msg)->cmd);
- if (cmd == MSG_CLOSE)
+ switch (cmd) {
+ case MSG_RCONNECT:
+ offset += sizeof((*msg)->data.rconnect.str.size);
+ buf = tracecmd_msg_buf_access(*msg, offset);
+ /* Make sure the server is the tracecmd server */
+ if (memcmp(buf, CONNECTION_MSG,
+ ntohl((*msg)->data.rconnect.str.size) - 1) != 0) {
+ warning("server not tracecmd server");
+ return -EPROTONOSUPPORT;
+ }
+ break;
+ case MSG_CLOSE:
return -ECONNABORTED;
+ }

return 0;
}
@@ -495,7 +539,54 @@ static void error_operation_for_server(struct tracecmd_msg *msg)

cmd = ntohl(msg->cmd);

- warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
+ if (cmd == MSG_ERROR)
+ plog("Receive error message: cmd=%d size=%d\n",
+ ntohl(msg->data.err.cmd), ntohl(msg->data.err.size));
+ else
+ warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
+}
+
+int tracecmd_msg_set_connection(int fd, const char *domain)
+{
+ struct tracecmd_msg *msg;
+ char buf[TRACECMD_MSG_MAX_LEN] = {};
+ u32 cmd;
+ int ret;
+
+ /*
+ * Wait for connection msg by a client first.
+ * If a client uses virtio-serial, a connection message will
+ * not be sent immediately after accept(). connect() is called
+ * in QEMU, so the client can send the connection message
+ * after guest boots. Therefore, the virt-server patiently
+ * waits for the connection request of a client.
+ */
+ ret = tracecmd_msg_recv(fd, buf);
+ if (ret < 0) {
+ if (!buf[0]) {
+ /* No data means QEMU has already died. */
+ close(fd);
+ die("Connection refuesd: %s", domain);
+ }
+ return -ENOMSG;
+ }
+
+ msg = (struct tracecmd_msg *)buf;
+ cmd = ntohl(msg->cmd);
+ if (cmd == MSG_CLOSE)
+ return -ECONNABORTED;
+ else if (cmd != MSG_TCONNECT)
+ return -EINVAL;
+
+ ret = tracecmd_msg_send(fd, MSG_RCONNECT);
+ if (ret < 0)
+ goto error;
+
+ return 0;
+
+error:
+ error_operation_for_server(msg);
+ return ret;
}

#define MAX_OPTION_SIZE 4096
diff --git a/trace-recorder.c b/trace-recorder.c
index 520d486..8169dc3 100644
--- a/trace-recorder.c
+++ b/trace-recorder.c
@@ -149,19 +149,23 @@ tracecmd_create_buffer_recorder_fd2(int fd, int fd2, int cpu, unsigned flags,
recorder->fd1 = fd;
recorder->fd2 = fd2;

- path = malloc_or_die(strlen(buffer) + 40);
- if (!path)
- goto out_free;
-
- if (flags & TRACECMD_RECORD_SNAPSHOT)
- sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw", buffer, cpu);
- else
- sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw", buffer, cpu);
- recorder->trace_fd = open(path, O_RDONLY);
- if (recorder->trace_fd < 0)
- goto out_free;
-
- free(path);
+ if (buffer) {
+ path = malloc_or_die(strlen(buffer) + 40);
+ if (!path)
+ goto out_free;
+
+ if (flags & TRACECMD_RECORD_SNAPSHOT)
+ sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw",
+ buffer, cpu);
+ else
+ sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw",
+ buffer, cpu);
+ recorder->trace_fd = open(path, O_RDONLY);
+ if (recorder->trace_fd < 0)
+ goto out_free;
+
+ free(path);
+ }

if ((recorder->flags & TRACECMD_RECORD_NOSPLICE) == 0) {
ret = pipe(recorder->brass);
@@ -184,8 +188,9 @@ tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *
return tracecmd_create_buffer_recorder_fd2(fd, -1, cpu, flags, buffer, 0);
}

-struct tracecmd_recorder *
-tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer)
+static struct tracecmd_recorder *
+__tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
+ const char *buffer)
{
struct tracecmd_recorder *recorder;
int fd;
@@ -248,6 +253,25 @@ tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags,
goto out;
}

+struct tracecmd_recorder *
+tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
+ const char *buffer)
+{
+ return __tracecmd_create_buffer_recorder(file, cpu, flags, buffer);
+}
+
+struct tracecmd_recorder *
+tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd)
+{
+ struct tracecmd_recorder *recorder;
+
+ recorder = __tracecmd_create_buffer_recorder(file, cpu, 0, NULL);
+ if (recorder)
+ recorder->trace_fd = trace_fd;
+
+ return recorder;
+}
+
struct tracecmd_recorder *tracecmd_create_recorder_fd(int fd, int cpu, unsigned flags)
{
char *tracing;
diff --git a/trace-usage.c b/trace-usage.c
index b8f26e6..e6a239f 100644
--- a/trace-usage.c
+++ b/trace-usage.c
@@ -153,6 +153,16 @@ static struct usage_help usage_help[] = {
" -l logfile to write messages to.\n"
},
{
+ "virt-server",
+ "listen on a virtio-serial for trace clients",
+ " %s virt-server [-o file][-d dir][-l logfile]\n"
+ " Creates a socket to listen for clients.\n"
+ " -D create it in daemon mode.\n"
+ " -o file name to use for clients.\n"
+ " -d diretory to store client files.\n"
+ " -l logfile to write messages to.\n"
+ },
+ {
"list",
"list the available events, plugins or options",
" %s list [-e][-t][-o][-f [regex]]\n"

2013-09-13 02:02:45

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 5/5] trace-cmd: Add --virt option for record mode

Add --virt option for record mode for a virtualization environment.
If we use this option on a guest, we can send trace data in low-overhead.
This is because guests can send trace data to a host without copying the data
by using splice(2).

The format is:

trace-cmd record --virt -e sched*

<Note>
The client using virtio-serial does not wait for the connection message
"tracecmd" from the server. The client sends the connection message
MSG_TCONNECT first.

<Restriction>
This feature can use from kernel-3.6 which supports splice_read for ftrace
and splice_write for virtio-serial.

Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
---
Documentation/trace-cmd-record.1.txt | 11 ++++-
trace-cmd.h | 3 +
trace-msg.c | 80 ++++++++++++++++++++++++++++++++--
trace-msg.h | 4 ++
trace-record.c | 70 ++++++++++++++++++++++++++++--
5 files changed, 156 insertions(+), 12 deletions(-)

diff --git a/Documentation/trace-cmd-record.1.txt b/Documentation/trace-cmd-record.1.txt
index 832a257..7eb8ac9 100644
--- a/Documentation/trace-cmd-record.1.txt
+++ b/Documentation/trace-cmd-record.1.txt
@@ -240,6 +240,15 @@ OPTIONS
timestamp to gettimeofday which will allow wall time output from the
timestamps reading the created 'trace.dat' file.

+*--virt*::
+ This option is usded on a guest in a virtualization environment. If a host
+ is running "trace-cmd virt-server", this option is used to have the data
+ sent to the host with virtio-serial like *-N* option. (see also
+ trace-cmd-virt-server(1))
+
+ Note: This option is not supported with latency tracer plugins:
+ wakeup, wakeup_rt, irqsoff, preemptoff and preemptirqsoff
+
EXAMPLES
--------

@@ -302,7 +311,7 @@ SEE ALSO
--------
trace-cmd(1), trace-cmd-report(1), trace-cmd-start(1), trace-cmd-stop(1),
trace-cmd-extract(1), trace-cmd-reset(1), trace-cmd-split(1),
-trace-cmd-list(1), trace-cmd-listen(1)
+trace-cmd-list(1), trace-cmd-listen(1), trace-cmd-virt-server(1)

AUTHOR
------
diff --git a/trace-cmd.h b/trace-cmd.h
index ce3df2c..d69ea2e 100644
--- a/trace-cmd.h
+++ b/trace-cmd.h
@@ -250,7 +250,8 @@ void tracecmd_stat_cpu(struct trace_seq *s, int cpu);
long tracecmd_flush_recording(struct tracecmd_recorder *recorder);

/* for clients */
-int tracecmd_msg_send_init_data(int fd);
+int tracecmd_msg_connect_to_server(int fd);
+int tracecmd_msg_send_init_data_nw(int fd);
int tracecmd_msg_metadata_send(int fd, char *buf, int size);
int tracecmd_msg_finish_sending_metadata(int fd);
void tracecmd_msg_send_close_msg();
diff --git a/trace-msg.c b/trace-msg.c
index 0b3b356..4de1cf3 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -30,6 +30,7 @@
#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <linux/types.h>

@@ -72,6 +73,7 @@ int cpu_count;
static int psfd;
unsigned int page_size;
int *client_ports;
+int *virt_sfds;
bool send_metadata;

/* for server */
@@ -268,12 +270,20 @@ static int make_rinit(struct tracecmd_msg *msg)
return 0;
}

+static int make_error_msg(u32 len, struct tracecmd_msg *msg)
+{
+ bufcpy(msg, TRACECMD_MSG_HDR_LEN, errmsg, len);
+ return 0;
+}
+
static u32 tracecmd_msg_get_body_length(u32 cmd)
{
struct tracecmd_msg *msg;
u32 len = 0;

switch (cmd) {
+ case MSG_ERROR:
+ return ntohl(errmsg->size);
case MSG_RCONNECT:
return sizeof(msg->data.rconnect.str.size) + CONNECTION_MSGSIZE;
case MSG_TINIT:
@@ -302,6 +312,7 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
+ sizeof(msg->data.rinit.port_array);
case MSG_SENDMETA:
return TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN;
+ case MSG_TCONNECT:
case MSG_CLOSE:
case MSG_FINMETA:
break;
@@ -313,12 +324,15 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
{
switch (cmd) {
+ case MSG_ERROR:
+ return make_error_msg(len, msg);
case MSG_RCONNECT:
return make_rconnect(CONNECTION_MSG, CONNECTION_MSGSIZE, msg);
case MSG_TINIT:
return make_tinit(len, msg);
case MSG_RINIT:
return make_rinit(msg);
+ case MSG_TCONNECT:
case MSG_CLOSE:
case MSG_SENDMETA: /* meta data is not stored here. */
case MSG_FINMETA:
@@ -374,6 +388,12 @@ static int tracecmd_msg_send(int fd, u32 cmd)
return 0;
}

+static void tracecmd_msg_send_error(int fd, struct tracecmd_msg *msg)
+{
+ errmsg = msg;
+ tracecmd_msg_send(fd, MSG_ERROR);
+}
+
static int tracecmd_msg_read_extra(int fd, char *buf, u32 size, int *n)
{
int r = 0;
@@ -502,20 +522,36 @@ static int tracecmd_msg_send_and_wait_for_msg(int fd, u32 cmd, struct tracecmd_m
return 0;
}

-int tracecmd_msg_send_init_data(int fd)
+static int tracecmd_msg_send_init_data(int fd, bool nw)
{
- struct tracecmd_msg *msg;
+ struct tracecmd_msg *msg = NULL;
int i, cpus;
int ret;
+ char buf[PATH_MAX];

ret = tracecmd_msg_send_and_wait_for_msg(fd, MSG_TINIT, &msg);
if (ret < 0)
return ret;

cpus = ntohl(msg->data.rinit.cpus);
- client_ports = malloc_or_die(sizeof(int) * cpus);
- for (i = 0; i < cpus; i++)
- client_ports[i] = ntohl(msg->data.rinit.port_array[i]);
+ if (nw) {
+ client_ports = malloc_or_die(sizeof(int) * cpus);
+ for (i = 0; i < cpus; i++)
+ client_ports[i] =
+ ntohl(msg->data.rinit.port_array[i]);
+ } else {
+ virt_sfds = malloc_or_die(sizeof(int) * cpus);
+
+ /* Open data paths of virtio-serial */
+ for (i = 0; i < cpus; i++) {
+ snprintf(buf, PATH_MAX, TRACE_PATH_CPU, i);
+ virt_sfds[i] = open(buf, O_WRONLY);
+ if (virt_sfds[i] < 0) {
+ warning("Cannot open %s", TRACE_PATH_CPU, i);
+ return -errno;
+ }
+ }
+ }

/* Next, send meta data */
send_metadata = true;
@@ -523,6 +559,40 @@ int tracecmd_msg_send_init_data(int fd)
return 0;
}

+int tracecmd_msg_send_init_data_nw(int fd)
+{
+ return tracecmd_msg_send_init_data(fd, true);
+}
+
+static int tracecmd_msg_send_init_data_virt(int fd)
+{
+ return tracecmd_msg_send_init_data(fd, false);
+}
+
+int tracecmd_msg_connect_to_server(int fd)
+{
+ struct tracecmd_msg *msg = NULL;
+ int ret;
+
+ /* connect to a server */
+ ret = tracecmd_msg_send_and_wait_for_msg(fd, MSG_TCONNECT, &msg);
+ if (ret < 0) {
+ if (ret == -EPROTONOSUPPORT)
+ goto error;
+ return ret;
+ }
+
+ ret = tracecmd_msg_send_init_data_virt(fd);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+
+error:
+ tracecmd_msg_send_error(fd, msg);
+ return ret;
+}
+
static bool process_option(struct tracecmd_msg_opt *opt)
{
/* currently the only option we have is to us TCP */
diff --git a/trace-msg.h b/trace-msg.h
index b23e72b..502c1bf 100644
--- a/trace-msg.h
+++ b/trace-msg.h
@@ -2,6 +2,9 @@
#define _TRACE_MSG_H_

#include <stdbool.h>
+#define VIRTIO_PORTS "/dev/virtio-ports/"
+#define AGENT_CTL_PATH VIRTIO_PORTS "agent-ctl-path"
+#define TRACE_PATH_CPU VIRTIO_PORTS "trace-path-cpu%d"

#define UDP_MAX_PACKET (65536 - 20)
#define V2_MAGIC "677768\0"
@@ -17,6 +20,7 @@ extern int cpu_count;
extern unsigned int page_size;
extern int *client_ports;
extern bool send_metadata;
+extern int *virt_sfds;

/* for server */
extern bool done;
diff --git a/trace-record.c b/trace-record.c
index ebfe6c0..1b1d293 100644
--- a/trace-record.c
+++ b/trace-record.c
@@ -80,6 +80,9 @@ static int sfd;
/* Max size to let a per cpu file get */
static int max_kb;

+struct tracecmd_output *virt_handle;
+static bool virt;
+
static int do_ptrace;

static int filter_task;
@@ -1578,6 +1581,9 @@ static int create_recorder(struct buffer_instance *instance, int cpu, int extrac
if (client_ports) {
connect_port(cpu);
recorder = tracecmd_create_recorder_fd(client_ports[cpu], cpu, recorder_flags);
+ } else if (virt_sfds) {
+ recorder = tracecmd_create_recorder_fd(virt_sfds[cpu], cpu,
+ recorder_flags);
} else {
file = get_temp_file(instance, cpu);
recorder = create_recorder_instance(instance, file, cpu);
@@ -1613,7 +1619,7 @@ static void check_first_msg_from_server(int fd)
die("server not tracecmd server");
}

-static void communicate_with_listener_v1(int fd)
+static void communicate_with_listener_v1_nw(int fd)
{
char buf[BUFSIZ];
ssize_t n;
@@ -1676,9 +1682,9 @@ static void communicate_with_listener_v1(int fd)
}
}

-static void communicate_with_listener_v2(int fd)
+static void communicate_with_listener_v2_nw(int fd)
{
- if (tracecmd_msg_send_init_data(fd) < 0)
+ if (tracecmd_msg_send_init_data_nw(fd) < 0)
die("Cannot communicate with server");
}

@@ -1716,6 +1722,12 @@ static void check_protocol_version(int fd)
}
}

+static void communicate_with_listener_virt(int fd)
+{
+ if (tracecmd_msg_connect_to_server(fd) < 0)
+ die("Cannot communicate with server");
+}
+
static void setup_network(void)
{
struct tracecmd_output *handle;
@@ -1772,11 +1784,11 @@ again:
close(sfd);
goto again;
}
- communicate_with_listener_v2(sfd);
+ communicate_with_listener_v2_nw(sfd);
}

if (proto_ver == V1_PROTOCOL)
- communicate_with_listener_v1(sfd);
+ communicate_with_listener_v1_nw(sfd);

/* Now create the handle through this socket */
handle = tracecmd_create_init_fd_glob(sfd, listed_events);
@@ -1787,6 +1799,21 @@ again:
/* OK, we are all set, let'r rip! */
}

+static void setup_virtio(void)
+{
+ int fd;
+
+ fd = open(AGENT_CTL_PATH, O_RDWR);
+ if (fd < 0)
+ die("Cannot open %s", AGENT_CTL_PATH);
+
+ communicate_with_listener_virt(fd);
+
+ /* Now create the handle through this socket */
+ virt_handle = tracecmd_create_init_fd_glob(fd, listed_events);
+ tracecmd_msg_finish_sending_metadata(fd);
+}
+
static void finish_network(void)
{
if (proto_ver == V2_PROTOCOL)
@@ -1795,6 +1822,13 @@ static void finish_network(void)
free(host);
}

+static void finish_virt(void)
+{
+ tracecmd_msg_send_close_msg();
+ free(virt_handle);
+ free(virt_sfds);
+}
+
static void start_threads(void)
{
struct buffer_instance *instance;
@@ -1802,6 +1836,8 @@ static void start_threads(void)

if (host)
setup_network();
+ else if (virt)
+ setup_virtio();

/* make a thread for every CPU we have */
pids = malloc_or_die(sizeof(*pids) * cpu_count * (buffers + 1));
@@ -1846,6 +1882,9 @@ static void record_data(char *date2ts, struct trace_seq *s)
if (host) {
finish_network();
return;
+ } else if (virt) {
+ finish_virt();
+ return;
}

if (latency)
@@ -2337,6 +2376,7 @@ static void record_all_events(void)
}

enum {
+ OPT_virt = 252,
OPT_nosplice = 253,
OPT_funcstack = 254,
OPT_date = 255,
@@ -2408,6 +2448,7 @@ void trace_record (int argc, char **argv)
{"date", no_argument, NULL, OPT_date},
{"func-stack", no_argument, NULL, OPT_funcstack},
{"nosplice", no_argument, NULL, OPT_nosplice},
+ {"virt", no_argument, NULL, OPT_virt},
{"help", no_argument, NULL, '?'},
{NULL, 0, NULL, 0}
};
@@ -2519,6 +2560,8 @@ void trace_record (int argc, char **argv)
case 'o':
if (host)
die("-o incompatible with -N");
+ if (virt)
+ die("-o incompatible with --virt");
if (!record && !extract)
die("start does not take output\n"
"Did you mean 'record'?");
@@ -2550,6 +2593,8 @@ void trace_record (int argc, char **argv)
case 'N':
if (!record)
die("-N only available with record");
+ if (virt)
+ die("-N incompatible with --virt");
if (output)
die("-N incompatible with -o");
host = optarg;
@@ -2562,6 +2607,8 @@ void trace_record (int argc, char **argv)
max_kb = atoi(optarg);
break;
case 't':
+ if (virt)
+ die("-t incompatible with --virt");
use_tcp = 1;
break;
case 'b':
@@ -2588,6 +2635,17 @@ void trace_record (int argc, char **argv)
case OPT_nosplice:
recorder_flags |= TRACECMD_RECORD_NOSPLICE;
break;
+ case OPT_virt:
+ if (!record)
+ die("--virt only available with record");
+ if (host)
+ die("--virt incompatible with -N");
+ if (output)
+ die("--virt incompatible with -o");
+ if (use_tcp)
+ die("--virt incompatible with -t");
+ virt = true;
+ break;
default:
usage(argv);
}
@@ -2663,6 +2721,8 @@ void trace_record (int argc, char **argv)
latency = 1;
if (host)
die("Network tracing not available with latency tracer plugins");
+ if (virt)
+ die("Virtio-trace not available with latency tracer plugins");
}
if (fset < 0 && (strcmp(plugin, "function") == 0 ||
strcmp(plugin, "function_graph") == 0))

2013-09-13 02:03:03

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 3/5] trace-cmd: Use poll(2) to wait for a message

Use poll(2) to wait for a message. If a client/server cannot send a message for
any reasons, the current server/client will wait in a blocking read operation.
So, we use poll(2) for avoiding remaining in a blocking state.

Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
---
trace-msg.c | 42 ++++++++++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/trace-msg.c b/trace-msg.c
index cf82ff6..61bde54 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -396,6 +396,27 @@ error:
return -ENOMSG;
}

+#define MSG_WAIT_MSEC 5000
+
+/*
+ * A return value of 0 indicates time-out
+ */
+static int tracecmd_msg_recv_wait(int fd, char *buf, struct tracecmd_msg **msg)
+{
+ struct pollfd pfd;
+ int ret;
+
+ pfd.fd = fd;
+ pfd.events = POLLIN;
+ ret = poll(&pfd, 1, MSG_WAIT_MSEC);
+ if (ret < 0) {
+ return -errno;
+ } else if (ret == 0)
+ return -ETIMEDOUT;
+
+ return tracecmd_msg_recv(fd, buf);
+}
+
static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
{
return (void *)msg + offset;
@@ -407,9 +428,12 @@ static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
u32 cmd;
int ret;

- ret = tracecmd_msg_recv(fd, msg_tmp);
- if (ret < 0)
+ ret = tracecmd_msg_recv_wait(fd, msg_tmp, msg);
+ if (ret < 0) {
+ if (ret == -ETIMEDOUT)
+ warning("Connection timed out\n");
return ret;
+ }

*msg = (struct tracecmd_msg *)msg_tmp;
cmd = ntohl((*msg)->cmd);
@@ -487,9 +511,12 @@ int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize)
u32 size = 0;
u32 cmd;

- ret = tracecmd_msg_recv(fd, buf);
- if (ret < 0)
+ ret = tracecmd_msg_recv_wait(fd, buf, &msg);
+ if (ret < 0) {
+ if (ret == -ETIMEDOUT)
+ warning("Connection timed out\n");
return ret;
+ }

msg = (struct tracecmd_msg *)buf;
cmd = ntohl(msg->cmd);
@@ -625,9 +652,12 @@ int tracecmd_msg_collect_metadata(int ifd, int ofd)
int ret;

do {
- ret = tracecmd_msg_recv(ifd, buf);
+ ret = tracecmd_msg_recv_wait(ifd, buf, &msg);
if (ret < 0) {
- warning("reading client");
+ if (ret == -ETIMEDOUT)
+ warning("Connection timed out\n");
+ else
+ warning("reading client");
return ret;
}

2013-09-13 02:03:22

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

Apply trace-msg protocol for communication between a server and clients.

Currently, trace-listen(server) and trace-record -N(client) operate as follows:

<server> <client>
listen to socket fd
connect to socket fd
accept the client
send "tracecmd"
+------------> receive "tracecmd"
check "tracecmd"
send cpus
receive cpus <------------+
print "cpus=XXX"
send pagesize
|
receive pagesize <--------+
print "pagesize=XXX"
send option
|
receive option <----------+
understand option
send port_array
+------------> receive port_array
understand port_array
send meta data
receive meta data <-------+
record meta data
(snip)
read block
--- start sending trace data on child processes ---

--- When client finishes sending trace data ---
close(socket fd)
read size = 0
close(socket fd)

All messages are unstructured character strings, so server(client) using the
protocol must parse the unstructured messages. Since it is hard to
add complex contents in the protocol, structured binary message trace-msg
is introduced as the communication protocol.

By applying this patch, server and client operate as follows:

<server> <client>
listen to socket fd
connect to socket fd
accept the client
send "tracecmd"
+------------> receive "tracecmd"
check "tracecmd"
send "V2\0<MAGIC_NUMBER>\00" as the v2 protocol
receive "V2" <------------+
check "V2"
read "<MAGIC_NUMBER>\00"
send "V2"
+---------------> receive "V2"
check "V2"
send cpus,pagesize,option(MSG_TINIT)
receive MSG_TINIT <-------+
print "cpus=XXX"
print "pagesize=XXX"
understand option
send port_array
+--MSG_RINIT-> receive MSG_RINIT
understand port_array
send meta data(MSG_SENDMETA)
receive MSG_SENDMETA <----+
record meta data
(snip)
send a message to finish sending meta data
| (MSG_FINMETA)
receive MSG_FINMETA <-----+
read block
--- start sending trace data on child processes ---

--- When client finishes sending trace data ---
send MSG_CLOSE
receive MSG_CLOSE <-------+
close(socket fd) close(socket fd)

By introducing the v2 protocol, after the client checks "tracecmd", the client
will send "V2\0<MAGIC_NUMBER>\00\0". This complex message is used when the
new client tries to connect to the old server. The new client wants to check
whether the reply message from the server is "V2" or not. However, the old
server does not respond to the client before receiving cpu numbers, page size,
and options. Each message is separated with "\0" in the old server, so the
client send "V2" as cpu numbers, "<MAGIC_NUMBER>" as page size, and "0" as
no options. On the other hands, the old server will understand the messages
as cpus=0, pagesize=<MAGIC_NUMBER>, and options=0, and then the server will
send the message "\0" as port numbers. Then, the message which the client
receives is not "V2" but "\0", so the client will reconnect to the old server
as the v1 protocol.

Changes in V2: Regacy porotocol support in order to keep backward compatibility

Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
---
Makefile | 2
trace-cmd.h | 11 +
trace-listen.c | 133 +++++++----
trace-msg.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
trace-msg.h | 27 ++
trace-output.c | 4
trace-record.c | 86 ++++++-
7 files changed, 880 insertions(+), 66 deletions(-)
create mode 100644 trace-msg.c
create mode 100644 trace-msg.h

diff --git a/Makefile b/Makefile
index 1964949..054f53d 100644
--- a/Makefile
+++ b/Makefile
@@ -314,7 +314,7 @@ KERNEL_SHARK_OBJS = $(TRACE_VIEW_OBJS) $(TRACE_GRAPH_OBJS) $(TRACE_GUI_OBJS) \
PEVENT_LIB_OBJS = event-parse.o trace-seq.o parse-filter.o parse-utils.o
TCMD_LIB_OBJS = $(PEVENT_LIB_OBJS) trace-util.o trace-input.o trace-ftrace.o \
trace-output.o trace-recorder.o trace-restore.o trace-usage.o \
- trace-blk-hack.o kbuffer-parse.o
+ trace-blk-hack.o kbuffer-parse.o trace-msg.o

PLUGIN_OBJS = plugin_hrtimer.o plugin_kmem.o plugin_sched_switch.o \
plugin_mac80211.o plugin_jbd2.o plugin_function.o plugin_kvm.o \
diff --git a/trace-cmd.h b/trace-cmd.h
index cbbc6ed..a2958ac 100644
--- a/trace-cmd.h
+++ b/trace-cmd.h
@@ -248,6 +248,17 @@ void tracecmd_stop_recording(struct tracecmd_recorder *recorder);
void tracecmd_stat_cpu(struct trace_seq *s, int cpu);
long tracecmd_flush_recording(struct tracecmd_recorder *recorder);

+/* for clients */
+int tracecmd_msg_send_init_data(int fd);
+int tracecmd_msg_metadata_send(int fd, char *buf, int size);
+int tracecmd_msg_finish_sending_metadata(int fd);
+void tracecmd_msg_send_close_msg();
+
+/* for server */
+int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize);
+int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports);
+int tracecmd_msg_collect_metadata(int ifd, int ofd);
+
/* --- Plugin handling --- */
extern struct plugin_option trace_ftrace_options[];

diff --git a/trace-listen.c b/trace-listen.c
index bf187c9..280b1af 100644
--- a/trace-listen.c
+++ b/trace-listen.c
@@ -33,6 +33,7 @@
#include <errno.h>

#include "trace-local.h"
+#include "trace-msg.h"

#define MAX_OPTION_SIZE 4096

@@ -45,10 +46,10 @@ static FILE *logfp;

static int debug;

-static int use_tcp;
-
static int backlog = 5;

+static int proto_ver;
+
#define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu
static char *get_temp_file(const char *host, const char *port, int cpu)
{
@@ -112,10 +113,9 @@ static int process_option(char *option)
return 0;
}

-static int done;
static void finish(int sig)
{
- done = 1;
+ done = true;
}

#define LOG_BUF_SIZE 1024
@@ -144,7 +144,7 @@ static void __plog(const char *prefix, const char *fmt, va_list ap,
fprintf(fp, "%.*s", r, buf);
}

-static void plog(const char *fmt, ...)
+void plog(const char *fmt, ...)
{
va_list ap;

@@ -153,7 +153,7 @@ static void plog(const char *fmt, ...)
va_end(ap);
}

-static void pdie(const char *fmt, ...)
+void pdie(const char *fmt, ...)
{
va_list ap;
char *str = "";
@@ -324,56 +324,78 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize)

*cpus = atoi(buf);

- plog("cpus=%d\n", *cpus);
- if (*cpus < 0)
- return -1;
+ /* Is the client using the new protocol? */
+ if (!*cpus) {
+ if (memcmp(buf, "V2", 2) != 0) {
+ plog("Cannot handle the protocol %s", buf);
+ return -1;
+ }

- /* next read the page size */
- n = read_string(fd, buf, BUFSIZ);
- if (n == BUFSIZ)
- /** ERROR **/
- return -1;
+ /* read the rest of dummy data, but not use */
+ read(fd, buf, sizeof(V2_MAGIC)+1);

- *pagesize = atoi(buf);
+ proto_ver = V2_PROTOCOL;

- plog("pagesize=%d\n", *pagesize);
- if (*pagesize <= 0)
- return -1;
+ /* Let the client know we use v2 protocol */
+ write(fd, "V2", 2);

- /* Now the number of options */
- n = read_string(fd, buf, BUFSIZ);
- if (n == BUFSIZ)
- /** ERROR **/
- return -1;
+ /* read the CPU count, the page size, and options */
+ if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0)
+ return -1;
+ } else {
+ /* The client is using the v1 protocol */

- options = atoi(buf);
+ plog("cpus=%d\n", *cpus);
+ if (*cpus < 0)
+ return -1;

- for (i = 0; i < options; i++) {
- /* next is the size of the options */
+ /* next read the page size */
n = read_string(fd, buf, BUFSIZ);
if (n == BUFSIZ)
/** ERROR **/
return -1;
- size = atoi(buf);
- /* prevent a client from killing us */
- if (size > MAX_OPTION_SIZE)
+
+ *pagesize = atoi(buf);
+
+ plog("pagesize=%d\n", *pagesize);
+ if (*pagesize <= 0)
return -1;
- option = malloc_or_die(size);
- do {
- t = size;
- s = 0;
- s = read(fd, option+s, t);
- if (s <= 0)
- return -1;
- t -= s;
- s = size - t;
- } while (t);

- s = process_option(option);
- free(option);
- /* do we understand this option? */
- if (!s)
+ /* Now the number of options */
+ n = read_string(fd, buf, BUFSIZ);
+ if (n == BUFSIZ)
+ /** ERROR **/
return -1;
+
+ options = atoi(buf);
+
+ for (i = 0; i < options; i++) {
+ /* next is the size of the options */
+ n = read_string(fd, buf, BUFSIZ);
+ if (n == BUFSIZ)
+ /** ERROR **/
+ return -1;
+ size = atoi(buf);
+ /* prevent a client from killing us */
+ if (size > MAX_OPTION_SIZE)
+ return -1;
+ option = malloc_or_die(size);
+ do {
+ t = size;
+ s = 0;
+ s = read(fd, option+s, t);
+ if (s <= 0)
+ return -1;
+ t -= s;
+ s = size - t;
+ } while (t);
+
+ s = process_option(option);
+ free(option);
+ /* do we understand this option? */
+ if (!s)
+ return -1;
+ }
}

if (use_tcp)
@@ -442,14 +464,20 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
start_port = udp_port + 1;
}

- /* send the client a comma deliminated set of port numbers */
- for (cpu = 0; cpu < cpus; cpu++) {
- snprintf(buf, BUFSIZ, "%s%d",
- cpu ? "," : "", port_array[cpu]);
- write(fd, buf, strlen(buf));
+ if (proto_ver == V2_PROTOCOL) {
+ /* send set of port numbers to the client */
+ if (tracecmd_msg_send_port_array(fd, cpus, port_array) < 0)
+ goto out_free;
+ } else {
+ /* send the client a comma deliminated set of port numbers */
+ for (cpu = 0; cpu < cpus; cpu++) {
+ snprintf(buf, BUFSIZ, "%s%d",
+ cpu ? "," : "", port_array[cpu]);
+ write(fd, buf, strlen(buf));
+ }
+ /* end with null terminator */
+ write(fd, "\0", 1);
}
- /* end with null terminator */
- write(fd, "\0", 1);

return pid_array;

@@ -528,7 +556,10 @@ static void process_client(const char *node, const char *port, int fd)
return;

/* Now we are ready to start reading data from the client */
- collect_metadata_from_client(fd, ofd);
+ if (proto_ver == V2_PROTOCOL)
+ tracecmd_msg_collect_metadata(fd, ofd);
+ else
+ collect_metadata_from_client(fd, ofd);

/* wait a little to let our readers finish reading */
sleep(1);
diff --git a/trace-msg.c b/trace-msg.c
new file mode 100644
index 0000000..cf82ff6
--- /dev/null
+++ b/trace-msg.c
@@ -0,0 +1,683 @@
+/*
+ * trace-msg.c : define message protocol for communication between clients and
+ * a server
+ *
+ * Copyright (C) 2013 Hitachi, Ltd.
+ * Created by Yoshihiro YUNOMAE <[email protected]>
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License (not later!)
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses>
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <errno.h>
+#include <poll.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <sys/types.h>
+#include <linux/types.h>
+
+#include "trace-cmd-local.h"
+#include "trace-msg.h"
+
+typedef __u32 u32;
+typedef __be32 be32;
+
+#define TRACECMD_MSG_MAX_LEN BUFSIZ
+
+ /* size + cmd */
+#define TRACECMD_MSG_HDR_LEN ((sizeof(be32)) + (sizeof(be32)))
+
+ /* + size of the metadata */
+#define TRACECMD_MSG_META_MIN_LEN \
+ ((TRACECMD_MSG_HDR_LEN) + (sizeof(be32)))
+
+ /* - header size for error msg */
+#define TRACECMD_MSG_META_MAX_LEN \
+((TRACECMD_MSG_MAX_LEN) - (TRACECMD_MSG_META_MIN_LEN) - TRACECMD_MSG_HDR_LEN)
+
+ /* size + opt_cmd + size of str */
+#define TRACECMD_OPT_MIN_LEN \
+ ((sizeof(be32)) + (sizeof(be32)) +(sizeof(be32)))
+
+
+#define CPU_MAX 256
+
+/* for both client and server */
+bool use_tcp;
+int cpu_count;
+
+/* for client */
+static int psfd;
+unsigned int page_size;
+int *client_ports;
+bool send_metadata;
+
+/* for server */
+static int *port_array;
+bool done;
+
+struct tracecmd_msg_str {
+ be32 size;
+ char *buf;
+} __attribute__((packed));
+
+struct tracecmd_msg_opt {
+ be32 size;
+ be32 opt_cmd;
+ struct tracecmd_msg_str str;
+};
+
+struct tracecmd_msg_tinit {
+ be32 cpus;
+ be32 page_size;
+ be32 opt_num;
+ struct tracecmd_msg_opt *opt;
+} __attribute__((packed));
+
+struct tracecmd_msg_rinit {
+ be32 cpus;
+ be32 port_array[CPU_MAX];
+} __attribute__((packed));
+
+struct tracecmd_msg_meta {
+ struct tracecmd_msg_str str;
+};
+
+struct tracecmd_msg_error {
+ be32 size;
+ be32 cmd;
+ union {
+ struct tracecmd_msg_tinit tinit;
+ struct tracecmd_msg_rinit rinit;
+ struct tracecmd_msg_meta meta;
+ } data;
+} __attribute__((packed));
+
+enum tracecmd_msg_cmd {
+ MSG_CLOSE = 1,
+ MSG_TINIT = 4,
+ MSG_RINIT = 5,
+ MSG_SENDMETA = 6,
+ MSG_FINMETA = 7,
+};
+
+struct tracecmd_msg {
+ be32 size;
+ be32 cmd;
+ union {
+ struct tracecmd_msg_tinit tinit;
+ struct tracecmd_msg_rinit rinit;
+ struct tracecmd_msg_meta meta;
+ struct tracecmd_msg_error err;
+ } data;
+} __attribute__((packed));
+
+struct tracecmd_msg *errmsg;
+
+static ssize_t msg_do_write_check(int fd, struct tracecmd_msg *msg)
+{
+ return __do_write_check(fd, msg, ntohl(msg->size));
+}
+
+static struct tracecmd_msg *tracecmd_msg_alloc(u32 size)
+{
+ size += TRACECMD_MSG_HDR_LEN;
+ return malloc(size);
+}
+
+static void tracecmd_msg_init(u32 cmd, u32 size, struct tracecmd_msg *msg)
+{
+ size += TRACECMD_MSG_HDR_LEN;
+ memset(msg, 0, size);
+ msg->size = htonl(size);
+ msg->cmd = htonl(cmd);
+}
+
+static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
+{
+ memcpy(dest+offset, buf, buflen);
+}
+
+enum msg_opt_command {
+ MSGOPT_USETCP = 1,
+};
+
+static struct tracecmd_msg_opt *tracecmd_msg_opt_alloc(u32 len)
+{
+ len += TRACECMD_OPT_MIN_LEN;
+ return malloc(len);
+}
+
+static void make_option(int opt_cmd, const char *buf,
+ struct tracecmd_msg_opt *opt)
+{
+ u32 buflen = 0;
+ u32 size = TRACECMD_OPT_MIN_LEN;
+
+ if (buf) {
+ buflen = strlen(buf);
+ size += buflen;
+ }
+
+ opt->size = htonl(size);
+ opt->opt_cmd = htonl(opt_cmd);
+ opt->str.size = htonl(buflen);
+
+ if (buf)
+ bufcpy(opt, TRACECMD_OPT_MIN_LEN, buf, buflen);
+}
+
+static int add_options_to_tinit(u32 len, struct tracecmd_msg *msg)
+{
+ struct tracecmd_msg_opt *opt;
+ int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
+
+ if (use_tcp) {
+ opt = tracecmd_msg_opt_alloc(0);
+ if (!opt)
+ return -ENOMEM;
+
+ make_option(MSGOPT_USETCP, NULL, opt);
+ /* add option */
+ bufcpy(msg, offset, opt, ntohl(opt->size));
+ free(opt);
+ }
+
+ return 0;
+}
+
+static int make_tinit(u32 len, struct tracecmd_msg *msg)
+{
+ int opt_num = 0;
+ int ret = 0;
+
+ if (use_tcp)
+ opt_num++;
+
+ if (opt_num) {
+ ret = add_options_to_tinit(len, msg);
+ if (ret < 0)
+ return ret;
+ }
+
+ msg->data.tinit.cpus = htonl(cpu_count);
+ msg->data.tinit.page_size = htonl(page_size);
+ msg->data.tinit.opt_num = htonl(opt_num);
+
+ return 0;
+}
+
+static int make_rinit(struct tracecmd_msg *msg)
+{
+ int i;
+ u32 offset = TRACECMD_MSG_HDR_LEN;
+ be32 port;
+
+ msg->data.rinit.cpus = htonl(cpu_count);
+
+ for (i = 0; i < cpu_count; i++) {
+ /* + rrqports->cpus or rrqports->port_array[i] */
+ offset += sizeof(be32);
+ port = htonl(port_array[i]);
+ bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
+ }
+
+ return 0;
+}
+
+static u32 tracecmd_msg_get_body_length(u32 cmd)
+{
+ struct tracecmd_msg *msg;
+ u32 len = 0;
+
+ switch (cmd) {
+ case MSG_TINIT:
+ len = sizeof(msg->data.tinit.cpus)
+ + sizeof(msg->data.tinit.page_size)
+ + sizeof(msg->data.tinit.opt_num);
+
+ /*
+ * If we are using IPV4 and our page size is greater than
+ * or equal to 64K, we need to punt and use TCP. :-(
+ */
+
+ /* TODO, test for ipv4 */
+ if (page_size >= UDP_MAX_PACKET) {
+ warning("page size too big for UDP using TCP "
+ "in live read");
+ use_tcp = true;
+ }
+
+ if (use_tcp)
+ len += TRACECMD_OPT_MIN_LEN;
+
+ return len;
+ case MSG_RINIT:
+ return sizeof(msg->data.rinit.cpus)
+ + sizeof(msg->data.rinit.port_array);
+ case MSG_SENDMETA:
+ return TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN;
+ case MSG_CLOSE:
+ case MSG_FINMETA:
+ break;
+ }
+
+ return 0;
+}
+
+static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
+{
+ switch (cmd) {
+ case MSG_TINIT:
+ return make_tinit(len, msg);
+ case MSG_RINIT:
+ return make_rinit(msg);
+ case MSG_CLOSE:
+ case MSG_SENDMETA: /* meta data is not stored here. */
+ case MSG_FINMETA:
+ break;
+ }
+
+ return 0;
+}
+
+static int tracecmd_msg_create(u32 cmd, struct tracecmd_msg **msg)
+{
+ u32 len = 0;
+ int ret = 0;
+
+ len = tracecmd_msg_get_body_length(cmd);
+ if (len > (TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN)) {
+ plog("Exceed maximum message size cmd=%d\n", cmd);
+ return -EINVAL;
+ }
+
+ *msg = tracecmd_msg_alloc(len);
+ if (!*msg)
+ return -ENOMEM;
+ tracecmd_msg_init(cmd, len, *msg);
+
+ ret = tracecmd_msg_make_body(cmd, len, *msg);
+ if (ret < 0)
+ free(*msg);
+
+ return ret;
+}
+
+static int tracecmd_msg_send(int fd, u32 cmd)
+{
+ struct tracecmd_msg *msg = NULL;
+ int ret = 0;
+
+ if (cmd > MSG_FINMETA) {
+ plog("Unsupported command: %d\n", cmd);
+ return -EINVAL;
+ }
+
+ ret = tracecmd_msg_create(cmd, &msg);
+ if (ret < 0)
+ return ret;
+
+ ret = msg_do_write_check(fd, msg);
+ if (ret < 0) {
+ free(msg);
+ return -ECOMM;
+ }
+
+ return 0;
+}
+
+static int tracecmd_msg_read_extra(int fd, char *buf, u32 size, int *n)
+{
+ int r = 0;
+
+ do {
+ r = read(fd, buf+*n, size);
+ if (r < 0) {
+ if (errno == EINTR)
+ continue;
+ return -errno;
+ } else if (!r)
+ return -ENOTCONN;
+ size -= r;
+ *n += r;
+ } while (size);
+
+ return 0;
+}
+
+/*
+ * Read header information of msg first, then read all data
+ */
+static int tracecmd_msg_recv(int fd, char *buf)
+{
+ struct tracecmd_msg *msg;
+ u32 size = 0;
+ int n = 0;
+ int ret;
+
+ ret = tracecmd_msg_read_extra(fd, buf, TRACECMD_MSG_HDR_LEN, &n);
+ if (ret < 0)
+ return ret;
+
+ msg = (struct tracecmd_msg *)buf;
+ size = ntohl(msg->size);
+ if (size > TRACECMD_MSG_MAX_LEN)
+ /* too big */
+ goto error;
+ else if (size < TRACECMD_MSG_HDR_LEN)
+ /* too small */
+ goto error;
+ else if (size > TRACECMD_MSG_HDR_LEN) {
+ size -= TRACECMD_MSG_HDR_LEN;
+ return tracecmd_msg_read_extra(fd, buf, size, &n);
+ }
+
+ return 0;
+error:
+ plog("Receive an invalid message(size=%d)\n", size);
+ return -ENOMSG;
+}
+
+static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
+{
+ return (void *)msg + offset;
+}
+
+static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
+{
+ char msg_tmp[TRACECMD_MSG_MAX_LEN];
+ u32 cmd;
+ int ret;
+
+ ret = tracecmd_msg_recv(fd, msg_tmp);
+ if (ret < 0)
+ return ret;
+
+ *msg = (struct tracecmd_msg *)msg_tmp;
+ cmd = ntohl((*msg)->cmd);
+ if (cmd == MSG_CLOSE)
+ return -ECONNABORTED;
+
+ return 0;
+}
+
+static int tracecmd_msg_send_and_wait_for_msg(int fd, u32 cmd, struct tracecmd_msg **msg)
+{
+ int ret;
+
+ ret = tracecmd_msg_send(fd, cmd);
+ if (ret < 0)
+ return ret;
+
+ ret = tracecmd_msg_wait_for_msg(fd, msg);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+int tracecmd_msg_send_init_data(int fd)
+{
+ struct tracecmd_msg *msg;
+ int i, cpus;
+ int ret;
+
+ ret = tracecmd_msg_send_and_wait_for_msg(fd, MSG_TINIT, &msg);
+ if (ret < 0)
+ return ret;
+
+ cpus = ntohl(msg->data.rinit.cpus);
+ client_ports = malloc_or_die(sizeof(int) * cpus);
+ for (i = 0; i < cpus; i++)
+ client_ports[i] = ntohl(msg->data.rinit.port_array[i]);
+
+ /* Next, send meta data */
+ send_metadata = true;
+
+ return 0;
+}
+
+static bool process_option(struct tracecmd_msg_opt *opt)
+{
+ /* currently the only option we have is to us TCP */
+ if (ntohl(opt->opt_cmd) == MSGOPT_USETCP) {
+ use_tcp = true;
+ return true;
+ }
+ return false;
+}
+
+static void error_operation_for_server(struct tracecmd_msg *msg)
+{
+ u32 cmd;
+
+ cmd = ntohl(msg->cmd);
+
+ warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
+}
+
+#define MAX_OPTION_SIZE 4096
+
+int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize)
+{
+ struct tracecmd_msg *msg;
+ struct tracecmd_msg_opt *opt;
+ char buf[TRACECMD_MSG_MAX_LEN];
+ int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
+ int options, i, s;
+ int ret;
+ u32 size = 0;
+ u32 cmd;
+
+ ret = tracecmd_msg_recv(fd, buf);
+ if (ret < 0)
+ return ret;
+
+ msg = (struct tracecmd_msg *)buf;
+ cmd = ntohl(msg->cmd);
+ if (cmd != MSG_TINIT) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ *cpus = ntohl(msg->data.tinit.cpus);
+ plog("cpus=%d\n", *cpus);
+ if (*cpus < 0) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ *pagesize = ntohl(msg->data.tinit.page_size);
+ plog("pagesize=%d\n", *pagesize);
+ if (*pagesize <= 0) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ options = ntohl(msg->data.tinit.opt_num);
+ for (i = 0; i < options; i++) {
+ offset += size;
+ opt = tracecmd_msg_buf_access(msg, offset);
+ size = ntohl(opt->size);
+ /* prevent a client from killing us */
+ if (size > MAX_OPTION_SIZE) {
+ plog("Exceed MAX_OPTION_SIZE\n");
+ ret = -EINVAL;
+ goto error;
+ }
+ s = process_option(opt);
+ /* do we understand this option? */
+ if (!s) {
+ plog("Cannot understand(%d:%d:%d)\n",
+ i, ntohl(opt->size), ntohl(opt->opt_cmd));
+ ret = -EINVAL;
+ goto error;
+ }
+ }
+
+ return 0;
+
+error:
+ error_operation_for_server(msg);
+ return ret;
+}
+
+int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports)
+{
+ int ret;
+
+ cpu_count = total_cpus;
+ port_array = ports;
+
+ ret = tracecmd_msg_send(fd, MSG_RINIT);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+void tracecmd_msg_send_close_msg()
+{
+ tracecmd_msg_send(psfd, MSG_CLOSE);
+}
+
+static void make_meta(const char *buf, int buflen, struct tracecmd_msg *msg)
+{
+ int offset = offsetof(struct tracecmd_msg, data.meta.str.buf);
+
+ msg->data.meta.str.size = htonl(buflen);
+ bufcpy(msg, offset, buf, buflen);
+}
+
+int tracecmd_msg_metadata_send(int fd, char *buf, int size)
+{
+ struct tracecmd_msg *msg;
+ int n, len;
+ int ret;
+ int count = 0;
+
+ ret = tracecmd_msg_create(MSG_SENDMETA, &msg);
+ if (ret < 0)
+ return ret;
+
+ n = size;
+ do {
+ if (n > TRACECMD_MSG_META_MAX_LEN) {
+ make_meta(buf+count, TRACECMD_MSG_META_MAX_LEN, msg);
+ n -= TRACECMD_MSG_META_MAX_LEN;
+ count += TRACECMD_MSG_META_MAX_LEN;
+ } else {
+ make_meta(buf+count, n, msg);
+ /*
+ * TRACECMD_MSG_META_MAX_LEN is stored in msg->size,
+ * so update the size to the correct value.
+ */
+ len = TRACECMD_MSG_META_MIN_LEN + n;
+ msg->size = htonl(len);
+ n = 0;
+ }
+
+ ret = msg_do_write_check(fd, msg);
+ if (ret < 0)
+ return ret;
+ } while (n);
+
+ return 0;
+}
+
+int tracecmd_msg_finish_sending_metadata(int fd)
+{
+ int ret;
+
+ ret = tracecmd_msg_send(fd, MSG_FINMETA);
+ if (ret < 0)
+ return ret;
+
+ /* psfd will be used for closing */
+ psfd = fd;
+ return 0;
+}
+
+int tracecmd_msg_collect_metadata(int ifd, int ofd)
+{
+ struct tracecmd_msg *msg;
+ char buf[TRACECMD_MSG_MAX_LEN];
+ u32 s, t, n, cmd;
+ int offset = TRACECMD_MSG_META_MIN_LEN;
+ int ret;
+
+ do {
+ ret = tracecmd_msg_recv(ifd, buf);
+ if (ret < 0) {
+ warning("reading client");
+ return ret;
+ }
+
+ msg = (struct tracecmd_msg *)buf;
+ cmd = ntohl(msg->cmd);
+ if (cmd == MSG_FINMETA) {
+ /* Finish receiving meta data */
+ break;
+ } else if (cmd != MSG_SENDMETA)
+ goto error;
+
+ n = ntohl(msg->data.meta.str.size);
+ t = n;
+ s = 0;
+ do {
+ s = write(ofd, buf+s+offset, t);
+ if (s < 0) {
+ if (errno == EINTR)
+ continue;
+ warning("writing to file");
+ return -errno;
+ }
+ t -= s;
+ s = n - t;
+ } while (t);
+ } while (cmd == MSG_SENDMETA);
+
+ /* check the finish message of the client */
+ while(!done) {
+ ret = tracecmd_msg_recv(ifd, buf);
+ if (ret < 0) {
+ warning("reading client");
+ return ret;
+ }
+
+ msg = (struct tracecmd_msg *)buf;
+ cmd = ntohl(msg->cmd);
+ if (cmd == MSG_CLOSE)
+ /* Finish this connection */
+ break;
+ else {
+ warning("Not accept the message %d", ntohl(msg->cmd));
+ ret = -EINVAL;
+ goto error;
+ }
+ }
+
+ return 0;
+
+error:
+ error_operation_for_server(msg);
+ return ret;
+}
diff --git a/trace-msg.h b/trace-msg.h
new file mode 100644
index 0000000..b23e72b
--- /dev/null
+++ b/trace-msg.h
@@ -0,0 +1,27 @@
+#ifndef _TRACE_MSG_H_
+#define _TRACE_MSG_H_
+
+#include <stdbool.h>
+
+#define UDP_MAX_PACKET (65536 - 20)
+#define V2_MAGIC "677768\0"
+
+#define V1_PROTOCOL 1
+#define V2_PROTOCOL 2
+
+/* for both client and server */
+extern bool use_tcp;
+extern int cpu_count;
+
+/* for client */
+extern unsigned int page_size;
+extern int *client_ports;
+extern bool send_metadata;
+
+/* for server */
+extern bool done;
+
+void plog(const char *fmt, ...);
+void pdie(const char *fmt, ...);
+
+#endif /* _TRACE_MSG_H_ */
diff --git a/trace-output.c b/trace-output.c
index bdb478d..6e1298b 100644
--- a/trace-output.c
+++ b/trace-output.c
@@ -36,6 +36,7 @@
#include <glob.h>

#include "trace-cmd-local.h"
+#include "trace-msg.h"
#include "version.h"

/* We can't depend on the host size for size_t, all must be 64 bit */
@@ -80,6 +81,9 @@ struct list_event_system {
static stsize_t
do_write_check(struct tracecmd_output *handle, void *data, tsize_t size)
{
+ if (send_metadata)
+ return tracecmd_msg_metadata_send(handle->fd, data, size);
+
return __do_write_check(handle->fd, data, size);
}

diff --git a/trace-record.c b/trace-record.c
index 0199627..ebfe6c0 100644
--- a/trace-record.c
+++ b/trace-record.c
@@ -45,6 +45,7 @@
#include <errno.h>

#include "trace-local.h"
+#include "trace-msg.h"

#define _STR(x) #x
#define STR(x) _STR(x)
@@ -59,29 +60,21 @@
#define STAMP "stamp"
#define FUNC_STACK_TRACE "func_stack_trace"

-#define UDP_MAX_PACKET (65536 - 20)
-
static int tracing_on_init_val;

static int rt_prio;

-static int use_tcp;
-
-static unsigned int page_size;
-
static int buffer_size;

static const char *output_file = "trace.dat";

static int latency;
static int sleep_time = 1000;
-static int cpu_count;
static int recorder_threads;
static int *pids;
static int buffers;

static char *host;
-static int *client_ports;
static int sfd;

/* Max size to let a per cpu file get */
@@ -99,6 +92,8 @@ static unsigned recorder_flags;
/* Try a few times to get an accurate date */
static int date2ts_tries = 5;

+static int proto_ver = V2_PROTOCOL;
+
struct func_list {
struct func_list *next;
const char *func;
@@ -1607,20 +1602,26 @@ static int create_recorder(struct buffer_instance *instance, int cpu, int extrac
exit(0);
}

-static void communicate_with_listener(int fd)
+static void check_first_msg_from_server(int fd)
{
char buf[BUFSIZ];
- ssize_t n;
- int cpu, i;

- n = read(fd, buf, 8);
+ read(fd, buf, 8);

/* Make sure the server is the tracecmd server */
if (memcmp(buf, "tracecmd", 8) != 0)
die("server not tracecmd server");
+}

- /* write the number of CPUs we have (in ASCII) */
+static void communicate_with_listener_v1(int fd)
+{
+ char buf[BUFSIZ];
+ ssize_t n;
+ int cpu, i;
+
+ check_first_msg_from_server(fd);

+ /* write the number of CPUs we have (in ASCII) */
sprintf(buf, "%d", cpu_count);

/* include \0 */
@@ -1675,6 +1676,46 @@ static void communicate_with_listener(int fd)
}
}

+static void communicate_with_listener_v2(int fd)
+{
+ if (tracecmd_msg_send_init_data(fd) < 0)
+ die("Cannot communicate with server");
+}
+
+static void check_protocol_version(int fd)
+{
+ char buf[BUFSIZ];
+
+ check_first_msg_from_server(fd);
+
+ /*
+ * Write the protocol version, the magic number, and the dummy
+ * option(0) (in ASCII). The client understands whether the client
+ * uses the v2 protocol or not by checking a reply message from the
+ * server. If the message is "V2", the server uses v2 protocol. On the
+ * other hands, if the message is just number strings, the server
+ * returned port numbers. So, in that time, the client understands the
+ * server uses the v1 protocol. However, the old server tells the
+ * client port numbers after reading cpu_count, page_size, and option.
+ * So, we add the dummy number (the magic number and 0 option) to the
+ * first client message.
+ */
+ write(fd, "V2\0"V2_MAGIC"0", sizeof(V2_MAGIC)+4);
+
+ /* read a reply message */
+ read(fd, buf, BUFSIZ);
+
+ if (!buf[0]) {
+ /* the server uses the v1 protocol, so we'll use it */
+ proto_ver = V1_PROTOCOL;
+ plog("Use the v1 protocol\n");
+ } else {
+ if (memcmp(buf, "V2", 2) != 0)
+ die("Cannot handle the protocol %s", buf);
+ /* OK, let's use v2 protocol */
+ }
+}
+
static void setup_network(void)
{
struct tracecmd_output *handle;
@@ -1703,6 +1744,7 @@ static void setup_network(void)
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;

+again:
s = getaddrinfo(server, port, &hints, &result);
if (s != 0)
die("getaddrinfo: %s", gai_strerror(s));
@@ -1723,16 +1765,32 @@ static void setup_network(void)

freeaddrinfo(result);

- communicate_with_listener(sfd);
+ if (proto_ver == V2_PROTOCOL) {
+ check_protocol_version(sfd);
+ if (proto_ver == V1_PROTOCOL) {
+ /* reconnect to the server for using the v1 protocol */
+ close(sfd);
+ goto again;
+ }
+ communicate_with_listener_v2(sfd);
+ }
+
+ if (proto_ver == V1_PROTOCOL)
+ communicate_with_listener_v1(sfd);

/* Now create the handle through this socket */
handle = tracecmd_create_init_fd_glob(sfd, listed_events);

+ if (proto_ver == V2_PROTOCOL)
+ tracecmd_msg_finish_sending_metadata(sfd);
+
/* OK, we are all set, let'r rip! */
}

static void finish_network(void)
{
+ if (proto_ver == V2_PROTOCOL)
+ tracecmd_msg_send_close_msg();
close(sfd);
free(host);
}

2013-10-11 01:39:41

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

Hi Steven,

Would you review this patch set?

Thanks,
Yoshihiro YUNOMAE

(2013/09/13 11:06), Yoshihiro YUNOMAE wrote:
> Hi Steven,
>
> This is a v2 patch set for realizing a part of "Integrated trace" feature which
> is a trace merging system for a virtualization environment. Currently, trace-cmd
> does not have following features yet:
>
> a) Server and client for a virtualization environment
> b) Structured message platform between guests and host
> c) Agent feature of a client
> d) Merge feature of trace data of multiple guests and host in chronological
> order
>
> This patch set supports above a) and b) features.
>
> <overall view>
>
> +------------+ +------------+
> Guest | a), c) | | a), c) | client/agent
> ^ +------------+ +------------+
> | ^ ^ ^ ^
> ============|===|=================|===|===========
> | v b)v v b)v
> v +----------------------------------+
> Host | a) | server
> +----------------------------------+
> ||output || ||
> \/ \/ \/
> /--------+ /--------+ /--------+
> | 010101 | | 101010 | | 100101 | binary data
> | 010100 | | 010100 | | 110011 |
> +--------+ +--------+ +--------+
> \ /
> \-----------------------------------/
> || d)
> \/
> /-----------------------------------+
> | (guest1) 123456: sched_switch... | text data
> | (guest2) 123458: kmem_free... |
> | (host) 123500: kvm_exit (guest1)|
> | (host) 123510: kvm_entry(guest1)|
> | (guest1) 123550: sched_switch... |
> +-----------------------------------+
>
> a) Server and client for a virtualization environment
> trace-cmd has listen mode for network, but using network will be a high cost
> operation for inducing a lot of memory copying. From kernel-3.6, the
> virtio-console driver supports splice_write and ftrace supports "steal" for
> fops. So, guest clients of trace-cmd can send trace data without copying memory
> by using splice(2). If guest clients use virtio-serial, the server also needs to
> support virtio-serial I/F.
>
> b) Structured message platform between guests and a host
> Currently, a server(clients) sends unstructured character string to
> clients(server), so clients(server) must parse the unstructured messages.
> Since it is hard to add complex contents in the protocol, structured binary
> message trace-msg is introduced as the communication protocol.
>
> c) Agent feature of a client
> Current trace-cmd client can operate only as "record" mode, so the client
> will send trace data to the server immediately. However, when an user tries to
> collect trace data of multiple guests on a host, the user must log in to
> each guest. This is hard to use, I think. So, trace-cmd client had better
> support agent mode which receives a message from the server.
>
> d) Merge feature of trace data of multiple guests and a host in chronological
> order
> Current trace-cmd has a merge feature for multiple machines whose times are
> synchronized by NTP. When we use the feature, we execute "trace-cmd record"
> with --date option on each machine, and then we run "trace-cmd report" with -i
> option for each file.
> However, there are cases that times of those machines cannot be synchronized.
> For example, although multiple users can run guests on virtualization
> environments (e.g. multi-tenant cloud hosting), there are no guarantee that
> they use the same NTP server. Moreover, even if the times are synchronized,
> trace data cannot exactly be merged because the NTP-synchronized time
> granularity may not be enough fine for sorting guest-host switching events.
> So, I'm considering that trace data use x86-tsc as timestamp in order to merge
> trace data. By using x86-tsc, we can merge trace data even if time of those
> machines is not synchronized when CPU has the invariant TSC feature or the
> constant TSC feature. And the precision will be enough for understanding
> operations of guests and host. However, TSC values on a guest are not equal to
> the values on the host because
> TSC_guest = TSC_host + TSC_offset.
> This series actually doesn't support TSC offset, but I'd like to add such
> feature to fix host/guest clock difference in the other series. TSC offset
> values can be gotten as write_tsc_offset trace event from kernel-3.11.
> (see https://lkml.org/lkml/2013/6/12/72)
>
> For a), this patch introduces "virt-server" and "record --virt" modes for
> achieving low-overhead communication of trace data of guests. "virt-server" is a
> server mode for collecting trace data of guests. On the other hand,
> "record --virt" mode is a guest client for sending trace data of the guest.
> Although these functions are similar to "listen" and "record -N" modes each,
> these do not use network but use virtio-serial for low-overhead communication.
>
> For b), this patch series introduce specific message protocol in order to handle
> communication messages with 8 commands. When we extend any messages, using
> structured message will be easier than using unstructured message.
>
> <How to use>
> 1. Run virt-server on a host
> # trace-cmd virt-server
>
> 2. Make guest domain directory
> # mkdir -p /tmp/trace-cmd/virt/<domain>
> # chmod 710 /tmp/trace-cmd/virt/<domain>
> # chgrp qemu /tmp/trace-cmd/virt/<domain>
>
> 3. Make FIFO on the host
> # mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}
>
> 4. Set up of virtio-serial pipe of a guest on the host
> Add the following tags to domain XML files.
> # virsh edit <domain>
> <channel type='unix'>
> <source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
> <target type='virtio' name='agent-ctl-path'/>
> </channel>
> <channel type='pipe'>
> <source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
> <target type='virtio' name='trace-path-cpu0'/>
> </channel>
> ... (cpu1, cpu2, ...)
>
> 5. Boot the guest
> # virsh start <domain>
>
> 6. Execute "record --virt" on the guest
> # trace-cmd record --virt -e sched*
>
> <Result>
> I measured CPU usage outputted by top command on a guest when client sends
> trace data. Client means "record -N"(NW) or "record --virt"(virtio-serial).
>
> NW virtio-serial(splice)
> client(fedora19) ~2.9[%] ~1.7[%]
>
> <Future work>
> - Add an agent mode based on "record --virt"
> - Add a merging feature of trace data of guests and host to "report"
>
> Changes in V2:
> [1/5] Add a comment in open_udp()
> [2/5] Regacy protocol support in order to keep backward compatibility
>
> Thank you,
>
> ---
>
> Yoshihiro YUNOMAE (5):
> [CLEANUP] trace-cmd: Split out binding a port and fork reader from open_udp()
> trace-cmd: Apply the trace-msg protocol for communication between a server and clients
> trace-cmd: Use poll(2) to wait for a message
> trace-cmd: Add virt-server mode for a virtualization environment
> trace-cmd: Add --virt option for record mode
>
>
> Documentation/trace-cmd-record.1.txt | 11
> Documentation/trace-cmd-virt-server.1.txt | 89 +++
> Makefile | 2
> trace-cmd.c | 3
> trace-cmd.h | 14
> trace-listen.c | 601 ++++++++++++++++----
> trace-msg.c | 874 +++++++++++++++++++++++++++++
> trace-msg.h | 31 +
> trace-output.c | 4
> trace-record.c | 146 ++++-
> trace-recorder.c | 54 +-
> trace-usage.c | 10
> 12 files changed, 1678 insertions(+), 161 deletions(-)
> create mode 100644 Documentation/trace-cmd-virt-server.1.txt
> create mode 100644 trace-msg.c
> create mode 100644 trace-msg.h
>

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-11 01:46:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

On Fri, 11 Oct 2013 10:39:35 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:

> Hi Steven,
>
> Would you review this patch set?

Yeah, I've been really busy lately and now I'm back to reviewing
patches. I should be able to get to it next week.

-- Steve

>
> Thanks,
> Yoshihiro YUNOMAE
>

2013-10-14 21:26:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

On Fri, 13 Sep 2013 11:06:27 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:


> <How to use>
> 1. Run virt-server on a host
> # trace-cmd virt-server
>
> 2. Make guest domain directory
> # mkdir -p /tmp/trace-cmd/virt/<domain>
> # chmod 710 /tmp/trace-cmd/virt/<domain>
> # chgrp qemu /tmp/trace-cmd/virt/<domain>

Quick comment. I think the above should be done by trace-cmd. At least
have options for it like:

trace-cmd virt-server -d /tmp/trace-cmd/virt/domain -m 710 -g qemu

Perhaps default some of those, and have trace-cmd print out:

Process Directory: /tmp/trace-cmd/virt/domain
Directory permission: 0710
Group: qemu

OK, now to look at the actual code ;-)

-- Steve


>
> 3. Make FIFO on the host
> # mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}
>
> 4. Set up of virtio-serial pipe of a guest on the host
> Add the following tags to domain XML files.
> # virsh edit <domain>
> <channel type='unix'>
> <source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
> <target type='virtio' name='agent-ctl-path'/>
> </channel>
> <channel type='pipe'>
> <source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
> <target type='virtio' name='trace-path-cpu0'/>
> </channel>
> ... (cpu1, cpu2, ...)
>
> 5. Boot the guest
> # virsh start <domain>
>
> 6. Execute "record --virt" on the guest
> # trace-cmd record --virt -e sched*
>

2013-10-15 02:21:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

On Fri, 13 Sep 2013 11:06:32 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:

> Apply trace-msg protocol for communication between a server and clients.
>
> Currently, trace-listen(server) and trace-record -N(client) operate as follows:
>
> <server> <client>
> listen to socket fd
> connect to socket fd
> accept the client
> send "tracecmd"
> +------------> receive "tracecmd"
> check "tracecmd"
> send cpus
> receive cpus <------------+
> print "cpus=XXX"
> send pagesize
> |
> receive pagesize <--------+
> print "pagesize=XXX"
> send option
> |
> receive option <----------+
> understand option
> send port_array
> +------------> receive port_array
> understand port_array
> send meta data
> receive meta data <-------+
> record meta data
> (snip)
> read block
> --- start sending trace data on child processes ---
>
> --- When client finishes sending trace data ---
> close(socket fd)
> read size = 0
> close(socket fd)

Note, this patch is filled with whitespace errors. Run checkpatch.pl on
it if you can.

I applied and fixed up the first patch.

Also, when I tested this patch I got:

Running in one terminal:

# trace-cmd listen -p 12345

And then in another terminal:

# trace-cmd record -N localhost:12345 -p function -e all
/debug/tracing/events/*/filter
plugin 'function'
Hit Ctrl^C to stop recording
trace-cmd: Connection refused
trace-cmd: Connection refused
trace-cmd: Connection refused
recorder error in splice output recorder error in splice output

recorder error in splice output
trace-cmd: Connection refused
recorder error in splice output

-- Steve



>
> All messages are unstructured character strings, so server(client) using the
> protocol must parse the unstructured messages. Since it is hard to
> add complex contents in the protocol, structured binary message trace-msg
> is introduced as the communication protocol.
>
> By applying this patch, server and client operate as follows:
>
> <server> <client>
> listen to socket fd
> connect to socket fd
> accept the client
> send "tracecmd"
> +------------> receive "tracecmd"
> check "tracecmd"
> send "V2\0<MAGIC_NUMBER>\00" as the v2 protocol
> receive "V2" <------------+
> check "V2"
> read "<MAGIC_NUMBER>\00"
> send "V2"
> +---------------> receive "V2"
> check "V2"
> send cpus,pagesize,option(MSG_TINIT)
> receive MSG_TINIT <-------+
> print "cpus=XXX"
> print "pagesize=XXX"
> understand option
> send port_array
> +--MSG_RINIT-> receive MSG_RINIT
> understand port_array
> send meta data(MSG_SENDMETA)
> receive MSG_SENDMETA <----+
> record meta data
> (snip)
> send a message to finish sending meta data
> | (MSG_FINMETA)
> receive MSG_FINMETA <-----+
> read block
> --- start sending trace data on child processes ---
>
> --- When client finishes sending trace data ---
> send MSG_CLOSE
> receive MSG_CLOSE <-------+
> close(socket fd) close(socket fd)
>
> By introducing the v2 protocol, after the client checks "tracecmd", the client
> will send "V2\0<MAGIC_NUMBER>\00\0". This complex message is used when the
> new client tries to connect to the old server. The new client wants to check
> whether the reply message from the server is "V2" or not. However, the old
> server does not respond to the client before receiving cpu numbers, page size,
> and options. Each message is separated with "\0" in the old server, so the
> client send "V2" as cpu numbers, "<MAGIC_NUMBER>" as page size, and "0" as
> no options. On the other hands, the old server will understand the messages
> as cpus=0, pagesize=<MAGIC_NUMBER>, and options=0, and then the server will
> send the message "\0" as port numbers. Then, the message which the client
> receives is not "V2" but "\0", so the client will reconnect to the old server
> as the v1 protocol.
>
> Changes in V2: Regacy porotocol support in order to keep backward compatibility
>
> Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
> ---
> Makefile | 2
> trace-cmd.h | 11 +
> trace-listen.c | 133 +++++++----
> trace-msg.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> trace-msg.h | 27 ++
> trace-output.c | 4
> trace-record.c | 86 ++++++-
> 7 files changed, 880 insertions(+), 66 deletions(-)
> create mode 100644 trace-msg.c
> create mode 100644 trace-msg.h
>
> diff --git a/Makefile b/Makefile
> index 1964949..054f53d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -314,7 +314,7 @@ KERNEL_SHARK_OBJS = $(TRACE_VIEW_OBJS) $(TRACE_GRAPH_OBJS) $(TRACE_GUI_OBJS) \
> PEVENT_LIB_OBJS = event-parse.o trace-seq.o parse-filter.o parse-utils.o
> TCMD_LIB_OBJS = $(PEVENT_LIB_OBJS) trace-util.o trace-input.o trace-ftrace.o \
> trace-output.o trace-recorder.o trace-restore.o trace-usage.o \
> - trace-blk-hack.o kbuffer-parse.o
> + trace-blk-hack.o kbuffer-parse.o trace-msg.o
>
> PLUGIN_OBJS = plugin_hrtimer.o plugin_kmem.o plugin_sched_switch.o \
> plugin_mac80211.o plugin_jbd2.o plugin_function.o plugin_kvm.o \
> diff --git a/trace-cmd.h b/trace-cmd.h
> index cbbc6ed..a2958ac 100644
> --- a/trace-cmd.h
> +++ b/trace-cmd.h
> @@ -248,6 +248,17 @@ void tracecmd_stop_recording(struct tracecmd_recorder *recorder);
> void tracecmd_stat_cpu(struct trace_seq *s, int cpu);
> long tracecmd_flush_recording(struct tracecmd_recorder *recorder);
>
> +/* for clients */
> +int tracecmd_msg_send_init_data(int fd);
> +int tracecmd_msg_metadata_send(int fd, char *buf, int size);
> +int tracecmd_msg_finish_sending_metadata(int fd);
> +void tracecmd_msg_send_close_msg();
> +
> +/* for server */
> +int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize);
> +int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports);
> +int tracecmd_msg_collect_metadata(int ifd, int ofd);
> +
> /* --- Plugin handling --- */
> extern struct plugin_option trace_ftrace_options[];
>
> diff --git a/trace-listen.c b/trace-listen.c
> index bf187c9..280b1af 100644
> --- a/trace-listen.c
> +++ b/trace-listen.c
> @@ -33,6 +33,7 @@
> #include <errno.h>
>
> #include "trace-local.h"
> +#include "trace-msg.h"
>
> #define MAX_OPTION_SIZE 4096
>
> @@ -45,10 +46,10 @@ static FILE *logfp;
>
> static int debug;
>
> -static int use_tcp;
> -
> static int backlog = 5;
>
> +static int proto_ver;
> +
> #define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu
> static char *get_temp_file(const char *host, const char *port, int cpu)
> {
> @@ -112,10 +113,9 @@ static int process_option(char *option)
> return 0;
> }
>
> -static int done;
> static void finish(int sig)
> {
> - done = 1;
> + done = true;
> }
>
> #define LOG_BUF_SIZE 1024
> @@ -144,7 +144,7 @@ static void __plog(const char *prefix, const char *fmt, va_list ap,
> fprintf(fp, "%.*s", r, buf);
> }
>
> -static void plog(const char *fmt, ...)
> +void plog(const char *fmt, ...)
> {
> va_list ap;
>
> @@ -153,7 +153,7 @@ static void plog(const char *fmt, ...)
> va_end(ap);
> }
>
> -static void pdie(const char *fmt, ...)
> +void pdie(const char *fmt, ...)
> {
> va_list ap;
> char *str = "";
> @@ -324,56 +324,78 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize)
>
> *cpus = atoi(buf);
>
> - plog("cpus=%d\n", *cpus);
> - if (*cpus < 0)
> - return -1;
> + /* Is the client using the new protocol? */
> + if (!*cpus) {
> + if (memcmp(buf, "V2", 2) != 0) {
> + plog("Cannot handle the protocol %s", buf);
> + return -1;
> + }
>
> - /* next read the page size */
> - n = read_string(fd, buf, BUFSIZ);
> - if (n == BUFSIZ)
> - /** ERROR **/
> - return -1;
> + /* read the rest of dummy data, but not use */
> + read(fd, buf, sizeof(V2_MAGIC)+1);
>
> - *pagesize = atoi(buf);
> + proto_ver = V2_PROTOCOL;
>
> - plog("pagesize=%d\n", *pagesize);
> - if (*pagesize <= 0)
> - return -1;
> + /* Let the client know we use v2 protocol */
> + write(fd, "V2", 2);
>
> - /* Now the number of options */
> - n = read_string(fd, buf, BUFSIZ);
> - if (n == BUFSIZ)
> - /** ERROR **/
> - return -1;
> + /* read the CPU count, the page size, and options */
> + if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0)
> + return -1;
> + } else {
> + /* The client is using the v1 protocol */
>
> - options = atoi(buf);
> + plog("cpus=%d\n", *cpus);
> + if (*cpus < 0)
> + return -1;
>
> - for (i = 0; i < options; i++) {
> - /* next is the size of the options */
> + /* next read the page size */
> n = read_string(fd, buf, BUFSIZ);
> if (n == BUFSIZ)
> /** ERROR **/
> return -1;
> - size = atoi(buf);
> - /* prevent a client from killing us */
> - if (size > MAX_OPTION_SIZE)
> +
> + *pagesize = atoi(buf);
> +
> + plog("pagesize=%d\n", *pagesize);
> + if (*pagesize <= 0)
> return -1;
> - option = malloc_or_die(size);
> - do {
> - t = size;
> - s = 0;
> - s = read(fd, option+s, t);
> - if (s <= 0)
> - return -1;
> - t -= s;
> - s = size - t;
> - } while (t);
>
> - s = process_option(option);
> - free(option);
> - /* do we understand this option? */
> - if (!s)
> + /* Now the number of options */
> + n = read_string(fd, buf, BUFSIZ);
> + if (n == BUFSIZ)
> + /** ERROR **/
> return -1;
> +
> + options = atoi(buf);
> +
> + for (i = 0; i < options; i++) {
> + /* next is the size of the options */
> + n = read_string(fd, buf, BUFSIZ);
> + if (n == BUFSIZ)
> + /** ERROR **/
> + return -1;
> + size = atoi(buf);
> + /* prevent a client from killing us */
> + if (size > MAX_OPTION_SIZE)
> + return -1;
> + option = malloc_or_die(size);
> + do {
> + t = size;
> + s = 0;
> + s = read(fd, option+s, t);
> + if (s <= 0)
> + return -1;
> + t -= s;
> + s = size - t;
> + } while (t);
> +
> + s = process_option(option);
> + free(option);
> + /* do we understand this option? */
> + if (!s)
> + return -1;
> + }
> }
>
> if (use_tcp)
> @@ -442,14 +464,20 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
> start_port = udp_port + 1;
> }
>
> - /* send the client a comma deliminated set of port numbers */
> - for (cpu = 0; cpu < cpus; cpu++) {
> - snprintf(buf, BUFSIZ, "%s%d",
> - cpu ? "," : "", port_array[cpu]);
> - write(fd, buf, strlen(buf));
> + if (proto_ver == V2_PROTOCOL) {
> + /* send set of port numbers to the client */
> + if (tracecmd_msg_send_port_array(fd, cpus, port_array) < 0)
> + goto out_free;
> + } else {
> + /* send the client a comma deliminated set of port numbers */
> + for (cpu = 0; cpu < cpus; cpu++) {
> + snprintf(buf, BUFSIZ, "%s%d",
> + cpu ? "," : "", port_array[cpu]);
> + write(fd, buf, strlen(buf));
> + }
> + /* end with null terminator */
> + write(fd, "\0", 1);
> }
> - /* end with null terminator */
> - write(fd, "\0", 1);
>
> return pid_array;
>
> @@ -528,7 +556,10 @@ static void process_client(const char *node, const char *port, int fd)
> return;
>
> /* Now we are ready to start reading data from the client */
> - collect_metadata_from_client(fd, ofd);
> + if (proto_ver == V2_PROTOCOL)
> + tracecmd_msg_collect_metadata(fd, ofd);
> + else
> + collect_metadata_from_client(fd, ofd);
>
> /* wait a little to let our readers finish reading */
> sleep(1);
> diff --git a/trace-msg.c b/trace-msg.c
> new file mode 100644
> index 0000000..cf82ff6
> --- /dev/null
> +++ b/trace-msg.c
> @@ -0,0 +1,683 @@
> +/*
> + * trace-msg.c : define message protocol for communication between clients and
> + * a server
> + *
> + * Copyright (C) 2013 Hitachi, Ltd.
> + * Created by Yoshihiro YUNOMAE <[email protected]>
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; version 2 of the License (not later!)
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses>
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + */
> +
> +#include <errno.h>
> +#include <poll.h>
> +#include <fcntl.h>
> +#include <limits.h>
> +#include <stddef.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <arpa/inet.h>
> +#include <sys/types.h>
> +#include <linux/types.h>
> +
> +#include "trace-cmd-local.h"
> +#include "trace-msg.h"
> +
> +typedef __u32 u32;
> +typedef __be32 be32;
> +
> +#define TRACECMD_MSG_MAX_LEN BUFSIZ
> +
> + /* size + cmd */
> +#define TRACECMD_MSG_HDR_LEN ((sizeof(be32)) + (sizeof(be32)))
> +
> + /* + size of the metadata */
> +#define TRACECMD_MSG_META_MIN_LEN \
> + ((TRACECMD_MSG_HDR_LEN) + (sizeof(be32)))
> +
> + /* - header size for error msg */
> +#define TRACECMD_MSG_META_MAX_LEN \
> +((TRACECMD_MSG_MAX_LEN) - (TRACECMD_MSG_META_MIN_LEN) - TRACECMD_MSG_HDR_LEN)
> +
> + /* size + opt_cmd + size of str */
> +#define TRACECMD_OPT_MIN_LEN \
> + ((sizeof(be32)) + (sizeof(be32)) +(sizeof(be32)))
> +
> +
> +#define CPU_MAX 256
> +
> +/* for both client and server */
> +bool use_tcp;
> +int cpu_count;
> +
> +/* for client */
> +static int psfd;
> +unsigned int page_size;
> +int *client_ports;
> +bool send_metadata;
> +
> +/* for server */
> +static int *port_array;
> +bool done;
> +
> +struct tracecmd_msg_str {
> + be32 size;
> + char *buf;
> +} __attribute__((packed));
> +
> +struct tracecmd_msg_opt {
> + be32 size;
> + be32 opt_cmd;
> + struct tracecmd_msg_str str;
> +};
> +
> +struct tracecmd_msg_tinit {
> + be32 cpus;
> + be32 page_size;
> + be32 opt_num;
> + struct tracecmd_msg_opt *opt;
> +} __attribute__((packed));
> +
> +struct tracecmd_msg_rinit {
> + be32 cpus;
> + be32 port_array[CPU_MAX];
> +} __attribute__((packed));
> +
> +struct tracecmd_msg_meta {
> + struct tracecmd_msg_str str;
> +};
> +
> +struct tracecmd_msg_error {
> + be32 size;
> + be32 cmd;
> + union {
> + struct tracecmd_msg_tinit tinit;
> + struct tracecmd_msg_rinit rinit;
> + struct tracecmd_msg_meta meta;
> + } data;
> +} __attribute__((packed));
> +
> +enum tracecmd_msg_cmd {
> + MSG_CLOSE = 1,
> + MSG_TINIT = 4,
> + MSG_RINIT = 5,
> + MSG_SENDMETA = 6,
> + MSG_FINMETA = 7,
> +};
> +
> +struct tracecmd_msg {
> + be32 size;
> + be32 cmd;
> + union {
> + struct tracecmd_msg_tinit tinit;
> + struct tracecmd_msg_rinit rinit;
> + struct tracecmd_msg_meta meta;
> + struct tracecmd_msg_error err;
> + } data;
> +} __attribute__((packed));
> +
> +struct tracecmd_msg *errmsg;
> +
> +static ssize_t msg_do_write_check(int fd, struct tracecmd_msg *msg)
> +{
> + return __do_write_check(fd, msg, ntohl(msg->size));
> +}
> +
> +static struct tracecmd_msg *tracecmd_msg_alloc(u32 size)
> +{
> + size += TRACECMD_MSG_HDR_LEN;
> + return malloc(size);
> +}
> +
> +static void tracecmd_msg_init(u32 cmd, u32 size, struct tracecmd_msg *msg)
> +{
> + size += TRACECMD_MSG_HDR_LEN;
> + memset(msg, 0, size);
> + msg->size = htonl(size);
> + msg->cmd = htonl(cmd);
> +}
> +
> +static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
> +{
> + memcpy(dest+offset, buf, buflen);
> +}
> +
> +enum msg_opt_command {
> + MSGOPT_USETCP = 1,
> +};
> +
> +static struct tracecmd_msg_opt *tracecmd_msg_opt_alloc(u32 len)
> +{
> + len += TRACECMD_OPT_MIN_LEN;
> + return malloc(len);
> +}
> +
> +static void make_option(int opt_cmd, const char *buf,
> + struct tracecmd_msg_opt *opt)
> +{
> + u32 buflen = 0;
> + u32 size = TRACECMD_OPT_MIN_LEN;
> +
> + if (buf) {
> + buflen = strlen(buf);
> + size += buflen;
> + }
> +
> + opt->size = htonl(size);
> + opt->opt_cmd = htonl(opt_cmd);
> + opt->str.size = htonl(buflen);
> +
> + if (buf)
> + bufcpy(opt, TRACECMD_OPT_MIN_LEN, buf, buflen);
> +}
> +
> +static int add_options_to_tinit(u32 len, struct tracecmd_msg *msg)
> +{
> + struct tracecmd_msg_opt *opt;
> + int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
> +
> + if (use_tcp) {
> + opt = tracecmd_msg_opt_alloc(0);
> + if (!opt)
> + return -ENOMEM;
> +
> + make_option(MSGOPT_USETCP, NULL, opt);
> + /* add option */
> + bufcpy(msg, offset, opt, ntohl(opt->size));
> + free(opt);
> + }
> +
> + return 0;
> +}
> +
> +static int make_tinit(u32 len, struct tracecmd_msg *msg)
> +{
> + int opt_num = 0;
> + int ret = 0;
> +
> + if (use_tcp)
> + opt_num++;
> +
> + if (opt_num) {
> + ret = add_options_to_tinit(len, msg);
> + if (ret < 0)
> + return ret;
> + }
> +
> + msg->data.tinit.cpus = htonl(cpu_count);
> + msg->data.tinit.page_size = htonl(page_size);
> + msg->data.tinit.opt_num = htonl(opt_num);
> +
> + return 0;
> +}
> +
> +static int make_rinit(struct tracecmd_msg *msg)
> +{
> + int i;
> + u32 offset = TRACECMD_MSG_HDR_LEN;
> + be32 port;
> +
> + msg->data.rinit.cpus = htonl(cpu_count);
> +
> + for (i = 0; i < cpu_count; i++) {
> + /* + rrqports->cpus or rrqports->port_array[i] */
> + offset += sizeof(be32);
> + port = htonl(port_array[i]);
> + bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
> + }
> +
> + return 0;
> +}
> +
> +static u32 tracecmd_msg_get_body_length(u32 cmd)
> +{
> + struct tracecmd_msg *msg;
> + u32 len = 0;
> +
> + switch (cmd) {
> + case MSG_TINIT:
> + len = sizeof(msg->data.tinit.cpus)
> + + sizeof(msg->data.tinit.page_size)
> + + sizeof(msg->data.tinit.opt_num);
> +
> + /*
> + * If we are using IPV4 and our page size is greater than
> + * or equal to 64K, we need to punt and use TCP. :-(
> + */
> +
> + /* TODO, test for ipv4 */
> + if (page_size >= UDP_MAX_PACKET) {
> + warning("page size too big for UDP using TCP "
> + "in live read");
> + use_tcp = true;
> + }
> +
> + if (use_tcp)
> + len += TRACECMD_OPT_MIN_LEN;
> +
> + return len;
> + case MSG_RINIT:
> + return sizeof(msg->data.rinit.cpus)
> + + sizeof(msg->data.rinit.port_array);
> + case MSG_SENDMETA:
> + return TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN;
> + case MSG_CLOSE:
> + case MSG_FINMETA:
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
> +{
> + switch (cmd) {
> + case MSG_TINIT:
> + return make_tinit(len, msg);
> + case MSG_RINIT:
> + return make_rinit(msg);
> + case MSG_CLOSE:
> + case MSG_SENDMETA: /* meta data is not stored here. */
> + case MSG_FINMETA:
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static int tracecmd_msg_create(u32 cmd, struct tracecmd_msg **msg)
> +{
> + u32 len = 0;
> + int ret = 0;
> +
> + len = tracecmd_msg_get_body_length(cmd);
> + if (len > (TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN)) {
> + plog("Exceed maximum message size cmd=%d\n", cmd);
> + return -EINVAL;
> + }
> +
> + *msg = tracecmd_msg_alloc(len);
> + if (!*msg)
> + return -ENOMEM;
> + tracecmd_msg_init(cmd, len, *msg);
> +
> + ret = tracecmd_msg_make_body(cmd, len, *msg);
> + if (ret < 0)
> + free(*msg);
> +
> + return ret;
> +}
> +
> +static int tracecmd_msg_send(int fd, u32 cmd)
> +{
> + struct tracecmd_msg *msg = NULL;
> + int ret = 0;
> +
> + if (cmd > MSG_FINMETA) {
> + plog("Unsupported command: %d\n", cmd);
> + return -EINVAL;
> + }
> +
> + ret = tracecmd_msg_create(cmd, &msg);
> + if (ret < 0)
> + return ret;
> +
> + ret = msg_do_write_check(fd, msg);
> + if (ret < 0) {
> + free(msg);
> + return -ECOMM;
> + }
> +
> + return 0;
> +}
> +
> +static int tracecmd_msg_read_extra(int fd, char *buf, u32 size, int *n)
> +{
> + int r = 0;
> +
> + do {
> + r = read(fd, buf+*n, size);
> + if (r < 0) {
> + if (errno == EINTR)
> + continue;
> + return -errno;
> + } else if (!r)
> + return -ENOTCONN;
> + size -= r;
> + *n += r;
> + } while (size);
> +
> + return 0;
> +}
> +
> +/*
> + * Read header information of msg first, then read all data
> + */
> +static int tracecmd_msg_recv(int fd, char *buf)
> +{
> + struct tracecmd_msg *msg;
> + u32 size = 0;
> + int n = 0;
> + int ret;
> +
> + ret = tracecmd_msg_read_extra(fd, buf, TRACECMD_MSG_HDR_LEN, &n);
> + if (ret < 0)
> + return ret;
> +
> + msg = (struct tracecmd_msg *)buf;
> + size = ntohl(msg->size);
> + if (size > TRACECMD_MSG_MAX_LEN)
> + /* too big */
> + goto error;
> + else if (size < TRACECMD_MSG_HDR_LEN)
> + /* too small */
> + goto error;
> + else if (size > TRACECMD_MSG_HDR_LEN) {
> + size -= TRACECMD_MSG_HDR_LEN;
> + return tracecmd_msg_read_extra(fd, buf, size, &n);
> + }
> +
> + return 0;
> +error:
> + plog("Receive an invalid message(size=%d)\n", size);
> + return -ENOMSG;
> +}
> +
> +static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
> +{
> + return (void *)msg + offset;
> +}
> +
> +static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
> +{
> + char msg_tmp[TRACECMD_MSG_MAX_LEN];
> + u32 cmd;
> + int ret;
> +
> + ret = tracecmd_msg_recv(fd, msg_tmp);
> + if (ret < 0)
> + return ret;
> +
> + *msg = (struct tracecmd_msg *)msg_tmp;
> + cmd = ntohl((*msg)->cmd);
> + if (cmd == MSG_CLOSE)
> + return -ECONNABORTED;
> +
> + return 0;
> +}
> +
> +static int tracecmd_msg_send_and_wait_for_msg(int fd, u32 cmd, struct tracecmd_msg **msg)
> +{
> + int ret;
> +
> + ret = tracecmd_msg_send(fd, cmd);
> + if (ret < 0)
> + return ret;
> +
> + ret = tracecmd_msg_wait_for_msg(fd, msg);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +int tracecmd_msg_send_init_data(int fd)
> +{
> + struct tracecmd_msg *msg;
> + int i, cpus;
> + int ret;
> +
> + ret = tracecmd_msg_send_and_wait_for_msg(fd, MSG_TINIT, &msg);
> + if (ret < 0)
> + return ret;
> +
> + cpus = ntohl(msg->data.rinit.cpus);
> + client_ports = malloc_or_die(sizeof(int) * cpus);
> + for (i = 0; i < cpus; i++)
> + client_ports[i] = ntohl(msg->data.rinit.port_array[i]);
> +
> + /* Next, send meta data */
> + send_metadata = true;
> +
> + return 0;
> +}
> +
> +static bool process_option(struct tracecmd_msg_opt *opt)
> +{
> + /* currently the only option we have is to us TCP */
> + if (ntohl(opt->opt_cmd) == MSGOPT_USETCP) {
> + use_tcp = true;
> + return true;
> + }
> + return false;
> +}
> +
> +static void error_operation_for_server(struct tracecmd_msg *msg)
> +{
> + u32 cmd;
> +
> + cmd = ntohl(msg->cmd);
> +
> + warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
> +}
> +
> +#define MAX_OPTION_SIZE 4096
> +
> +int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize)
> +{
> + struct tracecmd_msg *msg;
> + struct tracecmd_msg_opt *opt;
> + char buf[TRACECMD_MSG_MAX_LEN];
> + int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
> + int options, i, s;
> + int ret;
> + u32 size = 0;
> + u32 cmd;
> +
> + ret = tracecmd_msg_recv(fd, buf);
> + if (ret < 0)
> + return ret;
> +
> + msg = (struct tracecmd_msg *)buf;
> + cmd = ntohl(msg->cmd);
> + if (cmd != MSG_TINIT) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + *cpus = ntohl(msg->data.tinit.cpus);
> + plog("cpus=%d\n", *cpus);
> + if (*cpus < 0) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + *pagesize = ntohl(msg->data.tinit.page_size);
> + plog("pagesize=%d\n", *pagesize);
> + if (*pagesize <= 0) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + options = ntohl(msg->data.tinit.opt_num);
> + for (i = 0; i < options; i++) {
> + offset += size;
> + opt = tracecmd_msg_buf_access(msg, offset);
> + size = ntohl(opt->size);
> + /* prevent a client from killing us */
> + if (size > MAX_OPTION_SIZE) {
> + plog("Exceed MAX_OPTION_SIZE\n");
> + ret = -EINVAL;
> + goto error;
> + }
> + s = process_option(opt);
> + /* do we understand this option? */
> + if (!s) {
> + plog("Cannot understand(%d:%d:%d)\n",
> + i, ntohl(opt->size), ntohl(opt->opt_cmd));
> + ret = -EINVAL;
> + goto error;
> + }
> + }
> +
> + return 0;
> +
> +error:
> + error_operation_for_server(msg);
> + return ret;
> +}
> +
> +int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports)
> +{
> + int ret;
> +
> + cpu_count = total_cpus;
> + port_array = ports;
> +
> + ret = tracecmd_msg_send(fd, MSG_RINIT);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +void tracecmd_msg_send_close_msg()
> +{
> + tracecmd_msg_send(psfd, MSG_CLOSE);
> +}
> +
> +static void make_meta(const char *buf, int buflen, struct tracecmd_msg *msg)
> +{
> + int offset = offsetof(struct tracecmd_msg, data.meta.str.buf);
> +
> + msg->data.meta.str.size = htonl(buflen);
> + bufcpy(msg, offset, buf, buflen);
> +}
> +
> +int tracecmd_msg_metadata_send(int fd, char *buf, int size)
> +{
> + struct tracecmd_msg *msg;
> + int n, len;
> + int ret;
> + int count = 0;
> +
> + ret = tracecmd_msg_create(MSG_SENDMETA, &msg);
> + if (ret < 0)
> + return ret;
> +
> + n = size;
> + do {
> + if (n > TRACECMD_MSG_META_MAX_LEN) {
> + make_meta(buf+count, TRACECMD_MSG_META_MAX_LEN, msg);
> + n -= TRACECMD_MSG_META_MAX_LEN;
> + count += TRACECMD_MSG_META_MAX_LEN;
> + } else {
> + make_meta(buf+count, n, msg);
> + /*
> + * TRACECMD_MSG_META_MAX_LEN is stored in msg->size,
> + * so update the size to the correct value.
> + */
> + len = TRACECMD_MSG_META_MIN_LEN + n;
> + msg->size = htonl(len);
> + n = 0;
> + }
> +
> + ret = msg_do_write_check(fd, msg);
> + if (ret < 0)
> + return ret;
> + } while (n);
> +
> + return 0;
> +}
> +
> +int tracecmd_msg_finish_sending_metadata(int fd)
> +{
> + int ret;
> +
> + ret = tracecmd_msg_send(fd, MSG_FINMETA);
> + if (ret < 0)
> + return ret;
> +
> + /* psfd will be used for closing */
> + psfd = fd;
> + return 0;
> +}
> +
> +int tracecmd_msg_collect_metadata(int ifd, int ofd)
> +{
> + struct tracecmd_msg *msg;
> + char buf[TRACECMD_MSG_MAX_LEN];
> + u32 s, t, n, cmd;
> + int offset = TRACECMD_MSG_META_MIN_LEN;
> + int ret;
> +
> + do {
> + ret = tracecmd_msg_recv(ifd, buf);
> + if (ret < 0) {
> + warning("reading client");
> + return ret;
> + }
> +
> + msg = (struct tracecmd_msg *)buf;
> + cmd = ntohl(msg->cmd);
> + if (cmd == MSG_FINMETA) {
> + /* Finish receiving meta data */
> + break;
> + } else if (cmd != MSG_SENDMETA)
> + goto error;
> +
> + n = ntohl(msg->data.meta.str.size);
> + t = n;
> + s = 0;
> + do {
> + s = write(ofd, buf+s+offset, t);
> + if (s < 0) {
> + if (errno == EINTR)
> + continue;
> + warning("writing to file");
> + return -errno;
> + }
> + t -= s;
> + s = n - t;
> + } while (t);
> + } while (cmd == MSG_SENDMETA);
> +
> + /* check the finish message of the client */
> + while(!done) {
> + ret = tracecmd_msg_recv(ifd, buf);
> + if (ret < 0) {
> + warning("reading client");
> + return ret;
> + }
> +
> + msg = (struct tracecmd_msg *)buf;
> + cmd = ntohl(msg->cmd);
> + if (cmd == MSG_CLOSE)
> + /* Finish this connection */
> + break;
> + else {
> + warning("Not accept the message %d", ntohl(msg->cmd));
> + ret = -EINVAL;
> + goto error;
> + }
> + }
> +
> + return 0;
> +
> +error:
> + error_operation_for_server(msg);
> + return ret;
> +}
> diff --git a/trace-msg.h b/trace-msg.h
> new file mode 100644
> index 0000000..b23e72b
> --- /dev/null
> +++ b/trace-msg.h
> @@ -0,0 +1,27 @@
> +#ifndef _TRACE_MSG_H_
> +#define _TRACE_MSG_H_
> +
> +#include <stdbool.h>
> +
> +#define UDP_MAX_PACKET (65536 - 20)
> +#define V2_MAGIC "677768\0"
> +
> +#define V1_PROTOCOL 1
> +#define V2_PROTOCOL 2
> +
> +/* for both client and server */
> +extern bool use_tcp;
> +extern int cpu_count;
> +
> +/* for client */
> +extern unsigned int page_size;
> +extern int *client_ports;
> +extern bool send_metadata;
> +
> +/* for server */
> +extern bool done;
> +
> +void plog(const char *fmt, ...);
> +void pdie(const char *fmt, ...);
> +
> +#endif /* _TRACE_MSG_H_ */
> diff --git a/trace-output.c b/trace-output.c
> index bdb478d..6e1298b 100644
> --- a/trace-output.c
> +++ b/trace-output.c
> @@ -36,6 +36,7 @@
> #include <glob.h>
>
> #include "trace-cmd-local.h"
> +#include "trace-msg.h"
> #include "version.h"
>
> /* We can't depend on the host size for size_t, all must be 64 bit */
> @@ -80,6 +81,9 @@ struct list_event_system {
> static stsize_t
> do_write_check(struct tracecmd_output *handle, void *data, tsize_t size)
> {
> + if (send_metadata)
> + return tracecmd_msg_metadata_send(handle->fd, data, size);
> +
> return __do_write_check(handle->fd, data, size);
> }
>
> diff --git a/trace-record.c b/trace-record.c
> index 0199627..ebfe6c0 100644
> --- a/trace-record.c
> +++ b/trace-record.c
> @@ -45,6 +45,7 @@
> #include <errno.h>
>
> #include "trace-local.h"
> +#include "trace-msg.h"
>
> #define _STR(x) #x
> #define STR(x) _STR(x)
> @@ -59,29 +60,21 @@
> #define STAMP "stamp"
> #define FUNC_STACK_TRACE "func_stack_trace"
>
> -#define UDP_MAX_PACKET (65536 - 20)
> -
> static int tracing_on_init_val;
>
> static int rt_prio;
>
> -static int use_tcp;
> -
> -static unsigned int page_size;
> -
> static int buffer_size;
>
> static const char *output_file = "trace.dat";
>
> static int latency;
> static int sleep_time = 1000;
> -static int cpu_count;
> static int recorder_threads;
> static int *pids;
> static int buffers;
>
> static char *host;
> -static int *client_ports;
> static int sfd;
>
> /* Max size to let a per cpu file get */
> @@ -99,6 +92,8 @@ static unsigned recorder_flags;
> /* Try a few times to get an accurate date */
> static int date2ts_tries = 5;
>
> +static int proto_ver = V2_PROTOCOL;
> +
> struct func_list {
> struct func_list *next;
> const char *func;
> @@ -1607,20 +1602,26 @@ static int create_recorder(struct buffer_instance *instance, int cpu, int extrac
> exit(0);
> }
>
> -static void communicate_with_listener(int fd)
> +static void check_first_msg_from_server(int fd)
> {
> char buf[BUFSIZ];
> - ssize_t n;
> - int cpu, i;
>
> - n = read(fd, buf, 8);
> + read(fd, buf, 8);
>
> /* Make sure the server is the tracecmd server */
> if (memcmp(buf, "tracecmd", 8) != 0)
> die("server not tracecmd server");
> +}
>
> - /* write the number of CPUs we have (in ASCII) */
> +static void communicate_with_listener_v1(int fd)
> +{
> + char buf[BUFSIZ];
> + ssize_t n;
> + int cpu, i;
> +
> + check_first_msg_from_server(fd);
>
> + /* write the number of CPUs we have (in ASCII) */
> sprintf(buf, "%d", cpu_count);
>
> /* include \0 */
> @@ -1675,6 +1676,46 @@ static void communicate_with_listener(int fd)
> }
> }
>
> +static void communicate_with_listener_v2(int fd)
> +{
> + if (tracecmd_msg_send_init_data(fd) < 0)
> + die("Cannot communicate with server");
> +}
> +
> +static void check_protocol_version(int fd)
> +{
> + char buf[BUFSIZ];
> +
> + check_first_msg_from_server(fd);
> +
> + /*
> + * Write the protocol version, the magic number, and the dummy
> + * option(0) (in ASCII). The client understands whether the client
> + * uses the v2 protocol or not by checking a reply message from the
> + * server. If the message is "V2", the server uses v2 protocol. On the
> + * other hands, if the message is just number strings, the server
> + * returned port numbers. So, in that time, the client understands the
> + * server uses the v1 protocol. However, the old server tells the
> + * client port numbers after reading cpu_count, page_size, and option.
> + * So, we add the dummy number (the magic number and 0 option) to the
> + * first client message.
> + */
> + write(fd, "V2\0"V2_MAGIC"0", sizeof(V2_MAGIC)+4);
> +
> + /* read a reply message */
> + read(fd, buf, BUFSIZ);
> +
> + if (!buf[0]) {
> + /* the server uses the v1 protocol, so we'll use it */
> + proto_ver = V1_PROTOCOL;
> + plog("Use the v1 protocol\n");
> + } else {
> + if (memcmp(buf, "V2", 2) != 0)
> + die("Cannot handle the protocol %s", buf);
> + /* OK, let's use v2 protocol */
> + }
> +}
> +
> static void setup_network(void)
> {
> struct tracecmd_output *handle;
> @@ -1703,6 +1744,7 @@ static void setup_network(void)
> hints.ai_family = AF_UNSPEC;
> hints.ai_socktype = SOCK_STREAM;
>
> +again:
> s = getaddrinfo(server, port, &hints, &result);
> if (s != 0)
> die("getaddrinfo: %s", gai_strerror(s));
> @@ -1723,16 +1765,32 @@ static void setup_network(void)
>
> freeaddrinfo(result);
>
> - communicate_with_listener(sfd);
> + if (proto_ver == V2_PROTOCOL) {
> + check_protocol_version(sfd);
> + if (proto_ver == V1_PROTOCOL) {
> + /* reconnect to the server for using the v1 protocol */
> + close(sfd);
> + goto again;
> + }
> + communicate_with_listener_v2(sfd);
> + }
> +
> + if (proto_ver == V1_PROTOCOL)
> + communicate_with_listener_v1(sfd);
>
> /* Now create the handle through this socket */
> handle = tracecmd_create_init_fd_glob(sfd, listed_events);
>
> + if (proto_ver == V2_PROTOCOL)
> + tracecmd_msg_finish_sending_metadata(sfd);
> +
> /* OK, we are all set, let'r rip! */
> }
>
> static void finish_network(void)
> {
> + if (proto_ver == V2_PROTOCOL)
> + tracecmd_msg_send_close_msg();
> close(sfd);
> free(host);
> }

2013-10-17 06:32:46

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

Hi Steven,

(2013/10/15 6:26), Steven Rostedt wrote:
> On Fri, 13 Sep 2013 11:06:27 +0900
> Yoshihiro YUNOMAE <[email protected]> wrote:
>
>
>> <How to use>
>> 1. Run virt-server on a host
>> # trace-cmd virt-server
>>
>> 2. Make guest domain directory
>> # mkdir -p /tmp/trace-cmd/virt/<domain>
>> # chmod 710 /tmp/trace-cmd/virt/<domain>
>> # chgrp qemu /tmp/trace-cmd/virt/<domain>
>
> Quick comment. I think the above should be done by trace-cmd. At least
> have options for it like:
>
> trace-cmd virt-server -d /tmp/trace-cmd/virt/domain -m 710 -g qemu
>
> Perhaps default some of those, and have trace-cmd print out:
>
> Process Directory: /tmp/trace-cmd/virt/domain
> Directory permission: 0710
> Group: qemu

OK. As you say, when we know domains which we will boot, trace-cmd
should make those automatically. So, I'll add this feature.
Note that if we don't know domains when we boot virt-sevrer, we must
make those manually now.

In this patch set, virt-server always uses /tmp/trace-cmd/virt, so
we had better indicate only the domain name with d option, I think.

trace-cmd virt-server -d domain -m 710 -g qemu

What do you think about this?

Thanks,

Yoshihiro YUNOMAE

> OK, now to look at the actual code ;-)
>
> -- Steve
>
>
>>
>> 3. Make FIFO on the host
>> # mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}
>>
>> 4. Set up of virtio-serial pipe of a guest on the host
>> Add the following tags to domain XML files.
>> # virsh edit <domain>
>> <channel type='unix'>
>> <source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
>> <target type='virtio' name='agent-ctl-path'/>
>> </channel>
>> <channel type='pipe'>
>> <source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
>> <target type='virtio' name='trace-path-cpu0'/>
>> </channel>
>> ... (cpu1, cpu2, ...)
>>
>> 5. Boot the guest
>> # virsh start <domain>
>>
>> 6. Execute "record --virt" on the guest
>> # trace-cmd record --virt -e sched*
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-17 06:34:26

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: Re: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

(2013/10/15 11:21), Steven Rostedt wrote:
> On Fri, 13 Sep 2013 11:06:32 +0900
> Yoshihiro YUNOMAE <[email protected]> wrote:
>
>> Apply trace-msg protocol for communication between a server and clients.
>>
>> Currently, trace-listen(server) and trace-record -N(client) operate as follows:
>>
>> <server> <client>
>> listen to socket fd
>> connect to socket fd
>> accept the client
>> send "tracecmd"
>> +------------> receive "tracecmd"
>> check "tracecmd"
>> send cpus
>> receive cpus <------------+
>> print "cpus=XXX"
>> send pagesize
>> |
>> receive pagesize <--------+
>> print "pagesize=XXX"
>> send option
>> |
>> receive option <----------+
>> understand option
>> send port_array
>> +------------> receive port_array
>> understand port_array
>> send meta data
>> receive meta data <-------+
>> record meta data
>> (snip)
>> read block
>> --- start sending trace data on child processes ---
>>
>> --- When client finishes sending trace data ---
>> close(socket fd)
>> read size = 0
>> close(socket fd)
>
> Note, this patch is filled with whitespace errors. Run checkpatch.pl on
> it if you can.

Oh sorry. I'll check it.

> I applied and fixed up the first patch.
>
> Also, when I tested this patch I got:
>
> Running in one terminal:
>
> # trace-cmd listen -p 12345
>
> And then in another terminal:
>
> # trace-cmd record -N localhost:12345 -p function -e all
> /debug/tracing/events/*/filter
> plugin 'function'
> Hit Ctrl^C to stop recording
> trace-cmd: Connection refused
> trace-cmd: Connection refused
> trace-cmd: Connection refused
> recorder error in splice output recorder error in splice output
>
> recorder error in splice output
> trace-cmd: Connection refused
> recorder error in splice output

It seems to be not due to applying my patch.
We cannot use "localhost" for trace-cmd(v1.2).
When we use "127.0.0.1", this problem does not occur.

Thanks,
Yoshihiro YUNOMAE

> -- Steve
>
>
>
>>
>> All messages are unstructured character strings, so server(client) using the
>> protocol must parse the unstructured messages. Since it is hard to
>> add complex contents in the protocol, structured binary message trace-msg
>> is introduced as the communication protocol.
>>
>> By applying this patch, server and client operate as follows:
>>
>> <server> <client>
>> listen to socket fd
>> connect to socket fd
>> accept the client
>> send "tracecmd"
>> +------------> receive "tracecmd"
>> check "tracecmd"
>> send "V2\0<MAGIC_NUMBER>\00" as the v2 protocol
>> receive "V2" <------------+
>> check "V2"
>> read "<MAGIC_NUMBER>\00"
>> send "V2"
>> +---------------> receive "V2"
>> check "V2"
>> send cpus,pagesize,option(MSG_TINIT)
>> receive MSG_TINIT <-------+
>> print "cpus=XXX"
>> print "pagesize=XXX"
>> understand option
>> send port_array
>> +--MSG_RINIT-> receive MSG_RINIT
>> understand port_array
>> send meta data(MSG_SENDMETA)
>> receive MSG_SENDMETA <----+
>> record meta data
>> (snip)
>> send a message to finish sending meta data
>> | (MSG_FINMETA)
>> receive MSG_FINMETA <-----+
>> read block
>> --- start sending trace data on child processes ---
>>
>> --- When client finishes sending trace data ---
>> send MSG_CLOSE
>> receive MSG_CLOSE <-------+
>> close(socket fd) close(socket fd)
>>
>> By introducing the v2 protocol, after the client checks "tracecmd", the client
>> will send "V2\0<MAGIC_NUMBER>\00\0". This complex message is used when the
>> new client tries to connect to the old server. The new client wants to check
>> whether the reply message from the server is "V2" or not. However, the old
>> server does not respond to the client before receiving cpu numbers, page size,
>> and options. Each message is separated with "\0" in the old server, so the
>> client send "V2" as cpu numbers, "<MAGIC_NUMBER>" as page size, and "0" as
>> no options. On the other hands, the old server will understand the messages
>> as cpus=0, pagesize=<MAGIC_NUMBER>, and options=0, and then the server will
>> send the message "\0" as port numbers. Then, the message which the client
>> receives is not "V2" but "\0", so the client will reconnect to the old server
>> as the v1 protocol.
>>
>> Changes in V2: Regacy porotocol support in order to keep backward compatibility
>>
>> Signed-off-by: Yoshihiro YUNOMAE <[email protected]>
>> ---
>> Makefile | 2
>> trace-cmd.h | 11 +
>> trace-listen.c | 133 +++++++----
>> trace-msg.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> trace-msg.h | 27 ++
>> trace-output.c | 4
>> trace-record.c | 86 ++++++-
>> 7 files changed, 880 insertions(+), 66 deletions(-)
>> create mode 100644 trace-msg.c
>> create mode 100644 trace-msg.h
>>
>> diff --git a/Makefile b/Makefile
>> index 1964949..054f53d 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -314,7 +314,7 @@ KERNEL_SHARK_OBJS = $(TRACE_VIEW_OBJS) $(TRACE_GRAPH_OBJS) $(TRACE_GUI_OBJS) \
>> PEVENT_LIB_OBJS = event-parse.o trace-seq.o parse-filter.o parse-utils.o
>> TCMD_LIB_OBJS = $(PEVENT_LIB_OBJS) trace-util.o trace-input.o trace-ftrace.o \
>> trace-output.o trace-recorder.o trace-restore.o trace-usage.o \
>> - trace-blk-hack.o kbuffer-parse.o
>> + trace-blk-hack.o kbuffer-parse.o trace-msg.o
>>
>> PLUGIN_OBJS = plugin_hrtimer.o plugin_kmem.o plugin_sched_switch.o \
>> plugin_mac80211.o plugin_jbd2.o plugin_function.o plugin_kvm.o \
>> diff --git a/trace-cmd.h b/trace-cmd.h
>> index cbbc6ed..a2958ac 100644
>> --- a/trace-cmd.h
>> +++ b/trace-cmd.h
>> @@ -248,6 +248,17 @@ void tracecmd_stop_recording(struct tracecmd_recorder *recorder);
>> void tracecmd_stat_cpu(struct trace_seq *s, int cpu);
>> long tracecmd_flush_recording(struct tracecmd_recorder *recorder);
>>
>> +/* for clients */
>> +int tracecmd_msg_send_init_data(int fd);
>> +int tracecmd_msg_metadata_send(int fd, char *buf, int size);
>> +int tracecmd_msg_finish_sending_metadata(int fd);
>> +void tracecmd_msg_send_close_msg();
>> +
>> +/* for server */
>> +int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize);
>> +int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports);
>> +int tracecmd_msg_collect_metadata(int ifd, int ofd);
>> +
>> /* --- Plugin handling --- */
>> extern struct plugin_option trace_ftrace_options[];
>>
>> diff --git a/trace-listen.c b/trace-listen.c
>> index bf187c9..280b1af 100644
>> --- a/trace-listen.c
>> +++ b/trace-listen.c
>> @@ -33,6 +33,7 @@
>> #include <errno.h>
>>
>> #include "trace-local.h"
>> +#include "trace-msg.h"
>>
>> #define MAX_OPTION_SIZE 4096
>>
>> @@ -45,10 +46,10 @@ static FILE *logfp;
>>
>> static int debug;
>>
>> -static int use_tcp;
>> -
>> static int backlog = 5;
>>
>> +static int proto_ver;
>> +
>> #define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu
>> static char *get_temp_file(const char *host, const char *port, int cpu)
>> {
>> @@ -112,10 +113,9 @@ static int process_option(char *option)
>> return 0;
>> }
>>
>> -static int done;
>> static void finish(int sig)
>> {
>> - done = 1;
>> + done = true;
>> }
>>
>> #define LOG_BUF_SIZE 1024
>> @@ -144,7 +144,7 @@ static void __plog(const char *prefix, const char *fmt, va_list ap,
>> fprintf(fp, "%.*s", r, buf);
>> }
>>
>> -static void plog(const char *fmt, ...)
>> +void plog(const char *fmt, ...)
>> {
>> va_list ap;
>>
>> @@ -153,7 +153,7 @@ static void plog(const char *fmt, ...)
>> va_end(ap);
>> }
>>
>> -static void pdie(const char *fmt, ...)
>> +void pdie(const char *fmt, ...)
>> {
>> va_list ap;
>> char *str = "";
>> @@ -324,56 +324,78 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize)
>>
>> *cpus = atoi(buf);
>>
>> - plog("cpus=%d\n", *cpus);
>> - if (*cpus < 0)
>> - return -1;
>> + /* Is the client using the new protocol? */
>> + if (!*cpus) {
>> + if (memcmp(buf, "V2", 2) != 0) {
>> + plog("Cannot handle the protocol %s", buf);
>> + return -1;
>> + }
>>
>> - /* next read the page size */
>> - n = read_string(fd, buf, BUFSIZ);
>> - if (n == BUFSIZ)
>> - /** ERROR **/
>> - return -1;
>> + /* read the rest of dummy data, but not use */
>> + read(fd, buf, sizeof(V2_MAGIC)+1);
>>
>> - *pagesize = atoi(buf);
>> + proto_ver = V2_PROTOCOL;
>>
>> - plog("pagesize=%d\n", *pagesize);
>> - if (*pagesize <= 0)
>> - return -1;
>> + /* Let the client know we use v2 protocol */
>> + write(fd, "V2", 2);
>>
>> - /* Now the number of options */
>> - n = read_string(fd, buf, BUFSIZ);
>> - if (n == BUFSIZ)
>> - /** ERROR **/
>> - return -1;
>> + /* read the CPU count, the page size, and options */
>> + if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0)
>> + return -1;
>> + } else {
>> + /* The client is using the v1 protocol */
>>
>> - options = atoi(buf);
>> + plog("cpus=%d\n", *cpus);
>> + if (*cpus < 0)
>> + return -1;
>>
>> - for (i = 0; i < options; i++) {
>> - /* next is the size of the options */
>> + /* next read the page size */
>> n = read_string(fd, buf, BUFSIZ);
>> if (n == BUFSIZ)
>> /** ERROR **/
>> return -1;
>> - size = atoi(buf);
>> - /* prevent a client from killing us */
>> - if (size > MAX_OPTION_SIZE)
>> +
>> + *pagesize = atoi(buf);
>> +
>> + plog("pagesize=%d\n", *pagesize);
>> + if (*pagesize <= 0)
>> return -1;
>> - option = malloc_or_die(size);
>> - do {
>> - t = size;
>> - s = 0;
>> - s = read(fd, option+s, t);
>> - if (s <= 0)
>> - return -1;
>> - t -= s;
>> - s = size - t;
>> - } while (t);
>>
>> - s = process_option(option);
>> - free(option);
>> - /* do we understand this option? */
>> - if (!s)
>> + /* Now the number of options */
>> + n = read_string(fd, buf, BUFSIZ);
>> + if (n == BUFSIZ)
>> + /** ERROR **/
>> return -1;
>> +
>> + options = atoi(buf);
>> +
>> + for (i = 0; i < options; i++) {
>> + /* next is the size of the options */
>> + n = read_string(fd, buf, BUFSIZ);
>> + if (n == BUFSIZ)
>> + /** ERROR **/
>> + return -1;
>> + size = atoi(buf);
>> + /* prevent a client from killing us */
>> + if (size > MAX_OPTION_SIZE)
>> + return -1;
>> + option = malloc_or_die(size);
>> + do {
>> + t = size;
>> + s = 0;
>> + s = read(fd, option+s, t);
>> + if (s <= 0)
>> + return -1;
>> + t -= s;
>> + s = size - t;
>> + } while (t);
>> +
>> + s = process_option(option);
>> + free(option);
>> + /* do we understand this option? */
>> + if (!s)
>> + return -1;
>> + }
>> }
>>
>> if (use_tcp)
>> @@ -442,14 +464,20 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
>> start_port = udp_port + 1;
>> }
>>
>> - /* send the client a comma deliminated set of port numbers */
>> - for (cpu = 0; cpu < cpus; cpu++) {
>> - snprintf(buf, BUFSIZ, "%s%d",
>> - cpu ? "," : "", port_array[cpu]);
>> - write(fd, buf, strlen(buf));
>> + if (proto_ver == V2_PROTOCOL) {
>> + /* send set of port numbers to the client */
>> + if (tracecmd_msg_send_port_array(fd, cpus, port_array) < 0)
>> + goto out_free;
>> + } else {
>> + /* send the client a comma deliminated set of port numbers */
>> + for (cpu = 0; cpu < cpus; cpu++) {
>> + snprintf(buf, BUFSIZ, "%s%d",
>> + cpu ? "," : "", port_array[cpu]);
>> + write(fd, buf, strlen(buf));
>> + }
>> + /* end with null terminator */
>> + write(fd, "\0", 1);
>> }
>> - /* end with null terminator */
>> - write(fd, "\0", 1);
>>
>> return pid_array;
>>
>> @@ -528,7 +556,10 @@ static void process_client(const char *node, const char *port, int fd)
>> return;
>>
>> /* Now we are ready to start reading data from the client */
>> - collect_metadata_from_client(fd, ofd);
>> + if (proto_ver == V2_PROTOCOL)
>> + tracecmd_msg_collect_metadata(fd, ofd);
>> + else
>> + collect_metadata_from_client(fd, ofd);
>>
>> /* wait a little to let our readers finish reading */
>> sleep(1);
>> diff --git a/trace-msg.c b/trace-msg.c
>> new file mode 100644
>> index 0000000..cf82ff6
>> --- /dev/null
>> +++ b/trace-msg.c
>> @@ -0,0 +1,683 @@
>> +/*
>> + * trace-msg.c : define message protocol for communication between clients and
>> + * a server
>> + *
>> + * Copyright (C) 2013 Hitachi, Ltd.
>> + * Created by Yoshihiro YUNOMAE <[email protected]>
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; version 2 of the License (not later!)
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses>
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + */
>> +
>> +#include <errno.h>
>> +#include <poll.h>
>> +#include <fcntl.h>
>> +#include <limits.h>
>> +#include <stddef.h>
>> +#include <stdio.h>
>> +#include <unistd.h>
>> +#include <arpa/inet.h>
>> +#include <sys/types.h>
>> +#include <linux/types.h>
>> +
>> +#include "trace-cmd-local.h"
>> +#include "trace-msg.h"
>> +
>> +typedef __u32 u32;
>> +typedef __be32 be32;
>> +
>> +#define TRACECMD_MSG_MAX_LEN BUFSIZ
>> +
>> + /* size + cmd */
>> +#define TRACECMD_MSG_HDR_LEN ((sizeof(be32)) + (sizeof(be32)))
>> +
>> + /* + size of the metadata */
>> +#define TRACECMD_MSG_META_MIN_LEN \
>> + ((TRACECMD_MSG_HDR_LEN) + (sizeof(be32)))
>> +
>> + /* - header size for error msg */
>> +#define TRACECMD_MSG_META_MAX_LEN \
>> +((TRACECMD_MSG_MAX_LEN) - (TRACECMD_MSG_META_MIN_LEN) - TRACECMD_MSG_HDR_LEN)
>> +
>> + /* size + opt_cmd + size of str */
>> +#define TRACECMD_OPT_MIN_LEN \
>> + ((sizeof(be32)) + (sizeof(be32)) +(sizeof(be32)))
>> +
>> +
>> +#define CPU_MAX 256
>> +
>> +/* for both client and server */
>> +bool use_tcp;
>> +int cpu_count;
>> +
>> +/* for client */
>> +static int psfd;
>> +unsigned int page_size;
>> +int *client_ports;
>> +bool send_metadata;
>> +
>> +/* for server */
>> +static int *port_array;
>> +bool done;
>> +
>> +struct tracecmd_msg_str {
>> + be32 size;
>> + char *buf;
>> +} __attribute__((packed));
>> +
>> +struct tracecmd_msg_opt {
>> + be32 size;
>> + be32 opt_cmd;
>> + struct tracecmd_msg_str str;
>> +};
>> +
>> +struct tracecmd_msg_tinit {
>> + be32 cpus;
>> + be32 page_size;
>> + be32 opt_num;
>> + struct tracecmd_msg_opt *opt;
>> +} __attribute__((packed));
>> +
>> +struct tracecmd_msg_rinit {
>> + be32 cpus;
>> + be32 port_array[CPU_MAX];
>> +} __attribute__((packed));
>> +
>> +struct tracecmd_msg_meta {
>> + struct tracecmd_msg_str str;
>> +};
>> +
>> +struct tracecmd_msg_error {
>> + be32 size;
>> + be32 cmd;
>> + union {
>> + struct tracecmd_msg_tinit tinit;
>> + struct tracecmd_msg_rinit rinit;
>> + struct tracecmd_msg_meta meta;
>> + } data;
>> +} __attribute__((packed));
>> +
>> +enum tracecmd_msg_cmd {
>> + MSG_CLOSE = 1,
>> + MSG_TINIT = 4,
>> + MSG_RINIT = 5,
>> + MSG_SENDMETA = 6,
>> + MSG_FINMETA = 7,
>> +};
>> +
>> +struct tracecmd_msg {
>> + be32 size;
>> + be32 cmd;
>> + union {
>> + struct tracecmd_msg_tinit tinit;
>> + struct tracecmd_msg_rinit rinit;
>> + struct tracecmd_msg_meta meta;
>> + struct tracecmd_msg_error err;
>> + } data;
>> +} __attribute__((packed));
>> +
>> +struct tracecmd_msg *errmsg;
>> +
>> +static ssize_t msg_do_write_check(int fd, struct tracecmd_msg *msg)
>> +{
>> + return __do_write_check(fd, msg, ntohl(msg->size));
>> +}
>> +
>> +static struct tracecmd_msg *tracecmd_msg_alloc(u32 size)
>> +{
>> + size += TRACECMD_MSG_HDR_LEN;
>> + return malloc(size);
>> +}
>> +
>> +static void tracecmd_msg_init(u32 cmd, u32 size, struct tracecmd_msg *msg)
>> +{
>> + size += TRACECMD_MSG_HDR_LEN;
>> + memset(msg, 0, size);
>> + msg->size = htonl(size);
>> + msg->cmd = htonl(cmd);
>> +}
>> +
>> +static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
>> +{
>> + memcpy(dest+offset, buf, buflen);
>> +}
>> +
>> +enum msg_opt_command {
>> + MSGOPT_USETCP = 1,
>> +};
>> +
>> +static struct tracecmd_msg_opt *tracecmd_msg_opt_alloc(u32 len)
>> +{
>> + len += TRACECMD_OPT_MIN_LEN;
>> + return malloc(len);
>> +}
>> +
>> +static void make_option(int opt_cmd, const char *buf,
>> + struct tracecmd_msg_opt *opt)
>> +{
>> + u32 buflen = 0;
>> + u32 size = TRACECMD_OPT_MIN_LEN;
>> +
>> + if (buf) {
>> + buflen = strlen(buf);
>> + size += buflen;
>> + }
>> +
>> + opt->size = htonl(size);
>> + opt->opt_cmd = htonl(opt_cmd);
>> + opt->str.size = htonl(buflen);
>> +
>> + if (buf)
>> + bufcpy(opt, TRACECMD_OPT_MIN_LEN, buf, buflen);
>> +}
>> +
>> +static int add_options_to_tinit(u32 len, struct tracecmd_msg *msg)
>> +{
>> + struct tracecmd_msg_opt *opt;
>> + int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
>> +
>> + if (use_tcp) {
>> + opt = tracecmd_msg_opt_alloc(0);
>> + if (!opt)
>> + return -ENOMEM;
>> +
>> + make_option(MSGOPT_USETCP, NULL, opt);
>> + /* add option */
>> + bufcpy(msg, offset, opt, ntohl(opt->size));
>> + free(opt);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int make_tinit(u32 len, struct tracecmd_msg *msg)
>> +{
>> + int opt_num = 0;
>> + int ret = 0;
>> +
>> + if (use_tcp)
>> + opt_num++;
>> +
>> + if (opt_num) {
>> + ret = add_options_to_tinit(len, msg);
>> + if (ret < 0)
>> + return ret;
>> + }
>> +
>> + msg->data.tinit.cpus = htonl(cpu_count);
>> + msg->data.tinit.page_size = htonl(page_size);
>> + msg->data.tinit.opt_num = htonl(opt_num);
>> +
>> + return 0;
>> +}
>> +
>> +static int make_rinit(struct tracecmd_msg *msg)
>> +{
>> + int i;
>> + u32 offset = TRACECMD_MSG_HDR_LEN;
>> + be32 port;
>> +
>> + msg->data.rinit.cpus = htonl(cpu_count);
>> +
>> + for (i = 0; i < cpu_count; i++) {
>> + /* + rrqports->cpus or rrqports->port_array[i] */
>> + offset += sizeof(be32);
>> + port = htonl(port_array[i]);
>> + bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static u32 tracecmd_msg_get_body_length(u32 cmd)
>> +{
>> + struct tracecmd_msg *msg;
>> + u32 len = 0;
>> +
>> + switch (cmd) {
>> + case MSG_TINIT:
>> + len = sizeof(msg->data.tinit.cpus)
>> + + sizeof(msg->data.tinit.page_size)
>> + + sizeof(msg->data.tinit.opt_num);
>> +
>> + /*
>> + * If we are using IPV4 and our page size is greater than
>> + * or equal to 64K, we need to punt and use TCP. :-(
>> + */
>> +
>> + /* TODO, test for ipv4 */
>> + if (page_size >= UDP_MAX_PACKET) {
>> + warning("page size too big for UDP using TCP "
>> + "in live read");
>> + use_tcp = true;
>> + }
>> +
>> + if (use_tcp)
>> + len += TRACECMD_OPT_MIN_LEN;
>> +
>> + return len;
>> + case MSG_RINIT:
>> + return sizeof(msg->data.rinit.cpus)
>> + + sizeof(msg->data.rinit.port_array);
>> + case MSG_SENDMETA:
>> + return TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN;
>> + case MSG_CLOSE:
>> + case MSG_FINMETA:
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
>> +{
>> + switch (cmd) {
>> + case MSG_TINIT:
>> + return make_tinit(len, msg);
>> + case MSG_RINIT:
>> + return make_rinit(msg);
>> + case MSG_CLOSE:
>> + case MSG_SENDMETA: /* meta data is not stored here. */
>> + case MSG_FINMETA:
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tracecmd_msg_create(u32 cmd, struct tracecmd_msg **msg)
>> +{
>> + u32 len = 0;
>> + int ret = 0;
>> +
>> + len = tracecmd_msg_get_body_length(cmd);
>> + if (len > (TRACECMD_MSG_MAX_LEN - TRACECMD_MSG_HDR_LEN)) {
>> + plog("Exceed maximum message size cmd=%d\n", cmd);
>> + return -EINVAL;
>> + }
>> +
>> + *msg = tracecmd_msg_alloc(len);
>> + if (!*msg)
>> + return -ENOMEM;
>> + tracecmd_msg_init(cmd, len, *msg);
>> +
>> + ret = tracecmd_msg_make_body(cmd, len, *msg);
>> + if (ret < 0)
>> + free(*msg);
>> +
>> + return ret;
>> +}
>> +
>> +static int tracecmd_msg_send(int fd, u32 cmd)
>> +{
>> + struct tracecmd_msg *msg = NULL;
>> + int ret = 0;
>> +
>> + if (cmd > MSG_FINMETA) {
>> + plog("Unsupported command: %d\n", cmd);
>> + return -EINVAL;
>> + }
>> +
>> + ret = tracecmd_msg_create(cmd, &msg);
>> + if (ret < 0)
>> + return ret;
>> +
>> + ret = msg_do_write_check(fd, msg);
>> + if (ret < 0) {
>> + free(msg);
>> + return -ECOMM;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tracecmd_msg_read_extra(int fd, char *buf, u32 size, int *n)
>> +{
>> + int r = 0;
>> +
>> + do {
>> + r = read(fd, buf+*n, size);
>> + if (r < 0) {
>> + if (errno == EINTR)
>> + continue;
>> + return -errno;
>> + } else if (!r)
>> + return -ENOTCONN;
>> + size -= r;
>> + *n += r;
>> + } while (size);
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Read header information of msg first, then read all data
>> + */
>> +static int tracecmd_msg_recv(int fd, char *buf)
>> +{
>> + struct tracecmd_msg *msg;
>> + u32 size = 0;
>> + int n = 0;
>> + int ret;
>> +
>> + ret = tracecmd_msg_read_extra(fd, buf, TRACECMD_MSG_HDR_LEN, &n);
>> + if (ret < 0)
>> + return ret;
>> +
>> + msg = (struct tracecmd_msg *)buf;
>> + size = ntohl(msg->size);
>> + if (size > TRACECMD_MSG_MAX_LEN)
>> + /* too big */
>> + goto error;
>> + else if (size < TRACECMD_MSG_HDR_LEN)
>> + /* too small */
>> + goto error;
>> + else if (size > TRACECMD_MSG_HDR_LEN) {
>> + size -= TRACECMD_MSG_HDR_LEN;
>> + return tracecmd_msg_read_extra(fd, buf, size, &n);
>> + }
>> +
>> + return 0;
>> +error:
>> + plog("Receive an invalid message(size=%d)\n", size);
>> + return -ENOMSG;
>> +}
>> +
>> +static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
>> +{
>> + return (void *)msg + offset;
>> +}
>> +
>> +static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
>> +{
>> + char msg_tmp[TRACECMD_MSG_MAX_LEN];
>> + u32 cmd;
>> + int ret;
>> +
>> + ret = tracecmd_msg_recv(fd, msg_tmp);
>> + if (ret < 0)
>> + return ret;
>> +
>> + *msg = (struct tracecmd_msg *)msg_tmp;
>> + cmd = ntohl((*msg)->cmd);
>> + if (cmd == MSG_CLOSE)
>> + return -ECONNABORTED;
>> +
>> + return 0;
>> +}
>> +
>> +static int tracecmd_msg_send_and_wait_for_msg(int fd, u32 cmd, struct tracecmd_msg **msg)
>> +{
>> + int ret;
>> +
>> + ret = tracecmd_msg_send(fd, cmd);
>> + if (ret < 0)
>> + return ret;
>> +
>> + ret = tracecmd_msg_wait_for_msg(fd, msg);
>> + if (ret < 0)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +int tracecmd_msg_send_init_data(int fd)
>> +{
>> + struct tracecmd_msg *msg;
>> + int i, cpus;
>> + int ret;
>> +
>> + ret = tracecmd_msg_send_and_wait_for_msg(fd, MSG_TINIT, &msg);
>> + if (ret < 0)
>> + return ret;
>> +
>> + cpus = ntohl(msg->data.rinit.cpus);
>> + client_ports = malloc_or_die(sizeof(int) * cpus);
>> + for (i = 0; i < cpus; i++)
>> + client_ports[i] = ntohl(msg->data.rinit.port_array[i]);
>> +
>> + /* Next, send meta data */
>> + send_metadata = true;
>> +
>> + return 0;
>> +}
>> +
>> +static bool process_option(struct tracecmd_msg_opt *opt)
>> +{
>> + /* currently the only option we have is to us TCP */
>> + if (ntohl(opt->opt_cmd) == MSGOPT_USETCP) {
>> + use_tcp = true;
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> +static void error_operation_for_server(struct tracecmd_msg *msg)
>> +{
>> + u32 cmd;
>> +
>> + cmd = ntohl(msg->cmd);
>> +
>> + warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
>> +}
>> +
>> +#define MAX_OPTION_SIZE 4096
>> +
>> +int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize)
>> +{
>> + struct tracecmd_msg *msg;
>> + struct tracecmd_msg_opt *opt;
>> + char buf[TRACECMD_MSG_MAX_LEN];
>> + int offset = offsetof(struct tracecmd_msg, data.tinit.opt);
>> + int options, i, s;
>> + int ret;
>> + u32 size = 0;
>> + u32 cmd;
>> +
>> + ret = tracecmd_msg_recv(fd, buf);
>> + if (ret < 0)
>> + return ret;
>> +
>> + msg = (struct tracecmd_msg *)buf;
>> + cmd = ntohl(msg->cmd);
>> + if (cmd != MSG_TINIT) {
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> +
>> + *cpus = ntohl(msg->data.tinit.cpus);
>> + plog("cpus=%d\n", *cpus);
>> + if (*cpus < 0) {
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> +
>> + *pagesize = ntohl(msg->data.tinit.page_size);
>> + plog("pagesize=%d\n", *pagesize);
>> + if (*pagesize <= 0) {
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> +
>> + options = ntohl(msg->data.tinit.opt_num);
>> + for (i = 0; i < options; i++) {
>> + offset += size;
>> + opt = tracecmd_msg_buf_access(msg, offset);
>> + size = ntohl(opt->size);
>> + /* prevent a client from killing us */
>> + if (size > MAX_OPTION_SIZE) {
>> + plog("Exceed MAX_OPTION_SIZE\n");
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> + s = process_option(opt);
>> + /* do we understand this option? */
>> + if (!s) {
>> + plog("Cannot understand(%d:%d:%d)\n",
>> + i, ntohl(opt->size), ntohl(opt->opt_cmd));
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> + }
>> +
>> + return 0;
>> +
>> +error:
>> + error_operation_for_server(msg);
>> + return ret;
>> +}
>> +
>> +int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports)
>> +{
>> + int ret;
>> +
>> + cpu_count = total_cpus;
>> + port_array = ports;
>> +
>> + ret = tracecmd_msg_send(fd, MSG_RINIT);
>> + if (ret < 0)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +void tracecmd_msg_send_close_msg()
>> +{
>> + tracecmd_msg_send(psfd, MSG_CLOSE);
>> +}
>> +
>> +static void make_meta(const char *buf, int buflen, struct tracecmd_msg *msg)
>> +{
>> + int offset = offsetof(struct tracecmd_msg, data.meta.str.buf);
>> +
>> + msg->data.meta.str.size = htonl(buflen);
>> + bufcpy(msg, offset, buf, buflen);
>> +}
>> +
>> +int tracecmd_msg_metadata_send(int fd, char *buf, int size)
>> +{
>> + struct tracecmd_msg *msg;
>> + int n, len;
>> + int ret;
>> + int count = 0;
>> +
>> + ret = tracecmd_msg_create(MSG_SENDMETA, &msg);
>> + if (ret < 0)
>> + return ret;
>> +
>> + n = size;
>> + do {
>> + if (n > TRACECMD_MSG_META_MAX_LEN) {
>> + make_meta(buf+count, TRACECMD_MSG_META_MAX_LEN, msg);
>> + n -= TRACECMD_MSG_META_MAX_LEN;
>> + count += TRACECMD_MSG_META_MAX_LEN;
>> + } else {
>> + make_meta(buf+count, n, msg);
>> + /*
>> + * TRACECMD_MSG_META_MAX_LEN is stored in msg->size,
>> + * so update the size to the correct value.
>> + */
>> + len = TRACECMD_MSG_META_MIN_LEN + n;
>> + msg->size = htonl(len);
>> + n = 0;
>> + }
>> +
>> + ret = msg_do_write_check(fd, msg);
>> + if (ret < 0)
>> + return ret;
>> + } while (n);
>> +
>> + return 0;
>> +}
>> +
>> +int tracecmd_msg_finish_sending_metadata(int fd)
>> +{
>> + int ret;
>> +
>> + ret = tracecmd_msg_send(fd, MSG_FINMETA);
>> + if (ret < 0)
>> + return ret;
>> +
>> + /* psfd will be used for closing */
>> + psfd = fd;
>> + return 0;
>> +}
>> +
>> +int tracecmd_msg_collect_metadata(int ifd, int ofd)
>> +{
>> + struct tracecmd_msg *msg;
>> + char buf[TRACECMD_MSG_MAX_LEN];
>> + u32 s, t, n, cmd;
>> + int offset = TRACECMD_MSG_META_MIN_LEN;
>> + int ret;
>> +
>> + do {
>> + ret = tracecmd_msg_recv(ifd, buf);
>> + if (ret < 0) {
>> + warning("reading client");
>> + return ret;
>> + }
>> +
>> + msg = (struct tracecmd_msg *)buf;
>> + cmd = ntohl(msg->cmd);
>> + if (cmd == MSG_FINMETA) {
>> + /* Finish receiving meta data */
>> + break;
>> + } else if (cmd != MSG_SENDMETA)
>> + goto error;
>> +
>> + n = ntohl(msg->data.meta.str.size);
>> + t = n;
>> + s = 0;
>> + do {
>> + s = write(ofd, buf+s+offset, t);
>> + if (s < 0) {
>> + if (errno == EINTR)
>> + continue;
>> + warning("writing to file");
>> + return -errno;
>> + }
>> + t -= s;
>> + s = n - t;
>> + } while (t);
>> + } while (cmd == MSG_SENDMETA);
>> +
>> + /* check the finish message of the client */
>> + while(!done) {
>> + ret = tracecmd_msg_recv(ifd, buf);
>> + if (ret < 0) {
>> + warning("reading client");
>> + return ret;
>> + }
>> +
>> + msg = (struct tracecmd_msg *)buf;
>> + cmd = ntohl(msg->cmd);
>> + if (cmd == MSG_CLOSE)
>> + /* Finish this connection */
>> + break;
>> + else {
>> + warning("Not accept the message %d", ntohl(msg->cmd));
>> + ret = -EINVAL;
>> + goto error;
>> + }
>> + }
>> +
>> + return 0;
>> +
>> +error:
>> + error_operation_for_server(msg);
>> + return ret;
>> +}
>> diff --git a/trace-msg.h b/trace-msg.h
>> new file mode 100644
>> index 0000000..b23e72b
>> --- /dev/null
>> +++ b/trace-msg.h
>> @@ -0,0 +1,27 @@
>> +#ifndef _TRACE_MSG_H_
>> +#define _TRACE_MSG_H_
>> +
>> +#include <stdbool.h>
>> +
>> +#define UDP_MAX_PACKET (65536 - 20)
>> +#define V2_MAGIC "677768\0"
>> +
>> +#define V1_PROTOCOL 1
>> +#define V2_PROTOCOL 2
>> +
>> +/* for both client and server */
>> +extern bool use_tcp;
>> +extern int cpu_count;
>> +
>> +/* for client */
>> +extern unsigned int page_size;
>> +extern int *client_ports;
>> +extern bool send_metadata;
>> +
>> +/* for server */
>> +extern bool done;
>> +
>> +void plog(const char *fmt, ...);
>> +void pdie(const char *fmt, ...);
>> +
>> +#endif /* _TRACE_MSG_H_ */
>> diff --git a/trace-output.c b/trace-output.c
>> index bdb478d..6e1298b 100644
>> --- a/trace-output.c
>> +++ b/trace-output.c
>> @@ -36,6 +36,7 @@
>> #include <glob.h>
>>
>> #include "trace-cmd-local.h"
>> +#include "trace-msg.h"
>> #include "version.h"
>>
>> /* We can't depend on the host size for size_t, all must be 64 bit */
>> @@ -80,6 +81,9 @@ struct list_event_system {
>> static stsize_t
>> do_write_check(struct tracecmd_output *handle, void *data, tsize_t size)
>> {
>> + if (send_metadata)
>> + return tracecmd_msg_metadata_send(handle->fd, data, size);
>> +
>> return __do_write_check(handle->fd, data, size);
>> }
>>
>> diff --git a/trace-record.c b/trace-record.c
>> index 0199627..ebfe6c0 100644
>> --- a/trace-record.c
>> +++ b/trace-record.c
>> @@ -45,6 +45,7 @@
>> #include <errno.h>
>>
>> #include "trace-local.h"
>> +#include "trace-msg.h"
>>
>> #define _STR(x) #x
>> #define STR(x) _STR(x)
>> @@ -59,29 +60,21 @@
>> #define STAMP "stamp"
>> #define FUNC_STACK_TRACE "func_stack_trace"
>>
>> -#define UDP_MAX_PACKET (65536 - 20)
>> -
>> static int tracing_on_init_val;
>>
>> static int rt_prio;
>>
>> -static int use_tcp;
>> -
>> -static unsigned int page_size;
>> -
>> static int buffer_size;
>>
>> static const char *output_file = "trace.dat";
>>
>> static int latency;
>> static int sleep_time = 1000;
>> -static int cpu_count;
>> static int recorder_threads;
>> static int *pids;
>> static int buffers;
>>
>> static char *host;
>> -static int *client_ports;
>> static int sfd;
>>
>> /* Max size to let a per cpu file get */
>> @@ -99,6 +92,8 @@ static unsigned recorder_flags;
>> /* Try a few times to get an accurate date */
>> static int date2ts_tries = 5;
>>
>> +static int proto_ver = V2_PROTOCOL;
>> +
>> struct func_list {
>> struct func_list *next;
>> const char *func;
>> @@ -1607,20 +1602,26 @@ static int create_recorder(struct buffer_instance *instance, int cpu, int extrac
>> exit(0);
>> }
>>
>> -static void communicate_with_listener(int fd)
>> +static void check_first_msg_from_server(int fd)
>> {
>> char buf[BUFSIZ];
>> - ssize_t n;
>> - int cpu, i;
>>
>> - n = read(fd, buf, 8);
>> + read(fd, buf, 8);
>>
>> /* Make sure the server is the tracecmd server */
>> if (memcmp(buf, "tracecmd", 8) != 0)
>> die("server not tracecmd server");
>> +}
>>
>> - /* write the number of CPUs we have (in ASCII) */
>> +static void communicate_with_listener_v1(int fd)
>> +{
>> + char buf[BUFSIZ];
>> + ssize_t n;
>> + int cpu, i;
>> +
>> + check_first_msg_from_server(fd);
>>
>> + /* write the number of CPUs we have (in ASCII) */
>> sprintf(buf, "%d", cpu_count);
>>
>> /* include \0 */
>> @@ -1675,6 +1676,46 @@ static void communicate_with_listener(int fd)
>> }
>> }
>>
>> +static void communicate_with_listener_v2(int fd)
>> +{
>> + if (tracecmd_msg_send_init_data(fd) < 0)
>> + die("Cannot communicate with server");
>> +}
>> +
>> +static void check_protocol_version(int fd)
>> +{
>> + char buf[BUFSIZ];
>> +
>> + check_first_msg_from_server(fd);
>> +
>> + /*
>> + * Write the protocol version, the magic number, and the dummy
>> + * option(0) (in ASCII). The client understands whether the client
>> + * uses the v2 protocol or not by checking a reply message from the
>> + * server. If the message is "V2", the server uses v2 protocol. On the
>> + * other hands, if the message is just number strings, the server
>> + * returned port numbers. So, in that time, the client understands the
>> + * server uses the v1 protocol. However, the old server tells the
>> + * client port numbers after reading cpu_count, page_size, and option.
>> + * So, we add the dummy number (the magic number and 0 option) to the
>> + * first client message.
>> + */
>> + write(fd, "V2\0"V2_MAGIC"0", sizeof(V2_MAGIC)+4);
>> +
>> + /* read a reply message */
>> + read(fd, buf, BUFSIZ);
>> +
>> + if (!buf[0]) {
>> + /* the server uses the v1 protocol, so we'll use it */
>> + proto_ver = V1_PROTOCOL;
>> + plog("Use the v1 protocol\n");
>> + } else {
>> + if (memcmp(buf, "V2", 2) != 0)
>> + die("Cannot handle the protocol %s", buf);
>> + /* OK, let's use v2 protocol */
>> + }
>> +}
>> +
>> static void setup_network(void)
>> {
>> struct tracecmd_output *handle;
>> @@ -1703,6 +1744,7 @@ static void setup_network(void)
>> hints.ai_family = AF_UNSPEC;
>> hints.ai_socktype = SOCK_STREAM;
>>
>> +again:
>> s = getaddrinfo(server, port, &hints, &result);
>> if (s != 0)
>> die("getaddrinfo: %s", gai_strerror(s));
>> @@ -1723,16 +1765,32 @@ static void setup_network(void)
>>
>> freeaddrinfo(result);
>>
>> - communicate_with_listener(sfd);
>> + if (proto_ver == V2_PROTOCOL) {
>> + check_protocol_version(sfd);
>> + if (proto_ver == V1_PROTOCOL) {
>> + /* reconnect to the server for using the v1 protocol */
>> + close(sfd);
>> + goto again;
>> + }
>> + communicate_with_listener_v2(sfd);
>> + }
>> +
>> + if (proto_ver == V1_PROTOCOL)
>> + communicate_with_listener_v1(sfd);
>>
>> /* Now create the handle through this socket */
>> handle = tracecmd_create_init_fd_glob(sfd, listed_events);
>>
>> + if (proto_ver == V2_PROTOCOL)
>> + tracecmd_msg_finish_sending_metadata(sfd);
>> +
>> /* OK, we are all set, let'r rip! */
>> }
>>
>> static void finish_network(void)
>> {
>> + if (proto_ver == V2_PROTOCOL)
>> + tracecmd_msg_send_close_msg();
>> close(sfd);
>> free(host);
>> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-17 21:11:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

On Thu, 17 Oct 2013 15:32:37 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:


> OK. As you say, when we know domains which we will boot, trace-cmd
> should make those automatically. So, I'll add this feature.
> Note that if we don't know domains when we boot virt-sevrer, we must
> make those manually now.
>
> In this patch set, virt-server always uses /tmp/trace-cmd/virt, so
> we had better indicate only the domain name with d option, I think.
>
> trace-cmd virt-server -d domain -m 710 -g qemu
>
> What do you think about this?

Yes, yes, of course. I only added the full path because I blindly cut
and pasted what was there.

-- Steve

2013-10-17 21:21:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

On Thu, 17 Oct 2013 15:34:17 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:
> recorder error in splice output
>
> It seems to be not due to applying my patch.
> We cannot use "localhost" for trace-cmd(v1.2).
> When we use "127.0.0.1", this problem does not occur.

You're right. Hmm, it's actually quite annoying that we can't use
localhost. I may have to fix that :-/

-- Steve

2013-10-18 02:19:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

On Fri, 13 Sep 2013 11:06:32 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:


> diff --git a/trace-msg.c b/trace-msg.c
> new file mode 100644
> index 0000000..cf82ff6
> --- /dev/null
> +++ b/trace-msg.c
> @@ -0,0 +1,683 @@
> +/*
> + * trace-msg.c : define message protocol for communication between clients and
> + * a server
> + *
> + * Copyright (C) 2013 Hitachi, Ltd.
> + * Created by Yoshihiro YUNOMAE <[email protected]>
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; version 2 of the License (not later!)
> + *

BTW, as this can be used in a library, would be OK with re-licensing
this as the LGPL v2.1?

I try to keep all the helper C files as LGPL and the program files as
GPL. I would have made all of trace-cmd LGPL, but there was some code I
needed that was GPL, and I could only use it if I had the program as
GPL code.

-- Steve


> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses>
> + *
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + */
> +

2013-10-18 02:32:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] trace-cmd: Add virt-server mode for a virtualization environment

On Fri, 13 Sep 2013 11:06:37 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:

> static int *create_all_readers(int cpus, const char *node, const char *port,
> - int pagesize, int fd)
> + const char *domain, int virtpid, int pagesize, int fd)
> {
> char buf[BUFSIZ];
> - int *port_array;
> + int *port_array = NULL;
> int *pid_array;
> int start_port;
> int udp_port;
> int cpu;
> int pid;
>
> - port_array = malloc_or_die(sizeof(int) * cpus);
> + if (node) {
> + port_array = malloc_or_die(sizeof(int) * cpus);
> + start_port = START_PORT_SEARCH;
> + }
> pid_array = malloc_or_die(sizeof(int) * cpus);
> memset(pid_array, 0, sizeof(int) * cpus);
>
> - start_port = START_PORT_SEARCH;
> -
> - /* Now create a UDP port for each CPU */
> + /* Now create a reader for each CPU */
> for (cpu = 0; cpu < cpus; cpu++) {
> - udp_port = open_udp(node, port, &pid, cpu,
> - pagesize, start_port);
> - if (udp_port < 0)
> - goto out_free;
> - port_array[cpu] = udp_port;
> + if (node) {
> + udp_port = open_udp(node, port, &pid, cpu,
> + pagesize, start_port);
> + if (udp_port < 0)
> + goto out_free;
> + port_array[cpu] = udp_port;
> + /*
> + * due to some bugging finding ports,

s/due/Due/

> + * force search after last port
> + */
> + start_port = udp_port + 1;
> + } else {
> + if (open_virtio_serial_pipe(&pid, cpu, pagesize,
> + domain, virtpid) < 0)
> + goto out_free;
> + }
> pid_array[cpu] = pid;
> /*
> * Due to some bugging finding ports,

Hmm, it seems that you added the start_port = udp_port + 1 above, but
shouldn't you remove the one here?

> @@ -482,7 +595,7 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
> return pid_array;
>
> out_free:
> - destroy_all_readers(cpus, pid_array, node, port);
> + destroy_all_readers(cpus, pid_array, node, port, domain, virtpid);
> return NULL;
> }
>
> @@ -524,7 +637,7 @@ static void stop_all_readers(int cpus, int *pid_array)
> }
>
> static void put_together_file(int cpus, int ofd, const char *node,
> - const char *port)
> + const char *port, const char *domain, int virtpid)
> {
> char **temp_files;
> int cpu;
> @@ -533,25 +646,31 @@ static void put_together_file(int cpus, int ofd, const char *node,
> temp_files = malloc_or_die(sizeof(*temp_files) * cpus);
>
> for (cpu = 0; cpu < cpus; cpu++)
> - temp_files[cpu] = get_temp_file(node, port, cpu);
> + temp_files[cpu] = get_temp_file(node, port, domain,
> + virtpid, cpu);
>
> tracecmd_attach_cpu_data_fd(ofd, cpus, temp_files);
> free(temp_files);
> }
>
> -static void process_client(const char *node, const char *port, int fd)
> +static void process_client(const char *node, const char *port,
> + const char *domain, int virtpid, int fd)
> {
> int *pid_array;
> int pagesize;
> int cpus;
> int ofd;
>
> - if (communicate_with_client(fd, &cpus, &pagesize) < 0)
> - return;
> -
> - ofd = create_client_file(node, port);
> + if (node) {
> + if (communicate_with_client_nw(fd, &cpus, &pagesize) < 0)

I take it _nw is for "network". If so, please use "*_net" instead. "nw"
is pretty meaningless.

This applies for all functions.

-- Steve

> + return;
> + } else {
> + if (communicate_with_client_virt(fd, domain, &cpus, &pagesize) < 0)
> + return;
> + }
>
> - pid_array = create_all_readers(cpus, node, port, pagesize, fd);
> + ofd = create_client_file(node, port, domain, virtpid);
> + pid_array = create_all_readers(cpus, node, port, domain, virtpid, pagesize, fd);
> if (!pid_array)
> return;
>
> @@ -570,9 +689,22 @@ static void process_client(const char *node, const char *port, int fd)
> /* wait a little to have the readers clean up */
> sleep(1);
>
> - put_together_file(cpus, ofd, node, port);
> + put_together_file(cpus, ofd, node, port, domain, virtpid);
>
> - destroy_all_readers(cpus, pid_array, node, port);
> + destroy_all_readers(cpus, pid_array, node, port, domain, virtpid);
> +}
> +
> +static void process_client_nw(const char *node, const char *port, int fd)
> +{
> + process_client(node, port, NULL, 0, fd);
> +}
> +
> +static void process_client_virt(const char *domain, int virtpid, int fd)
> +{
> + /* keep connection to qemu if clients on guests finish operation */
> + do {
> + process_client(NULL, NULL, domain, virtpid, fd);
> + } while (!done);
> }
>
> static int do_fork(int cfd)
> @@ -599,8 +731,8 @@ static int do_fork(int cfd)
> return 0;
> }
>
> -static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
> - socklen_t peer_addr_len)
> +static int do_connection(int cfd, struct sockaddr *peer_addr,
> + socklen_t *peer_addr_len, const char *domain, int virtpid)
> {
> char host[NI_MAXHOST], service[NI_MAXSERV];
> int s;
> @@ -610,21 +742,22 @@ static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
> if (ret)
> return ret;
>
> - s = getnameinfo((struct sockaddr *)peer_addr, peer_addr_len,
> - host, NI_MAXHOST,
> - service, NI_MAXSERV, NI_NUMERICSERV);
> -
> - if (s == 0)
> - plog("Connected with %s:%s\n",
> - host, service);
> - else {
> - plog("Error with getnameinfo: %s\n",
> - gai_strerror(s));
> - close(cfd);
> - return -1;
> - }
> -
> - process_client(host, service, cfd);
> + if (peer_addr) {
> + s = getnameinfo(peer_addr, *peer_addr_len, host, NI_MAXHOST,
> + service, NI_MAXSERV, NI_NUMERICSERV);
> +
> + if (s == 0)
> + plog("Connected with %s:%s\n",
> + host, service);
> + else {
> + plog("Error with getnameinfo: %s\n",
> + gai_strerror(s));
> + close(cfd);
> + return -1;
> + }
> + process_client_nw(host, service, cfd);
> + } else
> + process_client_virt(domain, virtpid, cfd);
>
> close(cfd);
>
> @@ -634,6 +767,77 @@ static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
> return 0;
> }
>
> +static int do_connection_nw(int cfd, struct sockaddr *addr, socklen_t *addrlen)
> +{
> + return do_connection(cfd, addr, addrlen, NULL, 0);
> +}
> +
> +#define LIBVIRT_DOMAIN_PATH "/var/run/libvirt/qemu/"
> +
> +/* We can convert pid to domain name of a guest when we use libvirt. */
> +static char *get_guest_domain_from_pid(int pid)
> +{
> + struct dirent *dirent;
> + char file_name[NAME_MAX];
> + char *file_name_ret, *domain;
> + char buf[BUFSIZ];
> + DIR *dir;
> + size_t doml;
> + int fd;
> +
> + dir = opendir(LIBVIRT_DOMAIN_PATH);
> + if (!dir) {
> + if (errno == ENOENT)
> + warning("Only support for using libvirt");
> + return NULL;
> + }
> +
> + for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
> + snprintf(file_name, NAME_MAX, LIBVIRT_DOMAIN_PATH"%s",
> + dirent->d_name);
> + file_name_ret = strstr(file_name, ".pid");
> + if (file_name_ret) {
> + fd = open(file_name, O_RDONLY);
> + if (fd < 0)
> + return NULL;
> + if (read(fd, buf, BUFSIZ) < 0)
> + return NULL;
> +
> + if (pid == atoi(buf)) {
> + /* not include /var/run/libvirt/qemu */
> + doml = (size_t)(file_name_ret - file_name)
> + - strlen(LIBVIRT_DOMAIN_PATH);
> + domain = strndup(file_name +
> + strlen(LIBVIRT_DOMAIN_PATH),
> + doml);
> + plog("start %s:%d\n", domain, pid);
> + return domain;
> + }
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static int do_connection_virt(int cfd)
> +{
> + struct ucred cr;
> + socklen_t cl;
> + int ret;
> + char *domain;
> +
> + cl = sizeof(cr);
> + ret = getsockopt(cfd, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
> + if (ret < 0)
> + return ret;
> +
> + domain = get_guest_domain_from_pid(cr.pid);
> + if (!domain)
> + return -1;
> +
> + return do_connection(cfd, NULL, NULL, domain, cr.pid);
> +}
> +
> static int *client_pids;
> static int saved_pids;
> static int size_pids;
> @@ -678,12 +882,11 @@ static void remove_process(int pid)
>
> static void kill_clients(void)
> {
> - int status;
> int i;
>
> for (i = 0; i < saved_pids; i++) {
> kill(client_pids[i], SIGINT);
> - waitpid(client_pids[i], &status, 0);
> + waitpid(client_pids[i], NULL, 0);
> }
>
> saved_pids = 0;
> @@ -702,31 +905,51 @@ static void clean_up(int sig)
> } while (ret > 0);
> }
>
> -static void do_accept_loop(int sfd)
> +static void do_accept_loop(int sfd, bool nw, struct sockaddr *addr,
> + socklen_t *addrlen)
> {
> - struct sockaddr_storage peer_addr;
> - socklen_t peer_addr_len;
> int cfd, pid;
>
> - peer_addr_len = sizeof(peer_addr);
> -
> do {
> - cfd = accept(sfd, (struct sockaddr *)&peer_addr,
> - &peer_addr_len);
> + cfd = accept(sfd, addr, addrlen);
> printf("connected!\n");
> if (cfd < 0 && errno == EINTR)
> continue;
> if (cfd < 0)
> pdie("connecting");
>
> - pid = do_connection(cfd, &peer_addr, peer_addr_len);
> + if (nw)
> + pid = do_connection_nw(cfd, addr, addrlen);
> + else
> + pid = do_connection_virt(cfd);
> if (pid > 0)
> add_process(pid);
>
> } while (!done);
> }
>
> -static void do_listen(char *port)
> +static void do_accept_loop_nw(int sfd)
> +{
> + struct sockaddr_storage peer_addr;
> + socklen_t peer_addr_len;
> +
> + peer_addr_len = sizeof(peer_addr);
> +
> + do_accept_loop(sfd, true, (struct sockaddr *)&peer_addr,
> + &peer_addr_len);
> +}
> +
> +static void do_accept_loop_virt(int sfd)
> +{
> + struct sockaddr_un un_addr;
> + socklen_t un_addrlen;
> +
> + un_addrlen = sizeof(un_addr);
> +
> + do_accept_loop(sfd, false, (struct sockaddr *)&un_addr, &un_addrlen);
> +}
> +
> +static void do_listen_nw(char *port)
> {
> struct addrinfo hints;
> struct addrinfo *result, *rp;
> @@ -764,11 +987,67 @@ static void do_listen(char *port)
> if (listen(sfd, backlog) < 0)
> pdie("listen");
>
> - do_accept_loop(sfd);
> + do_accept_loop_nw(sfd);
>
> kill_clients();
> }
>
> +static void make_virt_if_dir(void)
> +{
> + struct group *group;
> +
> + if (mkdir(TRACE_CMD_DIR, 0710) < 0) {
> + if (errno != EEXIST)
> + pdie("mkdir %s", TRACE_CMD_DIR);
> + }
> + /* QEMU operates as qemu:qemu */
> + chmod(TRACE_CMD_DIR, 0710);
> + group = getgrnam("qemu");
> + if (chown(TRACE_CMD_DIR, -1, group->gr_gid) < 0)
> + pdie("chown %s", TRACE_CMD_DIR);
> +
> + if (mkdir(VIRT_DIR, 0710) < 0) {
> + if (errno != EEXIST)
> + pdie("mkdir %s", VIRT_DIR);
> + }
> + chmod(VIRT_DIR, 0710);
> + if (chown(VIRT_DIR, -1, group->gr_gid) < 0)
> + pdie("chown %s", VIRT_DIR);
> +}
> +
> +static void do_listen_virt(void)
> +{
> + struct sockaddr_un un_server;
> + struct group *group;
> + socklen_t slen;
> + int sfd;
> +
> + make_virt_if_dir();
> +
> + slen = sizeof(un_server);
> + sfd = socket(AF_UNIX, SOCK_STREAM, 0);
> + if (sfd < 0)
> + pdie("socket");
> +
> + un_server.sun_family = AF_UNIX;
> + snprintf(un_server.sun_path, PATH_MAX, VIRT_TRACE_CTL_SOCK);
> +
> + if (bind(sfd, (struct sockaddr *)&un_server, slen) < 0)
> + pdie("bind");
> + chmod(VIRT_TRACE_CTL_SOCK, 0660);
> + group = getgrnam("qemu");
> + if (chown(VIRT_TRACE_CTL_SOCK, -1, group->gr_gid) < 0)
> + pdie("fchown %s", VIRT_TRACE_CTL_SOCK);
> +
> + if (listen(sfd, backlog) < 0)
> + pdie("listen");
> +
> + do_accept_loop_virt(sfd);
> +
> + unlink(VIRT_TRACE_CTL_SOCK);
> + kill_clients();
> +}
> +
> static void start_daemon(void)
> {
> if (daemon(1, 0) < 0)
> @@ -785,11 +1064,17 @@ void trace_listen(int argc, char **argv)
> char *port = NULL;
> int daemon = 0;
> int c;
> + int nw = 0;
> + int virt = 0;
>
> if (argc < 2)
> usage(argv);
>
> - if (strcmp(argv[1], "listen") != 0)
> + if ((nw = (strcmp(argv[1], "listen") == 0)))
> + ; /* do nothing */
> + else if ((virt = (strcmp(argv[1], "virt-server") == 0)))
> + ; /* do nothing */
> + else
> usage(argv);
>
> for (;;) {
> @@ -810,6 +1095,8 @@ void trace_listen(int argc, char **argv)
> usage(argv);
> break;
> case 'p':
> + if (virt)
> + die("-p only available with listen");
> port = optarg;
> break;
> case 'd':
> @@ -832,7 +1119,7 @@ void trace_listen(int argc, char **argv)
> }
> }
>
> - if (!port)
> + if (!port && nw)
> usage(argv);
>
> if ((argc - optind) >= 2)
> @@ -860,7 +1147,10 @@ void trace_listen(int argc, char **argv)
> signal_setup(SIGINT, finish);
> signal_setup(SIGTERM, finish);
>
> - do_listen(port);
> + if (nw)
> + do_listen_nw(port);
> + else
> + do_listen_virt();
>
> return;
> }
> diff --git a/trace-msg.c b/trace-msg.c
> index 61bde54..0b3b356 100644
> --- a/trace-msg.c
> +++ b/trace-msg.c
> @@ -59,6 +59,11 @@ typedef __be32 be32;
>
> #define CPU_MAX 256
>
> +/* use CONNECTION_MSG as a protocol version of trace-msg */
> +#define MSG_VERSION "V2"
> +#define CONNECTION_MSG "tracecmd-" MSG_VERSION
> +#define CONNECTION_MSGSIZE sizeof(CONNECTION_MSG)
> +
> /* for both client and server */
> bool use_tcp;
> int cpu_count;
> @@ -78,6 +83,10 @@ struct tracecmd_msg_str {
> char *buf;
> } __attribute__((packed));
>
> +struct tracecmd_msg_rconnect {
> + struct tracecmd_msg_str str;
> +};
> +
> struct tracecmd_msg_opt {
> be32 size;
> be32 opt_cmd;
> @@ -104,6 +113,7 @@ struct tracecmd_msg_error {
> be32 size;
> be32 cmd;
> union {
> + struct tracecmd_msg_rconnect rconnect;
> struct tracecmd_msg_tinit tinit;
> struct tracecmd_msg_rinit rinit;
> struct tracecmd_msg_meta meta;
> @@ -111,7 +121,10 @@ struct tracecmd_msg_error {
> } __attribute__((packed));
>
> enum tracecmd_msg_cmd {
> + MSG_ERROR = 0,
> MSG_CLOSE = 1,
> + MSG_TCONNECT = 2,
> + MSG_RCONNECT = 3,
> MSG_TINIT = 4,
> MSG_RINIT = 5,
> MSG_SENDMETA = 6,
> @@ -122,6 +135,7 @@ struct tracecmd_msg {
> be32 size;
> be32 cmd;
> union {
> + struct tracecmd_msg_rconnect rconnect;
> struct tracecmd_msg_tinit tinit;
> struct tracecmd_msg_rinit rinit;
> struct tracecmd_msg_meta meta;
> @@ -155,6 +169,16 @@ static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
> memcpy(dest+offset, buf, buflen);
> }
>
> +static int make_rconnect(const char *buf, int buflen, struct tracecmd_msg *msg)
> +{
> + u32 offset = offsetof(struct tracecmd_msg, data.rconnect.str.buf);
> +
> + msg->data.rconnect.str.size = htonl(buflen);
> + bufcpy(msg, offset, buf, buflen);
> +
> + return 0;
> +}
> +
> enum msg_opt_command {
> MSGOPT_USETCP = 1,
> };
> @@ -232,11 +256,13 @@ static int make_rinit(struct tracecmd_msg *msg)
>
> msg->data.rinit.cpus = htonl(cpu_count);
>
> - for (i = 0; i < cpu_count; i++) {
> - /* + rrqports->cpus or rrqports->port_array[i] */
> - offset += sizeof(be32);
> - port = htonl(port_array[i]);
> - bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
> + if (port_array) {
> + for (i = 0; i < cpu_count; i++) {
> + /* + rrqports->cpus or rrqports->port_array[i] */
> + offset += sizeof(be32);
> + port = htonl(port_array[i]);
> + bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
> + }
> }
>
> return 0;
> @@ -248,6 +274,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
> u32 len = 0;
>
> switch (cmd) {
> + case MSG_RCONNECT:
> + return sizeof(msg->data.rconnect.str.size) + CONNECTION_MSGSIZE;
> case MSG_TINIT:
> len = sizeof(msg->data.tinit.cpus)
> + sizeof(msg->data.tinit.page_size)
> @@ -285,6 +313,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
> static int tracecmd_msg_make_body(u32 cmd, u32 len, struct tracecmd_msg *msg)
> {
> switch (cmd) {
> + case MSG_RCONNECT:
> + return make_rconnect(CONNECTION_MSG, CONNECTION_MSGSIZE, msg);
> case MSG_TINIT:
> return make_tinit(len, msg);
> case MSG_RINIT:
> @@ -425,6 +455,8 @@ static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)
> static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
> {
> char msg_tmp[TRACECMD_MSG_MAX_LEN];
> + char *buf;
> + int offset = TRACECMD_MSG_HDR_LEN;
> u32 cmd;
> int ret;
>
> @@ -437,8 +469,20 @@ static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg **msg)
>
> *msg = (struct tracecmd_msg *)msg_tmp;
> cmd = ntohl((*msg)->cmd);
> - if (cmd == MSG_CLOSE)
> + switch (cmd) {
> + case MSG_RCONNECT:
> + offset += sizeof((*msg)->data.rconnect.str.size);
> + buf = tracecmd_msg_buf_access(*msg, offset);
> + /* Make sure the server is the tracecmd server */
> + if (memcmp(buf, CONNECTION_MSG,
> + ntohl((*msg)->data.rconnect.str.size) - 1) != 0) {
> + warning("server not tracecmd server");
> + return -EPROTONOSUPPORT;
> + }
> + break;
> + case MSG_CLOSE:
> return -ECONNABORTED;
> + }
>
> return 0;
> }
> @@ -495,7 +539,54 @@ static void error_operation_for_server(struct tracecmd_msg *msg)
>
> cmd = ntohl(msg->cmd);
>
> - warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
> + if (cmd == MSG_ERROR)
> + plog("Receive error message: cmd=%d size=%d\n",
> + ntohl(msg->data.err.cmd), ntohl(msg->data.err.size));
> + else
> + warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
> +}
> +
> +int tracecmd_msg_set_connection(int fd, const char *domain)
> +{
> + struct tracecmd_msg *msg;
> + char buf[TRACECMD_MSG_MAX_LEN] = {};
> + u32 cmd;
> + int ret;
> +
> + /*
> + * Wait for connection msg by a client first.
> + * If a client uses virtio-serial, a connection message will
> + * not be sent immediately after accept(). connect() is called
> + * in QEMU, so the client can send the connection message
> + * after guest boots. Therefore, the virt-server patiently
> + * waits for the connection request of a client.
> + */
> + ret = tracecmd_msg_recv(fd, buf);
> + if (ret < 0) {
> + if (!buf[0]) {
> + /* No data means QEMU has already died. */
> + close(fd);
> + die("Connection refuesd: %s", domain);
> + }
> + return -ENOMSG;
> + }
> +
> + msg = (struct tracecmd_msg *)buf;
> + cmd = ntohl(msg->cmd);
> + if (cmd == MSG_CLOSE)
> + return -ECONNABORTED;
> + else if (cmd != MSG_TCONNECT)
> + return -EINVAL;
> +
> + ret = tracecmd_msg_send(fd, MSG_RCONNECT);
> + if (ret < 0)
> + goto error;
> +
> + return 0;
> +
> +error:
> + error_operation_for_server(msg);
> + return ret;
> }
>
> #define MAX_OPTION_SIZE 4096
> diff --git a/trace-recorder.c b/trace-recorder.c
> index 520d486..8169dc3 100644
> --- a/trace-recorder.c
> +++ b/trace-recorder.c
> @@ -149,19 +149,23 @@ tracecmd_create_buffer_recorder_fd2(int fd, int fd2, int cpu, unsigned flags,
> recorder->fd1 = fd;
> recorder->fd2 = fd2;
>
> - path = malloc_or_die(strlen(buffer) + 40);
> - if (!path)
> - goto out_free;
> -
> - if (flags & TRACECMD_RECORD_SNAPSHOT)
> - sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw", buffer, cpu);
> - else
> - sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw", buffer, cpu);
> - recorder->trace_fd = open(path, O_RDONLY);
> - if (recorder->trace_fd < 0)
> - goto out_free;
> -
> - free(path);
> + if (buffer) {
> + path = malloc_or_die(strlen(buffer) + 40);
> + if (!path)
> + goto out_free;
> +
> + if (flags & TRACECMD_RECORD_SNAPSHOT)
> + sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw",
> + buffer, cpu);
> + else
> + sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw",
> + buffer, cpu);
> + recorder->trace_fd = open(path, O_RDONLY);
> + if (recorder->trace_fd < 0)
> + goto out_free;
> +
> + free(path);
> + }
>
> if ((recorder->flags & TRACECMD_RECORD_NOSPLICE) == 0) {
> ret = pipe(recorder->brass);
> @@ -184,8 +188,9 @@ tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *
> return tracecmd_create_buffer_recorder_fd2(fd, -1, cpu, flags, buffer, 0);
> }
>
> -struct tracecmd_recorder *
> -tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer)
> +static struct tracecmd_recorder *
> +__tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
> + const char *buffer)
> {
> struct tracecmd_recorder *recorder;
> int fd;
> @@ -248,6 +253,25 @@ tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags,
> goto out;
> }
>
> +struct tracecmd_recorder *
> +tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
> + const char *buffer)
> +{
> + return __tracecmd_create_buffer_recorder(file, cpu, flags, buffer);
> +}
> +
> +struct tracecmd_recorder *
> +tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd)
> +{
> + struct tracecmd_recorder *recorder;
> +
> + recorder = __tracecmd_create_buffer_recorder(file, cpu, 0, NULL);
> + if (recorder)
> + recorder->trace_fd = trace_fd;
> +
> + return recorder;
> +}
> +
> struct tracecmd_recorder *tracecmd_create_recorder_fd(int fd, int cpu, unsigned flags)
> {
> char *tracing;
> diff --git a/trace-usage.c b/trace-usage.c
> index b8f26e6..e6a239f 100644
> --- a/trace-usage.c
> +++ b/trace-usage.c
> @@ -153,6 +153,16 @@ static struct usage_help usage_help[] = {
> " -l logfile to write messages to.\n"
> },
> {
> + "virt-server",
> + "listen on a virtio-serial for trace clients",
> + " %s virt-server [-o file][-d dir][-l logfile]\n"
> + " Creates a socket to listen for clients.\n"
> + " -D create it in daemon mode.\n"
> + " -o file name to use for clients.\n"
> + " -d diretory to store client files.\n"
> + " -l logfile to write messages to.\n"
> + },
> + {
> "list",
> "list the available events, plugins or options",
> " %s list [-e][-t][-o][-f [regex]]\n"

2013-10-18 15:06:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

On Fri, 13 Sep 2013 11:06:27 +0900
Yoshihiro YUNOMAE <[email protected]> wrote:

> Hi Steven,
>
> This is a v2 patch set for realizing a part of "Integrated trace" feature which
> is a trace merging system for a virtualization environment. Currently, trace-cmd
> does not have following features yet:
>
> a) Server and client for a virtualization environment
> b) Structured message platform between guests and host
> c) Agent feature of a client
> d) Merge feature of trace data of multiple guests and host in chronological
> order
>

Hi Yoshihiro,

I do like your patch set, but I was thinking as this is becoming a new
ABI between the client and the server, I really want to spend a lot
more time looking at this before I pull it.

I'll be traveling again on Monday for 10 days (going to Kernel Summit
than the RT summit), and I'll try to review it more during my trip. But
even then, I really want to make sure the communication is robust. I
hate to have a "V3" and then need to support two older versions.

Thanks,

-- Steve

2013-10-22 08:53:10

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] trace-cmd: Support the feature recording trace data of guests on the host

Hi Steven,

(2013/10/19 0:06), Steven Rostedt wrote:
> On Fri, 13 Sep 2013 11:06:27 +0900
> Yoshihiro YUNOMAE <[email protected]> wrote:
>
>> Hi Steven,
>>
>> This is a v2 patch set for realizing a part of "Integrated trace" feature which
>> is a trace merging system for a virtualization environment. Currently, trace-cmd
>> does not have following features yet:
>>
>> a) Server and client for a virtualization environment
>> b) Structured message platform between guests and host
>> c) Agent feature of a client
>> d) Merge feature of trace data of multiple guests and host in chronological
>> order
>>
>
> Hi Yoshihiro,
>
> I do like your patch set, but I was thinking as this is becoming a new
> ABI between the client and the server, I really want to spend a lot
> more time looking at this before I pull it.
>
> I'll be traveling again on Monday for 10 days (going to Kernel Summit
> than the RT summit), and I'll try to review it more during my trip. But
> even then, I really want to make sure the communication is robust. I
> hate to have a "V3" and then need to support two older versions.

Sure, I agree with you:)
Thank you for spending a lot of time on reviewing this patch set.

Thanks,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-22 08:53:53

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: Re: [PATCH V2 2/5] trace-cmd: Apply the trace-msg protocol for communication between a server and clients

(2013/10/18 11:19), Steven Rostedt wrote:
> On Fri, 13 Sep 2013 11:06:32 +0900
> Yoshihiro YUNOMAE <[email protected]> wrote:
>
>
>> diff --git a/trace-msg.c b/trace-msg.c
>> new file mode 100644
>> index 0000000..cf82ff6
>> --- /dev/null
>> +++ b/trace-msg.c
>> @@ -0,0 +1,683 @@
>> +/*
>> + * trace-msg.c : define message protocol for communication between clients and
>> + * a server
>> + *
>> + * Copyright (C) 2013 Hitachi, Ltd.
>> + * Created by Yoshihiro YUNOMAE <[email protected]>
>> + *
>> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; version 2 of the License (not later!)
>> + *
>
> BTW, as this can be used in a library, would be OK with re-licensing
> this as the LGPL v2.1?
>
> I try to keep all the helper C files as LGPL and the program files as
> GPL. I would have made all of trace-cmd LGPL, but there was some code I
> needed that was GPL, and I could only use it if I had the program as
> GPL code.

No problem.
When I resubmit this patch set, I'll change this license.

Thanks,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-22 08:55:41

by Yoshihiro YUNOMAE

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] trace-cmd: Add virt-server mode for a virtualization environment

(2013/10/18 11:32), Steven Rostedt wrote:
> On Fri, 13 Sep 2013 11:06:37 +0900
> Yoshihiro YUNOMAE <[email protected]> wrote:
>
>> static int *create_all_readers(int cpus, const char *node, const char *port,
>> - int pagesize, int fd)
>> + const char *domain, int virtpid, int pagesize, int fd)
>> {
>> char buf[BUFSIZ];
>> - int *port_array;
>> + int *port_array = NULL;
>> int *pid_array;
>> int start_port;
>> int udp_port;
>> int cpu;
>> int pid;
>>
>> - port_array = malloc_or_die(sizeof(int) * cpus);
>> + if (node) {
>> + port_array = malloc_or_die(sizeof(int) * cpus);
>> + start_port = START_PORT_SEARCH;
>> + }
>> pid_array = malloc_or_die(sizeof(int) * cpus);
>> memset(pid_array, 0, sizeof(int) * cpus);
>>
>> - start_port = START_PORT_SEARCH;
>> -
>> - /* Now create a UDP port for each CPU */
>> + /* Now create a reader for each CPU */
>> for (cpu = 0; cpu < cpus; cpu++) {
>> - udp_port = open_udp(node, port, &pid, cpu,
>> - pagesize, start_port);
>> - if (udp_port < 0)
>> - goto out_free;
>> - port_array[cpu] = udp_port;
>> + if (node) {
>> + udp_port = open_udp(node, port, &pid, cpu,
>> + pagesize, start_port);
>> + if (udp_port < 0)
>> + goto out_free;
>> + port_array[cpu] = udp_port;
>> + /*
>> + * due to some bugging finding ports,
>
> s/due/Due/

Thanks.

>> + * force search after last port
>> + */
>> + start_port = udp_port + 1;
>> + } else {
>> + if (open_virtio_serial_pipe(&pid, cpu, pagesize,
>> + domain, virtpid) < 0)
>> + goto out_free;
>> + }
>> pid_array[cpu] = pid;
>> /*
>> * Due to some bugging finding ports,
>
> Hmm, it seems that you added the start_port = udp_port + 1 above, but
> shouldn't you remove the one here?

Oh, you're right.
I'll delete it here.

>> @@ -482,7 +595,7 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
>> return pid_array;
>>
>> out_free:
>> - destroy_all_readers(cpus, pid_array, node, port);
>> + destroy_all_readers(cpus, pid_array, node, port, domain, virtpid);
>> return NULL;
>> }
>>
>> @@ -524,7 +637,7 @@ static void stop_all_readers(int cpus, int *pid_array)
>> }
>>
>> static void put_together_file(int cpus, int ofd, const char *node,
>> - const char *port)
>> + const char *port, const char *domain, int virtpid)
>> {
>> char **temp_files;
>> int cpu;
>> @@ -533,25 +646,31 @@ static void put_together_file(int cpus, int ofd, const char *node,
>> temp_files = malloc_or_die(sizeof(*temp_files) * cpus);
>>
>> for (cpu = 0; cpu < cpus; cpu++)
>> - temp_files[cpu] = get_temp_file(node, port, cpu);
>> + temp_files[cpu] = get_temp_file(node, port, domain,
>> + virtpid, cpu);
>>
>> tracecmd_attach_cpu_data_fd(ofd, cpus, temp_files);
>> free(temp_files);
>> }
>>
>> -static void process_client(const char *node, const char *port, int fd)
>> +static void process_client(const char *node, const char *port,
>> + const char *domain, int virtpid, int fd)
>> {
>> int *pid_array;
>> int pagesize;
>> int cpus;
>> int ofd;
>>
>> - if (communicate_with_client(fd, &cpus, &pagesize) < 0)
>> - return;
>> -
>> - ofd = create_client_file(node, port);
>> + if (node) {
>> + if (communicate_with_client_nw(fd, &cpus, &pagesize) < 0)
>
> I take it _nw is for "network". If so, please use "*_net" instead. "nw"
> is pretty meaningless.
>
> This applies for all functions.

OK, I'll rename all functions using _nw.

Thanks,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]