Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752132AbbEZHDG (ORCPT ); Tue, 26 May 2015 03:03:06 -0400 Received: from [133.145.228.5] ([133.145.228.5]:41519 "EHLO mail4.hitachi.co.jp" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751320AbbEZG6i (ORCPT ); Tue, 26 May 2015 02:58:38 -0400 X-AuditID: 85900ec0-9e1cab9000001a57-a5-5564197ae717 Subject: [PATCH trace-cmd V6 4/7] trace-cmd/virt-server: Add virt-server mode for a virtualization environment From: Masami Hiramatsu To: Steven Rostedt Cc: Yoshihiro YUNOMAE , Aaron Fabbri , linux-kernel@vger.kernel.org, cti.systems-productivity-manager.ts@hitachi.com, Divya Vyas , Hidehiro Kawai , yoshihiro.yunomae@aktsk.jp Date: Tue, 26 May 2015 15:55:30 +0900 Message-ID: <20150526065530.16023.20448.stgit@localhost.localdomain> In-Reply-To: <20150526065522.16023.30813.stgit@localhost.localdomain> References: <20150526065522.16023.30813.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 38063 Lines: 1334 From: Yoshihiro YUNOMAE Add the virt-server mode for a virtualization environment based on the listen mode. This mode works as a client/server mode over not TCP/UDP but virtio-serial channel. Since the troughput of trace-data can be huge, traditional IP network easily gets higher overhead. Using virtio-serial can reduce overhead because it can skip guest/host TCP/IP network stack. virt-server uses two kinds of virtio-serial I/Fs: (1) agent-ctl-path(UNIX domain socket) => control path of an agent trace-cmd each guest (2) trace-path-cpuX(named pipe) => trace data path each vcpu Those I/Fs must be defined as below paths: (1) /tmp/trace-cmd/virt/agent-ctl-path (2) /tmp/trace-cmd/virt//trace-path-cpuX If we run virt-server, agent-ctl-path I/F is automatically created because virt-server operates as a server mode of UNIX domain socket. However, trace-path-cpuX is not automatically created because we need to separate trace data for each guests. Over the virtio-serial, V2 protocol is slightly changed since the server can not notice when the client connects. The detail is described in Documentation/Protocol.txt. NOTE: This feature requests to disable(or make permissive) selinux since qemu has to open a (non-registered) unix domain socket. 1. Run virt-server on a host before booting guests # trace-cmd virt-server 2. Make guest domain directory # mkdir -p /tmp/trace-cmd/virt/ # chmod 710 /tmp/trace-cmd/virt/ # chgrp qemu /tmp/trace-cmd/virt/ 3. Make FIFO on the host # mkfifo /tmp/trace-cmd/virt//trace-path-cpu{0,1,...,X}.{in,out} 4. Set up virtio-serial pipes of the guest on the host Add the following tags to domain XML files. # virsh edit ... (cpu1, cpu2, ...) 5. Boot the guest # virsh start 6. Check I/F of virtio-serial on the guest # ls /dev/virtio-ports ... agent-ctl-path ... trace-path-cpu0 ... Next, the user will run trace-cmd with record --virt options or other options for virtualization on the guest. This patch adds only minimum features of virt-server as follows: - virt-server subcommand - Create I/F directory(/tmp/trace-cmd/virt/) - Use named pipe I/Fs of virtio-serial for trace data paths - Use UNIX domain socket for connecting clients on guests - Use splice(2) for collecting trace data of guests - libvirt is required for finding guest domain name - User must setup fifos by hand - Do not support hotplug VCPUs - Interface directory is fixed - SELinux should be disabled Signed-off-by: Yoshihiro YUNOMAE Signed-off-by: Masami Hiramatsu --- Changes in V5: Change patch description Update protocol document Changes in V4: Fix some typos and cleanup Changes in V3: Change _nw/_NW to _net/_NET --- Documentation/Protocol.txt | 44 +++ Documentation/trace-cmd-virt-server.1.txt | 89 ++++++ trace-cmd.c | 3 trace-cmd.h | 2 trace-listen.c | 467 ++++++++++++++++++++++++----- trace-msg.c | 105 ++++++- trace-recorder.c | 50 ++- trace-usage.c | 10 + 8 files changed, 667 insertions(+), 103 deletions(-) create mode 100644 Documentation/trace-cmd-virt-server.1.txt diff --git a/Documentation/Protocol.txt b/Documentation/Protocol.txt index 49f7766..52df89e 100644 --- a/Documentation/Protocol.txt +++ b/Documentation/Protocol.txt @@ -6,6 +6,7 @@ Index 1. What is the trace-cmd protocol? 2. Trace-cmd Protocol V1 (Obsolete) 3. Trace-cmd Protocol V2 +4. Trace-cmd Protocol V2 in virt-server mode 1. What is the trace-cmd protocol? @@ -117,3 +118,46 @@ or not by checking the first message from the client. If client sends a positive number, it should be a V1 protocol client. +4. Trace-cmd Protocol V2 in virt-server mode +============================================ + +In the virt-server mode, trace-cmd uses a control channel and +trace data channels of virtio-serial to transfar trace data. + +Since the virtio-serial channel is just a character device +on the guest, the server can not notice when a client attaches +to (means opens) the channel. Thus, the server waits for the +connection message MSG_TCONNECT from the client on the control +channel. The protocol flow is as follows; + + + Open a control channel + wait for MSG_TCONNECT + open a virtio-serial channel + send MSG_TCONNECT + receive MSG_TCONNECT <----+ + send MSG_RCONNECT + +---------------> receive MSG_RCONNECT + check "tracecmd-V2" + send MSG_TINIT with cpus, pagesize and options + receive MSG_TINIT <-------+ + perse the parameters + send MSG_RINIT with port_array + +----------------> receive MSG_RINIT + get port_array + send meta data(MSG_SENDMETA) + receive MSG_SENDMETA <----+ + record meta data + (snip) + send a message to finish sending meta data + | (MSG_FINMETA) + receive MSG_FINMETA <-----+ + read block + --- start sending trace data on child processes --- + + --- When client finishes sending trace data --- + send MSG_CLOSE + receive MSG_CLOSE <-------+ + close the virtio-serial channel + + diff --git a/Documentation/trace-cmd-virt-server.1.txt b/Documentation/trace-cmd-virt-server.1.txt new file mode 100644 index 0000000..b775745 --- /dev/null +++ b/Documentation/trace-cmd-virt-server.1.txt @@ -0,0 +1,89 @@ +TRACE-CMD-VIRT-SERVER(1) +======================== + +NAME +---- +trace-cmd-virt-server - listen for incoming connection to record tracing of + guests' clients + +SYNOPSIS +-------- +*trace-cmd virt-server ['OPTIONS'] + +DESCRIPTION +----------- +The trace-cmd(1) virt-server sets up UNIX domain socket I/F for communicating +with guests' clients that run 'trace-cmd-record(1)' with the *--virt* option. +When a connection is made, and the guest's client sends data, it will create a +file called 'trace.DOMAIN.dat'. Where DOMAIN is the name of the guest named +by libvirt. + +OPTIONS +------- +*-D*:: + This options causes trace-cmd listen to go into a daemon mode and run in + the background. + +*-d* 'dir':: + This option specifies a directory to write the data files into. + +*-o* 'filename':: + This option overrides the default 'trace' in the 'trace.DOMAIN.dat' that + is created when guest's client connects. + +*-l* 'filename':: + This option writes the output messages to a log file instead of standard output. + +SETTING +------- +Here, an example is written as follows: + +1. Run virt-server on a host + # trace-cmd virt-server + +2. Make guest domain directory + # mkdir -p /tmp/trace-cmd/virt/ + # chmod 710 /tmp/trace-cmd/virt/ + # chgrp qemu /tmp/trace-cmd/virt/ + +3. Make FIFO on the host + # mkfifo /tmp/trace-cmd/virt//trace-path-cpu{0,1,...,X}.{in,out} + +4. Set up of virtio-serial pipe of a guest on the host + Add the following tags to domain XML files. + # virsh edit + + + + + + + + + ... (cpu1, cpu2, ...) + +5. Boot the guest + # virsh start + +6. Run the guest's client(see trace-cmd-record(1) with the *--virt* option) + # trace-cmd record -e sched* --virt + +SEE ALSO +-------- +trace-cmd(1), trace-cmd-record(1), trace-cmd-report(1), trace-cmd-start(1), +trace-cmd-stop(1), trace-cmd-extract(1), trace-cmd-reset(1), +trace-cmd-split(1), trace-cmd-list(1) + +AUTHOR +------ +Written by Masami Hiramatsu + +RESOURCES +--------- +git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git + +COPYING +------- +Copyright \(C) 2013,2104 Hitachi, Ltd. Free use of this software is +granted under the terms of the GNU Public License (GPL). + diff --git a/trace-cmd.c b/trace-cmd.c index 4c5b564..29a2bb8 100644 --- a/trace-cmd.c +++ b/trace-cmd.c @@ -425,7 +425,8 @@ int main (int argc, char **argv) } else if (strcmp(argv[1], "mem") == 0) { trace_mem(argc, argv); exit(0); - } else if (strcmp(argv[1], "listen") == 0) { + } else if (strcmp(argv[1], "listen") == 0 || + strcmp(argv[1], "virt-server") == 0) { trace_listen(argc, argv); exit(0); } else if (strcmp(argv[1], "split") == 0) { diff --git a/trace-cmd.h b/trace-cmd.h index 1261e23..a93920f 100644 --- a/trace-cmd.h +++ b/trace-cmd.h @@ -257,6 +257,7 @@ struct tracecmd_recorder *tracecmd_create_recorder_maxkb(const char *file, int c struct tracecmd_recorder *tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *buffer); struct tracecmd_recorder *tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer); struct tracecmd_recorder *tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags, const char *buffer, int maxkb); +struct tracecmd_recorder *tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd); int tracecmd_start_recording(struct tracecmd_recorder *recorder, unsigned long sleep); void tracecmd_stop_recording(struct tracecmd_recorder *recorder); @@ -270,6 +271,7 @@ int tracecmd_msg_finish_sending_metadata(int fd); void tracecmd_msg_send_close_msg(void); /* for server */ +int tracecmd_msg_set_connection(int fd, const char *domain); int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize); int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports); int tracecmd_msg_collect_metadata(int ifd, int ofd); diff --git a/trace-listen.c b/trace-listen.c index 17ab184..718680f 100644 --- a/trace-listen.c +++ b/trace-listen.c @@ -23,9 +23,13 @@ #include #include #include +#include +#include #include #include #include +#include +#include #include #include #include @@ -50,19 +54,42 @@ static int backlog = 5; static int proto_ver; -#define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu -static char *get_temp_file(const char *host, const char *port, int cpu) +enum { + NET = 1, + VIRT = 2, +}; + +#define TEMP_FILE_STR_NET "%s.%s:%s.cpu%d", output_file, host, port, cpu +#define TEMP_FILE_STR_VIRT "%s.%s:%d.cpu%d", output_file, domain, virtpid, cpu +static char *get_temp_file(const char *host, const char *port, + const char *domain, int virtpid, int cpu, int mode) { char *file = NULL; int size; - size = snprintf(file, 0, TEMP_FILE_STR); - file = malloc_or_die(size + 1); - sprintf(file, TEMP_FILE_STR); + if (mode == NET) { + size = snprintf(file, 0, TEMP_FILE_STR_NET); + file = malloc_or_die(size + 1); + sprintf(file, TEMP_FILE_STR_NET); + } else if (mode == VIRT) { + size = snprintf(file, 0, TEMP_FILE_STR_VIRT); + file = malloc_or_die(size + 1); + sprintf(file, TEMP_FILE_STR_VIRT); + } return file; } +static char *get_temp_file_net(const char *host, const char *port, int cpu) +{ + return get_temp_file(host, port, NULL, 0, cpu, NET); +} + +static char *get_temp_file_virt(const char *domain, int virtpid, int cpu) +{ + return get_temp_file(NULL, NULL, domain, virtpid, cpu, VIRT); +} + static void put_temp_file(char *file) { free(file); @@ -81,11 +108,15 @@ static void signal_setup(int sig, sighandler_t handle) sigaction(sig, &action, NULL); } -static void delete_temp_file(const char *host, const char *port, int cpu) +static void delete_temp_file(const char *host, const char *port, + const char *domain, int virtpid, int cpu, int mode) { char file[MAX_PATH]; - snprintf(file, MAX_PATH, TEMP_FILE_STR); + if (mode == NET) + snprintf(file, MAX_PATH, TEMP_FILE_STR_NET); + else if (mode == VIRT) + snprintf(file, MAX_PATH, TEMP_FILE_STR_VIRT); unlink(file); } @@ -113,8 +144,12 @@ static int process_option(char *option) return 0; } +static struct tracecmd_recorder *recorder; + static void finish(int sig) { + if (recorder) + tracecmd_stop_recording(recorder); done = 1; } @@ -184,7 +219,7 @@ static void process_udp_child(int sfd, const char *host, const char *port, signal_setup(SIGUSR1, finish); - tempfile = get_temp_file(host, port, cpu); + tempfile = get_temp_file_net(host, port, cpu); fd = open(tempfile, O_WRONLY | O_TRUNC | O_CREAT, 0644); if (fd < 0) pdie("creating %s", tempfile); @@ -225,6 +260,28 @@ static void process_udp_child(int sfd, const char *host, const char *port, exit(0); } +#define SLEEP_DEFAULT 1000 + +static void process_virt_child(int fd, int cpu, int pagesize, + const char *domain, int virtpid) +{ + char *tempfile; + + signal_setup(SIGUSR1, finish); + tempfile = get_temp_file_virt(domain, virtpid, cpu); + + recorder = tracecmd_create_recorder_virt(tempfile, cpu, fd); + + do { + if (tracecmd_start_recording(recorder, SLEEP_DEFAULT) < 0) + break; + } while (!done); + + tracecmd_free_recorder(recorder); + put_temp_file(tempfile); + exit(0); +} + #define START_PORT_SEARCH 1500 #define MAX_PORT_SEARCH 6000 @@ -272,20 +329,37 @@ static int udp_bind_a_port(int start_port, int *sfd) return num_port; } -static void fork_udp_reader(int sfd, const char *node, const char *port, - int *pid, int cpu, int pagesize) +static void fork_reader(int sfd, const char *node, const char *port, + int *pid, int cpu, int pagesize, const char *domain, + int virtpid, int mode) { *pid = fork(); if (*pid < 0) - pdie("creating udp reader"); + pdie("creating reader"); - if (!*pid) - process_udp_child(sfd, node, port, cpu, pagesize); + if (!*pid) { + if (mode == NET) + process_udp_child(sfd, node, port, cpu, pagesize); + else if (mode == VIRT) + process_virt_child(sfd, cpu, pagesize, domain, virtpid); + } close(sfd); } +static void fork_udp_reader(int sfd, const char *node, const char *port, + int *pid, int cpu, int pagesize) +{ + fork_reader(sfd, node, port, pid, cpu, pagesize, NULL, 0, NET); +} + +static void fork_virt_reader(int sfd, int *pid, int cpu, int pagesize, + const char *domain, int virtpid) +{ + fork_reader(sfd, NULL, NULL, pid, cpu, pagesize, domain, virtpid, VIRT); +} + static int open_udp(const char *node, const char *port, int *pid, int cpu, int pagesize, int start_port) { @@ -305,6 +379,29 @@ static int open_udp(const char *node, const char *port, int *pid, return num_port; } +#define TRACE_CMD_DIR "/tmp/trace-cmd/" +#define VIRT_DIR TRACE_CMD_DIR "virt/" +#define VIRT_TRACE_CTL_SOCK VIRT_DIR "agent-ctl-path" +#define TRACE_PATH_DOMAIN_CPU VIRT_DIR "%s/trace-path-cpu%d.out" + +static int open_virtio_serial_pipe(int *pid, int cpu, int pagesize, + const char *domain, int virtpid) +{ + char buf[PATH_MAX]; + int fd; + + snprintf(buf, PATH_MAX, TRACE_PATH_DOMAIN_CPU, domain, cpu); + fd = open(buf, O_RDONLY | O_NONBLOCK); + if (fd < 0) { + warning("open %s", buf); + return fd; + } + + fork_virt_reader(fd, pid, cpu, pagesize, domain, virtpid); + + return fd; +} + /* Setup client who is using the v1 protocol */ static int client_initial_setting(int fd, char *buf, int *cpus, int *pagesize) { @@ -369,7 +466,7 @@ static int client_initial_setting(int fd, char *buf, int *cpus, int *pagesize) return 0; } -static int communicate_with_client(int fd, int *cpus, int *pagesize) +static int communicate_with_client_net(int fd, int *cpus, int *pagesize) { char buf[BUFSIZ]; int n; @@ -407,12 +504,32 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize) return 0; } -static int create_client_file(const char *node, const char *port) +static int communicate_with_client_virt(int fd, const char *domain, int *cpus, int *pagesize) +{ + proto_ver = V2_PROTOCOL; + + if (tracecmd_msg_set_connection(fd, domain) < 0) + return -1; + + /* read the CPU count, the page size, and options */ + if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0) + return -1; + + return 0; +} + +static int create_client_file(const char *node, const char *port, + const char *domain, int pid, int mode) { char buf[BUFSIZ]; int ofd; - snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port); + if (mode == NET) + snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port); + else if (mode == VIRT) + snprintf(buf, BUFSIZ, "%s.%s:%d.dat", output_file, domain, pid); + else + plog("create_client_file: Unsupported mode %d", mode); ofd = open(buf, O_RDWR | O_CREAT | O_TRUNC, 0644); if (ofd < 0) @@ -421,7 +538,8 @@ static int create_client_file(const char *node, const char *port) } static void destroy_all_readers(int cpus, int *pid_array, const char *node, - const char *port) + const char *port, const char *domain, + int virtpid, int mode) { int cpu; @@ -429,42 +547,50 @@ static void destroy_all_readers(int cpus, int *pid_array, const char *node, if (pid_array[cpu] > 0) { kill(pid_array[cpu], SIGKILL); waitpid(pid_array[cpu], NULL, 0); - delete_temp_file(node, port, cpu); + delete_temp_file(node, port, domain, virtpid, cpu, mode); pid_array[cpu] = 0; } } } static int *create_all_readers(int cpus, const char *node, const char *port, - int pagesize, int fd) + const char *domain, int virtpid, int pagesize, + int fd, int mode) { char buf[BUFSIZ]; - int *port_array; + int *port_array = NULL; int *pid_array; int start_port; int udp_port; int cpu; int pid; - port_array = malloc_or_die(sizeof(int) * cpus); + if (mode == NET) { + port_array = malloc_or_die(sizeof(int) * cpus); + start_port = START_PORT_SEARCH; + } pid_array = malloc_or_die(sizeof(int) * cpus); memset(pid_array, 0, sizeof(int) * cpus); - start_port = START_PORT_SEARCH; - - /* Now create a UDP port for each CPU */ + /* Now create a reader for each CPU */ for (cpu = 0; cpu < cpus; cpu++) { - udp_port = open_udp(node, port, &pid, cpu, - pagesize, start_port); - if (udp_port < 0) - goto out_free; - port_array[cpu] = udp_port; + if (node) { + udp_port = open_udp(node, port, &pid, cpu, + pagesize, start_port); + if (udp_port < 0) + goto out_free; + port_array[cpu] = udp_port; + /* + * Due to some bugging finding ports, + * force search after last port + */ + start_port = udp_port + 1; + } else { + if (open_virtio_serial_pipe(&pid, cpu, pagesize, + domain, virtpid) < 0) + goto out_free; + } pid_array[cpu] = pid; - /* - * Due to some bugging finding ports, - * force search after last port - */ - start_port = udp_port + 1; } if (proto_ver == V2_PROTOCOL) { @@ -485,7 +611,7 @@ static int *create_all_readers(int cpus, const char *node, const char *port, return pid_array; out_free: - destroy_all_readers(cpus, pid_array, node, port); + destroy_all_readers(cpus, pid_array, node, port, domain, virtpid, mode); return NULL; } @@ -527,7 +653,8 @@ static void stop_all_readers(int cpus, int *pid_array) } static void put_together_file(int cpus, int ofd, const char *node, - const char *port) + const char *port, const char *domain, int virtpid, + int mode) { char **temp_files; int cpu; @@ -536,25 +663,33 @@ static void put_together_file(int cpus, int ofd, const char *node, temp_files = malloc_or_die(sizeof(*temp_files) * cpus); for (cpu = 0; cpu < cpus; cpu++) - temp_files[cpu] = get_temp_file(node, port, cpu); + temp_files[cpu] = get_temp_file(node, port, domain, + virtpid, cpu, mode); tracecmd_attach_cpu_data_fd(ofd, cpus, temp_files); free(temp_files); } -static void process_client(const char *node, const char *port, int fd) +static void process_client(int fd, const char *node, const char *port, + const char *domain, int virtpid, int mode) { int *pid_array; int pagesize; int cpus; int ofd; - if (communicate_with_client(fd, &cpus, &pagesize) < 0) - return; - - ofd = create_client_file(node, port); - - pid_array = create_all_readers(cpus, node, port, pagesize, fd); + if (mode == NET) { + if (communicate_with_client_net(fd, &cpus, &pagesize) < 0) + return; + } else if (mode == VIRT) { + if (communicate_with_client_virt(fd, domain, &cpus, &pagesize) < 0) + return; + } else + pdie("process_client: Unsupported mode %d", mode); + + ofd = create_client_file(node, port, domain, virtpid, mode); + pid_array = create_all_readers(cpus, node, port, domain, virtpid, + pagesize, fd, mode); if (!pid_array) return; @@ -573,9 +708,22 @@ static void process_client(const char *node, const char *port, int fd) /* wait a little to have the readers clean up */ sleep(1); - put_together_file(cpus, ofd, node, port); + put_together_file(cpus, ofd, node, port, domain, virtpid, mode); + + destroy_all_readers(cpus, pid_array, node, port, domain, virtpid, mode); +} + +static void process_client_net(int fd, const char *node, const char *port) +{ + process_client(fd, node, port, NULL, 0, NET); +} - destroy_all_readers(cpus, pid_array, node, port); +static void process_client_virt(int fd, const char *domain, int virtpid) +{ + /* keep connection to qemu if clients on guests finish operation */ + do { + process_client(fd, NULL, NULL, domain, virtpid, VIRT); + } while (!done); } static int do_fork(int cfd) @@ -602,32 +750,104 @@ static int do_fork(int cfd) return 0; } -static int do_connection(int cfd, struct sockaddr_storage *peer_addr, - socklen_t peer_addr_len) +static int get_virtpid(int cfd) { - char host[NI_MAXHOST], service[NI_MAXSERV]; - int s; + struct ucred cr; + socklen_t cl; int ret; - ret = do_fork(cfd); - if (ret) + cl = sizeof(cr); + ret = getsockopt(cfd, SOL_SOCKET, SO_PEERCRED, &cr, &cl); + if (ret < 0) return ret; - s = getnameinfo((struct sockaddr *)peer_addr, peer_addr_len, - host, NI_MAXHOST, - service, NI_MAXSERV, NI_NUMERICSERV); + return cr.pid; +} - if (s == 0) - plog("Connected with %s:%s\n", - host, service); - else { - plog("Error with getnameinfo: %s\n", - gai_strerror(s)); - close(cfd); - return -1; +#define LIBVIRT_DOMAIN_PATH "/var/run/libvirt/qemu/" + +/* We can convert pid to domain name of a guest when we use libvirt. */ +static char *get_guest_domain_from_pid(int pid) +{ + struct dirent *dirent; + char file_name[NAME_MAX]; + char *file_name_ret, *domain; + char buf[BUFSIZ]; + DIR *dir; + size_t doml; + int fd; + + dir = opendir(LIBVIRT_DOMAIN_PATH); + if (!dir) { + if (errno == ENOENT) + warning("Only support for using libvirt"); + return NULL; + } + + for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) { + snprintf(file_name, NAME_MAX, LIBVIRT_DOMAIN_PATH"%s", + dirent->d_name); + file_name_ret = strstr(file_name, ".pid"); + if (file_name_ret) { + fd = open(file_name, O_RDONLY); + if (fd < 0) + return NULL; + if (read(fd, buf, BUFSIZ) < 0) + return NULL; + + if (pid == atoi(buf)) { + /* not include /var/run/libvirt/qemu */ + doml = (size_t)(file_name_ret - file_name) + - strlen(LIBVIRT_DOMAIN_PATH); + domain = strndup(file_name + + strlen(LIBVIRT_DOMAIN_PATH), + doml); + plog("start %s:%d\n", domain, pid); + return domain; + } + } } - process_client(host, service, cfd); + return NULL; +} + +static int do_connection(int cfd, struct sockaddr *peer_addr, + socklen_t peer_addr_len, int mode) +{ + char host[NI_MAXHOST], service[NI_MAXSERV]; + int s, ret, virtpid; + char *domain = NULL; + + if (mode == VIRT) { + virtpid = get_virtpid(cfd); + if (virtpid < 0) + return virtpid; + + domain = get_guest_domain_from_pid(virtpid); + if (!domain) + return -1; + } + + ret = do_fork(cfd); + if (ret) + return ret; + + if (mode == NET) { + s = getnameinfo(peer_addr, peer_addr_len, host, NI_MAXHOST, + service, NI_MAXSERV, NI_NUMERICSERV); + + if (s == 0) + plog("Connected with %s:%s\n", + host, service); + else { + plog("Error with getnameinfo: %s\n", + gai_strerror(s)); + close(cfd); + return -1; + } + process_client_net(cfd, host, service); + } else if (mode == VIRT) + process_client_virt(cfd, domain, virtpid); close(cfd); @@ -681,12 +901,11 @@ static void remove_process(int pid) static void kill_clients(void) { - int status; int i; for (i = 0; i < saved_pids; i++) { kill(client_pids[i], SIGINT); - waitpid(client_pids[i], &status, 0); + waitpid(client_pids[i], NULL, 0); } saved_pids = 0; @@ -705,31 +924,38 @@ static void clean_up(int sig) } while (ret > 0); } -static void do_accept_loop(int sfd) +static void do_accept_loop(int sfd, int mode) { - struct sockaddr_storage peer_addr; - socklen_t peer_addr_len; + struct sockaddr addr; + socklen_t addrlen; int cfd, pid; - peer_addr_len = sizeof(peer_addr); + if (mode == NET) + addrlen = sizeof(struct sockaddr_storage); + else if (mode == VIRT) + addrlen = sizeof(struct sockaddr_un); + else + pdie("do_accept_loop: Unsupported mode %d", mode); do { - cfd = accept(sfd, (struct sockaddr *)&peer_addr, - &peer_addr_len); + cfd = accept(sfd, &addr, &addrlen); printf("connected!\n"); if (cfd < 0 && errno == EINTR) continue; if (cfd < 0) pdie("connecting"); - pid = do_connection(cfd, &peer_addr, peer_addr_len); + if (mode == NET) + pid = do_connection(cfd, &addr, addrlen, mode); + else if (mode == VIRT) + pid = do_connection(cfd, NULL, 0, mode); if (pid > 0) add_process(pid); } while (!done); } -static void do_listen(char *port) +static void do_listen_net(char *port) { struct addrinfo hints; struct addrinfo *result, *rp; @@ -767,8 +993,64 @@ static void do_listen(char *port) if (listen(sfd, backlog) < 0) pdie("listen"); - do_accept_loop(sfd); + do_accept_loop(sfd, NET); + + kill_clients(); +} + +static void make_virt_if_dir(void) +{ + struct group *group; + + if (mkdir(TRACE_CMD_DIR, 0710) < 0) { + if (errno != EEXIST) + pdie("mkdir %s", TRACE_CMD_DIR); + } + /* QEMU operates as qemu:qemu */ + chmod(TRACE_CMD_DIR, 0710); + group = getgrnam("qemu"); + if (chown(TRACE_CMD_DIR, -1, group->gr_gid) < 0) + pdie("chown %s", TRACE_CMD_DIR); + + if (mkdir(VIRT_DIR, 0710) < 0) { + if (errno != EEXIST) + pdie("mkdir %s", VIRT_DIR); + } + chmod(VIRT_DIR, 0710); + if (chown(VIRT_DIR, -1, group->gr_gid) < 0) + pdie("chown %s", VIRT_DIR); +} + +static void do_listen_virt(void) +{ + struct sockaddr_un un_server; + struct group *group; + socklen_t slen; + int sfd; + + make_virt_if_dir(); + + slen = sizeof(un_server); + sfd = socket(AF_UNIX, SOCK_STREAM, 0); + if (sfd < 0) + pdie("socket"); + + un_server.sun_family = AF_UNIX; + snprintf(un_server.sun_path, PATH_MAX, VIRT_TRACE_CTL_SOCK); + + if (bind(sfd, (struct sockaddr *)&un_server, slen) < 0) + pdie("bind"); + chmod(VIRT_TRACE_CTL_SOCK, 0660); + group = getgrnam("qemu"); + if (chown(VIRT_TRACE_CTL_SOCK, -1, group->gr_gid) < 0) + pdie("fchown %s", VIRT_TRACE_CTL_SOCK); + + if (listen(sfd, backlog) < 0) + pdie("listen"); + + do_accept_loop(sfd, VIRT); + unlink(VIRT_TRACE_CTL_SOCK); kill_clients(); } @@ -782,17 +1064,33 @@ enum { OPT_debug = 255, }; +static void parse_args_net(int c, char **argv, char **port) +{ + switch (c) { + case 'p': + *port = optarg; + break; + default: + usage(argv); + } +} + void trace_listen(int argc, char **argv) { char *logfile = NULL; char *port = NULL; int daemon = 0; + int mode = 0; int c; if (argc < 2) usage(argv); - if (strcmp(argv[1], "listen") != 0) + if (strcmp(argv[1], "listen") == 0) + mode = NET; + else if (strcmp(argv[1], "virt-server") == 0) + mode = VIRT; + else usage(argv); for (;;) { @@ -812,9 +1110,6 @@ void trace_listen(int argc, char **argv) case 'h': usage(argv); break; - case 'p': - port = optarg; - break; case 'd': output_dir = optarg; break; @@ -831,11 +1126,14 @@ void trace_listen(int argc, char **argv) debug = 1; break; default: - usage(argv); + if (mode == NET) + parse_args_net(c, argv, &port); + else + usage(argv); } } - if (!port) + if (!port && mode == NET) usage(argv); if ((argc - optind) >= 2) @@ -863,7 +1161,12 @@ void trace_listen(int argc, char **argv) signal_setup(SIGINT, finish); signal_setup(SIGTERM, finish); - do_listen(port); + if (mode == NET) + do_listen_net(port); + else if (mode == VIRT) + do_listen_virt(); + else + ; /* Not reached */ return; } diff --git a/trace-msg.c b/trace-msg.c index e3d4f3f..717089c 100644 --- a/trace-msg.c +++ b/trace-msg.c @@ -59,6 +59,9 @@ typedef __be32 be32; #define CPU_MAX 256 +/* use CONNECT_MSG as a protocol version of trace-msg */ +#define CONNECT_MSG "tracecmd-V2" + /* for both client and server */ bool use_tcp; int cpu_count; @@ -78,6 +81,10 @@ struct tracecmd_msg_str { char *buf; } __attribute__((packed)); +struct tracecmd_msg_rconnect { + struct tracecmd_msg_str str; +}; + struct tracecmd_msg_opt { be32 size; be32 opt_cmd; @@ -104,6 +111,7 @@ struct tracecmd_msg_error { be32 size; be32 cmd; union { + struct tracecmd_msg_rconnect rconnect; struct tracecmd_msg_tinit tinit; struct tracecmd_msg_rinit rinit; struct tracecmd_msg_meta meta; @@ -111,7 +119,10 @@ struct tracecmd_msg_error { } __attribute__((packed)); enum tracecmd_msg_cmd { + MSG_ERROR = 0, MSG_CLOSE = 1, + MSG_TCONNECT = 2, + MSG_RCONNECT = 3, MSG_TINIT = 4, MSG_RINIT = 5, MSG_SENDMETA = 6, @@ -122,6 +133,7 @@ struct tracecmd_msg { be32 size; be32 cmd; union { + struct tracecmd_msg_rconnect rconnect; struct tracecmd_msg_tinit tinit; struct tracecmd_msg_rinit rinit; struct tracecmd_msg_meta meta; @@ -159,6 +171,16 @@ static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen) memcpy(dest+offset, buf, buflen); } +static int make_rconnect(const char *buf, int buflen, struct tracecmd_msg *msg) +{ + u32 offset = offsetof(struct tracecmd_msg, data.rconnect.str.buf); + + msg->data.rconnect.str.size = htonl(buflen); + bufcpy(msg, offset, buf, buflen); + + return 0; +} + enum msg_opt_command { MSGOPT_USETCP = 1, }; @@ -236,11 +258,13 @@ static int make_rinit(struct tracecmd_msg *msg) msg->data.rinit.cpus = htonl(cpu_count); - for (i = 0; i < cpu_count; i++) { - /* + rrqports->cpus or rrqports->port_array[i] */ - offset += sizeof(be32); - port = htonl(port_array[i]); - bufcpy(msg, offset, &port, sizeof(be32) * cpu_count); + if (port_array) { + for (i = 0; i < cpu_count; i++) { + /* + rrqports->cpus or rrqports->port_array[i] */ + offset += sizeof(be32); + port = htonl(port_array[i]); + bufcpy(msg, offset, &port, sizeof(be32) * cpu_count); + } } return 0; @@ -252,6 +276,9 @@ static u32 tracecmd_msg_get_body_length(u32 cmd) u32 len = 0; switch (cmd) { + case MSG_RCONNECT: + return sizeof(msg->data.rconnect.str.size) + + sizeof(CONNECT_MSG); case MSG_TINIT: len = sizeof(msg->data.tinit.cpus) + sizeof(msg->data.tinit.page_size) @@ -288,6 +315,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd) static int tracecmd_msg_make_body(u32 cmd, struct tracecmd_msg *msg) { switch (cmd) { + case MSG_RCONNECT: + return make_rconnect(CONNECT_MSG, sizeof(CONNECT_MSG), msg); case MSG_TINIT: return make_tinit(msg); case MSG_RINIT: @@ -423,6 +452,8 @@ static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset) static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg *msg) { + int offset = TRACECMD_MSG_HDR_LEN; + char *buf; u32 cmd; int ret; @@ -434,8 +465,20 @@ static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg *msg) } cmd = ntohl(msg->cmd); - if (cmd == MSG_CLOSE) + switch (cmd) { + case MSG_RCONNECT: + offset += sizeof(msg->data.rconnect.str.size); + buf = tracecmd_msg_buf_access(msg, offset); + /* Make sure the server is the tracecmd server */ + if (memcmp(buf, CONNECT_MSG, + ntohl(msg->data.rconnect.str.size) - 1) != 0) { + warning("server not tracecmd server"); + return -EPROTONOSUPPORT; + } + break; + case MSG_CLOSE: return -ECONNABORTED; + } return 0; } @@ -494,7 +537,55 @@ static void error_operation_for_server(struct tracecmd_msg *msg) cmd = ntohl(msg->cmd); - warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size)); + if (cmd == MSG_ERROR) + plog("Receive error message: cmd=%d size=%d\n", + ntohl(msg->data.err.cmd), ntohl(msg->data.err.size)); + else + warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size)); +} + +int tracecmd_msg_set_connection(int fd, const char *domain) +{ + struct tracecmd_msg *msg; + char buf[TRACECMD_MSG_MAX_LEN] = {}; + u32 cmd; + int ret; + + msg = (struct tracecmd_msg *)buf; + + /* + * Wait for connection msg by a client first. + * If a client uses virtio-serial, a connection message will + * not be sent immediately after accept(). connect() is called + * in QEMU, so the client can send the connection message + * after guest boots. Therefore, the virt-server patiently + * waits for the connection request of a client. + */ + ret = tracecmd_msg_recv(fd, msg); + if (ret < 0) { + if (!buf[0]) { + /* No data means QEMU has already died. */ + close(fd); + die("Connection refuesd: %s", domain); + } + return -ENOMSG; + } + + cmd = ntohl(msg->cmd); + if (cmd == MSG_CLOSE) + return -ECONNABORTED; + else if (cmd != MSG_TCONNECT) + return -EINVAL; + + ret = tracecmd_msg_send(fd, MSG_RCONNECT); + if (ret < 0) + goto error; + + return 0; + +error: + error_operation_for_server(msg); + return ret; } #define MAX_OPTION_SIZE 4096 diff --git a/trace-recorder.c b/trace-recorder.c index 66cad98..ad80d82 100644 --- a/trace-recorder.c +++ b/trace-recorder.c @@ -155,19 +155,23 @@ tracecmd_create_buffer_recorder_fd2(int fd, int fd2, int cpu, unsigned flags, recorder->fd1 = fd; recorder->fd2 = fd2; - path = malloc_or_die(strlen(buffer) + 40); - if (!path) - goto out_free; + if (buffer) { + path = malloc_or_die(strlen(buffer) + 40); + if (!path) + goto out_free; - if (flags & TRACECMD_RECORD_SNAPSHOT) - sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw", buffer, cpu); - else - sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw", buffer, cpu); - recorder->trace_fd = open(path, O_RDONLY); - if (recorder->trace_fd < 0) - goto out_free; + if (flags & TRACECMD_RECORD_SNAPSHOT) + sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw", + buffer, cpu); + else + sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw", + buffer, cpu); + recorder->trace_fd = open(path, O_RDONLY); + if (recorder->trace_fd < 0) + goto out_free; - free(path); + free(path); + } if ((recorder->flags & TRACECMD_RECORD_NOSPLICE) == 0) { ret = pipe(recorder->brass); @@ -190,8 +194,9 @@ tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char * return tracecmd_create_buffer_recorder_fd2(fd, -1, cpu, flags, buffer, 0); } -struct tracecmd_recorder * -tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer) +static struct tracecmd_recorder * +__tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, + const char *buffer) { struct tracecmd_recorder *recorder; int fd; @@ -254,6 +259,25 @@ tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags, goto out; } +struct tracecmd_recorder * +tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, + const char *buffer) +{ + return __tracecmd_create_buffer_recorder(file, cpu, flags, buffer); +} + +struct tracecmd_recorder * +tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd) +{ + struct tracecmd_recorder *recorder; + + recorder = __tracecmd_create_buffer_recorder(file, cpu, 0, NULL); + if (recorder) + recorder->trace_fd = trace_fd; + + return recorder; +} + struct tracecmd_recorder *tracecmd_create_recorder_fd(int fd, int cpu, unsigned flags) { const char *tracing; diff --git a/trace-usage.c b/trace-usage.c index 520b14b..3d9b821 100644 --- a/trace-usage.c +++ b/trace-usage.c @@ -212,6 +212,16 @@ static struct usage_help usage_help[] = { " -l logfile to write messages to.\n" }, { + "virt-server", + "listen on a virtio-serial for trace clients", + " %s virt-server [-o file][-d dir][-l logfile]\n" + " Creates a socket to listen for clients.\n" + " -D create it in daemon mode.\n" + " -o file name to use for clients.\n" + " -d diretory to store client files.\n" + " -l logfile to write messages to.\n" + }, + { "list", "list the available events, plugins or options", " %s list [-e [regex]][-t][-o][-f [regex]]\n" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/