Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754218AbdCMRhu (ORCPT ); Mon, 13 Mar 2017 13:37:50 -0400 Received: from mail-db5eur01on0122.outbound.protection.outlook.com ([104.47.2.122]:45128 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750814AbdCMRhl (ORCPT ); Mon, 13 Mar 2017 13:37:41 -0400 Authentication-Results: openvz.org; dkim=none (message not signed) header.d=none;openvz.org; dmarc=none action=none header.from=virtuozzo.com; Date: Mon, 13 Mar 2017 10:37:24 -0700 From: Andrei Vagin To: Cyrill Gorcunov CC: , , , , , , , , Andrey Vagin , Jason Baron , Andy Lutomirski Subject: Re: [patch 2/3] kcmp: Add KCMP_EPOLL_TFD mode to compare epoll target files Message-ID: <20170313173723.GA2855@outlook.office365.com> References: <20170310082146.103151106@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Disposition: inline In-Reply-To: <20170310082146.103151106@openvz.org> User-Agent: Mutt/1.7.1 (2016-10-04) X-Originating-IP: [162.246.95.100] X-ClientProxiedBy: BN6PR18CA0011.namprd18.prod.outlook.com (10.175.188.21) To DB6PR0801MB1973.eurprd08.prod.outlook.com (10.168.85.146) X-MS-Office365-Filtering-Correlation-Id: f16e9baa-6e42-42b8-0d0c-08d46a37a44f X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:DB6PR0801MB1973; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1973;3:UL5nuUwzTaJ7LZ4gTduuGp4PDFifWKL7iQM9KbJo28K8nIJWF54/n+SE7Zagj0luXJpQopCQFWmhDF6UiQHKqM/z3EBx5cAIlyGqKsUeLpu4PTmbOoKF8zYGkTmyiV5vWcMBUP0oraoj5Bqd2w27YXAbvM5ZQei8thiE2RRWM6JwCXFLCyEauLuwkgvkaimCj9NiS3v4fHoH12h6E+1QkBJENp8x89cyCbIZLHUslau75WoH5WoAI0tP5eDA/t8YR7deJ55MZWnay6Lqoq4J4w==;25:upcjYOUCaEKfhO255QbJc1IrhmOFeWuDDTnX7xe04Z0VAJRXxZTqhNO4BRE5uilszZnFOGmFNAfkjJ1XD0nynGTOl1CCYJtaRhc757E1cdvlmWUYtHWZ3WoIvMKdlzb50v7UeFtinbtLsoWcHhdmM/N0xFhGVM+muf/7902xq9+pIhky0PNo2Yo2NSiuB1te19qPZ0FaZUo6xUh4C7r+z+4N+6wbtVV8qBeL81lYYRhE8M2vREXashB6A84ojkZCzSK6pxCitEWGX1MyCvvxgTvSaU52aa2OEOx5SSFUibHVw1rrJZK1QKHVT8JddpgHr5BqgeT9ROXgqFfTYXArldiWpAVVWgzNzsH/xwZ492i8GI4vByw27StVbrDvoKJFSl+Am5dMPeTqg0BW3X3AfSxmSwgqIiss6LBajCePEmOwF4cuGNMdnhYIzYV7iANcaYEAllAbMWVo2M7MD+NQew== X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1973;31:UEA6tJCykW/5TX3cHDugIpMODdv9lJIasJxbMqm2hdrksn0QYe2EWh9tEi7z+7Nzw1NJCpFK1bqQgJLPJUhSCM7daqhcq/KPfxUgV9JlZOCVZZqOWurDb7E4dDfAQXVdNBVup4fZ61SF9ruEdnSvSVfmkOVcHjsg/XJAS7v6x9CSHJj3qi7rxnck7kyg5lXJLAbA7R3jcXiEIjDeJKVp0pVab38Tsc39N5xfUPYWI34=;20:rUyws/bI0soX01BTjdPJx1DFS/AmFcqfl+CyP+oN+qDnX/qa+peNRKhpw3JFGiJNqDfksZpDZDmOjlDbpsagcIIh4ebbqBijDhsY16y8iDDk8kh00I2sQ/BbBnu4xbWk2lPIbj7sxA9APmrZqNkgj9ntJSlXcl5iaiy+nKJkTWulqquwV21c1ELay6WC6Csq529OpkBbattAOVr0ETk4it62Utj0IeiF7XHIvyrJ0bVczeee0Fukcrgr8ykiH0kqf94PnlxvSxYEVLX+wK2Lr4SAeuVV4+JCroF+PGcrNrl+KcC1bH/lMSgHOWJTBI9Y2lSanx+DqvsLfXjK/XdmeePA75NlCNE4RzC5XyHQfPjbNPjilmNITZhz39HflWlrvgR/9gFeI4WT9ONXX9902QQPFusE8hM2gA5WzBxDD+Y= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123564025)(20161123558025)(20161123562025)(20161123555025)(20161123560025)(6072148);SRVR:DB6PR0801MB1973;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0801MB1973; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1973;4:YVYKv8EH0XW+D2W1fm09Cv1ofkopviXqQQd5rGvxEpS706YuMDEcI8dpHOIhgTOdwCyQBtB66bazzxIekewWCyh38okt+fkm0fKZCH1R3U41JI/ST3ecGKEOFkPtvwiCr5WlbJ5Y8tQ6kgwHUlv6Pxeu0PRLYronbp3oQQ0GsJs1Ac/ivaO/ml03wJl+uPUg7zw6tDGFjkvklWHbV+nd3n4s0XyPHwAFLu4wxWXTijVqCC+4YhrMTi4Ctg3vl7b9ItoNsh23AtX4XlqZ0/T7v/g0GWOzjCJX3oHD1FGoHwCXOKovSzZYIgSS1GNC0wOouYqACv3d9DLCf6nwdeznyud+BC518VcrhoAj8ut++TA0jf8KptxMJa/onHa35VCZtPOFpg7Hh5VeBc5UD2La6Y9XXbKWJn6W5ucflc+MLCf6OV+T3DYUetb0LYpIvg7jFa0nQUKNK7fPw+BoBWTr0DmhWncrYhq+MRidwbG3DgpOlR8B3vCWUVTdziVp+UNP/IloWYjbp0sD3vGOLa92FrVuv6cKUY6XeGQpszNIBlJnoIuvruoa0GmNYfTf4nzuCYZKEzosApj+z7JEITpOUkXykoac569rvOfNVxr5h5OIQWZ9DYLNSpT6jPJfqJraYDJTUQUFiq1W/1OeaJNgVg== X-Forefront-PRVS: 0245702D7B X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(39450400003)(24454002)(53416004)(81166006)(7736002)(5660300001)(66066001)(3846002)(6116002)(76176999)(86362001)(50466002)(23686003)(189998001)(33656002)(8676002)(42186005)(305945005)(4326008)(25786008)(83506001)(50986999)(6246003)(2906002)(54356999)(6666003)(110136004)(6862004)(6506006)(47776003)(9686003)(54906002)(38730400002)(53936002)(1076002)(229853002)(55016002)(2950100002)(7099028)(18370500001);DIR:OUT;SFP:1102;SCL:1;SRVR:DB6PR0801MB1973;H:outlook.office365.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?koi8-r?Q?1;DB6PR0801MB1973;23:hy0myDyffK2VggI+Tk9IQ+7VcGzWrWAM8L5VakBww?= =?koi8-r?Q?UNABgpf88aoP3t+2DolR7YV5FfhhrDKVLApiFfeyQ5Oy0T+BlARO81aoTUQXw3?= =?koi8-r?Q?ATyivEm6uWSM2xK+6rqXYIdTkp6y0UbO68NyDi9poDgWOYz+oG6iy7xl4Dr++K?= =?koi8-r?Q?oHfe1MIaYfvx/bSehN7dqEuyYar4rZ2DT/TPFpAUDSTIyk3wHw1yk3/dMYkWgd?= =?koi8-r?Q?mAT47kb4PUohWVjHXpU22a/CoNvOv+Zt9ATsY9nNVZCQzYTtwjF3rftJXbXe6b?= =?koi8-r?Q?nnhOUuoAovsKE81AQMs3A0g0LAoy3ggZlqhCXUYekIeodkzsnwi8NsDMw3OQKs?= =?koi8-r?Q?Djvng2KxG4G0jfIrtENDM4sq4JoN3As1WyqTd/NJur7IQJvIUIWB2qoyt8z+yI?= =?koi8-r?Q?SoGitwcr1H4uvrSRVzZURKZpfkUZUibJaWQQVUR/6J6DON1kEHXswpPW/0y4Jb?= =?koi8-r?Q?VeGjOZVNfulcnH7daiizewn6XBOXdMnanoetWccGrBUxoDibon8FJlvOXTjjXL?= =?koi8-r?Q?2/9DPBPIe3wjvfPlpbhqzON6B0QpxbpvHoBo1cy8HEcQ3W+NwTOzaE8YefhG+G?= =?koi8-r?Q?F0BspIducKUk9glkAqwzdfkCO2Gjif9sIgL2XS0HO3FLkXqMJ2hUSEK7X1sHEQ?= =?koi8-r?Q?cRs8ieYIpHpbUz59fQ2f/6ydulFTZKSSeGcCrnAMioxdPGEbjtljJHpkHYbEiB?= =?koi8-r?Q?i1lH6DwYips4MMv6/b+s/QdTkAw3ZgJQaGujNYAAyxP/82k0O1pl04DtFTDdvq?= =?koi8-r?Q?aFj/ZPeGBnKplJEy88s2joPn6GoYnuhlEwjr1OncBxhiGE4m0AFltDwoH8arMb?= =?koi8-r?Q?4U8VPSDM/dx6HvSme2JI+QkhnExbjpv23fMZIJN20imWrMWjBb7nbePFXPdmAf?= =?koi8-r?Q?JrfRxA5qE6c7NX3whpMdCrVf8eJ4FxoQtdL7hSsQ9QORgycVB6UhriRFHzz5Oh?= =?koi8-r?Q?GU/6QQEt57o0U8RpPzpf8wl0xR6fbQ5I2jH7AgYkZkXGVf31WSUkJTtWi5Bk/K?= =?koi8-r?Q?VlJFpe3JEIFHQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1973;6:fFtiZjoq1vlqbPX1EvU0QR7HiVL4UF8VWSIHPt7e4B3K2WnU3tSakWCpf+5dVaI7fu2/jcJFgiMmBnANn9ULIiupWhp1zp5pBO8TcxoaLH1lHm02FJK+isnKreEd3wN5CAP2VCn3qg76PpK1ANIAVDk/CgLmWk6gtZmvO0JpCP5EmXTcC0eypvVHjYlWHB8ug86P1Hd9gmpMtyDbGi6Y5F3XuR3Gz1GCWddfbiLIF5RZBOqll3LXsdq+HMeUSYLeVtR8LipDbeRcrIheJ8seFxiNlZ3In5CsyDcp2wkYiPVkM4kUOiI6qAhwa31ga3ioDt96SMW79XBRMHR0wp2zs2ly+pMI7axXBTMklvkRj9SzTdGdshJ29jLnYGIckFuNS8hdkNoZL3gE1f8TYh1RxA==;5:WmQjFnMRndg8/DtWbcdc7C11LIaWwt3yUV3HI+hKVoNJ2U0lElxhy09V2NAUcnJkcTrcr/FyGiJdhrsv0LVp4TjU50+1lNQEN8QMqhhFsYG7J9OwOSAXyRhEBp1DBvU8ndAJ+cLGRwb10opjGXWQJw==;24:2FmNSRuT78PaTK8K7OCKdsd5Pl+fuOMqTXwjgNIUXZfMhEKAI7/jS9PwdW2XGyj4zHfx+a51AsJoaOBo62d3S3Qitym39xuSI0dzToCSldo= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DB6PR0801MB1973;7:Cf150+QNA07VF4UT5ehUtYtCBBifP3xR6xh8zFaOsGcqXOGwS41j1a3TNxMSLmTQXSxdYCm0cc2CD8NFXC7Vj0xrhatHJCdOHPLHR6QeB+rDWBSwX6oYp9Ts4vtd8AHll6D8K2O8+TPib6DoDKmjapFszi4sFYwpZkoI0j2S3kbz5VBD4Dx1UGRAQZHFWDx6jFLJXxqufdpKO7os8P723EurUHCTxzUf/lauHgOpvNOvtd7sPgqq8N9H2m3yk+mIehDFDOgTMXThKvMcvHaPD1KD/ej9HwCLHMkEqPxYd8bSAn0cg7uV6legMqkMSTbdwH/xsj0+P8uMpfdvVNu+5g==;20:dNGUgjV9LBb7v/m6IdtWgv7CtMFa8fH0sz1ouXvYZnTfixI0SgzwMl6n1ZW1JEYhl4cE2C1WukjHFurvhJKFxAnejbS5P8+kQzMDcJdkt4jlqtHBOloBOmnaDFxC5RASndIwv1gSHeueUfBwBWvls76RMBSUmB4IoMNSoksh2po= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Mar 2017 17:37:35.0779 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB1973 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7541 Lines: 249 On Fri, Mar 10, 2017 at 11:16:57AM +0300, Cyrill Gorcunov wrote: > With current epoll architecture target files are addressed > with file_struct and file descriptor number, where the last > is not unique. Moreover files can be transferred from another > process via unix socket, added into queue and closed then > so we won't find this descriptor in the task fdinfo list. > > Thus to checkpoint and restore such processes CRIU needs to > find out where exactly the target file is present to add it into > epoll queue. For this sake one can use kcmp call where > some particular target file from the queue is compared with > arbitrary file passed as an argument. > > Because epoll target files can have same file descriptor > number but different file_struct a caller should explicitly > specify the offset within. > > To test if some particular file is matching entry inside > epoll one have to > > - fill kcmp_epoll_slot structure with epoll file descriptor, > target file number and target file offset (in case if only > one target is present then it should be 0) > > - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot) > - the kernel fetch file pointer matching file descriptor @fd of pid1 > - lookups for file struct in epoll queue of pid2 and returns traditional > 0,1,2 result for sorting purpose > > v2: > - Use KCMP_FILES salt for files comparision (for convenience sake, > since the pointers are file structs so user can lookup over previously > collected files tree) > - Make kcmp_epoll_target as a separate helper instead of opencoding > it with #ifdef > > v3: > - Use less if()s in kcmp_epoll_target for readability sake (by avagin@) > - Use u32 for kcmp_epoll_slot::toff instead of u64, which makes the less > memory pressue > Here is one question inline. Acked-by: Andrei Vagin > Signed-off-by: Cyrill Gorcunov > CC: Al Viro > CC: Andrew Morton > CC: Andrey Vagin > CC: Pavel Emelyanov > CC: Michael Kerrisk > CC: Kir Kolyshkin > CC: Jason Baron > CC: Andy Lutomirski > --- > fs/eventpoll.c | 42 +++++++++++++++++++++++++++++++++++ > include/linux/eventpoll.h | 3 ++ > include/uapi/linux/kcmp.h | 10 ++++++++ > kernel/kcmp.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 110 insertions(+) > > Index: linux-ml.git/fs/eventpoll.c > =================================================================== > --- linux-ml.git.orig/fs/eventpoll.c > +++ linux-ml.git/fs/eventpoll.c > @@ -1000,6 +1000,48 @@ static struct epitem *ep_find(struct eve > return epir; > } > > +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) > +{ > + struct rb_node *rbp; > + struct epitem *epi; > + > + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { > + epi = rb_entry(rbp, struct epitem, rbn); > + if (epi->ffd.fd == tfd) { > + if (toff == 0) > + return epi; > + else > + toff--; > + } > + cond_resched(); > + } > + > + return NULL; > +} > + > +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, > + unsigned long toff) > +{ > + struct file *file_raw; > + struct eventpoll *ep; > + struct epitem *epi; > + > + if (!is_file_epoll(file)) > + return ERR_PTR(-EINVAL); > + > + ep = file->private_data; > + > + mutex_lock(&ep->mtx); > + epi = ep_find_tfd(ep, tfd, toff); > + if (epi) > + file_raw = epi->ffd.file; > + else > + file_raw = ERR_PTR(-ENOENT); > + mutex_unlock(&ep->mtx); > + > + return file_raw; > +} > + > /* > * This is the callback that is passed to the wait queue wakeup > * mechanism. It is called by the stored file descriptors when they > Index: linux-ml.git/include/linux/eventpoll.h > =================================================================== > --- linux-ml.git.orig/include/linux/eventpoll.h > +++ linux-ml.git/include/linux/eventpoll.h > @@ -14,6 +14,7 @@ > #define _LINUX_EVENTPOLL_H > > #include > +#include > > > /* Forward declarations to avoid compiler errors */ > @@ -22,6 +23,8 @@ struct file; > > #ifdef CONFIG_EPOLL > > +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); > + > /* Used to initialize the epoll bits inside the "struct file" */ > static inline void eventpoll_init_file(struct file *file) > { > Index: linux-ml.git/include/uapi/linux/kcmp.h > =================================================================== > --- linux-ml.git.orig/include/uapi/linux/kcmp.h > +++ linux-ml.git/include/uapi/linux/kcmp.h > @@ -1,6 +1,8 @@ > #ifndef _UAPI_LINUX_KCMP_H > #define _UAPI_LINUX_KCMP_H > > +#include > + > /* Comparison type */ > enum kcmp_type { > KCMP_FILE, > @@ -10,8 +12,16 @@ enum kcmp_type { > KCMP_SIGHAND, > KCMP_IO, > KCMP_SYSVSEM, > + KCMP_EPOLL_TFD, > > KCMP_TYPES, > }; > > +/* Slot for KCMP_EPOLL_TFD */ > +struct kcmp_epoll_slot { > + __u32 efd; /* epoll file descriptor */ > + __u32 tfd; /* target file number */ > + __u32 toff; /* target offset within same numbered sequence */ > +}; > + > #endif /* _UAPI_LINUX_KCMP_H */ > Index: linux-ml.git/kernel/kcmp.c > =================================================================== > --- linux-ml.git.orig/kernel/kcmp.c > +++ linux-ml.git/kernel/kcmp.c > @@ -11,6 +11,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include > > @@ -94,6 +98,54 @@ static int kcmp_lock(struct mutex *m1, s > return err; > } > > +#ifdef CONFIG_EPOLL > +static int kcmp_epoll_target(struct task_struct *task1, > + struct task_struct *task2, > + unsigned long idx1, > + struct kcmp_epoll_slot __user *uslot) > +{ > + struct file *filp, *filp_epoll, *filp_tgt; > + struct kcmp_epoll_slot slot; > + struct files_struct *files; > + > + if (copy_from_user(&slot, uslot, sizeof(slot))) > + return -EFAULT; > + > + filp = get_file_raw_ptr(task1, idx1); > + if (!filp) > + return -EBADF; > + > + files = get_files_struct(task2); > + if (!files) > + return -EBADF; > + > + spin_lock(&files->file_lock); > + filp_epoll = fcheck_files(files, slot.efd); > + if (filp_epoll) > + get_file(filp_epoll); > + else > + filp_tgt = ERR_PTR(-EBADF); > + spin_unlock(&files->file_lock); > + put_files_struct(files); Why can we not use fget here ^^^^ ? > + > + if (filp_epoll) { > + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); > + fput(filp_epoll); > + } > + > + return IS_ERR(filp_tgt) ? PTR_ERR(filp_tgt) : > + kcmp_ptr(filp, filp_tgt, KCMP_FILES); > +} > +#else > +static int kcmp_epoll_target(struct task_struct *task1, > + struct task_struct *task2, > + unsigned long idx1, > + struct kcmp_epoll_slot __user *uslot) > +{ > + return -EOPNOTSUPP; > +} > +#endif > + > SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, > unsigned long, idx1, unsigned long, idx2) > { > @@ -165,6 +217,9 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t > ret = -EOPNOTSUPP; > #endif > break; > + case KCMP_EPOLL_TFD: > + ret = kcmp_epoll_target(task1, task2, idx1, (void *)idx2); > + break; > default: > ret = -EINVAL; > break; >