Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753926AbdCBApR (ORCPT ); Wed, 1 Mar 2017 19:45:17 -0500 Received: from mail-db5eur01on0091.outbound.protection.outlook.com ([104.47.2.91]:25600 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753448AbdCBAoh (ORCPT ); Wed, 1 Mar 2017 19:44:37 -0500 Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=virtuozzo.com; Date: Wed, 1 Mar 2017 15:05:09 -0800 From: Andrei Vagin To: Cyrill Gorcunov CC: , , , , , , , , , , Andrey Vagin Subject: Re: [RFC v2 2/3] kcmp: Add KCMP_EPOLL_TFD mode to compare epoll target files Message-ID: <20170301230508.GA19953@outlook.office365.com> References: <20170221171255.023016858@openvz.org> <20170227224346.GA7101@outlook.office365.com> <20170228065306.GH22938@uranus> <20170228171246.GC28817@uranus.lan> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Disposition: inline In-Reply-To: <20170228171246.GC28817@uranus.lan> User-Agent: Mutt/1.7.1 (2016-10-04) X-Originating-IP: [162.246.95.100] X-ClientProxiedBy: DM5PR1601CA0016.namprd16.prod.outlook.com (10.174.111.29) To HE1PR0801MB1979.eurprd08.prod.outlook.com (10.168.94.149) X-MS-Office365-Filtering-Correlation-Id: 1f2e902f-40f9-4fb1-04ac-08d460f7747c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:HE1PR0801MB1979; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1979;3:gGWb4PC8k2WysCUbPFJZdSgq6eVciEokXqr8J7sxGBC7+8wvKgqTGMM5pc5TOAFv+ZXOpfdz4CttHnJ4D0/JKWfZ1SA3JnPUvbq/Qymj1gQjEtfT8uQNgTa4T5sF9fRI4p90QrmhHChGcVbaX7YYN7xdDq7MWO8tR+Z5uFPPO7+uYlZGlzk1OtHO2uKMThu5pbn2W4fVbK6bGOqZ2UXj0S2MX+7rkeY5ffhXdtcMCmdXPVt7OaWw3AG/8altF8hVaHsDPWArvr8iKzME9jwOXw==;25:6dTdcblPdgIPI0aT6Rc9Oj3yPRHFYEybmeKQIw4hVX87N9nsr9/AyibSUhSdRudbXqYWFpmgAx08Kh2LzeyH+Ci9RhM7wXrsqFqK+CuOq7F94OaSH5Ldjz1U57MKDo6UUABrFUU8jWdn2c8H3rLadMp5ULcilNe19ZEeZPcf5Y0YGD1vMTCGL68YHcS2ghRKTl6uOFzj2XklUxHW68Y6St40/vi3h/zpgi+nXL/Z61XJvg3eJwWPEHsexUbw+eMFMNxD+AQBV2/zbBBq7ZHU1eurdXpOTwBHhwgti5OTKmoyzPMmaAzXOcMosZlGdG4EOHA5QMkzQcrl74ej5Dmvrm1D5gJ7KpiUWxjqJd+GcK7/V/LMMsKz/5QR00VomMhB8t00HJgtdmgLjBdGce09LQgMdTLNozU+28FK5lbl6esqpVOROOI5XMYDawrpBO5eTUBQ1RuOOcnIsEp8p3QVvw== X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1979;31:8lH1WH9zlTv+s3IPKKtctzNO+RUd4TxHvrZZ45euGnu/nPHPFh8SLaCjPlQ51eqYmwMP08Bj5mR/jkzQGh5V9Eg3n+0wmMjinAitWg9dNjdN91EuEhcUb2R9TmtEXh4/451ylDAm7bQ3edVf5zEq3e51iHQKfCWJw6ZjOPrjA0sMPHkg7kUnPawgNVqk9U/s0aOqwQ62fkmoKOxLKaK43Ein9XNz7UAmjOzToRLY3/vqwKoHOOKa5xfUCxgW+6hi;20:1Tpywu0AjmAE/k76pXzTYV8hMDJkeZCi9lDrm986FElxDOlKlnWuX56PjeJoAnE6njTxce8wyAfoglZT/xIil7goU6EvUd3DwT1NaKOFEOZxv3+eNUXo9RhzzhmQNtxoieDMn9X5eAZWLFhYsgTU+MprlAeqVi52A+Ak6Sg3KeOKY88KgFNukGl5ejQ8UMetNpow9BjtbanQyY2D18HF7SxIwX8Dyn02/8UOc5uVPqWeTp7CYbl0zOTJUEHcq/ByDMZFOLSvMkrW2oT0L1olgE+Wybde95wRz+jBHo7Y2YIf8Le7Ghgi/V08/D8AGCsPvGAUNcaiNEfSVKWH39vDmauELEIZu7wzLIO0BGDB0qOqJtR6zLlcpkbJk6U841oqK1GL9FL0jRpWW9BQqsNqxCUAyr7ZKvr8ujYdqymRJ0E= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041248)(20161123560025)(20161123555025)(20161123562025)(20161123564025)(20161123558025)(6072148);SRVR:HE1PR0801MB1979;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0801MB1979; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1979;4:fcjUbXNG44DqijoLK0DG8WWT9sZAQZW36fczs19Zj2ea0HtYIlvH9O3GSsEcKQsKceMmY2flQ55pYdA5NBHmrZgF4a2ffb/hH5UQKvsxE32uLoRKFM6q7snmFqvTt2HI8LUqw/R2lUG7LvubDg+kjOuIxb8O5ot7UCMETYQL1/yvXShGpF/ncPSqP5sQwvNZ/ndDw2eoxl85TP42gVBugP7/rrPfzewj2u6DvznYenvFXhFsew0z4n3oy1XVX8AJvUig7hDYCFra4ynuuScSo8HCKKncRpF3iQY53CRDj+DDJ3zQCYXD2nF6NaIixiyN0MdxFl2FnMDHqOzplvmE8CL8ulj3HR9fApwx3oNvCNCAZE5QRc2TvpCbOEGC0RIjytNDOrLaBFIm19Bty2cFTz/eRDAYlaEGY+CAs09m3rbGbqQfkwE7FyF801PG0YtzR50BsSNPW7bPeWpgOp0xlfsG4ta7OZzT0/s+I8MQuFxVfqndagQTtZySQay1RIrneMsc84RGC8ShPyBLs77TGUSB81PYLn/XtFjoOp2FzXb/66SemOjALq3KaPkop90F4Ih5+Gm0ejrsgstqsVCVHuBzODyecuKQOneyVl73nSFiNaWZVKq/6vBrF8GEtfmxhBBaYgyYZWxm5uh/HhaqsQ== X-Forefront-PRVS: 0233768B38 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(39450400003)(76104003)(24454002)(92566002)(4326008)(6666003)(2906002)(2950100002)(50466002)(6916009)(53416004)(83506001)(42186005)(53936002)(5660300001)(1076002)(305945005)(3846002)(38730400002)(47776003)(107886003)(6116002)(6246003)(23686003)(7736002)(33656002)(66066001)(110136004)(50986999)(8676002)(76176999)(54356999)(25786008)(81166006)(229853002)(189998001)(4001350100001)(93886004)(1411001)(86362001)(54906002)(6506006)(97736004)(55016002)(9686003)(7099028)(18370500001)(26326002);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0801MB1979;H:outlook.office365.com;FPR:;SPF:None;MLV:nov;PTR:InfoNoRecords;LANG:en; X-Microsoft-Exchange-Diagnostics: =?koi8-r?Q?1;HE1PR0801MB1979;23:KZsIbJ3Xh0XFP+RK1x6wXVusi7xLaNDyB5K3O5GLC?= =?koi8-r?Q?3EbCauoFZL+wWTpYrtEEMhfJCGocvdPQPegz2TZSNApkmgPzStISSRHsDsO83x?= =?koi8-r?Q?QJFF0Fd2u7aBqoInMjhcnk+zCxc0h38sfHYTLw3zSOdjYgLz3Fu4c0QaqPWERm?= =?koi8-r?Q?4R8xP8VrLmKV6h6To0aIpNEsCQpPAChqoEtDPiXLtyTgQ67/vyxO5oAzH4TAcE?= =?koi8-r?Q?Ql9p1WqFsh1sOUreda7H+Hh6NcMAy8Cn7pSgVgIfZL8LqNXHN7T7SKQviezc6Y?= =?koi8-r?Q?A7omNysCAot/+DicP2e0fRvIJQr2xO0PRC/qNDFDJBO/6SmXEZlN/9btNpDNYR?= =?koi8-r?Q?KGavod8iu+0Y9VdHyhq9putQBCtmLBDtYXP7z3xgQzVclDgfUkIexs/Q4Bpowm?= =?koi8-r?Q?CtOoOFAn+vf8R/I0HClp/TXCLd04Kb6aND5Cf7TI+jukSyFfRok8O+fcMPtJmh?= =?koi8-r?Q?+JAu16CFGuO8E7yoJ/7J/8/pUNG1/IcncPKjw8d3uDZRvnzrJf/9Jmozl/O0Sz?= =?koi8-r?Q?vrMvjg+FgwRjBM3pyV8EhTdX1pheaW9gxWm66YEyuXullAb5E6Olow/XgpHpW2?= =?koi8-r?Q?39J6zX+JM/TBqI9276XkmqcJDBoHWb6STqEGL5rcl2zdoFJO2Cb18FoWxjtucz?= =?koi8-r?Q?o5Uz5WDjPsmPbtzWHZ3xskpq6W0sW3wT0fCjmI5Yd8M9R8EOd+HxmaWsEgpYSC?= =?koi8-r?Q?VgiyRLA//Mi00tyDUXyMtYkFb1N21BIpYg5B1fRGTB4dfPig/tm1tJsgY2uttJ?= =?koi8-r?Q?jWaj6uPD4LEGm3vol0e6zNjNbIKTTSGgN1HdV/y8Nv3XaQe+X7ZUjIgoZ7DJg5?= =?koi8-r?Q?IEpmyFcKZ4bxYjiOL4FcAP2+B7yx56kFqNWnZU8K682Y3ZaE9WG4j7r/d0njtH?= =?koi8-r?Q?H3p82RBNiOrFmH0up5Rf8P+CLpYpqKGFDjLIDiyf72dZV4dP2aEb+fnLsJfXtV?= =?koi8-r?Q?YJO4AOtmrivNR+yXB/y2h5CjykniUmLfY1rxkR3UWOCnlMkR9dUJDeETAmEyLU?= =?koi8-r?Q?bffjGuNzzHPikWkV+LXrG/UHhhnkel5EwiPfIvilKoSbQsBOPujBySNnIXiR/f?= =?koi8-r?Q?UjNDp7ZXHjg6gvSqdWS8RmnbGNLTR9LT/ggUniy2Q/sbxvjqcQ5UH4tPKT2Hch?= =?koi8-r?Q?uE1AVtwWC3qQ42nebcAQxZg/+/kYHhcnRwP8o7pG07iyBsj9byGw2thDC2b2WA?= =?koi8-r?Q?JonYtrxY0TRzYIHheYQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1979;6:4ftJQUWGGOyMMQC21Aloejt/yWk/iWj7aLNe3H0MoZemOVlzybkbMrIjDJ9vrJT+d7BfbqCoWEME8Xz2UYlFPbZwaIQWWTgU81dW4v8bSLlILS8XlEG4Vqbmn1z75QgwQhGgZdd71dAsqU1NmJ6AysE8P7XGQ5jGaDbv1U3+1v9XcN5CM9xs6gvnlxvz33l8XD8U7SPYwgQYPuziJVsez4DEl7Ic763EwMHvgJB/9/Z21rXJPQakKePsF7sLHI5MbwUH2T9wtDLDYuGSA0/bpC8oq8hdCsA0pfIhVqjxnhmceOnav16U07GuFbmxAz7IREK7hBHnz6lMPHK/Ddw5LtZxJZrAfetS6lSGJLfMLa0k10PO6/hAqoxdF9DMLRHmWnpXjSrz0miHPTG6dR9vDg==;5:adTUXUnpbFTIXyqZJk5yzI1/78dFRXWyCeEbq2QIIHG2JKFvVjF9vrglGsIPOIAz/N8qpXxdQ7ZM8qxa7fvKVzKqNKgc3vvNnUlQ/RnFxgrXlEZJ6VTh/ZZc0rMw6uFwDJ9rZZev51wC11d6KrYAsA==;24:Yp/SakI2oLbFHxLZojufSJ3SO7Y0EM4UmSsj3upKAKFkTus9Vz0umg4GFyi84xKpJIR1Xtie5OlcH3F9hd/4EFYNarg+7n1JN/xvExGZFjo= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR0801MB1979;7:5UbISFSFns8FHYauMc4kLmQ8UqRYJziQOdAn7dszs/F6Qtm0B6Uk5b1MelHYzIR+nQEqgHUe1OuTAMLU6Ss2v4h94N2B2rvvWTyx71zWKeagnvWf/TesJMOfzUf97DdoZS1mXj3xnz1t0UjNsYdOfnEdjpY06ddfp1JzQCHUGUPZ/cUK9pLx19XyMoBZgjCDgkx1Va7PfrC2mIkivRFzjSGZ1ZZn3B+eG/l3UU+todwIaESVdesjGFytNTPl8ruWCr/jvYLvkQgZdt1ovRftlLvrz9OaQ+PBIrN4rpXItVGlYB85+HQOyopQbOlCm/deFx8EztpQO1GywhflOBdi7A==;20:cWRHNwQxLHhVkUvYAL7Watr63C6WzdfgH6i7OdD3fsqha9FNBo9IIYI9Jf/CAD2TOsjQQD5nLdcvYobYdOCOG2F3Elch5LL3M1TGBRsDqTRGEY2AcPkm0OMa4kVBpD11jHLvQClgk0y2EpK0uk6CDM/NS1uak5ot6KY+dUbHdY8= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Mar 2017 23:05:26.7618 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1979 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8230 Lines: 278 On Tue, Feb 28, 2017 at 08:12:46PM +0300, Cyrill Gorcunov wrote: > With current epoll architecture target files are addressed > with file_struct and file descriptor number, where the last > is not unique. Moreover files can be transferred from another > process via unix socket, added into queue and closed then > so we won't find this descriptor in the task fdinfo list. > > Thus to checkpoint and restore such processes CRIU needs to > find out where exactly target file is present to add it into > the epoll queue. For this sake one can use kcmp call where > some particular target file from the queue is compared with > arbitrary file passed as an argument. > > Because epoll target files can have same file descriptor > number but different file_struct a caller should explicitly > specify the offset within such entries. > > To test if some particular file is matching entry inside > epoll one have to > > - fill kcmp_epoll_slot structure with epoll file descriptor, > target file number and target file offset (in case if only > one target is present then it should be 0) > > - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot) > - the kernel fetch file pointer matching file descriptor @fd of pid1 > - lookups for file struct in epoll queue of pid2 and returns traditional > 0,1,2 result for sorting purpose > > v2: > - Use KCMP_FILES salt for files comparision (for convenience sake, > since the pointers are file structs so user can lookup over previously > collected files tree) > - Make kcmp_epoll_target as a separate helper instead of opencoding > it with #ifdef > > Signed-off-by: Cyrill Gorcunov > CC: Al Viro > CC: Andrew Morton > CC: Andrey Vagin > CC: Pavel Emelyanov > CC: Michael Kerrisk > CC: Kir Kolyshkin > CC: Jason Baron > CC: Andy Lutomirski > --- > fs/eventpoll.c | 42 +++++++++++++++++++++++++++++++++ > include/linux/eventpoll.h | 3 ++ > include/uapi/linux/kcmp.h | 10 +++++++ > kernel/kcmp.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 113 insertions(+) > > Index: linux-ml.git/fs/eventpoll.c > =================================================================== > --- linux-ml.git.orig/fs/eventpoll.c > +++ linux-ml.git/fs/eventpoll.c > @@ -1000,6 +1000,48 @@ static struct epitem *ep_find(struct eve > return epir; > } > > +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) > +{ > + struct rb_node *rbp; > + struct epitem *epi; > + > + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { > + epi = rb_entry(rbp, struct epitem, rbn); > + if (epi->ffd.fd == tfd) { > + if (toff == 0) > + return epi; > + else > + toff--; > + } > + cond_resched(); > + } > + > + return NULL; > +} > + > +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, > + unsigned long toff) > +{ > + struct file *file_raw; > + struct eventpoll *ep; > + struct epitem *epi; > + > + if (!is_file_epoll(file)) > + return ERR_PTR(-EINVAL); > + > + ep = file->private_data; > + > + mutex_lock(&ep->mtx); > + epi = ep_find_tfd(ep, tfd, toff); > + if (epi) > + file_raw = epi->ffd.file; > + else > + file_raw = ERR_PTR(-ENOENT); > + mutex_unlock(&ep->mtx); > + > + return file_raw; > +} > + > /* > * This is the callback that is passed to the wait queue wakeup > * mechanism. It is called by the stored file descriptors when they > Index: linux-ml.git/include/linux/eventpoll.h > =================================================================== > --- linux-ml.git.orig/include/linux/eventpoll.h > +++ linux-ml.git/include/linux/eventpoll.h > @@ -14,6 +14,7 @@ > #define _LINUX_EVENTPOLL_H > > #include > +#include > > > /* Forward declarations to avoid compiler errors */ > @@ -22,6 +23,8 @@ struct file; > > #ifdef CONFIG_EPOLL > > +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); > + > /* Used to initialize the epoll bits inside the "struct file" */ > static inline void eventpoll_init_file(struct file *file) > { > Index: linux-ml.git/include/uapi/linux/kcmp.h > =================================================================== > --- linux-ml.git.orig/include/uapi/linux/kcmp.h > +++ linux-ml.git/include/uapi/linux/kcmp.h > @@ -1,6 +1,8 @@ > #ifndef _UAPI_LINUX_KCMP_H > #define _UAPI_LINUX_KCMP_H > > +#include > + > /* Comparison type */ > enum kcmp_type { > KCMP_FILE, > @@ -10,8 +12,16 @@ enum kcmp_type { > KCMP_SIGHAND, > KCMP_IO, > KCMP_SYSVSEM, > + KCMP_EPOLL_TFD, > > KCMP_TYPES, > }; > > +/* Slot for KCMP_EPOLL_TFD */ > +struct kcmp_epoll_slot { > + __u32 efd; /* epoll file descriptor */ > + __u32 tfd; /* target file number */ > + __u32 toff; /* target offset within same numbered sequence */ > +}; > + > #endif /* _UAPI_LINUX_KCMP_H */ > Index: linux-ml.git/kernel/kcmp.c > =================================================================== > --- linux-ml.git.orig/kernel/kcmp.c > +++ linux-ml.git/kernel/kcmp.c > @@ -11,6 +11,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include > > @@ -94,6 +98,57 @@ static int kcmp_lock(struct mutex *m1, s > return err; > } > > +#ifdef CONFIG_EPOLL > +static int kcmp_epoll_target(struct task_struct *task1, > + struct task_struct *task2, > + unsigned long idx1, > + struct kcmp_epoll_slot __user *uslot) > +{ > + struct file *filp, *filp_epoll, *filp_tgt; > + struct kcmp_epoll_slot slot; > + struct files_struct *files; > + int ret; > + > + if (copy_from_user(&slot, uslot, sizeof(slot))) > + return -EFAULT; > + > + filp = get_file_raw_ptr(task1, idx1); > + > + files = get_files_struct(task2); > + if (files) { > + spin_lock(&files->file_lock); > + filp_epoll = fcheck_files(files, slot.efd); > + if (filp_epoll) > + get_file(filp_epoll); > + spin_unlock(&files->file_lock); > + put_files_struct(files); > + } else > + filp_epoll = NULL; > + > + if (filp && filp_epoll) { > + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); > + if (IS_ERR(filp_tgt)) > + ret = PTR_ERR(filp_tgt); > + else > + ret = kcmp_ptr(filp, filp_tgt, KCMP_FILES); > + } else > + ret = -EBADF; > + > + if (filp_epoll) > + fput(filp_epoll); > + > + return ret; > +} I rewrote this function and I think it looks more readable now. What do you think? static int kcmp_epoll_target(struct task_struct *task1, struct task_struct *task2, unsigned long idx1, struct kcmp_epoll_slot __user *uslot) { struct file *filp, *filp_epoll, *filp_tgt; struct kcmp_epoll_slot slot; struct files_struct *files; if (copy_from_user(&slot, uslot, sizeof(slot))) return -EFAULT; filp = get_file_raw_ptr(task1, idx1); if (!filp) return -EBADF; files = get_files_struct(task2); if (!files) return -EBADF; spin_lock(&files->file_lock); filp_epoll = fcheck_files(files, slot.efd); if (filp_epoll) get_file(filp_epoll); spin_unlock(&files->file_lock); put_files_struct(files); filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); fput(filp_epoll); if (IS_ERR(filp_tgt)) return PTR_ERR(filp_tgt); return kcmp_ptr(filp, filp_tgt, KCMP_FILES); } > +#else > +static int kcmp_epoll_target(struct task_struct *task1, > + struct task_struct *task2, > + unsigned long idx1, > + struct kcmp_epoll_slot __user *uslot) > +{ > + return -EOPNOTSUPP; > +} > +#endif > + > SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type, > unsigned long, idx1, unsigned long, idx2) > { > @@ -165,6 +220,9 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t > ret = -EOPNOTSUPP; > #endif > break; > + case KCMP_EPOLL_TFD: > + ret = kcmp_epoll_target(task1, task2, idx1, (void *)idx2); > + break; > default: > ret = -EINVAL; > break;