Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755094Ab3HEUxP (ORCPT ); Mon, 5 Aug 2013 16:53:15 -0400 Received: from smtp.infotech.no ([82.134.31.41]:50246 "EHLO smtp.infotech.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753013Ab3HEUxN (ORCPT ); Mon, 5 Aug 2013 16:53:13 -0400 Message-ID: <5200108A.9020008@interlog.com> Date: Mon, 05 Aug 2013 16:52:26 -0400 From: Douglas Gilbert Reply-To: dgilbert@interlog.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: vaughan CC: =?UTF-8?B?SsO2cm4gRW5nZWw=?= , JBottomley@parallels.com, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open References: <1374075246-22923-1-git-send-email-vaughan.cao@oracle.com> <1374468033-8947-1-git-send-email-vaughan.cao@oracle.com> <20130722170338.GA15824@logfs.org> <51F9EB97.7070305@interlog.com> <51FC9449.4060906@interlog.com> <51FF0BC0.5020702@oracle.com> In-Reply-To: <51FF0BC0.5020702@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5683 Lines: 150 On 13-08-04 10:19 PM, vaughan wrote: > On 08/03/2013 01:25 PM, Douglas Gilbert wrote: >> On 13-08-01 01:01 AM, Douglas Gilbert wrote: >>> On 13-07-22 01:03 PM, Jörn Engel wrote: >>>> On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote: >>>>> >>>>> There is a race when open sg with O_EXCL flag. Also a race may >>>>> happen between >>>>> sg_open and sg_remove. >>>>> >>>>> Changes from v4: >>>>> * [3/4] use ERR_PTR series instead of adding another parameter in >>>>> sg_add_sfp >>>>> * [4/4] fix conflict for cherry-pick from v3. >>>>> >>>>> Changes from v3: >>>>> * release o_sem in sg_release(), not in sg_remove_sfp(). >>>>> * not set exclude with sfd_lock held. >>>>> >>>>> Vaughan Cao (4): >>>>> [SCSI] sg: use rwsem to solve race during exclusive open >>>>> [SCSI] sg: no need sg_open_exclusive_lock >>>>> [SCSI] sg: checking sdp->detached isn't protected when open >>>>> [SCSI] sg: push file descriptor list locking down to per-device >>>>> locking >>>>> >>>>> drivers/scsi/sg.c | 178 >>>>> +++++++++++++++++++++++++----------------------------- >>>>> 1 file changed, 83 insertions(+), 95 deletions(-) >>>> >>>> Patchset looks good to me, although I didn't test it on hardware yet. >>>> Signed-off-by: Joern Engel >>>> >>>> James, care to pick this up? >>> >>> Acked-by: Douglas Gilbert >>> >>> Tested O_EXCL with multiple processes and threads; passed. >>> sg driver prior to this patch had "leaky" O_EXCL logic >>> according to the same test. Block device passed. >>> >>> James, could you clean this up: >>> drivers/scsi/sg.c:242:6: warning: unused variable ‘res’ >>> [-Wunused-variable] >> >> Further testing suggests this patch on the sg driver is >> broken, so I'll rescind my ack. >> >> The case it is broken for is when a device is opened >> without O_EXCL. Now if, while it is open, a second >> thread/process tries to open the same device O_EXCL >> then IMO the second open should fail with EBUSY. >> >> My testing shows that O_EXCL opens properly deflect >> other O_EXCL opens. > Hi Doug, > > My test don't have this issue. The routine is something as below: > > I start three opens without O_EXCL, wait 30s each, and open with > O_EXCL|O_NONBLOCK, it failed with EBUSY. > And I also call myopen with/without O_EXCL many times in background at > the same time, and the test is passed. I don't know why it failed in > your test. > > Usage: myopen [-e][-n][-d delay] -f file > -e: exclude > -n: nonblock > -d: delay N seconds and then close. > > [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & > [1] 3417 > [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & > [2] 3418 > [root@vacaowol5 16835013]# ./myopen -f /dev/sg5 -d 30 & > [3] 3419 > [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug > max_active_device=6(origin 1) > def_reserved_size=32768 > >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 > FD(1): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > FD(2): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > FD(3): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > > [root@vacaowol5 16835013]# ./myopen -e -n -f /dev/sg5 -d 30 & > [4] 3422 > [3422:3351] /dev/sg5:exclude: Device or resource busy > > [4]+ Exit 1 ./myopen -e -n -f /dev/sg5 -d 30 > > [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug > max_active_device=6(origin 1) > def_reserved_size=32768 > >>> device=sg5 scsi5 chan=0 id=1 lun=0 em=0 sg_tablesize=55 excl=0 > FD(1): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > FD(2): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > FD(3): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0 > cmd_q=0 f_packid=0 k_orphan=0 closed=0 > No requests active > [root@vacaowol5 16835013]# cat /proc/scsi/sg/debug > [1] Done ./myopen -f /dev/sg5 -d 30 > [2]- Done ./myopen -f /dev/sg5 -d 30 > [3]+ Done ./myopen -f /dev/sg5 -d 30 > Hi, After the initial failures about 36 hours ago, retesting yesterday and today has not produced any unexpected failures. And I have been trying hard on lk 3.10.4 and lk 3.10.5 . My test program is a bit more intense than yours and can be found in the sg3_utils beta in the News section of this page: http://sg.danny.cz/sg/ It is in the examples directory, two variants called sg_tst_excl and sg_tst_excl2 . You will need a recent gcc compiler, IOW something that can compile c++11 . gcc 4.7.3 in Ubuntu 13.04 only just manages, fedora 19 should do better with gcc 4.8.1 . The threading is implemented using pthreads so it should be reliable. Typically I run multiple instances (processes) and each has multiple threads. One instance can run '-x' which will cause its first thread not to use O_EXCL **. All my tests currently use O_NONBLOCK and that leads to lots of EBUSYs (sometimes in the billions). Doug Gilbert ** Using '-x' on two instances will cause an expected failure so can be used as a control. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/