Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757219AbcJaQLK (ORCPT ); Mon, 31 Oct 2016 12:11:10 -0400 Received: from us-smtp-delivery-194.mimecast.com ([216.205.24.194]:45509 "EHLO us-smtp-delivery-194.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756980AbcJaQLI (ORCPT ); Mon, 31 Oct 2016 12:11:08 -0400 From: Trond Myklebust To: Sebastian Andrzej Siewior CC: Schumaker Anna , List Linux NFS Mailing , List Linux Kernel Mailing , "tglx@linutronix.de" Subject: Re: [PATCH v3] NFSv4: replace seqcount_t with a rw_semaphore Thread-Topic: [PATCH v3] NFSv4: replace seqcount_t with a rw_semaphore Thread-Index: AQHSM3mEHI2fh+m5skS58vgJ3aR4g6DCsFUAgAAHVwCAAAQfgA== Date: Mon, 31 Oct 2016 16:11:02 +0000 Message-ID: <40147FA4-9FD8-4349-B309-751DFCE875A2@primarydata.com> References: <20161021164727.24485-1-bigeasy@linutronix.de> <20161028210511.k7hrsfxovti7gqtu@linutronix.de> <747E2CCB-3D83-40FD-8B31-46AB5A5E8592@primarydata.com> <20161031131958.mrrez5t7sow75p6v@linutronix.de> <51E3F753-2D1F-4370-BFEB-BD8356D622A1@primarydata.com> <20161031155616.fqbqy53bwpanufhn@linutronix.de> In-Reply-To: <20161031155616.fqbqy53bwpanufhn@linutronix.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [50.124.59.220] x-ms-office365-filtering-correlation-id: db00e3b5-3f6c-419e-c717-08d401a882a4 x-microsoft-exchange-diagnostics: 1;BN6PR11MB1570;7:72+ZweCUc7gCmRbhIy1iVd9ZKLK1RgHa3DCEkntpIHl0Mg21pi/IZDfRFAKxawdHoL7qyp+lWuN0YWRTCPF321cEiGoXPKFo0QAfKGFy1fbNKHbnbjOcZSdh+ayE9cHPvRdvTSg2fygYFX5BrxLS5Uc/ZUJ717NZmBs8pGPtm8HY4MhiMbpVYIa1tMpSGGsmXBYsnxbSwtMel5N2ewYQRjLm7Jf6bWMtcPOQDClGSZtC52NA2OXE8JrVwPJBwj2zja/D4MpwSBJOim7fK4v/qZMK0/2LohqsmyhCyAj31hrbATmPzszX/fQXUk/5FfshsIf8Yhi7Pj9BmLo0hz/Fv1DNw7n/zG0HiCjP2o+TlKw=;20:TUbcAyglPJaXGamA/slSnBJiK+uaqorSVtdZEHYJJctYjUNLtViy6Wmtxhrq4tSxFeP6mEPdOK5ZzpALxUpuElFB5DLx8SKHUyF0xU/5N59nrIsRP+RtHRK7p4BpHDQk2L5nsPsLbmn+Y2M2zzYV2EcQpv92gri4mP6ZrcRDVg4= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN6PR11MB1570; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(788757137089); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6042046)(6043046);SRVR:BN6PR11MB1570;BCL:0;PCL:0;RULEID:;SRVR:BN6PR11MB1570; x-forefront-prvs: 01128BA907 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(7916002)(377424004)(24454002)(189002)(199003)(19580395003)(5002640100001)(19580405001)(92566002)(586003)(66066001)(102836003)(3846002)(6116002)(3280700002)(4326007)(68736007)(83716003)(87936001)(97736004)(11100500001)(81166006)(81156014)(8676002)(2900100001)(8936002)(4001150100001)(77096005)(189998001)(86362001)(3660700001)(101416001)(54356999)(106356001)(7846002)(50986999)(5660300001)(76176999)(36756003)(7736002)(305945005)(106116001)(99286002)(122556002)(10400500002)(6916009)(2950100002)(105586002)(93886004)(82746002)(33656002)(110136003)(2906002)(104396002);DIR:OUT;SFP:1102;SCL:1;SRVR:BN6PR11MB1570;H:BN6PR11MB1570.namprd11.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <90ED8AEB11AB074BBFC448D8F9E326BA@namprd11.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: primarydata.com X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Oct 2016 16:11:02.2291 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 03193ed6-8726-4bb3-a832-18ab0d28adb7 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR11MB1570 X-MC-Unique: 8_c2cDi2N_udYn6A3ZIo_w-1 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u9VGBHoL029680 Content-Length: 2610 Lines: 45 > On Oct 31, 2016, at 11:56, Sebastian Andrzej Siewior wrote: > > On 2016-10-31 15:30:02 [+0000], Trond Myklebust wrote: >>> On Oct 31, 2016, at 09:19, Sebastian Andrzej Siewior wrote: >>> The list_for_each_entry() in nfs4_reclaim_open_state: >>> It seems that this lock protects the ->so_states list among other >>> atomic_t & flags members. So at the begin of the loop we inc ->count >>> ensuring that this field is not removed while we use it. So we drop the >>> ->so_lock loc during the loop it seems. And after nfs4_reclaim_locks() >>> invocation we nfs4_put_open_state() and grab the ->so_lock again. So if >>> we were the last user of this struct and we remove it, then the >>> following list_next_entry() invocation is a use-after-free. Even if we >>> use list_for_each_entry_safe() there is no guarantee that the following >>> member is still valid because it might have been removed by someone >>> invoking nfs4_put_open_state() on it, right? >>> So there is this. >>> >>> However to address my initial problem I have here a patch :) So it uses >>> a rw_semaphore which ensures that there is only one writer at a time or >>> multiple reader. So it should be basically what is happening now plus a >>> tiny tiny tiny lock plus lockdep coverage. I tried to this myself but I >>> don't manage to get into this code path at all so I might be doing >>> something wrong. >>> >>> Could you please check if this patch is working for you and whether my >>> list_for_each_entry() observation is correct or not? >>> >>> v2…v3: replace the seqlock with a RW semaphore. >>> >> >> NACK. That will deadlock. The reason why we use a seqlock there is precisely because we cannot allow ordinary RPC calls to lock out the recovery thread. > Hmmm. So this is getting invoked if I reboot the server? A restart of > nfs-kernel-server is the same thing? > Is the list_for_each_entry() observation I made correct? Yes, and yes. We can’t rely on the list pointers remaining correct, so we restart the list scan and we use the ops->state_flag_bit to signal whether or not state has been recovered for the entry being scanned. > >> If the server reboots, then ordinary RPC calls will fail until the recovery thread has had a chance to re-establish the state. > > This means that the ordinary RPC call won't return and fail but wait > with the lock held until the recovery thread did its thing? It uses the seqcount_t to figure out if a recovery occurred while an OPEN or CLOSE was being processed. If so, it schedules a new recovery of the stateids in question.