Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp292109lqz; Fri, 29 Mar 2024 18:06:43 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWyIRwwMPjafsbSws8cFHL48pXzPBj9OrqaBNdiKayNco94ThQSmA9dUwy2U9Mpa0JIPrdaGGOxbl1VRDYRo+H8Z9CI71nWuh4P2TDPWg== X-Google-Smtp-Source: AGHT+IE8Zmbcj6S4+jzXRHhetsPDiRynUrnpfmQGT2QyJIz0cjKLg4U/Mx74PoViFxgYnDGYZloA X-Received: by 2002:a05:6a21:99aa:b0:1a3:c4c2:b016 with SMTP id ve42-20020a056a2199aa00b001a3c4c2b016mr4197105pzb.45.1711760803007; Fri, 29 Mar 2024 18:06:43 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711760802; cv=pass; d=google.com; s=arc-20160816; b=WC5BMBDTdn4LtWWY8a5U/dJXNh/Rc7SXldOywqJ9a+oaLoL4yydQYrWtPn73YXNey7 sEUPWxM1v8WqP6u6iUGcUyjJ0OLP3kxFnOxzcr/FxqHuGss3Z/tGk3lLvkVo1WM9bPiK TJm3zYNkd0Ejh6/xpJUpZDAdD5RsrZstmtXYKOO/QCWPqmBfXUxCTyArncbH6U/azu9P BawsMbVvXClbTRCYbxEIFzopXFeQ/hxmIauM2C+SOjNr+onK5WBFU8SuUL5B1mT2FoeW CJUoMWkNgp40m100Ik4hpuT26US+z37B/2Lcr+IyZqZphVx7mAQiWljd8zoQDAP2347p uHtQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:dkim-signature:message-id; bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=; fh=5rHPgdcqqVGOC57wPODWgIGvfasKt102oJ5IW9CSWl0=; b=RqY84gDGJxS2zrEjkgJJUQuTacunEGvnQ+IBhENzYLPaiEFWPAEt/tQyQgv+/hvXY0 c8wupjGrh5pO/aZRetbXyhLdma3XfA6mz6lW8jKg/g1HF+IYMo/At4jeSMKtzqC5GqGP /zCgGV0T4lipx61S8UKvTrLWEhfbsVq2l5VEK2MMg3c0KuEQwCxL1+YqOTX3HPTef6V9 xarkint6dmHSegp3mZBtXwnZYwziYCc9w73WNQ86O2cjzwTnTeUdyONo1/TwXs/bWmKR TKBhdpPoCeVSoycwSWdEziLW1ObJdcOI9HsFeLxeVQIa/a5kjEed3Nq6FVqg+dpXAJIb Ln6Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=VugHsG2Y; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-125561-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-125561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id f20-20020a635554000000b005dc97d88896si4527651pgm.729.2024.03.29.18.06.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Mar 2024 18:06:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-125561-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=VugHsG2Y; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-125561-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-125561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 9F818283647 for ; Sat, 30 Mar 2024 01:06:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B3A754C90; Sat, 30 Mar 2024 01:06:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="VugHsG2Y" Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE52D10E9; Sat, 30 Mar 2024 01:06:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711760783; cv=none; b=eQMLIM3E3vC1fcfr7QPtAQIpBm5D/DQ5nbet5fJmSZleovRyoxau2snQi72aO1TzlzgS5k4khrdezEgHPJMG+BNkzJMSw6FopjQNhUwEYoZS1xoARA+0fUM/nlOo+C0LFouufIIblUWLdmlSMvYsj1pJgOtyDxrmVMl2G7A8JUY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711760783; c=relaxed/simple; bh=0Tab6a8G72hAdQQTifWJdxmg84+y/tA9VvkWPFI9VQA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=X1L0GBShyUfqsn6VvqqPsGVcef0/0+Su3BiDDri8AYSw2Tm6sFPgcsO5B786NS8cdmDPjuer6wtR8AutcMTwPZp4Wd2K6VRRe7ZaUU3BnJP8YPQUtBHNQYSt/yCGns+w71mx60RSjYvdktBv/+0Lzos4yFLliQBn7gOIDp7LcOQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=VugHsG2Y; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Message-ID: <08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1711760777; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=; b=VugHsG2Y5vYBP20CvcOMmGaEI/5A/PvLnTsQTtTA0ZebTuac4nORyH8iRZe/CwFs5RLRhJ 4Ih/prbwbd8/OSA0Wv9Z9Z9JdeLOJUf8/vLW1xeGCG/2qNeI4CXYcIw3EixotT7o6oviEg ZM4gfY/Y4bUjm5TsY8pyZBWQLZ0Jv74= Date: Fri, 29 Mar 2024 18:06:09 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH 19/26] netfs: New writeback implementation Content-Language: en-US To: Naveen Mamindlapalli , David Howells , Christian Brauner , Jeff Layton , Gao Xiang , Dominique Martinet Cc: Matthew Wilcox , Steve French , Marc Dionne , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Eric Van Hensbergen , Ilya Dryomov , "netfs@lists.linux.dev" , "linux-cachefs@redhat.com" , "linux-afs@lists.infradead.org" , "linux-cifs@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "ceph-devel@vger.kernel.org" , "v9fs@lists.linux.dev" , "linux-erofs@lists.ozlabs.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Latchesar Ionkov , Christian Schoenebeck References: <20240328163424.2781320-1-dhowells@redhat.com> <20240328163424.2781320-20-dhowells@redhat.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Vadim Fedorenko In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 29/03/2024 10:34, Naveen Mamindlapalli wrote: >> -----Original Message----- >> From: David Howells >> Sent: Thursday, March 28, 2024 10:04 PM >> To: Christian Brauner ; Jeff Layton ; >> Gao Xiang ; Dominique Martinet >> >> Cc: David Howells ; Matthew Wilcox >> ; Steve French ; Marc Dionne >> ; Paulo Alcantara ; Shyam >> Prasad N ; Tom Talpey ; Eric Van >> Hensbergen ; Ilya Dryomov ; >> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org; >> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph- >> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux- >> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux- >> kernel@vger.kernel.org; Latchesar Ionkov ; Christian >> Schoenebeck >> Subject: [PATCH 19/26] netfs: New writeback implementation >> >> The current netfslib writeback implementation creates writeback requests of >> contiguous folio data and then separately tiles subrequests over the space >> twice, once for the server and once for the cache. This creates a few >> issues: >> >> (1) Every time there's a discontiguity or a change between writing to only >> one destination or writing to both, it must create a new request. >> This makes it harder to do vectored writes. >> >> (2) The folios don't have the writeback mark removed until the end of the >> request - and a request could be hundreds of megabytes. >> >> (3) In future, I want to support a larger cache granularity, which will >> require aggregation of some folios that contain unmodified data (which >> only need to go to the cache) and some which contain modifications >> (which need to be uploaded and stored to the cache) - but, currently, >> these are treated as discontiguous. >> >> There's also a move to get everyone to use writeback_iter() to extract >> writable folios from the pagecache. That said, currently writeback_iter() >> has some issues that make it less than ideal: >> >> (1) there's no way to cancel the iteration, even if you find a "temporary" >> error that means the current folio and all subsequent folios are going >> to fail; >> >> (2) there's no way to filter the folios being written back - something >> that will impact Ceph with it's ordered snap system; >> >> (3) and if you get a folio you can't immediately deal with (say you need >> to flush the preceding writes), you are left with a folio hanging in >> the locked state for the duration, when really we should unlock it and >> relock it later. >> >> In this new implementation, I use writeback_iter() to pump folios, >> progressively creating two parallel, but separate streams and cleaning up >> the finished folios as the subrequests complete. Either or both streams >> can contain gaps, and the subrequests in each stream can be of variable >> size, don't need to align with each other and don't need to align with the >> folios. >> >> Indeed, subrequests can cross folio boundaries, may cover several folios or >> a folio may be spanned by multiple folios, e.g.: >> >> +---+---+-----+-----+---+----------+ >> Folios: | | | | | | | >> +---+---+-----+-----+---+----------+ >> >> +------+------+ +----+----+ >> Upload: | | |.....| | | >> +------+------+ +----+----+ >> >> +------+------+------+------+------+ >> Cache: | | | | | | >> +------+------+------+------+------+ >> >> The progressive subrequest construction permits the algorithm to be >> preparing both the next upload to the server and the next write to the >> cache whilst the previous ones are already in progress. Throttling can be >> applied to control the rate of production of subrequests - and, in any >> case, we probably want to write them to the server in ascending order, >> particularly if the file will be extended. >> >> Content crypto can also be prepared at the same time as the subrequests and >> run asynchronously, with the prepped requests being stalled until the >> crypto catches up with them. This might also be useful for transport >> crypto, but that happens at a lower layer, so probably would be harder to >> pull off. >> >> The algorithm is split into three parts: >> >> (1) The issuer. This walks through the data, packaging it up, encrypting >> it and creating subrequests. The part of this that generates >> subrequests only deals with file positions and spans and so is usable >> for DIO/unbuffered writes as well as buffered writes. >> >> (2) The collector. This asynchronously collects completed subrequests, >> unlocks folios, frees crypto buffers and performs any retries. This >> runs in a work queue so that the issuer can return to the caller for >> writeback (so that the VM can have its kswapd thread back) or async >> writes. >> >> (3) The retryer. This pauses the issuer, waits for all outstanding >> subrequests to complete and then goes through the failed subrequests >> to reissue them. This may involve reprepping them (with cifs, the >> credits must be renegotiated, and a subrequest may need splitting), >> and doing RMW for content crypto if there's a conflicting change on >> the server. >> >> [!] Note that some of the functions are prefixed with "new_" to avoid >> clashes with existing functions. These will be renamed in a later patch >> that cuts over to the new algorithm. >> >> Signed-off-by: David Howells >> cc: Jeff Layton >> cc: Eric Van Hensbergen >> cc: Latchesar Ionkov >> cc: Dominique Martinet >> cc: Christian Schoenebeck >> cc: Marc Dionne >> cc: v9fs@lists.linux.dev >> cc: linux-afs@lists.infradead.org >> cc: netfs@lists.linux.dev >> cc: linux-fsdevel@vger.kernel.org [..snip..] >> +/* >> + * Begin a write operation for writing through the pagecache. >> + */ >> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t >> len) >> +{ >> + struct netfs_io_request *wreq = NULL; >> + struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp)); >> + >> + mutex_lock(&ictx->wb_lock); >> + >> + wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp, >> + iocb->ki_pos, NETFS_WRITETHROUGH); >> + if (IS_ERR(wreq)) >> + mutex_unlock(&ictx->wb_lock); >> + >> + wreq->io_streams[0].avail = true; >> + trace_netfs_write(wreq, netfs_write_trace_writethrough); > > Missing mutex_unlock() before return. > mutex_unlock() happens in new_netfs_end_writethrough() > Thanks, > Naveen >