summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorIlya Dryomov <idryomov@gmail.com>2015-12-28 13:18:34 +0300
committerSasha Levin <sasha.levin@oracle.com>2016-02-15 15:45:18 -0500
commit65a7633357c158ed164878d97e8a7fc09fb75946 (patch)
tree6859e3263b6e054949c3ee0ac547a112f993fd38 /include
parent92d4a19886bcce4cca4c9f993a7324ac607d142c (diff)
libceph: fix ceph_msg_revoke()
[ Upstream commit 67645d7619738e51c668ca69f097cb90b5470422 ] There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Cc: stable@vger.kernel.org # 4.0+ Reported-by: Matt Conner <matt.conner@keepertech.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Matt Conner <matt.conner@keepertech.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Diffstat (limited to 'include')
-rw-r--r--include/linux/ceph/messenger.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index e15499422fdc..e91c6f15f6e8 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -224,6 +224,7 @@ struct ceph_connection {
struct ceph_entity_addr actual_peer_addr;
/* message out temps */
+ struct ceph_msg_header out_hdr;
struct ceph_msg *out_msg; /* sending message (== tail of
out_sent) */
bool out_msg_done;
@@ -233,7 +234,6 @@ struct ceph_connection {
int out_kvec_left; /* kvec's left in out_kvec */
int out_skip; /* skip this many bytes */
int out_kvec_bytes; /* total bytes left */
- bool out_kvec_is_msg; /* kvec refers to out_msg */
int out_more; /* there is more data after the kvecs */
__le64 out_temp_ack; /* for writing an ack */