aboutsummaryrefslogtreecommitdiff
path: root/fs/dlm/rcom.c
AgeCommit message (Collapse)Author
2012-08-08dlm: fix unlock balance warningsDavid Teigland
The in_recovery rw_semaphore has always been acquired and released by different threads by design. To work around the "BUG: bad unlock balance detected!" messages, adjust things so the dlm_recoverd thread always does both down_write and up_write. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use idr instead of list for recovered rsbsDavid Teigland
When a large number of resources are being recovered, a linear search of the recover_list takes a long time. Use an idr in place of a list. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use rsbtbl as resource directoryDavid Teigland
Remove the dir hash table (dirtbl), and use the rsb hash table (rsbtbl) as the resource directory. It has always been an unnecessary duplication of information. This improves efficiency by using a single rsbtbl lookup in many cases where both rsbtbl and dirtbl lookups were needed previously. This eliminates the need to handle cases of rsbtbl and dirtbl being out of sync. In many cases there will be memory savings because the dir hash table no longer exists. Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02dlm: fixes for nodir modeDavid Teigland
The "nodir" mode (statically assign master nodes instead of using the resource directory) has always been highly experimental, and never seriously used. This commit fixes a number of problems, making nodir much more usable. - Major change to recovery: recover all locks and restart all in-progress operations after recovery. In some cases it's not possible to know which in-progess locks to recover, so recover all. (Most require recovery in nodir mode anyway since rehashing changes most master nodes.) - Change the way nodir mode is enabled, from a command line mount arg passed through gfs2, into a sysfs file managed by dlm_controld, consistent with the other config settings. - Allow recovering MSTCPY locks on an rsb that has not yet been turned into a master copy. - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages from a previous, aborted recovery cycle. Base this on the local recovery status not being in the state where any nodes should be sending LOCK messages for the current recovery cycle. - Hold rsb lock around dlm_purge_mstcpy_locks() because it may run concurrently with dlm_recover_master_copy(). - Maintain highbast on process-copy lkb's (in addition to the master as is usual), because the lkb can switch back and forth between being a master and being a process copy as the master node changes in recovery. - When recovering MSTCPY locks, flag rsb's that have non-empty convert or waiting queues for granting at the end of recovery. (Rename flag from LOCKS_PURGED to RECOVER_GRANT and similar for the recovery function, because it's not only resources with purged locks that need grant a grant attempt.) - Replace a couple of unnecessary assertion panics with error messages. Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26dlm: limit rcom debug messagesDavid Teigland
Unify the checking for both types of ignored rcom messages, and replace the two log_debug statements with a single, rate limited debug message. Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04dlm: add node slots and generationDavid Teigland
Slot numbers are assigned to nodes when they join the lockspace. The slot number chosen is the minimum unused value starting at 1. Once a node is assigned a slot, that slot number will not change while the node remains a lockspace member. If the node leaves and rejoins it can be assigned a new slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10dlm: record full callback stateDavid Teigland
Change how callbacks are recorded for locks. Previously, information about multiple callbacks was combined into a couple of variables that indicated what the end result should be. In some situations, we could not tell from this combined state what the exact sequence of callbacks were, and would end up either delivering the callbacks in the wrong order, or suppress redundant callbacks incorrectly. This new approach records all the data for each callback, leaving no uncertainty about what needs to be delivered. Signed-off-by: David Teigland <teigland@redhat.com>
2009-11-30dlm: always use GFP_NOFSDavid Teigland
Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-21dlm: fix rcom_names message to selfDavid Teigland
The recent patch to validate data lengths in rcom_names messages failed to account for fake messages a node directs to itself before ever sending it. In this case we need to fill in the message length in the header for the validation code to use. Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-06dlm: proper types for asts and bastsDavid Teigland
Use proper types for ast and bast functions, and use consistent type for ast param. Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04dlm: verify that places expecting rcom_lock have packet long enoughAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04dlm: missing length check in check_config()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04dlm: use proper type for ->ls_recover_bufAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04dlm: do not byteswap rcom_configAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>
2008-02-04dlm: do not byteswap rcom_lockAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>
2008-01-30dlm: clean upsDavid Teigland
A couple small clean-ups. Remove unnecessary wrapper-functions in rcom.c, and remove unnecessary casting and an unnecessary ASSERT in util.c. Signed-off-by: David Teigland <teigland@redhat.com>
2007-10-10[DLM] block dlm_recv in recovery transitionDavid Teigland
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv threads while working in the dlm. This allows dlm_recv activity to be suspended when the lockspace transitions to, from and between recovery cycles. The specific bug prompting this change is one where an in-progress recovery cycle is aborted by a new recovery cycle. While dlm_recv was processing a recovery message, the recovery cycle was aborted and dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count on an rsb after dlm_recoverd had reset it to zero. This is fixed by suspending dlm_recv (taking write lock on the rwsem) before aborting the current recovery. The transitions to/from normal and recovery modes are simplified by using this new ability to block dlm_recv. The switch from normal to recovery mode means dlm_recv goes from processing locking messages, to saving them for later, and vice versa. Races are avoided by blocking dlm_recv when setting the flag that switches between modes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14[DLM] fix NULL ls usageDavid Teigland
Fix regression in recent patch "[DLM] variable allocation" which attempts to dereference an "ls" struct when it's NULL. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] variable allocationPatrick Caulfield
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[DLM] wait for config check during join [6/6]David Teigland
Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] rename dlm_config_info fieldsDavid Teigland
Add a "ci_" prefix to the fields in the dlm_config_info struct so that we can use macros to add configfs functions to access them (in a later patch). No functional changes in this patch, just naming changes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] change some log_error to log_debugDavid Teigland
Some common, non-error messages should use log_debug instead of log_error so they can be turned off. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] add version checkDavid Teigland
Check if we receive a message from another lockspace member running a version of the dlm with an incompatible inter-node message protocol. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05[DLM] fix old rcom messagesDavid Teigland
A reply to a recovery message will often be received after the relevant recovery sequence has aborted and the next recovery sequence has begun. We need to ignore replies to these old messages from the previous recovery. There's already a way to do this for synchronous recovery requests using the rc_id number, but not for async. Each recovery sequence already has a locally unique sequence number associated with it. This patch adds a field to the rcom (recovery message) structure where this recovery sequence number can be placed, rc_seq. When a node sends a reply to a recovery request, it copies the rc_seq number it received into rc_seq_reply. When the first node receives the reply to its recovery message, it will check whether rc_seq_reply matches the current recovery sequence number, ls_recover_seq, and if not then it ignores the old reply. An old, inadequate approach to filtering out old replies (checking if the current stage of recovery has moved back to the start) has been removed from two spots. The protocol version number is changed to reflect the different rcom structures. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix format warnings in rcom.c and recoverd.cRyusuke Konishi
This fixes the following gcc warnings generated on the architectures where uint64_t != unsigned long long (e.g. ppc64). fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t' fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t' fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] don't accept replies to old recovery messagesDavid Teigland
We often abort a recovery after sending a status request to a remote node. We want to ignore any potential status reply we get from the remote node. If we get one of these unwanted replies, we've often moved on to the next recovery message and incremented the message sequence counter, so the reply will be ignored due to the seq number. In some cases, we've not moved on to the next message so the seq number of the reply we want to ignore is still correct, causing the reply to be accepted. The next recovery message will then mistake this old reply as a new one. To fix this, we add the flag RCOM_WAIT to indicate when we can accept a new reply. We clear this flag if we abort recovery while waiting for a reply. Before the flag is set again (to allow new replies) we know that any old replies will be rejected due to their sequence number. We also initialize the recovery-message sequence number to a random value when a lockspace is first created. This makes it clear when messages are being rejected from an old instance of a lockspace that has since been recreated. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] fix size of STATUS_REPLY messageDavid Teigland
When the not_ready routine sends a "fake" status reply with blank status flags, it needs to use the correct size for a normal STATUS_REPLY by including the size of the would-be config parameters. We also fill in the non-existant config parameters with an invalid lvblen value so it's easier to notice if these invalid paratmers are ever being used. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30[DLM] status messages ping-pong between unmounted nodesDavid Teigland
Red Hat BZ 213682 If two nodes leave the lockspace (while unmounting the fs in the case of gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY messages will then ping-pong between the nodes when neither of them can find the lockspace in question any longer. We kill this by not sending another STATUS message when we get a STATUS_REPLY for an unknown lockspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-24[DLM] sequence number missing in not_ready replyDavid Teigland
When a status reply is sent for a lockspace that doesn't yet exist, the message sequence number from the sender was not being copied into the reply causing the sender to ignore the reply. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-09[DLM] reject replies to old requestsDavid Teigland
When recoveries are aborted by other recoveries we can get replies to status or names requests that we've given up on. This can cause problems if we're making another request and receive an old reply. Add a sequence number to status/names requests and reject replies that don't match. A field already exists for the seq number that's used in other message types. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-08-09[DLM] show nodeid for recovery messageDavid Teigland
To aid debugging, it's useful to be able to see what nodeid the dlm is waiting on for a message reply. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-23[DLM] Remove range locks from the DLMDavid Teigland
This patch removes support for range locking from the DLM Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-01-18[DLM] The core of the DLM for GFS2/CLVMDavid Teigland
This is the core of the distributed lock manager which is required to use GFS2 as a cluster filesystem. It is also used by CLVM and can be used as a standalone lock manager independantly of either of these two projects. It implements VAX-style locking modes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>