dlm: fixes for nodir mode

The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used.  This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
  all in-progress operations after recovery.  In some
  cases it's not possible to know which in-progess locks
  to recover, so recover all.  (Most require recovery
  in nodir mode anyway since rehashing changes most
  master nodes.)

- Change the way nodir mode is enabled, from a command
  line mount arg passed through gfs2, into a sysfs
  file managed by dlm_controld, consistent with the
  other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
  yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
  from a previous, aborted recovery cycle.  Base this
  on the local recovery status not being in the state
  where any nodes should be sending LOCK messages for the
  current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
  may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
  the master as is usual), because the lkb can switch
  back and forth between being a master and being a
  process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
  non-empty convert or waiting queues for granting
  at the end of recovery.  (Rename flag from LOCKS_PURGED
  to RECOVER_GRANT and similar for the recovery function,
  because it's not only resources with purged locks
  that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
  error messages.

Signed-off-by: David Teigland <teigland@redhat.com>
diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c
index 6565fd5..64d3e2b 100644
--- a/fs/dlm/rcom.c
+++ b/fs/dlm/rcom.c
@@ -492,30 +492,41 @@
 void dlm_receive_rcom(struct dlm_ls *ls, struct dlm_rcom *rc, int nodeid)
 {
 	int lock_size = sizeof(struct dlm_rcom) + sizeof(struct rcom_lock);
-	int stop, reply = 0;
+	int stop, reply = 0, lock = 0;
+	uint32_t status;
 	uint64_t seq;
 
 	switch (rc->rc_type) {
+	case DLM_RCOM_LOCK:
+		lock = 1;
+		break;
+	case DLM_RCOM_LOCK_REPLY:
+		lock = 1;
+		reply = 1;
+		break;
 	case DLM_RCOM_STATUS_REPLY:
 	case DLM_RCOM_NAMES_REPLY:
 	case DLM_RCOM_LOOKUP_REPLY:
-	case DLM_RCOM_LOCK_REPLY:
 		reply = 1;
 	};
 
 	spin_lock(&ls->ls_recover_lock);
+	status = ls->ls_recover_status;
 	stop = test_bit(LSFL_RECOVERY_STOP, &ls->ls_flags);
 	seq = ls->ls_recover_seq;
 	spin_unlock(&ls->ls_recover_lock);
 
 	if ((stop && (rc->rc_type != DLM_RCOM_STATUS)) ||
-	    (reply && (rc->rc_seq_reply != seq))) {
+	    (reply && (rc->rc_seq_reply != seq)) ||
+	    (lock && !(status & DLM_RS_DIR))) {
 		log_limit(ls, "dlm_receive_rcom ignore msg %d "
-			  "from %d %llu %llu seq %llu",
-			   rc->rc_type, nodeid,
+			  "from %d %llu %llu recover seq %llu sts %x gen %u",
+			   rc->rc_type,
+			   nodeid,
 			   (unsigned long long)rc->rc_seq,
 			   (unsigned long long)rc->rc_seq_reply,
-			   (unsigned long long)seq);
+			   (unsigned long long)seq,
+			   status, ls->ls_generation);
 		goto out;
 	}