tcm_fc: Convert to per-cpu command map pre-allocation of ft_cmd

This patch converts tcm_fc to use transport_init_session_tags()
pre-allocation logic for struct ft_cmd descriptors using per-cpu
session tag pooling in order to effectively avoid memory allocation
+ release for each received I/O.

It adds percpu_ida_alloc() in ft_recv_cmd() to obtain an tag and
locate ft_cmd from se_sess->sess_cmd_map[], and percpu_ida_free()
in ft_free_cmd() to release the tag based upon se_cmd->map_tag id.

It also uses a TCM_FC_DEFAULT_TAGS value of 512, that puts the
per se_sess->sess_cmd_map allocation at ~360K on 64-bit.

v2 changes:

  - Handle possible tag < 0 failure with GFP_ATOMIC

Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
diff --git a/drivers/target/tcm_fc/tcm_fc.h b/drivers/target/tcm_fc/tcm_fc.h
index 0dd54a4..752863a 100644
--- a/drivers/target/tcm_fc/tcm_fc.h
+++ b/drivers/target/tcm_fc/tcm_fc.h
@@ -22,6 +22,7 @@
 #define FT_NAMELEN 32		/* length of ASCII WWPNs including pad */
 #define FT_TPG_NAMELEN 32	/* max length of TPG name */
 #define FT_LUN_NAMELEN 32	/* max length of LUN name */
+#define TCM_FC_DEFAULT_TAGS 512	/* tags used for per-session preallocation */
 
 struct ft_transport_id {
 	__u8	format;