i40e/i40evf: Do not free the dummy packet buffer synchronously

The HW still needs to consume it and freeing it in the function
that created it would mean we will be racing with the HW. The
i40e_clean_tx_ring() routine will free up the buffer attached once
the HW has consumed it.  The clean_fdir_tx_irq function had to be fixed
to handle the freeing correctly.

Cases where we program more than one filter per flow (Ipv4), the
code had to be changed to allocate dummy buffer multiple times
since it will be freed by the clean routine.  This also fixes an issue
where the filter program routine was not checking if there were
descriptors available for programming a filter.

Change-ID: Idf72028fd873221934e319d021ef65a1e51acaf7
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 31709b8..440b671 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3087,16 +3087,33 @@
 		/* clear next_to_watch to prevent false hangs */
 		tx_buf->next_to_watch = NULL;
 
+		tx_desc->buffer_addr = 0;
+		tx_desc->cmd_type_offset_bsz = 0;
+		/* move past filter desc */
+		tx_buf++;
+		tx_desc++;
+		i++;
+		if (unlikely(!i)) {
+			i -= tx_ring->count;
+			tx_buf = tx_ring->tx_bi;
+			tx_desc = I40E_TX_DESC(tx_ring, 0);
+		}
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
 				 dma_unmap_addr(tx_buf, dma),
 				 dma_unmap_len(tx_buf, len),
 				 DMA_TO_DEVICE);
+		if (tx_buf->tx_flags & I40E_TX_FLAGS_FD_SB)
+			kfree(tx_buf->raw_buf);
 
+		tx_buf->raw_buf = NULL;
+		tx_buf->tx_flags = 0;
+		tx_buf->next_to_watch = NULL;
 		dma_unmap_len_set(tx_buf, len, 0);
+		tx_desc->buffer_addr = 0;
+		tx_desc->cmd_type_offset_bsz = 0;
 
-
-		/* move to the next desc and buffer to clean */
+		/* move us past the eop_desc for start of next FD desc */
 		tx_buf++;
 		tx_desc++;
 		i++;